CN114565663A - Positioning method and device - Google Patents
Positioning method and device Download PDFInfo
- Publication number
- CN114565663A CN114565663A CN202011271315.1A CN202011271315A CN114565663A CN 114565663 A CN114565663 A CN 114565663A CN 202011271315 A CN202011271315 A CN 202011271315A CN 114565663 A CN114565663 A CN 114565663A
- Authority
- CN
- China
- Prior art keywords
- image
- images
- information
- coordinate system
- pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000012545 processing Methods 0.000 claims description 31
- 230000003190 augmentative effect Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 20
- 238000013507 mapping Methods 0.000 description 19
- 238000004422 calculation algorithm Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 14
- 238000010276 construction Methods 0.000 description 14
- 238000004590 computer program Methods 0.000 description 13
- 230000009466 transformation Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000003860 storage Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000009826 distribution Methods 0.000 description 8
- 238000000605 extraction Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- PUAQLLVFLMYYJJ-UHFFFAOYSA-N 2-aminopropiophenone Chemical compound CC(N)C(=O)C1=CC=CC=C1 PUAQLLVFLMYYJJ-UHFFFAOYSA-N 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000002349 favourable effect Effects 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Processing Or Creating Images (AREA)
Abstract
The application provides a positioning method and device. The embodiment of the application realizes the positioning of the first image by determining at least one second image matched with the first image shot by the first terminal and the poses of the first image and the at least one second image in the local coordinate system of the first terminal in an image database, and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. Therefore, the embodiment of the application can obtain the relation between the local coordinate system and the world coordinate system based on the poses of a part of images in the local coordinate system and the world coordinate system, so that the first image is positioned in the world coordinate system without acquiring a 3D point cloud for each image. The application can be applied to the fields of virtual reality VR, augmented reality AR and the like.
Description
Technical Field
The present application relates to the field of Augmented Reality (AR) technology, and more particularly, to a method and apparatus for positioning.
Background
With the gradual maturity of the fifth generation (5th generation, 5G) communication technology and the development of mobile phone camera hardware and computing power, intelligent application services based on the visual AR technology are more and more abundant. The AR technology is a technology for skillfully fusing virtual information and a real world, and a plurality of technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, sensing and the like are widely applied, and virtual information such as characters, images, three-dimensional models, music, videos and the like generated by a computer is applied to the real world after being simulated, so that the virtual information and the real world are mutually supplemented, and the real world is enhanced.
In order to realize wide-coverage AR application, a map feature library needs to be constructed for pose acquisition. For example, first, a map feature library may be constructed by collecting three-dimensional (3D) point cloud (x, y, z) information of a surrounding environment using a professional collection device such as a road measuring vehicle, a drone, a survey type laser scanner, an optical camera, and the like, in an off-library construction stage. Then, in the on-line positioning stage, an environmental picture can be shot, two-dimensional (2D) feature points of the shot picture are extracted and matched with a map feature library constructed in the off-line library construction stage, a series of 2D-3D matching point pairs are obtained through retrieval, and then the pose of the terminal is obtained through a pose calculation algorithm, such as PnP (point-n-point).
However, the above scheme relies on the construction of a map feature library containing point cloud information which is complete and highly uniform with the environment, but the construction of the map feature library containing point cloud information requires the dependence on professionals and acquisition equipment, and is time-consuming, labor-consuming and high in cost. Therefore, a need exists for an efficient and low-cost positioning scheme.
Disclosure of Invention
The application provides a positioning method and a positioning device, which can obtain the relation between a local coordinate system and a world coordinate system based on the position and the posture of a part of images in the local coordinate system and the position and the posture of the part of images in the world coordinate system, so that an image to be positioned is positioned in the world coordinate system without acquiring a 3D point cloud for each image.
In a first aspect, a method for positioning is provided, where the method may be applied to a terminal or a cloud. In the method, a first image shot by a first terminal can be obtained, at least one second image matched with the first image is determined in an image database according to first information of the first image and first information of images in an image database, wherein the first information is used for indicating global features of the images, the image database comprises first information of multi-frame images, second information of the multi-frame images and poses of the multi-frame images in a world coordinate system, and the second information is used for indicating local features of the images. Then, the poses of the first image and the at least one second image in the first local coordinate system of the first terminal may be determined based on the second information of the first image and the second information of the at least one second image. Then, the pose of the first image in the world coordinate system may be output according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system.
Therefore, the embodiment of the application realizes the positioning of the first image by determining at least one second image matched with the first image shot by the first terminal and the pose of the first image and the at least one second image in the local coordinate system of the first terminal in the image database, and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. Therefore, the embodiment of the application can obtain the relation between the local coordinate system and the world coordinate system based on the pose of a part of images in the local coordinate system and the pose in the world coordinate system, so that the image to be positioned (such as the first image) is positioned in the world coordinate system without acquiring a 3D point cloud for each image.
In the embodiment of the application, the pose of the first image in the world coordinate system is the pose of the first terminal in the world coordinate system when the first terminal shoots the first image, and may also be referred to as the pose of the camera.
In some embodiments, the pose of the at least one second image in the world coordinate system and the second information may be obtained from an image database, which is not limited in this application.
As an example, in this embodiment, the first terminal may be a mobile device such as a mobile phone and an autonomous driving automobile, which is not limited in this application.
Compared with a calculation scheme of constructing a 3D point cloud feature library of an environment (including global 3D point cloud features) and realizing a pose based on feature point matching in the existing scheme, the embodiment of the application does not utilize the 3D point cloud features of the image, but utilizes the pose of the image for positioning. Because the 3D point cloud feature library of the environment needs to depend on professionals and acquisition equipment, time and labor are consumed, the cost is high, and the 3D point cloud features do not need to be acquired, on one hand, low-cost equipment, such as a camera (such as a mobile phone camera) with low resolution and a small field angle, can be adopted for constructing the image database without depending on the professionals and the professional equipment, and the data volume of the image database can be reduced, so that the cost for constructing the image database can be reduced; on the other hand, the method and the device can shorten the time for constructing the database, and are beneficial to efficiently constructing the image database. Therefore, the scheme for positioning based on the image pose and the image database can be beneficial to positioning with high efficiency and low cost.
In some embodiments, the at least one second image may constitute a set of images, which may be referred to as a set of similar images of the first image.
In some embodiments, the first image and the at least one second image may form an image set, which may be referred to as a local image set. The first local coordinate system, that is, the camera coordinate system of the first terminal, may be a relative coordinate system constructed by using any one of the frame images in the local image set as an origin of the first local coordinate system.
As a possible implementation manner, a mapping relationship (or may also be referred to as a conversion relationship) between the first local coordinate system and the world coordinate system may be determined according to the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. Then, according to the mapping relationship, the pose of the first image in the first local coordinate system can be converted, the pose of the first image in the world coordinate system is determined, and the pose is output.
By way of example, the mapping relationship between the world coordinate system and the first local coordinate system may be described by a rotation matrix or a translation vector, which is not limited in this application.
In the embodiment of the present application, the world coordinate system may also be referred to as an absolute coordinate system, a global coordinate system, or the like. As an example, the world coordinate system may describe the position of a camera (or a terminal containing the camera) as a reference coordinate system in the environment, and may also be used to describe the position of any object in the environment.
As a possible implementation manner, the image database may include a plurality of mapping relationships, where one mapping relationship may indicate a correspondence between an identifier of one image, first information of the image, second information of the image, and a pose of the image in a world coordinate system, and the application is not limited thereto.
With reference to the first aspect, in certain implementations of the first aspect, the method may further include establishing the image database.
With reference to the first aspect, in some implementation manners of the first aspect, a video captured by a second terminal may be acquired, and a pose of a multi-frame image in the video in a second local coordinate system of the second terminal may be acquired. Then, the pose of the multi-frame image in the video in the world coordinate system can be determined according to the third information and the pose of the multi-frame image in the video in the second local coordinate system. Wherein the third information is used to indicate a position of at least a portion of the image in the video in a map, the map being associated with the world coordinate system. After that, the first information and the second information of the multi-frame image in the video may be acquired. In this way, the establishment of the image database can be achieved.
Therefore, the embodiment of the application can establish the image data amount by acquiring the poses of the multi-frame images in the video shot by the second terminal in the second local coordinate system of the second terminal and combining the relevant positions of the multi-frame images in the video in the map to determine the positions of the multi-frame images in the video in the world coordinate system and acquiring the first information and the second information of the images. Therefore, when the image database is constructed, only the pose of the image in the world coordinate system and the first information and the second information of the image need to be acquired, and the 3D point cloud features of the environment (such as the global 3D point cloud features) do not need to be acquired.
It should be noted that, in this embodiment of the application, the first terminal and the second terminal may be the same terminal device or different terminal devices, and are not limited.
In some embodiments, after the pose of the first image in the world coordinate system is acquired, the first information of the first image, the second information of the first image, and the pose of the first image in the world coordinate system may be added to the image database, so as to update the image database.
As an example, the third information may be used to indicate a position of a first frame image in the video in a map, and/or a position of a last frame image on the map. For example, the map may be a map obtained through a Global Positioning System (GPS), or a map obtained through a global positioning system (GNSS), which is not limited in this application.
As a possible implementation manner, a video may be acquired based on a camera of the second terminal, and a pose of each frame of image frame in the video in the second local coordinate system of the second terminal is obtained through a simultaneous localization and mapping (SLAM) algorithm. Further, the position of the first frame image in the video in the map and the position of the last frame image in the map can be combined to calculate the pose of each image in the video in the world coordinate system. Here, the second local coordinate system, that is, the camera coordinate system of the second terminal may be a relative coordinate system constructed with an origin of the second local coordinate system in any one frame of image in the video.
For example, the video may be captured according to a planned capture route. For example, the user may hold the terminal by hand, start from the starting point of the route, continuously shoot along the route facing the environment, and end shooting to the end point of the route to acquire the video corresponding to the route. For part or all of the video images in the video, the pose of the image in the local coordinate system of the camera can be obtained by utilizing the SLAM algorithm. Meanwhile, the terminal can also acquire the position of the starting point of the route in the map and/or the position of the ending point of the route in the map, so as to calculate the pose of the image in the video in the world coordinate system.
With reference to the first aspect, in certain implementations of the first aspect, a first position of the first image may be further obtained, and at least one image is determined in the image database according to the first position of the first image. Wherein a distance between a location of each of the at least one image and a first location of the first image from a GPS module or a Wireless Fidelity (WiFi) module of the first terminal is less than a first threshold.
As a specific implementation manner of matching the first information of the first image with the first information of the images in the image database and determining at least one second image matching the first image in the image database, the first information of the first image may be matched with the first information of the at least one image, and the at least one second image matching the first image is acquired from the at least one image.
It should be noted that the first position is a "coarse positioning" position, which has a low accuracy, and the error may be, for example, 3 to 10 meters.
Therefore, according to the embodiment of the application, at least one image is determined in the image database according to the first position of the first image, so that the first information of the first image is matched with the first information of the at least one image, and at least one second image matched with the first image is obtained in the at least one image. Therefore, the first information of the first image does not need to be matched with the first information of all images in the image database, so that the calculation amount of the terminal is reduced, the matching time is reduced, and the efficiency of obtaining the second image is improved.
As an example, the at least one image acquired in the image data according to the first position may be a set of images, which may be referred to as a set of image candidates.
As one possible implementation manner of determining at least one second image according to the first information of the first image and the first information of the images in the image database or the image candidate set, a similarity between the first information of the first image and the first information of the images in the image database or the image candidate set may be calculated, and then the at least one second image is determined according to the similarity, which is not limited in this application.
As a possible implementation manner, after obtaining the similarity between the image in the image database or the image candidate set and the first image, the calculated multiple similarities may be ranked to obtain the top m images with the highest similarity as the second image, where m is an integer greater than 1. As one example, m may be 20. Alternatively, in some other implementations, an image with a similarity greater than a preset threshold with respect to the first image may be used as the second image, which is not limited in this application.
Here, the process of calculating the similarity between the first information of the first image and the first information of the images in the image database or the image candidate set may be referred to as matching the first image with the images in the image database or the image candidate set, and the present application is not limited thereto.
With reference to the first aspect, in some implementations of the first aspect, as an implementation manner of determining, in the image database, at least one second image matching the first image according to the first information of the first image and the first information of the images in the image database, a plurality of third images matching the first image may be determined in the image database according to the first information of the first image and the first information of the images in the image database, and then, an image of the plurality of third images whose distance from a cluster center of the plurality of third images is greater than a second threshold may be deleted to obtain the at least one second image. Wherein the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.
For example, the image having a distance from the cluster center of the third images larger than the second threshold is an outlier image in the third images.
Here, the image set composed of the plurality of third images may also be referred to as a similar image set, and the similar image set includes the similar image set composed of the at least one second image. In addition, when there is no outlier image among the plurality of third images, the operation of deleting the image may not be performed. At this time, the third image is the second image.
Since images at different positions may have very similar textures, an error may exist in the similarity calculation process, which may cause an outlier in a similar image set. Therefore, the images in the similar image set can be subjected to error elimination by eliminating the outlier images, and a more accurate similar image set can be obtained. Here, acquiring a more accurate set of similar images may help to make the pose of the subsequently acquired second image in the first local coordinate system more accurate, thereby helping to improve the accuracy of the pose of the first image in the world coordinate system.
With reference to the first aspect, in some implementations of the first aspect, as an implementation manner of determining, in the image database, at least one second image matching the first image according to the first information of the first image and the first information of the images in the image database, a plurality of fourth images matching the first image may be determined in the image database according to the first information of the first image and the first information of the images in the image database, and then, an image with an angle smaller than a third threshold and/or a distance smaller than a fourth threshold in the plurality of fourth images may be deleted to obtain the at least one second image. The angle is a difference value of poses of the at least two images in a world coordinate system, and the distance is a difference value of positions of the poses of the at least two images in the world coordinate system. As an example, the angle may be a difference of an elevation angle (pitch), a yaw angle (yaw), or a roll angle (roll) of the poses of the at least two images in the world coordinate system.
For example, the image with an angle smaller than the third threshold and/or a distance smaller than the fourth threshold in the fourth images is a redundant image in the second images.
Here, the image set composed of the plurality of fourth images may also be referred to as a similar image set, and the similar image set includes the similar image set composed of the at least one second image. In addition, when there is no redundant picture among the plurality of fourth pictures, the operation of deleting the picture may not be performed. At this time, the fourth image is the second image.
Due to the fact that the images with the angles smaller than the preset threshold and/or the distances smaller than the preset threshold are higher in overlapping degree in spatial distribution, redundant images exist in the image similar image set. Therefore, by deleting the images with the angles smaller than the preset value and/or the distances smaller than the preset value in the similar image set, the overlapping degree of the second images reserved in the similar image set is moderate, the surrounding space can be uniformly covered, and the more refined similar image set is obtained. Here, acquiring a more refined set of similar images may help to reduce the amount of calculation of the pose of the subsequently acquired second image in the first local coordinate system, thereby helping to improve the efficiency of acquiring the pose of the first image in the world coordinate system.
In some embodiments, after the outlier image is deleted from the plurality of second images, the redundant image may be further deleted, which is not limited in this application.
In a second aspect, an embodiment of the present application provides an apparatus for performing a method in the first aspect or any possible implementation manner of the first aspect, and specifically, the apparatus includes a module for performing a method in the first aspect or any possible implementation manner of the first aspect. The apparatus may include an acquisition unit, a processing unit, and an output unit.
The terminal comprises an acquisition unit used for acquiring a first image shot by a first terminal.
The processing unit is used for determining at least one second image matched with the first image in the image database according to the first information of the first image and the first information of the images in the image database, wherein the first information is used for indicating the global features of the images, the image database comprises the first information of multi-frame images, the second information of the multi-frame images and the poses of the multi-frame images in a world coordinate system, and the second information is used for indicating the local features of the images.
The processing unit is further configured to determine, according to the second information of the first image and the second information of the at least one second image, poses of the first image and the at least one second image in a first local coordinate system of the first terminal.
An output unit configured to output the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system.
With reference to the second aspect, in some implementations of the second aspect, the creating unit is further configured to create the image database.
With reference to the second aspect, in some implementations of the second aspect, the establishing unit is specifically configured to acquire a video captured by a second terminal, acquire a pose of a multi-frame image in the video in a second local coordinate system of the second terminal, and determine the pose of the multi-frame image in the video in a world coordinate system according to third information and the pose of the multi-frame image in the video in the second local coordinate system. Wherein the third information is used to indicate a position of at least a portion of the image in the video in a map, the map being associated with the world coordinate system.
The establishing unit is further specifically configured to acquire the first information and the second information of the multiple frames of images in the video.
With reference to the second aspect, in certain implementations of the second aspect, the acquiring unit is further configured to acquire a first location of the first image, where the first location is from a GPS module or a WiFi module of the first terminal.
The acquisition unit may receive data sent by the GPS module or the WiFi module, such as the first location.
Optionally, the obtaining unit may further send a request message to the GPS module or the WiFi module, where the request message is used to request the GPS module or the WiFi module to send the first position of the first image acquired by the obtaining unit. In response to the request message, the GPS module or the WiFi module may transmit the first location to the acquisition unit.
The processing unit is further configured to determine at least one image in the image database according to the first position of the first image, and a distance between the position of each image of the at least one image and the first position of the first image is smaller than a first threshold.
The processing unit is further configured to match the first information of the first image with the first information of the at least one image, and acquire at least one second image matching the first image in the at least one image.
With reference to the second aspect, in some implementations of the second aspect, the processing unit is specifically configured to: according to the first information of the first image and the first information of the images in an image database, determining a plurality of third images matched with the first image in the image database, and deleting the images, of the plurality of third images, with the distance from a cluster center of the plurality of third images being larger than a second threshold value so as to obtain the at least one second image, wherein the cluster center is determined according to the positions of the plurality of third images in a world coordinate system.
With reference to the second aspect, in some implementations of the second aspect, the processing unit is specifically configured to: determining a plurality of fourth images matched with the first image in the image database according to the first information of the first image and the first information of the images in the image database, and deleting the images of which the angle is smaller than a third threshold and/or the distance is smaller than a fourth threshold in the plurality of fourth images to obtain the at least one second image, wherein the angle is the difference value of the poses of at least two images in the world coordinate system, and the distance is the difference value of the positions of the poses of at least two images in the world coordinate system.
In a third aspect, an embodiment of the present application provides an apparatus for positioning, including: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method as in the first aspect above or any possible implementation manner of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation manner of the first aspect.
In a fifth aspect, the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method of the first aspect or any possible implementation manner of the first aspect.
It should be understood that, for the beneficial effects obtained by the descriptions of the second to fifth aspects and the corresponding implementations of the present application, reference is made to the beneficial effects obtained by the first aspect and the corresponding implementations of the present application, and details are not repeated.
Drawings
Fig. 1 is a schematic block diagram of a positioning apparatus provided in an embodiment of the present application;
FIG. 2 is a schematic block diagram of a system architecture for use with aspects of embodiments of the present application;
fig. 3 is a schematic flow chart of a method of positioning according to an embodiment of the present application;
FIG. 4 is an example of an indoor collection route of an embodiment of the present application;
FIG. 5 is an example of a video extracted according to an embodiment of the present application;
FIG. 6 is an example of a visualization of an image database provided by an embodiment of the present application;
FIG. 7 is an example of retrieving a second image according to an embodiment of the present application;
fig. 8 is a specific example of region clustering of a plurality of frames of second images;
fig. 9 is an example of further deleting redundant pictures after deleting outlier pictures in the plurality of second pictures;
fig. 10 is a specific example of pose solution provided by an embodiment of the present application;
FIG. 11 is a specific example of a real-time positioning result according to an embodiment of the present application;
FIG. 12 is a schematic flow chart diagram of another method of positioning provided by embodiments of the present application;
FIG. 13 is a schematic block diagram of an apparatus for another positioning of an embodiment of the present application;
FIG. 14 is a schematic view of another positioning apparatus according to an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
First, related terms to which the present application relates will be briefly described.
Pose: i.e. information indicating the position and orientation of a certain image or of a certain content in an image. The content may be an object, a person, a building, an animal, and the like, but the present application is not limited thereto. As an example, the pose may be 6 degrees of freedom (6 degrees of freedom, 6 DOF). The position can be represented by coordinates (X, Y, Z) in euclidean space, and the attitude by the rotational coordinates pitch (pitch), yaw (yaw) and roll (roll).
Simultaneous localization and mapping (SLAM): the method is characterized in that the method starts to move from an unknown position in an unknown environment, self-positioning is carried out according to position estimation and a map in the moving process, and meanwhile, an incremental map is built on the basis of self-positioning to realize autonomous positioning and navigation. By SLAM tracking localization, the pose in the relative space (i.e., relative coordinate system) of the camera can be obtained.
Fig. 1 is a schematic block diagram of an apparatus 100 for positioning according to an embodiment of the present disclosure. The apparatus 100 may be applied to a terminal, such as a mobile phone, a wearable device, a Virtual Reality (VR) device, an AR device, a vehicle-mounted smart terminal, and the like, and the apparatus 100 may also be applied to a cloud, such as a server, which is not limited in this embodiment. As an example, the apparatus 100 may be used for visual positioning.
As shown in fig. 1, the apparatus 100 includes an image retrieval module 110 and a pose resolving module 120.
The image retrieving module 110 is configured to obtain a first image (which may also be referred to as an image to be located, or the like) shot by the first terminal, and determine, in the image database, at least one second image that matches the first image according to first information of the first image and first information of images in the image database.
The first information is used for indicating a global feature of the image, and may also be referred to as global feature information. The image database comprises first information of the multi-frame images, second information of the multi-frame images and poses of the multi-frame images in a world coordinate system. The second information is used to indicate a local feature of the image, and may also be referred to as local feature information.
In some embodiments, the at least one second image may constitute a set of images, which may be referred to as a set of similar images of the first image.
In the embodiment of the present application, the image database may also be referred to as an image pose library, an image pose feature library, a pose feature library, and the like, which are not limited herein.
The pose calculation module 120 may determine poses of the first image and the at least one second image in the first local coordinate system of the first terminal according to the second information of the first image and the second information of the at least one second image. Then, the pose of the first image in the world coordinate system may be output according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system.
As a possible implementation manner, a mapping relationship (or may also be referred to as a transformation relationship) between the first local coordinate system and the world coordinate system may be determined according to the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. And then, according to the mapping relation, the pose of the first image in the first local coordinate system can be converted, the pose of the first image in the world coordinate system is determined, and the pose is output.
By way of example, the mapping relationship between the world coordinate system and the first local coordinate system may be described by a rotation matrix or a translation vector, which is not limited in this application.
In the embodiment of the present application, the world coordinate system may also be referred to as an absolute coordinate system. As an example, the world coordinate system may describe the position of a camera (or a terminal containing the camera) as a reference coordinate system in the environment, and may also be used to describe the position of any object in the environment.
As a possible implementation manner, the image database may include a plurality of mapping relationships, where one mapping relationship may indicate a correspondence between an identifier of one image, first information of the image, second information of the image, and a pose of the image in a world coordinate system, and the application is not limited thereto.
As an example, the pose calculating module 120 may obtain the pose of the at least one second image in the world coordinate system and the second information from the image database, which is not limited in this application.
In the embodiment of the application, the pose of the first image in the world coordinate system is the pose of the first terminal in the world coordinate system when the first terminal shoots the first image, and may also be referred to as the pose of the camera.
In some embodiments, the first image and the at least one second image may form an image set, which may be referred to as a local image set. The first local coordinate system, that is, the camera coordinate system of the first terminal, may be a relative coordinate system constructed with an origin of the first local coordinate system, which is any one of the frame images in the local image set. By way of example, the first local coordinate system may be constructed using SLAM, which is not limited in this application.
Therefore, the embodiment of the application realizes the positioning of the first image by determining at least one second image matched with the first image shot by the first terminal and the pose of the first image and the at least one second image in the local coordinate system of the first terminal in the image database, and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. Therefore, the embodiment of the application can obtain the relation between the local coordinate system and the world coordinate system based on the pose of a part of images in the local coordinate system and the pose in the world coordinate system, so that the image to be positioned (such as the first image) is positioned in the world coordinate system without acquiring a 3D point cloud for each image.
Compared with a calculation scheme of constructing a 3D point cloud feature library of an environment (including global 3D point cloud features) and realizing a pose based on feature point matching in the existing scheme, the embodiment of the application does not utilize the 3D point cloud features of the image, but utilizes the pose of the image for positioning. Because the 3D point cloud feature library of the environment needs to depend on professionals and acquisition equipment, time and labor are consumed, the cost is high, and the 3D point cloud features do not need to be acquired, on one hand, low-cost equipment, such as a camera (such as a mobile phone camera) with low resolution and a small field angle, can be adopted for constructing the image database without depending on the professionals and the professional equipment, and the data volume of the image database can be reduced, so that the cost for constructing the image database can be reduced; on the other hand, the time for constructing the database can be shortened, and the image database can be constructed efficiently. Therefore, the scheme for positioning based on the image pose and the image database can be beneficial to positioning with high efficiency and low cost.
Fig. 2 is a schematic block diagram of a system architecture 200 that is applicable to aspects of embodiments of the present application. As shown in fig. 2, the system architecture 200 may include a hardware abstraction layer data interface 210, a pose acquisition module 220, an application service 230, and a data processing module 240. The pose acquisition module 220 may include an image retrieval module 210 and a pose resolving module 220. The image retrieval module 210 may further include a feature extraction unit 2211 and an image retrieval unit 2212, and the pose solution module 220 may further include a local pose solution unit 2221 and a coordinate transformation unit 2222. The data processing module 240 may include an image database construction module 241 and a video 242.
It should be understood that fig. 2 shows modules or units of one system architecture suitable for the embodiments of the present application, but these modules or units are merely examples, and the embodiments of the present application may also include other parts or variations of the parts in fig. 2, or may not include all of the modules or units in fig. 2.
In fig. 2, the pose acquisition module 220 may serve as an example of the apparatus 100, the image retrieval module 210 may serve as an example of the image retrieval module 110, and the pose solution module 222 may serve as an example of the pose solution module 120, which is not limited in this application.
In some embodiments, the pose acquisition module 220 and the image database construction module 241 may exist in the form of binary software packages. In addition, the pose acquisition module 220 may be deployed in a framework layer of an operating system of the terminal, and provide positioning information for a service application in an application layer through an interface, for example, an Application Programming Interface (API).
It should be noted that the embodiments of the present application may be implemented based on hardware such as a Global Positioning System (GPS)/magnetometer, a gyroscope, a wireless fidelity (WiFi) chip, and a camera chip that are built in the terminal. Hardware drivers or data read-write modules corresponding to the hardware or chips can perform interaction of data and control through the hardware abstraction layer data interface 210 according to a standard system interface and an upper layer positioning software service program.
In some embodiments, the positioning scheme provided by the embodiments of the present application can implement pose solution through an offline library-online positioning phase framework.
In the offline banking stage, the image database building module 241 may build an image database, for example, based on the video 242. As an example, the image database building module 241 may obtain, through a SLAM algorithm, a pose of each image frame in the video 242 in a local coordinate system of the camera, and fuse the positions of the video 242 in a map to obtain a sequence of image frames with poses in a world coordinate system. Further, the image database construction module 241 may further extract global features (may also be referred to as global feature information, first information) and local features (may also be referred to as local feature information, second information) of image frames in the video 242. Here, the local coordinate system may be a local coordinate system constructed with any one of the images in the video 242 as an origin of the second local coordinate system.
The video 242 may be referred to as a data source for constructing an image database, and may be a mobile phone video, for example.
As an example, the image database construction module 241 may output the image feature library after the image database is constructed. Alternatively, the image database may be stored in a memory, which is not limited in this application.
Therefore, when the image database is constructed, only the pose of the image in the world coordinate system, the global feature information and the local feature information of the image need to be acquired, and 3D point cloud features (such as global 3D point cloud features) highly uniform with the environment do not need to be acquired. Therefore, on one hand, the method and the device can adopt low-cost equipment, for example, a camera (such as a mobile phone camera) with low resolution and small field angle to construct the image database, do not need to depend on professional personnel and professional equipment, and can reduce the data volume of the image database, so that the cost for constructing the image database can be reduced; on the other hand, the time for constructing the database can be shortened, and the method is favorable for efficiently constructing the image database. Therefore, the scheme for positioning based on the image pose and the image database can be beneficial to positioning with high efficiency and low cost.
In the on-line positioning stage, the image can be acquired in real time, and the pose of the image is acquired according to the image.
As an example, the hardware abstraction layer data interface 210 may be used to acquire image data acquired by a camera, such as to acquire an image to be positioned, a first image, and so on. Optionally, the hardware abstraction layer data interface 210 may also be used to obtain sensor signals (such as magnetometer, gyroscope parameters, etc.), WiFi chip parameters, without limitation. As an example, the hardware abstraction layer data interface 210 may obtain the above information from a standard API extracted from a hardware abstraction layer of an operating system of the terminal, which is not limited in this application.
After acquiring the first image, the hardware abstraction layer data interface 210 may send the first image to the pose acquisition module 220. After the pose acquisition module 220 acquires the first image, the feature extraction unit 2211 may extract global feature information of the first image, and the image retrieval unit 2212 may determine at least one second image matching the first image in the image database according to the global feature information of the first image and the global feature information of the images in the image database. As an example, the set of second images may be referred to as a similar image set.
The local pose calculation unit 2221 may determine the poses of the first image and the at least one second image in the local coordinate system according to the local feature information of the first image and the local feature information of the at least one second image. Here, the local coordinate system may be, for example, the above first local coordinate system, and specifically, refer to the description of the first local coordinate system, which is not described again. Thereafter, the coordinate transformation unit 2222 may determine a mapping relationship between the local coordinate system and the world coordinate system according to the pose of the at least one second image in the world coordinate system and the pose of the at least one second image in the local coordinate system, and then transform the pose of the first image in the local coordinate system according to the mapping relationship to obtain the pose of the first image in the world coordinate system.
In some optional embodiments, the pose acquisition module 220 may send the pose of the first image to the application service 230 after acquiring the pose. The application service 230 may utilize the pose to provide pose location services for the user.
In some embodiments, application service 230 may also initiate a request for gesture positioning service to gesture acquisition module 220. In response to the request, the pose acquisition module 220 may acquire an image from the hardware abstraction layer data interface 210 and locate the image to acquire the pose of the image in the world coordinate system.
By way of example, the application services 230 may be various types of Location Based Services (LBS), AR/VR application services, and the like, including, for example and without limitation, location-specific applications, various types of e-commerce shopping applications, various types of social communication applications, various types of in-car applications, online to offline (O2O) door service applications, exhibition-to-tour applications, family anti-walkaway applications, emergency rescue service applications, video entertainment applications, gaming applications, and the like that require precise location. A typical application scenario is, for example, AR navigation, in which a navigation icon is superimposed on a real world captured by a camera of a terminal to provide more indicative navigation information.
In the system architecture 200 shown in fig. 2, the pose acquisition module 220 may be implemented on a client (e.g., a terminal) to support various types of application services on the client. As an example, in response to a positioning service request initiated by the application service 230, as positioning software served by the system, the pose acquisition module 220 starts to operate, and acquires an image pose in real time on a smartphone by using the positioning method provided in the embodiment of the present application.
As one possible implementation, the system architecture 200 may support a client-server model. As an example, the data processing module 140 may be stored in a server (e.g., a server), and video data acquired by a client (e.g., a terminal) may be uploaded to the server through a network connection. After the server side completes the construction of the image database, the image database or part of data in the image database can be downloaded to the client side through network connection.
As another possible implementation, the system architecture 200 may support a client mode, i.e., may be directly offline for location processing. At this time, the data processing module 140 may be stored on the client, forming an offline pure client mode.
Fig. 3 shows a schematic flow chart of a method 300 for positioning according to an embodiment of the present application. By way of example, the method 300 is described as including an offline banking phase and an online positioning phase. The method 300 will be described below in conjunction with the system architecture 200 of fig. 2.
It should be understood that fig. 3 shows steps or operations of a method of positioning, but these steps or operations are merely examples, and other operations or variations of the operations in fig. 3 may also be performed by embodiments of the present application. Moreover, the various steps in FIG. 3 may be performed in a different order presented in FIG. 3, and it is possible that not all of the operations in FIG. 3 may be performed.
As shown in fig. 3, method 300 may include steps 301 through 310. Wherein, the off-line composition phase may include steps 301 to 304, and the on-line positioning phase may include steps 305 to 310.
And step 301, video acquisition. Here, the captured video may be used as an input for the off-line library construction stage.
As an example, video capture may be performed indoors. As a possible implementation manner, the collection route may be planned according to the indoor plan, for example, multiple collection routes may be planned, and the collection routes may be located on one floor or multiple floors, without limitation.
Fig. 4 shows an example of an indoor acquisition route. For example, the user may start from the start point of the route with the terminal (e.g., a mobile phone), continue shooting in the environment along the route, and end shooting to the end point of the route.
While the video is collected, the terminal can also acquire the position of the route in the map. For example, when starting to shoot a video, the user may manually start the positioning module in the terminal to acquire the position of the starting point of the route in the map, and when reaching the end point of the route, the user may manually start the positioning module to acquire the position of the end point of the route in the map. Or, when the video starts to be shot, the positioning module in the terminal may automatically acquire the position of the starting point of the route in the map, and when the ending point of the route is reached, the positioning module may automatically acquire the position of the ending point of the route in the map. Here, the position of the start point of the route in the map may be referred to as the position of the first frame image in the captured video in the map, and the position of the end point of the route in the map may be referred to as the position of the last frame image in the captured video in the map.
In the embodiment of the application, the position information of the starting point and/or the ending point in the map can be used for assisting calculation, for example, for calculating the pose of an image in a video in a world coordinate system. Illustratively, the positioning module may be at least one of a GPS/magnetometer, a gyroscope, a WiFi chip, and the like, for example.
It should be noted that, here, an indoor scene is taken as an example for description, and the scheme is also applicable to an outdoor scene, that is, in step 301, video acquisition may also be performed outdoors, for example, an outdoor acquisition route may be planned, which is not limited in this application.
As an example, in the process of capturing and acquiring a video image by using a terminal, the terminal may also automatically perform frame extraction processing on the video image to acquire a multi-frame image. Fig. 5 shows an example of a plurality of frame images decimated for one video.
In some embodiments, when a client-server mode is employed, the terminal may upload the acquired video to the server. When the client mode is employed, the terminal may store the video and process the video.
As an example, step 303 may be performed by the image database construction module 241 on the data processing module 240 in fig. 2 described above. Referring to fig. 3, step 303 may further include step 3031 and step 3032.
As an example, for each frame of video image, the pose of the image in the local coordinate system of the camera (i.e., the terminal) can be acquired using the SLAM algorithm. Here, the local coordinate system is a relative coordinate system of the camera in the process of constructing the image database.
In other embodiments, an AR engine (AREngien) algorithm may also be used to obtain the pose of each frame of image in the local coordinate system of the camera, which is not limited in this application.
After the pose of each frame of image in the local coordinate system of the camera is obtained, the image coordinate registration can be carried out by combining the position of the image in the map, and the pose of the image in the camera coordinate system is converted into the world coordinate system. For example, image coordinate registration may be performed in conjunction with the position of the start point of the video capture route in the map (i.e., the position of the first frame image in the video in the map), and/or the position of the end point of the video capture route in the map (i.e., the position of the last frame image in the video in the map).
Fig. 6 shows an example of a visualization of an image database. As shown in fig. 6, the acquisition route may be selected on the left side of the interface. When a collection route is selected, the collection route may be presented to the right of the interface, e.g., the location of the collection route in a map may be presented. As an example, fig. 6 shows 3 acquisition routes acquired in 4-storied buildings of a building a. For example, the name of the collection route 1 may be 0a1c55c4-9303-46a5-b785-3ad77d80d809_ 20200415-.
Optionally, a folder in which the image data corresponding to the acquisition route is located may also be opened through a visual display interface, for example, the folder in which the image data corresponding to the selected acquisition route is located may be opened through an "open folder" button at the upper right corner in fig. 6.
Illustratively, the following table 1 can be shown after selecting the track tree-A building-4-building-track-0 ale55c4-9303-46a5-b785-3ad77d80d809_ 202004150-. Wherein pos _ x, pos _ y, pos _ z represent position information; the quat _ x, quat _ y, quat _ z, and quat _ w represent directions represented by quaternions that can be translated into attitude information, e.g., yaw, roll, pitch.
TABLE 1
FileName | pos_x | pos_y | pos_z | quat_x | quat_y | quat_z | quat_w |
000001 | 1.8536 | 0.0189 | 76.6551 | 0.4322 | -0.5769 | -0.5329 | 0.4430 |
000002 | 2.8263 | 0.0043 | 76.6297 | 0.4190 | -0.5702 | -0.5573 | 0.4341 |
000003 | 3.3308 | 0.0002 | 76.5465 | 0.4543 | -0.5904 | -0.53556 | 0.3975 |
… |
Optionally, the image frames, the global specific information or the local feature information acquired on each acquisition route may also be displayed in the visualization result of the image database, which is not limited in the present application.
Specifically, the global feature information and the local feature information may be extracted for each frame of image collected by the terminal. The global feature information is image search feature information, such as NetVLAD, boww, and the like, and is not limited. Examples of the local feature information include, but are not limited to, scale-invariant feature transform (SIFT), ORB, SUPERPOINT, and D2 Net.
As an example, the image database may include a pose, global features, and local features of each frame of image in the video captured along the planned path in a world coordinate system. As an example, the images in the image database may be represented as (R, T, F)l,Fg) Wherein R represents position information, T represents angle information, FlRepresenting a global feature, FgLocal features are represented.
Step 305, image acquisition. Here, the acquired image is the first image, and may also be referred to as an image to be positioned or the like, which is not limited in this application.
As an example, a user may take an image using a terminal. Accordingly, the hardware abstraction layer data interface 210 in the smart phone may acquire image data acquired by the camera and send the image data to the pose acquisition module 220.
It should be noted that the terminal in step 305 and the terminal in step 301 may be the same terminal or different terminals, and this application is not limited thereto.
Optionally, step 306, initial position acquisition.
As an example, when the image is acquired in step 305, the hardware abstraction layer data interface 210 may further acquire positioning information acquired by a WiFi chip or a GPS module and send the positioning information to the pose acquisition module 220. The positioning information may be referred to as an initial position of the first image.
It should be noted that the position provided by the WiFi chip or the GPS module is a "coarse positioning" position, which has a low accuracy and an error of, for example, 3 to 10 meters. This position may be used as the initial position of the first image.
In addition, the hardware abstraction layer data interface 210 may further send information of the gyroscope and the magnetometer to the pose acquisition module 220, so that the pose acquisition module 220 may jointly determine the initial position of the first image according to the positioning information acquired by the WiFi chip or the GPS module and the acquisition information of the gyroscope and the magnetometer.
Optionally, step 307, floor identification. As an example, when the initial position of the first image acquired in step 306 includes altitude information, a floor corresponding to the first image may be identified based on the initial position.
And step 308, retrieving the image.
As an example, the image retrieval module 221 in fig. 2 may determine at least one second image matching the first image in the image database based on the global feature information of the first image acquired in step 305 and the global feature information of the images in the image database. In the embodiment of the present application, determining at least one second image matching the first image in the image data may be further described as retrieving at least one second image matching the first image in an image database.
As an example, the set of at least one second image may be referred to as a similar image set.
In some possible implementations, the image retrieval module 221 may perform image retrieval based on the first image and the initial position acquired in step 306.
In some possible implementations, the image retrieval module 221 may perform image retrieval based on the first image, the initial location obtained in step 306, and the floor obtained in step 307.
Continuing with FIG. 3, as an example, step 308 may include the following steps 3081 to 3083. As an example, step 3081 may be performed by the feature extraction unit 2211, and steps 3082 and 3083 may be performed by the image retrieval unit 2212.
As an example, global feature information of the first image may be extracted using a deep learning algorithm. For example, a NetVLAD layer can be added after the convolutional neural network framework to extract a global descriptor (an example of global feature information). The global feature of the image is an overall attribute of the image, and may include, for example and without limitation, a color feature, a texture feature, a shape feature, and the like.
As a possible implementation manner, at least one second image matching the first image may be retrieved from the image database according to the global feature information obtained in step 3081, so as to obtain a similar image set.
As another possible implementation manner, at least one image may be determined in the image database according to the initial position obtained in step 306, wherein a distance between the position of each image in the at least one image and the initial position of the first image is smaller than a preset threshold. Then, at least one second image matching the first image is retrieved from the at least one image, resulting in a set of similar images.
As a specific example, a region (which may be referred to as a buffer, for example) may be constructed based on the initial position of the first image. For example, a circular buffer may be constructed with a radius of 5 meters (or 10 meters, or 30 meters, but not limited to) from the initial position of the first image. Then, at least one second image matching the first image is retrieved from the images in the image database located in the buffer area, and a similar image set is obtained. As an example, the set of images located within the buffer may be referred to as a set of image candidates.
Therefore, according to the embodiment of the application, at least one image is determined in the image database according to the first position of the first image, so that the first information of the first image is matched with the first information of the at least one image, and at least one second image matched with the first image is obtained in the at least one image. Therefore, the first information of the first image does not need to be matched with the first information of all images in the image database, so that the calculation amount of the terminal is reduced, the matching time is reduced, and the efficiency of obtaining the second image is improved.
As a possible implementation manner, the similarity between the global feature information of the first image and the global feature information of the images in the image database or the image candidate set may be calculated, and the second image may be determined according to the similarity.
For example, the similarity may be calculated by the following formula (1):
wherein similarity (X, Y) represents the similarity between X and Y, X represents the global feature information of the first image, Y represents the global feature information of the image to be matched in the feature library, and XiRepresenting each component in the eigencode of X, yiEach component in the eigen code representing Y, n represents the total number of components in one eigen code, and n is a positive integer.
After obtaining the similarity between each frame of image and the first image, the calculated multiple similarities may be sorted to obtain m previous images with the highest similarity as the second image, where m is an integer greater than 1. As one example, m may be 20. Alternatively, in some other implementations, an image having a similarity greater than a preset threshold with respect to the first image may be used as the second image.
Fig. 7 shows an example of retrieving a second image. As shown in fig. 7, "x" represents global feature information (which may be denoted as C) of the image of the cluster center in the one cluster regionk VLAD) And the first image is included in the clustering area. All images in the cluster region can be processedAnd searching the line to acquire a second image. Wherein ″) represents a first image (which can be represented as) "o" represents global feature information of an image closest to the first image in the cluster region (may be referred to as a closest image), and "●" represents global feature information of an image next to the first image in the buffer region (may be referred to as a next-adjacent image). And when the ratio of the distance between the nearest neighbor image and the first image to the distance between the secondary neighbor image and the first image is greater than a threshold value, the nearest neighbor image is matched with the first image. Alternatively, according to the method, 20 images that can be matched with the first image are determined in the buffer area in sequence as the second image.
Optionally, in step 3083, pose-perceived image refinement.
As an example, when the number of the acquired second images is plural, the plural second images may be further subjected to image refinement. For example, the plurality of second images may be refined based on their poses in the world coordinate system.
As one possible implementation, the outlier image in the second image (i.e., the set of similar images) may be deleted based on the pose of the second image in the world coordinate system. Here, the outlier image in the second image refers to an image having a distance from a cluster center of the plurality of second images, which is determined according to positions of the plurality of second images in the world coordinate system, greater than a preset threshold.
For example, a position of each second image in the world coordinate system may be obtained from the image database, and according to the position, a cluster center of the plurality of second images is determined, and an image having a distance from the cluster center greater than a preset threshold value is determined as an outlier image in the plurality of second images.
It should be noted that the process of determining the image whose distance from the cluster center is greater than the second threshold (i.e., determining the outlier image) is a process of determining whether an image whose distance from the cluster center is greater than the second threshold (i.e., whether the outlier image exists) exists in the plurality of second images. As a possible determination result, there may be no outliers in the plurality of second images. As another possible determination result, at least one frame of outlier image may exist in the plurality of second images.
In some embodiments, when there is no outlier image in the plurality of second images, the operation of deleting the image may not be performed.
Since images at different positions may have very similar textures, there may be errors in the similarity calculation process, which may result in outlier images in the similar image set. Therefore, the images in the similar image set can be subjected to error elimination by eliminating the outlier images, and a more accurate similar image set can be obtained. Here, acquiring a more accurate set of similar images may help to make the pose of the subsequently acquired second image in the first local coordinate system more accurate, thereby helping to improve the accuracy of the pose of the first image in the world coordinate system.
Fig. 8 shows one specific example of region clustering of the second images of the plurality of frames. And performing clustering calculation on the second images of the plurality of frames in the similar image set to obtain that most of the second images are gathered in the areas 1 and 2, and only a small part of the second images are scattered in the areas 3 and 4. At this time, the second images corresponding to the clustering points in the region 2 and the region 3 may be deleted, and the second images corresponding to the clustering points in the region 1 and the region 2 may be used as the second images in the updated similar image set.
As another possible implementation, redundant images in the second image (i.e., the set of similar images) may be deleted based on the pose of the second image in the world coordinate system. As an example, the pose of each second image may be obtained from an image database, and then an image with an angle smaller than a preset angle threshold and/or a distance smaller than a preset distance threshold may be determined as a redundant image of the plurality of second images in the plurality of frames of second images according to the pose. The angle is a difference value of poses of the at least two images in a world coordinate system, and the distance is a difference value of positions of the poses of the at least two images in the world coordinate system.
As an example, the angle may be a difference of an elevation angle (pitch), a yaw angle (yaw), or a roll angle (roll) of the poses of the at least two images in the world coordinate system.
It should be noted that the process of determining the image with the angle smaller than the preset angle threshold and/or the distance smaller than the preset distance threshold (i.e., determining the redundant image) is a process of determining whether an image with the angle smaller than the preset angle threshold and/or the distance smaller than the preset distance threshold exists in the plurality of second images (i.e., whether the redundant image exists). As a possible determination result, there may be no redundant picture in the plurality of second pictures. As another possible determination result, at least one frame of redundant images may exist in the plurality of second images.
In some embodiments, when there is no redundant picture in the plurality of second pictures, the operation of deleting the picture may not be performed.
Because the images with the angles smaller than the preset threshold and/or the distances smaller than the preset threshold have higher overlapping degrees in the spatial distribution, it can be considered that the images with the angles smaller than the preset threshold and/or the distances smaller than the preset threshold have higher overlapping degrees in the spatial distribution, so that redundant images exist in the image similar image set. Therefore, by deleting the images with the angles smaller than the preset value and/or the distances smaller than the preset value in the similar image set, the overlapping degree of the second image reserved in the similar image set is moderate, the surrounding space can be uniformly covered, and the method and the device are favorable for obtaining a more refined similar image set. Here, acquiring a more refined set of similar images may help to reduce the amount of calculation of the pose of the subsequently acquired second image in the first local coordinate system, thereby helping to improve the efficiency of acquiring the pose of the first image in the world coordinate system.
In some embodiments, after the outlier image is deleted in the plurality of second images, the redundant image may be further deleted. Fig. 9 shows an example of further deleting redundant images after deleting outlier images among the plurality of second images. Wherein the position of the circle may represent the position of the second image, and the arrow may represent the pose of the second image. As a specific example, the angle threshold may be set to 15, and the distance threshold may be set to 0.5 m. Accordingly, when the difference (i.e., angle) between the poses of any two or more of the at least two second images is less than 15 ° and/or the positional distance is less than 0.5m, the two or more second images can be considered as redundant images (e.g., can correspond to the image corresponding to the position of the white circle in fig. 9). Correspondingly, redundant images in the second image are removed through angle and position filtering. At this time, the angles between the second images remaining in the plurality of second images are greater than or equal to 15 ° and/or the positional distances are greater than or equal to 0.5m, the surrounding space can be uniformly covered, and the degree of overlap is moderate.
As a specific example, through steps 3081 and 3082, 20 frames of second images in the original similar image set can be reduced to 8 frames of second images. Illustratively, the 8 frames of images may be respectively represented as (R)1,T1)、(R2,T2)、(R3,T3)、(R4,T4)、(R5,T5)、(R6,T6)、(R7,T7)、(R8,T8). At this time, the first image may be represented as (R)q,Tq)。
With continued reference to fig. 3, after the image retrieval module 221 acquires the set of similar images, the first image and the at least one second image may be sent to the pose solution module 222.
And 309, solving the pose of the camera.
Illustratively, after the pose solution module 222 acquires the first image and the at least one second image, a camera pose solution may be performed. Referring to fig. 3, step 309 may further include steps 3091 and 3092.
3091, resolving local pose.
As an example, the local pose solution unit 2221 in the image retrieval module 221 may perform local pose solution based on the first image and the at least one second image.
Specifically, the local pose calculating unit 2221 may determine the poses of the first image and the at least one second image in the local coordinate system of the first terminal according to the local feature information of the first image and the local feature information of the at least one second image. For example, the local pose calculating unit 2221 may use the first image and the at least one second image as a local image set, and obtain the relative pose of each image in the local image set in the local coordinate system according to the local feature information of each image in the local image set. Here, the local coordinate system may be a relative coordinate system of a camera constructed with any one of the images in the local image set as an origin of the local coordinate system. The local feature of the image is a local expression of the image feature, and can reflect the local characteristic of the image.
As a possible implementation manner, the local pose calculating unit 2221 may perform matching on each image in the local image set according to the local feature information in the local image set, to obtain the pose of each image in the local image set in the local coordinate system, and the point cloud information of each image mapped to the three-dimensional space. Here, the pose of each image in the local image set in the local coordinate system and the point cloud information mapped into the three-dimensional space may be referred to as local structure information of the local image set. In some embodiments, the process of acquiring the local structure information of the local image set may be referred to as a process of recovering the local structure information of each image in the local image set.
In the embodiment of the application, the local structure information of the local image concentrated images is recovered, so that on one hand, the construction of a global 3D point cloud feature library can be avoided, the library construction cost is saved, on the other hand, errors caused by alignment of a large number of images by adopting the global feature library can be avoided, and therefore more accurate pose of each image in the local image concentrated images can be obtained.
For example, the coordinate transformation unit 2222 may acquire the pose of the second image in the world coordinate system in the similar image set, and determine the mapping relationship (i.e., transformation relationship) between the local coordinate system and the world coordinate system according to the pose of the second image in the local coordinate system acquired in step 3091 above. And then, according to the mapping relation, performing coordinate transformation on the pose of the first image in the local coordinate system to obtain the pose of the first image in the world coordinate system.
As a possible implementation manner, the coordinate conversion unit 2222 may acquire the pose of the second image in the world coordinate system from the image database output in step 304.
Fig. 10 shows one specific example of performing the pose solution. Wherein (a) the image represents a local image set input in pose resolving, including n frames of second images in the similar image set, and is represented as (R)1,T1)…(Rn,Tn) And a first image represented as (R)q,Tq). As an example, n is 8 in fig. 10, but the present application is not limited thereto.
Meanwhile, when the pose is resolved, the pose and local feature information of the n frames of second images in the similar image set in the world coordinate system are input. As an example, the pose and local feature information of the n frames of second images in the world coordinate system may be obtained from an image database. (e) An example of the image trajectory distribution of 8 frames of second images in the world coordinate system is shown, in which 8 boxes constituting the image trajectory respectively correspond to the poses of the 8 frames of second images.
As an example, for an input local image set, the SFM algorithm may be used to estimate the pose of each image in the local image set in the local coordinate system.
First, image feature points of a first image in the local image set may be extracted to obtain local feature information of the first image. And then matching the local characteristic information of the first image with the local characteristic information of the second image. Fig. 9 (b) illustrates a specific example of three feature points that match in the three-frame image.
Incremental camera parameter solution may then be performed on the matched feature points between the images in the local image set. By way of example, fig. 9 (c) illustrates a stationAn example of incremental camera parameters corresponding to three images in a partial image set, where (c) the upper image in the figure contains 7 feature points. (c) The lower left image in the figure is a frame image in the local image set, and includes 4 feature points (for example, the left 4 feature points among the above 7 feature points), and the incremental parameter of this left image is P0=K[I|0]Where K denotes a camera memory, I denotes an initial direction (0,0,0,1), and 0 denotes an initial position (0,0, 0). That is, the local coordinate system of the local image set may be constructed with the left image as the origin. (c) The lower intermediate image in the figure is a frame image in the local image set, and includes the 4 feature points (for example, the front 4 feature points out of the 7 feature points), and the incremental parameter of the intermediate image is P1=K[R1|t1]Wherein R is1Indicating the position of the intermediate image, t1Representing the angle of the intermediate image. (c) The lower right image in the figure is a frame image in the local image set, and includes the 4 feature points (for example, the right 4 feature points out of the above 7 feature points), and the incremental parameter of the right image is Pi=K[Ri|ti]Wherein R isiIndicating the position of the intermediate image, tiRepresenting the angle of the intermediate image.
And then, acquiring the pose and the 3D point cloud of each image in the local image set in the local coordinate system according to the obtained incremental camera parameters. (d) One specific example of an image track distribution and 3D point cloud of 9 frames of images in a local coordinate system is illustrated. Wherein, 9 boxes forming the image track respectively correspond to the poses of the 9 frames of images. Wherein the black square represents the pose of the first image in the local coordinate system, and the white square represents the pose of the 8 frames of second image in the local coordinate system.
Thereafter, the pose (or image trajectory distribution) of each image in the local image set in the local coordinate system and the pose (or image trajectory distribution) of the second image in the similar image set in the world coordinate system may be input to the coordinate conversion unit 2222, and coordinate conversion may be performed by the coordinate conversion unit 2222. (f) A specific example of coordinate transformation is illustrated. WhereinAnd obtaining the mapping relation between the local coordinate system and the world coordinate system according to the poses of the 8 frames of second images in the world coordinate system and the poses of the 8 frames of second images in the local coordinate system. As an example, the mapping relationship may be a similarity transformation matrix (R) from a local coordinate system to a world coordinate systemi,ti,αi). Further, the pose of the first image in the local coordinate system is multiplied by the similarity transformation matrix (R)i,ti,αi) And obtaining the pose of the first image in the world coordinate system.
And 310, outputting a positioning result.
Specifically, after the pose of the first image in the world coordinate system is obtained, the pose may be output, that is, the positioning result of the first image is output, so that the positioning of the first image is completed.
Fig. 11 shows a specific example of the real-time positioning result. The real scene in the map is the current position of the terminal, the small circle in the lower map represents the visual result of the current position in the two-dimensional map, and the arrow represents the direction information.
Therefore, the embodiment of the application realizes the positioning of the first image by determining at least one second image matched with the first image shot by the first terminal and the pose of the first image and the at least one second image in the local coordinate system of the first terminal in the image database, and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. Therefore, the embodiment of the application can obtain the relation between the local coordinate system and the world coordinate system based on the pose of a part of images in the local coordinate system and the pose in the world coordinate system, so that the image to be positioned (such as the first image) is positioned in the world coordinate system without acquiring a 3D point cloud for each image.
Based on the above positioning scheme, the embodiment of the application can make full use of the existing terminal device, for example, a user-level receiver device (such as a mobile phone) or a navigation-level receiver device (such as a vehicle), to perform image acquisition, image feature extraction, and robust image retrieval and matching, so as to implement positioning (e.g., 6DoF positioning). In the image retrieval process, on the basis of image retrieval according to global feature information, the method and the device can further consider the spatial distribution of the retrieved images, construct a dual-threshold image refinement algorithm by combining the pose, delete the outlier images and the redundant images (also called secondary retrieval), and obtain a refined similar image set. For the local image set, the SFM algorithm can be executed, the relative pose of the images in the local image set in the local coordinate system is recovered based on the local structure information, and the relative pose of the images to be positioned is converted into the world coordinate system by combining the poses of the images in the pose feature library in the world coordinate system, so that the solution of the camera pose is realized.
The method and the device can solve the problems that a point cloud database in LBS/AR/VR application service requirements is difficult to construct and cannot be popularized on a large scale. Specifically, the solution of indoor and outdoor positioning without depending on a 3D point cloud feature library is provided in the embodiments of the present application. Compared with a positioning scheme needing to construct a point cloud database, the method and the device have the advantages of high positioning accuracy and low cost, and break the high-cost barrier of traditional global 3D point cloud feature library construction. In addition, the embodiment of the application does not need to align each image to construct a global 3D point cloud database, but only needs to acquire the pose of the image and extract the global features and the local features of the image to construct an image pose database. In the on-line positioning stage of the embodiment of the application, firstly, the image to be positioned is searched in the database, then the local structure information of the searched image and the image to be positioned is recovered, the image to be positioned and the relative pose of the searched image in the relative coordinate system are obtained, finally, the pose feature library is combined to realize the conversion from the relative coordinate system to the world coordinate system, and the solution of the camera pose is realized.
According to the above embodiments, an example of the positioning accuracy of the present application in a place (e.g., nanjing) and B place (e.g., west ampere) is shown in table 2 below. Wherein, the error of the position location is in centimeter level, the proportion of the angle location precision within 3 degrees is 48.1 percent to 67.7 percent, and the proportion of the angle location precision within 10 percent is more than 90 percent.
TABLE 2
<1° | <3° | <10° | X(m) | Y(m) | Z(m) | |
A ground | 10% | 48.1% | 91.9% | 0.17 | 0.01 | 0.13 |
B ground | 7.5% | 67.7% | 93.7% | -0.01 | 0.004 | -0.019 |
The reason why the technical effects can be achieved is that the image pose library (a model), the image retrieval algorithm based on pose perception and the camera pose solving algorithm (two algorithms) based on local structure information provided by the application have solid theoretical basis and practical basis. The theoretical basis is that the image data sets uniformly distributed in the image database can be searched based on the global feature information and the position of the image to be positioned, stable input data are provided for SMF to restore local structure information, and redundant information is not generated while the image to be positioned participates in reconstruction as much as possible. And then, by combining with global prior information provided by the image pose library, the transformation relation (namely mapping relation) from the local coordinate system to the global coordinate system can be obtained, and accurate pose calculation is realized. The practical foundation is that the terminal (such as a mobile phone) is used as an easily-obtained image acquisition device, and is matched with an easily-operated acquisition mode of acquiring videos, so that the data acquisition efficiency can be greatly accelerated, and the data acquisition process and the labor cost are simplified.
Fig. 12 shows a schematic flow chart of a method 1200 for positioning according to an embodiment of the present application. The method 1200 can be applied to a terminal or a cloud. As an example, method 1200 may be performed by device 100 of the positioning in fig. 1, or by system 200 in fig. 2. As shown in fig. 12, the method 1200 includes steps 1210 through 1240.
1210, a first image captured by a first terminal is acquired.
And 1220, determining at least one second image matched with the first image in the image database according to the first information of the first image and the first information of the images in the image database, wherein the first information is used for indicating the global features of the images, and the image database comprises the first information of multi-frame images, the second information of the multi-frame images and the poses of the multi-frame images in a world coordinate system, wherein the second information is used for indicating the local features of the images.
1230, determining the poses of the first image and the at least one second image in the first local coordinate system of the first terminal according to the second information of the first image and the second information of the at least one second image.
1240, outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system.
Therefore, the embodiment of the application realizes the positioning of the first image by determining at least one second image matched with the first image shot by the first terminal and the pose of the first image and the at least one second image in the local coordinate system of the first terminal in the image database, and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system. Therefore, the embodiment of the application can obtain the relation between the local coordinate system and the world coordinate system based on the pose of a part of images in the local coordinate system and the pose in the world coordinate system, so that the image to be positioned (such as the first image) is positioned in the world coordinate system without acquiring a 3D point cloud for each image.
Compared with a calculation scheme of constructing a 3D point cloud feature library of an environment (including global 3D point cloud features) and realizing a pose based on feature point matching in the existing scheme, the embodiment of the application does not utilize the 3D point cloud features of the image, but utilizes the pose of the image for positioning. Because the 3D point cloud feature library of the environment needs to depend on professionals and acquisition equipment, time and labor are consumed, the cost is high, and the 3D point cloud features do not need to be acquired, on one hand, low-cost equipment, such as a camera (such as a mobile phone camera) with low resolution and a small field angle, can be adopted for constructing the image database without depending on the professionals and the professional equipment, and the data volume of the image database can be reduced, so that the cost for constructing the image database can be reduced; on the other hand, the time for constructing the database can be shortened, and the method is favorable for efficiently constructing the image database. Therefore, the scheme for positioning based on the image pose and the image database can be beneficial to positioning with high efficiency and low cost.
In some possible implementations, the method 1200 further includes establishing an image database.
In some possible implementation manners, a video shot by the second terminal may be acquired, and a pose of the multi-frame image in the video in the second local coordinate system of the second terminal may be acquired. Then, the pose of the multi-frame image in the video in the world coordinate system can be determined according to the third information and the pose of the multi-frame image in the video in the second local coordinate system. Wherein the third information is used to indicate a position of at least a portion of the image in the video in a map, the map being associated with the world coordinate system. After that, the first information and the second information of the multi-frame image in the video may be acquired. In this way, the establishment of the image database can be achieved.
In some possible implementations, the method 1200 may further include obtaining a first location of the first image, and determining at least one image in the image database according to the first location of the first image. Wherein a distance between a position of each image of the at least one image and a first position of the first image is less than a first threshold, the first position being from a GPS module or a WIFI module of the first terminal.
As a specific implementation manner of matching the first information of the first image with the first information of the images in the image database and determining at least one second image matching the first image in the image database, the first information of the first image may be matched with the first information of the at least one image, and the at least one second image matching the first image is obtained from the at least one image.
In some possible implementations, a plurality of third images matching the first image may be determined in the image database according to the first information of the first image and the first information of the images in the image database, and then, an image of the plurality of third images whose distance from a cluster center of the plurality of third images is greater than a second threshold may be deleted to obtain the at least one second image. Wherein the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.
Here, an image whose distance from the cluster center of the plurality of third images is greater than the second threshold value may be referred to as an outlier. That is, the third image may include the outlier image and the at least one second image. In some possible descriptions, the third image may be described as a plurality of second images in the image database that match the first image and have no outlier images deleted.
In some possible implementations, a plurality of fourth images matching the first image may be determined in the image database according to the first information of the first image and the first information of the images in the image database, and then, an image with an angle smaller than a third threshold and/or a distance smaller than a fourth threshold in the plurality of fourth images may be deleted to obtain the at least one second image. The angle is the difference value of the poses of the at least two images in the world coordinate system, and the distance is the difference value of the positions of the poses of the at least two images in the world coordinate system. As an example, the angle may be a difference of an elevation angle (pitch), a yaw angle (yaw), or a roll angle (roll) of the poses of the at least two images in the world coordinate system.
Here, images having an angle smaller than the third threshold and/or a distance smaller than the fourth threshold may be referred to as redundant images. That is, the fourth picture may include the redundant picture and the at least one second picture. In some possible descriptions, the fourth image may be described as a plurality of second images in the image database that match the first image and have no redundant images deleted.
Specifically, all relevant contents of the steps involved in the method 1200 for positioning shown in fig. 12 may refer to the relevant functions of the modules in fig. 1 or fig. 2, or the description of the method 300 for positioning shown in fig. 3, and are not described again here.
The method for positioning provided by the embodiment of the present application is described in detail above with reference to fig. 1 to 12, and the apparatus for positioning provided by the embodiment of the present application is described below with reference to fig. 13 and 14. It should be understood that the positioning apparatus in fig. 13 and 14 can perform each step in the positioning method in the embodiment of the present application, and in order to avoid redundancy, the redundant description is appropriately omitted when the positioning apparatus in fig. 13 and 14 is described below.
Fig. 13 is a schematic block diagram of an apparatus 1300 for positioning of an embodiment of the present application. The apparatus 1300 includes an acquisition unit 1310, a processing unit 1320, and an output unit 1330.
Specifically, when the positioning apparatus 1300 performs the positioning method, the obtaining unit 1310 is configured to obtain a first image captured by the first terminal.
A processing unit 1320, configured to determine, in the image database, at least one second image matching the first image according to the first information of the first image and the first information of the images in the image database, where the first information is used to indicate a global feature of the image, and the image database includes the first information of the multi-frame image, the second information of the multi-frame image, and a pose of the multi-frame image in a world coordinate system, where the second information is used to indicate a local feature of the image.
The processing unit 1320 is further configured to determine the poses of the first image and the at least one second image in the first local coordinate system of the first terminal according to the second information of the first image and the second information of the at least one second image.
An output unit 1330 configured to output the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system.
In some possible implementations, the apparatus 1300 further includes a building unit configured to build the image database.
In some possible implementations, the establishing unit is specifically configured to:
acquiring a video shot by a second terminal; acquiring the pose of a multi-frame image in the video under a second local coordinate system of the second terminal; determining the pose of the multi-frame image in the video in a world coordinate system according to third information and the pose of the multi-frame image in the video in the second local coordinate system, wherein the third information is used for indicating the position of at least part of the image in a map, and the map is associated with the world coordinate system; and acquiring the first information and the second information of a plurality of frames of images in the video.
In some possible implementations, the obtaining unit 1310 is further configured to obtain a first location of the first image, where the first location is from a GPS module or a WiFi module of the first terminal.
The acquisition unit may receive data sent by the GPS module or the WiFi module, such as the first location.
Optionally, the obtaining unit may further send a request message to the GPS module or the WiFi module, where the request message is used to request the GPS module or the WiFi module to send the first position of the first image acquired by the obtaining unit. In response to the request message, the GPS module or the WiFi module may transmit the first location to the acquisition unit.
The processing unit 1320 is further configured to determine at least one image in the image database according to the first position of the first image, where a distance between the position of each of the at least one image and the first position of the first image is smaller than a first threshold.
The processing unit 1320 is further configured to match the first information of the first image with the first information of the at least one image, and obtain at least one second image matching the first image in the at least one image.
In some possible implementations, the processing unit 1320 is specifically configured to:
determining a plurality of third images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database; deleting the images, of the plurality of third images, of which the distance from the cluster center of the plurality of third images is greater than a second threshold value, to obtain the at least one second image, wherein the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.
In some possible implementations, the processing unit 1320 is specifically configured to:
determining a plurality of fourth images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database; and deleting the images with the angle smaller than the third threshold and/or the distance smaller than the fourth threshold from the plurality of fourth images to obtain the at least one second image. The angle is the difference value of the poses of the at least two images in the world coordinate system, and the distance is the difference value of the positions of the poses of the at least two images in the world coordinate system.
Specifically, all relevant contents (for example, implementation examples or technical effects) of the units involved in the positioning apparatus 1300 shown in fig. 13 may refer to the relevant functions of the respective modules in fig. 1 or fig. 2 above, or the relevant description of the positioning method 300 shown in fig. 3, and are not described again here.
Fig. 14 is a schematic structural diagram of an apparatus 1400 for positioning according to an embodiment of the present application. As shown in fig. 14, the device 1400 includes a communication module 1410, sensors 1420, a user input module 1430, an output module 1440, a processor 1450, an audio-visual input module 1460, a memory 1470, and a power supply 1480.
The communication module 1410 may include at least one module that enables communication between the apparatus 1400 and other apparatuses (e.g., other computer systems or mobile terminals). For example, the communication module 1410 may include one or more of a wired network interface, a broadcast receiving module, a mobile communication module, a wireless internet module, a local area communication module, and a location (or position) information module, etc. The various modules are implemented in various ways in the prior art, and are not described in the application.
The sensors 1420 may sense the current state of the device 1400, such as location, presence or absence of contact with a user, direction, acceleration/deceleration, and the like. For example, the sensor 1420 may transmit the sensed current state of the device 1400 to the GPS module or the WiFi module.
The user input module 1430 is configured to receive input of digital information, character information, or contact touch operation/non-contact gesture, and to receive signal input related to user setting and function control of the apparatus. User input module 1430 includes a touch panel and/or other input devices.
The output module 1440 includes a display panel for displaying information input by a user, information provided to the user, various menu interfaces of the system, and the like. Alternatively, the display panel may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. In other embodiments, the touch panel can be overlaid on the display panel to form a touch display screen. In addition, the output module 1440 may further include an audio output module, an alarm, a haptic module, and the like.
For example, the output module 1440 may implement the related functions of the output unit 1330 in the apparatus 1300, for example, may be used to output the pose of the first image in the world coordinate system. As a specific example, the display panel of the output module 1440 may display the visualization result of the current location of the terminal in a map.
The audio/video input module 1460 is used for inputting audio signals or video signals. The audio/video input module 1460 may include a camera and a microphone. For example, the camera may be used to capture the first image, and the present application is not limited thereto.
A power supply 1480 may receive external power and internal power under the control of the processor 1450 and provide power required for operation of the various components of the system.
Illustratively, the processor 1450 may implement the functionality of the processing unit 1320 in the apparatus 1300. Optionally, the processor 1450 may also implement the functions of the building units in the apparatus 1300, which is not limited in this application.
For example, the processor 1450 may also be used to implement the function of the obtaining unit 1310 in the apparatus 1300, such as obtaining a first image captured by the terminal from a camera unit (e.g., a video camera), and/or obtaining a first position of the first image from a GPS module or a WiFi module, and the like, which is not limited in this application.
The memory 1470 stores computer programs including an operating system program 1472 and application programs 1471, among others. Typical operating systems are those used in desktop or notebook computers, such as Windows from Microsoft corporation, MacOS from apple Inc., and others developed by Google, IncAndroid ofSystem, etc. for a mobile terminal. The methods provided by the foregoing embodiments may be implemented in software, which may be considered specific implementations of the application programs 1471 and/or the operating system program 1472.
The processor 1450 serves to read the computer programs in the memory 1470 and then execute computer program-defined methods, such as the processor 1450 reads the operating system program 1472 to run an operating system on the system and implement various functions of the operating system, or reads one or more application programs 1471 to run applications on the system.
The memory 1470 also stores other data 1473 than computer programs, such as an image database and the like referred to in this application.
The connection relationship of the modules in fig. 14 is only an example, and the method provided by any embodiment of the present application may also be applied to devices located in other connection manners, for example, all modules are connected through a bus.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
The embodiments of the present application also provide a computer-readable medium, on which a computer program is stored, where the computer program is executed by a computer to implement the steps of the method for positioning in any of the above embodiments.
The embodiments of the present application further provide a computer program product, which when executed by a computer, implements the steps of the positioning method in any of the above embodiments.
The embodiments in the present application may be used independently or jointly, and are not limited herein.
In addition, various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer-readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, etc.), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD), etc.), smart cards, and flash memory devices (e.g., erasable programmable read-only memory (EPROM), card, stick, or key drive, etc.). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media capable of storing, containing, and/or carrying instruction(s) and/or data.
In the embodiments provided in the present application, there is no time limit relationship between each step, and each step may be taken as one scheme, or may be combined with one or more other steps to form a scheme, which is not limited in the present application.
The embodiments in the present application may be used independently or jointly, for example, one or more steps in each of the different embodiments may be combined to form an embodiment separately, and the embodiments are not limited herein.
It should be understood that in the above illustrated embodiments, the first and second are only for convenience of distinguishing different objects, and should not constitute any limitation to the present application.
It should also be understood that, in the embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It should also be understood that "and/or," which describes an association relationship for an associated object, indicates that there may be three relationships, e.g., a and/or B, may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one" means one or more than one; "at least one of a and B", similar to "a and/or B", describes an association relationship of associated objects, meaning that three relationships may exist, for example, at least one of a and B may mean: a exists alone, A and B exist simultaneously, and B exists alone.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (13)
1. A method of positioning, comprising:
acquiring a first image shot by a first terminal;
determining at least one second image matched with the first image in the image database according to the first information of the first image and the first information of the images in the image database, wherein the first information is used for indicating the global features of the images, and the image database comprises the first information of multiframe images, the second information of the multiframe images and the poses of the multiframe images in a world coordinate system, wherein the second information is used for indicating the local features of the images;
determining the poses of the first image and the at least one second image in a first local coordinate system of the first terminal according to the second information of the first image and the second information of the at least one second image;
and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system.
2. The method of claim 1, further comprising:
and establishing the image database.
3. The method of claim 2, wherein the establishing the image database comprises:
acquiring a video shot by a second terminal;
acquiring the pose of a multi-frame image in the video under a second local coordinate system of the second terminal;
determining the pose of the multi-frame image in the video in a world coordinate system according to third information and the pose of the multi-frame image in the video in the second local coordinate system, wherein the third information is used for indicating the position of at least part of the image in a map, and the map is associated with the world coordinate system;
and acquiring the first information and the second information of a plurality of frames of images in the video.
4. The method according to any one of claims 1-3, further comprising:
acquiring a first position of the first image, wherein the first position is from a Global Positioning System (GPS) module or a wireless fidelity (WiFi) module of the first terminal;
determining at least one image in the image database according to the first position of the first image, wherein the distance between the position of each image in the at least one image and the first position of the first image is less than a first threshold value;
wherein the matching the first information of the first image with the first information of images in an image database, determining at least one second image in the image database that matches the first image, comprises:
and matching the first information of the first image with the first information of the at least one image, and acquiring at least one second image matched with the first image in the at least one image.
5. The method of any of claims 1-4, wherein determining, in the image database, at least one second image that matches the first image based on the first information of the first image and first information of images in an image database comprises:
determining a plurality of third images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database;
deleting the images, of the plurality of third images, of which the distance from the cluster center of the plurality of third images is greater than a second threshold value, to obtain the at least one second image, wherein the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.
6. The method of any one of claims 1-5, wherein determining, in the image database, at least one second image that matches the first image based on the first information of the first image and first information of images in an image database comprises:
determining a plurality of fourth images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database;
and deleting images with angles smaller than a third threshold and/or distances smaller than a fourth threshold from the plurality of fourth images to obtain the at least one second image, wherein the angles are the difference of the poses of the at least two images in the world coordinate system, and the distances are the difference of the positions of the poses of the at least two images in the world coordinate system.
7. An apparatus for positioning, comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a first image shot by a first terminal;
the processing unit is used for determining at least one second image matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database, wherein the first information is used for indicating the global features of the images, and the image database comprises the first information of multi-frame images, the second information of the multi-frame images and the poses of the multi-frame images in a world coordinate system, wherein the second information is used for indicating the local features of the images;
the processing unit is further configured to determine, according to the second information of the first image and the second information of the at least one second image, poses of the first image and the at least one second image in a first local coordinate system of the first terminal;
an output unit configured to output the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system.
8. The apparatus according to claim 7, further comprising a building unit configured to build the image database.
9. The apparatus according to claim 8, wherein the establishing unit is specifically configured to:
acquiring a video shot by a second terminal;
acquiring the pose of a multi-frame image in the video under a second local coordinate system of the second terminal;
determining the pose of the multi-frame image in the video in a world coordinate system according to third information and the pose of the multi-frame image in the video in the second local coordinate system, wherein the third information is used for indicating the position of at least part of the image in a map, and the map is associated with the world coordinate system;
and acquiring the first information and the second information of a plurality of frames of images in the video.
10. The apparatus according to any one of claims 7 to 9,
the acquisition unit is further configured to acquire a first position of the first image, where the first position is from a Global Positioning System (GPS) module or a wireless fidelity (WiFi) module of the first terminal;
the processing unit is further configured to determine at least one image in the image database according to the first position of the first image, wherein a distance between the position of each image of the at least one image and the first position of the first image is smaller than a first threshold;
the processing unit is further configured to match the first information of the first image with the first information of the at least one image, and acquire at least one second image matched with the first image in the at least one image.
11. The apparatus according to any one of claims 7 to 10, wherein the processing unit is specifically configured to:
determining a plurality of third images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database;
deleting the images, of the plurality of third images, of which the distance from the cluster center of the plurality of third images is greater than a second threshold value, to obtain the at least one second image, wherein the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.
12. The apparatus according to any one of claims 7 to 11, wherein the processing unit is specifically configured to:
determining a plurality of fourth images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database;
and deleting images with an angle smaller than a third threshold and/or a distance smaller than a fourth threshold from the plurality of fourth images to obtain the at least one second image, wherein the angle is a difference value of the poses of the at least two images in the world coordinate system, and the distance is a difference value of the positions of the poses of the at least two images in the world coordinate system.
13. A terminal device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the apparatus of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011271315.1A CN114565663A (en) | 2020-11-13 | 2020-11-13 | Positioning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011271315.1A CN114565663A (en) | 2020-11-13 | 2020-11-13 | Positioning method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114565663A true CN114565663A (en) | 2022-05-31 |
Family
ID=81712068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011271315.1A Pending CN114565663A (en) | 2020-11-13 | 2020-11-13 | Positioning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114565663A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704821A (en) * | 2017-09-29 | 2018-02-16 | 河北工业大学 | A kind of vehicle pose computational methods of bend |
CN109029442A (en) * | 2018-06-07 | 2018-12-18 | 武汉理工大学 | Based on the matched positioning device of multi-angle of view and method |
CN110705574A (en) * | 2019-09-27 | 2020-01-17 | Oppo广东移动通信有限公司 | Positioning method and device, equipment and storage medium |
CN111046125A (en) * | 2019-12-16 | 2020-04-21 | 视辰信息科技(上海)有限公司 | Visual positioning method, system and computer readable storage medium |
CN111724438A (en) * | 2019-03-18 | 2020-09-29 | 阿里巴巴集团控股有限公司 | Data processing method and device |
-
2020
- 2020-11-13 CN CN202011271315.1A patent/CN114565663A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704821A (en) * | 2017-09-29 | 2018-02-16 | 河北工业大学 | A kind of vehicle pose computational methods of bend |
CN109029442A (en) * | 2018-06-07 | 2018-12-18 | 武汉理工大学 | Based on the matched positioning device of multi-angle of view and method |
CN111724438A (en) * | 2019-03-18 | 2020-09-29 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN110705574A (en) * | 2019-09-27 | 2020-01-17 | Oppo广东移动通信有限公司 | Positioning method and device, equipment and storage medium |
CN111046125A (en) * | 2019-12-16 | 2020-04-21 | 视辰信息科技(上海)有限公司 | Visual positioning method, system and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11729245B2 (en) | Platform for constructing and consuming realm and object feature clouds | |
US11113882B2 (en) | Generating immersive trip photograph visualizations | |
US9558559B2 (en) | Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system | |
US9699375B2 (en) | Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system | |
CN107133325B (en) | Internet photo geographic space positioning method based on street view map | |
US9269196B1 (en) | Photo-image-based 3D modeling system on a mobile device | |
Chen et al. | Rise of the indoor crowd: Reconstruction of building interior view via mobile crowdsourcing | |
CN111046125A (en) | Visual positioning method, system and computer readable storage medium | |
JP5799521B2 (en) | Information processing apparatus, authoring method, and program | |
Braud et al. | Scaling-up ar: University campus as a physical-digital metaverse | |
CN112750203B (en) | Model reconstruction method, device, equipment and storage medium | |
KR20160003553A (en) | Electroninc device for providing map information | |
KR101545138B1 (en) | Method for Providing Advertisement by Using Augmented Reality, System, Apparatus, Server And Terminal Therefor | |
CN103761539B (en) | Indoor locating method based on environment characteristic objects | |
CN110926478B (en) | AR navigation route deviation rectifying method and system and computer readable storage medium | |
Bao et al. | Robust tightly-coupled visual-inertial odometry with pre-built maps in high latency situations | |
CN113298871A (en) | Map generation method, positioning method, system thereof, and computer-readable storage medium | |
US9188444B2 (en) | 3D object positioning in street view | |
Brata et al. | An Enhancement of Outdoor Location-Based Augmented Reality Anchor Precision through VSLAM and Google Street View | |
CN115468568A (en) | Indoor navigation method, device and system, server equipment and storage medium | |
CN114565663A (en) | Positioning method and device | |
CN112700546A (en) | System and method for constructing outdoor large-scale three-dimensional map | |
CN108062786B (en) | Comprehensive perception positioning technology application system based on three-dimensional information model | |
Porzi et al. | An automatic image-to-DEM alignment approach for annotating mountains pictures on a smartphone | |
Chen et al. | To Know Where We Are: Vision-Based Positioning in Outdoor Environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |