CN114565663A

CN114565663A - Positioning method and device

Info

Publication number: CN114565663A
Application number: CN202011271315.1A
Authority: CN
Inventors: 周妍; 李威; 王永亮
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2022-05-31

Abstract

The present application provides a positioning method and apparatus. In this embodiment of the present application, by determining in the image database at least one second image that matches the first image captured by the first terminal, and the poses of the first image and the at least one second image in the local coordinate system of the first terminal, And according to the pose of the first image in the first local coordinate system, the pose of at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system, output The pose of the first image in the world coordinate system realizes the positioning of the first image. Therefore, the embodiment of the present application can obtain the relationship between the local coordinate system and the world coordinate system based on the pose of a part of the image in the local coordinate system and the world coordinate system, so as to locate the first image in the world coordinate system, without the need for each image. 3D point clouds are collected from each image. The present application can be applied to the fields of virtual reality VR, augmented reality AR and the like.

Description

Method and device for positioning

技术领域technical field

本申请涉及增强现实(augmented reality，AR)技术领域，并且更具体的，涉及定位的方法和装置。The present application relates to the technical field of augmented reality (AR), and more particularly, to a method and apparatus for positioning.

背景技术Background technique

随着第五代(5th generation，5G)通信技术的逐渐成熟，以及手机摄像硬件、计算能力的发展，基于视觉AR技术的智能化应用服务越来越丰富。AR技术是一种将虚拟信息与真实世界巧妙融合的技术，广泛运用了多媒体、三维建模、实时跟踪及注册、智能交互、传感等多种技术手段，将计算机生成的文字、图像、三维模型、音乐、视频等虚拟信息模拟仿真后，应用到真实世界中，虚拟信息和真实世界两种信息互为补充，从而实现对真实世界的“增强”。With the gradual maturity of the 5th generation (5G) communication technology and the development of mobile phone camera hardware and computing capabilities, intelligent application services based on visual AR technology are becoming more and more abundant. AR technology is a technology that ingeniously integrates virtual information with the real world. It widely uses various technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, and sensing. Models, music, videos and other virtual information are simulated and applied to the real world, and the virtual information and real world information complement each other, thereby realizing the "enhancement" of the real world.

为了实现广覆盖的AR应用，需要构建地图特征库以进行位姿获取。例如，首先，可以在离线构库阶段利用专业采集设备，如道路测量车、无人机、测量式激光扫描仪、光学相机等采集周围环境的三维(3dimensions，3D)点云(x，y，z)信息，构建地图特征库。然后，在在线定位阶段，可以拍摄一张环境图片，提取拍摄图片的二维(2dimensions，2D)特征点，并与离线构库阶段构建好的地图特征库进行匹配，检索得到一系列2D-3D匹配点对，再通过姿态解算算法，例如PnP(point-n-point)，获取终端的位姿。In order to realize a wide coverage of AR applications, it is necessary to build a map feature library for pose acquisition. For example, first, professional acquisition equipment, such as road survey vehicles, unmanned aerial vehicles, survey laser scanners, optical cameras, etc., can be used to acquire three-dimensional (3dimensions, 3D) point clouds (x, y, 3D) of the surrounding environment in the offline library construction stage z) Information, build a map feature library. Then, in the online positioning stage, a picture of the environment can be taken, the two-dimensional (2D) feature points of the captured picture can be extracted, and matched with the map feature library constructed in the offline library construction stage, a series of 2D-3D feature points can be retrieved. Match point pairs, and then obtain the pose of the terminal through an attitude calculation algorithm, such as PnP (point-n-point).

但是，上述方案依赖完整且与环境高度统一的包含点云信息的地图特征库的构建，然而构建包含点云信息的地图特征库需要依赖专业人员和采集设备，耗时费力，成本高昂。因此，亟需一种高效率、低成本的定位方案。However, the above solution relies on the construction of a complete and highly unified map feature library containing point cloud information. However, the construction of a map feature library containing point cloud information requires professionals and collection equipment, which is time-consuming, labor-intensive, and expensive. Therefore, an efficient and low-cost positioning solution is urgently needed.

发明内容SUMMARY OF THE INVENTION

本申请提供一种定位的方法和装置，能够基于一部分图像在局部坐标系下的位姿以及在世界坐标系下的位姿，得到局部坐标系与世界坐标系的关系，从而对待定位图像在世界坐标系中定位，而无需对每个图像采集3D点云。The present application provides a positioning method and device, which can obtain the relationship between the local coordinate system and the world coordinate system based on the pose of a part of the image in the local coordinate system and the pose in the world coordinate system, so that the image to be positioned is located in the world. Positioning in a coordinate system without acquiring a 3D point cloud for each image.

第一方面，提供了一种定位的方法，该方法可以应用于终端或云端。在该方法中，首先可以获取第一终端拍摄的第一图像，根据该第一图像的第一信息与图像数据库中的图像的第一信息，在图像数据库中确定与该第一图像匹配的至少一个第二图像，其中，第一信息用于指示图像的全局特征，该图像数据库包括多帧图像的第一信息、多帧图像的第二信息和多帧图像在世界坐标系下的位姿，其中，所述第二信息用于指示图像的局部特征。然后，可以根据第一图像的第二信息和至少一个第二图像的第二信息，确定第一图像和至少一个第二图像在第一终端的第一局部坐标系下的位姿。之后，可以根据第一图像在第一局部坐标系下的位姿、至少一个第二图像在第一局部坐标系下的位姿以及至少一个第二图像在世界坐标系下的位姿，输出第一图像在所述世界坐标系下的位姿。In a first aspect, a positioning method is provided, and the method can be applied to a terminal or a cloud. In this method, a first image captured by a first terminal may be acquired first, and according to the first information of the first image and the first information of the image in the image database, at least one image matching the first image is determined in the image database. a second image, wherein the first information is used to indicate the global feature of the image, the image database includes the first information of the multi-frame image, the second information of the multi-frame image and the pose of the multi-frame image in the world coordinate system, Wherein, the second information is used to indicate local features of the image. Then, the poses of the first image and the at least one second image in the first local coordinate system of the first terminal may be determined according to the second information of the first image and the second information of the at least one second image. After that, according to the pose of the first image under the first local coordinate system, the pose of the at least one second image under the first local coordinate system, and the pose of the at least one second image under the world coordinate system, output the first image The pose of an image in the world coordinate system.

因此，本申请实施例通过在图像数据库中确定与第一终端拍摄的第一图像匹配的至少一个第二图像，以及第一图像与至少一个第二图像在第一终端的局部坐标系下的位姿，并根据第一图像在该第一局部坐标系下的位姿、至少一个第二图像在该第一局部坐标系下的位姿以及该至少一个第二图像在世界坐标系中的位姿，输出所述第一图像在所述世界坐标系下的位姿，实现对第一图像的定位。因此，本申请实施例能够基于一部分图像在局部坐标系下的位姿以及在世界坐标系下的位姿，得到局部坐标系与世界坐标系的关系，从而对待定位图像(例如第一图像)在世界坐标系中定位，而无需对每个图像采集3D点云。Therefore, in this embodiment of the present application, by determining in the image database at least one second image that matches the first image captured by the first terminal, and the position of the first image and the at least one second image in the local coordinate system of the first terminal pose, and according to the pose of the first image in the first local coordinate system, the pose of at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system , and output the pose of the first image in the world coordinate system, so as to realize the positioning of the first image. Therefore, the embodiment of the present application can obtain the relationship between the local coordinate system and the world coordinate system based on the pose of a part of the image in the local coordinate system and the pose in the world coordinate system, so that the image to be positioned (for example, the first image) is Positioning in the world coordinate system without acquiring a 3D point cloud for each image.

本申请实施例中，第一图像在世界坐标系下的位姿，即为第一终端在拍摄该第一图像时在世界坐标系下的位姿，也可以称为相机的位姿。In this embodiment of the present application, the pose of the first image in the world coordinate system is the pose of the first terminal in the world coordinate system when the first image is captured, and may also be referred to as the pose of the camera.

在一些实施例中，可以从图像数据库中获取上述至少一个第二图像在世界坐标系下的位姿，以及第二信息，本申请对此不作限定。In some embodiments, the pose of the at least one second image in the world coordinate system and the second information may be acquired from an image database, which is not limited in this application.

作为示例，本申请实施例中第一终端可以为手机、自动驾驶汽车等移动设备，本申请对此不作限定。As an example, in this embodiment of the present application, the first terminal may be a mobile device such as a mobile phone and an autonomous vehicle, which is not limited in the present application.

相对于现有方案中构建环境的3D点云特征库(其中包含全局3D点云特征)，并基于特征点匹配实现位姿的计算方案而言，本申请实施例并没有利用图像的3D点云特征，而是利用了图像的位姿进行定位。由于构建环境的3D点云特征库需要依赖专业人员和采集设备，耗时费力，成本高昂，而本申请不需要获取3D点云特征，因此一方面，本申请可以采用低成本的设备，例如具有低分辨率、小视场角的相机(比如手机相机)进行图像数据库的构建，而无需依赖专业人员和专业设备，并且能够对图像数据库的数据量进行缩小，从而能够降低构建图像数据库的成本；另一方面，本申请能够缩短构建数据库的时间，有助于进行高效率的构建图像数据库。因此本申请的基于图像的位姿以及图像数据库进行定位的方案能够有助于进行高效率、低成本的定位。Compared with the 3D point cloud feature library (which includes global 3D point cloud features) of the environment in the existing solution, and the calculation solution of realizing pose based on feature point matching, the embodiment of the present application does not use the 3D point cloud of the image. feature, but uses the pose of the image for localization. Since the construction of the 3D point cloud feature library of the environment needs to rely on professionals and acquisition equipment, which is time-consuming and labor-intensive, and the cost is high, and this application does not need to acquire 3D point cloud features, on the one hand, the application can use low-cost equipment, such as Cameras with low resolution and small field of view (such as mobile phone cameras) can build image databases without relying on professionals and professional equipment, and can reduce the data volume of image databases, thereby reducing the cost of building image databases; On the one hand, the present application can shorten the time for constructing the database, which contributes to the efficient construction of the image database. Therefore, the positioning solution based on the pose of the image and the image database of the present application can contribute to high-efficiency and low-cost positioning.

在一些实施例中，上述至少一个第二图像可以组成一个图像集合，该图像集合可以称为该第一图像的相似图像集。In some embodiments, the above-mentioned at least one second image may form an image set, and the image set may be referred to as a similar image set of the first image.

在一些实施例中，上述第一图像和至少一个第二图像可以组成一个图像集合，该图像集合可以称为局部图像集。其中，上述第一局部坐标系即第一终端的摄像机坐标系，可以为以该局部图像集中以任一一帧图像为该第一局部坐标系的原点构建的相对坐标系。In some embodiments, the above-mentioned first image and at least one second image may form an image set, which may be referred to as a partial image set. The above-mentioned first local coordinate system, that is, the camera coordinate system of the first terminal, may be a relative coordinate system constructed with any one frame of image in the local image set as the origin of the first local coordinate system.

作为一种可能的实现方式，可以根据上述至少一个第二图像在第一局部坐标系下的位姿，以及该至少一个第二图像在世界坐标系中的位姿，确定第一局部坐标系与世界坐标系的映射关系(或者也可以称为转换关系)。然后根据该映射关系，可以对第一图像在第一局部坐标系下的位姿进行转换，确定第一图像在世界坐标系下的位姿，并输出该位姿。As a possible implementation manner, the first local coordinate system and The mapping relationship of the world coordinate system (or can also be called a transformation relationship). Then, according to the mapping relationship, the pose of the first image under the first local coordinate system can be transformed, the pose of the first image under the world coordinate system is determined, and the pose is output.

作为示例，世界坐标系与第一局部坐标系之间的映射关系可以用旋转矩阵或平移向量来描述，本申请对此不作限定。As an example, the mapping relationship between the world coordinate system and the first local coordinate system may be described by a rotation matrix or a translation vector, which is not limited in this application.

本申请实施例中，世界坐标系也可以称为绝对坐标系、全局坐标系等。作为示例，世界坐标系可以作为环境中的一个基准坐标系来描述摄像机(也可以为包含该摄像机的终端)的位置，也可以用来描述环境中任何物体的位置。In this embodiment of the present application, the world coordinate system may also be referred to as an absolute coordinate system, a global coordinate system, or the like. As an example, the world coordinate system can be used as a reference coordinate system in the environment to describe the position of the camera (which can also be the terminal including the camera), and can also be used to describe the position of any object in the environment.

作为一种可能的实现方式，该图像数据库中可以包括多个映射关系，其中一个映射关系可以指示一个图像的标识、该图像的第一信息、该图像的第二信息和该图像在世界坐标系下的位姿之间的对应关系，本申请对此不作限定。As a possible implementation manner, the image database may include multiple mapping relationships, wherein one mapping relationship may indicate the identification of an image, the first information of the image, the second information of the image, and the image in the world coordinate system The corresponding relationship between the lower poses is not limited in this application.

结合第一方面，在第一方面的某些实现方式中，该方法还可以包括建立上述图像数据库。In conjunction with the first aspect, in some implementations of the first aspect, the method may further include establishing the above-mentioned image database.

结合第一方面，在第一方面的某些实现方式中，可以获取第二终端拍摄的视频，以及获取该视频中的多帧图像在该第二终端的第二局部坐标系下的位姿。然后，可以根据第三信息和该视频中的多帧图像在该第二局部坐标系下的位姿，确定视频中的多帧图像在世界坐标系下的位姿。其中，所述第三信息用于指示视频中的至少部分图像在地图中的位置，该地图与所述世界坐标系关联。之后，可以获取该视频中的多帧图像的第一信息和所述第二信息。这样，可以实现建立该图像数据库。With reference to the first aspect, in some implementations of the first aspect, a video shot by the second terminal may be acquired, and poses of multiple frames of images in the video in the second local coordinate system of the second terminal may be acquired. Then, the pose of the multi-frame images in the video under the world coordinate system may be determined according to the third information and the pose of the multi-frame images in the video under the second local coordinate system. Wherein, the third information is used to indicate the position of at least part of the images in the video in a map, and the map is associated with the world coordinate system. Afterwards, the first information and the second information of the multi-frame images in the video may be acquired. In this way, the establishment of the image database can be realized.

因此，本申请实施例可以通过获取第二终端拍摄的视频中多帧图像在第二终端的第二局部坐标系下的位姿，以及结合该视频中的多帧图像在地图中的相关位置，来确定该视频中的多帧图像在世界坐标系中的位置，并获取这些图像的第一信息和第二信息，来建立该图像数据量。因此，本申请在构建图像数据库时，只需要获取图像在世界坐标系下的位姿，以及图像的第一信息和第二信息，而不需要获取环境的3D点云特征(例如全局3D点云特征)。Therefore, in this embodiment of the present application, by acquiring the poses of multiple frames of images in the video shot by the second terminal in the second local coordinate system of the second terminal, and combining the relevant positions of the multiple frames of images in the video in the map, to determine the positions of the multiple frames of images in the video in the world coordinate system, and to acquire the first information and the second information of these images to establish the image data volume. Therefore, when constructing an image database, the present application only needs to obtain the pose of the image in the world coordinate system, as well as the first and second information of the image, but does not need to obtain the 3D point cloud features of the environment (for example, the global 3D point cloud feature).

需要说明的是，本申请实施例中，上述第一终端和第二终端可以为相同的终端设备，或者为不同的终端设备，不作限定。It should be noted that, in this embodiment of the present application, the first terminal and the second terminal may be the same terminal device or different terminal devices, which are not limited.

在一些实施例中，在获取第一图像在世界坐标系下的位姿之后，可以在图像数据库中增加该第一图像的第一信息、第一图像的第二信息，以及第一图像在世界坐标系下的位姿，从而对图像数据库进行更新。In some embodiments, after the pose of the first image in the world coordinate system is acquired, the first information of the first image, the second information of the first image, and the world coordinate system of the first image may be added to the image database. The pose in the coordinate system is used to update the image database.

作为示例，第三信息可以用于指示视频中的第一帧图像在地图中的位置，和/或最后一帧图像在地图上的位置。示例性的，地图可以为通过全球定位系统(globalpositioning，GPS)获取的地图，或通过全球卫星定位系统(global navigation satellitesystem,GNSS)获取的地图，本申请对此不作限定。As an example, the third information may be used to indicate the position of the first frame of image in the video on the map, and/or the position of the last frame of image on the map. Exemplarily, the map may be a map obtained through a global positioning system (global positioning system, GPS), or a map obtained through a global satellite positioning system (global navigation satellite system, GNSS), which is not limited in this application.

作为一种可能的实现方式，可以基于第二终端的相机采集视频，并通过同步定位和制图(simultaneous localization and mapping，SLAM)算法获取视频中每一帧图像帧在该第二终端的第二局部坐标系下的位姿。进一步的，可以结合该视频中的第一帧图像在地图中的位置和最后一帧图像在地图中的位置，计算视频中每个图像在世界坐标系下的位姿。这里，第二局部坐标系即第而终端的摄像机坐标系，可以为以该视频中以任一一帧图像为该第二局部坐标系的原点构建的相对坐标系。As a possible implementation manner, the video can be collected based on the camera of the second terminal, and the second part of each image frame in the video can be obtained through a simultaneous localization and mapping (SLAM) algorithm in the second part of the second terminal. The pose in the coordinate system. Further, the pose of each image in the video in the world coordinate system can be calculated by combining the position of the first frame of image in the video in the map and the position of the last frame of image in the map. Here, the second local coordinate system, that is, the camera coordinate system of the second terminal, may be a relative coordinate system constructed with any one frame of image in the video as the origin of the second local coordinate system.

示例性的，可以根据规划的采集路线进行视频的采集。例如，用户可以手持终端，从路线的起点出发，沿着路线面对环境连续拍摄，直至到路线的终点结束拍摄，以获取该路线对应的视频。对于该视频中的部分或者全部的视频图像，可以利用SLAM算法获取图像在相机的局部坐标系下的位姿。同时，终端还可以获取该路线的起点在地图中的位置，和/或路线的终点在在地图中的位置，以用于计算视频中的图像在世界坐标系中的位姿。Exemplarily, video collection may be performed according to a planned collection route. For example, the user can hold the terminal, start from the starting point of the route, and face the environment along the route to continuously shoot until the end of the route is finished, so as to obtain the video corresponding to the route. For some or all of the video images in the video, the SLAM algorithm can be used to obtain the pose of the image in the local coordinate system of the camera. At the same time, the terminal can also obtain the position of the starting point of the route on the map, and/or the position of the end point of the route on the map, so as to calculate the pose of the image in the video in the world coordinate system.

结合第一方面，在第一方面的某些实现方式中，还可以获取所述第一图像的第一位置，并根据所述第一图像的第一位置，在所述图像数据库中确定至少一个图像。其中，所述至少一个图像中的每个图像的位置与所述第一图像的第一位置之间的距离小于第一阈值，该第一位置来自所述第一终端的GPS模块或无线保真(wireless fidelity,WiFi)模块。With reference to the first aspect, in some implementations of the first aspect, the first position of the first image may also be acquired, and according to the first position of the first image, at least one image database may be determined in the image database. image. Wherein, the distance between the position of each image in the at least one image and the first position of the first image is less than a first threshold, and the first position comes from the GPS module or Wi-Fi of the first terminal (wireless fidelity, WiFi) module.

其中，作为将所述第一图像的第一信息与图像数据库中的图像的第一信息进行匹配，在所述图像数据库中确定与所述第一图像匹配的至少一个第二图像的一种具体的实现方式，可以将所述第一图像的第一信息与所述至少一个图像的第一信息进行匹配，在所述至少一个图像中获取与所述第一图像匹配的至少一个第二图像。Wherein, as the first information of the first image is matched with the first information of the image in the image database, a specific method of determining at least one second image matching the first image in the image database In an implementation manner, the first information of the first image may be matched with the first information of the at least one image, and at least one second image matching the first image is acquired in the at least one image.

需要说明的是，上述第一位置为一个“粗定位”的位置，其精确度较低，误差例如可以为3至10米。It should be noted that the above-mentioned first position is a "coarse positioning" position, and its accuracy is relatively low, and the error may be, for example, 3 to 10 meters.

因此，本申请实施例通过根据第一图像的第一位置，在图像数据库中确定至少一个图像，进而将第一图像的第一信息与该至少一个图像的第一信息进行匹配，在该至少一个图像中获取与所述第一图像匹配的至少一个第二图像。这样不需要将第一图像的第一信息与图像数据库中所有图像的第一信息进行匹配，从而减小终端的计算量，降低匹配耗时，有利于提高获取第二图像的效率。Therefore, in this embodiment of the present application, by determining at least one image in the image database according to the first position of the first image, and then matching the first information of the first image with the first information of the at least one image, in the at least one image At least one second image matching the first image is acquired from the image. In this way, it is not necessary to match the first information of the first image with the first information of all images in the image database, thereby reducing the calculation amount of the terminal, reducing the time-consuming of matching, and improving the efficiency of acquiring the second image.

作为示例，上述根据第一位置在图像数据中获取的至少一个图像可以为一个图像集合，该图像集合可以称为图像候选集合。As an example, the above-mentioned at least one image obtained in the image data according to the first position may be an image set, and the image set may be referred to as an image candidate set.

作为根据第一图像的第一信息与图像数据库或图像候选集合中图像的第一信息，确定至少一个第二图像的一种可能的实现方式，可以计算第一图像的第一信息与图像数据库或图像候选集合中图像的第一信息的相似度，然后根据该相似度，确定该至少一个第二图像，本申请对此不作限定。As a possible implementation manner of determining at least one second image according to the first information of the first image and the first information of the images in the image database or the image candidate set, the first information of the first image and the image database or The similarity of the first information of the images in the image candidate set, and then the at least one second image is determined according to the similarity, which is not limited in this application.

作为一种可能的实现方式，在获得图像数据库或图像候选集合中图像与第一图像的相似度之后，可以对计算得到的多个相似度进行排序，得到相似度最高的前m张图像，作为第二图像，其中，m位大于1的整数。作为一个示例，m可以为20。或者，在另外的一些实现方式中，可以将与第一图像的相似度大于预设阈值的图像，作为第二图像，本申请对此不作限定。As a possible implementation, after obtaining the similarity between the image in the image database or the image candidate set and the first image, the calculated similarity may be sorted to obtain the first m images with the highest similarity, as The second image, where m bits are an integer greater than 1. As an example, m may be 20. Or, in some other implementation manners, an image whose similarity with the first image is greater than a preset threshold may be used as the second image, which is not limited in this application.

这里，计算第一图像的第一信息与图像数据库或图像候选集合中图像的第一信息的相似度的过程，可以称为将第一图像与图像数据库或图像候选集合的图像进行匹配，本申请对此不作限定。Here, the process of calculating the similarity between the first information of the first image and the first information of the images in the image database or the image candidate set may be referred to as matching the first image with the images in the image database or the image candidate set. This application This is not limited.

结合第一方面，在第一方面的某些实现方式中，作为根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的至少一个第二图像的一种实现方式，可以根据第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的多个第三图像，然后删除该多个第三图像中，与该多个第三图像的聚类中心的距离大于第二阈值的图像，以得到所述至少一个第二图像。其中，该聚类中心是根据所述多个第三图像在世界坐标系下的位置确定的。With reference to the first aspect, in some implementations of the first aspect, as the first information of the first image and the first information of the image in the image database, it is determined in the image database that it is the same as the first image. An implementation manner of the at least one second image that matches the image, according to the first information of the first image and the first information of the image in the image database, to determine the number of images that match the first image in the image database. and then delete the images whose distance from the cluster center of the plurality of third images is greater than the second threshold from among the plurality of third images, so as to obtain the at least one second image. Wherein, the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.

示例性的，与多个第三图像的聚类中心的距离大于第二阈值的图像，即为该多个第三图像中的离群图像。Exemplarily, an image whose distance from the cluster centers of the plurality of third images is greater than the second threshold is an outlier image in the plurality of third images.

这里，该多个第三图像组成的图像集合也可以称为相似图像集，并且该相似图像集中包括上述至少一个第二图像组成的相似图像集。另外，当多个第三图像中没有离群图像时，可以不执行删除图像的操作。此时，该第三图像即为第二图像。Here, the image set composed of the plurality of third images may also be referred to as a similar image set, and the similar image set includes a similar image set composed of the above at least one second image. In addition, when there is no outlier image among the plurality of third images, the operation of deleting the image may not be performed. At this time, the third image is the second image.

由于不同位置的图像可能会存在非常相似的纹理，使得在相似度计算过程中可能存在误差，从而可能会导致相似图像集中出现离群图像。因此，本申请通过剔除这些离群图像，可以对相似图像集中的图像进行误差剔除，进而有助于得到更加精确的相似图像集。这里，获取更加精确的相似图像集可以有助于使得后续获取的第二图像在第一局部坐标系下的位姿的更加准确，进而有助于提高第一图像在世界坐标系下的位姿的准确性。Since images at different locations may have very similar textures, there may be errors in the similarity calculation process, which may lead to outlier images in a set of similar images. Therefore, by eliminating these outlier images in the present application, errors in the images in the similar image set can be eliminated, thereby helping to obtain a more accurate similar image set. Here, acquiring a more accurate set of similar images can help to make the pose of the second image acquired subsequently in the first local coordinate system more accurate, thereby helping to improve the pose of the first image in the world coordinate system accuracy.

结合第一方面，在第一方面的某些实现方式中，作为根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的至少一个第二图像的一种实现方式，可以根据第一图像的第一信息与图像数据库中的图像的第一信息，在图像数据库中确定与第一图像匹配的多个第四图像，然后删除该多个第四图像中，角度小于第三阈值和/或距离小于第四阈值的图像，以得到所述至少一个第二图像。其中，该角度为至少两个图像在世界坐标系下的位姿的姿态的差值，述距离为至少两个图像在世界坐标系下的位姿的位置的差值。作为示例，角度可以为至少两个图像在世界坐标系下的位姿的仰角(pitch)、航偏角(yaw)或滚转角(roll)的差值。With reference to the first aspect, in some implementations of the first aspect, as the first information of the first image and the first information of the image in the image database, it is determined in the image database that it is the same as the first image. An implementation manner of at least one second image for image matching, which can determine a plurality of fourth images matching the first image in the image database according to the first information of the first image and the first information of the image in the image database , and then delete the images whose angle is smaller than the third threshold value and/or the distance is smaller than the fourth threshold value among the plurality of fourth images, so as to obtain the at least one second image. The angle is the difference between the poses of the at least two images in the world coordinate system, and the distance is the difference between the poses of the at least two images in the world coordinate system. As an example, the angle may be a difference in pitch, yaw or roll of poses of at least two images in the world coordinate system.

示例性的，多个第四图像中角度小于第三阈值和/或距离小于第四阈值的图像，即为该多个第二图像中的冗余图像。Exemplarily, among the plurality of fourth images, an image whose angle is smaller than the third threshold and/or whose distance is smaller than the fourth threshold is a redundant image among the plurality of second images.

这里，该多个第四图像组成的图像集合也可以称为相似图像集，并且该相似图像集中包括上述至少一个第二图像组成的相似图像集。另外，当多个第四图像中没有冗余图像时，可以不执行删除图像的操作。此时，该第四图像即为第二图像。Here, the image set composed of the plurality of fourth images may also be referred to as a similar image set, and the similar image set includes a similar image set composed of the above at least one second image. In addition, when there is no redundant image among the plurality of fourth images, the operation of deleting the image may not be performed. At this time, the fourth image is the second image.

由于角度小于预设阈值和/或距离小于预设阈值的图像在空间分布上重叠度较高，使得图像相似图像集存在冗余图像。因此，本申请实施例通过删除相似图像集中的角度小于预设值和/或距离小于预设值的图像，能够使得相似图像集中保留下来的第二图像的重叠度适中，且能够均匀覆盖周围空间，进而有助于得到更加精化的相似图像集。这里，获取更加精化的相似图像集可以有助于减少后续获取的第二图像在第一局部坐标系下的位姿的计算量，进而有助于提高获取第一图像在世界坐标系下的位姿的效率。Since the images whose angles are smaller than the preset threshold and/or whose distances are smaller than the preset threshold have a high degree of overlap in spatial distribution, redundant images exist in the image-similar image set. Therefore, in the embodiment of the present application, by deleting the images whose angle is smaller than the preset value and/or the distance is smaller than the preset value in the similar image set, the overlapping degree of the remaining second images in the similar image set can be moderate, and the surrounding space can be uniformly covered , which in turn helps to obtain a more refined set of similar images. Here, acquiring a more refined set of similar images can help reduce the amount of calculation of the pose of the second image acquired in the first local coordinate system, thereby helping to improve the accuracy of acquiring the first image in the world coordinate system. Pose efficiency.

在一些实施例中，在多个第二图像中删除了离群图像之后，可以进一步删除冗余图像，本申请对此不作限定。In some embodiments, after the outlier images are deleted from the plurality of second images, redundant images may be further deleted, which is not limited in this application.

第二方面，本申请实施例提供了一种定位的装置，用于执行上述第一方面或第一方面的任意可能的实现方式中的方法，具体的，该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的模块。该装置可以包括获取单元、处理单元和输出单元。In a second aspect, an embodiment of the present application provides a positioning device for executing the method in the first aspect or any possible implementation manner of the first aspect. Specifically, the device includes a device for executing the first aspect. or a module of the method in any possible implementation manner of the first aspect. The apparatus may include an acquisition unit, a processing unit and an output unit.

获取单元，用于获取第一终端拍摄的第一图像。an acquiring unit, configured to acquire the first image captured by the first terminal.

处理单元，用于根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的至少一个第二图像，其中，所述第一信息用于指示图像的全局特征，所述图像数据库包括多帧图像的所述第一信息、所述多帧图像的第二信息和所述多帧图像在世界坐标系下的位姿，其中，所述第二信息用于指示图像的局部特征。a processing unit, configured to determine at least one second image matching the first image in the image database according to the first information of the first image and the first information of the image in the image database, wherein the The first information is used to indicate the global feature of the image, and the image database includes the first information of the multi-frame image, the second information of the multi-frame image, and the pose of the multi-frame image in the world coordinate system , wherein the second information is used to indicate local features of the image.

所述处理单元还用于根据所述第一图像的所述第二信息和所述至少一个第二图像的所述第二信息，确定所述第一图像和所述至少一个第二图像在所述第一终端的第一局部坐标系下的位姿。The processing unit is further configured to determine, according to the second information of the first image and the second information of the at least one second image, where the first image and the at least one second image are located. The pose in the first local coordinate system of the first terminal is described.

输出单元，用于根据所述第一图像在所述第一局部坐标系下的位姿、所述至少一个第二图像在所述第一局部坐标系下的位姿以及所述至少一个第二图像在世界坐标系下的位姿，输出所述第一图像在所述世界坐标系下的位姿。an output unit, configured to: according to the pose of the first image under the first local coordinate system, the pose of the at least one second image under the first local coordinate system, and the at least one second image The pose of the image in the world coordinate system, and the pose of the first image in the world coordinate system is output.

结合第二方面，在第二方面的某些实现方式中，还包括建立单元，用于建立所述图像数据库。With reference to the second aspect, in some implementations of the second aspect, an establishment unit is further included for establishing the image database.

结合第二方面，在第二方面的某些实现方式中，所述建立单元具体用于获取第二终端拍摄的视频，获取所述视频中的多帧图像在所述第二终端的第二局部坐标系下的位姿，并根据第三信息和所述视频中的多帧图像在所述第二局部坐标系下的位姿，确定所述视频中的多帧图像在世界坐标系下的位姿。其中，所述第三信息用于指示所述视频中的至少部分图像在地图中的位置，所述地图与所述世界坐标系关联。With reference to the second aspect, in some implementations of the second aspect, the establishing unit is specifically configured to acquire a video shot by a second terminal, and acquire multiple frames of images in the video in a second part of the second terminal. The pose in the coordinate system, and according to the third information and the pose of the multi-frame images in the video in the second local coordinate system, determine the position of the multi-frame images in the video in the world coordinate system posture. Wherein, the third information is used to indicate the position of at least part of the images in the video in a map, and the map is associated with the world coordinate system.

建立单元具体还用于获取所述视频中的多帧图像的所述第一信息和所述第二信息。The establishing unit is further configured to acquire the first information and the second information of the multi-frame images in the video.

结合第二方面，在第二方面的某些实现方式中，所述获取单元还用于获取所述第一图像的第一位置，其中，所述第一位置来自所述第一终端的GPS模块或WiFi模块。With reference to the second aspect, in some implementations of the second aspect, the acquiring unit is further configured to acquire a first position of the first image, where the first position comes from a GPS module of the first terminal or WiFi module.

其中，获取单元可以接收GPS模块或WiFi模块发送的数据，例如上述第一位置。Wherein, the acquiring unit may receive data sent by the GPS module or the WiFi module, such as the above-mentioned first position.

可选的，获取单元还可以向GPS模块或WiFi模块发送请求消息，该请求消息用于请求GPS模块或WiFi模块向获取单元发送其采集到的第一图像的第一位置。响应于该请求消息，GPS模块或WiFi模块可以向获取单元发送该第一位置。Optionally, the obtaining unit may also send a request message to the GPS module or the WiFi module, where the request message is used to request the GPS module or the WiFi module to send the first position of the first image collected by the GPS module or the WiFi module to the obtaining unit. In response to the request message, the GPS module or the WiFi module may send the first location to the acquisition unit.

所述处理单元还用于根据所述第一图像的第一位置，在所述图像数据库中确定至少一个图像，所述至少一个图像中的每个图像的位置与所述第一图像的第一位置之间的距离小于第一阈值。The processing unit is further configured to determine at least one image in the image database according to the first position of the first image, and the position of each image in the at least one image is the same as the first position of the first image. The distance between the locations is less than the first threshold.

所述处理单元还用于将所述第一图像的第一信息与所述至少一个图像的第一信息进行匹配，在所述至少一个图像中获取与所述第一图像匹配的至少一个第二图像。The processing unit is further configured to match the first information of the first image with the first information of the at least one image, and obtain at least one second image matching the first image in the at least one image. image.

结合第二方面，在第二方面的某些实现方式中，所述处理单元具体用于：根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的多个第三图像，并删除所述多个第三图像中，与所述多个第三图像的聚类中心的距离大于第二阈值的图像，以得到所述至少一个第二图像，其中，所述聚类中心是根据所述多个第三图像在世界坐标系下的位置确定的。With reference to the second aspect, in some implementations of the second aspect, the processing unit is specifically configured to: according to the first information of the first image and the first information of the image in the image database, in the image database Determine a plurality of third images matching the first image, and delete the plurality of third images, the distance from the cluster center of the plurality of third images is greater than the second threshold, to obtain The at least one second image, wherein the cluster centers are determined according to the positions of the plurality of third images in the world coordinate system.

结合第二方面，在第二方面的某些实现方式中，所述处理单元具体用于：根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的多个第四图像，并删除所述多个第四图像中，角度小于第三阈值和/或距离小于第四阈值的图像，以得到所述至少一个第二图像，其中，所述角度为至少两个图像在世界坐标系下的位姿的姿态的差值，所述距离为至少两个图像在世界坐标系下的位姿的位置的差值。With reference to the second aspect, in some implementations of the second aspect, the processing unit is specifically configured to: according to the first information of the first image and the first information of the image in the image database, in the image database determining a plurality of fourth images matching the first image, and deleting the images whose angle is smaller than the third threshold value and/or the distance is smaller than the fourth threshold value among the plurality of fourth images, so as to obtain the at least one first image Two images, wherein the angle is the difference between the poses of the at least two images in the world coordinate system, and the distance is the difference between the poses of the at least two images in the world coordinate system.

第三方面，本申请实施例提供了一种定位的装置，包括：一个或多个处理器；存储器，用于存储一个或多个程序；当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如上述第一方面或第一方面的任意可能的实现方式中的方法。In a third aspect, an embodiment of the present application provides a positioning device, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more programs The execution of the one or more processors causes the one or more processors to implement the method as described above in the first aspect or in any possible implementation manner of the first aspect.

第四方面，本申请实施例提供了一种计算机可读介质，用于存储计算机程序，该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。In a fourth aspect, embodiments of the present application provide a computer-readable medium for storing a computer program, where the computer program includes instructions for executing the method in the first aspect or any possible implementation manner of the first aspect.

第五方面，本申请实施例还提供一种包含指令的计算机程序产品，当该计算机程序产品在计算机上运行时，使得该计算机执行第一方面或第一方面的任意可能的实现方式中的方法。In a fifth aspect, the embodiments of the present application further provide a computer program product containing instructions, when the computer program product runs on a computer, the computer is made to execute the method in the first aspect or any possible implementation manner of the first aspect .

应理解，本申请的第二至第五方面及对应的实现方式的说明所取得的有益效果参见本申请的第一方面及对应的实现方式所取得的有益效果，不再赘述。It should be understood that the beneficial effects obtained by the description of the second to fifth aspects of the present application and the corresponding implementation manners can be referred to the beneficial effects obtained by the first aspect of the present application and the corresponding implementation manners, and will not be repeated.

附图说明Description of drawings

图1是本申请实施例提供的一种定位的装置的示意性框图；1 is a schematic block diagram of a positioning device provided by an embodiment of the present application;

图2是一种应用于本申请实施例的方案的系统架构的示意性框图；2 is a schematic block diagram of a system architecture of a solution applied to an embodiment of the present application;

图3是本申请实施例提供的一种定位的方法的示意性流程图；3 is a schematic flowchart of a positioning method provided by an embodiment of the present application;

图4是本申请实施例的室内采集路线的一个示例；FIG. 4 is an example of an indoor collection route according to an embodiment of the present application;

图5是本申请实施例对于一个视频，抽取到的视频的一个示例；Fig. 5 is an example of the extracted video for a video in the embodiment of the present application;

图6是本申请实施例提供的图像数据库的可视化结果的一个示例；Fig. 6 is an example of the visualization result of the image database provided by the embodiment of the present application;

图7是本申请实施例的检索第二图像的一个示例；FIG. 7 is an example of retrieving a second image according to an embodiment of the present application;

图8是对多帧第二图像进行区域聚类的一个具体示例；8 is a specific example of performing regional clustering on multiple frames of second images;

图9是在多个第二图像中删除离群图像之后，进一步删除冗余图像的一个示例；FIG. 9 is an example of further deleting redundant images after deleting outlier images in a plurality of second images;

图10是本申请实施例提供的进行位姿解算的一个具体示例；FIG. 10 is a specific example of the pose calculation provided by the embodiment of the present application;

图11是本申请实施例的实时定位结果的一个具体示例；FIG. 11 is a specific example of the real-time positioning result of the embodiment of the present application;

图12是本申请实施例提供的另一种定位的方法的示意性流程图；12 is a schematic flowchart of another positioning method provided by an embodiment of the present application;

图13是本申请实施例的另一种定位的装置的示意性框图；13 is a schematic block diagram of another positioning apparatus according to an embodiment of the present application;

图14是本申请实施例的另一种定位的装置的结构示意图。FIG. 14 is a schematic structural diagram of another positioning device according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合附图，对本申请中的技术方案进行描述。The technical solutions in the present application will be described below with reference to the accompanying drawings.

首先，对本申请涉及的相关术语进行简单介绍。First, the related terms involved in this application are briefly introduced.

位姿：即用于指示某个图像或者图像中某个内容的位置和姿态的信息。其中，内容可以是物体，人，建筑物，动物等等，本申请不做限制。作为示例，位姿可以为6自由度(6degree of freedom，6DOF)。位置可以通过在欧式空间中的坐标(X，Y，Z)表示，姿态通过旋转坐标俯仰角(pitch)、航偏角(yaw)和滚转角(roll)表示。Pose: that is, the information used to indicate the position and pose of an image or a content in an image. Wherein, the content can be objects, people, buildings, animals, etc., which is not limited in this application. As an example, the pose may be 6 degrees of freedom (6DOF). The position can be represented by coordinates (X, Y, Z) in Euclidean space, and the attitude is represented by the rotation coordinates pitch, yaw, and roll.

同步定位与制图(simultaneous localization and mapping，SLAM)：在未知环境中从一个未知位置开始移动，在移动过程中根据位置估计和地图进行自身定位，同时在自身定位的基础上建造增量式地图，实现自主定位和导航。通过SLAM跟踪定位，可以获得相机的相对空间(即相对坐标系)中的位姿。Simultaneous localization and mapping (SLAM): Start moving from an unknown position in an unknown environment, and perform self-positioning according to position estimation and map during the movement process, and build incremental maps based on self-positioning. Achieve autonomous positioning and navigation. Through SLAM tracking and positioning, the pose of the camera in the relative space (that is, the relative coordinate system) can be obtained.

图1是本申请实施例提供的一种定位的装置100的示意性框图。装置100可以应用于终端，例如移动手机、可穿戴设备、虚拟现实(virtual reality，VR)设备、AR设备、车载智能终端等，装置100也可以应用于云端，例如服务器，本申请实施例对此不作限定。作为示例，该装置100可以用于视觉定位。FIG. 1 is a schematic block diagram of a positioning apparatus 100 provided by an embodiment of the present application. The apparatus 100 may be applied to a terminal, such as a mobile phone, a wearable device, a virtual reality (VR) device, an AR device, a vehicle-mounted smart terminal, etc. The apparatus 100 may also be applied to a cloud, such as a server, to which the embodiments of the present application Not limited. As an example, the apparatus 100 may be used for visual positioning.

如图1所示，装置100包括图像检索模块110和位姿解算模块120。As shown in FIG. 1 , the apparatus 100 includes an image retrieval module 110 and a pose calculation module 120 .

其中，图像检索模块110用于获取第一终端拍摄的第一图像(也可以称为待定位图像，需要定位的图像等)，并根据第一图像的第一信息与图像数据库中的图像的第一信息，在该图像数据库中确定与第一图像匹配的至少一个第二图像。The image retrieval module 110 is configured to obtain the first image (also called the image to be positioned, the image to be positioned, etc.) captured by the first terminal, and, according to the first information of the first image and the first image of the image in the image database, A message that determines at least one second image in the image database that matches the first image.

其中，第一信息用于指示图像的全局特征，也可以称为全局特征信息。该图像数据库包括多帧图像的第一信息、多帧图像的第二信息和多帧图像在世界坐标系下的位姿。其中，第二信息用于指示图像的局部特征，也可以称为局部特征信息。The first information is used to indicate the global feature of the image, and may also be referred to as global feature information. The image database includes first information of the multi-frame images, second information of the multi-frame images, and poses of the multi-frame images in the world coordinate system. The second information is used to indicate local features of the image, and may also be referred to as local feature information.

本申请实施例中，图像数据库也可以称为图像位姿库、图像位姿特征库、位姿特征库等，对此不作限定。In this embodiment of the present application, the image database may also be referred to as an image pose library, an image pose feature library, a pose feature library, etc., which is not limited thereto.

位姿解算模块120可以根据第一图像的第二信息和至少一个第二图像的第二信息，确定第一图像和至少一个第二图像在第一终端的第一局部坐标系下的位姿。之后，可以根据第一图像在第一局部坐标系下的位姿、至少一个第二图像在第一局部坐标系下的位姿以及至少一个第二图像在世界坐标系下的位姿，输出第一图像在所述世界坐标系下的位姿。The pose calculation module 120 may determine the pose of the first image and the at least one second image in the first local coordinate system of the first terminal according to the second information of the first image and the second information of the at least one second image. . After that, according to the pose of the first image under the first local coordinate system, the pose of the at least one second image under the first local coordinate system, and the pose of the at least one second image under the world coordinate system, output the first image The pose of an image in the world coordinate system.

本申请实施例中，世界坐标系也可以称为绝对坐标系。作为示例，世界坐标系可以作为环境中的一个基准坐标系来描述摄像机(也可以为包含该摄像机的终端)的位置，也可以用来描述环境中任何物体的位置。In this embodiment of the present application, the world coordinate system may also be referred to as an absolute coordinate system. As an example, the world coordinate system can be used as a reference coordinate system in the environment to describe the position of the camera (which can also be the terminal including the camera), and can also be used to describe the position of any object in the environment.

作为示例，位姿解算模块120可以从图像数据库中获取上述至少一个第二图像在世界坐标系下的位姿，以及第二信息，本申请对此不作限定。As an example, the pose calculation module 120 may acquire the pose of the at least one second image in the world coordinate system and the second information from the image database, which is not limited in this application.

在一些实施例中，上述第一图像和至少一个第二图像可以组成一个图像集合，该图像集合可以称为局部图像集。其中，上述第一局部坐标系即第一终端的摄像机坐标系，可以为以该局部图像集中以任一一帧图像为的该第一局部坐标系的原点构建的相对坐标系。作为示例，可以采用SLAM构建该第一局部坐标系，本申请对此不作限定。In some embodiments, the above-mentioned first image and at least one second image may form an image set, which may be referred to as a partial image set. The above-mentioned first local coordinate system, that is, the camera coordinate system of the first terminal, may be a relative coordinate system constructed by taking any one frame of image in the local image set as the origin of the first local coordinate system. As an example, SLAM may be used to construct the first local coordinate system, which is not limited in this application.

相对于现有方案中构建环境的3D点云特征库(其中包含全局3D点云特征)，并基于特征点匹配实现位姿的计算方案而言，本申请实施例并没有利用图像的3D点云特征，而是利用了图像的位姿进行定位。由于构建环境的3D点云特征库需要依赖专业人员和采集设备，耗时费力，成本高昂，而本申请不需要获取3D点云特征，因此一方面，本申请可以采用低成本的设备，例如具有低分辨率、小视场角的相机(比如手机相机)进行图像数据库的构建，而无需依赖专业人员和专业设备，并且能够对图像数据库的数据量进行缩小，从而能够降低构建图像数据库的成本；另一方面，本申请能够缩短构建数据库的时间，有助于进进行高效率的构建图像数据库。因此本申请的基于图像的位姿以及图像数据库进行定位的方案能够有助于进行高效率、低成本的定位。Compared with the 3D point cloud feature library (which includes global 3D point cloud features) of the environment in the existing solution, and the calculation solution of realizing pose based on feature point matching, the embodiment of the present application does not use the 3D point cloud of the image. feature, but uses the pose of the image for localization. Since the construction of the 3D point cloud feature library of the environment needs to rely on professionals and acquisition equipment, which is time-consuming and labor-intensive, and the cost is high, and this application does not need to acquire 3D point cloud features, on the one hand, the application can use low-cost equipment, such as Cameras with low resolution and small field of view (such as mobile phone cameras) can build image databases without relying on professionals and professional equipment, and can reduce the data volume of image databases, thereby reducing the cost of building image databases; On the one hand, the present application can shorten the time for constructing the database, which is conducive to the efficient construction of the image database. Therefore, the positioning solution based on the pose of the image and the image database of the present application can contribute to high-efficiency and low-cost positioning.

图2是一种应用于本申请实施例的方案的系统架构200的示意性框图。如图2所示，系统架构200可以包括硬件抽象层数据接口210、位姿获取模块220、应用服务230和数据处理模块240。其中，位姿获取模块220中可以包括图像检索模块210和位姿解算模块220。图像检索模块210中进一步可以包括特征提取单元2211和图像检索单元2212，位姿解算模块220中进一步可以包括局部位姿解算单元2221和坐标转换单元2222。数据处理模块240中可以包括图像数据库构建模块241和视频242。FIG. 2 is a schematic block diagram of a system architecture 200 to which the solution of the embodiment of the present application is applied. As shown in FIG. 2 , the system architecture 200 may include a hardware abstraction layer data interface 210 , a pose acquisition module 220 , an application service 230 and a data processing module 240 . The pose acquisition module 220 may include an image retrieval module 210 and a pose calculation module 220 . The image retrieval module 210 may further include a feature extraction unit 2211 and an image retrieval unit 2212, and the pose calculation module 220 may further include a local pose calculation unit 2221 and a coordinate conversion unit 2222. The data processing module 240 may include an image database building module 241 and a video 242.

应理解，图2示出了适用于本申请实施例的一种系统架构的模块或单元，但这些模块或单元仅是示例，本申请实施例还可以包括其他部分或者图2中的各个部分的变形，或者有可能并非要包括图2中的全部模块或单元。It should be understood that FIG. 2 shows modules or units of a system architecture applicable to the embodiment of the present application, but these modules or units are only examples, and the embodiment of the present application may further include other parts or parts of each part in FIG. 2 . Variations, or possibly not all of the modules or units in FIG. 2 are intended to be included.

在图2中，位姿获取模块220可以作为上述装置100的一个示例，图像检索模块210可以作为上述图像检索模块110的一个示例，位姿解算模块222可以作为位姿解算模块120的一个示例，本申请对此不作限定。In FIG. 2 , the pose acquisition module 220 can be used as an example of the above apparatus 100 , the image retrieval module 210 can be used as an example of the above image retrieval module 110 , and the pose calculation module 222 can be used as one of the pose calculation modules 120 For example, this application does not limit it.

在一些实施例中，位姿获取模块220和图像数据库构建模块241可以以二进制软件包的形式存在。另外，位姿获取模块220可以部署在终端的操作系统的框架层，并通过接口，例如应用程序接口(application programming interface，API)，为应用层的服务应用提供定位信息。In some embodiments, the pose acquisition module 220 and the image database construction module 241 may exist in the form of binary software packages. In addition, the pose acquisition module 220 can be deployed in the framework layer of the operating system of the terminal, and provides positioning information for service applications at the application layer through an interface, such as an application programming interface (API).

需要说明的是，本申请实施例可以基于终端内置的全球定位系统(globalpositioning，GPS)/磁力计、陀螺仪、无线保真(wireless fidelity,WiFi)芯片、相机芯片等硬件实施。这些硬件或芯片对应的硬件驱动或数据读写模块可以通过硬件抽象层数据接口210，按照标准系统接口和上层定位软件服务程序进行数据和控制的交互。It should be noted that the embodiments of the present application may be implemented based on hardware such as a built-in global positioning system (global positioning system, GPS)/magnetometer, gyroscope, wireless fidelity (WiFi) chip, and camera chip. The hardware drivers or data read/write modules corresponding to these hardwares or chips can interact with the data and control through the hardware abstraction layer data interface 210 according to the standard system interface and the upper-layer positioning software service program.

在一些实施例中，本申请实施例提供的定位方案可以通过离线构库-在线定位阶段的框架实现位姿解算。In some embodiments, the positioning solution provided by the embodiments of the present application can realize pose calculation through the framework of offline library building-online positioning stage.

在离线构库阶段，图像数据库构建模块241可以构建图像数据库，例如基于视频242，构建图像数据库。作为示例，图像数据库构建模块241可以通过SLAM算法获取视频242中每一帧图像帧在相机的局部坐标系下的位姿，并融合该视频242在地图中的位置，得到世界坐标系下带位姿的图像帧序列。进一步的，图像数据库构建模块241还可以提取视频242中的图像帧的全局特征(也可称为全局特征信息、第一信息)和局部特征(也可称为局部特征信息、第二信息)。这里，该局部坐标系可以以该视频242中的任一一个图像为该第二局部坐标系的原点构建的局部坐标系。In the offline library construction stage, the image database construction module 241 may construct an image database, for example, based on the video 242, construct an image database. As an example, the image database construction module 241 can obtain the pose of each image frame in the video 242 in the local coordinate system of the camera through the SLAM algorithm, and fuse the position of the video 242 in the map to obtain the position in the world coordinate system pose sequence of image frames. Further, the image database building module 241 can also extract global features (also referred to as global feature information, first information) and local features (also referred to as local feature information, second information) of the image frames in the video 242 . Here, the local coordinate system may be a local coordinate system constructed by taking any image in the video 242 as the origin of the second local coordinate system.

其中，上述视频242可以称为用于进行构建图像数据库的数据源，例如可以为手机视频。The above-mentioned video 242 may be referred to as a data source for constructing an image database, for example, a mobile phone video.

作为示例，图像数据库构建模块241在建立图像数据库之后，可以输出图像特征库。可选的，图像数据库可以存储在存储器中，本申请对此不作限定。As an example, after the image database building module 241 builds the image database, it can output the image feature library. Optionally, the image database may be stored in a memory, which is not limited in this application.

因此，本申请实施例在构建图像数据库时，只需要获取图像在世界坐标系下的位姿，以及图像的全局特征信息和局部特征信息，而不需要获取与环境高度统一的3D点云特征(例如全局3D点云特征)。因此一方面，本申请可以采用低成本的设备，例如具有低分辨率、小视场角的相机(比如手机相机)进行图像数据库的构建，而无需依赖专业人员和专业设备，并且能够对图像数据库的数据量进行缩小，从而能够降低构建图像数据库的成本；另一方面，本申请能够缩短构建数据库的时间，有助于进进行高效率的构建图像数据库。因此本申请的基于图像的位姿以及图像数据库进行定位的方案能够有助于进行高效率、低成本的定位。Therefore, when constructing an image database in the embodiment of the present application, it is only necessary to obtain the pose of the image in the world coordinate system, as well as the global feature information and local feature information of the image, and it is not necessary to obtain the 3D point cloud features that are highly unified with the environment ( such as global 3D point cloud features). Therefore, on the one hand, the present application can use low-cost equipment, such as cameras with low resolution and small field of view (such as mobile phone cameras) to construct an image database, without relying on professionals and professional equipment, and can analyze the image database. The amount of data is reduced, thereby reducing the cost of constructing the image database; on the other hand, the present application can shorten the time for constructing the database, which is conducive to the efficient construction of the image database. Therefore, the positioning solution based on the pose of the image and the image database of the present application can contribute to high-efficiency and low-cost positioning.

在在线定位阶段，可以实时获取图像，并根据该图像获取图像的位姿。In the online positioning stage, the image can be acquired in real time, and the pose of the image can be acquired according to the image.

作为示例，硬件抽象层数据接口210可以用于获取相机采集的图像数据，例如获取待定位图像、第一图像等。可选的，硬件抽象层数据接口210还可以用于获取传感器信号(如磁力计、陀螺仪参数等)，WiFi芯片参数，不作限定。作为示例，硬件抽象层数据接口210可以从终端的操作系统的硬件抽象层提取的标准API获取上述信息，本申请对此不作限定。As an example, the hardware abstraction layer data interface 210 may be used to obtain image data collected by a camera, such as obtaining an image to be positioned, a first image, and the like. Optionally, the hardware abstraction layer data interface 210 may also be used to acquire sensor signals (such as magnetometer, gyroscope parameters, etc.), WiFi chip parameters, which are not limited. As an example, the hardware abstraction layer data interface 210 may obtain the above information from a standard API extracted from the hardware abstraction layer of the operating system of the terminal, which is not limited in this application.

硬件抽象层数据接口210在获取第一图像后，可以将该第一图像发送给位姿获取模块220。位姿获取模块220获取第一图像后，特征提取单元2211可以提取该第一图像的全局特征信息，图像检索单元2212可以根据第一图像的全局特征信息与图像数据库中的图像的全局特征信息，在图像数据库中确定与该第一图像匹配的至少一个第二图像。作为示例，第二图像的集合可以称为相似图像集。After acquiring the first image, the hardware abstraction layer data interface 210 may send the first image to the pose acquiring module 220 . After the pose acquisition module 220 acquires the first image, the feature extraction unit 2211 can extract the global feature information of the first image, and the image retrieval unit 2212 can, according to the global feature information of the first image and the global feature information of the image in the image database, At least one second image matching the first image is determined in the image database. As an example, the set of second images may be referred to as a set of similar images.

局部位姿解算单元2221可以根据第一图像的局部特征信息和至少一个第二图像的局部特征信息，确定第一图像和至少一个第二图像在局部坐标系下的位姿。这里，该局部坐标系例如可以为上文中的第一局部坐标系，具体可以参见第一局部坐标系的描述，不再赘述。之后，坐标转换单元2222可以根据至少一个第二图像在世界坐标系中的位姿，以及至少一个第二图像在该局部坐标系中的位姿，确定该局部坐标系和世界坐标系之间的映射关系，然后，根据该映射关系，对第一图像在该局部坐标系下的位姿进行转换，获得第一图像在世界坐标系中的位姿。The local pose calculation unit 2221 may determine the pose of the first image and the at least one second image in the local coordinate system according to the local feature information of the first image and the local feature information of the at least one second image. Here, the local coordinate system may be, for example, the above-mentioned first local coordinate system. For details, reference may be made to the description of the first local coordinate system, which will not be repeated here. After that, the coordinate conversion unit 2222 may determine the distance between the local coordinate system and the world coordinate system according to the pose of the at least one second image in the world coordinate system and the pose of the at least one second image in the local coordinate system mapping relationship, and then, according to the mapping relationship, transform the pose of the first image in the local coordinate system to obtain the pose of the first image in the world coordinate system.

在一些可选的实施例中，位姿获取模块220在获取第一图像的位姿之后，可以将该位姿发送给应用服务230。应用服务230可以利用该位姿，为用户提供位姿定位服务。In some optional embodiments, after acquiring the pose of the first image, the pose obtaining module 220 may send the pose to the application service 230 . The application service 230 can use the pose to provide the user with a pose location service.

在一些实施例中，应用服务230还可以向位姿获取模块220发起姿定位服务的请求。响应于该请求，位姿获取模块220可以从硬件抽象层数据接口210获取图像，并对该图像进行定位，获取该图像在世界坐标系下的位姿。In some embodiments, the application service 230 may also initiate a request for the pose location service to the pose acquisition module 220 . In response to the request, the pose obtaining module 220 can obtain the image from the hardware abstraction layer data interface 210, and locate the image to obtain the pose of the image in the world coordinate system.

作为示例，应用服务230可以为各类基于位置的服务(location based service，LBS)、AR/VR应用服务等，例如包括但不限定于专用定位应用、各类电商购物应用、各类社交通信应用、各类用车应用、线上到线下(online to offline，O2O)上门服务应用、展馆自助游应用、家人防走散应用、紧急救援服务应用、影音娱乐应用、游戏应用等需要提供精准定位的应用。其中，典型的应用场景例如为AR导航，即在终端的摄像头拍摄的真实世界上叠加导航图标，以提供更具有指示性的导航信息。As an example, the application service 230 may be various types of location based services (location based services, LBS), AR/VR application services, etc., for example, including but not limited to dedicated positioning applications, various e-commerce shopping applications, various types of social communication Application, various car applications, online to offline (online to offline, O2O) on-site service application, exhibition hall self-service tour application, family separation prevention application, emergency rescue service application, audio-visual entertainment application, game application, etc. need to be provided Precise positioning applications. A typical application scenario is, for example, AR navigation, that is, superimposing navigation icons on the real world captured by the camera of the terminal to provide more indicative navigation information.

在图2所示的系统架构200中，位姿获取模块220可以在客户端(例如终端)实现，以支持客户端上的各类应用服务。作为示例，响应于应用服务230发起的定位服务请求，作为系统服务的定位软件，位姿获取模块220开始运行，并使用本申请实施例提供的定位方法，在智能手机上实时获取图像位姿。In the system architecture 200 shown in FIG. 2 , the pose obtaining module 220 may be implemented on a client (eg, a terminal) to support various application services on the client. As an example, in response to the positioning service request initiated by the application service 230, as the positioning software of the system service, the pose obtaining module 220 starts running, and uses the positioning method provided by the embodiment of the present application to obtain the image pose on the smartphone in real time.

作为一种可能的实现方式，系统架构200可以支持客户端-服务器模式。作为示例，数据处理模块140可以存储在服务端(例如服务器)，通过网络连接可以将客户端(例如终端)获取的视频数据上传到服务端。在服务端完成图像数据库的构建后，可以通过网络连接将图像数据库或图像数据库中的部分数据下载到客户端。As a possible implementation, the system architecture 200 may support a client-server model. As an example, the data processing module 140 may be stored in a server (eg, a server), and the video data acquired by a client (eg, a terminal) may be uploaded to the server through a network connection. After the server completes the construction of the image database, the image database or part of the data in the image database can be downloaded to the client through a network connection.

作为另一种可能的实现方式，系统架构200可以支持客户端模式，即可以直接离线进行定位处理。此时，数据处理模块140可以存储在客户端，形成离线的纯客户端模式。As another possible implementation manner, the system architecture 200 may support a client mode, that is, positioning processing may be performed directly offline. At this time, the data processing module 140 can be stored in the client, forming an offline pure client mode.

图3示出了本申请实施例提供的一种定位的方法300的示意性流程图。作为示例，以方法300包括离线构库阶段和在线定位阶段为例进行描述。下面，将结合图2中的系统架构200，对方法300进行描述。FIG. 3 shows a schematic flowchart of a positioning method 300 provided by an embodiment of the present application. As an example, the method 300 includes an offline library construction stage and an online positioning stage for description. The method 300 will be described below in conjunction with the system architecture 200 in FIG. 2 .

应理解，图3示出了定位的方法的步骤或操作，但这些步骤或操作仅是示例，本申请实施例还可以执行其他操作或者图3中的各个操作的变形。此外，图3中的各个步骤可以按照与图3呈现的不同的顺序来执行，并且有可能并非要执行图3中的全部操作。It should be understood that FIG. 3 shows steps or operations of the positioning method, but these steps or operations are only examples, and the embodiments of the present application may also perform other operations or variations of the respective operations in FIG. 3 . Furthermore, the various steps in FIG. 3 may be performed in a different order than that presented in FIG. 3 , and it is possible that not all operations in FIG. 3 are intended to be performed.

如图3所示，方法300可以包括步骤301至步骤310。其中，离线构图阶段可以包括步骤301至步骤304，在线定位阶段可以包括在步骤305至步骤310。As shown in FIG. 3 , the method 300 may include steps 301 to 310 . The offline composition phase may include steps 301 to 304 , and the online positioning phase may include steps 305 to 310 .

步骤301，视频采集。这里，采集到的视频可以作为离线构库阶段的输入。Step 301, video capture. Here, the captured video can be used as the input of the offline library construction stage.

作为示例，可以在室内进行视频采集。作为一种可能的实现方式，可以根据室内平面图，规划采集路线，例如可以规划多条采集路线，这些采集路线可以位于一个楼层，或多个楼层，不作限定。As an example, video capture can be done indoors. As a possible implementation manner, a collection route may be planned according to an indoor floor plan, for example, multiple collection routes may be planned, and these collection routes may be located on one floor or multiple floors, which is not limited.

图4示出了室内采集路线的一个示例。示例性的，用户可以持终端(例如手机)从该路线的起点出发，沿着路线面对环境连续拍摄，直至到路线的终点结束拍摄。Figure 4 shows an example of an indoor acquisition route. Exemplarily, the user may start from the starting point of the route with a terminal (eg, a mobile phone), and face the environment along the route to continuously shoot until the end of the route is completed.

在采集视频的同时，终端还可以获取该路线在地图中的位置。例如，在开始拍摄视频时，用户可以手动启动终端中的定位模块来获取路线的起点在地图中的位置，在到达路线终点时，用户可以手动启动定位模块来获取路线的终点在地图中的位置。或者，在开始拍摄视频时，终端中的定位模块可以自动获取路线的起点在地图中的位置，在到达路线终点时，定位模块可以自动取路线的终点在地图中的位置。这里，路线的起点在地图中的位置，可以称为是采集的视频中的第一帧图像在地图中的位置，路线的终点在地图中的位置，可以称为是采集的视频中的最后一帧图像在地图中的位置。While collecting the video, the terminal can also obtain the location of the route on the map. For example, when starting to shoot a video, the user can manually start the positioning module in the terminal to obtain the location of the starting point of the route on the map. When reaching the end of the route, the user can manually start the positioning module to obtain the location of the end of the route on the map. . Alternatively, when starting to shoot a video, the positioning module in the terminal can automatically obtain the position of the starting point of the route on the map, and when reaching the end point of the route, the positioning module can automatically obtain the position of the end point of the route on the map. Here, the position of the starting point of the route on the map can be referred to as the position of the first frame of the image in the captured video, and the position of the end point of the route in the map can be referred to as the last image in the captured video. The location of the frame image on the map.

本申请实施例中，起点和/或终点在地图中的位置信息可以用来辅助计算，例如用于计算视频中的图像在世界坐标系中的位姿。示例性的，定位模块例如可以为GPS/磁力计、陀螺仪、WiFi芯片等中的至少一种。In this embodiment of the present application, the position information of the starting point and/or the ending point in the map may be used to assist the calculation, for example, to calculate the pose of the image in the video in the world coordinate system. Exemplarily, the positioning module may be, for example, at least one of GPS/magnetometer, gyroscope, WiFi chip, and the like.

需要说明的是，这里以室内场景为例进行说明，该方案同一适用于室外场景，即在步骤301中，也可以在室外进行视频采集，例如可以规划室外的采集路线，本申请对此不作限定。It should be noted that the indoor scene is used as an example for description here. This solution is also applicable to the outdoor scene. That is, in step 301, video collection can also be performed outdoors, for example, an outdoor collection route can be planned, which is not limited in this application. .

步骤302，视频图像抽帧。Step 302, frame extraction of the video image.

作为示例，在采用终端进行拍摄获取视频图像的过程中，终端还可以自动对视频图像进行抽帧处理，以获取多帧图像。图5示出了对于一个视频，抽取到的多帧图像的一个示例。As an example, in the process of using the terminal to capture and acquire a video image, the terminal may also automatically perform frame extraction processing on the video image to acquire multiple frames of images. FIG. 5 shows an example of extracted multi-frame images for a video.

在一些实施例中，当采用客户端-服务器模式时，终端可以将获取的视频上传服务器。当采用客户端模式时，终端可以存储该视频，并对该视频进行处理。In some embodiments, when the client-server mode is adopted, the terminal can upload the acquired video to the server. When the client mode is adopted, the terminal can store the video and process the video.

步骤303，构建图像数据库。Step 303, build an image database.

作为示例，可以由上述图2中的数据处理模块240上的图像数据库构建模块241执行步骤303。参考图3，步骤303进一步可以包括步骤3031和步骤3032。As an example, step 303 may be performed by the image database building module 241 on the data processing module 240 in FIG. 2 described above. Referring to FIG. 3 , step 303 may further include step 3031 and step 3032 .

步骤3031，SLAM路径跟踪。Step 3031, SLAM path tracking.

作为示例，对于每一帧视频图像，可以利用SLAM算法获取图像在相机(即终端)的局部坐标系下的位姿。这里，该局部坐标系为构建图像数据库的过程中相机的相对坐标系。As an example, for each frame of video image, the SLAM algorithm can be used to obtain the pose of the image in the local coordinate system of the camera (ie, the terminal). Here, the local coordinate system is the relative coordinate system of the camera in the process of constructing the image database.

在另一些实施例中，还可以采用AR引擎(AREngien)算法来获取每一帧图像在相机的局部坐标系下的位姿，本申请对此不作限定。In other embodiments, an AR engine (AREngien) algorithm may also be used to obtain the pose of each frame of image in the local coordinate system of the camera, which is not limited in this application.

在获取每一帧图像在相机的局部坐标系下的位姿之后，可以结合图像在地图中的位置进行图像坐标配准，将图像在相机坐标系下的位姿转换到世界坐标系。例如，可以结合视频采集路线的起点在地图中的位置(即视频中第一帧图像在地图中的位置)，和/或，视频采集路线的终点在地图中的位置(即视频中最后一帧图像在地图中的位置)进行图像坐标配准。After obtaining the pose of each frame of image in the local coordinate system of the camera, the image coordinate registration can be performed in combination with the position of the image in the map, and the pose of the image in the camera coordinate system can be converted to the world coordinate system. For example, the location of the starting point of the video collection route on the map (that is, the location of the first frame in the video on the map), and/or the location of the end point of the video recording route on the map (that is, the last frame of the video) image location in the map) for image coordinate registration.

图6示出了图像数据库的可视化结果的一个示例。如图6所示，可以在界面左侧选择采集路线。当选中某条采集路线时，界面右侧可呈现该采集路线，例如可以呈现该采集路线在地图中的位置。作为示例，图6中示出了在A栋大楼的4楼采集到的3段采集路线。其中，作为示例，采集路线1的名称可以为0a1c55c4-9303-46a5-b785-3ad77d80d809_20200415-090323，采集路线2的名称可以为0a1c55c4-9303-46a5-b785-3ad77d80d809_20200415-091659，采集路线3的名称可以为0a1c55c4-9303-46a5-b785-3ad77d80d809_20200415-093745。FIG. 6 shows an example of visualization results of an image database. As shown in Figure 6, the collection route can be selected on the left side of the interface. When a collection route is selected, the collection route can be displayed on the right side of the interface, for example, the location of the collection route on the map can be displayed. As an example, FIG. 6 shows the 3-segment collection route collected on the 4th floor of Building A. Among them, as an example, the name of the collection route 1 can be 0a1c55c4-9303-46a5-b785-3ad77d80d809_20200415-090323, the name of the collection route 2 can be 0a1c55c4-9303-46a5-b785-3ad77d80d809_20200415-091659, and the name of the collection route 3 can be 0a1c55c4-9303-46a5-b785-3ad77d80d809_20200415-093745.

可选的，还可以通过可视化的显示界面打开采集路线对应的图像数据所在的文件夹，例如可以通过图6中右上角的“打开文件夹”按钮打开所选采集路线对应的图像数据所在的文件夹。Optionally, the folder where the image data corresponding to the collection route is located can also be opened through the visual display interface. For example, the file where the image data corresponding to the selected collection route is located can be opened through the "Open Folder" button in the upper right corner of Figure 6. folder.

示例性的，选中轨迹树-A栋大楼-4楼-轨迹-0ale55c4-9303-46a5-b785-3ad77d80d809_202004150-090323轨迹，并选择打开文件夹之后，可以显示如下表1。其中，pos_x,pos_y,pos_z表示位置信息；quat_x,quat_y,quat_z,quat_w表示用四元数表示的方向，可以转换成姿态信息，例如yaw,roll,pitch。Exemplarily, after selecting the track tree-building A-4th floor-track-0ale55c4-9303-46a5-b785-3ad77d80d809_202004150-090323 track and choosing to open the folder, the following table 1 can be displayed. Among them, pos_x, pos_y, pos_z represent position information; quat_x, quat_y, quat_z, quat_w represent the direction represented by quaternion, which can be converted into attitude information, such as yaw, roll, pitch.

表1Table 1

FileNameFileName pos_xpos_x pos_ypos_y pos_zpos_z quat_xquat_x quat_yquat_y quat_zquat_z quat_wquat_w 000001000001 1.85361.8536 0.01890.0189 76.655176.6551 0.43220.4322 -0.5769-0.5769 -0.5329-0.5329 0.44300.4430 000002000002 2.82632.8263 0.00430.0043 76.629776.6297 0.41900.4190 -0.5702-0.5702 -0.5573-0.5573 0.43410.4341 000003000003 3.33083.3308 0.00020.0002 76.546576.5465 0.45430.4543 -0.5904-0.5904 -0.53556-0.53556 0.39750.3975 ……

可选的，在图像数据库的可视化结果中还可以显示每段采集路线上获取的图像帧、全局特在信息或局部特征信息等，本申请对此不作限定。Optionally, in the visualization result of the image database, image frames obtained on each collection route, global specific information or local feature information, etc. may also be displayed, which is not limited in this application.

步骤3032，全局/局部特征提取。Step 3032, global/local feature extraction.

具体的，可以对终端采集的每一帧图像提取全局特征信息和局部特征信息。其中，全局特征信息为图像检索特征信息，例如为NetVLAD，BoVW等，不作限定。局部特征信息例如为尺度不变特征变换(scale-invariant feature transform，SIFT)、ORB、SUPERPOINT，D2Net等，不作限定。Specifically, global feature information and local feature information can be extracted from each frame of image collected by the terminal. The global feature information is image retrieval feature information, such as NetVLAD, BoVW, etc., which is not limited. The local feature information is, for example, scale-invariant feature transform (SIFT), ORB, SUPERPOINT, D2Net, etc., without limitation.

步骤304，输出图像数据库。Step 304, outputting the image database.

作为示例，图像数据库可以包括沿规划路径采集的视频中每一帧图像在世界坐标系下的位姿，全局特征和局部特征。作为示例，图像数据库中的图像可以表示为(R,T，F_l,F_g)，其中，R表示位置信息，T表示角度信息，F_l表示全局特征，F_g表示局部特征。As an example, the image database may include the pose, global features and local features of each frame of images in the video collected along the planned path in the world coordinate system. As an example, an image in an image database can be represented as (R, T, F _l , F _g ), where R represents position information, T represents angle information, F _l represents global features, and F _g represents local features.

步骤305，图像采集。这里，采集到的图像即为第一图像，也可以称为待定位图像或者其他，本申请对此不作限定。Step 305, image acquisition. Here, the collected image is the first image, which may also be called an image to be positioned or others, which is not limited in this application.

作为示例，用户可以使用终端拍摄图像。相应的，智能手机中的硬件抽象层数据接口210可以获取相机采集的图像数据，并将该图像数据发送给位姿获取模块220。As an example, a user may use the terminal to take an image. Correspondingly, the hardware abstraction layer data interface 210 in the smart phone can acquire the image data collected by the camera, and send the image data to the pose acquisition module 220 .

需要说明的是，步骤305中的终端与步骤301中的终端可以为相同的终端，或不同的终端，本申请对此不作限定。It should be noted that the terminal in step 305 and the terminal in step 301 may be the same terminal or different terminals, which are not limited in this application.

可选的，步骤306，初始位置获取。Optionally, in step 306, the initial position is acquired.

作为示例，在步骤305进行图像采集时，硬件抽象层数据接口210还可以获取WiFi芯片或GPS模块采集的定位信息，并向位姿获取模块220发送该定位信息。该定位信息可以称为是第一图像的初始位置。As an example, when image acquisition is performed in step 305 , the hardware abstraction layer data interface 210 may also acquire the positioning information collected by the WiFi chip or the GPS module, and send the positioning information to the pose acquisition module 220 . The positioning information may be referred to as the initial position of the first image.

需要说明的是，WiFi芯片或GPS模块提供的该位置为一个“粗定位”的位置，其精确度较低，误差例如可以为3至10米。该位置可以作为第一图像的初始位置。It should be noted that the position provided by the WiFi chip or the GPS module is a "coarse positioning" position, and its accuracy is low, and the error may be, for example, 3 to 10 meters. This position can be used as the initial position of the first image.

另外，硬件抽象层数据接口210还可以向位姿获取模块220发送陀螺仪和磁力计的信息，使得位姿获取模块220可以根据WiFi芯片或GPS模块采集的定位信息，以及陀螺仪和磁力计的采集信息联合确定第一图像的初始位置。In addition, the hardware abstraction layer data interface 210 can also send the information of the gyroscope and the magnetometer to the pose obtaining module 220, so that the pose obtaining module 220 can obtain the positioning information according to the WiFi chip or the GPS module, as well as the information of the gyroscope and the magnetometer. The acquisition information jointly determines the initial position of the first image.

可选的，步骤307，楼层识别。作为示例，当步骤306中获取的第一图像的初始位置包括高度信息时，可以基于该初始位置，识别出第一图像对应的楼层。Optionally, step 307, floor identification. As an example, when the initial position of the first image acquired in step 306 includes height information, the floor corresponding to the first image may be identified based on the initial position.

步骤308，图像检索。Step 308, image retrieval.

作为示例，图2中的图像检索模块221可以基于步骤305中获取的第一图像的全局特征信息与图像数据库中的图像的全局特征信息，在图像数据库中确定与所述第一图像匹配的至少一个第二图像。本申请实施例中，在图像数据中确定与所述第一图像匹配的至少一个第二图像，还可以描述为在图像数据库中检索与所述第一图像匹配的至少一个第二图像。As an example, the image retrieval module 221 in FIG. 2 may, based on the global feature information of the first image obtained in step 305 and the global feature information of the image in the image database, determine in the image database at least one image matching the first image. a second image. In this embodiment of the present application, determining at least one second image matching the first image in the image data can also be described as retrieving at least one second image matching the first image in an image database.

作为示例，该至少一个第二图像的集合可以称为相似图像集。As an example, the set of at least one second image may be referred to as a set of similar images.

在一些可能的实现方式中，图像检索模块221可以基于第一图像和步骤306中获取的初始位置，进行图像检索。In some possible implementations, the image retrieval module 221 may perform image retrieval based on the first image and the initial position obtained in step 306 .

在一些可能的实现方式中，图像检索模块221可以基于第一图像、步骤306获取的初始位置，以及步骤307中获取的楼层，进行图像检索。In some possible implementations, the image retrieval module 221 may perform image retrieval based on the first image, the initial position obtained in step 306 , and the floor obtained in step 307 .

继续参见图3，作为一个示例，步骤308可以包括以下步骤3081至3083。作为示例，步骤3081可以由特征提取单元2211执行，步骤3082和3083可以由图像检索单元2212执行。Continuing to refer to FIG. 3 , as an example, step 308 may include the following steps 3081 to 3083 . As an example, step 3081 may be performed by the feature extraction unit 2211 , and steps 3082 and 3083 may be performed by the image retrieval unit 2212 .

步骤3081，全局特征提取。Step 3081, global feature extraction.

作为示例，可以使用深度学习算法提取第一图像的全局特征信息。例如可以在卷积神经网络框架后加入NetVLAD层，实现对全局描述子(全局特征信息的一个示例)的提取。其中，图像的全局特征是图像的整体属性，例如可以包括颜色特征、纹理特征和形状特征等，不作限定。As an example, global feature information of the first image may be extracted using a deep learning algorithm. For example, the NetVLAD layer can be added after the convolutional neural network framework to realize the extraction of the global descriptor (an example of global feature information). The global feature of the image is the overall attribute of the image, for example, it may include color feature, texture feature, shape feature, etc., which is not limited.

步骤3082，图像初步检索。Step 3082, preliminary retrieval of images.

作为一种可能的实现方式，可以根据步骤3081中获取的全局特征信息，在图像数据库中检索与第一图像匹配的至少一个第二图像，得到相似图像集。As a possible implementation manner, at least one second image matching the first image may be retrieved in the image database according to the global feature information obtained in step 3081 to obtain a similar image set.

作为另一种可能的实现方式，可以根据步骤306获得的初始位置，在图像数据库中确定至少一个图像，其中，该至少一个图像中每个图像的位置与第一图像的初始位置之间的距离小于预设阈值。然后，在该至少一个图像中检索与第一图像匹配的至少一个第二图像，得到相似图像集。As another possible implementation manner, at least one image may be determined in the image database according to the initial position obtained in step 306, wherein the distance between the position of each image in the at least one image and the initial position of the first image less than the preset threshold. Then, at least one second image matching the first image is retrieved in the at least one image to obtain a set of similar images.

作为一个具体的例子，可以根据第一图像的初始位置，构建一个区域(例如可以称为缓冲区)。例如，可以以第一图像的初始位置为圆心，半径为5米(或10米，或30米，不作限定)构建一个圆形缓冲区。然后，在图像数据库中的位于该缓冲区内的图像中检索与第一图像匹配的至少一个第二图像，得到相似图像集。作为示例，位于该缓冲区内的图像的集合可以称为图像候选集合。As a specific example, an area (for example, it may be called a buffer area) may be constructed according to the initial position of the first image. For example, a circular buffer zone may be constructed with the initial position of the first image as the center and a radius of 5 meters (or 10 meters, or 30 meters, which is not limited). Then, at least one second image matching the first image is retrieved from the images located in the buffer in the image database to obtain a set of similar images. As an example, the set of images located within the buffer may be referred to as the image candidate set.

因此，本申请实施例通过根据第一图像的第一位置，在图像数据库中确定至少一个图像，进而将第一图像的第一信息与该至少一个图像的第一信息进行匹配，在该至少一个图像中获取与所述第一图像匹配的至少一个第二图像。这样不需要将第一图像的第一信息与图像数据库中所有图像的第一信息进行匹配，从而减小终端的计算量，降低匹配耗时，有利于提高获取第二图像的效率。Therefore, in this embodiment of the present application, by determining at least one image in the image database according to the first position of the first image, and then matching the first information of the first image with the first information of the at least one image, in the at least one image At least one second image matching the first image is acquired from the images. In this way, it is not necessary to match the first information of the first image with the first information of all images in the image database, thereby reducing the calculation amount of the terminal, reducing the time-consuming of matching, and improving the efficiency of acquiring the second image.

作为一种可能的实现方式，可以计算第一图像的全局特征信息与图像数据库或图像候选集合中图像的全局特征信息的相似度，并根据该相似度，确定第二图像。As a possible implementation manner, the similarity between the global feature information of the first image and the global feature information of the images in the image database or the image candidate set may be calculated, and the second image may be determined according to the similarity.

示例性的，相似度可以采用如下公式(1)进行计算：Exemplarily, the similarity can be calculated using the following formula (1):

其中，similarity(X,Y)表示X与Y的相似度，X表示第一图像的全局特征信息，Y表示特征库中待匹配的图像的全局特征信息，x_i代表X的特征编码中的各个分量，y_i代表Y的特征编码中的各个分量，n表示一个特征编码中的总的分量的数量，n为正整数。Among them, similarity(X, Y) represents the similarity between X and Y, X represents the global feature information of the first image, Y represents the global feature information of the image to be matched in the feature library, and x _i represents each feature code of X component, _yi represents each component in the feature code of Y, n represents the total number of components in a feature code, and n is a positive integer.

在获得各帧图像与第一图像的相似度之后，可以对计算得到的多个相似度进行排序，得到相似度最高的前m张图像，作为第二图像，其中，m位大于1的整数。作为一个示例，m可以为20。或者，在另外的一些实现方式中，可以将与第一图像的相似度大于预设阈值的图像，作为第二图像。After obtaining the similarity between each frame of images and the first image, the calculated similarity may be sorted to obtain the first m images with the highest similarity as the second image, where m is an integer greater than 1. As an example, m may be 20. Or, in some other implementation manners, an image whose similarity with the first image is greater than a preset threshold may be used as the second image.

图7示出了检索第二图像的一个示例。如图7所示，“×”表示该一个聚类区域中的聚类中心的图像的全局特征信息(可表示为C_k ^VLAD)，该聚类区域中包括第一图像。可以对该聚类区域中的所有图像进行检索，以获取第二图像。其中，“★”表示第一图像(可表示为

)的全局特征信息，“○”表示聚类区域中最邻近第一图像的图像(可称为最邻近图像)的全局特征信息，“●”表示缓冲区中次邻近第一图像的图像(可称为次邻近图像)的全局特征信息。当最邻近图像与第一图像的距离和次邻近图像与第一图像的距离比值大于阈值时，则表示该最邻近图像与第一图像匹配。可选的，可以根据该方法，依次在该缓冲区中确定20个能够与第一图像匹配的图像，作为第二图像。FIG. 7 shows an example of retrieving the second image. As shown in FIG. 7 , “×” represents the global feature information (may be expressed as C _k ^VLAD ) of the image of the cluster center in the one cluster area, and the first image is included in the cluster area. All images in the clustered region can be retrieved to obtain the second image. Among them, "★" represents the first image (which can be expressed as

), "○" represents the global feature information of the image closest to the first image in the clustering area (may be referred to as the nearest neighbor image), "●" represents the image next to the first image in the buffer area (may be called the closest neighbor image) known as the global feature information of the next adjacent image). When the ratio between the distance between the closest adjacent image and the first image and the distance between the second adjacent image and the first image is greater than the threshold, it means that the closest adjacent image matches the first image. Optionally, according to the method, 20 images that can be matched with the first image may be sequentially determined in the buffer as the second image.

可选的，步骤3083，位姿感知的图像精化。Optionally, in step 3083, pose-aware image refinement.

作为示例，当获取的第二图像的数量为多个时，可以进一步对该多个第二图像进行图像精化。例如，可以基于多个第二图像在世界坐标系下的位姿，对多个第二图像进行精化。As an example, when the number of acquired second images is multiple, image refinement may be further performed on the multiple second images. For example, the plurality of second images may be refined based on their poses in the world coordinate system.

作为一种可能的实现方式，可以基于第二图像在世界坐标系下的位姿，删除第二图像(即相似图像集)中的离群图像。这里，第二图像中的离群图像指的是与多个第二图像的聚类中心的距离大于预设阈值的图像，其中该聚类中心是根据多个第二图像在世界坐标系下的位置确定的。As a possible implementation manner, outlier images in the second image (ie, similar image set) may be deleted based on the pose of the second image in the world coordinate system. Here, the outlier image in the second image refers to an image whose distance from the cluster centers of the plurality of second images is greater than a preset threshold, wherein the cluster centers are based on the plurality of second images in the world coordinate system The location is determined.

示例性的，可以从图像数据库中获取每个第二图像的在世界坐标系中的位置，并根据该位置，确定该多个第二图像的聚类中心，在该多个第二图像中确定与该聚类中心的距离大于预设阈值的图像，作为离群图像。Exemplarily, the position of each second image in the world coordinate system may be obtained from the image database, and according to the position, the cluster centers of the plurality of second images may be determined, and the cluster centers of the plurality of second images may be determined in the plurality of second images. Images whose distance from the cluster center is greater than the preset threshold are regarded as outlier images.

需要说明的是，上述确定与聚类中心的距离大于第二阈值的图像(即确定离群图像)的过程，即为判断多个第二图像中是否存在与聚类中心的距离大于第二阈值的图像(即是否存在离群图像)的过程。作为一种可能的判断结果，多个第二图像中可能没有离群图像。作为另一种可能的判断结果，多个第二图像中可能存在至少一帧离群图像。It should be noted that the above-mentioned process of determining an image whose distance from the cluster center is greater than the second threshold (that is, determining an outlier image) is to determine whether there is a distance from the cluster center in the plurality of second images that is greater than the second threshold. image (that is, whether there is an outlier image). As a possible judgment result, there may be no outlier images in the plurality of second images. As another possible judgment result, there may be at least one frame of outlier images in the plurality of second images.

在一些实施例中，当多个第二图像中没有离群图像时，可以不执行删除图像的操作。In some embodiments, when there is no outlier image in the plurality of second images, the operation of deleting the image may not be performed.

图8示出了对多帧第二图像进行区域聚类的一个具体示例。其中，经过对相似图像集中的多帧第二图像进行聚类计算，可以得到大部分第二图像聚集在1、2区域中，只有小部分第二图像分散在区域3和区域4中。此时，可以删除区域2和区域3中聚类点对应的第二图像，将区域1和区域2中的聚类点对应的第二图像作为更新后的相似图像集中的第二图像。FIG. 8 shows a specific example of performing region clustering on multiple frames of second images. Among them, after clustering calculation of multiple frames of second images in the similar image set, it can be obtained that most of the second images are gathered in areas 1 and 2, and only a small part of the second images are scattered in areas 3 and 4. At this time, the second images corresponding to the clustering points in the areas 2 and 3 may be deleted, and the second images corresponding to the clustering points in the areas 1 and 2 are used as the second images in the updated similar image set.

作为另一种可能的实现方式，可以基于第二图像在世界坐标系下的位姿，删除第二图像(即相似图像集)中的冗余图像。作为示例，可以从图像数据库中获取每个第二图像的位姿，然后根据该位姿，在该多帧第二图像中确定角度小于预设角度阈值和/或距离小于预设距离阈值的图像，作为多个第二图像中的冗余图像。其中，该角度为至少两个图像在世界坐标系下的位姿的姿态的差值，述距离为至少两个图像在世界坐标系下的位姿的位置的差值。As another possible implementation manner, redundant images in the second image (ie, the set of similar images) may be deleted based on the pose of the second image in the world coordinate system. As an example, the pose of each second image may be acquired from an image database, and then, according to the pose, images whose angles are smaller than a preset angle threshold and/or whose distances are smaller than a preset distance threshold are determined in the multiple frames of second images , as redundant images in the plurality of second images. The angle is the difference between the poses of the at least two images in the world coordinate system, and the distance is the difference between the poses of the at least two images in the world coordinate system.

作为示例，角度可以为至少两个图像在世界坐标系下的位姿的仰角(pitch)、航偏角(yaw)或滚转角(roll)的差值。As an example, the angle may be a difference in pitch, yaw or roll of poses of at least two images in the world coordinate system.

需要说明的是，上述确定角度小于预设角度阈值和/或距离小于预设距离阈值的图像(即确定冗余图像)的过程，即为判断多个第二图像中是否存在角度小于预设角度阈值和/或距离小于预设距离阈值的图像(即是否存在冗余图像)的过程。作为一种可能的判断结果，多个第二图像中可能没有冗余图像。作为另一种可能的判断结果，多个第二图像中可能存在至少一帧冗余图像。It should be noted that the above process of determining an image with an angle smaller than a preset angle threshold and/or a distance smaller than the preset distance threshold (ie, determining redundant images) is to determine whether there is an angle smaller than the preset angle in the plurality of second images. The process of thresholding and/or images whose distance is less than a preset distance threshold (ie, whether there are redundant images). As a possible judgment result, there may be no redundant images in the plurality of second images. As another possible judgment result, at least one frame of redundant image may exist in the plurality of second images.

在一些实施例中，当多个第二图像中没有冗余图像时，可以不执行删除图像的操作。In some embodiments, when there are no redundant images in the plurality of second images, the operation of deleting images may not be performed.

由于角度小于预设阈值和/或距离小于预设阈值的图像在空间分布上重叠度较高，可认由于角度小于预设阈值和/或距离小于预设阈值的图像在空间分布上重叠度较高，使得图像相似图像集存在冗余图像。因此，本申请实施例通过删除相似图像集中的角度小于预设值和/或距离小于预设值的图像，能够使得相似图像集中保留下来的第二图像的重叠度适中，且能够均匀覆盖周围空间，进而有助于得到更加精化的相似图像集。这里，获取更加精化的相似图像集可以有助于减少后续获取的第二图像在第一局部坐标系下的位姿的计算量，进而有助于提高获取第一图像在世界坐标系下的位姿的效率。Since the images whose angle is less than the preset threshold and/or the distance is less than the preset threshold have a higher degree of overlap in the spatial distribution, it can be considered that the images whose angle is less than the preset threshold and/or the distance is less than the preset threshold have a higher degree of overlap in the spatial distribution. high, so that there are redundant images in the image set with similar images. Therefore, in the embodiment of the present application, by deleting the images whose angle is smaller than the preset value and/or the distance is smaller than the preset value in the similar image set, the overlapping degree of the remaining second images in the similar image set can be moderate, and the surrounding space can be uniformly covered , which in turn helps to obtain a more refined set of similar images. Here, acquiring a more refined set of similar images can help reduce the amount of calculation of the pose of the second image acquired in the first local coordinate system, thereby helping to improve the accuracy of acquiring the first image in the world coordinate system. Pose efficiency.

在一些实施例中，在多个第二图像中删除了离群图像之后，可以进一步删除冗余图像。图9示出了在多个第二图像中删除离群图像之后，进一步删除冗余图像的一个示例。其中，圆圈所在的位置可以表示第二图像所在的位置，箭头可以表示第二图像的姿态。作为一个具体的例子，可以将角度阈值设置为±15°，将距离阈值设置为0.5m。相应的，当至少两帧第二图像中任意两帧或多帧图像的姿态的差值(即角度)小于15°和/或位置距离小于0.5m时，可以认为这该两帧或多帧第二图像为冗余图像(例如可以对应于图9中白色圆圈所在位置对应的图像)。相应的，通过角度和位置过滤，将剔除第二图像中的冗余图像。此时，在多个第二图像中保留下来的第二图像之间角度大于或等于15°和/或位置距离大于或等于0.5m，能够均匀覆盖周围空间，且重叠度适中。In some embodiments, after the outlier images are removed from the plurality of second images, redundant images may be further removed. FIG. 9 shows an example of further removing redundant images after removing outlier images in the plurality of second images. The position of the circle may represent the position of the second image, and the arrow may represent the posture of the second image. As a specific example, the angle threshold can be set to ±15°, and the distance threshold can be set to 0.5m. Correspondingly, when the difference (that is, the angle) of the postures of any two or more frames of the at least two frames of the second image is less than 15° and/or the position distance is less than 0.5m, it can be considered that the two or more frames The two images are redundant images (for example, they may correspond to the images corresponding to the positions of the white circles in FIG. 9 ). Correspondingly, redundant images in the second image will be eliminated by filtering by angle and position. At this time, the angle between the remaining second images is greater than or equal to 15° and/or the position distance is greater than or equal to 0.5m, which can cover the surrounding space uniformly and has a moderate degree of overlap.

作为一个具体的例子，经过步骤3081和步骤3082，可以将原来相似图像集中的20帧第二图像精简为8帧第二图像。示例性的，这8帧图像可以分别表示为(R₁，T₁)、(R₂，T₂)、(R₃，T₃)、(R₄，T₄)、(R₅，T₅)、(R₆，T₆)、(R₇，T₇)、(R₈，T₈)。此时，第一图像可以表示为(R_q，T_q)。As a specific example, after steps 3081 and 3082, 20 frames of second images in the original similar image set can be reduced to 8 frames of second images. Exemplarily, these 8 frames of images can be represented as (R ₁ , T ₁ ), (R ₂ , T ₂ ), (R ₃ , T ₃ ), (R ₄ , T ₄ ), (R ₅ , T ₅ ), respectively ), (R ₆ , T ₆ ), (R ₇ , T ₇ ), (R ₈ , T ₈ ). At this time, the first image can be represented as (R _q , T _q ).

继续参见图3，在图像检索模块221获取相似图像集之后，可以将上述第一图像以及至少一个第二图像发送给位姿解算模块222。Continuing to refer to FIG. 3 , after the image retrieval module 221 acquires the similar image set, the above-mentioned first image and at least one second image may be sent to the pose calculation module 222 .

步骤309，相机位姿求解。Step 309, the camera pose is solved.

示例性的，位姿解算模块222获取第一图像和至少一个第二图像之后，可以进行相机位姿求解。参考图3，步骤309进一步可以包括步骤3091和3092。Exemplarily, after the pose solving module 222 acquires the first image and at least one second image, the camera pose can be solved. Referring to FIG. 3 , step 309 may further include steps 3091 and 3092 .

步骤3091，局部位姿解算。Step 3091, local pose calculation.

作为示例，图像检索模块221中的局部位姿解算单元2221可以基于第一图像和至少一个第二图像进行局部位姿解算。As an example, the local pose calculation unit 2221 in the image retrieval module 221 may perform a local pose calculation based on the first image and the at least one second image.

具体而言，局部位姿解算单元2221可以根据第一图像的局部特征信息和至少一个第二图像的局部特征信息，确定第一图像和至少一个第二图像在第一终端的局部坐标系下的位姿。示例性的，局部位姿解算单元2221可以将第一图像和至少一个第二图像作为局部图像集，并根据该局部图像集中每个图像的局部特征信息，获取该局部图像集中每个图像在局部坐标系下的相对的位姿。这里，该局部坐标系可以为以该局部图像集中任一一帧图像为该局部坐标系的原点构建的相机的相对坐标系。其中，图像的局部特征是图像特征的局部表达，能够反映了图像上具有的局部特性。Specifically, the local pose calculation unit 2221 can determine, according to the local feature information of the first image and the local feature information of the at least one second image, that the first image and the at least one second image are in the local coordinate system of the first terminal pose. Exemplarily, the local pose calculation unit 2221 may use the first image and at least one second image as the local image set, and obtain the information of each image in the local image set according to the local feature information of each image in the local image set. The relative pose in the local coordinate system. Here, the local coordinate system may be a relative coordinate system of the camera constructed with any one frame of image in the local image set as the origin of the local coordinate system. Among them, the local feature of the image is the local expression of the image feature, which can reflect the local characteristics of the image.

作为一种可能的实现方式，局部位姿解算单元2221可以根据局部图像集中的局部特征信息，对局部图像集中各个图像的进行匹配，获取局部图像集中各图像在局部坐标系下的位姿，以及各图像映射到三维空间中的点云信息。这里，局部图像集中各图像在局部坐标系下的位姿，以及映射到三维空间中的点云信息，可以称为该局部图像集的局部结构信息。在一些实施例中，获取局部图像集的局部结构信息的过程可以称为对局部图像集中各图像的局部结构信息的恢复过程。As a possible implementation, the local pose calculation unit 2221 can match each image in the local image set according to the local feature information in the local image set, and obtain the pose of each image in the local image set in the local coordinate system, And the point cloud information that each image is mapped to three-dimensional space. Here, the pose of each image in the local image set in the local coordinate system and the point cloud information mapped into the three-dimensional space can be referred to as the local structure information of the local image set. In some embodiments, the process of obtaining the local structure information of the partial image set may be referred to as a process of restoring the local structure information of each image in the partial image set.

本申请实施例中，通过对局部图像集中图像的局部结构信息进行恢复，一方面可以避免构建全局3D点云特征库，节省建库成本，另一方面可以避免采用全局特征库对大量图像进行对齐造成的误差，从而能够得到该局部图像集中各图像的更加精确的位姿。In the embodiment of the present application, by restoring the local structure information of the images in the local image set, on the one hand, the construction of a global 3D point cloud feature library can be avoided, and the cost of building the library can be saved, and on the other hand, the use of the global feature library to align a large number of images can be avoided. Therefore, a more accurate pose of each image in the local image set can be obtained.

步骤3092，坐标转换。Step 3092, coordinate transformation.

示例性的，坐标转换单元2222可以获取相似图像集中第二图像在世界坐标系中的位姿，并根据上面步骤3091中获取的第二图像在局部坐标系中的位姿，确定该局部坐标系和世界坐标系之间的映射关系(即转换关系)。然后，根据该映射关系，对第一图像在该局部坐标系下的位姿进行坐标转换，得到第一图像在世界坐标系下的位姿。Exemplarily, the coordinate conversion unit 2222 may acquire the pose of the second image in the similar image set in the world coordinate system, and determine the local coordinate system according to the pose of the second image in the local coordinate system obtained in step 3091 above. and the mapping relationship between the world coordinate system (ie, the transformation relationship). Then, according to the mapping relationship, coordinate transformation is performed on the pose of the first image in the local coordinate system to obtain the pose of the first image in the world coordinate system.

作为一种可能的实现方式，坐标转换单元2222可以从步骤304中输出的图像数据库中获取第二图像在世界坐标系中的位姿。As a possible implementation manner, the coordinate conversion unit 2222 may acquire the pose of the second image in the world coordinate system from the image database output in step 304 .

图10示出了进行位姿解算的一个具体示例。其中，(a)图中表示在进行位姿解算时输入的局部图像集，其中包括相似图像集中的n帧第二图像，表示为(R₁,T₁)…(R_n,T_n)，以及第一图像，表示为(R_q,T_q)。作为示例，在图10中以n为8为例进行描述，但是本申请并不限于此。FIG. 10 shows a specific example of performing pose calculation. Among them, Figure (a) represents the partial image set input during pose calculation, including n frames of second images in the similar image set, which are represented as (R ₁ , T ₁ )…(R _n ,T _n ) , and the first image, denoted as (R _q , T _q ). As an example, in FIG. 10 , n is 8 for description, but the present application is not limited to this.

同时，在进行位姿解算时，还需要输入相似图像集中的该n帧第二图像在世界坐标系中的位姿和局部特征信息。作为示例，可以从图像数据库中获得该n帧第二图像在世界坐标系中的位姿和局部特征信息。(e)图示出了8帧第二图像在世界坐标系中的图像轨迹分布一个示例，其中，构成图像轨迹的8个方框分别对应该8帧第二图像的位姿。At the same time, when the pose calculation is performed, it is also necessary to input the pose and local feature information of the n-frame second images in the similar image set in the world coordinate system. As an example, the pose and local feature information of the n frames of second images in the world coordinate system may be obtained from an image database. Figure (e) shows an example of the distribution of image trajectories of 8 frames of second images in the world coordinate system, wherein the 8 boxes constituting the image trajectories correspond to the poses of the 8 frames of second images respectively.

作为示例，对于输入的局部图像集，可以利用SFM算法估计局部图像集中各图像在局部坐标系中的位姿。As an example, for the input partial image set, the pose of each image in the partial image set in the local coordinate system can be estimated by using the SFM algorithm.

首先，可以对局部图像集中第一图像进行图像特征点的提取，获取第一图像的局部特征信息。然后对第一图像的局部特征信息和第二图像的局部特征信息进行匹配。图9中(b)图示出了三帧图像中相匹配的三个特征点的一个具体示例。First, image feature points may be extracted for the first image in the local image set to obtain local feature information of the first image. Then, the local feature information of the first image and the local feature information of the second image are matched. (b) of FIG. 9 shows a specific example of three matched feature points in three frames of images.

然后，可以对局部图像集中图像之间匹配的特征点进行增量式相机参数求解。作为示例，图9中(c)图示出了局部图像集中的三帧图像对应的增量式相机参数的一个示例，其中，(c)图中上方图像中包含7个特征点。(c)图中下方的左侧图像为局部图像集中的一帧图像，包含4个特征点(例如上述7个特征点中的左侧的4个特征点)，该左侧图像的增量式参数为P₀＝K[I|0]，其中，K表示相机内存，I表示初始方向(0,0,0,1)，0表示初始位置(0,0,0)。也就是说，可以以该左侧图像为原点构建该局部图像集的局部坐标系。(c)图中下方的中间图像为局部图像集中的一帧图像，包含该4个特征点(例如上述7个特征点中的前方的4个特征点)，该中间图像的增量式参数为P₁＝K[R₁|t₁]，其中，R₁表示该中间图像的位置，t₁表示该中间图像的角度。(c)图中下方右侧图像为局部图像集中的一帧图像，包含该4个特征点(例如上述7个特征点中的右侧的4个特征点)，该右侧图像的增量式参数为P_i＝K[R_i|t_i]，其中，R_i表示该中间图像的位置，t_i表示该中间图像的角度。Then, incremental camera parameter solving can be performed on feature points that match between images in the local image set. As an example, Fig. 9 (c) shows an example of incremental camera parameters corresponding to three frames of images in the partial image set, wherein the upper image in (c) includes 7 feature points. (c) The left image at the bottom of the figure is a frame of image in the local image set, including 4 feature points (for example, the 4 feature points on the left in the above 7 feature points). The incremental formula of the left image The parameter is P ₀ =K[I|0], where K represents the camera memory, I represents the initial direction (0, 0, 0, 1), and 0 represents the initial position (0, 0, 0). That is, the local coordinate system of the local image set can be constructed with the left image as the origin. (c) The middle image at the bottom of the figure is a frame of image in the local image set, including the 4 feature points (for example, the 4 feature points in front of the above 7 feature points). The incremental parameters of the middle image are: P ₁ =K[R ₁ |t ₁ ], where R ₁ represents the position of the intermediate image and t ₁ represents the angle of the intermediate image. (c) The lower right image in the figure is a frame of image in the local image set, including the 4 feature points (for example, the 4 feature points on the right in the above 7 feature points). The incremental formula of the right image The parameter is P _i =K[R _i |t _i ], where R _i represents the position of the intermediate image, and t _i represents the angle of the intermediate image.

之后，可以根据得到的增量式相机参数，获取局部图像集中各图像在局部坐标系中的位姿和3D点云。(d)图示出了9帧图像在局部坐标系中的图像轨迹分布和3D点云的一个具体示例。其中，构成图像轨迹的9个方框分别对应该9帧图像的位姿。其中，黑色方框表示第一图像在局部坐标系中的位姿，白色方框表示8帧第二图像在局部坐标系中的位姿。Afterwards, the pose and 3D point cloud of each image in the local image set in the local coordinate system can be obtained according to the obtained incremental camera parameters. (d) Figure shows a specific example of the image trajectory distribution and 3D point cloud of 9 frames of images in the local coordinate system. Among them, the 9 boxes constituting the image trajectory correspond to the poses of the 9 frames of images respectively. Among them, the black box represents the pose of the first image in the local coordinate system, and the white box represents the pose of the eight frames of the second image in the local coordinate system.

之后，可以将局部图像集中的各图像在局部坐标系下的位姿(或图像轨迹分布)，以及相似图像集中的第二图像在世界坐标系下的位姿(或图像轨迹分布)，输入到坐标转换单元2222，由坐标转换单元2222进行坐标转换。(f)图示出了坐标转换的一个具体示例。其中，根据上述8帧第二图像在世界坐标系中的位姿，以及该8帧第二图像在局部坐标系中的位姿，可以得到局部坐标系和世界坐标系之间的映射关系。作为示例，该映射关系可以为从局部坐标系到世界坐标系的相似变换矩阵(R_i，t_i，α_i)。进一步，通过对第一图像在局部坐标系中的位姿乘以该相似变换矩阵(R_i，t_i，α_i)，可以得到该第一图像在世界坐标系中的位姿。After that, the pose (or image trajectory distribution) of each image in the local image set in the local coordinate system and the pose (or image trajectory distribution) of the second image in the similar image set in the world coordinate system can be input into The coordinate conversion unit 2222 performs coordinate conversion by the coordinate conversion unit 2222 . Figure (f) shows a specific example of coordinate conversion. The mapping relationship between the local coordinate system and the world coordinate system can be obtained according to the poses of the 8 frames of second images in the world coordinate system and the poses of the 8 frames of second images in the local coordinate system. As an example, the mapping relationship may be a similar transformation matrix (R _i , t _i , α _i ) from the local coordinate system to the world coordinate system. Further, by multiplying the pose of the first image in the local coordinate system by the similarity transformation matrix (R _i , t _i , α _i ), the pose of the first image in the world coordinate system can be obtained.

310，输出定位结果。310. Output the positioning result.

具体而言，在得到第一图像在世界坐标系中的位姿后，可以输出该位姿，即输出对第一图像的定位结果，从而完成对第一图像的定位。Specifically, after the pose of the first image in the world coordinate system is obtained, the pose can be output, that is, the positioning result of the first image is output, thereby completing the positioning of the first image.

图11示出了实时定位结果的具体示例。其中，图中的实景为终端当前所在的位置，下方地图中小圈表示当前位置在二维地图中的可视化结果，其中箭头表示方向信息。FIG. 11 shows a specific example of the real-time positioning result. Among them, the real scene in the figure is the current location of the terminal, the small circle in the lower map represents the visualization result of the current location in the two-dimensional map, and the arrow represents the direction information.

基于上述定位方案，本申请实施例能够充分利用现有终端设备，例如用户级接收机设备(比如手机)或导航级接收机设备(比如车辆)，进行采集图像、提取图像特征，以及鲁棒性的图像检索和匹配，实现定位(例如6DoF定位)。在图像检索过程中，本申请实施例能够在根据全局特征信息进行图像检索基础上，进一步考虑检索到图像的空间分布，联合位姿构建双阈值图像精化算法，对离群图像和冗余图像进行删除(也可以称为二次检索)，得到精化的相似图像集。对于局部图像集合，本申请实施例能够执行SFM算法，基于局部结构信息恢复局部图像集合中的图像在局部坐标系中的相对位姿，并结合位姿特征库中图像在世界坐标系中的位姿，将待定位图像的相对位姿转换到世界坐标系中，实现相机位姿的求解。Based on the above positioning solution, the embodiments of the present application can make full use of existing terminal devices, such as user-level receiver devices (such as mobile phones) or navigation-level receiver devices (such as vehicles), to collect images, extract image features, and improve robustness. image retrieval and matching to achieve positioning (such as 6DoF positioning). In the process of image retrieval, the embodiment of the present application can further consider the spatial distribution of the retrieved images on the basis of image retrieval based on the global feature information, and construct a double-threshold image refinement algorithm jointly with poses, so as to detect outlier images and redundant images. Deletion (also known as secondary retrieval) is performed to obtain a refined set of similar images. For a local image set, the embodiment of the present application can execute the SFM algorithm, recover the relative pose of the image in the local image set in the local coordinate system based on the local structure information, and combine the position of the image in the world coordinate system in the pose feature library The relative pose of the image to be positioned is converted into the world coordinate system to solve the camera pose.

本申请实施例能够解决面向LBS/AR/VR应用服务需求中点云数据库构建困难，无法实现大规模推广的问题。具体而言，本申请实施例的提出了无需依赖3D点云特征库的室内外定位的解决方案。相比于需要构建点云数据库的定位方案，本申请实施例能够兼具高定位精度和低成本优势，打破了传统全局3D点云特征库构建的高成本壁垒。另外，本申请实施例无需对齐每张图像来构建全局3D点云数据库，而只需获取图像的位姿，并提取图像全局特征和局部特征，来构建图像位姿库。在本申请实施例的在线定位阶段，首先对需要定位的图像在数据库中进行图像检索，然后恢复检索到的图像和待定位图像的局部结构信息，获得待定位的图像以及检索到的图像在相对坐标系下的相对位姿，最后结合位姿特征库实现从相对坐标系到世界坐标系位姿的转换，实现相机位姿的求解。The embodiments of the present application can solve the problem of difficulty in building a point cloud database for LBS/AR/VR application service requirements and inability to achieve large-scale promotion. Specifically, the embodiments of the present application propose solutions for indoor and outdoor positioning without relying on a 3D point cloud feature library. Compared with the positioning scheme that needs to build a point cloud database, the embodiments of the present application can have the advantages of high positioning accuracy and low cost, and break the high cost barrier of traditional global 3D point cloud feature database construction. In addition, the embodiment of the present application does not need to align each image to construct a global 3D point cloud database, but only needs to obtain the pose of the image, and extract the global and local features of the image to construct the image pose library. In the online positioning stage of this embodiment of the present application, image retrieval is first performed on the image to be positioned in the database, and then the local structure information of the retrieved image and the image to be positioned is restored, and the relative image of the image to be positioned and the retrieved image are obtained. The relative pose in the coordinate system is finally combined with the pose feature library to realize the transformation from the relative coordinate system to the world coordinate system, so as to realize the solution of the camera pose.

根据上述实施例，本申请在A地(例如南京)和B地(例如西安)的定位精度的一个示例如下表2所示。其中，位置定位的误差在厘米级，角度定位精度在3°以内的比例为48.1％至67.7％，角度定位精度在10％以内的比例为90％以上。According to the above embodiment, an example of the positioning accuracy of the present application in place A (for example, Nanjing) and place B (for example, Xi'an) is shown in Table 2 below. Among them, the error of position positioning is at the centimeter level, the proportion of angle positioning accuracy within 3° is 48.1% to 67.7%, and the proportion of angle positioning accuracy within 10% is more than 90%.

表2Table 2

<1°<1° <3°<3° <10°<10° X(m)X(m) Y(m)Y(m) Z(m)Z(m) A地A place 10％10% 48.1％48.1% 91.9％91.9% 0.170.17 0.010.01 0.130.13 B地place B 7.5％7.5% 67.7％67.7% 93.7％93.7% -0.01-0.01 0.0040.004 -0.019-0.019

能达到上述技术效果的原因在于，本申请提出的图像位姿库(一个模型)，基于位姿感知的图像检索算法，以及基于局部结构信息的相机姿态求解算法(两个算法)具有坚实的理论基础和实践基础。理论基础在于，本申请实施例能够基于待定位图像的全局特征信息，以及位置，搜索图像数据库中均匀分布的图像数据集，为SMF恢复局部结构信息提供稳定的输入数据，保证尽可能多的图像参与重建的同时，不产生多余的冗余信息。再结合图像位姿库提供的全局先验信息，可以得到局部坐标系到全局坐标系的变换关系(即映射关系)，实现精确的位姿计算。实践基础在于，终端(例如手机)作为一种易获取的图像采集设备，搭配采集视频这种易操作的采集方式，能够极大的加快数据采集的效率，简化了数据采集的流程和人工成本。The reason why the above technical effects can be achieved is that the image pose library (one model) proposed in this application, the image retrieval algorithm based on pose perception, and the camera pose solution algorithm (two algorithms) based on local structure information have solid theoretical foundations. Fundamentals and practical foundations. The theoretical basis is that the embodiments of the present application can search for image datasets evenly distributed in the image database based on the global feature information and location of the image to be located, provide stable input data for SMF to restore local structure information, and ensure as many images as possible. While participating in reconstruction, redundant redundant information is not generated. Combined with the global prior information provided by the image pose library, the transformation relationship (ie the mapping relationship) from the local coordinate system to the global coordinate system can be obtained to achieve accurate pose calculation. The basis of practice is that a terminal (such as a mobile phone), as an easy-to-obtain image acquisition device, combined with an easy-to-operate acquisition method such as video acquisition, can greatly speed up the efficiency of data acquisition and simplify the data acquisition process and labor costs.

图12示出了本申请实施例提供的一种定位的方法1200的示意性流程图。其中，该方法1200可以应用于终端或云端。作为示例，方法1200可以由图1中的定位的装置100执行，或由图2中的系统200执行。如图12所示，该方法1200包括步骤1210至步骤1240。FIG. 12 shows a schematic flowchart of a positioning method 1200 provided by an embodiment of the present application. Wherein, the method 1200 may be applied to the terminal or the cloud. As an example, the method 1200 may be performed by the positioned apparatus 100 in FIG. 1 , or by the system 200 in FIG. 2 . As shown in FIG. 12 , the method 1200 includes steps 1210 to 1240 .

1210，获取第一终端拍摄的第一图像。1210. Acquire a first image captured by the first terminal.

1220，根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的至少一个第二图像，其中，所述第一信息用于指示图像的全局特征，所述图像数据库包括多帧图像的所述第一信息、所述多帧图像的第二信息和所述多帧图像在世界坐标系下的位姿，其中，所述第二信息用于指示图像的局部特征。1220. Determine at least one second image matching the first image in the image database according to the first information of the first image and the first information of the image in the image database, wherein the first image The information is used to indicate the global feature of the image, and the image database includes the first information of the multi-frame image, the second information of the multi-frame image, and the pose of the multi-frame image in the world coordinate system, wherein, The second information is used to indicate local features of the image.

1230，根据所述第一图像的所述第二信息和所述至少一个第二图像的所述第二信息，确定所述第一图像和所述至少一个第二图像在所述第一终端的第一局部坐标系下的位姿。1230. According to the second information of the first image and the second information of the at least one second image, determine whether the first image and the at least one second image are in the first terminal. The pose in the first local coordinate system.

1240，根据所述第一图像在所述第一局部坐标系下的位姿、所述至少一个第二图像在所述第一局部坐标系下的位姿以及所述至少一个第二图像在世界坐标系下的位姿，输出所述第一图像在所述世界坐标系下的位姿。1240. According to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world The pose in the coordinate system, and output the pose of the first image in the world coordinate system.

在一些可能的实现方式中，方法1200还包括建立图像数据库。In some possible implementations, method 1200 also includes establishing an image database.

在一些可能的实现方式中，可以获取第二终端拍摄的视频，以及获取该视频中的多帧图像在该第二终端的第二局部坐标系下的位姿。然后，可以根据第三信息和该视频中的多帧图像在该第二局部坐标系下的位姿，确定视频中的多帧图像在世界坐标系下的位姿。其中，所述第三信息用于指示视频中的至少部分图像在地图中的位置，该地图与所述世界坐标系关联。之后，可以获取该视频中的多帧图像的第一信息和所述第二信息。这样，可以实现建立该图像数据库。In some possible implementations, the video shot by the second terminal may be acquired, and the poses of multiple frames of images in the video in the second local coordinate system of the second terminal may be acquired. Then, the pose of the multi-frame images in the video under the world coordinate system may be determined according to the third information and the pose of the multi-frame images in the video under the second local coordinate system. Wherein, the third information is used to indicate the position of at least part of the images in the video in a map, and the map is associated with the world coordinate system. Afterwards, the first information and the second information of the multi-frame images in the video may be acquired. In this way, the establishment of the image database can be realized.

在一些可能的实现方式中，方法1200还可以包括获取所述第一图像的第一位置，并根据所述第一图像的第一位置，在所述图像数据库中确定至少一个图像。其中，所述至少一个图像中的每个图像的位置与所述第一图像的第一位置之间的距离小于第一阈值，该第一位置来自所述第一终端的GPS模块或WIFI模块。In some possible implementations, method 1200 may further include acquiring a first position of the first image, and determining at least one image in the image database according to the first position of the first image. Wherein, the distance between the position of each image in the at least one image and the first position of the first image is less than a first threshold, and the first position comes from the GPS module or the WIFI module of the first terminal.

在一些可能的实现方式中，可以根据第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的多个第三图像，然后删除该多个第三图像中，与该多个第三图像的聚类中心的距离大于第二阈值的图像，以得到所述至少一个第二图像。其中，该聚类中心是根据所述多个第三图像在世界坐标系下的位置确定的。In some possible implementations, a plurality of third images matching the first image may be determined in the image database according to the first information of the first image and the first information of the image in the image database, and then The at least one second image is obtained by deleting an image whose distance from the cluster center of the plurality of third images is greater than a second threshold from among the plurality of third images. Wherein, the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.

这里，与该多个第三图像的聚类中心的距离大于第二阈值的图像可以称为离群图像。也就是说，第三图像可以包括离群图像和上述至少一个第二图像。在一些可能的描述中，第三图像可以描述为是图像数据库中与第一图像匹配，并且未删除离群图像的多个第二图像。Here, an image whose distance from the cluster centers of the plurality of third images is greater than the second threshold may be referred to as an outlier image. That is, the third image may include the outlier image and the aforementioned at least one second image. In some possible descriptions, the third image may be described as being a plurality of second images in the image database that match the first image and have not removed outlier images.

在一些可能的实现方式中，可以根据第一图像的第一信息与图像数据库中的图像的第一信息，在图像数据库中确定与第一图像匹配的多个第四图像，然后删除该多个第四图像中，角度小于第三阈值和/或距离小于第四阈值的图像，以得到所述至少一个第二图像。其中，该角度为至少两个图像在世界坐标系下的位姿的姿态的差值，述距离为至少两个图像在世界坐标系下的位姿的位置的差值。作为示例，角度可以为至少两个图像在世界坐标系下的位姿的仰角(pitch)、航偏角(yaw)或滚转角(roll)的差值。In some possible implementations, a plurality of fourth images matching the first image may be determined in the image database according to the first information of the first image and the first information of the images in the image database, and then the plurality of fourth images may be deleted. In the fourth image, the angle is smaller than the third threshold and/or the distance is smaller than the fourth threshold, so as to obtain the at least one second image. The angle is the difference between the poses of the at least two images in the world coordinate system, and the distance is the difference between the poses of the at least two images in the world coordinate system. As an example, the angle may be a difference in pitch, yaw or roll of poses of at least two images in the world coordinate system.

这里，角度小于第三阈值和/或距离小于第四阈值的图像可以称为冗余图像。也就是说，第四图像可以包括冗余图像和上述至少一个第二图像。在一些可能的描述中，第四图像可以描述为是图像数据库中与第一图像匹配，并且未删除冗余图像的多个第二图像。Here, images whose angle is smaller than the third threshold and/or whose distance is smaller than the fourth threshold may be referred to as redundant images. That is, the fourth image may include the redundant image and the aforementioned at least one second image. In some possible descriptions, the fourth image may be described as being a plurality of second images in the image database that match the first image and have not removed redundant images.

具体的，图12所示的定位的方法1200所涉及的各步骤的所有相关内容可以参见上文中图1或图2中各个模块的相关功能，或者图3中所示的定位的方法300的描述，在此不再赘述。Specifically, for all relevant contents of the steps involved in the positioning method 1200 shown in FIG. 12 , please refer to the relevant functions of the modules in FIG. 1 or FIG. 2 above, or the description of the positioning method 300 shown in FIG. 3 . , and will not be repeated here.

上文结合图1至图12对本申请实施例提供的定位的方法进行了详细描述，下面结合图13和图14对本申请实施例的定位的装置进行介绍。应理解，图13和图14中的定位的装置能够执行本申请实施例中的定位的方法中的各个步骤，为了避免重复，下面在介绍图13和图14中的定位的装置时适当省略重复的描述。The positioning method provided by the embodiment of the present application has been described in detail above with reference to FIGS. 1 to 12 , and the positioning device according to the embodiment of the present application will be introduced below with reference to FIG. 13 and FIG. 14 . It should be understood that the positioning device in FIGS. 13 and 14 can perform each step in the positioning method in the embodiment of the present application. In order to avoid repetition, repetition is appropriately omitted when introducing the positioning device in FIGS. 13 and 14 below. description of.

图13是本申请实施例的定位的装置1300的示意性框图。该装置1300包括获取单元1310、处理单元1320和输出单元1330。FIG. 13 is a schematic block diagram of an apparatus 1300 for positioning according to an embodiment of the present application. The apparatus 1300 includes an acquisition unit 1310 , a processing unit 1320 and an output unit 1330 .

具体的，当定位的装置1300执行定位的方法时，获取单元1310，用于获取第一终端拍摄的第一图像。Specifically, when the positioning apparatus 1300 executes the positioning method, the obtaining unit 1310 is configured to obtain the first image captured by the first terminal.

处理单元1320，用于根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的至少一个第二图像，其中，所述第一信息用于指示图像的全局特征，所述图像数据库包括多帧图像的所述第一信息、所述多帧图像的第二信息和所述多帧图像在世界坐标系下的位姿，其中，所述第二信息用于指示图像的局部特征。The processing unit 1320 is configured to determine at least one second image matching the first image in the image database according to the first information of the first image and the first information of the image in the image database, wherein, The first information is used to indicate the global characteristics of the image, and the image database includes the first information of the multi-frame images, the second information of the multi-frame images, and the position of the multi-frame images in the world coordinate system. pose, wherein the second information is used to indicate local features of the image.

所述处理单元1320还用于根据所述第一图像的所述第二信息和所述至少一个第二图像的所述第二信息，确定所述第一图像和所述至少一个第二图像在所述第一终端的第一局部坐标系下的位姿。The processing unit 1320 is further configured to determine, according to the second information of the first image and the second information of the at least one second image, that the first image and the at least one second image are The pose of the first terminal in the first local coordinate system.

输出单元1330，用于根据所述第一图像在所述第一局部坐标系下的位姿、所述至少一个第二图像在所述第一局部坐标系下的位姿以及所述至少一个第二图像在世界坐标系下的位姿，输出所述第一图像在所述世界坐标系下的位姿。The output unit 1330 is configured to, according to the pose of the first image under the first local coordinate system, the pose of the at least one second image under the first local coordinate system, and the at least one first The pose of the second image in the world coordinate system, and the pose of the first image in the world coordinate system is output.

在一些可能的实现方式，装置1300还包括建立单元，用于建立所述图像数据库。In some possible implementations, the apparatus 1300 further includes an establishment unit for establishing the image database.

在一些可能的实现方式，所述建立单元具体用于：In some possible implementations, the establishment unit is specifically used for:

获取第二终端拍摄的视频；获取所述视频中的多帧图像在所述第二终端的第二局部坐标系下的位姿；根据第三信息和所述视频中的多帧图像在所述第二局部坐标系下的位姿，确定所述视频中的多帧图像在世界坐标系下的位姿，其中，所述第三信息用于指示所述视频中的至少部分图像在地图中的位置，所述地图与所述世界坐标系关联；获取所述视频中的多帧图像的所述第一信息和所述第二信息。Acquiring the video shot by the second terminal; acquiring the pose of the multi-frame images in the video in the second local coordinate system of the second terminal; according to the third information and the multi-frame images in the video in the The pose in the second local coordinate system, determining the pose of the multi-frame images in the video in the world coordinate system, wherein the third information is used to indicate the position of at least part of the images in the video in the map position, the map is associated with the world coordinate system; and the first information and the second information of the multi-frame images in the video are acquired.

在一些可能的实现方式，所述获取单元1310还用于获取所述第一图像的第一位置，其中，所述第一位置来自所述第一终端的GPS模块或WiFi模块。In some possible implementations, the obtaining unit 1310 is further configured to obtain a first position of the first image, where the first position comes from a GPS module or a WiFi module of the first terminal.

所述处理单元1320还用于根据所述第一图像的第一位置，在所述图像数据库中确定至少一个图像，所述至少一个图像中的每个图像的位置与所述第一图像的第一位置之间的距离小于第一阈值。The processing unit 1320 is further configured to determine at least one image in the image database according to the first position of the first image, and the position of each image in the at least one image is the same as the first position of the first image. A distance between locations is less than a first threshold.

所述处理单元1320还用于将所述第一图像的第一信息与所述至少一个图像的第一信息进行匹配，在所述至少一个图像中获取与所述第一图像匹配的至少一个第二图像。The processing unit 1320 is further configured to match the first information of the first image with the first information of the at least one image, and obtain at least one first image matching the first image in the at least one image. Second image.

在一些可能的实现方式，所述处理单元1320具体用于：In some possible implementations, the processing unit 1320 is specifically used for:

根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的多个第三图像；删除所述多个第三图像中，与所述多个第三图像的聚类中心的距离大于第二阈值的图像，以得到所述至少一个第二图像，其中，所述聚类中心是根据所述多个第三图像在世界坐标系下的位置确定的。According to the first information of the first image and the first information of the image in the image database, determine a plurality of third images matching the first image in the image database; delete the plurality of third images , the distance from the cluster centers of the plurality of third images is greater than a second threshold to obtain the at least one second image, wherein the cluster centers are based on the plurality of third images in The position in the world coordinate system is determined.

根据所述第一图像的第一信息与图像数据库中的图像的第一信息，在所述图像数据库中确定与所述第一图像匹配的多个第四图像；删除所述多个第四图像中，角度小于第三阈值和/或距离小于第四阈值的图像，以得到所述至少一个第二图像。其中，所述角度为至少两个图像在世界坐标系下的位姿的姿态的差值，所述距离为至少两个图像在世界坐标系下的位姿的位置的差值。According to the first information of the first image and the first information of the image in the image database, determine a plurality of fourth images matching the first image in the image database; delete the plurality of fourth images among the images whose angle is smaller than the third threshold and/or the distance is smaller than the fourth threshold to obtain the at least one second image. The angle is the difference between the poses of the at least two images in the world coordinate system, and the distance is the difference between the poses of the at least two images in the world coordinate system.

具体的，图13所示的定位的装置1300所涉及的各单元的所有相关内容(例如实现示例或技术效果)可以参见上文中图1或图2中各个模块的相关功能，或者图3中所示的定位的方法300的相关描述，在此不再赘述。Specifically, for all relevant content (for example, implementation examples or technical effects) of the units involved in the positioning apparatus 1300 shown in FIG. 13 , reference may be made to the relevant functions of each module in FIG. The relevant description of the positioning method 300 shown in the figure is omitted here.

图14是本申请实施例的定位的装置1400的结构示意图。如图14所示，该装置1400包括通信模块1410、传感器1420、用户输入模块1430、输出模块1440、处理器1450、音视频输入模块1460、存储器1470以及电源1480。FIG. 14 is a schematic structural diagram of a positioning apparatus 1400 according to an embodiment of the present application. As shown in FIG. 14 , the device 1400 includes a communication module 1410 , a sensor 1420 , a user input module 1430 , an output module 1440 , a processor 1450 , an audio and video input module 1460 , a memory 1470 and a power supply 1480 .

通信模块1410可以包括至少一个能使该装置1400与其他装置(例如其他计算机系统或移动终端)之间进行通信的模块。例如，通信模块1410可以包括有线网络接口，广播接收模块、移动通信模块、无线因特网模块、局域通信模块和位置(或定位)信息模块等其中的一个或多个。这多种模块均在现有技术中有多种实现，本申请不一一描述。The communication module 1410 may include at least one module that enables communication between the device 1400 and other devices (eg, other computer systems or mobile terminals). For example, the communication module 1410 may include one or more of a wired network interface, a broadcast receiving module, a mobile communication module, a wireless Internet module, a local area communication module, and a location (or positioning) information module, among others. There are various implementations of these various modules in the prior art, which are not described in this application.

传感器1420可以感测装置1400的当前状态，诸如位置、与用户是否有接触、方向、和加速/减速等。示例性的，传感器1420可以向GPS模块或WiFi模块发送感测到的装置1400的当前状态。The sensor 1420 may sense the current state of the device 1400, such as position, contact with the user, orientation, acceleration/deceleration, and the like. For example, the sensor 1420 may transmit the sensed current state of the device 1400 to the GPS module or the WiFi module.

用户输入模块1430，用于接收输入的数字信息、字符信息或接触式触摸操作/非接触式手势，以及接收与装置的用户设置以及功能控制有关的信号输入等。用户输入模块1430包括触控面板和/或其他输入设备。The user input module 1430 is used for receiving input digital information, character information or contact touch operation/non-contact gesture, as well as receiving signal input related to user settings and function control of the device. User input module 1430 includes a touch panel and/or other input devices.

输出模块1440包括显示面板，用于显示由用户输入的信息、提供给用户的信息或系统的各种菜单界面等。可选的，可以采用液晶显示器(liquid crystal display，LCD)或有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板。在其他一些实施例中，触控面板可覆盖显示面板上，形成触摸显示屏。另外，输出模块1440还可以包括音频输出模块、告警器以及触觉模块等。The output module 1440 includes a display panel for displaying information input by the user, information provided to the user, various menu interfaces of the system, and the like. Optionally, the display panel may be configured in the form of a liquid crystal display (liquid crystal display, LCD) or an organic light-emitting diode (organic light-emitting diode, OLED). In other embodiments, the touch panel may cover the display panel to form a touch display screen. In addition, the output module 1440 may further include an audio output module, an alarm, a haptic module, and the like.

示例性的，输出模块1440可以实现装置1300中输出单元1330的相关功能，例如可以用于输出第一图像在世界坐标系下的位姿。作为一个具体的例子，输出模块1440的显示面板上可以显示终端当前位置在地图中的可视化结果。Exemplarily, the output module 1440 may implement the related functions of the output unit 1330 in the apparatus 1300, for example, it may be used to output the pose of the first image in the world coordinate system. As a specific example, the display panel of the output module 1440 may display the visualization result of the current location of the terminal in the map.

音视频输入模块1460，用于输入音频信号或视频信号。音视频输入模块1460可以包括摄像头和麦克风。示例性的，摄像头可以用于拍摄第一图像，本申请对此不作限定。The audio and video input module 1460 is used for inputting audio signals or video signals. The audio and video input module 1460 may include a camera and a microphone. Exemplarily, the camera may be used to capture the first image, which is not limited in this application.

电源1480可以在处理器1450的控制下接收外部电力和内部电力，并且提供系统的各个组件的操作所需的电力。The power supply 1480 may receive external power and internal power under the control of the processor 1450 and provide power required for the operation of various components of the system.

处理器1450可以指示一个或多个处理器，例如，处理器1450可以包括一个或多个中央处理器，或者包括一个中央处理器和一个图形处理器，或者包括一个应用处理器和一个协处理器(例如微控制单元)。当处理器1450包括多个处理器时，这多个处理器可以集成在同一块芯片上，也可以各自为独立的芯片。一个处理器可以包括一个或多个物理核，其中物理核为最小的处理模块。Processor 1450 may refer to one or more processors, for example, processor 1450 may include one or more central processing units, or a central processing unit and a graphics processor, or an application processor and a co-processor (eg a microcontroller unit). When the processor 1450 includes multiple processors, the multiple processors may be integrated on the same chip, or may be independent chips. A processor may include one or more physical cores, where a physical core is the smallest processing module.

示例性的，处理器1450可以实现装置1300中处理单元1320的功能。可选的，处理器1450还可以实现装置1300中建立单元的功能，本申请对此不做限定。Exemplarily, the processor 1450 may implement the functions of the processing unit 1320 in the apparatus 1300 . Optionally, the processor 1450 may also implement the function of the establishment unit in the apparatus 1300, which is not limited in this application.

示例性的，处理器1450还可以用于实现装置1300中获取单元1310的功能，例如从摄像单元(例如摄像机)获取终端拍摄的第一图像，和/或从GPS模块或WiFi模块获取第一图像的第一位置等，本申请对此不作限定。Exemplarily, the processor 1450 may also be used to implement the function of the obtaining unit 1310 in the apparatus 1300, such as obtaining the first image captured by the terminal from a camera unit (eg, a camera), and/or obtaining the first image from a GPS module or a WiFi module. The first position, etc., is not limited in this application.

存储器1470存储计算机程序，该计算机程序包括操作系统程序1472和应用程序1471等。典型的操作系统如微软公司的Windows，苹果公司的MacOS等用于台式机或笔记本的系统，又如谷歌公司开发的基于

的安卓

系统等用于移动终端的系统。前述实施例提供的方法可以通过软件的方式实现，可以认为是应用程序1471和/或操作系统程序1472的具体实现。The memory 1470 stores computer programs including an operating system program 1472, an application program 1471, and the like. Typical operating systems such as Microsoft's Windows, Apple's MacOS, etc. are used for desktop or notebook systems, and Google's developed

android

system etc. for mobile terminals. The methods provided in the foregoing embodiments may be implemented by means of software, which may be considered as the specific implementation of the application program 1471 and/or the operating system program 1472 .

存储器1470可以是以下类型中的一种或多种：闪速(flash)存储器、硬盘类型存储器、微型多媒体卡型存储器、卡式存储器(例如SD或XD存储器)、随机存取存储器(randomaccess memory,RAM)、静态随机存取存储器(static RAM,SRAM)、只读存储器(read onlymemory,ROM)、电可擦除可编程只读存储器(electrically erasable programmable read-only memory，EEPROM)、可编程只读存储器(programmable ROM,PROM)、磁存储器、磁盘或光盘。在其他一些实施例中，存储器1470也可以是因特网上的网络存储设备，系统可以对在因特网上的存储器1470执行更新或读取等操作。The memory 1470 may be one or more of the following types: flash memory, hard disk type memory, micro multimedia card type memory, card memory (eg SD or XD memory), random access memory (random access memory, RAM), static random access memory (static RAM, SRAM), read only memory (read only memory, ROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), programmable read only memory Memory (programmable ROM, PROM), magnetic memory, magnetic disk or optical disk. In some other embodiments, the memory 1470 can also be a network storage device on the Internet, and the system can perform operations such as updating or reading on the memory 1470 on the Internet.

处理器1450用于读取存储器1470中的计算机程序，然后执行计算机程序定义的方法，例如处理器1450读取操作系统程序1472从而在该系统运行操作系统以及实现操作系统的各种功能，或读取一种或多种应用程序1471，从而在该系统上运行应用。The processor 1450 is used to read the computer program in the memory 1470, and then execute the method defined by the computer program. For example, the processor 1450 reads the operating system program 1472 to run the operating system and implement various functions of the operating system in the system, or read the operating system program 1472. One or more applications are taken 1471 to run on the system.

存储器1470还存储有除计算机程序之外的其他数据1473，例如本申请中涉及的图像数据库等。The memory 1470 also stores other data 1473 other than computer programs, such as image databases and the like referred to in this application.

图14中各个模块的连接关系仅为一种示例，本申请任意实施例提供的方法也可以应用在其它连接方式的定位的装置中，例如所有模块通过总线连接。The connection relationship of each module in FIG. 14 is only an example, and the method provided by any embodiment of the present application can also be applied to a positioning device in other connection manners, for example, all modules are connected through a bus.

上述实施例，可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时，上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质。半导体介质可以是固态硬盘。The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive.

本申请实施例还提供了一种计算机可读介质，其上存储有计算机程序，该计算机程序被计算机执行时实现上述任一实施例中的定位的方法的步骤。Embodiments of the present application further provide a computer-readable medium, on which a computer program is stored, and when the computer program is executed by a computer, implements the steps of the positioning method in any of the foregoing embodiments.

本申请实施例还提供了一种计算机程序产品，该计算机程序产品被计算机执行时实现上述任一实施例中的定位的方法的步骤。The embodiments of the present application further provide a computer program product, which implements the steps of the positioning method in any of the foregoing embodiments when the computer program product is executed by a computer.

本申请中的各个实施例可以独立的使用，也可以进行联合的使用，这里不做限定。Each embodiment in this application can be used independently or in combination, which is not limited here.

另外，本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如，计算机可读介质可以包括，但不限于:磁存储器件(例如，硬盘、软盘或磁带等)，光盘(例如，压缩盘(compact disc，CD)、数字通用盘(digital versatile disc，DVD)等)，智能卡和闪存器件(例如，可擦写可编程只读存储器(erasable programmableread-only memory，EPROM)、卡、棒或钥匙驱动器等)。另外，本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于，无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。Additionally, various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used in this application encompasses a computer program accessible from any computer-readable device, carrier or media. For example, computer readable media may include, but are not limited to: magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CDs), digital versatile discs (DVDs) etc.), smart cards and flash memory devices (eg, erasable programmable read-only memory (EPROM), cards, stick or key drives, etc.). Additionally, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.

需要说明的是，在本申请提供的各个实施例中，各个步骤之间没有时间限制关系，并且各个步骤可以作为一个方案，也可以和其他一个或多个步骤组合构成一个方案，本申请对此不作限定。It should be noted that, in the various embodiments provided in this application, there is no time-limited relationship between each step, and each step can be used as a solution, and can also be combined with one or more other steps to form a solution. Not limited.

本申请中的各个实施例可以独立的使用，也可以进行联合的使用，例如各不同实施例中的任何一个或多个步骤可以进行组合，单独构成实施例，这里不做限定。The various embodiments in this application can be used independently or in combination, for example, any one or more steps in each of the different embodiments can be combined to form an independent embodiment, which is not limited here.

应理解，在上文示出的实施例中，第一、第二仅为便于区分不同的对象，而不应对本申请构成任何限定。It should be understood that, in the embodiments shown above, the first and the second are only for the convenience of distinguishing different objects, and should not constitute any limitation to the present application.

还应理解，在本申请的实施例中，上述过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should also be understood that, in the embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, rather than the implementation of the embodiments of the present application. The process constitutes any qualification.

还应理解，“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“至少一个”是指一个或一个以上；“A和B中的至少一个”，类似于“A和/或B”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和B中的至少一个，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。It should also be understood that "and/or", describing the association relationship of associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. a situation. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one" refers to one or more than one; "at least one of A and B", similar to "A and/or B", describes the association relationship of associated objects, indicating that there can be three kinds of relationships, for example, A and B At least one of the three cases can be represented: A exists alone, A and B exist simultaneously, and B exists alone.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method of positioning, comprising:

acquiring a first image shot by a first terminal;

determining at least one second image matched with the first image in the image database according to the first information of the first image and the first information of the images in the image database, wherein the first information is used for indicating the global features of the images, and the image database comprises the first information of multiframe images, the second information of the multiframe images and the poses of the multiframe images in a world coordinate system, wherein the second information is used for indicating the local features of the images;

determining the poses of the first image and the at least one second image in a first local coordinate system of the first terminal according to the second information of the first image and the second information of the at least one second image;

and outputting the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system and the pose of the at least one second image in the world coordinate system.

2. The method of claim 1, further comprising:

and establishing the image database.

3. The method of claim 2, wherein the establishing the image database comprises:

acquiring a video shot by a second terminal;

acquiring the pose of a multi-frame image in the video under a second local coordinate system of the second terminal;

determining the pose of the multi-frame image in the video in a world coordinate system according to third information and the pose of the multi-frame image in the video in the second local coordinate system, wherein the third information is used for indicating the position of at least part of the image in a map, and the map is associated with the world coordinate system;

and acquiring the first information and the second information of a plurality of frames of images in the video.

4. The method according to any one of claims 1-3, further comprising:

acquiring a first position of the first image, wherein the first position is from a Global Positioning System (GPS) module or a wireless fidelity (WiFi) module of the first terminal;

determining at least one image in the image database according to the first position of the first image, wherein the distance between the position of each image in the at least one image and the first position of the first image is less than a first threshold value;

wherein the matching the first information of the first image with the first information of images in an image database, determining at least one second image in the image database that matches the first image, comprises:

and matching the first information of the first image with the first information of the at least one image, and acquiring at least one second image matched with the first image in the at least one image.

5. The method of any of claims 1-4, wherein determining, in the image database, at least one second image that matches the first image based on the first information of the first image and first information of images in an image database comprises:

determining a plurality of third images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database;

deleting the images, of the plurality of third images, of which the distance from the cluster center of the plurality of third images is greater than a second threshold value, to obtain the at least one second image, wherein the cluster center is determined according to the positions of the plurality of third images in the world coordinate system.

6. The method of any one of claims 1-5, wherein determining, in the image database, at least one second image that matches the first image based on the first information of the first image and first information of images in an image database comprises:

determining a plurality of fourth images matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database;

and deleting images with angles smaller than a third threshold and/or distances smaller than a fourth threshold from the plurality of fourth images to obtain the at least one second image, wherein the angles are the difference of the poses of the at least two images in the world coordinate system, and the distances are the difference of the positions of the poses of the at least two images in the world coordinate system.

7. An apparatus for positioning, comprising:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a first image shot by a first terminal;

the processing unit is used for determining at least one second image matched with the first image in an image database according to the first information of the first image and the first information of the images in the image database, wherein the first information is used for indicating the global features of the images, and the image database comprises the first information of multi-frame images, the second information of the multi-frame images and the poses of the multi-frame images in a world coordinate system, wherein the second information is used for indicating the local features of the images;

the processing unit is further configured to determine, according to the second information of the first image and the second information of the at least one second image, poses of the first image and the at least one second image in a first local coordinate system of the first terminal;

an output unit configured to output the pose of the first image in the world coordinate system according to the pose of the first image in the first local coordinate system, the pose of the at least one second image in the first local coordinate system, and the pose of the at least one second image in the world coordinate system.

8. The apparatus according to claim 7, further comprising a building unit configured to build the image database.

9. The apparatus according to claim 8, wherein the establishing unit is specifically configured to:

acquiring a video shot by a second terminal;

10. The apparatus according to any one of claims 7 to 9,

the acquisition unit is further configured to acquire a first position of the first image, where the first position is from a Global Positioning System (GPS) module or a wireless fidelity (WiFi) module of the first terminal;

the processing unit is further configured to determine at least one image in the image database according to the first position of the first image, wherein a distance between the position of each image of the at least one image and the first position of the first image is smaller than a first threshold;

the processing unit is further configured to match the first information of the first image with the first information of the at least one image, and acquire at least one second image matched with the first image in the at least one image.

11. The apparatus according to any one of claims 7 to 10, wherein the processing unit is specifically configured to:

12. The apparatus according to any one of claims 7 to 11, wherein the processing unit is specifically configured to:

and deleting images with an angle smaller than a third threshold and/or a distance smaller than a fourth threshold from the plurality of fourth images to obtain the at least one second image, wherein the angle is a difference value of the poses of the at least two images in the world coordinate system, and the distance is a difference value of the positions of the poses of the at least two images in the world coordinate system.

13. A terminal device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the apparatus of any of claims 1-6.