CN114814872A

CN114814872A - Pose determination method and device, electronic equipment and storage medium

Info

Publication number: CN114814872A
Application number: CN202210363072.7A
Authority: CN
Inventors: 刘浩敏; 杭蒙; 张壮; 章国锋
Original assignee: Zhejiang Shangtang Technology Development Co Ltd
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2022-07-29
Also published as: TW202208879A; WO2022036980A1; JP2022548441A; CN111983635A; KR20220028042A; JP7236565B2; CN111983635B

Abstract

The disclosure relates to a pose determination method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: acquiring acquisition data acquired by a first terminal in a target scene; acquiring a global map containing the target scene, wherein the global map is generated based on map data obtained by a second terminal performing data acquisition on the global scene containing the target scene, and the global map meets a precision condition; and determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps. Through the process, a large amount of first pose data can be collected in a large scale, extra equipment setting for a target scene or extra calibration synchronization among a plurality of pieces of equipment and the like are reduced, and the data of the first pose also has high precision.

Description

Pose determination method and device, electronic equipment and storage medium

The application is a divisional application of Chinese patent application with the name of 'pose determination method and device, electronic equipment and storage medium', which is filed by China patent office with the application number of 202010826704.X on 17.08.2020.

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a pose determination method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of mobile sensors, network infrastructure, and cloud computing, the scale of augmented reality application scenarios is expanding from small to large environments. Localization in large-scale environments is a key requirement for augmented reality applications. Most of the commonly used positioning techniques require a large amount of motion truth data (such as pose data of the device during movement) and background information for positioning, so as to perform algorithm benchmark test or model training. Therefore, how to obtain a large amount of motion truth data at a low cost becomes a problem to be solved.

Disclosure of Invention

The disclosure provides a pose determination technical scheme.

According to an aspect of the present disclosure, there is provided a pose determination method including:

acquiring acquisition data acquired by a first terminal in a target scene; acquiring a global map containing the target scene, wherein the global map is generated based on map data obtained by a second terminal performing data acquisition on the global scene containing the target scene, and the global map meets a precision condition; and determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps.

In one possible implementation, the global map includes at least one frame of visual point cloud including at least one three-dimensional feature point in the global scene; the acquisition data comprises a first acquisition image; the determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps comprises: performing feature matching on the first collected image and the at least one frame of visual point cloud to obtain a global feature matching result; and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result.

In one possible implementation, the global map includes at least one frame of a visual point cloud in the target scene; the acquired data comprises at least two frames of first acquired images; the determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps comprises: performing feature matching on the first collected image and the at least one frame of visual point cloud to obtain a global feature matching result; performing feature matching according to the at least two frames of first collected images to obtain a local feature matching result; and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.

In one possible implementation, the collected data further includes first inertial measurement IMU data; determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result, including: acquiring first constraint information according to the global feature matching result and/or the local feature matching result; acquiring second constraint information according to the first IMU data; and processing the global feature matching result and the local feature matching result according to at least one of the first constraint information and the second constraint information to obtain at least one first pose of the first terminal in the acquisition process.

In a possible implementation manner, the processing the global feature matching result and the local feature matching result includes: and processing the global feature matching result and the local feature matching result through beam adjustment.

In a possible implementation manner, the performing feature matching on the first captured image and the at least one frame of visual point cloud to obtain a global feature matching result includes: and matching the two-dimensional feature points in the first acquired image with the three-dimensional feature points included in the at least one frame of visual point cloud to obtain a global feature matching result.

In one possible implementation, the method further includes: and determining motion truth value data according to at least one first pose of the first terminal in the acquisition process.

In a possible implementation manner, the determining motion truth value data according to at least one first pose of the first terminal in an acquisition process includes: taking at least one first position of the first terminal in the collection process as the motion truth value data; and/or taking at least one of the collected data and at least one first posture of the first terminal in the collection process as the motion truth value data, wherein the collected data comprises: one or more of wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, a first captured image, and first IMU data.

In one possible implementation, the motion truth data is used for at least one of the following operations: judging the precision of the positioning result, training the neural network and carrying out information fusion with the global map.

In one possible implementation, the map data includes: laser point clouds, a second acquired image and second IMU data in the global scene; the method further comprises the following steps: acquiring map data of the global scene acquired by a second terminal; and according to the map data, performing off-line reconstruction on the global scene to generate a global map of the global scene.

In a possible implementation manner, the reconstructing the global scene offline according to the map data to generate the global map of the global scene includes: determining at least one second pose of the second terminal in the data acquisition process according to the second IMU data and the laser point cloud; performing visual map reconstruction on the global scene by combining the second collected image according to the at least one second pose to obtain at least one frame of visual point cloud, wherein the visual point cloud comprises at least one three-dimensional feature point in the global scene; and obtaining a global map of the global scene according to the at least one frame of visual point cloud.

In a possible implementation manner, the performing, according to the at least one second pose and in combination with the second captured image, a visual map reconstruction on the global scene to obtain at least one frame of visual point cloud includes: according to the at least one second pose and the second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of initial visual point cloud; acquiring third constraint information in the process of reconstructing the visual map according to the laser point cloud and/or the second acquired image; and optimizing the at least one frame of initial visual point cloud according to the third constraint information to obtain at least one frame of visual point cloud.

In one possible implementation manner, the second terminal includes: the radar is used for acquiring laser point clouds in the global scene; a vision sensor for acquiring a second captured image in the global scene; and the IMU sensor is used for acquiring second IMU data in the global scene.

In a possible implementation manner, before the performing offline reconstruction on the global scene according to the map data and generating the global map of the global scene, the method further includes: calibrating a coordinate transformation relation between the visual sensor and the IMU sensor to obtain a first calibration result; calibrating the coordinate transformation relation between the radar and the vision sensor to obtain a second calibration result; and carrying out combined calibration on the coordinate transformation relation among the vision sensor, the IMU sensor and the radar according to the first calibration result and the second calibration result.

In one possible implementation, the method further includes: in the process of collecting map data by a second terminal, reconstructing the global scene in real time according to the map data to generate a real-time map of the global scene; and sending the map data and/or the real-time map to target equipment, wherein the target equipment is used for displaying the geographic range of completing data acquisition of the global scene.

According to an aspect of the present disclosure, there is provided a pose determination apparatus including:

the acquisition module of the collected data, is used for obtaining the collected data that the first terminal station in the goal scene gathers; the global map acquisition module is used for acquiring a global map containing the target scene, wherein the global map generates map data obtained by acquiring data of the global scene containing the target scene based on a second terminal, and the global map meets the precision condition; and the pose determining module is used for determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps.

In one possible implementation, the global map includes at least one frame of visual point cloud including at least one three-dimensional feature point in the global scene; the acquisition data comprises a first acquisition image; the pose determination module is to: performing feature matching on the first collected image and the at least one frame of visual point cloud to obtain a global feature matching result; and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result.

In one possible implementation, the global map includes at least one frame of a visual point cloud in the target scene; the acquired data comprises at least two frames of first acquired images; the pose determination module is to: performing feature matching on the first collected image and the at least one frame of visual point cloud to obtain a global feature matching result; performing feature matching according to the at least two frames of first collected images to obtain a local feature matching result; and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.

In one possible implementation, the collected data further includes first inertial measurement IMU data; the pose determination module is further to: acquiring first constraint information according to the global feature matching result and/or the local feature matching result; acquiring second constraint information according to the first IMU data; and processing the global feature matching result and the local feature matching result according to at least one of the first constraint information and the second constraint information to obtain at least one first pose of the first terminal in the acquisition process.

In one possible implementation, the pose determination module is further configured to: and processing the global feature matching result and the local feature matching result through beam adjustment.

In one possible implementation, the pose determination module is further configured to: and matching the two-dimensional feature points in the first acquired image with the three-dimensional feature points included in the at least one frame of visual point cloud to obtain a global feature matching result.

In one possible implementation, the apparatus further includes: and the motion truth value data acquisition module is used for determining motion truth value data according to at least one first pose of the first terminal in the acquisition process.

In one possible implementation manner, the motion truth data obtaining module is configured to: taking at least one first position of the first terminal in the collection process as the motion truth value data; and/or taking at least one of the collected data and at least one first posture of the first terminal in the collection process as the motion truth value data, wherein the collected data comprises: one or more of wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, a first captured image, and first IMU data.

In one possible implementation, the map data includes: laser point cloud, a second acquired image and second IMU data in the global scene; the device further comprises: the map data acquisition module is used for acquiring the map data of the global scene acquired by the second terminal; and the global map generation module is used for performing off-line reconstruction on the global scene according to the map data to generate a global map of the global scene.

In one possible implementation, the global map generation module is configured to: determining at least one second pose of the second terminal in the data acquisition process according to the second IMU data and the laser point cloud; performing visual map reconstruction on the global scene by combining the second collected image according to the at least one second pose to obtain at least one frame of visual point cloud, wherein the visual point cloud comprises at least one three-dimensional feature point in the global scene; and obtaining a global map of the global scene according to the at least one frame of visual point cloud.

In one possible implementation, the global map generation module is further configured to: according to the at least one second pose and the second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of initial visual point cloud; acquiring third constraint information in the process of reconstructing the visual map according to the laser point cloud and/or the second acquired image; and optimizing the at least one frame of initial visual point cloud according to the third constraint information to obtain at least one frame of visual point cloud.

In one possible implementation, the apparatus is further configured to: calibrating a coordinate transformation relation between the visual sensor and the IMU sensor to obtain a first calibration result; calibrating the coordinate transformation relation between the radar and the vision sensor to obtain a second calibration result; and carrying out combined calibration on the coordinate transformation relation among the vision sensor, the IMU sensor and the radar according to the first calibration result and the second calibration result.

In one possible implementation, the apparatus is further configured to: in the process of collecting map data by a second terminal, reconstructing the global scene in real time according to the map data to generate a real-time map of the global scene; and sending the map data and/or the real-time map to target equipment, wherein the target equipment is used for displaying the geographic range of completing data acquisition of the global scene.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the pose determination method described above.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above pose determination method.

In the embodiment of the disclosure, at least one first pose of a first terminal in the acquisition process is determined by acquiring the acquired data acquired by the first terminal in the target scene, acquiring a global map containing the target scene, and according to the characteristic corresponding relation between the acquired data and the global map. Through the process, the global map of the global scene can be repeatedly utilized, and a large amount of first attitude data can be collected through the first terminal in a large scale after the global map is generated; the acquisition mode of acquiring the acquired data for generating the first attitude is simple, acquisition can be realized only through the first terminal, and the additional equipment setting of a target scene or the additional calibration synchronization among a plurality of pieces of equipment and the like are reduced; in addition, the global map meets the precision condition, so that the data of the first pose obtained based on the acquired data and the feature corresponding relation between the global map also has higher precision.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of a pose determination method according to an embodiment of the present disclosure.

Fig. 2 shows a comparison schematic before and after visual point cloud optimization according to an embodiment of the disclosure.

Fig. 3 shows a schematic structural diagram of a second terminal according to an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of an application example according to the present disclosure.

Fig. 5 shows a block diagram of a pose determination apparatus according to an embodiment of the present disclosure.

Fig. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Fig. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a pose determination method according to an embodiment of the present disclosure, which may be applied to a pose determination apparatus, and a position determination apparatus may be a terminal device, a server, or other processing device. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.

In some possible implementations, the pose determination method may be implemented by a processor invoking computer readable instructions stored in a memory.

As shown in fig. 1, the pose determination method may include:

and step S11, acquiring data acquired by the first terminal in the target scene.

Step S12, a global map including the target scene is obtained, where the global map is generated based on map data obtained by the second terminal performing data acquisition on the global scene including the target scene, and the global map satisfies a precision condition.

And step S13, determining at least one first pose of the first terminal in the acquisition process according to the characteristic corresponding relation between the acquired data and the global map.

The target scene may be any scene where the first terminal acquires the acquired data, and the implementation form of the target scene may be flexibly determined according to actual needs, which is not limited in the embodiment of the present disclosure. In one possible implementation, the target scene may include an outdoor scene, such as a square, a street, or an open place; in one possible implementation, the target scene may include an indoor scene, such as a classroom, an office building, a residential building, or the like; in one possible implementation, the target scene may include both an outdoor scene and an indoor scene, etc.

The first terminal may be a mobile terminal having a data acquisition function, an implementation manner of the first terminal is not limited in the embodiment of the present disclosure, and any device having a mobile and data acquisition function may be used as an implementation manner of the first terminal. In one possible implementation, the first terminal may be an Augmented Reality (AR) device, such as a mobile phone or AR glasses.

The collected data may be data collected by the first terminal in a target scene, and an implementation form of the collected data and data content included in the collected data may be flexibly determined according to a data collection manner of the first terminal and an actual implementation form of the first terminal, and is not limited to the following disclosure embodiments. In a possible implementation manner, in a case that the first terminal is an AR device, the acquired data may include a first acquired image obtained by the first terminal performing image acquisition on the target scene, and the like; in a possible implementation manner, in a case that the first terminal is an AR device, the collected data may further include first Inertial Measurement Unit (IMU) data obtained by collecting target scene data by an IMU in the first terminal, and the like.

The method for acquiring the collected data by the first terminal is not limited in the embodiment of the present disclosure, and may be flexibly selected according to actual conditions of the first terminal and the target scene. The specific moving process and the specific moving mode can be flexibly selected according to actual conditions.

In step S11, the manner of acquiring the collected data is not limited in the embodiment of the present disclosure, and in a possible implementation manner, the collected data may be acquired by reading the collected data from the first terminal or receiving the collected data transmitted by the first terminal; in a possible implementation manner, the pose determination method provided in the embodiment of the present disclosure may also be applied to the first terminal, and in this case, the collected data acquired by moving the first terminal in the target scene may be directly acquired.

As can be seen from the foregoing disclosure, in a possible implementation manner, in this disclosure, the global map including the target scene may be further obtained in step S12, where the global map may be generated based on the map data and meets the accuracy condition, and the map data may be obtained by the second terminal performing data acquisition on the global scene of the target scene.

The global scene may be any scene including a target scene, and a range of the scene included in the global scene may be flexibly determined according to an actual situation, which is not limited in the embodiment of the present disclosure. As described in the above-mentioned embodiments, the target scene may include an outdoor scene and/or an indoor scene, and thus, the global scene may also include an outdoor scene and/or an indoor scene according to the actual situation of the target scene. For example, in a possible implementation manner, when the target scene is an outdoor scene including a certain open space or square, the global scene may be a scene of a district or a city area where the open space or square is located, and the global scene may include an outdoor scene in the district or the city area, or an indoor scene in the district or the city area.

The map data may be corresponding data obtained by the second terminal performing data acquisition on the global scene, and the map data may be used to generate a global map corresponding to the global scene. The data content contained in the map data can be flexibly determined according to actual requirements, and in a possible implementation manner, the map data can comprise a second acquired image obtained by acquiring an image of a global scene; in one possible implementation, the map data may include second IMU data obtained by performing IMU data acquisition on the global scene; in a possible implementation manner, the map data may further include laser point cloud data obtained by performing radar scanning on the global scene, and the like. Specific implementation forms of the map data can refer to the following disclosed embodiments, and are not expanded here.

The second terminal can be a mobile terminal with a map data acquisition function, and the implementation mode of the second terminal can be flexibly selected according to data contents required to be contained in the map data. For example, the second terminal may include a visual sensor for image acquisition in the case where the map data includes the second captured image, an IMU sensor for acquiring IMU data in the case where the map data includes the second IMU data, a radar for acquiring laser point cloud in the case where the map data includes the laser point cloud data, and the like. The hardware structure and the connection mode of the second terminal can be seen in the following embodiments, which are not expanded herein.

The global map may be a map generated based on map data, and the implementation form thereof may be determined according to the actual situation of the global scene and the data content of the map data. In a possible implementation manner, the global map may include related information of each three-dimensional feature point in the global scene, which information is specifically included, and how the information is embodied, and may be flexibly determined according to an actual situation. In a possible implementation manner, the global map may include related information of each three-dimensional feature point in the global scene, in a possible implementation manner, the three-dimensional feature point in the global scene may be observed in the form of an image, information content included in the related information of the three-dimensional feature point may be flexibly determined according to an actual situation, for example, the related information includes coordinates of the three-dimensional feature point and feature information of the three-dimensional feature point, where the feature information of the three-dimensional feature point may include one or more of feature descriptors corresponding to the three-dimensional feature point, communication signal fingerprints corresponding to the three-dimensional feature point, or semantic information. The detailed description of which related information is contained, how the related information is contained in the global map, and what the specific meaning of each related information is can be seen in the following disclosure embodiments, which are not expanded at first.

The precision of the global map may be the position precision of each three-dimensional feature point in the global map, and may be, for example, a position difference between coordinates of the three-dimensional feature point included in the global map and an actual position of the three-dimensional feature point in the global scene. Therefore, the precision condition of the global map can reflect whether the position of each three-dimensional feature point in the global map meets the precision requirement, and the specific value of the precision condition can be flexibly set according to the actual situation.

In a possible implementation manner, the difficulty of directly judging the position difference between the coordinates of the three-dimensional feature points in the global map and the actual positions of the three-dimensional feature points is likely to be high, so that whether the global map meets the precision condition can be indirectly judged by judging whether the data acquisition amount of the map data reaches a certain data value or whether the precision of a method for generating the global map meets the requirement or not. For example, in one example, whether the global map meets the accuracy condition may be indirectly inferred by determining whether a ratio between a geographic range corresponding to the collected map data and a geographic range covered by the global scene reaches a preset threshold.

The method for acquiring the global map can be flexibly selected according to actual conditions, and is not limited to the following disclosure embodiments. In one possible implementation manner, a global map may be generated in the pose determination apparatus according to map data by acquiring the map data acquired by the second terminal; in a possible implementation manner, the global map may also be generated in other devices or apparatuses, in which case, the manner of acquiring the global map may be to directly read the global map from the device storing or generating the global map.

The process of acquiring the map data by the second terminal through data acquisition of the global scene is not limited in the embodiment of the disclosure, and can be flexibly determined according to actual conditions. In a possible implementation manner, the second terminal may move in the global scene, so as to collect corresponding map data, specifically how to move, and how to collect the map data, which can be described in detail in the following disclosure embodiments, and is not expanded first.

Further, how to generate the global map of the global scene according to the map data can be flexibly determined according to the data condition contained in the map data, and the generation process is detailed in the following disclosed embodiments and is not expanded first.

In the above-mentioned disclosed embodiment, the implementation order of step S11 and step S12 is not limited in the disclosed embodiment, and in a possible implementation manner, step S11 and step S12 may be sequentially executed in a certain order, and in a possible implementation manner, step S11 and step S12 may also be executed simultaneously.

After acquiring the acquisition data in the target scene and the global map including the target scene, at least one first pose of the first terminal in the acquisition process can be determined according to the characteristic corresponding relation between the acquisition data and the global map.

As described in the foregoing disclosure embodiments, the collected data may be data obtained by collecting a target scene, and may reflect a feature of the target scene, and a global scene corresponding to the global map may also include the feature of the target scene because the global scene includes the target scene, so in a possible implementation manner, a feature correspondence between the collected data and the global map may include a feature correspondence between the collected data and the global map. In addition, as the first terminal moves in the target scene, a large amount of collected data can be collected, and the characteristics of the target scene can be reflected among the collected data, in a possible implementation manner, the characteristic corresponding relationship between the collected data and the global map can also include the characteristic corresponding relationship among the data included in the collected data. Specifically, at least one first pose is determined according to which feature relationships, and can be flexibly selected according to actual conditions, and detailed description is given in each of the following disclosure embodiments, and the description is not given here.

The first pose can be one or more poses generated at the moment of data acquisition of the first terminal in the moving process of the target scene, and the number of the determined first poses can be flexibly determined according to the actual situation. In one possible implementation, the first pose may correspond to the collected data, that is, each determined first pose may be a pose generated by the first terminal at the time of collecting each collected data. How to determine the process of the first posture according to the feature correspondence can also be flexibly selected according to actual conditions, and detailed descriptions are given in the following disclosure embodiments, and are not expanded here.

As described in the foregoing embodiments, the implementation form of the map data may be flexibly determined according to the actual situation, and the manner of generating the global map based on the map data may be flexibly determined according to the actual situation of the map data. Thus, in one possible implementation, the map data may include: laser point cloud, a second acquired image and second IMU data in the global scene;

the pose determination method provided in the embodiment of the present disclosure further includes:

acquiring map data of a global scene acquired by a second terminal;

and according to the map data, performing off-line reconstruction on the global scene to generate a global map of the global scene.

The laser point cloud may be a point cloud formed by a plurality of laser points obtained by radar scanning of the global scene by the second terminal, and the number of the laser points included in the laser point cloud may be flexibly determined according to the radar scanning condition of the second terminal and the moving track of the second terminal in the global scene, which is not limited in the embodiment of the present disclosure.

The second captured image may be a plurality of images captured during the process that the second terminal moves in the global scene, and the number of the second captured images may be determined jointly according to the moving condition of the second terminal in the global scene and the number of hardware devices included in the second terminal and used for capturing the images, which is not limited in the embodiment of the present disclosure.

The second IMU data may be related inertial measurement data acquired during the process that the second terminal moves in the global scene, and the quantity of the second IMU data may also be determined according to the movement condition of the second terminal in the global scene and the quantity of hardware devices included in the second terminal and used for acquiring the IMU data, which is not limited in the embodiment of the present disclosure.

As can be seen from the foregoing disclosure, in a possible implementation manner, the method for determining a pose provided in the disclosure may further include performing offline reconstruction on the global scene according to the acquired map data of the global scene acquired by the second terminal, so as to generate a global map of the global scene.

The off-line reconstruction may be a reconstruction process of the global map of the global scene according to the collected map data after the second data collects the map data in the global scene. Specifically, how to perform off-line reconstruction on the global scene according to the map data including the laser point cloud, the second acquired image and the second IMU data can be flexibly determined according to the actual situation, and the detailed description is given in each of the following disclosure embodiments, and is not expanded here.

In the embodiment of the disclosure, the global map of the global scene is generated by acquiring the map data including the laser point cloud, the second acquired image and the second IMU data and performing off-line reconstruction on the global scene according to the acquired map data. Through the process, after relatively comprehensive map data acquisition is completed on the global scene, a large amount of acquired map data are integrated, and the global scene is comprehensively reconstructed off line, so that the generated global map has relatively high precision, and the result of at least one first pose determined based on the global map and the acquired data is relatively accurate; meanwhile, the map data comprise the laser point cloud, the second acquisition image and the second IMU data, and the acquisition modes of the data are easy, and the acquisition process is less limited by space.

As described in the above-mentioned embodiments, the off-line reconstruction process can be flexibly determined according to actual situations. In one possible implementation, the off-line reconstructing a global scene according to map data to generate a global map of the global scene includes:

determining at least one second pose of the second terminal in the data acquisition process according to the second IMU data and the laser point cloud;

according to at least one second pose and in combination with a second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of visual point cloud, wherein the visual point cloud comprises at least one three-dimensional feature point in the global scene;

and obtaining a global map of the global scene according to the at least one frame of visual point cloud.

The method for determining at least one second pose of the second terminal in the data acquisition process according to the second IMU data and the laser point cloud may be flexibly determined according to actual conditions, and any method that can recover the pose of the second terminal based on the laser point cloud and the IMU data may be used as an implementation method in the embodiments of the present disclosure, and is not limited to the following embodiments. In a possible implementation manner, the acquired laser point may be projected onto the lidar frame at different times of the second terminal in the data acquisition process according to the second IMU data, so that the second pose of the second terminal at different times in the data acquisition process may be estimated based on the projection result of the laser point.

After at least one second pose of the second terminal in the data acquisition process is determined, visual map reconstruction can be performed on the global scene according to the at least one second pose and the second acquired image to obtain at least one frame of visual point cloud. The visual point cloud may include at least one three-dimensional feature point in the global scene, and the number of visual point clouds and the number of included three-dimensional feature points are not limited in the embodiments of the present disclosure, and in one possible implementation, the global map may include one or more frames of visual point clouds. As described in the foregoing embodiments, the global map may include information related to three-dimensional feature points in the global scene, and in one possible implementation, the three-dimensional feature points in the global scene may be observed in the form of images, so that, in one possible implementation, the visual point cloud may be observed through a visual image, in which case, the global map may also include at least one or more visual images for observing the visual point cloud.

Further, the visual point cloud includes three-dimensional feature points, and feature information of the three-dimensional feature points may also be stored in the global map, so that the visual point cloud may also correspond to the feature information of the three-dimensional feature points, for example, in a possible implementation manner, a feature descriptor of the three-dimensional feature points may be determined according to features extracted from the second acquired image, and thus, the visual point cloud may correspond to the feature descriptor of the three-dimensional feature points; in a possible implementation manner, the acquired map data may further include signal data related to communication, such as a WiFi signal, a bluetooth signal, or an Ultra Wide Band (UWB) signal, and the like, and these signals may serve as signal fingerprints and correspond to the three-dimensional feature points, so as to serve as feature information of the three-dimensional feature points, and thus, the visual point cloud may correspond to the communication signal fingerprints of the three-dimensional feature points; in a possible implementation manner, the second captured image may further include some semantic information, and the semantic information may also establish a corresponding relationship with the three-dimensional feature point, so as to serve as feature information of the three-dimensional feature point, in this case, the visual point cloud may establish a corresponding relationship with the semantic information, and the like.

According to at least one second pose and in combination with a second acquired image, the process of performing visual map reconstruction on the global scene can be realized through a related technology, and specifically, what kind of mode is adopted to realize the process can be flexibly selected according to actual conditions, and the method is not limited to the following disclosed embodiments. In a possible implementation manner, feature extraction and matching may be performed on the second acquired image through Scale-invariant feature transform (SIFT), so as to generate at least one frame of visual point cloud, and further, according to at least one second pose determined by the laser point cloud and the second IMU data, information such as coordinates of each three-dimensional feature point observed in the at least one frame of visual point cloud may be further obtained.

After at least one frame of visual point cloud is obtained through visual map reconstruction, a global map of the global scene can be obtained according to the at least one frame of visual point cloud. In a possible implementation manner, all obtained visual point clouds, the feature information of three-dimensional feature points corresponding to the visual point clouds and the like can be used as a global map together; in a possible implementation manner, one or more frames may also be selected from the obtained visual point cloud, and the selected frames or frames may be collectively used as the global map according to the feature information of the three-dimensional feature points corresponding to the one or more frames of visual point clouds and the like.

In the embodiment of the disclosure, at least one second pose of the second terminal in the data acquisition process is determined according to the second IMU data and the laser point cloud, and the global scene is subjected to visual map reconstruction according to the at least one second pose and the second acquired image to obtain at least one frame of visual point cloud, so that the global map of the global scene is obtained according to the at least one frame of visual point cloud; through the process, the laser point cloud, the second IMU data and the second collected image can be comprehensively utilized, information such as positions and characteristics of all three-dimensional characteristic points in the global scene is represented through the visual point cloud, the reconstruction of the global map can be realized through data which are easy to obtain, the reconstruction result is accurate, and the convenience and the determination precision of the whole posture determination process are improved.

In a possible implementation manner, performing visual map reconstruction on the global scene by combining the second collected image according to at least one second pose to obtain at least one frame of visual point cloud, including:

according to at least one second pose and in combination with a second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of initial visual point cloud;

acquiring third constraint information in the visual map reconstruction process according to the laser point cloud and/or the second acquired image;

and optimizing the at least one frame of initial visual point cloud according to the third constraint information to obtain at least one frame of visual point cloud.

In one possible implementation, the accuracy may be lower due to the second pose determined from the laser point cloud. In this case, the determined second pose is directly utilized, and the visual point cloud obtained by performing visual map reconstruction by combining the second acquired image may contain larger noise. Therefore, in a possible implementation manner, after the global scene is subjected to the visual map reconstruction according to the second pose and the second acquired image, the image obtained by the visual map reconstruction can be used as the initial visual point cloud, and the initial visual point cloud is further optimized according to the third constraint information generated by the laser point cloud and/or the second acquired image, so that the noise in the initial visual point cloud is reduced, and the visual point cloud with higher precision is obtained.

The process of reconstructing the visual map according to the second pose and the second captured image to obtain at least one frame of initial visual point cloud may refer to the above-described embodiments, and is not described herein again.

The third constraint information may be constraint information determined according to the laser point cloud and/or the second collected image, and specifically how to obtain the third constraint information according to the laser point cloud and/or the second collected image, and this process may be flexibly selected according to actual situations, and is not limited to the following disclosure embodiments.

In one possible implementation, obtaining the third constraint information in the visual map reconstruction process according to the laser point cloud may include:

performing feature extraction on the laser point cloud by a real-time laser odometer and Mapping in real-time LOAM (laser Odometry and Mapping in real-time) method, and determining the plane feature and the edge feature of the laser point cloud;

determining plane constraint of the laser point cloud in the process of reconstructing the visual map according to the plane characteristics of the laser point cloud;

determining the edge constraint of the laser point cloud in the process of reconstructing the visual map according to the edge characteristics of the laser point cloud;

and acquiring third constraint information in the process of reconstructing the visual map according to the plane constraint of the laser point cloud and/or the edge constraint of the laser point cloud.

The plane features of the laser point cloud can be flexibly determined according to the actual situation of the laser point cloud, the specific form of the plane constraint determined based on the plane features of the laser point cloud can be flexibly selected according to the actual situation, and in one example, the plane constraint can be expressed by the following formula (1):

wherein n and m are two different laser point cloud coordinate systems, ^m n is a characteristic point in the coordinate system m ^m The normal vector of the planar features at q, ^m n ^T is composed of ^m The transpose of n is then performed,

is a transformation between coordinate systems n and m, ⁿ p is a feature point in the coordinate system n, ^m q is a characteristic point in the coordinate system m,

is based on

This coordinate transformation relation pair ⁿ p coordinate transformation performed, Σ _p Covariance matrix of planar features of laser point cloud, where _p The value of (c) can be flexibly set according to the actual situation, and in one example, sigma _p Can be set to 0.2 square meters.

Similarly, the edge feature of the laser point cloud may also be flexibly determined according to the actual situation of the laser point cloud, and the specific form of the edge constraint determined based on the edge feature of the laser point cloud may be flexibly selected according to the actual situation, and in one example, the edge constraint may be represented by the following formula (2):

wherein the content of the first and second substances, ^m i is a characteristic point in a coordinate system m ^m The edge feature direction vector at q, ∑ _e For the covariance matrix of the edge features of the laser point cloud, the meaning of the remaining parameters can be referred to the above embodiment, where Σ _e The value of (c) can be flexibly set according to the actual situation, and in one example, sigma _e Can be set to 0.5 square meter.

After the plane constraint of the laser point cloud and the edge constraint of the laser point cloud are respectively determined, both the plane constraint and the edge constraint can be used as third constraint information, and one of the plane constraint and the edge constraint can also be used as the third constraint information, so that how to select can be flexibly determined according to actual conditions.

In a possible implementation manner, obtaining the third constraint information in the visual map reconstruction process according to the second acquired image may include:

projecting the three-dimensional characteristic points corresponding to the initial visual point cloud to obtain a projection result;

acquiring visual constraints in the process of reconstructing a visual map according to errors between the projection result and two-dimensional feature points in the initial visual point cloud, wherein the two-dimensional feature points are two-dimensional feature points corresponding to the three-dimensional feature points in the initial visual point cloud;

and acquiring third constraint information in the visual map reconstruction process according to the visual constraint.

And acquiring a specific process of visual constraint in the visual map reconstruction process according to the error between the projection result and the two-dimensional feature point corresponding to the three-dimensional feature point in the initial visual point cloud, and flexibly selecting according to actual conditions. In one example, the visual constraint may be represented by the following equation (3):

wherein, X _j Is the jth three-dimensional feature point, x, corresponding to the visual point cloud _ij For the ith frame, calculating the three-dimensional feature point X in the initial visual point cloud _j Corresponding two-dimensional feature points, f: ( ^W T _i ,X _j ) To form three-dimensional feature points X _j Projection result, sigma, of the initial visual point cloud projected to the ith frame _v Covariance matrix constrained for image features, where ∑ _v The value of (a) can be flexibly set according to the actual situation, and in one example, sigma _v It can be set to 2 pixels squared.

The third constraint information specifically includes which information can be flexibly selected according to actual conditions, and it can be seen from the above-mentioned embodiments that the third constraint information may include one or more of a plane constraint of the laser point cloud, an edge constraint of the laser point cloud, and a visual constraint. In one example, the third constraint information may include a plane constraint of the laser point cloud, an edge constraint of the laser point cloud, and a visual constraint, in which case, according to the third constraint information, the at least one initial frame of visual point cloud is optimized, and the process of obtaining the at least one frame of visual point cloud may be represented by the following formula (4):

wherein L is _p Is a point cloud formed by points belonging to a plane in the laser point cloud, and L' p is L _p Set of (2), L _e Is a point cloud formed by points belonging to the edge in the laser point cloud, and L' e is L _e Reference may be made to the above-disclosed embodiments for the meanings of the remaining parameters.

As can be seen from formula (4), in a possible implementation manner, according to the third constraint information, at least one frame of initial visual point cloud is optimized, which may include optimizing a three-dimensional feature point included in the initial visual point cloud, and may also include optimizing a pose of a device acquiring a second captured image in the second terminal, and in a case of optimizing a pose of a device acquiring a second captured image in the second terminal, correspondingly, a second pose corresponding to the second terminal may also be optimized, thereby reducing a case of large noise included in the visual point cloud due to low accuracy of the second pose. Further, after the visual point cloud is optimized, the third constraint information of the visual map reconstruction process can be obtained again based on the optimization result of the visual point cloud, so that the visual point cloud is further iteratively optimized, the iteration times can be flexibly selected according to the actual situation, and the embodiment of the disclosure is not limited. Fig. 2 shows a comparison schematic diagram before and after optimization of a visual point cloud according to an embodiment of the present disclosure, where for a same scene, an upper square frame is a visual image corresponding to the visual point cloud before optimization, and a lower square frame is a visual image corresponding to the visual point cloud after optimization, and it can be seen from the diagram that noise points in the visual point cloud after optimization are reduced, so that the visual point cloud after optimization has higher precision, and correspondingly, the precision of three-dimensional feature points corresponding to the visual point cloud after optimization is also improved.

As described in the foregoing disclosure embodiments, the map data collected by the second terminal may include the second collected image, the second IMU data, and the laser point cloud, and accordingly, the second terminal may also include a hardware structure having the data collection function. Therefore, in one possible implementation, the second terminal may include:

the radar is used for acquiring laser point cloud in a global scene;

the vision sensor is used for acquiring a second acquisition image in the global scene;

and the IMU sensor is used for acquiring second IMU data in the global scene.

The radar may be any radar having a laser point cloud acquisition function, and an implementation form of the radar is not limited in this disclosure, and in one possible implementation manner, the radar may be a 3D radar. The visual sensor may be any sensor with image capturing function, such as a camera, etc., and the specific implementation form thereof may also be flexibly determined, and in a possible implementation form, the second terminal may simultaneously include a 4-array camera with 360 ° image capturing function. The implementation form of the IMU sensor can also be flexibly determined according to actual situations, and is not limited in the embodiments of the present disclosure.

The arrangement positions and the connection relations among the radar, the vision sensor and the IMU sensor in the second terminal can be flexibly selected according to actual conditions, and are not limited to the following disclosed embodiments. In a possible implementation manner, the radar, the vision sensor and the IMU sensor may be rigidly connected, and a specific connection sequence may be flexibly selected according to actual conditions. In one possible implementation, the vision sensor and the IMU sensor may be fixedly connected and packaged as one fixed structural unit, and the radar may be disposed above the fixed structural unit. In a possible implementation manner, the vision sensor, the IMU sensor and the radar may also be fixedly disposed in a backpack, and fig. 3 illustrates a structural schematic diagram of a second terminal according to an embodiment of the present disclosure, and as can be seen from the diagram, in an example, the vision sensor and the IMU sensor may be fixedly connected and packaged as a fixed structural unit, a lower end of the fixed structural unit may be disposed in the backpack so as to be convenient for carrying, and the radar may be disposed above the fixed structural unit.

Through the second terminal including the radar, the vision sensor and the IMU sensor, the map data in the global scene can be comprehensively collected by the second terminal, so that the generation of a subsequent global map is facilitated. Through the structure, the generation of the global map can be realized by utilizing simpler and low-cost hardware equipment, so that a plurality of first poses of the first terminal are further acquired based on the global map, and the hardware realization cost and difficulty for acquiring the first pose data are greatly reduced.

Because the second terminal can comprise hardware devices such as a radar sensor, a vision sensor and an IMU sensor, the hardware devices may need to be calibrated or the data measurement time calibration before use, and further, the coordinate transformation relation between different hardware can be calibrated while each hardware is calibrated, so that the precision of the generated global map is improved. Therefore, in a possible implementation manner, before performing offline reconstruction on the global scene according to the map data and generating the global map of the global scene, the method may further include:

calibrating a coordinate transformation relation between the visual sensor and the IMU sensor to obtain a first calibration result;

calibrating the coordinate transformation relation between the radar and the vision sensor to obtain a second calibration result;

and performing combined calibration on the coordinate transformation relation among the visual sensor, the IMU sensor and the radar according to the first calibration result and the second calibration result.

The method for calibrating the coordinate transformation relation between the visual sensor and the IMU sensor can be flexibly selected according to actual conditions, and in one example, the calibration of the visual sensor and the IMU sensor can be realized by a Kalibr tool; the method for calibrating the coordinate transformation relation between the radar and the vision sensor can be flexibly selected according to the actual situation, and in one example, the calibration of the radar and the vision sensor can be realized through an AutoWare framework. Further, because there may be an error in the calibration process, in a possible implementation manner, the coordinate transformation relationship among the visual sensor, the IMU sensor, and the radar may be jointly calibrated and optimized according to the first calibration result and the second calibration result, so that the coordinate transformation relationship among different hardware devices is more accurate.

The implementation form of the joint calibration can be flexibly determined according to practical situations, and is not limited to the following disclosed embodiments, and in one possible implementation form, the joint calibration can be represented by the following formula (5):

wherein, C _i Is the ith vision sensor in the second terminal, I is the IMU sensor, L is the radar,

for the coordinate transformation relationship between the ith vision sensor and the IMU sensor, ^I T _L for coordinate transformation between the radar and IMU sensors,

for the coordinate transformation relationship between the radar and the ith vision sensor, covariance ∑ _c /∑ _L Respectively representing the error in the calibration process of the IMU sensor and the radar, the value of the error can be flexibly set according to the actual situation, in one example, sigma _c Sum Σ _L May be set to 0.01 radians for all rotational components in the diagonal matrix ² ，∑ _c All conversion components can be set to be 0.03 square meter and sigma _L All conversion classifications of (0.03,0.03.0.15) square meters can be set.

According to the formula (5) and the calibration process, the coordinate transformation relation between the vision sensor and the IMU sensor and the coordinate transformation relation between the radar and the IMU sensor obtained through combined calibration can be used for enabling the overall calibration error to be small, the global map is generated after calibration, the precision of the global map can be greatly improved, and the precision of the whole pose determination process is improved.

In a possible implementation manner, the pose determination method provided by the embodiment of the present disclosure may further include:

and in the process of acquiring the map data by the second terminal, reconstructing the global scene in real time according to the map data to generate a real-time map of the global scene.

And sending map data and/or a real-time map to target equipment, wherein the target equipment is used for displaying the geographic range for completing data acquisition of the global scene.

In the foregoing embodiments, it is mentioned that, in a possible implementation manner, the global scene may be reconstructed offline according to the map data to generate a global map of the global scene.

In a possible implementation manner, if the collected map data is not comprehensive enough, for example, missing collection of map data in a part of scenes in the global scene will easily result in reduction of accuracy of the global map established offline, and if collection of map data is performed on the global scene again, extra human labor is required, and the calculation amount is increased. In some possible implementations, repeated acquisition may also occur during the acquisition of the map data.

Therefore, in a possible implementation manner, in order to facilitate mastering of the acquisition condition of the map data, the global scene may be reconstructed in real time according to the map data in the process of acquiring the map data by the second terminal, so as to generate a real-time map of the global scene. The real-time map may refer to the global map, and details are not repeated here, and in one example, the real-time map may cover each scene corresponding to the map data that has been acquired by the second terminal in the global scene.

The real-time reconstruction process can refer to the off-line reconstruction process in the above disclosed embodiments, and is not described in detail here. In a possible implementation manner, the real-time reconstruction may be performed based on currently acquired map data, and compared with the off-line reconstruction performed based on a large amount of map data obtained after the acquisition is completed, the reconstruction data amount is small, so that the reconstruction speed may be higher. In one possible implementation, the real-time reconstruction may omit some optimization processes in the offline reconstruction to increase the reconstruction speed, such as in one example, the real-time reconstruction may omit the processes of acquiring the third constraint information and optimizing the visual point cloud according to the third constraint information. In a possible implementation manner, the real-time reconstruction may be implemented by using some specific 3D radar instant positioning and Mapping (SLAM), also called cml (current Mapping and Mapping) systems, and in one example, the global scene may be reconstructed in real time by using an open source Cartographer library to generate a real-time map of the global scene.

In a possible implementation manner, as can be seen from the foregoing disclosure, the method for determining an attitude provided by the present disclosure may further include sending map data and/or a real-time map to the target device.

The target device may be configured to display a geographic range in which data acquisition is completed for the global scene, that is, in a possible implementation manner, the target device may display a geographic range covered by map data acquired by the second terminal, so as to indicate a subsequent moving direction of the second terminal in the global scene and an acquisition requirement of the map data. The implementation form of the target device is not limited in the embodiment of the present disclosure, and in a possible implementation manner, the target device may be a handheld device in a hand of an operator performing map data acquisition as shown in fig. 3, such as a tablet computer or a mobile phone; in a possible implementation, the target device may be a controller or a display screen of the mobile device if the second terminal is placed on the mobile device (such as an automatic robot) for collecting the map data.

The specific data sent to the target device may also be flexibly selected according to the situation, and as described in the above-described embodiment, in one possible implementation manner, the collected map data may be sent to the target device, in one possible implementation manner, the real-time map may also be sent to the target device, and in one possible implementation manner, the map data, the real-time map, and the like may also be sent to the target device at the same time.

The real-time map is generated by reconstructing the global scene in real time according to the map data in the process of acquiring the map data by the second terminal, and the map data and/or the real-time map are sent to the target device.

The global map may be generated by various combinations of the above disclosed embodiments, so that the global map obtained by step S12 has the possibility of implementation. After acquiring the acquisition data and the global map, as described in the above disclosed embodiments, at least one first pose of the first terminal during the acquisition process may be determined through step S13.

The implementation of step S13 may be determined flexibly, and in one possible implementation, the global map may include at least one frame of visual point cloud, where the visual point cloud includes at least one three-dimensional feature point in the global scene; the acquired data comprises a first acquired image; in this case, step S13 may include:

performing feature matching on the first collected image and at least one frame of visual point cloud to obtain a global feature matching result;

and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result.

The first collected image may be an image collected by the first terminal in a data collection process in a target scene, the number of the first collected images may be flexibly selected according to an actual situation, and may be one frame or multiple frames, and the number of the first collected images is determined according to the actual situation, which is not limited in the embodiment of the present disclosure.

The global feature matching result may be a three-dimensional feature point in at least one frame of visual point cloud, which is matched with a two-dimensional feature point in the first captured image, how to obtain the global feature matching result may be flexibly selected according to actual conditions, and details are shown in the following disclosure embodiments.

The implementation form of the visual point cloud can refer to the above disclosed embodiments, and is not described herein again.

As can be seen from the above disclosed embodiments, in a possible implementation manner, the first collected image and at least one frame of visual point cloud may be subjected to feature matching to obtain a global feature matching result. The feature matching relationship between the first captured image and the visual point cloud can be flexibly selected according to actual conditions, and any method capable of realizing feature matching between images can be used as a feature matching mode between the first captured image and the visual point cloud, and is not limited to the following disclosed embodiments. In one possible implementation, SIFT may be employed, and/or sparse optical flow Tracking (KLT) may be employed to implement feature matching, etc.

In one possible implementation, the performing feature matching on the first captured image and the at least one frame of visual point cloud to obtain a global feature matching result may include:

and matching the two-dimensional feature points in the first acquired image with the three-dimensional feature points included in at least one frame of visual point cloud to obtain a global feature matching result.

As described in the foregoing disclosure embodiments, the visual point cloud may include at least one three-dimensional feature point in the global scene, and various types of feature information corresponding to the three-dimensional feature points may also be in the global map, so in a possible implementation manner, the two-dimensional feature point in the first captured image may be feature-matched with the three-dimensional feature point included in the at least one frame of visual point cloud to obtain a global matching result. The feature information used for feature matching may be one or more of various kinds of feature information such as a feature descriptor, a communication signal fingerprint, or semantic information mentioned in the above disclosed embodiments, specifically which feature information is included, and how to perform matching, which can be flexibly selected according to actual situations, and is not limited to the following disclosed embodiments.

In a possible implementation manner, the manner of matching the two-dimensional feature points in the first acquired image with the three-dimensional feature points included in the at least one frame of visual point cloud to obtain the global feature matching result may be implemented in a manner of approximating Nearest neighbor search ann (approximate Nearest neighbor). In one example, for the features included in the first captured image, K features closest to the features may be found in the global map (the number of K may be flexibly set according to actual situations). And then the K characteristics can vote for each frame of visual point cloud in the global map to indicate whether the visual point cloud corresponds to the first acquired image, if the voting number of a certain frame or a plurality of frames of visual point clouds exceeds a set threshold value, the visual image corresponding to the certain frame or the plurality of frames of visual point clouds can be regarded as a common-view image of the first acquired image, and each three-dimensional characteristic point matched with the two-dimensional characteristic point in the first acquired image in the common-view image can be used as a global characteristic matching result.

The two-dimensional feature points in the first collected image are matched with the three-dimensional feature points corresponding to at least one frame of visual point cloud through the ANN to obtain a global feature matching result, so that the condition of mismatching in the feature matching process can be reduced, the precision of the global feature matching result is improved, and the precision of pose determination is improved.

After the global feature matching result is obtained, at least one first pose of the first terminal in the acquisition process can be determined according to the global feature matching result, and the implementation mode of the process can be flexibly selected according to the actual situation, and is not limited to the following disclosed embodiments. In a possible implementation manner, the global feature matching result may be subjected to pose estimation by a ransac (random Sample consensus) method, a perspective N-point positioning (PnP), and the like, and the estimated pose is optimized by a re-projection error optimization method, so as to obtain at least one first pose of the first terminal in the acquisition process.

Through the process, the characteristics corresponding to the visual point cloud in the global map can be matched with the characteristics between the first collected images, so that the pose of the first terminal is estimated by using the characteristics matched in the first collected images, and at least one pose of the first terminal is obtained.

In one possible implementation, the acquisition data may include at least two frames of the first acquired image, and step S13 may include:

step S131, performing feature matching on the first collected image and at least one frame of visual point cloud to obtain a global feature matching result;

step S132, performing feature matching according to at least two frames of first collected images to obtain a local feature matching result;

and S133, determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.

The manner of performing feature matching on the first captured image and the at least one frame of visual point cloud to obtain the global feature matching result may refer to the above-mentioned disclosed embodiments, and is not described herein again.

In a possible implementation manner, because the generated global map may not completely cover the global scene, the first pose may be determined only according to the global feature matching result obtained by performing feature matching between the first acquired image and the visual point cloud, and the result of determining the first pose may be inaccurate or the first pose may not be determined due to reasons such as incomplete three-dimensional feature points included in the visual point cloud or a small number of three-dimensional feature points included in the visual point cloud. Therefore, in a possible implementation manner, a local feature matching result can be further obtained according to a feature matching relationship between different first acquired images under the condition that the acquired data includes at least two frames of first acquired images, so that at least one first pose of the first terminal in the acquisition process is jointly determined according to the global feature matching result and the local feature matching result.

The local feature matching result may be two-dimensional feature points that are matched with each other between different first collected image frames, the process of performing feature matching according to at least two first collected images may be flexibly selected according to actual situations, and any manner that can achieve feature matching between different images may be used as a corresponding implementation manner, and is not limited to the following disclosed embodiments. In one possible implementation, the local feature matching result may be obtained by performing feature matching by using optical flow features between different first captured images through the KLT method mentioned in the above disclosed embodiment.

After the global feature matching result and the local feature matching result are obtained, at least one first pose of the first terminal in the acquisition process can be determined jointly through the step S133 according to the global feature matching result and the local feature matching result, and the implementation manner of the step S133 can be determined flexibly according to the actual situation. In a possible implementation manner, the implementation manner of step S133 may refer to the above disclosed embodiment, and the manner of determining the first pose based on the global feature matching result, for example, the global feature matching result and the local feature matching result may be estimated and further optimized by the methods of RANSAC and PnP.

The method comprises the steps of carrying out feature matching on a first collected image and at least one frame of visual point cloud to obtain a global feature matching result, obtaining a local feature matching result according to at least two frames of the first collected image, and determining at least one first pose of a first terminal in the collecting process according to the global feature matching result and the local feature matching result. Through the process, the global feature matching result can be assisted through the local feature matching result, so that the influence of the global map on the pose determination result caused by incomplete global scene coverage is reduced, and the accuracy of the first pose is improved.

In one possible implementation, the acquiring data may further include the first IMU data, in which case step S133 may include:

acquiring first constraint information according to the global feature matching result and/or the local feature matching result;

acquiring second constraint information according to the first IMU data;

and processing the global feature matching result and the local feature matching result according to at least one of the first constraint information and the second constraint information to obtain at least one first pose of the first terminal in the acquisition process.

The first IMU data may be inertial measurement data acquired by the first terminal in a data acquisition process in a target scene.

It can be seen from the foregoing disclosure that, in a possible implementation manner, in the process of determining the first pose through the global feature matching result and the local feature matching result, the first constraint information and the second constraint information may also be acquired to add a constraint to the process of obtaining the first pose.

The first constraint information may be constraint information obtained according to the global feature matching result and/or the local feature matching result. How to obtain the first constraint information is, the implementation form may be flexibly selected according to the actual situation, and is not limited to the following disclosure embodiments.

In a possible implementation manner, the first constraint information may be obtained by using information of the three-dimensional feature point and the two-dimensional feature point that are matched in the global feature matching result. In one example, according to the global feature matching result, the process of obtaining the first constraint information may be expressed by the following formula (6):

wherein, the first and the second end of the pipe are connected with each other, ^W T _i the pose of the device for acquiring the first acquisition image in the first terminal under the condition of acquiring the first acquisition image of the ith frame,

for the jth three-dimensional feature point matched in the global feature matching result,

matching sums in results for global features

The two-dimensional feature points of the matching are,

to combine three-dimensional feature points

And projecting the projection result onto the first acquired image of the ith frame.

In one possible implementation manner, the first constraint information may be obtained by using information of the three-dimensional feature point and the two-dimensional feature point that are matched in the local feature matching result. In one example, according to the local feature matching result, the process of obtaining the first constraint information may be expressed by the following formula (7):

wherein x is _ij For the jth two-dimensional feature point, X, matched in the local feature matching result _j Matching x in the result for local feature _ij Three-dimensional feature points mapped in the target scene, f: ( ^W T _i ,X _j ) To form three-dimensional feature points X _j The meaning of the projection result projected onto the ith frame of the first acquired image and the remaining parameters can refer to the above-disclosed embodiments.

The calculation result of the formula (6) or the formula (7) may be used as the first constraint information. In a possible implementation manner, the first constraint information may also be obtained jointly according to the global feature matching result and the local feature matching result, and in this case, the first constraint information may be obtained by combining the manner of obtaining the first constraint information in the formula (6) and the manner of obtaining the first constraint information in the formula (7).

Similarly, the second constraint information may be constraint information derived from the first IMU data. Specifically, how to obtain the second constraint information, the implementation form thereof may be flexibly selected according to the actual situation, and is not limited to the following disclosure embodiments.

In one possible implementation manner, the second constraint information may be obtained by using relevant parameters of a device in the first terminal, which acquires the first acquired image and the first IMU data. In one example, the process of obtaining the second constraint information according to the first IMU data may be expressed by the following formula (8):

wherein, C _i ＝( ^W T _i ,Wv _i ,b _a ,b _g ) For acquiring parameters of the first terminal in case of the first acquired image of the ith frame, Wv _i Is the speed of the first terminal, b _a For the acceleration offset of the device in the first terminal measuring the first IMU data, b _g For the gyroscope measurement bias of the device in the first terminal measuring the first IMU data, h (-) is the IMU cost function, and the meaning of the remaining parameters can be referred to the above disclosed embodiments.

The calculation result of equation (8) may be used as the second constraint information. As can be seen from the foregoing disclosure embodiments, in a possible implementation manner, the second constraint information may be determined according to a change condition of the first IMU data in the process of acquiring the first acquired image by the first terminal.

After the first constraint information and the second constraint information are obtained, the global feature matching result and the local feature matching result can be processed according to at least one of the first constraint information and the second constraint information, and at least one first pose of the first terminal in the acquisition process is obtained.

The manner of processing the global feature matching result and the local feature matching result according to at least one of the first constraint information and the second constraint information is not limited in the embodiment of the present disclosure, and any calculation manner that can process the feature matching result based on the constraint information to obtain the pose can be used as an implementation manner in the embodiment of the present disclosure.

In a possible implementation manner, the processing the global feature matching result and the local feature matching result may include: and processing the global feature matching result and the local feature matching result through the adjustment of the beam method.

The Bundle Adjustment (BA) is an implementation manner of pose solving. For example, the specific process of solving the pose through the BA can be flexibly determined according to the actual situation. In one example, at least one of formula (6) to formula (8) may be used as constraint information in the process of solving the first pose, and the constraint information may be solved by BA to calculate the first pose under the minimum error. Specifically, which data is used as constraint information is not limited in the embodiment of the present disclosure, and in a possible implementation manner, the first constraint information and the second constraint information may be used together as constraint information, and in this case, a process of solving the constraint information by using BA may be represented by the following formula (9):

the meaning of each parameter can refer to each disclosed embodiment, and is not described herein again.

The concrete process of determining at least one first pose by solving formula (9) with BA is not limited in the embodiment of the present disclosure, and any method of solving with BA may be used as the solving method applied in the embodiment of the present disclosure. In one possible implementation, in the process of solving equation (9), the formula (9) may be calculated by using a keyframe solution and an incremental BA (ICE-BA) solution method, so as to determine at least one first pose of the first terminal in the acquisition process.

And processing the global feature matching result and the local feature matching result through bundle adjustment based on at least one of the first constraint information and the second constraint information to obtain at least one first pose of the first terminal in the acquisition process. Through the process, the obtained first posture can be optimized by utilizing at least one of the first constraint information and the second constraint information, so that the finally determined first posture is smoother overall, and jitter is reduced.

Furthermore, the first pose is solved by using key frames, ICE-BA and the like, so that the calculated amount in the first pose determining process can be effectively reduced, and the efficiency of the pose determining process is improved.

As described in the foregoing disclosure embodiments, the accuracy of the first pose determined in the disclosure embodiment is high, so that the method provided in the disclosure embodiment may be applied to various scenes in the field of mobile positioning, and specifically, which scene may be selected according to an actual situation.

In a possible implementation manner, the pose determination method provided in the embodiment of the present disclosure may be used to determine the pose of the device offline. In a possible implementation manner, in the pose determination method provided in the embodiment of the present disclosure, the determined pose may be used as a motion truth data (group truth) to perform result accuracy evaluation on some neural network algorithms related to mobile positioning, and the like. Therefore, in a possible implementation manner, the pose determination method provided by the embodiment of the present disclosure further includes:

determining motion truth value data according to at least one first pose of the first terminal in the acquisition process, wherein the motion truth value data is used for at least one of the following operations:

judging the precision of the positioning result, training the neural network and fusing information with the global map.

The motion true value data may be data that is determined to be a true value in the neural network training, that is, group truth data in the neural network algorithm. Because the first pose determined in the embodiment of the present disclosure is pose data of the first terminal in the data acquisition process, and the accuracy is high, in a possible implementation manner, the determined first pose can be considered as real, and therefore the first pose can be used as motion true value data.

The implementation manner of the process of determining the motion truth value data in the embodiments of the present disclosure may be flexibly determined according to the actual situation according to at least one first pose of the first terminal in the acquisition process, and is not limited to the following embodiments of the present disclosure.

In a possible implementation manner, determining motion truth value data according to at least one first pose of the first terminal in the acquisition process may include:

taking at least one first position of the first terminal in the acquisition process as the motion truth value data; and/or the presence of a gas in the gas,

taking at least one of the collected data and at least one first pose of the first terminal in the collection process as motion truth value data, wherein the collected data comprises the following steps:

one or more of wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, a first captured image, and first IMU data.

As can be seen from the above disclosure embodiments, in a possible implementation manner, the determined at least one first pose may be directly used as the motion truth data. As the number of the determined first poses is not limited in the embodiment of the present disclosure, the number of the obtained motion truth value data is also not limited in the embodiment of the present disclosure, and in a possible implementation manner, each of the determined first poses may be used as motion truth value data; in a possible implementation manner, one or more first poses from the determined plurality of first poses can be selected randomly or according to a certain mode as the motion truth data.

In one possible implementation, in addition to the first bit position as the motion truth data in any of the above manners, at least one of the collected data may be used as the motion truth data. The implementation manner of acquiring data is not limited in the embodiments of the present disclosure, and as described in the foregoing disclosure embodiments, in one possible implementation manner, the acquired data may include the first acquired image and/or the first IMU data; in a possible implementation manner, since the implementation manner of the first terminal is not limited, and the type of the data collected by the first terminal may also be changed and extended flexibly, in a possible implementation manner, the collected data may further include one or more of wireless network WiFi data, bluetooth data, geomagnetic data, and UWB data.

The different types of collected data can be collected by the first terminal, so that the collected data can have corresponding relation with the determined first pose, and corresponding constraint can be provided in the pose determination process to assist in pose determination. In one possible implementation, therefore, several types of acquired data can also be used as motion truth data.

By taking at least one of the first pose and the collected data as the motion truth value data, the data volume of the motion truth value data can be further increased, so that the application of the motion truth value data in different scenes has a better effect.

It can be seen from the foregoing disclosure that the motion true value data can be used in different scenarios, and in a possible implementation manner, the motion true value data can be used to determine the accuracy of the positioning result, and how to determine the accuracy is not limited in the disclosure embodiment. In a possible implementation manner, the motion truth value data can be used as data in a benchmark data set for evaluating the accuracy of the neural network evaluation algorithm, so as to be used for judging the accuracy of the positioning result.

In a possible implementation manner, the motion truth data may also be used for training the neural network, and how to apply the motion truth data in the training process is not limited in the embodiment of the present disclosure. In one possible implementation, the motion truth data may be used as training data and/or test data in the neural network, and thus may be applied in the training process of the neural network.

In a possible implementation manner, the motion truth value data may further be information-fused with the global map, as described in the foregoing disclosure embodiments, in a possible implementation manner, the motion truth value data may further include collected data such as WiFi data, bluetooth data, geomagnetic data, or UWB data, and the collected data and the first pose have a corresponding relationship, so in a possible implementation manner, the collected data may be used as additional auxiliary data, and the collected data may be further fused into the global map according to the corresponding relationship between the first pose and the global map, so as to further improve the data accuracy and the data comprehensiveness of the global map, and also further improve the accuracy of determining the remaining poses by using the fused global map.

Application scenario example

With the rapid development of deep learning technology, the requirements on the quality, quantity, scene diversity and other aspects of the sports truth data are increasingly strong, and expensive equipment or extra field arrangement is often required in the related art to collect the sports truth data, so how to obtain a large amount of sports truth data with low cost becomes an urgent problem to be solved at present.

Fig. 4 is a schematic diagram illustrating an application example according to the present disclosure, and as shown in the drawing, an embodiment of the present disclosure provides a method for acquiring motion truth data, where a specific process may include:

as shown in the figure, in a possible implementation manner, the process of acquiring the motion truth data proposed in the application example of the present disclosure may be divided into two parts, i.e., global map reconstruction and motion truth data localization.

Wherein, the global map reconstruction may include the following processes:

and scanning the global scene through the second terminal to obtain the map data in the global scene. The structure of the second terminal is shown in the figure, and the second terminal can be composed of a radar sensor, a vision sensor and an IMU sensor and can be carried by a backpack. As shown, in one example, the second terminal may be moved in the global scene by an operator in a back of the second terminal to capture the laser point cloud in the global scene with the radar, capture the second captured image in the global scene with the visual sensor, and capture the second IMU data in the global scene with the IMU sensor.

And in the process of scanning the global scene by the second terminal, reconstructing the global map in real time by using the acquired laser point cloud, the second acquired image and the second IMU data to obtain the real-time map. The process of establishing the real-time map may refer to the above disclosed embodiments, and is not described herein again. In one possible implementation, the real-time map may reflect the range of the operator that has performed the map data collection in the global scene, and thus may be sent to a target device, which may be a handheld device in the hand of the operator shown in the figure in one example.

After the second terminal scans the global scene, the global map can be reconstructed offline by using the acquired laser point cloud in the global scene, the second acquired image and the second IMU data, so as to obtain the global map. As shown in the figure, in an example, the laser point cloud and the second IMU data may be calculated by a radar SLAM system, so as to determine at least one pose of the radar in the process of acquiring map data, and the pose of the radar may be quasi-transformed into the pose of the visual sensor by a coordinate transformation relationship between the radar and the visual sensor, so as to obtain at least one second pose of the second terminal; meanwhile, the second collected image can be subjected to visual map reconstruction in a feature matching mode to obtain at least one frame of initial visual point cloud. Further, the determined at least one second pose may be used as an initial pose, and the features in the second collected image may provide third constraint information for the visual map reconstruction process, so as to perform the visual-radar joint optimization on the obtained initial visual point cloud, and the optimization process may refer to formula (4) in the above disclosed embodiment. Through the process, the optimized visual point cloud and the position and the characteristic information of the three-dimensional characteristic points included in the visual point cloud can be obtained. Furthermore, the visual point cloud and the three-dimensional feature points can be used as a global map, so that the reconstruction of the global map is realized.

After the global map reconstruction is completed, a process of locating the motion truth value data may be entered, and in one possible implementation, the process of locating the motion truth value data is as follows:

as shown in the figure, in a possible implementation manner, the collected data may be obtained by moving the first terminal, such as a mobile phone or AR glasses, within a certain target scene in the global scene. Wherein the acquired data may include the first acquired image and the first IMU data.

On one hand, the first collected image can be subjected to feature matching with the global map, so that visual positioning is realized, and a global feature matching result (namely global feature tracking in the map) is obtained. For the feature matching process between the first captured image and the global map, reference may be made to the above-mentioned embodiments, which are not described herein again.

On the other hand, feature matching can be performed between different frame images in the first acquired image, so as to obtain a local feature matching result (i.e. local feature tracking in the image). For the feature matching process between different frame images in the first captured image, reference may be made to the above-mentioned embodiments, which are not described herein again.

After the global feature matching result and the local feature matching result are obtained, visual-inertial joint optimization may be performed according to the global feature matching result, the local feature matching result, and the collected first IMU data through the formula (9) mentioned in the above disclosed embodiments, so as to determine at least one first pose of the first terminal in the moving process of the target scene, and the joint optimization process may be detailed in the above disclosed embodiments, and is not described herein again.

Further, after at least one first posture is obtained, the obtained first posture can be used as motion truth value data and stored in a benchmark database for evaluating the performance of the neural network algorithm.

According to the method for acquiring the motion truth value data, the equipment cost is mainly high-precision map acquisition equipment integrating a laser radar, a camera and an IMU, and the overall cost is low; the global scene and the target scene do not need to be arranged in advance, the scale expansibility is obviously superior to the related scheme of the scene needing to be arranged in advance, the upper limit of the scale is mainly determined by off-line computing power, and the existing algorithm and computing power can meet hundreds of thousands of scenes, so that the method can be used for large-scale scenes; meanwhile, the global map in the same global scene can be reused, and mass data of the mobile terminal can be acquired in a large scale after the global map is acquired and reconstructed; the acquisition of the mobile data only depends on a built-in sensor of the mobile equipment, so that additional operations of calibration, synchronization and the like with other external equipment which limit large-scale acquisition are not needed before each acquisition; in addition, the method is not limited by application scenes and can be simultaneously applied to indoor and outdoor scenes.

It should be noted that the motion true value obtained in the embodiment of the present disclosure is not limited to be used in evaluation or training of a neural network, and may also be extended to be applied to other scenarios, which is not limited in the present disclosure.

It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the image processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.

Fig. 5 shows a block diagram of a pose determination apparatus according to an embodiment of the present disclosure. The pose determination device can be a terminal device, a server or other processing equipment and the like. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.

In some possible implementations, the image processing apparatus may be implemented by a processor calling computer readable instructions stored in a memory.

As shown in fig. 5, the pose determination apparatus 20 may include:

and the collected data acquiring module 21 is configured to acquire collected data collected by a first terminal in a target scene.

And the global map obtaining module 22 is configured to obtain a global map including a target scene, where the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map meets the accuracy condition.

And the pose determining module 23 is configured to determine at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relationship between the global maps.

In one possible implementation, the global map includes at least one frame of visual point cloud, the visual point cloud including at least one three-dimensional feature point in the global scene; the acquired data comprises a first acquired image; the pose determination module is to: performing feature matching on the first collected image and at least one frame of visual point cloud to obtain a global feature matching result; and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result.

In one possible implementation, the global map includes at least one frame of visual point cloud in the target scene; the acquired data comprises at least two frames of first acquired images; the pose determination module is to: performing feature matching on the first collected image and at least one frame of visual point cloud to obtain a global feature matching result; performing feature matching according to at least two frames of first collected images to obtain a local feature matching result; and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.

In one possible implementation, the pose determination module is further configured to: and processing the global feature matching result and the local feature matching result through the adjustment of the beam method.

In one possible implementation, the pose determination module is further configured to: and matching the two-dimensional feature points in the first acquired image with the three-dimensional feature points included in at least one frame of visual point cloud to obtain a global feature matching result.

In one possible implementation, the motion truth data acquisition module is configured to: taking at least one first position of the first terminal in the acquisition process as motion truth value data; and/or taking at least one of the collected data and at least one first pose of the first terminal in the collection process as motion truth value data, wherein the collected data comprises the following steps: one or more of wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, a first captured image, and first IMU data.

In one possible implementation, the motion truth data is used for at least one of the following operations: judging the precision of the positioning result, training the neural network and fusing information with the global map.

In one possible implementation, the map data includes: laser point cloud, a second acquired image and second IMU data in the global scene; the device still includes: the map data acquisition module is used for acquiring map data of the global scene acquired by the second terminal; and the global map generation module is used for performing off-line reconstruction on the global scene according to the map data to generate a global map of the global scene.

In one possible implementation, the global map generation module is configured to: determining at least one second pose of the second terminal in the data acquisition process according to the second IMU data and the laser point cloud; according to at least one second pose and in combination with a second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of visual point cloud, wherein the visual point cloud corresponds to a plurality of three-dimensional feature points in the global scene; and obtaining a global map of the global scene according to the at least one frame of visual point cloud.

In one possible implementation, the global map generation module is further configured to: according to at least one second pose and in combination with a second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of initial visual point cloud; acquiring third constraint information in the visual map reconstruction process according to the laser point cloud and/or the second acquired image; and optimizing the at least one frame of initial visual point cloud according to the third constraint information to obtain at least one frame of visual point cloud.

In one possible implementation, the second terminal includes: the radar is used for acquiring laser point cloud in a global scene; the vision sensor is used for acquiring a second acquisition image in the global scene; and the IMU sensor is used for acquiring second IMU data in the global scene.

In one possible implementation, the apparatus is further configured to: calibrating a coordinate transformation relation between the visual sensor and the IMU sensor to obtain a first calibration result; calibrating the coordinate transformation relation between the radar and the vision sensor to obtain a second calibration result; and performing combined calibration on the coordinate transformation relation among the visual sensor, the IMU sensor and the radar according to the first calibration result and the second calibration result.

In one possible implementation, the apparatus is further configured to: in the process of collecting map data by the second terminal, reconstructing the global scene in real time according to the map data to generate a real-time map of the global scene; and sending map data and/or a real-time map to target equipment, wherein the target equipment is used for displaying the geographic range for completing data acquisition of the global scene.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the image processing method provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the image processing method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may further include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932 ^TM ，Mac OS X ^TM ，UnixTM,Linux ^TM ，FreeBSD ^TM Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A pose determination method, characterized in that the method comprises:

acquiring acquisition data acquired by a first terminal in a target scene;

acquiring a global map containing the target scene, wherein the global map is generated based on map data obtained by a second terminal performing data acquisition on the global scene containing the target scene, and the global map meets a precision condition;

determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps;

the method further comprises the following steps:

determining motion truth value data according to at least one first pose of the first terminal in the acquisition process;

the determining motion truth value data according to at least one first pose of the first terminal in an acquisition process includes:

taking at least one first position of the first terminal in the collection process as the motion truth value data; and/or the presence of a gas in the gas,

taking at least one of the collected data and at least one first posture of the first terminal in a collecting process as the motion truth value data, wherein the collected data comprises:

one or more of wireless network WiFi data, bluetooth data, geomagnetic data, ultra wideband UWB data, a first captured image, and first IMU data.

2. The method of claim 1, wherein the global map comprises at least one frame of a visual point cloud comprising at least one three-dimensional feature point in the global scene; the acquisition data comprises a first acquisition image;

the determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps comprises:

performing feature matching on the first collected image and the at least one frame of visual point cloud to obtain a global feature matching result;

3. The method of claim 1 or 2, wherein the global map comprises at least one frame of a visual point cloud in the target scene; the acquired data comprises at least two frames of first acquired images;

performing feature matching according to the at least two frames of first collected images to obtain a local feature matching result;

and determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.

4. The method of claim 3, wherein the acquiring data further comprises first Inertial Measurement (IMU) data;

determining at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result, including:

acquiring second constraint information according to the first IMU data;

processing the global feature matching result and the local feature matching result according to at least one of the first constraint information and the second constraint information to obtain at least one first pose of the first terminal in the acquisition process;

wherein the first constraint information is a visual constraint and the second constraint information is an IMU constraint.

5. The method of claim 4, wherein the processing the global feature matching result and the local feature matching result comprises:

and processing the global feature matching result and the local feature matching result through beam adjustment.

6. The method of claim 2, wherein the feature matching the first captured image with the at least one frame of visual point cloud to obtain a global feature matching result comprises:

and matching the two-dimensional feature points in the first acquired image with the three-dimensional feature points included in the at least one frame of visual point cloud to obtain a global feature matching result.

7. The method of claim 1, wherein the motion truth data is used for at least one of:

judging the precision of the positioning result, training the neural network and carrying out information fusion with the global map.

8. The method according to claim 1 or 2, wherein the map data comprises: laser point cloud, a second acquired image and second IMU data in the global scene;

the method further comprises the following steps:

acquiring map data of the global scene acquired by a second terminal;

9. The method of claim 8, wherein the reconstructing the global scene offline from the map data to generate the global map of the global scene comprises:

performing visual map reconstruction on the global scene by combining the second collected image according to the at least one second pose to obtain at least one frame of visual point cloud, wherein the visual point cloud comprises at least one three-dimensional feature point in the global scene;

10. The method of claim 9, wherein the visually mapping the global scene in conjunction with the second captured image according to the at least one second pose to obtain at least one frame of visual point cloud comprises:

according to the at least one second pose and the second collected image, performing visual map reconstruction on the global scene to obtain at least one frame of initial visual point cloud;

acquiring third constraint information in the process of reconstructing the visual map according to the laser point cloud and/or the second acquired image;

optimizing the at least one frame of initial visual point cloud according to the third constraint information to obtain at least one frame of visual point cloud;

wherein the third constraint information includes at least one of: plane constraints of the laser point cloud, edge constraints of the laser point cloud, and visual constraints.

11. The method of claim 8, wherein the second terminal comprises:

the radar is used for acquiring laser point clouds in the global scene;

a vision sensor for acquiring a second captured image in the global scene;

and the IMU sensor is used for acquiring second IMU data in the global scene.

12. The method of claim 11, wherein before the offline reconstructing the global scene from the map data to generate the global map of the global scene, further comprising:

and carrying out combined calibration on the coordinate transformation relation among the vision sensor, the IMU sensor and the radar according to the first calibration result and the second calibration result.

13. The method of claim 8, further comprising:

in the process of collecting map data by a second terminal, reconstructing the global scene in real time according to the map data to generate a real-time map of the global scene;

and sending the map data and/or the real-time map to target equipment, wherein the target equipment is used for displaying the geographic range of completing data acquisition of the global scene.

14. A pose determination apparatus, characterized by comprising:

the acquisition module of the collected data, is used for obtaining the collected data that the first terminal station in the goal scene gathers;

the global map acquisition module is used for acquiring a global map containing the target scene, wherein the global map generates map data obtained by acquiring data of the global scene containing the target scene based on a second terminal, and the global map meets the precision condition;

the pose determining module is used for determining at least one first pose of the first terminal in the acquisition process according to the acquired data and the feature corresponding relation between the global maps;

the device further comprises:

the motion truth value data acquisition module is used for determining motion truth value data according to at least one first pose of the first terminal in the acquisition process;

the motion truth value data acquisition module is used for: taking at least one first position of the first terminal in the collection process as the motion truth value data; and/or taking at least one of the collected data and at least one first posture of the first terminal in the collection process as the motion truth value data, wherein the collected data comprises: one or more of wireless network WiFi data, bluetooth data, geomagnetic data, ultra wideband UWB data, a first captured image, and first IMU data.

15. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 13.

16. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 13.