WO2022036980A1 - 位姿确定方法、装置、电子设备、存储介质及程序 - Google Patents

位姿确定方法、装置、电子设备、存储介质及程序 Download PDF

Info

Publication number
WO2022036980A1
WO2022036980A1 PCT/CN2020/140274 CN2020140274W WO2022036980A1 WO 2022036980 A1 WO2022036980 A1 WO 2022036980A1 CN 2020140274 W CN2020140274 W CN 2020140274W WO 2022036980 A1 WO2022036980 A1 WO 2022036980A1
Authority
WO
WIPO (PCT)
Prior art keywords
global
data
map
point cloud
scene
Prior art date
Application number
PCT/CN2020/140274
Other languages
English (en)
French (fr)
Inventor
刘浩敏
杭蒙
张壮
章国锋
Original Assignee
浙江商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江商汤科技开发有限公司 filed Critical 浙江商汤科技开发有限公司
Priority to JP2021568700A priority Critical patent/JP7236565B2/ja
Priority to KR1020227003200A priority patent/KR20220028042A/ko
Publication of WO2022036980A1 publication Critical patent/WO2022036980A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • the present disclosure relates to the technical field of computer vision, and relates to, but is not limited to, a pose determination method, apparatus, electronic device, storage medium and computer program.
  • the embodiments of the present disclosure provide a pose determination method, apparatus, electronic device, storage medium, and computer program.
  • An embodiment of the present disclosure provides a method for determining a pose, the method includes:
  • At least one first pose of the first terminal during the collection process is determined.
  • An embodiment of the present disclosure also provides a device for determining a pose, the device comprising:
  • the acquisition data acquisition module is configured to: acquire acquisition data acquired by the first terminal in the target scene;
  • the global map obtaining module is configured to: obtain a global map including the target scene; wherein, the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition;
  • the pose determination module is configured to: determine at least one first pose of the first terminal in the collection process according to the feature correspondence between the collected data and the global map.
  • the global map includes at least one frame of visual point cloud, and the visual point cloud includes at least one three-dimensional feature point in the global scene;
  • the collected data includes a first collected image;
  • An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory to execute any of the preceding steps The described pose determination method.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the method for determining a pose and attitude as described in any preceding one.
  • Embodiments of the present disclosure further provide a computer program, the computer program includes computer-readable codes, and when the computer-readable codes are executed in an electronic device, the processor of the electronic device executes the code for realizing the The pose determination method described in the preceding one.
  • the first terminal by obtaining the collection data collected by the first terminal in the target scene, and obtaining the global map including the target scene, and according to the feature correspondence between the collected data and the global map, it is determined that the first terminal is collecting at least one first pose in the process.
  • the global map of the global scene can be reused, and after the global map is generated, a large amount of first pose data can be collected through the first terminal on a large scale, and the method of acquiring the collected data for generating the first pose is also It is relatively simple, and the acquisition can be realized only through the first terminal, which reduces the additional device settings for the target scene or the additional calibration synchronization between multiple devices, etc., thereby reducing the acquisition cost of the first pose;
  • the global map satisfies the accuracy conditions, so the first pose data obtained based on the collected data and the feature correspondence between the global maps also has high accuracy.
  • FIG. 1 is a flowchart of a pose determination method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of comparison before and after visual point cloud optimization provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a second terminal according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of motion truth data acquisition provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an apparatus for determining a pose and orientation provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a first electronic device according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a second electronic device according to an embodiment of the present disclosure.
  • Mobile positioning is a key technology in applications such as augmented reality, autonomous driving, and mobile robots.
  • Augmented reality is used to seamlessly integrate virtual objects with the real environment based on real-time localization results to realize path planning for vehicles or mobile robots.
  • Early mobile positioning mainly relied on dedicated hardware equipment such as laser equipment, differential Global Positioning System (GPS) equipment, and high-precision inertial navigation equipment.
  • GPS Global Positioning System
  • SLAM Simultaneous Localization And Mapping
  • the smart terminal In terms of augmented display, with the launch of the SLAM-based augmented reality platform configured in the smart terminal, the smart terminal has entered the Augmented Reality (AR) era. It has become a trend to provide centimeter-level positioning in earth-level scenes by reconstructing high-precision maps of large-scale scenes, such as determination of pose. However, there is no solution for realizing high-precision pose determination based on low-cost equipment in the related art.
  • AR Augmented Reality
  • FIG. 1 is a flowchart of a pose determination method provided by an embodiment of the present disclosure, and the method can be applied to a pose determination apparatus.
  • the pose determination apparatus may be a terminal device, a server, or other processing devices, or the like.
  • the terminal device can be User Equipment (UE), mobile device, user terminal, terminal, cellular phone, cordless phone, Personal Digital Assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device Wait.
  • UE User Equipment
  • PDA Personal Digital Assistant
  • the pose determination method provided by the embodiments of the present disclosure may be implemented by a processor calling computer-readable instructions stored in a memory.
  • the pose determination method may include steps S11 to S13:
  • Step S11 acquiring the collection data collected by the first terminal in the target scene.
  • Step S12 acquiring a global map including the target scene.
  • the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition.
  • Step S13 Determine at least one first pose of the first terminal during the collection process according to the feature correspondence between the collected data and the global map.
  • the target scene may be any scene in which the first terminal acquires the collected data, and its implementation form may be flexibly determined according to actual needs, which is not limited in the embodiments of the present disclosure.
  • the target scene may include an outdoor scene, such as a square, a street, or an open space.
  • the target scene may include an indoor scene, such as a classroom, an office building, or a residential building.
  • the target scene may include both an outdoor scene and an indoor scene.
  • the first terminal may be a mobile terminal with a data collection function, and any device with movement and data collection functions may be used as the first terminal.
  • the first terminal may be an AR device, such as a mobile phone or AR glasses.
  • the collected data may be data collected by the first terminal in the target scene, and the implementation form of the collected data and the data content contained in the collected data may be based on the data collection method of the first terminal or the first terminal.
  • the actual implementation form of the data collection of the terminal is determined flexibly, which is not limited in this embodiment of the present disclosure.
  • the collected data may include a first captured image obtained by the AR device performing image capture of the target scene, etc.; in the case where the first terminal is an AR device Next, the collected data may further include first IMU data obtained by the IMU in the AR device collecting the target scene data, and the like.
  • the first terminal may move in the target scene to realize the collection of collected data, wherein the specific moving process and manner of the first terminal may be flexibly selected according to actual conditions.
  • the collected data may be acquired by reading the collected data from the first terminal, or by receiving the collected data transmitted by the first terminal; in some embodiments of the present disclosure, the present disclosure implements The pose determination method provided in the example can also be applied to the first terminal. In this case, the collection data collected by the first terminal in the target scene can be directly acquired.
  • the global scene when the target scene is an outdoor scene including a certain open space or a square, the global scene may be a suburb or an urban scene including the target scene, and the global scene may include the suburb Or an outdoor scene in an urban area, and may also include an indoor scene in the suburb or an urban area.
  • the map data may include a second captured image obtained by performing image capture on the global scene; the map data may include second IMU data obtained by performing IMU data collection on the global scene; the map data may also include Laser point cloud data obtained by radar scanning of the global scene, etc.
  • the map data in the case where the second terminal includes a visual sensor for image acquisition, the map data may include the second acquired image; in the case where the second terminal includes an IMU sensor for acquiring IMU data , the map data may include second IMU data; in the case that the second terminal includes a radar for collecting laser point clouds, the map data may include laser point cloud data.
  • the hardware structure and connection method included in the second terminal can also refer to the subsequent disclosed embodiments for details, and will not be expanded here.
  • the implementation form of the global map may be jointly determined according to the actual situation of the global scene and the data content of the map data.
  • the global map may include relevant information of each three-dimensional feature point in the global scene.
  • the global map may include relevant information of each 3D feature point in the global scene, wherein the 3D feature point in the global scene may be displayed in the form of an image, and the information content included in the relevant information of the 3D feature point It can be flexibly determined according to the actual situation, such as including the coordinates of the 3D feature points and the feature information of the 3D feature points.
  • the feature information of the 3D feature points can include the feature descriptors corresponding to the 3D feature points, the communication signal fingerprints corresponding to the 3D feature points, Or one or more kinds of semantic information and other information related to features.
  • the accuracy of the global map may be the position accuracy of each 3D feature point in the global map, for example, it may be the coordinates of the 3D feature points included in the global map, and the accuracy of the 3D feature points in the global scene. Position difference between actual positions. Therefore, the accuracy condition of the global map can be used to determine whether the position of each three-dimensional feature point in the global map meets the accuracy requirement, and the specific content of the accuracy condition can be flexibly set according to the actual situation.
  • the global map satisfies the accuracy condition can be indirectly inferred by judging whether the ratio between the geographic range corresponding to the collected map data and the geographic range covered by the global scene reaches a preset threshold.
  • a global map can be generated in the pose determination device according to the map data by acquiring map data collected by the second terminal; the global map can also be generated in other devices or devices, where In this case, the way to obtain the global map may be to directly read the global map from the device for storing or generating the global map.
  • the second terminal may move in the global scene to collect corresponding map data.
  • step S11 and step S12 is not limited in the embodiment of the present disclosure. Exemplarily, step S11 and step S12 may be performed in a certain order, and step S11 and step S12 may also be performed at the same time implement.
  • the collected data may be data obtained by collecting the target scene. Therefore, the collected data may reflect the characteristics of the target scene. Since the global scene corresponding to the global map includes the target scene, the global map may also include the target scene. The feature of the target scene, in this way, according to the feature correspondence between the collected data and the global map, may include the feature correspondence between the collected data and the global map. Moreover, since the first terminal moves in the target scene, a large amount of collection data can be collected, and the collected data can also reflect the characteristics of the target scene. Therefore, in this embodiment of the present disclosure, the feature correspondence between the collected data and the global map is It may also include the feature correspondence between each data contained in the collected data itself.
  • the first pose may be one or more poses corresponding to the moment when the first terminal performs the data collection operation during the movement of the target scene; The quantity can be flexibly determined according to the actual situation.
  • the first pose may correspond to the collected data, that is, the first pose may be the pose corresponding to the moment when the first terminal collects each collected data.
  • the first terminal by obtaining the collection data collected by the first terminal in the target scene, and obtaining the global map including the target scene, and according to the feature correspondence between the collected data and the global map, it can be determined that the first terminal is collecting at least one first pose in the process.
  • the global map of the global scene can be reused. After the global map is generated, a large number of first poses can be collected through the first terminal on a large scale, and the method of acquiring the collected data for generating the first pose is also relatively simple.
  • the acquisition can be realized only through the first terminal, which reduces the additional device settings for the target scene or the additional calibration synchronization between multiple devices, thereby reducing the cost of obtaining the first pose; and, due to the global map
  • the accuracy conditions are met, so the first pose obtained based on the feature correspondence between the collected data and the global map also has high accuracy.
  • the obtaining form of the map data can be flexibly determined according to the actual situation, and the method of generating the global map based on the map data can be flexibly determined according to the actual situation of the map data. Therefore, in some embodiments of the present disclosure, the map data may include: the laser point cloud in the global scene, the second acquired image, and the second IMU data.
  • offline reconstruction of the global scene is performed to generate a global map of the global scene.
  • the laser point cloud may be a point cloud composed of multiple laser points obtained by performing radar scanning of the global scene through the second terminal, and the number of laser points contained in the laser point cloud may be determined according to the second terminal.
  • the radar scanning situation of the terminal and the movement trajectory of the second terminal in the global scene are jointly and flexibly determined, and are not limited in the embodiments of the present disclosure.
  • the second collected images may be multiple images collected during the movement of the second terminal in the global scene, and the number of the second collected images may be based on the second terminal in the global scene It is determined jointly by the movement situation of the second terminal and the number of hardware devices used for capturing images included in the second terminal, which is not limited in this embodiment of the present disclosure.
  • the second IMU data may be relevant inertial measurement data collected during the movement of the second terminal in the global scene, and the quantity of the second IMU data may also be based on the global data of the second terminal.
  • the movement situation in the scene and the number of hardware devices included in the second terminal for collecting IMU data are jointly determined, and are not limited in the embodiments of the present disclosure.
  • a global map of the global scene is generated.
  • the result of the at least one first pose determined based on the global map and the collected data is relatively accurate; at the same time, since the map data includes the laser point cloud, the second collected image and the second IMU data, the acquisition method of these data is relatively easy and the acquisition process is limited by the There are few cases of space constraints. Therefore, the attitude determination method proposed in the embodiments of the present disclosure has less difficulty in acquiring map data and a global map, thereby reducing the dependence on the environment and/or equipment, thereby enabling the attitude determination method to be able to Applied in various scenarios.
  • offline reconstruction is performed on the global scene according to the map data to generate a global map of the global scene, including:
  • a global map of the global scene is obtained.
  • the acquired laser points can be projected onto the lidar frame at the moment, so that the laser light-based The projection result of the point is used to estimate the second pose of the second terminal at different times during the data collection process.
  • a visual map reconstruction of the global scene may be performed according to the at least one second pose and combined with the second collected image , to get at least one frame of visual point cloud.
  • the visual point cloud may include at least one three-dimensional feature point in the global scene, and the number of the visual point cloud and the number of included three-dimensional feature points are not limited in the embodiments of the present disclosure.
  • the global map may include one or more frames of visual point clouds. As described in the above disclosed embodiments, the global map may include relevant information of each three-dimensional feature point in the global scene. In some embodiments of the present disclosure, the visual point cloud may be obtained through a visual image. In this case, the global map may further include at least one frame of visual image for observing the visual point cloud.
  • the three-dimensional feature points included in the visual point cloud can also be stored in the global map, so the visual point cloud can also correspond to the feature information of the three-dimensional feature points.
  • the feature descriptors of the three-dimensional feature points may be determined according to the features extracted from the second captured image, so the visual point cloud may correspond to the feature descriptors of the three-dimensional feature points.
  • the map data may also include signal data related to communication, such as WiFi signals, Bluetooth signals, or UWB signals, etc.
  • the visual point cloud can correspond to the communication signal fingerprints of the three-dimensional feature points; in some embodiments of the present disclosure, the second captured image may also include some semantic information, and these semantic information may also be related to A corresponding relationship is established between the three-dimensional feature points, so as to serve as the feature information of the three-dimensional feature points. In this case, the visual point cloud can establish a corresponding relationship with the semantic information.
  • feature extraction and matching may be performed on the second captured image through a scale-invariant feature transform (Scale-Invariant Feature Transform, SIFT), thereby generating at least one frame of visual point cloud, for example, according to
  • SIFT Scale-Invariant Feature Transform
  • all the obtained visual point clouds and the feature information of the three-dimensional feature points corresponding to these visual point clouds can be taken together as a global map; in some embodiments of the present disclosure, it is also possible to One or more frames are selected from the obtained visual point cloud, and based on the feature information of the three-dimensional feature points corresponding to the visual point cloud of this frame or multiple frames, they are collectively used as a global map.
  • the laser point cloud, the second IMU data, and the second collected image can be comprehensively used, and the information such as the position and characteristics of each 3D feature point in the global scene can be represented by the visual point cloud, and the data that is easier to obtain can be used.
  • the reconstruction of the global map can be realized, and the reconstruction result is more accurate, which improves the convenience and accuracy of the entire attitude determination process.
  • a visual map is reconstructed on the global scene to obtain at least one frame of visual point cloud, including:
  • At least one frame of the initial visual point cloud is optimized to obtain at least one frame of the visual point cloud.
  • the accuracy may be lower due to the second pose determined from the laser point cloud.
  • the visual point cloud obtained by directly using the determined second pose and combining the second captured image to reconstruct the visual map may contain relatively large noise. Therefore, in this embodiment of the present disclosure, after the visual map is reconstructed on the global scene according to the second pose and the second captured image, the image reconstructed from the visual map may be used as the initial visual point cloud, and the /or the third constraint information generated by the second captured image, further optimizes the initial visual point cloud, thereby reducing the noise in the initial visual point cloud to obtain a visual point cloud with higher precision.
  • the process of reconstructing the visual map according to the second pose and the second captured image to obtain at least one frame of the initial visual point cloud may refer to the above disclosed embodiments, which will not be repeated here.
  • the third constraint information may be constraint information determined according to the laser point cloud and/or the second captured image.
  • obtaining the third constraint information in the visual map reconstruction process may include:
  • the plane feature information of the laser point cloud determine the plane constraint information of the laser point cloud in the process of visual map reconstruction
  • the edge feature information of the laser point cloud determine the edge constraint information of the laser point cloud in the process of visual map reconstruction
  • the third constraint information in the visual map reconstruction process is acquired.
  • the plane feature information of the laser point cloud can be flexibly determined according to the actual situation of the laser point cloud, and the specific form of the plane constraint information determined based on the plane feature information of the laser point cloud can be flexibly selected according to the actual situation, for example,
  • the plane constraint information can be calculated by formula (1):
  • n and m are two different laser point cloud coordinate systems
  • m n is the plane feature normal vector at the feature point m q in the coordinate system m
  • m n T is the transpose of m n
  • n p is the feature point in the coordinate system n
  • m q is the feature point in the coordinate system m, based on The coordinate transformation performed by this coordinate transformation relationship on n p
  • ⁇ p is the covariance matrix of the plane feature of the laser point cloud, where the value of ⁇ p can be flexibly set according to the actual situation, for example, ⁇ p can be set to 0.2m 2 .
  • the edge feature information of the laser point cloud can also be flexibly determined according to the actual situation of the laser point cloud.
  • the specific form of the edge constraint information determined based on the edge feature information of the laser point cloud can be flexibly selected according to the actual situation.
  • the edge constraint information can be calculated by formula (2):
  • Equation (2) m I is the edge feature direction vector at the feature point m q in the coordinate system m, ⁇ e is the covariance matrix of the edge feature of the laser point cloud, and the remaining parameters have the meanings of the corresponding parameters in Equation (1).
  • the value of ⁇ e can be flexibly set according to the actual situation, for example, ⁇ e can be set to 0.5m 2 .
  • both the plane constraint information and the edge constraint information can be used as the third constraint information, or one of the plane constraint information or the edge constraint information can be used as the third constraint information.
  • the third constraint information the specific selection can be flexibly determined according to the actual situation.
  • acquiring the third constraint information in the process of reconstructing the visual map according to the second captured image may include:
  • the visual constraint information in the process of visual map reconstruction is obtained; wherein, the two-dimensional feature points are the two-dimensional features corresponding to the three-dimensional feature points in the initial visual point cloud. point;
  • the third constraint information in the visual map reconstruction process is obtained.
  • the specific process of obtaining the visual constraint information in the visual map reconstruction process can be flexibly selected according to the actual situation.
  • the visual constraint information can be calculated by formula (3):
  • X j is the jth three-dimensional feature point corresponding to the visual point cloud
  • x ij is the two-dimensional feature point corresponding to the three-dimensional feature point X j in the initial visual point cloud of the ith frame
  • f( W T i , X j ) is the projection result of projecting the three-dimensional feature point X j to the initial visual point cloud of the i-th frame
  • ⁇ v is the covariance matrix constrained by the image features, and the value of ⁇ v can be flexibly set according to the actual situation , exemplarily, ⁇ v can be set to 2 pixels squared.
  • the third constraint information may include one or more of plane constraint information of the laser point cloud, edge constraint information of the laser point cloud, and visual constraint information. In some embodiments of the present disclosure, the third constraint information may simultaneously include plane constraint information of the laser point cloud, edge constraint information and visual constraint information of the laser point cloud. In this case, according to the third constraint information, at least One frame of initial visual point cloud is optimized, and the process of obtaining at least one frame of visual point cloud can be realized by formula (4):
  • L p is the point cloud composed of points belonging to the plane in the laser point cloud
  • L'p is the set of L p
  • L e is the point cloud composed of points belonging to the edge in the laser point cloud
  • L' e is a set of Le
  • optimizing at least one frame of the initial visual point cloud may include optimizing the three-dimensional feature points included in the initial visual point cloud, and may also include optimizing the data collected in the second terminal.
  • the pose of the device that collects the second image is optimized, and in the case of optimizing the pose of the device that collects the second image in the second terminal, correspondingly, the second pose corresponding to the second terminal can also be optimized. optimization, thereby reducing the noise contained in the visual point cloud due to the lower accuracy of the second pose.
  • the third constraint information of the visual map reconstruction process can be obtained again based on the optimization result of the visual point cloud, and based on the third constraint information, the visual point cloud can be further iteratively optimized.
  • the number of iterations can be flexibly selected according to the actual situation, which is not limited in this embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram of comparison before and after visual point cloud optimization according to an embodiment of the present disclosure.
  • boxes 201 and 202 are the visual images corresponding to the visual point cloud before optimization
  • boxes 203 and 204 are the visual images corresponding to the optimized visual point cloud. 2
  • the optimized visual point cloud has higher accuracy.
  • the three-dimensional corresponding to the optimized visual point cloud The accuracy of feature points is also improved.
  • the second terminal may include:
  • a vision sensor for acquiring the second captured image in the global scene
  • the IMU sensor is used to obtain the second IMU data in the global scene.
  • the radar may be any radar with a laser point cloud collection function, for example, the radar may be a three-dimensional (Three Dimension, 3D) radar.
  • the visual sensor can be any sensor with image acquisition function, such as a camera.
  • the second terminal may simultaneously include a 4-array camera with a 360° image acquisition function.
  • the implementation form of the IMU sensor can also be flexibly determined according to the actual situation.
  • the setting position and connection relationship between the radar, the visual sensor and the IMU sensor in the second terminal can be flexibly selected according to the actual situation.
  • the radar, the vision sensor, and the IMU sensor may be rigidly connected, and the specific connection sequence may be flexibly selected according to the actual situation.
  • the vision sensor and the IMU sensor may be fixedly connected and packaged as a fixed structural unit, and the radar may be disposed above the fixed structural unit.
  • the vision sensor, the IMU sensor and the radar may also be fixedly arranged in a backpack.
  • FIG. 3 is a schematic structural diagram of a second terminal according to an embodiment of the present disclosure.
  • the visual sensor and the IMU sensor can be fixedly connected and packaged as a fixed structural unit 301 .
  • the lower end of the fixed structural unit 301 can be set in the backpack 302 for easy portability, and the radar 303 can be set on the fixed structural unit 301 above.
  • the map data in the global scene can be comprehensively collected, thereby facilitating the subsequent generation of the global map.
  • Collecting map data through the simple and low-cost hardware device second terminal shown in FIG. 3 can reduce the equipment cost of acquiring map data, thereby reducing the hardware implementation cost and difficulty of determining the first pose data.
  • the offline reconstruction of the global scene according to the map data, and before generating the global map of the global scene may further include:
  • the coordinate transformation relationship between the vision sensor and the IMU sensor is calibrated to obtain the first calibration result
  • the coordinate transformation relationship between the radar and the vision sensor is calibrated to obtain the second calibration result
  • the coordinate transformation relationship among the vision sensor, the IMU sensor and the radar is jointly calibrated.
  • the method of calibrating the coordinate transformation relationship between the vision sensor and the IMU sensor can be selected flexibly according to the actual situation.
  • the calibration of the vision sensor and the IMU sensor can be realized by the Kalibr tool;
  • the way of calibrating the coordinate transformation relationship between vision sensors can also be flexibly selected according to the actual situation;
  • the calibration of radar and vision sensors can also be realized through the AutoWare framework.
  • the coordinate transformation between the vision sensor, the IMU sensor, and the radar may also be performed according to the first calibration result and the second calibration result.
  • the relationship is jointly calibrated and optimized to make the coordinate transformation relationship between different hardware devices more accurate.
  • joint calibration can be achieved by formula (5):
  • C i is the ith visual sensor in the second terminal
  • I is the IMU sensor
  • L is the radar
  • I T L is the coordinate transformation relationship between the radar and the IMU sensor
  • the covariance ⁇ c / ⁇ L represents the error in the calibration process of the IMU sensor and the radar, respectively. The value of this error can be flexibly set according to the actual situation.
  • all rotation components in the diagonal matrices of ⁇ c and ⁇ L can be set to 0.01rad 2
  • all transformation components of ⁇ c can be set to 0.03m 2
  • all transformation categories of ⁇ L can be set to (0.03, 0.03 , 0.15)m 2 .
  • the overall calibration error can be made smaller. Therefore, in After the above calibration is completed, the global map is generated, which can greatly improve the accuracy of the global map, thereby improving the accuracy of the entire pose determination process.
  • the global scene is reconstructed in real time according to the map data, and a real-time map of the global scene is generated.
  • the target device is used to display the geographic scope of data collection for the global scene.
  • the second terminal may also perform real-time reconstruction of the global scene according to the map data during the process of collecting the map data to generate a real-time map of the global scene.
  • the implementation form of the real-time map may refer to the global map, which will not be repeated here.
  • the real-time map may cover each scene corresponding to the map data collected by the second terminal in the global scene.
  • some optimization processes in the offline reconstruction may be omitted to improve the reconstruction speed.
  • the acquisition of the third constraint information and the visual adjustment according to the third constraint information may be omitted.
  • the real-time reconstruction can be implemented by some specific 3D radar real-time localization and map construction SLAM, also known as Concurrent Mapping and Localization (CML) system, exemplarily, also The open source Cartographer library can be used to reconstruct the global scene in real time and generate a real-time map of the global scene.
  • SLAM also known as Concurrent Mapping and Localization
  • CML Concurrent Mapping and Localization
  • the target device may be used to display the geographic scope of data collection for the global scene, that is, the target device may display the geographic scope covered by the map data collected by the second terminal, thereby indicating the second terminal Subsequent movement directions and map data collection requirements in the global scene.
  • the target device may be a handheld device that can be flexibly controlled by map data collectors, such as a tablet computer or a mobile phone; in some embodiments of the present disclosure, the second terminal is set on the mobile device Under the condition that map data is collected on a mobile device (such as an automatic robot, etc.), the target device may be a controller or a display screen of a mobile device.
  • the collected map data may be sent to the target device, or the real-time map may be sent to the target device, or the map data and the real-time map may be sent to the target device at the same time.
  • the global scene is reconstructed in real time according to the map data to generate a real-time map, and the map data and/or the real-time map are sent to the target device.
  • real-time preview of the area where map data has been collected in the global scene, and the reconstruction quality of the map can be controlled at any time, thereby improving the collection efficiency and success rate of map data, and reducing the risk of missing or repeated collection of map data.
  • a global map can be generated through various combinations of the above disclosed embodiments, so that it is possible to obtain the global map through step S12. After the acquisition data and the global map are acquired, as described in the above disclosed embodiments, step S13 may be used to determine at least one first pose of the first terminal in the acquisition process.
  • step S13 can be flexibly determined.
  • the global map may include at least one frame of visual point cloud, and the visual point cloud includes at least one three-dimensional feature point in the global scene; the collected data includes the first collected image;
  • step S13 may include:
  • At least one first pose of the first terminal in the acquisition process is determined.
  • the first collected image may be an image collected by the first terminal in the target scene, and the number of the first collected image may be one frame or multiple frames, depending on the actual situation It only needs to be determined, which is not limited in the embodiments of the present disclosure.
  • the global feature matching result may be three-dimensional feature points in at least one frame of visual point cloud that match the two-dimensional feature points in the first captured image.
  • the feature matching relationship between the first captured image and the visual point cloud can be flexibly selected according to the actual situation, and any method that can achieve feature matching between images can be used as the first captured image and the visual point cloud.
  • the feature matching method between visual point clouds can use SIFT, and/or use sparse optical flow tracking method (Kanade-Lucas-Tomasi Tracking Method, KLT), for the first collected image and at least one frame of visual point Cloud for feature matching.
  • feature matching is performed between the first captured image and the at least one frame of visual point cloud to obtain a global feature matching result, which may include:
  • two-dimensional feature points in the first captured image may be feature-matched with three-dimensional feature points included in at least one frame of visual point cloud to obtain a global matching result.
  • the feature information used for feature matching may be one or more of various types of feature information such as feature descriptors, communication signal fingerprints, or semantic information.
  • the global feature matching result may be implemented by an approximate nearest neighbor search (Approximate Nearest Neighbor, ANN). For example, for the feature included in the first captured image, K features closest to the feature can be found in the global map (the number of K can be flexibly set according to the actual situation). Then these K features can vote on the visual point cloud of each frame in the global map to determine whether the visual point cloud corresponds to the first captured image.
  • threshold value it can be considered that the visual image corresponding to a certain frame or several frames of visual point cloud is the co-view image of the first captured image, and in the co-view image, each 3D feature matching the 2D feature points in the first captured image point, which can be used as the global feature matching result.
  • the operation of obtaining the global feature matching result by matching the two-dimensional feature points in the first captured image with the three-dimensional feature points corresponding to at least one frame of visual point cloud through the ANN can reduce the process of feature matching.
  • the number of mismatches improves the accuracy of the global feature matching results, thereby improving the accuracy of pose determination.
  • the global feature matching result can be estimated by Random Sample Consensus (RANSAC) method and Perspective n Points (PnP) method, and the pose is estimated by The reprojection error optimization method optimizes the estimated pose, so as to obtain at least one first pose of the first terminal during the acquisition process.
  • RANSAC Random Sample Consensus
  • PnP Perspective n Points
  • the features corresponding to the visual point cloud in the global map can be used to match the features of the first captured image, so that the pose of the first terminal can be estimated by using the matched features in the first captured image, to obtain at least one pose of the first terminal. Since the accuracy of the global map satisfies the accuracy conditions, the first pose determined based on the result of matching with the global map feature also has high accuracy and can also improve the first pose The accuracy of the pose determination process.
  • the global map includes at least one frame of visual point cloud in the target scene; the collected data may include at least two frames of first collected images, and step S13 may include:
  • Step S131 performing feature matching between the first captured image and at least one frame of visual point cloud to obtain a global feature matching result
  • Step S132 performing feature matching on at least two frames of the first captured images to obtain a local feature matching result
  • Step S133 Determine at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.
  • the method of performing feature matching between the first captured image and at least one frame of visual point cloud to obtain a global feature matching result may refer to the above disclosed embodiments, which will not be repeated here.
  • the way to determine the first pose may be due to the global feature matching results obtained by feature matching between the first captured image and the visual point cloud.
  • the 3D feature points included in the point cloud are incomplete or the number is small, etc., resulting in inaccurate results of determining the first pose or inability to determine the first pose. Therefore, in some embodiments of the present disclosure, when the collected data includes at least two frames of the first collected images, a local feature matching result may be further obtained according to the feature matching relationship between different first collected images, and then based on the global feature matching results.
  • the feature matching result and the local feature matching result jointly determine at least one first pose of the first terminal in the acquisition process.
  • the local feature matching result may be two-dimensional feature points that match each other between different first captured image frames, and the process of performing feature matching according to at least two frames of the first captured image may be flexibly selected according to the actual situation.
  • the KLT method may be used to perform feature matching using optical flow features between different first captured images, thereby obtaining a local feature matching result.
  • the way of determining the first pose based on the global feature matching result in step S133 can be performed by RANSAC and PnP to estimate the pose and further optimize the global feature matching result and the local feature matching result realized.
  • the global feature matching result can be assisted based on the local feature matching result, thereby reducing the influence on the pose determination result due to the incomplete coverage of the global scene by the global map, and improving the The accuracy of the first pose.
  • the collected data may further include first IMU data, in this case, step S133 may include:
  • the global feature matching result and the local feature matching result are processed to obtain at least one first pose of the first terminal during the acquisition process.
  • the first IMU data may be inertial measurement data collected during the data collection process in the target scene by the first terminal.
  • the first constraint information and the second constraint information may also be obtained, so as to obtain the first pose.
  • the first constraint information may be constraint information obtained according to the global feature matching result and/or the local feature matching result. Specifically how to obtain the first constraint information.
  • the first constraint information may be obtained by using the information of the matched three-dimensional feature points and two-dimensional feature points in the global feature matching result.
  • the process of obtaining the first constraint information can be implemented by formula (6):
  • W T i is the pose of the device in the first terminal for collecting the first captured image when the i-th frame of the first captured image is collected, is the jth three-dimensional feature point matched in the global feature matching result, Match results for global features with matched 2D feature points, for the three-dimensional feature points The projection result projected onto the first captured image of the i-th frame.
  • the first constraint information may be acquired by using the information of the matched three-dimensional feature points and two-dimensional feature points in the local feature matching result.
  • the process of obtaining the first constraint information can be realized by formula (7):
  • x ij is the j-th two-dimensional feature point matched in the local feature matching result
  • X j is the three-dimensional feature point mapped by x ij in the target scene in the local feature matching result
  • f( W T i , X j ) is the projection result of projecting the three-dimensional feature point X j onto the first captured image of the i-th frame
  • the meanings of the remaining parameters may refer to the aforementioned disclosed embodiments.
  • Equation (6) or Equation (7) can be used as the first constraint information.
  • the first constraint information can also be jointly obtained according to the global feature matching result and the local feature matching result.
  • the first constraint information can be obtained from equations (6) and (7).
  • the constraint information is combined to obtain the first constraint information.
  • the second constraint information may be constraint information obtained according to the first IMU data.
  • the second constraint information may be obtained by using the relevant parameters of the device in the first terminal that collects the first captured image and the first IMU data.
  • the process of acquiring the second constraint information can be implemented by formula (8):
  • C i ( W T i , W v i , b a , b g ) is the parameter of the first terminal in the case of collecting the first captured image of the i-th frame
  • W v i is the first terminal speed
  • b a is the acceleration bias of the device measuring the first IMU data in the first terminal
  • b g is the gyroscope measurement bias of the device measuring the first IMU data in the first terminal
  • h() is the IMU cost function , and the meanings of other parameters may refer to the above disclosed embodiments.
  • the second constraint information may be determined according to changes in the first IMU data during the process of collecting the first captured image by the first terminal.
  • the processing of the global feature matching result and the local feature matching result may include: processing the global feature matching result and the local feature matching result by adjusting the beam method.
  • the bundle adjustment (Bundle Adjustment, BA) is an implementation of the pose solution.
  • the constraint information can be solved through BA to calculate the first pose under the minimum error.
  • the first constraint information and the second constraint information can be used together as the constraint information.
  • the process of solving the constraint information by BA can be represented by the following formula (9):
  • formula (9) can be calculated by using the key frame solution and the incremental BA (Incremental Consistent and Efficient Bundle Adjustment, ICE-BA) solution method, thereby determining at least one first pose .
  • incremental BA Incmental Consistent and Efficient Bundle Adjustment, ICE-BA
  • At least one of the first constraint information and the second constraint information can be used to optimize the obtained first pose, thereby making the final determined first pose overall smoother and reducing jitter; and , using key frames and ICE-BA to solve the first pose can effectively reduce the amount of calculation in the first pose determination process, thereby improving the efficiency of the pose determination process.
  • the accuracy of the first pose determined in the embodiments of the present disclosure is relatively high. Therefore, the methods proposed in the embodiments of the present disclosure can be applied to various scenarios in the field of mobile positioning, and are specifically applied to Which scenario can be selected according to the actual situation.
  • the pose determination method proposed in the embodiments of the present disclosure can be used to determine the device pose offline.
  • the first pose determined by the pose determination method proposed in the embodiments of the present disclosure can be used to evaluate the result accuracy of some neural network algorithms related to mobile positioning.
  • Carrying the data set with the true value of motion is an important condition for the development of SLAM technology.
  • the true value in motion can be used to evaluate and compare the accuracy of the SLAM algorithm, and it can also be used to improve the accuracy of the SLAM algorithm when processing some extreme cases, such as pictures with motion blur, severe illumination changes, and few feature points.
  • Standard which can improve the ability of SLAM algorithm to deal with extreme scenarios.
  • the true value of motion is mainly obtained through GPS; in indoor application scenarios, the true value of motion is mainly achieved by building high-precision motion capture systems such as VICON and lighthouse in indoor environments.
  • the system is a reflection-based capture system, which requires a custom-made reflective ball to be attached to the captured object as a signal receiver.
  • the capture camera emits a specific When there is light, the reflective ball will reflect the light signal of the same wavelength to the camera.
  • This method needs to install and calibrate the equipment of motion capture system such as VICON in the surrounding environment where the true value of the trajectory needs to be collected in advance. Therefore, both the equipment and the deployment cost are very high, and the equipment cost of a small room is close to one million. It is more difficult to scale to large-scale scenes.
  • each mobile device to collect the true value needs to install and calibrate the signal receiver. Before collecting each set of data, it is necessary to synchronize the received signal with the sensor on the mobile device, which is time-consuming and labor-intensive, and it is difficult to expand to the collection of massive data. .
  • real-time positioning can also be achieved based on external signals such as Bluetooth, geomagnetic and other signals, but these methods usually rely on a signal fingerprint map constructed in advance that matches the positioning environment, and the positioning accuracy can be collected in the environment.
  • the intensity of the signal at each point varies.
  • the operator needs to use the measurement tool to measure on the spot in the positioning environment, which will result in high time cost and labor cost. Therefore, it is impossible to obtain a large amount of true motion value by this method. .
  • the embodiment of the present disclosure further provides a method for acquiring motion truth data.
  • the pose determination method provided by the embodiments of the present disclosure further includes:
  • the motion truth data is determined, wherein the motion truth data is used for at least one of the following operations:
  • the motion truth data can be the data for which the result is determined to be the true value in the neural network training, that is, the Ground Truth data in the neural network algorithm. Since the first pose determined in the embodiment of the present disclosure is the pose data of the first terminal during the movement process of data collection, and the accuracy is high, the first pose can be used as the movement truth data.
  • the implementation manner of the process of determining the motion truth data in the embodiments of the present disclosure can be flexibly determined according to the actual situation, and is not limited to the following disclosed embodiments.
  • determining motion truth data according to at least one first pose of the first terminal during the acquisition process may include:
  • At least one of the collected data and at least one first pose of the first terminal during the collection process are used as the true motion data, where the collected data includes:
  • One or more of wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, the first acquired image, and the first IMU data are wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, the first acquired image, and the first IMU data.
  • the determined at least one first pose may be directly used as motion truth data. Since the number of the determined first poses is not limited in the embodiments of the present disclosure, the number of obtained motion truth data is also not limited in the embodiments of the present disclosure. In some embodiments of the present disclosure, the Each of the determined first poses is used as motion truth data, or one or more first poses are randomly selected from a plurality of first poses as motion truth data.
  • the collected data may also be used as motion truth data.
  • the collected data may include the first collected image and/or the first IMU data; in some embodiments of the present disclosure, since the implementation of the first terminal is not limited, the type of data collected by the first terminal is not limited. It may also be flexibly changed and expanded, so the collected data may also include one or more of wireless network WiFi data, Bluetooth data, geomagnetic data, and UWB data.
  • these collected data can have a corresponding relationship with the determined first pose, and can also provide corresponding constraints in the pose determination process to Assist in pose determination. Therefore, in some embodiments of the present disclosure, various types of collected data may also be used as motion truth data.
  • the amount of true motion data can be further increased, so that the application of the true motion data in different scenarios has better performance. Effect.
  • the motion truth data may be used to determine the accuracy of the positioning result, and the specific determination is not limited in the embodiments of the present disclosure.
  • the motion truth data can be used as the data in the benchmark data set used to judge the accuracy of the algorithm in the neural network evaluation algorithm, so as to be used to judge the accuracy of the positioning result.
  • the ground-truth motion data can also be used to train the neural network, and the specific application in the training process is not limited in the embodiments of the present disclosure.
  • the ground-truth motion data can be used as training data and/or test data in the neural network, so as to be applied in the training process of the neural network.
  • the ground-truth motion data may also be information fused with the global map.
  • the ground-truth motion data may also include collected data such as WiFi data, Bluetooth data, geomagnetic data, or UWB data.
  • collected data such as WiFi data, Bluetooth data, geomagnetic data, or UWB data.
  • FIG. 4 is a schematic flowchart of the acquisition of true motion data provided by an embodiment of the present disclosure.
  • the acquisition process of true motion data may include two links: global map reconstruction 401 and true motion data location 402 .
  • the global map reconstruction 401 link is used to reconstruct the global map.
  • a global map 4014 can be obtained based on the three sub-links of radar SLAM 4011 , feature matching 4012 and vision-radar joint optimization 4013 .
  • the second terminal carried by the operator moves in the global scene, so as to use the radar SLAM4011 to collect the laser point cloud in the global scene, use the vision sensor to collect the second collected image in the global scene, and use the IMU sensor to collect the global scene.
  • the second IMU data in the acquisition is performed.
  • the global map In the process of scanning the global scene by the second terminal, the global map can be reconstructed in real time by using the acquired laser point cloud, the second collected image and the second IMU data, so as to obtain the real-time map.
  • the real-time map can reflect the range in which the operator has collected map data in the global scene, so the real-time map can be sent to the target device.
  • the global map may be reconstructed offline by using the acquired laser point cloud, the second captured image and the second IMU data in the global scene to obtain the global map.
  • the laser point cloud and the second IMU data can be calculated by the radar SLAM4011, so as to determine at least one pose of the radar during the map data collection process, and the position and orientation of the radar can be aligned through the coordinate transformation relationship between the radar and the vision sensor.
  • the second captured image can reconstruct the visual map by means of feature matching 4012 to obtain at least one frame of the initial visual point cloud;
  • the at least one second pose of is the initial pose, and the features in the second captured image provide third constraint information for the visual map reconstruction process, so as to perform vision-radar joint optimization 4013 on the obtained initial visual point cloud.
  • the locating 402 of the true motion data needs to be realized by means of the first terminal including the AR glasses 4021 or the mobile phone 4022, wherein the true motion data
  • the positioning 402 may include four sub-links: local feature tracking 4023 , global feature tracking 4024 , visual-inertial joint optimization 4025 , and motion truth data storage 4026 .
  • the collection data is acquired by moving within a certain target scene in the global scene through the first terminal including the AR glasses 4021 or the mobile phone 4022 .
  • the collected data may include a first collected image and first IMU data.
  • the first captured image may be matched with a global map for global feature matching 4024, thereby realizing visual positioning and obtaining a global feature matching result.
  • Local feature tracking 4023 may also be performed between different frame images in the first captured image, so as to obtain a local feature matching result.
  • visual-inertial joint optimization 4025 may be performed according to the global feature matching results, the local feature matching results and the collected first IMU data, thereby determining the location of the first terminal in the target scene.
  • the equipment used is mainly a high-precision map acquisition equipment integrating lidar, camera, and IMU. Therefore, the overall cost of the equipment is low; and the global scene and the target scene do not need to be preliminarily Therefore, the scalability of the scale is obviously better than that of the related schemes that require pre-arrangement of the scene.
  • the upper limit of the scale mainly depends on the offline computing power, and the existing algorithms and computing power can meet the scene of hundreds of thousands of square meters
  • the method for acquiring motion truth data provided by the embodiment can be used in large-scale scenes; at the same time, the global map in the same global scene can be reused.
  • the acquisition only relies on the built-in sensor of the mobile device, so there is no need to perform additional operations such as calibration and synchronization with other external devices before each acquisition to limit the large-scale acquisition; in addition, the method for acquiring the true motion data provided by the embodiment of the present disclosure does not yet Restricted by the application scenario, it can be applied to both indoor and outdoor scenarios.
  • the motion truth value obtained in the embodiments of the present disclosure is not limited to being used in the evaluation or training of the neural network, but can also be extended to other scenarios, which is not limited in the present disclosure.
  • the present disclosure also provides a pose determination device, an electronic device, a computer-readable storage medium, and a program, all of which can be used to implement any one of the pose determination methods provided by the present disclosure. Corresponding records will not be repeated.
  • FIG. 5 is a schematic structural diagram of a pose determination apparatus 5 according to an embodiment of the present disclosure.
  • the pose determination apparatus may be a terminal device, a server, or other processing devices.
  • the terminal device may be a UE, a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a PDA, a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like.
  • the pose determination apparatus may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • the pose determination device 5 may include:
  • the acquisition data acquisition module 501 is configured to acquire acquisition data acquired by the first terminal in the target scene.
  • the global map obtaining module 502 is configured to: obtain a global map including the target scene; wherein, the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition .
  • the pose determination module 503 is configured to: determine at least one first pose of the first terminal during the collection process according to the feature correspondence between the collected data and the global map.
  • the global map includes at least one frame of visual point cloud, and the visual point cloud includes at least one three-dimensional feature point in the global scene;
  • the collected data includes a first collected image;
  • the pose determination module 503 is configured to: The first captured image is feature-matched with at least one frame of visual point cloud to obtain a global feature matching result; at least one first pose of the first terminal during the capturing process is determined according to the global feature matching result.
  • the global map includes at least one frame of visual point cloud in the target scene; the collected data includes at least two frames of first collected images; the pose determination module 503 is configured to: the first collected image and the at least one frame Perform feature matching on the visual point cloud to obtain a global feature matching result; perform feature matching on at least two frames of the first collected images to obtain a local feature matching result; according to the global feature matching result and the local feature matching result, determine that the first terminal is in the collection process. at least one of the first poses.
  • the collected data further includes first inertial measurement IMU data; the pose determination module 503 is configured to: obtain first constraint information according to the global feature matching result and/or the local feature matching result; IMU data to obtain second constraint information; according to at least one of the first constraint information and the second constraint information, the global feature matching result and the local feature matching result are processed to obtain at least one first terminal during the acquisition process. one pose.
  • the pose determination module 503 is configured to process the global feature matching result and the local feature matching result through a beam method adjustment.
  • the pose determination module is configured to: match two-dimensional feature points in the first captured image with three-dimensional feature points included in at least one frame of visual point cloud to obtain a global feature matching result.
  • the apparatus further includes: a motion truth data acquisition module; the motion truth data acquisition module is configured to: determine motion truth data according to at least one first pose of the first terminal during the acquisition process .
  • the motion truth data acquisition module is configured to: use at least one first pose of the first terminal in the acquisition process as motion truth data; and/or, take at least one of the collected data and at least one first pose of the first terminal during the acquisition process, as the motion truth data; wherein the acquisition data includes: wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, first acquired image and one or more of the first IMU data.
  • the motion ground truth data is used for at least one of the following operations: judging the accuracy of the positioning result, training the neural network, and performing information fusion with the global map.
  • the map data includes: the laser point cloud in the global scene, the second acquired image, and the second IMU data; the apparatus further includes: a map data acquisition module and a global map generation module; wherein, the map data acquisition The module is configured to: obtain map data of the global scene collected by the second terminal; the global map generation module is configured to: reconstruct the global scene offline according to the map data, and generate a global map of the global scene.
  • the global map generation module is configured to: determine at least one second pose of the second terminal during the data collection process according to the second IMU data and the laser point cloud; , in combination with the second captured image, reconstruct the visual map of the global scene, and obtain at least one frame of visual point cloud, wherein the visual point cloud corresponds to a plurality of three-dimensional feature points in the global scene; according to at least one frame of visual point cloud, obtain the global scene A global map of the scene.
  • the global map generation module is configured to: perform visual map reconstruction on the global scene according to at least one second pose and in combination with the second captured image, to obtain at least one frame of initial visual point cloud; Cloud and/or the second captured image, to obtain third constraint information in the process of visual map reconstruction; according to the third constraint information, optimize at least one frame of initial visual point cloud to obtain at least one frame of visual point cloud; wherein, the third The constraint information includes one or more of the plane constraint information of the laser point cloud, the edge constraint information of the laser point cloud, and the visual constraint information.
  • the second terminal includes: the radar is configured to obtain the laser point cloud in the global scene; the vision sensor is configured to obtain the second captured image in the global scene; the IMU sensor is configured to obtain the global scene Second IMU data in the scene.
  • the pose determination device 5 is configured to: calibrate the coordinate transformation relationship between the vision sensor and the IMU sensor to obtain a first calibration result; Calibration is performed to obtain a second calibration result; according to the first calibration result and the second calibration result, the coordinate transformation relationship among the vision sensor, the IMU sensor and the radar is jointly calibrated.
  • the pose determination device 5 is configured to: in the process of collecting map data by the second terminal, reconstruct the global scene in real time according to the map data, and generate a real-time map of the global scene; send the map to the target device Data and/or a real-time map, where the target device is configured to display the geographic extent over which data collection is done for the global scene.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • the embodiments of the present disclosure also provide a computer program program, the computer program includes computer-readable codes, when the computer-readable codes are executed in the electronic device, the processor in the electronic device executes the program to realize the provision of any of the above embodiments pose determination method.
  • Embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, which, when executed, cause the computer to perform the operations of the pose determination method provided by any of the foregoing embodiments.
  • the electronic device may be provided as a terminal, server or other form of device.
  • FIG. 6 shows a block diagram of an electronic device 6 according to an embodiment of the present disclosure.
  • the electronic device 6 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc. terminals.
  • the electronic device 6 may include one or more of the following components: a processor 601 , a first memory 602 , a first power supply component 603 , a multimedia component 604 , an audio component 605 , a first input/output interface 606 , and a sensor component 607 , and the communication component 608 .
  • the processor 601 generally controls the overall operation of the electronic device 6, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the number of processors 601 may be one or more, and the processor 601 may include one or more modules to facilitate interaction between the processor 601 and other components.
  • the processor 601 may include a multimedia module to facilitate its interaction with the multimedia component 604 .
  • the first memory 602 is configured to store various types of data to support operation at the electronic device 6 . Examples of such data include instructions for any application or method operating on the electronic device 6, contact data, phonebook data, messages, pictures, videos, and the like.
  • the first memory 602 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable read only memory, EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (Read- Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM Static Random-Access Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • Read- Only Memory Read- Only Memory
  • magnetic memory flash memory, magnetic disk or optical disk.
  • the first power supply assembly 603 provides electrical power to various components of the electronic device 6 .
  • the first power supply component 603 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the electronic device 6 .
  • Multimedia component 604 includes a screen that provides an output interface between the electronic device 6 and the user.
  • the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP).
  • the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel.
  • a touch sensor can not only sense the boundaries of a touch or swipe action, but also the duration and pressure associated with the touch or swipe action.
  • Multimedia component 604 includes a front-facing camera and/or a rear-facing camera. When the electronic device 6 is in an operation mode such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data.
  • Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 605 is configured to output and/or input audio signals.
  • the audio component 605 includes a microphone (Micphone, MIC) that is configured to receive external audio signals when the electronic device 6 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
  • the received audio signal may be further stored in the first memory 602 or transmitted via the communication component 608 .
  • Audio component 605 also includes a speaker for outputting audio signals.
  • the first input/output interface 606 provides an interface between the processor 601 and a peripheral interface module, and the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 607 includes one or more sensors for providing electronic device 6 with various aspects of status assessment.
  • the sensor assembly 607 can detect the open/closed state of the electronic device 6, the relative positioning of the components, such as the display and the keypad of the electronic device 6, the sensor assembly 607 can also detect the position change of the electronic device 6, or Changes in the position of a component of the electronic device 6 , presence or absence of user contact with the electronic device 6 , orientation or acceleration/deceleration of the electronic device 6 and changes in the temperature of the electronic device 6 .
  • Sensor assembly 607 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • the sensor assembly 607 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or an image sensor (Charge-coupled Device, CCD), for use in imaging applications.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD Charge-coupled Device
  • the sensor assembly 607 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • Communication component 608 is configured to facilitate wired or wireless communication between electronic device 6 and other devices.
  • the electronic device 6 can access a wireless network based on a communication standard, such as WiFi, a second generation wireless communication technology (The 2nd Generation, 2G) or a third generation mobile communication technology (The 3rd Generation, 3G), or a combination thereof.
  • the communication component 608 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 608 also includes a Near Field Communication (NFC) module to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC module may be implemented based on radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data association (Infrared Data Association, IrDA) technology, UWB technology, Bluetooth (Blue-Tooth, BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Wireless Broadband
  • Bluetooth Bluetooth
  • the electronic device 6 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (Digital Signal Processing Device) , DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation, used to perform the above method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • controller microcontroller, microprocessor, or other electronic component implementation, used to perform the above method.
  • a non-volatile computer-readable storage medium is also provided, such as a first memory 602 including computer program instructions that can be executed by the processor 601 of the electronic device 6 to complete the foregoing implementation The pose determination method described in the example.
  • FIG. 7 is a schematic structural diagram of a second electronic device 6 according to an embodiment of the disclosure.
  • the electronic device 6 may be provided as a server. 7
  • the electronic device 6 includes a processing component 701, wherein the processing component 701 may include one or more processors 601; the electronic device 6 further includes a memory resource represented by a second memory 702, and the second memory 702 is configured as Instructions for execution of the processing component 701, eg, application programs, are stored.
  • the application program stored in the second memory 702 may include at least one set of instructions.
  • the processing component 701 is configured to execute instructions to perform the above-described pose determination method.
  • the electronic device 7 may also include a second power supply assembly 703 , a network interface 704 configured to connect the electronic device 6 to a network, and a second input/output interface 705 .
  • the second power supply component 703 is configured to perform power management of the electronic device 6 .
  • the electronic device 6 can operate an operating system stored in the second memory 702, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • Embodiments of the present disclosure also provide a non-volatile computer-readable storage medium, where computer program instructions are stored in the storage medium.
  • the computer program instructions are executed by a processor, for example, the first memory 602 or In the second memory 702, the above-mentioned computer program instructions can be executed by the processing component 701 of the electronic device 6 to complete the above-mentioned pose determination method.
  • Embodiments of the present disclosure also provide a computer program, where the computer program includes computer-readable codes, and when the computer-readable codes run in an electronic device, the processor of the electronic device performs the pose determination as provided in any of the previous embodiments method.
  • the present disclosure may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), ROM, EPROM or flash memory, static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), digital versatile disk (Digital Video Disc, DVD), memory sticks, floppy disks, mechanical coding devices, such as other A punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM EPROM or flash memory
  • static random access memory Static Random-Access Memory
  • SRAM static random access memory
  • portable compact disk read-only memory Compact Disc Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • DVD digital versatile disk
  • memory sticks floppy disks
  • mechanical coding devices such as other A punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
  • the computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for carrying out operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • Source or object code written in any combination including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, can be connected to an external computer (e.g. use an internet service provider to connect via the internet).
  • LAN Local Area Network
  • WAN Wide Area Network
  • electronic circuits such as programmable logic circuits, FPGAs, or Programmable logic arrays (PLAs), that can execute computer-readable Program instructions are read to implement various aspects of the present disclosure.
  • PDAs Programmable logic arrays
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • a software development kit Software Development Kit, SDK
  • the embodiments of the present application disclose a method, device, electronic device, storage medium and program for determining a pose and attitude.
  • the method includes: acquiring acquisition data acquired by a first terminal in a target scene; acquiring a global map including the target scene ; wherein, the global map is generated based on the map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition; according to the collected data and the described
  • the feature correspondence between the global maps determines at least one first pose of the first terminal during the acquisition process.
  • the pose determination method provided by the embodiments of the present application can reduce the cost of obtaining the first pose, and can also improve the accuracy of the first pose.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)
  • Optical Radar Systems And Details Thereof (AREA)

Abstract

本公开涉及一种位姿确定方法、装置、电子设备、存储介质及程序。所述方法包括:获取目标场景中的第一终端采集的采集数据;获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。通过上述方法,可以规模化采集地采集精度较高的第一位姿数据,减少对目标场景的额外设备设置或是多个设备之间的额外标定同步产生的运算量。

Description

位姿确定方法、装置、电子设备、存储介质及程序
相关申请的交叉引用
本公开基于申请号为202010826704.X、申请日为2020年8月17日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机视觉技术领域,涉及但不限于一种位姿确定方法、装置、电子设备、存储介质及计算机程序。
背景技术
随着移动传感器、网络基础设施和云计算的快速发展,增强现实应用场景的规模已经从中小型扩展到大规模环境,大规模环境下的定位是增强现实应用的关键需求。相关技术中的定位技术,需要借助于大量的运动真值数据、比如设备在移动过程中的位姿数据才能实现,并且,在进行算法基准测试或模型训练时,也需要借助于大量的包括位姿数据在内的运动真值数据才能实现。因此,如何以较低的成本获取精度较高的运动真值数据,成为目前一个亟待解决的问题。
发明内容
本公开实施例提出了一种位姿确定方法、装置、电子设备、存储介质以及计算机程序。
本公开实施例提供了一种位姿确定方法,所述方法包括:
获取目标场景中的第一终端采集的采集数据;获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;
根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。
本公开实施例还提供了一种位姿确定装置,所述装置包括:
采集数据获取模块配置为:获取目标场景中的第一终端采集的采集数据;
全局地图获取模块配置为:获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;
位姿确定模块配置为:根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,所述全局地图包括至少一帧视觉点云,所述视觉点云包括所述全局场景中的至少一个三维特征点;所述采集数据包括第一采集图像;
本公开实施例还提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行如前任一所述的位姿确定方法。
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现如前任一所述的位姿确定方法。
本公开实施例还提供了一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如前任一所述的位姿确定方法。
在本公开实施例中,通过获取目标场景中第一终端采集的采集数据,以及获取包含目标场景的全局地图,并根据采集数据以及全局地图之间的特征对应关系,来确定第一终端在采集过程中的至少一个第一位姿。通过上述过程,可以重复利用全局场景的全局地图,在生成全局地图后即可规模化通过第一终端采集大量的第一位姿数据,而且获取用于生成第一位姿的采集数据的方式也较为简单,仅通 过第一终端即可实现采集,减小了对目标场景的额外设备设置或是多个设备之间的额外标定同步等,从而降低了第一位姿的获取成本;并且,由于全局地图满足精度条件,因此基于采集数据以及全局地图之间的特征对应关系所得到的第一位姿的数据,也具有较高精度。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本公开实施例提供的位姿确定方法的流程图;
图2为本公开实施例提供的视觉点云优化前后的对比示意图;
图3为本公开实施例提供的第二终端的结构示意图;
图4为本公开实施例提供的运动真值数据获取的流程示意图;
图5为本公开实施例提供的位姿确定装置的结构示意图;
图6为本公开实施例提供的第一种电子设备的结构示意图;
图7为本公开实施例提供的第二种电子设备的结构示意图。
具体实施方式
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。
移动定位是增强现实、自动驾驶、移动机器人等应用领域中的关键技术。增强现实,用于根据实时定位结果将虚拟物体与真实环境无缝融合,以实现对车辆或移动机器人的路径规划。早期的移动定位主要依靠专用硬件设备例如激光设备、差分全球定位系统(Global Positioning System,GPS)设备、高精度惯导设备实现,但这些设备的成本高且灵活性差,因此难以广泛应用。随着配置摄像头且计算能力明显改善的移动设备的普及,基于低成本的视觉传感器和IMU的(Simultaneous Localization And Mapping,SLAM)取得了重大突破,且已经能够在较小范围内实现实时定位。在增强显示方面,随着智能终端中配置的基于SLAM的增强现实平台的推出,智能终端进入了增强现实(Augmented Reality,AR)时代。通过重建大规模场景的高精地图提供地球级场景中的厘米级别的定位,比如对位姿的确定,成为了一种趋势。然而,在相关技术中尚未出现基于低成本的设备实现高精度的位姿确定的方案。
图1为本公开实施例提供的位姿确定方法的流程图,该方法可以应用于位姿确定装置。其中,位姿确定装置可以为终端设备、服务器或者其他处理设备等。终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。
在本公开的一些实施例中,本公开实施例提供的位姿确定方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
如图1所示,所述位姿确定方法可以包括步骤S11至步骤S13:
步骤S11、获取目标场景中的第一终端采集的采集数据。
步骤S12、获取包含目标场景的全局地图。
其中,全局地图,是基于第二终端对包含目标场景的全局场景进行数据采集所获得的地图数据生成的,且全局地图满足精度条件。
步骤S13、根据采集数据以及全局地图之间的特征对应关系,确定第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,目标场景可以是第一终端获取采集数据的任意场景,其实现形式可以根据实际需求进行灵活决定,在本公开实施例中不做限制。
在本公开的一些实施例中,目标场景可以包括室外场景,比如广场、街道或是空地等。
在本公开的一些实施例中,目标场景可以包括室内场景,比如教室、办公楼或是住宅楼等。
在本公开的一些实施例中,目标场景可以同时包含室外场景和室内场景。
在本公开的一些实施例中,第一终端可以是具有数据采集功能的移动终端,任何具有移动以及数据采集功能的设备,均可以作为第一终端。
在本公开的一些实施例中,第一终端可以是AR设备,比如手机或是AR眼镜等。
在本公开的一些实施例中,采集数据可以是第一终端在目标场景中采集的数据,采集数据的实现形式及其包含的数据内容,均可以根据第一终端的数据采集方式、或第一终端的数据采集的实际实现形式灵活决定,本公开实施例对此不做限定。
在本公开的一些实施例中,在第一终端为AR设备的情况下,采集数据可以包括AR设备对目标场景进行图像采集所得到的第一采集图像等;在第一终端为AR设备的情况下,采集数据还可以包括AR设备中的IMU对目标场景数据采集所得到的第一IMU数据等。
在本公开的一些实施例中,第一终端可以通过在目标场景中移动,以实现采集数据的采集,其中,第一终端的具体移动过程和方式均可以根据实际情况灵活选择。
在本公开的一些实施例中,可以通过从第一终端中读取采集数据、或是接收第一终端传输的采集数据的方式,获取采集数据;在本公开的一些实施例中,本公开实施例中提供的位姿确定方法也可以应用于第一终端中,在这种情况下,可以直接获取第一终端在目标场景中所采集的采集数据。
在本公开的一些实施例中,在目标场景为包含某一空地或广场的室外场景的情况下,全局场景可以是包含目标场景的郊区或市区的场景,同时该全局场景既可以包括该郊区或是市区中的室外场景,也可以包括该郊区或是市区中的室内场景等。
在本公开的一些实施例中,地图数据可以包括对全局场景进行图像采集得到的第二采集图像;地图数据可以包括对全局场景进行IMU数据采集所得到的第二IMU数据;地图数据还可以包括对全局场景进行雷达扫描所得到的激光点云数据等。
在本公开的一些实施例中,在第二终端包括用于图像采集的视觉传感器的情况下,地图数据可以包含第二采集图像;在第二终端包括用于采集IMU数据的IMU传感器的情况下,地图数据可以包含第二IMU数据;在第二终端包括用于采集激光点云的雷达的情况下,地图数据可以包含激光点云数据。第二终端包含的硬件结构以及连接方式同样可以详见后续各公开实施例,在此也先不做展开。
在本公开的一些实施例中,全局地图的实现形式可以根据全局场景的实际情况,以及地图数据的数据内容所共同决定。在本公开的一些实施例中,全局地图可以包含全局场景中各三维特征点的相关信息。在本公开的一些实施例中,全局地图可以包含全局场景中各三维特征点的相关信息,其中,全局场景中的三维特征点可以以图像的形式展示,三维特征点的相关信息包含的信息内容可以根据实际情况灵活决定,比如包含三维特征点的坐标以及三维特征点的特征信息,其中三维特征点的特征信息可以包含有三维特征点对应的特征描述子、三维特征点对应的通信信号指纹、或是语义信息中的一种或多种等与特征相关的信息。
在本公开的一些实施例中,全局地图的精度,可以是全局地图中各三维特征点的位置精度,比如可以是全局地图中包含的三维特征点的坐标,与三维特征点在全局场景中的实际位置之间的位置差值。因此,全局地图的精度条件,可以用于确定全局地图中各三维特征点的位置是否达到精度要求, 精度条件的具体内容可以根据实际情况灵活设定。
在本公开的一些实施例中,直接判断全局地图中三维特征点的坐标、与其实际位置之间的位置差值的难度可能较高,因此,可以通过地图数据的数据采集量是否达到一定的数据值,或是生成全局地图的方法精度是否达到要求等方式,来间接判断全局地图是否满足精度条件。举例来说,可以通过判断采集的地图数据所对应的地理范围,与全局场景所覆盖的地理范围之间的比值是否达到预设阈值的方式,来间接推断全局地图是否满足精度条件。
在本公开的一些实施例中,可以通过获取第二终端采集的地图数据,从而根据地图数据在位姿确定装置内生成全局地图;全局地图也可以在其他的装置或设备内进行生成,在这种情况下,获取全局地图的方式可以为直接从存储或生成全局地图的装置中,读取全局地图。
在本公开的一些实施例中,第二终端可以在全局场景中移动,从而采集相应的地图数据。
在公开实施例中,步骤S11和步骤S12的实现顺序在本公开实施例中不做限制,示例性地,步骤S11与步骤S12可以按照一定的先后顺序依次执行,步骤S11与步骤S12也可以同时执行。
在本公开实施例中,采集数据可以是对目标场景进行采集所得到的数据,因此,采集数据可以反应目标场景的特征;全局地图对应的全局场景由于包含目标场景,因此全局地图中也可以包含目标场景的特征,如此,根据采集数据以及全局地图之间的特征对应关系,可以包括采集数据与全局地图之间的特征对应关系。并且,由于第一终端在目标场景中移动可以采集大量的采集数据,采集数据之间也可以反应目标场景的特征,因此,在本公开实施例中,采集数据以及全局地图之间的特征对应关系,也可以包括采集数据自身包含的各个数据内部之间的特征对应关系。
在本公开的一些实施例中,第一位姿,可以是第一终端在目标场景的移动过程中,执行数据采集操作的时刻所对应的一个或多个位姿;其中,第一位姿的数量可以根据实际情况灵活决定。在本公开的一些实施例中,第一位姿可以与采集数据相对应,即第一位姿可以是第一终端在采集各采集数据的时刻所对应的位姿。
在本公开实施例中,通过获取目标场景中第一终端采集的采集数据,以及获取包含目标场景的全局地图,并根据采集数据以及全局地图之间的特征对应关系,能够确定第一终端在采集过程中的至少一个第一位姿。通过上述过程,可以重复利用全局场景的全局地图,在生成全局地图后即可规模化通过第一终端采集大量的第一位姿,而获取用于生成第一位姿的采集数据的方式也较为简单,仅通过第一终端即可实现采集,减小了对目标场景的额外设备设置或是多个设备之间的额外标定同步,从而降低了第一位姿获取的成本;并且,由于全局地图满足精度条件,因此基于采集数据以及全局地图之间的特征对应关系所得到的第一位姿,也具有较高精度。
如上述各公开实施例所述,地图数据的获得形式可以根据实际情况灵活决定,而基于地图数据生成全局地图的方式可以根据地图数据的实际情况灵活决定。因此,在本公开的一些实施方式中,地图数据可以包括:全局场景中的激光点云、第二采集图像以及第二IMU数据。
本公开实施例中提出的位姿确定方法还包括:
获取通过第二终端采集的全局场景的地图数据;
根据地图数据,对全局场景进行离线重建,生成全局场景的全局地图。
在本公开的一些实施例中,激光点云,可以是通过第二终端对全局场景进行雷达扫描所得到的多个激光点构成的点云,激光点云中包含的激光点数量可以根据第二终端的雷达扫描情况,以及第二终端在全局场景中的移动轨迹所共同灵活确定,在本公开实施例中不做限制。
在本公开的一些实施例中,第二采集图像,可以是第二终端在全局场景内移动的过程中所采集到的多个图像,第二采集图像的数量可以根据第二终端在全局场景中的移动情况,以及第二终端包含的用于采集图像的硬件设备的数量所共同决定,在本公开实施例中不做限制。
在本公开的一些实施例中,第二IMU数据,可以是第二终端在全局场景内移动的过程中所采集到的相关惯性测量数据,第二IMU数据的数量同样可以根据第二终端在全局场景中的移动情况,以及第二终端包含的用于采集IMU数据的硬件设备的数量所共同决定,在本公开实施例中不做限制。
在本公开实施例中,通过获取包括激光点云、第二采集图像以及第二IMU数据的地图数据,并根 据获取的地图数据,对全局场景进行离线重建,生成全局场景的全局地图。通过上述过程,可以在对全局场景完成较为全面的地图数据采集以后,再综合采集到的大量地图数据,对全局场景进行全面地离线重建,从而使得生成的全局地图具有较高的精度,进而使得基于全局地图和采集数据确定的至少一个第一位姿的结果较为准确;同时,由于地图数据包含激光点云、第二采集图像以及第二IMU数据,这些数据的获取方式较为容易且获取过程受空间制约的情况较少,因此,本公开实施例提出的姿态确定方法,获取地图数据以及全局地图的难度较小,从而降低了对环境和/或设备的依赖,进而使得该位姿确定方法能够应用在各种场景中。
如上述公开实施例所述,离线重建的过程可以根据实际情况灵活决定。在本公开的一些实施方式中,根据地图数据,对全局场景进行离线重建,生成全局场景的全局地图,包括:
根据第二IMU数据和激光点云,确定第二终端在数据采集过程中的至少一个第二位姿;
根据至少一个第二位姿、结合第二采集图像,对全局场景进行视觉地图重建,得到至少一帧视觉点云;其中,视觉点云包括全局场景中的至少一个三维特征点;
根据至少一帧视觉点云,得到全局场景的全局地图。
在本公开的一些实施例中,可以根据第二IMU数据,在第二终端在数据采集过程中的不同的时刻,将获取的激光点投影至该时刻下的激光雷达帧上,从而可以基于激光点的投影结果,对第二终端在数据采集过程中不同时刻的第二位姿进行估算。
在本公开的一些实施例中,在确定第二终端在数据采集过程中的至少一个第二位姿以后,可以根据至少一个第二位姿,结合第二采集图像,对全局场景进行视觉地图重建,来得到至少一帧视觉点云。视觉点云可以包括全局场景中的至少一个三维特征点,视觉点云的数量以及包含的三维特征点的数量在本公开实施例中不做限制。
在本公开的一些实施例中,全局地图可以包括一帧或多帧视觉点云。如上述各公开实施例所述,全局地图可以包含全局场景中各三维特征点的相关信息。在本公开的一些实施例中,视觉点云可以通过视觉图像得到,在这种情况下,全局地图还可以包括至少一帧用于观测视觉点云的视觉图像。
在本公开的一些实施例中,视觉点云包括的三维特征点,由于三维特征点也可以存储在全局地图中,因此视觉点云也可以与三维特征点的特征信息进行对应。在本公开的一些实施例中,三维特征点的特征描述子可以根据第二采集图像中提取的特征确定,因此,视觉点云可以与三维特征点的特征描述子进行对应。在本公开的一些实施例中,地图数据中还可以包含与通信相关的信号数据,比如WiFi信号、蓝牙信号或是UWB信号等,这些信号可以作为信号指纹,与三维特征点对应,从而作为三维特征点的特征信息,因此,视觉点云可以与三维特征点的通信信号指纹进行对应;在本公开的一些实施例中,第二采集图像中还可以包含一些语义信息,这些语义信息也可以与三维特征点之间建立对应关系,从而作为三维特征点的特征信息,在这种情况下,视觉点云可以与语义信息建立对应关系。
在本公开的一些实施例中,可以通过尺度不变特征变换(Scale-Invariant Feature Transform,SIFT)对第二采集图像进行特征提取和匹配,从而生成至少一帧视觉点云,示例性地,根据通过激光点云和第二IMU数据所确定的至少一个第二位姿之后,还可以进一步从至少一帧视觉点云中,观测到的各三维特征点的坐标等信息。
在本公开的一些实施例中,可以将得到的全部视觉点云,以及这些视觉点云所对应的三维特征点的特征信息等,共同作为全局地图;在本公开的一些实施例中,也可以从得到的视觉点云中选定一帧或多帧,并根据这一帧或多帧视觉点云对应的三维特征点的特征信息等,共同作为全局地图。
在本公开实施例中,可以综合利用激光点云、第二IMU数据和第二采集图像,通过视觉点云来表征全局场景中各三维特征点的位置和特征等信息,利用较易获取的数据即可实现全局地图的重建,且重建的结果较为准确,提升了整个姿态确定过程的便捷性和确定精度。
在本公开的一些实施方式中,根据至少一个第二位姿,结合第二采集图像,对全局场景进行视觉地图重建,得到至少一帧视觉点云,包括:
根据至少一个第二位姿、结合第二采集图像,对全局场景进行视觉地图重建,得到至少一帧初始视觉点云;
根据激光点云和/或第二采集图像,获取视觉地图重建过程中的第三约束信息;
根据第三约束信息,对至少一帧初始视觉点云进行优化,得到至少一帧视觉点云。
由于根据激光点云确定的第二位姿,精度可能较低。在这种情况下,直接利用确定的第二位姿,结合第二采集图像进行视觉地图重建得到的视觉点云,可能包含较大的噪声。因此,在本公开实施例中,在根据第二位姿和第二采集图像,对全局场景进行视觉地图重建后,可以将视觉地图重建得到的图像作为初始视觉点云,并根据激光点云和/或第二采集图像所产生的第三约束信息,对初始视觉点云进行进一步优化,从而降低初始视觉点云中的噪声,来得到具有较高精度的视觉点云。
其中,根据第二位姿以及第二采集图像进行视觉地图重建,得到至少一帧初始视觉点云的过程,可以参考上述公开实施例,在此不再赘述。
在本公开实施例中,第三约束信息可以为根据激光点云和/或第二采集图像所确定的约束信息。
在本公开的一些实施方式中,根据激光点云,获取视觉地图重建过程中的第三约束信息可以包括:
通过实时激光里程计与建图(Lidar Odometry and Mapping in real-time,LOAM)方法,对激光点云进行特征提取,确定激光点云的平面特征信息以及边缘特征信息;
根据激光点云的平面特征信息,确定视觉地图重建过程中激光点云的平面约束信息;
根据激光点云的边缘特征信息,确定视觉地图重建过程中激光点云的边缘约束信息;
根据激光点云的平面约束信息和/或激光点云的边缘约束信息,获取视觉地图重建过程中的第三约束信息。
其中,激光点云的平面特征信息可以根据激光点云的实际情况灵活确定,基于激光点云的平面特征信息所确定的平面约束信息的具体形式,可以根据实际情况进行灵活选择,示例性地,平面约束信息可以通过式(1)计算得到:
Figure PCTCN2020140274-appb-000001
在式(1)中,n与m为两个不同的激光点云坐标系, mn为坐标系m中特征点 mq处的平面特征法向量, mn Tmn的转置,
Figure PCTCN2020140274-appb-000002
为坐标系n与m之间的变换关系, np为坐标系n中的特征点, mq为坐标系m中的特征点,
Figure PCTCN2020140274-appb-000003
为依据
Figure PCTCN2020140274-appb-000004
这一坐标变换关系对 np执行的坐标变换,∑ p为激光点云平面特征的协方差矩阵,其中,∑ p的数值可以根据实际情况灵活设置,比如,∑ p可以设置为0.2m 2
同理,激光点云的边缘特征信息也可以根据激光点云的实际情况灵活确定,基于激光点云的边缘特征信息所确定的边缘约束信息的具体形式,可以根据实际情况进行灵活选择,示例性地,边缘约束信息可以通过式(2)计算得到:
Figure PCTCN2020140274-appb-000005
在式(2)中, mI为坐标系m中特征点 mq处的边缘特征方向向量,∑ e为激光点云边缘特征的协方差矩阵,其余参数与式(1)中对应参数的含义相同,其中,∑ e的数值可以根据实际情况灵活设置,比如,∑ e可以设置为0.5m 2
在分别确定激光点云的平面约束信息以及激光点云的边缘约束信息以后,可以将平面约束信息和边缘约束信息均作为第三约束信息,也可以将平面约束信息或是边缘约束信息中的一种作为第三约束信息,具体如何选择可以根据实际情况灵活确定。
在本公开的一些实施例中,根据第二采集图像,获取视觉地图重建过程中的第三约束信息可以包括:
将与初始视觉点云对应的三维特征点投影至初始视觉点云,得到投影结果;
根据投影结果与初始视觉点云中二维特征点之间的误差,获取视觉地图重建过程中的视觉约束信息;其中,二维特征点是初始视觉点云中与三维特征点对应的二维特征点;
根据视觉约束信息,获取视觉地图重建过程中的第三约束信息。
根据投影结果与初始视觉点云中与三维特征点对应的二维特征点之间的误差,获取视觉地图重建过程中的视觉约束信息的具体过程,可以根据实际情况灵活选择。示例性地,视觉约束信息可以通过式(3)计算得到:
Figure PCTCN2020140274-appb-000006
在式(3)中,X j为与视觉点云对应的第j个三维特征点,x ij为第i帧初始视觉点云中与三维特征点X j对应的二维特征点,f( WT i,X j)为将三维特征点X j投影至第i帧初始视觉点云的投影结果,∑ v为图像特征约束的协方差矩阵,其中,∑ v的数值可以根据实际情况灵活设定,示例性地,∑ v可以设置为2像素平方。
在本公开的一些实施例中,第三约束信息可以包括激光点云的平面约束信息、激光点云的边缘约束信息以及视觉约束信息中的一种或多种。在本公开的一些实施例中,第三约束信息可以同时包含激光点云的平面约束信息、激光点云的边缘约束信息和视觉约束信息,在这种情况下,根据第三约束信息,对至少一帧初始视觉点云进行优化,得到至少一帧视觉点云的过程可以通过式(4)实现:
Figure PCTCN2020140274-appb-000007
在式(4)中,L p为激光点云中属于平面的点所构成的点云,L'p为L p的集合,L e为激光点云中属于边缘的点所构成的点云,L'e为L e的集合,其余各参数的含义可以参考上述各公开实施例。
在本公开的一些实施例中,根据第三约束信息,对至少一帧初始视觉点云进行优化,可以包括对初始视觉点云包括的三维特征点进行优化,还可以包括对第二终端中采集第二采集图像的设备的位姿进行优化,在对第二终端中采集第二采集图像的设备的位姿进行优化的情况下,相应地,也可以对第二终端对应的第二位姿进行优化,从而减少了由于第二位姿的精确度较低所导致的视觉点云中包含的噪声。并且,在对视觉点云进行优化后,还可以基于视觉点云的优化结果,再次获取视觉地图重建过程的第三约束信息,并基于第三约束信息,对视觉点云进行进一步的迭代优化,迭代的次数可以根据实际情况灵活选择,在本公开实施例中不做限制。
图2为本公开实施例提供的视觉点云优化前后的对比示意图。在图2中针对同一场景,方框201和方框202中为优化前的视觉点云对应的视觉图像,方框203和方框204中为优化后的视觉点云对应的视 觉图像,从图2中可以看出,优化后视觉点云中的噪声点有所减少、且清晰度有明显改善,优化后的视觉点云具有更高的精度,相应的,优化后的视觉点云对应的三维特征点的精度也有所改善。
因此,在本公开实施例中,第二终端可以包括:
雷达,用于获取全局场景中的激光点云;
视觉传感器,用于获取全局场景中的第二采集图像;
IMU传感器,用于获取全局场景中的第二IMU数据。
在本公开的一些实施例中,雷达可以是具有激光点云采集功能的任意雷达,示例性地,雷达可以为三维(Three Dimension,3D)雷达。视觉传感器可以是具有图像采集功能的任意传感器,比如相机等。在本申请的一些实施例中,第二终端可以同时包括具有360°图像采集功能的4阵列相机。IMU传感器的实现形式同样可以根据实际情况灵活决定。第二终端中雷达、视觉传感器以及IMU传感器之间的设置位置和连接关系可以根据实际情况灵活选择。
在本公开的一些实施例中,雷达、视觉传感器与IMU传感器之间可以刚性连接,具体的连接顺序可以根据实际情况灵活选择。在本公开的一些实施例中,视觉传感器和IMU传感器可以固定连接并封装为一个固定结构单元,雷达可以设置在固定结构单元的上方。在本公开的一些实施例中,视觉传感器、IMU传感器和雷达还可以固定设置在一个背包中。
图3为本公开实施例提供的第二终端的结构示意图。从图3中可以看出,视觉传感器和IMU传感器可以固定连接并封装为固定结构单元301,该固定结构单元301的下端可以设置在背包302内从而便于携带,雷达303可以设置在固定结构单元301的上方。
在本公开实施例中,通过第二终端中包含雷达、视觉传感器和IMU传感器,可以对全局场景中的地图数据进行全面采集,从而便于后续全局地图的生成。通过图3所示的简单且成本低的硬件设备第二终端采集地图数据,能够降低获取地图数据的设备成本,从而降低了确定第一位姿数据的硬件实现成本和难度。
由于第二终端可以包括雷达、视觉传感器以及IMU传感器等硬件设备,这些硬件设备在使用前可能需要进行标定或测量数据时间校准,并且,在对各硬件进行标定的同时,还可以对不同硬件之间的坐标变换关系进行标定,以提高生成的全局地图的精度。因此,在本公开的一些实施例中,根据地图数据,对全局场景进行离线重建,生成全局场景的全局地图之前,还可以包括:
对视觉传感器与IMU传感器之间的坐标变换关系进行标定,得到第一标定结果;
对雷达与视觉传感器之间的坐标变换关系进行标定,得到第二标定结果;
根据第一标定结果和第二标定结果,对视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定。
在本公开实施例中,对视觉传感器与IMU传感器之间的坐标变换关系进行标定的方式可以根据实际情况灵活选择,示例性地,可以通过Kalibr工具实现视觉传感器和IMU传感器的标定;对雷达与视觉传感器之间的坐标变换关系进行标定的方式同样可以根据实际情况灵活选择;还可以通过AutoWare框架实现雷达与视觉传感器的标定。示例性地,由于在标定过程中还可能存在误差,因此在一种可能的实现方式中,还可以根据第一标定结果和第二标定结果,对视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定与优化,以使得不同硬件设备之间的坐标变换关系更加准确。
在本公开的一些实施例中,联合标定可以通过式(5)实现:
Figure PCTCN2020140274-appb-000008
在式(5)中,C i为第二终端中的第i个视觉传感器,I为IMU传感器,L为雷达,
Figure PCTCN2020140274-appb-000009
为第i个视觉传感器与IMU传感器之间的坐标变换关系, IT L为雷达与IMU传感器之间的坐标变换关系,
Figure PCTCN2020140274-appb-000010
为 雷达与第i个视觉传感器之间的坐标变换关系,协方差∑ c/∑ L分别代表IMU传感器和雷达各自标定过程中的误差,该误差的值可以根据实际情况进行灵活设定,示例性地,∑ c和∑ L的对角矩阵中所有旋转分量均可以设置为0.01rad 2,∑ c的所有转换分量均可以设置为0.03m 2,∑ L的所有转换分类可以设置为(0.03,0.03,0.15)m 2
通过式(5)中所示的基于联合标定得到的视觉传感器和IMU传感器之间的坐标变换关系、以及雷达与IMU传感器之间的坐标变换关系,可以使得整体的标定误差较小,因此,在上述标定结束之后再进行全局地图的生成,就可以大大提升全局地图的精度,从而提升整个位姿确定过程的精度。
本公开实施例提出的位姿确定方法还可以包括:
在第二终端采集地图数据的过程中,根据地图数据对全局场景进行实时重建,生成全局场景的实时地图。
向目标设备发送地图数据和/或实时地图;其中,目标设备,用于显示对全局场景完成数据采集的地理范围。
在本公开的一些实施例中,为了便于掌握地图数据的采集情况,还可以在第二终端采集地图数据的过程中,根据地图数据对全局场景进行实时重建,生成全局场景的实时地图。实时地图的实现形式可以参考全局地图,在此不再赘述,在一个示例中,实时地图中可以覆盖全局场景中,第二终端已经采集到的地图数据所对应的各场景。
在实际应用中,由于实时重建可以基于当前已采集的地图数据进行重建,相对于离线重建中基于采集完成后得到的大量地图数据进行重建来说,重建的数据量较小,因此可以具有更高的重建速度。在本公开的一些实施例中,实时重建过程中,可以省略离线重建中的一些优化过程来提高重建速度,比如,实时重建过程中,可以省略获取第三约束信息以及根据第三约束信息对视觉点云进行优化的过程。在本公开的一些实施例中,实时重建可以通过一些特定的3D雷达即时定位与地图构建SLAM,又称为同步建图与定位(Concurrent Mapping and Localization,CML)系统来实现,示例性地,还可以通过开源的Cartographer库,来对全局场景进行实时重建,生成全局场景的实时地图。
在本公开的一些实施例中,目标设备可以用于显示对全局场景完成数据采集的地理范围,即目标设备可以显示第二终端已采集到的地图数据所覆盖的地理范围,从而指示第二终端在全局场景中的后续移动方向和地图数据的采集需求。在本公开的一些实施例中,目标设备可以是地图数据采集人员能够灵活控制的手持设备,比如平板电脑或是手机等;在本公开的一些实施例中,在将第二终端设置在移动设备上(比如自动机器人等)进行地图数据的采集的条件下,目标设备可以是移动设备的控制器或是显示屏等。
在本公开的一些实施例中,可以向目标设备发送已采集的地图数据,或者向目标设备发送实时地图,或者向目标设备同时发送地图数据和实时地图等。
在实际应用中,如果第二设备采集的地图数据不够全面,比如漏掉对全局场景中部分场景内地图数据的采集,将容易导致离线建立的全局地图精度降低,如果重新对全局场景进行地图数据的采集,则会产生额外的人力成本以及计算成本;另外,在实际应用中,在地图数据的采集过程中,也可能会发生重复采集的情况。而在本公开实施例中,在第二终端采集地图数据的过程中,根据地图数据对全局场景进行实时重建生成实时地图,并向目标设备发送地图数据和/或实时地图,就可以基于实时地图,对全局场景中已进行地图数据采集的区域进行实时预览,并可以随时把控地图的重建质量,从而提升地图数据的采集效率和成功率,也能够降低地图数据遗漏采集或重复采集的风险。
通过上述各公开实施例的各种组合形式可以生成全局地图,从而使得通过步骤S12获取全局地图具有实现的可能性。在获取到采集数据以及全局地图以后,如上述各公开实施例所述,可以通过步骤S13,来确定第一终端在采集过程中的至少一个第一位姿。
步骤S13的实现方式可以灵活确定,在本公开实施例中,全局地图可以包括至少一帧视觉点云,视觉点云包括全局场景中的至少一个三维特征点;采集数据包括第一采集图像;在这种情况下,步骤S13可以包括:
将第一采集图像与至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;
根据全局特征匹配结果,确定第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,第一采集图像,可以是第一终端在目标场景中所采集到的图像,第一采集图像的数量可以可以为一帧,也可以为多帧,根据实际情况进行确定即可,在本公开实施例中不做限定。
在本公开的一些实施例中,全局特征匹配结果可以是至少一帧视觉点云中、与第一采集图像中的二维特征点相匹配的三维特征点。
视觉点云的实现形式可以参考上述各公开实施例,在此不再赘述。
在本公开的一些实施例中,第一采集图像与视觉点云之间的特征匹配关系,可以根据实际情况灵活选择,任何可以实现图像之间特征匹配的方法,均可以作为第一采集图像和视觉点云之间的特征匹配方式,示例性地,可以采用SIFT,和/或采用稀疏光流跟踪方法(Kanade-Lucas-Tomasi Tracking Method,KLT),对第一采集图像以及至少一帧视觉点云进行特征匹配。
在一种可能的实现方式中,将第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果,可以包括:
将第一采集图像中的二维特征点,与至少一帧视觉点云包括的三维特征点进行匹配,得到全局特征匹配结果。
在本公开的一些实施例中,可以将第一采集图像中的二维特征点,与至少一帧视觉点云包括的三维特征点进行特征匹配,来得到全局匹配结果。其中,用于特征匹配的特征信息可以是特征描述子、通信信号指纹或是语义信息等各类特征信息的一种或多种。
在本公开的一些实施例中,全局特征匹配结果,可以通过近似最近邻搜索(Approximate Nearest Neighbor,ANN)的方式进行实现。比如,对于第一采集图像所包含的特征,可以在全局地图中寻找与该特征最接近的K个特征(K的数量可以根据实际情况进行灵活设定)。然后这K个特征可以对全局地图中的各帧视觉点云进行投票,以确定视觉点云是否与第一采集图像相对应,如果某帧或某几帧视觉点云的投票数超过设定的阈值,则可以认为某帧或某几帧视觉点云对应的视觉图像为第一采集图像的共视图像,在共视图像中,与第一采集图像中的二维特征点匹配的各三维特征点,可以作为全局特征匹配结果。
在本公开实施例中,通过ANN将第一采集图像中的二维特征点,与至少一帧视觉点云对应的三维特征点进行匹配,得到全局特征匹配结果的操作,可以减少特征匹配过程中误匹配的次数,提高全局特征匹配结果的精度,从而提升位姿确定的精度。
在得到全局特征匹配结果以后,可以根据全局特征匹配结果,确定第一终端在采集过程中的至少一个第一位姿,这一过程的实现方式同样可以根据实际情况灵活选择,不局限于下述各公开实施例。在一种可能的实现方式中,可以将全局特征匹配结果,通过随机一致性采样(Random Sample Consensus,RANSAC)方法和透视N点定位(Perspective n Points,PnP)等方法进行位姿估算,并通过重投影误差的优化方式对估算的位姿进行优化,从而得到第一终端在采集过程中的至少一个第一位姿。
通过上述过程,可以利用全局地图中视觉点云所对应的特征,与第一采集图像之间的特征进行匹配,从而利用第一采集图像中匹配到的特征对第一终端的位姿进行估算,来获取第一终端的至少一个位姿,由于全局地图的精度满足精度条件,因此基于与全局地图特征匹配的结果所确定的第一位姿,也具有较高的精度,也能够提升第一位姿确定过程的精度。
在本公开的一些实施例中,全局地图包括目标场景中的至少一帧视觉点云;采集数据可以包括至少两帧第一采集图像,步骤S13可以包括:
步骤S131,将第一采集图像与至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;
步骤S132,将至少两帧第一采集图像进行特征匹配,得到本地特征匹配结果;
步骤S133,根据全局特征匹配结果和本地特征匹配结果,确定第一终端在采集过程中的至少一个第一位姿。
其中,将第一采集图像与至少一帧视觉点云进行特征匹配,得到全局特征匹配结果的方式,可以参考上述各公开实施例,在此不再赘述。
由于生成的全局地图可能无法实现对全局场景的完全覆盖,仅根据第一采集图像与视觉点云之间进行特征匹配所得到的全局特征匹配结果,确定第一位姿的方式,可能会由于视觉点云包括的三维特征点不全或是数量较少等原因,导致确定第一位姿的结果不准确或是无法确定第一位姿。因此,在本公开的一些实施例中,可以在采集数据包括至少两帧第一采集图像的情况下,根据不同第一采集图像之间的特征匹配关系,进一步得到本地特征匹配结果,再根据全局特征匹配结果和本地特征匹配结果,共同确定第一终端在采集过程中的至少一个第一位姿。
本地特征匹配结果可以是不同第一采集图像帧之间相互匹配的二维特征点,根据至少两帧第一采集图像进行特征匹配的过程可以根据实际情况灵活选择。在本公开的一些实施例中,可以通过KLT方法,利用不同的第一采集图像之间的光流特征,进行特征匹配,从而得到本地特征匹配结果。
在本公开的一些实施例中,步骤S133中基于全局特征匹配结果确定第一位姿的方式,可以通过RANSAC和PnP对全局特征匹配结果和本地特征匹配结果,进行位姿的估算以及进一步地优化实现的。
在本公开实施例中,通过步骤S131至步骤S133,可以基于本地特征匹配结果,对全局特征匹配结果进行辅助,从而减小由于全局地图对全局场景覆盖不全面对位姿确定结果的影响,提高第一位姿的精度。
在本公开的一些实施例中,采集数据还可以包括第一IMU数据,在这种情况下,步骤S133可以包括:
根据全局特征匹配结果和/或本地特征匹配结果,获取第一约束信息;
根据第一IMU数据,获取第二约束信息;
根据第一约束信息和第二约束信息中的至少一种,对全局特征匹配结果和本地特征匹配结果进行处理,得到第一终端在采集过程中的至少一个第一位姿。
其中,第一IMU数据可以是第一终端在目标场景中进行数据采集的过程中,采集到的惯性测量数据。
在本公开的一些实施例中,在通过全局特征匹配结果和本地特征匹配结果确定第一位姿的过程中,还可以获取第一约束信息和第二约束信息,来对求取第一位姿的过程添加约束。其中,第一约束信息可以是根据全局特征匹配结果和/或本地特征匹配结果所得到的约束信息。具体如何获取第一约束信息。
在本公开的一些实施例中,可以利用全局特征匹配结果中匹配的三维特征点和二维特征点的信息,来获取第一约束信息。在一个示例中,根据全局特征匹配结果,获取第一约束信息的过程可以通过式(6)实现:
Figure PCTCN2020140274-appb-000011
在式(6)中, WT i为第一终端中用于采集第一采集图像的设备在采集第i帧第一采集图像情况下的位姿,
Figure PCTCN2020140274-appb-000012
为全局特征匹配结果中匹配的第j个三维特征点,
Figure PCTCN2020140274-appb-000013
为全局特征匹配结果中与
Figure PCTCN2020140274-appb-000014
匹配的二维特征点,
Figure PCTCN2020140274-appb-000015
为将三维特征点
Figure PCTCN2020140274-appb-000016
投影至第i帧第一采集图像上的投影结果。
在本公开的一些实施例中,可以利用本地特征匹配结果中匹配的三维特征点和二维特征点的信息,来获取第一约束信息。在一个示例中,根据本地特征匹配结果,获取第一约束信息的过程可以通 过式(7)实现:
Figure PCTCN2020140274-appb-000017
在式(7)中,x ij为本地特征匹配结果中匹配的第j个二维特征点,X j为本地特征匹配结果中x ij在目标场景中映射的三维特征点,f( WT i,X j)为将三维特征点X j投影至第i帧第一采集图像上的投影结果,其余参数的含义可以参考前述公开实施例。
式(6)或者式(7)的计算结果均可作为第一约束信息。在本公开的一些实施例中,还可以根据全局特征匹配结果和本地特征匹配结果,共同获取第一约束信息,在这种情况下,可以将式(6)与式(7)中获取第一约束信息的方式结合,以得到第一约束信息。
在本公开的一些实施例中,第二约束信息可以是根据第一IMU数据所得到的约束信息。
在本公开的一些实施例中,可以利用第一终端中采集第一采集图像以及采集第一IMU数据的设备的相关参数,来获取第二约束信息。示例性地,根据第一IMU数据,获取第二约束信息的过程可以通过式(8)实现:
Figure PCTCN2020140274-appb-000018
在式(8)中,C i=( WT i, Wv i,b a,b g)为采集第i帧第一采集图像的情况下第一终端的参数, Wv i为第一终端的速度,b a为第一终端中测量第一IMU数据的设备的加速度偏置,b g为第一终端中测量第一IMU数据的设备的陀螺仪测量偏置,h()为IMU成本函数,其余参数的含义可以参考上述各公开实施例。
在本公开的一些实施例中,可以根据第一终端在采集第一采集图像的过程中,第一IMU数据的变化情况,来确定第二约束信息。
在本公开的一些实施例中,对全局特征匹配结果和本地特征匹配结果进行处理,可以包括:通过光束法平差,对全局特征匹配结果和本地特征匹配结果进行处理。
其中,光束法平差(Bundle Adjustment,BA)是一种位姿求解的实现方式。在本公开的一些实施例中,可以通过BA对约束信息进行求解,计算最小误差下的第一位姿。示例性地,可以将第一约束信息和第二约束信息共同作为约束信息,在这种情况下,通过BA对约束信息进行求解的过程可以通过下述式(9)进行表示:
Figure PCTCN2020140274-appb-000019
其中各参数的含义可以参考前述各公开实施例,在此不再赘述。
在本公开的一些实施例中,可以利用关键帧求解以及增量BA(Incremental Consistent and Efficient Bundle Adjustment,ICE-BA)的求解方法,对式(9)进行计算,从而确定至少一个第一位姿。
通过上述过程,可以利用第一约束信息以及第二约束信息中的至少一种,对得到的第一位姿进行优化,从而使得最终确定的第一位姿整体更加平滑,减小抖动性;并且,利用关键帧以及ICE-BA等方式对第一位姿进行求解,可以有效减小第一位姿确定过程中的计算量,从而提高位姿确定过程的效率。如上述各公开实施例所述,本公开实施例中确定的第一位姿的精度较高,因此本公开实施例中提出的方法,可以应用于移动定位领域中的各类场景,具体应用于何种场景可以根据实际情况进行选择。
在本公开的一些实施例中,本公开实施例中提出的位姿确定方法,可以用于离线确定设备位姿。 在本公开的一些实施例中,通过本公开实施例中提出的位姿确定方法确定的第一位姿,可以用于对一些与移动定位相关的神经网络算法进行结果准确性的评估等。
携带有运动真值的数据集,是研发SLAM技术的重要条件。其中,运动中真值可以用于对SLAM算法的精度进行评价和对比,也可以在对一些极端情况如针对运动模糊、光照变化剧烈、特征点稀少的图片进行处理时,作为SLAM算法精度的提升标准,从而能够提高SLAM算法应对极端场景的能力。在实际应用中,在室外应用场景中,运动真值主要通过GPS获取;在室内应用场景中,运动真值主要通过在室内环境中搭建高精度运动捕捉系统如VICON、lighthouse等实现。
然而,GPS定位精度只有米级别,因此无法实现高精度的运动真值获取,而差分GPS目前可以达到较高的定位精度,但这种方法的成本过高;并且,GPS的精度和定位成功率容易收到建筑物遮挡的影响,且无法在室内使用。对于室内应用场景而言,以VICON为例,该系统是一种基于反射式的捕捉系统,它需要在被捕捉的物体上贴附一种定制的反光球作为信号接收器,当捕捉摄像机发射特定光线时,反光球会反射同样波长的光信号给摄像机,通过多个捕捉摄像机采集到的光信号,就可以计算得到精确的被捕捉物体的定位结果。这种方法需要提前在需要采集的轨迹真值的周围环境中、布置安装并标定VICON等运动捕捉系统的设备,因此无论设备还是部署成本都很高,一个小房间的设备成本就接近百万,更难以扩展至大尺度场景。此外,每个待采集真值的移动设备都需要安装并标定信号接收器,采集每组数据前都需要将接收信号与移动设备上的传感器做时间同步,费时费力,难以扩展至海量数据的采集。
在相关技术中,基于外部信号例如蓝牙、地磁等信号也能够实现实时定位,但这些方法通常需要依赖于先行构建的与定位环境匹配的信号指纹地图,并且,定位精度可以随着在环境中采集到的每个点位的信号的密集程度变化。为了获取每个点位的运动真值,就需要操作人员在定位环境中使用测量工具实地测量,这会产生较高的时间成本和人力成本,因此,无法通过这种方法获得海量的运动真值。
随着深度学习技术的快速发展,许多基于样本数据驱动的定位方法的优越性得以体现。比如,在视觉定位领域,由于深度神经网络、卷积神经网络的快速发展,通过提取大量图像样本数据中的特征点描述子并对其进行匹配的效果,甚至优于传统技术。在步行者航位推算(Pedestrian Dead Reckoning,PDR)领域,基于深度神经网络的行人轨迹恢复方法,也被证明优于传统的基于计步器的方法,甚至在简单条件下已接近视觉惯性SLAM的跟踪精度。然而,这些数据驱动方法的最终表现严重依赖于样本数据,因此对样本数据的质量、数量、场景多样性等方面的需求愈发旺盛,现有的运动真值获取方法无法满足这样的要求。本公开实施例在确定确定方法中,还提供了运动真值数据的获取方法。
在本公开的一些实施例中,本公开实施例提出的位姿确定方法还包括:
根据第一终端在采集过程中的至少一个第一位姿,确定运动真值数据,其中,运动真值数据用于以下操作中的至少一种:
判断定位结果的精度、对神经网络进行训练以及与全局地图进行信息融合。
其中,运动真值数据可以是神经网络训练中,认定其结果为真实值的数据,即神经网络算法中的Ground Truth数据。由于本公开实施例中确定的第一位姿为第一终端在数据采集这一运动过程中的位姿数据,且精度较高,因此可以将第一位姿作为运动真值数据。
根据第一终端在采集过程中的至少一个第一位姿,确定运动真值数据的过程在本公开实施例中的实现方式可以根据实际情况灵活决定,不局限于下述各公开实施例。
在本公开的一些实施例中,根据第一终端在采集过程中的至少一个第一位姿,确定运动真值数据,可以包括:
将第一终端在采集过程中的至少一个第一位姿作为所述运动真值数据;和/或,
将采集数据中的至少一种,以及第一终端在采集过程中的至少一个第一位姿,作为运动真值数据,其中,采集数据包括:
无线网络WiFi数据、蓝牙数据、地磁数据、超宽带UWB数据、第一采集图像以及第一IMU数据中的一种或多种。
在本公开的一些实施例中,可以直接将确定的至少一个第一位姿,作为运动真值数据。由于确定 的第一位姿的数量在本公开实施例中不做限定,因此得到的运动真值数据的数量在本公开实施例中也不做限定,在本公开的一些实施例中,可以将确定的各第一位姿均作为运动真值数据,或者随机地从多个第一位姿中选定一个或多个第一位姿来作为运动真值数据。
在本公开的一些实施例中,还可以将采集数据中的至少一种来作为运动真值数据。在本公开的一些实施例中,采集数据可以包括第一采集图像和/或第一IMU数据;在本公开的一些实施例中,由于第一终端的实现方式不受限定,其采集的数据类型也可能灵活发生变化与扩展,因此采集数据还可以包括无线网络WiFi数据、蓝牙数据、地磁数据以及UWB数据中的一种或多种等。
由于不同类型的采集数据均可以由第一终端进行采集,因此这些采集数据均可以与确定的第一位姿之间具有相应的对应关系,也可以在位姿确定的过程提供相应的约束,来辅助进行位姿确定。因此在本公开的一些实施例中,可以将多种类型的采集数据也作为运动真值数据。
通过将至少一个第一位姿,以及采集数据中的至少一种作为运动真值数据,可以进一步增加运动真值数据的数据量,从而使得运动真值数据在不同场景下的应用具有更好的效果。
在本公开的一些实施例中,运动真值数据可以用于判断定位结果的精度,具体如何判断在本公开实施例中不做限制。比如,可以将运动真值数据作为神经网络评价算法中用于评判算法准确度的benchmark数据集中的数据,从而用于对定位结果精度的判断。
在本公开的一些实施例中,运动真值数据也可以用于对神经网络进行训练,具体如何应用在训练过程中在本公开实施例中不做限制。比如,可以将运动真值数据作为神经网络中的训练数据和/或测试数据等,从而应用于神经网络的训练过程中。
在本公开的一些实施例中,运动真值数据还可以与全局地图进行信息融合,比如运动真值数据还可以包括如WiFi数据、蓝牙数据、地磁数据或是UWB数据等采集数据,而这些采集数据与第一位姿之间存在对应关系,因此,可以将这些采集数据作为额外的辅助数据,通过第一位姿与全局地图之间的对应关系,将这些采集数据也融合进全局地图中,从而进一步提升全局地图的数据精度和数据全面性,也可以进一步提升利用融合后的全局地图,进行其余的位姿确定的准确性。
图4为本公开实施例提供的运动真值数据获取的流程示意图,如图4所示,该运动真值数据获取流程,可以包括全局地图重建401以及运动真值数据定位402两个环节。
其中,全局地图重建401环节,用于重建全局地图。如图4所示,全局地图重建401环节,可以基于雷达SLAM4011、特征匹配4012以及视觉-雷达联合优化4013三个子环节得到全局地图4014。
通过操作人员背负的第二终端在全局场景中移动,从而利用雷达SLAM4011对全局场景中的激光点云进行采集,利用视觉传感器对全局场景中的第二采集图像进行采集,以及利用IMU传感器对全局场景中的第二IMU数据进行采集。
在第二终端对全局场景进行扫描的过程中,可以利用已获取的激光点云、第二采集图像以及第二IMU数据对全局地图进行实时重建,得到实时地图。在本公开实施例中,实时地图可以反应操作人员在全局场景中已经进行地图数据采集的范围,因此可以将实时地图发送至目标设备中。
在第二终端对全局场景进行扫描完成后,可以利用获取的全局场景中的激光点云、第二采集图像以及第二IMU数据对全局地图进行离线重建,得到全局地图。激光点云和第二IMU数据可以通过雷达SLAM4011进行计算,从而确定雷达在地图数据采集过程中的至少一个位姿,而且可以通过雷达与视觉传感器之间的坐标变换关系,将雷达的位姿准换为视觉传感器的位姿,从而得到第二终端的至少一个第二位姿;同时,第二采集图像可以通过特征匹配4012的方式进行视觉地图重建得到至少一帧初始视觉点云;可以利用确定的至少一个第二位姿作为初始位姿,以及第二采集图像中的特征为视觉地图重建过程提供第三约束信息,从而对得到的初始视觉点云进行视觉-雷达联合优化4013。通过上述过程,可以得到优化后的视觉点云,以及从视觉点云中包括的三维特征点的位置与特征信息。进一步地,可以将视觉点云以及三维特征点作为全局地图4014,从而实现全局地图的重建。
在完成全局地图重建后,可以进入到运动真值数据定位402的过程,运动真值数据定位402环节需要借助于包括AR眼镜4021或手机4022在内的第一终端实现,其中,运动真值数据定位402可以包括本地特征跟踪4023、全局特征跟踪4024、视觉-惯性联合优化4025以及运动真值数据存储4026四个子环 节。
在图4中,通过包括AR眼镜4021或手机4022在内的第一终端,在全局场景中的某个目标场景内进行移动,来获取采集数据。其中,采集数据可以包括第一采集图像以及第一IMU数据。
第一采集图像可以与全局地图进行全局特征匹配4024,从而实现视觉定位,得到全局特征匹配结果。第一采集图像中的不同帧图像之间还可以进行本地特征跟踪4023,从而得到本地特征匹配结果。在得到全局特征匹配结果以及本地特征匹配结果以后,可以根据全局特征匹配结果、本地特征匹配结果以及采集的第一IMU数据,进行视觉-惯性的联合优化4025,从而确定第一终端在目标场景的移动过程中的至少一个第一位姿。在得到至少一个第一位姿以后,可以将得到的第一位姿作为运动真值数据,并执行运动真值数据存储4026,示例性地,可以将运动真值数据存储在数据库中。
本公开实施例提供的运动真值数据的获取方法,所采用的设备主要为集成激光雷达、相机和IMU的高精地图采集设备,因此,设备总体成本较低;且全局场景以及目标场景无需预先布置,因此,尺度扩展性明显优于需预先布置场景的相关方案,在尺度上限主要取决于离线算力,且现有算法和算力已可满足数十万平的场景的情况下,本公开实施例提供的运动真值数据获取方法,可用于大尺度场景;同时,同一全局场景中的全局地图可重用,在采集和重建全局地图后即可规模化采集移动终端的海量数据,移动数据的采集只依赖移动设备的内置传感器,因此每次采集前无需进行和其他外部设备的标定、同步等局限规模化采集的额外操作;另外,本公开实施例提供的运动真值数据获取方法,还不受应用场景的限制,可以同时适用于室内外场景。
需要说明的是,本公开实施例获取的运动真值不仅限于用于神经网络的评价或训练中,也可以扩展应用于其他场景,本公开对此不作限定。
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
此外,本公开还提供了位姿确定装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种位姿确定方法,相应技术方案和描述和参见方法部分的相应记载,不再赘述。
图5为本公开实施例提供的位姿确定装置5的结构示意图。该位姿确定装置可以为终端设备、服务器或者其他处理设备等。其中,终端设备可以为UE、移动设备、用户终端、终端、蜂窝电话、无绳电话、PDA、手持设备、计算设备、车载设备、可穿戴设备等。
在本公开的一些实施例中,该位姿确定装置可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
如图5所示,所述位姿确定装置5可以包括:
采集数据获取模块501配置为:获取目标场景中的第一终端采集的采集数据。全局地图获取模块502配置为:获取包含目标场景的全局地图;其中,全局地图,是基于第二终端对包含目标场景的全局场景进行数据采集所获得的地图数据生成的,且全局地图满足精度条件。
位姿确定模块503配置为:根据采集数据以及全局地图之间的特征对应关系,确定第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,全局地图包括至少一帧视觉点云,视觉点云包括全局场景中的至少一个三维特征点;采集数据包括第一采集图像;位姿确定模块503配置为:将第一采集图像与至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;根据全局特征匹配结果,确定第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,全局地图包括目标场景中的至少一帧视觉点云;采集数据包括至少两帧第一采集图像;位姿确定模块503配置为:第一采集图像与至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;对至少两帧第一采集图像进行特征匹配,得到本地特征匹配结果;根据全局特征匹配结果和本地特征匹配结果,确定第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,采集数据还包括第一惯性测量IMU数据;位姿确定模块503配置为:根据全局特征匹配结果和/或本地特征匹配结果,获取第一约束信息;根据第一IMU数据,获取第二约 束信息;根据第一约束信息和第二约束信息中的至少一种,对全局特征匹配结果和本地特征匹配结果进行处理,得到第一终端在采集过程中的至少一个第一位姿。
在本公开的一些实施例中,位姿确定模块503配置为:通过光束法平差,对全局特征匹配结果和本地特征匹配结果进行处理。
在本公开的一些实施例中,位姿确定模块配置为:将第一采集图像中的二维特征点,与至少一帧视觉点云包括的三维特征点进行匹配,得到全局特征匹配结果。
在本公开的一些实施例中,装置还包括:运动真值数据获取模块;运动真值数据获取模块配置为:根据第一终端在采集过程中的至少一个第一位姿,确定运动真值数据。
在本公开的一些实施例中,运动真值数据获取模块配置为:将第一终端在采集过程中的至少一个第一位姿作为运动真值数据;和/或,将采集数据中的至少一种,以及第一终端在采集过程中的至少一个第一位姿,作为运动真值数据;其中,采集数据包括:无线网络WiFi数据、蓝牙数据、地磁数据、超宽带UWB数据、第一采集图像以及第一IMU数据中的一种或多种。
在本公开的一些实施例中,运动真值数据用于以下操作中的至少一种:判断定位结果的精度、对神经网络进行训练以及与全局地图进行信息融合。
在本公开的一些实施例中,地图数据包括:全局场景中的激光点云、第二采集图像以及第二IMU数据;装置还包括:地图数据获取模块和全局地图生成模块;其中,地图数据获取模块配置为:获取通过第二终端采集的全局场景的地图数据;全局地图生成模块配置为:根据地图数据,对全局场景进行离线重建,生成全局场景的全局地图。
在本公开的一些实施例中,全局地图生成模块配置为:根据第二IMU数据和激光点云,确定第二终端在数据采集过程中的至少一个第二位姿;根据至少一个第二位姿、结合第二采集图像,对全局场景进行视觉地图重建,得到至少一帧视觉点云,其中,视觉点云与全局场景中的多个三维特征点对应;根据至少一帧视觉点云,得到全局场景的全局地图。
在本公开的一些实施例中,全局地图生成模块配置为:根据至少一个第二位姿,结合第二采集图像,对全局场景进行视觉地图重建,得到至少一帧初始视觉点云;根据激光点云和/或第二采集图像,获取视觉地图重建过程中的第三约束信息;根据第三约束信息,对至少一帧初始视觉点云进行优化,得到至少一帧视觉点云;其中,第三约束信息,包括激光点云的平面约束信息、激光点云的边缘约束信息以及视觉约束信息中的一种或多种。
在本公开的一些实施例中,第二终端包括:雷达配置为、;获取全局场景中的激光点云;视觉传感器配置为:获取全局场景中的第二采集图像;IMU传感器配置为:获取全局场景中的第二IMU数据。
在本公开的一些实施例中,位姿确定装置5配置为:对视觉传感器与IMU传感器之间的坐标变换关系进行标定,得到第一标定结果;对雷达与视觉传感器之间的坐标变换关系进行标定,得到第二标定结果;根据第一标定结果和第二标定结果,对视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定。
在本公开的一些实施例中,位姿确定装置5配置为:在第二终端采集地图数据的过程中,根据地图数据对全局场景进行实时重建,生成全局场景的实时地图;向目标设备发送地图数据和/或实时地图,其中,目标设备,配置为显示对全局场景完成数据采集的地理范围。
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是非易失性计算机可读存储介质。
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。
本公开实施例还提供了一种计算机程序程序,该计算机程序包括计算机可读代码,当计算机可读代码在电子设备中运行时,电子设备中的处理器执行用于实现如上任一实施例提供的位姿确定方法。
本公开实施例还提供了另一种计算机程序产品,用于存储计算机可读指令,指令被执行时使得计算机执行上述任一实施例提供的位姿确定方法的操作。
电子设备可以被提供为终端、服务器或其它形态的设备。
图6示出根据本公开实施例的一种电子设备6的框图。例如,电子设备6可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。
参照图6,电子设备6可以包括以下一个或多个组件:处理器601,第一存储器602,第一电源组件603,多媒体组件604,音频组件605,第一输入/输出接口606,传感器组件607,以及通信组件608。
处理器601通常控制电子设备6的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理器601的数量可以为一个或多个,处理器601可以包括一个或多个模块,便于处理器601和其他组件之间的交互。例如,处理器601可以包括多媒体模块,以方便其与多媒体组件604之间的交互。
第一存储器602被配置为存储各种类型的数据以支持在电子设备6的操作。这些数据的示例包括用于在电子设备6上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。第一存储器602可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random-Access Memory,SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable read only memory,EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM),可编程只读存储器(Programmable Read-Only Memory,PROM),只读存储器(Read-Only Memory,ROM),磁存储器,快闪存储器,磁盘或光盘。
第一电源组件603为电子设备6的各种组件提供电能。第一电源组件603可以包括电源管理系统,一个或多个电源,及其他与为电子设备6生成、管理和分配电力相关联的组件。
多媒体组件604包括在所述电子设备6和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(Liquid Crystal Display,LCD)和触摸面板(Touch Panel,TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器不仅可以感测触摸或滑动动作的边界,而且还可以检测与所述触摸或滑动操作相关的持续时间和压力。多媒体组件604包括前置摄像头和/或后置摄像头。当电子设备6处于操作模式如拍摄模式或视频模式的情况下,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件605被配置为输出和/或输入音频信号。例如,音频组件605包括一个麦克风(Micphone,MIC),当电子设备6处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在第一存储器602或经由通信组件608发送。音频组件605还包括一个扬声器,用于输出音频信号。
第一输入/输出接口606为处理器601与外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件607包括一个或多个传感器,用于为电子设备6提供各个方面的状态评估。例如,传感器组件607可以检测到电子设备6的打开/关闭状态,组件的相对定位,例如所述组件为电子设备6的显示器和小键盘,传感器组件607还可以检测电子设备6的位置改变、或电子设备6某个组件的位置改变、用户与电子设备6接触的存在或不存在、电子设备6方位或加速/减速和电子设备6的温度变化。传感器组件607可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件607还可以包括光传感器,如互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)或图像传感器(Charge-coupled Device,CCD),用于在成像应用中使用。该传感器组件607还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件608被配置为便于电子设备6和其他设备之间有线或无线方式的通信。电子设备6可以接入基于通信标准的无线网络,如WiFi,第二代无线通信技术(The 2nd Generation,2G)或第三代移动通信技术(The 3rd Generation,3G),或它们的组合。在一个示例性实施例中,通信组件608经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件608还包括近场通信(Near Field Communication,NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(Radio Frequency Identification,RFID)技术,红外数据协会(Infrared Data Association, IrDA)技术,UWB技术,蓝牙(Blue-Tooth,BT)技术和其他技术来实现。
在示例性实施例中,电子设备6可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理设备(Digital Signal Processing Device,DSPD)、可编程逻辑器件(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的第一存储器602,上述计算机程序指令可由电子设备6的处理器601执行以完成前述实施例所述的位姿确定方法。
图7为本公开实施例的第二种电子设备6的结构示意图。例如,电子设备6可以被提供为一服务器。参照图7,电子设备6包括处理组件701,其中,处理组件701可以包括一个或多个处理器601;电子设备6还包括由第二存储器702所代表的存储器资源,第二存储器702被配置为存储处理组件701的执行的指令,例如应用程序。第二存储器702中存储的应用程序可以包括至少一组指令。此外,处理组件701被配置为执行指令,以执行上述位姿确定方法。
电子设备7还可以包括第二电源组件703、网络接口704被配置为将电子设备6连接到网络以及第二输入/输出接口705。其中,第二电源组件703被配置为执行电子设备6的电源管理。电子设备6可以操作存储在第二存储器702的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
本公开实施例还提供了一种非易失性计算机可读存储介质,该存储介质中存储有计算机程序指令,该计算机程序指令被处理器执行时,例如包括计算机程序指令的第一存储器602或第二存储器702,上述计算机程序指令可由电子设备6的处理组件701执行以完成上述位姿确定方法。
本公开实施例还提供了一种计算机程序,计算机程序包括计算机可读代码,在计算机可读代码在电子设备中运行的情况下,电子设备的处理器执行如前任一实施例提供的位姿确定方法。
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、ROM、EPROM或闪存、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任 意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、FPGA或可编程逻辑阵列(Programmable logic arrays,PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。
工业实用性
本申请实施例公开了一种位姿确定方法、装置、电子设备、存储介质及程序,所述方法包括:获取目标场景中的第一终端采集的采集数据;获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。本申请实施例提供的位姿确定方法,能够降低第一位姿的获取成本,且还能改善第一位姿的精度。

Claims (33)

  1. 一种位姿确定方法,所述方法包括:
    获取目标场景中的第一终端采集的采集数据;
    获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;
    根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。
  2. 根据权利要求1所述的方法,其中,所述全局地图包括至少一帧视觉点云,所述视觉点云包括所述全局场景中的至少一个三维特征点;所述采集数据包括第一采集图像;
    所述根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿,包括:
    将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;
    根据所述全局特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
  3. 根据权利要求1所述的方法,其中,所述全局地图包括所述目标场景中的至少一帧视觉点云;所述采集数据包括至少两帧第一采集图像;
    所述根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿,包括:
    将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;
    将所述至少两帧第一采集图像进行特征匹配,得到本地特征匹配结果;
    根据所述全局特征匹配结果和所述本地特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
  4. 根据权利要求3所述的方法,其中,所述采集数据还包括第一惯性测量IMU数据;
    所述根据所述全局特征匹配结果和所述本地特征匹配结果,确定所述第一终端在采集过程中的至少一个第一位姿,包括:
    根据所述全局特征匹配结果和/或所述本地特征匹配结果,获取第一约束信息;
    根据所述第一IMU数据,获取第二约束信息;
    根据所述第一约束信息和所述第二约束信息中的至少一种,对所述全局特征匹配结果和所述本地特征匹配结果进行处理,得到所述第一终端在采集过程中的至少一个所述第一位姿。
  5. 根据权利要求4所述的方法,其中,所述对所述全局特征匹配结果和所述本地特征匹配结果进行处理,包括:
    通过光束法平差,对所述全局特征匹配结果和所述本地特征匹配结果进行处理。
  6. 根据权利要求2至5中任意一项所述的方法,其中,所述将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果,包括:
    将所述第一采集图像中的二维特征点,与所述至少一帧视觉点云包括的三维特征点进行匹配,得到所述全局特征匹配结果。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    根据所述第一终端在采集过程中的至少一个所述第一位姿,确定运动真值数据。
  8. 根据权利要求7所述的方法,其中,所述根据所述第一终端在采集过程中的至少一个所述第一位姿,确定运动真值数据,包括:
    将所述第一终端在采集过程中的至少一个所述第一位姿作为所述运动真值数据;
    和/或,
    将所述采集数据中的至少一种,以及所述第一终端在采集过程中的至少一个所述第一位姿,作为 所述运动真值数据,其中,所述采集数据包括:
    无线网络WiFi数据、蓝牙数据、地磁数据、超宽带UWB数据、第一采集图像以及第一IMU数据中的一种或多种。
  9. 根据权利要求7或8任一所述的方法,其中,所述运动真值数据用于以下操作中的至少一种:
    判断定位结果的精度、对神经网络进行训练以及与所述全局地图进行信息融合。
  10. 根据权利要求1所述的方法,其中,所述地图数据包括:所述全局场景中的激光点云、第二采集图像以及第二IMU数据;
    所述方法还包括:
    获取通过所述第二终端采集的所述全局场景的地图数据;
    根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图。
  11. 根据权利要求10所述的方法,其中,所述根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图,包括:
    根据所述第二IMU数据以及所述激光点云,确定所述第二终端在数据采集过程中的至少一个第二位姿;
    根据至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧视觉点云;其中,所述视觉点云包括所述全局场景中的至少一个三维特征点;
    根据所述至少一帧视觉点云,得到所述全局场景的全局地图。
  12. 根据权利要求11所述的方法,其中,所述根据所述至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧视觉点云,包括:
    根据所述至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧初始视觉点云;
    根据所述激光点云和/或所述第二采集图像,获取视觉地图重建过程中的第三约束信息;其中,所述第三约束信息,包括所述激光点云的平面约束信息、所述激光点云的边缘约束信息以及视觉约束信息中的一种或多种;
    根据所述第三约束信息,对所述至少一帧初始视觉点云进行优化,得到至少一帧视觉点云。
  13. 根据权利要求10所述的方法,其中,所述第二终端包括:
    雷达,用于获取所述全局场景中的激光点云;
    视觉传感器,用于获取所述全局场景中的第二采集图像;
    IMU传感器,用于获取所述全局场景中的第二IMU数据。
  14. 根据权利要求13所述的方法,其中,所述根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图之前,还包括:
    对所述视觉传感器与所述IMU传感器之间的坐标变换关系进行标定,得到第一标定结果;
    对所述雷达与所述视觉传感器之间的坐标变换关系进行标定,得到第二标定结果;
    根据所述第一标定结果和所述第二标定结果,对所述视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定。
  15. 根据权利要求10至14中任意一项所述的方法,其中,所述方法还包括:
    在所述第二终端采集所述地图数据的过程中,根据所述地图数据对所述全局场景进行实时重建,生成所述全局场景的实时地图;
    向目标设备发送所述地图数据和/或所述实时地图;其中,所述目标设备,用于显示对所述全局场景完成数据采集的地理范围。
  16. 一种位姿确定装置,包括:
    采集数据获取模块配置为:获取目标场景中的第一终端采集的采集数据;
    全局地图获取模块配置为:获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;
    位姿确定模块配置为:根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。
  17. 根据权利要求16所述的装置,其中,所述全局地图包括至少一帧视觉点云,所述视觉点云包括所述全局场景中的至少一个三维特征点;所述采集数据包括第一采集图像;
    所述位姿确定模块配置为:将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;根据所述全局特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
  18. 根据权利要求16所述的装置,其中,所述全局地图包括所述目标场景中的至少一帧视觉点云;所述采集数据包括至少两帧第一采集图像;
    所述位姿确定模块配置为:将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;将所述至少两帧第一采集图像进行特征匹配,得到本地特征匹配结果;根据所述全局特征匹配结果和所述本地特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
  19. 根据权利要求18所述的装置,其中,所述采集数据还包括第一惯性测量IMU数据;
    所述位姿确定模块配置为:根据所述全局特征匹配结果和/或所述本地特征匹配结果,获取第一约束信息;根据所述第一IMU数据,获取第二约束信息;根据所述第一约束信息和所述第二约束信息中的至少一种,对所述全局特征匹配结果和所述本地特征匹配结果进行处理,得到所述第一终端在采集过程中的至少一个所述第一位姿。
  20. 根据权利要求19所述的装置,其中,
    所述位姿确定模块配置为:通过光束法平差,对所述全局特征匹配结果和所述本地特征匹配结果进行处理。
  21. 根据权利要求17至20任一所述的装置,其中:
    所述位姿确定模块配置为:将所述第一采集图像中的二维特征点,与所述至少一帧视觉点云包括的三维特征点进行匹配,得到所述全局特征匹配结果。
  22. 根据权利要求16所述的装置,所述装置还包括运动真值获取模块,其中,
    所述运动真值获取模块配置为:根据所述第一终端在采集过程中的至少一个所述第一位姿,确定运动真值数据。
  23. 根据权利要求22所述的装置,其中,
    所述运动真值获取模块配置为:将所述第一终端在采集过程中的至少一个所述第一位姿、作为所述运动真值数据;
    和/或,
    将所述采集数据中的至少一种,以及所述第一终端在采集过程中的至少一个所述第一位姿,作为所述运动真值数据,其中,所述采集数据包括:无线网络WiFi数据、蓝牙数据、地磁数据、超宽带UWB数据、第一采集图像以及第一IMU数据中的一种或多种。
  24. 根据权利要求22或23任一所述的装置,其中,
    所述运动真值数据用于以下操作中的至少一种:
    判断定位结果的精度、对神经网络进行训练以及与所述全局地图进行信息融合。
  25. 根据权利要求16所述的装置,其中,所述地图数据包括:所述全局场景中的激光点云、第二采集图像以及第二IMU数据;所述装置还包括地图数据获取模块以及全局地图生成模块;
    所述地图数据获取模块配置为:获取通过所述第二终端采集的所述全局场景的地图数据;
    所述全局地图生成模块配置为:根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图。
  26. 根据权利要求25所述的装置,其中,
    所述全局地图生成模块配置为:根据所述第二IMU数据以及所述激光点云,确定所述第二终端在数据采集过程中的至少一个第二位姿;根据所述至少一个所述第二位姿、结合所述第二采集图像,对 所述全局场景进行视觉地图重建,得到至少一帧视觉点云;根据所述至少一帧视觉点云,得到所述全局场景的全局地图;其中,所述视觉点云包括所述全局场景中的至少一个三维特征点。
  27. 根据权利要求26所述的装置,其中,
    所述全局地图生成模块配置为:根据所述至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧初始视觉点云;根据所述激光点云和/或所述第二采集图像,获取视觉地图重建过程中的第三约束信息;根据所述第三约束信息,对所述至少一帧初始视觉点云进行优化,得到至少一帧视觉点云;根据所述第三约束信息,对所述至少一帧初始视觉点云进行优化,得到至少一帧视觉点云;其中,所述第三约束信息,包括所述激光点云的平面约束信息、所述激光点云的边缘约束信息以及视觉约束信息中的一种或多种。
  28. 根据权利要求25所述的装置,其中,所述第二终端包括:
    雷达配置为:获取所述全局场景中的激光点云;
    视觉传感器配置为:获取所述全局场景中的第二采集图像;
    IMU传感器配置为:获取所述全局场景中的第二IMU数据。
  29. 根据权利要求28所述的装置,所述装置配置为:对所述视觉传感器与所述IMU传感器之间的坐标变换关系进行标定,得到第一标定结果;对所述雷达与所述视觉传感器之间的坐标变换关系进行标定,得到第二标定结果;根据所述第一标定结果和所述第二标定结果,对所述视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定。
  30. 根据权利要求25至29任一所述的装置,其中,
    所述装置配置为:在所述第二终端采集所述地图数据的过程中,根据所述地图数据对所述全局场景进行实时重建,生成所述全局场景的实时地图;向目标设备发送所述地图数据和/或所述实时地图,其中,所述目标设备用于显示对所述全局场景完成数据采集的地理范围。
  31. 一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至15中任意一项所述的位姿确定方法。
  32. 一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现权利要求1至15中任意一项所述的位姿确定方法。
  33. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如权利要求1至15任一项所述的位姿确定方法。
PCT/CN2020/140274 2020-08-17 2020-12-28 位姿确定方法、装置、电子设备、存储介质及程序 WO2022036980A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021568700A JP7236565B2 (ja) 2020-08-17 2020-12-28 位置姿勢決定方法、装置、電子機器、記憶媒体及びコンピュータプログラム
KR1020227003200A KR20220028042A (ko) 2020-08-17 2020-12-28 포즈 결정 방법, 장치, 전자 기기, 저장 매체 및 프로그램

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010826704.X 2020-08-17
CN202010826704.XA CN111983635B (zh) 2020-08-17 2020-08-17 位姿确定方法及装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022036980A1 true WO2022036980A1 (zh) 2022-02-24

Family

ID=73435659

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140274 WO2022036980A1 (zh) 2020-08-17 2020-12-28 位姿确定方法、装置、电子设备、存储介质及程序

Country Status (5)

Country Link
JP (1) JP7236565B2 (zh)
KR (1) KR20220028042A (zh)
CN (2) CN111983635B (zh)
TW (1) TW202208879A (zh)
WO (1) WO2022036980A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439536A (zh) * 2022-08-18 2022-12-06 北京百度网讯科技有限公司 视觉地图更新方法、装置及电子设备
CN115497087A (zh) * 2022-11-18 2022-12-20 广州煌牌自动设备有限公司 一种餐具姿态的识别系统及其方法
CN117636251A (zh) * 2023-11-30 2024-03-01 交通运输部公路科学研究所 一种基于机器人的灾损检测方法和系统
CN118089743A (zh) * 2024-04-24 2024-05-28 广州中科智云科技有限公司 一种换流站专用无人机智能导航与摄像系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111983635B (zh) * 2020-08-17 2022-03-29 浙江商汤科技开发有限公司 位姿确定方法及装置、电子设备和存储介质
CN112433211B (zh) * 2020-11-27 2022-11-29 浙江商汤科技开发有限公司 一种位姿确定方法及装置、电子设备和存储介质
WO2022133986A1 (en) * 2020-12-25 2022-06-30 SZ DJI Technology Co., Ltd. Accuracy estimation method and system
CN113108792A (zh) * 2021-03-16 2021-07-13 中山大学 Wi-Fi指纹地图重建方法、装置、终端设备及介质
CN112948411B (zh) * 2021-04-15 2022-10-18 深圳市慧鲤科技有限公司 位姿数据的处理方法及接口、装置、系统、设备和介质
CN114827727B (zh) * 2022-04-25 2024-05-07 深圳创维-Rgb电子有限公司 电视控制方法、装置、电视及计算机可读存储介质
WO2024085266A1 (ko) * 2022-10-17 2024-04-25 삼성전자 주식회사 초광대역 통신 신호를 이용하여 제스처를 검출하기 위한 방법 및 장치
CN116202511B (zh) * 2023-05-06 2023-07-07 煤炭科学研究总院有限公司 长巷道超宽带一维约束下的移动装备位姿确定方法及装置
CN116518961B (zh) * 2023-06-29 2023-09-01 煤炭科学研究总院有限公司 大规模固定视觉传感器全局位姿的确定方法和装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017097402A (ja) * 2015-11-18 2017-06-01 株式会社明電舎 周辺地図作成方法、自己位置推定方法および自己位置推定装置
CN106940186A (zh) * 2017-02-16 2017-07-11 华中科技大学 一种机器人自主定位与导航方法及系统
CN108801243A (zh) * 2017-04-28 2018-11-13 宏达国际电子股份有限公司 追踪系统及方法
CN109084732A (zh) * 2018-06-29 2018-12-25 北京旷视科技有限公司 定位与导航方法、装置及处理设备
CN109727288A (zh) * 2017-12-28 2019-05-07 北京京东尚科信息技术有限公司 用于单目同时定位与地图构建的系统和方法
CN110335316A (zh) * 2019-06-28 2019-10-15 Oppo广东移动通信有限公司 基于深度信息的位姿确定方法、装置、介质与电子设备
CN110389348A (zh) * 2019-07-30 2019-10-29 四川大学 基于激光雷达与双目相机的定位与导航方法及装置
CN110849374A (zh) * 2019-12-03 2020-02-28 中南大学 地下环境定位方法、装置、设备及存储介质
CN111983635A (zh) * 2020-08-17 2020-11-24 浙江商汤科技开发有限公司 位姿确定方法及装置、电子设备和存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101192825B1 (ko) * 2011-06-30 2012-10-18 서울시립대학교 산학협력단 Gps/ins/영상at를 통합한 라이다 지오레퍼린싱 장치 및 방법
JP6354120B2 (ja) * 2013-05-21 2018-07-11 株式会社デンソー 道路情報送信装置、地図生成装置、道路情報収集システム
EP3078935A1 (en) * 2015-04-10 2016-10-12 The European Atomic Energy Community (EURATOM), represented by the European Commission Method and device for real-time mapping and localization
JP6768416B2 (ja) * 2015-09-08 2020-10-14 キヤノン株式会社 画像処理装置、画像合成装置、画像処理システム、画像処理方法、及びプログラム
CN108475433B (zh) * 2015-11-20 2021-12-14 奇跃公司 用于大规模确定rgbd相机姿势的方法和系统
WO2018126228A1 (en) * 2016-12-30 2018-07-05 DeepMap Inc. Sign and lane creation for high definition maps used for autonomous vehicles
JP2019074532A (ja) * 2017-10-17 2019-05-16 有限会社ネットライズ Slamデータに実寸法を付与する方法とそれを用いた位置測定
CN108489482B (zh) * 2018-02-13 2019-02-26 视辰信息科技(上海)有限公司 视觉惯性里程计的实现方法及系统
CN108765487B (zh) * 2018-06-04 2022-07-22 百度在线网络技术(北京)有限公司 重建三维场景的方法、装置、设备和计算机可读存储介质
CN109737983B (zh) * 2019-01-25 2022-02-22 北京百度网讯科技有限公司 用于生成行驶路径的方法和装置
CN110118554B (zh) * 2019-05-16 2021-07-16 达闼机器人有限公司 基于视觉惯性的slam方法、装置、存储介质和设备
CN110246182B (zh) * 2019-05-29 2021-07-30 达闼机器人有限公司 基于视觉的全局地图定位方法、装置、存储介质和设备
CN111442722B (zh) * 2020-03-26 2022-05-17 达闼机器人股份有限公司 定位方法、装置、存储介质及电子设备

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017097402A (ja) * 2015-11-18 2017-06-01 株式会社明電舎 周辺地図作成方法、自己位置推定方法および自己位置推定装置
CN106940186A (zh) * 2017-02-16 2017-07-11 华中科技大学 一种机器人自主定位与导航方法及系统
CN108801243A (zh) * 2017-04-28 2018-11-13 宏达国际电子股份有限公司 追踪系统及方法
CN109727288A (zh) * 2017-12-28 2019-05-07 北京京东尚科信息技术有限公司 用于单目同时定位与地图构建的系统和方法
CN109084732A (zh) * 2018-06-29 2018-12-25 北京旷视科技有限公司 定位与导航方法、装置及处理设备
CN110335316A (zh) * 2019-06-28 2019-10-15 Oppo广东移动通信有限公司 基于深度信息的位姿确定方法、装置、介质与电子设备
CN110389348A (zh) * 2019-07-30 2019-10-29 四川大学 基于激光雷达与双目相机的定位与导航方法及装置
CN110849374A (zh) * 2019-12-03 2020-02-28 中南大学 地下环境定位方法、装置、设备及存储介质
CN111983635A (zh) * 2020-08-17 2020-11-24 浙江商汤科技开发有限公司 位姿确定方法及装置、电子设备和存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439536A (zh) * 2022-08-18 2022-12-06 北京百度网讯科技有限公司 视觉地图更新方法、装置及电子设备
CN115439536B (zh) * 2022-08-18 2023-09-26 北京百度网讯科技有限公司 视觉地图更新方法、装置及电子设备
CN115497087A (zh) * 2022-11-18 2022-12-20 广州煌牌自动设备有限公司 一种餐具姿态的识别系统及其方法
CN115497087B (zh) * 2022-11-18 2024-04-19 广州煌牌自动设备有限公司 一种餐具姿态的识别系统及其方法
CN117636251A (zh) * 2023-11-30 2024-03-01 交通运输部公路科学研究所 一种基于机器人的灾损检测方法和系统
CN117636251B (zh) * 2023-11-30 2024-05-17 交通运输部公路科学研究所 一种基于机器人的灾损检测方法和系统
CN118089743A (zh) * 2024-04-24 2024-05-28 广州中科智云科技有限公司 一种换流站专用无人机智能导航与摄像系统

Also Published As

Publication number Publication date
JP7236565B2 (ja) 2023-03-09
TW202208879A (zh) 2022-03-01
CN111983635B (zh) 2022-03-29
CN111983635A (zh) 2020-11-24
KR20220028042A (ko) 2022-03-08
JP2022548441A (ja) 2022-11-21
CN114814872A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2022036980A1 (zh) 位姿确定方法、装置、电子设备、存储介质及程序
US11165959B2 (en) Connecting and using building data acquired from mobile devices
WO2022262152A1 (zh) 地图构建方法及装置、电子设备、存储介质和计算机程序产品
US9646384B2 (en) 3D feature descriptors with camera pose information
CN112020855B (zh) 用于稳定视频以减少相机和人脸移动的方法、系统和设备
TWI753348B (zh) 位姿確定方法、位姿確定裝置、電子設備和電腦可讀儲存媒介
US9699375B2 (en) Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system
US10545215B2 (en) 4D camera tracking and optical stabilization
WO2022077296A1 (zh) 三维重建方法、云台负载、可移动平台以及计算机可读存储介质
US20220084249A1 (en) Method for information processing, electronic equipment, and storage medium
WO2023103377A1 (zh) 标定方法及装置、电子设备、存储介质及计算机程序产品
CN112432637B (zh) 定位方法及装置、电子设备和存储介质
CN113066135A (zh) 图像采集设备的标定方法及装置、电子设备和存储介质
CA3069813C (en) Capturing, connecting and using building interior data from mobile devices
WO2021088497A1 (zh) 虚拟物体显示方法、全局地图更新方法以及设备
CN113063421A (zh) 导航方法及相关装置、移动终端、计算机可读存储介质
CN112700468A (zh) 位姿确定方法及装置、电子设备和存储介质
WO2022110801A1 (zh) 数据处理方法及装置、电子设备和存储介质
KR20220169472A (ko) 센서 캘리브레이트 방법 및 장치, 전자 기기와 저장 매체
US20220345621A1 (en) Scene lock mode for capturing camera images
CN117115244A (zh) 云端重定位方法、装置及存储介质
CN116664887A (zh) 定位精度确定方法、装置、电子设备及可读存储介质
CN112308878A (zh) 一种信息处理方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021568700

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227003200

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20950180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20950180

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20950180

Country of ref document: EP

Kind code of ref document: A1