WO2022036980A1 - 位姿确定方法、装置、电子设备、存储介质及程序 - Google Patents
位姿确定方法、装置、电子设备、存储介质及程序 Download PDFInfo
- Publication number
- WO2022036980A1 WO2022036980A1 PCT/CN2020/140274 CN2020140274W WO2022036980A1 WO 2022036980 A1 WO2022036980 A1 WO 2022036980A1 CN 2020140274 W CN2020140274 W CN 2020140274W WO 2022036980 A1 WO2022036980 A1 WO 2022036980A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- global
- data
- map
- point cloud
- scene
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 225
- 238000003860 storage Methods 0.000 title claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 117
- 230000000007 visual effect Effects 0.000 claims description 179
- 230000033001 locomotion Effects 0.000 claims description 83
- 238000013480 data collection Methods 0.000 claims description 33
- 238000004590 computer program Methods 0.000 claims description 27
- 230000009466 transformation Effects 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 22
- 238000013528 artificial neural network Methods 0.000 claims description 14
- 238000005259 measurement Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 20
- 238000004891 communication Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 13
- 238000005457 optimization Methods 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000012549 training Methods 0.000 description 8
- 230000003190 augmentative effect Effects 0.000 description 7
- 230000004807 localization Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- QBWCMBCROVPCKQ-UHFFFAOYSA-N chlorous acid Chemical compound OCl=O QBWCMBCROVPCKQ-UHFFFAOYSA-N 0.000 description 3
- 239000011521 glass Substances 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/10—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
- G01C21/12—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
- G01C21/16—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
- G01C21/165—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
Definitions
- the present disclosure relates to the technical field of computer vision, and relates to, but is not limited to, a pose determination method, apparatus, electronic device, storage medium and computer program.
- the embodiments of the present disclosure provide a pose determination method, apparatus, electronic device, storage medium, and computer program.
- An embodiment of the present disclosure provides a method for determining a pose, the method includes:
- At least one first pose of the first terminal during the collection process is determined.
- An embodiment of the present disclosure also provides a device for determining a pose, the device comprising:
- the acquisition data acquisition module is configured to: acquire acquisition data acquired by the first terminal in the target scene;
- the global map obtaining module is configured to: obtain a global map including the target scene; wherein, the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition;
- the pose determination module is configured to: determine at least one first pose of the first terminal in the collection process according to the feature correspondence between the collected data and the global map.
- the global map includes at least one frame of visual point cloud, and the visual point cloud includes at least one three-dimensional feature point in the global scene;
- the collected data includes a first collected image;
- An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory to execute any of the preceding steps The described pose determination method.
- Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the method for determining a pose and attitude as described in any preceding one.
- Embodiments of the present disclosure further provide a computer program, the computer program includes computer-readable codes, and when the computer-readable codes are executed in an electronic device, the processor of the electronic device executes the code for realizing the The pose determination method described in the preceding one.
- the first terminal by obtaining the collection data collected by the first terminal in the target scene, and obtaining the global map including the target scene, and according to the feature correspondence between the collected data and the global map, it is determined that the first terminal is collecting at least one first pose in the process.
- the global map of the global scene can be reused, and after the global map is generated, a large amount of first pose data can be collected through the first terminal on a large scale, and the method of acquiring the collected data for generating the first pose is also It is relatively simple, and the acquisition can be realized only through the first terminal, which reduces the additional device settings for the target scene or the additional calibration synchronization between multiple devices, etc., thereby reducing the acquisition cost of the first pose;
- the global map satisfies the accuracy conditions, so the first pose data obtained based on the collected data and the feature correspondence between the global maps also has high accuracy.
- FIG. 1 is a flowchart of a pose determination method provided by an embodiment of the present disclosure
- FIG. 2 is a schematic diagram of comparison before and after visual point cloud optimization provided by an embodiment of the present disclosure
- FIG. 3 is a schematic structural diagram of a second terminal according to an embodiment of the present disclosure.
- FIG. 4 is a schematic flowchart of motion truth data acquisition provided by an embodiment of the present disclosure.
- FIG. 5 is a schematic structural diagram of an apparatus for determining a pose and orientation provided by an embodiment of the present disclosure
- FIG. 6 is a schematic structural diagram of a first electronic device according to an embodiment of the present disclosure.
- FIG. 7 is a schematic structural diagram of a second electronic device according to an embodiment of the present disclosure.
- Mobile positioning is a key technology in applications such as augmented reality, autonomous driving, and mobile robots.
- Augmented reality is used to seamlessly integrate virtual objects with the real environment based on real-time localization results to realize path planning for vehicles or mobile robots.
- Early mobile positioning mainly relied on dedicated hardware equipment such as laser equipment, differential Global Positioning System (GPS) equipment, and high-precision inertial navigation equipment.
- GPS Global Positioning System
- SLAM Simultaneous Localization And Mapping
- the smart terminal In terms of augmented display, with the launch of the SLAM-based augmented reality platform configured in the smart terminal, the smart terminal has entered the Augmented Reality (AR) era. It has become a trend to provide centimeter-level positioning in earth-level scenes by reconstructing high-precision maps of large-scale scenes, such as determination of pose. However, there is no solution for realizing high-precision pose determination based on low-cost equipment in the related art.
- AR Augmented Reality
- FIG. 1 is a flowchart of a pose determination method provided by an embodiment of the present disclosure, and the method can be applied to a pose determination apparatus.
- the pose determination apparatus may be a terminal device, a server, or other processing devices, or the like.
- the terminal device can be User Equipment (UE), mobile device, user terminal, terminal, cellular phone, cordless phone, Personal Digital Assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device Wait.
- UE User Equipment
- PDA Personal Digital Assistant
- the pose determination method provided by the embodiments of the present disclosure may be implemented by a processor calling computer-readable instructions stored in a memory.
- the pose determination method may include steps S11 to S13:
- Step S11 acquiring the collection data collected by the first terminal in the target scene.
- Step S12 acquiring a global map including the target scene.
- the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition.
- Step S13 Determine at least one first pose of the first terminal during the collection process according to the feature correspondence between the collected data and the global map.
- the target scene may be any scene in which the first terminal acquires the collected data, and its implementation form may be flexibly determined according to actual needs, which is not limited in the embodiments of the present disclosure.
- the target scene may include an outdoor scene, such as a square, a street, or an open space.
- the target scene may include an indoor scene, such as a classroom, an office building, or a residential building.
- the target scene may include both an outdoor scene and an indoor scene.
- the first terminal may be a mobile terminal with a data collection function, and any device with movement and data collection functions may be used as the first terminal.
- the first terminal may be an AR device, such as a mobile phone or AR glasses.
- the collected data may be data collected by the first terminal in the target scene, and the implementation form of the collected data and the data content contained in the collected data may be based on the data collection method of the first terminal or the first terminal.
- the actual implementation form of the data collection of the terminal is determined flexibly, which is not limited in this embodiment of the present disclosure.
- the collected data may include a first captured image obtained by the AR device performing image capture of the target scene, etc.; in the case where the first terminal is an AR device Next, the collected data may further include first IMU data obtained by the IMU in the AR device collecting the target scene data, and the like.
- the first terminal may move in the target scene to realize the collection of collected data, wherein the specific moving process and manner of the first terminal may be flexibly selected according to actual conditions.
- the collected data may be acquired by reading the collected data from the first terminal, or by receiving the collected data transmitted by the first terminal; in some embodiments of the present disclosure, the present disclosure implements The pose determination method provided in the example can also be applied to the first terminal. In this case, the collection data collected by the first terminal in the target scene can be directly acquired.
- the global scene when the target scene is an outdoor scene including a certain open space or a square, the global scene may be a suburb or an urban scene including the target scene, and the global scene may include the suburb Or an outdoor scene in an urban area, and may also include an indoor scene in the suburb or an urban area.
- the map data may include a second captured image obtained by performing image capture on the global scene; the map data may include second IMU data obtained by performing IMU data collection on the global scene; the map data may also include Laser point cloud data obtained by radar scanning of the global scene, etc.
- the map data in the case where the second terminal includes a visual sensor for image acquisition, the map data may include the second acquired image; in the case where the second terminal includes an IMU sensor for acquiring IMU data , the map data may include second IMU data; in the case that the second terminal includes a radar for collecting laser point clouds, the map data may include laser point cloud data.
- the hardware structure and connection method included in the second terminal can also refer to the subsequent disclosed embodiments for details, and will not be expanded here.
- the implementation form of the global map may be jointly determined according to the actual situation of the global scene and the data content of the map data.
- the global map may include relevant information of each three-dimensional feature point in the global scene.
- the global map may include relevant information of each 3D feature point in the global scene, wherein the 3D feature point in the global scene may be displayed in the form of an image, and the information content included in the relevant information of the 3D feature point It can be flexibly determined according to the actual situation, such as including the coordinates of the 3D feature points and the feature information of the 3D feature points.
- the feature information of the 3D feature points can include the feature descriptors corresponding to the 3D feature points, the communication signal fingerprints corresponding to the 3D feature points, Or one or more kinds of semantic information and other information related to features.
- the accuracy of the global map may be the position accuracy of each 3D feature point in the global map, for example, it may be the coordinates of the 3D feature points included in the global map, and the accuracy of the 3D feature points in the global scene. Position difference between actual positions. Therefore, the accuracy condition of the global map can be used to determine whether the position of each three-dimensional feature point in the global map meets the accuracy requirement, and the specific content of the accuracy condition can be flexibly set according to the actual situation.
- the global map satisfies the accuracy condition can be indirectly inferred by judging whether the ratio between the geographic range corresponding to the collected map data and the geographic range covered by the global scene reaches a preset threshold.
- a global map can be generated in the pose determination device according to the map data by acquiring map data collected by the second terminal; the global map can also be generated in other devices or devices, where In this case, the way to obtain the global map may be to directly read the global map from the device for storing or generating the global map.
- the second terminal may move in the global scene to collect corresponding map data.
- step S11 and step S12 is not limited in the embodiment of the present disclosure. Exemplarily, step S11 and step S12 may be performed in a certain order, and step S11 and step S12 may also be performed at the same time implement.
- the collected data may be data obtained by collecting the target scene. Therefore, the collected data may reflect the characteristics of the target scene. Since the global scene corresponding to the global map includes the target scene, the global map may also include the target scene. The feature of the target scene, in this way, according to the feature correspondence between the collected data and the global map, may include the feature correspondence between the collected data and the global map. Moreover, since the first terminal moves in the target scene, a large amount of collection data can be collected, and the collected data can also reflect the characteristics of the target scene. Therefore, in this embodiment of the present disclosure, the feature correspondence between the collected data and the global map is It may also include the feature correspondence between each data contained in the collected data itself.
- the first pose may be one or more poses corresponding to the moment when the first terminal performs the data collection operation during the movement of the target scene; The quantity can be flexibly determined according to the actual situation.
- the first pose may correspond to the collected data, that is, the first pose may be the pose corresponding to the moment when the first terminal collects each collected data.
- the first terminal by obtaining the collection data collected by the first terminal in the target scene, and obtaining the global map including the target scene, and according to the feature correspondence between the collected data and the global map, it can be determined that the first terminal is collecting at least one first pose in the process.
- the global map of the global scene can be reused. After the global map is generated, a large number of first poses can be collected through the first terminal on a large scale, and the method of acquiring the collected data for generating the first pose is also relatively simple.
- the acquisition can be realized only through the first terminal, which reduces the additional device settings for the target scene or the additional calibration synchronization between multiple devices, thereby reducing the cost of obtaining the first pose; and, due to the global map
- the accuracy conditions are met, so the first pose obtained based on the feature correspondence between the collected data and the global map also has high accuracy.
- the obtaining form of the map data can be flexibly determined according to the actual situation, and the method of generating the global map based on the map data can be flexibly determined according to the actual situation of the map data. Therefore, in some embodiments of the present disclosure, the map data may include: the laser point cloud in the global scene, the second acquired image, and the second IMU data.
- offline reconstruction of the global scene is performed to generate a global map of the global scene.
- the laser point cloud may be a point cloud composed of multiple laser points obtained by performing radar scanning of the global scene through the second terminal, and the number of laser points contained in the laser point cloud may be determined according to the second terminal.
- the radar scanning situation of the terminal and the movement trajectory of the second terminal in the global scene are jointly and flexibly determined, and are not limited in the embodiments of the present disclosure.
- the second collected images may be multiple images collected during the movement of the second terminal in the global scene, and the number of the second collected images may be based on the second terminal in the global scene It is determined jointly by the movement situation of the second terminal and the number of hardware devices used for capturing images included in the second terminal, which is not limited in this embodiment of the present disclosure.
- the second IMU data may be relevant inertial measurement data collected during the movement of the second terminal in the global scene, and the quantity of the second IMU data may also be based on the global data of the second terminal.
- the movement situation in the scene and the number of hardware devices included in the second terminal for collecting IMU data are jointly determined, and are not limited in the embodiments of the present disclosure.
- a global map of the global scene is generated.
- the result of the at least one first pose determined based on the global map and the collected data is relatively accurate; at the same time, since the map data includes the laser point cloud, the second collected image and the second IMU data, the acquisition method of these data is relatively easy and the acquisition process is limited by the There are few cases of space constraints. Therefore, the attitude determination method proposed in the embodiments of the present disclosure has less difficulty in acquiring map data and a global map, thereby reducing the dependence on the environment and/or equipment, thereby enabling the attitude determination method to be able to Applied in various scenarios.
- offline reconstruction is performed on the global scene according to the map data to generate a global map of the global scene, including:
- a global map of the global scene is obtained.
- the acquired laser points can be projected onto the lidar frame at the moment, so that the laser light-based The projection result of the point is used to estimate the second pose of the second terminal at different times during the data collection process.
- a visual map reconstruction of the global scene may be performed according to the at least one second pose and combined with the second collected image , to get at least one frame of visual point cloud.
- the visual point cloud may include at least one three-dimensional feature point in the global scene, and the number of the visual point cloud and the number of included three-dimensional feature points are not limited in the embodiments of the present disclosure.
- the global map may include one or more frames of visual point clouds. As described in the above disclosed embodiments, the global map may include relevant information of each three-dimensional feature point in the global scene. In some embodiments of the present disclosure, the visual point cloud may be obtained through a visual image. In this case, the global map may further include at least one frame of visual image for observing the visual point cloud.
- the three-dimensional feature points included in the visual point cloud can also be stored in the global map, so the visual point cloud can also correspond to the feature information of the three-dimensional feature points.
- the feature descriptors of the three-dimensional feature points may be determined according to the features extracted from the second captured image, so the visual point cloud may correspond to the feature descriptors of the three-dimensional feature points.
- the map data may also include signal data related to communication, such as WiFi signals, Bluetooth signals, or UWB signals, etc.
- the visual point cloud can correspond to the communication signal fingerprints of the three-dimensional feature points; in some embodiments of the present disclosure, the second captured image may also include some semantic information, and these semantic information may also be related to A corresponding relationship is established between the three-dimensional feature points, so as to serve as the feature information of the three-dimensional feature points. In this case, the visual point cloud can establish a corresponding relationship with the semantic information.
- feature extraction and matching may be performed on the second captured image through a scale-invariant feature transform (Scale-Invariant Feature Transform, SIFT), thereby generating at least one frame of visual point cloud, for example, according to
- SIFT Scale-Invariant Feature Transform
- all the obtained visual point clouds and the feature information of the three-dimensional feature points corresponding to these visual point clouds can be taken together as a global map; in some embodiments of the present disclosure, it is also possible to One or more frames are selected from the obtained visual point cloud, and based on the feature information of the three-dimensional feature points corresponding to the visual point cloud of this frame or multiple frames, they are collectively used as a global map.
- the laser point cloud, the second IMU data, and the second collected image can be comprehensively used, and the information such as the position and characteristics of each 3D feature point in the global scene can be represented by the visual point cloud, and the data that is easier to obtain can be used.
- the reconstruction of the global map can be realized, and the reconstruction result is more accurate, which improves the convenience and accuracy of the entire attitude determination process.
- a visual map is reconstructed on the global scene to obtain at least one frame of visual point cloud, including:
- At least one frame of the initial visual point cloud is optimized to obtain at least one frame of the visual point cloud.
- the accuracy may be lower due to the second pose determined from the laser point cloud.
- the visual point cloud obtained by directly using the determined second pose and combining the second captured image to reconstruct the visual map may contain relatively large noise. Therefore, in this embodiment of the present disclosure, after the visual map is reconstructed on the global scene according to the second pose and the second captured image, the image reconstructed from the visual map may be used as the initial visual point cloud, and the /or the third constraint information generated by the second captured image, further optimizes the initial visual point cloud, thereby reducing the noise in the initial visual point cloud to obtain a visual point cloud with higher precision.
- the process of reconstructing the visual map according to the second pose and the second captured image to obtain at least one frame of the initial visual point cloud may refer to the above disclosed embodiments, which will not be repeated here.
- the third constraint information may be constraint information determined according to the laser point cloud and/or the second captured image.
- obtaining the third constraint information in the visual map reconstruction process may include:
- the plane feature information of the laser point cloud determine the plane constraint information of the laser point cloud in the process of visual map reconstruction
- the edge feature information of the laser point cloud determine the edge constraint information of the laser point cloud in the process of visual map reconstruction
- the third constraint information in the visual map reconstruction process is acquired.
- the plane feature information of the laser point cloud can be flexibly determined according to the actual situation of the laser point cloud, and the specific form of the plane constraint information determined based on the plane feature information of the laser point cloud can be flexibly selected according to the actual situation, for example,
- the plane constraint information can be calculated by formula (1):
- n and m are two different laser point cloud coordinate systems
- m n is the plane feature normal vector at the feature point m q in the coordinate system m
- m n T is the transpose of m n
- n p is the feature point in the coordinate system n
- m q is the feature point in the coordinate system m, based on The coordinate transformation performed by this coordinate transformation relationship on n p
- ⁇ p is the covariance matrix of the plane feature of the laser point cloud, where the value of ⁇ p can be flexibly set according to the actual situation, for example, ⁇ p can be set to 0.2m 2 .
- the edge feature information of the laser point cloud can also be flexibly determined according to the actual situation of the laser point cloud.
- the specific form of the edge constraint information determined based on the edge feature information of the laser point cloud can be flexibly selected according to the actual situation.
- the edge constraint information can be calculated by formula (2):
- Equation (2) m I is the edge feature direction vector at the feature point m q in the coordinate system m, ⁇ e is the covariance matrix of the edge feature of the laser point cloud, and the remaining parameters have the meanings of the corresponding parameters in Equation (1).
- the value of ⁇ e can be flexibly set according to the actual situation, for example, ⁇ e can be set to 0.5m 2 .
- both the plane constraint information and the edge constraint information can be used as the third constraint information, or one of the plane constraint information or the edge constraint information can be used as the third constraint information.
- the third constraint information the specific selection can be flexibly determined according to the actual situation.
- acquiring the third constraint information in the process of reconstructing the visual map according to the second captured image may include:
- the visual constraint information in the process of visual map reconstruction is obtained; wherein, the two-dimensional feature points are the two-dimensional features corresponding to the three-dimensional feature points in the initial visual point cloud. point;
- the third constraint information in the visual map reconstruction process is obtained.
- the specific process of obtaining the visual constraint information in the visual map reconstruction process can be flexibly selected according to the actual situation.
- the visual constraint information can be calculated by formula (3):
- X j is the jth three-dimensional feature point corresponding to the visual point cloud
- x ij is the two-dimensional feature point corresponding to the three-dimensional feature point X j in the initial visual point cloud of the ith frame
- f( W T i , X j ) is the projection result of projecting the three-dimensional feature point X j to the initial visual point cloud of the i-th frame
- ⁇ v is the covariance matrix constrained by the image features, and the value of ⁇ v can be flexibly set according to the actual situation , exemplarily, ⁇ v can be set to 2 pixels squared.
- the third constraint information may include one or more of plane constraint information of the laser point cloud, edge constraint information of the laser point cloud, and visual constraint information. In some embodiments of the present disclosure, the third constraint information may simultaneously include plane constraint information of the laser point cloud, edge constraint information and visual constraint information of the laser point cloud. In this case, according to the third constraint information, at least One frame of initial visual point cloud is optimized, and the process of obtaining at least one frame of visual point cloud can be realized by formula (4):
- L p is the point cloud composed of points belonging to the plane in the laser point cloud
- L'p is the set of L p
- L e is the point cloud composed of points belonging to the edge in the laser point cloud
- L' e is a set of Le
- optimizing at least one frame of the initial visual point cloud may include optimizing the three-dimensional feature points included in the initial visual point cloud, and may also include optimizing the data collected in the second terminal.
- the pose of the device that collects the second image is optimized, and in the case of optimizing the pose of the device that collects the second image in the second terminal, correspondingly, the second pose corresponding to the second terminal can also be optimized. optimization, thereby reducing the noise contained in the visual point cloud due to the lower accuracy of the second pose.
- the third constraint information of the visual map reconstruction process can be obtained again based on the optimization result of the visual point cloud, and based on the third constraint information, the visual point cloud can be further iteratively optimized.
- the number of iterations can be flexibly selected according to the actual situation, which is not limited in this embodiment of the present disclosure.
- FIG. 2 is a schematic diagram of comparison before and after visual point cloud optimization according to an embodiment of the present disclosure.
- boxes 201 and 202 are the visual images corresponding to the visual point cloud before optimization
- boxes 203 and 204 are the visual images corresponding to the optimized visual point cloud. 2
- the optimized visual point cloud has higher accuracy.
- the three-dimensional corresponding to the optimized visual point cloud The accuracy of feature points is also improved.
- the second terminal may include:
- a vision sensor for acquiring the second captured image in the global scene
- the IMU sensor is used to obtain the second IMU data in the global scene.
- the radar may be any radar with a laser point cloud collection function, for example, the radar may be a three-dimensional (Three Dimension, 3D) radar.
- the visual sensor can be any sensor with image acquisition function, such as a camera.
- the second terminal may simultaneously include a 4-array camera with a 360° image acquisition function.
- the implementation form of the IMU sensor can also be flexibly determined according to the actual situation.
- the setting position and connection relationship between the radar, the visual sensor and the IMU sensor in the second terminal can be flexibly selected according to the actual situation.
- the radar, the vision sensor, and the IMU sensor may be rigidly connected, and the specific connection sequence may be flexibly selected according to the actual situation.
- the vision sensor and the IMU sensor may be fixedly connected and packaged as a fixed structural unit, and the radar may be disposed above the fixed structural unit.
- the vision sensor, the IMU sensor and the radar may also be fixedly arranged in a backpack.
- FIG. 3 is a schematic structural diagram of a second terminal according to an embodiment of the present disclosure.
- the visual sensor and the IMU sensor can be fixedly connected and packaged as a fixed structural unit 301 .
- the lower end of the fixed structural unit 301 can be set in the backpack 302 for easy portability, and the radar 303 can be set on the fixed structural unit 301 above.
- the map data in the global scene can be comprehensively collected, thereby facilitating the subsequent generation of the global map.
- Collecting map data through the simple and low-cost hardware device second terminal shown in FIG. 3 can reduce the equipment cost of acquiring map data, thereby reducing the hardware implementation cost and difficulty of determining the first pose data.
- the offline reconstruction of the global scene according to the map data, and before generating the global map of the global scene may further include:
- the coordinate transformation relationship between the vision sensor and the IMU sensor is calibrated to obtain the first calibration result
- the coordinate transformation relationship between the radar and the vision sensor is calibrated to obtain the second calibration result
- the coordinate transformation relationship among the vision sensor, the IMU sensor and the radar is jointly calibrated.
- the method of calibrating the coordinate transformation relationship between the vision sensor and the IMU sensor can be selected flexibly according to the actual situation.
- the calibration of the vision sensor and the IMU sensor can be realized by the Kalibr tool;
- the way of calibrating the coordinate transformation relationship between vision sensors can also be flexibly selected according to the actual situation;
- the calibration of radar and vision sensors can also be realized through the AutoWare framework.
- the coordinate transformation between the vision sensor, the IMU sensor, and the radar may also be performed according to the first calibration result and the second calibration result.
- the relationship is jointly calibrated and optimized to make the coordinate transformation relationship between different hardware devices more accurate.
- joint calibration can be achieved by formula (5):
- C i is the ith visual sensor in the second terminal
- I is the IMU sensor
- L is the radar
- I T L is the coordinate transformation relationship between the radar and the IMU sensor
- the covariance ⁇ c / ⁇ L represents the error in the calibration process of the IMU sensor and the radar, respectively. The value of this error can be flexibly set according to the actual situation.
- all rotation components in the diagonal matrices of ⁇ c and ⁇ L can be set to 0.01rad 2
- all transformation components of ⁇ c can be set to 0.03m 2
- all transformation categories of ⁇ L can be set to (0.03, 0.03 , 0.15)m 2 .
- the overall calibration error can be made smaller. Therefore, in After the above calibration is completed, the global map is generated, which can greatly improve the accuracy of the global map, thereby improving the accuracy of the entire pose determination process.
- the global scene is reconstructed in real time according to the map data, and a real-time map of the global scene is generated.
- the target device is used to display the geographic scope of data collection for the global scene.
- the second terminal may also perform real-time reconstruction of the global scene according to the map data during the process of collecting the map data to generate a real-time map of the global scene.
- the implementation form of the real-time map may refer to the global map, which will not be repeated here.
- the real-time map may cover each scene corresponding to the map data collected by the second terminal in the global scene.
- some optimization processes in the offline reconstruction may be omitted to improve the reconstruction speed.
- the acquisition of the third constraint information and the visual adjustment according to the third constraint information may be omitted.
- the real-time reconstruction can be implemented by some specific 3D radar real-time localization and map construction SLAM, also known as Concurrent Mapping and Localization (CML) system, exemplarily, also The open source Cartographer library can be used to reconstruct the global scene in real time and generate a real-time map of the global scene.
- SLAM also known as Concurrent Mapping and Localization
- CML Concurrent Mapping and Localization
- the target device may be used to display the geographic scope of data collection for the global scene, that is, the target device may display the geographic scope covered by the map data collected by the second terminal, thereby indicating the second terminal Subsequent movement directions and map data collection requirements in the global scene.
- the target device may be a handheld device that can be flexibly controlled by map data collectors, such as a tablet computer or a mobile phone; in some embodiments of the present disclosure, the second terminal is set on the mobile device Under the condition that map data is collected on a mobile device (such as an automatic robot, etc.), the target device may be a controller or a display screen of a mobile device.
- the collected map data may be sent to the target device, or the real-time map may be sent to the target device, or the map data and the real-time map may be sent to the target device at the same time.
- the global scene is reconstructed in real time according to the map data to generate a real-time map, and the map data and/or the real-time map are sent to the target device.
- real-time preview of the area where map data has been collected in the global scene, and the reconstruction quality of the map can be controlled at any time, thereby improving the collection efficiency and success rate of map data, and reducing the risk of missing or repeated collection of map data.
- a global map can be generated through various combinations of the above disclosed embodiments, so that it is possible to obtain the global map through step S12. After the acquisition data and the global map are acquired, as described in the above disclosed embodiments, step S13 may be used to determine at least one first pose of the first terminal in the acquisition process.
- step S13 can be flexibly determined.
- the global map may include at least one frame of visual point cloud, and the visual point cloud includes at least one three-dimensional feature point in the global scene; the collected data includes the first collected image;
- step S13 may include:
- At least one first pose of the first terminal in the acquisition process is determined.
- the first collected image may be an image collected by the first terminal in the target scene, and the number of the first collected image may be one frame or multiple frames, depending on the actual situation It only needs to be determined, which is not limited in the embodiments of the present disclosure.
- the global feature matching result may be three-dimensional feature points in at least one frame of visual point cloud that match the two-dimensional feature points in the first captured image.
- the feature matching relationship between the first captured image and the visual point cloud can be flexibly selected according to the actual situation, and any method that can achieve feature matching between images can be used as the first captured image and the visual point cloud.
- the feature matching method between visual point clouds can use SIFT, and/or use sparse optical flow tracking method (Kanade-Lucas-Tomasi Tracking Method, KLT), for the first collected image and at least one frame of visual point Cloud for feature matching.
- feature matching is performed between the first captured image and the at least one frame of visual point cloud to obtain a global feature matching result, which may include:
- two-dimensional feature points in the first captured image may be feature-matched with three-dimensional feature points included in at least one frame of visual point cloud to obtain a global matching result.
- the feature information used for feature matching may be one or more of various types of feature information such as feature descriptors, communication signal fingerprints, or semantic information.
- the global feature matching result may be implemented by an approximate nearest neighbor search (Approximate Nearest Neighbor, ANN). For example, for the feature included in the first captured image, K features closest to the feature can be found in the global map (the number of K can be flexibly set according to the actual situation). Then these K features can vote on the visual point cloud of each frame in the global map to determine whether the visual point cloud corresponds to the first captured image.
- threshold value it can be considered that the visual image corresponding to a certain frame or several frames of visual point cloud is the co-view image of the first captured image, and in the co-view image, each 3D feature matching the 2D feature points in the first captured image point, which can be used as the global feature matching result.
- the operation of obtaining the global feature matching result by matching the two-dimensional feature points in the first captured image with the three-dimensional feature points corresponding to at least one frame of visual point cloud through the ANN can reduce the process of feature matching.
- the number of mismatches improves the accuracy of the global feature matching results, thereby improving the accuracy of pose determination.
- the global feature matching result can be estimated by Random Sample Consensus (RANSAC) method and Perspective n Points (PnP) method, and the pose is estimated by The reprojection error optimization method optimizes the estimated pose, so as to obtain at least one first pose of the first terminal during the acquisition process.
- RANSAC Random Sample Consensus
- PnP Perspective n Points
- the features corresponding to the visual point cloud in the global map can be used to match the features of the first captured image, so that the pose of the first terminal can be estimated by using the matched features in the first captured image, to obtain at least one pose of the first terminal. Since the accuracy of the global map satisfies the accuracy conditions, the first pose determined based on the result of matching with the global map feature also has high accuracy and can also improve the first pose The accuracy of the pose determination process.
- the global map includes at least one frame of visual point cloud in the target scene; the collected data may include at least two frames of first collected images, and step S13 may include:
- Step S131 performing feature matching between the first captured image and at least one frame of visual point cloud to obtain a global feature matching result
- Step S132 performing feature matching on at least two frames of the first captured images to obtain a local feature matching result
- Step S133 Determine at least one first pose of the first terminal in the acquisition process according to the global feature matching result and the local feature matching result.
- the method of performing feature matching between the first captured image and at least one frame of visual point cloud to obtain a global feature matching result may refer to the above disclosed embodiments, which will not be repeated here.
- the way to determine the first pose may be due to the global feature matching results obtained by feature matching between the first captured image and the visual point cloud.
- the 3D feature points included in the point cloud are incomplete or the number is small, etc., resulting in inaccurate results of determining the first pose or inability to determine the first pose. Therefore, in some embodiments of the present disclosure, when the collected data includes at least two frames of the first collected images, a local feature matching result may be further obtained according to the feature matching relationship between different first collected images, and then based on the global feature matching results.
- the feature matching result and the local feature matching result jointly determine at least one first pose of the first terminal in the acquisition process.
- the local feature matching result may be two-dimensional feature points that match each other between different first captured image frames, and the process of performing feature matching according to at least two frames of the first captured image may be flexibly selected according to the actual situation.
- the KLT method may be used to perform feature matching using optical flow features between different first captured images, thereby obtaining a local feature matching result.
- the way of determining the first pose based on the global feature matching result in step S133 can be performed by RANSAC and PnP to estimate the pose and further optimize the global feature matching result and the local feature matching result realized.
- the global feature matching result can be assisted based on the local feature matching result, thereby reducing the influence on the pose determination result due to the incomplete coverage of the global scene by the global map, and improving the The accuracy of the first pose.
- the collected data may further include first IMU data, in this case, step S133 may include:
- the global feature matching result and the local feature matching result are processed to obtain at least one first pose of the first terminal during the acquisition process.
- the first IMU data may be inertial measurement data collected during the data collection process in the target scene by the first terminal.
- the first constraint information and the second constraint information may also be obtained, so as to obtain the first pose.
- the first constraint information may be constraint information obtained according to the global feature matching result and/or the local feature matching result. Specifically how to obtain the first constraint information.
- the first constraint information may be obtained by using the information of the matched three-dimensional feature points and two-dimensional feature points in the global feature matching result.
- the process of obtaining the first constraint information can be implemented by formula (6):
- W T i is the pose of the device in the first terminal for collecting the first captured image when the i-th frame of the first captured image is collected, is the jth three-dimensional feature point matched in the global feature matching result, Match results for global features with matched 2D feature points, for the three-dimensional feature points The projection result projected onto the first captured image of the i-th frame.
- the first constraint information may be acquired by using the information of the matched three-dimensional feature points and two-dimensional feature points in the local feature matching result.
- the process of obtaining the first constraint information can be realized by formula (7):
- x ij is the j-th two-dimensional feature point matched in the local feature matching result
- X j is the three-dimensional feature point mapped by x ij in the target scene in the local feature matching result
- f( W T i , X j ) is the projection result of projecting the three-dimensional feature point X j onto the first captured image of the i-th frame
- the meanings of the remaining parameters may refer to the aforementioned disclosed embodiments.
- Equation (6) or Equation (7) can be used as the first constraint information.
- the first constraint information can also be jointly obtained according to the global feature matching result and the local feature matching result.
- the first constraint information can be obtained from equations (6) and (7).
- the constraint information is combined to obtain the first constraint information.
- the second constraint information may be constraint information obtained according to the first IMU data.
- the second constraint information may be obtained by using the relevant parameters of the device in the first terminal that collects the first captured image and the first IMU data.
- the process of acquiring the second constraint information can be implemented by formula (8):
- C i ( W T i , W v i , b a , b g ) is the parameter of the first terminal in the case of collecting the first captured image of the i-th frame
- W v i is the first terminal speed
- b a is the acceleration bias of the device measuring the first IMU data in the first terminal
- b g is the gyroscope measurement bias of the device measuring the first IMU data in the first terminal
- h() is the IMU cost function , and the meanings of other parameters may refer to the above disclosed embodiments.
- the second constraint information may be determined according to changes in the first IMU data during the process of collecting the first captured image by the first terminal.
- the processing of the global feature matching result and the local feature matching result may include: processing the global feature matching result and the local feature matching result by adjusting the beam method.
- the bundle adjustment (Bundle Adjustment, BA) is an implementation of the pose solution.
- the constraint information can be solved through BA to calculate the first pose under the minimum error.
- the first constraint information and the second constraint information can be used together as the constraint information.
- the process of solving the constraint information by BA can be represented by the following formula (9):
- formula (9) can be calculated by using the key frame solution and the incremental BA (Incremental Consistent and Efficient Bundle Adjustment, ICE-BA) solution method, thereby determining at least one first pose .
- incremental BA Incmental Consistent and Efficient Bundle Adjustment, ICE-BA
- At least one of the first constraint information and the second constraint information can be used to optimize the obtained first pose, thereby making the final determined first pose overall smoother and reducing jitter; and , using key frames and ICE-BA to solve the first pose can effectively reduce the amount of calculation in the first pose determination process, thereby improving the efficiency of the pose determination process.
- the accuracy of the first pose determined in the embodiments of the present disclosure is relatively high. Therefore, the methods proposed in the embodiments of the present disclosure can be applied to various scenarios in the field of mobile positioning, and are specifically applied to Which scenario can be selected according to the actual situation.
- the pose determination method proposed in the embodiments of the present disclosure can be used to determine the device pose offline.
- the first pose determined by the pose determination method proposed in the embodiments of the present disclosure can be used to evaluate the result accuracy of some neural network algorithms related to mobile positioning.
- Carrying the data set with the true value of motion is an important condition for the development of SLAM technology.
- the true value in motion can be used to evaluate and compare the accuracy of the SLAM algorithm, and it can also be used to improve the accuracy of the SLAM algorithm when processing some extreme cases, such as pictures with motion blur, severe illumination changes, and few feature points.
- Standard which can improve the ability of SLAM algorithm to deal with extreme scenarios.
- the true value of motion is mainly obtained through GPS; in indoor application scenarios, the true value of motion is mainly achieved by building high-precision motion capture systems such as VICON and lighthouse in indoor environments.
- the system is a reflection-based capture system, which requires a custom-made reflective ball to be attached to the captured object as a signal receiver.
- the capture camera emits a specific When there is light, the reflective ball will reflect the light signal of the same wavelength to the camera.
- This method needs to install and calibrate the equipment of motion capture system such as VICON in the surrounding environment where the true value of the trajectory needs to be collected in advance. Therefore, both the equipment and the deployment cost are very high, and the equipment cost of a small room is close to one million. It is more difficult to scale to large-scale scenes.
- each mobile device to collect the true value needs to install and calibrate the signal receiver. Before collecting each set of data, it is necessary to synchronize the received signal with the sensor on the mobile device, which is time-consuming and labor-intensive, and it is difficult to expand to the collection of massive data. .
- real-time positioning can also be achieved based on external signals such as Bluetooth, geomagnetic and other signals, but these methods usually rely on a signal fingerprint map constructed in advance that matches the positioning environment, and the positioning accuracy can be collected in the environment.
- the intensity of the signal at each point varies.
- the operator needs to use the measurement tool to measure on the spot in the positioning environment, which will result in high time cost and labor cost. Therefore, it is impossible to obtain a large amount of true motion value by this method. .
- the embodiment of the present disclosure further provides a method for acquiring motion truth data.
- the pose determination method provided by the embodiments of the present disclosure further includes:
- the motion truth data is determined, wherein the motion truth data is used for at least one of the following operations:
- the motion truth data can be the data for which the result is determined to be the true value in the neural network training, that is, the Ground Truth data in the neural network algorithm. Since the first pose determined in the embodiment of the present disclosure is the pose data of the first terminal during the movement process of data collection, and the accuracy is high, the first pose can be used as the movement truth data.
- the implementation manner of the process of determining the motion truth data in the embodiments of the present disclosure can be flexibly determined according to the actual situation, and is not limited to the following disclosed embodiments.
- determining motion truth data according to at least one first pose of the first terminal during the acquisition process may include:
- At least one of the collected data and at least one first pose of the first terminal during the collection process are used as the true motion data, where the collected data includes:
- One or more of wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, the first acquired image, and the first IMU data are wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, the first acquired image, and the first IMU data.
- the determined at least one first pose may be directly used as motion truth data. Since the number of the determined first poses is not limited in the embodiments of the present disclosure, the number of obtained motion truth data is also not limited in the embodiments of the present disclosure. In some embodiments of the present disclosure, the Each of the determined first poses is used as motion truth data, or one or more first poses are randomly selected from a plurality of first poses as motion truth data.
- the collected data may also be used as motion truth data.
- the collected data may include the first collected image and/or the first IMU data; in some embodiments of the present disclosure, since the implementation of the first terminal is not limited, the type of data collected by the first terminal is not limited. It may also be flexibly changed and expanded, so the collected data may also include one or more of wireless network WiFi data, Bluetooth data, geomagnetic data, and UWB data.
- these collected data can have a corresponding relationship with the determined first pose, and can also provide corresponding constraints in the pose determination process to Assist in pose determination. Therefore, in some embodiments of the present disclosure, various types of collected data may also be used as motion truth data.
- the amount of true motion data can be further increased, so that the application of the true motion data in different scenarios has better performance. Effect.
- the motion truth data may be used to determine the accuracy of the positioning result, and the specific determination is not limited in the embodiments of the present disclosure.
- the motion truth data can be used as the data in the benchmark data set used to judge the accuracy of the algorithm in the neural network evaluation algorithm, so as to be used to judge the accuracy of the positioning result.
- the ground-truth motion data can also be used to train the neural network, and the specific application in the training process is not limited in the embodiments of the present disclosure.
- the ground-truth motion data can be used as training data and/or test data in the neural network, so as to be applied in the training process of the neural network.
- the ground-truth motion data may also be information fused with the global map.
- the ground-truth motion data may also include collected data such as WiFi data, Bluetooth data, geomagnetic data, or UWB data.
- collected data such as WiFi data, Bluetooth data, geomagnetic data, or UWB data.
- FIG. 4 is a schematic flowchart of the acquisition of true motion data provided by an embodiment of the present disclosure.
- the acquisition process of true motion data may include two links: global map reconstruction 401 and true motion data location 402 .
- the global map reconstruction 401 link is used to reconstruct the global map.
- a global map 4014 can be obtained based on the three sub-links of radar SLAM 4011 , feature matching 4012 and vision-radar joint optimization 4013 .
- the second terminal carried by the operator moves in the global scene, so as to use the radar SLAM4011 to collect the laser point cloud in the global scene, use the vision sensor to collect the second collected image in the global scene, and use the IMU sensor to collect the global scene.
- the second IMU data in the acquisition is performed.
- the global map In the process of scanning the global scene by the second terminal, the global map can be reconstructed in real time by using the acquired laser point cloud, the second collected image and the second IMU data, so as to obtain the real-time map.
- the real-time map can reflect the range in which the operator has collected map data in the global scene, so the real-time map can be sent to the target device.
- the global map may be reconstructed offline by using the acquired laser point cloud, the second captured image and the second IMU data in the global scene to obtain the global map.
- the laser point cloud and the second IMU data can be calculated by the radar SLAM4011, so as to determine at least one pose of the radar during the map data collection process, and the position and orientation of the radar can be aligned through the coordinate transformation relationship between the radar and the vision sensor.
- the second captured image can reconstruct the visual map by means of feature matching 4012 to obtain at least one frame of the initial visual point cloud;
- the at least one second pose of is the initial pose, and the features in the second captured image provide third constraint information for the visual map reconstruction process, so as to perform vision-radar joint optimization 4013 on the obtained initial visual point cloud.
- the locating 402 of the true motion data needs to be realized by means of the first terminal including the AR glasses 4021 or the mobile phone 4022, wherein the true motion data
- the positioning 402 may include four sub-links: local feature tracking 4023 , global feature tracking 4024 , visual-inertial joint optimization 4025 , and motion truth data storage 4026 .
- the collection data is acquired by moving within a certain target scene in the global scene through the first terminal including the AR glasses 4021 or the mobile phone 4022 .
- the collected data may include a first collected image and first IMU data.
- the first captured image may be matched with a global map for global feature matching 4024, thereby realizing visual positioning and obtaining a global feature matching result.
- Local feature tracking 4023 may also be performed between different frame images in the first captured image, so as to obtain a local feature matching result.
- visual-inertial joint optimization 4025 may be performed according to the global feature matching results, the local feature matching results and the collected first IMU data, thereby determining the location of the first terminal in the target scene.
- the equipment used is mainly a high-precision map acquisition equipment integrating lidar, camera, and IMU. Therefore, the overall cost of the equipment is low; and the global scene and the target scene do not need to be preliminarily Therefore, the scalability of the scale is obviously better than that of the related schemes that require pre-arrangement of the scene.
- the upper limit of the scale mainly depends on the offline computing power, and the existing algorithms and computing power can meet the scene of hundreds of thousands of square meters
- the method for acquiring motion truth data provided by the embodiment can be used in large-scale scenes; at the same time, the global map in the same global scene can be reused.
- the acquisition only relies on the built-in sensor of the mobile device, so there is no need to perform additional operations such as calibration and synchronization with other external devices before each acquisition to limit the large-scale acquisition; in addition, the method for acquiring the true motion data provided by the embodiment of the present disclosure does not yet Restricted by the application scenario, it can be applied to both indoor and outdoor scenarios.
- the motion truth value obtained in the embodiments of the present disclosure is not limited to being used in the evaluation or training of the neural network, but can also be extended to other scenarios, which is not limited in the present disclosure.
- the present disclosure also provides a pose determination device, an electronic device, a computer-readable storage medium, and a program, all of which can be used to implement any one of the pose determination methods provided by the present disclosure. Corresponding records will not be repeated.
- FIG. 5 is a schematic structural diagram of a pose determination apparatus 5 according to an embodiment of the present disclosure.
- the pose determination apparatus may be a terminal device, a server, or other processing devices.
- the terminal device may be a UE, a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a PDA, a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like.
- the pose determination apparatus may be implemented by a processor invoking computer-readable instructions stored in a memory.
- the pose determination device 5 may include:
- the acquisition data acquisition module 501 is configured to acquire acquisition data acquired by the first terminal in the target scene.
- the global map obtaining module 502 is configured to: obtain a global map including the target scene; wherein, the global map is generated based on map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition .
- the pose determination module 503 is configured to: determine at least one first pose of the first terminal during the collection process according to the feature correspondence between the collected data and the global map.
- the global map includes at least one frame of visual point cloud, and the visual point cloud includes at least one three-dimensional feature point in the global scene;
- the collected data includes a first collected image;
- the pose determination module 503 is configured to: The first captured image is feature-matched with at least one frame of visual point cloud to obtain a global feature matching result; at least one first pose of the first terminal during the capturing process is determined according to the global feature matching result.
- the global map includes at least one frame of visual point cloud in the target scene; the collected data includes at least two frames of first collected images; the pose determination module 503 is configured to: the first collected image and the at least one frame Perform feature matching on the visual point cloud to obtain a global feature matching result; perform feature matching on at least two frames of the first collected images to obtain a local feature matching result; according to the global feature matching result and the local feature matching result, determine that the first terminal is in the collection process. at least one of the first poses.
- the collected data further includes first inertial measurement IMU data; the pose determination module 503 is configured to: obtain first constraint information according to the global feature matching result and/or the local feature matching result; IMU data to obtain second constraint information; according to at least one of the first constraint information and the second constraint information, the global feature matching result and the local feature matching result are processed to obtain at least one first terminal during the acquisition process. one pose.
- the pose determination module 503 is configured to process the global feature matching result and the local feature matching result through a beam method adjustment.
- the pose determination module is configured to: match two-dimensional feature points in the first captured image with three-dimensional feature points included in at least one frame of visual point cloud to obtain a global feature matching result.
- the apparatus further includes: a motion truth data acquisition module; the motion truth data acquisition module is configured to: determine motion truth data according to at least one first pose of the first terminal during the acquisition process .
- the motion truth data acquisition module is configured to: use at least one first pose of the first terminal in the acquisition process as motion truth data; and/or, take at least one of the collected data and at least one first pose of the first terminal during the acquisition process, as the motion truth data; wherein the acquisition data includes: wireless network WiFi data, Bluetooth data, geomagnetic data, ultra-wideband UWB data, first acquired image and one or more of the first IMU data.
- the motion ground truth data is used for at least one of the following operations: judging the accuracy of the positioning result, training the neural network, and performing information fusion with the global map.
- the map data includes: the laser point cloud in the global scene, the second acquired image, and the second IMU data; the apparatus further includes: a map data acquisition module and a global map generation module; wherein, the map data acquisition The module is configured to: obtain map data of the global scene collected by the second terminal; the global map generation module is configured to: reconstruct the global scene offline according to the map data, and generate a global map of the global scene.
- the global map generation module is configured to: determine at least one second pose of the second terminal during the data collection process according to the second IMU data and the laser point cloud; , in combination with the second captured image, reconstruct the visual map of the global scene, and obtain at least one frame of visual point cloud, wherein the visual point cloud corresponds to a plurality of three-dimensional feature points in the global scene; according to at least one frame of visual point cloud, obtain the global scene A global map of the scene.
- the global map generation module is configured to: perform visual map reconstruction on the global scene according to at least one second pose and in combination with the second captured image, to obtain at least one frame of initial visual point cloud; Cloud and/or the second captured image, to obtain third constraint information in the process of visual map reconstruction; according to the third constraint information, optimize at least one frame of initial visual point cloud to obtain at least one frame of visual point cloud; wherein, the third The constraint information includes one or more of the plane constraint information of the laser point cloud, the edge constraint information of the laser point cloud, and the visual constraint information.
- the second terminal includes: the radar is configured to obtain the laser point cloud in the global scene; the vision sensor is configured to obtain the second captured image in the global scene; the IMU sensor is configured to obtain the global scene Second IMU data in the scene.
- the pose determination device 5 is configured to: calibrate the coordinate transformation relationship between the vision sensor and the IMU sensor to obtain a first calibration result; Calibration is performed to obtain a second calibration result; according to the first calibration result and the second calibration result, the coordinate transformation relationship among the vision sensor, the IMU sensor and the radar is jointly calibrated.
- the pose determination device 5 is configured to: in the process of collecting map data by the second terminal, reconstruct the global scene in real time according to the map data, and generate a real-time map of the global scene; send the map to the target device Data and/or a real-time map, where the target device is configured to display the geographic extent over which data collection is done for the global scene.
- Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
- the computer-readable storage medium may be a non-volatile computer-readable storage medium.
- An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
- the embodiments of the present disclosure also provide a computer program program, the computer program includes computer-readable codes, when the computer-readable codes are executed in the electronic device, the processor in the electronic device executes the program to realize the provision of any of the above embodiments pose determination method.
- Embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, which, when executed, cause the computer to perform the operations of the pose determination method provided by any of the foregoing embodiments.
- the electronic device may be provided as a terminal, server or other form of device.
- FIG. 6 shows a block diagram of an electronic device 6 according to an embodiment of the present disclosure.
- the electronic device 6 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc. terminals.
- the electronic device 6 may include one or more of the following components: a processor 601 , a first memory 602 , a first power supply component 603 , a multimedia component 604 , an audio component 605 , a first input/output interface 606 , and a sensor component 607 , and the communication component 608 .
- the processor 601 generally controls the overall operation of the electronic device 6, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the number of processors 601 may be one or more, and the processor 601 may include one or more modules to facilitate interaction between the processor 601 and other components.
- the processor 601 may include a multimedia module to facilitate its interaction with the multimedia component 604 .
- the first memory 602 is configured to store various types of data to support operation at the electronic device 6 . Examples of such data include instructions for any application or method operating on the electronic device 6, contact data, phonebook data, messages, pictures, videos, and the like.
- the first memory 602 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable read only memory, EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (Read- Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
- SRAM Static Random-Access Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- EPROM Erasable Programmable Read Only Memory
- PROM Programmable Read Only Memory
- Read- Only Memory Read- Only Memory
- magnetic memory flash memory, magnetic disk or optical disk.
- the first power supply assembly 603 provides electrical power to various components of the electronic device 6 .
- the first power supply component 603 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the electronic device 6 .
- Multimedia component 604 includes a screen that provides an output interface between the electronic device 6 and the user.
- the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP).
- the screen may be implemented as a touch screen to receive input signals from a user.
- the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel.
- a touch sensor can not only sense the boundaries of a touch or swipe action, but also the duration and pressure associated with the touch or swipe action.
- Multimedia component 604 includes a front-facing camera and/or a rear-facing camera. When the electronic device 6 is in an operation mode such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data.
- Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
- Audio component 605 is configured to output and/or input audio signals.
- the audio component 605 includes a microphone (Micphone, MIC) that is configured to receive external audio signals when the electronic device 6 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
- the received audio signal may be further stored in the first memory 602 or transmitted via the communication component 608 .
- Audio component 605 also includes a speaker for outputting audio signals.
- the first input/output interface 606 provides an interface between the processor 601 and a peripheral interface module, and the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
- Sensor assembly 607 includes one or more sensors for providing electronic device 6 with various aspects of status assessment.
- the sensor assembly 607 can detect the open/closed state of the electronic device 6, the relative positioning of the components, such as the display and the keypad of the electronic device 6, the sensor assembly 607 can also detect the position change of the electronic device 6, or Changes in the position of a component of the electronic device 6 , presence or absence of user contact with the electronic device 6 , orientation or acceleration/deceleration of the electronic device 6 and changes in the temperature of the electronic device 6 .
- Sensor assembly 607 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
- the sensor assembly 607 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or an image sensor (Charge-coupled Device, CCD), for use in imaging applications.
- CMOS Complementary Metal Oxide Semiconductor
- CCD Charge-coupled Device
- the sensor assembly 607 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
- Communication component 608 is configured to facilitate wired or wireless communication between electronic device 6 and other devices.
- the electronic device 6 can access a wireless network based on a communication standard, such as WiFi, a second generation wireless communication technology (The 2nd Generation, 2G) or a third generation mobile communication technology (The 3rd Generation, 3G), or a combination thereof.
- the communication component 608 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
- the communication component 608 also includes a Near Field Communication (NFC) module to facilitate short-range communication.
- NFC Near Field Communication
- the NFC module may be implemented based on radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data association (Infrared Data Association, IrDA) technology, UWB technology, Bluetooth (Blue-Tooth, BT) technology and other technologies.
- RFID Radio Frequency Identification
- IrDA Infrared Data Association
- UWB Wireless Broadband
- Bluetooth Bluetooth
- the electronic device 6 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (Digital Signal Processing Device) , DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation, used to perform the above method.
- ASIC Application Specific Integrated Circuit
- DSP Digital Signal Processor
- DSPD Digital Signal Processing Device
- PLD Programmable Logic Device
- FPGA Field Programmable Gate Array
- controller microcontroller, microprocessor, or other electronic component implementation, used to perform the above method.
- a non-volatile computer-readable storage medium is also provided, such as a first memory 602 including computer program instructions that can be executed by the processor 601 of the electronic device 6 to complete the foregoing implementation The pose determination method described in the example.
- FIG. 7 is a schematic structural diagram of a second electronic device 6 according to an embodiment of the disclosure.
- the electronic device 6 may be provided as a server. 7
- the electronic device 6 includes a processing component 701, wherein the processing component 701 may include one or more processors 601; the electronic device 6 further includes a memory resource represented by a second memory 702, and the second memory 702 is configured as Instructions for execution of the processing component 701, eg, application programs, are stored.
- the application program stored in the second memory 702 may include at least one set of instructions.
- the processing component 701 is configured to execute instructions to perform the above-described pose determination method.
- the electronic device 7 may also include a second power supply assembly 703 , a network interface 704 configured to connect the electronic device 6 to a network, and a second input/output interface 705 .
- the second power supply component 703 is configured to perform power management of the electronic device 6 .
- the electronic device 6 can operate an operating system stored in the second memory 702, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- Embodiments of the present disclosure also provide a non-volatile computer-readable storage medium, where computer program instructions are stored in the storage medium.
- the computer program instructions are executed by a processor, for example, the first memory 602 or In the second memory 702, the above-mentioned computer program instructions can be executed by the processing component 701 of the electronic device 6 to complete the above-mentioned pose determination method.
- Embodiments of the present disclosure also provide a computer program, where the computer program includes computer-readable codes, and when the computer-readable codes run in an electronic device, the processor of the electronic device performs the pose determination as provided in any of the previous embodiments method.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present disclosure.
- a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), ROM, EPROM or flash memory, static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), digital versatile disk (Digital Video Disc, DVD), memory sticks, floppy disks, mechanical coding devices, such as other A punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
- RAM random access memory
- ROM read-only memory
- EPROM EPROM or flash memory
- static random access memory Static Random-Access Memory
- SRAM static random access memory
- portable compact disk read-only memory Compact Disc Read-Only Memory
- CD-ROM Compact Disc Read-Only Memory
- DVD digital versatile disk
- memory sticks floppy disks
- mechanical coding devices such as other A punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the
- Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
- the computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
- Computer program instructions for carrying out operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
- Source or object code written in any combination including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
- the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, can be connected to an external computer (e.g. use an internet service provider to connect via the internet).
- LAN Local Area Network
- WAN Wide Area Network
- electronic circuits such as programmable logic circuits, FPGAs, or Programmable logic arrays (PLAs), that can execute computer-readable Program instructions are read to implement various aspects of the present disclosure.
- PDAs Programmable logic arrays
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
- These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
- Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
- the computer program product can be specifically implemented by hardware, software or a combination thereof.
- the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
- a software development kit Software Development Kit, SDK
- the embodiments of the present application disclose a method, device, electronic device, storage medium and program for determining a pose and attitude.
- the method includes: acquiring acquisition data acquired by a first terminal in a target scene; acquiring a global map including the target scene ; wherein, the global map is generated based on the map data obtained by the second terminal performing data collection on the global scene including the target scene, and the global map satisfies the accuracy condition; according to the collected data and the described
- the feature correspondence between the global maps determines at least one first pose of the first terminal during the acquisition process.
- the pose determination method provided by the embodiments of the present application can reduce the cost of obtaining the first pose, and can also improve the accuracy of the first pose.
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
- Optical Radar Systems And Details Thereof (AREA)
Abstract
Description
Claims (33)
- 一种位姿确定方法,所述方法包括:获取目标场景中的第一终端采集的采集数据;获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。
- 根据权利要求1所述的方法,其中,所述全局地图包括至少一帧视觉点云,所述视觉点云包括所述全局场景中的至少一个三维特征点;所述采集数据包括第一采集图像;所述根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿,包括:将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;根据所述全局特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
- 根据权利要求1所述的方法,其中,所述全局地图包括所述目标场景中的至少一帧视觉点云;所述采集数据包括至少两帧第一采集图像;所述根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿,包括:将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;将所述至少两帧第一采集图像进行特征匹配,得到本地特征匹配结果;根据所述全局特征匹配结果和所述本地特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
- 根据权利要求3所述的方法,其中,所述采集数据还包括第一惯性测量IMU数据;所述根据所述全局特征匹配结果和所述本地特征匹配结果,确定所述第一终端在采集过程中的至少一个第一位姿,包括:根据所述全局特征匹配结果和/或所述本地特征匹配结果,获取第一约束信息;根据所述第一IMU数据,获取第二约束信息;根据所述第一约束信息和所述第二约束信息中的至少一种,对所述全局特征匹配结果和所述本地特征匹配结果进行处理,得到所述第一终端在采集过程中的至少一个所述第一位姿。
- 根据权利要求4所述的方法,其中,所述对所述全局特征匹配结果和所述本地特征匹配结果进行处理,包括:通过光束法平差,对所述全局特征匹配结果和所述本地特征匹配结果进行处理。
- 根据权利要求2至5中任意一项所述的方法,其中,所述将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果,包括:将所述第一采集图像中的二维特征点,与所述至少一帧视觉点云包括的三维特征点进行匹配,得到所述全局特征匹配结果。
- 根据权利要求1所述的方法,其中,所述方法还包括:根据所述第一终端在采集过程中的至少一个所述第一位姿,确定运动真值数据。
- 根据权利要求7所述的方法,其中,所述根据所述第一终端在采集过程中的至少一个所述第一位姿,确定运动真值数据,包括:将所述第一终端在采集过程中的至少一个所述第一位姿作为所述运动真值数据;和/或,将所述采集数据中的至少一种,以及所述第一终端在采集过程中的至少一个所述第一位姿,作为 所述运动真值数据,其中,所述采集数据包括:无线网络WiFi数据、蓝牙数据、地磁数据、超宽带UWB数据、第一采集图像以及第一IMU数据中的一种或多种。
- 根据权利要求7或8任一所述的方法,其中,所述运动真值数据用于以下操作中的至少一种:判断定位结果的精度、对神经网络进行训练以及与所述全局地图进行信息融合。
- 根据权利要求1所述的方法,其中,所述地图数据包括:所述全局场景中的激光点云、第二采集图像以及第二IMU数据;所述方法还包括:获取通过所述第二终端采集的所述全局场景的地图数据;根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图。
- 根据权利要求10所述的方法,其中,所述根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图,包括:根据所述第二IMU数据以及所述激光点云,确定所述第二终端在数据采集过程中的至少一个第二位姿;根据至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧视觉点云;其中,所述视觉点云包括所述全局场景中的至少一个三维特征点;根据所述至少一帧视觉点云,得到所述全局场景的全局地图。
- 根据权利要求11所述的方法,其中,所述根据所述至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧视觉点云,包括:根据所述至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧初始视觉点云;根据所述激光点云和/或所述第二采集图像,获取视觉地图重建过程中的第三约束信息;其中,所述第三约束信息,包括所述激光点云的平面约束信息、所述激光点云的边缘约束信息以及视觉约束信息中的一种或多种;根据所述第三约束信息,对所述至少一帧初始视觉点云进行优化,得到至少一帧视觉点云。
- 根据权利要求10所述的方法,其中,所述第二终端包括:雷达,用于获取所述全局场景中的激光点云;视觉传感器,用于获取所述全局场景中的第二采集图像;IMU传感器,用于获取所述全局场景中的第二IMU数据。
- 根据权利要求13所述的方法,其中,所述根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图之前,还包括:对所述视觉传感器与所述IMU传感器之间的坐标变换关系进行标定,得到第一标定结果;对所述雷达与所述视觉传感器之间的坐标变换关系进行标定,得到第二标定结果;根据所述第一标定结果和所述第二标定结果,对所述视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定。
- 根据权利要求10至14中任意一项所述的方法,其中,所述方法还包括:在所述第二终端采集所述地图数据的过程中,根据所述地图数据对所述全局场景进行实时重建,生成所述全局场景的实时地图;向目标设备发送所述地图数据和/或所述实时地图;其中,所述目标设备,用于显示对所述全局场景完成数据采集的地理范围。
- 一种位姿确定装置,包括:采集数据获取模块配置为:获取目标场景中的第一终端采集的采集数据;全局地图获取模块配置为:获取包含所述目标场景的全局地图;其中,所述全局地图,是基于第二终端对包含所述目标场景的全局场景进行数据采集所获得的地图数据生成的,且所述全局地图满足精度条件;位姿确定模块配置为:根据所述采集数据以及所述全局地图之间的特征对应关系,确定所述第一终端在采集过程中的至少一个第一位姿。
- 根据权利要求16所述的装置,其中,所述全局地图包括至少一帧视觉点云,所述视觉点云包括所述全局场景中的至少一个三维特征点;所述采集数据包括第一采集图像;所述位姿确定模块配置为:将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;根据所述全局特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
- 根据权利要求16所述的装置,其中,所述全局地图包括所述目标场景中的至少一帧视觉点云;所述采集数据包括至少两帧第一采集图像;所述位姿确定模块配置为:将所述第一采集图像与所述至少一帧视觉点云进行特征匹配,得到全局特征匹配结果;将所述至少两帧第一采集图像进行特征匹配,得到本地特征匹配结果;根据所述全局特征匹配结果和所述本地特征匹配结果,确定所述第一终端在采集过程中的至少一个所述第一位姿。
- 根据权利要求18所述的装置,其中,所述采集数据还包括第一惯性测量IMU数据;所述位姿确定模块配置为:根据所述全局特征匹配结果和/或所述本地特征匹配结果,获取第一约束信息;根据所述第一IMU数据,获取第二约束信息;根据所述第一约束信息和所述第二约束信息中的至少一种,对所述全局特征匹配结果和所述本地特征匹配结果进行处理,得到所述第一终端在采集过程中的至少一个所述第一位姿。
- 根据权利要求19所述的装置,其中,所述位姿确定模块配置为:通过光束法平差,对所述全局特征匹配结果和所述本地特征匹配结果进行处理。
- 根据权利要求17至20任一所述的装置,其中:所述位姿确定模块配置为:将所述第一采集图像中的二维特征点,与所述至少一帧视觉点云包括的三维特征点进行匹配,得到所述全局特征匹配结果。
- 根据权利要求16所述的装置,所述装置还包括运动真值获取模块,其中,所述运动真值获取模块配置为:根据所述第一终端在采集过程中的至少一个所述第一位姿,确定运动真值数据。
- 根据权利要求22所述的装置,其中,所述运动真值获取模块配置为:将所述第一终端在采集过程中的至少一个所述第一位姿、作为所述运动真值数据;和/或,将所述采集数据中的至少一种,以及所述第一终端在采集过程中的至少一个所述第一位姿,作为所述运动真值数据,其中,所述采集数据包括:无线网络WiFi数据、蓝牙数据、地磁数据、超宽带UWB数据、第一采集图像以及第一IMU数据中的一种或多种。
- 根据权利要求22或23任一所述的装置,其中,所述运动真值数据用于以下操作中的至少一种:判断定位结果的精度、对神经网络进行训练以及与所述全局地图进行信息融合。
- 根据权利要求16所述的装置,其中,所述地图数据包括:所述全局场景中的激光点云、第二采集图像以及第二IMU数据;所述装置还包括地图数据获取模块以及全局地图生成模块;所述地图数据获取模块配置为:获取通过所述第二终端采集的所述全局场景的地图数据;所述全局地图生成模块配置为:根据所述地图数据,对所述全局场景进行离线重建,生成所述全局场景的全局地图。
- 根据权利要求25所述的装置,其中,所述全局地图生成模块配置为:根据所述第二IMU数据以及所述激光点云,确定所述第二终端在数据采集过程中的至少一个第二位姿;根据所述至少一个所述第二位姿、结合所述第二采集图像,对 所述全局场景进行视觉地图重建,得到至少一帧视觉点云;根据所述至少一帧视觉点云,得到所述全局场景的全局地图;其中,所述视觉点云包括所述全局场景中的至少一个三维特征点。
- 根据权利要求26所述的装置,其中,所述全局地图生成模块配置为:根据所述至少一个所述第二位姿、结合所述第二采集图像,对所述全局场景进行视觉地图重建,得到至少一帧初始视觉点云;根据所述激光点云和/或所述第二采集图像,获取视觉地图重建过程中的第三约束信息;根据所述第三约束信息,对所述至少一帧初始视觉点云进行优化,得到至少一帧视觉点云;根据所述第三约束信息,对所述至少一帧初始视觉点云进行优化,得到至少一帧视觉点云;其中,所述第三约束信息,包括所述激光点云的平面约束信息、所述激光点云的边缘约束信息以及视觉约束信息中的一种或多种。
- 根据权利要求25所述的装置,其中,所述第二终端包括:雷达配置为:获取所述全局场景中的激光点云;视觉传感器配置为:获取所述全局场景中的第二采集图像;IMU传感器配置为:获取所述全局场景中的第二IMU数据。
- 根据权利要求28所述的装置,所述装置配置为:对所述视觉传感器与所述IMU传感器之间的坐标变换关系进行标定,得到第一标定结果;对所述雷达与所述视觉传感器之间的坐标变换关系进行标定,得到第二标定结果;根据所述第一标定结果和所述第二标定结果,对所述视觉传感器、IMU传感器以及雷达之间的坐标变换关系进行联合标定。
- 根据权利要求25至29任一所述的装置,其中,所述装置配置为:在所述第二终端采集所述地图数据的过程中,根据所述地图数据对所述全局场景进行实时重建,生成所述全局场景的实时地图;向目标设备发送所述地图数据和/或所述实时地图,其中,所述目标设备用于显示对所述全局场景完成数据采集的地理范围。
- 一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至15中任意一项所述的位姿确定方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现权利要求1至15中任意一项所述的位姿确定方法。
- 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如权利要求1至15任一项所述的位姿确定方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021568700A JP7236565B2 (ja) | 2020-08-17 | 2020-12-28 | 位置姿勢決定方法、装置、電子機器、記憶媒体及びコンピュータプログラム |
KR1020227003200A KR20220028042A (ko) | 2020-08-17 | 2020-12-28 | 포즈 결정 방법, 장치, 전자 기기, 저장 매체 및 프로그램 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010826704.X | 2020-08-17 | ||
CN202010826704.XA CN111983635B (zh) | 2020-08-17 | 2020-08-17 | 位姿确定方法及装置、电子设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022036980A1 true WO2022036980A1 (zh) | 2022-02-24 |
Family
ID=73435659
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/140274 WO2022036980A1 (zh) | 2020-08-17 | 2020-12-28 | 位姿确定方法、装置、电子设备、存储介质及程序 |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP7236565B2 (zh) |
KR (1) | KR20220028042A (zh) |
CN (2) | CN111983635B (zh) |
TW (1) | TW202208879A (zh) |
WO (1) | WO2022036980A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115439536A (zh) * | 2022-08-18 | 2022-12-06 | 北京百度网讯科技有限公司 | 视觉地图更新方法、装置及电子设备 |
CN115497087A (zh) * | 2022-11-18 | 2022-12-20 | 广州煌牌自动设备有限公司 | 一种餐具姿态的识别系统及其方法 |
CN117636251A (zh) * | 2023-11-30 | 2024-03-01 | 交通运输部公路科学研究所 | 一种基于机器人的灾损检测方法和系统 |
CN118089743A (zh) * | 2024-04-24 | 2024-05-28 | 广州中科智云科技有限公司 | 一种换流站专用无人机智能导航与摄像系统 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111983635B (zh) * | 2020-08-17 | 2022-03-29 | 浙江商汤科技开发有限公司 | 位姿确定方法及装置、电子设备和存储介质 |
CN112433211B (zh) * | 2020-11-27 | 2022-11-29 | 浙江商汤科技开发有限公司 | 一种位姿确定方法及装置、电子设备和存储介质 |
WO2022133986A1 (en) * | 2020-12-25 | 2022-06-30 | SZ DJI Technology Co., Ltd. | Accuracy estimation method and system |
CN113108792A (zh) * | 2021-03-16 | 2021-07-13 | 中山大学 | Wi-Fi指纹地图重建方法、装置、终端设备及介质 |
CN112948411B (zh) * | 2021-04-15 | 2022-10-18 | 深圳市慧鲤科技有限公司 | 位姿数据的处理方法及接口、装置、系统、设备和介质 |
CN114827727B (zh) * | 2022-04-25 | 2024-05-07 | 深圳创维-Rgb电子有限公司 | 电视控制方法、装置、电视及计算机可读存储介质 |
WO2024085266A1 (ko) * | 2022-10-17 | 2024-04-25 | 삼성전자 주식회사 | 초광대역 통신 신호를 이용하여 제스처를 검출하기 위한 방법 및 장치 |
CN116202511B (zh) * | 2023-05-06 | 2023-07-07 | 煤炭科学研究总院有限公司 | 长巷道超宽带一维约束下的移动装备位姿确定方法及装置 |
CN116518961B (zh) * | 2023-06-29 | 2023-09-01 | 煤炭科学研究总院有限公司 | 大规模固定视觉传感器全局位姿的确定方法和装置 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017097402A (ja) * | 2015-11-18 | 2017-06-01 | 株式会社明電舎 | 周辺地図作成方法、自己位置推定方法および自己位置推定装置 |
CN106940186A (zh) * | 2017-02-16 | 2017-07-11 | 华中科技大学 | 一种机器人自主定位与导航方法及系统 |
CN108801243A (zh) * | 2017-04-28 | 2018-11-13 | 宏达国际电子股份有限公司 | 追踪系统及方法 |
CN109084732A (zh) * | 2018-06-29 | 2018-12-25 | 北京旷视科技有限公司 | 定位与导航方法、装置及处理设备 |
CN109727288A (zh) * | 2017-12-28 | 2019-05-07 | 北京京东尚科信息技术有限公司 | 用于单目同时定位与地图构建的系统和方法 |
CN110335316A (zh) * | 2019-06-28 | 2019-10-15 | Oppo广东移动通信有限公司 | 基于深度信息的位姿确定方法、装置、介质与电子设备 |
CN110389348A (zh) * | 2019-07-30 | 2019-10-29 | 四川大学 | 基于激光雷达与双目相机的定位与导航方法及装置 |
CN110849374A (zh) * | 2019-12-03 | 2020-02-28 | 中南大学 | 地下环境定位方法、装置、设备及存储介质 |
CN111983635A (zh) * | 2020-08-17 | 2020-11-24 | 浙江商汤科技开发有限公司 | 位姿确定方法及装置、电子设备和存储介质 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101192825B1 (ko) * | 2011-06-30 | 2012-10-18 | 서울시립대학교 산학협력단 | Gps/ins/영상at를 통합한 라이다 지오레퍼린싱 장치 및 방법 |
JP6354120B2 (ja) * | 2013-05-21 | 2018-07-11 | 株式会社デンソー | 道路情報送信装置、地図生成装置、道路情報収集システム |
EP3078935A1 (en) * | 2015-04-10 | 2016-10-12 | The European Atomic Energy Community (EURATOM), represented by the European Commission | Method and device for real-time mapping and localization |
JP6768416B2 (ja) * | 2015-09-08 | 2020-10-14 | キヤノン株式会社 | 画像処理装置、画像合成装置、画像処理システム、画像処理方法、及びプログラム |
CN108475433B (zh) * | 2015-11-20 | 2021-12-14 | 奇跃公司 | 用于大规模确定rgbd相机姿势的方法和系统 |
WO2018126228A1 (en) * | 2016-12-30 | 2018-07-05 | DeepMap Inc. | Sign and lane creation for high definition maps used for autonomous vehicles |
JP2019074532A (ja) * | 2017-10-17 | 2019-05-16 | 有限会社ネットライズ | Slamデータに実寸法を付与する方法とそれを用いた位置測定 |
CN108489482B (zh) * | 2018-02-13 | 2019-02-26 | 视辰信息科技(上海)有限公司 | 视觉惯性里程计的实现方法及系统 |
CN108765487B (zh) * | 2018-06-04 | 2022-07-22 | 百度在线网络技术(北京)有限公司 | 重建三维场景的方法、装置、设备和计算机可读存储介质 |
CN109737983B (zh) * | 2019-01-25 | 2022-02-22 | 北京百度网讯科技有限公司 | 用于生成行驶路径的方法和装置 |
CN110118554B (zh) * | 2019-05-16 | 2021-07-16 | 达闼机器人有限公司 | 基于视觉惯性的slam方法、装置、存储介质和设备 |
CN110246182B (zh) * | 2019-05-29 | 2021-07-30 | 达闼机器人有限公司 | 基于视觉的全局地图定位方法、装置、存储介质和设备 |
CN111442722B (zh) * | 2020-03-26 | 2022-05-17 | 达闼机器人股份有限公司 | 定位方法、装置、存储介质及电子设备 |
-
2020
- 2020-08-17 CN CN202010826704.XA patent/CN111983635B/zh active Active
- 2020-08-17 CN CN202210363072.7A patent/CN114814872A/zh active Pending
- 2020-12-28 WO PCT/CN2020/140274 patent/WO2022036980A1/zh active Application Filing
- 2020-12-28 KR KR1020227003200A patent/KR20220028042A/ko active Search and Examination
- 2020-12-28 JP JP2021568700A patent/JP7236565B2/ja active Active
-
2021
- 2021-07-30 TW TW110128193A patent/TW202208879A/zh unknown
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017097402A (ja) * | 2015-11-18 | 2017-06-01 | 株式会社明電舎 | 周辺地図作成方法、自己位置推定方法および自己位置推定装置 |
CN106940186A (zh) * | 2017-02-16 | 2017-07-11 | 华中科技大学 | 一种机器人自主定位与导航方法及系统 |
CN108801243A (zh) * | 2017-04-28 | 2018-11-13 | 宏达国际电子股份有限公司 | 追踪系统及方法 |
CN109727288A (zh) * | 2017-12-28 | 2019-05-07 | 北京京东尚科信息技术有限公司 | 用于单目同时定位与地图构建的系统和方法 |
CN109084732A (zh) * | 2018-06-29 | 2018-12-25 | 北京旷视科技有限公司 | 定位与导航方法、装置及处理设备 |
CN110335316A (zh) * | 2019-06-28 | 2019-10-15 | Oppo广东移动通信有限公司 | 基于深度信息的位姿确定方法、装置、介质与电子设备 |
CN110389348A (zh) * | 2019-07-30 | 2019-10-29 | 四川大学 | 基于激光雷达与双目相机的定位与导航方法及装置 |
CN110849374A (zh) * | 2019-12-03 | 2020-02-28 | 中南大学 | 地下环境定位方法、装置、设备及存储介质 |
CN111983635A (zh) * | 2020-08-17 | 2020-11-24 | 浙江商汤科技开发有限公司 | 位姿确定方法及装置、电子设备和存储介质 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115439536A (zh) * | 2022-08-18 | 2022-12-06 | 北京百度网讯科技有限公司 | 视觉地图更新方法、装置及电子设备 |
CN115439536B (zh) * | 2022-08-18 | 2023-09-26 | 北京百度网讯科技有限公司 | 视觉地图更新方法、装置及电子设备 |
CN115497087A (zh) * | 2022-11-18 | 2022-12-20 | 广州煌牌自动设备有限公司 | 一种餐具姿态的识别系统及其方法 |
CN115497087B (zh) * | 2022-11-18 | 2024-04-19 | 广州煌牌自动设备有限公司 | 一种餐具姿态的识别系统及其方法 |
CN117636251A (zh) * | 2023-11-30 | 2024-03-01 | 交通运输部公路科学研究所 | 一种基于机器人的灾损检测方法和系统 |
CN117636251B (zh) * | 2023-11-30 | 2024-05-17 | 交通运输部公路科学研究所 | 一种基于机器人的灾损检测方法和系统 |
CN118089743A (zh) * | 2024-04-24 | 2024-05-28 | 广州中科智云科技有限公司 | 一种换流站专用无人机智能导航与摄像系统 |
Also Published As
Publication number | Publication date |
---|---|
JP7236565B2 (ja) | 2023-03-09 |
TW202208879A (zh) | 2022-03-01 |
CN111983635B (zh) | 2022-03-29 |
CN111983635A (zh) | 2020-11-24 |
KR20220028042A (ko) | 2022-03-08 |
JP2022548441A (ja) | 2022-11-21 |
CN114814872A (zh) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022036980A1 (zh) | 位姿确定方法、装置、电子设备、存储介质及程序 | |
US11165959B2 (en) | Connecting and using building data acquired from mobile devices | |
WO2022262152A1 (zh) | 地图构建方法及装置、电子设备、存储介质和计算机程序产品 | |
US9646384B2 (en) | 3D feature descriptors with camera pose information | |
CN112020855B (zh) | 用于稳定视频以减少相机和人脸移动的方法、系统和设备 | |
TWI753348B (zh) | 位姿確定方法、位姿確定裝置、電子設備和電腦可讀儲存媒介 | |
US9699375B2 (en) | Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system | |
US10545215B2 (en) | 4D camera tracking and optical stabilization | |
WO2022077296A1 (zh) | 三维重建方法、云台负载、可移动平台以及计算机可读存储介质 | |
US20220084249A1 (en) | Method for information processing, electronic equipment, and storage medium | |
WO2023103377A1 (zh) | 标定方法及装置、电子设备、存储介质及计算机程序产品 | |
CN112432637B (zh) | 定位方法及装置、电子设备和存储介质 | |
CN113066135A (zh) | 图像采集设备的标定方法及装置、电子设备和存储介质 | |
CA3069813C (en) | Capturing, connecting and using building interior data from mobile devices | |
WO2021088497A1 (zh) | 虚拟物体显示方法、全局地图更新方法以及设备 | |
CN113063421A (zh) | 导航方法及相关装置、移动终端、计算机可读存储介质 | |
CN112700468A (zh) | 位姿确定方法及装置、电子设备和存储介质 | |
WO2022110801A1 (zh) | 数据处理方法及装置、电子设备和存储介质 | |
KR20220169472A (ko) | 센서 캘리브레이트 방법 및 장치, 전자 기기와 저장 매체 | |
US20220345621A1 (en) | Scene lock mode for capturing camera images | |
CN117115244A (zh) | 云端重定位方法、装置及存储介质 | |
CN116664887A (zh) | 定位精度确定方法、装置、电子设备及可读存储介质 | |
CN112308878A (zh) | 一种信息处理方法、装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021568700 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20227003200 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20950180 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20950180 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20950180 Country of ref document: EP Kind code of ref document: A1 |