WO2024087066A1 - 图像定位方法、装置、电子设备及存储介质 - Google Patents
图像定位方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2024087066A1 WO2024087066A1 PCT/CN2022/127768 CN2022127768W WO2024087066A1 WO 2024087066 A1 WO2024087066 A1 WO 2024087066A1 CN 2022127768 W CN2022127768 W CN 2022127768W WO 2024087066 A1 WO2024087066 A1 WO 2024087066A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- imu
- visual
- positioning map
- value corresponding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000004807 localization Effects 0.000 title abstract description 10
- 230000000007 visual effect Effects 0.000 claims abstract description 149
- 230000005484 gravity Effects 0.000 claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 17
- 238000004891 communication Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/10—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
- G01C21/12—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
- G01C21/16—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
Definitions
- the present disclosure relates to the technical field of image positioning, and in particular to an image positioning method, device, electronic device and storage medium.
- the embodiments of the present disclosure provide an image positioning method, device, electronic device and storage medium to solve the defects in the related art.
- an image positioning method comprising:
- the target position and posture of the image to be positioned is determined according to the multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned.
- it further includes:
- the world coordinate system of the visual positioning map is rotated according to the rotation matrix, and the IMU value of the world coordinate system of the visual positioning map is aligned with the gravity direction.
- obtaining the IMU value corresponding to the image to be positioned includes:
- the IMU value collected by the IMU sensor of the image acquisition device at the shooting time is obtained, wherein the shooting time is the time when the image acquisition device collects the image to be positioned.
- obtaining the IMU value corresponding to the image to be positioned includes:
- the shooting time is the midpoint of the preset time length.
- determining the target pose of the image to be positioned according to the multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned includes:
- the target position and posture of the image to be positioned is determined according to at least one of the feature matching results, the IMU value corresponding to the image to be positioned, and the three-dimensional key points of the visual positioning map.
- determining the target pose of the image to be positioned according to at least one of the feature matching results, the 3D key points of the 3D image and the visual positioning map includes:
- the initial pose is optimized by minimizing the IMU error and the reprojection error to obtain the target pose of the image to be positioned.
- it further includes:
- the image to be located is taken by the navigation device; the method further includes:
- the position of the navigation device in the navigation space is determined according to the target posture of the image to be positioned and the visual positioning map.
- an image positioning device comprising:
- a first acquisition module is used to acquire a visual positioning map, wherein an IMU value of a world coordinate system of the visual positioning map is aligned with a gravity direction, and the visual positioning map has a plurality of visual images;
- a second acquisition module is used to acquire the image to be positioned and the IMU value corresponding to the image to be positioned;
- the posture module is used to determine the target posture of the image to be positioned according to the multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned.
- an alignment module is further included, for:
- the world coordinate system of the visual positioning map is rotated according to the rotation matrix, and the IMU value of the world coordinate system of the visual positioning map is aligned with the gravity direction.
- the second acquisition module is specifically used for:
- the IMU value collected by the IMU sensor of the image acquisition device at the shooting time is obtained, wherein the shooting time is the time when the image acquisition device collects the image to be positioned.
- the second acquisition module is specifically used for:
- the shooting time is the midpoint of the preset time length.
- the posture module is specifically used for:
- the target position and posture of the image to be positioned is determined according to at least one of the feature matching results, the IMU value corresponding to the image to be positioned, and the three-dimensional key points of the visual positioning map.
- the posture module is used to determine the target posture of the image to be positioned according to at least one of the feature matching results, the IMU value corresponding to the image to be positioned, and the three-dimensional key points of the visual positioning map, specifically for:
- the initial pose is optimized by minimizing the IMU error and the reprojection error to obtain the target pose of the image to be positioned.
- a building block is further included for:
- the image to be located is taken by the navigation device; the apparatus further includes a navigation module for:
- the position of the navigation device in the navigation space is determined according to the target posture of the image to be positioned and the visual positioning map.
- an electronic device comprising a memory and a processor, wherein the memory is used to store computer instructions executable on the processor, and the processor is used to implement the image positioning method described in the first aspect when executing the computer instructions.
- a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in the first aspect is implemented.
- the image positioning method provided by the embodiment of the present disclosure can determine the target posture of the image to be positioned according to multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned by acquiring a visual positioning map, an image to be positioned, and an IMU value corresponding to the image to be positioned. Since the IMU value of the world coordinate system of the visual positioning is aligned with the direction of gravity, and the IMU value of the image to be positioned is also combined when determining the target posture, the target posture can be constrained in at least two degrees of freedom, namely, the roll angle and the yaw angle, thereby improving the accuracy of the target posture and reducing the repositioning error in positioning technologies such as SLAM.
- FIG1 is a flow chart of an image annotation method shown in an exemplary embodiment of the present disclosure
- FIG2 is a schematic diagram of the structure of an image annotation device shown in an exemplary embodiment of the present disclosure
- Fig. 3 is a structural block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
- first, second, third, etc. may be used in the present disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
- word "if” as used herein may be interpreted as "at the time of” or "when” or "in response to determining”.
- the accuracy of image pose calculation in the relocalization process is low. Especially in cases where there are few textures in the image and the scene depth is large, the accuracy of image pose calculation is low, which makes the relocalization error in positioning technologies such as SLAM large. Specifically, there are errors in the offset degrees of freedom and rotation degrees of freedom of the three coordinate axes.
- At least one embodiment of the present disclosure provides an image positioning method. Please refer to FIG. 1 , which shows the process of the method, including steps S101 to S103 .
- this method can be applied to the re-positioning process of positioning technologies such as SLAM, and is used to match the collected image (i.e., the image to be positioned) with the map data, thereby calculating the image pose.
- the reduction of the re-positioning error will reduce the positioning error in the positioning process based on the SLAM positioning technology.
- This method can be applied to the positioning functions of AR (Augmented Reality), VR (Virtual Reality), robots and other devices (hereinafter collectively referred to as navigation devices), which have sensors such as cameras, GPS (Global Positioning System), IMU (inertial measurement unit), lidar, depth cameras, etc.
- AR Augmented Reality
- VR Virtual Reality
- robots and other devices hereinafter collectively referred to as navigation devices
- sensors such as cameras, GPS (Global Positioning System), IMU (inertial measurement unit), lidar, depth cameras, etc.
- step S101 a visual positioning map is obtained, wherein an IMU value of a world coordinate system of the visual positioning is aligned with a gravity direction, and the visual positioning map has a plurality of visual images.
- the visual positioning map is a map corresponding to the navigation space, which is the activity space of the navigation device running the method.
- the visual positioning map can be stored in the navigation device or in a server connected to the navigation device, so in this step, the visual positioning map can be obtained from the memory space of the navigation device or from the server.
- the visual positioning map includes multiple visual images, three-dimensional key points, and a mesh topological structure between three-dimensional key points.
- the visual positioning map can be constructed in advance in the following manner: multiple visual images in the navigation space and the IMU value corresponding to each of the visual images are obtained, and the visual positioning map is constructed according to the multiple visual images.
- the visual images can be collected by the navigation device using a camera during the activity in the navigation space, and the IMU value can be collected by the IMU sensor of the navigation device.
- the IMU value corresponding to the first frame of the visual image is used as the IMU value of the world coordinate system of the visual positioning map.
- the IMU value of the world coordinate system of the visual positioning map can be pre-aligned with the gravity direction in the following manner: first, the IMU value corresponding to the first frame of the visual image of the visual positioning map is obtained as the IMU value of the world coordinate system of the visual positioning map; next, the rotation matrix is determined according to the IMU value of the world coordinate system of the visual positioning map and the gravity direction (0, 0, g); finally, the world coordinate system of the visual positioning map is rotated according to the rotation matrix to align the IMU value of the world coordinate system of the visual positioning map with the gravity direction.
- step S102 an image to be positioned and an IMU value corresponding to the image to be positioned are obtained.
- the camera can be used to collect the image to be positioned, and the IMU sensor can be used to collect the IMU value.
- the IMU value collected by the IMU sensor of the image acquisition device at the shooting time can be obtained, wherein the shooting time is the time when the image acquisition device collects the image to be positioned, thereby improving the accuracy of the IMU.
- the shooting time may be the midpoint of the preset time.
- the IMU values collected within the preset time may be filtered, that is, the upper and lower limits of the IMU values are set, and the IMU values above the upper limit and below the lower limit are removed, thereby avoiding the interference of erroneous data on the IMU values corresponding to the positioning image.
- step S103 the target posture of the image to be positioned is determined according to the multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned.
- At least one candidate image is determined from the multiple visual images according to the image to be located.
- the visual image can be used as key data and a bag-of-words model can be used to retrieve the candidate image from the multiple visual images.
- a feature matching result between each candidate image in the at least one candidate image and the image to be located is determined.
- the image to be located and feature points of each candidate image are first extracted, and then each candidate image and the image to be located are combined into an image pair, and then feature matching is performed on the feature points of the two images in each image pair, thereby obtaining a feature matching result, that is, a plurality of mutually matching feature point pairs.
- the target pose of the image to be positioned is determined according to at least one of the feature matching results, the IMU value corresponding to the image to be positioned, and the three-dimensional key points of the visual positioning map.
- the initial pose of the image to be positioned can be determined based on at least one of the feature matching results and the three-dimensional key points of the visual positioning map, for example, by solving the initial pose through a PNP algorithm or an ICP algorithm.
- the IMU error is determined, and the reprojection error between the image to be positioned and the visual positioning map is determined.
- the IMU error e img and the reprojection error e imu are calculated according to the following two formulas:
- Tci is the camera extrinsic parameter used to characterize the relative position between the IMU sensor and the camera
- gc is the IMU value corresponding to the image to be positioned
- u is the feature point in the image to be positioned
- K is the camera intrinsic parameter
- Tcw is the transformation matrix from the world coordinate system to the camera coordinate system
- Pw is the three-dimensional key point corresponding to u.
- the initial pose is optimized by minimizing the IMU error and the reprojection error to obtain the target pose of the image to be positioned.
- the IMU error and the reprojection error can be summed to obtain a total error, and then the initial pose is optimized by minimizing the total error.
- the position of the navigation device in the navigation space can also be determined according to the target position of the image to be positioned and the visual positioning map, that is, the positioning and navigation of the navigation device are completed.
- the image positioning method provided by the embodiment of the present disclosure can determine the target posture of the image to be positioned according to multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned by acquiring a visual positioning map, an image to be positioned, and an IMU value corresponding to the image to be positioned.
- the target posture can be constrained in at least two degrees of freedom, namely the roll angle and the yaw angle, so that the error in six degrees of freedom is reduced to the error in four degrees of freedom, thereby improving the accuracy of the target posture, reducing the error of repositioning in positioning technologies such as SLAM, and improving the accuracy and robustness of repositioning.
- an image positioning device is provided. Please refer to FIG. 2 .
- the device includes:
- a first acquisition module 201 is used to acquire a visual positioning map, wherein an IMU value of a world coordinate system of the visual positioning map is aligned with a gravity direction, and the visual positioning map has a plurality of visual images;
- the second acquisition module 202 is used to acquire the image to be positioned and the IMU value corresponding to the image to be positioned;
- the pose module 203 is used to determine the target pose of the image to be positioned according to the multiple visual images of the visual positioning map, the image to be positioned, and the IMU value corresponding to the image to be positioned.
- an alignment module is further included, which is used to:
- the world coordinate system of the visual positioning map is rotated according to the rotation matrix, and the IMU value of the world coordinate system of the visual positioning map is aligned with the gravity direction.
- the second acquisition module is specifically used to:
- the IMU value collected by the IMU sensor of the image acquisition device at the shooting time is obtained, wherein the shooting time is the time when the image acquisition device collects the image to be positioned.
- the second acquisition module is specifically used to:
- an IMU value collected by an IMU sensor of the image acquisition device within a preset time period including a shooting time, wherein the shooting time is the time when the image acquisition device collects the image to be positioned;
- the shooting moment is the midpoint of the preset duration.
- the posture module is specifically used to:
- the target position and posture of the image to be positioned is determined according to at least one of the feature matching results, the IMU value corresponding to the image to be positioned, and the three-dimensional key points of the visual positioning map.
- the pose module is used to determine the target pose of the image to be positioned according to at least one of the feature matching results, the IMU value corresponding to the image to be positioned, and the three-dimensional key points of the visual positioning map, specifically for:
- the initial pose is optimized by minimizing the IMU error and the reprojection error to obtain the target pose of the image to be positioned.
- a building block is further included for:
- the image to be located is taken by the navigation device; the apparatus further includes a navigation module for:
- the position of the navigation device in the navigation space is determined according to the target posture of the image to be positioned and the visual positioning map.
- the device 300 can be a mobile phone, a computer, a digital broadcast terminal, a message transceiver device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
- the device 300 may include one or more of the following components: a processing component 302 , a memory 304 , a power component 306 , a multimedia component 308 , an audio component 310 , an input/output (I/O) interface 312 , a sensor component 314 , and a communication component 316 .
- the processing component 302 generally controls the overall operation of the device 300, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
- the processing component 302 may include one or more processors 320 to execute instructions to complete all or part of the steps of the above-mentioned method.
- the processing component 302 may include one or more modules to facilitate the interaction between the processing component 302 and other components.
- the processing component 302 may include a multimedia module to facilitate the interaction between the multimedia component 308 and the processing component 302.
- the memory 304 is configured to store various types of data to support operations on the device 300. Examples of such data include instructions for any application or method operating on the device 300, contact data, phone book data, messages, pictures, videos, etc.
- the memory 304 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable programmable read-only memory
- PROM programmable read-only memory
- ROM read-only memory
- magnetic memory flash memory
- flash memory magnetic disk or optical disk.
- the power component 306 provides power to the various components of the device 300.
- the power component 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 300.
- the multimedia component 308 includes a screen that provides an output interface between the device 300 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundaries of the touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
- the multimedia component 308 includes a front camera and/or a rear camera. When the device 300 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and the rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
- the audio component 310 is configured to output and/or input audio signals.
- the audio component 310 includes a microphone (MIC), and when the device 300 is in an operating mode, such as a call mode, a recording mode, and a speech recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal can be further stored in the memory 304 or sent via the communication component 316.
- the audio component 310 also includes a speaker for outputting audio signals.
- I/O interface 312 provides an interface between processing component 302 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include but are not limited to: a home button, a volume button, a start button, and a lock button.
- the sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300.
- the sensor assembly 314 can detect the open/closed state of the device 300, the relative positioning of components, such as the display and keypad of the device 300, the sensor assembly 314 can also detect the position change of the device 300 or a component of the device 300, the presence or absence of user contact with the device 300, the orientation or acceleration/deceleration of the device 300 and the temperature change of the device 300.
- the sensor assembly 314 can also include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
- the sensor assembly 314 can also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor assembly 314 can also include an accelerometer, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
- the communication component 316 is configured to facilitate wired or wireless communication between the device 300 and other devices.
- the device 300 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, 4G or 5G or a combination thereof.
- the communication component 316 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
- the communication component 316 also includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the apparatus 300 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components to execute the power supply method for the above-mentioned electronic device.
- ASICs application-specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- controllers microcontrollers, microprocessors or other electronic components to execute the power supply method for the above-mentioned electronic device.
- the present disclosure in an exemplary embodiment, further provides a non-transitory computer-readable storage medium including instructions, such as a memory 304 including instructions, and the instructions can be executed by a processor 320 of the device 300 to complete the power supply method of the electronic device.
- the non-transitory computer-readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Navigation (AREA)
Abstract
本公开是关于图像定位方法、装置、电子设备及存储介质,所述方法包括:获取视觉定位地图,其中,所述视觉定位的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像;获取待定位图像,以及所述待定位图像对应的IMU值;根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。由于所述视觉定位的世界坐标系的IMU值与重力方向对齐,而且在确定目标位姿时也结合了待定位图像的IMU值,因此至少可以在滚转角和偏航角两个自由度上对目标位姿进行约束,从而提高目标位姿的精度,降低SLAM等定位技术中重定位的误差。
Description
本公开涉及图像定位技术领域,具体涉及一种图像定位方法、装置、电子设备及存储介质。
近年来,人工智能愈加进步,取得了非常大的发展,逐渐在各个领域掀起了技术革新。例如,人工智能使AR(Augmented Reality,增强现实)、VR(Virtual Reality,虚拟现实)、机器人等设备中的定位功能变得更加准确和高效,实现了SLAM(Simultaneous Localization and Mapping,即时定位与地图构建)等定位技术。基于SLAM定位技术进行定位的过程中,定位误差会随着运动时长与距离的增加而增大,因此为了保证任意时刻的定位准确度与精度,需要利用当前时刻的图像,与地图数据进行匹配,解算图像位姿,完成重定位。但是相关技术中,重定位过程中的图像位姿解算的准确度较低。
发明内容
为克服相关技术中存在的问题,本公开实施例提供一种图像定位方法、装置、电子设备及存储介质,用以解决相关技术中的缺陷。
根据本公开实施例的第一方面,提供一种图像定位方法,所述方法包括:
获取视觉定位地图,其中,所述视觉定位的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像;
获取待定位图像,以及所述待定位图像对应的IMU值;
根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。
在一个实施例中,还包括:
获取所述视觉定位地图的首帧视觉图像对应的IMU值,作为所述视觉定位地图的世界坐标系的IMU值;
根据所述视觉定位地图的世界坐标系的IMU值和所述重力方向,确定旋转矩阵;
根据所述旋转矩阵对所述视觉定位地图的世界坐标系进行旋转,将所述视觉定位地图的世界坐标系的IMU值与所述重力方向对齐。
在一个实施例中,所述获取所述待定位图像对应的IMU值,包括:
获取所述图像采集设备的IMU传感器在拍摄时刻采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻。
在一个实施例中,所述获取所述待定位图像对应的IMU值,包括:
获取所述图像采集设备的IMU传感器在包含拍摄时刻的预设时长内采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻;
确定所述预设时长内采集的IMU值的平均值为所述待定位图像对应的IMU值。
在一个实施例中,所述拍摄时刻为所述预设时长的中点时刻。
在一个实施例中,所述根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿,包括:
根据所述待定位图像,在所述多个视觉图像中确定至少一个候选图像;
确定所述至少一个候选图像中每个候选图像,与所述待定位图像的特征匹配结果;
根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿。
在一个实施例中,所述根据至少一个所述特征匹配结果、所述、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿,包括:
根据至少一个所述特征匹配结果和所述视觉定位地图的三维关键点,确 定所述待定位图像的初始位姿;
根据所述待定位图像对应的IMU值和所述重力方向,确定IMU误差,并确定所述待定位图像和所述视觉定位地图之间的重投影误差;
通过最小化所述IMU误差和所述重投影误差,对所述初始位姿进行优化,得到所述待定位图像的目标位姿。
在一个实施例中,还包括:
获取导航空间内的多个视觉图像,以及每个所述视觉图像对应的IMU值,并根据所述多个视觉图像构建所述视觉定位地图。
在一个实施例中,所述待定位图像由所述导航设备拍摄;所述方法还包括:
根据所述待定位图像的目标位姿,以及所述视觉定位地图,确定所述导航设备在导航空间内的位置。
根据本公开实施例的第二方面,提供一种图像定位装置,所述装置包括:
第一获取模块,用于获取视觉定位地图,其中,所述视觉定位地图的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像;
第二获取模块,用于获取待定位图像,以及所述待定位图像对应的IMU值;
位姿模块,用于根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。
在一个实施例中,还包括对齐模块,用于:
获取所述视觉定位地图的首帧视觉图像对应的IMU值,作为所述视觉定位地图的世界坐标系的IMU值;
根据所述视觉定位地图的世界坐标系的IMU值和所述重力方向,确定旋转矩阵;
根据所述旋转矩阵对所述视觉定位地图的世界坐标系进行旋转,将所述视觉定位地图的世界坐标系的IMU值与所述重力方向对齐。
在一个实施例中,所述第二获取模块具体用于:
获取所述图像采集设备的IMU传感器在拍摄时刻采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻。
在一个实施例中,所述第二获取模块具体用于:
获取所述图像采集设备的IMU传感器在包含拍摄时刻的预设时长内采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻;
确定所述预设时长内采集的IMU值的平均值为所述待定位图像对应的IMU值。
在一个实施例中,所述拍摄时刻为所述预设时长的中点时刻。
在一个实施例中,所述位姿模块具体用于:
根据所述待定位图像,在所述多个视觉图像中确定至少一个候选图像;
确定所述至少一个候选图像中每个候选图像,与所述待定位图像的特征匹配结果;
根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿。
在一个实施例中,所述位姿模块用于根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿时,具体用于:
根据至少一个所述特征匹配结果和所述视觉定位地图的三维关键点,确定所述待定位图像的初始位姿;
根据所述待定位图像对应的IMU值和所述重力方向,确定IMU误差,并确定所述待定位图像和所述视觉定位地图之间的重投影误差;
通过最小化所述IMU误差和所述重投影误差,对所述初始位姿进行优化,得到所述待定位图像的目标位姿。
在一个实施例中,还包括构建模块,用于:
获取导航空间内的多个视觉图像,以及每个所述视觉图像对应的IMU值,并根据所述多个视觉图像构建所述视觉定位地图。
在一个实施例中,所述待定位图像由所述导航设备拍摄;所述装置还包括导航模块,用于:
根据所述待定位图像的目标位姿,以及所述视觉定位地图,确定所述导航设备在导航空间内的位置。
根据本公开实施例的第三方面,提供一种电子设备,所述电子设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现第一方面所述的图像定位方法。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面所述的方法。
本公开的实施例提供的技术方案可以包括以下有益效果:
本公开实施例所提供的图像定位方法,通过获取视觉定位地图,以及待定位图像,以及所述待定位图像对应的IMU值,可以根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。由于所述视觉定位的世界坐标系的IMU值与重力方向对齐,而且在确定目标位姿时也结合了待定位图像的IMU值,因此至少可以在滚转角和偏航角两个自由度上对目标位姿进行约束,从而提高目标位姿的精度,降低SLAM等定位技术中重定位的误差。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。
图1是本公开一示例性实施例示出的图像标注方法的流程图;
图2是本公开一示例性实施例示出的图像标注装置的结构示意图;
图3是本公开一示例性实施例示出的电子设备的结构框图。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的 描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
近年来,人工智能愈加进步,取得了非常大的发展,逐渐在各个领域掀起了技术革新。例如,人工智能使AR(Augmented Reality,增强现实)、VR(Virtual Reality,虚拟现实)、机器人等设备中的定位功能变得更加准确和高效,实现了SLAM(Simultaneous Localization and Mapping,即时定位与地图构建)等定位技术。基于SLAM定位技术进行定位的过程中,定位误差会随着运动时长与距离的增加而增大,因此为了保证任意时刻的定位准确度与精度,需要利用当前时刻的图像,与地图数据进行匹配,解算图像位姿,完成重定位。
但是相关技术中,重定位过程中的图像位姿解算的准确度较低。尤其是在图像中纹理较少、场景深度较大等情况下,图像位姿解算的准确度较低,使得SLAM等定位技术中重定位的误差较大,具体来说,在三个坐标轴的偏移自由度和旋转自由度上均存在误差。
基于此,第一方面,本公开至少一个实施例提供了一种图像定位方法, 请参照附图1,其示出了该方法的流程,包括步骤S101至步骤S103。
其中,该方法可以应用于SLAM等定位技术的重定位过程中,用于对采集的图像(即待定位图像)与地图数据进行匹配,从而结算图像位姿。重定位误差的降低,将会降低基于SLAM定位技术的定位过程中的定位误差。
该方法可以应用于AR(Augmented Reality,增强现实)、VR(Virtual Reality,虚拟现实)、机器人等设备(以下统一称为导航设备)的定位功能中,这些设备具有相机、GPS(Global Positioning System,全球定位系统)、IMU(inertial measurement unit,惯性测量单元)、激光雷达、深度相机等传感器。
在步骤S101中,获取视觉定位地图,其中,所述视觉定位的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像。
其中,所述视觉定位地图是对应于导航空间的地图,导航空间也就是运行该方法的导航设备的活动空间。视觉定位地图可以存储在导航设备内,也可以存储在与导航设备连接的服务器内,因此本步骤中可以向导航设备的内存空间,或者向服务器获取视觉定位地图。
视觉定位地图包括多个视觉图像、三维关键点、三维关键点间的网状拓扑结构等。
在一个可能的实施例中,可以预先按照下述方式构建视觉定位地图:获取导航空间内的多个视觉图像,以及每个所述视觉图像对应的IMU值,并根据所述多个视觉图像构建所述视觉定位地图。其中,视觉图像可以由导航设备在导航空间中活动的过程中使用相机采集,IMU值可以由导航设备的IMU传感器采集。
示例性的,对每个视觉图像进行特征提取,并基于特征计算不同视觉图像之间的匹配关系,最后利用匹配关系结算图像位姿,并生成三维关键点及网状拓扑结构等。可以理解的是,在构建视觉定位地图时,将首帧视觉图像对应的IMU值作为所述视觉定位地图的世界坐标系的IMU值。
在另一个可能的实施例中,可以预先采用下述方式将所述视觉定位地图的世界坐标系的IMU值与所述重力方向对齐:首先,获取所述视觉定位地图 的首帧视觉图像对应的IMU值,作为所述视觉定位地图的世界坐标系的IMU值;接下来,根据所述视觉定位地图的世界坐标系的IMU值和所述重力方向(0,0,g),确定旋转矩阵;最后,根据所述旋转矩阵对所述视觉定位地图的世界坐标系进行旋转,将所述视觉定位地图的世界坐标系的IMU值与所述重力方向对齐。
在步骤S102中,获取待定位图像,以及所述待定位图像对应的IMU值。
其中,导航设备在导航空间中活动的过程中,可以利用相机采集待定位图像,并利用IMU传感器采集IMU值。
示例性的,获取待定位图像对应的IMU值时,可以获取所述图像采集设备的IMU传感器在拍摄时刻采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻。从而提高IMU的准确性。
再示例性的,获取待定位图像对应的IMU值时,可以先获取所述图像采集设备的IMU传感器在包含拍摄时刻的预设时长内采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻;再确定所述预设时长内采集的IMU值的平均值为所述待定位图像对应的IMU值。
其中,所述拍摄时刻可以为所述预设时长的中点时刻。另外,在确定所述预设时长内采集的IMU值的平均值之前,还可以对预设时长内采集的IMU值进行滤波处理,也就是设置IMU值的上限和下限,将高于上限和低于下限的IMU值进行去除,从而避免错误数据对待定位图像对应的IMU值产生干扰。
本示例中,通过采集预设时长内的IMU值并进行平均,能够避免待定位图像对应的IMU值噪声干扰,提高IMU值的准确性。
在步骤S103中,根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。
可选的,按照下述方式执行本步骤:
首先,根据所述待定位图像,在所述多个视觉图像中确定至少一个候选图像。示例性的,可以以视觉图像作为关键数据,利用词袋模型在多个视觉图像中检索候选图像。
接下来,确定所述至少一个候选图像中每个候选图像,与所述待定位图像的特征匹配结果。示例性的,先提取待定位图像,以及每个候选图像的特征点,然后将每个候选图像与待定位图像组成图像对,再对每个图像对中两个图像的特征点进行特征匹配,从而得到特征匹配结果,也就是多个相互匹配的特征点对。
最后,根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿。
示例性的,可以先根据至少一个所述特征匹配结果和所述视觉定位地图的三维关键点,确定所述待定位图像的初始位姿,例如通过PNP算法或者ICP算法求解初始位姿。
然后根据所述待定位图像对应的IMU值和所述重力方向,确定IMU误差,并确定所述待定位图像和所述视觉定位地图之间的重投影误差。例如按照下述两式计算IMU误差e
img和重投影误差e
imu:
e
imu=u-KT
cwP
w
式中,g
w为重力方向(0,0,g),T
ci为用于表征IMU传感器与相机之间的相对位置的相机外参,g
c为待定位图像对应的IMU值,u为待定位图像中的特征点,K为相机内参,T
cw为世界坐标系至相机坐标系的转换矩阵,P
w为u对应的三维关键点。
最后通过最小化所述IMU误差和所述重投影误差,对所述初始位姿进行优化,得到所述待定位图像的目标位姿。例如可以将所述IMU误差和所述重投影误差求和得到总误差,然后通过最小化总误差的方式,对所述初始位姿进行优化。
在执行完本步骤后,还可以根据所述待定位图像的目标位姿,以及所述视觉定位地图,确定所述导航设备在导航空间内的位置。也就是完成导航设备的定位和导航。
本公开实施例所提供的图像定位方法,通过获取视觉定位地图,以及待 定位图像,以及所述待定位图像对应的IMU值,可以根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。由于所述视觉定位的世界坐标系的IMU值与重力方向对齐,而且在确定目标位姿时也结合了待定位图像的IMU值,因此至少可以在滚转角和偏航角两个自由度上对目标位姿进行约束,从而使六个自由度上的误差减少为四个自由度上的误差,提高了目标位姿的精度,降低了SLAM等定位技术中重定位的误差,提高重定位的精度和鲁棒性。
根据本公开实施例的第二方面,提供一种图像定位装置,请参照附图2,所述装置包括:
第一获取模块201,用于获取视觉定位地图,其中,所述视觉定位地图的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像;
第二获取模块202,用于获取待定位图像,以及所述待定位图像对应的IMU值;
位姿模块203,用于根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。
在本公开的一些实施例中,还包括对齐模块,用于:
获取所述视觉定位地图的首帧视觉图像对应的IMU值,作为所述视觉定位地图的世界坐标系的IMU值;
根据所述视觉定位地图的世界坐标系的IMU值和所述重力方向,确定旋转矩阵;
根据所述旋转矩阵对所述视觉定位地图的世界坐标系进行旋转,将所述视觉定位地图的世界坐标系的IMU值与所述重力方向对齐。
在本公开的一些实施例中,所述第二获取模块具体用于:
获取所述图像采集设备的IMU传感器在拍摄时刻采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻。
在本公开的一些实施例中,所述第二获取模块具体用于:
获取所述图像采集设备的IMU传感器在包含拍摄时刻的预设时长内采集 的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻;
确定所述预设时长内采集的IMU值的平均值为所述待定位图像对应的IMU值。
在本公开的一些实施例中,所述拍摄时刻为所述预设时长的中点时刻。
在本公开的一些实施例中,所述位姿模块具体用于:
根据所述待定位图像,在所述多个视觉图像中确定至少一个候选图像;
确定所述至少一个候选图像中每个候选图像,与所述待定位图像的特征匹配结果;
根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿。
在本公开的一些实施例中,所述位姿模块用于根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿时,具体用于:
根据至少一个所述特征匹配结果和所述视觉定位地图的三维关键点,确定所述待定位图像的初始位姿;
根据所述待定位图像对应的IMU值和所述重力方向,确定IMU误差,并确定所述待定位图像和所述视觉定位地图之间的重投影误差;
通过最小化所述IMU误差和所述重投影误差,对所述初始位姿进行优化,得到所述待定位图像的目标位姿。
在本公开的一些实施例中,还包括构建模块,用于:
获取导航空间内的多个视觉图像,以及每个所述视觉图像对应的IMU值,并根据所述多个视觉图像构建所述视觉定位地图。
在本公开的一些实施例中,所述待定位图像由所述导航设备拍摄;所述装置还包括导航模块,用于:
根据所述待定位图像的目标位姿,以及所述视觉定位地图,确定所述导航设备在导航空间内的位置。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在第一方面有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
根据本公开实施例的第五方面,请参照附图3,其示例性的示出了一种电子设备的框图。例如,装置300可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图3,装置300可以包括以下一个或多个组件:处理组件302,存储器304,电源组件306,多媒体组件308,音频组件310,输入/输出(I/O)的接口312,传感器组件314,以及通信组件316。
处理组件302通常控制装置300的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件302可以包括一个或多个处理器320来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件302可以包括一个或多个模块,便于处理组件302和其他组件之间的交互。例如,处理部件302可以包括多媒体模块,以方便多媒体组件308和处理组件302之间的交互。
存储器304被配置为存储各种类型的数据以支持在设备300的操作。这些数据的示例包括用于在装置300上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器304可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电力组件306为装置300的各种组件提供电力。电力组件306可以包括电源管理系统,一个或多个电源,及其他与为装置300生成、管理和分配电力相关联的组件。
多媒体组件308包括在所述装置300和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触控面板(TP)。 如果屏幕包括触控面板,屏幕可以被实现为触控屏,以接收来自用户的输入信号。触控面板包括一个或多个触控传感器以感测触控、滑动和触控面板上的手势。所述触控传感器可以不仅感测触控或滑动动作的边界,而且还检测与所述触控或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件308包括一个前置摄像头和/或后置摄像头。当装置300处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件310被配置为输出和/或输入音频信号。例如,音频组件310包括一个麦克风(MIC),当装置300处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器304或经由通信组件316发送。在一些实施例中,音频组件310还包括一个扬声器,用于输出音频信号。
I/O接口312为处理组件302和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件314包括一个或多个传感器,用于为装置300提供各个方面的状态评估。例如,传感器组件314可以检测到装置300的打开/关闭状态,组件的相对定位,例如所述组件为装置300的显示器和小键盘,传感器组件314还可以检测装置300或装置300一个组件的位置改变,用户与装置300接触的存在或不存在,装置300方位或加速/减速和装置300的温度变化。传感器组件314还可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件314还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件314还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件316被配置为便于装置300和其他设备之间有线或无线方式的 通信。装置300可以接入基于通信标准的无线网络,如WiFi,2G或3G,4G或5G或它们的组合。在一个示例性实施例中,通信部件316经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件316还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置300可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述电子设备的供电方法。
第四方面,本公开在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器304,上述指令可由装置300的处理器320执行以完成上述电子设备的供电方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
Claims (12)
- 一种图像定位方法,其特征在于,所述方法包括:获取视觉定位地图,其中,所述视觉定位的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像;获取待定位图像,以及所述待定位图像对应的IMU值;根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。
- 根据权利要求1所述的图像定位方法,其特征在于,还包括:获取所述视觉定位地图的首帧视觉图像对应的IMU值,作为所述视觉定位地图的世界坐标系的IMU值;根据所述视觉定位地图的世界坐标系的IMU值和所述重力方向,确定旋转矩阵;根据所述旋转矩阵对所述视觉定位地图的世界坐标系进行旋转,将所述视觉定位地图的世界坐标系的IMU值与所述重力方向对齐。
- 根据权利要求1所述的图像定位方法,其特征在于,所述获取所述待定位图像对应的IMU值,包括:获取所述图像采集设备的IMU传感器在拍摄时刻采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻。
- 根据权利要求1所述的图像定位方法,其特征在于,所述获取所述待定位图像对应的IMU值,包括:获取图像采集设备的IMU传感器在包含拍摄时刻的预设时长内采集的IMU值,其中,所述拍摄时刻为所述图像采集设备采集所述待定位图像的时刻;确定所述预设时长内采集的IMU值的平均值为所述待定位图像对应 的IMU值。
- 根据权利要求4所述的图像定位方法,其特征在于,所述拍摄时刻为所述预设时长的中点时刻。
- 根据权利要求1所述的图像定位方法,其特征在于,所述根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿,包括:根据所述待定位图像,在所述多个视觉图像中确定至少一个候选图像;确定所述至少一个候选图像中每个候选图像,与所述待定位图像的特征匹配结果;根据至少一个所述特征匹配结果、所述待定位图像对应的IMU值、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿。
- 根据权利要求6所述的图像定位方法,其特征在于,所述根据至少一个所述特征匹配结果、所述、以及所述视觉定位地图的三维关键点,确定所述待定位图像的目标位姿,包括:根据至少一个所述特征匹配结果和所述视觉定位地图的三维关键点,确定所述待定位图像的初始位姿;根据所述待定位图像对应的IMU值和所述重力方向,确定IMU误差,并确定所述待定位图像和所述视觉定位地图之间的重投影误差;通过最小化所述IMU误差和所述重投影误差,对所述初始位姿进行优化,得到所述待定位图像的目标位姿。
- 根据权利要求1至7任一项所述的图像定位方法,其特征在于,还包括:获取导航空间内的多个视觉图像,以及每个所述视觉图像对应的IMU 值,并根据所述多个视觉图像构建所述视觉定位地图。
- 根据权利要求1所述的图像定位方法,其特征在于,所述待定位图像由导航设备拍摄;所述方法还包括:根据所述待定位图像的目标位姿,以及所述视觉定位地图,确定所述导航设备在导航空间内的位置。
- 一种图像定位装置,其特征在于,所述装置包括:第一获取模块,用于获取视觉定位地图,其中,所述视觉定位地图的世界坐标系的IMU值与重力方向对齐,所述视觉定位地图具有多个视觉图像;第二获取模块,用于获取待定位图像,以及所述待定位图像对应的IMU值;位姿模块,用于根据所述视觉定位地图的多个视觉图像、所述待定位图像、以及所述待定位图像对应的IMU值,确定所述待定位图像的目标位姿。
- 一种终端设备,其特征在于,所述终端设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时基于权利要求1至9中任一项所述的图像定位方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至9中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/127768 WO2024087066A1 (zh) | 2022-10-26 | 2022-10-26 | 图像定位方法、装置、电子设备及存储介质 |
CN202280085785.XA CN118525296A (zh) | 2022-10-26 | 2022-10-26 | 图像定位方法、装置、电子设备及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/127768 WO2024087066A1 (zh) | 2022-10-26 | 2022-10-26 | 图像定位方法、装置、电子设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024087066A1 true WO2024087066A1 (zh) | 2024-05-02 |
Family
ID=90829802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/127768 WO2024087066A1 (zh) | 2022-10-26 | 2022-10-26 | 图像定位方法、装置、电子设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118525296A (zh) |
WO (1) | WO2024087066A1 (zh) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109084732A (zh) * | 2018-06-29 | 2018-12-25 | 北京旷视科技有限公司 | 定位与导航方法、装置及处理设备 |
CN110118554A (zh) * | 2019-05-16 | 2019-08-13 | 深圳前海达闼云端智能科技有限公司 | 基于视觉惯性的slam方法、装置、存储介质和设备 |
US20200160479A1 (en) * | 2018-10-08 | 2020-05-21 | R-Go Robotics Ltd. | System and method for geometrical user interactions via three-dimensional mapping |
CN111928847A (zh) * | 2020-09-22 | 2020-11-13 | 蘑菇车联信息科技有限公司 | 惯性测量单元位姿数据优化方法、装置及电子设备 |
CN112197764A (zh) * | 2020-12-07 | 2021-01-08 | 广州极飞科技有限公司 | 实时位姿确定方法、装置及电子设备 |
US20210241523A1 (en) * | 2020-02-04 | 2021-08-05 | Naver Corporation | Electronic device for providing visual localization based on outdoor three-dimension map information and operating method thereof |
CN115063480A (zh) * | 2022-06-24 | 2022-09-16 | 咪咕动漫有限公司 | 位姿确定方法、装置、电子设备和可读存储介质 |
-
2022
- 2022-10-26 CN CN202280085785.XA patent/CN118525296A/zh active Pending
- 2022-10-26 WO PCT/CN2022/127768 patent/WO2024087066A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109084732A (zh) * | 2018-06-29 | 2018-12-25 | 北京旷视科技有限公司 | 定位与导航方法、装置及处理设备 |
US20200160479A1 (en) * | 2018-10-08 | 2020-05-21 | R-Go Robotics Ltd. | System and method for geometrical user interactions via three-dimensional mapping |
CN110118554A (zh) * | 2019-05-16 | 2019-08-13 | 深圳前海达闼云端智能科技有限公司 | 基于视觉惯性的slam方法、装置、存储介质和设备 |
US20210241523A1 (en) * | 2020-02-04 | 2021-08-05 | Naver Corporation | Electronic device for providing visual localization based on outdoor three-dimension map information and operating method thereof |
CN111928847A (zh) * | 2020-09-22 | 2020-11-13 | 蘑菇车联信息科技有限公司 | 惯性测量单元位姿数据优化方法、装置及电子设备 |
CN112197764A (zh) * | 2020-12-07 | 2021-01-08 | 广州极飞科技有限公司 | 实时位姿确定方法、装置及电子设备 |
CN115063480A (zh) * | 2022-06-24 | 2022-09-16 | 咪咕动漫有限公司 | 位姿确定方法、装置、电子设备和可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN118525296A (zh) | 2024-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11176687B2 (en) | Method and apparatus for detecting moving target, and electronic equipment | |
US20210158560A1 (en) | Method and device for obtaining localization information and storage medium | |
CN105450736B (zh) | 与虚拟现实连接的方法和装置 | |
EP3287745B1 (en) | Information interaction method and device | |
US10885682B2 (en) | Method and device for creating indoor environment map | |
CN110853095B (zh) | 相机定位方法、装置、电子设备及存储介质 | |
CN112348933B (zh) | 动画生成方法、装置、电子设备及存储介质 | |
US10248855B2 (en) | Method and apparatus for identifying gesture | |
JP2021520540A (ja) | カメラの位置決め方法および装置、端末並びにコンピュータプログラム | |
US20200402321A1 (en) | Method, electronic device and storage medium for image generation | |
CN114092655A (zh) | 构建地图的方法、装置、设备及存储介质 | |
WO2023273498A1 (zh) | 深度检测方法及装置、电子设备和存储介质 | |
WO2023273499A1 (zh) | 深度检测方法及装置、电子设备和存储介质 | |
CN112700468A (zh) | 位姿确定方法及装置、电子设备和存储介质 | |
CN114140536A (zh) | 位姿数据处理方法、装置、电子设备及存储介质 | |
CN112767541A (zh) | 三维重建方法及装置、电子设备和存储介质 | |
CN111984755A (zh) | 确定目标停车点的方法、装置、电子设备及存储介质 | |
WO2024087066A1 (zh) | 图像定位方法、装置、电子设备及存储介质 | |
CN113642551A (zh) | 指甲关键点检测方法、装置、电子设备及存储介质 | |
WO2022110801A1 (zh) | 数据处理方法及装置、电子设备和存储介质 | |
CN117115244A (zh) | 云端重定位方法、装置及存储介质 | |
WO2022237071A1 (zh) | 定位方法及装置、电子设备、存储介质和计算机程序 | |
WO2022110777A1 (zh) | 定位方法及装置、电子设备、存储介质、计算机程序产品、计算机程序 | |
WO2024087067A1 (zh) | 图像标注方法及装置、神经网络训练方法及装置 | |
CN112308878A (zh) | 一种信息处理方法、装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22963050 Country of ref document: EP Kind code of ref document: A1 |