WO2024087927A1 - 位姿确定方法及装置、计算机可读存储介质和电子设备 - Google Patents

位姿确定方法及装置、计算机可读存储介质和电子设备 Download PDF

Info

Publication number
WO2024087927A1
WO2024087927A1 PCT/CN2023/118752 CN2023118752W WO2024087927A1 WO 2024087927 A1 WO2024087927 A1 WO 2024087927A1 CN 2023118752 W CN2023118752 W CN 2023118752W WO 2024087927 A1 WO2024087927 A1 WO 2024087927A1
Authority
WO
WIPO (PCT)
Prior art keywords
color image
camera
feature points
coordinate system
posture
Prior art date
Application number
PCT/CN2023/118752
Other languages
English (en)
French (fr)
Inventor
尹赫
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2024087927A1 publication Critical patent/WO2024087927A1/zh

Links

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a posture determination method, a posture determination device, a computer-readable storage medium, and an electronic device.
  • visual positioning is a technology that uses images taken by a camera to determine the camera's position in the real world. It has important application value in augmented reality, virtual reality, robotics, intelligent transportation and other fields.
  • the present disclosure provides a posture determination method, a posture determination device, a computer-readable storage medium and an electronic device, thereby overcoming the problem of poor visual positioning accuracy at least to a certain extent.
  • a posture determination method is provided, which is applied to a terminal device, wherein the terminal device is configured with a first camera and at least one second camera, and the posture determination method includes: obtaining matching feature points between two images in a first color image set, and determining a reprojection error of the matching feature points between two images in the first color image set, wherein the first color image set is composed of a current frame color image acquired by the first camera and previous n frames of color images acquired by the first camera; obtaining matching feature points between two images in a second color image set, and determining a reprojection error of the matching feature points between two images in the second color image set in combination with a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, wherein the second color image set is composed of a current frame color image acquired by the second camera and previous n frames of color images acquired by the second camera; optimizing a posture to be optimized when the first camera acquires the current frame color image based on the repro
  • a posture determination apparatus which is configured in a terminal device, and the terminal device is also configured with a first camera and at least one second camera, and the posture determination apparatus includes: a first error determination module, which is used to obtain matching feature points between two images in a first color image set, and determine the reprojection error of the matching feature points between two images in the first color image set, wherein the first color image set is composed of a current frame color image captured by the first camera and the first n frames of color images captured by the first camera; a second error determination module, which is used to obtain matching feature points between two images in the second color image set, and determine the reprojection error of the matching feature points between two images in the first color image set in combination with the first camera coordinate system of the first camera.
  • the module is used to determine the reprojection error of the matching feature points between each pair of images in the second color image set by the transformation matrix between the first camera and the second camera coordinate system of the second camera, wherein the second color image set is composed of the current frame color image acquired by the second camera and the first n frames of color images acquired by the second camera;
  • the target posture determination module is used to optimize the posture to be optimized when the first camera acquires the current frame color image based on the reprojection error of the matching feature points between each pair of images in the first color image set and the reprojection error of the matching feature points between each pair of images in the second color image set, so as to determine the target posture when the first camera acquires the current frame color image; wherein n is a positive integer.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned posture determination method is implemented.
  • an electronic device comprising a processor; and a memory for storing one or more programs, wherein when the one or more programs are executed by the processor, the processor implements the above-mentioned posture determination method.
  • FIG1 is a schematic diagram showing a system architecture of a posture determination system according to an embodiment of the present disclosure
  • FIG2 is a schematic diagram showing a placement of dual cameras on a terminal device according to an embodiment of the present disclosure
  • FIG3 is a schematic diagram showing the placement angles of the dual cameras according to an embodiment of the present disclosure.
  • FIG4 is a schematic diagram showing various processing stages involved in the posture determination solution of an embodiment of the present disclosure.
  • FIG5 schematically shows a flow chart of a method for determining a posture according to an exemplary embodiment of the present disclosure
  • FIG6 schematically shows a flow chart of a method for determining a posture to be optimized according to an embodiment of the present disclosure
  • FIG7 is a schematic diagram showing dual-camera point pair matching according to an embodiment of the present disclosure.
  • FIG8 is a flowchart showing a positioning initialization process according to an embodiment of the present disclosure.
  • FIG9 is a schematic diagram showing a method of determining two planes according to an embodiment of the present disclosure.
  • FIG10 is a schematic diagram showing a method of determining a ground plane according to an embodiment of the present disclosure
  • FIG11 is a flowchart showing a process of determining a transformation matrix between a first camera coordinate system and a world coordinate system according to an embodiment of the present disclosure
  • FIG12 is a schematic diagram showing a sliding window according to an embodiment of the present disclosure.
  • FIG13 is a schematic diagram showing a method of determining a reprojection error between two frames according to an embodiment of the present disclosure
  • FIG14 is a schematic diagram showing the walking route of the robot dog during the test scheme disclosed in the present invention.
  • FIG15 schematically shows a block diagram of a posture determination apparatus according to an exemplary embodiment of the present disclosure
  • FIG16 schematically shows a block diagram of a posture determination apparatus according to another exemplary embodiment of the present disclosure.
  • FIG17 schematically shows a block diagram of a posture determination apparatus according to another exemplary embodiment of the present disclosure.
  • FIG. 18 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • computer devices can autonomously perceive their own position in the environment, so as to perform any tasks proposed by the user, such as tracking, monitoring, interaction, displaying images, playing audio, etc.
  • the accuracy of positioning greatly affects the realization of computer device functions.
  • the embodiments of the present disclosure provide a new positioning solution.
  • FIG1 is a schematic diagram showing the system architecture of the posture determination system according to an embodiment of the present disclosure.
  • a terminal device 1 may include The EMBODIMENT 100 includes a processor 100, a first camera 110, and at least one second camera 120.
  • the terminal device 1 may include, for example, a robot, an intelligent monitoring device, an intelligent tracking device, etc. It may be a whole device, or a device system composed of multiple entity units.
  • the terminal device 1 may be a robot dog.
  • a robot dog is a robot form with advantages such as flexibility and strong mobility, and can perform tasks such as security patrol, transporting items, and emotional companionship.
  • the first camera 110 and the at least one second camera 120 serve as input sensors of the posture determination solution of the embodiment of the present disclosure, and can transmit the sensed color image and depth image to the processor 100 .
  • the first camera 110 and the second camera 120 may be Realsense D455 cameras.
  • the Realsense D455 camera consists of an RGB camera, two IR (infrared) cameras, and an IR transmitter.
  • the RGB camera outputs a color image
  • the two IR cameras may output a dense depth map aligned with the color image.
  • the FOV (field of view) of the Realsense D455 camera is 90° horizontally and 65° vertically.
  • the terminal device 1 includes a first camera 110 and a second camera 120
  • the first camera 110 may be a left camera
  • the second camera 120 may be a right camera.
  • the left camera involved may be understood as the first camera 110
  • the right camera involved may be understood as the second camera 120.
  • “left”, “right”, “first”, and “second” are merely exemplary descriptions for distinction.
  • the first camera 110 may be a right camera
  • the second camera 120 may be a left camera, and the present disclosure does not limit this.
  • FIG2 shows a schematic diagram of the placement of the dual cameras on a terminal device according to an embodiment of the present disclosure. It should be understood that the placement shown in FIG2 is only an exemplary description, and there may be multiple placements according to the type of terminal device and the camera configuration space, and the present disclosure does not limit this.
  • FIG3 shows a schematic diagram of the placement angles of the dual cameras of the embodiment of the present disclosure.
  • the first camera 110 and the second camera 120 both of which are placed vertically, their viewing angles are both 65°, corresponding to angles A and B in FIG3 , respectively.
  • the leftmost line of sight of the first camera 110 can be parallel to the rightmost line of sight of the second camera 120.
  • the two cameras can obtain the maximum field of view, that is, 130°, corresponding to angle C in FIG3 .
  • the first camera 110 and the second camera 120 are placed vertically side by side at an angle of 115°, and the fields of view of the two cameras are 130° in the horizontal direction and 90° in the vertical direction. This achieves the maximum superposition of the fields of view of the two cameras, effectively increases the field of view of the terminal device 1, and provides more sufficient accuracy for the subsequent positioning algorithm.
  • first camera 110 and the second camera 120 support multi-camera hardware synchronization.
  • the first camera 110 and the second camera 120 can be connected by a wire, and the same pulse signal is used to trigger the two cameras to expose simultaneously, thereby realizing hardware synchronization of multiple cameras.
  • the image input into the subsequent positioning algorithm is the image taken at the same time. In this way, additional errors caused by inconsistent shooting time of multiple cameras are avoided.
  • the internal and external parameters of the two cameras can be calibrated respectively for use by subsequent algorithms.
  • the present disclosure does not limit the calibration process.
  • the current frame color image captured by the first camera 110 and the first n frames of color images captured by the first camera 110 are recorded as a first color image set.
  • the processor 100 can obtain matching feature points between two images in the first color image set, and determine the reprojection error of the matching feature points between two images in the first color image set. Where n is a positive integer.
  • the current frame color image captured by the second camera 120 and the first n frames of color images captured by the second camera 120 are recorded as a second color image set.
  • the processor 100 can obtain matching feature points between two images in the second color image set, and determine the reprojection errors of the matching feature points between two images in the second color image set in combination with the transformation matrix between the first camera coordinate system of the first camera 110 and the second camera coordinate system of the second camera 120.
  • the processor 100 can optimize the pose to be optimized when the first camera captures the current frame color image based on the reprojection error of the matching feature points between each image in the first color image set and the reprojection error of the matching feature points between each image in the second color image set, so as to determine the target pose when the first camera captures the current frame color image.
  • the pose to be optimized when the first camera captures the current frame color image can be understood as a pre-determined rough pose.
  • the target pose obtained by optimizing the reprojection error is a fine pose corresponding to the rough pose.
  • the pose accuracy of the target pose is better than the pose accuracy of the pose to be optimized.
  • the reprojection error of the matching feature points between the two images in the second color image set is determined for each second camera 120.
  • the pose optimization is performed based on the reprojection errors corresponding to the first camera 110 and all the second cameras 120.
  • the placement positions of the first camera 110 and the second camera 120 on the terminal device 1 are fixed, and when the current posture of the first camera 110 is determined, the current posture of the second camera 120 and the current posture of the terminal device 1 can be obtained.
  • any one of the cameras may be determined as the first camera 110 in algorithm implementation, and the remaining cameras may be determined as the second camera 120 .
  • the determined posture can be further optimized to obtain a more accurate target posture.
  • the present disclosure scheme establishes error constraints for different cameras respectively, and then combines these error constraints to achieve posture optimization, which can improve the accuracy and robustness of positioning.
  • the establishment of the error constraints of the present disclosure relies on the matching feature points between the current frame and the previous n frames. In other words, the scheme takes into account the correlation between adjacent frames, further improving the accuracy and robustness of positioning.
  • the processing stages involved include but are not limited to the coordinate system alignment stage, the positioning initialization stage, the real-time positioning stage and the posture optimization stage.
  • the coordinate system alignment stage, the positioning initialization stage and the real-time positioning stage are configured to determine the posture to be optimized when the first camera captures the current frame color image
  • the posture optimization stage is configured to optimize the posture to be optimized to determine the target posture when the first camera captures the current frame color image, that is, the optimized posture.
  • the terminal device determines the transformation matrix between the first camera coordinate system and the world coordinate system.
  • the terminal device can construct a point cloud using the depth image output by the first camera and the depth image output by the second camera, wherein the three-dimensional space points corresponding to the two depth images can be merged to obtain a point cloud of three-dimensional feature points.
  • the terminal device uses a plane detection algorithm to extract plane information from the point cloud, and selects a specified plane (such as the ground plane) based on the extracted plane information.
  • a plane detection algorithm to extract plane information from the point cloud, and selects a specified plane (such as the ground plane) based on the extracted plane information.
  • the terminal device may calculate a transformation matrix according to the normal vector and the gravity vector of the specified plane to align the first camera coordinate system with the world coordinate system.
  • the transformation matrix between the first camera coordinate system and the second camera coordinate system can be obtained.
  • the transformation matrix between the second camera coordinate system and the world coordinate system can also be obtained to achieve alignment among the first camera coordinate system, the second camera coordinate system, and the world coordinate system.
  • the terminal device can determine the position and posture of the first camera when initially capturing a color image. It should be understood that the position and posture of the camera when capturing an image in the present disclosure refers to the position and posture in the world coordinate system.
  • the terminal device can determine the three-dimensional feature points corresponding to the initial frame color image captured by the first camera, and the three-dimensional feature points are feature points in the first camera coordinate system.
  • the initial rotation matrix and the initial translation vector can be set.
  • the initial rotation matrix is the identity matrix
  • the initial translation vector is [0,0,0].
  • the positioning initialization in the first camera coordinate system is completed.
  • the positioning initialization result in the first camera coordinate system can be converted into the positioning initialization result in the world coordinate system, that is, the position and posture of the first camera when capturing the initial frame color image is determined.
  • the terminal device can obtain the pose of the current frame in real time in combination with the initial pose determined in the positioning initialization stage.
  • the features of the second camera can be transferred to the coordinate system of the first camera, and the pose can be solved in conjunction with the features of the first camera to complete the pose prediction of the current frame and obtain the pose to be optimized when the first camera captures the color image of the current frame.
  • the terminal device can optimize the optimized posture when the first camera captures the current frame color image to obtain the target posture.
  • the disclosed embodiment provides a posture determination method for the processing process of the posture optimization stage.
  • FIG5 schematically shows a flow chart of a method for determining a posture according to an exemplary embodiment of the present disclosure.
  • the method for determining a posture may include the following operations:
  • a set consisting of a current frame color image captured by a first camera and the first n frames of color images captured by the first camera is recorded as a first color image set.
  • the first n frames of color images are continuous frame images before the current frame of color image, wherein n is a positive integer, and the present disclosure does not limit the specific value of n.
  • the value of n can be determined based on the required accuracy, time consumption, and processing power of the device.
  • n is set to 5
  • the previous n color image frames include the 9th image frame, the 8th image frame, the 7th image frame, the 6th image frame and the 5th image frame.
  • the terminal device can obtain matching feature points between two images in the first color image set.
  • the matching feature points between two images in the first color image set refer to matching feature points between all image pairs in the first color image set. It should be understood that the image pairs are not limited to adjacent frames, and any two color images in the first color image set constitute an image pair.
  • the matching feature points between two images are 2D-2D (two-dimensional-two-dimensional) feature points.
  • the terminal device can use a feature point matching algorithm to determine the matching feature points between two images in the first color image set.
  • the present disclosure does not impose any restrictions on this.
  • the terminal device may determine the reprojection errors of the matching feature points between any two images in the first color image set.
  • the following takes the first color image and the second color image included in the first color image set as an example to illustrate the process of determining the reprojection error of the matching feature points between the two.
  • the terminal device can obtain feature points on the first color image that match the second color image, which are recorded as first matching feature points.
  • the terminal device can obtain depth information of the first matching feature points, which can be output by the first camera or can be sensed by other depth cameras equipped with the terminal device, and the present disclosure does not limit this.
  • the terminal device may determine the three-dimensional feature point of the first matching feature point in the world coordinate system by using the first matching feature point, the depth information of the first matching feature point, and the posture of the first camera when capturing the first color image.
  • the first matching feature point, the depth information of the first matching feature point, and the position and posture of the first camera when acquiring the first color image may be multiplied to obtain the three-dimensional feature point of the first matching feature point in the world coordinate system.
  • the feature points on the second color image that match the first color image are recorded as second matching feature points. Then, when the second matching feature points are obtained, the terminal device can use the second matching feature points, the three-dimensional feature points of the first matching feature points in the world coordinate system, and the position and posture of the first camera when capturing the second color image to determine the reprojection error of the matching feature points between the first color image and the second color image.
  • the three-dimensional feature point of the first matching feature point in the world coordinate system can be multiplied by the inverse of the posture of the first camera when capturing the second color image, the multiplication result can be normalized, and then the normalized result can be subtracted from the second matching feature point to construct a reprojection error of the matching feature point between the first color image and the second color image.
  • first matching feature point and the second matching feature point can both be normalized feature points.
  • a set consisting of the current frame color image acquired by the second camera and the first n frames of color images acquired by the second camera is recorded as a second color image set, where n is the same as n in operation S52.
  • the terminal device can obtain matching feature points between two images in the second color image set.
  • the two images in the second color image set mentioned in the present disclosure are not limited to adjacent frames. Any two color images in the set are said pairwise images.
  • the present disclosure does not limit the method of determining matching feature points.
  • the terminal device can determine the reprojection error of the matching feature points between each image in the second color image set by combining the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera.
  • the following takes the third color image and the fourth color image included in the second color image set as an example to illustrate the process of determining the reprojection error of the matching feature points between the two.
  • the terminal device can obtain feature points on the third color image that match the fourth color image, which are recorded as third matching feature points.
  • the terminal device can obtain depth information of the third matching feature points, which can be output by the second camera or can be sensed by other depth cameras equipped with the terminal device, and the present disclosure does not limit this.
  • the terminal device can use the third matching feature point, the depth information of the third matching feature point, the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera, and the posture of the first camera when the second camera captures the third color image to determine the three-dimensional feature point of the third matching feature point in the world coordinate system.
  • the third matching feature point, the depth information of the third matching feature point, the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera, and the posture of the first camera when the second camera captures the third color image can be multiplied to obtain the three-dimensional feature point of the third matching feature point in the world coordinate system.
  • the feature points on the fourth color image that match the third color image are recorded as fourth matching feature points.
  • the terminal device can use the fourth matching feature points, the three-dimensional feature points of the third matching feature points in the world coordinate system, the position of the first camera when the second camera captures the fourth color image, and the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera to determine the reprojection error of the matching feature points between the third color image and the fourth color image.
  • the three-dimensional feature point of the third matching feature point in the world coordinate system, the inverse of the position of the first camera when the second camera captures the fourth color image, and the inverse of the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera can be multiplied, the multiplication result can be normalized, and then the normalized result can be subtracted from the fourth matching feature point to construct a reprojection error of the matching feature point between the third color image and the fourth color image.
  • the third matching feature point and the fourth matching feature point can both be normalized feature points.
  • the terminal device may accumulate the reprojection errors determined in operation S52 and operation S54 to obtain a total error function.
  • the total error function is a nonlinear function with the pose when the first camera captures each color image in the first color image set as a variable.
  • the pose when the first camera captures each color image in the first color image set includes the pose to be optimized when the first camera captures the current frame color image.
  • the terminal device can use iterative processing to minimize the total error function.
  • the target posture when the first camera captures the current frame color image can be determined, that is, the posture after the posture to be optimized is optimized.
  • the Jacobian matrix of each error in the total error function with respect to the optimization variable can be solved, and iterative optimization can be performed using a nonlinear optimization method to minimize the total error function, thereby ultimately determining the precise position and posture of the first camera when capturing the current frame color image.
  • the pose to be optimized when the first camera captures the current frame color image is involved.
  • the first color image may be the current frame color image.
  • the pose when the first camera captures the first color image is the pose to be optimized when the first camera captures the current frame color image.
  • the posture to be optimized may be a pre-determined posture, and the present disclosure further provides a method for determining the posture to be optimized. This process will be described below with reference to FIG6 .
  • S602. Obtain a current frame color image acquired by a first camera, and determine a first two-dimensional feature point on the current frame color image acquired by the first camera that matches a previous frame color image acquired by the first camera.
  • the terminal device may extract feature points of the current color image captured by the first camera.
  • the feature extraction algorithm used in the exemplary embodiments of the present disclosure may include but is not limited to the FAST feature point detection algorithm, the DOG feature point detection algorithm, the Harris feature point detection algorithm, the SIFT feature point detection algorithm, the SURF feature point detection algorithm, etc.
  • the feature descriptor may include but is not limited to the BRIEF feature point descriptor, the BRISK feature point descriptor, the FREAK feature point descriptor, etc.
  • the combination of the feature extraction algorithm and the feature descriptor may be a FAST feature point detection algorithm and a BRIEF feature point descriptor. According to other embodiments of the present disclosure, the combination of the feature extraction algorithm and the feature descriptor may be a DOG feature point detection algorithm and a FREAK feature point descriptor.
  • the FAST feature point detection algorithm and the BRIEF feature point descriptor can be used for feature extraction; for weak texture scenes, the DOG feature point detection algorithm and the FREAK feature point descriptor can be used for feature extraction.
  • the terminal device can use the feature points of the current color image frame captured by the first camera and the feature points of the previous color image frame captured by the first camera to determine the two-dimensional feature points that match between the two images, that is, the first two-dimensional feature points mentioned in the present disclosure.
  • the optical flow method can be used to determine the matching relationship of the feature points, that is, the feature points of the current frame color image captured by the first camera and the feature points of the previous frame color image captured by the first camera are used for optical flow tracking to determine the first two-dimensional feature points.
  • other image matching methods can also be used to determine 2D-2D feature point pairs, which is not limited in the present disclosure.
  • the terminal device After acquiring the current frame color image captured by the second camera, the terminal device can extract feature points of the current color image captured by the second camera.
  • the feature point extraction method can be the same as the feature point extraction method in operation S602, which is not repeated here.
  • the terminal device may perform optical flow tracking using the feature points of the current frame color image captured by the second camera and the feature points of the previous frame color image captured by the second camera to determine the second two-dimensional feature points.
  • a camera coordinate system of the first camera is recorded as a first camera coordinate system
  • a camera coordinate system of the second camera is recorded as a second camera coordinate system
  • the first camera and the second camera are calibrated with internal and external parameters in advance, and the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera can be determined from the calibration results.
  • the terminal device can obtain the conversion matrix between the first camera coordinate system and the second camera coordinate system and the depth information of the second two-dimensional feature point, and determine the third two-dimensional feature point according to the conversion matrix, the depth information of the second two-dimensional feature point and the second feature point.
  • the third two-dimensional feature point is the two-dimensional feature point converted from the second two-dimensional feature point to the first camera coordinate system.
  • the transformation matrix, the depth information of the second two-dimensional feature points, and the second two-dimensional feature points can be multiplied, and the multiplication result can be normalized to determine the third two-dimensional feature points.
  • the second two-dimensional feature points in the multiplication operation refer to the position coordinate information of these feature points.
  • the third two-dimensional feature points can be determined using formula 1
  • T lr is the transformation matrix between the first camera coordinate system and the second camera coordinate system
  • d j is the depth value of the second two-dimensional feature point
  • S608. Determine the first camera according to the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame color image acquired by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame color image acquired by the second camera in the world coordinate system.
  • the first two-dimensional feature points and the third two-dimensional feature points constitute two-dimensional coordinate information
  • the three-dimensional feature points of the previous frame color image captured by the first camera in the world coordinate system and the three-dimensional feature points of the previous frame color image captured by the second camera in the world coordinate system constitute three-dimensional coordinate information
  • the terminal device can associate the two-dimensional coordinate system information with the three-dimensional coordinate information to obtain point pair information, and use the point pair information to solve the perspective-n-Point (PnP) problem, and determine the pose to be optimized when the first camera captures the current frame color image based on the solution result.
  • PnP perspective-n-Point
  • PnP is a method in the field of machine vision, which can determine the relative position of the camera based on n feature points in the scene. Specifically, the rotation matrix and translation vector of the camera can be determined based on the n feature points on the scene.
  • the process of determining the three-dimensional feature points of the previous frame color image in the world coordinate system in the present disclosure can be performed during the processing of the current frame or during the processing of the previous frame, and the present disclosure does not impose any limitation on this.
  • the terminal device may obtain the last frame of color image captured by the first camera, and extract feature points of the last frame of color image captured by the first camera.
  • the process of extracting feature points is the same as the process in operation S602, and will not be repeated here.
  • the terminal device can use the previous frame depth image aligned with the previous frame color image captured by the first camera to perform spatial projection on the feature points of the previous frame color image captured by the first camera to obtain the three-dimensional feature points of the previous frame color image captured by the first camera in the first camera coordinate system.
  • the previous frame depth image can be output by the first camera, or can be obtained by other depth cameras equipped by the terminal device, and the present disclosure does not limit this.
  • the spatial projection process can also be constrained.
  • the terminal device can use the previous frame of depth image aligned with the previous frame of color image captured by the first camera to perform spatial projection on the feature points within a predetermined depth range among the feature points of the previous frame of color image captured by the first camera, so as to obtain the three-dimensional feature points of the previous frame of color image captured by the first camera in the first camera coordinate system.
  • the predetermined depth range is determined based on the range of the depth measurement.
  • the value of the predetermined depth range may vary depending on the type and model of the depth camera.
  • the present disclosure does not limit the specific value of the predetermined depth range. For example, feature points with a depth value greater than 0.5m and less than 6m are spatially projected.
  • the terminal device can transform the three-dimensional feature points in the first camera coordinate system according to the posture when the first camera captured the last frame of color image, so as to obtain the three-dimensional feature points in the world coordinate system of the last frame of color image captured by the first camera.
  • T w_last is the position and posture of the first camera when capturing the last frame of color image.
  • the position and posture of the first camera when capturing the previous color image can be determined during the processing of the previous image, that is, during the processing of the current frame, the position and posture corresponding to the previous frame is known.
  • the initial position and posture are explained in the process of positioning initialization of the present disclosure.
  • the terminal device can obtain the last frame of color image captured by the second camera, and extract feature points of the last frame of color image captured by the second camera.
  • the process of extracting feature points is the same as the process in operation S602, which will not be repeated here.
  • the terminal device can use the previous frame depth image aligned with the previous frame color image captured by the second camera to perform spatial projection on the feature points of the previous frame color image captured by the second camera to obtain the three-dimensional feature points of the previous frame color image captured by the second camera in the second camera coordinate system.
  • the previous frame depth image can be output by the second camera, or can be obtained by other depth cameras equipped by the terminal device, and the present disclosure does not limit this.
  • the spatial projection process can also be constrained.
  • the terminal device can use the previous frame of depth image aligned with the previous frame of color image captured by the second camera to perform spatial projection on the feature points within a predetermined depth range among the feature points of the previous frame of color image captured by the second camera, so as to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the second camera coordinate system.
  • the predetermined depth range is determined based on the range of the depth measurement.
  • the value of the predetermined depth range may vary depending on the type and model of the depth camera.
  • the present disclosure does not limit the specific value of the predetermined depth range. For example, feature points with a depth value greater than 0.5m and less than 6m are spatially projected.
  • the terminal device may use the transformation matrix between the first camera coordinate system and the second camera coordinate system to transform the three-dimensional feature points of the previous frame color image captured by the second camera in the second camera coordinate system into the three-dimensional feature points in the first camera coordinate system.
  • the terminal device can convert the three-dimensional feature points in the converted first camera coordinate system again according to the posture of the first camera when capturing the previous frame of color image, so as to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.
  • T w_last is the position and posture when the first camera captured the last frame color image
  • T lr is the transformation matrix between the first camera coordinate system and the second camera coordinate system.
  • FIG7 shows a schematic diagram of point pair matching between the first camera and the second camera to achieve PnP pose solution, which involves the matching relationship between 2D-2D feature points of the current frame and the matching relationship between 3D-2D feature points.
  • the position and posture of the first camera when capturing the previous color image is used.
  • the process of determining the initial position and posture of the first camera is described below.
  • the terminal device may obtain an initial frame color image captured by the first camera, and extract feature points of the initial frame color image captured by the first camera.
  • the process of extracting feature points is the same as the process in operation S602, and will not be repeated here.
  • the terminal device can use the initial frame depth image aligned with the initial frame color image captured by the first camera to spatially project the feature points of the initial frame color image captured by the first camera to obtain the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.
  • the spatial projection process can also be constrained.
  • the terminal device can use the feature points in the initial frame color image captured by the first camera that are within a predetermined depth range for spatial projection to obtain the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.
  • the predetermined depth range is determined based on the range of the depth measurement.
  • the value of the predetermined depth range may vary depending on the type and model of the depth camera.
  • the present disclosure does not limit the specific value of the predetermined depth range. For example, feature points with a depth value greater than 0.5m and less than 6m are spatially projected.
  • the terminal device may determine an initial positioning result of the first camera in the first camera coordinate system based on the three-dimensional feature points, the initial rotation matrix and the initial translation vector of the initial frame color image captured by the first camera in the first camera coordinate system.
  • the initial rotation matrix may be set to the identity matrix, and the translation vector may be set to [0, 0, 0].
  • the terminal device may transform the initial positioning result of the first camera in the first camera coordinate system using the transformation matrix between the first camera coordinate system and the world coordinate system, so as to determine the posture of the first camera when capturing the initial frame color image.
  • the process of determining the initial position and posture of the first camera may also be combined with feature data of the second camera, and this process is described below.
  • the terminal device can determine the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.
  • the terminal device can obtain the initial frame color image captured by the second camera, and extract feature points of the initial frame color image captured by the second camera.
  • the process of extracting feature points is the same as the process in operation S602, which will not be repeated here.
  • the terminal device can use the initial frame depth image aligned with the initial frame color image collected by the second camera to perform spatial projection on the feature points of the initial frame color image collected by the second camera to obtain the initial frame color image collected by the second camera in the second camera coordinate system. 3D feature points.
  • the spatial projection process can also be constrained.
  • the terminal device can perform spatial projection using feature points within a predetermined depth range among feature points of the initial frame color image captured by the second camera to obtain three-dimensional feature points of the initial frame color image captured by the second camera in the second camera coordinate system.
  • the predetermined depth range is determined based on the range of the depth measurement.
  • the value of the predetermined depth range may vary depending on the type and model of the depth camera.
  • the present disclosure does not limit the specific value of the predetermined depth range. For example, feature points with a depth value greater than 0.5m and less than 6m are spatially projected.
  • the terminal device can use the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera to transform the three-dimensional feature points of the initial frame color image captured by the second camera in the second camera coordinate system into the three-dimensional feature points in the first camera coordinate system.
  • the converted 3D feature points and the 3D feature points of the initial frame color image captured by the first camera in the first camera coordinate system can be combined to obtain combined 3D feature points. It can be understood that the combined 3D feature points are 3D feature points in the first camera coordinate system.
  • the terminal device can determine the initial positioning result of the first camera in the first camera coordinate system according to the combined three-dimensional feature points, the initial rotation matrix and the initial translation vector.
  • the initial rotation matrix can be set to the unit matrix and the translation vector can be set to [0,0,0].
  • the terminal device can use the transformation matrix between the first camera coordinate system and the world coordinate system to transform the initial positioning result of the first camera in the first camera coordinate system to determine the posture of the first camera when capturing the initial frame color image.
  • the terminal device may acquire an initial frame color image acquired by a first camera, and extract feature points of the initial frame color image acquired by the first camera.
  • the terminal device may perform spatial projection in combination with the depth image aligned with the initial frame color image captured by the first camera to obtain the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.
  • the three-dimensional feature points determined in operation S804 may also include three-dimensional feature points corresponding to the initial frame color image captured by the second camera.
  • the terminal device may determine an initial positioning result of the first camera in the first camera coordinate system according to the three-dimensional feature points, the initial rotation matrix, and the initial translation vector determined in operation S804.
  • the terminal device may transform the initial positioning result using a transformation matrix between the first camera coordinate system and the world coordinate system to determine the position and posture of the first camera when capturing the initial frame color, thereby completing the positioning initialization.
  • the transformation matrix between the first camera coordinate system and the world coordinate system is used.
  • the embodiment of the present disclosure provides a coordinate system alignment solution. Specifically, the coordinate system alignment is achieved in combination with the depth information.
  • the coordinate system alignment process is described using the terminology of the reference depth image.
  • the terminal device can obtain a reference depth image output by the first camera.
  • the terminal device can determine the transformation matrix between the first camera coordinate system and the world coordinate system according to the normal vector and gravity vector of the designated plane.
  • the gravity vector can be Ng (0,0,1), in which case the designated plane is usually the ground plane to match the scenario where the terminal device is, for example, a robot dog.
  • the designated plane can also be a plane manually designated in a specific scenario, such as a wall, a desktop, etc., and the present disclosure does not limit this.
  • R wc is the transformation matrix between the first camera coordinate system and the world coordinate system
  • the rotation angle ⁇ of R wc can be obtained by multiplying N g and n c , as shown in Formula 5:
  • the rotation axis ⁇ and the rotation angle ⁇ constitute the rotation vector between the first camera coordinate system and the world coordinate system.
  • the terminal The device may calculate a transformation matrix R wc between the first camera coordinate system and the world coordinate system, thereby completing the coordinate system alignment process.
  • the terminal device can return to the operation of obtaining the reference depth image, re-acquire the reference depth image, and perform a judgment process on whether the specified plane exists.
  • the terminal device can determine the point cloud corresponding to the first camera in combination with the reference depth image output by the first camera, which is recorded as the reference point cloud.
  • the terminal device determines the three-dimensional space point of each pixel on the reference depth image output by the first camera according to the pixel, the depth value of the pixel and the camera internal parameter of the first camera.
  • P represents the three-dimensional space point projected into the space
  • z represents the depth value of the pixel point
  • K -1 represents the inverse of the camera intrinsic parameter matrix
  • p represents the coordinate position of the pixel point.
  • a reference point cloud corresponding to the first camera may be constructed from the three-dimensional space points obtained through this process.
  • the terminal device determines, for each pixel point on the reference depth image output by the first camera, the three-dimensional spatial point of each pixel point on the reference depth image according to the pixel point, the depth value of the pixel point and the camera intrinsic parameters of the first camera.
  • the terminal device can obtain the reference depth image output by the second camera, and determine the three-dimensional space point of each pixel on the reference depth image output by the second camera in combination with the above formula 6.
  • the terminal device can transform the three-dimensional space point of each pixel on the reference depth image output by the second camera according to the transformation matrix between the first camera coordinate system and the second camera coordinate system to obtain the transformed three-dimensional space point.
  • PC_mixture is the determined reference point cloud
  • PC_right is the three-dimensional space point of each pixel on the reference depth image output by the second camera
  • PC_left is the three-dimensional space point of each pixel on the reference depth image output by the first camera
  • Tlr is the transformation matrix between the first camera coordinate system and the second camera coordinate system.
  • the construction of the reference point cloud incorporates information of the depth image output by the second camera, thereby making the spatial feature points more comprehensive and improving the accuracy of the algorithm.
  • the terminal device can extract the plane information of the reference point cloud.
  • the present disclosure does not limit the plane extraction method, and can adopt the RANSAC fitting method, the normal vector region growing method, the hierarchical clustering method, etc., as long as the plane information in the scene can be extracted.
  • Some embodiments of the present disclosure adopt the plane extraction algorithm PEAC based on hierarchical clustering. Referring to Figure 9, two planes can be extracted using this algorithm. Figure 9 is only an example. All planes in the scene can be extracted using the above algorithm.
  • the extracted plane information includes but is not limited to the plane ID, the normal vector of the plane, the distance between the plane and the camera, etc.
  • the terminal device may filter the designated plane according to the plane information of the reference point cloud. Specifically, the terminal device may filter the designated plane according to the distance information of the plane from the first camera included in the plane information of the reference point cloud.
  • the terminal device may determine a candidate plane corresponding to the distance, and in this case, the number of the determined candidate planes is one or more.
  • the terminal device may determine the candidate plane as the designated plane.
  • the terminal device may determine the candidate plane whose distance from the first camera is closest to a distance threshold as the designated plane, wherein the distance threshold is within the above-mentioned predetermined distance range.
  • FIG. 10 is a schematic diagram showing the screening of the ground plane. Compared with the result of plane detection, planes such as the ceiling are eliminated through the above distance-based screening process.
  • the terminal device is a robot dog.
  • the terminal device is equipped with a first camera and a second camera.
  • the two cameras are arranged in fixed positions.
  • the robot dog is controlled to move for a short period of time and only moves on the ground plane.
  • the position of the ground plane in the coordinate system of the first camera is basically fixed.
  • the height of the ground plane from the camera is equivalent to the height of the robot dog, which is about 0.3m. Therefore, the above
  • the predetermined distance range is set to 0.25m to 0.35m as the ground plane. If multiple candidate planes are screened out, the plane with the closest distance of 0.3m is used as the ground plane.
  • the terminal device is controlled to continuously repeat the above-mentioned process of determining the plane using the depth image and plane screening until the terminal device detects the ground plane.
  • the terminal device obtains a reference depth image output by the first camera, and back-projects the reference depth image to obtain a three-dimensional space point in the space.
  • the terminal device obtains a reference depth image output by the second camera, and back-projects the reference depth image to obtain a three-dimensional space point in the space.
  • the terminal device converts the three-dimensional space point obtained in operation S1104 to a three-dimensional space point in the first camera coordinate system.
  • the terminal device combines the three-dimensional space point obtained in operation S1102 with the three-dimensional space point obtained in operation S1106 to obtain a reference point cloud corresponding to the first camera.
  • the terminal device may extract plane information based on the reference point cloud.
  • the terminal device may screen the extracted planes to determine a ground plane
  • the terminal device may determine a transformation matrix between the first camera coordinate system and the world coordinate system by using the normal vector of the ground plane and the gravity vector to complete alignment of the first camera coordinate system and the world coordinate system.
  • the transformation matrix between the second camera coordinate system and the world coordinate system can also be obtained to achieve alignment of the first camera coordinate system, the second camera coordinate system, and the world coordinate system. Therefore, the coordinate system alignment result can be applied to the above-mentioned posture determination process of the present disclosure.
  • the present disclosure converts the feature points collected by the second camera to the coordinate system of the first camera to calculate the posture together with the feature points collected by the first camera. Since the feature points come from at least two cameras and the coordinate system is unified, more feature points are collected, that is, the feature points involved in the unified processing are more comprehensive, and the determined posture is more accurate, which improves the accuracy of positioning.
  • the posture determination process of the present disclosure takes into account the correlation between frames, combines the feature information of the previous frame image, and uses the data of the previous frame for constraints, which further improves the accuracy of positioning.
  • the above process of determining the posture to be optimized is only an exemplary description.
  • the posture to be optimized can also be determined in combination with the inertial data sensed by the IMU.
  • the present disclosure does not impose any restrictions on this.
  • a sliding window can also be maintained to implement the above-mentioned posture determination method.
  • the terminal device can add the current frame color image group to the sliding window, so as to determine the target posture when the first camera captures the current frame color image in combination with the first color image set and the second color image set contained in the sliding window.
  • the current frame color image group includes the current frame color image captured by the first camera and the current frame color image captured by the second camera.
  • an array can be used to implement the sliding window.
  • the sliding window mentioned in the present disclosure may be a sliding window of fixed size, and the size of the sliding window is characterized by the maximum number of color image groups that can be contained. For example, if the sliding window is configured to contain a maximum of 10 color image groups, the size of the sliding window is 10.
  • the color image group that was first added to the sliding window is removed from the sliding window. For example, if the size of the sliding window is 10, when the color image group of the current frame is added to the sliding window, the first color image group in the sliding window is removed so that after the color image group of the current frame is added to the sliding window, the sliding window still includes 10 color image groups.
  • the following describes the posture determination method of the embodiment of the present disclosure by taking the sliding window size configured as 10 as an example.
  • the sliding window when adding the current frame color image group to the sliding window, if the sliding window is full, the image group that is earliest in the sliding window is removed.
  • the sliding window removes the first color image group in the window.
  • each color image group in the sliding window includes a color image captured by the first camera and a color image captured by the second camera.
  • the following describes the process of constructing error constraints by taking the 9th color image group and the 10th color image group in the sliding window as examples. For each group, the following processing can be performed.
  • the feature points in the color image captured by the first camera in the 9th color image group that match the color image captured by the first camera in the 10th color image group are recorded as
  • the feature points in the color image taken by the first camera in the 10th color image group that match the color image taken by the first camera in the 9th color image group are recorded as
  • the feature points in the color image taken by the second camera in the 9th color image group that match the color image taken by the second camera in the 10th color image group are recorded as Correspondingly, the feature points in the color image taken by the second camera in the 10th color image group that match the color image taken by the second camera in the 9th color image group are recorded as
  • d i is the feature point T w9 is the depth value of the first camera when the ninth color image group is collected. It can be understood that the posture is the posture of the first camera relative to the world coordinate system.
  • norm() represents normalization of the three-dimensional feature points
  • T w10 is the position and posture of the first camera when collecting the 10th color image group.
  • d j is the feature point where T lr is a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera.
  • the above processing is performed to obtain a plurality of reprojection errors for the first camera and a plurality of reprojection errors for the second camera.
  • the variables to be optimized are the 10 positions T w1 ...T w10 of the first camera corresponding to the color image in the acquisition sliding window.
  • the depth value variable of the feature point since it is a known quantity read from the depth map and the error is very small, it is not used as a variable to be optimized in this process.
  • the goal of optimization is to minimize the total error function e total .
  • the optimization of the postures T w1 ...T w10 ends.
  • the optimized T w10 is the target posture when the first camera captures the current frame color image as described in the present disclosure.
  • the nonlinear function can be linearized, as shown in Formula 13: f(x+ ⁇ x) ⁇ f(x)+J ⁇ x (Formula 13)
  • J is the derivative of f(x) with respect to ⁇ x.
  • Equation 14 Equation 14
  • ⁇ x can be obtained and the optimized variables can be updated.
  • the present disclosure also provides a testing method and gives the test results.
  • this exemplary embodiment also provides a posture determination device, which is configured in a terminal device, and the terminal device is also configured with a first camera and at least one second camera.
  • Fig. 15 schematically shows a block diagram of a posture determination apparatus according to an exemplary embodiment of the present disclosure.
  • the posture determination apparatus 15 may include a first error determination module 151 , a second error determination module 153 and a target posture determination module 155 .
  • the first error determination module 151 can be used to obtain matching feature points between two images in a first color image set, and determine the reprojection errors of the matching feature points between two images in the first color image set, where the first color image set consists of a current frame color image captured by a first camera and the first n frames of color images captured by the first camera;
  • the second error determination module 153 can be used to obtain matching feature points between two images in a second color image set, and determine the reprojection errors of the matching feature points between two images in the second color image set in combination with the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera, where the second color image set consists of the current frame color image captured by the second camera and the first n frames of color images captured by the second camera;
  • the target posture determination module 155 can be used to optimize the posture to be optimized when the first camera captures the current frame color image based on the reprojection errors of the matching feature points between two images in the first color image set and the reprojection errors of the matching feature points between
  • the first color image set includes a first color image and a second color image.
  • the process of determining the reprojection error of the matching feature points between the first color image and the second color image by the first error determination module 151 can be configured to perform: obtaining the first matching feature point on the first color image that matches the second color image; determining the three-dimensional feature point of the first matching feature point in the world coordinate system using the first matching feature point, the depth information of the first matching feature point, and the position and posture of the first camera when capturing the first color image; obtaining the second matching feature point on the second color image that matches the first color image; determining the reprojection error of the matching feature point between the first color image and the second color image using the second matching feature point, the three-dimensional feature point of the first matching feature point in the world coordinate system, and the position and posture of the first camera when capturing the second color image.
  • the first color image is the current frame color image
  • the posture when the first camera acquires the first color image is the posture to be optimized when the first camera acquires the current frame color image.
  • the posture determination device 16 may also include a posture estimation module 161 to be optimized.
  • the pose estimation module 161 to be optimized can be configured to execute: obtaining the current frame color image captured by the first camera, and determining the first two-dimensional feature points on the current frame color image captured by the first camera that match the previous frame color image captured by the first camera; obtaining the current frame color image captured by the second camera, and determining the second two-dimensional feature points on the current frame color image captured by the second camera that match the previous frame color image captured by the second camera; using the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera to convert the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system; determining the pose to be optimized when the first camera captures the current frame color image based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame color image captured by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame color image captured by the second camera in the world coordinate system.
  • the pose estimation module 161 to be optimized can also be configured to execute: obtaining the previous frame of color image captured by the first camera, and extracting the feature points of the previous frame of color image captured by the first camera; using the previous frame of depth image aligned with the previous frame of color image captured by the first camera, spatially projecting the feature points of the previous frame of color image captured by the first camera to obtain the three-dimensional feature points of the previous frame of color image captured by the first camera in the first camera coordinate system; according to the pose of the first camera when capturing the previous frame of color image, converting the three-dimensional feature points in the first camera coordinate system to obtain the three-dimensional feature points of the previous frame of color image captured by the first camera in the world coordinate system.
  • the second color image set includes a third color image and a fourth color image.
  • the process of determining the reprojection error of the matching feature points between the third color image and the fourth color image by the second error determination module 153 in combination with the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera can be configured to perform: obtaining a third matching feature point on the third color image that matches the fourth color image; determining the three-dimensional feature point of the third matching feature point in the world coordinate system by using the third matching feature point, the depth information of the third matching feature point, the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera, and the position of the first camera when the second camera captures the third color image; obtaining a fourth matching feature point on the fourth color image that matches the third color image; determining the reprojection error of the matching feature point between the third color image and the fourth color image by using the fourth matching feature point, the three-dimensional feature point of the third matching
  • the target pose determination module 155 may be configured to perform: The reprojection errors of the matching feature points between the two images in the first color image set and the reprojection errors of the matching feature points between the two images in the second color image set are accumulated to determine a total error function, where the total error function is a nonlinear function with the posture when the first camera acquires each color image in the first color image set as a variable, and the posture when the first camera acquires each color image in the first color image set includes the posture to be optimized when the first camera acquires the current frame color image; the total error function is minimized by iterative processing to determine the target posture when the first camera acquires the current frame color image.
  • the posture determination device 17 may further include a sliding window operation module 171 .
  • the sliding window operation module 171 can be configured to perform: when determining the posture to be optimized when the first camera captures the current frame color image, adding the current frame color image group to the sliding window, so as to determine the target posture when the first camera captures the current frame color image in combination with the first color image set and the second color image set contained in the sliding window; wherein the current frame color image group includes the current frame color image captured by the first camera and the current frame color image captured by the second camera.
  • the sliding window operation module 171 can also be configured to perform: when adding the current frame color image group into the sliding window, if the number of color image groups contained in the sliding window is equal to the maximum value of the color image groups that the sliding window can contain, then the color image group that was first added to the sliding window is removed from the sliding window.
  • FIG18 shows a schematic diagram of an electronic device suitable for implementing an exemplary embodiment of the present disclosure.
  • the terminal device of the exemplary embodiment of the present disclosure may be configured as shown in FIG18. It should be noted that the electronic device shown in FIG18 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device of the present disclosure includes at least a processor and a memory, and the memory is used to store one or more programs.
  • the processor can implement the posture determination method of the exemplary embodiment of the present disclosure.
  • the electronic device 180 at least includes: a processor 1810, an internal memory 1821, an external memory interface 1822, a Universal Serial Bus (USB) interface 1830, a charging management module 1840, a power management module 1841, a battery 1842, an antenna, a wireless communication module 1850, an audio module 1860, a display screen 1870, a sensor module 1880, a camera module 1890, etc.
  • a processor 1810 an internal memory 1821, an external memory interface 1822, a Universal Serial Bus (USB) interface 1830
  • a charging management module 1840 a power management module 1841, a battery 1842, an antenna, a wireless communication module 1850, an audio module 1860, a display screen 1870, a sensor module 1880, a camera module 1890, etc.
  • USB Universal Serial Bus
  • the sensor module 1880 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.
  • the structure illustrated in the embodiment of the present disclosure does not constitute a specific limitation on the electronic device 180.
  • the electronic device 180 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 1810 may include one or more processing units, for example, the processor 1810 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor and/or a neural-network processing unit (NPU), etc. Among them, different processing units may be independent devices or integrated in one or more processors. In addition, a memory may be provided in the processor 1810 for storing instructions and data.
  • AP application processor
  • modem processor GPU
  • ISP image signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units may be independent devices or integrated in one or more processors.
  • a memory may be provided in the processor 1810 for storing instructions and data.
  • the electronic device 180 can implement the shooting function through the ISP, the camera module 1890, the video codec, the GPU, the display screen 1870 and the application processor.
  • the electronic device 180 may include at least two camera modules 1890.
  • one camera module is determined as the reference camera, and the feature data collected by the other camera modules is transferred to the coordinate system of the reference camera for processing.
  • the electronic device 180 is configured with two Realsense D455 cameras.
  • the internal memory 1821 can be used to store computer executable program codes, which include instructions.
  • the internal memory 1821 can include a program storage area and a data storage area.
  • the external memory interface 1822 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 180.
  • the present disclosure also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist independently without being assembled into the electronic device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.
  • Computer-readable storage media can send, propagate or transmit programs for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer-readable storage medium can be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • the computer-readable storage medium carries one or more programs.
  • the electronic device implements the method described in the embodiments of the present disclosure.
  • each box in the flow chart or block diagram can represent a module, a program segment, or a part of a code, and the above-mentioned module, program segment, or a part of a code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each box in the block diagram or flow chart, and the combination of the boxes in the block diagram or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware, and the units described may also be arranged in a processor.
  • the names of these units do not constitute limitations on the units themselves in some cases.
  • the technical solution according to the implementation of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network, including several instructions to enable a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the implementation of the present disclosure.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a computing device which can be a personal computer, a server, a terminal device, or a network device, etc.

Abstract

本公开提供了一种位姿确定方法、位姿确定装置、计算机可读存储介质和电子设备,涉及计算机视觉技术领域。该位姿确定方法包括:确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,第一彩色图像集合由第一相机采集的当前帧彩色图像和第一相机采集的前n帧彩色图像组成,结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,第二彩色图像集合由第二相机采集的当前帧彩色图像和第二相机采集的前n帧彩色图像组成,并基于确定出的重投影误差确定第一相机采集当前帧彩色图像时的目标位姿。本公开可以提高定位的精确度和鲁棒性。

Description

位姿确定方法及装置、计算机可读存储介质和电子设备
本申请要求于2022年10月28日提交中国专利局,申请号为202211336046.1,发明名称为“位姿确定方法及装置、计算机可读存储介质和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机视觉技术领域,具体而言,涉及一种位姿确定方法、位姿确定装置、计算机可读存储介质和电子设备。
背景技术
在计算机视觉技术领域,视觉定位是一种利用相机拍摄的图像进行定位以确定相机在真实世界中位姿的技术,其在增强现实、虚拟现实、机器人、智能交通等领域均具有重要的应用价值。
在多个相机执行视觉定位的场景中,可能出现定位精度差的问题。
发明内容
本公开提供一种位姿确定方法、位姿确定装置、计算机可读存储介质和电子设备,进而至少在一定程度上克服视觉定位精度差的问题。
根据本公开的第一方面,提供了一种位姿确定方法,应用于终端设备,终端设备配置有第一相机和至少一个第二相机,该位姿确定方法包括:获取第一彩色图像集合中两两图像之间的匹配特征点,并确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,第一彩色图像集合由第一相机采集的当前帧彩色图像和第一相机采集的前n帧彩色图像组成;获取第二彩色图像集合中两两图像之间的匹配特征点,并结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,第二彩色图像集合由第二相机采集的当前帧彩色图像和第二相机采集的前n帧彩色图像组成;基于第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出第一相机采集当前帧彩色图像时的目标位姿;其中,n为正整数。
根据本公开的第二方面,提供了一种位姿确定装置,配置于终端设备,终端设备还配置有第一相机和至少一个第二相机,该位姿确定装置包括:第一误差确定模块,用于获取第一彩色图像集合中两两图像之间的匹配特征点,并确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,第一彩色图像集合由第一相机采集的当前帧彩色图像和第一相机采集的前n帧彩色图像组成;第二误差确定模块,用于获取第二彩色图像集合中两两图像之间的匹配特征点,并结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,第二彩色图像集合由第二相机采集的当前帧彩色图像和第二相机采集的前n帧彩色图像组成;目标位姿确定模块,用于基于第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出第一相机采集当前帧彩色图像时的目标位姿;其中,n为正整数。
根据本公开的第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的位姿确定方法。
根据本公开的第四方面,提供了一种电子设备,包括处理器;存储器,用于存储一个或多个程序,当一个或多个程序被处理器执行时,使得处理器实现上述的位姿确定方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了本公开实施例的位姿确定系统的系统架构的示意图;
图2示出了本公开实施例的双相机在终端设备上的摆放方式的示意图;
图3示出了本公开实施例的双相机的摆放角度的示意图;
图4示出了本公开实施例的位姿确定方案所涉及的各个处理阶段的示意图;
图5示意性示出了本公开示例性实施方式的位姿确定方法的流程图;
图6示意性示出了本公开实施例的确定待优化位姿的方法的流程图;
图7示出了本公开实施例的双相机点对匹配的示意图;
图8示出了本公开实施例的定位初始化的过程的流程图;
图9示出了本公开实施例的确定两个平面的示意图;
图10示出了本公开实施例的确定地平面的示意图;
图11示出了本公开实施例的确定第一相机坐标系与世界坐标系之间的转换矩阵的过程的流程图;
图12示出了本公开实施例的滑动窗口的示意图;
图13示出了本公开实施例的确定两帧间重投影误差的示意图;
图14示出了本公开测试方案时机器狗的行走路线的示意图;
图15示意性示出了根据本公开示例性实施方式的位姿确定装置的方框图;
图16示意性示出了本公开另一示例性实施方式的位姿确定装置的方框图;
图17示意性示出了本公开又一示例性实施方式的位姿确定装置的方框图;
图18示意性示出了根据本公开的示例性实施方式的电子设备的方框图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、操作等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的操作。例如,有的操作还可以分解,而有的操作可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。另外,下面所有的术语“第一”、“第二”、“第三”、“第四”等仅是为了区分的目的,不应作为本公开内容的限制。
通过视觉定位技术,使得计算机设备可以自主感知自身在环境中的位姿状态,以便执行跟踪、监控、交互、显示画面、播放音频等任意用户提出的任务。定位的精确程度极大影响计算机设备功能的实现。
为了提高设备视觉定位的精确程度,本公开实施方式提供了一种新的定位方案。
图1示出了本公开实施例的位姿确定系统的系统架构的示意图。参考图1,终端设备1可以包 括处理器100、第一相机110和至少一个第二相机120。
终端设备1可以例如包括机器人、智能监控设备、智能跟踪设备等。其可以是一个设备整体,也可以是由多个实体单元组成的设备系统。
例如,终端设备1可以是机器狗。机器狗是一种机器人形态,具有灵活、移动能力强等优点,可以实现安防巡逻、运送物品、情感陪伴等任务。
第一相机110和至少一个第二相机120作为本公开实施方式位姿确定方案的输入传感器,可以将感测到的彩色图像和深度图像传输至处理器100。
例如,第一相机110和第二相机120可以是RealsenseD455相机。RealsenseD455相机由一个RGB相机、两个IR(红外)相机、一个IR发射器组成。RGB相机输出彩色图像,两个IR相机可以输出与彩色图像对齐的稠密深度图。RealsenseD455相机的FOV(视场角)为水平方向90°、竖直方向65°。
在终端设备1包括第一相机110和一个第二相机120的情况下,第一相机110可以是左目(left)相机,第二相机120可以是右目(right)相机,在下述实施例中,涉及的左目相机可以理解为是第一相机110,涉及的右目相机可以理解为是第二相机120。然而,应当理解的是,“左”、“右”、“第一”、“第二”仅是为了区分的示例性描述,在本公开另一些实施例中,第一相机110可以是右目相机,第二相机120可以是左目相机,本公开对此不做限制。
以第一相机110和一个第二相机120共两个相机为例,图2示出了本公开实施例的该双相机在终端设备上的摆放方式的示意图。应当理解的是,图2所示的摆放方式仅是示例性的说明,根据终端设备的类型以及相机配置空间,还可以存在多种摆放方式,本公开对此不做限制。
图3示出了本公开实施例的双相机的摆放角度的示意图。对于均是竖直摆放的第一相机110和第二相机120,它们的视角均为65°,分别对应图3中的角A和角B。在摆放时,可以将第一相机110的最左边视线与第二相机120的最右边视线平行,此时两个相机可以得到最大视野范围,即130°,对应图3中的角C。第一相机110与第二相机120之间存在狭小的共视区域。按照上述角度的设计,可以确定出第一相机110与第二相机120放置的夹角为115°,对应图3中的角D。
由此,将第一相机110与第二相机120按115°夹角竖直并列摆放,两个相机的视场为水平方向130°、竖直方向90°。实现了对两个相机视野的最大化叠加,有效增加了终端设备1的视野,为后续定位算法提供更加充足的准确。
另外,第一相机110与第二相机120支持多相机硬件同步,可以将第一相机110与第二相机120通过导线连接起来,使用同一脉冲信号触发两个相机同时曝光,实现了多个相机的硬件同步。经过硬件同步设置后,输入到后续定位算法中的图像即是同一时刻拍摄出的图像。由此,避免了由于多相机拍摄时刻不一致而造成的额外误差。
在通过上述方式摆放第一相机和第二相机之后,可以对两个相机分别进行内外参的标定,以供后续算法使用。本公开对标定的过程不做限制。
在本公开实施方式的位姿确定方案中,将第一相机110采集的当前帧彩色图像和第一相机110采集的前n帧彩色图像记为第一彩色图像集合。处理器100可以获取第一彩色图像集合中两两图像之间的匹配特征点,并确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差。其中,n为正整数。
将第二相机120采集的当前帧彩色图像和第二相机120采集的前n帧彩色图像记为第二彩色图像集合。处理器100可以获取第二彩色图像集合中两两图像之间的匹配特征点,并结合第一相机110的第一相机坐标系与第二相机120的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差。
接下来,处理器100可以基于第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出第一相机采集当前帧彩色图像时的目标位姿。
其中,第一相机采集当前帧彩色图像时的待优化位姿可以被理解为预先确定出的粗略位姿,通过 重投影误差进行优化而得到的目标位姿是与该粗略位姿对应的精细位姿。目标位姿的位姿准确度优于待优化位姿的位姿准确度。
在终端设备1配置有多个第二相机120的情况下,针对每个第二相机120,均确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差。在确定目标位姿时,基于第一相机110以及所有第二相机120对应的重投影误差而进行位姿优化。
可以理解的是,第一相机110和第二相机120在终端设备1上的摆放位置固定,在确定出第一相机110当前位姿的情况下,即可以得到第二相机120当前的位姿和终端设备1的当前位姿。
此外,在终端设备1配置两个以上相机的情况下,可以将任意一个相机确定为算法实现上的第一相机110,并将其余相机确定为第二相机120。
基于本公开实施方式的位姿确定方案,可以对已确定出的位姿进行进一步优化,得到更加准确的目标位姿。本公开方案通过分别对不同相机建立误差约束,再联合这些误差约束实现位姿的优化,可以提高定位的精确度和鲁棒性。另外,本公开误差约束的建立依赖当前帧与前n帧两两之间的匹配特征点,也就是说,方案考虑到了邻近帧帧间的关联性,进一步提高了定位的准确度和鲁棒性。
在实现本公开实施方式的位姿确定过程中,涉及多个处理阶段。参考图4,涉及的处理阶段包括但不限于坐标系对齐阶段、定位初始化阶段、实时定位阶段和位姿优化阶段。其中,坐标系对齐阶段、定位初始化阶段和实时定位阶段被配置为确定第一相机采集当前帧彩色图像时的待优化位姿,位姿优化阶段被配置为对该待优化位姿进行优化,以确定出第一相机采集当前帧彩色图像时的目标位姿,即优化后的位姿。
针对坐标系对齐阶段,终端设备确定出第一相机坐标系与世界坐标系之间的转换矩阵。
首先,终端设备可以利用第一相机输出的深度图像和第二相机输出的深度图像,构建点云。其中,可以两幅深度图像对应的三维空间点进行合并,以得到三维特征点的点云。
接下来,终端设备利用平面检测算法从点云中提取平面信息,并根据提取到的平面信息筛选出指定平面(如地平面)。
然后,终端设备可以根据指定平面的法向量和重力向量计算转换矩阵,以实现第一相机坐标系与世界坐标系的对齐。
另外,可以理解的是,基于预先得到的内外参标定结果,可以获知第一相机坐标系与第二相机坐标系之间的转换矩阵。在这种情况下,也可以得到第二相机坐标系与世界坐标系之间的转换矩阵,实现第一相机坐标系、第二相机坐标系、世界坐标系三者之间的对齐。
针对定位初始化阶段,终端设备可以确定第一相机初始拍摄彩色图像时的位姿。应当理解的是,本公开所说的相机拍摄图像时的位姿指的是在世界坐标系下的位姿。
一方面,终端设备可以确定第一相机采集的初始帧彩色图像对应的三维特征点,该三维特征点是在第一相机坐标系下的特征点。
另一方面,可以设置初始旋转矩阵和初始平移向量。例如,初始旋转矩阵为单位矩阵,初始平移向量为[0,0,0]。
在确定出初始帧彩色图像对应的三维特征点以及初始旋转矩阵和初始平移向量之后,即完成在第一相机坐标系下的定位初始化。
接下来,结合坐标系对齐阶段确定出的第一相机坐标系与世界坐标系之间的转换矩阵,可以将第一相机坐标系下的定位初始化结果转换为世界坐标系下的定位初始化结果,即确定出第一相机采集初始帧彩色图像时的位姿。
针对实时定位阶段,终端设备可以结合定位初始化阶段确定出的初始位姿对实时得到当前帧的位姿。在此过程中,可以将第二相机的特征转移到第一相机坐标系下,与第一相机的特征联合进行位姿求解,完成当前帧的位姿预测,得到第一相机采集当前帧彩色图像时的待优化位姿。
针对位姿优化阶段,终端设备可以对第一相机采集当前帧彩色图像时的优化位姿进行优化,以得到目标位姿。
本公开实施方式针对位姿优化阶段的处理过程,提供了一种位姿确定方法。
图5示意性示出了本公开的示例性实施方式的位姿确定方法的流程图。参考图5,该位姿确定方法可以包括以下操作:
S52.获取第一彩色图像集合中两两图像之间的匹配特征点,并确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,第一彩色图像集合由第一相机采集的当前帧彩色图像和第一相机采集的前n帧彩色图像组成。
在本公开的示例性实施方式中,将第一相机采集的当前帧彩色图像和第一相机采集的前n帧彩色图像组成的集合记为第一彩色图像集合。可以理解的是,前n帧彩色图像是当前帧彩色图像之前的连续帧图像,其中,n为正整数,本公开对n的具体取值不做限制。n越大,则算法精度越高,耗费的计算资源也越多;n越小,则算法精度相对降低,而处理速度越快。实施时可以基于要求的精度、耗时以及设备的处理能力等综合确定出n的取值。
例如将n设置为5,如果当前帧彩色图像是第10帧图像,则前n帧彩色图像包括第9帧图像、第8帧图像、第7帧图像、第6帧图像和第5帧图像。
在确定出第一彩色图像集合的情况下,终端设备可以获取第一彩色图像集合中两两图像之间的匹配特征点。其中,第一彩色图像集合中两两图像之间的匹配特征点指的是第一彩色图像集合中所有图像对之间的匹配特征点。应当理解的是,图像对不限于相邻帧,第一彩色图像集合中任意两个彩色图像均构成图像对。
应当注意的是,两两图像之间的匹配特征点是2D-2D(二维-二维)特征点,终端设备可以利用特征点匹配算法确定第一彩色图像集合中两两图像之间的匹配特征点,本公开对此不做限制。
在确定出第一彩色图像集合中两两图像之间的匹配特征点之后,终端设备可以确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差。
下面以第一彩色图像集合包含的第一彩色图像和第二彩色图像为例,对确定二者之间的匹配特征点的重投影误差的过程进行说明。
首先,终端设备可以获取第一彩色图像上与第二彩色图像匹配的特征点,记为第一匹配特征点。终端设备可以获取第一匹配特征点的深度信息,该深度信息可以由第一相机输出,或者可以由终端设备配备的其他深度相机感测得到,本公开对此不做限制。
接下来,终端设备可以利用第一匹配特征点、第一匹配特征点的深度信息以及第一相机采集第一彩色图像时的位姿,确定第一匹配特征点在世界坐标系下的三维特征点。
具体的,可以将第一匹配特征点、第一匹配特征点的深度信息以及第一相机采集第一彩色图像时的位姿相乘,以得到第一匹配特征点在世界坐标系啊下的三维特征点。
本公开实施例将第二彩色图像上与第一彩色图像匹配的特征点记为第二匹配特征点。然后,在获取到该第二匹配特征点的情况下,终端设备可以利用第二匹配特征点、第一匹配特征点在世界坐标系下的三维特征点以及第一相机采集第二彩色图像时的位姿,确定第一彩色图像与第二彩色图像之间的匹配特征点的重投影误差。
具体的,可以将第一匹配特征点在世界坐标系下的三维特征点与第一相机采集第二彩色图像时的位姿的逆相乘,将相乘的结果进行归一化处理,再将归一化的结果与第二匹配特征点相减,以构建出第一彩色图像与第二彩色图像之间的匹配特征点的重投影误差。
可以理解的是,上述第一匹配特征点和第二匹配特征点均可以是归一化后的特征点。
S54.获取第二彩色图像集合中两两图像之间的匹配特征点,并结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,第二彩色图像集合由第二相机采集的当前帧彩色图像和第二相机采集的前n帧彩色图像组成。
在本公开的示例性实施方式中,将第二相机采集的当前帧彩色图像和第二相机采集的前n帧彩色图像组成的集合记为第二彩色图像集合。此处的n与操作S52中的n相同。
在确定出第二彩色图像集合的情况下,终端设备可以获取第二彩色图像集合中两两图像之间的匹配特征点。类似的,本公开所说的第二彩色图像集合中的两两图像不限于相邻帧,第二彩色图像 集合中任意两个彩色图像均为所说的两两图像。本公开对确定匹配特征点的方式也不做限制。
在确定出第二彩色图像集合中两两图像之间的匹配特征点之后,终端设备可以结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差。
下面以第二彩色图像集合包含的第三彩色图像和第四彩色图像为例,对确定二者之间的匹配特征点的重投影误差的过程进行说明。
首先,终端设备可以获取第三彩色图像上与第四彩色图像匹配的特征点,记为第三匹配特征点。终端设备可以获取第三匹配特征点的深度信息,该深度信息可以由第二相机输出,或者可以由终端设备配备的其他深度相机感测得到,本公开对此不做限制。
接下来,终端设备可以利用第三匹配特征点、第三匹配特征点的深度信息、第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵以及第二相机采集第三彩色图像时第一相机的位姿,确定第三匹配特征点在世界坐标系下的三维特征点。
具体的,可以将第三匹配特征点、第三匹配特征点的深度信息、第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵、第二相机采集第三彩色图像时第一相机的位姿相乘,以得到第三匹配特征点在世界坐标系下的三维特征点。
本公开实施例将第四彩色图像上与第三彩色图像匹配的特征点记为第四匹配特征点。然后,在获取到该第四匹配特征点的情况下,终端设备可以利用第四匹配特征点、第三匹配特征点在世界坐标系下的三维特征点、第二相机采集第四彩色图像时第一相机的位姿以及第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵,确定第三彩色图像与第四彩色图像之间的匹配特征点的重投影误差。
具体的,可以将第三匹配特征点在世界坐标系下的三维特征点、第二相机采集第四彩色图像时第一相机的位姿的逆以及第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵的逆,三者相乘,将相乘的结果进行归一化处理,再将归一化的结果与第四匹配特征点相减,以构建出第三彩色图像与第四彩色图像之间的匹配特征点的重投影误差。
可以理解的是,上述第三匹配特征点和第四匹配特征点均可以是归一化后的特征点。
S56.基于第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出第一相机采集当前帧彩色图像时的目标位姿。
在本公开的示例性实施方式中,终端设备可以对操作S52和操作S54确定出的重投影误差进行累加,以得到总误差函数。可以理解的是,该总误差函数是以第一相机采集第一彩色图像集合中各彩色图像时的位姿为变量的非线性函数。鉴于第一彩色图像集合包括当前帧彩色图像,故第一相机采集第一彩色图像集合中各彩色图像时的位姿包括第一相机采集当前帧彩色图像时的待优化位姿。
终端设备可以利用迭代处理的方式使总误差函数最小,在总误差函数达到最小值时,可以确定出第一相机采集当前帧彩色图像时的目标位姿,即是待优化位姿进行优化后的位姿。
具体的,可以通过求解总误差函数中每一项误差关于优化变量的雅克比矩阵,利用非线性优化的方式迭代优化,以最小化总误差函数,最终确定出第一相机采集当前帧彩色图像时的精准位姿。
在上述处理过程中,涉及第一相机采集当前帧彩色图像时的待优化位姿。具体的,上述第一彩色图像可以是当前帧彩色图像,在这种情况下,上述第一相机采集第一彩色图像时的位姿是第一相机采集当前帧彩色图像时的待优化位姿。
应当注意的是,该待优化位姿可以是预先确定出的位姿,本公开还提供了一种待优化位姿的确定方法。下面参考图6对此过程进行进行说明。
S602.获取第一相机采集的当前帧彩色图像,确定第一相机采集的当前帧彩色图像上与第一相机采集的上一帧彩色图像匹配的第一二维特征点。
在获取到第一相机采集的当前帧彩色图像之后,终端设备可以提取第一相机采集的当前彩色图像的特征点。
本公开示例性实施方式采用的特征提取算法可以包括但不限于FAST特征点检测算法、DOG特征点检测算法、Harris特征点检测算法、SIFT特征点检测算法、SURF特征点检测算法等。特征描述子可以包括但不限于BRIEF特征点描述子、BRISK特征点描述子、FREAK特征点描述子等。
根据本公开的一个实施例,特征提取算法和特征描述子的组合可以是FAST特征点检测算法和BRIEF特征点描述子。根据本公开的另一些实施例,特征提取算法和特征描述子的组合可以是DOG特征点检测算法和FREAK特征点描述子。
应当理解的是,还可以针对不同纹理场景采用不同的组合形式,例如,针对强纹理场景,可以采用FAST特征点检测算法和BRIEF特征点描述子来进行特征提取;针对弱纹理场景,可以采用DOG特征点检测算法和FREAK特征点描述子来进行特征提取。
在当前帧彩色图像对应的上一帧彩色图像的处理过程中,同样存在提取特征点的过程。由此,终端设备可以利用第一相机采集的当前帧彩色图像的特征点以及第一相机采集的上一帧彩色图像的特征点,确定出两张图像之间匹配的二维特征点,即本公开所说的第一二维特征点。
具体的,可以采用光流法确定特征点的匹配关系,即利用第一相机采集的当前帧彩色图像的特征点以及第一相机采集的上一帧彩色图像的特征点进行光流跟踪,以确定出第一二维特征点。另外,还可以采用其他图像匹配方法来确定2D-2D特征点对,本公开对此不做限制。
S604.获取第二相机采集的当前帧彩色图像,确定第二相机采集的当前帧彩色图像上与第二相机采集的上一帧彩色图像匹配的第二二维特征点。
应当理解的是,与操作S602相比,虽然都存在当前帧彩色图像和上一帧彩色图像的描述,然而,操作S602中的当前帧彩色图像和上一帧彩色图像是由第一相机采集,操作S604中的当前帧彩色图像和上一帧彩色图像是由第二相机采集。
在获取到第二相机采集的当前帧彩色图像之后,终端设备可以提取第二相机采集的当前彩色图像的特征点。特征点的提取方式可以与操作S602中提取特征点的方式相同,不再赘述。
终端设备可以利用第二相机采集的当前帧彩色图像的特征点以及第二相机采集的上一帧彩色图像的特征点进行光流跟踪,以确定出第二二维特征点。
S606.利用第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵将第二二维特征点转换为第一相机坐标系下的第三二维特征点。
在本公开的示例性实施方式中,为了区分,将第一相机的相机坐标系记为第一相机坐标系,将第二相机的相机坐标系记为第二相机坐标系。
在第一相机和第二相机于终端设备上的摆放位置固定的情况下,预先对第一相机和第二相机进行内参、外参的标定,从标定结果中可以确定出第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵。
终端设备可以获取第一相机坐标系与第二相机坐标系之间的转换矩阵以及第二二维特征点的深度信息,并根据该转换矩阵、第二二维特征点的深度信息以及第二特征点确定第三二维特征点。该第三二维特征点为第二二维特征点转换到第一相机坐标系下的二维特征点。
具体的,可以将转换矩阵、第二二维特征点的深度信息以及第二二维特征点相乘,并对相乘的结果进行归一化处理,以确定出第三二维特征点。其中,乘法运算中的第二二维特征点指的是这些特征点的位置坐标信息。可以利用公式1确定出第三二维特征点
其中,Tlr为第一相机坐标系与第二相机坐标系之间的转换矩阵,dj为第二二维特征点的深度值,为第二二维特征点。
S608.根据第一二维特征点、第三二维特征点、第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点以及第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点,确定第一相机 采集当前帧彩色图像时的待优化位姿。
在本公开的示例性实施方式中,第一二维特征点和第三二维特征点组成二维坐标信息,第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点和第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点组成三维坐标信息。
终端设备可以将二维坐标系信息与三维坐标信息关联,以得到点对信息,并利用该点对信息求解透视n点(Perspective-n-Point,PnP)问题,并结合求解结果确定所述第一相机采集当前帧彩色图像时的待优化位姿。
其中,PnP是机器视觉领域的方法,可以根据场景中的n个特征点来确定相机的相对位姿。具体可以根据场景上的n个特征点来确定相机的旋转矩阵和平移向量。
应当注意的是,本公开确定上一帧彩色图像在世界坐标系下的三维特征点的过程可以当前帧的处理过程中进行,也可以在上一帧的处理过程中进行,本公开对此不做限制。
下面对确定第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点的过程进行说明。
首先,终端设备可以获取第一相机采集的上一帧彩色图像,并提取第一相机采集的上一帧彩色图像的特征点。其中,提取特征点的过程与操作S602中的过程相同,在此不在赘述。
接下来,终端设备可以利用与第一相机采集的上一帧彩色图像对齐的上一帧深度图像,对第一相机采集的上一帧彩色图像的特征点进行空间投射,以得到第一相机采集的上一帧彩色图像在第一相机坐标系下的三维特征点。其中,该上一帧深度图像可以由第一相机输出,或者可以由终端设备配备的其他深度相机得到,本公开对此不做限制。
另外,为了进一步提高本公开定位的精度,还可以对空间投射过程进行约束。具体的,终端设备可以利用与第一相机采集的上一帧彩色图像对齐的上一帧深度图像,对第一相机采集的上一帧彩色图像的特征点中处于预定深度范围内的特征点进行空间投射,以得到第一相机采集的上一帧彩色图像在第一相机坐标系下的三维特征点。
预定深度范围基于深度测量的量程确定出,深度相机类型、型号的不同,预定深度范围的取值可能存在差异,本公开对预定深度范围的具体取值不做限制。例如,深度值大于0.5m且小于6m的特征点进行空间投射。
然后,终端设备可以根据第一相机采集上一帧彩色图像时的位姿,对第一相机坐标系下的三维特征点进行转换,以得到第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点。参考公式2:
其中,为第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点,为第一相机采集的上一帧彩色图像在第一相机坐标系下的三维特征点,Tw_last为第一相机采集上一帧彩色图像时的位姿。
需要说明的是,第一相机采集上一帧彩色图像时的位姿在上一帧图像的处理过程中可以确定出,也就是说,在当前帧的处理过程中,上一帧对应的位姿是已知的。对于初始的位姿,在本公开定位初始化的过程中进行说明。
下面对确定第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点的过程进行说明。
首先,终端设备可以获取第二相机采集的上一帧彩色图像,并提取第二相机采集的上一帧彩色图像的特征点。其中,提取特征点的过程与操作S602中的过程相同,在此不在赘述。
接下来,终端设备可以利用与第二相机采集的上一帧彩色图像对齐的上一帧深度图像,对第二相机采集的上一帧彩色图像的特征点进行空间投射,以得到第二相机采集的上一帧彩色图像在第二相机坐标系下的三维特征点。其中,该上一帧深度图像可以由第二相机输出,或者可以由终端设备配备的其他深度相机得到,本公开对此不做限制。
类似地,为了进一步提高本公开定位的精度,还可以对空间投射过程进行约束。具体的,终端设备可以利用与第二相机采集的上一帧彩色图像对齐的上一帧深度图像,对第二相机采集的上一帧彩色图像的特征点中处于预定深度范围内的特征点进行空间投射,以得到第二相机采集的上一帧彩色图像在第二相机坐标系下的三维特征点。
预定深度范围基于深度测量的量程确定出,深度相机类型、型号的不同,预定深度范围的取值可能存在差异,本公开对预定深度范围的具体取值不做限制。例如,深度值大于0.5m且小于6m的特征点进行空间投射。
随后,终端设备可以利用第一相机坐标系与第二相机坐标系之间的转换矩阵将第二相机采集的上一帧彩色图像在第二相机坐标系下的三维特征点转换为第一相机坐标系下的三维特征点。
然后,终端设备可以根据第一相机采集上一帧彩色图像时的位姿,对该转换而来的第一相机坐标系下的三维特征点再次进行转换,以得到第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点。
下面参考公式3对上述过程进行说明:
其中,为第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点,为第二相机采集的上一帧彩色图像在第二相机坐标系下的三维特征点,Tw_last为第一相机采集上一帧彩色图像时的位姿,Tlr为第一相机坐标系与第二相机坐标系之间的转换矩阵。
结合上述点对匹配关系,图7给出了第一相机和第二相机点对匹配进而实现PnP位姿求解的示意图,其中涉及当前帧2D-2D特征点匹配的关系以及3D-2D特征点的匹配关系。
在上述确定上一帧彩色图像在世界坐标系下的三维特征点的过程中,利用了第一相机采集上一帧彩色图像时的位姿。下面对第一相机的初始位姿的确定过程进行说明。
根据本公开的一些实施例,首先,终端设备可以获取第一相机采集的初始帧彩色图像,并提取第一相机采集的初始帧彩色图像的特征点。其中,提取特征点的过程与操作S602中的过程相同,在此不在赘述。
接下来,终端设备可以利用与第一相机采集的初始帧彩色图像对齐的初始帧深度图像,对第一相机采集的初始帧彩色图像的特征点进行空间投射,以得到第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点。
类似地,为了进一步提高本公开定位的精度,还可以对空间投射过程进行约束。具体的,终端设备可以利用与第一相机采集的初始帧彩色图像的特征点中处于预定深度范围内的特征点进行空间投射,以得到第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点。
预定深度范围基于深度测量的量程确定出,深度相机类型、型号的不同,预定深度范围的取值可能存在差异,本公开对预定深度范围的具体取值不做限制。例如,深度值大于0.5m且小于6m的特征点进行空间投射。
随后,终端设备可以根据第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点、初始旋转矩阵和初始平移向量,确定出第一相机在第一相机坐标系下的初始定位结果。
在本公开的一个实施例中,可以将初始旋转矩阵设定为单位矩阵,将平移向量设置为[0,0,0]。
应当注意的是,在得知第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点、初始旋转矩阵和初始平移向量的情况下,此时确定出的仅是第一相机在第一相机坐标系下的位姿。为了得到应用于后续当前帧处理过程的位姿,需要对该位姿进行转换,以得到第一相机在世界坐标系下的位姿。
具体的,终端设备可以利用第一相机坐标系与世界坐标系之间的转换矩阵,对第一相机在第一相机坐标系下的初始定位结果进行转换,以确定出第一相机采集初始帧彩色图像时的位姿。
根据本公开的另一些实施例,针对第一相机的初始位姿的确定过程还可以结合第二相机的特征数据,下面对此过程进行说明。
一方面,终端设备可以确定出第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点。
另一方面,终端设备可以获取第二相机采集的初始帧彩色图像,并提取第二相机采集的初始帧彩色图像的特征点。其中,提取特征点的过程与操作S602中的过程相同,在此不在赘述。
终端设备可以利用与第二相机采集的初始帧彩色图像对齐的初始帧深度图像,对第二相机采集的初始帧彩色图像的特征点进行空间投射,以得到第二相机采集的初始帧彩色图像在第二相机坐标系下的 三维特征点。
类似地,还可以对空间投射过程进行约束。具体的,终端设备可以利用与第二相机采集的初始帧彩色图像的特征点中处于预定深度范围内的特征点进行空间投射,以得到第二相机采集的初始帧彩色图像在第二相机坐标系下的三维特征点。
预定深度范围基于深度测量的量程确定出,深度相机类型、型号的不同,预定深度范围的取值可能存在差异,本公开对预定深度范围的具体取值不做限制。例如,深度值大于0.5m且小于6m的特征点进行空间投射。
接下来,终端设备可以利用第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵,将第二相机采集的初始帧彩色图像在第二相机坐标系下的三维特征点转换至第一相机坐标系下的三维特征点。
该转换后的三维特征点和第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点可以合并,得到合并后的三维特征点。可以理解的是,合并后的三维特征点是在第一相机坐标系下的三维特征点。
随后,终端设备可以根据合并后的三维特征点、初始旋转矩阵和初始平移向量,确定出第一相机在第一相机坐标系下的初始定位结果。例如,可以将初始旋转矩阵设定为单位矩阵,将平移向量设置为[0,0,0]。
然后,终端设备可以利用第一相机坐标系与世界坐标系之间的转换矩阵,对第一相机在第一相机坐标系下的初始定位结果进行转换,以确定出第一相机采集初始帧彩色图像时的位姿。
下面将参考图8对本公开实施例的定位初始化的过程进行说明。
在操作S802中,终端设备可以获取第一相机采集的初始帧彩色图像,并提取第一相机采集的初始帧彩色图像的特征点。
在操作S804中,终端设备可以结合与第一相机采集的初始帧彩色图像对齐的深度图像进行空间投射,以得到第一相机采集的初始帧彩色图像在第一相机坐标系下的三维特征点。如上述实施例中说明的是,操作S804确定出的三维特征点还可以包括第二相机采集初始帧彩色图像对应的三维特征点。
在操作S806中,终端设备可以根据操作S804确定出的三维特征点、初始旋转矩阵和初始平移向量,确定第一相机在第一相机坐标系下的初始定位结果。
在操作S808中,终端设备可以利用第一相机坐标系与世界坐标系之间的转换矩阵对初始定位结果进行转换,以确定出第一相机采集初始帧彩色时的位姿,完成定位初始化。
在上述处理过程中,利用到了第一相机坐标系与世界坐标系之间的转换矩阵,对于该预先确定的转换矩阵,本公开实施方式提供了一种坐标系对齐方案。具体的,结合深度信息来实现坐标系对齐,为了区分,在下面的实施例中,采用参考深度图像的术语对坐标系对齐的过程进行说明。
首先,终端设备可以获取第一相机输出的参考深度图像。
接下来,在结合第一相机输出的参考深度图像确定出场景中存在指定平面的情况下,终端设备可以根据指定平面的法向量和重力向量确定第一相机坐标系与世界坐标系之间的转换矩阵。
其中,重力向量可以为Ng(0,0,1),在这种情况下,指定平面通常为地平面,以与终端设备为例如机器狗的场景匹配。然而,可以理解的是,指定平面还可以是特定场景下人为指定的平面,例如墙面、桌面等,本公开对此不做限制。
如果将指定平面的法向量记为nc,将nc旋转Rwc之后,可与Ng重合,即可实现第一相机坐标系与世界坐标系的对齐。其中。Rwc为第一相机坐标系与世界坐标系之间的转换矩阵,Rwc的转轴ω可以由Ng与nc叉乘得到,如公式4所示:
ω=Ng×nc        (公式4)
Rwc的转角θ可以由Ng与nc点乘得到,如公式5所示:
转轴ω和转角θ构成了第一相机坐标系与世界坐标系之间的旋转向量,根据罗德里格斯公式,终端 设备可以计算出第一相机坐标系与世界坐标系之间的转换矩阵Rwc。由此,坐标系对齐的线程结束。
在上述处理过程中,如果场景中不存在指定平面,则终端设备可以返回获取参考深度图像的操作,重新获取参考深度图像,并进行是否存在指定平面的判断过程。
下面对指定平面的确定过程进行说明。
首先,终端设备可以结合第一相机输出的参考深度图像,确定出第一相机对应的点云,记为参考点云。
根据本公开的一些实施例,终端设备针对第一相机输出的参考深度图像上的每一个像素点,根据像素点、像素点的深度值和第一相机的相机内参确定参考深度图像上各像素点的三维空间点。公式6给出了此处确定三维空间点的方式:
P=z*K-1*p        (公式6)
其中,P表示投射到空间的三维空间点,z表示该像素点的深度值,K-1表示相机内参矩阵的逆,p表示该像素点的坐标位置。
在这些实施例中,可以由经此过程得到三维空间点构建出第一相机对应的参考点云。
根据本公开的另一些实施例,一方面,终端设备针对第一相机输出的参考深度图像上的每一个像素点,根据像素点、像素点的深度值和第一相机的相机内参确定参考深度图像上各像素点的三维空间点。
另一方面,终端设备可以获取第二相机输出的参考深度图像,并结合上述公式6确定第二相机输出的参考深度图像上每一个像素点的三维空间点。
终端设备可以根据第一相机坐标系与第二相机坐标系之间的转换矩阵将第二相机输出的参考深度图像上每一个像素点的三维空间点进行转换,以得到转换后的三维空间点。
由此,将第一相机输出的参考深度图像上每一个像素点的三维空间点与上述转换后的三维空间点合并,以构建出第一相机对应的参考点云。参考公式7:
PC_mixture=PC_left+Tlr*PC_righ t     (公式7)
其中,PC_mixture为确定出的参考点云,PC_righ t为第二相机输出的参考深度图像上每一个像素点的三维空间点,PC_left为第一相机输出的参考深度图像上每一个像素点的三维空间点,Tlr为第一相机坐标系与第二相机坐标系之间的转换矩阵。
在这些实施例中,参考点云的构建融合了第二相机输出的深度图像的信息,由此,空间特征点更加全面,提高算法的准确度。
在确定出第一相机对应的参考点云之后,终端设备可以提取参考点云的平面信息。本公开对平面提取方式不做限制,可以采用ransac拟合的方式、法向量区域生长的方式、层次聚类的方式等等,只要能够提取出场景中的平面信息即可。本公开一些实施例采用了基于层次聚类的平面提取算法peac,参考图9,利用该算法可以提取到的两个平面,图9仅是示例,利用上述算法可以提取到场景中的所有平面。
可以理解的是,提取到的平面信息包括但不限于平面id、平面的法向量、平面距相机的距离等。
在基于参考点云提取到平面之后,终端设备可以根据参考点云的平面信息筛选指定平面。具体的,终端设备可以根据参考点云的平面信息中包含的平面距第一相机的距离信息筛选指定平面。
在该距离信息中包含预定距离范围内的距离的情况下,终端设备可以确定与该距离对应的候选平面,此时确定出的候选平面的数量为一个或多个。
在候选平面的数量为一个的情况下,终端设备可以将该候选平面确定为指定平面。
在候选平面的数量为多个的情况下,终端设备可以将距第一相机的距离最接近距离阈值的候选平面确定为指定平面。其中,该距离阈值在上述预定距离范围内。
图10示出了筛选出地平面的示意图,相对于平面检测的结果,通过上述基于距离的筛选过程,剔除了例如天花板等平面。
以终端设备是机器狗为例,终端设备配置有第一相机和一个第二相机,两个相机的配置位置固定,在实施方案时,控制机器狗运动一小段时间,仅在地平面上运动。基于此先验条件,地平面在第一相机坐标系下的位置基本固定。地平面距离相机的高度与机器狗的高度相当,约为0.3m。由此,可以将上 述预定距离范围设置为0.25m至0.35m,作为地平面。如果筛选出多个候选平面,则将距离最近接0.3m的平面作为地平面。
应当理解的是,如果在此过程中未检测到地平面,则控制终端设备不断重复上述深度图像确定平面以及平面筛选的过程,直至终端设备检测到地平面为止。
下面参考图11对本公开实施例的坐标系对齐的过程进行说明。
在操作S1102中,终端设备获取第一相机输出的参考深度图像,并将该参考深度图像反投影以得到空间中的三维空间点。
在操作S1104中,终端设备获取第二相机输出的参考深度图像,并将该参考深度图像反投影以得到空间中的三维空间点。
在操作S1106中,终端设备将操作S1104得到的三维空间点转换至第一相机坐标系下的三维空间点。
在操作S1108中,终端设备将操作S1102得到的三维空间点与操作S1106得到的三维空间点合并,以得到第一相机对应的参考点云。
在操作S1110中,终端设备可以基于参考点云提取平面信息。
在操作S1112中,终端设备可以对提取到的平面进行筛选,确定出地平面;
在操作S1114中,终端设备可以利用地平面的法向量和重力向量确定第一相机坐标系与世界坐标系之间的转换矩阵,以完成第一相机坐标系与世界坐标系的对齐。
另外,鉴于第一相机坐标系与第二相机坐标系之间的关系已通过标定确定出,由此,还可以得到第二相机坐标系与世界坐标系之间的转换矩阵,以实现第一相机坐标系、第二相机坐标系、世界坐标系三者的对齐。由此,可以将坐标系对齐结果应用于本公开上述位姿确定过程中。
本公开实施方式的上述待优化位姿的确定过程,虽然确定的是待优化位姿,然而,本公开通过将第二相机采集的特征点转换到第一相机坐标系下,以与第一相机采集的特征点一并进行位姿计算,由于特征点来自至少两个相机,并进行了坐标系的统一,采集的特征点更多,即参与统一处理的特征点更全面,确定出的位姿更准确,提高了定位的精确度。另外,本公开的位姿确定过程考虑到了帧间的关联性,结合了上一帧图像的特征信息,用上一帧的数据进行约束,进一步提高了定位的精确度。
需要说明的是,上述确定待优化位姿的过程仅是示例性的描述,在终端设备配备有IMU(Inertial Measurement Unit,惯性测量单元)的情况下,还可以结合IMU感测到的惯性数据确定待优化位姿,本公开对此不做限制。
此外,本公开实施例中还可以维护一个滑动窗口来实现上述位姿确定方法。
具体的,在确定出第一相机采集当前帧彩色图像时的待优化位姿时,终端设备可以将当前帧彩色图像组添加入滑动窗口,以便结合滑动窗口内包含的第一彩色图像集合和第二彩色图像集合确定第一相机采集当前帧彩色图像时的目标位姿。当前帧彩色图像组包括第一相机采集的当前帧彩色图像和第二相机采集的当前帧彩色图像。在实施时,可以使用数组来实现滑动窗口。
也就是说,在当前帧彩色图像组被添加入滑动窗口之后,可以执行上述操作S52至操作S56的处理过程。
本公开所说的滑动窗口可以是固定大小的滑动窗口,用所能包含彩色图像组的最大数量来表征滑动窗口的大小。如滑动窗口被配置为最大能够包含10个彩色图像组,则滑动窗口的大小为10。
在将当前帧彩色图像组添加入滑动窗口时,如果滑动窗口包含的彩色图像组的数量等于滑动窗口所能包含的彩色图像组的最大值,则从滑动窗口中移出最早添加至滑动窗口的彩色图像组。例如,滑动窗口的大小为10,在将当前帧彩色图像组添加入滑动窗口时,将滑动窗口中第一个彩色图像组移出,以在当前帧彩色图像组添加入滑动窗口后,滑动窗口中仍包含10个彩色图像组。
在将当前帧彩色图像组添加入滑动窗口时,如果滑动窗口包含的彩色图像组的数量小于滑动窗口所能包含的彩色图像组的最大值,则直接利用滑动窗口内包含的所有彩色图像组执行上述基于第一彩色图像集合和第二彩色图像集合的位姿确定过程。
下面以滑动窗口的大小被配置为10为例,对本公开实施方式的位姿确定方法进行说明。
参考图12,在将当前帧彩色图像组添加入滑动窗口时,如果滑动窗口已满,则移出最早在滑动窗口的图像组。在按时间将彩色图像组添加入滑动窗口的过程中,当算法输入当前帧彩色图像组时,滑动窗口将窗口内第一组彩色图像组移出。
在将当前帧彩色图像组添加入滑动窗口之后,可以利用滑动窗口内10个彩色图像组提供的约束对位姿进行优化求解。应当理解的是,滑动窗口内每一个彩色图像组均包括第一相机采集的彩色图像和第二相机采集的彩色图像。
下面以滑动窗口内第9个彩色图像组和第10个彩色图像组为例对构建误差约束的过程进行说明。对于每一组,均可以执行下述处理过程。
参考图13,将第9个彩色图像组中第一相机拍摄的彩色图像中与第10个彩色图像组中第一相机拍摄的彩色图像匹配的特征点记为与此对应的,将第10个彩色图像组中第一相机拍摄的彩色图像中与第9个彩色图像组中第一相机拍摄的彩色图像匹配的特征点记为
将第9个彩色图像组中第二相机拍摄的彩色图像中与第10个彩色图像组中第二相机拍摄的彩色图像匹配的特征点记为与此对应的,将第10个彩色图像组中第二相机拍摄的彩色图像中与第9个彩色图像组中第二相机拍摄的彩色图像匹配的特征点记为
针对第一相机,在世界坐标系下的三维特征点可以表示为公式8:
其中,di为特征点的深度值,Tw9为采集第9个彩色图像组时第一相机的位姿,可以理解的是,该位姿是第一相机相对于世界坐标系的位姿。
由此,可以构建出与特征点的重投影误差eleft,如公式9所示:
其中,norm()表示对三维特征点进行归一化,Tw10为采集第10个彩色图像组时第一相机的位姿。
针对第二相机,在世界坐标系下的三维特征点可以表示为公式10:
其中,dj为特征点的深度值,Tlr为第一相机的第一相机坐标系与所述第二相机的第二相机坐标系之间的转换矩阵。
由此,可以构建出与特征点的重投影误差erigh t,如公式11所示:
对于滑动窗口内包含的每一组彩色图像组,均执行上述处理过程,得到针对第一相机的多个重投影误差以及针对第二相机的多个重投影误差。
接下来,将这些误差约束累加,构建出位姿优化的总误差函数etotal,如图12所示:
etotal=∑i∈left_factor||ei||2+∑j∈righ t_factor||ej||2       (公式12)
其中,待优化变量为采集滑动窗口内彩色图像对应的第一相机的10个位姿Tw1……Tw10。对于特征点的深度值变量,由于是从深度图读取的已知量,误差很小,此过程不作为待优化变量。
优化的目标是使总误差函数etotal最小。总误差函数etotal最小时,位姿Tw1……Tw10优化结束。对应于输入的当前帧彩色图像组,优化后的Tw10为本公开所说的第一相机采集当前帧彩色图像时的目标位姿。
通过上式可以看出,总误差函数etotal是一个关于Tw1……Tw10变量的非线性函数。由此,可以定义 e=f(x),x代表待优化变量。
首先,可以将非线性函数进行线性化,如公式13所示:
f(x+Δx)≈f(x)+JΔx       (公式13)
其中,J为f(x)对Δx的导数。
接下来,可以将其转换为最小二乘问题,如公式14所示:
对上式进行求导,并令导数等于0,如公式15和公式16:

JTJΔx=-JTf         (公式16)
根据上式可以求出Δx,对待优化变量进行更新,更新后的变量可以表示为公式17:
x=x+Δx        (公式17)
如果更新后的x带入最小二乘方程F(x),能够使得其值变小,则本次更新有效。循环执行上述计算过程,直至F(x)的值小于一定阈值后,停止更新,此时得到的x为优化后的位姿结果。
为了验证本公开实施方式的位姿确定方案的效果,本公开还提供了一种测试方式并给出了测试结果。
以机器狗为例,参考图14,控制机器狗沿一个长6m、宽5m的矩形行走一圈,回到原点。在矩形行走路线上设置3个路标,分别为路标1、路标2和路标3,并测量这三个路标与原点的距离和角度,作为评估上述定位方案的真值。
可以理解的是,记录机器狗经过3个路标时定位算法输出的位姿,将其与真值进行对比,便可以评估出本公开定位方案的精度。
记录四组测试数据,测试结果如表1所示。
表1
通过表1可以看出,应用本公开实施方式的位姿确定方案,定位距离精度在0.5%以内,旋转精度均在5°以内。可见,本公开实施方式的方案可以实现较高定位精度的效果。
应当注意,尽管在附图中以特定顺序描述了本公开中方法的各个操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加的或备选的,可以省略某些操作,将多个操作合并为一个操作执行,以及/或者将一个操作分解为多个操作执行等。
进一步的,本示例实施方式中还提供了一种位姿确定装置。该位姿确定装置配置于终端设备,终端设备还配置有第一相机和至少一个第二相机。
图15示意性示出了本公开的示例性实施方式的位姿确定装置的方框图。参考图15,根据本公开的示例性实施方式的位姿确定装置15可以包括第一误差确定模块151、第二误差确定模块153和目标位姿确定模块155。
具体的,第一误差确定模块151可以用于获取第一彩色图像集合中两两图像之间的匹配特征点,并确定第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,第一彩色图像集合由第一相机采集的当前帧彩色图像和第一相机采集的前n帧彩色图像组成;第二误差确定模块153可以用于获取第二彩色图像集合中两两图像之间的匹配特征点,并结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,第二彩色图像集合由第二相机采集的当前帧彩色图像和第二相机采集的前n帧彩色图像组成;目标位姿确定模块155可以用于基于第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出第一相机采集当前帧彩色图像时的目标位姿;其中,n为正整数。
根据本公开的示例性实施例,第一彩色图像集合包括第一彩色图像和第二彩色图像。在这种情况下,第一误差确定模块151确定第一彩色图像与第二彩色图像之间的匹配特征点的重投影误差的过程可以被配置为执行:获取第一彩色图像上与第二彩色图像匹配的第一匹配特征点;利用第一匹配特征点、第一匹配特征点的深度信息以及第一相机采集第一彩色图像时的位姿,确定第一匹配特征点在世界坐标系下的三维特征点;获取第二彩色图像上与第一彩色图像匹配的第二匹配特征点;利用第二匹配特征点、第一匹配特征点在世界坐标系下的三维特征点以及第一相机采集第二彩色图像时的位姿,确定第一彩色图像与第二彩色图像之间的匹配特征点的重投影误差。
根据本公开的示例性实施例,第一彩色图像为当前帧彩色图像,第一相机采集第一彩色图像时的位姿为第一相机采集当前帧彩色图像时的待优化位姿。在这种情况下,参考图16,相比于位姿确定装置15,位姿确定装置16还可以包括待优化位姿估计模块161。
具体的,待优化位姿估计模块161可以被配置为执行:获取第一相机采集的当前帧彩色图像,确定第一相机采集的当前帧彩色图像上与第一相机采集的上一帧彩色图像匹配的第一二维特征点;获取第二相机采集的当前帧彩色图像,确定第二相机采集的当前帧彩色图像上与第二相机采集的上一帧彩色图像匹配的第二二维特征点;利用第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵将第二二维特征点转换为第一相机坐标系下的第三二维特征点;根据第一二维特征点、第三二维特征点、第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点以及第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点,确定第一相机采集当前帧彩色图像时的待优化位姿。
根据本公开的示例性实施例,待优化位姿估计模块161还可以被配置为执行:获取第一相机采集的上一帧彩色图像,提取第一相机采集的上一帧彩色图像的特征点;利用与第一相机采集的上一帧彩色图像对齐的上一帧深度图像,对第一相机采集的上一帧彩色图像的特征点进行空间投射,以得到第一相机采集的上一帧彩色图像在第一相机坐标系下的三维特征点;根据第一相机采集上一帧彩色图像时的位姿,对第一相机坐标系下的三维特征点进行转换,以得到第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点。
根据本公开的示例性实施例,第二彩色图像集合包括第三彩色图像和第四彩色图像。在这种情况下,第二误差确定模块153结合第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵确定第三彩色图像与第四彩色图像之间的匹配特征点的重投影误差的过程可以被配置为执行:获取第三彩色图像上与第四彩色图像匹配的第三匹配特征点;利用第三匹配特征点、第三匹配特征点的深度信息、第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵以及第二相机采集第三彩色图像时第一相机的位姿,确定第三匹配特征点在世界坐标系下的三维特征点;获取第四彩色图像上与第三彩色图像匹配的第四匹配特征点;利用第四匹配特征点、第三匹配特征点在世界坐标系下的三维特征点、第二相机采集第四彩色图像时第一相机的位姿以及第一相机的第一相机坐标系与第二相机的第二相机坐标系之间的转换矩阵,确定第三彩色图像与第四彩色图像之间的匹配特征点的重投影误差。
根据本公开的示例性实施例,目标位姿确定模块155可以被配置为执行:对第一彩色图像集合 中两两图像之间的匹配特征点的重投影误差以及第二彩色图像集合中两两图像之间的匹配特征点的重投影误差进行累加,以确定总误差函数,该总误差函数是以第一相机采集第一彩色图像集合中各彩色图像时的位姿为变量的非线性函数,第一相机采集第一彩色图像集合中各彩色图像时的位姿包括第一相机采集当前帧彩色图像时的待优化位姿;利用迭代处理的方式使总误差函数最小,以确定出第一相机采集当前帧彩色图像时的目标位姿。
根据本公开的示例性实施例,参考图17,相比于位姿确定装置15,位姿确定装置17还可以包括滑动窗口操作模块171。
具体的,滑动窗口操作模块171可以被配置为执行:在确定出第一相机采集当前帧彩色图像时的待优化位姿时,将当前帧彩色图像组添加入滑动窗口,以便结合滑动窗口内包含的第一彩色图像集合和第二彩色图像集合确定第一相机采集当前帧彩色图像时的目标位姿;其中,当前帧彩色图像组包括第一相机采集的当前帧彩色图像和第二相机采集的当前帧彩色图像。
根据本公开的示例性实施例,滑动窗口操作模块171还可以被配置为执行:在将当前帧彩色图像组添加入滑动窗口时,如果滑动窗口包含的彩色图像组的数量等于滑动窗口所能够包含的彩色图像组的最大值,则从滑动窗口中移出最早添加至滑动窗口的彩色图像组。
由于本公开实施方式的位姿确定装置的各个功能模块与上述方法实施方式中相同,因此在此不再赘述。
图18示出了适于用来实现本公开示例性实施方式的电子设备的示意图。本公开示例性实施方式的终端设备可以被配置为如图18的形式。需要说明的是,图18示出的电子设备仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
本公开的电子设备至少包括处理器和存储器,存储器用于存储一个或多个程序,当一个或多个程序被处理器执行时,使得处理器可以实现本公开示例性实施方式的位姿确定方法。
具体的,如图18所示,电子设备180至少包括:处理器1810、内部存储器1821、外部存储器接口1822、通用串行总线(Universal Serial Bus,USB)接口1830、充电管理模块1840、电源管理模块1841、电池1842、天线、无线通信模块1850、音频模块1860、显示屏1870、传感器模块1880、摄像模组1890等。其中传感器模块1880可以包括深度传感器、压力传感器、陀螺仪传感器、气压传感器、磁传感器、加速度传感器、距离传感器、接近光传感器、指纹传感器、温度传感器、触摸传感器、环境光传感器及骨传导传感器等。
可以理解的是,本公开实施例示意的结构并不构成对电子设备180的具体限定。在本公开另一些实施例中,电子设备180可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。
处理器1810可以包括一个或多个处理单元,例如:处理器1810可以包括应用处理器(Application Processor,AP)、调制解调处理器、图形处理器(Graphics Processing Unit,GPU)、图像信号处理器(Image Signal Processor,ISP)、控制器、视频编解码器、数字信号处理器(Digital Signal Processor,DSP)、基带处理器和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。另外,处理器1810中还可以设置存储器,用于存储指令和数据。
电子设备180可以通过ISP、摄像模组1890、视频编解码器、GPU、显示屏1870及应用处理器等实现拍摄功能。在一些实施例中,电子设备180可以包括至少两个摄像模组1890,在实现本公开方案时,将一个摄像模组确定为基准相机,其他摄像模组采集到的特征数据转移到该基准相机的坐标系下进行处理。例如,电子设备180配置有两个RealsenseD455相机。
内部存储器1821可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器1821可以包括存储程序区和存储数据区。外部存储器接口1822可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备180的存储能力。
本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现如本公开实施例中所述的方法。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。
此外,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
本领域技术人员在考虑说明书及实践这里公开的内容后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。

Claims (11)

  1. 一种位姿确定方法,其特征在于,应用于终端设备,所述终端设备配置有第一相机和至少一个第二相机,所述位姿确定方法包括:
    获取第一彩色图像集合中两两图像之间的匹配特征点,并确定所述第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,所述第一彩色图像集合由所述第一相机采集的当前帧彩色图像和所述第一相机采集的前n帧彩色图像组成;
    获取第二彩色图像集合中两两图像之间的匹配特征点,并结合所述第一相机的第一相机坐标系与所述第二相机的第二相机坐标系之间的转换矩阵确定所述第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,所述第二彩色图像集合由所述第二相机采集的当前帧彩色图像和所述第二相机采集的前n帧彩色图像组成;
    基于所述第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及所述第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对所述第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出所述第一相机采集当前帧彩色图像时的目标位姿;
    其中,n为正整数。
  2. 根据权利要求1所述的位姿确定方法,其特征在于,所述第一彩色图像集合包括第一彩色图像和第二彩色图像;其中,确定所述第一彩色图像与第二彩色图像之间的匹配特征点的重投影误差包括:
    获取所述第一彩色图像上与所述第二彩色图像匹配的第一匹配特征点;
    利用所述第一匹配特征点、所述第一匹配特征点的深度信息以及所述第一相机采集所述第一彩色图像时的位姿,确定所述第一匹配特征点在世界坐标系下的三维特征点;
    获取所述第二彩色图像上与所述第一彩色图像匹配的第二匹配特征点;
    利用所述第二匹配特征点、所述第一匹配特征点在世界坐标系下的三维特征点以及所述第一相机采集所述第二彩色图像时的位姿,确定所述第一彩色图像与第二彩色图像之间的匹配特征点的重投影误差。
  3. 根据权利要求2所述的位姿确定方法,其特征在于,所述第一彩色图像为当前帧彩色图像,所述第一相机采集所述第一彩色图像时的位姿为所述第一相机采集所述当前帧彩色图像时的待优化位姿;其中,所述位姿确定方法还包括:
    获取所述第一相机采集的当前帧彩色图像,确定所述第一相机采集的当前帧彩色图像上与所述第一相机采集的上一帧彩色图像匹配的第一二维特征点;
    获取所述第二相机采集的当前帧彩色图像,确定所述第二相机采集的当前帧彩色图像上与所述第二相机采集的上一帧彩色图像匹配的第二二维特征点;
    利用所述第一相机的第一相机坐标系与所述第二相机的第二相机坐标系之间的转换矩阵将所述第二二维特征点转换为所述第一相机坐标系下的第三二维特征点;
    根据所述第一二维特征点、所述第三二维特征点、所述第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点以及所述第二相机采集的上一帧彩色图像在世界坐标系下的三维特征点,确定所述第一相机采集所述当前帧彩色图像时的待优化位姿。
  4. 根据权利要求3所述的位姿确定方法,其特征在于,所述位姿确定方法还包括:
    获取所述第一相机采集的上一帧彩色图像,提取所述第一相机采集的上一帧彩色图像的特征点;
    利用与所述第一相机采集的上一帧彩色图像对齐的上一帧深度图像,对所述第一相机采集的上一帧彩色图像的特征点进行空间投射,以得到所述第一相机采集的上一帧彩色图像在所述第一相机坐标系下的三维特征点;
    根据所述第一相机采集上一帧彩色图像时的位姿,对所述第一相机坐标系下的三维特征点进行转换,以得到所述第一相机采集的上一帧彩色图像在世界坐标系下的三维特征点。
  5. 根据权利要求1所述的位姿确定方法,其特征在于,所述第二彩色图像集合包括第三彩色图像和第四彩色图像;其中,结合所述第一相机的第一相机坐标系与所述第二相机的第二相机坐标系 之间的转换矩阵确定所述第三彩色图像与所述第四彩色图像之间的匹配特征点的重投影误差包括:
    获取所述第三彩色图像上与所述第四彩色图像匹配的第三匹配特征点;
    利用所述第三匹配特征点、所述第三匹配特征点的深度信息、所述第一相机的第一相机坐标系与所述第二相机的第二相机坐标系之间的转换矩阵以及所述第二相机采集所述第三彩色图像时所述第一相机的位姿,确定所述第三匹配特征点在世界坐标系下的三维特征点;
    获取所述第四彩色图像上与所述第三彩色图像匹配的第四匹配特征点;
    利用所述第四匹配特征点、所述第三匹配特征点在世界坐标系下的三维特征点、所述第二相机采集所述第四彩色图像时所述第一相机的位姿以及所述第一相机的第一相机坐标系与所述第二相机的第二相机坐标系之间的转换矩阵,确定所述第三彩色图像与所述第四彩色图像之间的匹配特征点的重投影误差。
  6. 根据权利要求1所述的位姿确定方法,其特征在于,基于所述第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及所述第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对所述第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出所述第一相机采集当前帧彩色图像时的目标位姿,包括:
    对所述第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及所述第二彩色图像集合中两两图像之间的匹配特征点的重投影误差进行累加,以确定总误差函数,所述总误差函数是以所述第一相机采集所述第一彩色图像集合中各彩色图像时的位姿为变量的非线性函数,所述第一相机采集所述第一彩色图像集合中各彩色图像时的位姿包括所述第一相机采集当前帧彩色图像时的待优化位姿;
    利用迭代处理的方式使所述总误差函数最小,以确定出所述第一相机采集所述当前帧彩色图像时的目标位姿。
  7. 根据权利要求1至6中任一项所述的位姿确定方法,其特征在于,所述位姿确定方法还包括:
    在确定出所述第一相机采集所述当前帧彩色图像时的待优化位姿时,将当前帧彩色图像组添加入滑动窗口,以便结合所述滑动窗口内包含的所述第一彩色图像集合和所述第二彩色图像集合确定所述第一相机采集当前帧彩色图像时的目标位姿;
    其中,所述当前帧彩色图像组包括所述第一相机采集的当前帧彩色图像和所述第二相机采集的当前帧彩色图像。
  8. 根据权利要求7所述的位姿确定方法,其特征在于,所述位姿确定方法还包括:
    在将所述当前帧彩色图像组添加入所述滑动窗口时,如果所述滑动窗口包含的彩色图像组的数量等于所述滑动窗口所能够包含的彩色图像组的最大值,则从所述滑动窗口中移出最早添加至所述滑动窗口的彩色图像组。
  9. 一种位姿确定装置,其特征在于,应用于终端设备,所述终端设备配置有第一相机和至少一个第二相机,所述位姿确定装置包括:
    第一误差确定模块,用于获取第一彩色图像集合中两两图像之间的匹配特征点,并确定所述第一彩色图像集合中两两图像之间的匹配特征点的重投影误差,所述第一彩色图像集合由所述第一相机采集的当前帧彩色图像和所述第一相机采集的前n帧彩色图像组成;
    第二误差确定模块,用于获取第二彩色图像集合中两两图像之间的匹配特征点,并结合所述第一相机的第一相机坐标系与所述第二相机的第二相机坐标系之间的转换矩阵确定所述第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,所述第二彩色图像集合由所述第二相机采集的当前帧彩色图像和所述第二相机采集的前n帧彩色图像组成;
    目标位姿确定模块,用于基于所述第一彩色图像集合中两两图像之间的匹配特征点的重投影误差以及所述第二彩色图像集合中两两图像之间的匹配特征点的重投影误差,对所述第一相机采集当前帧彩色图像时的待优化位姿进行优化,以确定出所述第一相机采集当前帧彩色图像时的目标位姿;
    其中,n为正整数。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行 时实现如权利要求1至8中任一项所述的位姿确定方法。
  11. 一种电子设备,其特征在于,包括:
    处理器;
    存储器,用于存储一个或多个程序,当所述一个或多个程序被所述处理器执行时,使得所述处理器实现如权利要求1至8中任一项所述的位姿确定方法。
PCT/CN2023/118752 2022-10-28 2023-09-14 位姿确定方法及装置、计算机可读存储介质和电子设备 WO2024087927A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211336046.1 2022-10-28
CN202211336046.1A CN117994332A (zh) 2022-10-28 2022-10-28 位姿确定方法及装置、计算机可读存储介质和电子设备

Publications (1)

Publication Number Publication Date
WO2024087927A1 true WO2024087927A1 (zh) 2024-05-02

Family

ID=90829971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118752 WO2024087927A1 (zh) 2022-10-28 2023-09-14 位姿确定方法及装置、计算机可读存储介质和电子设备

Country Status (2)

Country Link
CN (1) CN117994332A (zh)
WO (1) WO2024087927A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019050417A1 (en) * 2017-09-06 2019-03-14 Auckland Uniservices Limited METHOD FOR CALIBRATING STEREOSCOPIC SYSTEM
CN111415387A (zh) * 2019-01-04 2020-07-14 南京人工智能高等研究院有限公司 相机位姿确定方法、装置、电子设备及存储介质
CN114677439A (zh) * 2022-03-29 2022-06-28 Oppo广东移动通信有限公司 相机的位姿确定方法、装置、电子设备以及存储介质
CN114998433A (zh) * 2022-05-31 2022-09-02 Oppo广东移动通信有限公司 位姿计算方法、装置、存储介质以及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019050417A1 (en) * 2017-09-06 2019-03-14 Auckland Uniservices Limited METHOD FOR CALIBRATING STEREOSCOPIC SYSTEM
CN111415387A (zh) * 2019-01-04 2020-07-14 南京人工智能高等研究院有限公司 相机位姿确定方法、装置、电子设备及存储介质
CN114677439A (zh) * 2022-03-29 2022-06-28 Oppo广东移动通信有限公司 相机的位姿确定方法、装置、电子设备以及存储介质
CN114998433A (zh) * 2022-05-31 2022-09-02 Oppo广东移动通信有限公司 位姿计算方法、装置、存储介质以及电子设备

Also Published As

Publication number Publication date
CN117994332A (zh) 2024-05-07

Similar Documents

Publication Publication Date Title
US11145083B2 (en) Image-based localization
US10311648B2 (en) Systems and methods for scanning three-dimensional objects
WO2020259248A1 (zh) 基于深度信息的位姿确定方法、装置、介质与电子设备
JP6430064B2 (ja) データを位置合わせする方法及びシステム
US9129435B2 (en) Method for creating 3-D models by stitching multiple partial 3-D models
Herrera et al. Dt-slam: Deferred triangulation for robust slam
WO2021082801A1 (zh) 增强现实处理方法及装置、系统、存储介质和电子设备
CN108958469B (zh) 一种基于增强现实的在虚拟世界增加超链接的方法
WO2020228682A1 (zh) 对象交互方法及装置、系统、计算机可读介质和电子设备
CN111127524A (zh) 一种轨迹跟踪与三维重建方法、系统及装置
WO2021136386A1 (zh) 数据处理方法、终端和服务器
WO2022156755A1 (zh) 一种室内定位方法、装置、设备和计算机可读存储介质
WO2024066816A1 (zh) 相机与惯性测量单元的标定方法、装置和计算机设备
WO2023169281A1 (zh) 图像配准方法、装置、存储介质及电子设备
CN110310325B (zh) 一种虚拟测量方法、电子设备及计算机可读存储介质
TW202244680A (zh) 位置姿勢獲取方法、電子設備及電腦可讀儲存媒體
CN111047622A (zh) 视频中对象的匹配方法和装置、存储介质及电子装置
CN113362467B (zh) 基于点云预处理和ShuffleNet的移动端三维位姿估计方法
CN110849380B (zh) 一种基于协同vslam的地图对齐方法及系统
CN112365530A (zh) 增强现实处理方法及装置、存储介质和电子设备
WO2024087927A1 (zh) 位姿确定方法及装置、计算机可读存储介质和电子设备
WO2024087917A1 (zh) 位姿确定方法及装置、计算机可读存储介质和电子设备
CN108566545A (zh) 通过移动终端和球幕相机对大场景进行三维建模的方法
CN115344113A (zh) 多视角人体运动捕捉方法、装置、系统、介质及终端
Kehl et al. Direct image-to-geometry registration using mobile sensor data