WO2023005457A1 - Pose calculation method and apparatus, electronic device, and readable storage medium - Google Patents

Pose calculation method and apparatus, electronic device, and readable storage medium Download PDF

Info

Publication number
WO2023005457A1
WO2023005457A1 PCT/CN2022/098295 CN2022098295W WO2023005457A1 WO 2023005457 A1 WO2023005457 A1 WO 2023005457A1 CN 2022098295 W CN2022098295 W CN 2022098295W WO 2023005457 A1 WO2023005457 A1 WO 2023005457A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
rgb image
target
image
transformation matrix
Prior art date
Application number
PCT/CN2022/098295
Other languages
French (fr)
Chinese (zh)
Inventor
尹赫
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2023005457A1 publication Critical patent/WO2023005457A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present application relates to the technical field of computer vision, in particular to a pose calculation method and device, electronic equipment, and a readable storage medium.
  • the position and attitude of electronic devices in an unknown environment is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices.
  • VIO Visual-Inertial Odometry, visual-inertial odometer
  • Embodiments of the present application provide a pose calculation method and device, electronic equipment, and a readable storage medium, which can reduce the waiting time for real-time output pose.
  • a pose calculation method comprising:
  • the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the Default initialization of the last frame image in the sliding window;
  • the target RGB image, the depth image, and the next frame of RGB image of the target RGB image determine the pose of the electronic device when collecting the next frame of RGB image.
  • a pose computing device comprising:
  • the initial pose determination module is used to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the depth image The corresponding target RGB image is not the last frame image in the preset initialization sliding window;
  • a pose determination module configured to determine that the electronic device acquires the next frame of RGB images according to the initial pose, the target RGB image, the depth image, and the next frame of RGB images of the target RGB image time pose.
  • An electronic device includes a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the operation of the pose calculation method as described above.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the operation of the pose calculation method as described above is realized.
  • Fig. 1 is an application environment diagram of a pose calculation method in an embodiment
  • Fig. 2 is the flow chart of pose computing method in an embodiment
  • Fig. 3 is the flow chart of pose calculation method in another embodiment
  • Fig. 4 is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment
  • Fig. 5 adopts preset perspective projection PnP algorithm in one embodiment, calculates the flow chart of the first pose transformation matrix method between next frame RGB image and target RGB image;
  • FIG. 6 is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment
  • Fig. 7 is the flowchart of calculating the first translation transformation matrix method between the next frame RGB image and the target RGB image in Fig. 5;
  • FIG. 8 is a flowchart of a method for constructing a target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image in one embodiment
  • Fig. 9 is a flow chart of the method for calculating the pose of the electronic device when the RGB image after the next frame of RGB image is collected in the preset initialization sliding window according to the three-dimensional map of the target in Fig. 3;
  • Fig. 10 is a schematic diagram of the pose and three-dimensional map calculation when calculating the RGB image after collecting the next frame of RGB image in one embodiment
  • Fig. 11 is a flowchart of a pose calculation method in another embodiment
  • Fig. 12 is a schematic diagram of a pose calculation method in yet another embodiment
  • Fig. 13 is a flowchart of a pose calculation method in a specific embodiment
  • Fig. 14 is a structural block diagram of a pose calculation device in an embodiment
  • Fig. 15 is a structural block diagram of a pose calculation device in an embodiment
  • Fig. 16 is a structural block diagram of a pose calculation device in another embodiment
  • Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment.
  • the position and attitude of electronic devices in an unknown environment is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices.
  • VIO Visual-Inertial Odometry, visual-inertial odometer
  • VINS visual-inertial system, visual-inertial system
  • VINS-MONO visual-inertial system
  • the specific operation of the VINS-MONO algorithm includes: assuming that there are 10 frames of images in the preset initialization sliding window. Of course, the size of the preset initialization sliding window is not specifically limited in this application.
  • the first step is to collect RGB images through the camera on the electronic device. After accumulating 10 frames of RGB images in the preset initialization sliding window, two frames of images with parallax satisfying the conditions are selected from the 10 frames of RGB images (for example, L frame and R frame), and then use the epipolar geometric constraints to calculate the pose between these two frames.
  • the pose is used to restore the map points that are co-viewed between the two frames using the triangulation method.
  • the third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error .
  • the fourth step is to use the triangulation method between the arbitrary frame and the L frame and the R frame to restore the map points that are co-viewed between the arbitrary frame and the L frame and the R frame.
  • the VINS-RGBD algorithm When determining the pose of the electronic device when capturing each frame of image, the VINS-RGBD algorithm is used.
  • the specific operations of the VINS-RGBD algorithm include: assuming that there are 10 frames of images in the preset initialization sliding window, Then, in the first step, the camera on the electronic device collects the RGB image and the depth image at the same time, and it is necessary to ensure that each frame of the RGB image has a corresponding depth image.
  • the correspondence means that the RGB image is aligned with the depth image in time and space.
  • two frames of images (such as L frame and R frame) are screened out from these 10 frames of RGB images.
  • the traditional PnP algorithm is used to calculate the pose of the L frame at the same time, and then combined with the depth image corresponding to the L frame, the reprojection error meets the preset threshold to filter the effective map points, and then restore the map points.
  • the third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error .
  • the fourth step is to use the depth image of any frame and the L frame to recover the map points that can be co-viewed and the reprojection error meets the threshold requirement.
  • a pose calculation method is proposed in the embodiment of the present application. It is not necessary to collect full RGB images in the preset initialization sliding window before the electronic device can output the collected images. Preset the pose when initializing the first frame of RGB image outside the sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. Obviously, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output pose.
  • Fig. 1 is an application scene diagram of a pose calculation method in an embodiment.
  • the application environment includes an electronic device 120, and the electronic device 120 includes a first camera and a second camera.
  • the first camera is an RGB camera
  • the second camera is a camera for collecting depth images, for example, a TOF (Time-offlight, time-of-flight) camera or a structured light camera, which is not limited in this application.
  • the electronic device 120 collects the RGB image and the depth image of the current environment respectively through the first camera and the second camera. If the electronic device 120 collects the depth image of the current environment for the first time within the preset initialization sliding window, the electronic device will collect the depth image when the depth image is collected.
  • the pose of is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined.
  • the electronic device 120 can be a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device (smart bracelet, smart watch, smart glasses, smart gloves, smart socks, smart belt, etc.), VR (virtual reality , virtual reality) devices, smart homes, driverless cars and other arbitrary terminal devices.
  • Fig. 2 is a flowchart of a pose calculation method in an embodiment.
  • the pose calculation method in this embodiment is described by taking the operation on the electronic device 120 in FIG. 1 as an example, and the electronic device 120 includes a first camera and a second camera.
  • the first camera is an RGB camera
  • the second camera is a camera for collecting depth images.
  • the pose calculation method includes operation 220-operation 240, wherein,
  • Operation 220 if the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the preset initialization The last image frame within the sliding window.
  • the size of the preset initialization sliding window is equal to the duration of collecting a certain number of image frames. For example, if the default initialization sliding window is set to include 10 frames of RGB images, then after collecting 10 frames of RGB images, the 10 frames of RGB images will fill the preset initialization sliding window. At this time, the size of the preset initialization sliding window is the same as The duration of collecting 10 frames of RGB images is equal.
  • the electronic device does not need to collect full RGB images within the preset initialization sliding window before the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window.
  • the RGB image is collected through the first camera in the electronic device, and the depth image is collected through the second camera at the same time. If the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the position of the electronic device when the depth image is collected will be The pose is determined as the initial pose, and the image coordinate system of the target RGB image is used as the world coordinate system, which realizes the visual initialization of the electronic device. And there is no need to select two frames of images whose parallax meets the condition from the 10 frames of RGB images, so the adaptability is wider.
  • the acquisition frequency of the second camera to acquire the depth image may be lower than the acquisition frequency of the first camera to acquire the RGB image, it will not be able to acquire the depth image corresponding to each frame of the RGB image, that is, relative to the RGB image Part of the depth image will be missing.
  • Operation 240 according to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.
  • the perspective projection PnP algorithm can be used, based on the matching 2D-2D feature point pairs between the target RGB image and the next frame RGB image of the target RGB image; secondly, according to these 2D-2D feature point pairs combined with the depth image. 3D points to obtain matching 3D-2D feature point pairs; again, based on the 3D-2D feature point pairs, calculate the pose transformation matrix between the next frame RGB image of the target RGB image and the target RGB image.
  • pose refers to position and attitude
  • pose is a six-dimensional matrix, including three positions (X, Y, Z) and three attitude angles (heading, pitch, roll).
  • the perspective projection PnP algorithm here can be the traditional perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are simultaneously calculated based on the 3D-2D feature point pairs, and the rotation transformation matrix and the translation transformation matrix constitute Pose transformation matrix.
  • the perspective projection PnP algorithm here may be a new perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are calculated step by step based on the 3D-2D feature point pairs.
  • the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined.
  • the time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.
  • the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.
  • a pose calculation method which also includes:
  • operation 260 construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image.
  • the pose of the electronic device when capturing the next frame of RGB image is calculated, and at this time, the target 3D map of the current environment can be constructed based on the depth image of the target RGB image and the pose of the next frame of RGB image .
  • the target three-dimensional map of the current environment it can be divided into the following two situations.
  • the target RGB image is the first frame image in the preset initialization sliding window, that is, the corresponding depth image is collected for the first frame RGB image in the preset initialization sliding window. Then, because there is no RGB image before the target RGB image, when constructing the target 3D map of the current environment, only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated, according to With a pose transformation matrix and a depth image, a target 3D map of the current environment can be constructed.
  • the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly.
  • the target RGB image is not the first frame of image in the preset initialization sliding window, that is, the corresponding depth image is not collected from the first frame of RGB image in the sliding window. Then, when constructing the target 3D map of the current environment, because the target RGB image has also collected RGB images before, the map points can not only be restored based on the target RGB image, but also based on the RGB images collected before the target RGB image To restore the map points, so as to restore more map points to enrich the 3D map of the current environment.
  • the first pose transformation matrix between the next frame of RGB image and the target RGB image is calculated similarly, and the initial three-dimensional map of the current environment can be constructed according to the first pose transformation matrix and the depth image.
  • the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly.
  • calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .
  • Operation 280 calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window.
  • the position and orientation of the electronic device when collecting other RGB images after the next RGB image frame within the sliding window can be calculated according to the target three-dimensional map. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is based on these RGB images and the depth image. Obviously, the map points on the target three-dimensional map are more comprehensive than the depth image.
  • the poses of other RGB images can be directly calculated according to the target three-dimensional map.
  • the target three-dimensional map of the current environment is constructed according to the pose and depth image when the electronic device collects the next frame of RGB image. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is obvious that the map points on the target 3D map are relative to the depth image. Said more comprehensively. Therefore, according to the target three-dimensional map, the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.
  • operation 260 is to construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image, including:
  • the target RGB image is the first frame image in the preset initialization sliding window, there is no RGB image before the target RGB image, so it is only necessary to calculate the first frame image between the next frame RGB image and the target RGB image.
  • a pose transformation matrix is constructed. Then, according to the first pose transformation matrix and the depth image, a target 3D map of the current environment is constructed.
  • the target RGB image is not the first frame image within the preset initialization sliding window, there is an RGB image before the target RGB image. Therefore, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, and construct the initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image. Secondly, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .
  • the target RGB image is the first frame image in the preset initialization sliding window
  • only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated.
  • the target 3D map of the current environment is directly constructed. If the target RGB image is not the first frame image in the preset initialization sliding window, first calculate the first pose transformation matrix between the next frame RGB image and the target RGB image. Then, according to the first pose transformation matrix and the depth image, an initial 3D map of the current environment is constructed.
  • the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.
  • a target three-dimensional map of the current environment is constructed, including:
  • FIG. 4 it is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame of RGB image. Then, when constructing the target 3D map of the current environment at this time, it specifically includes two operations:
  • an initial 3D map of the current environment is constructed. Specifically, calculate the first pose transformation matrix between the next RGB image (frame 5) and the target RGB image (frame 4), and construct the initial three-dimensional image of the current environment based on the first pose transformation matrix and the depth image map.
  • the method of calculating the initial three-dimensional map of the current environment here is the same as if the target RGB image is the first frame image in the preset initialization sliding window, then in operation 260, the pose and depth image of the next frame RGB image are collected according to the electronic device, The method of constructing the target 3D map of the current environment is the same, and will not be repeated here.
  • the initial 3D map is updated to generate the target 3D map. Specifically, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and supplement the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.
  • the target RGB image when constructing the target three-dimensional map of the current environment, it includes: calculating the first frame image between the next frame RGB image and the target RGB image A pose transformation matrix, constructing an initial 3D map of the current environment according to the first pose transformation matrix and the depth image. Then calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.
  • the integrity and accuracy of the target three-dimensional map obtained at this time are greatly improved.
  • a pose calculation method further comprising:
  • a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix.
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.
  • the first pose transformation matrix between the next frame RGB image and the target RGB image is calculated, including:
  • the preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix (also referred to as pose) between the next frame of RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used for step-by-step calculation Rotation transformation matrix and translation transformation matrix between a frame of RGB image and target RGB image.
  • the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm.
  • the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames.
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.
  • the preset perspective projection PnP algorithm when using the preset perspective projection PnP algorithm to calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image Rotation transformation matrix; secondly, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image. And the first rotation transformation matrix and the first translation transformation matrix constitute the first pose transformation matrix.
  • calculate the first pose transformation matrix between the RGB image before the target RGB image and the target RGB image including:
  • the preset perspective projection PnP algorithm is used to calculate the second pose transformation matrix (also referred to as pose) between the RGB image before the target RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used to analyze
  • the step is to calculate a second rotation transformation matrix and a second translation transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the second rotation transformation matrix and the second translation transformation matrix constitute a second pose transformation matrix.
  • the process of calculating the rotation transformation matrix and the translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the rotation transformation matrix with the error generated when calculating the translation transformation matrix. Therefore, the accuracy of the first or second pose transformation matrix finally obtained by adopting the preset perspective projection PnP algorithm to realize the step-by-step calculation is improved.
  • a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:
  • a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.
  • Matrix including:
  • Operation 520 calculating a first rotation transformation matrix between the RGB image of the next frame and the target RGB image.
  • the matched 2D-2D feature point pairs between two frames of RGB images are generally better than those of one frame.
  • FIG. 6 it is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment.
  • image i is the target RGB image
  • image j is the next frame RGB image of the target RGB image
  • the oval frame corresponds to the depth image corresponding to the target RGB image.
  • a pair of image i and image j on the left side of FIG. 6 is a schematic diagram of determining mutually matching 2D-2D feature point pairs in image i and image j and calculating the first rotation transformation matrix R ij .
  • the matched 2D-2D feature point pairs can be determined between the two frames of the next RGB image and the target RGB image.
  • the optical flow method or other image matching methods can be used to determine the 2D-2D feature points between the two frames. Pairs of feature points. Calculate the first rotation transformation matrix R ij between the next frame of RGB image and the target RGB image through the matching 2D-2D feature point pairs between the next frame of RGB image and the target RGB image, combined with epipolar geometric constraints .
  • Operation 540 Calculate a first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix.
  • the matched 2D-2D feature point pairs on the next frame RGB image and the target RGB image are eliminated according to the first rotation transformation matrix, to obtain the eliminated 2D-2D feature point pairs.
  • the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image.
  • Operation 560 based on the first rotation transformation matrix and the first translation transformation matrix, generate a first pose transformation matrix between the next frame of RGB image and the target RGB image.
  • the first pose transformation matrix between the next frame of RGB image and the target RGB image is generated.
  • the first rotation transformation matrix between the next frame RGB image and the target RGB image is calculated.
  • a first pose transformation matrix between the next frame of RGB image and the target RGB image is generated.
  • the preset perspective projection PnP algorithm is used to realize the step-by-step calculation of the rotation transformation matrix and the translation transformation matrix, avoiding the superposition of the errors generated in the calculation process of the two, and thus improving the accuracy of the final first pose transformation matrix sex.
  • operation 540 is to calculate the first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix, including:
  • Operation 542 Eliminate the matching 2D-2D feature point pairs on the RGB image of the next frame and the target RGB image according to the first rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.
  • image i is the target RGB image
  • image j is the next frame of RGB image of the target RGB image
  • the oval frame corresponds to the depth image corresponding to the target RGB image.
  • the matching 2D-2D feature point pairs on image i and image j are eliminated according to the first rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.
  • Operation 544 converting the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.
  • the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.
  • Operation 546 calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image.
  • the first translation transformation between the next frame RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula matrix.
  • the formula for calculating the translation transformation matrix is as follows:
  • R ij is the rotation transformation matrix from image j to image i
  • t ij is the translation transformation matrix from image j to image i
  • ⁇ () -1 is a transformation matrix that back-projects 2D points into 3D points.
  • the above formula (1-1) is used to construct the least squares formula, and the optimal variable t ij can be calculated as the first variable between the next frame RGB image and the target RGB image.
  • the translation transformation matrix is used to construct the least squares formula, and the optimal variable t ij can be calculated as the first variable between the next frame RGB image and the target RGB image.
  • the first translation transformation matrix between the next frame of RGB image and the target RGB image when calculating the first translation transformation matrix between the next frame of RGB image and the target RGB image, first, according to the first rotation transformation matrix, the mutual matching of the next frame of RGB image and the target RGB image The 2D-2D feature point pairs are eliminated to obtain the eliminated 2D-2D feature point pairs. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the first translation transformation matrix between the next frame RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated for many times, and the least square method is used to calculate the first translation transformation matrix, which improves the accuracy of the calculated first translation transformation matrix.
  • the target RGB image is the first frame image in the preset initialization sliding window
  • calculate the first pose transformation matrix between the next frame RGB image and the target RGB image as shown in Figure 8
  • construct the target 3D map of the current environment including:
  • Operation 820 according to the first pose transformation matrix, project the 3D feature points on the depth image onto the next frame of RGB image to generate projected 2D feature points.
  • the first pose transformation matrix is the pose transformation matrix between the next frame RGB image and the target RGB image.
  • the 3D-2D matching point pairs can be determined based on the depth image and target RGB image corresponding to the target RGB image, and then based on the first pose transformation matrix, the 3D feature points on the depth image can be projected onto the next frame of RGB image to generate Project 2D feature points.
  • Operation 840 calculating a reprojection error between the projected 2D feature point and the 2D feature point on the RGB image of the next frame.
  • the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the re-projection error is smaller than the preset error threshold, the 3D feature point corresponding to the re-projection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point on the depth image is used as the target map point.
  • the target 3D map of the current environment can be constructed based on these target map points.
  • the target RGB image is the first frame image in the preset initialization sliding window
  • the 3D feature points on the depth image are projected onto the next frame RGB image to generate Project 2D feature points.
  • the reprojection error is less than the preset error threshold
  • the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points.
  • the target RGB image is the first frame image in the preset initialization sliding window
  • the target three-dimensional map of the current environment can be calculated.
  • it can directly use the target three-dimensional map for calculation.
  • a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:
  • a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.
  • the preset perspective projection PnP algorithm needs to be calculated at the same time pose transformation matrix and the second pose transformation matrix.
  • the process of calculating the first pose transformation matrix is not repeated here, and the calculation of the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image includes:
  • a preset perspective projection PnP algorithm is used to calculate a second pose transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm.
  • the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames.
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.
  • the preset perspective projection PnP algorithm when using the preset perspective projection PnP algorithm to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, first, calculate the difference between the RGB image before the target RGB image and the target RGB image The second rotation transformation matrix between; secondly, according to the depth image and the second rotation transformation matrix, calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image. And the second rotation transformation matrix and the second translation transformation matrix constitute the second pose transformation matrix.
  • the process of calculating the second rotation transformation matrix and the second translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the second rotation transformation matrix with the error generated when calculating the second translation transformation matrix. Therefore, the preset perspective projection PnP algorithm is improved, and the second pose transformation matrix is calculated step by step, thereby improving the accuracy of the second pose transformation matrix.
  • calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image includes:
  • Operation 1 Eliminate the 2D-2D feature point pairs that match each other on the RGB image before the target RGB image and the target RGB image according to the second rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.
  • image i is the target RGB image
  • image j is the RGB image before the target RGB image of the target RGB image
  • the oval frame corresponds to the depth image corresponding to the target RGB image.
  • the matching 2D-2D feature point pairs on image i and image j are eliminated according to the second rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.
  • Operation 2 convert the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.
  • the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.
  • Operation three calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the distance between the RGB image before the target RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula
  • the second translation transformation matrix is as follows:
  • R ij is the rotation transformation matrix from image j to image i
  • t ij is the translation transformation matrix from image j to image i
  • ⁇ () -1 is a transformation matrix that back-projects 2D points into 3D points.
  • the above formula (1-1) is used to construct the least squares formula, and the optimal variable t ij can be calculated as the difference between the RGB image before the target RGB image and the target RGB image.
  • the second translation transformation matrix is used to construct the least squares formula, and the optimal variable t ij can be calculated as the difference between the RGB image before the target RGB image and the target RGB image.
  • the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image when calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image, first, according to the second rotation transformation matrix, the RGB image before the target RGB image and the target RGB image The 2D-2D feature point pairs that match each other are eliminated, and the 2D-2D feature point pairs after elimination are obtained. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated multiple times, and the least square method is used to calculate the second translation transformation matrix, which improves the accuracy of the calculated second translation transformation matrix.
  • the initial three-dimensional map of the current environment is updated according to the second pose transformation matrix and the depth image to generate a target three-dimensional map, including:
  • the 3D feature points on the depth image are respectively projected onto the RGB image before the target RGB image to generate projected 2D feature points;
  • the 3D feature point on the depth image is used as a new target map point
  • the second pose transformation matrix is a pose transformation matrix between the RGB image before the target RGB image and the target RGB image.
  • the 3D-2D matching point pairs can be determined based on the depth image and the target RGB image corresponding to the target RGB image, and then based on the second pose transformation matrix, the 3D feature points on the depth image can be projected onto the RGB image before the target RGB image , generating projected 2D feature points.
  • the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the reprojection error is smaller than the preset error threshold, the 3D feature point corresponding to the reprojection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point is used as the target map point.
  • the 3D feature points that satisfy the conditions are used as the target map points, and the target 3D map of the current environment can be constructed based on these target map points.
  • the 3D feature points on the depth image are projected onto the RGB image before the target RGB image , generating projected 2D feature points. Computes the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image. If the reprojection error is less than the preset error threshold, the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points. It realizes calculating the target 3D map of the current environment when the target RGB image is not the first frame image in the preset initialization sliding window. Then, when calculating the pose of the electronic device when it collects other RGB images after the next frame of RGB images within the preset initialization sliding window, it can directly use the target three-dimensional map for calculation.
  • operation 280 calculates the pose of the electronic device when the RGB image after the next frame of RGB image is collected within the preset initialization sliding window, including:
  • Operation 282 using the next frame of the next RGB image as the current frame, and performing the following target operations:
  • Operation 284 according to the target three-dimensional map, the current frame and the RGB image before the current frame, generate a pair of 3D-2D feature points that match each other between the target three-dimensional map and the current frame;
  • Operation 286, calculating the pose of the current frame based on the 3D-2D feature point pair;
  • Operation 288, update the target three-dimensional map according to the current frame, and use the updated three-dimensional map as a new target three-dimensional map.
  • the next frame of the current frame is used as the new current frame, and the target operation is executed cyclically until the pose of the last RGB image in the preset initialization sliding window is calculated.
  • FIG. 10 it is a schematic diagram of calculating the pose and three-dimensional map calculation of the RGB image after the acquisition of the next frame of RGB image in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame, and the next RGB image frame of the target RGB image is the 5th frame. At this time, the calculation of the pose when the RGB image after the next frame of RGB image is collected is to calculate the pose when the 6th-10th frame of image is collected.
  • next frame (frame 6) of the next RGB image (frame 5) is used as the current frame to calculate the pose of the current frame.
  • a 3D-2D feature point pair matching between the target 3D map and the current frame (6th frame) is generated.
  • the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs.
  • 3D feature points that match 2D feature point pairs from the map points in the target 3D map, and generate 3D-2D feature points that match each other between the target 3D map and the current frame based on the matched 3D feature points and 2D feature point pairs right.
  • the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame.
  • the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. Therefore, when the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame, the rotation transformation matrix and translation transformation matrix between the RGB image before the current frame and the current frame are respectively calculated based on the 3D-2D feature point pairs.
  • the pose transformation matrix is obtained based on the rotation transformation matrix and the translation transformation matrix, and then the pose of the current frame is obtained based on the pose transformation matrix and the pose of the RGB image before the current frame respectively.
  • the pose transformation matrix may be obtained based on the multiplication of the rotation transformation matrix and the translation transformation matrix, and this calculation method is not limited in this application.
  • the target 3D map is updated according to the current frame, and the updated 3D map is used as a new target 3D map.
  • the target operation operations 284-286 are cyclically executed until the pose of the last frame of RGB image in the preset initialization sliding window is calculated.
  • a loop method is adopted. First, based on the target 3D map and the current frame and RGB image located before the current frame, calculate the pose of the current frame. Secondly, the target three-dimensional map is updated based on the current frame, and the updated three-dimensional map is used as a new target three-dimensional map. Finally, the pose of the new current frame is calculated based on the new target 3D map, the next group of current frames and the RGB images before the current frame. Secondly, the new target three-dimensional map is updated again based on the new current frame, and the updated three-dimensional map is used as the new target three-dimensional map. This loops until the pose of the last frame of RGB image within the preset initialization sliding window is calculated.
  • a pair of 3D-2D feature points matching each other between the target three-dimensional map and the current frame is generated, including:
  • the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs. Since the RGB images before the current frame provide some map points when calculating the target 3D map, the 3D feature points matching the 2D feature point pair can be obtained from the map points in the target 3D map. Then, based on the 3D feature point and the 2D feature point of the current frame in the 2D feature point pair, a 3D-2D feature point pair matching between the target 3D map and the current frame is generated. Therefore, based on the 3D-2D feature point pair, the matching relationship between the 3D feature points on the target 3D map and the 2D feature points on the current frame is obtained.
  • the current frame is the image frame after the next frame image of the target RGB image in the sliding window. If there is a corresponding depth image in the current frame, then operation 288 is to update the target three-dimensional map according to the current frame to generate an updated
  • the final 3D map includes:
  • the target 3D map is updated to generate an intermediate 3D map
  • the intermediate 3D map is updated by triangulation method to generate the updated 3D map.
  • the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window; secondly, according to the pose transformation matrix, project the 3D feature points on the depth image of the current frame to the current On the RGB image before the frame, generate projected 2D feature points.
  • calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the current frame if the reprojection error is less than the preset error threshold, then use the 3D feature point on the depth image of the current frame as the target map point.
  • the target map is updated according to the target map points to generate an intermediate three-dimensional map.
  • a triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map.
  • the camera observes the same space point at two positions, and the three-dimensional space point coordinates are obtained through two camera poses and image observation point coordinates. This process is the calculation process of the triangulation method. Depth information missing in some depth images can be recovered by using triangulation.
  • the target three-dimensional map is calculated according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window Update generates intermediate 3D maps. Then, the triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map. If there is a corresponding depth image in the current frame, then the target 3D map is updated based on the 3D feature points on the depth image of the current frame in combination with the target 3D map constructed from the previous image frame. Finally, the triangulation method can recover the missing depth information in some depth images. Therefore, the integrity and accuracy of the three-dimensional map constructed at this time are greatly improved.
  • the target 3D map is updated according to the current frame to generate an updated 3D map, including:
  • the three-dimensional map of the target is updated by the triangulation method, and an updated three-dimensional map is generated.
  • a triangulation method is used to update the target three-dimensional map to generate an updated three-dimensional map. Since the depth information missing in some depth images can be recovered by using the keratinization method, the integrity and accuracy of the three-dimensional map constructed at this time are also greatly improved to a certain extent.
  • Operation 1120 acquire the IMU data collected within the preset sliding window.
  • VIO Visual-Inertial Odometry, visual-inertial odometer
  • IMU Inertial measurement unit, inertial measurement unit
  • the IMU data acquisition frequency of the electronic device is greater than the acquisition frequency of RGB images, then, when 10 frames of RGB images are included in the sliding window, generally more than 10 sets of IMU data are collected.
  • the initialization information of the IMU it is calculated based on all the IMU data collected in the preset sliding window. Therefore, it is necessary to acquire all the IMU data collected within the preset sliding window.
  • Operation 1140 calculate the initialization information of the IMU; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU.
  • the rotation transformation matrix in the pose transformation matrix of the RGB image can be used for rotation constraints
  • the translation transformation matrix of the RGB image can be used for translation constraints , so as to calculate the initialization information of the IMU.
  • the initialization information of the IMU includes the initial velocity of the electronic device, the bias of the IMU (Bias) and the gravity vector of the IMU.
  • Operation 1160 according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
  • the pose of the RGB image collected after the preset sliding window can be calculated according to the initial pose, the target 3D map, and the initialization information of the IMU.
  • the initial pose is the pose of the target RGB image corresponding to the depth image of the current environment collected for the first time.
  • the target 3D map at this time is the 3D map constructed based on all the image frames in the sliding window.
  • the IMU when calculating the pose of the RGB image collected after the preset sliding window, the IMU is first initialized, and then the preset sliding window can be calculated not only by combining the image data collected by the camera, but also by combining the IMU data.
  • the pose of the RGB image collected behind the window The adaptability and robustness of pose calculation are improved from two dimensions of vision and IMU.
  • a pose calculation method further comprising:
  • the initial pose and initial three-dimensional map are calculated according to the RGB image collected in the preset initialization sliding window
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.
  • the traditional VINS-MONO algorithm is used to calculate the initial pose and position according to the RGB images collected in the preset initialization sliding window.
  • Initial 3D map is used to calculate the pose of the electronic device when the RGB image is collected after the preset sliding window. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.
  • a pose calculation method further comprising:
  • the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window;
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
  • FIG. 12 it is a schematic diagram of calculating the pose when the depth image of the current environment is not collected in the preset initialization sliding window, or the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window .
  • the pose of the electronic device when the depth image is collected is determined as the initial pose. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined. Therefore, using the pose calculation method in this application can only calculate the pose of the electronic device when the RGB image is collected after the sliding window, which is different from when the electronic device collects the RGB image after the sliding window is calculated by using the traditional VINS-MONO algorithm.
  • the durations of the poses are basically the same.
  • the traditional VINS-MONO algorithm or the pose calculation method in this application can be used for calculation. Therefore, a variety of pose calculation methods are provided, which is more flexible.
  • a pose calculation method is provided, with the RGB image of the depth image collected for the first time as the preset initialization of the second frame image in the sliding window and after the second frame image To illustrate the image of , the method includes:
  • Operation 130 collecting RGB images, IMU data and depth images
  • Operation 1306, performing de-distortion and alignment processing on the collected RGB image and depth image
  • Operation 1310 judging whether there is a corresponding depth image (Depth image) in the RGB image currently collected, and it is the Depth image of the current environment collected for the first time; if so, enter operation 1312; if not, enter operation 1318;
  • the image coordinate system of the RGB image is set as the world coordinate system, the pose of the RGB image is used as the initial pose, and the initial pose is set to 0; and the frame_count of the RGB image is recorded as first_depth;
  • Operation 1318 judge whether first_depth is smaller than the sliding window size windowsize (10 frames); if so, then enter operation 1320; if not, then enter operation 1354;
  • Operation 1320 judge whether the frame number frame_count+1 of current frame is equal to first_depth+1; If so, then enter operation 1322; If not, then enter operation 1334;
  • Operation 1322 using the preset perspective projection PnP algorithm to calculate the first pose between the first_depth frame and the first_depth+1 frame;
  • Operation 1324 according to the first pose, project the 3D feature points on the Depth map onto the first_depth+1 frame to generate projected 2D feature points; calculate the weight between the projected 2D feature points and the 2D feature points on the first_depth+1 frame projection error;
  • Operation 1326 if the reprojection error is smaller than the preset error threshold, use the 3D feature point on the Depth map as the target map point; construct an initial 3D map of the current environment according to the target map point.
  • Operation 1330 respectively project the 3D feature points on the Depth map onto the RGB image before the first_depth frame to generate projected 2D feature points; calculate the projected 2D feature points and the RGB image before the first_depth frame The reprojection error between the 2D feature points on ;
  • Operation 1332 if the reprojection error is less than the preset error threshold, then use the 3D feature point on the Depth map as a new target map point; add the new target map point to the initial 3D map to generate the target 3D map;
  • Operation 1334 according to the target 3D map, the current frame, and the RGB image before the current frame, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame;
  • Operation 1336 based on the 3D-2D feature point pair, the traditional PnP algorithm is used to calculate the pose of the current frame; and enter operation 1354 to output the pose;
  • Operation 1338 judging whether there is a corresponding Depth map in the current frame; if yes, then enter operation 1340, if not, then enter operation 1344;
  • Operation 1340 update the target 3D map to generate an intermediate 3D map according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window;
  • Operation 1342 using the triangulation method to update the intermediate three-dimensional map to generate an updated three-dimensional map
  • Operation 1344 using the triangulation method to update the target three-dimensional map to generate an updated three-dimensional map
  • Operation 1346 judge whether frame_count is equal to windowsize (10 frames); if so, then enter operation 1348;
  • Operation 1348 performing BA optimization on the poses corresponding to the calculated 10 frames of images
  • Operation 1350 based on the poses corresponding to the 10 frames of images after BA optimization, perform IMU initialization;
  • Operation 1352 according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
  • the VINS-MONO algorithm is used to calculate the pose.
  • the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined.
  • the time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.
  • the traditional VINS-MONO algorithm is used to initialize according to the preset For the RGB image collected in the sliding window, calculate the pose when the electronic device collects the RGB image after the sliding window is preset and initialized. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.
  • the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.
  • a pose calculation device 1400 is provided, and the device includes:
  • the initial pose determination module 1420 is configured to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the target corresponding to the depth image The RGB image is not the last frame image in the preset initialization sliding window;
  • the next frame RGB image pose determination module 1440 is configured to determine the pose of the electronic device when the next frame of RGB image is collected according to the initial pose, the target RGB image, the depth image, and the next frame RGB image of the target RGB image.
  • a pose calculation device 1400 is provided, and the device further includes:
  • the target three-dimensional map construction module 1460 is used to construct the target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image;
  • the other RGB image pose determination module 1480 is configured to calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window according to the target three-dimensional map.
  • the target three-dimensional map construction module 1460 also includes:
  • An initial three-dimensional map construction unit configured to construct an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;
  • the target three-dimensional map construction unit is used to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and perform the initial three-dimensional map of the current environment according to the second pose transformation matrix and the depth image Update to build a target 3D map of the current environment.
  • a pose calculation device 1400 is provided, and the device further includes:
  • the pose transformation matrix calculation unit is also used to calculate the first pose transformation matrix or the second pose transformation matrix by using the preset perspective projection PnP algorithm;
  • the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.
  • the pose transformation matrix calculation unit is also used to further include:
  • the rotation transformation matrix calculation subunit is used to calculate the rotation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;
  • the translation transformation matrix calculation subunit is used to calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;
  • the pose transformation matrix calculation subunit is used to generate a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image based on the rotation transformation matrix and the translation transformation matrix.
  • the translation transformation matrix calculation subunit is also used to eliminate the relevant RGB image of the target RGB image and the matching 2D-2D feature point pairs on the target RGB image according to the rotation transformation matrix, and obtain the 2D feature point after elimination.
  • -2D feature point pair according to the depth image, convert the 2D-2D feature point pair after elimination into a 3D-2D feature point pair; according to the 3D-2D feature point pair, calculate the relationship between the relevant RGB image of the target RGB image and the target RGB image
  • the translation transformation matrix between is also used to eliminate the relevant RGB image of the target RGB image and the matching 2D-2D feature point pairs on the target RGB image according to the rotation transformation matrix, and obtain the 2D feature point after elimination.
  • the initial three-dimensional map construction unit is also used to project the 3D feature points on the depth image to the next frame of RGB image according to the first pose transformation matrix to generate projected 2D feature points; calculate projected 2D feature points The reprojection error between the point and the 2D feature point on the next RGB image; if the reprojection error is less than the preset error threshold, the 3D feature point on the depth image is used as the target map point; the current environment is constructed according to the target map point The initial 3D map of .
  • the target three-dimensional map construction unit is further configured to respectively project the 3D feature points on the depth image onto the RGB image before the target RGB image according to the second pose transformation matrix to generate projected 2D feature points; Calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the target RGB image; if the reprojection error is less than the preset error threshold, use the 3D feature point on the depth image as the new target map point; add new target map points to the initial 3D map to construct the target 3D map of the current environment.
  • other RGB image pose determination module 1480 includes:
  • the current frame definition unit is used to use the next frame of the next frame RGB image as the current frame, and perform the following target operations:
  • the target operation unit is used to generate a 3D-2D feature point pair matching between the target 3D map and the current frame according to the target 3D map, the current frame and the RGB image before the current frame; calculate the current frame based on the 3D-2D feature point pair The pose of the frame; update the target 3D map according to the current frame, and use the updated 3D map as the new target 3D map;
  • the loop unit is used to use the next frame of the current frame as a new current frame, and execute the target operation in a loop until the pose of the last frame of the RGB image in the preset initialization sliding window is calculated.
  • the target operation unit is further configured to obtain matching 2D feature point pairs from the current frame and the RGB image before the current frame; obtain the 2D feature point pairs from the map points in the target three-dimensional map The matched 3D feature points, according to the matched 3D feature points and 2D feature point pairs, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame.
  • the target operation unit is also used to calculate the current frame according to the depth image corresponding to the current frame, the pose between the current frame and the RGB image before the current frame in the preset sliding window
  • the transformation matrix is used to update the target 3D map to generate an intermediate 3D map
  • the triangulation method is used to update the intermediate 3D map to generate an updated 3D map.
  • the target operation unit is further configured to update the target 3D map by using a triangulation method to generate an updated 3D map.
  • a pose calculation device 1600 is provided, and the device further includes:
  • IMU data acquisition module 1620 for acquiring the IMU data collected in the preset sliding window
  • the IMU initialization module 1640 is used to calculate the initialization information of the IMU according to the pose and IMU data of each frame of the RGB image in the preset sliding window; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU;
  • the first pose calculation module 1660 is configured to calculate the pose of the RGB image collected after the preset sliding window according to the initial pose, the target 3D map and the initialization information of the IMU.
  • a pose calculation device is provided, the device also includes:
  • the second pose calculation module is used to calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window if the depth image of the current environment is not collected in the preset initialization sliding window;
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.
  • a pose calculation device is provided, the device also includes:
  • the third pose calculation module is used to calculate the initial pose and initial position according to the RGB images collected in the preset initialization sliding window if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window. 3D map;
  • the initial pose and the initial three-dimensional map calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
  • each module in the pose calculation device is only for illustration. In other embodiments, the pose calculation device can be divided into different modules according to needs, so as to complete all or part of the functions of the pose calculation device.
  • Each module in the pose calculation device can be fully or partially realized by software, hardware and a combination thereof.
  • Each module can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can call and execute the corresponding operations of the above modules.
  • an electronic device including a memory and a processor, and a computer program is stored in the memory.
  • the processor executes a pose provided by each of the above embodiments. The operation of the calculation method.
  • Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment.
  • the electronic device includes a processor and a memory connected through a system bus.
  • the processor is used to provide calculation and control capabilities to support the operation of the entire electronic device.
  • the memory may include non-volatile storage media and internal memory.
  • Nonvolatile storage media store operating systems and computer programs.
  • the computer program can be executed by a processor, so as to implement a pose calculation method provided in the above embodiments.
  • the internal memory provides a high-speed running environment for the operating system computer program in the non-volatile storage medium.
  • the electronic device may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a wearable device.
  • a terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a wearable device.
  • each module in the pose calculation device provided in the embodiment of the present application may be in the form of a computer program.
  • the computer program can run on or on an electronic device.
  • the program modules constituted by the computer program can be stored in the electronic device or the memory of the electronic device.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform operations of the pose calculation method.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • DDR SDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced SDRAM
  • SLDRAM Synchronous Link (Synchlink) DRAM
  • SLDRAM Synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

The present application relates to a pose calculation method and apparatus, an electronic device, and a computer readable storage medium. The method comprises: if a depth image of a current environment is collected in a preset initialization sliding window for the first time, determining the pose of the electronic device in collecting the depth image as an initial pose, wherein a target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window (220); and determining, according to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, the pose of the electronic device in collecting the next frame of RGB image (240).

Description

位姿计算方法和装置、电子设备、可读存储介质Pose calculation method and device, electronic device, readable storage medium
本申请要求于2021年07月29日提交中国专利局,申请号为202110866966.3,发明名称为“位姿计算方法和装置、电子设备、可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on July 29, 2021 with the application number 202110866966.3. Incorporated in this application by reference.
技术领域technical field
本申请涉及计算机视觉技术领域,特别是涉及一种位姿计算方法和装置、电子设备、可读存储介质。The present application relates to the technical field of computer vision, in particular to a pose calculation method and device, electronic equipment, and a readable storage medium.
背景技术Background technique
电子设备在未知环境中的位置与姿态(简称位姿)问题,是增强现实、虚拟现实、移动机器人、无人驾驶等行业的关键技术之一。且随着这些行业的快速发展,对电子设备对周围环境中的物体进行定位的精确度也随之提出了越来越高的要求。而在采用VIO(Visual-Inertial Odometry,视觉-惯性里程计)系统对周围环境中的物体进行定位时,其核心的操作之一为确定电子设备在拍摄每帧图像时的位姿。The position and attitude of electronic devices in an unknown environment (referred to as pose) is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices. When using the VIO (Visual-Inertial Odometry, visual-inertial odometer) system to locate objects in the surrounding environment, one of its core operations is to determine the pose of the electronic device when capturing each frame of image.
传统方法,在确定电子设备在拍摄每帧图像时的位姿时,需要基于电子设备上的相机,对周围环境采集一定数目的RGB图或深度图像,才能够基于该一定数目的RGB图或深度图像进行初始化,进而实时输出电子设备在拍摄每帧图像时的位姿。显然,采集一定数目的RGB图和/或深度图像需要耗费较长的时间,进而导致实时输出位姿的等待时间也较长。In the traditional method, when determining the pose of the electronic device when capturing each frame of image, it is necessary to collect a certain number of RGB images or depth images of the surrounding environment based on the camera on the electronic device, so as to be able to The image is initialized, and then the pose of the electronic device is output in real time when each frame of image is captured. Obviously, it takes a long time to collect a certain number of RGB images and/or depth images, which in turn leads to a long waiting time for real-time output poses.
发明内容Contents of the invention
本申请实施例提供了一种位姿计算方法和装置、电子设备、可读存储介质,可以降低实时输出位姿的等待时间。Embodiments of the present application provide a pose calculation method and device, electronic equipment, and a readable storage medium, which can reduce the waiting time for real-time output pose.
一种位姿计算方法,所述方法包括:A pose calculation method, the method comprising:
若在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集所述深度图像时电子设备的位姿确定为初始位姿;其中,所述深度图像对应的目标RGB图像并非所述预设初始化滑窗内的最后一帧图像;If the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the Default initialization of the last frame image in the sliding window;
根据所述初始位姿、所述目标RGB图像、所述深度图像及所述目标RGB图像的下一帧RGB图像,确定所述电子设备采集所述下一帧RGB图像时的位姿。According to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.
一种位姿计算装置,所述装置包括:A pose computing device, the device comprising:
初始位姿确定模块,用于若在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集所述深度图像时电子设备的位姿确定为初始位姿;其中,所述深度图像对应的目标RGB图像并非所述预设初始化滑窗内的最后一帧图像;The initial pose determination module is used to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the depth image The corresponding target RGB image is not the last frame image in the preset initialization sliding window;
位姿确定模块,用于根据所述初始位姿、所述目标RGB图像、所述深度图像及所述目标RGB图像的下一帧RGB图像,确定所述电子设备采集所述下一帧RGB图像时的位姿。A pose determination module, configured to determine that the electronic device acquires the next frame of RGB images according to the initial pose, the target RGB image, the depth image, and the next frame of RGB images of the target RGB image time pose.
一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如上所述的位姿计算方法的操作。An electronic device includes a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the operation of the pose calculation method as described above.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的位姿计算方法的操作。A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the operation of the pose calculation method as described above is realized.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为一个实施例中位姿计算方法的应用环境图;Fig. 1 is an application environment diagram of a pose calculation method in an embodiment;
图2为一个实施例中位姿计算方法的流程图;Fig. 2 is the flow chart of pose computing method in an embodiment;
图3为另一个实施例中位姿计算方法的流程图;Fig. 3 is the flow chart of pose calculation method in another embodiment;
图4为一个实施例中若目标RGB图像不是预设初始化滑窗内的第一帧图像,构建当前环境的目标三维地图的示意图;Fig. 4 is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment;
图5为一个实施例中采用预设透视投影PnP算法,计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵方法的流程图;Fig. 5 adopts preset perspective projection PnP algorithm in one embodiment, calculates the flow chart of the first pose transformation matrix method between next frame RGB image and target RGB image;
图6为一个实施例中采用预设透视投影PnP算法计算位姿变换矩阵的示意图;FIG. 6 is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment;
图7为图5中计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵方法的流程图;Fig. 7 is the flowchart of calculating the first translation transformation matrix method between the next frame RGB image and the target RGB image in Fig. 5;
图8为一个实施例中根据第一位姿变换矩阵及深度图像,构建当前环境的目标三维地图方法的流程图;FIG. 8 is a flowchart of a method for constructing a target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image in one embodiment;
图9为图3中根据目标三维地图,计算电子设备在预设初始化滑窗内采集下一帧RGB图像之后的RGB图像时的位姿方法的流程图;Fig. 9 is a flow chart of the method for calculating the pose of the electronic device when the RGB image after the next frame of RGB image is collected in the preset initialization sliding window according to the three-dimensional map of the target in Fig. 3;
图10为一个实施例中计算采集下一帧RGB图像之后的RGB图像时的位姿及三维地图计算的示意图;Fig. 10 is a schematic diagram of the pose and three-dimensional map calculation when calculating the RGB image after collecting the next frame of RGB image in one embodiment;
图11为再一个实施例中位姿计算方法的流程图;Fig. 11 is a flowchart of a pose calculation method in another embodiment;
图12为又一个实施例中位姿计算方法的示意图;Fig. 12 is a schematic diagram of a pose calculation method in yet another embodiment;
图13为一个具体的实施例中位姿计算方法的流程图;Fig. 13 is a flowchart of a pose calculation method in a specific embodiment;
图14为一个实施例中位姿计算装置的结构框图;Fig. 14 is a structural block diagram of a pose calculation device in an embodiment;
图15为一个实施例中位姿计算装置的结构框图;Fig. 15 is a structural block diagram of a pose calculation device in an embodiment;
图16为另一个实施例中位姿计算装置的结构框图;Fig. 16 is a structural block diagram of a pose calculation device in another embodiment;
图17为一个实施例中电子设备的内部结构示意图。Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.
电子设备在未知环境中的位置与姿态(简称位姿)问题,是增强现实、虚拟现实、移动机器人、无人驾驶等行业的关键技术之一。且随着这些行业的快速发展,对电子设备对周围环境中的物体进行定位的精确度也随之提出了越来越高的要求。而在采用VIO(Visual-Inertial Odometry,视觉-惯性里程计)系统对周围环境中的物体进行定位时,其核心的操作之一为确定电子设备在拍摄每帧图像时的位姿。The position and attitude of electronic devices in an unknown environment (referred to as pose) is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices. When using the VIO (Visual-Inertial Odometry, visual-inertial odometer) system to locate objects in the surrounding environment, one of its core operations is to determine the pose of the electronic device when capturing each frame of image.
其中一种传统方法,在确定电子设备在拍摄每帧图像时的位姿时,采用VINS(visual-inertial system,视觉惯性系统)-MONO算法。VINS-MONO算法的具体操作包括:假设预设初始化滑窗内为10帧图像,当然,本申请中对预设初始化滑窗的大小不做具体限定。那么,第一步,通过电子设备上的相机采集RGB图像,在预设初始化滑窗内需累计满10帧RGB图像之后,从这10帧RGB图像中筛选出两帧视差满足条件的图像(例如L帧和R帧),再利用对极几何约束来计算这两帧之间的位姿。第二步,再利用该位姿使用三角化法恢复出两帧之间共视的地图点。第三步,将这些地图点投影至上述10帧RGB图像中除去L帧和R帧之外的其他帧中的任一帧上,通过最小化重投影误差来分别计算该任一帧的位姿。第四步,利用该任一帧与L帧、R帧之间采用三角化法恢复出该任一帧与L帧、R帧之间共视的地图点。重复执行第三步及第四步,便可计算出预设初始化滑窗内10帧RGB图像的位姿,以及这10帧RGB图像对应的地图点。One of the traditional methods uses the VINS (visual-inertial system, visual-inertial system)-MONO algorithm when determining the pose of the electronic device when capturing each frame of image. The specific operation of the VINS-MONO algorithm includes: assuming that there are 10 frames of images in the preset initialization sliding window. Of course, the size of the preset initialization sliding window is not specifically limited in this application. Then, the first step is to collect RGB images through the camera on the electronic device. After accumulating 10 frames of RGB images in the preset initialization sliding window, two frames of images with parallax satisfying the conditions are selected from the 10 frames of RGB images (for example, L frame and R frame), and then use the epipolar geometric constraints to calculate the pose between these two frames. In the second step, the pose is used to restore the map points that are co-viewed between the two frames using the triangulation method. The third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error . The fourth step is to use the triangulation method between the arbitrary frame and the L frame and the R frame to restore the map points that are co-viewed between the arbitrary frame and the L frame and the R frame. By repeating the third and fourth steps, the poses of the 10 frames of RGB images in the preset initialization sliding window and the map points corresponding to these 10 frames of RGB images can be calculated.
还有另一种传统方法,在确定电子设备在拍摄每帧图像时的位姿时,采用VINS-RGBD算法,VINS-RGBD算法的具体操作包括:假设预设初始化滑窗内为10帧图像,那么,第一步,通过电子设备上的相机同时采集RGB图像及深度图像,且需保证每帧RGB图像均存在与其对应的深度图像。这里的对应指的是RGB图像在时间空间上均与深度图像对齐。第二步,在预设初始化滑窗内需累计满10帧RGB图像及对应的深度图像之后,从这10帧RGB图像中筛选出两帧视差满足条件的图像(例如L帧和R帧)。采用传统PnP算法同时计算L帧的位姿,再结合L帧对应的深度图像,采用重投影误差满足预设阈值来筛选有效地图点,进而恢复地图点。第三步,将这些地图点投影至上述10帧RGB图像 中除去L帧和R帧之外的其他帧中的任一帧上,通过最小化重投影误差来分别计算该任一帧的位姿。第四步,利用该任一帧的深度图像与L帧之间恢复可以共视且重投影误差满足阈值要求的地图点。重复执行第三步及第四步,便可计算出预设初始化滑窗内10帧RGB图像的位姿,以及这10帧RGB图像对应的地图点。There is another traditional method. When determining the pose of the electronic device when capturing each frame of image, the VINS-RGBD algorithm is used. The specific operations of the VINS-RGBD algorithm include: assuming that there are 10 frames of images in the preset initialization sliding window, Then, in the first step, the camera on the electronic device collects the RGB image and the depth image at the same time, and it is necessary to ensure that each frame of the RGB image has a corresponding depth image. The correspondence here means that the RGB image is aligned with the depth image in time and space. In the second step, after accumulating 10 frames of RGB images and corresponding depth images in the preset initialization sliding window, two frames of images (such as L frame and R frame) are screened out from these 10 frames of RGB images. The traditional PnP algorithm is used to calculate the pose of the L frame at the same time, and then combined with the depth image corresponding to the L frame, the reprojection error meets the preset threshold to filter the effective map points, and then restore the map points. The third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error . The fourth step is to use the depth image of any frame and the L frame to recover the map points that can be co-viewed and the reprojection error meets the threshold requirement. By repeating the third and fourth steps, the poses of the 10 frames of RGB images in the preset initialization sliding window and the map points corresponding to these 10 frames of RGB images can be calculated.
以上两种传统方法,在实时输出位姿时,均需要基于电子设备上的相机,对周围环境采集一定数目的RGB图或深度图像,才能够基于该一定数目的RGB图或深度图像,实时输出电子设备在拍摄每帧图像时的位姿。显然,采集一定数目的RGB图和/或深度图像需要耗费较长的时间,进而导致实时输出位姿的等待时间也较长。In the above two traditional methods, when outputting the pose in real time, it is necessary to collect a certain number of RGB images or depth images of the surrounding environment based on the camera on the electronic device, so that the real-time output can be based on the certain number of RGB images or depth images. The pose of the electronic device when capturing each frame of image. Obviously, it takes a long time to collect a certain number of RGB images and/or depth images, which in turn leads to a long waiting time for real-time output poses.
因此,为了解决实时输出位姿的等待时间较长的问题,本申请实施例中提出了一种位姿计算方法,不需要在预设初始化滑窗内采集满RGB图像,电子设备才能够输出采集预设初始化滑窗外的第一帧RGB图像时的位姿。而是在预设初始化滑窗内首次采集到当前环境的深度图像,且深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像时,就可以将采集深度图像时电子设备的位姿确定为初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。显然,在预设初始化滑窗内就可以实时输出位姿,降低了实时输出位姿的等待时间。Therefore, in order to solve the problem of long waiting time for real-time output pose, a pose calculation method is proposed in the embodiment of the present application. It is not necessary to collect full RGB images in the preset initialization sliding window before the electronic device can output the collected images. Preset the pose when initializing the first frame of RGB image outside the sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. Obviously, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output pose.
图1为一个实施例中位姿计算方法的应用场景图。如图1所示,该应用环境包括电子设备120,该电子设备120包括第一摄像头和第二摄像头。其中,第一摄像头为RGB摄像头,第二摄像头为采集深度图像的摄像头,例如,TOF(Time-offlight,飞行时间)摄像头或结构光摄像头等,本申请对此不做限定。电子设备120通过第一摄像头和第二摄像头分别采集当前环境的RGB图像及深度图像,若电子设备120在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集深度图像时电子设备的位姿确定为初始位姿;其中,深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像。根据初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。电子设备120可以是手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、穿戴式设备(智能手环、智能手表、智能眼镜、智能手套、智能袜子、智能腰带等)、VR(virtual reality,虚拟现实)设备、智能家居、无人驾驶汽车等任意终端设备。Fig. 1 is an application scene diagram of a pose calculation method in an embodiment. As shown in FIG. 1 , the application environment includes an electronic device 120, and the electronic device 120 includes a first camera and a second camera. Wherein, the first camera is an RGB camera, and the second camera is a camera for collecting depth images, for example, a TOF (Time-offlight, time-of-flight) camera or a structured light camera, which is not limited in this application. The electronic device 120 collects the RGB image and the depth image of the current environment respectively through the first camera and the second camera. If the electronic device 120 collects the depth image of the current environment for the first time within the preset initialization sliding window, the electronic device will collect the depth image when the depth image is collected. The pose of is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined. The electronic device 120 can be a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device (smart bracelet, smart watch, smart glasses, smart gloves, smart socks, smart belt, etc.), VR (virtual reality , virtual reality) devices, smart homes, driverless cars and other arbitrary terminal devices.
图2为一个实施例中位姿计算方法的流程图。本实施例中的位姿计算方法,以运行于图1中的电子设备120上为例进行描述,该电子设备120包括第一摄像头和第二摄像头。其中,第一摄像头为RGB摄像头,第二摄像头为采集深度图像的摄像头。该位姿计算方法包括操作220-操作240,其中,Fig. 2 is a flowchart of a pose calculation method in an embodiment. The pose calculation method in this embodiment is described by taking the operation on the electronic device 120 in FIG. 1 as an example, and the electronic device 120 includes a first camera and a second camera. Wherein, the first camera is an RGB camera, and the second camera is a camera for collecting depth images. The pose calculation method includes operation 220-operation 240, wherein,
操作220,若在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集深度图像时电子设备的位姿确定为初始位姿;其中,深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像。 Operation 220, if the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the preset initialization The last image frame within the sliding window.
传统方法中需要采集满一定数目的图像帧之后,才能够实时输出电子设备在拍摄每帧图像时的位姿。这些一定数目的图像帧就填满了预设初始化滑窗(以下简称滑窗),即预设初始化滑窗的大小与采集一定数目的图像帧的时长相等。例如,设置预设初始化滑窗内包括10帧RGB图像,则在采集了10帧RGB图像之后,这10帧RGB图像就填满了预设初始化滑窗,此时预设初始化滑窗的大小与采集10帧RGB图像的时长相等。In the traditional method, it is necessary to collect a certain number of image frames before the pose of the electronic device can be output in real time when each frame of image is captured. These certain number of image frames fill up the preset initialization sliding window (hereinafter referred to as the sliding window), that is, the size of the preset initialization sliding window is equal to the duration of collecting a certain number of image frames. For example, if the default initialization sliding window is set to include 10 frames of RGB images, then after collecting 10 frames of RGB images, the 10 frames of RGB images will fill the preset initialization sliding window. At this time, the size of the preset initialization sliding window is the same as The duration of collecting 10 frames of RGB images is equal.
在本申请实施例中,不需要在预设初始化滑窗内采集满RGB图像,电子设备才能够输出采集预设初始化滑窗外的第一帧RGB图像时的位姿。而是通过电子设备中的第一摄像头采集RGB图像、同时通过第二摄像头采集深度图像。若在预设初始化滑窗内首次采集到当前环境的深度图像,且与该深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像,则将采集深度图像时电子设备的位姿确定为初始位姿,并将目标RGB图像的图像坐标系作为世界坐标系,实现了对电子设备视觉上的初始化。且不需要从这10帧RGB图像中筛选出两帧视差满足条件的图像,因此,适应性更广。In the embodiment of the present application, the electronic device does not need to collect full RGB images within the preset initialization sliding window before the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window. Instead, the RGB image is collected through the first camera in the electronic device, and the depth image is collected through the second camera at the same time. If the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the position of the electronic device when the depth image is collected will be The pose is determined as the initial pose, and the image coordinate system of the target RGB image is used as the world coordinate system, which realizes the visual initialization of the electronic device. And there is no need to select two frames of images whose parallax meets the condition from the 10 frames of RGB images, so the adaptability is wider.
其一,由于通过第二摄像头采集深度图像时的采集频率可能低于第一摄像头采集RGB图像时的采集频率,所以会导致不能采集到与每帧RGB图像对应的深度图像,即相对于RGB图像会缺失部分深度图像。其二,在电子设备中经常由于第二摄像头及相关的软硬件异常,也会导致相对于RGB图像缺失部分深度图像或所采集的RGB图像与深度图像对齐异常等情况。以上都会导致不能针对每帧RGB 图像都能够采集到与之对应的深度图像。First, since the acquisition frequency of the second camera to acquire the depth image may be lower than the acquisition frequency of the first camera to acquire the RGB image, it will not be able to acquire the depth image corresponding to each frame of the RGB image, that is, relative to the RGB image Part of the depth image will be missing. Second, in electronic equipment, often due to the abnormality of the second camera and related software and hardware, some depth images are missing relative to the RGB image or the alignment of the collected RGB image and the depth image is abnormal. All of the above will lead to the inability to collect the corresponding depth image for each frame of RGB image.
操作240,根据初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。 Operation 240, according to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.
将采集深度图像时电子设备的位姿确定为初始位姿,并将目标RGB图像的图像坐标系作为世界坐标系,实现了对电子设备视觉上的初始化之后,就可以实时输出采集下一帧RGB图像时的位姿。Determine the pose of the electronic device when collecting the depth image as the initial pose, and use the image coordinate system of the target RGB image as the world coordinate system. After realizing the visual initialization of the electronic device, you can output and collect the next frame of RGB in real time. pose of the image.
具体的,在确定了初始位姿及世界坐标系之后,根据目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。先可以采用透视投影PnP算法,基于目标RGB图像与该目标RGB图像的下一帧RGB图像之间相互匹配的2D-2D特征点对;其次,根据这些2D-2D特征点对结合深度图像上的3D点,得到匹配的3D-2D特征点对;再次,基于3D-2D特征点对,计算目标RGB图像的下一帧RGB图像与目标RGB图像之间的位姿变换矩阵。最后,基于该位姿变换矩阵及初始位姿,就可以生成该下一帧RGB图像的位姿。其中,位姿指的是位置和姿态,位姿是一个六维的矩阵,包含三个位置(X,Y,Z)和三个姿态角(航向,俯仰,横滚)。Specifically, after determining the initial pose and the world coordinate system, according to the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image. First, the perspective projection PnP algorithm can be used, based on the matching 2D-2D feature point pairs between the target RGB image and the next frame RGB image of the target RGB image; secondly, according to these 2D-2D feature point pairs combined with the depth image. 3D points to obtain matching 3D-2D feature point pairs; again, based on the 3D-2D feature point pairs, calculate the pose transformation matrix between the next frame RGB image of the target RGB image and the target RGB image. Finally, based on the pose transformation matrix and the initial pose, the pose of the next frame of RGB image can be generated. Among them, pose refers to position and attitude, and pose is a six-dimensional matrix, including three positions (X, Y, Z) and three attitude angles (heading, pitch, roll).
其中,这里的透视投影PnP算法可以是传统的透视投影PnP算法,即基于3D-2D特征点对同时计算出两帧之间的旋转变换矩阵及平移变换矩阵,旋转变换矩阵及平移变换矩阵构成了位姿变换矩阵。当然,这里的透视投影PnP算法可以是新的透视投影PnP算法,即基于3D-2D特征点对分步计算出两帧之间的旋转变换矩阵及平移变换矩阵。Among them, the perspective projection PnP algorithm here can be the traditional perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are simultaneously calculated based on the 3D-2D feature point pairs, and the rotation transformation matrix and the translation transformation matrix constitute Pose transformation matrix. Of course, the perspective projection PnP algorithm here may be a new perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are calculated step by step based on the 3D-2D feature point pairs.
本申请实施例中,不需要在预设初始化滑窗内采集满RGB图像,电子设备才能够输出采集预设初始化滑窗外的第一帧RGB图像时的位姿。而是在预设初始化滑窗内首次采集到当前环境的深度图像,且深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像时,就可以将采集深度图像时电子设备的位姿确定为初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。将首次输出位姿的时间,从预设初始化滑窗外大大提前到滑窗内且提前到首次接收到深度图像的下一帧。从而,在预设初始化滑窗内就可以实时输出位姿,降低了实时输出位姿的等待时间。In the embodiment of the present application, the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. The time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.
同时,不需要在针对每帧RGB图像均对应采集到深度图像之后,才能够实时输出位姿。而是可以采用比采集RGB图像的频率较低的频率,来采集深度图,或可以采用可变频率来采集深度图,且只要在滑窗内采集到深度图,就可以开始初始化,并在初始化之后实时输出位姿。由于可以采用较低频率或可变频率采集深度图,就可以基于低频率的深度图去进行初始化及实时输出位姿,因此,避免了采集大量的数据,也避免了处理大量的数据,进而,降低了电子设备的功耗。At the same time, it is not necessary to output the pose in real time after the corresponding depth image is collected for each frame of RGB image. Instead, the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.
接上一个实施例,在计算出电子设备采集下一帧RGB图像时的位姿之后,依次计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿。那么,如图3所示,提供了一种位姿计算方法,还包括:Continuing from the previous embodiment, after calculating the pose of the electronic device when it captures the next frame of RGB image, sequentially calculate the pose of the electronic device when it collects other RGB images after the next frame of RGB image within the preset initialization sliding window . Then, as shown in Figure 3, a pose calculation method is provided, which also includes:
操作260,根据电子设备采集下一帧RGB图像时的位姿以及深度图像,构建当前环境的目标三维地图。In operation 260, construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image.
在操作240中计算出了电子设备采集下一帧RGB图像时的位姿,此时就可以基于目标RGB图像的深度图像以及该下一帧RGB图像时的位姿,构建当前环境的目标三维地图。其中,在构建当前环境的目标三维地图时,分为以下两种情况。In operation 240, the pose of the electronic device when capturing the next frame of RGB image is calculated, and at this time, the target 3D map of the current environment can be constructed based on the depth image of the target RGB image and the pose of the next frame of RGB image . Wherein, when constructing the target three-dimensional map of the current environment, it can be divided into the following two situations.
其中一种情况是,假设目标RGB图像为预设初始化滑窗内的第一帧图像,即针对预设初始化滑窗内的第一帧RGB图像,就采集到了与之对应的深度图像。那么,因为在该目标RGB图像之前不存在任何RGB图像,所以在构建当前环境的目标三维地图时,仅需要计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,根据第一位姿变换矩阵及深度图像,就可以构建当前环境的目标三维地图。此处,如果在计算下一帧RGB图像的位姿的过程中,已经计算出了第一位姿变换矩阵,则就只需要直接获取即可。One of the cases is assuming that the target RGB image is the first frame image in the preset initialization sliding window, that is, the corresponding depth image is collected for the first frame RGB image in the preset initialization sliding window. Then, because there is no RGB image before the target RGB image, when constructing the target 3D map of the current environment, only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated, according to With a pose transformation matrix and a depth image, a target 3D map of the current environment can be constructed. Here, if the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly.
另一种情况是,假设目标RGB图像不是预设初始化滑窗内的第一帧图像,即并不是在滑窗内的第一帧RGB图像就采集到了与之对应的深度图像。那么,在构建当前环境的目标三维地图时,因为该目标RGB图像之前还采集了RGB图像,则就不仅可以基于该目标RGB图像来恢复地图点,还可以基于 该目标RGB图像之前采集的RGB图像来恢复地图点,以实现恢复更多的地图点来丰富当前环境的三维地图。In another case, assuming that the target RGB image is not the first frame of image in the preset initialization sliding window, that is, the corresponding depth image is not collected from the first frame of RGB image in the sliding window. Then, when constructing the target 3D map of the current environment, because the target RGB image has also collected RGB images before, the map points can not only be restored based on the target RGB image, but also based on the RGB images collected before the target RGB image To restore the map points, so as to restore more map points to enrich the 3D map of the current environment.
此时,同理计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,根据第一位姿变换矩阵及深度图像,就可以构建当前环境的初始三维地图。此处,如果在计算下一帧RGB图像的位姿的过程中,已经计算出了第一位姿变换矩阵,则就只需要直接获取即可。然后,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图。At this time, the first pose transformation matrix between the next frame of RGB image and the target RGB image is calculated similarly, and the initial three-dimensional map of the current environment can be constructed according to the first pose transformation matrix and the depth image. Here, if the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly. Then, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .
操作280,根据目标三维地图,计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿。 Operation 280, according to the target three-dimensional map, calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window.
在构建了目标三维地图之后,就可以根据目标三维地图,计算电子设备在滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿。由于目标三维地图是基于滑窗内第一帧RGB图像直到目标RGB图像的下一帧RGB图像,结合目标RGB图像的深度图像所构建的,即是基于这些RGB图像及该深度图像所共同恢复的地图点所构建的,显然目标三维地图上的地图点相对于该深度图像来说更加全面。After the target three-dimensional map is constructed, the position and orientation of the electronic device when collecting other RGB images after the next RGB image frame within the sliding window can be calculated according to the target three-dimensional map. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is based on these RGB images and the depth image. Obviously, the map points on the target three-dimensional map are more comprehensive than the depth image.
因此,计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿,就可以直接根据目标三维地图计算其他RGB图像的位姿。Therefore, by calculating the poses of other RGB images located after the next frame of RGB images when the electronic device collects the other RGB images within the preset initialization sliding window, the poses of other RGB images can be directly calculated according to the target three-dimensional map.
本申请实施例中,根据电子设备采集下一帧RGB图像时的位姿以及深度图像,构建当前环境的目标三维地图。由于目标三维地图是基于滑窗内第一帧RGB图像直到目标RGB图像的下一帧RGB图像,结合目标RGB图像的深度图像所构建的,显然目标三维地图上的地图点相对于该深度图像来说更加全面。因此,根据目标三维地图,计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿。就大大提高了计算出的下一帧RGB图像之后的其他RGB图像时的位姿准确性。In the embodiment of the present application, the target three-dimensional map of the current environment is constructed according to the pose and depth image when the electronic device collects the next frame of RGB image. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is obvious that the map points on the target 3D map are relative to the depth image. Said more comprehensively. Therefore, according to the target three-dimensional map, the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.
在一个实施例中,操作260,根据电子设备采集下一帧RGB图像时的位姿以及深度图像,构建当前环境的目标三维地图,包括:In one embodiment, operation 260 is to construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image, including:
计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,根据第一位姿变换矩阵及深度图像,构建当前环境的目标三维地图。Calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, and construct the target 3D map of the current environment according to the first pose transformation matrix and the depth image.
在一个实施方式中,若目标RGB图像是预设初始化滑窗内的第一帧图像,则目标RGB图像之前不存在RGB图像,因此只需计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵。然后,根据第一位姿变换矩阵及深度图像,构建当前环境的目标三维地图。In one embodiment, if the target RGB image is the first frame image in the preset initialization sliding window, there is no RGB image before the target RGB image, so it is only necessary to calculate the first frame image between the next frame RGB image and the target RGB image. A pose transformation matrix. Then, according to the first pose transformation matrix and the depth image, a target 3D map of the current environment is constructed.
在另一个实施方式中,若目标RGB图像不是预设初始化滑窗内的第一帧图像,则目标RGB图像之前存在RGB图像。因此,首先,计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,根据第一位姿变换矩阵及深度图像,构建当前环境的初始三维地图。其次,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图。In another embodiment, if the target RGB image is not the first frame image within the preset initialization sliding window, there is an RGB image before the target RGB image. Therefore, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, and construct the initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image. Secondly, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .
本申请实施例中,若目标RGB图像是预设初始化滑窗内的第一帧图像,则只需计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵。然后,根据第一位姿变换矩阵及深度图像,直接构建出当前环境的目标三维地图。若目标RGB图像不是预设初始化滑窗内的第一帧图像,则先计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵。再根据第一位姿变换矩阵及深度图像,构建当前环境的初始三维地图。最后,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图。由于目标三维地图是基于滑窗内第一帧RGB图像直到目标RGB图像的下一帧RGB图像,结合目标RGB图像的深度图像所构建的,显然目标三维地图上的地图点相对于该深度图像来说更加全面。因此,根据目标三维地图,计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿。就大大提高了计算出的下一帧RGB图像之后的其他RGB图像时的位姿准确性。In the embodiment of the present application, if the target RGB image is the first frame image in the preset initialization sliding window, only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated. Then, according to the first pose transformation matrix and the depth image, the target 3D map of the current environment is directly constructed. If the target RGB image is not the first frame image in the preset initialization sliding window, first calculate the first pose transformation matrix between the next frame RGB image and the target RGB image. Then, according to the first pose transformation matrix and the depth image, an initial 3D map of the current environment is constructed. Finally, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map . Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is obvious that the map points on the target 3D map are relative to the depth image. Said more comprehensively. Therefore, according to the target three-dimensional map, the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.
在一个实施例中,根据电子设备采集下一帧RGB图像时的位姿以及深度图像,构建当前环境的目标三维地图,包括:In one embodiment, according to the pose and depth image when the electronic device collects the next frame of RGB images, a target three-dimensional map of the current environment is constructed, including:
根据第一位姿变换矩阵及深度图像,构建当前环境的初始三维地图;Construct an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;
计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图。Calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.
如图4所示,为一个实施例中若目标RGB图像不是预设初始化滑窗内的第一帧图像,构建当前环境的目标三维地图的示意图。例如,假设预设初始化滑窗内为10帧图像,在第4帧时采集到深度图像,即目标RGB图像为第4帧RGB图像。那么,此时构建当前环境的目标三维地图时,具体包括两个操作:As shown in FIG. 4 , it is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame of RGB image. Then, when constructing the target 3D map of the current environment at this time, it specifically includes two operations:
第一步,构建当前环境的初始三维地图。具体的,计算下一帧RGB图像(第5帧)与目标RGB图像(第4帧)之间的第一位姿变换矩阵,根据第一位姿变换矩阵及深度图像,构建当前环境的初始三维地图。这里计算当前环境的初始三维地图的方式,与若目标RGB图像为预设初始化滑窗内的第一帧图像,则操作260中根据电子设备采集下一帧RGB图像时的位姿以及深度图像,构建当前环境的目标三维地图的方式相同,再此不在赘述。In the first step, an initial 3D map of the current environment is constructed. Specifically, calculate the first pose transformation matrix between the next RGB image (frame 5) and the target RGB image (frame 4), and construct the initial three-dimensional image of the current environment based on the first pose transformation matrix and the depth image map. The method of calculating the initial three-dimensional map of the current environment here is the same as if the target RGB image is the first frame image in the preset initialization sliding window, then in operation 260, the pose and depth image of the next frame RGB image are collected according to the electronic device, The method of constructing the target 3D map of the current environment is the same, and will not be repeated here.
第二步,对初始三维地图进行更新,生成目标三维地图。具体的,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行补充,生成目标三维地图。In the second step, the initial 3D map is updated to generate the target 3D map. Specifically, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and supplement the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.
针对目标RGB图像之前的RGB图像,即为针对第1、2、3帧,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,即为分别计算第4帧与第1、2、3帧之间的第二位姿变换矩阵。再根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图。For the RGB image before the target RGB image, that is, for the first, second, and third frames, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, that is, calculate the fourth frame and The second pose transformation matrix between frames 1, 2, and 3. Then update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate a target 3D map.
具体的,首先计算第4帧与第1帧之间的第二位姿变换矩阵,再根据该第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成更新后的三维地图。接着,计算第4帧与第2帧之间的第二位姿变换矩阵,再根据该第二位姿变换矩阵及深度图像对更新后的三维地图再次进行更新,生成再次更新后的三维地图。计算第4帧与第3帧之间的第二位姿变换矩阵,再根据该第二位姿变换矩阵及深度图像对当前环境的再次更新后的三维地图进行第三次更新,生成目标三维地图。Specifically, first calculate the second pose transformation matrix between the fourth frame and the first frame, and then update the initial three-dimensional map of the current environment according to the second pose transformation matrix and the depth image to generate an updated three-dimensional map . Next, calculate the second pose transformation matrix between the fourth frame and the second frame, and then update the updated 3D map again according to the second pose transformation matrix and the depth image to generate a reupdated 3D map. Calculate the second pose transformation matrix between the 4th frame and the 3rd frame, and then update the updated 3D map of the current environment for the third time according to the second pose transformation matrix and the depth image to generate the target 3D map .
本申请实施例中,若目标RGB图像不是预设初始化滑窗内的第一帧图像,则在构建当前环境的目标三维地图时,包括:计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,根据第一位姿变换矩阵及深度图像,构建当前环境的初始三维地图。再计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图。使得最终所得的目标三维地图中,不仅恢复了目标RGB图像的下一帧RGB图像对应的地图点,还恢复了目标RGB图像之前的每一帧图像的地图点。因此,大大提高了此时所得到的目标三维地图的完整性、准确性。In the embodiment of the present application, if the target RGB image is not the first frame image in the preset initialization sliding window, when constructing the target three-dimensional map of the current environment, it includes: calculating the first frame image between the next frame RGB image and the target RGB image A pose transformation matrix, constructing an initial 3D map of the current environment according to the first pose transformation matrix and the depth image. Then calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map. In the finally obtained target three-dimensional map, not only the map points corresponding to the next frame of RGB image of the target RGB image are restored, but also the map points of each frame of images before the target RGB image are restored. Therefore, the integrity and accuracy of the target three-dimensional map obtained at this time are greatly improved.
在一个实施例中,提供了一种位姿计算方法,还包括:In one embodiment, a pose calculation method is provided, further comprising:
采用预设透视投影PnP算法,计算第一位姿变换矩阵或第二位姿变换矩阵。其中,预设透视投影PnP算法用于分步计算目标RGB图像的相关RGB图像与目标RGB图像之间的旋转变换矩阵和平移变换矩阵,目标RGB图像的相关RGB图像为位于目标RGB图像之前的RGB图像或下一帧RGB图像。A preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix. Among them, the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.
其中,计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,包括:Wherein, the first pose transformation matrix between the next frame RGB image and the target RGB image is calculated, including:
采用预设透视投影PnP算法,计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵(也可以简称为位姿);其中,预设透视投影PnP算法用于分步计算下一帧RGB图像与目标RGB图像之间的旋转变换矩阵和平移变换矩阵。The preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix (also referred to as pose) between the next frame of RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used for step-by-step calculation Rotation transformation matrix and translation transformation matrix between a frame of RGB image and target RGB image.
其中,预设透视投影PnP算法是相对于传统的透视投影PnP算法而言的,传统的透视投影PnP算法是基于3D-2D特征点对同时计算出两帧之间的旋转变换矩阵及平移变换矩阵。而预设透视投影PnP算法用于分步计算两帧之间的旋转变换矩阵和平移变换矩阵。Among them, the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm. The traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. . The preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.
具体的,在采用预设透视投影PnP算法,计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵时,首先,计算下一帧RGB图像与目标RGB图像之间的第一旋转变换矩阵;其次,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。且第一旋转变换矩阵与第一平移变换矩阵,构成了第一位姿变换矩阵。Specifically, when using the preset perspective projection PnP algorithm to calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image Rotation transformation matrix; secondly, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image. And the first rotation transformation matrix and the first translation transformation matrix constitute the first pose transformation matrix.
同理,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第一位姿变换矩阵,包括:Similarly, calculate the first pose transformation matrix between the RGB image before the target RGB image and the target RGB image, including:
采用预设透视投影PnP算法,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵(也可以简称为位姿);其中,预设透视投影PnP算法用于分步计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二旋转变换矩阵和第二平移变换矩阵。该第二旋转变换矩阵与第二平移变换矩阵,构成了第二位姿变换矩阵。The preset perspective projection PnP algorithm is used to calculate the second pose transformation matrix (also referred to as pose) between the RGB image before the target RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used to analyze The step is to calculate a second rotation transformation matrix and a second translation transformation matrix between the RGB image before the target RGB image and the target RGB image. The second rotation transformation matrix and the second translation transformation matrix constitute a second pose transformation matrix.
本申请实施例中,在计算位于目标RGB图像之前的RGB图像或下一帧RGB图像与目标RGB图像之间的位姿变换矩阵时,将计算旋转变换矩阵与平移变换矩阵的过程进行解耦。就可以避免将计算旋转变换矩阵时所产生的误差,与计算平移变换矩阵时所产生的误差进行叠加。从而,提高了采用预设透视投影PnP算法,实现分步式计算的方式所最终得到的第一或第二位姿变换矩阵的准确性。In the embodiment of the present application, when calculating the pose transformation matrix between the RGB image before the target RGB image or between the RGB image of the next frame and the target RGB image, the process of calculating the rotation transformation matrix and the translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the rotation transformation matrix with the error generated when calculating the translation transformation matrix. Therefore, the accuracy of the first or second pose transformation matrix finally obtained by adopting the preset perspective projection PnP algorithm to realize the step-by-step calculation is improved.
在一个实施例中,采用预设透视投影PnP算法,计算第一位姿变换矩阵或第二位姿变换矩阵,包括:In one embodiment, a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:
计算目标RGB图像的相关RGB图像与目标RGB图像之间的旋转变换矩阵;Calculate the rotation transformation matrix between the related RGB image of the target RGB image and the target RGB image;
根据深度图像、旋转变换矩阵,计算目标RGB图像的相关RGB图像与目标RGB图像之间的平移变换矩阵;Calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;
基于旋转变换矩阵及平移变换矩阵,生成目标RGB图像的相关RGB图像与目标RGB图像之间的第一位姿变换矩阵或第二位姿变换矩阵。Based on the rotation transformation matrix and the translation transformation matrix, a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.
其中,若目标RGB图像的相关RGB图像为下一帧RGB图像,即目标RGB图像之前不存在RGB图像,则如图5所示,采用预设透视投影PnP算法,只需要计算第一位姿变换矩阵,包括:Among them, if the relevant RGB image of the target RGB image is the next frame of RGB image, that is, there is no RGB image before the target RGB image, as shown in Figure 5, using the preset perspective projection PnP algorithm, only the first pose transformation needs to be calculated Matrix, including:
操作520,计算下一帧RGB图像与目标RGB图像之间的第一旋转变换矩阵。 Operation 520, calculating a first rotation transformation matrix between the RGB image of the next frame and the target RGB image.
由于所采集到的深度图像中会出现有些区域没有深度信息,或有些区域的深度信息不准确,因此,两帧RGB图像之间相互匹配的2D-2D特征点对,一般是比其中一帧的深度图像与另一帧RGB图像之间相互匹配的3D-2D特征点对要更多、更全面。Because some areas in the collected depth images have no depth information, or some areas have inaccurate depth information, therefore, the matched 2D-2D feature point pairs between two frames of RGB images are generally better than those of one frame. There are more and more comprehensive 3D-2D feature point pairs matched between the depth image and another RGB image.
结合图6所示,为一个实施例中采用预设透视投影PnP算法计算位姿变换矩阵的示意图。图6中图像i为目标RGB图像,图像j为目标RGB图像的下一帧RGB图像,椭圆形框对应的是目标RGB图像对应的深度图像。图6中左侧的一对图像i及图像j所示,即为在图像i及图像j中确定相互匹配的2D-2D特征点对,及计算第一旋转变换矩阵R ij的示意图。 In conjunction with FIG. 6 , it is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment. In Fig. 6, image i is the target RGB image, image j is the next frame RGB image of the target RGB image, and the oval frame corresponds to the depth image corresponding to the target RGB image. A pair of image i and image j on the left side of FIG. 6 is a schematic diagram of determining mutually matching 2D-2D feature point pairs in image i and image j and calculating the first rotation transformation matrix R ij .
具体的,可以在下一帧RGB图像与目标RGB图像这两帧之间确定相互匹配的2D-2D特征点对,具体可以采用光流法或其他图像匹配方法来确定两帧之间的2D-2D特征点对。通过下一帧RGB图像与目标RGB图像这两帧之间相互匹配的2D-2D特征点对,结合对极几何约束计算下一帧RGB图像与目标RGB图像之间的第一旋转变换矩阵R ijSpecifically, the matched 2D-2D feature point pairs can be determined between the two frames of the next RGB image and the target RGB image. Specifically, the optical flow method or other image matching methods can be used to determine the 2D-2D feature points between the two frames. Pairs of feature points. Calculate the first rotation transformation matrix R ij between the next frame of RGB image and the target RGB image through the matching 2D-2D feature point pairs between the next frame of RGB image and the target RGB image, combined with epipolar geometric constraints .
操作540,根据深度图像、第一旋转变换矩阵,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。Operation 540: Calculate a first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix.
具体的,首先,根据第一旋转变换矩阵对下一帧RGB图像与目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。其次,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。最后,根据3D-2D特征点对,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵t ijSpecifically, firstly, the matched 2D-2D feature point pairs on the next frame RGB image and the target RGB image are eliminated according to the first rotation transformation matrix, to obtain the eliminated 2D-2D feature point pairs. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, calculate the first translation transformation matrix t ij between the RGB image of the next frame and the target RGB image.
操作560,基于第一旋转变换矩阵及第一平移变换矩阵,生成下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵。Operation 560, based on the first rotation transformation matrix and the first translation transformation matrix, generate a first pose transformation matrix between the next frame of RGB image and the target RGB image.
结合上面所计算出的第一旋转变换矩阵R ij及第一平移变换矩阵t ij,生成下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵。 Combining the first rotation transformation matrix R ij and the first translation transformation matrix t ij calculated above, the first pose transformation matrix between the next frame of RGB image and the target RGB image is generated.
本申请实施例中,计算下一帧RGB图像与目标RGB图像之间的第一旋转变换矩阵。根据深度图像、第一旋转变换矩阵,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。基于第一旋转变换矩阵及第一平移变换矩阵,生成下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵。采用预设透视投影PnP算法,实现分步式计算旋转变换矩阵及平移变换矩阵,避免了将两者计算过程中所产生的误差进行叠加,进而提高了最终得到的第一位姿变换矩阵的准确性。In the embodiment of the present application, the first rotation transformation matrix between the next frame RGB image and the target RGB image is calculated. According to the depth image and the first rotation transformation matrix, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image. Based on the first rotation transformation matrix and the first translation transformation matrix, a first pose transformation matrix between the next frame of RGB image and the target RGB image is generated. The preset perspective projection PnP algorithm is used to realize the step-by-step calculation of the rotation transformation matrix and the translation transformation matrix, avoiding the superposition of the errors generated in the calculation process of the two, and thus improving the accuracy of the final first pose transformation matrix sex.
接上一个实施例,如图7所示,操作540,根据深度图像、第一旋转变换矩阵,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵,包括:Continuing from the previous embodiment, as shown in FIG. 7, operation 540 is to calculate the first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix, including:
操作542,根据第一旋转变换矩阵对下一帧RGB图像与目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。Operation 542: Eliminate the matching 2D-2D feature point pairs on the RGB image of the next frame and the target RGB image according to the first rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.
结合图6所示,图像i为目标RGB图像,图像j为目标RGB图像的下一帧RGB图像,椭圆形框对应的是目标RGB图像对应的深度图像。图6中右侧的一对图像i及图像j所示,即为在图像i及图像j中,根据第一旋转变换矩阵对图像i与图像j上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。As shown in FIG. 6 , image i is the target RGB image, image j is the next frame of RGB image of the target RGB image, and the oval frame corresponds to the depth image corresponding to the target RGB image. As shown in the pair of image i and image j on the right side of Figure 6, in image i and image j, the matching 2D-2D feature point pairs on image i and image j are eliminated according to the first rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.
操作544,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。 Operation 544, converting the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.
结合图6中的椭圆形框所示的深度图像,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。并采用Ransanc算法剔除3D-2D特征点对中的异常点对,生成剔除后的3D-2D特征点对。Combined with the depth image shown in the oval box in FIG. 6, the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.
操作546,根据3D-2D特征点对,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。 Operation 546, according to the 3D-2D feature point pair, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image.
在计算出了目标RGB图像与下一帧RGB图像之间的3D-2D特征点对之后,就可以基于平移变换矩阵计算公式,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。平移变换矩阵计算公式具体如下所示:After calculating the 3D-2D feature point pairs between the target RGB image and the next frame RGB image, the first translation transformation between the next frame RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula matrix. The formula for calculating the translation transformation matrix is as follows:
Figure PCTCN2022098295-appb-000001
Figure PCTCN2022098295-appb-000001
其中,
Figure PCTCN2022098295-appb-000002
Figure PCTCN2022098295-appb-000003
是图像i与图像j之间的3D-2D匹配点对,R ij是图像j到图像i的旋转变换矩阵,t ij是图像j到图像i的平移变换矩阵,
Figure PCTCN2022098295-appb-000004
Figure PCTCN2022098295-appb-000005
对应的深度值,π() -1是一种将2D点反投影成3D点的变换矩阵。
in,
Figure PCTCN2022098295-appb-000002
and
Figure PCTCN2022098295-appb-000003
is the 3D-2D matching point pair between image i and image j, R ij is the rotation transformation matrix from image j to image i, t ij is the translation transformation matrix from image j to image i,
Figure PCTCN2022098295-appb-000004
yes
Figure PCTCN2022098295-appb-000005
The corresponding depth value, π() -1 is a transformation matrix that back-projects 2D points into 3D points.
针对多个3D-2D特征点对,采用上述公式(1-1)构建最小二乘法公式,就可以计算出最优的变量t ij,作为下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。 For multiple 3D-2D feature point pairs, the above formula (1-1) is used to construct the least squares formula, and the optimal variable t ij can be calculated as the first variable between the next frame RGB image and the target RGB image. The translation transformation matrix.
本申请实施例中,在计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵时,首先,在根据第一旋转变换矩阵对下一帧RGB图像与目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。其次,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。最后,根据3D-2D特征点对,计算下一帧RGB图像与目标RGB图像之间的第一平移变换矩阵。对特征点对进行了多次剔除异常点对,以及采用最小二乘法来计算第一平移变换矩阵,提高了所计算出的第一平移变换矩阵的准确性。In the embodiment of the present application, when calculating the first translation transformation matrix between the next frame of RGB image and the target RGB image, first, according to the first rotation transformation matrix, the mutual matching of the next frame of RGB image and the target RGB image The 2D-2D feature point pairs are eliminated to obtain the eliminated 2D-2D feature point pairs. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the first translation transformation matrix between the next frame RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated for many times, and the least square method is used to calculate the first translation transformation matrix, which improves the accuracy of the calculated first translation transformation matrix.
在一个实施例中,若目标RGB图像为预设初始化滑窗内的第一帧图像,则计算下一帧RGB图像与目标RGB图像之间的第一位姿变换矩阵,如图8所示,根据第一位姿变换矩阵及深度图像,构建当前环境的目标三维地图,包括:In one embodiment, if the target RGB image is the first frame image in the preset initialization sliding window, then calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, as shown in Figure 8, According to the first pose transformation matrix and the depth image, construct the target 3D map of the current environment, including:
操作820,根据第一位姿变换矩阵,将深度图像上的3D特征点投影至下一帧RGB图像上,生成投影2D特征点。 Operation 820, according to the first pose transformation matrix, project the 3D feature points on the depth image onto the next frame of RGB image to generate projected 2D feature points.
其中,第一位姿变换矩阵是下一帧RGB图像与目标RGB图像之间的位姿变换矩阵。可以基于目标RGB图像对应的深度图像、目标RGB图像,确定3D-2D匹配点对,然后基于第一位姿变换矩阵就可以将深度图像上的3D特征点投影至下一帧RGB图像上,生成投影2D特征点。Among them, the first pose transformation matrix is the pose transformation matrix between the next frame RGB image and the target RGB image. The 3D-2D matching point pairs can be determined based on the depth image and target RGB image corresponding to the target RGB image, and then based on the first pose transformation matrix, the 3D feature points on the depth image can be projected onto the next frame of RGB image to generate Project 2D feature points.
操作840,计算投影2D特征点与下一帧RGB图像上的2D特征点之间的重投影误差。 Operation 840, calculating a reprojection error between the projected 2D feature point and the 2D feature point on the RGB image of the next frame.
操作860,若重投影误差小于预设误差阈值,则将深度图像上的3D特征点作为目标地图点。In operation 860, if the reprojection error is smaller than the preset error threshold, use the 3D feature point on the depth image as the target map point.
具体的,下一帧RGB图像上原本就存在很多原始2D特征点,此时计算这些投影2D特征点与原始2D特征点位置之间的重投影误差。若重投影误差小于预设误差阈值,则认为重投影误差小于预设误差阈值对应的3D特征点为可信的地图点,将深度图像上的3D特征点作为目标地图点。Specifically, there are many original 2D feature points on the next frame of RGB image, and at this time, the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the re-projection error is smaller than the preset error threshold, the 3D feature point corresponding to the re-projection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point on the depth image is used as the target map point.
操作880,根据目标地图点构建当前环境的目标三维地图。In operation 880, construct a target three-dimensional map of the current environment according to the target map points.
再将深度图像上的3D特征点中满足条件的3D特征点,作为目标地图点之后,就可以基于这些目 标地图点,构建当前环境的目标三维地图。After using the 3D feature points that meet the conditions in the 3D feature points on the depth image as the target map points, the target 3D map of the current environment can be constructed based on these target map points.
本申请实施例中,若目标RGB图像为预设初始化滑窗内的第一帧图像,则根据第一位姿变换矩阵,将深度图像上的3D特征点投影至下一帧RGB图像上,生成投影2D特征点。计算投影2D特征点与下一帧RGB图像上的2D特征点之间的重投影误差。若重投影误差小于预设误差阈值,则将深度图像上的3D特征点作为目标地图点,根据目标地图点构建当前环境的目标三维地图。实现了在目标RGB图像为预设初始化滑窗内的第一帧图像时,计算出当前环境的目标三维地图。然后,在计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿时,就可以直接采用目标三维地图进行计算。In the embodiment of the present application, if the target RGB image is the first frame image in the preset initialization sliding window, then according to the first pose transformation matrix, the 3D feature points on the depth image are projected onto the next frame RGB image to generate Project 2D feature points. Calculate the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image of the next frame. If the reprojection error is less than the preset error threshold, the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points. When the target RGB image is the first frame image in the preset initialization sliding window, the target three-dimensional map of the current environment can be calculated. Then, when calculating the pose of the electronic device when it collects other RGB images after the next RGB image within the preset initialization sliding window, it can directly use the target three-dimensional map for calculation.
在一个实施例中,采用预设透视投影PnP算法,计算第一位姿变换矩阵或第二位姿变换矩阵,包括:In one embodiment, a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:
计算目标RGB图像的相关RGB图像与目标RGB图像之间的旋转变换矩阵;Calculate the rotation transformation matrix between the related RGB image of the target RGB image and the target RGB image;
根据深度图像、旋转变换矩阵,计算目标RGB图像的相关RGB图像与目标RGB图像之间的平移变换矩阵;Calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;
基于旋转变换矩阵及平移变换矩阵,生成目标RGB图像的相关RGB图像与目标RGB图像之间的第一位姿变换矩阵或第二位姿变换矩阵。Based on the rotation transformation matrix and the translation transformation matrix, a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.
其中,若目标RGB图像的相关RGB图像为位于目标RGB图像之前的RGB图像及下一帧RGB图像,即目标RGB图像之前存在RGB图像,则采用预设透视投影PnP算法,需要同时计算第一位姿变换矩阵及第二位姿变换矩阵。其中,计算第一位姿变换矩阵的过程在此不再赘述,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,包括:Among them, if the relevant RGB image of the target RGB image is the RGB image before the target RGB image and the next frame of RGB image, that is, there is an RGB image before the target RGB image, then the preset perspective projection PnP algorithm needs to be calculated at the same time pose transformation matrix and the second pose transformation matrix. Wherein, the process of calculating the first pose transformation matrix is not repeated here, and the calculation of the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image includes:
采用预设透视投影PnP算法,计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵。A preset perspective projection PnP algorithm is used to calculate a second pose transformation matrix between the RGB image before the target RGB image and the target RGB image.
其中,预设透视投影PnP算法是相对于传统的透视投影PnP算法而言的,传统的透视投影PnP算法是基于3D-2D特征点对同时计算出两帧之间的旋转变换矩阵及平移变换矩阵。而预设透视投影PnP算法用于分步计算两帧之间的旋转变换矩阵和平移变换矩阵。Among them, the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm. The traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. . The preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.
具体的,在采用预设透视投影PnP算法,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵时,首先,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二旋转变换矩阵;其次,根据深度图像、第二旋转变换矩阵,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二平移变换矩阵。且第二旋转变换矩阵与第二平移变换矩阵,构成了第二位姿变换矩阵。Specifically, when using the preset perspective projection PnP algorithm to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, first, calculate the difference between the RGB image before the target RGB image and the target RGB image The second rotation transformation matrix between; secondly, according to the depth image and the second rotation transformation matrix, calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image. And the second rotation transformation matrix and the second translation transformation matrix constitute the second pose transformation matrix.
本申请实施例中,在计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵时,将计算第二旋转变换矩阵与第二平移变换矩阵的过程进行解耦。就可以避免将计算第二旋转变换矩阵时所产生的误差,与计算第二平移变换矩阵时所产生的误差进行叠加。从而,提高了采用预设透视投影PnP算法,实现了分步计算出第二位姿变换矩阵,进而提高了第二位姿变换矩阵的准确性。In the embodiment of the present application, when calculating the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, the process of calculating the second rotation transformation matrix and the second translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the second rotation transformation matrix with the error generated when calculating the second translation transformation matrix. Therefore, the preset perspective projection PnP algorithm is improved, and the second pose transformation matrix is calculated step by step, thereby improving the accuracy of the second pose transformation matrix.
在一个实施例中,根据深度图像、第二旋转变换矩阵,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二平移变换矩阵,包括:In one embodiment, according to the depth image and the second rotation transformation matrix, calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image includes:
操作一,根据第二旋转变换矩阵对目标RGB图像之前的RGB图像与目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。Operation 1: Eliminate the 2D-2D feature point pairs that match each other on the RGB image before the target RGB image and the target RGB image according to the second rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.
结合图6所示,假设图像i为目标RGB图像,图像j为目标RGB图像的目标RGB图像之前的RGB图像,椭圆形框对应的是目标RGB图像对应的深度图像。图6中右侧的一对图像i及图像j所示,即为在图像i及图像j中,根据第二旋转变换矩阵对图像i与图像j上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。As shown in FIG. 6 , it is assumed that image i is the target RGB image, image j is the RGB image before the target RGB image of the target RGB image, and the oval frame corresponds to the depth image corresponding to the target RGB image. As shown in the pair of image i and image j on the right side of Figure 6, in image i and image j, the matching 2D-2D feature point pairs on image i and image j are eliminated according to the second rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.
操作二,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。 Operation 2, convert the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.
结合图6中的椭圆形框所示的深度图像,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。并采用Ransanc算法剔除3D-2D特征点对中的异常点对,生成剔除后的3D-2D特征点对。Combined with the depth image shown in the oval box in FIG. 6, the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.
操作三,根据3D-2D特征点对,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二 平移变换矩阵。Operation three, according to the 3D-2D feature point pair, calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image.
在计算出了目标RGB图像与目标RGB图像之前的RGB图像之间的3D-2D特征点对之后,就可以基于平移变换矩阵计算公式,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二平移变换矩阵。平移变换矩阵计算公式具体如下所示:After calculating the 3D-2D feature point pairs between the target RGB image and the RGB image before the target RGB image, the distance between the RGB image before the target RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula The second translation transformation matrix. The formula for calculating the translation transformation matrix is as follows:
Figure PCTCN2022098295-appb-000006
Figure PCTCN2022098295-appb-000006
其中,
Figure PCTCN2022098295-appb-000007
Figure PCTCN2022098295-appb-000008
是图像i与图像j之间的3D-2D匹配点对,R ij是图像j到图像i的旋转变换矩阵,t ij是图像j到图像i的平移变换矩阵,
Figure PCTCN2022098295-appb-000009
Figure PCTCN2022098295-appb-000010
对应的深度值,π() -1是一种将2D点反投影成3D点的变换矩阵。
in,
Figure PCTCN2022098295-appb-000007
and
Figure PCTCN2022098295-appb-000008
is the 3D-2D matching point pair between image i and image j, R ij is the rotation transformation matrix from image j to image i, t ij is the translation transformation matrix from image j to image i,
Figure PCTCN2022098295-appb-000009
yes
Figure PCTCN2022098295-appb-000010
The corresponding depth value, π() -1 is a transformation matrix that back-projects 2D points into 3D points.
针对多个3D-2D特征点对,采用上述公式(1-1)构建最小二乘法公式,就可以计算出最优的变量t ij,作为目标RGB图像之前的RGB图像与目标RGB图像之间的第二平移变换矩阵。 For multiple 3D-2D feature point pairs, the above formula (1-1) is used to construct the least squares formula, and the optimal variable t ij can be calculated as the difference between the RGB image before the target RGB image and the target RGB image. The second translation transformation matrix.
本申请实施例中,在计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二平移变换矩阵时,首先,在根据第二旋转变换矩阵对目标RGB图像之前的RGB图像与目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对。其次,根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对。最后,根据3D-2D特征点对,计算目标RGB图像之前的RGB图像与目标RGB图像之间的第二平移变换矩阵。对特征点对进行了多次剔除异常点对,以及采用最小二乘法来计算第二平移变换矩阵,提高了所计算出的第二平移变换矩阵的准确性。In the embodiment of the present application, when calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image, first, according to the second rotation transformation matrix, the RGB image before the target RGB image and the target RGB image The 2D-2D feature point pairs that match each other are eliminated, and the 2D-2D feature point pairs after elimination are obtained. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated multiple times, and the least square method is used to calculate the second translation transformation matrix, which improves the accuracy of the calculated second translation transformation matrix.
在一个实施例中,根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,生成目标三维地图,包括:In one embodiment, the initial three-dimensional map of the current environment is updated according to the second pose transformation matrix and the depth image to generate a target three-dimensional map, including:
根据第二位姿变换矩阵,将深度图像上的3D特征点分别投影至位于目标RGB图像之前的RGB图像上,生成投影2D特征点;According to the second pose transformation matrix, the 3D feature points on the depth image are respectively projected onto the RGB image before the target RGB image to generate projected 2D feature points;
计算投影2D特征点与位于目标RGB图像之前的RGB图像上的2D特征点之间的重投影误差;Calculate the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image;
若重投影误差小于预设误差阈值,则将深度图像上的3D特征点作为新的目标地图点;If the reprojection error is less than the preset error threshold, the 3D feature point on the depth image is used as a new target map point;
将新的目标地图点添加至初始三维地图,生成目标三维地图。Add new target map points to the initial 3D map to generate the target 3D map.
具体的,其中,第二位姿变换矩阵是目标RGB图像之前的RGB图像与目标RGB图像之间的位姿变换矩阵。可以基于目标RGB图像对应的深度图像、目标RGB图像,确定3D-2D匹配点对,然后基于第二位姿变换矩阵就可以将深度图像上的3D特征点投影至目标RGB图像之前的RGB图像上,生成投影2D特征点。Specifically, the second pose transformation matrix is a pose transformation matrix between the RGB image before the target RGB image and the target RGB image. The 3D-2D matching point pairs can be determined based on the depth image and the target RGB image corresponding to the target RGB image, and then based on the second pose transformation matrix, the 3D feature points on the depth image can be projected onto the RGB image before the target RGB image , generating projected 2D feature points.
而目标RGB图像之前的RGB图像上原本就存在很多原始2D特征点,此时计算这些投影2D特征点与原始2D特征点位置之间的重投影误差。若重投影误差小于预设误差阈值,则认为重投影误差小于预设误差阈值对应的3D特征点为可信的地图点,将3D特征点作为目标地图点。However, there are many original 2D feature points on the RGB image before the target RGB image. At this time, the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the reprojection error is smaller than the preset error threshold, the 3D feature point corresponding to the reprojection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point is used as the target map point.
再将深度图像上的3D特征点中满足条件的3D特征点,作为目标地图点之后,就可以基于这些目标地图点,构建当前环境的目标三维地图。Then, among the 3D feature points on the depth image, the 3D feature points that satisfy the conditions are used as the target map points, and the target 3D map of the current environment can be constructed based on these target map points.
本申请实施例中,若目标RGB图像不是预设初始化滑窗内的第一帧图像,则根据第二位姿变换矩阵,将深度图像上的3D特征点投影至目标RGB图像之前的RGB图像上,生成投影2D特征点。计算投影2D特征点与目标RGB图像之前的RGB图像上的2D特征点之间的重投影误差。若重投影误差小于预设误差阈值,则将深度图像上的3D特征点作为目标地图点,根据目标地图点构建当前环境的目标三维地图。实现了在目标RGB图像不是预设初始化滑窗内的第一帧图像时,计算出当前环境的目标三维地图。然后,在计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿时,就可以直接采用目标三维地图进行计算。In the embodiment of the present application, if the target RGB image is not the first frame image in the preset initialization sliding window, then according to the second pose transformation matrix, the 3D feature points on the depth image are projected onto the RGB image before the target RGB image , generating projected 2D feature points. Computes the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image. If the reprojection error is less than the preset error threshold, the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points. It realizes calculating the target 3D map of the current environment when the target RGB image is not the first frame image in the preset initialization sliding window. Then, when calculating the pose of the electronic device when it collects other RGB images after the next frame of RGB images within the preset initialization sliding window, it can directly use the target three-dimensional map for calculation.
在一个实施例中,如图9所示,操作280,根据目标三维地图,计算电子设备在预设初始化滑窗内采集下一帧RGB图像之后的RGB图像时的位姿,包括:In one embodiment, as shown in FIG. 9, operation 280, according to the target three-dimensional map, calculates the pose of the electronic device when the RGB image after the next frame of RGB image is collected within the preset initialization sliding window, including:
操作282,将下一帧RGB图像的下一帧作为当前帧,执行以下目标操作: Operation 282, using the next frame of the next RGB image as the current frame, and performing the following target operations:
操作284,根据目标三维地图、当前帧及位于当前帧之前的RGB图像,生成目标三维地图与当前 帧之间相互匹配的3D-2D特征点对; Operation 284, according to the target three-dimensional map, the current frame and the RGB image before the current frame, generate a pair of 3D-2D feature points that match each other between the target three-dimensional map and the current frame;
操作286,基于3D-2D特征点对计算当前帧的位姿; Operation 286, calculating the pose of the current frame based on the 3D-2D feature point pair;
操作288,根据当前帧对目标三维地图进行更新,并将更新后的三维地图作为新的目标三维地图。 Operation 288, update the target three-dimensional map according to the current frame, and use the updated three-dimensional map as a new target three-dimensional map.
将当前帧的下一帧作为新的当前帧,循环执行目标操作直到计算出预设初始化滑窗内最后一帧RGB图像的位姿为止。The next frame of the current frame is used as the new current frame, and the target operation is executed cyclically until the pose of the last RGB image in the preset initialization sliding window is calculated.
结合图10所示,为一个实施例中计算采集下一帧RGB图像之后的RGB图像时的位姿及三维地图计算的示意图。例如,假设预设初始化滑窗内为10帧图像,在第4帧时采集到深度图像,即目标RGB图像为第4帧,目标RGB图像的下一帧RGB图像为第5帧。此时,计算采集下一帧RGB图像之后的RGB图像时的位姿,即为计算采集第6-10帧图像时的位姿。In conjunction with FIG. 10 , it is a schematic diagram of calculating the pose and three-dimensional map calculation of the RGB image after the acquisition of the next frame of RGB image in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame, and the next RGB image frame of the target RGB image is the 5th frame. At this time, the calculation of the pose when the RGB image after the next frame of RGB image is collected is to calculate the pose when the 6th-10th frame of image is collected.
从计算采集第6帧图像时的位姿开始,首先,将下一帧RGB图像(第5帧)的下一帧(第6帧)作为当前帧进行计算当前帧的位姿。Starting from the calculation of the pose when the sixth frame of image is collected, first, the next frame (frame 6) of the next RGB image (frame 5) is used as the current frame to calculate the pose of the current frame.
其次,根据目标三维地图、位于当前帧之前的RGB图像及当前帧(第6帧),生成目标三维地图与当前帧(第6帧)之间相互匹配的3D-2D特征点对。具体的,从位于当前帧之前的RGB图像及当前帧中,采用光流法或其他图像匹配方法来获取相互匹配的2D特征点对。从目标三维地图中的地图点中获取与2D特征点对匹配的3D特征点,根据匹配的3D特征点及2D特征点对,生成目标三维地图与当前帧之间相互匹配的3D-2D特征点对。Secondly, according to the target 3D map, the RGB image before the current frame and the current frame (6th frame), a 3D-2D feature point pair matching between the target 3D map and the current frame (6th frame) is generated. Specifically, from the RGB image located before the current frame and the current frame, the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs. Obtain 3D feature points that match 2D feature point pairs from the map points in the target 3D map, and generate 3D-2D feature points that match each other between the target 3D map and the current frame based on the matched 3D feature points and 2D feature point pairs right.
再次,基于3D-2D特征点对,采用传统的透视投影PnP算法计算当前帧的位姿。其中,传统的透视投影PnP算法是基于3D-2D特征点对同时计算出两帧之间的旋转变换矩阵及平移变换矩阵。因此,在采用传统的透视投影PnP算法计算当前帧的位姿时,基于3D-2D特征点对同时分别计算出位于当前帧之前的RGB图像与当前帧之间的旋转变换矩阵及平移变换矩阵。进而,基于旋转变换矩阵及平移变换矩阵,得到位姿变换矩阵,然后分别基于位姿变换矩阵及位于当前帧之前的RGB图像的位姿,得到当前帧的位姿。其中,具体可以是基于旋转变换矩阵及平移变换矩阵相乘得到位姿变换矩阵,本申请对此计算方式不做限定。Again, based on the 3D-2D feature point pairs, the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame. Among them, the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. Therefore, when the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame, the rotation transformation matrix and translation transformation matrix between the RGB image before the current frame and the current frame are respectively calculated based on the 3D-2D feature point pairs. Furthermore, the pose transformation matrix is obtained based on the rotation transformation matrix and the translation transformation matrix, and then the pose of the current frame is obtained based on the pose transformation matrix and the pose of the RGB image before the current frame respectively. Specifically, the pose transformation matrix may be obtained based on the multiplication of the rotation transformation matrix and the translation transformation matrix, and this calculation method is not limited in this application.
最后,根据当前帧对目标三维地图进行更新,并将更新后的三维地图作为新的目标三维地图。将当前帧的下一帧RGB图像(第7帧)作为新的当前帧,循环执行目标操作操作284-286直到计算出预设初始化滑窗内最后一帧RGB图像的位姿为止。Finally, the target 3D map is updated according to the current frame, and the updated 3D map is used as a new target 3D map. Taking the next frame of RGB image (frame 7) of the current frame as the new current frame, the target operation operations 284-286 are cyclically executed until the pose of the last frame of RGB image in the preset initialization sliding window is calculated.
本申请实施例中,在计算电子设备在预设初始化滑窗内采集下一帧RGB图像之后的RGB图像时的位姿的过程中,采用循环的方式,首先,基于目标三维地图及当前帧及位于当前帧之前的RGB图像,计算当前帧的位姿。其次,基于当前帧对目标三维地图进行更新,并将更新后的三维地图作为新的目标三维地图。最后,基于新的目标三维地图及下一组当前帧及位于当前帧之前的RGB图像,计算新的当前帧的位姿。其次,基于新的当前帧对新的目标三维地图再次进行更新,并将再次更新后的三维地图作为新的目标三维地图。如此循环,直到计算出预设初始化滑窗内最后一帧RGB图像的位姿为止。In the embodiment of the present application, in the process of calculating the pose of the electronic device when the RGB image after the next frame of RGB image is collected in the preset initialization sliding window, a loop method is adopted. First, based on the target 3D map and the current frame and RGB image located before the current frame, calculate the pose of the current frame. Secondly, the target three-dimensional map is updated based on the current frame, and the updated three-dimensional map is used as a new target three-dimensional map. Finally, the pose of the new current frame is calculated based on the new target 3D map, the next group of current frames and the RGB images before the current frame. Secondly, the new target three-dimensional map is updated again based on the new current frame, and the updated three-dimensional map is used as the new target three-dimensional map. This loops until the pose of the last frame of RGB image within the preset initialization sliding window is calculated.
由于计算后一帧的位姿时,均采用了基于前面所有帧所计算出的三维地图,因此,提高了所计算出的后一帧的位姿的准确性。Since the three-dimensional map calculated based on all previous frames is used when calculating the pose of the next frame, the accuracy of the calculated pose of the next frame is improved.
在一个实施例中,根据目标三维地图、当前帧及位于当前帧之前的RGB图像,生成目标三维地图与当前帧之间相互匹配的3D-2D特征点对,包括:In one embodiment, according to the target three-dimensional map, the current frame, and the RGB image before the current frame, a pair of 3D-2D feature points matching each other between the target three-dimensional map and the current frame is generated, including:
从当前帧及位于当前帧之前的RGB图像中,获取相互匹配的2D特征点对;Obtain matching 2D feature point pairs from the current frame and the RGB image before the current frame;
从目标三维地图中的地图点中获取与2D特征点对匹配的3D特征点,根据匹配的3D特征点及2D特征点对,生成目标三维地图与当前帧之间相互匹配的3D-2D特征点对。Obtain 3D feature points that match 2D feature point pairs from the map points in the target 3D map, and generate 3D-2D feature points that match each other between the target 3D map and the current frame based on the matched 3D feature points and 2D feature point pairs right.
本申请实施例中,从当前帧及位于当前帧之前的RGB图像中,采用光流法或其他图像匹配方法来获取相互匹配的2D特征点对。因为在计算目标三维地图时位于当前帧之前的RGB图像均提供了部分地图点,所以可以从目标三维地图中的地图点中获取与该2D特征点对匹配的3D特征点。然后,基于该3D特征点及2D特征点对中当前帧的2D特征点,就生成了目标三维地图与当前帧之间相互匹配的3D-2D特征点对。因此,基于3D-2D特征点对,就获取了目标三维地图上的3D特征点与当前帧上的2D特征点之间的匹配关系。In the embodiment of the present application, from the current frame and the RGB image before the current frame, the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs. Since the RGB images before the current frame provide some map points when calculating the target 3D map, the 3D feature points matching the 2D feature point pair can be obtained from the map points in the target 3D map. Then, based on the 3D feature point and the 2D feature point of the current frame in the 2D feature point pair, a 3D-2D feature point pair matching between the target 3D map and the current frame is generated. Therefore, based on the 3D-2D feature point pair, the matching relationship between the 3D feature points on the target 3D map and the 2D feature points on the current frame is obtained.
在一个实施例中,当前帧为滑窗内目标RGB图像的下一帧图像之后的图像帧,若当前帧存在对应的深度图像,则操作288,根据当前帧对目标三维地图进行更新,生成更新后的三维地图,包括:In one embodiment, the current frame is the image frame after the next frame image of the target RGB image in the sliding window. If there is a corresponding depth image in the current frame, then operation 288 is to update the target three-dimensional map according to the current frame to generate an updated The final 3D map includes:
根据当前帧对应的深度图像、当前帧与预设滑窗内当前帧之前的RGB图像之间的位姿变换矩阵,对目标三维地图进行更新生成中间三维地图;According to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window, the target 3D map is updated to generate an intermediate 3D map;
采用三角化法对中间三维地图进行更新,生成更新后的三维地图。The intermediate 3D map is updated by triangulation method to generate the updated 3D map.
假设当前帧(第6帧)存在对应的深度图像,则在根据当前帧对目标三维地图进行更新,生成更新后的三维地图时,先根据当前帧对应的深度图像、当前帧与预设滑窗内当前帧之前的RGB图像之间的位姿变换矩阵,对目标三维地图进行更新生成中间三维地图。Assuming that there is a corresponding depth image in the current frame (frame 6), when updating the target 3D map according to the current frame to generate the updated 3D map, firstly, according to the depth image corresponding to the current frame, the current frame and the preset sliding window The pose transformation matrix between the RGB images before the current frame is used to update the target 3D map to generate an intermediate 3D map.
具体的,首先,计算当前帧与预设滑窗内当前帧之前的RGB图像之间的位姿变换矩阵;其次,根据位姿变换矩阵,将当前帧的深度图像上的3D特征点投影至当前帧之前的RGB图像上,生成投影2D特征点。再次,计算投影2D特征点与当前帧之前的RGB图像上的2D特征点之间的重投影误差,若重投影误差小于预设误差阈值,则将当前帧的深度图像上的3D特征点作为目标地图点。最后,根据目标地图点对目标地图进行更新生成中间三维地图。Specifically, first, calculate the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window; secondly, according to the pose transformation matrix, project the 3D feature points on the depth image of the current frame to the current On the RGB image before the frame, generate projected 2D feature points. Again, calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the current frame, if the reprojection error is less than the preset error threshold, then use the 3D feature point on the depth image of the current frame as the target map point. Finally, the target map is updated according to the target map points to generate an intermediate three-dimensional map.
进而,再采用三角化法对中间三维地图进行更新,生成更新后的三维地图。其中,在多视几何中,相机在两个位置观测到同一个空间点,通过两次相机位姿和图像观测点坐标求得三维空间点坐标,这个过程即为三角化法的计算过程。采用三角化法可以恢复一些深度图像中所缺失的深度信息。Furthermore, a triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map. Among them, in multi-view geometry, the camera observes the same space point at two positions, and the three-dimensional space point coordinates are obtained through two camera poses and image observation point coordinates. This process is the calculation process of the triangulation method. Depth information missing in some depth images can be recovered by using triangulation.
本申请实施例中,若当前帧存在对应的深度图像,根据当前帧对应的深度图像、当前帧与预设滑窗内当前帧之前的RGB图像之间的位姿变换矩阵,对目标三维地图进行更新生成中间三维地图。再采用三角化法对中间三维地图进行更新,生成更新后的三维地图。若当前帧存在对应的深度图像,那么结合之前的图像帧所构建的目标三维地图,再基于该当前帧的深度图像上的3D特征点去更新目标三维地图。最后,再采用三角化法可以恢复一些深度图像中所缺失的深度信息。从而,大大提高了此时所构建的三维地图的完整性及准确性。In the embodiment of the present application, if there is a corresponding depth image in the current frame, the target three-dimensional map is calculated according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window Update generates intermediate 3D maps. Then, the triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map. If there is a corresponding depth image in the current frame, then the target 3D map is updated based on the 3D feature points on the depth image of the current frame in combination with the target 3D map constructed from the previous image frame. Finally, the triangulation method can recover the missing depth information in some depth images. Therefore, the integrity and accuracy of the three-dimensional map constructed at this time are greatly improved.
接上一个实施例,若当前帧不存在对应的深度图像,则操作288,根据当前帧对目标三维地图进行更新,生成更新后的三维地图,包括:Continuing from the previous embodiment, if there is no corresponding depth image in the current frame, then in operation 288, the target 3D map is updated according to the current frame to generate an updated 3D map, including:
采用三角化法对目标三维地图进行更新,生成更新后的三维地图。The three-dimensional map of the target is updated by the triangulation method, and an updated three-dimensional map is generated.
本申请实施例中,若当前帧不存在对应的深度图像,则采用三角化法对目标三维地图进行更新,生成更新后的三维地图。由于采用角化法可以恢复一些深度图像中所缺失的深度信息,所以也大大在一定程度上提高了此时所构建的三维地图的完整性及准确性。In the embodiment of the present application, if there is no corresponding depth image in the current frame, a triangulation method is used to update the target three-dimensional map to generate an updated three-dimensional map. Since the depth information missing in some depth images can be recovered by using the keratinization method, the integrity and accuracy of the three-dimensional map constructed at this time are also greatly improved to a certain extent.
在一个实施例中,如图11所示,在根据目标三维地图,计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的RGB图像时的位姿之后,还包括:In one embodiment, as shown in FIG. 11 , after calculating the pose of the electronic device when acquiring the RGB image after the next frame of RGB image within the preset initialization sliding window according to the target three-dimensional map, it further includes:
操作1120,获取在预设滑窗内所采集的IMU数据。 Operation 1120, acquire the IMU data collected within the preset sliding window.
采用VIO(Visual-Inertial Odometry,视觉-惯性里程计)系统可以将相机所采集的图像数据与IMU(Inertial measurement unit,惯性测量单元)数据进行融合实现更精确地定位。Using the VIO (Visual-Inertial Odometry, visual-inertial odometer) system can fuse the image data collected by the camera with the IMU (Inertial measurement unit, inertial measurement unit) data to achieve more accurate positioning.
由于电子设备中IMU采集IMU数据的采集频率大于采集RGB图像的频率,那么,在滑窗内包含10帧RGB图像时,一般情况下所采集到的IMU数据就多于10组。而在计算IMU的初始化信息时,是基于预设滑窗内所采集的所有IMU数据来进行计算的。因此,就需要获取在预设滑窗内所采集的所有IMU数据。Since the IMU data acquisition frequency of the electronic device is greater than the acquisition frequency of RGB images, then, when 10 frames of RGB images are included in the sliding window, generally more than 10 sets of IMU data are collected. When calculating the initialization information of the IMU, it is calculated based on all the IMU data collected in the preset sliding window. Therefore, it is necessary to acquire all the IMU data collected within the preset sliding window.
操作1140,根据预设滑窗内各帧RGB图像的位姿、IMU数据,计算IMU的初始化信息;初始化信息包括初始速度、IMU的零偏及IMU的重力向量。 Operation 1140, according to the pose and IMU data of each frame of RGB images in the preset sliding window, calculate the initialization information of the IMU; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU.
然后,就可以根据RGB图像的位姿、预设滑窗内所采集的所有IMU数据,利用RGB图像的位姿变换矩阵中的旋转变换矩阵进行旋转约束,利用RGB图像的平移变换矩阵进行平移约束,从而计算出IMU的初始化信息。IMU的初始化信息包括电子设备的初始速度、IMU的零偏(Bias)及IMU的重力向量。Then, according to the pose of the RGB image and all the IMU data collected in the preset sliding window, the rotation transformation matrix in the pose transformation matrix of the RGB image can be used for rotation constraints, and the translation transformation matrix of the RGB image can be used for translation constraints , so as to calculate the initialization information of the IMU. The initialization information of the IMU includes the initial velocity of the electronic device, the bias of the IMU (Bias) and the gravity vector of the IMU.
操作1160,根据初始位姿、目标三维地图及IMU的初始化信息,计算预设滑窗之后所采集的RGB图像的位姿。 Operation 1160, according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
在计算出了IMU的初始化信息之后,就可以根据初始位姿、目标三维地图及IMU的初始化信息,计算预设滑窗之后所采集的RGB图像的位姿。例如,计算第11帧图像的位姿,当然,本申请对此不做限定。其中,初始位姿就是首次采集到当前环境的深度图像对应的目标RGB图像的位姿。此时的目标三维地图就是基于滑窗内所有图像帧所构建的三维地图。After the initialization information of the IMU is calculated, the pose of the RGB image collected after the preset sliding window can be calculated according to the initial pose, the target 3D map, and the initialization information of the IMU. For example, calculate the pose of the 11th frame image, of course, this application is not limited to this. Among them, the initial pose is the pose of the target RGB image corresponding to the depth image of the current environment collected for the first time. The target 3D map at this time is the 3D map constructed based on all the image frames in the sliding window.
本申请实施例中,在计算预设滑窗之后所采集的RGB图像的位姿时,首先对IMU进行初始化,然后就可以不仅结合相机所采集的图像数据,还可以结合IMU数据计算预设滑窗之后所采集的RGB图像的位姿。从视觉和IMU两个维度上提高了位姿计算的适应性、鲁棒性。In the embodiment of the present application, when calculating the pose of the RGB image collected after the preset sliding window, the IMU is first initialized, and then the preset sliding window can be calculated not only by combining the image data collected by the camera, but also by combining the IMU data. The pose of the RGB image collected behind the window. The adaptability and robustness of pose calculation are improved from two dimensions of vision and IMU.
在一个实施例中,提供了一种位姿计算方法,还包括:In one embodiment, a pose calculation method is provided, further comprising:
若在预设初始化滑窗内未采集到当前环境的深度图像,则根据预设初始化滑窗内所采集的RGB图像,计算初始位姿及初始三维地图;If the depth image of the current environment is not collected in the preset initialization sliding window, the initial pose and initial three-dimensional map are calculated according to the RGB image collected in the preset initialization sliding window;
根据初始位姿及初始三维地图,计算预设初始化滑窗之后电子设备采集RGB图像时的位姿。According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.
本申请实施例中,若在预设初始化滑窗内未采集到当前环境的深度图像,则采用传统的VINS-MONO算法,根据预设初始化滑窗内所采集的RGB图像,计算初始位姿及初始三维地图。再根据初始位姿及初始三维地图,计算预设初始化滑窗之后电子设备采集RGB图像时的位姿。保证了在预设初始化滑窗内未采集到当前环境的深度图像的情况下,仍然能够在滑窗内需累计满10帧RGB图像之后,实时输出电子设备采集RGB图像时的位姿。In the embodiment of the present application, if the depth image of the current environment is not collected in the preset initialization sliding window, the traditional VINS-MONO algorithm is used to calculate the initial pose and position according to the RGB images collected in the preset initialization sliding window. Initial 3D map. Then, according to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.
在一个实施例中,提供了一种位姿计算方法,还包括:In one embodiment, a pose calculation method is provided, further comprising:
若深度图像对应的目标RGB图像为预设初始化滑窗内的最后一帧图像,则根据预设初始化滑窗内所采集的RGB图像,计算初始位姿及初始三维地图;If the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window;
根据初始位姿及初始三维地图,计算预设滑窗之后电子设备采集RGB图像时的位姿。According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
结合图12所示,为在预设初始化滑窗内未采集到当前环境的深度图像,或,深度图像对应的目标RGB图像为预设初始化滑窗内的最后一帧图像时计算位姿的示意图。As shown in Figure 12, it is a schematic diagram of calculating the pose when the depth image of the current environment is not collected in the preset initialization sliding window, or the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window .
本申请实施例中,若深度图像对应的目标RGB图像为预设初始化滑窗内的最后一帧图像,那么若采用本申请中的将采集深度图像时电子设备的位姿确定为初始位姿。根据初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。因此,采用本申请中的位姿计算方法也只能计算出滑窗之后电子设备采集RGB图像时的位姿,与采用传统的VINS-MONO算法计算预设初始化滑窗之后电子设备采集RGB图像时的位姿的时长基本相同。因此,若深度图像对应的目标RGB图像为预设初始化滑窗内的最后一帧图像,则可以采用传统的VINS-MONO算法、也可以采用本申请中的位姿计算方法来进行计算。因此,提供了多样化的位姿计算方式,更加灵活。In the embodiment of the present application, if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then if the method in this application is adopted, the pose of the electronic device when the depth image is collected is determined as the initial pose. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined. Therefore, using the pose calculation method in this application can only calculate the pose of the electronic device when the RGB image is collected after the sliding window, which is different from when the electronic device collects the RGB image after the sliding window is calculated by using the traditional VINS-MONO algorithm. The durations of the poses are basically the same. Therefore, if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, the traditional VINS-MONO algorithm or the pose calculation method in this application can be used for calculation. Therefore, a variety of pose calculation methods are provided, which is more flexible.
在一个具体的实施例中,如图13所示,提供了一种位姿计算方法,以首次采集到深度图像的RGB图像为预设初始化滑窗内的第二帧图像及第二帧图像之后的图像为例进行说明,该方法包括:In a specific embodiment, as shown in Figure 13, a pose calculation method is provided, with the RGB image of the depth image collected for the first time as the preset initialization of the second frame image in the sliding window and after the second frame image To illustrate the image of , the method includes:
操作1302,VIO系统启动初始化; Operation 1302, the VIO system starts and initializes;
操作1304,采集RGB图像、IMU数据及深度图像; Operation 1304, collecting RGB images, IMU data and depth images;
操作1306,对所采集的RGB图像及深度图像,进行去畸变及对齐处理; Operation 1306, performing de-distortion and alignment processing on the collected RGB image and depth image;
操作1308,对所采集的RGB图像的帧号计数为frame_count; Operation 1308, counting the frame number of the collected RGB image as frame_count;
操作1310,判断当前所采集的RGB图像是否存在对应的深度图像(Depth图),且为首次采集到当前环境的Depth图;若是,则进入操作1312;若否,则进入操作1318; Operation 1310, judging whether there is a corresponding depth image (Depth image) in the RGB image currently collected, and it is the Depth image of the current environment collected for the first time; if so, enter operation 1312; if not, enter operation 1318;
操作1312,将使用Depth图进行计算位姿的标志位start_depth_init=true; Operation 1312, use the Depth graph to calculate the flag bit start_depth_init=true;
操作1314,将该RGB图像的图像坐标系设置为世界坐标系,将该RGB图像的位姿作为初始位姿,并将初始位姿设置为0;并将该RGB图像的frame_count记为first_depth; Operation 1314, the image coordinate system of the RGB image is set as the world coordinate system, the pose of the RGB image is used as the initial pose, and the initial pose is set to 0; and the frame_count of the RGB image is recorded as first_depth;
操作1316,采集到下一帧RGB图像,则将frame_count更新为frame_count+1; Operation 1316, when the next frame of RGB image is collected, frame_count is updated to frame_count+1;
操作1318,判断first_depth是否小于滑窗大小windowsize(10帧);若是,则进入操作1320;若否,则进入操作1354; Operation 1318, judge whether first_depth is smaller than the sliding window size windowsize (10 frames); if so, then enter operation 1320; if not, then enter operation 1354;
操作1320,判断当前帧的帧号frame_count+1是否等于first_depth+1;若是,则进入操作1322;若 否,则进入操作1334; Operation 1320, judge whether the frame number frame_count+1 of current frame is equal to first_depth+1; If so, then enter operation 1322; If not, then enter operation 1334;
操作1322,采用预设透视投影PnP算法,计算first_depth帧与first_depth+1帧之间的第一位姿; Operation 1322, using the preset perspective projection PnP algorithm to calculate the first pose between the first_depth frame and the first_depth+1 frame;
操作1324,根据第一位姿,将Depth图上的3D特征点投影至first_depth+1帧上,生成投影2D特征点;计算投影2D特征点与first_depth+1帧上的2D特征点之间的重投影误差; Operation 1324, according to the first pose, project the 3D feature points on the Depth map onto the first_depth+1 frame to generate projected 2D feature points; calculate the weight between the projected 2D feature points and the 2D feature points on the first_depth+1 frame projection error;
操作1326,若重投影误差小于预设误差阈值,则将Depth图上的3D特征点作为目标地图点;根据目标地图点构建当前环境的初始三维地图。 Operation 1326, if the reprojection error is smaller than the preset error threshold, use the 3D feature point on the Depth map as the target map point; construct an initial 3D map of the current environment according to the target map point.
操作1328,计算位于first_depth帧之前的RGB图像与first_depth帧之间的第二位姿变换矩阵; Operation 1328, calculating the second pose transformation matrix between the RGB image before the first_depth frame and the first_depth frame;
操作1330,根据第二位姿变换矩阵,将Depth图上的3D特征点分别投影至位于first_depth帧之前的RGB图像上,生成投影2D特征点;计算投影2D特征点与位于first_depth帧之前的RGB图像上的2D特征点之间的重投影误差; Operation 1330, according to the second pose transformation matrix, respectively project the 3D feature points on the Depth map onto the RGB image before the first_depth frame to generate projected 2D feature points; calculate the projected 2D feature points and the RGB image before the first_depth frame The reprojection error between the 2D feature points on ;
操作1332,若重投影误差小于预设误差阈值,则将Depth图上的3D特征点作为新的目标地图点;将新的目标地图点添加至初始三维地图,生成目标三维地图; Operation 1332, if the reprojection error is less than the preset error threshold, then use the 3D feature point on the Depth map as a new target map point; add the new target map point to the initial 3D map to generate the target 3D map;
操作1334,根据目标三维地图、当前帧及位于当前帧之前的RGB图像,生成目标三维地图与当前帧之间相互匹配的3D-2D特征点对;Operation 1334, according to the target 3D map, the current frame, and the RGB image before the current frame, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame;
操作1336,基于3D-2D特征点对,采用传统的PnP算法计算当前帧的位姿;并进入操作1354,输出位姿; Operation 1336, based on the 3D-2D feature point pair, the traditional PnP algorithm is used to calculate the pose of the current frame; and enter operation 1354 to output the pose;
操作1338,判断当前帧是否存在对应的Depth图;若存在,则进入操作1340,若不存在,则进入操作1344; Operation 1338, judging whether there is a corresponding Depth map in the current frame; if yes, then enter operation 1340, if not, then enter operation 1344;
操作1340,根据当前帧对应的深度图像、当前帧与预设滑窗内当前帧之前的RGB图像之间的位姿变换矩阵,对目标三维地图进行更新生成中间三维地图; Operation 1340, update the target 3D map to generate an intermediate 3D map according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window;
操作1342,采用三角化法对中间三维地图进行更新,生成更新后的三维地图; Operation 1342, using the triangulation method to update the intermediate three-dimensional map to generate an updated three-dimensional map;
操作1344,采用三角化法对目标三维地图进行更新,生成更新后的三维地图;Operation 1344, using the triangulation method to update the target three-dimensional map to generate an updated three-dimensional map;
操作1346,判断frame_count是否等于windowsize(10帧);若是,则进入操作1348; Operation 1346, judge whether frame_count is equal to windowsize (10 frames); if so, then enter operation 1348;
操作1348,对所计算出的10帧图像对应的位姿进行BA优化; Operation 1348, performing BA optimization on the poses corresponding to the calculated 10 frames of images;
操作1350,基于进行BA优化后的10帧图像对应的位姿,进行IMU初始化; Operation 1350, based on the poses corresponding to the 10 frames of images after BA optimization, perform IMU initialization;
操作1352,根据初始位姿、目标三维地图及IMU的初始化信息,计算预设滑窗之后所采集的RGB图像的位姿。 Operation 1352, according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
操作1354,采用VINS-MONO算法计算位姿。In operation 1354, the VINS-MONO algorithm is used to calculate the pose.
本申请实施例中,不需要在预设初始化滑窗内采集满RGB图像,电子设备才能够输出采集预设初始化滑窗外的第一帧RGB图像时的位姿。而是在预设初始化滑窗内首次采集到当前环境的深度图像,且深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像时,就可以将采集深度图像时电子设备的位姿确定为初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。将首次输出位姿的时间,从预设初始化滑窗外大大提前到滑窗内且提前到首次接收到深度图像的下一帧。从而,在预设初始化滑窗内就可以实时输出位姿,降低了实时输出位姿的等待时间。In the embodiment of the present application, the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. The time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.
若在预设初始化滑窗内未采集到当前环境的深度图像或深度图像对应的目标RGB图像为预设初始化滑窗内的最后一帧图像,则采用传统的VINS-MONO算法,根据预设初始化滑窗内所采集的RGB图像,计算预设初始化滑窗之后电子设备采集RGB图像时的位姿。保证了在预设初始化滑窗内未采集到当前环境的深度图像的情况下,仍然能够在滑窗内需累计满10帧RGB图像之后,实时输出电子设备采集RGB图像时的位姿。If the depth image of the current environment is not collected in the preset initialization sliding window or the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, the traditional VINS-MONO algorithm is used to initialize according to the preset For the RGB image collected in the sliding window, calculate the pose when the electronic device collects the RGB image after the sliding window is preset and initialized. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.
同时,不需要在针对每帧RGB图像均对应采集到深度图像之后,才能够实时输出位姿。而是可以采用比采集RGB图像的频率较低的频率,来采集深度图,或可以采用可变频率来采集深度图,且只要在滑窗内采集到深度图,就可以开始初始化,并在初始化之后实时输出位姿。由于可以采用较低频率或可变频率采集深度图,就可以基于低频率的深度图去进行初始化及实时输出位姿,因此,避免了采集大量的数据,也避免了处理大量的数据,进而,降低了电子设备的功耗。At the same time, it is not necessary to output the pose in real time after the corresponding depth image is collected for each frame of RGB image. Instead, the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.
在一个实施例中,如图14所示,提供了一种位姿计算装置1400,装置包括:In one embodiment, as shown in FIG. 14 , a pose calculation device 1400 is provided, and the device includes:
初始位姿确定模块1420,用于若在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集深度图像时电子设备的位姿确定为初始位姿;其中,深度图像对应的目标RGB图像并非预设初始化滑窗内的最后一帧图像;The initial pose determination module 1420 is configured to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the target corresponding to the depth image The RGB image is not the last frame image in the preset initialization sliding window;
下一帧RGB图像位姿确定模块1440,用于根据初始位姿、目标RGB图像、深度图像及目标RGB图像的下一帧RGB图像,确定电子设备采集下一帧RGB图像时的位姿。The next frame RGB image pose determination module 1440 is configured to determine the pose of the electronic device when the next frame of RGB image is collected according to the initial pose, the target RGB image, the depth image, and the next frame RGB image of the target RGB image.
在一个实施例中,如图15所示,提供了一种位姿计算装置1400,装置还包括:In one embodiment, as shown in FIG. 15 , a pose calculation device 1400 is provided, and the device further includes:
目标三维地图构建模块1460,用于根据电子设备采集下一帧RGB图像时的位姿以及深度图像,构建当前环境的目标三维地图;The target three-dimensional map construction module 1460 is used to construct the target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image;
其他RGB图像位姿确定模块1480,用于根据目标三维地图,计算电子设备在预设初始化滑窗内采集位于下一帧RGB图像之后的其他RGB图像时的位姿。The other RGB image pose determination module 1480 is configured to calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window according to the target three-dimensional map.
在一个实施例中,目标三维地图构建模块1460,还包括:In one embodiment, the target three-dimensional map construction module 1460 also includes:
初始三维地图构建单元,用于根据第一位姿变换矩阵及深度图像,构建当前环境的初始三维地图;An initial three-dimensional map construction unit, configured to construct an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;
目标三维地图构建单元,用于计算位于目标RGB图像之前的RGB图像与目标RGB图像之间的第二位姿变换矩阵,并根据第二位姿变换矩阵及深度图像对当前环境的初始三维地图进行更新,构建当前环境的目标三维地图。The target three-dimensional map construction unit is used to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and perform the initial three-dimensional map of the current environment according to the second pose transformation matrix and the depth image Update to build a target 3D map of the current environment.
在一个实施例中,提供了一种位姿计算装置1400,装置还包括:In one embodiment, a pose calculation device 1400 is provided, and the device further includes:
位姿变换矩阵计算单元,还用于采用预设透视投影PnP算法,计算第一位姿变换矩阵或第二位姿变换矩阵;The pose transformation matrix calculation unit is also used to calculate the first pose transformation matrix or the second pose transformation matrix by using the preset perspective projection PnP algorithm;
其中,预设透视投影PnP算法用于分步计算目标RGB图像的相关RGB图像与目标RGB图像之间的旋转变换矩阵和平移变换矩阵,目标RGB图像的相关RGB图像为位于目标RGB图像之前的RGB图像或下一帧RGB图像。Among them, the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.
在一个实施例中,位姿变换矩阵计算单元,还用于,还包括:In one embodiment, the pose transformation matrix calculation unit is also used to further include:
旋转变换矩阵计算子单元,用于计算目标RGB图像的相关RGB图像与目标RGB图像之间的旋转变换矩阵;The rotation transformation matrix calculation subunit is used to calculate the rotation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;
平移变换矩阵计算子单元,用于根据深度图像、旋转变换矩阵,计算目标RGB图像的相关RGB图像与目标RGB图像之间的平移变换矩阵;The translation transformation matrix calculation subunit is used to calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;
位姿变换矩阵计算子单元,用于基于旋转变换矩阵及平移变换矩阵,生成目标RGB图像的相关RGB图像与目标RGB图像之间的第一位姿变换矩阵或第二位姿变换矩阵。The pose transformation matrix calculation subunit is used to generate a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image based on the rotation transformation matrix and the translation transformation matrix.
在一个实施例中,平移变换矩阵计算子单元,还用于根据旋转变换矩阵对目标RGB图像的相关RGB图像与目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对;根据深度图像将剔除后的2D-2D特征点对,转换为3D-2D特征点对;根据3D-2D特征点对,计算目标RGB图像的相关RGB图像与目标RGB图像之间的平移变换矩阵。In one embodiment, the translation transformation matrix calculation subunit is also used to eliminate the relevant RGB image of the target RGB image and the matching 2D-2D feature point pairs on the target RGB image according to the rotation transformation matrix, and obtain the 2D feature point after elimination. -2D feature point pair; according to the depth image, convert the 2D-2D feature point pair after elimination into a 3D-2D feature point pair; according to the 3D-2D feature point pair, calculate the relationship between the relevant RGB image of the target RGB image and the target RGB image The translation transformation matrix between .
在一个实施例中,初始三维地图构建单元,还用于根据第一位姿变换矩阵,将深度图像上的3D特征点投影至下一帧RGB图像上,生成投影2D特征点;计算投影2D特征点与下一帧RGB图像上的2D特征点之间的重投影误差;若重投影误差小于预设误差阈值,则将深度图像上的3D特征点作为目标地图点;根据目标地图点构建当前环境的初始三维地图。In one embodiment, the initial three-dimensional map construction unit is also used to project the 3D feature points on the depth image to the next frame of RGB image according to the first pose transformation matrix to generate projected 2D feature points; calculate projected 2D feature points The reprojection error between the point and the 2D feature point on the next RGB image; if the reprojection error is less than the preset error threshold, the 3D feature point on the depth image is used as the target map point; the current environment is constructed according to the target map point The initial 3D map of .
在一个实施例中,目标三维地图构建单元,还用于根据第二位姿变换矩阵,将深度图像上的3D特征点分别投影至位于目标RGB图像之前的RGB图像上,生成投影2D特征点;计算投影2D特征点与位于目标RGB图像之前的RGB图像上的2D特征点之间的重投影误差;若重投影误差小于预设误差阈值,则将深度图像上的3D特征点作为新的目标地图点;将新的目标地图点添加至初始三维地图,构建当前环境的目标三维地图。In one embodiment, the target three-dimensional map construction unit is further configured to respectively project the 3D feature points on the depth image onto the RGB image before the target RGB image according to the second pose transformation matrix to generate projected 2D feature points; Calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the target RGB image; if the reprojection error is less than the preset error threshold, use the 3D feature point on the depth image as the new target map point; add new target map points to the initial 3D map to construct the target 3D map of the current environment.
在一个实施例中,其他RGB图像位姿确定模块1480,包括:In one embodiment, other RGB image pose determination module 1480 includes:
当前帧定义单元,用于将下一帧RGB图像的下一帧作为当前帧,执行以下目标操作:The current frame definition unit is used to use the next frame of the next frame RGB image as the current frame, and perform the following target operations:
目标操作单元,用于根据目标三维地图、当前帧及位于当前帧之前的RGB图像,生成目标三维地 图与当前帧之间相互匹配的3D-2D特征点对;基于3D-2D特征点对计算当前帧的位姿;根据当前帧对目标三维地图进行更新,并将更新后的三维地图作为新的目标三维地图;The target operation unit is used to generate a 3D-2D feature point pair matching between the target 3D map and the current frame according to the target 3D map, the current frame and the RGB image before the current frame; calculate the current frame based on the 3D-2D feature point pair The pose of the frame; update the target 3D map according to the current frame, and use the updated 3D map as the new target 3D map;
循环单元,用于将当前帧的下一帧作为新的当前帧,循环执行目标操作直到计算出预设初始化滑窗内最后一帧RGB图像的位姿为止。The loop unit is used to use the next frame of the current frame as a new current frame, and execute the target operation in a loop until the pose of the last frame of the RGB image in the preset initialization sliding window is calculated.
在一个实施例中,目标操作单元,还用于从当前帧及位于当前帧之前的RGB图像中,获取相互匹配的2D特征点对;从目标三维地图中的地图点中获取与2D特征点对匹配的3D特征点,根据匹配的3D特征点及2D特征点对,生成目标三维地图与当前帧之间相互匹配的3D-2D特征点对。In one embodiment, the target operation unit is further configured to obtain matching 2D feature point pairs from the current frame and the RGB image before the current frame; obtain the 2D feature point pairs from the map points in the target three-dimensional map The matched 3D feature points, according to the matched 3D feature points and 2D feature point pairs, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame.
在一个实施例中,若当前帧存在对应的深度图像,则目标操作单元,还用于根据当前帧对应的深度图像、当前帧与预设滑窗内当前帧之前的RGB图像之间的位姿变换矩阵,对目标三维地图进行更新生成中间三维地图;采用三角化法对中间三维地图进行更新,生成更新后的三维地图。In one embodiment, if there is a corresponding depth image in the current frame, the target operation unit is also used to calculate the current frame according to the depth image corresponding to the current frame, the pose between the current frame and the RGB image before the current frame in the preset sliding window The transformation matrix is used to update the target 3D map to generate an intermediate 3D map; the triangulation method is used to update the intermediate 3D map to generate an updated 3D map.
在一个实施例中,若当前帧不存在对应的深度图像,则目标操作单元,还用于采用三角化法对目标三维地图进行更新,生成更新后的三维地图。In one embodiment, if there is no corresponding depth image in the current frame, the target operation unit is further configured to update the target 3D map by using a triangulation method to generate an updated 3D map.
在一个实施例中,如图16所示,提供了一种位姿计算装置1600,装置还包括:In one embodiment, as shown in FIG. 16 , a pose calculation device 1600 is provided, and the device further includes:
IMU数据获取模块1620,用于获取在预设滑窗内所采集的IMU数据;IMU data acquisition module 1620, for acquiring the IMU data collected in the preset sliding window;
IMU初始化模块1640,用于根据预设滑窗内各帧RGB图像的位姿、IMU数据,计算IMU的初始化信息;初始化信息包括初始速度、IMU的零偏及IMU的重力向量;The IMU initialization module 1640 is used to calculate the initialization information of the IMU according to the pose and IMU data of each frame of the RGB image in the preset sliding window; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU;
第一位姿计算模块1660,用于根据初始位姿、目标三维地图及IMU的初始化信息,计算预设滑窗之后所采集的RGB图像的位姿。The first pose calculation module 1660 is configured to calculate the pose of the RGB image collected after the preset sliding window according to the initial pose, the target 3D map and the initialization information of the IMU.
在一个实施例中,提供了一种位姿计算装置,该装置还包括:In one embodiment, a pose calculation device is provided, the device also includes:
第二位姿计算模块,用于若在预设初始化滑窗内未采集到当前环境的深度图像,则根据预设初始化滑窗内所采集的RGB图像,计算初始位姿及初始三维地图;The second pose calculation module is used to calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window if the depth image of the current environment is not collected in the preset initialization sliding window;
根据初始位姿及初始三维地图,计算预设初始化滑窗之后电子设备采集RGB图像时的位姿。According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.
在一个实施例中,提供了一种位姿计算装置,该装置还包括:In one embodiment, a pose calculation device is provided, the device also includes:
第三位姿计算模块,用于若深度图像对应的目标RGB图像为预设初始化滑窗内的最后一帧图像,则根据预设初始化滑窗内所采集的RGB图像,计算初始位姿及初始三维地图;The third pose calculation module is used to calculate the initial pose and initial position according to the RGB images collected in the preset initialization sliding window if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window. 3D map;
根据初始位姿及初始三维地图,计算预设滑窗之后电子设备采集RGB图像时的位姿。According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
应该理解的是,虽然图中的流程图中的各个操作按照箭头的指示依次显示,但是这些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,图中的至少一部分操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various operations in the flow charts in the figure are displayed sequentially according to the arrows, these operations are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these operations, and these operations can be executed in other orders. Moreover, at least some of the operations in the figure may include multiple sub-operations or multiple stages. These sub-operations or stages are not necessarily executed at the same time, but may be executed at different times. The execution order of these sub-operations or stages It is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of other operations or sub-operations or stages of other operations.
位姿计算装置中各个模块的划分仅用于举例说明,在其他实施例中,可将位姿计算装置按照需要划分为不同的模块,以完成位姿计算装置的全部或部分功能。The division of each module in the pose calculation device is only for illustration. In other embodiments, the pose calculation device can be divided into different modules according to needs, so as to complete all or part of the functions of the pose calculation device.
关于位姿计算装置的具体限定可以参见上文中对于位姿计算方法的限定,在此不再赘述。位姿计算装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the pose calculation device, refer to the above-mentioned limitations on the pose calculation method, which will not be repeated here. Each module in the pose calculation device can be fully or partially realized by software, hardware and a combination thereof. Each module can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can call and execute the corresponding operations of the above modules.
在一个实施例中,还提供了一种电子设备,包括存储器及处理器,存储器中储存有计算机程序,计算机程序被处理器执行时,使得处理器执行以上各个实施例所提供的一种位姿计算方法的操作。In one embodiment, an electronic device is also provided, including a memory and a processor, and a computer program is stored in the memory. When the computer program is executed by the processor, the processor executes a pose provided by each of the above embodiments. The operation of the calculation method.
图17为一个实施例中电子设备的内部结构示意图。如图17所示,该电子设备包括通过系统总线连接的处理器和存储器。其中,该处理器用于提供计算得到和控制能力,支撑整个电子设备的运行。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行,以用于实现以上各个实施例所提供的一种位姿计算方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。该电子设备可以是手机、平板电脑、PDA (Personal DigitalAssistant,个人数字助理)、POS(Point ofSales,销售终端)、车载电脑、穿戴式设备等任意终端设备。Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment. As shown in FIG. 17, the electronic device includes a processor and a memory connected through a system bus. Wherein, the processor is used to provide calculation and control capabilities to support the operation of the entire electronic device. The memory may include non-volatile storage media and internal memory. Nonvolatile storage media store operating systems and computer programs. The computer program can be executed by a processor, so as to implement a pose calculation method provided in the above embodiments. The internal memory provides a high-speed running environment for the operating system computer program in the non-volatile storage medium. The electronic device may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a wearable device.
本申请实施例中提供的位姿计算装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在电子设备或电子设备上运行。该计算机程序构成的程序模块可存储在电子设备或电子设备的存储器上。该计算机程序被处理器执行时,实现本申请实施例中所描述方法的操作。The implementation of each module in the pose calculation device provided in the embodiment of the present application may be in the form of a computer program. The computer program can run on or on an electronic device. The program modules constituted by the computer program can be stored in the electronic device or the memory of the electronic device. When the computer program is executed by the processor, the operations of the methods described in the embodiments of the present application are realized.
本申请实施例还提供了一种计算机可读存储介质。一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当计算机可执行指令被一个或多个处理器执行时,使得处理器执行位姿计算方法的操作。The embodiment of the present application also provides a computer-readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform operations of the pose calculation method.
一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行位姿计算方法。A computer program product containing instructions, when run on a computer, causes the computer to execute the pose calculation method.
本申请实施例所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。Any reference to memory, storage, database, or other media used in embodiments of the present application may include non-volatile and/or volatile memory. Suitable nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
以上初始化实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The initialization examples above only represent several implementation modes of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the present application. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims (19)

  1. 一种位姿计算方法,其特征在于,所述方法包括:A pose calculation method, characterized in that the method comprises:
    若在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集所述深度图像时电子设备的位姿确定为初始位姿;其中,所述深度图像对应的目标RGB图像并非所述预设初始化滑窗内的最后一帧图像;If the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the Default initialization of the last frame image in the sliding window;
    根据所述初始位姿、所述目标RGB图像、所述深度图像及所述目标RGB图像的下一帧RGB图像,确定所述电子设备采集所述下一帧RGB图像时的位姿。According to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.
  2. 根据权利要求1所述的位姿计算方法,其特征在于,所述方法还包括:The pose calculation method according to claim 1, wherein the method further comprises:
    根据所述电子设备采集所述下一帧RGB图像时的位姿以及所述深度图像,构建所述当前环境的目标三维地图;Constructing a target three-dimensional map of the current environment according to the pose and the depth image when the electronic device collects the next frame of RGB images;
    根据所述目标三维地图,计算所述电子设备在所述预设初始化滑窗内采集位于所述下一帧RGB图像之后的其他RGB图像时的位姿。According to the target three-dimensional map, calculate the pose of the electronic device when collecting other RGB images located after the next RGB image within the preset initialization sliding window.
  3. 根据权利要求2所述的位姿计算方法,其特征在于,所述根据所述电子设备采集所述下一帧RGB图像时的位姿以及所述深度图像,构建所述当前环境的目标三维地图,包括:The pose calculation method according to claim 2, characterized in that, according to the pose and the depth image when the next frame of RGB image is collected by the electronic device, the target three-dimensional map of the current environment is constructed ,include:
    计算所述下一帧RGB图像与所述目标RGB图像之间的第一位姿变换矩阵,根据所述第一位姿变换矩阵及所述深度图像,构建所述当前环境的目标三维地图。Calculate a first pose transformation matrix between the next frame of RGB image and the target RGB image, and construct a target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image.
  4. 根据权利要求3所述的位姿计算方法,其特征在于,所述根据所述第一位姿变换矩阵及所述深度图像,构建所述当前环境的目标三维地图,包括:The pose calculation method according to claim 3, wherein said constructing the target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image comprises:
    根据所述第一位姿变换矩阵及所述深度图像,构建所述当前环境的初始三维地图;Constructing an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;
    计算位于所述目标RGB图像之前的RGB图像与所述目标RGB图像之间的第二位姿变换矩阵,并根据所述第二位姿变换矩阵及所述深度图像对所述当前环境的初始三维地图进行更新,构建所述当前环境的目标三维地图。calculating a second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and performing an initial three-dimensional analysis of the current environment according to the second pose transformation matrix and the depth image The map is updated to construct a target three-dimensional map of the current environment.
  5. 根据权利要求4所述的位姿计算方法,其特征在于,所述方法还包括:The pose calculation method according to claim 4, wherein the method further comprises:
    采用预设透视投影PnP算法,计算所述第一位姿变换矩阵或所述第二位姿变换矩阵;calculating the first pose transformation matrix or the second pose transformation matrix by using a preset perspective projection PnP algorithm;
    其中,所述预设透视投影PnP算法用于分步计算所述目标RGB图像的相关RGB图像与所述目标RGB图像之间的旋转变换矩阵和平移变换矩阵,所述目标RGB图像的相关RGB图像为所述位于所述目标RGB图像之前的RGB图像或所述下一帧RGB图像。Wherein, the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image or the next frame of RGB image.
  6. 根据权利要求5所述的位姿计算方法,其特征在于,所述采用预设透视投影PnP算法,计算所述第一位姿变换矩阵或所述第二位姿变换矩阵,包括:The pose calculation method according to claim 5, wherein the calculation of the first pose transformation matrix or the second pose transformation matrix using a preset perspective projection PnP algorithm includes:
    计算所述目标RGB图像的相关RGB图像与所述目标RGB图像之间的旋转变换矩阵;calculating a rotation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;
    根据所述深度图像、所述旋转变换矩阵,计算所述目标RGB图像的相关RGB图像与所述目标RGB图像之间的平移变换矩阵;According to the depth image and the rotation transformation matrix, calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;
    基于所述旋转变换矩阵及所述平移变换矩阵,生成所述目标RGB图像的相关RGB图像与所述目标RGB图像之间的所述第一位姿变换矩阵或所述第二位姿变换矩阵。Based on the rotation transformation matrix and the translation transformation matrix, generate the first pose transformation matrix or the second pose transformation matrix between a related RGB image of the target RGB image and the target RGB image.
  7. 根据权利要求6所述的位姿计算方法,其特征在于,所述根据所述深度图像、所述旋转变换矩阵,计算所述目标RGB图像的相关RGB图像与所述目标RGB图像之间的平移变换矩阵,包括:The pose calculation method according to claim 6, wherein, according to the depth image and the rotation transformation matrix, the translation between the relevant RGB image of the target RGB image and the target RGB image is calculated Transformation matrix, including:
    根据所述旋转变换矩阵对所述目标RGB图像的相关RGB图像与所述目标RGB图像上相互匹配的2D-2D特征点对进行剔除,得到剔除后的2D-2D特征点对;Eliminating the 2D-2D feature point pairs matched to each other on the relevant RGB image of the target RGB image and the target RGB image according to the rotation transformation matrix, to obtain the 2D-2D feature point pairs after elimination;
    根据所述深度图像将所述剔除后的2D-2D特征点对,转换为3D-2D特征点对;converting the removed 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image;
    根据所述3D-2D特征点对,计算所述目标RGB图像的相关RGB图像与所述目标RGB图像之间的所述平移变换矩阵。According to the 3D-2D feature point pair, calculate the translation transformation matrix between the related RGB image of the target RGB image and the target RGB image.
  8. 根据权利要求4所述的位姿计算方法,其特征在于,所述根据第一位姿变换矩阵及所述深度图像,构建所述当前环境的初始三维地图,包括:The pose calculation method according to claim 4, wherein said constructing the initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image comprises:
    根据所述第一位姿变换矩阵,将所述深度图像上的3D特征点投影至所述下一帧RGB图像上,生成投影2D特征点;According to the first pose transformation matrix, project the 3D feature points on the depth image onto the next frame of RGB image to generate projected 2D feature points;
    计算所述投影2D特征点与所述下一帧RGB图像上的2D特征点之间的重投影误差;Calculate the reprojection error between the projected 2D feature points and the 2D feature points on the next frame RGB image;
    若所述重投影误差小于预设误差阈值,则将所述深度图像上的3D特征点作为目标地图点;If the reprojection error is less than a preset error threshold, using the 3D feature point on the depth image as a target map point;
    根据所述目标地图点构建所述当前环境的初始三维地图。An initial three-dimensional map of the current environment is constructed according to the target map points.
  9. 根据权利要求4-8中任一项所述的位姿计算方法,其特征在于,所述根据所述第二位姿变换矩阵及所述深度图像对所述当前环境的初始三维地图进行更新,构建所述当前环境的目标三维地图,包括:The pose calculation method according to any one of claims 4-8, wherein the initial three-dimensional map of the current environment is updated according to the second pose transformation matrix and the depth image, Constructing the target three-dimensional map of the current environment, including:
    根据所述第二位姿变换矩阵,将所述深度图像上的3D特征点分别投影至位于所述目标RGB图像之前的RGB图像上,生成投影2D特征点;According to the second pose transformation matrix, respectively project the 3D feature points on the depth image onto the RGB image before the target RGB image to generate projected 2D feature points;
    计算所述投影2D特征点与位于所述目标RGB图像之前的RGB图像上的2D特征点之间的重投影误差;calculating a reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image;
    若所述重投影误差小于预设误差阈值,则将所述深度图像上的3D特征点作为新的目标地图点;If the reprojection error is less than a preset error threshold, using the 3D feature point on the depth image as a new target map point;
    将所述新的目标地图点添加至所述初始三维地图,构建所述当前环境的目标三维地图。The new target map point is added to the initial three-dimensional map to construct the target three-dimensional map of the current environment.
  10. 根据权利要求2所述的位姿计算方法,其特征在于,所述根据所述目标三维地图,计算所述电子设备在所述预设初始化滑窗内采集所述下一帧RGB图像之后的其他RGB图像时的位姿,包括:The pose calculation method according to claim 2, wherein, according to the three-dimensional map of the target, calculate other positions after the next frame of RGB image is collected by the electronic device in the preset initialization sliding window. The pose of the RGB image, including:
    将所述下一帧RGB图像的下一帧作为当前帧,执行以下目标操作:Using the next frame of the next frame RGB image as the current frame, perform the following target operations:
    根据所述目标三维地图、所述当前帧及位于所述当前帧之前的RGB图像,生成所述目标三维地图与所述当前帧之间相互匹配的3D-2D特征点对;generating 3D-2D feature point pairs that match each other between the target three-dimensional map and the current frame according to the target three-dimensional map, the current frame, and the RGB image before the current frame;
    基于所述3D-2D特征点对计算所述当前帧的位姿;calculating the pose of the current frame based on the 3D-2D feature point pair;
    根据所述当前帧对所述目标三维地图进行更新,并将更新后的三维地图作为新的目标三维地图;updating the target three-dimensional map according to the current frame, and using the updated three-dimensional map as a new target three-dimensional map;
    将所述当前帧的下一帧作为新的当前帧,循环执行所述目标操作直到计算出所述预设初始化滑窗内最后一帧RGB图像的位姿为止。The next frame of the current frame is used as a new current frame, and the target operation is cyclically executed until the pose of the last frame of the RGB image in the preset initialization sliding window is calculated.
  11. 根据权利要求10所述的位姿计算方法,其特征在于,所述根据所述目标三维地图、所述当前帧及位于所述当前帧之前的RGB图像,生成所述目标三维地图与所述当前帧之间相互匹配的3D-2D特征点对,包括:The pose calculation method according to claim 10, wherein the target three-dimensional map and the current 3D-2D feature point pairs that match each other between frames, including:
    从所述当前帧及位于所述当前帧之前的RGB图像中,获取相互匹配的2D特征点对;Obtaining matching 2D feature point pairs from the current frame and the RGB image before the current frame;
    从所述目标三维地图中的地图点中获取与2D特征点对匹配的3D特征点,根据所述匹配的3D特征点及所述2D特征点对,生成所述目标三维地图与所述当前帧之间相互匹配的3D-2D特征点对。Obtaining 3D feature points matched with 2D feature point pairs from map points in the target 3D map, and generating the target 3D map and the current frame according to the matched 3D feature points and the 2D feature point pairs. 3D-2D feature point pairs that match each other.
  12. 根据权利要求10所述的位姿计算方法,其特征在于,若所述当前帧存在对应的深度图像,则所述根据所述当前帧对所述目标三维地图进行更新,生成更新后的三维地图,包括:The pose calculation method according to claim 10, wherein if there is a corresponding depth image in the current frame, the target three-dimensional map is updated according to the current frame to generate an updated three-dimensional map ,include:
    根据所述当前帧对应的深度图像、所述当前帧与预设滑窗内所述当前帧之前的RGB图像之间的位姿变换矩阵,对所述目标三维地图进行更新生成中间三维地图;updating the target three-dimensional map to generate an intermediate three-dimensional map according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window;
    采用三角化法对所述中间三维地图进行更新,生成所述更新后的三维地图。The intermediate three-dimensional map is updated by using a triangulation method to generate the updated three-dimensional map.
  13. 根据权利要求10所述的位姿计算方法,其特征在于,若所述当前帧不存在对应的深度图像,则所述根据所述当前帧对所述目标三维地图进行更新,生成更新后的三维地图,包括:The pose calculation method according to claim 10, wherein if there is no corresponding depth image in the current frame, the target 3D map is updated according to the current frame to generate an updated 3D map Maps, including:
    采用三角化法对所述目标三维地图进行更新,生成更新后的三维地图。The three-dimensional map of the target is updated by using a triangulation method to generate an updated three-dimensional map.
  14. 根据权利要求2所述的位姿计算方法,其特征在于,在所述根据所述目标三维地图,计算所述电子设备在所述预设初始化滑窗内采集位于所述下一帧RGB图像之后的其他RGB图像时的位姿之后,还包括:The pose calculation method according to claim 2, characterized in that, according to the three-dimensional map of the target, the calculation is performed by the electronic device after the acquisition of the next frame of RGB image in the preset initialization sliding window After the pose of the other RGB images, also include:
    获取在所述预设滑窗内所采集的IMU数据;Obtain the IMU data collected within the preset sliding window;
    根据所述预设滑窗内各帧RGB图像的位姿、所述IMU数据,计算所述IMU的初始化信息;所述初始化信息包括初始速度、所述IMU的零偏及所述IMU的重力向量;Calculate the initialization information of the IMU according to the pose of each frame of the RGB image in the preset sliding window and the IMU data; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU ;
    根据所述初始位姿、所述目标三维地图及所述IMU的初始化信息,计算所述预设滑窗之后所采集的RGB图像的位姿。According to the initial pose, the target three-dimensional map, and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
  15. 根据权利要求1所述的位姿计算方法,其特征在于,所述方法还包括:The pose calculation method according to claim 1, wherein the method further comprises:
    若在所述预设初始化滑窗内未采集到当前环境的深度图像,则根据所述预设初始化滑窗内所采集的RGB图像,计算所述初始位姿及初始三维地图;If the depth image of the current environment is not collected in the preset initialization sliding window, then calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window;
    根据所述初始位姿及所述初始三维地图,计算所述预设初始化滑窗之后所述电子设备采集RGB图像时的位姿。According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset initialization sliding window.
  16. 根据权利要求1所述的位姿计算方法,其特征在于,所述方法还包括:The pose calculation method according to claim 1, wherein the method further comprises:
    若所述深度图像对应的目标RGB图像为所述预设初始化滑窗内的最后一帧图像,则根据所述预设初始化滑窗内所采集的所述RGB图像,计算所述初始位姿及初始三维地图;If the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then calculate the initial pose and pose according to the RGB image collected in the preset initialization sliding window initial 3D map;
    根据所述初始位姿及所述初始三维地图,计算所述预设滑窗之后所述电子设备采集RGB图像时的位姿。According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
  17. 一种位姿计算装置,其特征在于,所述装置包括:A pose computing device, characterized in that the device comprises:
    初始位姿确定模块,用于若在预设初始化滑窗内首次采集到当前环境的深度图像,则将采集所述深度图像时电子设备的位姿确定为初始位姿;其中,所述深度图像对应的目标RGB图像并非所述预设初始化滑窗内的最后一帧图像;The initial pose determination module is used to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the depth image The corresponding target RGB image is not the last frame image in the preset initialization sliding window;
    下一帧RGB图像位姿确定模块,用于根据所述初始位姿、所述目标RGB图像、所述深度图像及所述目标RGB图像的下一帧RGB图像,确定所述电子设备采集所述下一帧RGB图像时的位姿。The next frame RGB image pose determination module is used to determine the electronic device to collect the The pose of the next frame of RGB image.
  18. 一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至16中任一项所述的位姿计算方法的操作。An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, wherein when the computer program is executed by the processor, the processor is made to execute any one of claims 1 to 16 An operation of the described pose calculation method.
  19. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至16中任一项所述的位姿计算方法的操作。A computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the operation of the pose calculation method according to any one of claims 1 to 16 is realized.
PCT/CN2022/098295 2021-07-29 2022-06-13 Pose calculation method and apparatus, electronic device, and readable storage medium WO2023005457A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110866966.3A CN113610918A (en) 2021-07-29 2021-07-29 Pose calculation method and device, electronic equipment and readable storage medium
CN202110866966.3 2021-07-29

Publications (1)

Publication Number Publication Date
WO2023005457A1 true WO2023005457A1 (en) 2023-02-02

Family

ID=78306042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098295 WO2023005457A1 (en) 2021-07-29 2022-06-13 Pose calculation method and apparatus, electronic device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN113610918A (en)
WO (1) WO2023005457A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237553A (en) * 2023-09-14 2023-12-15 广东省核工业地质局测绘院 Three-dimensional map mapping system based on point cloud image fusion
CN117419690A (en) * 2023-12-13 2024-01-19 陕西欧卡电子智能科技有限公司 Pose estimation method, device and medium of unmanned ship

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610918A (en) * 2021-07-29 2021-11-05 Oppo广东移动通信有限公司 Pose calculation method and device, electronic equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610175A (en) * 2017-08-04 2018-01-19 华南理工大学 The monocular vision SLAM algorithms optimized based on semi-direct method and sliding window
US20190221000A1 (en) * 2017-01-16 2019-07-18 Shapetrace Inc. Depth camera 3d pose estimation using 3d cad models
CN111160298A (en) * 2019-12-31 2020-05-15 深圳市优必选科技股份有限公司 Robot and pose estimation method and device thereof
CN112164117A (en) * 2020-09-30 2021-01-01 武汉科技大学 V-SLAM pose estimation method based on Kinect camera
CN112435206A (en) * 2020-11-24 2021-03-02 北京交通大学 Method for reconstructing three-dimensional information of object by using depth camera
CN112819860A (en) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 Visual inertial system initialization method and device, medium and electronic equipment
CN112907620A (en) * 2021-01-25 2021-06-04 北京地平线机器人技术研发有限公司 Camera pose estimation method and device, readable storage medium and electronic equipment
CN113610918A (en) * 2021-07-29 2021-11-05 Oppo广东移动通信有限公司 Pose calculation method and device, electronic equipment and readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190221000A1 (en) * 2017-01-16 2019-07-18 Shapetrace Inc. Depth camera 3d pose estimation using 3d cad models
CN107610175A (en) * 2017-08-04 2018-01-19 华南理工大学 The monocular vision SLAM algorithms optimized based on semi-direct method and sliding window
CN111160298A (en) * 2019-12-31 2020-05-15 深圳市优必选科技股份有限公司 Robot and pose estimation method and device thereof
CN112164117A (en) * 2020-09-30 2021-01-01 武汉科技大学 V-SLAM pose estimation method based on Kinect camera
CN112435206A (en) * 2020-11-24 2021-03-02 北京交通大学 Method for reconstructing three-dimensional information of object by using depth camera
CN112907620A (en) * 2021-01-25 2021-06-04 北京地平线机器人技术研发有限公司 Camera pose estimation method and device, readable storage medium and electronic equipment
CN112819860A (en) * 2021-02-18 2021-05-18 Oppo广东移动通信有限公司 Visual inertial system initialization method and device, medium and electronic equipment
CN113610918A (en) * 2021-07-29 2021-11-05 Oppo广东移动通信有限公司 Pose calculation method and device, electronic equipment and readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237553A (en) * 2023-09-14 2023-12-15 广东省核工业地质局测绘院 Three-dimensional map mapping system based on point cloud image fusion
CN117419690A (en) * 2023-12-13 2024-01-19 陕西欧卡电子智能科技有限公司 Pose estimation method, device and medium of unmanned ship
CN117419690B (en) * 2023-12-13 2024-03-12 陕西欧卡电子智能科技有限公司 Pose estimation method, device and medium of unmanned ship

Also Published As

Publication number Publication date
CN113610918A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN109727288B (en) System and method for monocular simultaneous localization and mapping
US11668571B2 (en) Simultaneous localization and mapping (SLAM) using dual event cameras
WO2023005457A1 (en) Pose calculation method and apparatus, electronic device, and readable storage medium
CN107990899B (en) Positioning method and system based on SLAM
CN110246147B (en) Visual inertial odometer method, visual inertial odometer device and mobile equipment
CN107747941B (en) Binocular vision positioning method, device and system
US10371529B2 (en) Computational budget estimation for vision-aided inertial navigation systems
CN109506642B (en) Robot multi-camera visual inertia real-time positioning method and device
WO2019157925A1 (en) Visual-inertial odometry implementation method and system
WO2020206903A1 (en) Image matching method and device, and computer readable storage medium
Saurer et al. Homography based visual odometry with known vertical direction and weak manhattan world assumption
WO2019104571A1 (en) Image processing method and device
US11380078B2 (en) 3-D reconstruction using augmented reality frameworks
US20210183100A1 (en) Data processing method and apparatus
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
Honegger et al. Embedded real-time multi-baseline stereo
CN111623773B (en) Target positioning method and device based on fisheye vision and inertial measurement
CN114013449A (en) Data processing method and device for automatic driving vehicle and automatic driving vehicle
CN113190120B (en) Pose acquisition method and device, electronic equipment and storage medium
CN113610702B (en) Picture construction method and device, electronic equipment and storage medium
US20120093393A1 (en) Camera translation using rotation from device
CN113570716A (en) Cloud three-dimensional map construction method, system and equipment
CN113034347A (en) Oblique photographic image processing method, device, processing equipment and storage medium
CN114092564B (en) External parameter calibration method, system, terminal and medium for non-overlapping vision multi-camera system
CN117710591A (en) Space map processing method, device, electronic equipment, medium and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22848066

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE