WO2023005457A1

WO2023005457A1 - Pose calculation method and apparatus, electronic device, and readable storage medium

Info

Publication number: WO2023005457A1
Application number: PCT/CN2022/098295
Authority: WO
Inventors: 尹赫
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-07-29
Filing date: 2022-06-13
Publication date: 2023-02-02
Also published as: CN113610918A

Abstract

The present application relates to a pose calculation method and apparatus, an electronic device, and a computer readable storage medium. The method comprises: if a depth image of a current environment is collected in a preset initialization sliding window for the first time, determining the pose of the electronic device in collecting the depth image as an initial pose, wherein a target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window (220); and determining, according to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, the pose of the electronic device in collecting the next frame of RGB image (240).

Description

Pose calculation method and device, electronic device, readable storage medium

This application claims the priority of the Chinese patent application submitted to the China Patent Office on July 29, 2021 with the application number 202110866966.3. Incorporated in this application by reference.

technical field

The present application relates to the technical field of computer vision, in particular to a pose calculation method and device, electronic equipment, and a readable storage medium.

Background technique

The position and attitude of electronic devices in an unknown environment (referred to as pose) is one of the key technologies in industries such as augmented reality, virtual reality, mobile robots, and unmanned driving. And with the rapid development of these industries, higher and higher requirements are put forward for the accuracy of the positioning of objects in the surrounding environment by electronic devices. When using the VIO (Visual-Inertial Odometry, visual-inertial odometer) system to locate objects in the surrounding environment, one of its core operations is to determine the pose of the electronic device when capturing each frame of image.

In the traditional method, when determining the pose of the electronic device when capturing each frame of image, it is necessary to collect a certain number of RGB images or depth images of the surrounding environment based on the camera on the electronic device, so as to be able to The image is initialized, and then the pose of the electronic device is output in real time when each frame of image is captured. Obviously, it takes a long time to collect a certain number of RGB images and/or depth images, which in turn leads to a long waiting time for real-time output poses.

Contents of the invention

Embodiments of the present application provide a pose calculation method and device, electronic equipment, and a readable storage medium, which can reduce the waiting time for real-time output pose.

A pose calculation method, the method comprising:

If the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the Default initialization of the last frame image in the sliding window;

According to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.

A pose computing device, the device comprising:

The initial pose determination module is used to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the depth image The corresponding target RGB image is not the last frame image in the preset initialization sliding window;

A pose determination module, configured to determine that the electronic device acquires the next frame of RGB images according to the initial pose, the target RGB image, the depth image, and the next frame of RGB images of the target RGB image time pose.

An electronic device includes a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the operation of the pose calculation method as described above.

A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the operation of the pose calculation method as described above is realized.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

Fig. 1 is an application environment diagram of a pose calculation method in an embodiment;

Fig. 2 is the flow chart of pose computing method in an embodiment;

Fig. 3 is the flow chart of pose calculation method in another embodiment;

Fig. 4 is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment;

Fig. 5 adopts preset perspective projection PnP algorithm in one embodiment, calculates the flow chart of the first pose transformation matrix method between next frame RGB image and target RGB image;

FIG. 6 is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment;

Fig. 7 is the flowchart of calculating the first translation transformation matrix method between the next frame RGB image and the target RGB image in Fig. 5;

FIG. 8 is a flowchart of a method for constructing a target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image in one embodiment;

Fig. 9 is a flow chart of the method for calculating the pose of the electronic device when the RGB image after the next frame of RGB image is collected in the preset initialization sliding window according to the three-dimensional map of the target in Fig. 3;

Fig. 10 is a schematic diagram of the pose and three-dimensional map calculation when calculating the RGB image after collecting the next frame of RGB image in one embodiment;

Fig. 11 is a flowchart of a pose calculation method in another embodiment;

Fig. 12 is a schematic diagram of a pose calculation method in yet another embodiment;

Fig. 13 is a flowchart of a pose calculation method in a specific embodiment;

Fig. 14 is a structural block diagram of a pose calculation device in an embodiment;

Fig. 15 is a structural block diagram of a pose calculation device in an embodiment;

Fig. 16 is a structural block diagram of a pose calculation device in another embodiment;

Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

One of the traditional methods uses the VINS (visual-inertial system, visual-inertial system)-MONO algorithm when determining the pose of the electronic device when capturing each frame of image. The specific operation of the VINS-MONO algorithm includes: assuming that there are 10 frames of images in the preset initialization sliding window. Of course, the size of the preset initialization sliding window is not specifically limited in this application. Then, the first step is to collect RGB images through the camera on the electronic device. After accumulating 10 frames of RGB images in the preset initialization sliding window, two frames of images with parallax satisfying the conditions are selected from the 10 frames of RGB images (for example, L frame and R frame), and then use the epipolar geometric constraints to calculate the pose between these two frames. In the second step, the pose is used to restore the map points that are co-viewed between the two frames using the triangulation method. The third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error . The fourth step is to use the triangulation method between the arbitrary frame and the L frame and the R frame to restore the map points that are co-viewed between the arbitrary frame and the L frame and the R frame. By repeating the third and fourth steps, the poses of the 10 frames of RGB images in the preset initialization sliding window and the map points corresponding to these 10 frames of RGB images can be calculated.

There is another traditional method. When determining the pose of the electronic device when capturing each frame of image, the VINS-RGBD algorithm is used. The specific operations of the VINS-RGBD algorithm include: assuming that there are 10 frames of images in the preset initialization sliding window, Then, in the first step, the camera on the electronic device collects the RGB image and the depth image at the same time, and it is necessary to ensure that each frame of the RGB image has a corresponding depth image. The correspondence here means that the RGB image is aligned with the depth image in time and space. In the second step, after accumulating 10 frames of RGB images and corresponding depth images in the preset initialization sliding window, two frames of images (such as L frame and R frame) are screened out from these 10 frames of RGB images. The traditional PnP algorithm is used to calculate the pose of the L frame at the same time, and then combined with the depth image corresponding to the L frame, the reprojection error meets the preset threshold to filter the effective map points, and then restore the map points. The third step is to project these map points onto any frame in the above 10 frames of RGB images except L frame and R frame, and calculate the pose of any frame by minimizing the reprojection error . The fourth step is to use the depth image of any frame and the L frame to recover the map points that can be co-viewed and the reprojection error meets the threshold requirement. By repeating the third and fourth steps, the poses of the 10 frames of RGB images in the preset initialization sliding window and the map points corresponding to these 10 frames of RGB images can be calculated.

In the above two traditional methods, when outputting the pose in real time, it is necessary to collect a certain number of RGB images or depth images of the surrounding environment based on the camera on the electronic device, so that the real-time output can be based on the certain number of RGB images or depth images. The pose of the electronic device when capturing each frame of image. Obviously, it takes a long time to collect a certain number of RGB images and/or depth images, which in turn leads to a long waiting time for real-time output poses.

Therefore, in order to solve the problem of long waiting time for real-time output pose, a pose calculation method is proposed in the embodiment of the present application. It is not necessary to collect full RGB images in the preset initialization sliding window before the electronic device can output the collected images. Preset the pose when initializing the first frame of RGB image outside the sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. Obviously, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output pose.

Fig. 1 is an application scene diagram of a pose calculation method in an embodiment. As shown in FIG. 1 , the application environment includes an electronic device 120, and the electronic device 120 includes a first camera and a second camera. Wherein, the first camera is an RGB camera, and the second camera is a camera for collecting depth images, for example, a TOF (Time-offlight, time-of-flight) camera or a structured light camera, which is not limited in this application. The electronic device 120 collects the RGB image and the depth image of the current environment respectively through the first camera and the second camera. If the electronic device 120 collects the depth image of the current environment for the first time within the preset initialization sliding window, the electronic device will collect the depth image when the depth image is collected. The pose of is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined. The electronic device 120 can be a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device (smart bracelet, smart watch, smart glasses, smart gloves, smart socks, smart belt, etc.), VR (virtual reality , virtual reality) devices, smart homes, driverless cars and other arbitrary terminal devices.

Fig. 2 is a flowchart of a pose calculation method in an embodiment. The pose calculation method in this embodiment is described by taking the operation on the electronic device 120 in FIG. 1 as an example, and the electronic device 120 includes a first camera and a second camera. Wherein, the first camera is an RGB camera, and the second camera is a camera for collecting depth images. The pose calculation method includes operation 220-operation 240, wherein,

Operation 220, if the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the preset initialization The last image frame within the sliding window.

In the traditional method, it is necessary to collect a certain number of image frames before the pose of the electronic device can be output in real time when each frame of image is captured. These certain number of image frames fill up the preset initialization sliding window (hereinafter referred to as the sliding window), that is, the size of the preset initialization sliding window is equal to the duration of collecting a certain number of image frames. For example, if the default initialization sliding window is set to include 10 frames of RGB images, then after collecting 10 frames of RGB images, the 10 frames of RGB images will fill the preset initialization sliding window. At this time, the size of the preset initialization sliding window is the same as The duration of collecting 10 frames of RGB images is equal.

In the embodiment of the present application, the electronic device does not need to collect full RGB images within the preset initialization sliding window before the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window. Instead, the RGB image is collected through the first camera in the electronic device, and the depth image is collected through the second camera at the same time. If the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the position of the electronic device when the depth image is collected will be The pose is determined as the initial pose, and the image coordinate system of the target RGB image is used as the world coordinate system, which realizes the visual initialization of the electronic device. And there is no need to select two frames of images whose parallax meets the condition from the 10 frames of RGB images, so the adaptability is wider.

First, since the acquisition frequency of the second camera to acquire the depth image may be lower than the acquisition frequency of the first camera to acquire the RGB image, it will not be able to acquire the depth image corresponding to each frame of the RGB image, that is, relative to the RGB image Part of the depth image will be missing. Second, in electronic equipment, often due to the abnormality of the second camera and related software and hardware, some depth images are missing relative to the RGB image or the alignment of the collected RGB image and the depth image is abnormal. All of the above will lead to the inability to collect the corresponding depth image for each frame of RGB image.

Operation 240, according to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.

Determine the pose of the electronic device when collecting the depth image as the initial pose, and use the image coordinate system of the target RGB image as the world coordinate system. After realizing the visual initialization of the electronic device, you can output and collect the next frame of RGB in real time. pose of the image.

Specifically, after determining the initial pose and the world coordinate system, according to the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image. First, the perspective projection PnP algorithm can be used, based on the matching 2D-2D feature point pairs between the target RGB image and the next frame RGB image of the target RGB image; secondly, according to these 2D-2D feature point pairs combined with the depth image. 3D points to obtain matching 3D-2D feature point pairs; again, based on the 3D-2D feature point pairs, calculate the pose transformation matrix between the next frame RGB image of the target RGB image and the target RGB image. Finally, based on the pose transformation matrix and the initial pose, the pose of the next frame of RGB image can be generated. Among them, pose refers to position and attitude, and pose is a six-dimensional matrix, including three positions (X, Y, Z) and three attitude angles (heading, pitch, roll).

Among them, the perspective projection PnP algorithm here can be the traditional perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are simultaneously calculated based on the 3D-2D feature point pairs, and the rotation transformation matrix and the translation transformation matrix constitute Pose transformation matrix. Of course, the perspective projection PnP algorithm here may be a new perspective projection PnP algorithm, that is, the rotation transformation matrix and the translation transformation matrix between two frames are calculated step by step based on the 3D-2D feature point pairs.

In the embodiment of the present application, the electronic device can output the pose when collecting the first frame of RGB images outside the preset initialization sliding window without collecting full RGB images within the preset initialization sliding window. Instead, when the depth image of the current environment is collected for the first time in the preset initialization sliding window, and the target RGB image corresponding to the depth image is not the last frame image in the preset initialization sliding window, the electronic device's The pose is determined as the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, and the pose of the electronic device when the next frame of RGB image is collected is determined. The time of the first output pose is greatly advanced from the preset initialization sliding window to the inside of the sliding window and to the next frame when the depth image is first received. Therefore, the pose can be output in real time within the preset initialization sliding window, which reduces the waiting time for real-time output of the pose.

At the same time, it is not necessary to output the pose in real time after the corresponding depth image is collected for each frame of RGB image. Instead, the depth map can be collected at a frequency lower than the frequency of collecting RGB images, or a variable frequency can be used to collect the depth map, and as long as the depth map is collected within the sliding window, the initialization can be started, and in the initialization Then the pose is output in real time. Since the depth map can be collected at a lower frequency or variable frequency, it can be initialized and the pose can be output in real time based on the low-frequency depth map. Therefore, it avoids collecting a large amount of data and avoids processing a large amount of data. Further, Reduced power consumption of electronic equipment.

Continuing from the previous embodiment, after calculating the pose of the electronic device when it captures the next frame of RGB image, sequentially calculate the pose of the electronic device when it collects other RGB images after the next frame of RGB image within the preset initialization sliding window . Then, as shown in Figure 3, a pose calculation method is provided, which also includes:

In operation 260, construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image.

In operation 240, the pose of the electronic device when capturing the next frame of RGB image is calculated, and at this time, the target 3D map of the current environment can be constructed based on the depth image of the target RGB image and the pose of the next frame of RGB image . Wherein, when constructing the target three-dimensional map of the current environment, it can be divided into the following two situations.

One of the cases is assuming that the target RGB image is the first frame image in the preset initialization sliding window, that is, the corresponding depth image is collected for the first frame RGB image in the preset initialization sliding window. Then, because there is no RGB image before the target RGB image, when constructing the target 3D map of the current environment, only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated, according to With a pose transformation matrix and a depth image, a target 3D map of the current environment can be constructed. Here, if the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly.

In another case, assuming that the target RGB image is not the first frame of image in the preset initialization sliding window, that is, the corresponding depth image is not collected from the first frame of RGB image in the sliding window. Then, when constructing the target 3D map of the current environment, because the target RGB image has also collected RGB images before, the map points can not only be restored based on the target RGB image, but also based on the RGB images collected before the target RGB image To restore the map points, so as to restore more map points to enrich the 3D map of the current environment.

At this time, the first pose transformation matrix between the next frame of RGB image and the target RGB image is calculated similarly, and the initial three-dimensional map of the current environment can be constructed according to the first pose transformation matrix and the depth image. Here, if the first pose transformation matrix has been calculated in the process of calculating the pose of the next frame of RGB image, it only needs to be obtained directly. Then, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .

Operation 280, according to the target three-dimensional map, calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window.

After the target three-dimensional map is constructed, the position and orientation of the electronic device when collecting other RGB images after the next RGB image frame within the sliding window can be calculated according to the target three-dimensional map. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is based on these RGB images and the depth image. Obviously, the map points on the target three-dimensional map are more comprehensive than the depth image.

Therefore, by calculating the poses of other RGB images located after the next frame of RGB images when the electronic device collects the other RGB images within the preset initialization sliding window, the poses of other RGB images can be directly calculated according to the target three-dimensional map.

In the embodiment of the present application, the target three-dimensional map of the current environment is constructed according to the pose and depth image when the electronic device collects the next frame of RGB image. Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is obvious that the map points on the target 3D map are relative to the depth image. Said more comprehensively. Therefore, according to the target three-dimensional map, the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.

In one embodiment, operation 260 is to construct a target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image, including:

Calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, and construct the target 3D map of the current environment according to the first pose transformation matrix and the depth image.

In one embodiment, if the target RGB image is the first frame image in the preset initialization sliding window, there is no RGB image before the target RGB image, so it is only necessary to calculate the first frame image between the next frame RGB image and the target RGB image. A pose transformation matrix. Then, according to the first pose transformation matrix and the depth image, a target 3D map of the current environment is constructed.

In another embodiment, if the target RGB image is not the first frame image within the preset initialization sliding window, there is an RGB image before the target RGB image. Therefore, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, and construct the initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image. Secondly, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map .

In the embodiment of the present application, if the target RGB image is the first frame image in the preset initialization sliding window, only the first pose transformation matrix between the next frame RGB image and the target RGB image needs to be calculated. Then, according to the first pose transformation matrix and the depth image, the target 3D map of the current environment is directly constructed. If the target RGB image is not the first frame image in the preset initialization sliding window, first calculate the first pose transformation matrix between the next frame RGB image and the target RGB image. Then, according to the first pose transformation matrix and the depth image, an initial 3D map of the current environment is constructed. Finally, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map . Since the target 3D map is constructed based on the first frame of RGB image in the sliding window to the next frame of RGB image of the target RGB image, combined with the depth image of the target RGB image, it is obvious that the map points on the target 3D map are relative to the depth image. Said more comprehensively. Therefore, according to the target three-dimensional map, the electronic device calculates the pose of other RGB images located after the next frame of RGB images in the preset initialization sliding window. It greatly improves the calculated pose accuracy of other RGB images after the next RGB image.

In one embodiment, according to the pose and depth image when the electronic device collects the next frame of RGB images, a target three-dimensional map of the current environment is constructed, including:

Construct an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;

Calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.

As shown in FIG. 4 , it is a schematic diagram of constructing a target three-dimensional map of the current environment if the target RGB image is not the first frame image in the preset initialization sliding window in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame of RGB image. Then, when constructing the target 3D map of the current environment at this time, it specifically includes two operations:

In the first step, an initial 3D map of the current environment is constructed. Specifically, calculate the first pose transformation matrix between the next RGB image (frame 5) and the target RGB image (frame 4), and construct the initial three-dimensional image of the current environment based on the first pose transformation matrix and the depth image map. The method of calculating the initial three-dimensional map of the current environment here is the same as if the target RGB image is the first frame image in the preset initialization sliding window, then in operation 260, the pose and depth image of the next frame RGB image are collected according to the electronic device, The method of constructing the target 3D map of the current environment is the same, and will not be repeated here.

In the second step, the initial 3D map is updated to generate the target 3D map. Specifically, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and supplement the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map.

For the RGB image before the target RGB image, that is, for the first, second, and third frames, calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, that is, calculate the fourth frame and The second pose transformation matrix between

frames

1, 2, and 3. Then update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate a target 3D map.

Specifically, first calculate the second pose transformation matrix between the fourth frame and the first frame, and then update the initial three-dimensional map of the current environment according to the second pose transformation matrix and the depth image to generate an updated three-dimensional map . Next, calculate the second pose transformation matrix between the fourth frame and the second frame, and then update the updated 3D map again according to the second pose transformation matrix and the depth image to generate a reupdated 3D map. Calculate the second pose transformation matrix between the 4th frame and the 3rd frame, and then update the updated 3D map of the current environment for the third time according to the second pose transformation matrix and the depth image to generate the target 3D map .

In the embodiment of the present application, if the target RGB image is not the first frame image in the preset initialization sliding window, when constructing the target three-dimensional map of the current environment, it includes: calculating the first frame image between the next frame RGB image and the target RGB image A pose transformation matrix, constructing an initial 3D map of the current environment according to the first pose transformation matrix and the depth image. Then calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and update the initial 3D map of the current environment according to the second pose transformation matrix and the depth image to generate the target 3D map. In the finally obtained target three-dimensional map, not only the map points corresponding to the next frame of RGB image of the target RGB image are restored, but also the map points of each frame of images before the target RGB image are restored. Therefore, the integrity and accuracy of the target three-dimensional map obtained at this time are greatly improved.

In one embodiment, a pose calculation method is provided, further comprising:

A preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix. Among them, the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.

Wherein, the first pose transformation matrix between the next frame RGB image and the target RGB image is calculated, including:

The preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix (also referred to as pose) between the next frame of RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used for step-by-step calculation Rotation transformation matrix and translation transformation matrix between a frame of RGB image and target RGB image.

Among them, the preset perspective projection PnP algorithm is relative to the traditional perspective projection PnP algorithm. The traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. . The preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between two frames step by step.

Specifically, when using the preset perspective projection PnP algorithm to calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, first, calculate the first pose transformation matrix between the next frame RGB image and the target RGB image Rotation transformation matrix; secondly, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image. And the first rotation transformation matrix and the first translation transformation matrix constitute the first pose transformation matrix.

Similarly, calculate the first pose transformation matrix between the RGB image before the target RGB image and the target RGB image, including:

The preset perspective projection PnP algorithm is used to calculate the second pose transformation matrix (also referred to as pose) between the RGB image before the target RGB image and the target RGB image; wherein, the preset perspective projection PnP algorithm is used to analyze The step is to calculate a second rotation transformation matrix and a second translation transformation matrix between the RGB image before the target RGB image and the target RGB image. The second rotation transformation matrix and the second translation transformation matrix constitute a second pose transformation matrix.

In the embodiment of the present application, when calculating the pose transformation matrix between the RGB image before the target RGB image or between the RGB image of the next frame and the target RGB image, the process of calculating the rotation transformation matrix and the translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the rotation transformation matrix with the error generated when calculating the translation transformation matrix. Therefore, the accuracy of the first or second pose transformation matrix finally obtained by adopting the preset perspective projection PnP algorithm to realize the step-by-step calculation is improved.

In one embodiment, a preset perspective projection PnP algorithm is used to calculate the first pose transformation matrix or the second pose transformation matrix, including:

Calculate the rotation transformation matrix between the related RGB image of the target RGB image and the target RGB image;

Calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;

Based on the rotation transformation matrix and the translation transformation matrix, a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image is generated.

Among them, if the relevant RGB image of the target RGB image is the next frame of RGB image, that is, there is no RGB image before the target RGB image, as shown in Figure 5, using the preset perspective projection PnP algorithm, only the first pose transformation needs to be calculated Matrix, including:

Operation 520, calculating a first rotation transformation matrix between the RGB image of the next frame and the target RGB image.

Because some areas in the collected depth images have no depth information, or some areas have inaccurate depth information, therefore, the matched 2D-2D feature point pairs between two frames of RGB images are generally better than those of one frame. There are more and more comprehensive 3D-2D feature point pairs matched between the depth image and another RGB image.

In conjunction with FIG. 6 , it is a schematic diagram of calculating a pose transformation matrix using a preset perspective projection PnP algorithm in an embodiment. In Fig. 6, image i is the target RGB image, image j is the next frame RGB image of the target RGB image, and the oval frame corresponds to the depth image corresponding to the target RGB image. A pair of image i and image j on the left side of FIG. 6 is a schematic diagram of determining mutually matching 2D-2D feature point pairs in image i and image j and calculating the first rotation transformation matrix R _ij .

Specifically, the matched 2D-2D feature point pairs can be determined between the two frames of the next RGB image and the target RGB image. Specifically, the optical flow method or other image matching methods can be used to determine the 2D-2D feature points between the two frames. Pairs of feature points. Calculate the first rotation transformation matrix R _ij between the next frame of RGB image and the target RGB image through the matching 2D-2D feature point pairs between the next frame of RGB image and the target RGB image, combined with epipolar geometric constraints .

Operation 540: Calculate a first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix.

Specifically, firstly, the matched 2D-2D feature point pairs on the next frame RGB image and the target RGB image are eliminated according to the first rotation transformation matrix, to obtain the eliminated 2D-2D feature point pairs. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, calculate the first translation transformation matrix t _ij between the RGB image of the next frame and the target RGB image.

Operation 560, based on the first rotation transformation matrix and the first translation transformation matrix, generate a first pose transformation matrix between the next frame of RGB image and the target RGB image.

Combining the first rotation transformation matrix R _ij and the first translation transformation matrix t _ij calculated above, the first pose transformation matrix between the next frame of RGB image and the target RGB image is generated.

In the embodiment of the present application, the first rotation transformation matrix between the next frame RGB image and the target RGB image is calculated. According to the depth image and the first rotation transformation matrix, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image. Based on the first rotation transformation matrix and the first translation transformation matrix, a first pose transformation matrix between the next frame of RGB image and the target RGB image is generated. The preset perspective projection PnP algorithm is used to realize the step-by-step calculation of the rotation transformation matrix and the translation transformation matrix, avoiding the superposition of the errors generated in the calculation process of the two, and thus improving the accuracy of the final first pose transformation matrix sex.

Continuing from the previous embodiment, as shown in FIG. 7, operation 540 is to calculate the first translation transformation matrix between the next frame RGB image and the target RGB image according to the depth image and the first rotation transformation matrix, including:

Operation 542: Eliminate the matching 2D-2D feature point pairs on the RGB image of the next frame and the target RGB image according to the first rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.

As shown in FIG. 6 , image i is the target RGB image, image j is the next frame of RGB image of the target RGB image, and the oval frame corresponds to the depth image corresponding to the target RGB image. As shown in the pair of image i and image j on the right side of Figure 6, in image i and image j, the matching 2D-2D feature point pairs on image i and image j are eliminated according to the first rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.

Operation 544, converting the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.

Combined with the depth image shown in the oval box in FIG. 6, the 2D-2D feature point pairs after elimination are converted into 3D-2D feature point pairs according to the depth image. And use the Ransanc algorithm to eliminate the abnormal point pairs in the 3D-2D feature point pairs, and generate the 3D-2D feature point pairs after elimination.

Operation 546, according to the 3D-2D feature point pair, calculate the first translation transformation matrix between the RGB image of the next frame and the target RGB image.

After calculating the 3D-2D feature point pairs between the target RGB image and the next frame RGB image, the first translation transformation between the next frame RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula matrix. The formula for calculating the translation transformation matrix is as follows:

in,

and

is the 3D-2D matching point pair between image i and image j, R _ij is the rotation transformation matrix from image j to image i, t _ij is the translation transformation matrix from image j to image i,

yes

The corresponding depth value, π() ^-1 is a transformation matrix that back-projects 2D points into 3D points.

For multiple 3D-2D feature point pairs, the above formula (1-1) is used to construct the least squares formula, and the optimal variable t _ij can be calculated as the first variable between the next frame RGB image and the target RGB image. The translation transformation matrix.

In the embodiment of the present application, when calculating the first translation transformation matrix between the next frame of RGB image and the target RGB image, first, according to the first rotation transformation matrix, the mutual matching of the next frame of RGB image and the target RGB image The 2D-2D feature point pairs are eliminated to obtain the eliminated 2D-2D feature point pairs. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the first translation transformation matrix between the next frame RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated for many times, and the least square method is used to calculate the first translation transformation matrix, which improves the accuracy of the calculated first translation transformation matrix.

In one embodiment, if the target RGB image is the first frame image in the preset initialization sliding window, then calculate the first pose transformation matrix between the next frame RGB image and the target RGB image, as shown in Figure 8, According to the first pose transformation matrix and the depth image, construct the target 3D map of the current environment, including:

Operation 820, according to the first pose transformation matrix, project the 3D feature points on the depth image onto the next frame of RGB image to generate projected 2D feature points.

Among them, the first pose transformation matrix is the pose transformation matrix between the next frame RGB image and the target RGB image. The 3D-2D matching point pairs can be determined based on the depth image and target RGB image corresponding to the target RGB image, and then based on the first pose transformation matrix, the 3D feature points on the depth image can be projected onto the next frame of RGB image to generate Project 2D feature points.

Operation 840, calculating a reprojection error between the projected 2D feature point and the 2D feature point on the RGB image of the next frame.

In operation 860, if the reprojection error is smaller than the preset error threshold, use the 3D feature point on the depth image as the target map point.

Specifically, there are many original 2D feature points on the next frame of RGB image, and at this time, the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the re-projection error is smaller than the preset error threshold, the 3D feature point corresponding to the re-projection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point on the depth image is used as the target map point.

In operation 880, construct a target three-dimensional map of the current environment according to the target map points.

After using the 3D feature points that meet the conditions in the 3D feature points on the depth image as the target map points, the target 3D map of the current environment can be constructed based on these target map points.

In the embodiment of the present application, if the target RGB image is the first frame image in the preset initialization sliding window, then according to the first pose transformation matrix, the 3D feature points on the depth image are projected onto the next frame RGB image to generate Project 2D feature points. Calculate the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image of the next frame. If the reprojection error is less than the preset error threshold, the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points. When the target RGB image is the first frame image in the preset initialization sliding window, the target three-dimensional map of the current environment can be calculated. Then, when calculating the pose of the electronic device when it collects other RGB images after the next RGB image within the preset initialization sliding window, it can directly use the target three-dimensional map for calculation.

Among them, if the relevant RGB image of the target RGB image is the RGB image before the target RGB image and the next frame of RGB image, that is, there is an RGB image before the target RGB image, then the preset perspective projection PnP algorithm needs to be calculated at the same time pose transformation matrix and the second pose transformation matrix. Wherein, the process of calculating the first pose transformation matrix is not repeated here, and the calculation of the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image includes:

A preset perspective projection PnP algorithm is used to calculate a second pose transformation matrix between the RGB image before the target RGB image and the target RGB image.

Specifically, when using the preset perspective projection PnP algorithm to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, first, calculate the difference between the RGB image before the target RGB image and the target RGB image The second rotation transformation matrix between; secondly, according to the depth image and the second rotation transformation matrix, calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image. And the second rotation transformation matrix and the second translation transformation matrix constitute the second pose transformation matrix.

In the embodiment of the present application, when calculating the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, the process of calculating the second rotation transformation matrix and the second translation transformation matrix is decoupled. It is possible to avoid superimposing the error generated when calculating the second rotation transformation matrix with the error generated when calculating the second translation transformation matrix. Therefore, the preset perspective projection PnP algorithm is improved, and the second pose transformation matrix is calculated step by step, thereby improving the accuracy of the second pose transformation matrix.

In one embodiment, according to the depth image and the second rotation transformation matrix, calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image includes:

Operation 1: Eliminate the 2D-2D feature point pairs that match each other on the RGB image before the target RGB image and the target RGB image according to the second rotation transformation matrix, and obtain the 2D-2D feature point pairs after elimination.

As shown in FIG. 6 , it is assumed that image i is the target RGB image, image j is the RGB image before the target RGB image of the target RGB image, and the oval frame corresponds to the depth image corresponding to the target RGB image. As shown in the pair of image i and image j on the right side of Figure 6, in image i and image j, the matching 2D-2D feature point pairs on image i and image j are eliminated according to the second rotation transformation matrix , to get the 2D-2D feature point pairs after elimination.

Operation 2, convert the culled 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image.

Operation three, according to the 3D-2D feature point pair, calculate the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image.

After calculating the 3D-2D feature point pairs between the target RGB image and the RGB image before the target RGB image, the distance between the RGB image before the target RGB image and the target RGB image can be calculated based on the translation transformation matrix calculation formula The second translation transformation matrix. The formula for calculating the translation transformation matrix is as follows:

in,

and

yes

For multiple 3D-2D feature point pairs, the above formula (1-1) is used to construct the least squares formula, and the optimal variable t _ij can be calculated as the difference between the RGB image before the target RGB image and the target RGB image. The second translation transformation matrix.

In the embodiment of the present application, when calculating the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image, first, according to the second rotation transformation matrix, the RGB image before the target RGB image and the target RGB image The 2D-2D feature point pairs that match each other are eliminated, and the 2D-2D feature point pairs after elimination are obtained. Secondly, the culled 2D-2D feature point pairs are converted into 3D-2D feature point pairs according to the depth image. Finally, according to the 3D-2D feature point pair, the second translation transformation matrix between the RGB image before the target RGB image and the target RGB image is calculated. The characteristic point pairs are eliminated multiple times, and the least square method is used to calculate the second translation transformation matrix, which improves the accuracy of the calculated second translation transformation matrix.

In one embodiment, the initial three-dimensional map of the current environment is updated according to the second pose transformation matrix and the depth image to generate a target three-dimensional map, including:

According to the second pose transformation matrix, the 3D feature points on the depth image are respectively projected onto the RGB image before the target RGB image to generate projected 2D feature points;

Calculate the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image;

If the reprojection error is less than the preset error threshold, the 3D feature point on the depth image is used as a new target map point;

Add new target map points to the initial 3D map to generate the target 3D map.

Specifically, the second pose transformation matrix is a pose transformation matrix between the RGB image before the target RGB image and the target RGB image. The 3D-2D matching point pairs can be determined based on the depth image and the target RGB image corresponding to the target RGB image, and then based on the second pose transformation matrix, the 3D feature points on the depth image can be projected onto the RGB image before the target RGB image , generating projected 2D feature points.

However, there are many original 2D feature points on the RGB image before the target RGB image. At this time, the reprojection error between these projected 2D feature points and the original 2D feature point positions is calculated. If the reprojection error is smaller than the preset error threshold, the 3D feature point corresponding to the reprojection error smaller than the preset error threshold is considered to be a credible map point, and the 3D feature point is used as the target map point.

Then, among the 3D feature points on the depth image, the 3D feature points that satisfy the conditions are used as the target map points, and the target 3D map of the current environment can be constructed based on these target map points.

In the embodiment of the present application, if the target RGB image is not the first frame image in the preset initialization sliding window, then according to the second pose transformation matrix, the 3D feature points on the depth image are projected onto the RGB image before the target RGB image , generating projected 2D feature points. Computes the reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image. If the reprojection error is less than the preset error threshold, the 3D feature points on the depth image are used as target map points, and a target 3D map of the current environment is constructed according to the target map points. It realizes calculating the target 3D map of the current environment when the target RGB image is not the first frame image in the preset initialization sliding window. Then, when calculating the pose of the electronic device when it collects other RGB images after the next frame of RGB images within the preset initialization sliding window, it can directly use the target three-dimensional map for calculation.

In one embodiment, as shown in FIG. 9, operation 280, according to the target three-dimensional map, calculates the pose of the electronic device when the RGB image after the next frame of RGB image is collected within the preset initialization sliding window, including:

Operation 282, using the next frame of the next RGB image as the current frame, and performing the following target operations:

Operation 284, according to the target three-dimensional map, the current frame and the RGB image before the current frame, generate a pair of 3D-2D feature points that match each other between the target three-dimensional map and the current frame;

Operation 286, calculating the pose of the current frame based on the 3D-2D feature point pair;

Operation 288, update the target three-dimensional map according to the current frame, and use the updated three-dimensional map as a new target three-dimensional map.

The next frame of the current frame is used as the new current frame, and the target operation is executed cyclically until the pose of the last RGB image in the preset initialization sliding window is calculated.

In conjunction with FIG. 10 , it is a schematic diagram of calculating the pose and three-dimensional map calculation of the RGB image after the acquisition of the next frame of RGB image in one embodiment. For example, suppose there are 10 frames of images in the default initialization sliding window, and the depth image is collected at the 4th frame, that is, the target RGB image is the 4th frame, and the next RGB image frame of the target RGB image is the 5th frame. At this time, the calculation of the pose when the RGB image after the next frame of RGB image is collected is to calculate the pose when the 6th-10th frame of image is collected.

Starting from the calculation of the pose when the sixth frame of image is collected, first, the next frame (frame 6) of the next RGB image (frame 5) is used as the current frame to calculate the pose of the current frame.

Secondly, according to the target 3D map, the RGB image before the current frame and the current frame (6th frame), a 3D-2D feature point pair matching between the target 3D map and the current frame (6th frame) is generated. Specifically, from the RGB image located before the current frame and the current frame, the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs. Obtain 3D feature points that match 2D feature point pairs from the map points in the target 3D map, and generate 3D-2D feature points that match each other between the target 3D map and the current frame based on the matched 3D feature points and 2D feature point pairs right.

Again, based on the 3D-2D feature point pairs, the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame. Among them, the traditional perspective projection PnP algorithm is based on 3D-2D feature point pairs to simultaneously calculate the rotation transformation matrix and translation transformation matrix between two frames. Therefore, when the traditional perspective projection PnP algorithm is used to calculate the pose of the current frame, the rotation transformation matrix and translation transformation matrix between the RGB image before the current frame and the current frame are respectively calculated based on the 3D-2D feature point pairs. Furthermore, the pose transformation matrix is obtained based on the rotation transformation matrix and the translation transformation matrix, and then the pose of the current frame is obtained based on the pose transformation matrix and the pose of the RGB image before the current frame respectively. Specifically, the pose transformation matrix may be obtained based on the multiplication of the rotation transformation matrix and the translation transformation matrix, and this calculation method is not limited in this application.

Finally, the target 3D map is updated according to the current frame, and the updated 3D map is used as a new target 3D map. Taking the next frame of RGB image (frame 7) of the current frame as the new current frame, the target operation operations 284-286 are cyclically executed until the pose of the last frame of RGB image in the preset initialization sliding window is calculated.

In the embodiment of the present application, in the process of calculating the pose of the electronic device when the RGB image after the next frame of RGB image is collected in the preset initialization sliding window, a loop method is adopted. First, based on the target 3D map and the current frame and RGB image located before the current frame, calculate the pose of the current frame. Secondly, the target three-dimensional map is updated based on the current frame, and the updated three-dimensional map is used as a new target three-dimensional map. Finally, the pose of the new current frame is calculated based on the new target 3D map, the next group of current frames and the RGB images before the current frame. Secondly, the new target three-dimensional map is updated again based on the new current frame, and the updated three-dimensional map is used as the new target three-dimensional map. This loops until the pose of the last frame of RGB image within the preset initialization sliding window is calculated.

Since the three-dimensional map calculated based on all previous frames is used when calculating the pose of the next frame, the accuracy of the calculated pose of the next frame is improved.

In one embodiment, according to the target three-dimensional map, the current frame, and the RGB image before the current frame, a pair of 3D-2D feature points matching each other between the target three-dimensional map and the current frame is generated, including:

Obtain matching 2D feature point pairs from the current frame and the RGB image before the current frame;

Obtain 3D feature points that match 2D feature point pairs from the map points in the target 3D map, and generate 3D-2D feature points that match each other between the target 3D map and the current frame based on the matched 3D feature points and 2D feature point pairs right.

In the embodiment of the present application, from the current frame and the RGB image before the current frame, the optical flow method or other image matching methods are used to obtain mutually matched 2D feature point pairs. Since the RGB images before the current frame provide some map points when calculating the target 3D map, the 3D feature points matching the 2D feature point pair can be obtained from the map points in the target 3D map. Then, based on the 3D feature point and the 2D feature point of the current frame in the 2D feature point pair, a 3D-2D feature point pair matching between the target 3D map and the current frame is generated. Therefore, based on the 3D-2D feature point pair, the matching relationship between the 3D feature points on the target 3D map and the 2D feature points on the current frame is obtained.

In one embodiment, the current frame is the image frame after the next frame image of the target RGB image in the sliding window. If there is a corresponding depth image in the current frame, then operation 288 is to update the target three-dimensional map according to the current frame to generate an updated The final 3D map includes:

According to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window, the target 3D map is updated to generate an intermediate 3D map;

The intermediate 3D map is updated by triangulation method to generate the updated 3D map.

Assuming that there is a corresponding depth image in the current frame (frame 6), when updating the target 3D map according to the current frame to generate the updated 3D map, firstly, according to the depth image corresponding to the current frame, the current frame and the preset sliding window The pose transformation matrix between the RGB images before the current frame is used to update the target 3D map to generate an intermediate 3D map.

Specifically, first, calculate the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window; secondly, according to the pose transformation matrix, project the 3D feature points on the depth image of the current frame to the current On the RGB image before the frame, generate projected 2D feature points. Again, calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the current frame, if the reprojection error is less than the preset error threshold, then use the 3D feature point on the depth image of the current frame as the target map point. Finally, the target map is updated according to the target map points to generate an intermediate three-dimensional map.

Furthermore, a triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map. Among them, in multi-view geometry, the camera observes the same space point at two positions, and the three-dimensional space point coordinates are obtained through two camera poses and image observation point coordinates. This process is the calculation process of the triangulation method. Depth information missing in some depth images can be recovered by using triangulation.

In the embodiment of the present application, if there is a corresponding depth image in the current frame, the target three-dimensional map is calculated according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window Update generates intermediate 3D maps. Then, the triangulation method is used to update the intermediate three-dimensional map to generate an updated three-dimensional map. If there is a corresponding depth image in the current frame, then the target 3D map is updated based on the 3D feature points on the depth image of the current frame in combination with the target 3D map constructed from the previous image frame. Finally, the triangulation method can recover the missing depth information in some depth images. Therefore, the integrity and accuracy of the three-dimensional map constructed at this time are greatly improved.

Continuing from the previous embodiment, if there is no corresponding depth image in the current frame, then in operation 288, the target 3D map is updated according to the current frame to generate an updated 3D map, including:

The three-dimensional map of the target is updated by the triangulation method, and an updated three-dimensional map is generated.

In the embodiment of the present application, if there is no corresponding depth image in the current frame, a triangulation method is used to update the target three-dimensional map to generate an updated three-dimensional map. Since the depth information missing in some depth images can be recovered by using the keratinization method, the integrity and accuracy of the three-dimensional map constructed at this time are also greatly improved to a certain extent.

In one embodiment, as shown in FIG. 11 , after calculating the pose of the electronic device when acquiring the RGB image after the next frame of RGB image within the preset initialization sliding window according to the target three-dimensional map, it further includes:

Operation 1120, acquire the IMU data collected within the preset sliding window.

Using the VIO (Visual-Inertial Odometry, visual-inertial odometer) system can fuse the image data collected by the camera with the IMU (Inertial measurement unit, inertial measurement unit) data to achieve more accurate positioning.

Since the IMU data acquisition frequency of the electronic device is greater than the acquisition frequency of RGB images, then, when 10 frames of RGB images are included in the sliding window, generally more than 10 sets of IMU data are collected. When calculating the initialization information of the IMU, it is calculated based on all the IMU data collected in the preset sliding window. Therefore, it is necessary to acquire all the IMU data collected within the preset sliding window.

Operation 1140, according to the pose and IMU data of each frame of RGB images in the preset sliding window, calculate the initialization information of the IMU; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU.

Then, according to the pose of the RGB image and all the IMU data collected in the preset sliding window, the rotation transformation matrix in the pose transformation matrix of the RGB image can be used for rotation constraints, and the translation transformation matrix of the RGB image can be used for translation constraints , so as to calculate the initialization information of the IMU. The initialization information of the IMU includes the initial velocity of the electronic device, the bias of the IMU (Bias) and the gravity vector of the IMU.

Operation 1160, according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.

After the initialization information of the IMU is calculated, the pose of the RGB image collected after the preset sliding window can be calculated according to the initial pose, the target 3D map, and the initialization information of the IMU. For example, calculate the pose of the 11th frame image, of course, this application is not limited to this. Among them, the initial pose is the pose of the target RGB image corresponding to the depth image of the current environment collected for the first time. The target 3D map at this time is the 3D map constructed based on all the image frames in the sliding window.

In the embodiment of the present application, when calculating the pose of the RGB image collected after the preset sliding window, the IMU is first initialized, and then the preset sliding window can be calculated not only by combining the image data collected by the camera, but also by combining the IMU data. The pose of the RGB image collected behind the window. The adaptability and robustness of pose calculation are improved from two dimensions of vision and IMU.

In one embodiment, a pose calculation method is provided, further comprising:

If the depth image of the current environment is not collected in the preset initialization sliding window, the initial pose and initial three-dimensional map are calculated according to the RGB image collected in the preset initialization sliding window;

According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized.

In the embodiment of the present application, if the depth image of the current environment is not collected in the preset initialization sliding window, the traditional VINS-MONO algorithm is used to calculate the initial pose and position according to the RGB images collected in the preset initialization sliding window. Initial 3D map. Then, according to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window is initialized. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.

In one embodiment, a pose calculation method is provided, further comprising:

If the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window;

According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.

As shown in Figure 12, it is a schematic diagram of calculating the pose when the depth image of the current environment is not collected in the preset initialization sliding window, or the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window .

In the embodiment of the present application, if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then if the method in this application is adopted, the pose of the electronic device when the depth image is collected is determined as the initial pose. According to the initial pose, the target RGB image, the depth image and the next frame of RGB image of the target RGB image, the pose of the electronic device when collecting the next frame of RGB image is determined. Therefore, using the pose calculation method in this application can only calculate the pose of the electronic device when the RGB image is collected after the sliding window, which is different from when the electronic device collects the RGB image after the sliding window is calculated by using the traditional VINS-MONO algorithm. The durations of the poses are basically the same. Therefore, if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, the traditional VINS-MONO algorithm or the pose calculation method in this application can be used for calculation. Therefore, a variety of pose calculation methods are provided, which is more flexible.

In a specific embodiment, as shown in Figure 13, a pose calculation method is provided, with the RGB image of the depth image collected for the first time as the preset initialization of the second frame image in the sliding window and after the second frame image To illustrate the image of , the method includes:

Operation 1302, the VIO system starts and initializes;

Operation 1304, collecting RGB images, IMU data and depth images;

Operation 1306, performing de-distortion and alignment processing on the collected RGB image and depth image;

Operation 1308, counting the frame number of the collected RGB image as frame_count;

Operation 1310, judging whether there is a corresponding depth image (Depth image) in the RGB image currently collected, and it is the Depth image of the current environment collected for the first time; if so, enter operation 1312; if not, enter operation 1318;

Operation 1312, use the Depth graph to calculate the flag bit start_depth_init=true;

Operation 1314, the image coordinate system of the RGB image is set as the world coordinate system, the pose of the RGB image is used as the initial pose, and the initial pose is set to 0; and the frame_count of the RGB image is recorded as first_depth;

Operation 1316, when the next frame of RGB image is collected, frame_count is updated to frame_count+1;

Operation 1318, judge whether first_depth is smaller than the sliding window size windowsize (10 frames); if so, then enter operation 1320; if not, then enter operation 1354;

Operation 1320, judge whether the frame number frame_count+1 of current frame is equal to first_depth+1; If so, then enter operation 1322; If not, then enter operation 1334;

Operation 1322, using the preset perspective projection PnP algorithm to calculate the first pose between the first_depth frame and the first_depth+1 frame;

Operation 1324, according to the first pose, project the 3D feature points on the Depth map onto the first_depth+1 frame to generate projected 2D feature points; calculate the weight between the projected 2D feature points and the 2D feature points on the first_depth+1 frame projection error;

Operation 1326, if the reprojection error is smaller than the preset error threshold, use the 3D feature point on the Depth map as the target map point; construct an initial 3D map of the current environment according to the target map point.

Operation 1328, calculating the second pose transformation matrix between the RGB image before the first_depth frame and the first_depth frame;

Operation 1330, according to the second pose transformation matrix, respectively project the 3D feature points on the Depth map onto the RGB image before the first_depth frame to generate projected 2D feature points; calculate the projected 2D feature points and the RGB image before the first_depth frame The reprojection error between the 2D feature points on ;

Operation 1332, if the reprojection error is less than the preset error threshold, then use the 3D feature point on the Depth map as a new target map point; add the new target map point to the initial 3D map to generate the target 3D map;

Operation 1334, according to the target 3D map, the current frame, and the RGB image before the current frame, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame;

Operation 1336, based on the 3D-2D feature point pair, the traditional PnP algorithm is used to calculate the pose of the current frame; and enter operation 1354 to output the pose;

Operation 1338, judging whether there is a corresponding Depth map in the current frame; if yes, then enter operation 1340, if not, then enter operation 1344;

Operation 1340, update the target 3D map to generate an intermediate 3D map according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window;

Operation 1342, using the triangulation method to update the intermediate three-dimensional map to generate an updated three-dimensional map;

Operation 1344, using the triangulation method to update the target three-dimensional map to generate an updated three-dimensional map;

Operation 1346, judge whether frame_count is equal to windowsize (10 frames); if so, then enter operation 1348;

Operation 1348, performing BA optimization on the poses corresponding to the calculated 10 frames of images;

Operation 1350, based on the poses corresponding to the 10 frames of images after BA optimization, perform IMU initialization;

Operation 1352, according to the initial pose, the target three-dimensional map and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.

In operation 1354, the VINS-MONO algorithm is used to calculate the pose.

If the depth image of the current environment is not collected in the preset initialization sliding window or the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, the traditional VINS-MONO algorithm is used to initialize according to the preset For the RGB image collected in the sliding window, calculate the pose when the electronic device collects the RGB image after the sliding window is preset and initialized. It is guaranteed that in the case that the depth image of the current environment is not collected in the preset initialization sliding window, it is still possible to output the pose of the electronic device when the RGB image is collected in real time after accumulating 10 frames of RGB images in the sliding window.

In one embodiment, as shown in FIG. 14 , a pose calculation device 1400 is provided, and the device includes:

The initial pose determination module 1420 is configured to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the target corresponding to the depth image The RGB image is not the last frame image in the preset initialization sliding window;

The next frame RGB image pose determination module 1440 is configured to determine the pose of the electronic device when the next frame of RGB image is collected according to the initial pose, the target RGB image, the depth image, and the next frame RGB image of the target RGB image.

In one embodiment, as shown in FIG. 15 , a pose calculation device 1400 is provided, and the device further includes:

The target three-dimensional map construction module 1460 is used to construct the target three-dimensional map of the current environment according to the pose and depth image when the electronic device collects the next frame of RGB image;

The other RGB image pose determination module 1480 is configured to calculate the pose of the electronic device when collecting other RGB images located after the next frame of RGB images within the preset initialization sliding window according to the target three-dimensional map.

In one embodiment, the target three-dimensional map construction module 1460 also includes:

An initial three-dimensional map construction unit, configured to construct an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;

The target three-dimensional map construction unit is used to calculate the second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and perform the initial three-dimensional map of the current environment according to the second pose transformation matrix and the depth image Update to build a target 3D map of the current environment.

In one embodiment, a pose calculation device 1400 is provided, and the device further includes:

The pose transformation matrix calculation unit is also used to calculate the first pose transformation matrix or the second pose transformation matrix by using the preset perspective projection PnP algorithm;

Among them, the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image image or the next frame of RGB image.

In one embodiment, the pose transformation matrix calculation unit is also used to further include:

The rotation transformation matrix calculation subunit is used to calculate the rotation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;

The translation transformation matrix calculation subunit is used to calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image according to the depth image and the rotation transformation matrix;

The pose transformation matrix calculation subunit is used to generate a first pose transformation matrix or a second pose transformation matrix between the relevant RGB image of the target RGB image and the target RGB image based on the rotation transformation matrix and the translation transformation matrix.

In one embodiment, the translation transformation matrix calculation subunit is also used to eliminate the relevant RGB image of the target RGB image and the matching 2D-2D feature point pairs on the target RGB image according to the rotation transformation matrix, and obtain the 2D feature point after elimination. -2D feature point pair; according to the depth image, convert the 2D-2D feature point pair after elimination into a 3D-2D feature point pair; according to the 3D-2D feature point pair, calculate the relationship between the relevant RGB image of the target RGB image and the target RGB image The translation transformation matrix between .

In one embodiment, the initial three-dimensional map construction unit is also used to project the 3D feature points on the depth image to the next frame of RGB image according to the first pose transformation matrix to generate projected 2D feature points; calculate projected 2D feature points The reprojection error between the point and the 2D feature point on the next RGB image; if the reprojection error is less than the preset error threshold, the 3D feature point on the depth image is used as the target map point; the current environment is constructed according to the target map point The initial 3D map of .

In one embodiment, the target three-dimensional map construction unit is further configured to respectively project the 3D feature points on the depth image onto the RGB image before the target RGB image according to the second pose transformation matrix to generate projected 2D feature points; Calculate the reprojection error between the projected 2D feature point and the 2D feature point on the RGB image before the target RGB image; if the reprojection error is less than the preset error threshold, use the 3D feature point on the depth image as the new target map point; add new target map points to the initial 3D map to construct the target 3D map of the current environment.

In one embodiment, other RGB image pose determination module 1480 includes:

The current frame definition unit is used to use the next frame of the next frame RGB image as the current frame, and perform the following target operations:

The target operation unit is used to generate a 3D-2D feature point pair matching between the target 3D map and the current frame according to the target 3D map, the current frame and the RGB image before the current frame; calculate the current frame based on the 3D-2D feature point pair The pose of the frame; update the target 3D map according to the current frame, and use the updated 3D map as the new target 3D map;

The loop unit is used to use the next frame of the current frame as a new current frame, and execute the target operation in a loop until the pose of the last frame of the RGB image in the preset initialization sliding window is calculated.

In one embodiment, the target operation unit is further configured to obtain matching 2D feature point pairs from the current frame and the RGB image before the current frame; obtain the 2D feature point pairs from the map points in the target three-dimensional map The matched 3D feature points, according to the matched 3D feature points and 2D feature point pairs, generate 3D-2D feature point pairs that match each other between the target 3D map and the current frame.

In one embodiment, if there is a corresponding depth image in the current frame, the target operation unit is also used to calculate the current frame according to the depth image corresponding to the current frame, the pose between the current frame and the RGB image before the current frame in the preset sliding window The transformation matrix is used to update the target 3D map to generate an intermediate 3D map; the triangulation method is used to update the intermediate 3D map to generate an updated 3D map.

In one embodiment, if there is no corresponding depth image in the current frame, the target operation unit is further configured to update the target 3D map by using a triangulation method to generate an updated 3D map.

In one embodiment, as shown in FIG. 16 , a pose calculation device 1600 is provided, and the device further includes:

IMU data acquisition module 1620, for acquiring the IMU data collected in the preset sliding window;

The IMU initialization module 1640 is used to calculate the initialization information of the IMU according to the pose and IMU data of each frame of the RGB image in the preset sliding window; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU;

The first pose calculation module 1660 is configured to calculate the pose of the RGB image collected after the preset sliding window according to the initial pose, the target 3D map and the initialization information of the IMU.

In one embodiment, a pose calculation device is provided, the device also includes:

The second pose calculation module is used to calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window if the depth image of the current environment is not collected in the preset initialization sliding window;

The third pose calculation module is used to calculate the initial pose and initial position according to the RGB images collected in the preset initialization sliding window if the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window. 3D map;

It should be understood that although the various operations in the flow charts in the figure are displayed sequentially according to the arrows, these operations are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these operations, and these operations can be executed in other orders. Moreover, at least some of the operations in the figure may include multiple sub-operations or multiple stages. These sub-operations or stages are not necessarily executed at the same time, but may be executed at different times. The execution order of these sub-operations or stages It is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of other operations or sub-operations or stages of other operations.

The division of each module in the pose calculation device is only for illustration. In other embodiments, the pose calculation device can be divided into different modules according to needs, so as to complete all or part of the functions of the pose calculation device.

For specific limitations on the pose calculation device, refer to the above-mentioned limitations on the pose calculation method, which will not be repeated here. Each module in the pose calculation device can be fully or partially realized by software, hardware and a combination thereof. Each module can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can call and execute the corresponding operations of the above modules.

In one embodiment, an electronic device is also provided, including a memory and a processor, and a computer program is stored in the memory. When the computer program is executed by the processor, the processor executes a pose provided by each of the above embodiments. The operation of the calculation method.

Fig. 17 is a schematic diagram of the internal structure of an electronic device in one embodiment. As shown in FIG. 17, the electronic device includes a processor and a memory connected through a system bus. Wherein, the processor is used to provide calculation and control capabilities to support the operation of the entire electronic device. The memory may include non-volatile storage media and internal memory. Nonvolatile storage media store operating systems and computer programs. The computer program can be executed by a processor, so as to implement a pose calculation method provided in the above embodiments. The internal memory provides a high-speed running environment for the operating system computer program in the non-volatile storage medium. The electronic device may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a wearable device.

The implementation of each module in the pose calculation device provided in the embodiment of the present application may be in the form of a computer program. The computer program can run on or on an electronic device. The program modules constituted by the computer program can be stored in the electronic device or the memory of the electronic device. When the computer program is executed by the processor, the operations of the methods described in the embodiments of the present application are realized.

The embodiment of the present application also provides a computer-readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform operations of the pose calculation method.

A computer program product containing instructions, when run on a computer, causes the computer to execute the pose calculation method.

Any reference to memory, storage, database, or other media used in embodiments of the present application may include non-volatile and/or volatile memory. Suitable nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The initialization examples above only represent several implementation modes of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the present application. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims

A pose calculation method, characterized in that the method comprises:

If the depth image of the current environment is collected for the first time within the preset initialization sliding window, the pose of the electronic device when the depth image is collected is determined as the initial pose; wherein, the target RGB image corresponding to the depth image is not the Default initialization of the last frame image in the sliding window;

According to the initial pose, the target RGB image, the depth image, and the next frame of RGB image of the target RGB image, determine the pose of the electronic device when collecting the next frame of RGB image.
The pose calculation method according to claim 1, wherein the method further comprises:

Constructing a target three-dimensional map of the current environment according to the pose and the depth image when the electronic device collects the next frame of RGB images;

According to the target three-dimensional map, calculate the pose of the electronic device when collecting other RGB images located after the next RGB image within the preset initialization sliding window.
The pose calculation method according to claim 2, characterized in that, according to the pose and the depth image when the next frame of RGB image is collected by the electronic device, the target three-dimensional map of the current environment is constructed ,include:

Calculate a first pose transformation matrix between the next frame of RGB image and the target RGB image, and construct a target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image.
The pose calculation method according to claim 3, wherein said constructing the target three-dimensional map of the current environment according to the first pose transformation matrix and the depth image comprises:

Constructing an initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image;

calculating a second pose transformation matrix between the RGB image before the target RGB image and the target RGB image, and performing an initial three-dimensional analysis of the current environment according to the second pose transformation matrix and the depth image The map is updated to construct a target three-dimensional map of the current environment.
The pose calculation method according to claim 4, wherein the method further comprises:

calculating the first pose transformation matrix or the second pose transformation matrix by using a preset perspective projection PnP algorithm;

Wherein, the preset perspective projection PnP algorithm is used to calculate the rotation transformation matrix and translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image step by step, and the relevant RGB image of the target RGB image is the RGB image before the target RGB image or the next frame of RGB image.
The pose calculation method according to claim 5, wherein the calculation of the first pose transformation matrix or the second pose transformation matrix using a preset perspective projection PnP algorithm includes:

calculating a rotation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;

According to the depth image and the rotation transformation matrix, calculate the translation transformation matrix between the relevant RGB image of the target RGB image and the target RGB image;

Based on the rotation transformation matrix and the translation transformation matrix, generate the first pose transformation matrix or the second pose transformation matrix between a related RGB image of the target RGB image and the target RGB image.
The pose calculation method according to claim 6, wherein, according to the depth image and the rotation transformation matrix, the translation between the relevant RGB image of the target RGB image and the target RGB image is calculated Transformation matrix, including:

Eliminating the 2D-2D feature point pairs matched to each other on the relevant RGB image of the target RGB image and the target RGB image according to the rotation transformation matrix, to obtain the 2D-2D feature point pairs after elimination;

converting the removed 2D-2D feature point pairs into 3D-2D feature point pairs according to the depth image;

According to the 3D-2D feature point pair, calculate the translation transformation matrix between the related RGB image of the target RGB image and the target RGB image.
The pose calculation method according to claim 4, wherein said constructing the initial three-dimensional map of the current environment according to the first pose transformation matrix and the depth image comprises:

According to the first pose transformation matrix, project the 3D feature points on the depth image onto the next frame of RGB image to generate projected 2D feature points;

Calculate the reprojection error between the projected 2D feature points and the 2D feature points on the next frame RGB image;

If the reprojection error is less than a preset error threshold, using the 3D feature point on the depth image as a target map point;

An initial three-dimensional map of the current environment is constructed according to the target map points.
The pose calculation method according to any one of claims 4-8, wherein the initial three-dimensional map of the current environment is updated according to the second pose transformation matrix and the depth image, Constructing the target three-dimensional map of the current environment, including:

According to the second pose transformation matrix, respectively project the 3D feature points on the depth image onto the RGB image before the target RGB image to generate projected 2D feature points;

calculating a reprojection error between the projected 2D feature points and the 2D feature points on the RGB image preceding the target RGB image;

If the reprojection error is less than a preset error threshold, using the 3D feature point on the depth image as a new target map point;

The new target map point is added to the initial three-dimensional map to construct the target three-dimensional map of the current environment.
The pose calculation method according to claim 2, wherein, according to the three-dimensional map of the target, calculate other positions after the next frame of RGB image is collected by the electronic device in the preset initialization sliding window. The pose of the RGB image, including:

Using the next frame of the next frame RGB image as the current frame, perform the following target operations:

generating 3D-2D feature point pairs that match each other between the target three-dimensional map and the current frame according to the target three-dimensional map, the current frame, and the RGB image before the current frame;

calculating the pose of the current frame based on the 3D-2D feature point pair;

updating the target three-dimensional map according to the current frame, and using the updated three-dimensional map as a new target three-dimensional map;

The next frame of the current frame is used as a new current frame, and the target operation is cyclically executed until the pose of the last frame of the RGB image in the preset initialization sliding window is calculated.
The pose calculation method according to claim 10, wherein the target three-dimensional map and the current 3D-2D feature point pairs that match each other between frames, including:

Obtaining matching 2D feature point pairs from the current frame and the RGB image before the current frame;

Obtaining 3D feature points matched with 2D feature point pairs from map points in the target 3D map, and generating the target 3D map and the current frame according to the matched 3D feature points and the 2D feature point pairs. 3D-2D feature point pairs that match each other.
The pose calculation method according to claim 10, wherein if there is a corresponding depth image in the current frame, the target three-dimensional map is updated according to the current frame to generate an updated three-dimensional map ,include:

updating the target three-dimensional map to generate an intermediate three-dimensional map according to the depth image corresponding to the current frame, the pose transformation matrix between the current frame and the RGB image before the current frame in the preset sliding window;

The intermediate three-dimensional map is updated by using a triangulation method to generate the updated three-dimensional map.
The pose calculation method according to claim 10, wherein if there is no corresponding depth image in the current frame, the target 3D map is updated according to the current frame to generate an updated 3D map Maps, including:

The three-dimensional map of the target is updated by using a triangulation method to generate an updated three-dimensional map.
The pose calculation method according to claim 2, characterized in that, according to the three-dimensional map of the target, the calculation is performed by the electronic device after the acquisition of the next frame of RGB image in the preset initialization sliding window After the pose of the other RGB images, also include:

Obtain the IMU data collected within the preset sliding window;

Calculate the initialization information of the IMU according to the pose of each frame of the RGB image in the preset sliding window and the IMU data; the initialization information includes the initial velocity, the zero bias of the IMU and the gravity vector of the IMU ;

According to the initial pose, the target three-dimensional map, and the initialization information of the IMU, calculate the pose of the RGB image collected after the preset sliding window.
The pose calculation method according to claim 1, wherein the method further comprises:

If the depth image of the current environment is not collected in the preset initialization sliding window, then calculate the initial pose and initial three-dimensional map according to the RGB image collected in the preset initialization sliding window;

According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset initialization sliding window.
The pose calculation method according to claim 1, wherein the method further comprises:

If the target RGB image corresponding to the depth image is the last frame image in the preset initialization sliding window, then calculate the initial pose and pose according to the RGB image collected in the preset initialization sliding window initial 3D map;

According to the initial pose and the initial three-dimensional map, calculate the pose of the electronic device when the RGB image is collected after the preset sliding window.
A pose computing device, characterized in that the device comprises:

The initial pose determination module is used to determine the pose of the electronic device when the depth image is collected as the initial pose if the depth image of the current environment is collected for the first time within the preset initialization sliding window; wherein, the depth image The corresponding target RGB image is not the last frame image in the preset initialization sliding window;

The next frame RGB image pose determination module is used to determine the electronic device to collect the The pose of the next frame of RGB image.
An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, wherein when the computer program is executed by the processor, the processor is made to execute any one of claims 1 to 16 An operation of the described pose calculation method.
A computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the operation of the pose calculation method according to any one of claims 1 to 16 is realized.