CN111462179B

CN111462179B - Three-dimensional object tracking method and device and electronic equipment

Info

Publication number: CN111462179B
Application number: CN202010222181.8A
Authority: CN
Inventors: 刘赵梁; 陈思利
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2023-06-27
Anticipated expiration: 2040-03-26
Also published as: CN111462179A

Abstract

The application provides a three-dimensional object tracking method and device and electronic equipment, and belongs to the technical field of computer vision. Wherein the method comprises the following steps: when the scene map contains reference data of a target object, determining a first two-dimensional feature contained in the target image and matched with the reference two-dimensional feature in the scene map and a first 6DoF (Six Degrees of Freedom ) pose of the target object in the target image; if the target image is a key frame image, estimating the pose of the target image of the three-dimensional object, and determining the second 6DoF pose of the target object in the target image; if the tracking deviation of the target image determined according to the first 6DoF pose and the second 6DoF pose is greater than or equal to a first threshold value, replacing the existing reference data in the scene map by utilizing the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose. Therefore, by the three-dimensional object tracking method, the accuracy and the universality of moving object tracking are improved.

Description

Three-dimensional object tracking method and device and electronic equipment

Technical Field

The application relates to the technical field of image processing, in particular to the technical field of computer vision, and provides a three-dimensional object tracking method, a three-dimensional object tracking device and electronic equipment.

Background

The three-dimensional object tracking technique is a technique for obtaining the 6DoF (Six Degrees of Freedom ) pose of an actual object in each frame according to the characteristics of the actual object itself in an image.

In the related art, a three-dimensional object tracking technology calculates the 6DoF pose of an actual object according to the corresponding two-dimensional coordinates of a three-dimensional space point in a three-dimensional model and a two-dimensional image. However, in the application, there are often cases that the area of part of the actual object in the scene image is small or the texture is not abundant, so that enough features cannot be proposed from the actual object, thereby making tracking difficult.

Disclosure of Invention

The three-dimensional object tracking method, the three-dimensional object tracking device and the electronic equipment are used for solving the problem that in the related technology, due to the fact that the actual object has smaller area in the scene image or is not rich in texture, enough characteristics cannot be provided from the actual object, and therefore tracking is difficult to carry out.

An embodiment of an aspect of the present application provides a three-dimensional object tracking method, including: detecting whether a scene map contains a reference 6DoF pose of a target object and a matching relation between a reference two-dimensional feature and a three-dimensional space point; if so, determining a matching relationship between the first two-dimensional feature contained in the target image and a three-dimensional space point and a first 6DoF pose of the target object in the target image according to the matching relationship between the first two-dimensional feature contained in the target image and the reference two-dimensional feature; judging whether the target image is a key frame image according to a preset rule; if yes, three-dimensional object pose estimation is carried out on the target image, and a matching relation of a second two-dimensional characteristic three-dimensional space point contained in the target image and a second 6DoF pose of the target object in the target image are determined; determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose; and if the tracking deviation is greater than or equal to a first threshold value, replacing a reference 6DoF pose and a matching relationship between the reference two-dimensional feature and the three-dimensional space point in the scene map by using the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

In another aspect, a three-dimensional object tracking device provided in an embodiment of the present application includes: the detection module is used for detecting whether the scene map contains the matching relation of the reference 6DoF pose of the target object, the reference two-dimensional characteristic and the three-dimensional space point; the first determining module is used for determining a matching relation between a first two-dimensional feature contained in the target image and a three-dimensional space point and a first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature contained in the target image and the reference two-dimensional feature if the first two-dimensional feature is contained in the target image; the judging module is used for judging whether the target image is a key frame image according to a preset rule; the second determining module is used for estimating the pose of the target image if the target image is in the first position, and determining the matching relation between the second two-dimensional feature contained in the target image and the three-dimensional space point and the second 6DoF pose of the target object in the target image; the third determining module is used for determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose; and the replacing module is used for replacing the reference 6DoF pose and the matching relation between the reference two-dimensional feature and the three-dimensional space point in the scene map by utilizing the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose if the tracking deviation is larger than or equal to the first threshold value.

In still another aspect, an electronic device provided in an embodiment of the present application includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a three-dimensional object tracking method as previously described.

A further aspect of the present application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the three-dimensional object tracking method as described above.

Any of the embodiments of the above application have the following advantages or benefits: the method comprises the steps of tracking a target object in a target image through a scene map comprising target object reference data, determining a first 6DoF pose of the target object, carrying out pose estimation on the target image through a three-dimensional object pose estimation algorithm when the target image is a key frame image, and determining a second 6DoF pose of the target object, so that when tracking deviation determined according to the first 6DoF pose and the second 6DoF pose is large, a scene map is initialized through a three-dimensional object pose estimation result of the target image, and environmental features are fused into the scene map, and when object tracking information is changed greatly, the scene map is initialized, and the accuracy and the universality of moving object tracking are improved. Because the matching relationship between the first two-dimensional feature and the three-dimensional space point and the first 6DoF pose of the target object in the target image are determined when the matching relationship between the reference 6DoF pose and the reference two-dimensional feature and the three-dimensional space point of the target object in the scene map are adopted, and the three-dimensional object pose estimation is carried out on the target image when the target image is a key frame image, the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose of the target object in the target image are determined, and then the tracking deviation of the target image is determined according to the matching relationship between the first 6DoF pose and the second 6DoF pose, and then when the tracking deviation is larger than or equal to a first threshold value, the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose are utilized to replace the matching relationship between the reference 6DoF pose and the reference two-dimensional feature and the three-dimensional space point in the scene map, the technical means of the scene map are overcome, and the problem that the tracking effect cannot be achieved due to the fact that the area of the actual object is small or the texture is not abundant is not enough, and the tracking effect cannot be achieved under the condition that the situation can not be achieved.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is a schematic flow chart of a three-dimensional object tracking method according to an embodiment of the present application;

FIG. 2 is a flow chart of another three-dimensional object tracking method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of another three-dimensional object tracking method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a three-dimensional object tracking device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the embodiment of the application, aiming at the problem that in the related technology, due to the fact that the area of an actual object in a scene image is small or textures are not abundant, enough characteristics cannot be provided from the actual object, so that tracking is difficult to carry out, a three-dimensional object tracking method is provided.

It should be noted that, the first two-dimensional feature, the second two-dimensional feature, and the third two-dimensional feature in the present application refer to two-dimensional features of a target object in a target image determined in different manners under different situations, respectively; the first 6DoF pose, the second 6DoF pose and the third 6DoF pose refer to 6DoF poses of the target object in the target image, which are determined in different manners under different conditions.

The three-dimensional object tracking method, the three-dimensional object tracking device, the electronic equipment and the storage medium provided by the application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a three-dimensional object tracking method according to an embodiment of the present application.

As shown in fig. 1, the three-dimensional object tracking method includes the steps of:

step 101, detecting whether a scene map contains a reference 6DoF pose of a target object and a matching relation between a reference two-dimensional feature and a three-dimensional space point.

The target object is a three-dimensional object which needs to track the 6DoF pose currently.

The reference 6DoF pose of the target object refers to one or more 6DoF pose information of the target object generated by tracking the target object with respect to each reference image acquired before the current moment.

The matching relationship between the reference two-dimensional feature of the target object and the three-dimensional space point refers to the matching relationship between the two-dimensional feature of the generated target object in the reference image and the three-dimensional space point of the three-dimensional model of the target object by tracking the target object on each reference image acquired before the current moment.

When the three-dimensional object is tracked by the computer vision technology, if the texture of the target object in the acquired image is weak or the occupied area in the image is too small, enough information of the target object cannot be extracted from the image, so that the generated tracking information of the target object is inaccurate or the target object cannot be tracked. Therefore, in the embodiment of the application, a scene map can be generated according to the tracking result of the target object in each frame image, and the matching relation between the two-dimensional feature of the background information in each frame image and the three-dimensional space point is integrated in the scene map, so that the tracking of the target object can be assisted according to the scene map, and the accuracy of the tracking of the target object is improved.

In the embodiment of the present application, when the target image needing to track the target object is acquired, it may be detected first whether the scene map is initialized, that is, whether the data in the scene map is empty, and whether the prior knowledge such as the reference 6DoF pose of the target object, the matching relationship between the reference two-dimensional feature and the three-dimensional space point is included. If the target object is included, the scene map can be utilized to track the target object of the target image; if the scene map is not contained, the pose estimation of the target object in the target image can be carried out through a three-dimensional object pose estimation algorithm so as to initialize the scene map. That is, in one possible implementation manner of the embodiment of the present application, after the step 101, the method may further include:

if the three-dimensional object pose is not included, three-dimensional object pose estimation is carried out on the target image, and the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose of the target object in the target image are determined.

And adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose into the scene map.

The third two-dimensional feature refers to the two-dimensional coordinates of the target object in the target image currently in the target image, which are determined by the three-dimensional object pose estimation algorithm when the scene map is empty.

The third 6DoF pose refers to 6DoF pose information of the target object currently in the target image, which is determined by the three-dimensional object pose estimation algorithm when the scene map is empty.

In this embodiment of the present application, if the scene map does not include the matching relationship between the reference 6DoF pose of the target object and the reference two-dimensional feature and the three-dimensional space point, that is, if the scene map is empty, the three-dimensional space point matched with each third two-dimensional feature of the target object in the target image and the third 6DoF pose of the target object may be determined according to the three-dimensional grid model corresponding to the target object by using the three-dimensional object pose estimation algorithm, and then the matching relationship between the three-dimensional space point and the third two-dimensional feature and the third 6DoF pose are added to the scene map, so as to complete initialization of the scene map.

As a possible implementation manner, in order to avoid that the number of the third two-dimensional features of the target object in the target image determined by the three-dimensional object pose estimation algorithm is small or the positions of the third two-dimensional features are unreasonable, after the third 6DoF pose of the target object is determined by the three-dimensional object pose estimation algorithm, each two-dimensional coordinate corresponding to the target object in the target image is determined by using a feature point extraction algorithm such as a ORB (Oriented FAST and Rotated BRIEF) algorithm or a BRISK (Binary Robust Invariant Scalable Keypoints) algorithm as the third two-dimensional feature of the target object, and according to the third two-dimensional feature of the target object and the third 6DoF pose of the target object determined by the feature extraction algorithm, a three-dimensional space point matched with each third two-dimensional feature is determined, and then the matching relationship between the third two-dimensional feature of the newly determined target object and the three-dimensional space point and the third 6DoF pose of the target object are added to the scene map, so as to complete initialization of the scene map.

In practical use, an appropriate feature point extraction algorithm and a three-dimensional object pose estimation algorithm can be selected according to practical needs and specific application scenarios, which is not limited in the embodiment of the present application. For example, the feature point extraction algorithm may be an ORB algorithm, a BRISK, or the like, and the three-dimensional object pose estimation algorithm may be a PWP3D (Real-time Segmentation and Tracking of 3D Objects, real-time segmentation and 3D object tracking) algorithm based on color segmentation, a Moving Edge (Real-Time Monocular Pose Estimation of 3D Objects using Temporally Consistent Local Color Histograms), a Real-time monocular pose estimation algorithm for 3D Objects using temporally coincident local color histograms, or the like.

Step 102, if the first two-dimensional feature is included, determining a matching relationship between the first two-dimensional feature included in the target image and the three-dimensional space point and a first 6DoF pose of the target object in the target image according to the matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature.

The first two-dimensional feature refers to the two-dimensional coordinate of the target object currently corresponding to the target object extracted from the target image by utilizing a feature point extraction algorithm.

The first 6DoF pose refers to 6DoF pose information of a target object currently in a target image, which is determined through a scene map.

In the embodiment of the application, if the data such as the reference 6DoF pose of the target object including the target object in the scene map, the matching relation between the reference two-dimensional feature and the three-dimensional space point and the like are detected, the scene map can be determined to be initialized, so that the prior knowledge about the target object in the scene map can be utilized to track the target object in the target image.

Specifically, whether the scene map contains the reference two-dimensional features matched with the first two-dimensional features or not can be judged by using the first two-dimensional features contained in the target image to determine the first two-dimensional features matched with the three-dimensional space points in the scene map, and then the first 6DoF pose of the target object in the target image is determined by using a PnP algorithm according to the matching relationship between the first two-dimensional features matched with the three-dimensional space points and the three-dimensional space points.

For example, the target image includes 100 first two-dimensional features, wherein 80 first two-dimensional features match with reference two-dimensional features in the scene map, so that a three-dimensional spatial point corresponding to the reference two-dimensional features matching the 80 first two-dimensional features can be determined as a three-dimensional spatial point matching the 80 first two-dimensional features.

And step 103, judging whether the target image is a key frame image according to a preset rule.

The key frame image refers to a target image in which the lighting condition is greatly changed or the position of a target object contained in the key frame image is possibly greatly changed compared with each previous frame image.

In the embodiment of the application, after the first 6DoF pose of the target object in the target image is determined, whether the target image is a key frame image with a large position change of the target object or not can be continuously judged, so that when the target image is determined to be the key frame image, the 6DoF pose of the target object in the target image is determined again.

In the embodiment of the present application, whether the target image is a key frame image may be determined by the following preset rules.

Rule one

And judging whether the number of the first two-dimensional features matched with the three-dimensional space points contained in the target image is smaller than a second threshold value.

It can be understood that, because the matching relationship between the three-dimensional space point and the reference two-dimensional feature in the scene map is determined according to the position of the target object in one or more images before the target image, the smaller the number of the first two-dimensional features matched with the three-dimensional space point contained in the target image, the larger the position of the target object in the target image and the position of the target object in one or more images before the target image are changed, or the larger the illumination condition when the target image is acquired is changed, so that the first DoF pose of the target object determined according to the scene map is likely to be inaccurate. Therefore, in the embodiment of the present application, if the number of the first two-dimensional features matched with the three-dimensional space points included in the target image is smaller than the second threshold, it may be determined that the position of the target object in the target image is greatly changed, or that the illumination condition when the target image is acquired is greatly changed, so that it may be determined that the target image is a key frame image.

Optionally, whether the target image is a key frame image may be further determined according to a ratio of the number of first two-dimensional features matched with the three-dimensional space point contained in the target image to the number of all first two-dimensional features in the target image. Specifically, when the ratio of the number of first two-dimensional features matched with the three-dimensional space point contained in the target image to the number of all the first two-dimensional features in the target image is smaller than the proportion threshold value, the target image can be determined to be the key frame image.

Rule two

And judging whether the distance between the acquisition position of the target image and the acquisition position of the adjacent previous key frame is larger than a third threshold value.

The acquisition position of the target image refers to the position of the image acquisition device for acquiring the target image when the image acquisition device acquires the target image.

In this embodiment of the present application, the greater the distance between the acquisition position of the target image and the acquisition position of the neighboring previous key frame, the greater the change may occur in the position of the target object in the target image compared to the neighboring previous key frame, so that it may be determined whether the target image is a key frame image when the distance between the acquisition position of the target image and the acquisition position of the neighboring previous key frame is greater than a third threshold.

Rule III

And judging whether the time interval between the acquisition time of the target image and the acquisition time of the adjacent previous key frame is larger than a fourth threshold value.

The time of collecting the target image refers to the time of collecting the target image by the image collecting device for collecting the target image.

In this embodiment of the present application, the larger the time interval between the acquisition time of the target image and the acquisition time of the adjacent previous key frame, the larger the position of the target object in the target image may be compared with the adjacent previous key frame, so that the time interval between the acquisition time of the target image and the acquisition time of the adjacent previous key frame may be greater than the fourth threshold value, and it is determined that the target image is a key frame image.

It should be noted that, the preset rule according to which the target image is determined whether to be the key frame image may include, but is not limited to, the above-listed cases. In actual use, an appropriate rule and a specific value of each related threshold value can be determined according to actual needs and specific application scenarios, which is not limited in the embodiment of the present application.

Further, if it is determined that the target image is not a key frame image, it may be determined that the illumination condition does not significantly change or the position of the target object in the target image does not significantly change when the target image is acquired. That is, in one possible implementation manner of the embodiment of the present application, after the step 103, the method may further include:

If not, the first 6DoF pose is determined as the current tracking information of the target object.

In the embodiment of the present application, if it is determined that the target image is not a key frame image, it may be determined that the position of the target object in the target image does not change greatly, so that it may be determined that the first 6DoF pose of the target object determined according to the scene map is more accurate, and therefore the first 6DoF pose of the target object may be determined as current tracking information of the target object, so as to complete the tracking process of the target image, without performing subsequent steps in the embodiment of the present application.

And 104, if so, estimating the pose of the three-dimensional object on the target image, and determining the matching relation between the second two-dimensional feature contained in the target image and the three-dimensional space point and the second 6DoF pose of the target object in the target image.

The second two-dimensional feature refers to two-dimensional coordinates of the target object, which are determined by the three-dimensional object pose estimation algorithm and are corresponding to the target object in the target image when the target image is a key frame image.

The second 6DoF pose refers to 6DoF pose information of the target delegate currently in the target image determined by the three-dimensional object pose estimation algorithm when the target image is a key frame image.

In this embodiment of the present application, if it is determined that the target image is a key frame image, the three-dimensional object pose estimation algorithm may be further used to perform three-dimensional object pose estimation on the target image according to the three-dimensional mesh model of the target object, so as to determine a matching relationship between each second two-dimensional feature corresponding to the target object in the target image and the three-dimensional space point, and a second 6DoF pose of the target object in the target image, and use the second 6DoF pose as a basis for measuring the actual pose of the target object.

And 105, determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose.

In the embodiment of the application, since the second 6DoF pose of the target object may be used to represent the real pose of the target object, the tracking deviation of the target image may be determined according to the matching degree of the first 6DoF pose and the second 6DoF pose, so as to determine whether the first 6DoF pose of the target object is accurate according to the tracking deviation of the target image.

As a possible implementation manner, two-dimensional coordinates corresponding to the target object may be determined according to the first 6DoF pose and the second 6DoF pose of the target object, and the matching degree of the first 6DoF pose and the second 6DoF pose may be determined according to the distance between the two-dimensional coordinates. That is, in one possible implementation manner of the embodiment of the present application, before the step 105, the method may further include:

According to the first 6DoF pose, determining a first two-dimensional coordinate corresponding to each three-dimensional space point;

determining a second two-dimensional coordinate corresponding to each three-dimensional space point according to the second 6DoF pose;

and determining the matching degree of the first 6DoF pose and the second 6DoF pose according to the distance between the first two-dimensional coordinate and the second two-dimensional coordinate.

In the embodiment of the application, according to the first 6DoF pose, the coordinate of each three-dimensional space point in the three-dimensional grid model of the target object and the internal parameter of the image acquisition device for acquiring the target image, determining the first two-dimensional coordinate corresponding to each three-dimensional space point, namely, projecting the three-dimensional grid model of the target object to the plane where the target image is located by using the first 6DoF pose, and generating the two-dimensional coordinate corresponding to projection; and determining a second two-dimensional coordinate corresponding to each three-dimensional space point according to the second 6DoF pose, the coordinate of each three-dimensional space point in the three-dimensional grid model of the target object and the internal parameter of the image acquisition equipment for acquiring the target image, namely, projecting the three-dimensional grid model of the target object to the plane where the target image is located by using the second 6DoF pose, and generating the two-dimensional coordinate corresponding to projection.

Specifically, the 6DoF pose may be represented as a 3×4 matrix, with the first 6DoF pose being denoted as P ₁ The second 6DoF pose is marked as P ₂ The coordinates of the three-dimensional space points can be expressed as column vectors of 3×1 and marked as X, the internal parameters of the image acquisition device can be expressed as a matrix of 3×3 and marked as K, and then the first two-dimensional coordinates corresponding to each three-dimensional space point are marked as X ₁ ＝KP ₁ X, the second two-dimensional coordinate corresponding to each three-dimensional space point is X ₂ ＝KP ₂ X is a metal alloy. And then determining the Euclidean distance between the first two-dimensional coordinates and the second two-dimensional coordinates corresponding to each three-dimensional space point, and further determining the matching degree of the first 6DoF pose and the second 6DoF pose according to the Euclidean distance between the first two-dimensional coordinates and the second two-dimensional coordinates corresponding to each three-dimensional space point.

Optionally, the larger the euclidean distance between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to the three-dimensional space point is, the larger the difference between the first two-dimensional coordinate and the second two-dimensional coordinate is, that is, the larger the difference between the first 6DoF pose and the second 6DoF pose is, so that an average value of the euclidean distance between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to each three-dimensional space point can be determined, and the reciprocal of the average value is determined to be the matching degree of the first 6DoF pose and the second 6DoF pose.

In the embodiment of the application, the larger the matching degree of the first 6DoF pose and the second 6DoF pose is, the smaller the tracking deviation of the target image is; the smaller the matching degree of the first 6DoF pose and the second 6DoF pose, the larger the tracking deviation of the target image is. That is, the tracking deviation of the target image has a positive correlation with the euclidean distance between the first two-dimensional coordinates and the second two-dimensional coordinates corresponding to the three-dimensional space points, so that the average value of the euclidean distances between the first two-dimensional coordinates and the second two-dimensional coordinates corresponding to each three-dimensional space point can be determined as the tracking deviation of the target image.

And 106, if the tracking deviation is greater than or equal to the first threshold value, replacing the reference 6DoF pose and the matching relation between the reference two-dimensional feature and the three-dimensional space point in the scene map by using the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

In the embodiment of the present application, if the tracking deviation of the target image is greater than or equal to the first threshold, it may be determined that the pose of the target object in the target image has changed greatly, that is, the first 6DoF pose determined according to the scene image is inaccurate, so that the second 6DoF pose of the target object may be determined as current tracking information of the target object, and the scene map may be emptied, so that the scene map may be initialized again by using the target image as an initial frame, that is, the matching relationship between the second two-dimensional feature and the three-dimensional space point and the matching relationship between the second two-dimensional feature and the second 6DoF pose in the scene map may be replaced.

According to the technical scheme, when the scene map contains the reference 6DoF pose of the target object and the matching relation between the reference two-dimensional feature and the three-dimensional space point, the first two-dimensional feature which is contained in the target image and matches with the reference two-dimensional feature and the first 6DoF pose of the target object in the target image are determined, when the target image is a key frame image, three-dimensional object pose estimation is carried out on the target image, the matching relation between the second two-dimensional feature contained in the target image and the three-dimensional space point and the second 6DoF pose of the target object in the target image are determined, then tracking deviation of the target image is determined according to the matching relation between the first 6DoF pose and the second 6DoF pose, and when the tracking deviation is larger than or equal to a first threshold value, the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose are utilized to replace the matching relation between the reference 6DoF pose and the reference two-dimensional feature in the scene map and the three-dimensional space point. Therefore, the target object in the target image is tracked through the scene map comprising the target object reference data and the environmental characteristics, and the scene map is initialized when the object tracking information is changed greatly, so that the accuracy and the universality of moving object tracking are improved.

In one possible implementation form of the present application, if the tracking deviation of the key frame image is smaller, the three-dimensional object pose estimation result and the feature extraction result of the key frame image may be added to the scene map to enrich the reference data in the scene map.

The three-dimensional object tracking method provided in the embodiment of the present application is further described below with reference to fig. 2.

Fig. 2 is a flow chart of another three-dimensional object tracking method according to an embodiment of the present application.

As shown in fig. 2, the three-dimensional object tracking method includes the steps of:

step 201, detecting whether a scene map contains a reference 6DoF pose of a target object and a matching relationship between a reference two-dimensional feature and a three-dimensional space point.

Step 202, if the first two-dimensional feature is included, determining a matching relationship between the first two-dimensional feature included in the target image and the three-dimensional space point and a first 6DoF pose of the target object in the target image according to the matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature.

Step 203, judging whether the target image is a key frame image according to a preset rule, if so, executing step 204 and step 205; otherwise, step 210 is performed.

The specific implementation and principles of the steps 201 to 203 may refer to the detailed description of the embodiments, and are not repeated here.

Step 204, extracting a fourth two-dimensional feature in the target image, and constructing a three-dimensional space position corresponding to the fourth two-dimensional feature; and adding the fourth two-dimensional feature and the corresponding three-dimensional space position into the scene map.

In the embodiment of the application, if the target image is determined to be the key frame image, the feature-rich scene map in the target image can be utilized to further improve the accuracy of target object tracking.

As a possible implementation form, feature extraction may be performed on the target image by using a feature point extraction algorithm to determine each feature point included in the target image, where the fourth two-dimensional feature of the target image is a two-dimensional coordinate of each extracted feature point in the target image. Then, a three-dimensional grid model corresponding to the target object is projected to a plane where the target image is located by using a first 6DoF pose corresponding to the target object so as to generate a projected contour of the target object in the target image, and each fourth two-dimensional feature within the projection contour range is further determined to be a two-dimensional coordinate corresponding to the target object in the target image; and finally, determining three-dimensional space points corresponding to the fourth two-dimensional features in the projection contour range by using the first 6DoF pose corresponding to the target object, the fourth two-dimensional features in the projection contour range and the three-dimensional grid model corresponding to the target object, namely constructing three-dimensional space positions corresponding to the fourth two-dimensional features corresponding to the target object.

Accordingly, each fourth two-dimensional feature located outside the projection contour range can be determined as a corresponding two-dimensional coordinate of the background environment in the target image. For each fourth two-dimensional feature corresponding to the background environment, the feature (such as the ORB feature) at each fourth two-dimensional feature corresponding to the background environment may be matched with the feature (also referred to as the ORB feature) corresponding to each position in the previous frame image. If the feature of the fourth two-dimensional feature corresponding to the background environment is matched with the feature of a certain position in the previous frame image, the fourth two-dimensional feature corresponding to the background environment and the position matched with the fourth two-dimensional feature in the previous frame image can be considered to have the same visual feature, namely the same point in the scene is represented, so that the three-dimensional space position of the fourth two-dimensional feature corresponding to the background environment can be constructed through a triangulation method according to the acquisition position of the target image and the acquisition position of the previous frame image, and the three-dimensional space position of each fourth two-dimensional feature corresponding to the background environment can be constructed.

In the embodiment of the application, after the three-dimensional space position of each fourth two-dimensional feature in the target image is constructed, each fourth two-dimensional feature in each target image and the corresponding three-dimensional space position can be added into the scene map to enrich the reference data in the scene map.

Step 205, estimating the pose of the target object, and determining the matching relationship between the second two-dimensional feature and the three-dimensional space point contained in the target image and the second 6DoF pose of the target object in the target image.

And 206, determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose.

Step 207, judging whether the tracking deviation of the target image is greater than or equal to a first threshold, if yes, executing step 208; otherwise, step 209 is performed.

And step 208, replacing the reference 6DoF pose and the matching relation between the reference two-dimensional feature and the three-dimensional space point in the scene map by using the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

The specific implementation and principles of the steps 205-208 may refer to the detailed description of the embodiments, which is not repeated here.

Step 209, adding the matching relationship between the second two-dimensional feature and the three-dimensional space point contained in the target image and the second 6DoF pose to the scene map.

In the embodiment of the present application, if the tracking deviation of the target image is smaller than the first threshold, it may be determined that the position of the target object in the target image does not change greatly, and the first 6DoF pose of the target object determined according to the scene map is also more accurate, so that the first 6DoF pose or the second 6DoF pose may be determined as current tracking information of the target object; and adding the result of three-dimensional object pose estimation on the target image to the scene map to enrich the reference data in the scene map without initializing the scene map.

Step 210, determining the first 6DoF pose as current tracking information of the target object.

The specific implementation process and principle of the step 210 may refer to the detailed description of the foregoing embodiments, which is not repeated herein.

According to the technical scheme, the target object in the target image is tracked through the scene map comprising the target object reference data, the first 6DoF pose of the target object is determined, when the target image is a key frame image, pose estimation is performed on the target image through the three-dimensional object pose estimation algorithm, the second 6DoF pose of the target object is determined, when tracking deviation determined according to the first 6DoF pose and the second 6DoF pose is greater than or equal to a first threshold value, the scene map is initialized through the three-dimensional object pose estimation result of the target image, and when the tracking deviation is smaller than the first threshold value, the three-dimensional object pose estimation result of the key frame and the three-dimensional space position of each fourth two-dimensional feature in the constructed key frame image are utilized, so that reference information in the scene map is enriched. Therefore, the scene map is updated through the environment features integrated in the scene map and the information of the key frame images, so that the accuracy and the universality of moving object tracking are further improved.

In one possible implementation form of the method, when the first 6DoF pose of the target object in the target image is determined by using the scene map, different weights can be given to different two-dimensional features, so that the accuracy of tracking the moving object is further improved.

The three-dimensional object tracking method provided in the embodiment of the present application is further described below with reference to fig. 3.

Fig. 3 is a flowchart of another three-dimensional object tracking method according to an embodiment of the present application.

As shown in fig. 3, the three-dimensional object tracking method includes the steps of:

step 301, detecting whether a scene map contains a reference 6DoF pose of a target object and a matching relationship between a reference two-dimensional feature and a three-dimensional space point.

The specific implementation process and principle of the above step 301 may refer to the detailed description of the above embodiments, which is not repeated herein.

Step 302, if the first two-dimensional feature is included, determining a matching relationship between the first two-dimensional feature included in the target image and the three-dimensional space point according to the matching relationship between the first two-dimensional feature and the reference two-dimensional feature included in the target image, and calculating a first 6DoF pose of the target object in the target image according to the matching relationship between each first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature matched with each first two-dimensional feature.

In the embodiment of the application, after the matching relation between each first two-dimensional feature contained in the target image and the three-dimensional space point is determined, the weight of the reference two-dimensional feature matched with each first two-dimensional feature can be fused when the first 6DoF pose of the target object in the target image is determined, so that the accuracy of the determined first 6DoF pose is further improved.

In embodiments of the present application, the weights of the reference two-dimensional features in the scene image may be determined in the following manner.

Mode one

And determining the weight of each reference two-dimensional feature according to the position of each reference two-dimensional feature in the reference image.

In the embodiment of the present application, three types of acquisition methods of reference data in a scene map are total: the first acquisition mode is: when the target image is a key frame image and the tracking deviation is smaller than a first threshold value, extracting the characteristics of the target image, and constructing a matching relation between the fourth two-dimensional characteristics and the three-dimensional space position; the second acquisition mode is: when the target image is a key frame image, a matching relation between a second two-dimensional feature and a three-dimensional space point and a second 6DoF pose are determined by estimating the pose of the three-dimensional object of the target image; and the acquisition mode is three: and initializing a scene map, namely when the target image is an initial frame, estimating and determining a matching relationship between a third two-dimensional feature and a three-dimensional space point and a third 6DoF pose of the target image.

The reference two-dimensional features acquired through the first acquisition mode may be coordinates corresponding to the target object in the reference image or coordinates corresponding to the background environment in the reference image, so that when the matching relation between each reference two-dimensional feature and the three-dimensional space point is determined through feature extraction of the key frame image and added into the scene map, the weight of each reference two-dimensional feature can be determined according to the position of each reference two-dimensional feature in the reference image where the reference two-dimensional feature is located.

Optionally, since the number of two-dimensional coordinates corresponding to the background environment in the target image may be far greater than the number of two-dimensional coordinates corresponding to the target object, in order to balance the effect of each reference two-dimensional feature in determining the first 6DoF pose of the target object, a greater weight may be given to each reference two-dimensional feature corresponding to the target object in the target image.

That is, if the position of the reference two-dimensional feature in the reference image in which it is located is the background environment in the reference image, the weight of the reference two-dimensional feature may be determined to be a smaller value; if the position of the reference two-dimensional feature in the reference image in which it is located is the target object in the reference image, the weight of the reference two-dimensional feature can be determined to be a larger value.

Optionally, the weight of each reference two-dimensional feature may be determined according to the ratio of the number of reference two-dimensional features corresponding to the background environment in the reference image to the number of reference two-dimensional features corresponding to the target object. Specifically, as the area of the target object in the image is smaller, the information of the target object which can be extracted from the image is smaller, so that the effect of the characteristics of the target object on determining the pose of the target object is smaller. Therefore, for the reference two-dimensional feature obtained from one reference image, the ratio of the number of the reference two-dimensional features corresponding to the background environment in the reference image to the number of the reference two-dimensional features corresponding to the target object can be determined as the weight of the reference two-dimensional feature corresponding to the target object; and determining the ratio of the number of the reference two-dimensional features corresponding to the target object to the number of the reference two-dimensional features corresponding to the background environment as the weight of the reference two-dimensional features corresponding to the background environment.

Mode two

Determining a first type of key two-dimensional features contained in the reference two-dimensional features according to the acquisition mode of each reference two-dimensional feature;

and determining the weight of each first type of key two-dimensional feature according to the time interval between the reference image and the target image where each first type of key two-dimensional feature is located.

The first type of key two-dimensional features refers to reference two-dimensional features acquired through the second acquisition mode and the third acquisition mode, namely second two-dimensional features and third two-dimensional features. It should be noted that, because the second two-dimensional feature and the third two-dimensional feature are determined by using the three-dimensional object pose estimation algorithm, the first key two-dimensional features are two-dimensional features corresponding to the target object in the reference image where the target object is located.

In this embodiment of the present application, since the matching relationship between the first type of key two-dimensional features and the three-dimensional space point in the scene map may be obtained from different reference images, and the larger the time interval between the reference image and the target image is, the larger the position of the target object in the reference image may be different from the position of the target object in the target image, so that the weight of each first type of key two-dimensional feature may be determined according to the time interval between the reference image and the target image where each first type of key two-dimensional feature is located.

Specifically, the larger the time interval between the reference image and the target image where the first type of key two-dimensional features are located, the smaller the weight of the first type of key two-dimensional features can be determined to be; the smaller the time interval between the reference image and the target image where the first type of key two-dimensional feature is located, the larger the weight of the first type of key two-dimensional feature can be determined to be.

Mode three

Determining a second type of key two-dimensional features contained in the reference two-dimensional features according to the positions of the reference two-dimensional features in the reference image;

and determining the weight of each second-class key two-dimensional feature according to the acquisition mode of each second-class key two-dimensional feature.

The second kind of key two-dimensional features refer to two-dimensional features corresponding to the target object in the reference image where the target object is located.

In this embodiment of the present application, the second type of key two-dimensional feature may be acquired by the first acquisition mode, or may be acquired by the second acquisition mode and the third acquisition mode. Because the second-type key two-dimensional features acquired through the second acquisition mode and the third acquisition mode are all determined through the three-dimensional object pose estimation algorithm, the second-type key two-dimensional features acquired through the second acquisition mode and the third acquisition mode can be considered as real two-dimensional features corresponding to the target object in the image, and therefore if the second-type key two-dimensional features are acquired through the first acquisition mode, the weight corresponding to the second-type key two-dimensional features can be determined to be a smaller value; if the second type of key two-dimensional features are acquired through the second acquisition mode and the third acquisition mode, the weight corresponding to the second type of key two-dimensional features can be determined to be a larger value.

It should be noted that the manner of determining the weights of the reference two-dimensional features may include, but is not limited to, the cases listed above. In actual use, the manner of determining the weight of the reference two-dimensional feature can be selected according to actual needs, which is not limited in the embodiment of the present application.

Step 303, judging whether the target image is a key frame image according to a preset rule.

And step 304, if yes, estimating the pose of the target image, and determining the matching relation between the second two-dimensional feature contained in the target image and the three-dimensional space point and the second 6DoF pose of the target object in the target image.

Step 305, determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose.

And 306, if the tracking deviation is greater than or equal to the first threshold, replacing the reference 6DoF pose and the matching relationship between the reference two-dimensional feature and the three-dimensional space point in the scene map by using the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

The specific implementation and principles of the steps 303-306 may refer to the detailed description of the embodiments, which is not repeated here.

According to the technical scheme, when the matching relation between the reference 6DoF pose of the target object and the reference two-dimensional feature and the three-dimensional space point is contained in the scene map, the first two-dimensional feature which is contained in the target image and is matched with the reference two-dimensional feature is determined, the first 6DoF pose of the target object in the target image is calculated according to the matching relation between the first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature which is respectively matched with the first two-dimensional feature, when the target image is a key frame image, three-dimensional object pose estimation is carried out on the target image, the matching relation between the second two-dimensional feature contained in the target image and the three-dimensional space point and the second 6DoF pose of the target object in the target image are determined, then tracking deviation of the target image is determined according to the matching relation between the first 6DoF pose and the second 6DoF pose, and the matching relation between the reference two-dimensional feature and the three-dimensional space point in the scene is replaced when the tracking deviation is larger than or equal to a first threshold value. Therefore, the target object in the target image is tracked through the scene map comprising the target object reference data and the environment features, and different weights are given to different reference two-dimensional features, so that the accuracy and the universality of moving object tracking are further improved.

In order to implement the above embodiment, the present application further proposes a three-dimensional object tracking device.

Fig. 4 is a schematic structural diagram of a three-dimensional object tracking device according to an embodiment of the present application.

As shown in fig. 4, the three-dimensional object tracking device 40 includes:

the detection module 41 is configured to detect whether the scene map includes a reference 6DoF pose of the target object, and a matching relationship between the reference two-dimensional feature and the three-dimensional space point.

A first determining module 42, configured to determine, if the first determining module has included the first two-dimensional feature, a matching relationship between the first two-dimensional feature included in the target image and a three-dimensional spatial point, and a first 6DoF pose of the target object in the target image according to a matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature;

a judging module 43, configured to judge whether the target image is a key frame image according to a preset rule;

a second determining module 44, configured to perform three-dimensional object pose estimation on the target image if the target image is positive, and determine a matching relationship between a second two-dimensional feature included in the target image and a three-dimensional space point, and a second 6DoF pose of the target object in the target image;

a third determining module 45, configured to determine a tracking deviation of the target image according to a matching degree of the first 6DoF pose and the second 6DoF pose;

And a replacing module 46, configured to replace the reference 6DoF pose and the matching relationship between the reference two-dimensional feature and the three-dimensional space point in the scene map with the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose if the tracking deviation is greater than or equal to the first threshold.

In practical use, the three-dimensional object tracking device provided by the embodiment of the application can be configured in any electronic equipment to execute the three-dimensional object tracking method.

In one possible implementation form of the present application, the three-dimensional object tracking device 40 further includes:

the fourth determining module is used for estimating the pose of the three-dimensional object on the target image if the three-dimensional object is not included, and determining the matching relation between the third two-dimensional feature included in the target image and the three-dimensional space point and the third 6DoF pose of the target object in the target image;

the first adding module is used for adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose into the scene map.

Further, in another possible implementation form of the present application, the three-dimensional object tracking device 40 further includes:

and a fifth determining module, configured to determine the first 6DoF pose as current tracking information of the target object if not.

Further, in still another possible implementation form of the present application, the three-dimensional object tracking device 40 further includes:

the sixth determining module is used for determining a first two-dimensional coordinate corresponding to each three-dimensional space point according to the first 6DoF pose;

a seventh determining module, configured to determine a second two-dimensional coordinate corresponding to each three-dimensional space point according to the second 6DoF pose;

and the eighth determining module is used for determining the matching degree of the first 6DoF pose and the second 6DoF pose according to the distance between the first two-dimensional coordinate and the second two-dimensional coordinate.

and the second adding module is used for adding the second two-dimensional features which are contained in the target image and matched with the three-dimensional space points to the scene map if the tracking deviation is smaller than the first threshold value.

the extraction module is used for extracting a fourth two-dimensional feature in the target image if the target image is the same;

the construction module is used for constructing a three-dimensional space position corresponding to the fourth two-dimensional feature;

and the third adding module is used for adding all the features in the target image and the corresponding three-dimensional space positions into the scene map.

In one possible implementation manner of the present application, the above-mentioned determining module 43 is specifically configured to:

Further, in another possible implementation manner of the present application, the determining module 43 is further configured to:

Further, in still another possible implementation manner of the present application, the determining module 43 is further configured to:

In one possible implementation manner of the present application, the first determining module 42 is specifically configured to:

and calculating a first 6DoF pose of the target object in the target image according to the matching relation between each first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature matched with each first two-dimensional feature.

Further, in another possible implementation manner of the present application, the first determining module 42 is further configured to:

Further, in still another possible implementation form of the present application, the first determining module 42 is further configured to:

It should be noted that the foregoing explanation of the embodiment of the three-dimensional object tracking method shown in fig. 1, 2 and 3 is also applicable to the three-dimensional object tracking device 40 of this embodiment, and will not be repeated here.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, a block diagram of an electronic device of a three-dimensional object tracking method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the three-dimensional object tracking method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the three-dimensional object tracking method provided by the present application.

The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the detection module 41, the first determination module 42, the judgment module 43, the second determination module 44, the third determination module 45, and the replacement module 46 shown in fig. 4) corresponding to the three-dimensional object tracking method in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, i.e., implements the three-dimensional object tracking method in the method embodiments described above.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the three-dimensional object tracking method, or the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory remotely located relative to processor 501, which may be connected to the electronic device of the three-dimensional object tracking method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the three-dimensional object tracking method may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the three-dimensional object tracking method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of three-dimensional object tracking, comprising:

detecting whether a scene map contains a reference 6DoF pose of a target object and a matching relation between a reference two-dimensional feature and a three-dimensional space point;

if the first two-dimensional feature is included, determining a matching relation between the first two-dimensional feature included in the target image and a three-dimensional space point and a first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature included in the target image and the reference two-dimensional feature, wherein the first two-dimensional feature refers to a two-dimensional coordinate currently corresponding to the target object extracted from the target image by utilizing a feature point extraction algorithm, and the first 6DoF pose refers to 6DoF pose information of the target object currently determined through a scene map in the target image;

Judging whether the target image is a key frame image according to a preset rule, and re-determining the 6DoF pose of a target object in the target image when the target image is determined to be the key frame image;

if so, carrying out three-dimensional object pose estimation on the target image, and determining a matching relation between a second two-dimensional feature contained in the target image and a three-dimensional space point and a second 6DoF pose of the target object in the target image, wherein the second two-dimensional feature is a two-dimensional coordinate corresponding to the target object currently in the target image, which is determined by a three-dimensional object pose estimation algorithm, when the target image is a key frame image, and the second 6DoF pose is 6DoF pose information of the target delegate currently in the target image, which is determined by a three-dimensional object pose estimation algorithm, when the target image is the key frame image;

determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose;

and if the tracking deviation is greater than or equal to a first threshold value, replacing a reference 6DoF pose in the scene map and a matching relationship between the reference two-dimensional feature and the three-dimensional space point by using the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose, and determining the second 6DoF pose of the target object as current tracking information of the target object.

2. The method of claim 1, wherein after detecting whether the scene map includes the reference 6DoF pose of the target object and the matching relationship between the reference two-dimensional feature and the three-dimensional space point, the method further comprises:

if the three-dimensional object pose is not included, estimating the pose of the three-dimensional object of the target image, and determining the matching relation between a third two-dimensional feature and a three-dimensional space point included in the target image and the third 6DoF pose of the target object in the target image;

3. The method of claim 1, wherein determining whether the target image is a key frame image according to a predetermined rule comprises:

4. The method of claim 1, wherein determining whether the target image is a key frame image according to a predetermined rule comprises:

5. The method of claim 1, wherein determining whether the target image is a key frame image according to a predetermined rule comprises:

6. The method according to any one of claims 1-5, wherein after determining whether the target image is a key frame image according to a preset rule, further comprising:

and if not, determining the first 6DoF pose as the current tracking information of the target object.

7. The method according to any one of claims 2-5, wherein determining a matching relationship between a first two-dimensional feature contained in the target image and a three-dimensional space point, and a first 6DoF pose of the target object in the target image, comprises:

and calculating a first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional features and the three-dimensional space points and the weights of the reference two-dimensional features respectively matched with the first two-dimensional features.

8. The method of claim 7, wherein the computing the first 6DoF pose of the target object in the target image further comprises:

9. The method of claim 7, wherein the computing the first 6DoF pose of the target object in the target image further comprises:

and determining the weight of each first type of key two-dimensional feature according to the time interval between the reference image where each first type of key two-dimensional feature is located and the target image.

10. The method of claim 7, wherein the computing the first 6DoF pose of the target object in the target image further comprises:

11. The method according to any one of claims 1-5, wherein before determining the tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose, further comprises:

Determining a first two-dimensional coordinate corresponding to each three-dimensional space point according to the first 6DoF pose;

12. The method of any of claims 1-5, wherein after determining the tracking bias for the target image, further comprising:

and if the tracking deviation is smaller than a first threshold value, adding the matching relation between the second two-dimensional feature and the three-dimensional space point contained in the target image and the second 6DoF pose into the scene map.

13. The method according to any one of claims 1-5, wherein after determining whether the target image is a key frame image according to a preset rule, further comprising:

if yes, extracting a fourth two-dimensional feature in the target image;

constructing a three-dimensional space position corresponding to the fourth two-dimensional feature;

and adding the fourth two-dimensional feature and the corresponding three-dimensional space position to the scene map.

14. A three-dimensional object tracking device, comprising:

the detection module is used for detecting whether the scene map contains the matching relation of the reference 6DoF pose of the target object, the reference two-dimensional characteristic and the three-dimensional space point;

the first determining module is configured to determine, if the first determining module is included, a matching relationship between a first two-dimensional feature included in a target image and the reference two-dimensional feature, and a first 6DoF pose of the target object in the target image according to the matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature, where the first two-dimensional feature is a two-dimensional coordinate currently corresponding to the target object extracted from the target image by using a feature point extraction algorithm, and the first 6DoF pose is 6DoF pose information of the target object currently in the target image determined by using a scene map;

the judging module is used for judging whether the target image is a key frame image according to a preset rule so as to redetermine the 6DoF pose of the target object in the target image when the target image is determined to be the key frame image;

the second determining module is configured to perform three-dimensional object pose estimation on the target image if the target image is a key frame image, determine a matching relationship between a second two-dimensional feature included in the target image and a three-dimensional space point, and determine a second 6DoF pose of the target object in the target image, where the second two-dimensional feature is a two-dimensional coordinate corresponding to the target object currently determined by the three-dimensional object pose estimation algorithm when the target image is a key frame image, and the second 6DoF pose is 6DoF pose information of the target delegate currently determined by the three-dimensional object pose estimation algorithm in the target image when the target image is a key frame image;

The third determining module is used for determining tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose;

and the replacement module is used for replacing the reference 6DoF pose in the scene map and the matching relation between the reference two-dimensional feature and the three-dimensional space point by using the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose if the tracking deviation is larger than or equal to the first threshold value, and determining the second 6DoF pose of the target object as the current tracking information of the target object.

15. The apparatus as recited in claim 14, further comprising:

a fourth determining module, configured to perform three-dimensional object pose estimation on the target image if the third two-dimensional feature is not included, and determine a matching relationship between a third two-dimensional feature included in the target image and a three-dimensional space point, and a third 6DoF pose of the target object in the target image;

and the first adding module is used for adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose into the scene map.

16. The apparatus of claim 14, wherein the judging module is specifically configured to:

17. The apparatus of claim 14, wherein the determination module is further configured to:

18. The apparatus of claim 14, wherein the determination module is further configured to:

19. The apparatus of any one of claims 14-18, further comprising:

20. The apparatus according to any of the claims 15-18, wherein the first determining module is specifically configured to:

21. The apparatus of claim 20, wherein the first determination module is further for:

22. The apparatus of claim 20, wherein the first determination module is further for:

23. The apparatus of claim 20, wherein the first determination module is further for:

24. The apparatus of any one of claims 14-18, further comprising:

a sixth determining module, configured to determine a first two-dimensional coordinate corresponding to each three-dimensional space point according to the first 6DoF pose;

A seventh determining module, configured to determine, according to the second 6DoF pose, a second two-dimensional coordinate corresponding to each three-dimensional spatial point;

and an eighth determining module, configured to determine, according to a distance between the first two-dimensional coordinate and the second two-dimensional coordinate, a matching degree between the first 6DoF pose and the second 6DoF pose.

25. The apparatus of any one of claims 14-18, further comprising:

and the second adding module is used for adding the matching relation between the second two-dimensional feature and the three-dimensional space point contained in the target image and the second 6DoF pose to the scene map if the tracking deviation is smaller than a first threshold value.

26. The apparatus of any one of claims 14-18, further comprising:

27. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.