CN111462179A

CN111462179A - Three-dimensional object tracking method and device and electronic equipment

Info

Publication number: CN111462179A
Application number: CN202010222181.8A
Authority: CN
Inventors: 刘赵梁; 陈思利
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-07-28
Anticipated expiration: 2040-03-26
Also published as: CN111462179B

Abstract

The application provides a three-dimensional object tracking method, a three-dimensional object tracking device and electronic equipment, and belongs to the technical field of computer vision. Wherein, the method comprises the following steps: when the scene map contains reference data of a target object, determining a first two-dimensional feature which is contained in the target image and matched with a reference two-dimensional feature in the scene map and a first 6DoF (Six Degrees of Freedom) pose of the target object in the target image; if the target image is the key frame image, performing three-dimensional object pose estimation on the target image, and determining a second 6DoF pose of the target object in the target image; and if the tracking deviation of the target image determined according to the first 6DoF pose and the second 6DoF pose is larger than or equal to the first threshold, replacing the existing reference data in the scene map by using the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose. Therefore, the accuracy and the universality of the tracking of the moving object are improved by the three-dimensional object tracking method.

Description

Three-dimensional object tracking method and device and electronic equipment

Technical Field

The application relates to the technical field of image processing, in particular to the technical field of computer vision, and provides a three-dimensional object tracking method and device and electronic equipment.

Background

The three-dimensional object tracking technology is a technology for obtaining a 6DoF (Six Degrees of Freedom) pose of an actual object in each frame according to the characteristics of the actual object in an image.

In the related technology, the three-dimensional object tracking technology calculates the 6DoF pose of an actual object according to a three-dimensional space point in a three-dimensional model and a corresponding two-dimensional coordinate on a two-dimensional image. However, in application, there are often situations where the area of a part of the real object in the scene image is small, or the texture is not rich, so that sufficient features cannot be extracted from the real object itself, thereby making the tracking difficult to proceed.

Disclosure of Invention

The three-dimensional object tracking method, the three-dimensional object tracking device and the electronic equipment are used for solving the problem that in the related technology, due to the fact that the area of an actual object in a scene image is small or the texture is not rich, sufficient features cannot be provided from the actual object, and therefore tracking is difficult to follow.

An embodiment of an aspect of the present application provides a three-dimensional object tracking method, including: detecting whether a scene map contains a reference 6DoF pose of a target object, a reference two-dimensional feature and a matching relation of three-dimensional space points; if the target object is included, determining the matching relation between the first two-dimensional feature included in the target image and the three-dimensional space point and the first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature included in the target image and the reference two-dimensional feature; judging whether the target image is a key frame image or not according to a preset rule; if so, performing three-dimensional object pose estimation on the target image, and determining the matching relation of a second two-dimensional feature three-dimensional space point contained in the target image and a second 6DoF pose of the target object in the target image; determining the tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose; and if the tracking deviation is greater than or equal to a first threshold value, replacing the reference 6DoF pose, the reference two-dimensional feature and the matching relationship of the three-dimensional space point in the scene map by using the matching relationship of the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

Another aspect of the present application provides a three-dimensional object tracking apparatus, including: the detection module is used for detecting whether the scene map contains the matching relation between the reference 6DoF pose of the target object, the reference two-dimensional feature and the three-dimensional space point; the first determining module is used for determining the matching relation between the first two-dimensional feature contained in the target image and the three-dimensional space point and the first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature contained in the target image and the reference two-dimensional feature if the first two-dimensional feature is contained; the judging module is used for judging whether the target image is a key frame image or not according to a preset rule; a second determining module, configured to perform three-dimensional object pose estimation on the target image if the target image is a second 6DoF pose in the target image, and determine a matching relationship between a second two-dimensional feature and a three-dimensional space point included in the target image; a third determining module, configured to determine a tracking deviation of the target image according to a matching degree of the first 6DoF pose and the second 6DoF pose; and the replacing module is used for replacing the matching relation between the reference 6DoF pose and the three-dimensional space point in the scene map, the reference two-dimensional feature and the three-dimensional space point by using the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose if the tracking deviation is greater than or equal to a first threshold value.

An embodiment of another aspect of the present application provides an electronic device, which includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the three-dimensional object tracking method as previously described.

A non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are configured to cause the computer to execute the three-dimensional object tracking method as described above.

Any of the embodiments of the above applications has the following advantages or benefits: tracking a target object in a target image through a scene map comprising target object reference data, determining a first 6DoF pose of the target object, and when the target image is a key frame image, performing pose estimation on the target image through a three-dimensional object pose estimation algorithm, determining a second 6DoF pose of the target object, so that when the tracking deviation determined according to the first 6DoF pose and the second 6DoF pose is large, the scene map is initialized by using a three-dimensional object pose estimation result of the target image, so that the environment characteristics are fused into the scene map, and the scene map is initialized when the object tracking information is greatly changed, so that the accuracy and the universality of the tracking of the moving object are improved. When the scene map comprises the reference 6DoF pose of the target object, the matching relation between the reference two-dimensional feature and the three-dimensional space point, the matching relation between the first two-dimensional feature and the three-dimensional space point, which are contained in the target image and matched with the reference two-dimensional feature, and the first 6DoF pose of the target object in the target image are determined, when the target image is a key frame image, the three-dimensional object pose estimation is carried out on the target image, the matching relation between the second two-dimensional feature and the three-dimensional space point, which are contained in the target image, and the second 6DoF pose of the target object in the target image are determined, then the tracking deviation of the target image is determined according to the matching degree between the first 6DoF pose and the second 6DoF pose, and when the tracking deviation is larger than or equal to the first threshold value, the reference 6DoF pose, which are contained in the scene map, is replaced by the matching relation between the second two-dimensional feature and the three-dimensional space point and the, The technical means of the matching relation between the two-dimensional features and the three-dimensional space points is referred, so that the problem that enough features cannot be provided from the actual object when the area of the actual object in the scene image is small or the texture is not rich, and the tracking is difficult to proceed is solved, and the technical effect of improving the accuracy and the universality of the tracking of the moving object is achieved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a three-dimensional object tracking method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of another three-dimensional object tracking method provided in the embodiments of the present application;

fig. 3 is a schematic flowchart of another three-dimensional object tracking method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a three-dimensional object tracking apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The embodiment of the application provides a three-dimensional object tracking method aiming at the problem that in the related art, due to the fact that enough features cannot be provided from an actual object when the area of the actual object in a scene image is small or the texture is not rich, tracking is difficult to proceed.

It should be noted that the first two-dimensional feature, the second two-dimensional feature, and the third two-dimensional feature referred to in this application respectively refer to two-dimensional features of a target object in a target image determined in different manners under different situations; the first 6DoF pose, the second 6DoF pose and the third 6DoF pose respectively refer to the 6DoF poses of the target object in the target image determined in different modes under different conditions.

The following describes in detail a three-dimensional object tracking method, an apparatus, an electronic device, and a storage medium provided in the present application with reference to the drawings.

Fig. 1 is a schematic flowchart of a three-dimensional object tracking method according to an embodiment of the present disclosure.

As shown in fig. 1, the three-dimensional object tracking method includes the following steps:

step 101, detecting whether a scene map contains a reference 6DoF pose of a target object, a reference two-dimensional feature and a matching relation of a three-dimensional space point.

The target object is a three-dimensional object which needs to track the 6DoF pose currently.

The reference 6DoF pose of the target object refers to one or more pieces of 6DoF pose information of the target object generated by tracking the target object on each reference image acquired before the current time.

The matching relationship between the reference two-dimensional features of the target object and the three-dimensional space points refers to the matching relationship between the two-dimensional features of the generated target object in the reference image and the three-dimensional space points of the three-dimensional model of the target object by tracking the target object in each reference image acquired before the current moment.

When a three-dimensional object is tracked by using a computer vision technique, if the texture of the target object in the acquired image is weak or the occupied area of the target object in the image is too small, sufficient information of the target object cannot be extracted from the image, so that the generated tracking information of the target object is inaccurate or the target object cannot be tracked. Therefore, in the embodiment of the present application, a scene map may be generated according to the tracking result of the target object in each frame of image, and the matching relationship between the two-dimensional feature of the background information in each frame of image and the three-dimensional space point is incorporated into the scene map, so that the tracking of the target object may be assisted according to the scene map, and the accuracy of the tracking of the target object may be improved.

In the embodiment of the application, when a target image which needs to be tracked by a target object is acquired, whether a scene map has been initialized or not can be detected, that is, whether data in the scene map is empty or not and whether prior knowledge such as a reference 6DoF pose of the target object, a matching relation between a reference two-dimensional feature and a three-dimensional space point is included or not can be detected. If the target object is included, the target object can be tracked by using the scene map; if not, the pose of the target object in the target image can be estimated through a three-dimensional object pose estimation algorithm so as to initialize the scene map. That is, in a possible implementation form of the embodiment of the present application, after the step 101, the method may further include:

and if not, estimating the pose of the three-dimensional object on the target image, and determining that the target image comprises the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose of the target object in the target image.

And adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose to the scene map.

And the third two-dimensional characteristic refers to a two-dimensional coordinate of a target object in the target image, which is determined by a three-dimensional object pose estimation algorithm when the scene map is empty, in the target image.

And the third 6DoF pose is 6DoF pose information of the target object in the target image determined by a three-dimensional object pose estimation algorithm when the scene map is empty.

In the embodiment of the application, if the scene map does not include the reference 6DoF pose of the target object, the matching relationship between the reference two-dimensional feature and the three-dimensional space point, that is, the scene map is empty, the three-dimensional space point matching each third two-dimensional feature of the target object in the target image and the third 6DoF pose of the target object may be determined according to the three-dimensional grid model corresponding to the target object by using a three-dimensional object pose estimation algorithm, and then the matching relationship between the three-dimensional space point and the third two-dimensional feature and the third 6DoF pose are added to the scene map, so as to complete initialization of the scene map.

As a possible implementation manner, in order to avoid that the number of the third two-dimensional features of the target object in the target image determined by the three-dimensional object pose estimation algorithm is small or the position of the third two-dimensional features in the target image is not reasonable, after the third 6DoF pose of the target object is determined by the three-dimensional object pose estimation algorithm, each two-dimensional coordinate corresponding to the target object in the target image is determined as the third two-dimensional feature of the target object by using a feature point extraction algorithm such as orb (organized FAST and organized bridge) algorithm or brisk (binary Robust Scalable keypoints) algorithm, and the three-dimensional space point matching each third two-dimensional feature is determined according to the third two-dimensional feature of the target object determined by the feature extraction algorithm and the third 6DoF pose of the target object, and the matching relationship between the newly determined third two-dimensional feature of the target object and the space point, and the third 6DoF pose of the target object, and adding the data into the scene map to complete the initialization of the scene map.

For example, the feature point extraction algorithm may be an ORB algorithm, a BRISK algorithm, or the like, and the three-dimensional object Pose Estimation algorithm may be a PWP3D (Real-Time Segmentation and Tracking of 3D Objects, Real-Time Segmentation and 3D object Tracking) algorithm based on Color Segmentation, a MovingEdge (Real-Time singular position Estimation of 3D object using a temporal consistent local Color histogram to perform Real-Time single-eye Pose Estimation on 3D Objects L actual colors), or the like.

And 102, if the first two-dimensional feature is included, determining the matching relation between the first two-dimensional feature and the three-dimensional space point included in the target image and the first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature included in the target image and the reference two-dimensional feature.

The first two-dimensional feature is a two-dimensional coordinate currently corresponding to a target object extracted from a target image by using a feature point extraction algorithm.

The first 6DoF pose is 6DoF pose information of the target object in the target image determined by the scene map.

In the embodiment of the application, if data such as a reference 6DoF pose of a target object, a matching relation between a reference two-dimensional feature and a three-dimensional space point, and the like, which include the target object, in a scene map are detected, it can be determined that the scene map has been initialized, so that the target object in a target image can be tracked by using prior knowledge about the target object in the scene map.

Specifically, whether a scene map includes reference two-dimensional features matched with the first two-dimensional features or not can be judged according to the first two-dimensional features included in the target image, so that the first two-dimensional features matched with three-dimensional space points in the scene map can be determined, and then the first 6DoF pose of the target object in the target image can be determined according to the matching relation between the first two-dimensional features matched with the three-dimensional space points and the three-dimensional space points by utilizing a PnP algorithm.

For example, the target image includes 100 first two-dimensional features, wherein 80 first two-dimensional features match with reference two-dimensional features in the scene map, so that a three-dimensional space point corresponding to the reference two-dimensional features matching with the 80 first two-dimensional features can be determined as the three-dimensional space point matching with the 80 first two-dimensional features.

And 103, judging whether the target image is a key frame image according to a preset rule.

The key frame image refers to a target image in which the lighting condition is greatly changed or the position of a target object included in the key frame image may be greatly changed compared with the previous frame image.

In the embodiment of the application, after the first 6DoF pose of the target object in the target image is determined, whether the target image is a key frame image with a large position change of the target object or not can be continuously judged, so that when the target image is determined to be the key frame image, the 6DoF pose of the target object in the target image is determined again.

In the embodiment of the present application, whether the target image is a key frame image may be determined according to the following preset rules.

Rule one

And judging whether the number of the first two-dimensional features matched with the three-dimensional space points contained in the target image is less than a second threshold value.

It can be understood that, since the matching relationship between the three-dimensional space point in the scene map and the reference two-dimensional feature is determined according to the position of the target object in one or more frames of images before the target image, the smaller the number of the first two-dimensional features matching with the three-dimensional space point included in the target image, the larger the change between the position of the target object in the target image and the position of the target object in one or more frames of images before the target image occurs, or the larger the change between the illumination condition when the target image is collected occurs, so that the first DoF pose of the target object determined according to the scene map is likely to be inaccurate. Therefore, in the embodiment of the present application, if the number of the first two-dimensional features matching the three-dimensional spatial point included in the target image is smaller than the second threshold, it may be determined that the position of the target object in the target image is greatly changed, or the illumination condition when the target image is acquired is greatly changed, so that the target image may be determined to be the key frame image.

Optionally, whether the target image is a key frame image may be determined according to a ratio of the number of the first two-dimensional features, which are included in the target image and are matched with the three-dimensional space point, to the number of all the first two-dimensional features in the target image. Specifically, when the ratio of the number of first two-dimensional features, which are included in the target image and are matched with the three-dimensional spatial point, to all the number of first two-dimensional features in the target image is smaller than a scale threshold, the target image is determined to be the key frame image.

Rule two

And judging whether the distance between the acquisition position of the target image and the acquisition position of the adjacent previous key frame is larger than a third threshold value.

The acquisition position of the target image refers to a position where an image acquisition device acquiring the target image acquires the target image.

In the embodiment of the application, the larger the distance between the acquisition position of the target image and the acquisition position of the adjacent previous key frame is, the larger the position of the target object in the target image may be changed from the position of the adjacent previous key frame, so that whether the distance between the acquisition position of the target image and the acquisition position of the adjacent previous key frame is greater than the third threshold value or not may be determined that the target image is the key frame image.

Rule three

And judging whether the time interval between the acquisition time of the target image and the acquisition time of the adjacent previous key frame is greater than a fourth threshold value or not.

The acquisition time of the target image refers to the time when the image acquisition equipment acquiring the target image acquires the target image.

In this embodiment of the application, the larger the time interval between the acquisition time of the target image and the acquisition time of the adjacent previous key frame is, it is indicated that the position of the target object in the target image may have a larger change than the position of the adjacent previous key frame, so that the time interval between the acquisition time of the target image and the acquisition time of the adjacent previous key frame is greater than the fourth threshold, and the target image is determined to be a key frame image.

It should be noted that the preset rule according to which whether the target image is the key frame image or not is determined, which may include, but is not limited to, the above-listed cases. In actual use, a suitable rule and specific values of the related thresholds can be determined according to actual needs and specific application scenarios, which are not limited in the embodiment of the present application.

Further, if it is determined that the target image is not the key frame image, it may be determined that the illumination condition does not significantly change or the position of the target object in the target image does not significantly change when the target image is acquired. That is, in a possible implementation form of the embodiment of the present application, after the step 103, the method may further include:

and if not, determining the first 6DoF pose as the current tracking information of the target object.

In the embodiment of the present application, if it is determined that the target image is not the key frame image, it may be determined that the position of the target object in the target image does not change greatly, and thus it may be determined that the first 6DoF pose of the target object determined according to the scene map is more accurate, and therefore, the first 6DoF pose of the target object may be determined as the current tracking information of the target object, so as to complete the tracking process of the target image without performing subsequent steps of the embodiment of the present application.

And 104, if so, performing three-dimensional object pose estimation on the target image, and determining the matching relation between the second two-dimensional features and the three-dimensional space points contained in the target image and the second 6DoF pose of the target object in the target image.

And the second two-dimensional characteristic refers to a corresponding two-dimensional coordinate of the target object in the target image currently determined by the three-dimensional object pose estimation algorithm when the target image is the key frame image.

And the second 6DoF pose is 6DoF pose information of a target entrusted in the target image determined by a three-dimensional object pose estimation algorithm when the target image is the key frame image.

In this embodiment of the present application, if it is determined that the target image is the keyframe image, a three-dimensional object pose estimation algorithm may be further used to perform three-dimensional object pose estimation on the target image according to the three-dimensional grid model of the target object, so as to determine a matching relationship between each corresponding second two-dimensional feature of the target object in the target image and the three-dimensional space point, and a second 6DoF pose of the target object in the target image, and use the second 6DoF pose as a basis for measuring an actual pose of the target object.

And 105, determining the tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose.

In the embodiment of the application, the second 6DoF pose of the target object can be used for representing the real pose of the target object, so that the tracking deviation of the target image can be determined according to the matching degree of the first 6DoF pose and the second 6DoF pose, and whether the first 6DoF pose of the target object is accurate or not can be judged according to the tracking deviation of the target image.

As a possible implementation manner, two-dimensional coordinates corresponding to the target object may be respectively determined according to the first 6DoF pose and the second 6DoF pose of the target object, and the matching degree between the first 6DoF pose and the second 6DoF pose may be determined according to the distance between the two-dimensional coordinates. That is, in a possible implementation form of the embodiment of the present application, before the step 105, the method may further include:

determining a first two-dimensional coordinate corresponding to each three-dimensional space point according to the first 6DoF pose;

determining a second two-dimensional coordinate corresponding to each three-dimensional space point according to the second 6DoF pose;

and determining the matching degree of the first 6DoF pose and the second 6DoF pose according to the distance between the first two-dimensional coordinate and the second two-dimensional coordinate.

In the embodiment of the application, a first two-dimensional coordinate corresponding to each three-dimensional space point can be determined according to a first 6DoF pose, the coordinate of each three-dimensional space point in the three-dimensional grid model of the target object and the internal parameters of the image acquisition equipment for acquiring the target image, that is, the first 6DoF pose is used for projecting the three-dimensional grid model of the target object to the plane where the target image is located, and the generated two-dimensional coordinate corresponds to the projection; and determining a second two-dimensional coordinate corresponding to each three-dimensional space point according to the second 6DoF pose, the coordinates of each three-dimensional space point in the three-dimensional grid model of the target object and the internal parameters of the image acquisition equipment for acquiring the target image, namely projecting the three-dimensional grid model of the target object to the plane where the target image is located by using the second 6DoF pose, and generating the two-dimensional coordinate corresponding to the projection.

Specifically, the 6DoF poses may be represented as a matrix of 3 × 4, with the first 6DoF pose being denoted as P₁Let the second 6DoF pose be denoted as P₂The coordinates of the three-dimensional space points can be represented as a column vector of 3 × 1, denoted as X, the internal parameters of the image acquisition device can be represented as a matrix of 3 × 3, denoted as K, and then the first two-dimensional coordinates corresponding to each three-dimensional space point is X₁＝KP₁X, and the corresponding second two-dimensional coordinate of each three-dimensional space point is X₂＝KP₂And (4) X. And then determining the Euclidean distance between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to each three-dimensional space point, and further determining the matching degree of the first 6DoF pose and the second 6DoF pose according to the Euclidean distance between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to each three-dimensional space point.

Optionally, the larger the euclidean distance between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to the three-dimensional space point is, the larger the difference between the first two-dimensional coordinate and the second two-dimensional coordinate is, that is, the larger the difference between the first 6DoF pose and the second 6DoF pose is, so that an average value of the euclidean distances between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to each three-dimensional space point can be determined, and then the reciprocal of the average value is determined as the matching degree of the first 6DoF pose and the second 6DoF pose.

In the embodiment of the application, the greater the matching degree of the first 6DoF pose and the second 6DoF pose is, the smaller the tracking deviation of the target image is; the smaller the matching degree of the first 6DoF pose and the second 6DoF pose is, the larger the tracking deviation of the target image is. That is to say, the tracking deviation of the target image is in a positive correlation with the euclidean distance between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to the three-dimensional space point, so that the average value of the euclidean distances between the first two-dimensional coordinate and the second two-dimensional coordinate corresponding to each three-dimensional space point can be determined as the tracking deviation of the target image.

And 106, if the tracking deviation is greater than or equal to the first threshold, replacing the reference 6DoF pose in the scene map, the reference two-dimensional feature and the matching relationship of the three-dimensional space point by using the matching relationship of the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

In this embodiment of the application, if the tracking deviation of the target image is greater than or equal to the first threshold, it may be determined that the pose of the target object in the target image has changed significantly, that is, the first 6DoF pose determined according to the scene image is not accurate, so that the second 6DoF pose of the target object may be determined as the current tracking information of the target object, and the scene map is cleared, so that the target image is used as an initial frame to initialize the scene map again, that is, the matching relationship between the second two-dimensional feature and the three-dimensional space point and the matching relationship between the second 6DoF pose are used to replace the matching relationship between the reference 6DoF pose and the three-dimensional space point in the scene map.

According to the technical scheme of the embodiment of the application, when the scene map comprises a reference 6DoF pose of a target object, a matching relation between a reference two-dimensional feature and a three-dimensional space point, a first two-dimensional feature which is contained in a target image and is matched with the reference two-dimensional feature and a first 6DoF pose of the target object in the target image are determined, when the target image is a key frame image, three-dimensional object pose estimation is carried out on the target image, a matching relation between a second two-dimensional feature and the three-dimensional space point and a second 6DoF pose of the target object in the target image are determined, then the tracking deviation of the target image is determined according to the matching degree between the first 6DoF pose and the second 6DoF pose, and when the tracking deviation is larger than or equal to a first threshold value, the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose are used for replacing the reference 6DoF pose, the two-dimensional feature, And (4) referring to the matching relation between the two-dimensional features and the three-dimensional space points. Therefore, the target object in the target image is tracked through the scene map comprising the target object reference data and the environmental characteristics, and the scene map is initialized when the object tracking information is greatly changed, so that the accuracy and the universality of the tracking of the moving object are improved.

In a possible implementation form of the method, if the tracking deviation of the key frame image is small, the three-dimensional object pose estimation result and the feature extraction result of the key frame image can be added to the scene map to enrich the reference data in the scene map.

The following further describes the three-dimensional object tracking method provided in the embodiment of the present application with reference to fig. 2.

Fig. 2 is a schematic flowchart of another three-dimensional object tracking method according to an embodiment of the present disclosure.

As shown in fig. 2, the three-dimensional object tracking method includes the following steps:

step 201, detecting whether the scene map contains the reference 6DoF pose of the target object, the matching relation between the reference two-dimensional feature and the three-dimensional space point.

Step 202, if the first two-dimensional feature included in the target image is included, according to the matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature, determining the matching relationship between the first two-dimensional feature included in the target image and the three-dimensional space point, and the first 6DoF pose of the target object in the target image.

Step 203, judging whether the target image is a key frame image according to a preset rule, if so, executing step 204 and step 205; otherwise, step 210 is performed.

The detailed implementation process and principle of the steps 201-203 can refer to the detailed description of the above embodiments, and are not described herein again.

Step 204, extracting a fourth two-dimensional feature in the target image, and constructing a three-dimensional space position corresponding to the fourth two-dimensional feature; and adding the fourth two-dimensional feature and the corresponding three-dimensional space position into the scene map.

In the embodiment of the application, if the target image is determined to be the key frame image, the scene map can be enriched by using the features in the target image, so as to further improve the accuracy of tracking the target object.

As a possible implementation form, feature extraction may be performed on the target image by using a feature point extraction algorithm to determine feature points included in the target image, where the fourth two-dimensional feature of the target image is a two-dimensional coordinate of each extracted feature point in the target image. Then, projecting the three-dimensional grid model corresponding to the target object to a plane where the target image is located by using the first 6DoF pose corresponding to the target object to generate a projected outline of the target object in the target image, and determining each fourth two-dimensional feature within the range of the projected outline as a corresponding two-dimensional coordinate of the target object in the target image; and finally, determining a three-dimensional space point corresponding to each fourth two-dimensional feature within the projection outline range by using the first 6DoF pose corresponding to the target object, each fourth two-dimensional feature within the projection outline range and the three-dimensional grid model corresponding to the target object, namely constructing a three-dimensional space position corresponding to each fourth two-dimensional feature corresponding to the target object.

Correspondingly, each fourth two-dimensional feature located outside the projection contour range can be determined as the corresponding two-dimensional coordinate of the background environment in the target image. For each fourth two-dimensional feature corresponding to the background environment, a feature (e.g., ORB feature) at each fourth two-dimensional feature corresponding to the background environment may be matched with a feature (also, ORB feature) corresponding to each position in the previous frame image. If the feature of a fourth two-dimensional feature corresponding to the background environment matches the feature of a certain position in the previous frame image, it can be considered that the fourth two-dimensional feature corresponding to the background environment and the position in the previous frame image matching therewith have the same visual feature, that is, represent the same point in the scene, so that the three-dimensional space position of the fourth two-dimensional feature corresponding to the background environment can be constructed by triangulation according to the acquisition position of the target image and the acquisition position of the previous frame image, thereby constructing the three-dimensional space position of each fourth two-dimensional feature corresponding to the background environment.

In the embodiment of the application, after the three-dimensional space position of each fourth two-dimensional feature in the target image is constructed, each fourth two-dimensional feature and the corresponding three-dimensional space position in each target image may be added to the scene map to enrich the reference data in the scene map.

And step 205, performing three-dimensional object pose estimation on the target image, and determining the matching relationship between the second two-dimensional features and the three-dimensional space points contained in the target image and the second 6DoF pose of the target object in the target image.

And step 206, determining the tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose.

Step 207, determining whether the tracking deviation of the target image is greater than or equal to a first threshold, if so, executing step 208; otherwise, step 209 is performed.

And step 208, replacing the reference 6DoF pose in the scene map, the reference two-dimensional feature and the matching relation of the three-dimensional space point by using the matching relation of the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

The detailed implementation process and principle of the steps 205-208 can refer to the detailed description of the above embodiments, and will not be described herein again.

And 209, adding the matching relation between the second two-dimensional feature and the three-dimensional space point contained in the target image and the second 6DoF position into the scene map.

In the embodiment of the application, if the tracking deviation of the target image is smaller than the first threshold, it can be determined that the position of the target object in the target image does not change greatly, and the first 6DoF pose of the target object determined according to the scene map is also more accurate, so that the first 6DoF pose or the second 6DoF pose can be determined as the current tracking information of the target object; and adding the result of the three-dimensional object pose estimation of the target image into the scene map so as to enrich the reference data in the scene map without initializing the scene map.

And step 210, determining the first 6DoF pose as the current tracking information of the target object.

The detailed implementation process and principle of the step 210 may refer to the detailed description of the above embodiments, and are not described herein again.

According to the technical scheme of the embodiment of the application, a target object in a target image is tracked through a scene map comprising target object reference data, a first 6DoF pose of the target object is determined, when the target image is a key frame image, pose estimation is carried out on the target image through a three-dimensional object pose estimation algorithm, a second 6DoF pose of the target object is determined, when tracking deviation determined according to the first 6DoF pose and the second 6DoF pose is larger than or equal to a first threshold value, a scene map is initialized by using a three-dimensional object pose estimation result of the target image, and when the tracking deviation is smaller than the first threshold value, reference information in the scene map is enriched by using the three-dimensional object pose estimation result of the key frame and three-dimensional space positions of fourth two-dimensional features in the constructed key frame image. Therefore, the environment features are blended into the scene map, and the scene map is updated through the information of the key frame images, so that the accuracy and the universality of the tracking of the moving object are further improved.

In a possible implementation form of the method, when the first 6DoF pose of the target object in the target image is determined by using the scene map, different weights can be given to different two-dimensional features, so that the tracking accuracy of the moving object is further improved.

The following further describes the three-dimensional object tracking method provided in the embodiment of the present application with reference to fig. 3.

Fig. 3 is a schematic flowchart of another three-dimensional object tracking method according to an embodiment of the present disclosure.

As shown in fig. 3, the three-dimensional object tracking method includes the following steps:

step 301, detecting whether a scene map contains a matching relationship between a reference 6DoF pose of a target object, a reference two-dimensional feature and a three-dimensional space point.

The detailed implementation process and principle of step 301 may refer to the detailed description of the above embodiments, and are not described herein again.

Step 302, if the first two-dimensional feature is included, determining a matching relationship between the first two-dimensional feature included in the target image and the three-dimensional space point according to the matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature, and calculating a first 6DoF pose of the target object in the target image according to the matching relationship between each first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature respectively matched with each first two-dimensional feature.

In the embodiment of the application, after the matching relationship between each first two-dimensional feature contained in the target image and the three-dimensional space point is determined, when the first 6DoF pose of the target object in the target image is determined, the weight of the reference two-dimensional feature matched with each first two-dimensional feature is blended, so that the accuracy of the determined first 6DoF pose is further improved.

In an embodiment of the present application, the weight of each reference two-dimensional feature in the scene image may be determined according to the following manner.

In a first mode

And determining the weight of each reference two-dimensional feature according to the position of each reference two-dimensional feature in the reference image where the reference two-dimensional feature is located.

In the embodiment of the present application, three types of manners of acquiring reference data in a scene map are provided: the acquisition method is as follows: when the target image is a key frame image and the tracking deviation is smaller than a first threshold value, performing feature extraction on the target image to establish a matching relation between a fourth two-dimensional feature and a three-dimensional space position; and the second acquisition mode is as follows: when the target image is a key frame image, utilizing a matching relation between a second two-dimensional feature and a three-dimensional space point, which is determined by estimating the pose of a three-dimensional object on the target image, and a second 6DoF pose; the third acquisition mode is as follows: and when the scene map is initialized, namely the target image is an initial frame, performing three-dimensional object pose estimation on the target image to determine the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose.

The reference two-dimensional features obtained in the first obtaining mode may be coordinates corresponding to a target object in a reference image or coordinates corresponding to a background environment in the reference image, so that when the matching relationship between each reference two-dimensional feature and a three-dimensional space point is determined by performing feature extraction on the key frame image so as to add the key frame image to a scene map, the weight of each reference two-dimensional feature can be determined according to the position of each reference two-dimensional feature in the reference image where the reference two-dimensional feature is located.

Optionally, the number of the two-dimensional coordinates corresponding to the background environment in the target image may be much larger than the number of the two-dimensional coordinates corresponding to the target object, so as to balance the role of each reference two-dimensional feature in determining the first 6DoF pose of the target object, a larger weight may be given to each reference two-dimensional feature corresponding to the target object in the target image.

That is, if the position of the reference two-dimensional feature in the reference image where the reference two-dimensional feature is located is the background environment in the reference image, the weight of the reference two-dimensional feature may be determined to be a smaller value; if the position of the reference two-dimensional feature in the reference image is the target object in the reference image, the weight of the reference two-dimensional feature can be determined as a larger value.

Optionally, the weight of each reference two-dimensional feature may be determined according to a ratio of the number of reference two-dimensional features corresponding to the background environment in the reference image to the number of reference two-dimensional features corresponding to the target object. Specifically, the smaller the area of the target object in the image is, the less the information of the target object itself can be extracted from the image, so that the smaller the role of the feature of the target object itself in determining the pose of the target object is. Therefore, for a reference two-dimensional feature obtained from one reference image, the ratio of the number of reference two-dimensional features corresponding to the background environment in the reference image to the number of reference two-dimensional features corresponding to the target object may be determined as the weight of the reference two-dimensional feature corresponding to the target object; and determining the ratio of the reference two-dimensional feature quantity corresponding to the target object to the reference two-dimensional feature quantity corresponding to the background environment as the weight of the reference two-dimensional feature corresponding to the background environment.

Mode two

Determining a first type of key two-dimensional features contained in the reference two-dimensional features according to the acquisition mode of each reference two-dimensional feature;

and determining the weight of each first-class key two-dimensional feature according to the time interval between the reference image and the target image in which each first-class key two-dimensional feature is positioned.

The first-class key two-dimensional features refer to reference two-dimensional features acquired by the second acquisition mode and the third acquisition mode, namely the second two-dimensional features and the third two-dimensional features. It should be noted that, because the second two-dimensional feature and the third two-dimensional feature are determined by using a three-dimensional object pose estimation algorithm, the first-class key two-dimensional features are both corresponding two-dimensional features of the target object in the reference image where the target object is located.

In this embodiment of the present application, since the matching relationship between the first-type key two-dimensional features in the scene map and the three-dimensional space point may be obtained from different reference images, and the larger the time interval between the reference image and the target image is, the larger the difference between the position of the target object in the reference image and the position of the target object in the target image may be, so that the weight of each first-type key two-dimensional feature may be determined according to the time interval between the reference image and the target image where each first-type key two-dimensional feature is located.

Specifically, the larger the time interval between the reference image and the target image where the first-class key two-dimensional feature is located, the smaller the weight of the first-class key two-dimensional feature may be determined to be; the smaller the time interval between the reference image and the target image where the first-class key two-dimensional feature is located, the larger the weight of the first-class key two-dimensional feature can be determined to be.

Mode III

Determining a second type of key two-dimensional features contained in the reference two-dimensional features according to the positions of the reference two-dimensional features in the reference image;

and determining the weight of each second-type key two-dimensional feature according to the acquisition mode of each second-type key two-dimensional feature.

The second type of key two-dimensional features refer to corresponding two-dimensional features of the target object in the reference image where the target object is located.

In this embodiment of the present application, the second type of key two-dimensional features may be obtained by the first obtaining method, or may be obtained by the second obtaining method and the third obtaining method. The second key two-dimensional features acquired by the second acquisition mode and the third acquisition mode are determined by a three-dimensional object pose estimation algorithm, so that the second key two-dimensional features acquired by the second acquisition mode and the third acquisition mode can be regarded as real two-dimensional features corresponding to the target object in the image, and if the second key two-dimensional features are acquired by the first acquisition mode, the weight corresponding to the second key two-dimensional features can be determined to be a smaller value; if the second type of key two-dimensional features are obtained in the second obtaining mode and the third obtaining mode, the weight corresponding to the second type of key two-dimensional features can be determined as a larger value.

It should be noted that, the manner of determining the weight of the reference two-dimensional feature may include, but is not limited to, the above-listed cases. In practical use, the method for determining the weight of the reference two-dimensional feature may be selected according to actual needs, which is not limited in the embodiment of the present application.

Step 303, judging whether the target image is a key frame image according to a preset rule.

And 304, if so, performing three-dimensional object pose estimation on the target image, and determining the matching relation between the second two-dimensional features and the three-dimensional space points contained in the target image and the second 6DoF pose of the target object in the target image.

And 305, determining the tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose.

And step 306, if the tracking deviation is greater than or equal to the first threshold, replacing the reference 6DoF pose in the scene map, the reference two-dimensional feature and the matching relationship of the three-dimensional space point by using the matching relationship of the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

The detailed implementation process and principle of the above-mentioned step 303-306 can refer to the detailed description of the above-mentioned embodiments, and are not described herein again.

According to the technical scheme of the embodiment of the application, when the scene map comprises the reference 6DoF pose of the target object, the matching relation between the reference two-dimensional feature and the three-dimensional space point, the first two-dimensional feature which is contained in the target image and is matched with the reference two-dimensional feature is determined, the first 6DoF pose of the target object in the target image is calculated according to the matching relation between each first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature which is respectively matched with each first two-dimensional feature, when the target image is a key frame image, the three-dimensional object pose estimation is carried out on the target image, the matching relation between the second two-dimensional feature contained in the target image and the three-dimensional space point and the second 6DoF pose of the target object in the target image are determined, then the tracking deviation of the target image is determined according to the matching degree between the first 6DoF pose and the second 6DoF pose, and when the tracking deviation is larger than or equal to the first threshold value, replacing the reference 6DoF pose in the scene map, the reference two-dimensional feature and the matching relation of the three-dimensional space point by using the matching relation of the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose. Therefore, the target object in the target image is tracked through the scene map comprising the target object reference data and the environmental characteristics, and different weights are given to different reference two-dimensional characteristics, so that the accuracy and the universality of the tracking of the moving object are further improved.

In order to implement the above embodiments, the present application also provides a three-dimensional object tracking device.

Fig. 4 is a schematic structural diagram of a three-dimensional object tracking apparatus according to an embodiment of the present application.

As shown in fig. 4, the three-dimensional object tracking apparatus 40 includes:

the detecting module 41 is configured to detect whether a scene map includes a matching relationship between a reference 6DoF pose of the target object, a reference two-dimensional feature, and a three-dimensional space point.

A first determining module 42, configured to determine, if the first two-dimensional feature is included in the target image, a matching relationship between the first two-dimensional feature included in the target image and the three-dimensional space point and a first 6DoF pose of the target object in the target image according to a matching relationship between the first two-dimensional feature included in the target image and the reference two-dimensional feature;

a judging module 43, configured to judge whether the target image is a key frame image according to a preset rule;

a second determining module 44, configured to perform three-dimensional object pose estimation on the target image if the target image is a second 6DoF pose in the target image, and determine a matching relationship between a second two-dimensional feature and a three-dimensional space point included in the target image;

a third determining module 45, configured to determine a tracking deviation of the target image according to a matching degree of the first 6DoF pose and the second 6DoF pose;

and a replacing module 46, configured to replace the reference 6DoF pose in the scene map, the matching relationship between the reference two-dimensional feature and the three-dimensional space point by using the matching relationship between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose if the tracking deviation is greater than or equal to the first threshold.

In practical use, the three-dimensional object tracking apparatus provided by the embodiment of the present application may be configured in any electronic device to execute the aforementioned three-dimensional object tracking method.

In one possible implementation form of the present application, the three-dimensional object tracking apparatus 40 further includes:

the fourth determination module is used for estimating the pose of the three-dimensional object on the target image if the target image is not included, and determining the matching relation between the third two-dimensional feature and the three-dimensional space point included in the target image and the third 6DoF pose of the target object in the target image;

and the first adding module is used for adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose to the scene map.

Further, in another possible implementation form of the present application, the three-dimensional object tracking apparatus 40 further includes:

the sixth determining module is used for determining a first two-dimensional coordinate corresponding to each three-dimensional space point according to the first 6DoF pose;

a seventh determining module, configured to determine, according to the second 6DoF pose, a second two-dimensional coordinate corresponding to each three-dimensional space point;

and the eighth determining module is used for determining the matching degree of the first 6DoF pose and the second 6DoF pose according to the distance between the first two-dimensional coordinate and the second two-dimensional coordinate.

and the second adding module is used for adding a second two-dimensional feature which is contained in the target image and matched with the three-dimensional space point into the scene map if the tracking deviation is smaller than the first threshold value.

the extraction module is used for extracting a fourth two-dimensional feature in the target image if the target image is the second two-dimensional feature;

the building module is used for building a three-dimensional space position corresponding to the fourth two-dimensional feature;

and the third adding module is used for adding each feature in the target image and the corresponding three-dimensional space position into the scene map.

In a possible implementation form of the present application, the determining module 43 is specifically configured to:

Further, in another possible implementation form of the present application, the determining module 43 is further configured to:

In a possible implementation form of the present application, the first determining module 42 is specifically configured to:

and calculating the first 6DoF pose of the target object in the target image according to the matching relation between each first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature respectively matched with each first two-dimensional feature.

Further, in another possible implementation form of the present application, the first determining module 42 is further configured to:

It should be noted that the foregoing explanation of the embodiment of the three-dimensional object tracking method shown in fig. 1, fig. 2, and fig. 3 is also applicable to the three-dimensional object tracking apparatus 40 of this embodiment, and details thereof are not repeated here.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each electronic device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the three-dimensional object tracking method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the three-dimensional object tracking method provided by the present application.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the three-dimensional object tracking method in the embodiments of the present application (for example, the detection module 41, the first determination module 42, the judgment module 43, the second determination module 44, the third determination module 45, and the replacement module 46 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implements the three-dimensional object tracking method in the above-described method embodiments.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the three-dimensional object tracking method, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to the electronics of the three-dimensional object tracking method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the three-dimensional object tracking method may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus of the three-dimensional object tracking method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. the output device 504 may include a display device, an auxiliary lighting device (e.g., L ED), a haptic feedback device (e.g., a vibration motor), etc.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (P L D)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.

The systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or L CD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer for providing interaction with the user.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., AN application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with AN implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for tracking a three-dimensional object, comprising:

detecting whether a scene map contains a reference 6DoF pose of a target object, a reference two-dimensional feature and a matching relation of three-dimensional space points;

if the target object is included, determining the matching relation between the first two-dimensional feature included in the target image and the three-dimensional space point and the first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature included in the target image and the reference two-dimensional feature;

judging whether the target image is a key frame image or not according to a preset rule;

if so, performing three-dimensional object pose estimation on the target image, and determining a matching relation between a second two-dimensional feature and a three-dimensional space point contained in the target image and a second 6DoF pose of the target object in the target image;

determining the tracking deviation of the target image according to the matching degree of the first 6DoF pose and the second 6DoF pose;

and if the tracking deviation is greater than or equal to a first threshold value, replacing the reference 6DoF pose, the reference two-dimensional feature and the matching relationship of the three-dimensional space point in the scene map by using the matching relationship of the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose.

2. The method as claimed in claim 1, wherein after detecting whether the scene image to be processed contains the reference 6DoF pose of the target object, the matching relationship between the reference two-dimensional feature and the three-dimensional space point, the method further comprises:

if not, performing three-dimensional object pose estimation on the target image, and determining that the target image comprises a matching relation between a third two-dimensional feature and a three-dimensional space point and a third 6DoF pose of the target object in the target image;

and adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose into the scene map.

3. The method according to claim 1, wherein said determining whether the target image is a key frame image according to a preset rule comprises:

4. The method according to claim 1, wherein said determining whether the target image is a key frame image according to a preset rule comprises:

5. The method according to claim 1, wherein said determining whether the target image is a key frame image according to a preset rule comprises:

6. The method according to any one of claims 1 to 5, wherein after determining whether the target image is a key frame image according to a preset rule, the method further comprises:

if not, determining the first 6DoF pose as the current tracking information of the target object.

7. The method of any one of claims 2-5, wherein said determining a matching relationship of a first two-dimensional feature contained in the target image to a three-dimensional spatial point and a first 6DoF pose of the target object in the target image comprises:

and calculating a first 6DoF pose of the target object in the target image according to the matching relation between each first two-dimensional feature and the three-dimensional space point and the weight of the reference two-dimensional feature respectively matched with each first two-dimensional feature.

8. The method of claim 7, wherein prior to the calculating the first 6DoF pose of the target object in the target image, further comprising:

9. The method of claim 7, wherein prior to the calculating the first 6DoF pose of the target object in the target image, further comprising:

10. The method of claim 7, wherein prior to the calculating the first 6DoF pose of the target object in the target image, further comprising:

11. The method of any of claims 1-5, wherein prior to determining the tracking deviation for the target image based on the degree of match between the first 6DoF pose and the second 6DoF pose, further comprising:

12. The method of any of claims 1-5, wherein after determining the tracking offset for the target image, further comprising:

and if the tracking deviation is smaller than a first threshold value, adding the matching relation between the second two-dimensional feature and the three-dimensional space point contained in the target image and the second 6DoF pose to the scene map.

13. The method according to any one of claims 1 to 5, wherein said determining whether the target image is a key frame image according to a preset rule further comprises:

if so, extracting a fourth two-dimensional feature in the target image;

constructing a three-dimensional space position corresponding to the fourth two-dimensional feature;

and adding the fourth two-dimensional feature and the corresponding three-dimensional space position into the scene map.

14. A three-dimensional object tracking device, comprising:

the detection module is used for detecting whether the scene map contains the matching relation between the reference 6DoF pose of the target object, the reference two-dimensional feature and the three-dimensional space point;

the first determining module is used for determining the matching relation between the first two-dimensional feature contained in the target image and the three-dimensional space point and the first 6DoF pose of the target object in the target image according to the matching relation between the first two-dimensional feature contained in the target image and the reference two-dimensional feature if the first two-dimensional feature is contained;

the judging module is used for judging whether the target image is a key frame image or not according to a preset rule;

a second determining module, configured to perform three-dimensional object pose estimation on the target image if the target image is a second 6DoF pose in the target image, and determine a matching relationship between a second two-dimensional feature and a three-dimensional space point included in the target image;

a third determining module, configured to determine a tracking deviation of the target image according to a matching degree of the first 6DoF pose and the second 6DoF pose;

and the replacing module is used for replacing the matching relation between the reference 6DoF pose and the three-dimensional space point in the scene map, the reference two-dimensional feature and the three-dimensional space point by using the matching relation between the second two-dimensional feature and the three-dimensional space point and the second 6DoF pose if the tracking deviation is greater than or equal to a first threshold value.

15. The apparatus of claim 14, further comprising:

a fourth determining module, configured to perform three-dimensional object pose estimation on the target image if the target image is not included, and determine a matching relationship between a third two-dimensional feature and a three-dimensional space point included in the target image and a third 6DoF pose of the target object in the target image;

and the first adding module is used for adding the matching relation between the third two-dimensional feature and the three-dimensional space point and the third 6DoF pose into the scene map.

16. The apparatus of claim 14, wherein the determining module is specifically configured to:

17. The apparatus of claim 14, wherein the determining module is further configured to:

18. The apparatus of claim 14, wherein the determining module is further configured to:

19. The apparatus of any of claims 14-18, further comprising:

20. The apparatus of any one of claims 15-18, wherein the first determining module is specifically configured to:

21. The apparatus of claim 20, wherein the first determining module is further configured to:

22. The apparatus of claim 20, wherein the first determining module is further configured to:

23. The apparatus of claim 20, wherein the first determining module is further configured to:

24. The apparatus of any of claims 14-18, further comprising:

a sixth determining module, configured to determine, according to the first 6DoF pose, a first two-dimensional coordinate corresponding to each three-dimensional space point;

an eighth determining module, configured to determine a matching degree of the first 6DoF pose and the second 6DoF pose according to a distance between the first two-dimensional coordinate and the second two-dimensional coordinate.

25. The apparatus of any of claims 14-18, further comprising:

and the second adding module is used for adding the matching relation between the second two-dimensional features and the three-dimensional space points contained in the target image and the second 6DoF pose to the scene map if the tracking deviation is smaller than a first threshold value.

26. The apparatus of any of claims 14-18, further comprising:

27. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

28. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-13.