CN108010055B

CN108010055B - Tracking system and tracking method for three-dimensional object

Info

Publication number: CN108010055B
Application number: CN201711183555.4A
Authority: CN
Inventors: 康大智; 吕国云
Original assignee: Tapuyihai Shanghai Intelligent Technology Co ltd
Current assignee: Tapuyihai Shanghai Intelligent Technology Co ltd
Priority date: 2017-11-23
Filing date: 2017-11-23
Publication date: 2022-07-12
Anticipated expiration: 2037-11-23
Also published as: CN108010055A

Abstract

The invention discloses a tracking system of a three-dimensional object and a tracking method thereof, wherein the tracking system of the three-dimensional object comprises a key frame forming unit, a video frame external parameter analyzing unit and a tracking judging unit, wherein the key frame forming unit forms key frame data by analyzing data of a template frame and data of a video frame, the key frame data comprises external parameters of the key frame, the video frame external parameter analyzing unit is connected with the key frame forming unit in a communication way, the video frame external parameter analyzing unit can obtain the external parameters of the key frame and calculate the external parameters of the video frame according to the external parameters of the key frame, the tracking judging unit is connected with the video frame external parameter analyzing unit in a communication way, after the video frame analyzing module obtains the data of the key frame and the data of the video frame, and calculating the corresponding pose of the tracked object in the video frame.

Description

Tracking system and tracking method for three-dimensional object

Technical Field

The present invention relates to a tracking system and a tracking method for a three-dimensional object, and more particularly, to a tracking system and a tracking method for a three-dimensional object, which can track a three-dimensional object without placing a marker in a real scene.

Background

According to the traditional tracking method for the three-dimensional object, a preset mark needs to be placed in a real scene in advance, tracking is achieved through detecting the mark, parameters of a camera are determined, a three-dimensional data model is further rendered, and external parameters of the camera are recalculated along with the change of the mark such as zooming, translation or rotation, so that the tracking of the three-dimensional object is achieved. In real life, placing a mark is troublesome, on one hand, it is necessary to avoid that the placed mark blocks a tracked object, and on the other hand, some tracked objects are moving, so that the video frame acquisition device is also moving correspondingly at this time, and the placed mark is still, if the video frame acquisition device moves very quickly, the mark placed in a real scene is likely to move out of the visual field range which can be acquired by the video frame acquisition device, which may cause a failure of the whole tracking process. To solve such problems, the conventional tracking of a three-dimensional object is performed by using a planar image as a mark, and the tracking is performed by using a two-dimensional planar image as a mark, so that the moving direction of the tracked object is generally limited, and the degree of freedom of movement of a further superimposed three-dimensional model is also limited.

In the development process of modern science and technology, the markless visual feature-based visual feature has a wider application range, but the related technology at present has higher requirements on a CAD model of a tracked object, the tracking process is very complex, and simultaneously, the related operation equipment has higher requirements, and with the great popularization of mobile terminal AR application, the real-time property becomes a key problem of algorithm research, wherein the key point is the stability of the tracking method, the traditional tracking method for three-dimensional objects has lower tracking speed on objects with complex textures, effective features and sufficient quantity of features can not be extracted from simple objects, and when the traditional tracking method extracts features from the tracked objects, the traditional tracking method is very sensitive to the illumination intensity, which causes the continuity of the whole tracking process to be poor to a certain extent, and the overlapped virtual three-dimensional objects can often move violently in a scene, there is no coherent visual perception.

Disclosure of Invention

An object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for the three-dimensional object does not need to place specific real object markers and plane markers in a real scene when tracking the object.

Another object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for a three-dimensional object can stably track objects having different texture richness.

Another object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for a three-dimensional object can quickly track an object with a complex texture.

Another object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for a three-dimensional object can stably track an object with a complex texture.

Another object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for a three-dimensional object can extract sufficient feature points for a simple object, thereby performing effective and stable tracking.

Another object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for a three-dimensional object is not affected by the intensity of light rays when tracking the three-dimensional object.

Another object of the present invention is to provide a tracking system for a three-dimensional object and a tracking method thereof, wherein the tracking system for a three-dimensional object can track the object in real time by updating the key template frame, so that the tracking of the object can be continued even when the tracked object goes out of bounds and returns to the field of view of the video frame acquisition device. That is, after the tracked object fails to track due to out-of-bounds, the tracking system of the three-dimensional object can automatically track the tracked object again if the tracked object returns to the field of view of the video frame acquisition device again.

To achieve at least one of the above objects, the present invention provides a tracking system for a three-dimensional object, comprising:

a key frame forming unit, wherein the key frame forming unit forms a key frame data by analyzing data of a template frame and data of a video frame, wherein the key frame data comprises extrinsic parameters of the key frame;

a video frame outer parameter analyzing unit, wherein the video frame outer parameter analyzing unit is communicatively connected to the key frame forming unit, and the video frame outer parameter analyzing unit is capable of acquiring outer parameters of the key frame and calculating the outer parameters of the video frame according to the outer parameters of the key frame; and

and the tracking judgment unit is in communication connection with the video frame outer parameter analysis unit, and the video frame analysis module calculates the spatial pose in the video frame after acquiring the data of the key frame and the data of the video frame.

According to an embodiment of the present invention, the tracking system of the three-dimensional object further comprises a data obtaining unit, wherein the data obtaining unit further comprises a template frame obtaining module and a video frame obtaining module, wherein the template frame obtaining module and the video frame obtaining module are respectively connected to the key frame forming unit in a communication manner.

According to an embodiment of the present invention, the data obtaining unit further includes a feature point processing module, wherein the feature point processing module includes a feature point judging module and a feature point extracting module, wherein the feature point judging module is communicatively connected to the template frame obtaining module and the video frame obtaining module, and wherein the feature point extracting module is communicatively connected to the feature point judging module, the key frame forming unit and the video frame outside parameter analyzing unit.

According to an embodiment of the present invention, the feature point processing module further includes a texture richness determination module, wherein the texture richness determination module includes a feature point analysis module, wherein the feature point analysis module is communicatively connected to the feature point determination module.

According to an embodiment of the present invention, the texture richness determining module further includes a feature point change determining module and a homogenizing module, wherein the feature point change determining module is communicatively connected to the feature point extracting module, and wherein the homogenizing module is communicatively connected to the feature point determining module.

According to an embodiment of the present invention, the feature point processing module further includes a feature point matching judgment module, wherein the feature point matching judgment module is communicatively connected to the feature point extraction module.

To achieve at least one of the above objects, the present invention provides a tracking method of a three-dimensional object, comprising the steps of:

(A) acquiring data of a key frame matched with a planar image of a tracked object from a video frame according to the data of the tracked object, wherein the data of the key frame comprises external parameters of the key frame;

(B) analyzing the acquired data of the key frame and the acquired data of a video frame, and further calculating an external parameter corresponding to the current video frame; and

(C) and calculating the pose between the tracked object in the video frame and the tracked object in the template frame according to the external parameters of the video frame.

According to an embodiment of the present invention, the tracking method of a three-dimensional object further includes:

(D) comparing whether the difference between the extrinsic parameters of the video frame and the extrinsic parameters of the key frame meets a specific threshold; and

(E) and when the difference value does not meet the threshold value, replacing the template frame with the template frame used last time.

According to an embodiment of the present invention, wherein the step (a) comprises:

(F) judging whether a point in the current image data is a characteristic point according to a characteristic point judgment threshold;

(G) extracting feature points according to the judgment result;

(H) comparing the feature points extracted in the step (G) with feature points in reference data to judge that the feature points meet the corresponding requirements of the reference data on the feature points; and

(I) and if so, taking the feature points extracted in the step (G) as finally extracted feature points, and if not, continuously executing the step (F) by changing the feature point judgment threshold in the step (F).

(J) judging whether the number of the extracted feature points meets a feature point number threshold range or not; and

(K) when the number of the extracted feature points does not accord with a feature point number threshold range, carrying out homogenization processing on the video frame; and

(L) extracting the feature points of the homogenized video frame.

Drawings

FIG. 1 is a schematic diagram of a three-dimensional object tracking system of the present invention.

FIG. 2 is a tracking schematic of a tracking system for a three-dimensional object according to the present invention.

FIG. 3 is a schematic diagram of feature point extraction of a three-dimensional object tracking system according to the present invention.

FIG. 4 is a flow chart of a method for tracking a three-dimensional object according to the present invention.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments provided in the following description are only intended as examples and modifications obvious to a person skilled in the art, and do not constitute a limitation of the scope of the invention. The general principles defined in the following description may be applied to other embodiments, alternatives, modifications, equivalent implementations, and applications without departing from the spirit and scope of the invention.

It will be understood by those skilled in the art that in the present disclosure, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for ease of description and simplicity of description, and do not indicate or imply that the referenced devices or components must be constructed and operated in a particular orientation and thus are not to be considered limiting.

The invention discloses a tracking system of a three-dimensional object and a tracking method thereofAs will be explained in detail below, the tracking system for three-dimensional objects includes a data obtaining unit 10, a key frame forming unit 20, a video frame outer parameter analyzing unit 30 and a tracking determining unit 40, wherein the data obtaining unit 10 can obtain tracking data of a tracked object, the key frame forming unit 20 is communicatively connected to the data obtaining unit 10 to obtain the tracking data from the data obtaining unit 10, and the key frame forming unit 20 can form initial data related to the key frame accordingly, wherein the initial data includes outer parameters and inner parameters of the key frame, the video frame outer parameter analyzing unit 30 is communicatively connected to the key frame forming unit 20 to obtain outer parameters Mi of the current video frame according to the initial data formed by the initial key frame forming unit 20, wherein the tracking judgment unit 40 is communicatively connected to the video frame extrinsic parameter analysis unit 30 and the key frame forming unit 20, wherein the tracking judgment unit 40 is capable of obtaining the extrinsic parameters Mi of the video frame and the extrinsic parameters M of the key frame₀Comparing, wherein when the Mi and the M₀When the difference value between the two exceeds a certain range, the tracking judgment unit 40 forms an updated data accordingly, wherein the updated data is transmitted to the key frame forming unit 20 to replace the initial data of the key frame with the related data of the previous template frame, so as to re-track.

It will be appreciated by those skilled in the art that by updating the keyframes in real time, the tracking system for three-dimensional objects based on CAD models can track objects even when the tracked objects move out of the field of view of the video frame acquisition device and return back into the field of view.

In the present invention, the data acquiring unit 10 includes a template frame acquiring module 11 and a video frame acquiring module 12, wherein the template frame acquiring module 11 and the video frame acquiring module 12 are communicatively connected to the key frame forming unit 20, and the key frame forming unit 20 determines whether the video frame can be used as a key template frame by analyzing and comparing data of a template frame formed by the template frame acquiring module 11 with data of a video frame formed by the video frame acquiring module 12.

Specifically, in an embodiment of the present invention, the template frame acquiring module 11 may acquire CAD model data of a tracked object, and parse the CAD model data to store the parsed data, where the format type of the CAD model data includes, but is not limited to, OBJ or 2DS, and the CAD model data parsed by the template frame acquiring module 11 includes triangular bin data, vertex data, normal vector data, texture data, material data, illumination information data, and the like.

Those skilled in the art can understand that the template frame acquiring module 11 can perform vertex redundancy processing on the CAD model data at the same time, so as to reduce redundant information of the CAD model data, thereby increasing the time for subsequently processing the CAD model data.

Further, the template frame obtaining module 11 can render the CAD model data by using OpenGL to form the template frame, so as to match the data characteristics adopted for subsequent tracking, and those skilled in the art can understand that, in the present invention, the rendering mode of the tracked object may be other methods, and the present invention is not limited in this respect.

The video frame acquiring module 12 can be communicatively connected to the video frame acquiring device, such as a monocular camera, wherein the video frame acquiring module 12 can acquire at least one video frame, wherein the key frame forming unit 20 can acquire the data of the template frame and the data of the video frame from the template frame acquiring module 11 and the video frame acquiring module 12, respectively, to determine whether the current video frame data can be used as the key frame, and if so, the video frame can be used as the key frame, and the data of the template frame can be used as the data of the key frame.

In the present invention, the initial data of the template frame is implemented as an image T of the tracked object rendered by OpenGL,and the data of the template frame comprises an extrinsic parameter M given the image rendered by the OpenGL at the time₀＝[R₀ t₀]Wherein the key frame forming unit 20 is capable of comparing whether the current video frame matches with the data of the template frame, if yes, the video frame is regarded as the key frame, and the external parameter of the video frame image capturing device corresponding to the key frame at this time is marked as the external parameter of the template frame M₀＝[R₀ t₀]。

Specifically, a two-dimensional projection image T of the CAD model under the current pose can be obtained by rendering the CAD model of the tracked object using OpenGL, and an internal parameter K and an external parameter M of the video frame acquisition device, such as a monocular camera, are given, and the video frame is acquired in real time by the video frame acquisition device, wherein the keyframe forming unit 20 can perform fast initial matching on the template frame and the video frame, and a successful matching can be regarded that the tracked object is included in the current frame, and the external parameter of the current frame is equal to the external parameter of the rendered image, that is, M ≈ M', and the current video frame is taken as the keyframe.

The specific process of the rapid initial matching in the embodiment of the invention is as follows:

firstly, giving an internal parameter K and an external parameter M of a CAD model image rendered by OpenGL, and then extracting and binarizing the outline characteristics of the rendered image and the video frame by adopting a self-adaptive threshold value; the keyframe forming unit 20 matches the processed image with a similarity matrix obtained by a conventional least square error method, where D (i, j) is the similarity matrix, and l (i, j) is the coordinate of the top left corner of the region with the highest similarity:

the best matching position in D (i, j), i.e. the element minimum point ε₁＝min{d_ij}, experiments prove that the match point falls on ε₁~ 2ε₁The probability of the range is as high as 98%, so ε is used₂＝2ε₁As the upper bound of similarity of matching points, the similarity matrix after correction is d'_ijFor the modified matrix element values:

the modified similarity matrix accelerates the matching speed, and when the template frame is close to the image of the video frame to be matched, the value is rapidly converged; when the difference between the two is large, the value is rapidly amplified, so that the rough scanning can rapidly determine the approximate area by utilizing the distribution characteristics, and then the searching step length is shortened to perform fine scanning, and the matching time consumption is obviously reduced. It can be understood by those skilled in the art that, in the embodiment of the present invention, by the method described above, the matching speed between the template frame and the video frame can be increased, so that when an object is tracked, a fast tracking rate can be maintained.

After the template frame and the video frame are successfully subjected to coarse-fine combination quick initial matching, the video frame is saved as the key frame, and the camera external parameters of the rendering CAD model are taken as the external parameters M of the key frame₀＝[R₀ t₀]。

Further, the data obtaining unit 10 further includes a feature point processing module 13, wherein the feature point processing module 13 is communicatively connected to the template frame obtaining module 11 and the video frame obtaining module 12, wherein the feature point processing module 13 is capable of extracting feature points of the template frame and feature points of the video frame to form corresponding feature point data, respectively.

Specifically, the feature point processing module 13 extracts feature descriptors with scale invariance and rotation invariance from the template frame and the video frame by an ORB feature extraction method, and matches the two by using distance measurement to obtain a rough matching point set. Then eliminating mismatching from the obtained matching point set, and reducing the influence of the miscorresponding relation on subsequent calculation, wherein the method specifically comprises the following steps:

carrying out ORB method feature point extraction on the template frame and the video frame, constructing a corresponding feature point set to obtain descriptors of feature points, and matching the feature points of the template frame and the feature points of the video frame by using Hamming distance, which comprises the following specific steps:

for the feature point p on the template frame, if its simple description subset is F_PAnother feature point descriptor on the video frame I is F_S＝{F₁，F₂，…F_n}；

Calculating F_PAnd F_IHamming distance of all feature points in (1): d ═ D₁,d₂,…d_n}, in which:

d_n＝F_p^F_n

minimum value of Hamming distance D_min＝min{d₁,d₂,…d_nThe point corresponding to p is the nearest point of p, t is the matching threshold, if D is_min<t, judging that p is matched with the point, otherwise, p has no matching point;

and executing the same operation on the rest points in the template frame P to obtain all feature point matching point sets.

In an embodiment of the present invention, the feature point processing module 13 further includes a feature point judging module 131, a feature point extracting module 132 and a texture richness judging module 133, wherein the feature point extracting module 132 is communicatively connected to the feature point judging module 131 and the texture richness judging module 133, wherein the feature point judging module 131 is communicatively connected to the template frame acquiring module 11 and the video frame acquiring module 12 of the data acquiring unit 10 to be able to acquire the corresponding data of the template frame and the data of the video frame from the template frame acquiring module 11 and the video frame acquiring module 12, wherein the feature point judging module 131 is able to judge whether the data of the vertex in the data of the template frame and the data of the vertex in the data of the video frame satisfy the requirement of the feature point, wherein the feature point extracting module 132 is capable of extracting points meeting the feature point requirement to form a template frame feature point set and a video frame feature point set respectively, wherein the texture richness judging module 133 is communicatively connected to the feature point extracting module 132 to be capable of acquiring the template frame feature point set and the video frame feature point set from the texture richness judging module 133 and analyzing whether the number of the feature points in the video frame feature point set meets the requirement, if not, the texture richness judging module 133 forms a replacement data accordingly, wherein the feature point judging module 131 is capable of acquiring the replacement data accordingly and changing the requirement of the feature points according to the replacement data.

Specifically, in the embodiment of the present invention, the feature point determining module 131 is preset with a feature point determining threshold δ, and preferably, the feature point determining threshold δ is implemented as an absolute value of a difference between a pixel value of 16 pixels on a circle with a certain pixel radius R and a pixel value of a center point, if the circle has a value exceeding det (P)_num) If the difference between each pixel point and the central point exceeds a characteristic point judgment threshold value delta, the central point is taken as a characteristic point, and the judgment condition is as follows:

N_i＝∑|gray(p)-gray(x_i)|。

the feature point extraction module 132 can correspondingly extract feature points meeting the above requirements to form each of the video feature point sets.

In the present invention, the texture richness determining module 133 includes a feature point analyzing module 1331, wherein the feature point analyzing module 1331 is communicatively connected to the feature point extracting module 132 and the feature point determining module 131, wherein the feature point analyzing module 1331 is capable of obtaining the video feature point set from the feature point extracting module 132 and comparing the number of feature points in the video feature point set with a preset reference value (δ;)_1-δ₂Inner), wherein when the number of feature points in the video feature point set analyzed by the feature point analysis module 1331 is not within the range corresponding to the reference value, the feature point analysis module 1331 forms the replacement data accordingly, wherein the feature point judgment module 131 obtains the replacement dataAfter data exchange, the replacement data can be executed accordingly to adjust the condition of feature point extraction.

Through the above description, it can be understood by those skilled in the art that, in the embodiment of the present invention, by using the feature point analysis module 1331 of the texture richness determination module 133, the tracking system of the three-dimensional object can extract enough feature point sets of the video for objects with different texture richness, so that the tracking system of the three-dimensional object can be applied to objects with different texture richness.

Further, the texture richness determination module 133 further includes a feature point change determination module 1332, wherein the feature point change determination module 1332 is communicatively connected to the feature point extraction module 132 to monitor the difference between the number of extracted feature points of the extracted video feature points at two adjacent times, wherein the feature point change determination module 1332 is communicatively connected to the feature point analysis module 1331, if the quantity difference is less than a given change threshold, it indicates that the current state is normal, the feature point analysis module 1331 continues to extract the feature points of the video frames, if the number difference is greater than the change threshold, it indicates that the feature point has changed drastically, and accordingly indicates that the currently tracked object is out of bounds or is occluded, and at this time, the image needs to be homogenized.

More specifically, the feature point processing module 13 further includes a uniformization module 134, wherein the uniformization module 134 is communicatively connected to the feature point change judging module 1332 and the feature point judging module 131, wherein when the feature point analyzing module 1331 analyzes that the number difference is greater than the change threshold, the uniformization module 134 forms uniformization data, wherein the feature point judging module 131 can correspondingly execute the uniformization data, and can equally divide the video frame according to the uniformization data, so that the feature point extracting module 132 can extract the feature points from the equally divided video frame.

As can be understood by those skilled in the art, with the feature point change determining module 1332 and the homogenization processing module 134, even if an object in the analyzed image is blocked or out of bounds, the image feature point extracting system can extract an appropriate number of feature points.

The feature point processing module 13 further comprises a feature point matching determination module 135, wherein the feature point matching determination module 135 is communicatively connected to the feature point extraction module 132 and the video frame outside parameter analysis unit 30, wherein after the feature point extraction module 132 extracts the feature points of the video frame, the feature point matching determination module 135 is capable of calculating a hamming distance D { D ═ D between each feature point of the template frame and each feature point of each video frame₁,d₂,...,d_nIn which the Hamming distance minimum D_min＝min{d₁,d₂,...,d_nThe point corresponding to p is the nearest neighbor point if D_minIf t, determining that two points are matched, otherwise, p has no matched point.

The feature point matching determination module 135 further rearranges the feature points of the video frame according to the matching result to form the matched feature point set, specifically, performs a ratio test with the template frame image as a reference and the video template frame image as a target, first performs proximity query 2 times on the image feature points by using the feature points of the template frame, so as to obtain a distance from each feature point of the video frame to each feature point of the template frame, correspondingly, can also obtain a nearest neighbor bn and a next neighbor bn ' of each feature point of the video frame, and then performs a threshold test with a matching threshold t, and if dn is set, performs a threshold test with a distance dn ' from the nearest neighbor and the next neighbor being dn and dn ' respectively>t, rejecting the matched pair; finally, a ratio test is carried out, and a ratio threshold value is set as epsilon if

At this time, we consider that both bn and bn' are possible matching points of the query set, so the matching pair is eliminated. And then carrying out cross test on the obtained point sets, wherein the feature point sets of the query set and the target set matching pairs are respectively { s }_nAnd { p }_nWhere n is 1, 2, 3 …, inverting the query set and target set, solving for { p }_nMatching point of s'_nAnd the query set corresponding to the correct matching pair is s_n}∩{s′_n}。

As can be understood by those skilled in the art, since there are a large number of mismatches in the matching point set obtained by the first matching, and a large error may be caused to subsequent results and operations, the feature point matching determination module 135 may eliminate the mismatching pairs after obtaining the matching point pairs.

Further, the video out-of-frame parameter analysis unit 30 is communicatively connected to the key frame forming unit 20 and the feature point matching determination module 135, wherein the video out-of-frame parameter analysis unit 30 can obtain data of the template frame and related data of the video frame, so that the video out-of-frame parameter analysis module 30 can obtain the initial external parameter M according to the known initial external parameter M₀＝[R₀ t₀]Obtaining the external parameter M of the video frame by the corresponding relation of the internal parameter K and the current matching point_i＝[R_i t_i]And using Kalman filtering prediction to ensure the stability of the parameters, which comprises the following steps:

the initial external parameter M of the video frame acquisition device₀＝[R₀ t₀]Is a 3D point p which is not coplanar with the internal reference K under a series of world coordinate systems_i＝(x_i,y_i,z_i)^tPoint g corresponding to the lower part of the two-dimensional plane_i＝(u′_i,v′_i,1)^tThe relationship between:

g_i＝KMp_i

in the following tracking process based on the characteristics, the template frame I is obtained₀Characteristic point g₀And the video frame I_iCharacteristic point g_iThe corresponding matching point set, namely the 2D-2D correspondence:

g_i＝H_0ig₀

obtaining a new 2D-3D corresponding relation according to the initial external participation current 2D-2D corresponding relation, and calculating an external participation M of the image acquisition equipment currently acquiring the video frame_iIs given by_i＝H_0iKM_ip₀And then optimizing the estimation result by using Kalman filtering.

The tracking judgment unit 40 can acquire the extrinsic parameters of the video frame calculated by the video frame extrinsic parameter analysis unit 30, and calculate the spatial pose of the tracked object in the video frame according to the extrinsic parameters, so as to realize the tracking of the tracked object.

It should be noted that the video frame extrinsic parameter analysis unit 30 is communicatively connected to the feature point processing module 13 and the feature point matching judgment module 135, so as to obtain the feature point data and calculate the extrinsic parameter M of the video frame according to the obtained feature point data_iAnd forming corresponding analysis data, wherein the tracking judgment unit 40 is communicatively connected to the video out-of-frame parameter analysis unit 30 to judge whether there is an object to be tracked in the current video frame according to the analysis data, so as to judge whether the current tracking is successful, and if the tracking is failed, the tracking judgment unit 40 correspondingly forms an update data, wherein the tracking judgment unit 40 is communicatively connected to the initial module forming unit 20, so as to update the relevant data in the template frame.

Specifically, the tracking judgment unit 40 can obtain the analysis result formed by the video frame external parameter analysis unit 30, wherein a threshold value is set in the tracking judgment unit 40, wherein the tracking judgment unit 40 can judge the magnitude relationship between the difference value between the external parameter of the current video frame and the external parameter of the template frame and the threshold value, wherein when the difference value is greater than the threshold value, it indicates that there is no object tracked by the tracking judgment unit 40 in the current video frame, the tracking judgment unit 40 forms an update data, wherein the initial key frame forming unit 20 can obtain the update data and can form the template frame data according to the update data; when the difference value is smaller than the threshold value, the tracked object exists in the current video frame, and the current tracking state is judged in real time and the key template image is updated in order to guarantee stable tracking of the three-dimensional object under different visual angles. Through the steps, the stable tracking of the three-dimensional object is completed, and the external parameters of the camera are output in real time, and the method specifically comprises the following steps:

judging whether the number of the matching point sets of the current frame and the key template frame is lower than a given threshold value or not;

judging that the external parameters of the camera obtained by the current frame exceed a certain range, namely that the moving range of the camera is enlarged, if one of the two conditions is met, storing a certain video frame as a new template key frame for subsequent tracking, and selecting a video frame strategy as follows:

the frame posture information obtained by matrix calculation is as follows:

{r_x,r_y,r_z,t_x,t_y,t_z}

the pose information predicted by removing singular values and Kalman filtering is as follows:

{r′_x,r′_y,r′_z,t′_x,t′_y,t′_z}

the frame tracking score is:

g＝(r′_x-r_x)²+(r′_y-r_y)²+(r′_z-r_z)²+(t′_x-t_x)²+(t′_y-t_y)²+(t′_z-t_z)²

and if G < G, judging the frame as a template replacement optional frame.

According to another aspect of the present invention, there is provided a method of tracking a three-dimensional object, wherein the method comprises the steps of:

In the present invention, the step (a) comprises:

(G) extracting feature points according to the judgment result;

Further, the step (a) includes:

(L) extracting the feature points of the homogenized video frame.

It can thus be seen that the objects of the invention are sufficiently well-attained. The embodiments for explaining the functional and structural principles of the present invention have been fully illustrated and described, and the present invention is not limited by changes based on the principles of these embodiments. Accordingly, this invention includes all modifications encompassed within the scope and spirit of the following claims.

Claims

1. A system for tracking a three-dimensional object, comprising:

the video frame external parameter analysis unit is in communication connection with the key frame forming unit and can acquire external parameters of the key frame and calculate the external parameters of the video frame according to the external parameters of the key frame; and

the tracking judgment unit is in communication connection with the video frame external parameter analysis unit, and after acquiring the data of the key frame and the data of the video frame, the tracking judgment unit calculates the corresponding pose of the tracked object in the video frame;

a data acquisition unit, wherein the data acquisition unit further comprises a template frame acquisition module and a video frame acquisition module, wherein the template frame acquisition module and the video frame acquisition module are respectively communicatively connected to the key frame forming unit;

the video frame acquisition module acquires at least one video frame, wherein the key frame forming unit can acquire the data of the template frame and the data of the video frame from the template frame acquisition module and the video frame acquisition module respectively, judge whether the current video frame data can be used as the key frame, if so, use the video frame as the key frame, and use the data of the template frame as the data of the key frame.

2. The system for tracking a three-dimensional object according to claim 1, wherein the data acquisition unit further comprises a feature point processing module, wherein the feature point processing module comprises a feature point judgment module and a feature point extraction module, wherein the feature point judgment module is communicatively connected to the template frame acquisition module and the video frame acquisition module, wherein the feature point extraction module is communicatively connected to the feature point judgment module, the key frame formation unit, and the video out-of-frame parameter analysis unit.

3. The system for tracking a three-dimensional object according to claim 2, wherein the feature point processing module further comprises a texture richness determination module, wherein the texture richness determination module comprises a feature point analysis module, wherein the feature point analysis module is communicatively connected to the feature point determination module.

4. The system for tracking a three-dimensional object according to claim 3, wherein the texture richness judging module further comprises a feature point change judging module and a homogenization processing module, wherein the feature point change judging module is communicatively connected to the feature point extracting module, wherein the homogenization processing module is communicatively connected to the feature point judging module.

5. The system for tracking a three-dimensional object according to any one of claims 2 to 4, wherein the feature point processing module further comprises a feature point matching judgment module, wherein the feature point matching judgment module is communicatively connected to the feature point extraction module.

6. A method of tracking a three-dimensional object, comprising the steps of:

(A) acquiring a video frame matched with a template frame from video frames according to the data of a tracked object as a key frame, and acquiring the data of the key frame, wherein the data of the key frame comprises the external parameters of the key frame;

(B) analyzing the acquired data of the key frame and the acquired data of a video frame, and further calculating the corresponding external parameters of the current video frame; and

(C) calculating the pose between the tracked object in the video frame and the tracked object in the template frame according to the external parameters of the video frame;

(D) comparing whether the difference value between the external parameters of the video frame and the external parameters of the key frame meets a specific threshold value; and

7. The tracking method of a three-dimensional object according to claim 6, wherein the step (A) includes:

(G) extracting feature points according to the judgment result;

8. The tracking method of a three-dimensional object according to claim 7, wherein the step (a) includes:

(J) judging whether the quantity change of the extracted feature points accords with a feature point quantity change threshold range or not; and

(K) when the change of the number of the extracted feature points does not accord with a feature point number change threshold range, carrying out homogenization processing on the video frame; and

(L) extracting the feature points of the homogenized video frame.