CN101493889B

CN101493889B - Method and apparatus for tracking video object

Info

Publication number: CN101493889B
Application number: CN2008100005828A
Authority: CN
Inventors: 赵光耀; 于纪征; 孔晓东; 曾贵华
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2008-01-23
Filing date: 2008-01-23
Publication date: 2011-12-07
Anticipated expiration: 2028-01-23
Also published as: CN101493889A

Abstract

The embodiment of the invention provides a method for tracking a video object and a device thereof, and relates to the technical field of image processing. The method and the device are invented for realizing the accurate tracking of the video object. The method comprises the steps of: picking feature points of the profile of the video object in a current frame of image; finding out matched feature points; detecting at least one alternative profile of the video object in a next frame of image; calculating the profile feature value of the video object in the current frame of image; calculating the profile feature value of the alternative profile; and comparing the profile feature value of the alternative profile with that of the video object in the current frame of image and taking the alternative profile as the profile of the video object in next frame of image if the two profile feature values are matched. The method and the device can improve the accuracy of tracking the video object.

Description

Method and device for tracking video object

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for tracking a video object.

Background

Computer vision refers to the identification, tracking, measurement and the like of a tracking target by using a camera, a computer and other equipment instead of human eyes. The real-time tracking technology of the video object is an important subject in the field of computer vision, and is the basis of a series of work such as video analysis, video understanding, video object recognition, video object behavior analysis and the like.

Currently, there are many methods for video object tracking. The method of tracking video objects may be classified into a method of tracking video objects based on detection and a method of tracking video objects based on recognition according to whether pattern matching is required between frames of an image.

The method for tracking the video object based on detection directly extracts the outline of the video object in each frame of image according to a certain characteristic of the video object without transmitting object motion state parameters, matching the outline and the like among frames of the image. Methods of tracking video objects based on detection include methods of differential detection and the like. In the method for tracking a video object based on recognition, a certain feature of the video object is usually extracted first, and then an area that most matches the feature is searched in each frame of image, where the most matching area is the video object.

Among the two methods for tracking the video object, the method for tracking the video object based on detection has simple algorithm and easy realization, but the tracking effect is not ideal. The main research direction of current video object tracking technology has therefore moved to recognition-based methods.

Among methods for tracking video objects based on recognition, the convergence (conditional density propagation) tracking algorithm, that is, the conditional probability density propagation algorithm, is one of the most widely used contour tracking methods.

The Condensation tracking algorithm is one of particle filter based tracking algorithms. Particle filtering is also known as Sequential Monte Carlo (SMC), which is a method for implementing bayesian recursive filtering using a Monte Carlo method. It uses a set of weightsTo represent the posterior probability density p (x) of the system state vector_k|x_1:k) When the number of samples is large enough, the probability estimation is equivalent to the posterior probability density function.

In the Condensation tracking algorithm, a contour representation method for representing a video object by adopting a movable contour model and a shape space is adopted, a contour curve of the video object is represented by using control points of B-Snake, and possible changes of the contour curve, such as translation, rotation and the like, are represented by using the shape space.

The motion state parameter T of the video object outline can be expressed as: t ═ T (TX, TY, θ, SX, SY), where TX and TY are the center points of the video object in the x and y directions, respectively, θ is the angle by which the contour of the video object is rotated, and SX and SY are the dimensions of the video object in the x and y directions, respectively. The shape space parameter S of the video object in the shape space is represented as: s ═ S (TX, TY, SX cos θ -1, SY cos θ -1, -SY sin θ, SX sin θ).

The process of video object tracking using the Condensation tracking algorithm is as follows.

1) Obtaining initial value T of state of video object motion from initial frame image₀Initializing N_sParticle, initial weight

Is 1/N_sThe motion state and the shape space parameter of each particle are respectively(i＝1，2，......N_s). In the k-th frame, the state transition is performed for each particle state. The state transition equation is shown in equation (11):

TX_k ⁱ＝TX_k-1 ⁱ+B₁×ξ_1-k ⁱ；

TY_k ⁱ＝TY_k-1 ⁱ+B₂×ξ_2-k ⁱ；

θ_k ⁱ＝θⁱ _k-1+B₃×ξ_3-k ⁱ；(11)

SX_k ⁱ＝SXⁱ _k-1+B₄×ξ_4-k ⁱ；

SY_k ⁱ＝SYⁱ _k-1+B₅×ξ_5-k ⁱ；

wherein, B₁，B₂，B₃，B₄，B₅Is constant and xi is [ -1, 1]The random number of (2).

2) Each candidate particle is evaluated by using the observed value (motion state parameter T, shape space parameter S, etc.) of the current frame image, and the weight value of each particle is calculated.

The specific process is as follows:

21) for particle NⁱCalculating the motion state parameter T according to the method in the formula (1)ⁱAnd a shape space parameter Sⁱ。

22) According to the motion state parameter TⁱAnd a shape space parameter SⁱObtaining the particle NⁱAnd fitting a contour curve of the video object by the control points of B-Snake.

23) And sampling N sampling points on the contour curve of the video object, and solving the pixel point with the maximum gradient of each sampling point in the normal direction.

24) Finding out the distance DIS between each sample point on the profile curve and the pixel point with the maximum gradient on the normal line of the sample pointⁱ(N), (N ═ 1, 2.. cndot.) this is used as a measure to obtain particles NⁱIs observed as a function of probability densityAnd to particle NⁱThe weight value of (2) is updated, and the weight value of (2) is updated

The calculation formula of (a) is as follows:

by the motion state parameter T and weight of each particle

Carrying out weighted summation to obtain the expected motion state parameters of each particle, and further obtaining the expected shape space parameters S of the video object_kB-Snake control points and profile curves. Thus, the process of tracking the outline of the video object is completed.

In the process of implementing the invention, the inventor finds that the prior art has the following problems:

the Condensation tracking algorithm can realize real-time tracking of the video object contour with affine change (such as rotation, translation, scaling and the like). For example, when the video object is a rigid body, the rigid body is not separated from its components during the motion process, so that the rigid body can be accurately tracked by the convergence tracking algorithm. However, for video objects with non-affine changes, such as the situation that the arm is bent during the walking process of the human body, the Condensation tracking algorithm cannot accurately track the video objects. In addition, the computational complexity of the convergence tracking algorithm is complex, so the convergence tracking algorithm tracks the video object, and the tracking speed is low.

Disclosure of Invention

In order to solve the problems of low tracking speed and poor accuracy of a video object in the prior art, the embodiment of the invention provides a method and a device for tracking the video object.

In one aspect, an embodiment of the present invention provides a method for tracking a video object, where the method includes the following steps:

taking characteristic points of the video object contour in the current frame image;

finding matched feature points matched with the feature points in the next frame of image;

detecting at least one candidate contour of the video object in a next frame of image according to the matched feature points;

calculating a contour characteristic value of a video object in the current frame image;

calculating contour characteristic values of the candidate contours;

and comparing the contour characteristic value of the candidate contour with the contour characteristic value of the video object in the current frame image, and if the contour characteristic value of the candidate contour is matched with the contour characteristic value of the video object in the next frame image, determining that the candidate contour is the contour of the video object in the next frame image.

According to the method provided by the embodiment of the invention, firstly, the matching feature point of the video object in the current frame in the next frame image is determined, and then the candidate contour of the video object in the next frame image is detected according to the matching feature point. And then matching the contour characteristic values of the video object in the front frame image and the rear frame image, wherein if the contour characteristic values of the video object in the front frame image and the rear frame image are matched, the candidate contour is the contour of the video object in the next frame image. When the video object is subjected to non-affine change, the contour characteristic value of the video object in the next frame image can be extracted, and the contour characteristic value which is most matched with the contour characteristic value of the video object in the current frame image is obtained by matching the contour characteristic value of the video object in the current frame image, so that the contour of the video object in the next frame image can be accurately described. The method of the embodiment of the invention overcomes the defect that the video object with non-affine change can not be accurately tracked in the prior art. In addition, the method reduces the operation amount in the process of tracking the video object and improves the tracking speed.

Therefore, the method for tracking the video object in the embodiment of the invention not only can accurately track the video object with affine change, but also can track the video object with non-affine change, thereby improving the accuracy of tracking the video object.

In another aspect, an embodiment of the present invention provides an apparatus for tracking a video object, the apparatus including:

the first positioning unit is used for acquiring the characteristic points of the video object outline in the current frame image;

the second positioning unit is used for finding matched feature points matched with the feature points in the next frame of image;

the contour detection unit is used for detecting at least one candidate contour of the video object in the next frame of image according to the matched feature points, and comprises: the region prediction module is used for obtaining the appearance region of the video object contour in the next frame image through linear transformation by taking the matched feature point as a center; a contour selection module for detecting at least one candidate contour of the video object within the occurrence region;

the first calculating unit is used for calculating the contour characteristic value of the video object in the current frame image;

a second calculation unit, configured to calculate a contour feature value of the candidate contour;

and the contour matching unit is used for comparing the contour characteristic value of the candidate contour with the contour characteristic value of the video object in the current frame image, and if the contour characteristic value of the candidate contour is matched with the contour characteristic value of the video object in the next frame image, the candidate contour is the contour of the video object in the next frame image.

According to the device provided by the embodiment of the invention, the candidate contour of the video object in the next frame image is determined by the contour detection unit, the first and second calculation units respectively calculate the contour characteristic values of the video object in the front and rear frames of images, the contour matching unit matches the two contour characteristic values, and when the video object is subjected to non-affine change, accurate description of the contour of the video object can be obtained by matching the contour characteristic values of the video object in the front and rear frames of images. When the video object is subjected to non-affine change, the contour characteristic value of the video object in the next frame image can be extracted, and the contour characteristic value which is most matched with the contour characteristic value of the video object in the current frame image is obtained by matching the contour characteristic value of the video object in the current frame image, so that the contour of the video object in the next frame image can be accurately described. By utilizing the device provided by the embodiment of the invention, the defect that the video object with non-affine change cannot be accurately tracked in the prior art is overcome. In addition, the device of the embodiment of the invention reduces the operation amount in the process of tracking the video object and improves the tracking speed.

Therefore, the device for tracking the video object in the embodiment of the invention not only can accurately track the video object with affine change, but also can track the video object with non-affine change, thereby improving the accuracy of tracking the video object.

Drawings

FIG. 1 is a flow diagram of a method of tracking a video object according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a method of tracking a video object according to an embodiment of the invention;

FIG. 3 is a diagram of a first embodiment of a method for tracking video objects according to an embodiment of the present invention;

FIG. 4 is a diagram showing the results of a Harr wavelet transform in a method of tracking a video object according to an embodiment of the present invention;

FIG. 5 is a diagram of a wavelet profile descriptor for different resolutions of a method for tracking a video object according to an embodiment of the present invention;

FIG. 6 is a graph of experimental results of a method of tracking a video object using an embodiment of the present invention;

FIG. 7 is a graph of yet another experimental result of a method of tracking a video object using an embodiment of the present invention;

FIG. 8 is a schematic diagram of an apparatus for tracking video objects according to an embodiment of the present invention;

fig. 9 is a schematic diagram of an apparatus for tracking video objects according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings according to these drawings without inventive exercise.

In order to accurately track a video object with affine change and non-affine change, the method for tracking the video object in the embodiment of the invention comprises the steps of firstly obtaining a contour characteristic value of the video object in a current frame image; then, obtaining the characteristic points of the video object in the current frame image by using a mean shift method, and obtaining the matching characteristic points of the video object in the next frame image; then according to the matching feature points of the video object in the next frame image, obtaining the candidate contour of the video object in the appearance area, and solving the contour feature value of the video object in the next frame image; and finally, matching the contour characteristic value of the video object in the current frame image with the contour characteristic value of the video object in the next frame image to obtain the contour of the video object in the next frame image.

In order to make the technical advantages of the embodiments of the present invention clearer, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings.

As shown in fig. 1, a method for tracking a video object according to an embodiment of the present invention includes the following steps:

s1, taking the feature points of the video object outline in the current frame image;

s2, finding out matched feature points matched with the feature points in the next frame of image;

s3, detecting at least one candidate contour of the video object in the next frame of image according to the matched feature points;

s4, calculating the contour characteristic value of the video object in the current frame image;

s5, calculating contour characteristic values of the candidate contours;

s6, comparing the contour characteristic value of the candidate contour with the contour characteristic value of the video object in the current frame image, if the contour characteristic value of the candidate contour is matched with the contour characteristic value of the video object in the current frame image, the candidate contour is the contour of the video object in the next frame image.

According to the method provided by the embodiment of the invention, firstly, the matching feature points of the video object in the current frame in the next frame image are determined, then the appearance area of the video object is predicted according to the matching feature points, and the candidate outline of the video object is detected in the predicted appearance area. And then matching the contour characteristic values of the video object in the front frame image and the rear frame image, wherein if the contour characteristic values of the video object in the front frame image and the rear frame image are matched, the candidate contour is the contour of the video object in the next frame image.

When the video object is subjected to non-affine change, the contour characteristic value of the video object in the next frame image can be extracted, and the contour characteristic value which is most matched with the contour characteristic value of the video object in the current frame image is obtained by matching the contour characteristic value of the video object in the current frame image, so that the contour of the video object in the next frame image can be accurately described. The method of the embodiment of the invention avoids the defect that the video object with non-affine change can not be accurately tracked in the prior art. Therefore, the method for tracking the video object in the embodiment of the invention not only can accurately track the video object with affine change, but also can track the video object with non-affine change, thereby improving the accuracy of tracking the video object. In addition, because the algorithm of the embodiment of the invention is simple, compared with the prior art, the method of the embodiment of the invention can improve the speed of tracking the video object.

As shown in fig. 2, the step S3 of detecting at least one candidate contour of the video object in the next frame of image according to the matching feature points includes:

s31: predicting the appearance area of the video object outline in the next frame image according to the matched feature points;

s32: at least one candidate contour of the video object is detected within the predicted region of occurrence.

Because the appearance area of the video object in the next frame of image is predicted at first, the operation amount of contour matching of the video object is reduced, and the efficiency of tracking the video object is improved.

In step S1, the feature point of the video object contour in the current frame image may be a central point of the video object in the current frame image; accordingly, the matching feature point in step S2 is the matching center point of the video object in the next frame image.

In addition, there are many ways to describe the contours of video objects, such as invariant moments, eccentricity, aspect ratio of the video object, form factor, wavelet contour descriptors, etc. The wavelet contour descriptor has the advantages of definite physical significance, good retrieval performance, unchanged rotation and scaling and the like, and can accurately describe the contour characteristic value of the video object. Therefore, in the embodiment of the present invention, the wavelet contour descriptor is adopted as the contour feature value describing the contour of the video object.

The following describes a specific implementation process of the method for tracking a video object according to the embodiment of the present invention in detail with reference to fig. 3.

T1: and in the current frame image, carrying out contour detection on the video object to obtain contour points of the video object.

The various calculations performed in the current frame image are based on the previous frame of the current frame image. The calculations performed in the next frame of image are based on the current frame. Therefore, the principles of various calculations in the current frame and the next frame image are the same, and only the reference standard is different.

The method for detecting the contour points comprises the following steps: in the current frame image, the object index M is checked_k ^jAll connected bitmaps V within a defined range_kIf the gray value of one point in the upper, lower, left and right sides around a certain point is 0, the point is marked as a contour point.

T2: and obtaining the contour vector of the video object from the contour points.

Assuming that the video object has Np contour points in the current frame image, the contour vector is defined as

And after all contour points are found, sequencing the contour points. The method for sequencing the contour points comprises the following steps: indexing M from an object of examination_k ^jThe 1 st contour point which is searched horizontally from the upper edge of the enclosed area is the first contour point P₀. Then using the first contour point P₀As a center, the contour point found in the counterclockwise order by searching for a template such as 3 x 3 is the second contour point P₁. Then using the second contour point P₁Using the searching template to find out the third contour point P in the counter-clockwise direction as the center₂. By analogy, the last contour point found is

Then again inUsing a 3 x 3 search template as the center, the first contour point found should be P₀. The method for searching the contour points ignores the inner contour points of the video object, and the output contour vector only comprises the peripheral contour points of the video object.

Contour points are formed according to the method described aboveIs sequenced to obtain the contour vector

Wherein, P_n＝(P_xn，P_yn)(n＝0，...Np)。

After the contour vector is obtained, the centroid coordinate of the contour is calculated according to the formulas (1) and (2)

Wherein (x)_n，y_n) N. N is the coordinate of each contour point, (N is 0, 1.)_p-1)。

T3: calculating a normalized track vector with constant translation, rotation and scaling according to the contour vector

The calculation formulas are shown in the following (3), (4) and (5):

r_{n} = \sqrt{{(x_{n} - {TX}_{k}^{j})}^{2} + {(y_{n} - {TY}_{k}^{j})}^{2}} - - - (3)

r_max＝Max(r₀，r₁，...r_N-1) (4)

U_n＝r_n/r_max； (5)

wherein r is_nFor the distance of each contour point to the centroid, r_maxN is the maximum of the distances of each contour point to the center of mass, N being 0, 1_p-1。

T4: the obtained normalized wheel track vector

Reordering to obtain directional track vector

The method of reordering the normalized track vectors is as follows:

from said normalized track vector

Is/are as follows

In (3), all the maximum and minimum values are found. Assuming that J maxima and K minima are found, J x K "maximum-minimum value pairs" can be formed between these maxima and minima. And finding a pair of maximum-minimum value pairs with maximum subscript interval between the maximum value and the minimum value from the J × K maximum-minimum value pairs. Since the first term and the last term in the normalized track distance vector are adjacent on the outline of the video object, the interval between any two vectors can be kept at N_pWithin/2. Therefore, if the distance d between two maximum and minimum values is larger than N_pAnd/2, making d equal to N_p/2。

If there is only one "max-min pair", then the minimum value of the "max-min pair" is the most appropriate for the directional track vectorThe first term q in₀And such that the maximum is in the first N of the directional track vector_pWithin/2, sorting the normalized track distance vector according to the direction from the minimum value to the maximum value to obtain an oriented track distance vector

If there are multiple "max-min value pairs," then a comparison of the neighbors of the minimum or maximum values is used to determine which "max-min value pair" to select as a basis for calculating the directional track vector. For example, in a first pair of "max-min value pairs", the neighbors of the maxima are greater than the neighbors of the maxima in a second pair of "max-min value pairs", then the first pair of "max-min value pairs" will be the basis for calculating the directional track vector. If the maximum values of the adjacent terms are equal, the outline of the video object is symmetrical.

T5: vector the directional track

Length normalization is performed to form a length normalized directional track vector having a fixed length (e.g., length M1024)

The calculation formula is as follows:

a = [\frac{i}{M} N_{p}]; - - - (6)

b＝[a+1]； (7)

c = \frac{i}{M} N_{p} - a; - - - (8)

L_i＝(1-c)×q_a+c×q_b，(i＝0，1，......M-1)；(9)

wherein a and b are integers, and c is a floating point number.

T6: normalizing the directional track vector by said length

Calculating to obtain a wavelet contour descriptor B of the video object in the current frame image_k＝{b₀，b₁，...b_N-1}。

Normalizing said length oriented track vector

Harr wavelet transform is carried out to obtain Harr wavelet transform result

The specific Harr wavelet transform is implemented as follows:

a one-dimensional array L with a length m is provided, and m is a power of 2, the Harr wavelet transform for the array can be implemented by the following pseudo-code method:

by the method described above, the length-normalized directional track vector can be obtained

Is converted into

Both are equal in length. The schematic diagram of the result of the Harr wavelet transform is shown in fig. 4.

The wavelet contour descriptor B is different according to the image resolution N_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿIt can be obtained from equation (10):

B_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿ}＝{w₀ ⁿ，w₁ ⁿ，...w_N-1 ⁿ} (10)

as can be seen from equation (10), the wavelet profile descriptor B_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿCutting the Harr wavelet transform result

The first N coefficients.

Figure 5 shows the results of the wavelet profile descriptors computed at resolutions 256, 64 and 16, respectively. In practical applications, in order to save the computation amount of the video object contour comparison, the resolution can be taken as 16.

Wavelet contour descriptor B of video object in current frame image_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿAfter that, firstly, the appearance area of the video object in the next frame image needs to be determined, the candidate contour of the video object in the appearance area is calculated, and then the wavelet contour descriptor B of the candidate contour of the video object in the next frame image is calculated_n+1＝{b₀ ⁿ⁺¹，b₁ ⁿ⁺¹，...b_N-1 ⁿ⁺¹}。

The following describes the above calculation process in detail.

T7: and obtaining the appearance area of the video object in the next frame of image.

T71: and obtaining the central point of the video object in the current frame image.

T72: and calculating the matching center point of the video object in the next frame image according to the center point of the obtained video object in the current frame image.

In the embodiment of the invention, the matching central point of the video object in the next frame of image is calculated by adopting a mean shift method. Then, when calculating the center point of the video object in the current frame image, the center point is calculated by using the previous frame image of the current frame image as a reference, and the calculation principle is the same as the calculation process described below.

Suppose that

Representing the normalized pixel position of the video object model, wherein the center point coordinate is O; further quantizing the color grayscale values of the video objects to m levels, b (x) a mapping of the pixel at location x to a color index; the probability of the color u occurring is defined as:

<math> <mrow> <mover> <msub> <mi>q</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mi>α</mi> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>k</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mi>δ</mi> <mo>[</mo> <mi>b</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mi>u</mi> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein: k (x) is a kernel function, and the pixel at a position farther from the center point has a smaller weight;

α is a constant, and its expression is:

the video object model is then represented as:

<math> <mrow> <mover> <mi>q</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <msub> <mrow> <mo>{</mo> <mover> <msub> <mi>q</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>}</mo> </mrow> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>m</mi> </mrow> </msub> <mo>,</mo> <munderover> <mi>Σ</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mover> <msub> <mi>q</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow> </math>

suppose that

The pixel position of the candidate video object in the current frame is the center point C, and the same kernel function k (x) is applied in the range h from the center point, so that the probability of the occurrence of the color u in the candidate video object can be represented as:

<math> <mrow> <mover> <msub> <mi>p</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>α</mi> <mi>h</mi> </msub> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>h</mi> </msub> </munderover> <mi>k</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mfrac> <mrow> <mi>C</mi> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mi>h</mi> </mfrac> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mi>δ</mi> <mo>[</mo> <mi>b</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>u</mi> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein: alpha is alpha_hIs a constant, and the expression is:

the candidate video object model is then represented as:

<math> <mrow> <mover> <mi>p</mi> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mrow> <mo>=</mo> <msub> <mrow> <mo>{</mo> <mover> <msub> <mi>p</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>}</mo> </mrow> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>m</mi> </mrow> </msub> <mo>,</mo> </mrow> <munderover> <mi>Σ</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mover> <msub> <mi>p</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>16</mn> <mo>)</mo> </mrow> </mrow> </math>

from the above defined video object model and the candidate video object model, the distance d (c) between them can be calculated:

<math> <mrow> <mi>d</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mn>1</mn> <mo>-</mo> <mi>ρ</mi> <mo>[</mo> <mover> <mi>p</mi> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>,</mo> <mover> <mi>q</mi> <mo>&OverBar;</mo> </mover> <mo>]</mo> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>17</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein:

<math> <mrow> <mover> <mi>ρ</mi> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&equiv;</mo> <mi>ρ</mi> <mo>[</mo> <mover> <mi>p</mi> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>,</mo> <mover> <mi>q</mi> <mo>&OverBar;</mo> </mover> <mo>]</mo> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msqrt> <mover> <msub> <mi>p</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>C</mi> <mo>)</mo> </mrow> <mover> <msub> <mi>q</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> </msqrt> <mo>.</mo> </mrow> </math>

from the above analysis, it can be seen that the best candidate video object of the video objects in the current image frame is the candidate video object closest to the video object model, i.e. the candidate region that minimizes the distance d (c). Therefore, the minimum value of d (C) is obtained, and the matching central point of the video object in the next frame image can be determined.

The calculation method of d (C) can be obtained according to the iterative formula (18),

<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mn>1</mn> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>h</mi> </msub> </munderover> <msub> <mi>x</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>g</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mfrac> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mn>0</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mi>h</mi> </mfrac> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>h</mi> </msub> </munderover> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>g</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <mfrac> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mn>0</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mi>h</mi> </mfrac> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein:

is the current center point of the video object,

the matching center point of the video object in the next frame image is obtained;

w_ithe expression of (a) is:

<math> <mrow> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msqrt> <mfrac> <mover> <msub> <mi>q</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mrow> <mover> <msub> <mi>p</mi> <mi>u</mi> </msub> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </msqrt> <mi>δ</mi> <mo>[</mo> <mi>b</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>u</mi> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow> </math>

then, by applying the iterative formula (19) to each frame of image, the candidate video object and the center point thereof, which are the best candidates for the video object, for which d (c) is the minimum value, can be obtained, so as to obtain the matching center point of the video object in the next frame of image. By using the mean shift method, the speed of determining the matched feature points can be increased, and the efficiency of the whole tracking process can be improved. Of course, the mean shift method may not be used in the process of calculating the matching feature points of the video object in the next frame image. That is, after determining the feature point of the video object in the current frame image, the region where the feature point is likely to appear in the next frame image is determined first. And matching pixel points one by one in the possibly occurring region until the most matched feature point is obtained.

T73: after the matching central point of the video object in the next frame image is obtained, the occurrence area of the video object in the next frame image is predicted by using a linear method and taking the matching central point as the center. The linear method may include translation, rotation, and the like.

For example, with

Representing the outline range of the video object obtained from the current frame image, and the outline range of the video object in the next frame image, i.e. the occurrence region, can be used

And (4) showing.

The outline range specific prediction formula of the video object in the next frame image is as follows:

\{\begin{matrix} {Left}_{k + 1} = {Left}_{k} - w_{k} / 2 \\ {Top}_{k + 1} = {Top}_{k} - h_{k} / 2 \\ {Right}_{k + 1} = {Right}_{k} + w_{k} / 2 \\ {Bottom}_{k + 1} = {bottom}_{k} + h_{k} / 2 \end{matrix} - - - (20)

wherein (CX)_k，CY_k) Is the coordinate of the center point of the video object in the current frame image, (CX)_k+1，CY_k+1) The coordinates of the matching center point of the video object in the next frame image can be obtained by calculation according to the mean shift method.

In the formula (20), in the following formula,

\{\begin{matrix} w_{k} = ({Right}_{k} - {Left}_{k}) \frac{{speed}_{k} - speed_\min}{speed_\max - speed_\min} \\ h_{k} = ({Bottom}_{k} - {Top}_{k}) \frac{{speed}_{k} - speed_\min}{speed_\max - speed_\min} \end{matrix} - - - (21)

where speed _ min represents the minimum speed (typically 0) at which the video object moves, speed _ max represents the maximum speed (typically 1) at which the video object moves, and speed _ min represents the minimum speed (typically 1) at which the video object moves_kRepresenting the actual speed of the object in the previous frame, the calculation formula is as follows:

{speed}_{k} = \frac{\sqrt{{({CX}_{k} - {CX}_{k - N})}^{2} + {({CY}_{k} - {CY}_{k - N})}^{2}}}{N ({Right}_{k - N} - {Left}_{k - N})} - - - (22)

n represents the number of frames (typically 10) spaced between the two frames used to calculate the velocity.

It should be noted that the method for obtaining the matching center point of the video object in the next frame image is not limited to the mean shift method mentioned in the present embodiment. Any method capable of obtaining the center point of the video object in the image can be applied to the embodiment of the present invention.

T8: after obtaining the matching center point of the video object in the next frame image, in the appearance area, carrying out contour detection on the video object in the next frame image to obtain the candidate contour of the video object, and calculating the wavelet contour descriptor of the video object in the next frame image

B_{n + 1} = {{b_{0}}^{n + 1}, {b_{1}}^{n + 1}, . . . {b_{N - 1}}^{n + 1}} .

In this step, the process of calculating the wavelet contour descriptor of the video object in the next frame image is the same as the principle of the steps T1-T6 described above, and will not be described herein again.

T9: after the wavelet contour descriptors of the video objects in the current image frame and the next image frame are obtained, contour matching is carried out on the video objects to obtain the contour of the video objects in the next image frame, and therefore tracking of the video objects is completed.

The method for performing contour matching on the video object comprises the following steps:

t91: a wavelet contour descriptor B of the candidate contour in the appearance region_n+1＝{b₀ ⁿ⁺¹，b₁ ⁿ⁺¹，...b_N-1 ⁿ⁺¹And B, the wavelet contour descriptor B in the current frame image_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿComparing, and calculating the similarity between the two. The similarity is calculated according to the following formula:

<math> <mrow> <mi>Similarity</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mrow> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mi>n</mi> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>23</mn> <mo>)</mo> </mrow> </mrow> </math>

if the similarity value of the wavelet contour descriptors in the two previous and next frame images exceeds the similarity threshold value, then B_n+1＝{b₀ ⁿ⁺¹，b₁ ⁿ⁺¹，...b_N-1 ⁿ⁺¹Of those, the one with the highest similarity value is selected as B_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿThe corresponding trace results in (f).

The similarity threshold value may be freely defined, and in this embodiment, in order to ensure accuracy of tracking a video object, the similarity threshold value takes a value of 80%.

If for B_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿWill select the most similar one as pair B_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿThe result of the tracking.

If for B_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿIf the trace fails, B is indicated_n＝{b₀ ⁿ，b₁ ⁿ，...b_N-1 ⁿThe video object corresponding to the image is blocked or disappears; if B is_n+1＝{b₀ ⁿ⁺¹，b₁ ⁿ⁺¹，...b_N-1 ⁿ⁺¹Find no source tracking object, then b_N+1＝{b₀ ⁿ⁺¹，b₁ ⁿ⁺¹，...b_N-1 ⁿ⁺¹The video object corresponding to the image is a video object which newly appears or disappears in the next frame of image.

And finally, matching the video object contour corresponding to the best matching wavelet contour descriptor with the contour of the video object in the current frame image, wherein if the similarity of the video object contour and the contour of the video object in the current frame image exceeds a threshold value, the video object in the current frame image is successfully tracked, and otherwise, the tracking fails.

By using the method, even if the tracked video object is deformed in the motion process, the method provided by the embodiment of the invention can still accurately track the video object by using a mode of combining a mean shift method and contour matching.

The results of tracking the video object by using the method for tracking the video object according to the embodiment of the present invention are shown in fig. 6 and 7. As can be seen from the tracking result, the method for tracking a video object according to the embodiment of the present invention has a relatively ideal tracking effect, can track the contour of the video object relatively accurately, and even when the legs and arms of the pedestrian are bent or the like, the curve of the candidate contour and the real contour of the video object are relatively matched, for example, c) and d) in fig. 6.

The tracking process of the method for tracking the video object is stable, and the contour tracking can be stably carried out even if the movement speed of the object is changed greatly. As shown in fig. 7, when a car in the car video drives into the parking lot from fast to slow, the algorithm realizes stable tracking.

In addition, compared with the Condensation tracking algorithm, the video object tracking method provided by the embodiment of the invention has the advantages that the calculated amount is small, and the tracking speed is greatly improved. The tracking speed values when tracking is performed using the two tracking algorithms, respectively, are listed in table 1.

TABLE 1

It can be seen from the above experiments that the method for tracking a video object according to the embodiment of the present invention can not only accurately track a video object with affine change or non-affine change, but also improve the tracking speed of the video object compared with the prior art because the algorithm of the embodiment of the present invention is simple.

Corresponding to the method for tracking the video object in the embodiment of the invention, the embodiment of the invention also provides a device for tracking the video object.

As shown in fig. 8, an apparatus for tracking a video object according to an embodiment of the present invention includes:

a first positioning unit 801, configured to acquire feature points of the video object contour in the current frame image;

a first positioning unit 802, configured to find a matching feature point matching the feature point in a next frame of image;

a contour detection unit 803, configured to detect at least one candidate contour of the video object in a next frame image according to the matching feature points;

a first calculating unit 804, configured to calculate a contour feature value of the candidate contour;

a second calculating unit 805 configured to calculate a contour feature value of a video object in the current frame image;

a contour matching unit 806, configured to compare the contour feature value of the candidate contour with the contour feature value of the video object in the current frame image, and if the two contour feature values match, the candidate contour is the contour of the video object in the next frame image.

With the apparatus according to the embodiment of the present invention, first, the contour detection unit 803 determines a candidate contour of the video object in the next frame image, the first calculation unit 804 and the second calculation unit 805 calculate contour feature values of the video object in the previous frame image and the next frame image, respectively, the contour matching unit 806 matches the two contour feature values, and when the video object changes non-affine, accurate description of the contour of the video object can be obtained by matching the contour feature values of the video object in the previous frame image and the next frame image. When the video object is subjected to non-affine change, the contour characteristic value of the video object in the next frame image can be extracted, and the contour characteristic value which is most matched with the contour characteristic value of the video object in the current frame image is obtained by matching the contour characteristic value of the video object in the current frame image, so that the contour of the video object in the next frame image can be accurately described. By utilizing the device provided by the embodiment of the invention, the defect that the video object with non-affine change cannot be accurately tracked in the prior art is overcome.

Also, there are many kinds of contours describing video objects, such as invariant moments, eccentricity, aspect ratio of the video object, shape factor, wavelet contour descriptor, and the like. The wavelet contour descriptor has the advantages of definite physical significance, good retrieval performance, unchanged rotation and scaling and the like, and can accurately describe the contour characteristic value of a video object. Therefore, in the apparatus for tracking a video object in the embodiment of the present invention, the wavelet contour descriptor is used as the contour feature value for describing the contour of the video object.

As shown in fig. 9, the contour detection unit 803 includes:

a region prediction module 8031, configured to predict, according to the matching feature points, an occurrence region of the video object contour in a next frame image;

a contour extraction module 8032, configured to detect at least one candidate contour of the video object in the occurrence area.

The method has the advantages that the predicted occurrence area of the video object in the next frame of image is predicted, the contour of the video object can be selected in a targeted manner, the calculated amount of contour matching of the video object is reduced, and the speed and the efficiency of tracking the video object are improved.

The first calculating unit 804 includes:

a first contour detection module 8041, configured to perform contour detection on the video object in a current frame image to obtain a contour point of the video object;

a first normalized wheel track vector calculation module 8042, configured to obtain, from the contour points, a normalized wheel track vector calculation module of the video object;

a first directional contour vector calculation module 8043, configured to calculate a directional contour vector of the video object by the normalized track vector calculation module;

a first length normalization directional contour vector calculation module 8044, configured to perform length normalization on the directional contour vector to obtain a length normalization directional contour vector;

a first contour feature value calculation module 8045, configured to obtain a wavelet contour descriptor of the video object from the length-normalized orientation contour vector.

The second calculating unit 805 comprises:

an area prediction module 8051, configured to obtain an appearance area of the video object in a next frame image;

a second contour detection module 8052, configured to perform contour detection on the video object in the occurrence area to obtain a contour point of the video object;

a second normalized wheel track vector calculation module 8053, configured to obtain, from the contour point, a normalized wheel track vector calculation module of the video object;

a second directional contour vector calculation module 8054, configured to calculate a directional contour vector of the video object by the normalized track vector calculation module;

a second length normalization directional contour vector calculation module 8055, configured to perform length normalization on the directional contour vector to obtain a length normalization directional contour vector;

a second contour feature value calculating module 8056, configured to obtain a wavelet contour descriptor of the video object from the length-normalized orientation contour vector.

The algorithms used in the calculation process of the modules of the first calculation unit 804 and the second calculation unit 805 are the same as those used in the embodiment of the method for tracking a video object, and are not described herein again.

In summary, the method and the device for tracking the video object according to the embodiments of the present invention can not only improve the accuracy of tracking the video object, but also improve the speed of tracking the video object due to the simple algorithm of the embodiments of the present invention.

There are, of course, many possible embodiments of this invention and it is intended that all such other embodiments as may be obtained by those skilled in the art without departing from the spirit and scope of the invention and without any inventive step are deemed to be covered by the present invention and all such modifications, equivalents and alternatives falling within the scope and spirit of the invention.

Claims

1. A method of tracking a video object, the method comprising the steps of:

detecting at least one candidate contour of the video object in a next frame of image according to the matched feature points, wherein the method comprises the following steps: taking the matched feature point as a center, obtaining an appearance region of the video object contour in a next frame image through linear transformation, and detecting at least one candidate contour of the video object in the appearance region;

calculating contour characteristic values of the candidate contours;

2. A method for tracking a video object as claimed in claim 1, wherein the contour feature value is a wavelet contour descriptor, or an invariant moment of the contour, or eccentricity, or form factor.

3. The method according to claim 1, wherein the process of finding the matching feature point matching the feature point in the next frame image specifically comprises:

and finding matched feature points matched with the feature points in the next frame of image by using a mean shift method.

4. A method for tracking a video object as claimed in claim 1, wherein the process of detecting at least one candidate contour of the video object in the occurrence region is specifically:

in the appearance area, carrying out contour detection on the video object to obtain contour points of the video object;

sequencing the contour points to obtain a contour vector of the video object

Wherein, the P₀For the first contour point,

is the Nth_pA contour point, N_pIs an integer greater than 0.

5. A method for tracking a video object as claimed in claim 4, characterized in that the contour points are sorted to obtain a contour vector of the video object

The process specifically comprises the following steps:

starting from the upper edge of the range defined by the video object index, the first contour point searched in the horizontal direction is the first contour point P₀；

With the first contour point P₀Using a search template as a center, searching in a range determined by the search template according to a counterclockwise direction to obtain a second contour point P₁；

The same steps are followed as for the second contour point from the first contour point until the Nth contour point is found_pA contour point

According to (first contour point P)₀,._pA contour point

) To obtain the contour vector of the video object

P_{k + 1}^{i} = (P_{0}, . . ., P_{N_{p} - 1}) .

6. The method for tracking a video object according to claim 5, wherein the process of calculating the contour feature value of the candidate contour is specifically as follows:

from said contour vector

Obtaining a normalized track vector for the video objectWherein, U₀Is the quotient of the distance of the first contour point to the center of mass of the contour of the video object in the current frame image and the maximum of the distances of the contour points to said center of mass, U₁The quotient of the distance of the second contour point to the centroid of the contour of the video object in the current frame image and the maximum of the distances of the contour points to the centroid,

is the Nth_pThe quotient of the distance from each contour point to the centroid of the contour of the video object in the current frame image and the maximum value of the distances from each contour point to the centroid;

from said normalized track vector

Calculating a directional track vector for the video object

Wherein,

presentation pair

Reordering the results;

vector the directional track

Length normalization is carried out to obtain length normalization directional wheel track vector of the video object

Wherein L is₀...L_M-1Presentation pair

The result of length normalization is carried out;

normalizing the directional track vector by said length

Obtaining a wavelet contour descriptor B of the video object_k+1＝{b₀，b₁，...b_N-1}；b₀...b_N-1Represents L₀，L₁......L_M-1N represents the coefficient length of the truncated wavelet transform result;

wherein N is_pTo form said contour vector

The number of contour points, M being a length-normalized directional track vector

The length factor of (c).

7. A method of tracking video objects as claimed in claim 6, characterized by the fact that said contour vectors are derived from said contour vectors

Obtaining a normalized track vector for the video object

The process specifically comprises the following steps:

calculating the centroid coordinates of the contour from the contour vectorsThe calculation formula of the centroid coordinate is as follows:

wherein (x)_n，y_n) N. N is the coordinate of each contour point, (N is 0, 1.)_p-1)；

Calculating a normalized track vector

U_{k + 1}^{i} = (U_{0}, U_{1}, . . . . . . U_{N_{p} 1}) :

r_{n} = \sqrt{{(x_{n} - {TX}_{k + 1}^{j})}^{2} + {(y_{n} - {TY}_{k + 1}^{j})}^{2}}

r_{\max} = Max (r_{0}, r_{1}, . . . r_{N_{p} - 1})

U_n＝r_n/r_max(n＝0，......N_p-1)；

Wherein r is_nFor the distance of each contour point to the centroid, r_maxThe maximum value of the distances from each contour point to the centroid.

8. A method of tracking video objects as claimed in claim 6, characterized by the fact that said normalized track vectorCalculating a directional track vector for the video object

The process specifically comprises the following steps:

at the normalized track pitch vector

Finding the maximum value and the minimum value to form a maximum value-minimum value pair;

finding out a pair with the maximum value and the minimum value subscript having the maximum interval from the maximum value-minimum value pair;

sorting the maximum-minimum value pairs with the maximum subscript interval according to the sequence of the minimum value to the maximum value to obtain the directional wheel track vector

Wherein the order of "minimum-maximum" refers to the minimum starting with the maximum preceding N_pOrdering within/2.

9. A method for tracking video objects as claimed in claim 6, characterized in that said directional track vector is adapted to

The process specifically comprises the following steps:

a = [\frac{i}{M} N_{p}];

b＝[a+1]；

c = \frac{i}{M} N_{p} - a;

L_i＝(1-c)×q_a+c×q_b，(i＝0，1，......M-1)；

wherein a, b and c are constants.

10. A method for tracking video objects as claimed in claim 6, characterized in that the directional track vector is normalized by the length

Obtaining a wavelet contour descriptor B of the video object_k+1＝{b₀，b₁，...b_N-1The process is concretely as follows:

normalizing said length oriented track vector

Performing wavelet transformation to obtain wavelet transformation result

w₀，w₁，...w_M-1Is represented by L₀，L₁......L_M-1Performing wavelet transform;

intercepting the transformation result according to the difference of image resolution

To obtain B_k+1＝{b₀，b₁，...b_N-1}＝(w₀，w₁，...w_N-1) The number of truncated coefficients is the same as the value of the resolution.

11. The method for tracking a video object according to claim 2, wherein when the contour feature value is a wavelet contour descriptor, the process of calculating the contour feature value of the video object in the current frame image specifically comprises:

in the current frame image, carrying out contour detection on the video object to obtain a contour vector of the video object

P_{k}^{i} = (P_{0}, . . ., P_{N_{p} - 1}),

Coordinates of each contour point;

from said contour vectorObtaining a normalized track vector for the video object

Wherein, U₀The maximum of the distance from the first contour point to the centroid of the contour of the video object in the current frame image and the distance from each contour point to the centroidThe quotient of the values is determined by the quotient,

is the Nth_pThe quotient of the distance from the outline point to the centroid of the outline of the video object in the current frame image and the maximum value of the distances from each outline point to the centroid;

from said normalized track vector

Calculating a directional track vector for the video object

Wherein,presentation pair

Reordering the results;

vector the directional track

L₀...L_M-1Presentation pair

The result of length normalization is carried out;

normalizing the directional track vector by said lengthObtaining a wavelet contour descriptor B of the video object_k＝{b₀，b₁，...b_N-1}；b₀...b_N-1Represents L₀，L₁......L_M-1The wavelet profile descriptor of (a);

wherein N is_pTo form said contour vector

Length coefficient of (d); n denotes the coefficient length of the truncated wavelet transform result.

12. A method for tracking a video object as recited in claim 11, wherein the contour vector of the video object is derived from the contour points

The process specifically comprises the following steps:

According to (first contour point P)₀,._pA contour point) To obtain the contour vector of the video object

P_{k}^{i} = (P_{0}, . . ., P_{N_{p} - 1}) .

13. A method of tracking a video object as defined in claim 11, wherein the contour vector is derived from the contour vector

Calculating a normalized track vector for the video object

The process specifically comprises the following steps:

from said contour vector

Calculating the coordinates of the centroid of the video object contourThe coordinates of the center of mass

The calculation formula of (2) is as follows:

wherein (x)_n，y_n) N is the coordinate of each contour point, 0, 1_p-1；

According to the coordinates of the center of mass

Calculating a normalized track vector

r_{n} = \sqrt{{(x_{n} - {TX}_{k}^{j})}^{2} + {(y_{n} - {TY}_{k}^{j})}^{2}}

r_max＝Max(r₀，r₁，...r_N-1)

U_n＝r_n/r_max；

14. A method of tracking video objects as claimed in claim 11, characterized by the fact that said normalized track vector

Calculating a directional track vector for the video object

The process specifically comprises the following steps:

at the normalized track pitch vector

Wherein the order of "minimum-maximum" means that the minimum is the starting term to ensure the maximumLarge value preceding N_pOrdering within/2.

15. A method of tracking video objects as claimed in claim 11, characterized by tracking the directional track vectorLength normalization is carried out to obtain length normalization directional wheel track vector

The process specifically comprises the following steps:

a = [\frac{i}{M} N_{p}];

b＝[a+1]；

c = \frac{i}{M} N_{p} - a;

L_i＝(1-c)×q_a+c×q_b，(i＝0，1，......M-1)；

wherein a, b and c are constants.

16. A method for tracking video objects as claimed in claim 11, wherein the directional track vector is normalized by the length

Obtaining a wavelet contour descriptor B of the video object_k＝{b₀，b₁，...b_N-1The process is concretely as follows:

normalizing said length oriented track vector

Performing wavelet transformation to obtain transformation result

To obtain B_k＝{b₀，b₁，...b_N-1}＝(w₀，w₁，...w_N-1) The number of truncated coefficients is the same as the value of the resolution.

17. The method according to claim 2, wherein when the contour feature value is a wavelet contour descriptor, the process of comparing the contour feature value of the candidate contour with the contour feature value of the video object in the current frame image specifically comprises:

comparing the wavelet contour descriptors of all candidate contours of the video object in the next frame image with the similarity of the wavelet contour descriptors of the video object in the current frame image;

and if the similarity value of the wavelet contour descriptors of the front frame and the rear frame exceeds a similarity threshold value, taking the candidate object contour in the next frame image as the contour of the tracked video object in the current frame image.

18. The method of video object tracking according to claim 17, wherein the similarity value is calculated by:

<math> <mrow> <mi>Similarity</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>b</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> </mrow> </math>

wherein,

a wavelet profile descriptor representing said video object in the current frame image,

and a wavelet contour descriptor representing the video object in the next frame image, and N represents the coefficient length of the truncated wavelet transform result.

19. An apparatus for tracking video objects, the apparatus comprising:

20. An apparatus for tracking video objects according to claim 19, wherein said first computing unit comprises:

the first contour detection module is used for carrying out contour detection on the video object in a current frame image to obtain contour points of the video object;

the first normalized wheel track vector calculation module is used for obtaining a normalized wheel track vector of the video object from the contour points;

the first directional contour vector calculation module is used for obtaining a directional contour vector of the video object by the normalized wheel track vector calculated by the normalized wheel track vector calculation module;

the first length normalization directional contour vector calculation module is used for carrying out length normalization on the directional contour vector to obtain a length normalization directional contour vector;

and the first contour characteristic value calculation module is used for obtaining the wavelet contour descriptor of the video object from the length normalization orientation contour vector.

21. An apparatus for tracking video objects according to claim 19, wherein said second computing unit comprises:

the second normalized wheel track vector calculation module is used for obtaining a normalized wheel track vector of the video object from the contour points;

the second directional contour vector calculation module is used for obtaining the directional contour vector of the video object by the normalized wheel track vector calculated by the normalized wheel track vector calculation module;

the second length normalization directional contour vector calculation module is used for carrying out length normalization on the directional contour vector to obtain a length normalization directional contour vector;

and the second contour characteristic value calculation module is used for obtaining the wavelet contour descriptor of the video object from the length normalization orientation contour vector.