CN107590820B

CN107590820B - Video object tracking method based on correlation filtering and intelligent device thereof

Info

Publication number: CN107590820B
Application number: CN201710742685.0A
Authority: CN
Inventors: 樊应若; 董远; 白洪亮
Original assignee: Lanzhou Feisou Information Technology Co Ltd
Current assignee: Lanzhou feisou Information Technology Co., Ltd
Priority date: 2017-08-25
Filing date: 2017-08-25
Publication date: 2020-06-02
Anticipated expiration: 2037-08-25
Also published as: CN107590820A

Abstract

The invention discloses a video object tracking method based on relevant filtering and an intelligent device thereof, wherein the method comprises the steps of circularly sampling a target area and the surrounding area of a target to obtain a shift sample; concatenating the shifted samples into a circulant matrix A; and performing operation updating on the circulant matrix A through discrete Fourier transform. The filtering method provided by the invention ensures the speed under the condition of ensuring higher precision by utilizing background information sampling. The method has the advantages that positive and negative samples are collected by using a circulation matrix of the area around the target, the target detector is trained by using ridge regression, and the operation of the matrix is converted into the dot multiplication of vectors through the diagonalizable property of the circulation matrix in the Fourier space, so that the operation amount is greatly reduced, and the tracking speed is greatly improved. The method has a good effect particularly under the conditions of complex extraction features and poor real-time effect. In addition, the background information is reasonably increased when the sample is adopted, the robustness of the model is enhanced, and the problems of rapid movement and shielding in the tracking process are effectively relieved.

Description

Video object tracking method based on correlation filtering and intelligent device thereof

Technical Field

The invention relates to a video object tracking method based on correlation filtering and an intelligent device thereof.

Background

At present, when a tracking model is trained and updated by a traditional tracking algorithm, if features are complex, calculation overhead is high, and real-time effect is poor. When a related filtering method is adopted to realize target tracking in a traditional mode, due to the fact that the quality of samples obtained through cyclic sampling is poor, utilized background information is few, and therefore the tracking effect on rapid movement and shielding is poor.

The tracking algorithms nowadays mainly include algorithms such as machine learning and filter learning based on traditional features, and the latest tracking algorithm based on deep learning. Some template matching algorithms based on traditional characteristics need to undergo a large amount of iteration, and the calculation amount is large; although the speed of a general correlation filtering method meets the requirement, the sample quality is not high, so that the robustness of a tracking model is poor; the tracking algorithm based on deep learning is characterized in that the image passes through a convolutional neural network, so that the parameter calculation amount is extremely complex, the GPU is required to support, and the practicability is poor. Therefore, a video object tracking method based on relevant filtering and an intelligent device thereof are urgently needed to solve the technical problems that the quality of samples obtained by cyclic sampling in the traditional method is poor, and utilized background information is little, so that the tracking effect on rapid movement and shielding is poor.

Disclosure of Invention

The invention aims to solve the technical problems that the quality of a sample obtained by cyclic sampling in the traditional method is poor, and utilized background information is little, so that the tracking effect on rapid movement and shielding is poor.

The present invention provides a video object tracking method based on correlation filtering, which includes:

s101, detecting a video stream to acquire a target area frame S₀；

S102, passing the target area frame S₀Constructing a regression model and carrying out operation transformation;

s103, in the next frame of the video stream, a target area frame S₀Obtaining a plurality of expanded search areas around;

s104, regarding the target area frame S₀Extracting characteristic vectors from a plurality of the extended search areas and performing operation transformation;

s105, calculating model parameters of the filter according to the characteristic vectors;

s106, calculating a corresponding graph according to the model parameters and the characteristic vectors;

s107, acquiring a relative displacement value of the target movement through the corresponding graph;

s108, restoring a target area frame of the current frame through the relative displacement value;

s109, the target area frame in the flow S108 is used as the target area frame S of the next frame₀And repeating the above steps S103 to SS109, tracking of the target object is achieved.

Furthermore, after detecting the first frame of the video, the target area frame S is obtained₀；

According to the target area frame S₀Size of the first, constructing the bandwidth and S₀A size-proportional Gaussian shape regression model y, and performing discrete Fourier transform operation on the y to obtain

With S₀As the basic search area of the next frame, obtaining 4 extended search areas S with the same rectangular size around the basic search area₁，S₂，S₃，S₄；

According to the basic search area S₀And the extended search area S₁，S₂，S₃，S₄Respectively extracting the directional gradient histogram features to obtain the shift sample x of the corresponding feature vector₀，x₁，x₂，x₃，x₄。

Further, the method comprises obtaining a shifted sample x of the corresponding feature vector₀，x₁，x₂，x₃，x₄For each of said shifted samples, connecting by cyclic shift a circulant matrix A into a respective region₀，A₁，A₂，A₃，A₄。

Further, for the circulant matrix A₀，A₁，A₂，A₃，A₄Performing discrete Fourier transform to obtain the frequency domain

According to

Calculate out the corresponding S₀，S₁，S₂，S₃，S₄Model parameters α ═ α for region dependent filters₀，α₁，α₂，α₃，α₄}；

Wherein i respectively corresponds to S₀，S₁，S₂，S₃，S₄Target area of λ₁The learning rate value is 0.015 for the fixed parameter according to the above

Subjecting it to complex conjugate transformation to obtain corresponding

By passing

According to

Obtaining a corresponding graph r of the current frame;

wherein Z represents that the current frame is based on the previous frame target area frame S₀Obtained shifted samples, λ₂The penalty factor value is 25 for the fixed parameter;

according to

Obtaining the maximum point of the corresponding graph r so as to obtain the relative displacement value of the target movement;

reducing the target area frame S of the current frame according to the relative shift value₀Search region S as a basis for the next frame₀。

The application also provides an intelligent device, which is characterized in that the intelligent device comprises:

a detection unit for detecting a video stream acquisition target region frame S₀；

A configuration arithmetic unit for passing through the target area frame S₀Constructing a regression model and carrying out operation transformation;

an acquisition unit for framing S in the target area in the next frame of the video stream₀Obtaining multiple extended searches aroundA cable region;

an extraction unit for extracting a target region frame S₀Extracting characteristic vectors from a plurality of the extended search areas and performing operation transformation;

the calculation unit is used for calculating the model parameters of the filter according to the characteristic vectors and calculating the corresponding graph according to the model parameters and the characteristic vectors;

the acquisition unit is further used for acquiring a relative displacement value of the target movement through the corresponding graph;

the restoring unit is used for restoring a target area of the current frame through the relative displacement value;

and the repeated circulation unit is used for realizing the tracking of the target object by repeated circulation.

Furthermore, the detection unit is further configured to obtain a target area frame S after detecting the first frame of the video₀；

The construction unit is also used for constructing a frame S according to the target area₀Size of (d), constructing bandwidth and S₀A size-proportional gaussian shape regression model y;

the computing unit is also used for carrying out discrete Fourier transform operation on the y to obtain

The acquisition unit is also used for acquiring the data by S₀As the basic search area of the next frame, obtaining 4 extended search areas S with the same rectangular size around the basic search area₁，S₂，S₃，S₄；

The extraction unit is used for searching the area S according to the basis₀And the extended search area S₁，S₂，S₃，S₄Respectively extracting the directional gradient histogram features to obtain the shift sample x of the corresponding feature vector₀，x₁，x₂，x₃，x₄。

Furthermore, the connection unit is further configured to obtain the shift samples of the corresponding feature vectors according to the obtained shift samplesThis x₀，x₁，x₂，x₃，x₄For each of said shifted samples, connecting by cyclic shift a circulant matrix A into a respective region₀，A₁，A₂，A₃，A₄。

Furthermore, the computing unit is also used for aligning the circulant matrix A₀，A₁，A₂，A₃，A₄Performing discrete Fourier transform to obtain the frequency domain

And is also used according to

Calculating model parameters α of the relevant filter corresponding to the regions S0, S1, S2, S3 and S4 to be { α%₀，α₁，α₂，α₃，α₄Wherein i corresponds to the target areas, λ, representing S0, S1, S2, S3, S4, respectively₁The learning rate value is 0.015 for the fixed parameter according to the above

Subjecting it to complex conjugate transformation to obtain corresponding

And also for passing

According to

A corresponding map r of the current frame is obtained, where Z denotes a shifted sample, λ, of the current frame obtained on the basis of the previous frame target area block S0₂The penalty factor value is 25 for the fixed parameter;

the above-mentionedAn acquisition unit further for obtaining

the restoring unit is used for restoring the target area frame S0 of the current frame as the basic search area S of the next frame according to the relative shift value₀。

The invention has the beneficial effects that:

1. the filtering method provided by the invention ensures the speed under the condition of ensuring higher precision by utilizing background information sampling. When samples are taken, background information is reasonably increased, the robustness of the model is enhanced, and the problems of rapid movement and shielding in the tracking process are effectively solved.

2. The method has the advantages that positive and negative samples are collected by using a circulation matrix of the area around the target, the target detector is trained by using ridge regression, and the operation of the matrix is converted into the dot multiplication of vectors by using the diagonalizable property of the circulation matrix in the Fourier space, so that the operation amount is greatly reduced, and the tracking speed is greatly improved. The problem of if extract the characteristic more complicated, real-time effect is poor is solved.

3. The method solves the problems of insufficient background information and poor quality of training samples when the general related filtering method realizes target tracking, obtains 4 extended search areas by expanding the basic search area in the image when extracting the sample, increases the background information for the sample and improves the sample quality. On the premise of not influencing the high-speed calculation of the filtering method in the Fourier space, the robustness of the tracking model is improved, and the method has high practicability.

4. The method ensures high efficiency in speed, enhances the sample quality by utilizing background information through a measure of expanding a search area, has higher model robustness compared with a common filtering algorithm, and can quickly and accurately track the object in the video.

Drawings

FIG. 1 is a flow chart of a method for tracking video objects based on correlation filtering according to an embodiment of the present application;

FIG. 2 is an architecture diagram of a smart device according to another embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of the present application;

FIG. 4 is a schematic flow chart of the present application as a whole;

FIG. 5 is a diagram illustrating a first state effect of an embodiment of the present application;

FIG. 6 is a diagram illustrating a second state effect of an embodiment of the present application;

FIG. 7 is a diagram illustrating a third state effect of an embodiment of the present application;

the specific implementation mode is as follows:

the following examples are given for the purpose of clarity of the invention and are not intended to limit the embodiments of the invention. It will be apparent to those skilled in the art that other variations and modifications can be made in the invention without departing from the spirit of the invention, and it is intended to cover all such modifications and variations as fall within the true spirit of the invention.

The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The noun explains:

robustness means that the control system maintains stable and robust characteristics under the perturbation of certain parameters.

Ridge regression: the method is a special biased estimation regression method for collinear data analysis, and is an improved least square estimation method.

Discrete Fourier transform: is a method of signal analysis by transforming a signal from the time domain to the frequency domain.

Description of the symbols:

for example:

a Discrete Fourier Transform (DFT) representing x,

to represent

The complex conjugate transformation of (2).

Represents the Inverse Discrete Fourier Transform (IDFT) on x.

The related filtering method is to obtain all possible shifted samples through cyclic sampling in the target area frame, and connect all samples into a cyclic matrix a (a cyclic matrix a is formed after the cyclic shift of the eigenvector x obtained after the target area image sampling). The parameters of the filter (i.e. the parameters of the tracking model) are continuously updated in the tracking process, and the updating process is essentially the discrete Fourier transform of the circulant matrix, so that the calculation amount is reduced. However, the traditional correlation filtering method adopts a limited number of negative samples around the target, but does not fully acquire the background information of the target.

The following describes a single-target tracking process, and can also detect a plurality of targets by combining with a detection module to respectively complete the tracking process of each target, and the specific flow of single-target tracking is realized as follows;

as shown in fig. 1 and 4, the present application provides a video object tracking method based on correlation filtering, including:

s101, detecting a video stream to acquire a target area frame S₀；

s109, the target area frame in the flow S108 is used as the target area frame S of the next frame₀And the above steps S103 to S108 are repeated to track the target object.

First, a region in and around the target region frame is cyclically sampled to obtain shifted samples. And connecting the obtained shift samples into a cyclic matrix A, and performing operation updating on the cyclic matrix A through discrete Fourier transform. Compared with the prior art, the traditional method is lack of information acquisition of surrounding areas. The algorithm is also quite different from the present application. The speed is also ensured under the condition of ensuring higher precision, the background information is reasonably increased when a sample is taken, the robustness of the model is enhanced, and the problems of rapid movement and shielding in the tracking process are effectively solved.

In an optional embodiment of the present application, the video stream acquisition target area frame S is detected₀Passing through the target area frame S₀Constructing a regression model and carrying out operation transformation, and setting a target area frame S in the next frame of the video stream₀Obtaining multiple extended search areas around, and framing the target area₀And extracting feature vectors from a plurality of the extended search regions and performing operation transformation, wherein the method further comprises the following steps:

obtaining a target area frame S after detecting a first frame of a video₀；

With S₀Searching as a basis for a next frameAn area which is positioned around the basic search area and obtains 4 extended search areas S with the same rectangular size₁，S₂，S₃，S₄；

Secondly, a target area frame S is obtained after the first frame of the video is detected₀According to the target area frame S₀Size of the first, constructing the bandwidth and S₀A size-proportional Gaussian shape regression model y, and performing discrete Fourier transform operation on the y to obtain

With S₀As the basic search area of the next frame, obtaining 4 extended search areas S with the same rectangular size around the basic search area₁，S₂，S₃，S₄Searching for the region S based on the basis₀And the extended search area S₁，S₂，S₃，S₄Respectively extracting the directional gradient histogram features to obtain the shift sample x of the corresponding feature vector₀，x₁，x₂，x₃，x₄. It is described in detail how to acquire modulus data values of a displaced sample and its samples. And provides a specific sample collection mode to adapt to a specific algorithm in the following process.

In an optional embodiment of the present application, the method further comprises:

from the shifted samples x that obtain the corresponding feature vector₀，x₁，x₂，x₃，x₄For each of said shifted samples, connecting by cyclic shift a circulant matrix A into a respective region₀，A₁，A₂，A₃，A₄. (feature direction obtained after sampling of target region image)The quantity x, after cyclic shift, constitutes a circulant matrix a).

Next, based on the shifted sample x for obtaining the corresponding feature vector₀，x₁，x₂，x₃，x₄For each of said shifted samples, connecting by cyclic shift a circulant matrix A into a respective region₀，A₁，A₂，A₃，A₄. And the modulus acquired by the method is converted into a corresponding cyclic matrix, so that a basis is provided for the following operation.

In an optional embodiment of the present application, the method further includes calculating a model parameter of the filter according to the feature vector, calculating a corresponding map according to the model parameter and the feature vector, obtaining a relative displacement value of the target moving through the corresponding map, and restoring a target region of the current frame according to the relative displacement value, where:

for cyclic matrix A₀，A₁，A₂，A₃，A₄Performing discrete Fourier transform to obtain the frequency domain

According to

Subjecting it to complex conjugate transformation to obtain corresponding

By passing

According to

Obtaining a corresponding graph r of the current frame;

according to

Finally, firstly detecting and acquiring a target area frame S of a first frame of the video₀From the target area frame S₀Size of (d), constructing the bandwidth and S₀A Gaussian shape regression model y with proportional size is obtained by performing DFT on the y

The S is₀As a base search area for the next frame, at S₀Obtaining 4 extended search areas S with same rectangular size around₁，S₂，S₃，S₄. Respectively pair S in the current frame of the video₀，S₁，S₂，S₃，S₄HOG features (Histogram of oriented gradient) features are extracted. Respectively obtaining the feature vectors x from 5 regions₀，x₁，x₂，x₃，x₄Representing the respective sample. Respectively circularly shifting the eigenvectors to obtain a corresponding circular matrix A₀，A₁，A₂，A₃，A₄. Due to the nature of the circulant matrix, vector x can be efficiently aligned₀，x₁，x₂，x₃，x₄A DFT transform (discrete fourier transform) is performed. Obtaining the frequency domain after DFT transformation

According to

Model parameter α ═ α for calculating correlation filter₀，α₁，α₂，α₃，α₄}. Taking i as 0, 1, 2, 3 and 4, representing the corresponding marks of the target area and the surrounding area, the fixed parameter learning rate is a constant, and lambda₁0.015, eigenvector x₀，x₁，x₂，x₃，x₄And a regression Gaussian model y obtained from the target region box S0 according to the regression Gaussian model y

Subjecting it to complex conjugate transformation to obtain corresponding

Obtained by discrete Fourier transform on a regression Gaussian model y

According to

Obtaining a corresponding map r, a size and S of the current frame₀And (5) the consistency is achieved. Where z denotes that the current frame is based on the previous frame target area box S0 (also called the basic search area S of the current frame)₀) And obtaining the feature vector. Fixed parameter penalty factor lambda₂＝25，x₀，x₁，x₂，x₃，x₄Representing S in the previous frame₀，S₁，S₂，S₃，S₄Region-derived feature vector α₀，α₁，α₂，α₃，α₄Obtained from the previous step and represents the training in the previous frameModel parameters of the correlation filter are refined. Finding the maximum point in the corresponding graph r, obtaining the relative shift value of the target movement, and further recovering the target area frame S0 of the current frame. The target area box S0 of the frame is then used as the base search area S of the next frame₀. The steps S101 to S108 are repeated in this loop to perform the loop update so as to realize the entire tracking process.

The specific embodiment is as follows:

as shown in fig. 3 and fig. 5 to 7, the target area frame S for the first frame of the video₀And (6) obtaining. The specific means usually adopts a detection module and frames S according to the target area_OSize of (d) to construct the bandwidth and S₀A size-proportional Gaussian shape regression model y, and performing DFT (discrete Fourier transform) on y to obtain

Step a: with S_OAs the base search area for the next frame, i.e. the second frame, at S_ORespectively obtain 4 extended search regions S with the same rectangular size₁，S₂，S₃，S₄. The tracking target is a target vehicle on one side of the left hand.

Step b: in the second frame of the video, respectively, the S₀，S₁，S₂，S₃，S₄HOG features (Histogram of Oriented Gradient features) are extracted. Obtaining the feature vectors x in the above 5 regions respectively₀，x₁，x₂，x₃，x₄Representing the respective sample. Converting the analog quantity of a specific video picture into a digital quantity which can be calculated, wherein the feature vector x₀，x₁，x₂，x₃，x₄Obtaining corresponding cyclic matrix A after self-cyclic shift₀，A₁，A₂，A₃，A₄. Due to the nature of the circulant matrix, vector x can be efficiently aligned₀，x₁，x₂，x₃，x₄A DFT transform (discrete fourier transform) is performed. Obtaining the frequency domain after DFT transformation

And C: according to

And calculating to obtain model parameters. Then according to

Obtaining a corresponding graph r of the second frame, the corresponding graph size and S₀And (5) the consistency is achieved. Where z denotes a target area box S0 (also called the base search area S of the current frame) of the second frame based on the first frame₀) And obtaining the feature vector.

And finding out the relative displacement value of the target movement according to the maximum point of the corresponding graph r. Restoring the second frame to obtain a target area frame S0 of the current second frame, and taking the area as a basic search area S of a third frame₀And repeating the steps a, b and c for the third frame to realize the whole tracking process.

In fig. 3, in the detailed explanation of the technical solution of the present invention, relevant parameters and formulas are given, and the present invention is applied to the actual camera monitoring. As shown in fig. 5-7 below, the tracked video object is a white car, followed by frames from the video during the tracking process, and the target frame represents the tracking frame. The method is used for single target tracking, and can also be applied to a multi-target tracking system in combination with a detection module, which is not described herein again.

The application also provides an intelligent device, which comprises:

an acquisition unit for framing S in the target area in the next frame of the video stream₀Obtaining a plurality of expanded search areas around;

an extraction unit for extracting a target region frame S₀And a plurality of said extended search area extractionTaking the feature vector and carrying out operation transformation;

Furthermore, the connection unit is further configured to obtain the shifted samples x of the corresponding feature vector according to the obtained shifted samples x₀，x₁，x₂，x₃，x₄For each said shifted sample, connecting by cyclic shiftCirculant matrix A for respective region₀，A₁，A₂，A₃，A₄。

And is also used according to

Calculate out the corresponding S₀Model parameters α ═ α of the filter associated with the regions S1, S2, S3, S4₀，α₁，α₂，α₃，α₄Wherein i respectively represents S₀Target region, λ, of S1, S2, S3, S4₁The learning rate value is 0.015 for the fixed parameter according to the above

Subjecting it to complex conjugate transformation to obtain corresponding

And also for passing

According to

the acquisition unit is also used for acquiring

the reduction unit is used for reducing the target area frame S of the current frame according to the relative shift value₀Search region S as a basis for the next frame₀。

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for video object tracking based on correlation filtering, the method comprising:

s101, detecting a video stream to acquire a target area frame S₀；

s109, the target area frame in the flow S108 is used as the target area frame S of the next frame₀And repeating the above steps S103 to S108 to achieve the pair of targetsTracking of the object.

2. The video object tracking method according to claim 1, wherein the video stream acquisition target area box S is detected₀Passing through the target area frame S₀Constructing a regression model and carrying out operation transformation, and setting a target area frame S in the next frame of the video stream₀Obtaining multiple extended search areas around, and framing the target area₀And extracting feature vectors from a plurality of the extended search regions and performing operation transformation, wherein the method further comprises the following steps:

obtaining a target area frame S after detecting a first frame of a video₀；

3. The video object tracking method of claim 2, further comprising:

from the shifted samples x that obtain the corresponding feature vector₀，x₁，x₂，x₃，x₄For each of said shifted samples, connecting by cyclic shift a circulant matrix A into a respective region₀，A₁，A₂，A₃，A₄。

4. The method of claim 3, wherein the model parameters of the filter are calculated according to the eigenvectors, the corresponding map is calculated according to the model parameters and the eigenvectors, the relative displacement value of the target movement is obtained from the corresponding map, and the target region of the current frame is restored according to the relative displacement value, the method further comprising:

According to

Subjecting it to complex conjugate transformation to obtain corresponding

By passing

According to

Obtaining a corresponding graph r of the current frame;

according to

5. An intelligent device, comprising:

6. The intelligent device according to claim 5, further comprising:

the detection unit is also used for acquiring a target area frame S after detecting the first frame of the video₀；

The construction operation unit is also used for constructing the frame S according to the target area₀Size of (d), constructing bandwidth and S₀A size-proportional gaussian shape regression model y;

7. The intelligent device according to claim 6, further comprising:

a connection unit for obtaining the shift sample x of the corresponding feature vector₀，x₁，x₂，x₃，x₄For each of said shifted samples, connecting by cyclic shift a circulant matrix A into a respective region₀，A₁，A₂，A₃，A₄。

8. The intelligent device according to claim 7, characterized in that it comprises:

the meterA computing unit for computing the cyclic matrix A₀，A₁，A₂，A₃，A₄Performing discrete Fourier transform to obtain the frequency domain

And is also used according to

Subjecting it to complex conjugate transformation to obtain corresponding

And also for passing

According to

the acquisition unit is also used for acquiring

the reduction unit is used for reducing the phase difference value according to the relative shift valueTarget region frame S of previous frame₀Search region S as a basis for the next frame₀。