CN110060280B

CN110060280B - Target tracking method based on appearance self-adaptive spatial regularization correlation filter

Info

Publication number: CN110060280B
Application number: CN201910349109.9A
Authority: CN
Inventors: 周武能; 傅衡成
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2021-03-30
Anticipated expiration: 2039-04-28
Also published as: CN110060280A

Abstract

The invention discloses a target tracking method based on an appearance self-adaptive spatial regularization correlation filter, which comprises the following steps: segmenting the target and the background of the tracking result image block by utilizing an online K-means clustering algorithm to obtain a target area template; generating a spatial regularization weight matrix by using a target area template; training and learning a correlation filter with space regularization by an Alternating Direction Multiplier Method (ADMM) to track the target. The invention can effectively limit the learning content of the correlation filter, reduce the background information of the correlation filter and inhibit the boundary effect of the correlation filter. Compared with the traditional correlation filter with spatial regularization, the method has the advantages that the target area and the background area are more accurately inhibited to different degrees; the search range of the correlation filter can be expanded, and the robustness of the correlation filter to large displacement of the target is improved.

Description

Target tracking method based on appearance self-adaptive spatial regularization correlation filter

Technical Field

The invention relates to a target tracking method based on an appearance self-adaptive spatial regularization correlation filter, and belongs to the technical field of video target tracking.

Background

The target tracking has important significance for the development of the fields of robots, unmanned planes, automatic driving, navigation, guidance and the like. For example, in the human-computer interaction process, the camera continuously tracks the human behavior, and the robot achieves the understanding of the human posture, the human motion and the human gesture through a series of analysis processing, so that the friendly communication between the human and the machine is better realized; in the unmanned aerial vehicle target tracking process, visual information of a target is continuously acquired and transmitted to a ground control station, and a video image sequence is analyzed through an algorithm to obtain real-time position information of the tracked target so as to ensure that the tracked target is within the visual field range of the unmanned aerial vehicle in real time.

When tracking an object in a video using a KCF algorithm, a fast motion or a violent motion of the tracked object may cause a search area of the KCF algorithm not to be completely covered, thereby causing a tracking failure. One way to ensure that the search area covers the target is by enlarging the area of the search area, but this introduces boundary effects that cause the filter to learn too much background information. There is a need to find an algorithm that can both enlarge the search area and suppress the background information of the filter.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: although the normalized spatial regularization related filtering algorithm can well solve the boundary effect, the regularization strategy of the algorithm only applies different penalty coefficients to the filter according to the positions of the coefficients of the filter, but the filter learning target information is not beneficial, so that the regularization strategy which can inhibit the boundary effect and better learn the target information needs to be found.

In order to solve the above technical problem, a technical solution of the present invention is to provide a target tracking method based on an appearance adaptive spatial regularization correlation filter, which is characterized by comprising the following steps:

(1) initializing the learning rate of a filter, the maximum iteration times of an ADMM algorithm, a Lagrange penalty factor and the size of a search box;

(2) extracting an image block containing a target from the t frame image, sorting each pixel point of the image block into a sample, and sequentially putting all sample points into a sequence D;

(3) clustering the sample points in the array D by using a K-means algorithm, and specifically setting as follows: measuring the similarity of the sample points by Euclidean distance, wherein the initial 5 centroids are designated as four vertexes of the rectangular image block and the central point of the rectangle, and finally obtaining the category of each sample point;

(4) arranging the sample points into a matrix P with the same size as the original image block according to the original sequence, wherein the elements of the matrix P are corresponding samplesThe class value of the point in the step (3) is that a matrix P with the same center point as that of the matrix P is intercepted from the matrix P₁But matrix P₁Is 0.6 times the current target size, for matrix P₁Counting and sorting the number of the belonged classes, considering the class with the largest number as the class where the target is located, naming as the target class, if the current frame is not the first frame, adding a class which is closest to all the previous target classes, adding the target class, setting the position of the class belonging to the target class to be 1 by using the elements in the matrix P, and if not, setting the positions of the classes to be 0, finally obtaining a Mask matrix of the area where the target is located, resetting the value of the position with the value of 1 to be 0.01 according to the Mask matrix, resetting the value of the position with the value of 0 to be 100000, and naming as a weight matrix w;

(5) solving the filter by using an alternating direction multiplier method, wherein an objective function L (f, g) of the filter is as follows:

where f is the filter, g is the auxiliary variable, y is the label generated by the Gaussian function,

represents the D-th feature channel of the target image block of the t-th frame, D represents the total number of feature channels,

is a Lagrange multiplier, mu is a Lagrange penalty factor;

the ADMM algorithm solves the objective function by iteratively solving the following subproblems:

the above subproblems are all closed-form solutions:

the horizontal line on the matrix represents the frequency domain form of the matrix, and the elements of the matrix N are all 1;

(5) the filter trained in step (4) is recorded as

And updating the previous filter, wherein the updated formula is shown as follows:

in the formula (I), the compound is shown in the specification,

and eta is the learning rate of the filter.

(7) If t frame is not the last frame, use h_i ^tAnd (3) scoring the candidate samples to obtain a response graph, taking the position with the maximum response value as the position of the target central point, and returning to the step (2), otherwise, ending the tracking.

Preferably, in step (2), the sample comprises 5 dimensions, and the sequence is: and carrying out dimension normalization processing on the R channel value, the G channel value, the B channel value, the X-axis coordinate and the Y-axis coordinate, so that the values are distributed in a [0,1] interval.

The invention can effectively limit the learning content of the correlation filter, reduce the background information of the correlation filter and inhibit the boundary effect of the correlation filter. Compared with the traditional correlation filter with spatial regularization, the method has the advantages that the target area and the background area are more accurately inhibited to different degrees; the search range of the correlation filter can be expanded, and the robustness of the correlation filter to large displacement of the target is improved.

Drawings

FIG. 1 is a flow algorithm implemented by the present invention;

fig. 2 is a process for obtaining the weight matrix according to the embodiment of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

With reference to fig. 1, the target tracking method based on the appearance adaptive spatial regularization correlation filter provided by the present invention includes the following steps:

(1) initializing the learning rate of the filter, the maximum iteration times of the ADMM, a Lagrange penalty factor and the size of a search box.

(2) And extracting image blocks containing the target from the t frame image, and sorting each pixel point of the image blocks into a sample. This sample contains 5 dimensions, arranged in the order: r channel, G channel value, B channel value, X axis coordinate, and Y axis coordinate. And carrying out dimension-based normalization processing on the first 3 dimensions, so that the values are distributed in a [0,1] interval. Finally, all the sample points are put into the array D in sequence.

(3) Clustering the sample points in the array D by using a K-means algorithm, and specifically setting as follows: the similarity of the sample points is measured in euclidean distances, and the initial 5 centroids are specified as the four vertices of the rectangular image block and the center point of the rectangle. And finally obtaining the category of each sample point.

(6) Arranging the sample points into a matrix P with the same size as the original image block according to the original sequence, wherein the elements of the matrix P are the belonged values of the corresponding sample points in the step (3). Intercepting a matrix P with the same center point as the matrix P in the matrix P₁But matrix P₁Is 0.6 times the current target size. For matrix P₁And counting and ordering the number of the belonged categories, and considering the category with the largest number as the category where the target is located, and naming the category as the target category. In addition, if the current frame is not the first frame, adding a frame closest to all the previous target classesAnd (4) adding the target class. And (3) setting the position of the element in the matrix P to be 1 according to the category belonging to the target class, otherwise, setting the position to be 0, finally obtaining a Mask matrix of the region where the target is located, resetting the value of the position with the value of 1 to be 0.01 according to the Mask matrix, resetting the value of the position with the value of 0 to be 100000, and naming the element as a weight matrix w.

(5) And solving the filter by using an alternating direction multiplier method, wherein the objective function of the filter is as follows:

is a lagrange multiplier and μ is a lagrange penalty factor. The ADMM algorithm solves the objective function by iteratively solving the following subproblems:

the subproblems in the above equation all have a closed form solution, as shown in the following equation:

in the above formula, the horizontal lines on the matrix represent the frequency domain form of the matrix, and the elements of the matrix N are all 1.

(6) The filter trained in step (5) is recorded as

And for the previous filterAnd updating the line, wherein the updated formula is shown as the following formula:

in the formula (I), the compound is shown in the specification,

and eta is the learning rate of the filter.

(7) If t frame is not the last frame, use

And (3) scoring the candidate samples to obtain a response graph, taking the position with the maximum response value as the position of the target central point, and returning to the step (2), otherwise, ending the tracking.

Claims

1. A target tracking method based on an appearance self-adaptive spatial regularization correlation filter is characterized by comprising the following steps:

(2) extracting an image block containing a target from the t frame image, sorting each pixel point of the image block into a sample, and sequentially putting all sample points into a sequence M;

(3) clustering sample points in the number series M by using a K-means algorithm, and specifically setting as follows: measuring the similarity of the sample points by Euclidean distance, wherein the initial 5 centroids are designated as four vertexes of the rectangular image block and the central point of the rectangle, and finally obtaining the category of each sample point;

(4) arranging the sample points into a matrix P with the same size as the original image block according to the original sequence, wherein the elements of the matrix P are the values to which the corresponding sample points belong in the step (3), and intercepting one sample point in the matrix P to be consistent with the center point of the matrix PThe same matrix P₁But matrix P₁Is 0.6 times the current target size, for matrix P₁Counting and sorting the number of the belonged classes, considering the class with the largest number as the class where the target is located, naming as the target class, if the current frame is not the first frame, adding a class which is closest to all the previous target classes, adding the target class, setting the position of the class belonging to the target class to be 1 by using the elements in the matrix P, and if not, setting the positions of the classes to be 0, finally obtaining a Mask matrix of the area where the target is located, resetting the value of the position with the value of 1 to be 0.01 according to the Mask matrix, resetting the value of the position with the value of 0 to be 100000, and naming as a weight matrix w;

representing the target image block x of the t-th frame_tD represents the total number of feature channels,

is a Lagrange multiplier, mu is a Lagrange penalty factor;

the above subproblems are all closed-form solutions:

(6) The filter trained in step (4) is recorded as

in the formula (I), the compound is shown in the specification,

a filter representing the ith feature of the t-th frame, wherein eta is the learning rate of the filter;

(7) if t frame is not the last frame, use

2. The method for tracking a target based on an appearance adaptive spatial regularization correlation filter according to claim 1, wherein in step (2), the samples comprise 5 dimensions, and are arranged in the following order: and carrying out dimension normalization processing on the R channel value, the G channel value, the B channel value, the X-axis coordinate and the Y-axis coordinate, so that the values are distributed in a [0,1] interval.