CN111260738A

CN111260738A - Multi-scale target tracking method based on relevant filtering and self-adaptive feature fusion

Info

Publication number: CN111260738A
Application number: CN202010017064.8A
Authority: CN
Inventors: 唐晨; 邱岳
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-06-09

Abstract

The invention belongs to the field of computer vision, and aims to realize target tracking in a complex scene, describe a target by fully utilizing the color characteristic and the gradient characteristic of the target, improve the robustness under the conditions of scale change, shielding and the like and have higher precision. The invention discloses a multi-scale target tracking method based on relevant filtering and self-adaptive feature fusion, which comprises the following steps: step 1: inputting a current t frame image; step 2: extracting gradient features and color features of the sample; and step 3: convolving the filter h obtained from the previous frame with the candidate target image block x of the current frame; and 4, step 4: the weights are adaptively distributed to the two characteristics through the magnitude of the response values of the two characteristics of the HOG and the CN; and 5: extracting HOG characteristics of the multi-scale image block; step 6: marking out the target of the current frame by using a rectangular frame; and 7: and updating the translation filter and the scale filter according to a linear interpolation method, and finally realizing target tracking. The invention is mainly applied to the target tracking occasion.

Description

Multi-scale target tracking method based on relevant filtering and self-adaptive feature fusion

Technical Field

The invention belongs to the field of computer vision, and relates to a multi-scale target tracking algorithm based on relevant filtering and self-adaptive feature fusion.

Background

The target tracking, which is essentially to robustly estimate the motion state of a moving target in each frame of an image sequence, is one of the popular research subjects in the field of computer vision, and has wide application in the field of computer vision, for example, in the aspects of man-machine interaction, intelligent traffic monitoring, military guidance, robots, etc., the target tracking technology plays a key role. Although the object tracking problem has been studied for decades, many advances have been made in recent years, but it is still a very challenging problem. Many factors affect the performance of the tracking algorithm, such as occlusion, illumination change, scale change, rapid movement, etc., and finding a suitable method to solve the target tracking problem in these complex scenes becomes a key and difficult point of research.

Methods for solving the problem of target tracking can be divided into two broad categories-generation and discrimination. And a generation method, wherein a target area is modeled in the current frame, and the area most similar to the model is found in the next frame and is the prediction position. Known examples include kalman filter, particle filter, Mean Shift (Mean-Shift), and the like. And the discrimination method is to regard the tracking problem as two classification problems of the target and the background on each frame of image, take the target area as a positive sample and the background area as a negative sample, train the classifier by continuously collecting and updating the positive and negative samples in the tracking process, and select the classifier with the maximum response from the current target to be selected as the tracking result of the frame. Among the discriminant methods, the Correlation-based filtering method is a method widely used in recent years, and is represented by a kernel Correlation filtering and tracking algorithm (KCF).

Although the traditional correlation filtering algorithm has small calculation amount, the traditional correlation filtering algorithm has some significant defects in practical application:

(1) objects are not well described using only a single feature.

(2) The scale of the target cannot be accurately estimated to cope with the scale variation problem.

(3) When the target is blocked, the tracking precision is obviously reduced.

(4) The learning rate is a fixed value, and thus, efficient model updating cannot be performed in individual scenes.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to realize target tracking in a complex scene, describe the target by fully utilizing the color characteristic and the gradient characteristic of the target and introduce the scale filter on the basis of translating the filter, so that the robustness under the conditions of scale change, shielding and the like can be improved, and the tracking result has higher precision. Therefore, the technical scheme adopted by the invention is that the multi-scale target tracking method based on the relevant filtering and the self-adaptive feature fusion comprises the following steps:

step 1: inputting a current t frame image;

step 2: collecting image sample blocks, extracting gradient features and color features of the samples, and adopting a Histogram of Oriented Gradients (HOG) feature x^HOGAnd color attribute CN (color name) feature x^CNExpressing;

and step 3: and (3) convolving the filter h obtained from the previous frame with the candidate target image block x of the current frame, and then searching response vectors of all test samples, wherein the maximum response position is the prediction position of the target. For a given single target image block x, the response output of the classifier is

Image sample x expressing HOG features^HOGAnd CN feature expressed image sample x_CNRespectively substituting the above formula to obtain respective related filter response graphs and obtain respective maximum response value R^HOGAnd R^CN；

And 4, step 4: and (3) adaptively allocating weights to the two characteristics according to the magnitude of the response values of the two characteristics, namely HOG and CN, and calculating as shown in formula (2):

and performing weighted fusion on the final target tracking result according to the weight distributed by each feature weight for the CN feature and the HOG feature, and calculating as shown in the formula (3):

R＝δR^CN+(1-δ)R^HOG(3)

wherein delta belongs to [0,1 ]. If δ is 0, it represents a tracking result in which only the HOG feature is used for the tracking result; if δ is 1, it represents a tracking result in which the tracking result uses only color features;

and 5: extracting HOG features of multi-scale image blocks, training a least square classifier of a kernel function by using the HOG features of the multi-scale image blocks to obtain a one-dimensional scale tracker, and finally searching the maximum output response R of the filter_SThe prediction of the scale is done with equation (4):

wherein the parameters

To learn the parameters, k_SOf (2) element(s)

x_SIs the target scale model learned from frame t-1,

is a sample taken from a new frame;

step 6: the predicted position R and the predicted scale R of the target are compared_SCombining, marking the target of the current frame by using a rectangular frame;

and 7: updating the translation filter and the scale filter according to a linear interpolation method, as shown in the following formulas (5) to (7):

wherein the parameters

In order to learn the parameters, the user may,

k^jis an element of

Gamma is the adaptive learning rate and is,

λ is a regular term coefficient as a predicted value of the target.

And 8: and inputting the next frame of image, and processing according to the steps to realize target tracking.

A further technical scheme is that in the step 2 and the step 5, the specific extraction process of the HOG features comprises the following steps:

(1) firstly, graying the target image.

(2) The method of Gamma correction is used for carrying out color space normalization on the input image to be detected, thereby achieving the purposes of adjusting the contrast of the image and reducing the influence caused by local shadow and illumination change, and simultaneously inhibiting the interference caused by noise. Gamma compression formula: i (x, y) ═ I (x, y)^GammaHere, Gamma is 1/2;

(3) calculating the gradient of each pixel of the target image, wherein the gradient of pixel points in the image is as follows:

G_x(x,y)＝H(x+1,y)-H(x-1,y) (8)

G_y(x,y)＝H(x,y+1)-H(x,y-1) (9)

in the formula G_x(x,y)、G_y(x, y) and H (x, y) represent pixel values of horizontal and vertical gradients at a pixel point (x, y) in the input image, respectively. The gradient amplitude and gradient direction at pixel point (x, y) are respectively:

(4) dividing the image into small cell units each containing 4 x 4 pixel points, and constructing a gradient direction histogram for each cell unit;

(5) forming a block by every 3 × 3 cell units, and connecting the feature vectors of the cell units in the block in series to obtain a gradient histogram of the block;

(6) the feature vectors of the gradient histograms of all blocks in the target image are connected in series to form a directional gradient histogram of the target image, namely the finally obtained feature vector used by a discrimination classifier;

in step 7, the specific process of selecting the adaptive learning rate γ is as follows:

the frame difference method, i.e. the difference between two adjacent frames of pictures, is used as the basis to determine the learning rate in a segmented manner, and for an image block x with the size of m multiplied by n, each pixel point uses x_ijWherein i is 0. ltoreq. m-1 and j is 0. ltoreq. n-1. For the image of the t-th frame,

when the value of q is more than 0 and less than 2.5, the change of the target pictures of two adjacent frames is small, the change of the target appearance model is small, and a low learning rate is set to keep good tracking; when q is more than or equal to 2.5 and less than 8, the change of the target appearance model is normal, and the conventional learning rate is set; when q is more than or equal to 8, the change of the target appearance model is obvious, a larger learning rate is set, and the specific segmented learning rate is shown as the formula (13):

the invention has the characteristics and beneficial effects that:

in the target tracking method provided by the invention, the description of the target model is not limited to a single gradient feature due to the introduction of the color feature, the problem of scale change is well solved by introducing the scale filter to predict the scale of the target, and the model can be updated to adapt to different changes of a scene due to the application of the self-adaptive learning rate. Therefore, the target tracking method provided by the invention has stronger robustness and reliability in a complex scene.

Drawings

FIG. 1 is a tracking framework flow diagram of the present invention providing a tracking algorithm;

FIG. 2 is a graph comparing partial tracking results on a BlurBody data set by the present invention and a KCF tracking algorithm.

FIG. 2(a) is a partial tracking result of a KCF tracking algorithm on a BlurBody data set;

FIG. 2(b) is a partial tracking result of the tracking algorithm provided by the present invention on the BlurBody data set;

FIG. 3 is a comparison of partial trace results on a CarScale data set by the present invention and a KCF trace algorithm.

FIG. 3(a) is the partial trace result of the KCF trace algorithm on the CarScale data set;

FIG. 3(b) is a partial trace result of the tracing algorithm provided by the present invention on the CarScale data set;

FIG. 4 is a comparison graph of partial tracking results on a Coke data set by the present invention and a KCF tracking algorithm.

FIG. 4(a) is the partial tracking result of the KCF tracking algorithm on the Coke data set;

FIG. 4(b) is a partial tracking result of the tracking algorithm provided by the present invention on the Coke data set;

Detailed Description

In order to overcome the defects of the prior art, the invention aims to realize target tracking in a complex scene, describe the target by fully utilizing the color characteristic and the gradient characteristic of the target and introduce the scale filter on the basis of translating the filter, so that the robustness under the conditions of scale change, shielding and the like can be improved, and the tracking result has higher precision. Therefore, the technical scheme adopted by the invention is a multi-scale target tracking algorithm based on relevant filtering and self-adaptive feature fusion.

The concrete steps are detailed as follows:

step 1: inputting a current t frame image;

step 2: collecting image sample blocks, extracting gradient feature and color feature of the sample, and adopting HOG feature x^HOGAnd CN feature x^CNAnd (4) expressing.

The specific extraction process of the HOG features comprises the following steps:

(1) firstly, graying the target image.

(2) The method of Gamma correction is used for carrying out color space normalization on the input image to be detected, thereby achieving the purposes of adjusting the contrast of the image and reducing the influence caused by local shadow and illumination change, and simultaneously inhibiting the interference caused by noise. Gamma compression formula: i (x, y) ═ I (x, y)^GammaHere, Gamma is 1/2.

(3) The gradient of each pixel of the target image is calculated, and the main purpose is to capture basic contour information and further weaken the interference caused by illumination. The gradient of the pixel points in the image is:

G_x(x,y)＝H(x+1,y)-H(x-1,y) (1)

G_y(x,y)＝H(x,y+1)-H(x,y-1) (2)

(4) the image is divided into small cell units each containing 4 x 4 pixels and a gradient direction histogram is constructed for each cell unit.

(5) By grouping every 3 x 3 cell units into a block, and concatenating the respective feature vectors of the cell units included in that block, a gradient histogram of the block can be obtained.

(6) And (3) connecting the gradient histogram feature vectors of all blocks in the target image in series to form a directional gradient histogram of the target image, namely the finally obtained feature vector used by the discrimination classifier.

Image sample x expressing HOG features^HOGAnd CN feature expressed image sample x_CNRespectively substituting the above formula to obtain respective related filter response graphs and obtain respective maximum response value R^HOGAnd R^CN。

And 4, step 4: and (3) adaptively allocating weights to the two characteristics according to the magnitude of the response values of the two characteristics, namely the HOG and the CN, and calculating the weight as shown in the formula (6):

and performing weighted fusion on the final target tracking result according to the weight distributed by each feature weight for the CN feature and the HOG feature, and calculating as shown in the formula (7):

R＝δR^CN+(1-δ)R^HOG(7)

wherein delta belongs to [0,1 ]. If δ is 0, it represents a tracking result in which only the HOG feature is used for the tracking result; if δ is 1, this represents a tracking result in which only the color feature is used for the tracking result.

And 5: extracting HOG features of multi-scale image blocks, training a least square classifier of a kernel function by using the HOG features of the multi-scale image blocks to obtain a one-dimensional scale tracker, and finally searching the maximum output response R of the filter_SThe prediction of the scale is accomplished using equation (8).

Wherein the parameters

To learn the parameters, k_SOf (2) element(s)

x_SIs the target scale model learned from frame t-1,

is the sample taken from the new frame.

Step 6: the predicted position R and the predicted scale R of the target are compared_SIn combination, the target of the current frame is marked with a rectangular box.

And 7: updating the translation filter and the scale filter according to a linear interpolation method, as shown in the following formulas (9) to (11):

wherein the parameters

In order to learn the parameters, the user may,

k^jis an element of

Gamma is the adaptive learning rate and is,

λ is a regular term coefficient as a predicted value of the target.

The specific process of selecting the self-adaptive learning rate gamma is as follows:

the frame difference method, namely the difference between two adjacent frames of pictures is used as a basis to determine the learning rate in a segmented manner. For a frame of image block x with size of mxn, each pixel point uses x_ijWherein i is 0. ltoreq. m-1 and j is 0. ltoreq. n-1. For the image of the t-th frame,

when the value of q is more than 0 and less than 2.5, the change of the target pictures of two adjacent frames is small, the change of the target appearance model is small, and a low learning rate is set to keep good tracking; when q is more than or equal to 2.5 and less than 8, the change of the target appearance model is normal, and the conventional learning rate is set; when q is more than or equal to 8, the change of the target appearance model is obvious, and a larger learning rate is set. The specific segment learning rate is as shown in equation (13):

To verify the validity of the method, experimental results are given.

The tracking method provided by the invention is adopted to track the video sequences under 3 groups of complex scenes, and the tracking results are compared with the tracking results of the KCF algorithm under the same condition, and the obtained partial tracking results are shown in figures 2, 3 and 4.

In the video sequence shown in fig. 2, a person and a surrounding area are selected as a tracking target, the target in the video generates a motion blur condition, and the scale of the person is changed continuously in the tracking process. Fig. 2(a) and (b) show partial tracking results obtained by the KCF algorithm and the tracking algorithm provided by the present invention, respectively, and the target areas selected by the two algorithms at the initial frame are the same. From the tracking result of fig. 2, it can be known that the tracking algorithm provided by the present invention can well locate the target, the tracking accuracy is higher, and a good tracking effect is obtained. However, the KCF algorithm starts to shift when the moving target becomes blurred, and the center of the subsequent target and the center of the tracking frame shift greatly, resulting in a tracking failure.

In the video sequence shown in fig. 3, a car and its surrounding area are selected as a tracking target, and the car is from far to near during the tracking process, and the scale changes continuously. Fig. 3(a) and (b) show partial tracking results obtained by the KCF algorithm and the tracking algorithm provided by the present invention, respectively, and the target areas selected by the two algorithms at the initial frame are the same. From the tracking result of fig. 3, it can be known that the tracking algorithm provided by the present invention can well locate the target, the tracking accuracy is higher, and a good tracking effect is obtained. The KCF algorithm cannot detect the scale in the video sequence, and the size of the tracking frame is unchanged all the time, so that the tracking precision is influenced.

In the video sequence shown in fig. 4, the pop can and the area around the pop can are selected as the tracking target, and the target in the video is blocked for several times. Fig. 4(a) and (b) show partial tracking results obtained by the KCF algorithm and the tracking algorithm provided by the present invention, respectively, and the target areas selected by the two algorithms at the initial frame are the same. From the tracking result of fig. 4, it can be known that the tracking algorithm provided by the present invention can well locate the target, the tracking accuracy is higher, and a good tracking effect is obtained. When the moving target meets partial shielding, the tracking frame is shifted by the KCF algorithm, and the target is not re-detected subsequently, so that the tracking fails.

Therefore, the multi-scale target tracking algorithm based on the relevant filtering and the self-adaptive feature fusion has stronger robustness on the tracking of the moving target in complex scenes such as motion blur, scale change, occlusion and the like.

While the present invention has been described with reference to the drawings, the foregoing embodiments are illustrative rather than limiting, and that those skilled in the art, having the benefit of the teachings herein, may make numerous modifications thereto without departing from the spirit or scope of the invention as set forth in the appended claims.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A multi-scale target tracking method based on relevant filtering and self-adaptive feature fusion is characterized by comprising the following steps:

step 1: inputting a current t frame image;

and step 3: convolving the filter h obtained from the previous frame with the candidate target image block x of the current frame, then searching all test sample response vectors, setting the maximum response position as the predicted position of the target, and outputting the response of the classifier as

R＝δR^CN+(1-δ)R^HOG(3)

wherein δ belongs to [0,1], if δ is 0, the tracking result represents the tracking result only using the HOG feature; if δ is 1, it represents a tracking result in which the tracking result uses only color features;

wherein the parameters

To learn the parameters, k_SOf (2) element(s)

x_SIs the target scale model learned from frame t-1,

is a sample taken from a new frame;

wherein the parameters

In order to learn the parameters, the user may,

k^jis an element of

Gamma is the adaptive learning rate and is,

the lambda is a regular term coefficient;

and 8: and inputting the next frame of image, processing according to the steps 1-7, and finally realizing target tracking.

2. The multi-scale target tracking method based on correlation filtering and adaptive feature fusion as claimed in claim 1, wherein in the step 2 and the step 5, the specific extraction process of the HOG features is as follows:

(1) firstly, graying a target image;

(2) the method of Gamma correction is used for carrying out color space normalization on an input image to be detected, thereby achieving the purposes of adjusting the contrast of the image and reducing the influence caused by local shadow and illumination change, and simultaneously inhibiting the interference caused by noise, and the Gamma compression formula: i (x, y) ═ I (x, y)^GammaHere, Gamma is 1/2;

G_x(x,y)＝H(x+1,y)-H(x-1,y) (8)

G_y(x,y)＝H(x,y+1)-H(x,y-1) (9)

in the formula G_x(x,y)、G_y(x, y) and H (x, y) respectively represent pixel values of gradients in the horizontal direction and the vertical direction at a pixel point (x, y) in the input image, and the gradient magnitude and the gradient direction at the pixel point (x, y) are respectively:

3. The multi-scale target tracking method based on correlation filtering and adaptive feature fusion as claimed in claim 1, wherein in step 7, the specific process of adaptive learning rate γ selection adopts a frame difference method, i.e. the difference between two adjacent frames of pictures is used as the basis to determine the learning rate in a segmented manner, and for a frame of image block x with the size of mxn, each pixel point uses x_ijWhere 0 ≦ i ≦ m-1, 0 ≦ j ≦ n-1, for the t-th frame image: