CN110298868B

CN110298868B - High-instantaneity multi-scale target tracking method

Info

Publication number: CN110298868B
Application number: CN201910559301.0A
Authority: CN
Inventors: 王波涛; 石梦华
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2021-06-25
Anticipated expiration: 2039-06-26
Also published as: CN110298868A

Abstract

The invention discloses a multi-scale target tracking method with high real-time performance. The method comprises the steps of extracting fast gradient direction histogram (fhog) features of a target and a surrounding background region of the target, generating positive and negative samples through cyclic shift simulation, using a two-dimensional Gaussian function as a sample label, and training a correlation filter by a ridge regression method; the subsequent frame utilizes the response characteristic of a filter to obtain a target position, and a method based on combination of scale prediction and a scale pool is adopted to calculate the target scale; and repeating the training process to update the interpolation of the related filter. The method realizes the optimization of frequency domain operation and scale estimation modes by utilizing the historical motion information of the target, can optimize the operation efficiency of a Kernel Correlation Filter (KCF) method by about 43 percent while ensuring the tracking precision, provides guarantee for the transplantation of a correlation filtering target tracking method to development boards with low operation capability such as an embedded system and the like due to high real-time property, and can be applied to the fields of intelligent monitoring, aerospace, unmanned driving and the like.

Description

High-instantaneity multi-scale target tracking method

Technical Field

The invention belongs to the field of single-target tracking of computer vision, and particularly provides a high-instantaneity multi-scale target tracking method.

Background

The target tracking is based on image processing, integrates technologies such as automatic control, information science and the like, can position a target in each frame of image, and obtains information such as the size of the target, so that the target area is divided from a background area, and the task of tracking the target in the whole video sequence image is completed.

In recent years, the focus of research in the field of discriminant tracking has mainly focused on the improvement of tracking methods based on correlation filters. Correlation filtering is originally a signal processing method, and the correlation of an input signal is judged through the output response of the input signal after passing through a filter. In target tracking, the center position of a tracking target can be determined by finding the maximum response. Fast and accurate scale estimation methods are necessary to achieve an efficient update of the target model. At present, the method for solving the problem of scale calculation is mainly to use a translation filter to perform target detection on an image block subjected to multi-scale scaling in a scale pool mode, and to obtain the translation position with the largest response and the scale where the translation position is located. The method is simple and easy to understand, but is too coarse, so that the overall complexity of the method can increase in multiple levels along with the increase of the number of the detected scales, and the superiority of high-speed performance of the nuclear correlation filtering tracking method is damaged, so that the real-time requirement cannot be met when the method runs on development boards with low computing power such as an embedded processor, and the real landing requirement of the tracking technology cannot be met.

In order to optimize the target scale estimation method based on the scale pool, the existing redundancy of the target scale estimation method is reduced, and the operation efficiency of the tracking method is improved. Since the increase of the calculation scale has strong destructive power, how to reduce the number of scale calculations is considered, and by incorporating the mechanism of scale prediction therein, the speed advantage can be improved while the high accuracy of the tracking method is ensured.

Disclosure of Invention

The invention provides a multi-scale target tracking method with high real-time property, aiming at solving the problem of low scale estimation efficiency in the target tracking process, and the tracking speed of the related filtering type target tracking method is improved while the high tracking precision is ensured.

In order to achieve the technical purpose, the technical scheme adopted by the invention mainly comprises the following steps:

step 1, determining a region of interest. Selecting a target to be tracked by using a rectangular frame in an initial image frame, processing a 2.5-time pixel area around the target, adjusting the size of a long edge to 96, and scaling a shorter edge according to the changed proportion of the long edge;

and 2, extracting features. Extracting 31-dimensional fast gradient direction histogram (fhog) features from the region of interest;

and 3, training a relevant filtering tracker. Converting the characteristic vector and the two-dimensional Gaussian peak function into a frequency domain for calculation to obtain a correlation filter;

and 4, detecting the target position. Extracting the region of interest in the next frame of image in the same way as the step 1, and obtaining the position of the target by utilizing the correlated filtering tracker obtained by training, wherein the maximum response value is;

and 5, estimating the scale. And combining the scale prediction with the scale pool to determine the target scale.

In scale estimation, the prioritization is first calculated for the scales within the scale pool. The change of the target scale per frame can be divided into three cases: 0.95,1,1.05, i.e. small scale, normal, large scale. This indicates that the target is smaller, unchanged, and larger, respectively.

The traditional scale estimation method is that image blocks with different scales are respectively taken to extract features, response values are obtained by the image blocks and relevant filters, the response values are compared, and the largest image block is the optimal scale. The overall complexity and the number of scales of the method are increased by multiple levels, so that the influence on the speed is large. The prioritization method provided by the invention is based on the response characteristic of the correlation filter, and by taking the situation that the target actually becomes larger as an example, the obtained response value is taken as a variable R, and R (0.95) < R (1) < R (1.05) can be deduced. Therefore, if R (1) and R (1.05) are calculated and compared, the actual maximum value can be obtained, and the response to a small scale does not need to be calculated again. And on the contrary, if the target actually becomes smaller, the response with large scale does not need to be calculated again. For the case that the target actual scale is not changed, the calculation can be performed by randomly dividing into one of the above cases, and then the response of the third scale is calculated and compared.

The present invention uses a statistical-based approach, using the actual motion information of the target, to confirm that a response of a certain scale is first calculated. And counting the scale change conditions of the past 10 frames, wherein the scale change is more, namely, the scale is preferentially calculated in the current frame. In either case, R (1), the response when the scale is unchanged, needs to be calculated before comparison.

The statistical method adopts a mode of creating a special circular linked list, and completes the statistics of the scale change condition of the past frame in constant time complexity so as to avoid the influence of high time complexity on the tracking speed.

In order to further improve the tracking speed, the invention tracks the target with almost unchanged scale by adopting a frame-by-frame multi-scale mode after counting multi-frame information. Namely one frame with multiple scales and multiple frames with single scale are alternately carried out. The target scale information can be updated in time, and time waste caused by multi-scale tracking of each frame can be effectively avoided.

The invention also performs relevant optimization on the redundant calculation, stores and utilizes the frequency domain operation of the characteristic vector, reduces the operation times of the Fast Fourier Transform (FFT) of the nuclear correlation filter, and improves the operation efficiency by 20 percent through optimization.

The frequency domain operation and scale estimation method provided by the invention can be applied to all correlation filtering tracking methods, and is not limited to a Kernel Correlation Filter (KCF) tracking method.

Drawings

FIG. 1: method overall flow chart

FIG. 2: circular linked list schematic diagram

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings.

As shown in the overall flowchart of fig. 1, the present invention discloses a high-real-time multi-scale target tracking method, which specifically includes the following steps:

and 3, training a relevant filtering tracker and optimizing redundancy. Converting the characteristic vector and the two-dimensional Gaussian peak function into a frequency domain for calculation to obtain a correlation filter, wherein the training formula is as follows (1):

where α is the implicit coefficient of the correlation filter, k is the gaussian kernel function variable k (x, x),

the calculation formula is shown as (2), x is a feature vector of the training image block, y is a label vector formed by a two-dimensional Gaussian function, and lambda is a regularization factor.

In the formula (2), F^-1For the purpose of the inverse transform of the FFT,

as a result of the conversion of x into the frequency domain, an operation is a dot product, indicates that a conjugate is taken, and σ is the bandwidth. Two inputs of the kernel function in the training are both x, so that only one FFT needs to be actually performed on x, which can reduce the FFT calculation times of 1/2 in the training.

And 4, detecting the target position. Extracting the region of interest in the next frame of image as in step 1, and using the trained correlation filtering tracker, wherein the detection formula is as follows (3):

wherein the content of the first and second substances,

and outputs the response values for all positions, alpha is obtained in step 3,

for the gaussian kernel function variable k (x, z), the formula is calculated as (4). Wherein x is the training image block feature vector and z is the testing image block feature vector. The maximum response value is the position of the target.

The two inputs to the kernel function under test are x and z respectively,

and

in the multi-scale calculation link, the detection process is repeated for image blocks with different scales, namely only the feature vector z of the test image block is changed, so that the FFT (x) in the first scale operation is stored and is directly called in the subsequent operation.

In this step scale estimation, the scaling calculations within the scale pool are first prioritized. The change of the target scale of each frame is divided into three cases: 0.95,1,1.05, i.e. small scale, normal, large scale. This indicates that the target is smaller, unchanged, and larger, respectively. The prioritization method provided by the invention is based on the response characteristic of the correlation filter, and by taking the situation that the target actually becomes larger as an example, the obtained response value is taken as a variable R, and R (0.95) < R (1) < R (1.05) can be deduced. Therefore, if R (1) and R (1.05) are calculated and compared, the actual maximum value can be obtained, and the response to a small scale does not need to be calculated again. Similarly, if the target actually becomes smaller, it is not necessary to calculate the response of large scale again. For the case that the target actual scale is not changed, the calculation can be performed by randomly dividing into one of the above cases, and then the response of the third scale is calculated and compared.

The present invention uses a statistical-based approach, using the actual motion information of the target, to confirm that a response of a certain scale is first calculated. And counting the scale change conditions of the past 10 frames, wherein the scale change is more, namely, the scale is preferentially calculated in the current frame. In either case, R (1), the response when the scale is unchanged, needs to be calculated before comparison. The statistical method adopts a mode of creating a special circular linked list, and completes the statistics of the scale change condition of the past frame in constant time complexity so as to avoid the influence of high time complexity on the tracking speed. The circular chain representation is intended to be as shown in fig. 2, and specifically includes the following:

firstly, defining nodes, wherein each node is provided with a bidirectional pointer which points to front and rear nodes respectively, defining an mcache variable inside the node and used for recording the current scale state, and the state is divided into a large state (with the value of 1), a constant state (with the value of 0), a small state (with the value of-1) and a default state (with the value of 2).

After the circular linked list is created, a pointer is used for pointing to any node, the pointer moves down one node every time when one frame passes, and when the number of frames is larger than the length of the queue, the pointer points to the initial node in the linked list, so that the node information is covered, and the recording of the latest 10 frames of information is realized.

In order to reduce the time complexity of the statistical scale information, a mSum variable is maintained for recording the sum of all frame scale states mScale, and the formula is as follows (5):

mSum＝∑mScale(mScale≠2) (5)

the default case of mSchale of 2 indicates that the node is not yet covered and cannot contain it.

The mSchale design is a statistical variable of a 0-mean value, if the sum is greater than 0, the result shows that the number of frames with the scale of the first 10 frames increased is large, if the sum is equal to 0, the number of frames with the scale unchanged is large, if the sum is less than 0, the number of frames with the scale reduced is large, the modification of the mSum variable is positioned at the moment of setting the information of the frame each time, and the formula is as follows (6):

mSum＝mSum-mScale(i)+mScale(j) (6)

the scale information mSCAle (i) of the current pointing node is subtracted, and the scale information mSCAle (j) of the current frame is added, so that the time complexity occupied by the statistics of the scale change information is constant complexity, and the tracking speed cannot be reduced.

Meanwhile, the head node information is used for judging the full state and the empty state of the team, if the scale state of the head node is the default, the head node is not used, and the empty state of the team can be judged; if the previous node scale state of the head node is not default, it can be determined that the queue is full. The time complexity of the query in the two states is constant complexity, and the tracking speed cannot be reduced.

For the target with almost unchanged scale, the invention adopts a mode of frame-by-frame multi-scale tracking. Namely one frame with multiple scales and multiple frames with single scale are alternately carried out. The multiscale is performed every 5 frames, namely one frame multiscale and 4 frames single scale. The target scale information can be updated in time, and time waste caused by multi-scale tracking of each frame can be effectively avoided.

In conclusion, the multi-scale target tracking method with high real-time performance provided by the invention is explained in detail. The description should not be construed as limiting the invention, which is intended to be limited only by the appended claims.

Claims

1. A multi-scale target tracking method with high real-time performance is characterized by comprising the following steps:

step 1, determining a region of interest; selecting a target to be tracked by using a rectangular frame in an initial image frame, and extracting the target and a surrounding background area thereof;

step 2, extracting characteristics; extracting the fast gradient direction histogram characteristics of the region of interest;

step 3, training a correlation filter; converting the characteristic vector and the two-dimensional Gaussian function into a frequency domain for calculation to obtain a correlation filter;

step 4, detecting the target position; extracting the region of interest in the next frame of image in the same way as the step 1, and obtaining the position of the target with the maximum response value by using the relevant filter obtained by training;

step 5, scale estimation; combining the scale prediction with a scale pool to determine a target scale;

the step 5 specifically comprises the following steps: based on the response characteristics of the relevant filters, calculating and dividing priorities for the scales in the scale pool; the step of dividing the scale priority specifically comprises the following steps:

first, the change of target scale per frame is divided into three cases: 0.95,1,1.05, namely small-scale, normal and large-scale; thereby respectively representing the situations that the target becomes smaller, unchanged and larger;

when the target actually becomes large, the obtained response value is set as a variable R, and R (0.95) < R (1) < R (1.05) is deduced; therefore, the R (1) and the R (1.05) are calculated and compared to obtain the actual maximum value, and the response to the small scale is not required to be calculated again;

if the target actually becomes smaller, it is deduced that R (0.95) > R (1) > R (1.05); therefore, the R (1) and the R (0.95) are calculated and compared to obtain the actual maximum value, and the large-scale response does not need to be calculated again;

if the actual target scale is not changed, randomly dividing the actual target scale into smaller ones or larger ones to calculate the actual maximum value, and calculating the response of the third scale to compare with the actual maximum value;

based on a statistical method, by utilizing the actual motion information of the target, confirming that the response of a certain scale is calculated at first; counting the scale change conditions of the past 10 frames, and determining the most scale change conditions, namely preferentially calculating the response of the scale corresponding to the most scale change conditions in the current frame; no matter which scale is changed most, R (1) needs to be calculated, namely the response when the scale is not changed;

the statistical-based method is as follows:

firstly, defining nodes, wherein each node is provided with a bidirectional pointer which respectively points to front and rear nodes, and an mcache variable is defined in each node and is used for recording the current scale state, and the state is divided into a variable-length value of 1, a constant-length value of 0, a variable-length value of-1 and a default value of 2;

after a circular linked list is created, a pointer is used for pointing to any node, the pointer moves down one node every time when one frame passes, and when the number of frames is larger than the length of a queue, the pointer points to the initial node in the linked list, so that the node information is covered, and the recording of the latest 10 frames of information is realized;

in order to reduce the time complexity of statistical scale information, maintaining an mSum variable for recording the sum of all frame scale states mSule, wherein the mSum is designed to be a statistical variable with a mean value of 0, if the sum is greater than 0, the number of frames with the scale of the first 10 frames being increased is more, if the sum is equal to 0, the number of frames with the scale being unchanged is more, if the sum is less than 0, the number of frames with the scale being decreased is more, the modification of the mSum variable is positioned at the time of setting current frame information each time, and the mSum subtracts the scale information of the current pointing node and the scale information of the current;

meanwhile, the head node information is used for judging the full state and the empty state of the team, and if the scale state of the head node is default, the head node is not used, and the empty state of the team is judged; and if the previous node scale state of the head node is not default, judging that the queue is full.

2. The multi-scale target tracking method with high real-time performance according to claim 1, characterized in that for a target with unchanged statistical scale, the tracking method adopts a frame-by-frame multi-scale mode; namely one frame with multiple scales and multiple frames with single scale are alternately carried out.

3. The method for multi-scale target tracking with high real-time performance as claimed in claim 2, wherein multi-scale is performed every 5 frames, namely, one frame multi-scale and 4 frames single-scale.