CN113379802A

CN113379802A - Multi-feature adaptive fusion related filtering target tracking method

Info

Publication number: CN113379802A
Application number: CN202110751273.XA
Authority: CN
Inventors: 赵磊; 李天文; 张莉园; 贺华迪
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2021-09-10
Anticipated expiration: 2041-07-01
Also published as: CN113379802B

Abstract

The invention relates to a multi-feature adaptive fusion related filtering target tracking method, and belongs to the technical field of image processing. The method is based on a related filtering tracking framework, adopts two complementary features of a direction gradient Histogram (HOG) and a color histogram to extract features, and adaptively adjusts fusion parameters of the two features according to the quality of a response diagram. Compared with a related filtering tracking method adopting a fixed parameter fusion mode, the method has the advantage that a more stable tracking effect is obtained. The invention improves the fusion strategy of the characteristics, extracts the HOG characteristics and the color characteristics with complementary advantages and disadvantages, and enables the characteristics to be more stably used for tracking by adjusting the proportion of the HOG characteristics and the color characteristics in the fusion characteristics.

Description

Multi-feature adaptive fusion related filtering target tracking method

Technical Field

The invention relates to a multi-feature adaptive fusion related filtering target tracking method, and belongs to the technical field of image processing.

Background

Video target tracking is one of research hotspots in the field of computer vision, and with the rapid improvement of computer processing capacity, a video-based target tracking technology develops rapidly, and important support is provided for applications such as intelligent monitoring, auxiliary driving and man-machine interaction. In recent years, a plurality of algorithms with excellent performance and speed emerge from the target tracking technology, wherein the related filtering algorithm is a more advanced algorithm at present and is widely concerned and researched. In the related filtering algorithm, although the DSST algorithm adds scale transformation, single characteristics are adopted, and more noise is introduced, so that the tracking is unstable. The SRDCF algorithm has good robustness, but the speed is very low, and the real-time performance cannot be met. Although the stable algorithm adopts a method of combining the HOG characteristic with the color characteristic, the weight is an empirical value, and the method cannot be automatically adjusted according to the change of the target and the environment, so that the adaptability is poor. The C-COT algorithm adopts a neural network to extract features, so that the computational complexity is greatly increased, and the speed is low.

The method realizes tracking based on a related filtering tracking frame, and realizes feature extraction by adopting two complementary features of a direction gradient Histogram (HOG) and a color histogram. The most similar prior art solution is the stack tracking method: and extracting the HOG features, learning according to the learning rule of the relevant filter by using the HOG features to obtain a filtering template, and updating the template by using a given formula. Extracting color features, training to obtain color probability models of the foreground and the background, and updating the template by using a given formula. And calculating an HOG characteristic response graph and a color characteristic response graph by using the template and the image to be detected, adding the HOG characteristic response graph and the color characteristic response graph in a ratio of 7:3 to obtain a fused response graph, wherein the maximum value of the response graph is the target position.

At present, a mainstream related filtering tracking method based on multi-feature fusion mostly adopts a fixed weight mode to realize multi-feature fusion. The related filtering template class features (HOG) have poor effects on quick deformation and quick movement, but can better process the conditions of motion blur, illumination change and the like, while the color statistical features with poor effects on illumination change and background similar colors are insensitive to deformation, do not belong to a related filtering framework, have no boundary effect, can deal with quick change, and perform linear summation of fixed weights aiming at the problems in a target tracking system of an actual complex scene under the condition that the two feature values are not judged, so that the advantages of the features under specific conditions cannot be exerted to the maximum extent.

Aiming at the problems that single characteristics of the algorithm have limitations and cannot meet the real-time performance, the method improves the characteristic fusion strategy, extracts the HOG characteristics and the color characteristics with complementary advantages and disadvantages, and enables the characteristics to be used for tracking more stably by adjusting the proportion of the HOG characteristics and the color characteristics in the fusion characteristics.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a multi-feature adaptive fusion related filtering target tracking method aiming at the defects of the existing fixed weight fusion mode, and multi-feature fusion parameters are set in a self-adaptive mode according to the confidence region of the response image of each video frame so as to promote the stability of a tracking system, thereby solving the technical problem.

The technical scheme of the invention is as follows: a multi-feature adaptive fusion related filtering target tracking method is based on a related filtering tracking framework, adopts two complementary features of a direction gradient Histogram (HOG) and a color histogram to extract features, and adaptively adjusts fusion parameters of the two features according to the quality of a response diagram. Compared with a related filtering tracking method adopting a fixed parameter fusion mode, the method has the advantage that a more stable tracking effect is obtained.

The method comprises the following specific steps:

step 1: inputting a first frame;

the video comprises a plurality of frames of pictures, each frame of picture at least comprises a target, wherein the target position on the 1 st frame of picture is known, and the target positions on the rest frames of pictures are unknown; the video frame number is a positive integer greater than or equal to 1; the upper left corner of each frame in the video frame sequence is a coordinate origin (1,1), and the Width and the Height are Width and Height respectively; manually or automatically selecting a rectangular area (x) of the object to be tracked in the first frame₀,y₀,w₀,h₀) I.e. the selected tracking target. Wherein (x)₀,y₀) Representing the coordinates of the upper left corner of the rectangular area, w₀,h₀Respectively, the width and height of the rectangular region. The first frame selected target is also called current frame tracking result (x)₁,y₁,w₁,h₁)＝(x₀,y₀,w₀,h₀) The subscript indicates the current frame number.

Step 2: initializing a target template;

step 2.1: calculating a search window;

tracking the result (x) from the previous frame, i.e. t-1 frame_t-1，y_t-1，w_t-1，h_t-1) The corresponding rectangular area can calculate the search window search (t) of the current frame, i.e. the tth frame candidate target, and the first frame search window is based on (x)₀,y₀，w₀,h₀) And (4) calculating. The center point of the search window is (x _ s)_t，y_s_t) Wherein x _ s_t＝x_t-1+w_t-1/2、y_s_t＝y_t-1+h_t-1(ii)/2, width and height are w _ s_t＝1.5×w_t-1+0.5×h_t-1、h_s_t＝1.5×h_t-1+0.5×w_t-1. In order to ensure that the search range is within the video frame range, the width and height of the search window are further modified according to the intersection of the search range and the current frame region. In order to facilitate the subsequent calculation of the color histogram feature, the distance between the boundary of the search window and the real target boundary is defined as an even number, and the width and height of the search window are further modified.

Let the width and height of normalized window Normwin be w _ n and h _ n, respectively, then the conversion factor of search window is

The search window image can be normalized and transformed to form a standard search window according to the search window transformation factor, which has a width and a height w _ sn_t＝w_s_t×γ、h_sn_t＝h_s_tX gamma, the width and height of the standard target window of the current frame is w _ on_t＝w_sn_t×0.75-h_sn_t×0.25、h_on_t＝h_sn_t×0.75-w_sn_t×0.25。

Step 2.2: generating a standard Gaussian response graph;

the standard gaussian response graph g is a two-dimensional matrix with width and height w _ g-w _ sn_t/cell、h_g＝h_sn_tCell, the matrix element value of which is a probability density function according to a two-dimensional Gaussian distribution N (0,0, delta, 0), can be expressed in terms of

And (4) calculating by using a formula. Wherein, the delta represents the standard deviation of two-dimensional Gaussian distribution and the calculation method is

The cell represents that the size of each grid in the HOG feature extraction process is cell multiplied by cell, and (i, j) represents the element coordinate position of the Gaussian response diagram matrix, and the origin is located at the center point of the matrix. The standard gaussian response map is fourier transformed to obtain its frequency domain representation G, which is the same size as G.

Step 2.3: extracting Histogram of Oriented Gradient (HOG) features;

using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search window_tThe size is w _ g × h _ g × 28. Using a cosine window pair of size w _ g × h _ g for feature f_tSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristic_tWhich is reacted with f_tThe same size.

Step 2.4: calculating a correlation filter template of the HOG characteristics;

frequency domain representation F of the HOG feature of the known normalized search window_tAnd the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter template_tCan be according to formula H_t＝G/F_tAnd (4) calculating.

Step 2.5: extracting a color histogram feature template;

search window (t) ═ x _ s_t,y_s_t,w_s_t,h_s_t) Inner target area (x)_t-1,y_t-1,w_t-1,h_t-1) The region outside is defined as the background region, and the target region is reduced to a certain degreeThe amount is defined as the foreground area, the center point is the same as the target area, and the width and height shrinkage are all (w)_t-1+h_t-1)/10. Extracting background color histograms bg _ hist in the background area and the foreground area respectively_tAnd foreground color histogram fg _ hist_tThe current frame color histogram feature template is obtained.

Step 3: inputting a next frame and extracting features;

calculating a search window (search) of the current frame according to the method of Step2.1, and extracting a frequency domain representation F of the directional gradient Histogram (HOG) feature of the current frame according to Step2.3_tResponse graph G of HOG feature of current frame_tCan be according to formula G_t＝F_t⊙H_t-1And (4) calculating.

Extracting the current frame color histogram feature bg _ hist according to the method of Step2.5_tAnd fg _ hist_tCorresponding each pixel in the search area image to the bin value of the histogram, combining the standard target window size and the color histogram feature bg _ hist of the last frame_t-1And fg _ hist_t-1Calculating the similarity atlas L of the color histogram feature and color histogram template of the current frame_tThe dimensions are the same as for response graph G.

Step 4: self-adaptive feature fusion;

step4.1: computing adaptive feature fusion parameters

By using

Calculating the mean value of the t-1 frame and the t frame

And

use of

Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame₀. Using formulas

Calculating the quality w of the HOG characteristic response diagram₁In which P is_m-P_sIs the peak difference, P, in the response map_mIs the main peak value, P_sIs the peak value of the sub-peak,

is the peak distance, is D_mPosition of main peak, D_sIs the secondary peak position, k is 0.1.

Using formulas

Calculating a similarity map L of a color histogram template_tAs the color response map mass w₂Wherein x is_iRepresents a similar map L_tThe ith value of (a).

Step4.2: self-adaptive feature fusion;

adaptive fusion is performed using the following formula:

using w₁，w₂Judging the quality of two characteristic diagrams and the illumination change intensity w₀And calculating the fusion ratio mu of the characteristic response graph. Wherein tau is a threshold parameter with a value range of [0.3, 0.6%]. When response map quality w₁，w₂When the response value is larger than or equal to the threshold value tau, the quality of the two response graphs is high. Judging the illumination change condition of the image, if the illumination change is not large, namely w₀≤15，μ＝0.3+(w₁-w₂) /2 if w₀> 15, mu ═ 0.2. When w is₁≥τ,w₂When τ is less, it indicates that HOG feature quality is good and color feature quality is not good, and μ ═ 0.3+ (w)₁τ) when w₁＜τ,w₂When t is more than or equal to t, the HOG characteristic quality is not good, the color characteristic quality is better, and mu is 0.3+ (w)₂- τ), when μ < 0, let μ ═ 0.

Using GL ═ 1-mu) G_t+μL_tAnd calculating a fusion response graph.

Step 5: determining a tracking result;

and the element value of the GL matrix of the self-adaptive feature fusion result represents the probability that the candidate target in the corresponding search window is the tracking result, and the candidate target corresponding to the maximum element value is the tracking result.

The number of candidate targets in the current frame search window is (w _ sn)_t-w_on_t)×(h_sn_t-h_on_t). Let GL be_max、x_GL_maxAnd y _ GL_maxRespectively representing the maximum element value in the self-adaptive feature fusion result GL matrix and the corresponding horizontal and vertical coordinate positions, and the tracking result of the current frame is (x)_t,y_t,w_t,h_t) Wherein w is_t＝w_t-1、h_t＝h_t-1、x_t＝x_t-1+(x_GL_max-(w_sn_t-w_on_t)/2)/γ-w_t/2、y_t＝y_t-1+(y_GL_max-(h_sn_t-h_on_t)/2)/γ-h_tAnd/2, gamma is a conversion factor of the search window.

Step 6: updating the target template;

tracking result (x) according to current frame_t，y_t，w_t，h_t) The Search window Search '(t) is calculated by the method of Step2.1, and the frequency domain representation F of the Histogram of Oriented Gradients (HOG) features in the range of Search' (t) is extracted according to the method of Step2.3_t', calculation of H according to the method of Step2.4_t'＝G/F_t'. Let η be the updated parameter, the HOG feature correlation filter template H of the current frame_tThe updating method is shown as the formula:

H_t＝(1-η)H_t-1+ηH_t'

tracking result (x) according to current frame_t,y_t,w_t,h_t) The method of Step2.5 and the location of (2) extracts the color histogram feature bg _ hist_t' and fg _ hist_t'. Let θ and β be update parameters, and the current frame background color histogram and foreground color histogram template update method is shown as formula:

bg_hist_t＝(1-θ)×bg_hist_t-1+θ×bg_hist_t'

fg_hist_t＝(1-β)×fg_hist_t-1+β×fg_hist_t'

step 7: if the current frame is the last frame, tracking ends, otherwise, go to Step 3.

The invention has the beneficial effects that: the self-adaptive fusion method is used for feature fusion, so that the advantages of the two features can be better exerted. The improved algorithm was tested using OTB100 with both improved accuracy and success rate. By performing central error analysis on a single video sequence, the accuracy of adaptive tracking without tracking failure is found to be better than that of fusion at a fixed ratio. The color feature is superior to the HOG feature in the conditions of motion blur and the like, the specific gravity of the color feature is improved by the algorithm according to the quality of the response image, the response image quality is superior to the color response image under the conditions that the illumination change of the HOG feature is strong and the background color is interfered, and the specific gravity of the HOG feature in the response image is improved by the algorithm according to the quality of the response image. The fusion method of the self-adaptive adjustment proportion improves the quality of fusion characteristics, so that the fused response diagram can better cope with different interferences.

Drawings

FIG. 1 is a flow chart of the steps of the present invention;

fig. 2 is a diagram of a first frame tracking area selection in embodiment 1 of the present invention;

FIG. 3 is a trace result snapshot in embodiment 1 of the present invention;

fig. 4 is a diagram of the selection of the tracking area of the first frame in embodiment 2 of the present invention;

fig. 5 is a trace result snapshot in embodiment 2 of the present invention.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

As shown in fig. 1, a method for tracking a multi-feature adaptive fusion correlation filtering target includes the following specific steps:

step 1: a first frame is input.

Step 2: the target template is initialized.

Step 3: the next frame is input and features are extracted.

Step 4: adaptive feature fusion.

Step 5: and determining a tracking result.

Step 6: and updating the target template.

Example 1: according to the technical scheme of the invention, a Basketball video sequence is selected for tracking, and the video sequence has five challenging attributes of illumination change, shading, deformation, out-of-plane rotation and background clutter.

Step 1: inputting a first frame;

selecting a Basketball video, wherein the Width and the Height of a video frame are 576 and 432 respectively. Rectangular areas (198,214,34,81) of the object to be tracked in the first frame are selected, i.e. the selected tracked object is shown as a green rectangle in fig. 2. Where (198,214) is the coordinate of the upper left corner of the rectangular area and (34,81) is the width and height of the rectangular area.

Step 2: initializing a target template;

step2.1: calculating a search window;

tracking the result (x) from the previous frame, i.e. t-1 frame_t-1,y_t-1,w_t-1,h_t-1) The corresponding rectangular area can be used to calculate the search window search (t) of the current frame, i.e. the tth frame candidate, and the search window of the first frame is calculated according to (198,214,34, 81).

Taking the first frame as an example, the center point of the search window is (255, 215), and the width and height are w _ s₁＝92、h_s₁139. In order to ensure that the search range is within the video frame range, the width and height of the search window are further modified according to the intersection of the search range and the current frame region. In order to facilitate the subsequent calculation of the color histogram feature, the distance between the boundary of the search window and the real target boundary is defined as an even number, and the width and height of the search window are further modified.

Let the width and height of the normalized window NormWin bew _ n is 150 and h _ n is 150, the transform factor of the search window is

The search window image can be normalized and transformed to form a standard search window according to the search window transformation factor. Taking the first frame as an example, the width and height are w _ sn₁＝92、h_sn₁139, the standard target window of the current frame has a width and height of w _ on₁＝122、h_on₁＝184。

Step2.2: generating a standard Gaussian response graph;

the standard gaussian response map g is a two-dimensional matrix, where cell 4 has a width w _ g _ 30 and h _ g _ 46, and the matrix element values are probability density functions according to a two-dimensional gaussian distribution N (0,0, δ, δ,0), which can be expressed in terms of

And (4) calculating by using a formula. Wherein δ represents a standard deviation of the two-dimensional gaussian distribution, and the calculation method comprises:

Step2.3: extracting Histogram of Oriented Gradient (HOG) features;

using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search window₁The size is 30 × 46 × 28. Using a cosine window pair of features f of size 30 x 46_tSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristic₁Which is reacted with f₁The same size.

Step2.4: calculating a correlation filter template of the HOG characteristics;

frequency of known normalized search window HOG featuresDomain representation F₁And the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter template₁Can be according to formula H₁＝G/F₁And (4) calculating.

Step2.5: extracting a color histogram feature template;

taking the first frame as an example, the region outside the target region (198,214,34,81) in the Search window Search (1) ═ 254.5,215,92, 139) is defined as the background region, the target region is retracted by a certain amount and defined as the foreground region, the center point is the same as the target region, and the width and height are both retracted by 11.5. Extracting background color histograms bg _ hist in the background area and the foreground area respectively₁And foreground color histogram fg _ hist₁The current frame color histogram feature template is obtained.

Step 3: inputting a next frame and extracting features;

calculating a current frame Search window Search (2) according to the method of Step2.1, and extracting a frequency domain representation F of the directional gradient Histogram (HOG) feature of the current frame according to Step2.3₂Response graph G of HOG feature of current frame₂Can be according to formula G₂＝F₂⊙H₁And (4) calculating.

Extracting the current frame color histogram feature bg _ hist according to the method of Step2.5₂And fg _ hist₂Corresponding each pixel in the search area image to the bin value of the histogram, combining the standard target window size and the color histogram feature bg _ hist of the last frame₁And fg _ hist₁Calculating the similarity atlas L of the color histogram feature and color histogram template of the current frame₁The dimensions are the same as for response graph G.

Step 4: self-adaptive feature fusion;

step 4.1: computing adaptive feature fusion parameters

Taking the first frame as an example, utilize

Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame₀-0.6316 wherein

Using formulas

Calculating the quality w of the HOG characteristic response diagram₁0.7613 where the molecule is peak difference, P_mIs the main peak value, P_sIs the sub-peak, the denominator is the peak distance, D_mIs the position of the main peak, D_sIs the position of the secondary peak, using the formula

Calculating a similarity map L of a color histogram template_tAs the color response map mass w₂0.3986 where x_iRepresenting the pixel value of each pixel point.

Step 4.2: self-adaptive feature fusion;

and fusing the characteristic graphs according to the weight values obtained in Step 4.1. The threshold value tau is 0.5, w₁＜τ，w₂≥τ，μ＝0.3+(w₂- τ), μ -0.1387 using GL (1- μ) G_t+μL_tAnd calculating a fusion response graph.

Step 5: determining a tracking result;

The number of candidate objects in the current frame search window is 75 × 75. Let GL be_max、x_GL_max42 and y _ GL_max38 respectively represents the maximum element value in the adaptive feature fusion result GL matrix and its corresponding horizontal and vertical coordinate position, the current frame tracking result is (195, 214,34,81), where x₂＝195、y₂＝214、w₂＝34、h₂81, and 1.3264 are transform factors of the search window.

Step 5: updating the target template;

according to the position of the current frame tracking result (194.9884,214,34,81) and Step2.1The method calculates Search window Search '(2), and extracts frequency domain representation F of Histogram of Oriented Gradients (HOG) features in the Search' (2) range according to the method of Step2.3₂', calculation of H according to the method of Step2.4₂'＝G/F₂'. Let η equal to 0.01 as the updated parameter, the current frame HOG feature correlation filter template H₂The updating method is shown as the formula:

H₂＝(1-0.01)×H₁+0.01×H₂'

extracting a color histogram feature bg _ hist according to the position of the current frame tracking result (194.9884,214,34,81) and the method of Step2.5₂' and fg _ hist₂'. Let θ and β be update parameters equal to 0.04, and the current frame background color histogram and foreground color histogram template updating method is shown as formula:

bg_hist₂＝(1-0.04)×bg_hist₁+0.04×bg_hist₂'

fg_hist₂＝(1-0.04)×fg_hist₁+0.04×fg_hist₂'

Finally, the hardware experimental environment of the embodiment of the invention is a computer configured by an Intel Core i5-6700 CPU, a main frequency of 3.4GHz and a memory of 8GB, the success rate of the final tracking result reaches 78.6%, and the screenshot of a part of the tracking result is shown in figure 3. In the figure, the red box represents the fixed-scale fusion tracking result, and the green box represents the tracking result of the present invention.

Example 2: according to the technical scheme of the invention, the Soccer video sequence is selected for tracking, and has eight challenge attributes of illumination change, scale change, shielding, motion blur, rapid motion, in-plane rotation, out-of-plane rotation and background clutter.

Step 1: inputting a first frame;

selecting a Soccer video, wherein the Width and the Height of a video frame are 640 and 360 respectively. Rectangular areas (302,135,67,81) of the object to be tracked in the first frame are selected, i.e. the selected tracking object is shown as a green rectangular box in fig. 4. Wherein (302, 135) is the coordinates of the upper left corner of the rectangular area, and (67, 81) is the width and height of the rectangular area.

Step 1: initializing a target template;

step 2.1: calculating a search window;

tracking the result (x) from the previous frame, i.e. t-1 frame_t-1，y_t-1，w_t-1，h_t-1) The corresponding rectangular area can calculate the search window search (t) of the current frame, i.e. the tth frame candidate, and the first frame search window is calculated according to (302,135,67, 81).

Taking the first frame as an example, the center point of the search window is (336, 176), and the width and height are w _ s₁＝141、h_s₁155. In order to ensure that the search range is within the video frame range, the width and height of the search window are further modified according to the intersection of the search range and the current frame region. In order to facilitate the subsequent calculation of the color histogram feature, the distance between the boundary of the search window and the real target boundary is defined as an even number, and the width and height of the search window are further modified.

Let w _ n be 150 and h _ n be 150 for normalized window NormWin, respectively, then the transform factor of the search window is

The search window image can be normalized and transformed to form a standard search window according to the search window transformation factor. Taking the first frame as an example, the width and height are w _ sn₁＝141、h_sn₁155, the standard target window of the current frame has a width and height w _ on₁＝143、h_on₁＝157。

Step 2.2: generating a standard Gaussian response graph;

the standard gaussian response map g is a two-dimensional matrix, where cell 4 has a width w _ g 35 and h _ g 39, and the matrix element values are probability density functions according to a two-dimensional gaussian distribution N (0,0, δ, δ,0), which can be expressed in terms of

Step 2.3: extracting Histogram of Oriented Gradient (HOG) features;

using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search window₁The size is 35 × 39 × 28. Using a cosine window pair of features f of size 35 x 39_tSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristic₁Which is reacted with f₁The same size.

Step 2.4: calculating a correlation filter template of the HOG characteristics;

frequency domain representation F of the HOG feature of the known normalized search window₁And the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter template₁Can be according to formula H₁＝G/F₁And (4) calculating.

Step 2.5: extracting a color histogram feature template;

taking the first frame as an example, the region outside the target region (302,135,67,81) in the Search window Search (1) — (336, 176, 141, 155) is defined as the background region, the target region is retracted by a certain amount and defined as the foreground region, the center point of the foreground region is the same as the target region, and the width and height are both 14.8. Extracting background color histograms bg _ hist in the background area and the foreground area respectively₁And foreground color histogram fg _ hist₁The current frame color histogram feature template is obtained.

Step 3: inputting a next frame and extracting features;

calculating a current frame Search window Search (2) according to the method of Step2.1, and extracting a frequency domain representation F of the directional gradient Histogram (HOG) feature of the current frame according to Step2.3₂Of HOG features of the current frameResponse graph G₂Can be according to formula G₂＝F₂⊙H₁And (4) calculating.

Step 4: self-adaptive feature fusion;

step 4.1: calculating self-adaptive feature fusion parameters;

taking the first frame as an example, utilize

Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame₀-2.6366 wherein

Using formulas

Calculating the quality w of the HOG characteristic response diagram₁0.6576 where the molecule is peak difference, P_mIs the main peak value, P_sIs the sub-peak, the denominator is the peak distance, D_mIs the position of the main peak, D_sIs the position of the secondary peak, using the formula

Calculating a similarity map L of a color histogram template_tAs the color response map mass w₂0.5245 where x_iRepresenting the pixel value of each pixel point.

Step 4.1: self-adaptive feature fusion;

and fusing the characteristic graphs according to the weight values obtained by the 4.1. Taking the threshold value tau as 0.5, w₁＞τ,w₂≥τ，μ＝0.3+(w₁-w₂) (vi)/2, μ 0.2335 using GL ═ 1- μ G_t+μL_tAnd calculating a fusion response graph.

Step 5: determining a tracking result;

The number of candidate objects in the current frame search window is 75 × 75. Let GL be_max、x_GL_max42 and y _ GL_max38 respectively represents the maximum element value in the adaptive feature fusion result GL matrix and its corresponding horizontal and vertical coordinate position, the current frame tracking result is (298,139,67,81), where x₂＝298、y₂＝139、w₂＝67、h₂81, and 1.0146 are transform factors of the search window.

Step 6: updating the target template;

calculating a Search window Search '(2) according to the position of the current frame tracking result (194.9884,214,34,81) and the method of Step2.1, and extracting a frequency domain representation F of the Histogram of Oriented Gradients (HOG) feature in the Search' (2) range according to the method of Step2.3₂', calculation of H according to the method of Step2.4₂'＝G/F₂'. Let η equal to 0.01 as the updated parameter, the current frame HOG feature correlation filter template H₂The updating method is shown as the formula:

H₂＝(1-0.01)×H₁+0.01×H₂'

extracting a color histogram feature bg _ hist according to a position of a current frame tracking result (194.9884,214,34,81) and a method of Step2.5₂' and fg _ hist₂'. Let θ and β be update parameters equal to 0.04, and the current frame background color histogram and foreground color histogram template updating method is shown as formula:

bg_hist₂＝(1-0.04)×bg_hist₁+0.04×bg_hist₂'

fg_hist₂＝(1-0.04)×fg_hist₁+0.04×fg_hist₂'

Finally, the hardware experimental environment of the embodiment of the invention is a computer configured by an Intel Core i5-6700 CPU, a main frequency of 3.4GHz and a memory of 8GB, the success rate of the final tracking result reaches 51.8%, and the screenshot of the partial tracking result is shown in FIG. 5. In the figure, the green box represents the fixed-scale fusion tracking result, and the red box represents the tracking result of the present invention.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A multi-feature adaptive fusion related filtering target tracking method is characterized by comprising the following steps:

step 1: inputting a first frame;

step 2: initializing a target template;

step 3: inputting a next frame and extracting features;

step 4: self-adaptive feature fusion;

step 5: determining a tracking result;

step 6: updating the target template;

2. The method for tracking a target through correlation filtering with multi-feature adaptive fusion as claimed in claim 1, wherein Step1 is specifically: the upper left corner of each frame in the video frame sequence is a coordinate origin (1,1), the Width and the Height are Width and Height respectively, and a rectangular area (x) of an object to be tracked in the first frame is selected₀,y₀,w₀,h₀) I.e. the selected tracking target;

wherein (x)₀,y₀) Representing the coordinates of the upper left corner of the rectangular area, w₀,h₀Respectively representing the width and height of the rectangular area, and the first frame selected target is also called a current frame tracking result (x)₁,y₁，w₁，h₁)＝(x₀，y₀，w₀，h₀) The subscript indicates the current frame number.

3. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 2, wherein Step2 specifically comprises:

step2.1: calculating a search window;

tracking the result (x) from the previous frame, i.e. t-1 frame_t-1，y_t-1，w_t-1，h_t-1) The corresponding rectangular area can calculate the search window search (t) of the current frame, i.e. the tth frame candidate target, and the first frame search window is based on (x)₀，y₀，w₀，h₀) Calculating;

the center point of the search window is (x _ s)_t,y_s_t) Wherein x _ s_t＝x_t-1+w_t-1/2、y_s_t＝y_t-1+h_t-1(ii)/2, width and height are w _ s_t＝1.5×w_t-1+0.5×h_t-1、h_s_t＝1.5×h_t-1+0.5×w_t-1；

Let the width and height of normalized window Normwin be w _ n and h _ n, respectively, then the transform factor of the search window is:

the search window image can be normalized and transformed to form a standard search window according to the search window transformation factor, which has a width and a height w _ sn_t＝w_s_t×γ、h_sn_t＝h_s_tX gamma, the width and height of the standard target window of the current frame is w _ on_t＝w_sn_t×0.75-h_sn_t×0.25、h_on_t＝h_sn_t×0.75-w_sn_t×0.25；

Step2.2: generating a standard Gaussian response graph;

the standard gaussian response graph g is a two-dimensional matrix with width and height w _ g-w _ sn_t/cell、h_g＝h_sn_tCell whose matrix element values are probability density functions conforming to a two-dimensional Gaussian distribution N (0,0, δ, δ,0) by

Calculating;

where δ represents the standard deviation of the two-dimensional gaussian distribution:

the cell represents that the size of each grid in the HOG feature extraction process is cell multiplied by cell, (i, j) represents the element coordinate position of a Gaussian response diagram matrix, the origin is positioned at the central point of the matrix, and the standard Gaussian response diagram is subjected to Fourier transform to obtain the frequency domain representation G which has the same size as G;

step2.3: extracting HOG characteristics of the histogram of directional gradient;

using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search window_tThe size is w _ g × h _ g × 28; using a cosine window pair of size w _ g × h _ g for feature f_tSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristic_tWhich is reacted with f_tThe same size;

step2.4: calculating a correlation filter template of the HOG characteristics;

frequency domain representation F of the HOG feature of the known normalized search window_tAnd the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter template_t；

H_t＝G/F_t

Step2.5: extracting a color histogram feature template;

search window (t) ═ x _ s_t,y_s_t,w_s_t，h_s_t) Inner target area (x)_t-1，y_t-1，w_t-1,h_t-1) The region outside is defined as the background region, eyeThe target region is defined as the foreground region, the center point is the same as the target region, and the width and height shrinkage are all (w)_t-1+h_t-1) 10, extracting a background color histogram bg _ hist in the background area and the foreground area respectively_tAnd foreground color histogram fg _ hist_tThe current frame color histogram feature template is obtained.

4. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 3, wherein Step3 is specifically as follows:

calculating a search window search (t) of the current frame according to a method of Step2.1, and extracting a frequency domain representation F of HOG characteristics of a directional gradient histogram of the current frame according to Step2.3_tResponse graph G of HOG feature of current frame_t；

G_t＝F_t⊙H_t-1

5. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 3, wherein Step4 is specifically as follows:

step4.1: calculating self-adaptive feature fusion parameters;

by using

Calculating the mean value of the t-1 frame and the t frame

And

use of

Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame₀；

Using formulas

is the peak distance, is D_mPosition of main peak, D_sIs the secondary peak position, k is 0.1;

using formulas

Calculating a similarity map L of a color histogram template_tAs the color response map mass w₂Wherein x is_iRepresents a similar map L_tThe ith value of (d);

step4.2: self-adaptive feature fusion;

adaptive fusion is performed using the following formula:

using w₁，w₂Judging the quality of two characteristic diagrams and the illumination change intensity w₀Calculating the fusion ratio mu of the characteristic response diagram;

wherein tau is a threshold parameter with a value range of [0.3, 0.6 ];

when response map quality w₁，w₂When the response value is greater than or equal to the threshold value tau, the quality of the two response graphs is higher;

when w is₁≥τ，w₂When τ is less, it indicates that HOG feature quality is good and color feature quality is not good, and μ ═ 0.3+ (w)₁-τ)；

When w is₁＜τ,w₂When t is more than or equal to t, the HOG characteristic quality is not good, the color characteristic quality is better, and mu is 0.3+ (w)₂-τ)；

When μ < 0, let μ equal to 0, use GL equal to (1- μ) G_t+μL_tAnd calculating a fusion response graph.

6. The method for tracking a target with correlation filtering based on multi-feature adaptive fusion as claimed in claim 5, wherein Step5 is specifically:

the element value of the GL matrix of the self-adaptive feature fusion result represents the probability that the candidate target in the corresponding search window is the tracking result, and the maximum element value corresponds to the candidate target, namely the tracking result;

the number of candidate targets in the current frame search window is (w _ sn)_t-w_on_t)×(h_sn_t-h_on_t)；

Let GL be_max、x_GL_maxAnd y _ GL_maxRespectively representing the maximum element value in the self-adaptive feature fusion result GL matrix and the corresponding horizontal and vertical coordinate positions, and the tracking result of the current frame is (x)_t，y_t，w_t，h_t) Wherein w is_t＝w_t-1、h_t＝h_t-1、x_t＝x_t-1+(x_GL_max-(w_sn_t-w_on_t)/2)/γ-w_t/2、y_t＝y_t-1+(y_GL_max-(h_sn_t-h_on_t)/2)/γ-h_tAnd/2, gamma is a conversion factor of the search window.

7. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 3, wherein Step6 is specifically as follows:

tracking result (x) according to current frame_t，y_t,w_t，h_t) The Search window Search '(t) is calculated according to the method of Step2.1, and the method of Step2.3 is used to calculate the Search window Search' (t)Taking a frequency domain representation F of the HOG characteristic of the histogram of oriented gradients in the Search' (t) range_t', calculation of H according to the method of Step2.4_t'＝G/F_t'；

Let η be the updated parameter, the HOG feature correlation filter template H of the current frame_tThe updating method is shown as the formula:

H_t＝(1-η)H_t-1+ηH_t'

tracking result (x) according to current frame_t,y_t,w_t，h_t) The method of (3) and Step2.5 extracts the color histogram feature bg _ hist_t' and fg _ hist_t'；

Let θ and β be update parameters, and the current frame background color histogram and foreground color histogram template update method is shown as follows:

bg_hist_t＝(1-θ)×bg_hist_t-1+θ×bg_hist_t'

fg_hist_t＝(1-β)×fg_hist_t-1+β×fg_hist_t'。