CN113379802A - Multi-feature adaptive fusion related filtering target tracking method - Google Patents

Multi-feature adaptive fusion related filtering target tracking method Download PDF

Info

Publication number
CN113379802A
CN113379802A CN202110751273.XA CN202110751273A CN113379802A CN 113379802 A CN113379802 A CN 113379802A CN 202110751273 A CN202110751273 A CN 202110751273A CN 113379802 A CN113379802 A CN 113379802A
Authority
CN
China
Prior art keywords
feature
frame
tracking
target
hog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110751273.XA
Other languages
Chinese (zh)
Other versions
CN113379802B (en
Inventor
赵磊
李天文
张莉园
贺华迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202110751273.XA priority Critical patent/CN113379802B/en
Publication of CN113379802A publication Critical patent/CN113379802A/en
Application granted granted Critical
Publication of CN113379802B publication Critical patent/CN113379802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-feature adaptive fusion related filtering target tracking method, and belongs to the technical field of image processing. The method is based on a related filtering tracking framework, adopts two complementary features of a direction gradient Histogram (HOG) and a color histogram to extract features, and adaptively adjusts fusion parameters of the two features according to the quality of a response diagram. Compared with a related filtering tracking method adopting a fixed parameter fusion mode, the method has the advantage that a more stable tracking effect is obtained. The invention improves the fusion strategy of the characteristics, extracts the HOG characteristics and the color characteristics with complementary advantages and disadvantages, and enables the characteristics to be more stably used for tracking by adjusting the proportion of the HOG characteristics and the color characteristics in the fusion characteristics.

Description

Multi-feature adaptive fusion related filtering target tracking method
Technical Field
The invention relates to a multi-feature adaptive fusion related filtering target tracking method, and belongs to the technical field of image processing.
Background
Video target tracking is one of research hotspots in the field of computer vision, and with the rapid improvement of computer processing capacity, a video-based target tracking technology develops rapidly, and important support is provided for applications such as intelligent monitoring, auxiliary driving and man-machine interaction. In recent years, a plurality of algorithms with excellent performance and speed emerge from the target tracking technology, wherein the related filtering algorithm is a more advanced algorithm at present and is widely concerned and researched. In the related filtering algorithm, although the DSST algorithm adds scale transformation, single characteristics are adopted, and more noise is introduced, so that the tracking is unstable. The SRDCF algorithm has good robustness, but the speed is very low, and the real-time performance cannot be met. Although the stable algorithm adopts a method of combining the HOG characteristic with the color characteristic, the weight is an empirical value, and the method cannot be automatically adjusted according to the change of the target and the environment, so that the adaptability is poor. The C-COT algorithm adopts a neural network to extract features, so that the computational complexity is greatly increased, and the speed is low.
The method realizes tracking based on a related filtering tracking frame, and realizes feature extraction by adopting two complementary features of a direction gradient Histogram (HOG) and a color histogram. The most similar prior art solution is the stack tracking method: and extracting the HOG features, learning according to the learning rule of the relevant filter by using the HOG features to obtain a filtering template, and updating the template by using a given formula. Extracting color features, training to obtain color probability models of the foreground and the background, and updating the template by using a given formula. And calculating an HOG characteristic response graph and a color characteristic response graph by using the template and the image to be detected, adding the HOG characteristic response graph and the color characteristic response graph in a ratio of 7:3 to obtain a fused response graph, wherein the maximum value of the response graph is the target position.
At present, a mainstream related filtering tracking method based on multi-feature fusion mostly adopts a fixed weight mode to realize multi-feature fusion. The related filtering template class features (HOG) have poor effects on quick deformation and quick movement, but can better process the conditions of motion blur, illumination change and the like, while the color statistical features with poor effects on illumination change and background similar colors are insensitive to deformation, do not belong to a related filtering framework, have no boundary effect, can deal with quick change, and perform linear summation of fixed weights aiming at the problems in a target tracking system of an actual complex scene under the condition that the two feature values are not judged, so that the advantages of the features under specific conditions cannot be exerted to the maximum extent.
Aiming at the problems that single characteristics of the algorithm have limitations and cannot meet the real-time performance, the method improves the characteristic fusion strategy, extracts the HOG characteristics and the color characteristics with complementary advantages and disadvantages, and enables the characteristics to be used for tracking more stably by adjusting the proportion of the HOG characteristics and the color characteristics in the fusion characteristics.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a multi-feature adaptive fusion related filtering target tracking method aiming at the defects of the existing fixed weight fusion mode, and multi-feature fusion parameters are set in a self-adaptive mode according to the confidence region of the response image of each video frame so as to promote the stability of a tracking system, thereby solving the technical problem.
The technical scheme of the invention is as follows: a multi-feature adaptive fusion related filtering target tracking method is based on a related filtering tracking framework, adopts two complementary features of a direction gradient Histogram (HOG) and a color histogram to extract features, and adaptively adjusts fusion parameters of the two features according to the quality of a response diagram. Compared with a related filtering tracking method adopting a fixed parameter fusion mode, the method has the advantage that a more stable tracking effect is obtained.
The method comprises the following specific steps:
step 1: inputting a first frame;
the video comprises a plurality of frames of pictures, each frame of picture at least comprises a target, wherein the target position on the 1 st frame of picture is known, and the target positions on the rest frames of pictures are unknown; the video frame number is a positive integer greater than or equal to 1; the upper left corner of each frame in the video frame sequence is a coordinate origin (1,1), and the Width and the Height are Width and Height respectively; manually or automatically selecting a rectangular area (x) of the object to be tracked in the first frame0,y0,w0,h0) I.e. the selected tracking target. Wherein (x)0,y0) Representing the coordinates of the upper left corner of the rectangular area, w0,h0Respectively, the width and height of the rectangular region. The first frame selected target is also called current frame tracking result (x)1,y1,w1,h1)=(x0,y0,w0,h0) The subscript indicates the current frame number.
Step 2: initializing a target template;
step 2.1: calculating a search window;
tracking the result (x) from the previous frame, i.e. t-1 framet-1,yt-1,wt-1,ht-1) The corresponding rectangular area can calculate the search window search (t) of the current frame, i.e. the tth frame candidate target, and the first frame search window is based on (x)0,y0,w0,h0) And (4) calculating. The center point of the search window is (x _ s)t,y_st) Wherein x _ st=xt-1+wt-1/2、y_st=yt-1+ht-1(ii)/2, width and height are w _ st=1.5×wt-1+0.5×ht-1、h_st=1.5×ht-1+0.5×wt-1. In order to ensure that the search range is within the video frame range, the width and height of the search window are further modified according to the intersection of the search range and the current frame region. In order to facilitate the subsequent calculation of the color histogram feature, the distance between the boundary of the search window and the real target boundary is defined as an even number, and the width and height of the search window are further modified.
Let the width and height of normalized window Normwin be w _ n and h _ n, respectively, then the conversion factor of search window is
Figure BDA0003144506970000021
The search window image can be normalized and transformed to form a standard search window according to the search window transformation factor, which has a width and a height w _ snt=w_st×γ、h_snt=h_stX gamma, the width and height of the standard target window of the current frame is w _ ont=w_snt×0.75-h_snt×0.25、h_ont=h_snt×0.75-w_snt×0.25。
Step 2.2: generating a standard Gaussian response graph;
the standard gaussian response graph g is a two-dimensional matrix with width and height w _ g-w _ snt/cell、h_g=h_sntCell, the matrix element value of which is a probability density function according to a two-dimensional Gaussian distribution N (0,0, delta, 0), can be expressed in terms of
Figure BDA0003144506970000031
And (4) calculating by using a formula. Wherein, the delta represents the standard deviation of two-dimensional Gaussian distribution and the calculation method is
Figure BDA0003144506970000032
The cell represents that the size of each grid in the HOG feature extraction process is cell multiplied by cell, and (i, j) represents the element coordinate position of the Gaussian response diagram matrix, and the origin is located at the center point of the matrix. The standard gaussian response map is fourier transformed to obtain its frequency domain representation G, which is the same size as G.
Step 2.3: extracting Histogram of Oriented Gradient (HOG) features;
using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search windowtThe size is w _ g × h _ g × 28. Using a cosine window pair of size w _ g × h _ g for feature ftSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristictWhich is reacted with ftThe same size.
Step 2.4: calculating a correlation filter template of the HOG characteristics;
frequency domain representation F of the HOG feature of the known normalized search windowtAnd the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter templatetCan be according to formula Ht=G/FtAnd (4) calculating.
Step 2.5: extracting a color histogram feature template;
search window (t) ═ x _ st,y_st,w_st,h_st) Inner target area (x)t-1,yt-1,wt-1,ht-1) The region outside is defined as the background region, and the target region is reduced to a certain degreeThe amount is defined as the foreground area, the center point is the same as the target area, and the width and height shrinkage are all (w)t-1+ht-1)/10. Extracting background color histograms bg _ hist in the background area and the foreground area respectivelytAnd foreground color histogram fg _ histtThe current frame color histogram feature template is obtained.
Step 3: inputting a next frame and extracting features;
calculating a search window (search) of the current frame according to the method of Step2.1, and extracting a frequency domain representation F of the directional gradient Histogram (HOG) feature of the current frame according to Step2.3tResponse graph G of HOG feature of current frametCan be according to formula Gt=Ft⊙Ht-1And (4) calculating.
Extracting the current frame color histogram feature bg _ hist according to the method of Step2.5tAnd fg _ histtCorresponding each pixel in the search area image to the bin value of the histogram, combining the standard target window size and the color histogram feature bg _ hist of the last framet-1And fg _ histt-1Calculating the similarity atlas L of the color histogram feature and color histogram template of the current frametThe dimensions are the same as for response graph G.
Step 4: self-adaptive feature fusion;
step4.1: computing adaptive feature fusion parameters
By using
Figure BDA0003144506970000041
Calculating the mean value of the t-1 frame and the t frame
Figure BDA0003144506970000042
And
Figure BDA0003144506970000043
use of
Figure BDA0003144506970000044
Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame0. Using formulas
Figure BDA0003144506970000045
Calculating the quality w of the HOG characteristic response diagram1In which P ism-PsIs the peak difference, P, in the response mapmIs the main peak value, PsIs the peak value of the sub-peak,
Figure BDA0003144506970000046
is the peak distance, is DmPosition of main peak, DsIs the secondary peak position, k is 0.1.
Using formulas
Figure BDA0003144506970000047
Calculating a similarity map L of a color histogram templatetAs the color response map mass w2Wherein x isiRepresents a similar map LtThe ith value of (a).
Step4.2: self-adaptive feature fusion;
adaptive fusion is performed using the following formula:
Figure BDA0003144506970000048
using w1,w2Judging the quality of two characteristic diagrams and the illumination change intensity w0And calculating the fusion ratio mu of the characteristic response graph. Wherein tau is a threshold parameter with a value range of [0.3, 0.6%]. When response map quality w1,w2When the response value is larger than or equal to the threshold value tau, the quality of the two response graphs is high. Judging the illumination change condition of the image, if the illumination change is not large, namely w0≤15,μ=0.3+(w1-w2) /2 if w0> 15, mu ═ 0.2. When w is1≥τ,w2When τ is less, it indicates that HOG feature quality is good and color feature quality is not good, and μ ═ 0.3+ (w)1τ) when w1<τ,w2When t is more than or equal to t, the HOG characteristic quality is not good, the color characteristic quality is better, and mu is 0.3+ (w)2- τ), when μ < 0, let μ ═ 0.
Using GL ═ 1-mu) Gt+μLtAnd calculating a fusion response graph.
Step 5: determining a tracking result;
and the element value of the GL matrix of the self-adaptive feature fusion result represents the probability that the candidate target in the corresponding search window is the tracking result, and the candidate target corresponding to the maximum element value is the tracking result.
The number of candidate targets in the current frame search window is (w _ sn)t-w_ont)×(h_snt-h_ont). Let GL bemax、x_GLmaxAnd y _ GLmaxRespectively representing the maximum element value in the self-adaptive feature fusion result GL matrix and the corresponding horizontal and vertical coordinate positions, and the tracking result of the current frame is (x)t,yt,wt,ht) Wherein w ist=wt-1、ht=ht-1、xt=xt-1+(x_GLmax-(w_snt-w_ont)/2)/γ-wt/2、yt=yt-1+(y_GLmax-(h_snt-h_ont)/2)/γ-htAnd/2, gamma is a conversion factor of the search window.
Step 6: updating the target template;
tracking result (x) according to current framet,yt,wt,ht) The Search window Search '(t) is calculated by the method of Step2.1, and the frequency domain representation F of the Histogram of Oriented Gradients (HOG) features in the range of Search' (t) is extracted according to the method of Step2.3t', calculation of H according to the method of Step2.4t'=G/Ft'. Let η be the updated parameter, the HOG feature correlation filter template H of the current frametThe updating method is shown as the formula:
Ht=(1-η)Ht-1+ηHt'
tracking result (x) according to current framet,yt,wt,ht) The method of Step2.5 and the location of (2) extracts the color histogram feature bg _ histt' and fg _ histt'. Let θ and β be update parameters, and the current frame background color histogram and foreground color histogram template update method is shown as formula:
bg_histt=(1-θ)×bg_histt-1+θ×bg_histt'
fg_histt=(1-β)×fg_histt-1+β×fg_histt'
step 7: if the current frame is the last frame, tracking ends, otherwise, go to Step 3.
The invention has the beneficial effects that: the self-adaptive fusion method is used for feature fusion, so that the advantages of the two features can be better exerted. The improved algorithm was tested using OTB100 with both improved accuracy and success rate. By performing central error analysis on a single video sequence, the accuracy of adaptive tracking without tracking failure is found to be better than that of fusion at a fixed ratio. The color feature is superior to the HOG feature in the conditions of motion blur and the like, the specific gravity of the color feature is improved by the algorithm according to the quality of the response image, the response image quality is superior to the color response image under the conditions that the illumination change of the HOG feature is strong and the background color is interfered, and the specific gravity of the HOG feature in the response image is improved by the algorithm according to the quality of the response image. The fusion method of the self-adaptive adjustment proportion improves the quality of fusion characteristics, so that the fused response diagram can better cope with different interferences.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
fig. 2 is a diagram of a first frame tracking area selection in embodiment 1 of the present invention;
FIG. 3 is a trace result snapshot in embodiment 1 of the present invention;
fig. 4 is a diagram of the selection of the tracking area of the first frame in embodiment 2 of the present invention;
fig. 5 is a trace result snapshot in embodiment 2 of the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
As shown in fig. 1, a method for tracking a multi-feature adaptive fusion correlation filtering target includes the following specific steps:
step 1: a first frame is input.
Step 2: the target template is initialized.
Step 3: the next frame is input and features are extracted.
Step 4: adaptive feature fusion.
Step 5: and determining a tracking result.
Step 6: and updating the target template.
Step 7: if the current frame is the last frame, tracking ends, otherwise, go to Step 3.
Example 1: according to the technical scheme of the invention, a Basketball video sequence is selected for tracking, and the video sequence has five challenging attributes of illumination change, shading, deformation, out-of-plane rotation and background clutter.
Step 1: inputting a first frame;
selecting a Basketball video, wherein the Width and the Height of a video frame are 576 and 432 respectively. Rectangular areas (198,214,34,81) of the object to be tracked in the first frame are selected, i.e. the selected tracked object is shown as a green rectangle in fig. 2. Where (198,214) is the coordinate of the upper left corner of the rectangular area and (34,81) is the width and height of the rectangular area.
Step 2: initializing a target template;
step2.1: calculating a search window;
tracking the result (x) from the previous frame, i.e. t-1 framet-1,yt-1,wt-1,ht-1) The corresponding rectangular area can be used to calculate the search window search (t) of the current frame, i.e. the tth frame candidate, and the search window of the first frame is calculated according to (198,214,34, 81).
Taking the first frame as an example, the center point of the search window is (255, 215), and the width and height are w _ s1=92、h_s1139. In order to ensure that the search range is within the video frame range, the width and height of the search window are further modified according to the intersection of the search range and the current frame region. In order to facilitate the subsequent calculation of the color histogram feature, the distance between the boundary of the search window and the real target boundary is defined as an even number, and the width and height of the search window are further modified.
Let the width and height of the normalized window NormWin bew _ n is 150 and h _ n is 150, the transform factor of the search window is
Figure BDA0003144506970000071
The search window image can be normalized and transformed to form a standard search window according to the search window transformation factor. Taking the first frame as an example, the width and height are w _ sn1=92、h_sn1139, the standard target window of the current frame has a width and height of w _ on1=122、h_on1=184。
Step2.2: generating a standard Gaussian response graph;
the standard gaussian response map g is a two-dimensional matrix, where cell 4 has a width w _ g _ 30 and h _ g _ 46, and the matrix element values are probability density functions according to a two-dimensional gaussian distribution N (0,0, δ, δ,0), which can be expressed in terms of
Figure BDA0003144506970000072
And (4) calculating by using a formula. Wherein δ represents a standard deviation of the two-dimensional gaussian distribution, and the calculation method comprises:
Figure BDA0003144506970000073
the cell represents that the size of each grid in the HOG feature extraction process is cell multiplied by cell, and (i, j) represents the element coordinate position of the Gaussian response diagram matrix, and the origin is located at the center point of the matrix. The standard gaussian response map is fourier transformed to obtain its frequency domain representation G, which is the same size as G.
Step2.3: extracting Histogram of Oriented Gradient (HOG) features;
using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search window1The size is 30 × 46 × 28. Using a cosine window pair of features f of size 30 x 46tSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristic1Which is reacted with f1The same size.
Step2.4: calculating a correlation filter template of the HOG characteristics;
frequency of known normalized search window HOG featuresDomain representation F1And the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter template1Can be according to formula H1=G/F1And (4) calculating.
Step2.5: extracting a color histogram feature template;
taking the first frame as an example, the region outside the target region (198,214,34,81) in the Search window Search (1) ═ 254.5,215,92, 139) is defined as the background region, the target region is retracted by a certain amount and defined as the foreground region, the center point is the same as the target region, and the width and height are both retracted by 11.5. Extracting background color histograms bg _ hist in the background area and the foreground area respectively1And foreground color histogram fg _ hist1The current frame color histogram feature template is obtained.
Step 3: inputting a next frame and extracting features;
calculating a current frame Search window Search (2) according to the method of Step2.1, and extracting a frequency domain representation F of the directional gradient Histogram (HOG) feature of the current frame according to Step2.32Response graph G of HOG feature of current frame2Can be according to formula G2=F2⊙H1And (4) calculating.
Extracting the current frame color histogram feature bg _ hist according to the method of Step2.52And fg _ hist2Corresponding each pixel in the search area image to the bin value of the histogram, combining the standard target window size and the color histogram feature bg _ hist of the last frame1And fg _ hist1Calculating the similarity atlas L of the color histogram feature and color histogram template of the current frame1The dimensions are the same as for response graph G.
Step 4: self-adaptive feature fusion;
step 4.1: computing adaptive feature fusion parameters
Taking the first frame as an example, utilize
Figure BDA0003144506970000081
Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame0-0.6316 wherein
Figure BDA0003144506970000082
Using formulas
Figure BDA0003144506970000083
Calculating the quality w of the HOG characteristic response diagram10.7613 where the molecule is peak difference, PmIs the main peak value, PsIs the sub-peak, the denominator is the peak distance, DmIs the position of the main peak, DsIs the position of the secondary peak, using the formula
Figure BDA0003144506970000091
Calculating a similarity map L of a color histogram templatetAs the color response map mass w20.3986 where xiRepresenting the pixel value of each pixel point.
Step 4.2: self-adaptive feature fusion;
and fusing the characteristic graphs according to the weight values obtained in Step 4.1. The threshold value tau is 0.5, w1<τ,w2≥τ,μ=0.3+(w2- τ), μ -0.1387 using GL (1- μ) Gt+μLtAnd calculating a fusion response graph.
Step 5: determining a tracking result;
and the element value of the GL matrix of the self-adaptive feature fusion result represents the probability that the candidate target in the corresponding search window is the tracking result, and the candidate target corresponding to the maximum element value is the tracking result.
The number of candidate objects in the current frame search window is 75 × 75. Let GL bemax、x_GLmax42 and y _ GLmax38 respectively represents the maximum element value in the adaptive feature fusion result GL matrix and its corresponding horizontal and vertical coordinate position, the current frame tracking result is (195, 214,34,81), where x2=195、y2=214、w2=34、h281, and 1.3264 are transform factors of the search window.
Step 5: updating the target template;
according to the position of the current frame tracking result (194.9884,214,34,81) and Step2.1The method calculates Search window Search '(2), and extracts frequency domain representation F of Histogram of Oriented Gradients (HOG) features in the Search' (2) range according to the method of Step2.32', calculation of H according to the method of Step2.42'=G/F2'. Let η equal to 0.01 as the updated parameter, the current frame HOG feature correlation filter template H2The updating method is shown as the formula:
H2=(1-0.01)×H1+0.01×H2'
extracting a color histogram feature bg _ hist according to the position of the current frame tracking result (194.9884,214,34,81) and the method of Step2.52' and fg _ hist2'. Let θ and β be update parameters equal to 0.04, and the current frame background color histogram and foreground color histogram template updating method is shown as formula:
bg_hist2=(1-0.04)×bg_hist1+0.04×bg_hist2'
fg_hist2=(1-0.04)×fg_hist1+0.04×fg_hist2'
step 7: if the current frame is the last frame, tracking ends, otherwise, go to Step 3.
Finally, the hardware experimental environment of the embodiment of the invention is a computer configured by an Intel Core i5-6700 CPU, a main frequency of 3.4GHz and a memory of 8GB, the success rate of the final tracking result reaches 78.6%, and the screenshot of a part of the tracking result is shown in figure 3. In the figure, the red box represents the fixed-scale fusion tracking result, and the green box represents the tracking result of the present invention.
Example 2: according to the technical scheme of the invention, the Soccer video sequence is selected for tracking, and has eight challenge attributes of illumination change, scale change, shielding, motion blur, rapid motion, in-plane rotation, out-of-plane rotation and background clutter.
Step 1: inputting a first frame;
selecting a Soccer video, wherein the Width and the Height of a video frame are 640 and 360 respectively. Rectangular areas (302,135,67,81) of the object to be tracked in the first frame are selected, i.e. the selected tracking object is shown as a green rectangular box in fig. 4. Wherein (302, 135) is the coordinates of the upper left corner of the rectangular area, and (67, 81) is the width and height of the rectangular area.
Step 1: initializing a target template;
step 2.1: calculating a search window;
tracking the result (x) from the previous frame, i.e. t-1 framet-1,yt-1,wt-1,ht-1) The corresponding rectangular area can calculate the search window search (t) of the current frame, i.e. the tth frame candidate, and the first frame search window is calculated according to (302,135,67, 81).
Taking the first frame as an example, the center point of the search window is (336, 176), and the width and height are w _ s1=141、h_s1155. In order to ensure that the search range is within the video frame range, the width and height of the search window are further modified according to the intersection of the search range and the current frame region. In order to facilitate the subsequent calculation of the color histogram feature, the distance between the boundary of the search window and the real target boundary is defined as an even number, and the width and height of the search window are further modified.
Let w _ n be 150 and h _ n be 150 for normalized window NormWin, respectively, then the transform factor of the search window is
Figure BDA0003144506970000101
The search window image can be normalized and transformed to form a standard search window according to the search window transformation factor. Taking the first frame as an example, the width and height are w _ sn1=141、h_sn1155, the standard target window of the current frame has a width and height w _ on1=143、h_on1=157。
Step 2.2: generating a standard Gaussian response graph;
the standard gaussian response map g is a two-dimensional matrix, where cell 4 has a width w _ g 35 and h _ g 39, and the matrix element values are probability density functions according to a two-dimensional gaussian distribution N (0,0, δ, δ,0), which can be expressed in terms of
Figure BDA0003144506970000111
And (4) calculating by using a formula. Wherein, the delta represents the standard deviation of two-dimensional Gaussian distribution and the calculation method is
Figure BDA0003144506970000112
The cell represents that the size of each grid in the HOG feature extraction process is cell multiplied by cell, and (i, j) represents the element coordinate position of the Gaussian response diagram matrix, and the origin is located at the center point of the matrix. The standard gaussian response map is fourier transformed to obtain its frequency domain representation G, which is the same size as G.
Step 2.3: extracting Histogram of Oriented Gradient (HOG) features;
using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search window1The size is 35 × 39 × 28. Using a cosine window pair of features f of size 35 x 39tSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristic1Which is reacted with f1The same size.
Step 2.4: calculating a correlation filter template of the HOG characteristics;
frequency domain representation F of the HOG feature of the known normalized search window1And the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter template1Can be according to formula H1=G/F1And (4) calculating.
Step 2.5: extracting a color histogram feature template;
taking the first frame as an example, the region outside the target region (302,135,67,81) in the Search window Search (1) — (336, 176, 141, 155) is defined as the background region, the target region is retracted by a certain amount and defined as the foreground region, the center point of the foreground region is the same as the target region, and the width and height are both 14.8. Extracting background color histograms bg _ hist in the background area and the foreground area respectively1And foreground color histogram fg _ hist1The current frame color histogram feature template is obtained.
Step 3: inputting a next frame and extracting features;
calculating a current frame Search window Search (2) according to the method of Step2.1, and extracting a frequency domain representation F of the directional gradient Histogram (HOG) feature of the current frame according to Step2.32Of HOG features of the current frameResponse graph G2Can be according to formula G2=F2⊙H1And (4) calculating.
Extracting the current frame color histogram feature bg _ hist according to the method of Step2.52And fg _ hist2Corresponding each pixel in the search area image to the bin value of the histogram, combining the standard target window size and the color histogram feature bg _ hist of the last frame1And fg _ hist1Calculating the similarity atlas L of the color histogram feature and color histogram template of the current frame1The dimensions are the same as for response graph G.
Step 4: self-adaptive feature fusion;
step 4.1: calculating self-adaptive feature fusion parameters;
taking the first frame as an example, utilize
Figure BDA0003144506970000121
Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame0-2.6366 wherein
Figure BDA0003144506970000122
Using formulas
Figure BDA0003144506970000123
Calculating the quality w of the HOG characteristic response diagram10.6576 where the molecule is peak difference, PmIs the main peak value, PsIs the sub-peak, the denominator is the peak distance, DmIs the position of the main peak, DsIs the position of the secondary peak, using the formula
Figure BDA0003144506970000124
Calculating a similarity map L of a color histogram templatetAs the color response map mass w20.5245 where xiRepresenting the pixel value of each pixel point.
Step 4.1: self-adaptive feature fusion;
and fusing the characteristic graphs according to the weight values obtained by the 4.1. Taking the threshold value tau as 0.5, w1>τ,w2≥τ,μ=0.3+(w1-w2) (vi)/2, μ 0.2335 using GL ═ 1- μ Gt+μLtAnd calculating a fusion response graph.
Step 5: determining a tracking result;
and the element value of the GL matrix of the self-adaptive feature fusion result represents the probability that the candidate target in the corresponding search window is the tracking result, and the candidate target corresponding to the maximum element value is the tracking result.
The number of candidate objects in the current frame search window is 75 × 75. Let GL bemax、x_GLmax42 and y _ GLmax38 respectively represents the maximum element value in the adaptive feature fusion result GL matrix and its corresponding horizontal and vertical coordinate position, the current frame tracking result is (298,139,67,81), where x2=298、y2=139、w2=67、h281, and 1.0146 are transform factors of the search window.
Step 6: updating the target template;
calculating a Search window Search '(2) according to the position of the current frame tracking result (194.9884,214,34,81) and the method of Step2.1, and extracting a frequency domain representation F of the Histogram of Oriented Gradients (HOG) feature in the Search' (2) range according to the method of Step2.32', calculation of H according to the method of Step2.42'=G/F2'. Let η equal to 0.01 as the updated parameter, the current frame HOG feature correlation filter template H2The updating method is shown as the formula:
H2=(1-0.01)×H1+0.01×H2'
extracting a color histogram feature bg _ hist according to a position of a current frame tracking result (194.9884,214,34,81) and a method of Step2.52' and fg _ hist2'. Let θ and β be update parameters equal to 0.04, and the current frame background color histogram and foreground color histogram template updating method is shown as formula:
bg_hist2=(1-0.04)×bg_hist1+0.04×bg_hist2'
fg_hist2=(1-0.04)×fg_hist1+0.04×fg_hist2'
step 7: if the current frame is the last frame, tracking ends, otherwise, go to Step 3.
Finally, the hardware experimental environment of the embodiment of the invention is a computer configured by an Intel Core i5-6700 CPU, a main frequency of 3.4GHz and a memory of 8GB, the success rate of the final tracking result reaches 51.8%, and the screenshot of the partial tracking result is shown in FIG. 5. In the figure, the green box represents the fixed-scale fusion tracking result, and the red box represents the tracking result of the present invention.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (7)

1. A multi-feature adaptive fusion related filtering target tracking method is characterized by comprising the following steps:
step 1: inputting a first frame;
step 2: initializing a target template;
step 3: inputting a next frame and extracting features;
step 4: self-adaptive feature fusion;
step 5: determining a tracking result;
step 6: updating the target template;
step 7: if the current frame is the last frame, tracking ends, otherwise, go to Step 3.
2. The method for tracking a target through correlation filtering with multi-feature adaptive fusion as claimed in claim 1, wherein Step1 is specifically: the upper left corner of each frame in the video frame sequence is a coordinate origin (1,1), the Width and the Height are Width and Height respectively, and a rectangular area (x) of an object to be tracked in the first frame is selected0,y0,w0,h0) I.e. the selected tracking target;
wherein (x)0,y0) Representing the coordinates of the upper left corner of the rectangular area, w0,h0Respectively representing the width and height of the rectangular area, and the first frame selected target is also called a current frame tracking result (x)1,y1,w1,h1)=(x0,y0,w0,h0) The subscript indicates the current frame number.
3. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 2, wherein Step2 specifically comprises:
step2.1: calculating a search window;
tracking the result (x) from the previous frame, i.e. t-1 framet-1,yt-1,wt-1,ht-1) The corresponding rectangular area can calculate the search window search (t) of the current frame, i.e. the tth frame candidate target, and the first frame search window is based on (x)0,y0,w0,h0) Calculating;
the center point of the search window is (x _ s)t,y_st) Wherein x _ st=xt-1+wt-1/2、y_st=yt-1+ht-1(ii)/2, width and height are w _ st=1.5×wt-1+0.5×ht-1、h_st=1.5×ht-1+0.5×wt-1
Let the width and height of normalized window Normwin be w _ n and h _ n, respectively, then the transform factor of the search window is:
Figure FDA0003144506960000011
the search window image can be normalized and transformed to form a standard search window according to the search window transformation factor, which has a width and a height w _ snt=w_st×γ、h_snt=h_stX gamma, the width and height of the standard target window of the current frame is w _ ont=w_snt×0.75-h_snt×0.25、h_ont=h_snt×0.75-w_snt×0.25;
Step2.2: generating a standard Gaussian response graph;
the standard gaussian response graph g is a two-dimensional matrix with width and height w _ g-w _ snt/cell、h_g=h_sntCell whose matrix element values are probability density functions conforming to a two-dimensional Gaussian distribution N (0,0, δ, δ,0) by
Figure FDA0003144506960000021
Calculating;
where δ represents the standard deviation of the two-dimensional gaussian distribution:
Figure FDA0003144506960000022
the cell represents that the size of each grid in the HOG feature extraction process is cell multiplied by cell, (i, j) represents the element coordinate position of a Gaussian response diagram matrix, the origin is positioned at the central point of the matrix, and the standard Gaussian response diagram is subjected to Fourier transform to obtain the frequency domain representation G which has the same size as G;
step2.3: extracting HOG characteristics of the histogram of directional gradient;
using cell as HOG characteristic grid size parameter, 2 x 2 grids as block size, setting histogram group bin as 2 pi/7, extracting HOG characteristic f in current frame standardized search windowtThe size is w _ g × h _ g × 28; using a cosine window pair of size w _ g × h _ g for feature ftSmoothing, and Fourier transforming to obtain frequency domain representation F of HOG characteristictWhich is reacted with ftThe same size;
step2.4: calculating a correlation filter template of the HOG characteristics;
frequency domain representation F of the HOG feature of the known normalized search windowtAnd the frequency domain representation G of the standard Gaussian response plot, then the frequency domain representation H of the HOG feature correlation filter templatet
Ht=G/Ft
Step2.5: extracting a color histogram feature template;
search window (t) ═ x _ st,y_st,w_st,h_st) Inner target area (x)t-1,yt-1,wt-1,ht-1) The region outside is defined as the background region, eyeThe target region is defined as the foreground region, the center point is the same as the target region, and the width and height shrinkage are all (w)t-1+ht-1) 10, extracting a background color histogram bg _ hist in the background area and the foreground area respectivelytAnd foreground color histogram fg _ histtThe current frame color histogram feature template is obtained.
4. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 3, wherein Step3 is specifically as follows:
calculating a search window search (t) of the current frame according to a method of Step2.1, and extracting a frequency domain representation F of HOG characteristics of a directional gradient histogram of the current frame according to Step2.3tResponse graph G of HOG feature of current framet
Gt=Ft⊙Ht-1
Extracting the current frame color histogram feature bg _ hist according to the method of Step2.5tAnd fg _ histtCorresponding each pixel in the search area image to the bin value of the histogram, combining the standard target window size and the color histogram feature bg _ hist of the last framet-1And fg _ histt-1Calculating the similarity atlas L of the color histogram feature and color histogram template of the current frametThe dimensions are the same as for response graph G.
5. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 3, wherein Step4 is specifically as follows:
step4.1: calculating self-adaptive feature fusion parameters;
by using
Figure FDA0003144506960000031
Calculating the mean value of the t-1 frame and the t frame
Figure FDA0003144506960000032
And
Figure FDA0003144506960000033
use of
Figure FDA0003144506960000034
Calculating the mean difference of the search windows between the t-1 frame and the t frame to obtain the illumination change intensity w of the previous frame and the next frame0
Using formulas
Figure FDA0003144506960000035
Calculating the quality w of the HOG characteristic response diagram1In which P ism-PsIs the peak difference, P, in the response mapmIs the main peak value, PsIs the peak value of the sub-peak,
Figure FDA0003144506960000036
is the peak distance, is DmPosition of main peak, DsIs the secondary peak position, k is 0.1;
using formulas
Figure FDA0003144506960000037
Calculating a similarity map L of a color histogram templatetAs the color response map mass w2Wherein x isiRepresents a similar map LtThe ith value of (d);
step4.2: self-adaptive feature fusion;
adaptive fusion is performed using the following formula:
Figure FDA0003144506960000038
using w1,w2Judging the quality of two characteristic diagrams and the illumination change intensity w0Calculating the fusion ratio mu of the characteristic response diagram;
wherein tau is a threshold parameter with a value range of [0.3, 0.6 ];
when response map quality w1,w2When the response value is greater than or equal to the threshold value tau, the quality of the two response graphs is higher;
when w is1≥τ,w2When τ is less, it indicates that HOG feature quality is good and color feature quality is not good, and μ ═ 0.3+ (w)1-τ);
When w is1<τ,w2When t is more than or equal to t, the HOG characteristic quality is not good, the color characteristic quality is better, and mu is 0.3+ (w)2-τ);
When μ < 0, let μ equal to 0, use GL equal to (1- μ) Gt+μLtAnd calculating a fusion response graph.
6. The method for tracking a target with correlation filtering based on multi-feature adaptive fusion as claimed in claim 5, wherein Step5 is specifically:
the element value of the GL matrix of the self-adaptive feature fusion result represents the probability that the candidate target in the corresponding search window is the tracking result, and the maximum element value corresponds to the candidate target, namely the tracking result;
the number of candidate targets in the current frame search window is (w _ sn)t-w_ont)×(h_snt-h_ont);
Let GL bemax、x_GLmaxAnd y _ GLmaxRespectively representing the maximum element value in the self-adaptive feature fusion result GL matrix and the corresponding horizontal and vertical coordinate positions, and the tracking result of the current frame is (x)t,yt,wt,ht) Wherein w ist=wt-1、ht=ht-1、xt=xt-1+(x_GLmax-(w_snt-w_ont)/2)/γ-wt/2、yt=yt-1+(y_GLmax-(h_snt-h_ont)/2)/γ-htAnd/2, gamma is a conversion factor of the search window.
7. The method for tracking a target through correlation filtering with multi-feature adaptive fusion according to claim 3, wherein Step6 is specifically as follows:
tracking result (x) according to current framet,yt,wt,ht) The Search window Search '(t) is calculated according to the method of Step2.1, and the method of Step2.3 is used to calculate the Search window Search' (t)Taking a frequency domain representation F of the HOG characteristic of the histogram of oriented gradients in the Search' (t) ranget', calculation of H according to the method of Step2.4t'=G/Ft';
Let η be the updated parameter, the HOG feature correlation filter template H of the current frametThe updating method is shown as the formula:
Ht=(1-η)Ht-1+ηHt'
tracking result (x) according to current framet,yt,wt,ht) The method of (3) and Step2.5 extracts the color histogram feature bg _ histt' and fg _ histt';
Let θ and β be update parameters, and the current frame background color histogram and foreground color histogram template update method is shown as follows:
bg_histt=(1-θ)×bg_histt-1+θ×bg_histt'
fg_histt=(1-β)×fg_histt-1+β×fg_histt'。
CN202110751273.XA 2021-07-01 2021-07-01 Multi-feature adaptive fusion related filtering target tracking method Active CN113379802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110751273.XA CN113379802B (en) 2021-07-01 2021-07-01 Multi-feature adaptive fusion related filtering target tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110751273.XA CN113379802B (en) 2021-07-01 2021-07-01 Multi-feature adaptive fusion related filtering target tracking method

Publications (2)

Publication Number Publication Date
CN113379802A true CN113379802A (en) 2021-09-10
CN113379802B CN113379802B (en) 2024-04-16

Family

ID=77580709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110751273.XA Active CN113379802B (en) 2021-07-01 2021-07-01 Multi-feature adaptive fusion related filtering target tracking method

Country Status (1)

Country Link
CN (1) CN113379802B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748877A (en) * 2017-11-10 2018-03-02 杭州晟元数据安全技术股份有限公司 A kind of Fingerprint recognition method based on minutiae point and textural characteristics
CN109934853A (en) * 2019-03-21 2019-06-25 云南大学 Correlation filtering tracking based on the fusion of response diagram confidence region self-adaptive features
CN110147747A (en) * 2019-05-09 2019-08-20 云南大学 A kind of correlation filtering tracking based on accumulation first derivative high confidence level strategy
CN110163132A (en) * 2019-05-09 2019-08-23 云南大学 A kind of correlation filtering tracking based on maximum response change rate more new strategy
CN111612817A (en) * 2020-05-07 2020-09-01 桂林电子科技大学 Target tracking method based on depth feature adaptive fusion and context information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748877A (en) * 2017-11-10 2018-03-02 杭州晟元数据安全技术股份有限公司 A kind of Fingerprint recognition method based on minutiae point and textural characteristics
CN109934853A (en) * 2019-03-21 2019-06-25 云南大学 Correlation filtering tracking based on the fusion of response diagram confidence region self-adaptive features
CN110147747A (en) * 2019-05-09 2019-08-20 云南大学 A kind of correlation filtering tracking based on accumulation first derivative high confidence level strategy
CN110163132A (en) * 2019-05-09 2019-08-23 云南大学 A kind of correlation filtering tracking based on maximum response change rate more new strategy
CN111612817A (en) * 2020-05-07 2020-09-01 桂林电子科技大学 Target tracking method based on depth feature adaptive fusion and context information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢煜: "基于上下文感知与自适应响应融合的相关滤波跟踪算法", 小型微型计算机系统, vol. 42, no. 4, 30 April 2021 (2021-04-30) *

Also Published As

Publication number Publication date
CN113379802B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN111080675B (en) Target tracking method based on space-time constraint correlation filtering
CN107909081B (en) Method for quickly acquiring and quickly calibrating image data set in deep learning
CN108109162B (en) Multi-scale target tracking method using self-adaptive feature fusion
CN110490907B (en) Moving target tracking method based on multi-target feature and improved correlation filter
CN108876820B (en) Moving target tracking method under shielding condition based on mean shift
CN109448023B (en) Satellite video small target real-time tracking method
CN112364865B (en) Method for detecting small moving target in complex scene
CN111582349A (en) Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering
CN110910421A (en) Weak and small moving object detection method based on block characterization and variable neighborhood clustering
CN106600613B (en) Improvement LBP infrared target detection method based on embedded gpu
CN108537825B (en) Target tracking method based on transfer learning regression network
CN107871315B (en) Video image motion detection method and device
CN110827327B (en) Fusion-based long-term target tracking method
CN110163132A (en) A kind of correlation filtering tracking based on maximum response change rate more new strategy
CN110570450B (en) Target tracking method based on cascade context-aware framework
CN112132855A (en) Self-adaptive Gaussian function target tracking method based on foreground segmentation guidance
CN117011381A (en) Real-time surgical instrument pose estimation method and system based on deep learning and stereoscopic vision
CN116777956A (en) Moving target screening method based on multi-scale track management
CN116665097A (en) Self-adaptive target tracking method combining context awareness
CN110147747B (en) Correlation filtering tracking method based on accumulated first-order derivative high-confidence strategy
CN113379802B (en) Multi-feature adaptive fusion related filtering target tracking method
CN109934853B (en) Correlation filtering tracking method based on response image confidence region adaptive feature fusion
CN107564029B (en) Moving target detection method based on Gaussian extreme value filtering and group sparse RPCA
CN112465865A (en) Multi-target tracking method based on background modeling and IoU matching
CN110660079A (en) Single target tracking method based on space-time context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant