Scale self-adaptive target tracking method based on rapid compression tracking algorithm
Technical Field
The invention belongs to the technical field of image processing, particularly relates to video target tracking, face recognition, online learning, scale self-adaptation and the like, and particularly relates to a scale self-adaptation target tracking method based on a rapid compression tracking algorithm.
Background
Target tracking is a research hotspot in the field of computer vision, and has wide application in the fields of motion analysis, behavior recognition, intelligent monitoring, human-computer interaction and the like [1,2]. The difficulty of target tracking is how to deal with the influence of the appearance change of the target itself and factors such as illumination, shielding and background change on the target [3,4]. In recent years, attention has been paid to an on-line learning-based target tracking algorithm, which considers the tracking problem as a special binary classification problem, and the key is to use a trained classifier to segment a target from the background of a sequence image and update the classifier on line [5,6]. Due to the influence of factors such as noise, mismatching of classifier update factors and the like, the tracking drift problem easily occurs in the target tracking algorithm of online learning [7]. Many improved algorithms have emerged. Document [8] proposes a tracking algorithm based on an online semi-supervised classifier to reduce tracking drift, and the main idea is to combine a priority classifier and an online classifier into a combined classifier, and learn the combined classifier by using a semi-supervised learning method to update the classifier, and the algorithm can keep stable tracking of an appearance change target while limiting tracking drift. The document [9] proposes a learning method combining semi-supervised learning and multi-example learning, the method introduces a combined loss factor, and meanwhile, the learning is carried out on labeled and unlabeled samples, and the experimental result shows that the tracking effect of the method is superior to that of an online semi-supervised classifier.
A Compression Tracking (CT) algorithm [10] is a popular algorithm in a binary classification method, has good real-time performance and has certain robustness on target occlusion and appearance change. It suffers from two major problems: first, the feature description is simple, and tracking drift or target loss is easy to occur when illumination changes or target appearance changes greatly. Secondly, the scale of a target window is fixed in the tracking process, and tracking drift or target loss easily occurs when the scale of the target becomes large or shielding occurs. Document [11] is an improvement of the CT algorithm by the original authors, but it only increases the processing speed of the algorithm and does not solve the above problems well.
List of references:
[1]Cehovin L,Kristan M,Leonardis A.Robust visual tracking using an adaptive coupled-layer visual model[J].IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI),2013,35(4):941–953.
[2]Qian Chen,Sun Xiao,Wei Yi-chen,etal.Realtime and robust hand tracking from depth[C].IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2014:1106-1113.
[3]Zhang T,Ghanem B,Liu S,etl.Robust visual tracking via multi-task sparse learning[C].IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2012:2042–2049.
[4]Dicle C,Sznaier M and Camps O.The way they move:tracking multiple targets with similar appearance[C].IEEE International Conference on Computer Vision(ICCV),2013:2304-2311.
[5]Luber Matthias,Spinello Luciano,Arras KaiO.People Tracking in RGB-D Data With On-line Boosted Target Models[C].2011IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2011:3844-3849.
[6]Grabner Helmut,Leistner Christian,Bischof Horst.Semi-supervised On-Line Boosting for Robust Tracking[J].Lecture Notes In Computer Science,2008,5302(1):234-247.
[7]Li Hai-feng,Zeng Min,Lu Min-yan.Adaboosting-based dynamic weighted combination of software reliability growth models[J].Quality and Reliability Engineering International,2013,28(1):67-84.
[8]Zeisl B,Leistner C,Saffari A,Bischof H.On-line Semi-supervised Multiple-Instance Boosting[C].2010IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2010:1879.1-1879.15.
[9]Crowley J L,Stelmaszyk P,Discours C.TransientBoost:On-line Boosting with Transient Data[C].2010IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(CVPRW),2010:22-27.
[10]Zhang Kai-hua,Zhang Lei,Yang Ming-hsuan.Real-Time Compressive Tracking[C].European Conference on Computer Vision(ECCV),2012:866-879.
[11]Zhang Kai-hua,Zhang Lei,Yang Ming-hsuan.Fast Compressive Tracking[J].IEEE Transaction on Pattern Analysis and Machine Intelligence,2014,36(10):2002-2015.
[12]Wright J,Yang A,Ganesh,Sastry S.Robust face recognition via sparse representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(2):210-227.
[13]Ng AY,Jordan M.On discriminative vs.generative classifier:a comparison of logistic regression and naive bayes[C].2002Annual Conference on Neural Information Processing Systems,2002:841-848.
[14]Comaniciu D,Ramesh V,Meer P.Kernel-based object tracking[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(5):564-577.
[15]Gary R,Bradski.Real Time Face and Object Tracking as a Component of a Perceptual User Interface[C]IEEE Workshop on Applications of Computer Vision(WACV),1998:214-219.
[16]Zhang Kai-hua,Zhang Lei,Liu Qing-shan.Fast Visual Tracking via Dense Spatio-Temporal Context Learning[C].European Conference on Computer Vision(ECCV),2014:127-141.
disclosure of Invention
Aiming at the problems, the invention provides a scale self-adaptive target tracking method based on a rapid compression tracking algorithm.
According to the method, firstly, a context model is adopted to weight haar-like features, so that the robustness of the haar-like features to illumination changes is enhanced. Secondly, step-by-step tracking is adopted, so that the anti-blocking capability and the capability of responding to target scale change of the algorithm are enhanced, and the real-time performance of the algorithm is maintained. Finally, a scale self-adaptive method is provided, and stable tracking of the scale change target is realized.
The method comprises the following specific steps:
(1) And transforming the input image by adopting the context characteristics to obtain a weighted haar-like characteristic image.
And the compressed haar-like features are adopted in the CT algorithm to construct appearance models of the target and the background. In contrast to the haar feature,
the haar-like feature is equal to the sum of all pixel values within the rectangular box. Assume that the position of the object in the current frame is o * The function of any pixel point z in the image when positioning the position of the target is different: important pixel points close to the central point of the target are endowed with larger weight; otherwise, the pixel points far away from the target center point should be given smaller weight. The weighted haar-like feature is calculated as follows:
wherein, S is the selected rectangular region, n is the quantization bit number of the image, and y is the characteristic value of the weighted pixel, and the formula is as follows:
y(z)=I(z)ω σ (z-o*) (2)
where I (-) is the intensity of the selected feature, typically the gray-scale value of a pixel, that characterizes the object model. Omega σ (. Cndot.) is a Gaussian weight function calculated as
In the formula, a is a normalized weight function, the value range of y is [0,1], and sigma is a scale function.
(2) And judging whether the target is shielded or not by utilizing the kernel function histogram with enhanced edges and centers.
And establishing a kernel function weighted histogram in the target area of two adjacent frames of images, wherein the weighting rule respectively adopts center enhancement and edge enhancement. The center enhancement means that the closer the pixel point is to the target center, the larger the obtained weight is; the farther a pixel point is from the target center, the smaller the weight it obtains. The edge enhancement means that the closer the pixel point is to the center of the target, the smaller the obtained weight is; the farther a pixel point is from the target center, the greater the weight it obtains. The formula for center enhancement is shown in (4), and the formula for edge enhancement is shown in (5):
where d is the relative position to the tracking window.
An occlusion decision factor J = B (pb, qb) -B (pt, qt) is defined. Wherein B (p, q) represents a Bhattacharyya coefficient [ 0-1 ]; pt, qt represents a histogram of the central enhancement kernel function; pb, qb denotes the histogram of the edge enhancement kernel. When occlusion does not occur, the J value should be close to 0; when the shielding occurs, the J value will be rapidly increased because a large number of background pixel points enter the tracking window. The value range of J is J epsilon (0, 1).
(3) When the shielding occurs, a target is tracked by adopting a mode of partitioning coarse search-compressed tracking fine search.
When the occlusion occurs, the target is subjected to block processing R i I =1,2, \8230, n is the total number of target blocks. Around the target lambda f Selecting step size delta in neighborhood range f And obtaining possible positions of the target, and classifying by using a naive Bayes classifier to obtain the response value of each classifier. Judging the target region R of each classifier by using the position information i Calculating each target region R i Sum of all classifier response values in S i Selecting the center of the target area with the largest sum as the starting position L of the accurate search 0 (some classifiers may belong to multiple regions simultaneously, and the value of each classifier need only be calculated once). At a starting position L 0 Performing accurate search in the region, and performing λ around the center of the window c The neighborhood range utilizes compression tracking to detect the target, and the position corresponding to the maximum response value of the classifier is the position (delta) of the target f >Δ c ,λ f >λ c ,Δ c =1)。
The compression tracking algorithm comprises the following steps:
a Compression Tracking (CT) algorithm is a rapid robust tracking algorithm based on feature compression, and compresses haar-like features x by using a very sparse random projection matrix R, extracts compression features v, and classifies v by using a naive Bayes classifier so as to realize target tracking.
The calculation formula of feature compression is as follows:
v=Rx (6)
in the formula (I), the compound is shown in the specification,meets the RIP premise in Johnson-Lindenstaus inference and compressive sensing theory. The matrix elements of the sparse projection matrix R are defined as follows:
where ρ =2 or 3. When p =3, the matrix is very sparse and the computation amount is reduced by 2/3 (when p =3, the probability of 1/6 of the matrix elements is 1.732, the probability of 1/6 is-1.732, and the probability of 2/3 is 0).
And after the compression characteristic v of the sample is extracted, inputting v into a naive Bayes classifier for classification, and taking the position corresponding to the maximum response value of the classifier as the position of the target in a new frame. The construction and updating of the classifier thereof are described below.
For each sampleIts low-dimensional representation isThe distribution of the elements in v is assumed to be independent and can be modeled by a naive Bayes classifier [12] 。
Where the prior probabilities of the two classes are equal p (y = 1) = p (y = 0), y ∈ {0,1} represents a sample label, y =0 represents a negative sample, and y =1 represents a positive sample.
Diaconis and Freedman demonstrated that random projections of high-dimensional random vectors are almost all Gaussian distributed [13] . Thus, a conditional probability p (v) in the classifier H (v) is assumed i Y = 1) and p (v) i Y = 0) also belongs to a gaussian distribution and can be defined by four parametersTo describe [10] :
The parameters in the above equation are updated incrementally according to equation (5), where j =0,1.
Wherein the learning factor lambda>0,
(4) When the target is not shielded, the target is tracked by adopting a gravity center coarse search-compressed tracking fine search mode.
Suppose M 00 Is 0 order of magnitude, M, of an image 10 And M 01 Is 1 step distance, M, of the image 20 And M 02 Is the 2 nd order of the image. The position of the center of gravity is
Assume that the position of the target in the previous frame isThe position of the center of gravity thereof isIn the image of the frame, firstly, the position of the target in the previous frame is taken as the center, and the initial position of the gravity center of the target is obtained by calculationMultiple repetition of calculating target center of gravityUntil the calculation results of two adjacent times satisfy the convergence condition, and willValue is assigned toUsing position of centre of gravityObtaining corresponding target positionPositioning the targetAs a starting position for the target accurate search. In thatPeripheral lambda c And detecting the target in the neighborhood range by adopting a compression tracking method, wherein the position corresponding to the maximum response value of the classifier is the position of the target.
(5) And calculating the n-order moment of the image, and updating the target scale by adopting a CAMShift algorithm.
During the tracking process, the dimensions of the target are constantly changing. And a corresponding strategy is adopted to update the target scale to adapt to the scale change of the target, and a scale updating method [15] is shown as follows.
Definition ofThe length l and width w of the tracking window are then
(3) The update formula [16] of the scale parameter σ in formula (3) is as follows:
of formula (II) s' t Is an estimated value of target scale change of two adjacent frames, and s 'is enhanced in order to reduce estimation error and excessive sensitivity of scale update' t Robustness of (2), introductionThe scale-change estimate is filtered and,is the mean of all scale estimates. λ is the update rate and is [0,1]]A natural number in between.
The invention has the beneficial effects that: the invention discloses a scale self-adaptive target tracking method based on a rapid compression tracking algorithm. Firstly, improving haar-like characteristics to enhance the characterization capability of the haar-like characteristics on a target; secondly, using gravity center rough search to reduce algorithm processing time, and introducing scale factors to update the scale of a target window in a self-adaptive manner; and finally, processing the shielding problem of the target by adopting a blocking and coarse-to-fine tracking strategy. Experimental results show that the method has better robustness to target scale change, target appearance change and target shielding conditions, can ensure that the frame frequency is about 39 frames/second, and meets the requirement of real-time property.
Drawings
FIG. 1 is a process flow of the present invention.
FIG. 2 is a process flow of compression tracking.
Fig. 3 is a block diagram.
Fig. 4 is the tracking result of a david video sequence.
Fig. 5 is the tracking result of a woman video sequence.
Fig. 6 is the tracking result of the twinnings video sequence.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings and tables.
Fig. 1 is a processing flow of the present invention, and first, an input image is transformed to obtain a weighted haar-like feature image. Secondly, whether the target is shielded or not is judged. When the shielding occurs, the target is tracked by adopting a mode of partitioning coarse search-compressed tracking fine search. When the target is not shielded, the target is tracked by adopting a gravity center coarse search-compressed tracking fine search mode. And finally, carrying out scale updating on the tracking window.
Fig. 2 is a process flow of the compression tracking algorithm. The Compression Tracking (CT) algorithm is a popular algorithm in the binary classification method, and it firstly uses the sparse projection matrix to reduce the dimension of the image features, secondly uses the simple naive bayes classifier to classify the features after dimension reduction, and finally updates the classifier through online learning.
Fig. 3 is a block diagram. When the occlusion occurs, the target area is divided into four parts R as shown in FIG. 3 i I =1,2,3,4. Around the target λ f Selecting step size delta in neighborhood range f And obtaining possible positions of the target, and classifying by using a naive Bayes classifier to obtain the response value of each classifier. Judging the target region R of each classifier by using the position information i Calculating each target region R i Of all classifier response values i Selecting the center of the target area with the largest sum as the accurate searchStarting position L of the cable 0 (some classifiers may belong to multiple regions at the same time, and the value of each classifier needs to be calculated only once). At a starting position L 0 Performing accurate search in the region, and performing λ around the center of the window c The neighborhood range utilizes compression tracking to detect the target, and the position corresponding to the maximum response value of the classifier is the position (delta) of the target f >Δ c ,λ f >λ c ,Δ c =1)。
Let λ be f =29Δ f =4 and λ c =10Δ c =1, the calculation amount of the block coarse search is aboutThe CT algorithm is approximately computationally intensiveSince the calculation of the classifier takes up most of the processing time, the amount of calculation is not increased although the block processing is adopted.
The experiment is carried out in the simulation environment of Matlab9.0 and compared with CT algorithm [10] and FCT algorithm [11 ]. The 3 common test videos are david. Jpg, wman. Png and twinnings. Png, respectively. The simulation results are shown in fig. 4 to 6, in which the white frame is the tracking result of the CT algorithm, the black frame is the tracking result of the FCT algorithm, and the gray frame is the tracking result of the present invention.
Fig. 4 is the tracking result of a david video sequence. The target in the david video sequence undergoes gradual changes of illumination, scale and appearance, and the tracking performance of the algorithm is mainly tested when the illumination changes. Fig. 4 shows the tracking results of 10 th, 80 th, 150 th, 300 th, 450 th, 600 th, 690 th, 770 th frames in a video sequence. The facial appearance and scale of david did not change significantly before 300 frames, while the room brightness changed significantly. The tracking errors of the CT algorithm and the FCT algorithm become larger gradually along with the change of illumination, and the invention provides a more accurate tracking result. After 300 frames, david's face appears with some degree of deformation and scale change. The tracking results of the CT algorithm and the FCT algorithm have larger deviation, but the FCT algorithm is superior to the CT algorithm, the method can adapt to the change of the target, and the tracking error is minimum.
Fig. 5 is the tracking result of a woman video sequence. And (3) subjecting the target in the woman video sequence to partial occlusion and large-area occlusion, and mainly testing the tracking performance of the algorithm during occlusion. Fig. 5 shows the tracking results of the 15 th, 100 th, 200 th, 300 th, 700 th and 831 th frames in the video sequence. When the face is not shielded (frame 15), the three algorithms can stably track the target, but when the target is shielded (frames 100, 200, 300, 700 and 831), the CT and FCT have certain tracking drift, the CT algorithm is superior to the FCT algorithm, and the invention can still more accurately track the target.
Fig. 6 is the tracking result of twinnings video sequence. And (3) subjecting targets in the twinnings video sequence to shape change, illumination change and large-scale change, and mainly testing the tracking performance of the algorithm during scale change. Fig. 6 shows the tracking result of the 30 th, 100 th, 200 th, 300 th, 400 th, 450 th frames in the video sequence. When the target is not subjected to scale change, the three algorithms can stably track the target (frame 30), but when the target is subjected to obvious scale change (frames 100, 200, 300, 400 and 451), the CT and the FCT have certain tracking drift, and the FCT algorithm is superior to the CT algorithm
Comparison of tracking performance of 3 algorithms is shown below, where table 1 is the success rate SR (SR) (%), and table 2 is the Center Location Error (CLE) (in pixels) and the average Frame Per Second (FPS).
TABLE 1 success Rate SR (%)
As can be seen from Table 1, the success rate of the present invention is the highest and is kept above 98%. When tracking targets with appearance change and scale change, the success rate of the FCT algorithm is higher than that of the CT algorithm.
TABLE 2CLE and FPS
As can be seen from Table 2, the tracking error of the present invention is the smallest, the FCT algorithm is the second order, and the CT algorithm is the largest; the FCT algorithm has the best real-time performance, and the CT algorithm is the worst in the invention.
Overall, the invention has the best overall performance, has the highest success rate and the smallest tracking error, and has better tracking time than the CT algorithm.
It should be understood by those skilled in the art that the above embodiments are only for illustrating the present invention and are not to be considered as limiting the present invention, and that the changes and modifications of the above embodiments are within the scope of the present invention.