CN107424175A

CN107424175A - A kind of method for tracking target of combination spatio-temporal context information

Info

Publication number: CN107424175A
Application number: CN201710596203.5A
Authority: CN
Inventors: 朱红; 王道江
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2017-12-01
Anticipated expiration: 2037-07-20
Also published as: CN107424175B

Abstract

The invention belongs to pattern-recognition and computer vision field, discloses a kind of method for tracking target of combination spatio-temporal context information, including：Using the initial strong classifier of the first frame picture training, and learn the space-time context model that next frame tracking needs；When a new frame reaches, some pieces of region of search are assessed using the strong classifier trained, obtains the first confidence matrix；Then integrate spatio-temporal context information and try to achieve confidence map function, and the value of the confidence of each piece of region of search is asked for using the function, obtain the second confidence matrix；Final confidence matrix finally is obtained according to corresponding weight linear combination, finds the target that the block that the value of the confidence is maximum in final confidence matrix seeks to tracking；The present invention can realize quick robustness tracking by the way that the spatio-temporal context information of target is attached in online Boosting algorithms.

Description

A kind of method for tracking target of combination spatio-temporal context information

Technical field

The invention belongs to pattern-recognition and technical field of computer vision, more particularly to one kind to combine spatio-temporal context information Method for tracking target.

Background technology

Motion target tracking is one of important research direction of computer vision field, in man-machine interaction, intelligent monitoring, doctor Learning the fields such as image has important application.Track algorithm had been achieved for bigger progress in the last few years, but how effective Solution track drifting problem due to blocking, caused by the factor such as quickly motion, illumination variation, background be mixed and disorderly, be still one The problem of extremely challenging.

Online Boosting algorithms, when a new frame reaches, the background in picture and target are entered using strong classifier Row classification, obtains target area, but when target is blocked, just occurs and feature pool is carried out more using the feature blocked Newly, feature pool is polluted, tracking drift finally just occurs.

Based on above mentioned problem, some improved online Boosting algorithms are suggested.Yan et al. proposes that one kind is based on sub-district The online Boosting algorithms of domain grader, are divided into some sub-regions by target area, corresponding one strong point per sub-regions Class device.To there is the feature pool selection corresponding to the strong classifier of minimum the value of the confidence not update during tracking, to avoid blocking Pollution of the feature to feature pool, but when target scale changes, tracking effect is poor.

Sun et al. proposes a kind of online Boosting algorithms of combination campaign Blob detections, when the value of the confidence of tracking result During less than lower threshold, the moving target of region of search is detected using the method for motion Blob detections, to what is detected Moving mass carries out the value of the confidence assessment using strong classifier, until the value of the confidence is more than upper limit threshold or lower threshold, but motion Blob Detection is typically that can't detect remote moving target, so improved DeGrain.

Wang et al. proposes that the online Boosting algorithms of perception are blocked in fusion, is trained using the picture frame of certain amount Background characteristics grader and target signature grader, perceive whether target blocks using the two graders, if target It is blocked, then does not gather contaminated positive sample renewal grader, thus increase the complexity of grader, reduce The effect of line Boosting algorithm real-times, and the target for quickly moving is easily with losing.

The content of the invention

The shortcomings that existing for above-mentioned prior art, it is an object of the invention to provide one kind to combine spatio-temporal context information Method for tracking target, can solve the problem that in the prior art when target area is at least partially obscured or target scale varies widely The tracking drifting problem of appearance.

To reach above-mentioned purpose, the present invention, which adopts the following technical scheme that, to be achieved.

A kind of method for tracking target of combination spatio-temporal context information, methods described comprise the following steps：

Step 1, the first two field picture in video image is obtained, the target area of first two field picture is demarcated, with described Centered on target area, the target area is expanded to obtain region of search, region of search size is the four of target area size Times, and using the target area as positive sample, using four corner areas of the region of search as four negative samples； Wherein, the size of the target area is identical with the size of each corner areas；Using the positive sample and four negative samples as Training sample, strong classifier is obtained according to the training sample；

Step 2, according to the first two field picture studying space context model, and as the tracking next frame figure learnt The space-time context model of picture；

Step 3, the current frame image for needing to track is obtained, determines the initial search area of current frame image, present frame figure More than the initial search area of picture centered on the target area of one two field picture, and the initial search area of current frame image is upper one Four times of the target area of two field picture；Target area size to the initial search area of current frame image according to previous frame image Piecemeal is carried out, obtains multiple size identical sub-blocks to be searched；

Step 4, each sub-block to be searched is assessed according to strong classifier, obtain each sub-block to be searched first is put Letter value, form the first confidence matrix；

Step 5, the space-time context model of the tracking current frame image arrived according to previous frame image study, tries to achieve confidence map Function；It is determined that the central point of each sub-block to be searched, according to the confidence map function, the central point of each sub-block to be searched, divides The second the value of the confidence of each sub-block to be searched is not tried to achieve, forms the second confidence matrix；

Step 6, the initial value for determining weight corresponding to the first confidence matrix is 1/2, weight corresponding to the second confidence matrix Initial value be 1/2, according to the first confidence matrix, weight corresponding to the first confidence matrix and second is put the second confidence matrix Believe weight corresponding to matrix, linear combination obtains final confidence matrix；And determine maximum in the final confidence matrix The value of the confidence, sub-block to be searched corresponding to the maximum the value of the confidence are the target area of the current frame image traced into；

Step 7, the region of search of current frame image is determined, the region of search of current frame image is with the target of current frame image Centered on region, and the region of search of current frame image is four times of the target area of current frame image；By current frame image Target area is as positive sample, using four corner areas of current frame image region of search as four negative samples, to strong Grader is updated；

Step 8, according to current frame image studying space context model, and the tracking arrived with reference to previous frame image study is worked as The space-time context model of prior image frame, determine the space-time context model for the next two field picture of tracking that present frame learns；

Step 9, according to current frame image, to weight corresponding to the first confidence matrix, weight corresponding to the second confidence matrix It is updated；

Step 10, step 3 is repeated to step 9, until completing to need all video images tracked.

The present invention incorporated spatio-temporal context information in online boosting target tracking algorisms, efficiently solve when with When track target is at least partially obscured or all blocked, easily there is tracking drift or even asked with what is lost in online boosting algorithms Topic, can realize the tracking of fast robust.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is that a kind of flow of method for tracking target of combination spatio-temporal context information provided in an embodiment of the present invention is illustrated Figure；

Fig. 2 is the tracking effect contrast schematic diagram of the inventive method and existing two methods.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.

Technical solution of the present invention utilizes adjacent two interframe target in video image that too big change will not occur, and position also will not Undergo mutation and target and target around background certain specific relation be present；When the outward appearance of target, great changes will take place When, it can help distinguish between target and background using this relation.Spatio-temporal context information is incorporated into online by the present invention In Boosting algorithms.

Spatio-temporal context information：Temporal information be exactly close on interframe target outward appearance and position will not undergo mutation, space Information is exactly that target and the background around it have certain specific relation, and this relation can help distinguish between target and background. Combination to the two information of target is exactly spatio-temporal context information.

The embodiment of the present invention provides a kind of method for tracking target of combination spatio-temporal context information, as shown in figure 1, the side Method comprises the following steps：

Step 1, the first two field picture in video image is obtained, the target area of first two field picture is demarcated, with described Centered on target area, the target area is expanded to obtain region of search, region of search size is the four of target area size Times, and using the target area as positive sample, using four corner areas of the region of search as four negative samples； Wherein, the size of the target area is identical with the size of each corner areas；Using the positive sample and four negative samples as Training sample, strong classifier is obtained according to the training sample.

In step 1, using the positive sample and four negative samples as training sample, divided by force according to the training sample Class device, specifically include following sub-step：

(1a) note training sample set S={ (x_i,y_i)|x_i∈X,y_i∈ Y, i=1,2 ... 5 }, X is represented by a positive sample This training sample space formed with four negative samples, x_iI-th of training sample in training sample space is represented, Y represents sample This class label, and Y={ -1,1 }, y_iRepresent the sample class label of i-th of training sample in training sample space；Sample Class label is that the 1 expression training sample is positive sample, and sample class label is that -1 expression training sample is negative sample；

M Weak Classifier is set, and note wherein m-th of Weak Classifier isM=1 ..., M；M represents Weak Classifier Total number；

The initial value that i initial value is 1, m is 1；Sample importance weight λ=1 is set；

(1b) obtains i-th of training sample, to m-th of Weak ClassifierParameterIt is updated：

When m-th of Weak ClassifierWhen correct to the classification results of i-th of training sample, parameter is madeValue add up Sample importance weight λ value, as m-th of Weak ClassifierNew parameterOtherwise, parameter is madeValue add up Sample importance weight λ value, as m-th of Weak ClassifierNew parameter

Wherein,The correct sample weights of accumulation classification of m-th of Weak Classifier are represented,Represent m-th of weak typing The accumulation classification error sample weights of device；

(1c) makes i value add 1, repeats sub-step (1b), until i value is more than 5；Obtain m-th of Weak ClassifierFinal argument

(1d) makes i value be set to 1, m value adding 1, repeat sub-step (1b) to (1c), until m value is more than M, obtains To the final argument of M Weak Classifier；

(1e) calculates the cumulative error frequency of m-th of Weak ClassifierM is made to take 1 ..., M respectively, respectively Obtain the cumulative error frequency of M Weak Classifier；

(1f) obtains a minimum Weak Classifier of cumulative error frequency as nth selected deviceN initial value is 1, n= 1,...,N；The total number of N presentation selectors；

I value is made to be set to 1；

(1g) obtains i-th of training sample, using nth selected deviceSample importance weight λ value is carried out more Newly：

When nth selected deviceWhen correct to the classification results of i-th of training sample, sample importance weight λ value is made It is multiplied by 1/ (2 × (1- ε_n)), as new sample importance weight λ, otherwise, make sample importance weight λ value be multiplied by 1/ (2 ×ε_n), as new sample importance weight λ；Wherein, ε_nRepresent nth selected deviceThe accumulation of corresponding Weak Classifier is wrong Rate by mistake；

(1h) makes i value add 1, repeats sub-step (1g), until i value is more than 5；Obtain final new sample weight The property wanted weight λ value；

The value that i value is set to 1, m by (1i) is set to 1, and makes n value add 1, using final new sample importance weight λ, repeat sub-step (1b) to sub-step (1h), straight n value and be more than N, obtain N number of selector；

(1j) calculates nth selected deviceCorresponding ballot weightN value is made to take 1 successively ..., N, respectively obtain ballot weight corresponding to N number of selector；Ln () represents logarithmic function；

N number of selector is carried out linear combination by (1k) according to corresponding ballot weight, obtains strong classifierWherein, sign () represents sign function.

Step 2, according to the first two field picture studying space context model, and as the tracking next frame figure learnt The space-time context model of picture.

Technical solution of the present invention utilizes blocks the excellent of aspect based on the target tracking algorism of spatio-temporal context information in processing Gesture, online Boosting algorithms are made up to the deficiency in terms of blocking.

Step 3, the current frame image for needing to track is obtained, determines the initial search area of current frame image, present frame figure More than the initial search area of picture centered on the target area of one two field picture, and the initial search area of current frame image is upper one Four times of the target area of two field picture；Target area size to the initial search area of current frame image according to previous frame image Piecemeal is carried out, obtains multiple size identical sub-blocks to be searched.

In step 3, the initial search area of current frame image is divided according to the target area size of previous frame image Block, obtains multiple size identical sub-blocks to be searched, and its piecemeal step-length includes row step-length and row step-length：Row step sizes are： Floor ((1-T) × W+0.5), row step sizes are：floor((1-T)×H+0.5)；Floor () represents to seek whole, T downwards Represent the coincidence factor between two neighboring sub-block to be searched, W represents the width of the target area of the first two field picture, and H represents the The height of the target area of one two field picture.

Step 4, each sub-block to be searched is assessed according to strong classifier, obtain each sub-block to be searched first is put Letter value, form the first confidence matrix.

Step 4 specifically includes：Each sub-block to be searched is assessed according to strong classifier, obtains each sub-block to be searched The first the value of the confidenceThe first confidence matrix is formed, x represents any sub-block to be searched.

Step 5, the space-time context model of the tracking current frame image arrived according to previous frame image study, tries to achieve confidence map Function；It is determined that the central point of each sub-block to be searched, according to the confidence map function, the central point of each sub-block to be searched, divides The second the value of the confidence of each sub-block to be searched is not tried to achieve, forms the second confidence matrix.

Step 5 specifically includes following sub-step：

The space-time context model for the tracking current frame image that (5a) arrives according to previous frame image study, tries to achieve confidence map letter Number c (h)=IFFT (FFT (H^stc(h))⊙FFT(R(h)ω_σ(h-h^*)))；

Wherein, H^stc(h) the space-time context model for the tracking current frame image that previous frame image study arrives is represented, h is represented Optional position in current frame image region of search, R (h) represent the gray scale of h opening position pixels in current frame image region of search Value；ω_σ(h-h^*) weighting function is represented, and be defined asζ is a regularization constant, and σ is Scale parameter, h^*The center position of previous frame objective area in image is represented, FFT () represents Fourier transformation, IFFT () represents inverse Fourier transform, and ⊙ represents dot product；

(5b) makes the central point that the variable h in confidence map function is respectively each sub-block to be searched of current frame image, point The second the value of the confidence of each sub-block to be searched is not tried to achieve, forms the second confidence matrix.

Step 6, the initial value for determining weight corresponding to the first confidence matrix is 1/2, weight corresponding to the second confidence matrix Initial value be 1/2, according to the first confidence matrix, weight corresponding to the first confidence matrix and second is put the second confidence matrix Believe weight corresponding to matrix, linear combination obtains final confidence matrix；And determine maximum in the final confidence matrix The value of the confidence, sub-block to be searched corresponding to the maximum the value of the confidence are the target area of the current frame image traced into.

Step 7, the region of search of current frame image is determined, the region of search of current frame image is with the target of current frame image Centered on region, and the region of search of current frame image is four times of the target area of current frame image；By current frame image Target area is as positive sample, using four corner areas of current frame image region of search as four negative samples, to strong Grader is updated.

Step 8, according to current frame image studying space context model, and the tracking arrived with reference to previous frame image study is worked as The space-time context model of prior image frame, determine the space-time context model for the next two field picture of tracking that present frame learns.

Step 8 specifically includes following sub-step：

(8a) determines the context prior probability model P (c (z) | o) of current frame image：

P (c (z) | o)=R (z) ω_σ(z-h^*)

Wherein, under conditions of P (c (z) | o) represents that target occurs in present frame region of search, current frame image background area The prior probability that contextual feature occurs at each pixel in domain, o represent the event that target occurs in present frame region of search, Contextual feature at z is expressed as c (z)=(R (z), z), and z ∈ Ω, z are the optional position in current frame image background area, Ω is current frame image background area, and current frame image background area refers in current frame image region of search in addition to target area Image-region, R (z) represent the gray value of pixel at the position z of current frame image background area, ω_σ(z-h^*) represent a weight letter Number, and be defined asζ is a regularization constant, and σ is scale parameter, h^*Represent previous frame image The center position of middle target area；

(8b) determines the spatial context model P (h | c (z), o) of current frame image：

P (h | c (z), o)=f^sc(h-z)

Wherein, P (h | c (z), o) represents that target occurs in current frame image region of search, and contextual feature goes out at z Target location is h conditional probability under conditions of existing, and h represents the optional position in current frame image region of search, f^sc(h-z) it is One, on position h and position z function, represents the spatial context model that present frame learns；

(8c) is according to confidence functionTry to achieve that present frame learns spatially under Literary model f^sc(h)：

Wherein, c (h) is confidence map function, is expressed asWherein, b is a constant, and α is scale parameter, and β is Form parameter,Represent convolution symbol；

(8d) sets current frame image as t two field pictures, and the space-time for tracking current frame image that previous frame image study arrives Context model isSo as to the space-time context model for the next two field picture of tracking that present frame learnsFor：

Wherein, ρ is undated parameter, and ρ ∈ (0,1), as t=1, Represent t two field pictures The spatial context model learnt.

Step 9, according to current frame image, to weight corresponding to the first confidence matrix, weight corresponding to the second confidence matrix It is updated.

For weight corresponding to the first confidence matrix, the renewal of weight corresponding to the second confidence matrix, mainly need to examine Consider whether target is blocked, both at 1/2 when initial, when target is partly or entirely blocked, increase by the second confidence matrix The ballot weight of corresponding weight, reduce weight corresponding to the first confidence matrix.In order to whether present frame target is blocked into Row judges that the concept that the embodiment of the present invention introduces occlusion coefficient is judged that this method is built upon color histogram feature On the basis of.

Step 9 specifically includes following sub-step：

(9a) calculates the occlusion coefficient occ of current frame image region of search；

(9b) setting occlusion coefficient threshold epsilon, 0<ε<1, then weight A1 corresponding to the first confidence matrix, the second confidence matrix pair The weight A2 renewals answered are as follows：

Wherein, A represents weight corresponding to the first confidence matrix of previous frame image determination, and Y represents final confidence matrix Middle maximum the value of the confidence.

Sub-step 9 (a) specifically includes following sub-step：

(9a1) obtains the color histogram feature of current frame image region of search；By the color histogram characteristic quantification For J levels, j-th stage character representation is u_j, and u_j=j, j=1 ..., J；

J initial value is 1；

(9a2) set the positional representation of the pixel of the first two field picture target area asK is the first two field picture target The pixel total number included in region, then j-th stage feature u_jThe probability density function being distributed on the first two field picture target areaIt is defined as：

Wherein, C is normalized constant, and K () is kernel function, | | | |²Square of modulus value is represented, δ () is represented Impulse response function,Represent positionThe quantization level of corresponding color histogram feature；

(9a3) sets the positional representation of the pixel of any sub-block to be searched of current frame image as { d_i}_{I=1,2 ... k}, k is present frame The pixel total number included in any sub-block to be searched of image, with the pixel total number phase included in the first two field picture target area Deng then j-th stage feature u_jThe probability density function being distributed in any sub-block to be searched of current frame imageIt is defined as：

Wherein, s is the center position of current frame image target area, and C is normalized constant, and K () is kernel function, ||·||²Square of modulus value is represented, δ () represents impulse response function,Represent positionCorresponding color histogram The quantization level of feature, h₁For the windows radius of kernel function；

The center position of the sub-block to be searched with maximum the value of the confidence is in (9a4) note current frame image region of search y₀, make the first intermediate variableIt is expressed as：

Make the second intermediate variableIt is expressed as：

Wherein, λ₁>=1, it is coverage extent parameter；

(9a5) makes j value add 1, repeats sub-step (9a2) to (9a4), J the second intermediate variables is obtained, so as to count Calculation obtains the occlusion coefficient of current frame image region of search

Technical solution of the present invention realizes that partial parameters set as follows in MATLAB 2014a：The number N=of selector 50, the number M=250 of Weak Classifier, factor T=0.99, scale parameter α=2.25, undated parameter ρ are overlapped between block and block =0.075, coverage extent parameter lambda₁=1, occlusion coefficient threshold θ=0.5.Three kinds of methods (the inventive method, online Boosting Algorithm, based on space-time contextual algorithms, solid box is the tracking effect of the inventive method, and pure dotted line frame is based on space-time context The tracking effect of algorithm, the dotted line frame with black real point are the tracking effect of online boosting algorithms) it is first in the first frame Beginning turns to identical target frame, and tracking effect is as shown in Figure 2.First video sequence (a row) tracks a toy dog, and (background is mixed Disorderly, target is blocked), the inventive method can correctly track target, but after 120 frames, other two kinds of algorithms are all with losing Target；Second video sequence (b row) tracks the people's (pedestrian's shelter target of motion) to be walked on a subway platform, can be with Find out that the inventive method is substantially better than other two kinds of effects, after particularly 43 frames；3rd sequence (c row) tracks one quickly The automobile (quick change occurs for target scale, and partial sequence is blocked) of traveling, the inventive method can also carry out robustness Tracking, so as to demonstrate the feasibility of the present invention.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in computer read/write memory medium, and the program exists During execution, execution the step of including above method embodiment；And foregoing storage medium includes：ROM, RAM, magnetic disc or CD Etc. it is various can be with the medium of store program codes.

The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of method for tracking target of combination spatio-temporal context information, it is characterised in that methods described comprises the following steps：

Step 1, the first two field picture in video image is obtained, the target area of first two field picture is demarcated, with the target Centered on region, the target area being expanded to obtain region of search, region of search size is four times of target area size, and Using the target area as positive sample, using four corner areas of the region of search as four negative samples；Wherein, The size of the target area is identical with the size of each corner areas；Using the positive sample and four negative samples as training sample This, strong classifier is obtained according to the training sample；

Step 2, according to the first two field picture studying space context model, and as the next two field picture of tracking learnt Space-time context model；

Step 3, the current frame image for needing to track is obtained, determines the initial search area of current frame image, current frame image More than initial search area centered on the target area of one two field picture, and the initial search area of current frame image is previous frame figure Four times of the target area of picture；The initial search area of current frame image is carried out according to the target area size of previous frame image Piecemeal, obtain multiple size identical sub-blocks to be searched；

Step 4, each sub-block to be searched is assessed according to strong classifier, obtains the first confidence of each sub-block to be searched Value, form the first confidence matrix；

Step 5, the space-time context model of the tracking current frame image arrived according to previous frame image study, tries to achieve confidence map letter Number；It is determined that the central point of each sub-block to be searched, according to the confidence map function, the central point of each sub-block to be searched, difference The second the value of the confidence of each sub-block to be searched is tried to achieve, forms the second confidence matrix；

Step 6, the initial value for determining weight corresponding to the first confidence matrix is 1/2, at the beginning of weight corresponding to the second confidence matrix Initial value is 1/2, according to the first confidence matrix, weight corresponding to the first confidence matrix, and the second confidence matrix and the second confidence square Weight corresponding to battle array, linear combination obtain final confidence matrix；And determine confidence maximum in the final confidence matrix Value, sub-block to be searched corresponding to the maximum the value of the confidence is the target area of the current frame image traced into；

Step 7, the region of search of current frame image is determined, the region of search of current frame image is with the target area of current frame image Centered on, and the region of search of current frame image is four times of the target area of current frame image；By the target of current frame image Region is as positive sample, using four corner areas of current frame image region of search as four negative samples, to strong classification Device is updated；

Step 8, according to current frame image studying space context model, and the tracking present frame arrived with reference to previous frame image study The space-time context model of image, determine the space-time context model for the next two field picture of tracking that present frame learns；

Step 9, according to current frame image, to weight corresponding to the first confidence matrix, weight corresponding to the second confidence matrix is carried out Renewal；

A kind of 2. method for tracking target of combination spatio-temporal context information according to claim 1, it is characterised in that step In 1, using the positive sample and four negative samples as training sample, strong classifier is obtained according to the training sample, specific bag Include following sub-step：

(1a) note training sample set S={ (x_i, y_i)|x_i∈ X, y_i∈ Y, i=1,2 ... 5 }, X is represented by a positive sample and four The training sample space of individual negative sample composition, x_iI-th of training sample in training sample space is represented, Y represents sample class Label, and Y={ -1,1 }, y_iRepresent the sample class label of i-th of training sample in training sample space；Sample class mark It is positive sample to sign as the 1 expression training sample, and sample class label be -1 to represent that the training sample is negative sample；

M Weak Classifier is set, and note wherein m-th of Weak Classifier isM=1 ..., M；M represents total of Weak Classifier Number；

When m-th of Weak ClassifierWhen correct to the classification results of i-th of training sample, parameter is madeValue add up sample Weights of importance λ value, as m-th of Weak ClassifierNew parameterOtherwise, parameter is madeValue add up sample Weights of importance λ value, as m-th of Weak ClassifierNew parameter

Wherein,The correct sample weights of accumulation classification of m-th of Weak Classifier are represented,Represent the tired of m-th Weak Classifier Integrate class error sample weights；

(1c) makes i value add 1, repeats sub-step (1b), until i value is more than 5；Obtain m-th of Weak ClassifierMost End condition

(1d) makes i value be set to 1, m value adding 1, repeat sub-step (1b) to (1c), until m value is more than M, obtains M The final argument of Weak Classifier；

(1e) calculates the cumulative error frequency of m-th of Weak ClassifierMake m take 1 ..., M respectively, respectively obtain The cumulative error frequency of M Weak Classifier；

(1f) obtains a minimum Weak Classifier of cumulative error frequency as nth selected deviceN initial value is 1, n= 1 ..., N；The total number of N presentation selectors；

I value is made to be set to 1；

(1g) obtains i-th of training sample, using nth selected deviceSample importance weight λ value is updated：

When nth selected deviceWhen correct to the classification results of i-th of training sample, sample importance weight λ value is made to be multiplied by 1/(2×(1-ε_n)), as new sample importance weight λ, otherwise, make sample importance weight λ value be multiplied by 1/ (2 × ε_n), As new sample importance weight λ；Wherein, ε_nRepresent nth selected deviceThe cumulative error frequency of corresponding Weak Classifier；

(1h) makes i value add 1, repeats sub-step (1g), until i value is more than 5；Obtain final new sample importance Weight λ value；

The value that i value is set to 1, m by (1i) is set to 1, and makes n value add 1, using final new sample importance weight λ, weight Sub-step (1b) to sub-step (1h) is performed again, and straight n value is more than N, obtains N number of selector；

(1j) calculates nth selected deviceCorresponding ballot weightN value is made to take 1 ..., N successively, point Ballot weight corresponding to N number of selector is not obtained；Ln () represents logarithmic function；

N number of selector is carried out linear combination by (1k) according to corresponding ballot weight, obtains strong classifier Wherein, sign () represents sign function.

A kind of 3. method for tracking target of combination spatio-temporal context information according to claim 1, it is characterised in that step In 3, piecemeal is carried out according to the target area size of previous frame image to the initial search area of current frame image, obtained multiple big Small identical sub-block to be searched, its piecemeal step-length include row step-length and row step-length：Row step sizes are：floor((1-T)×W+ 0.5), row step sizes are：floor((1-T)×H+0.5)；Floor () expressions ask downwards whole, and T represents two neighboring and waits to search The coincidence factor between large rope block, W represent the width of the target area of the first two field picture, and H represents the target area of the first two field picture The height in domain.

A kind of 4. method for tracking target of combination spatio-temporal context information according to claim 2, it is characterised in that step 4 specifically include：

Each sub-block to be searched is assessed according to strong classifier, obtains the first the value of the confidence of each sub-block to be searchedThe first confidence matrix is formed, x represents any sub-block to be searched.

A kind of 5. method for tracking target of combination spatio-temporal context information according to claim 1, it is characterised in that step 5 specifically include following sub-step：

The space-time context model for the tracking current frame image that (5a) arrives according to previous frame image study, tries to achieve confidence map function c (h)=IFFT (FFT (H^stc(h))⊙FFT(R(h)ω_σ(h-h^*)))；

Wherein, H^stc(h) the space-time context model for the tracking current frame image that previous frame image study arrives is represented, h represents current Optional position in two field picture region of search, R (h) represent the gray value of h opening position pixels in current frame image region of search； ω_σ(h-h^*) weighting function is represented, and be defined asζ is a regularization constant, and σ is chi Spend parameter, h^*The center position of previous frame objective area in image is represented, FFT () represents Fourier transformation, IFFT () Inverse Fourier transform is represented, ⊙ represents dot product；

(5b) makes the central point that the variable h in confidence map function is respectively each sub-block to be searched of current frame image, asks respectively The second the value of the confidence of each sub-block to be searched is obtained, forms the second confidence matrix.

A kind of 6. method for tracking target of combination spatio-temporal context information according to claim 1, it is characterised in that step 8 specifically include following sub-step：

P (c (z) | o)=R (z) ω_σ(z-h^*)

Wherein, under conditions of P (c (z) | o) represents that target occurs in present frame region of search, in current frame image background area The prior probability that contextual feature occurs at each pixel, o represent the event that occurs in present frame region of search of target, at z Contextual feature be expressed as c (z)=(R (z), z), z ∈ Ω, z are the optional position in current frame image background area, and Ω is Current frame image background area, current frame image background area refer to the image in addition to target area in current frame image region of search Region, R (z) represent the gray value of pixel at the position z of current frame image background area, ω_σ(z-h^*) weighting function is represented, And it is defined asζ is a regularization constant, and σ is scale parameter, h^*Represent mesh in previous frame image Mark the center position in region；

P (h | c (z), o)=f^sc(h-z)

Wherein, P (h | c (z), o) represents that target occurs in current frame image region of search, and contextual feature appearance z at Under the conditions of target location be h conditional probability, h represents the optional position in current frame image region of search, f^sc(h-z) it is one On position h and position z function, the spatial context model that present frame learns is represented；

(8c) is according to confidence functionTry to achieve the spatial context mould that present frame learns Type f^sc(h)：

<mrow> <msup> <mi>f</mi> <mrow> <mi>s</mi> <mi>c</mi> </mrow> </msup> <mrow> <mo>(</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>I</mi> <mi>F</mi> <mi>F</mi> <mi>T</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>F</mi> <mi>F</mi> <mi>T</mi> <mrow> <mo>(</mo> <mrow> <mi>c</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <mi>F</mi> <mi>F</mi> <mi>T</mi> <mrow> <mo>(</mo> <mrow> <mi>R</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>)</mo> </mrow> <msub> <mi>&omega;</mi> <mi>&sigma;</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>h</mi> <mo>-</mo> <msup> <mi>h</mi> <mo>*</mo> </msup> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>=</mo> <mi>I</mi> <mi>F</mi> <mi>F</mi> <mi>T</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>F</mi> <mi>F</mi> <mi>T</mi> <mrow> <mo>(</mo> <mrow> <msup> <mi>be</mi> <mrow> <mo>-</mo> <mo>|</mo> <mfrac> <mrow> <mi>h</mi> <mo>-</mo> <msup> <mi>h</mi> <mo>*</mo> </msup> </mrow> <mi>&alpha;</mi> </mfrac> <msup> <mo>|</mo> <mi>&beta;</mi> </msup> </mrow> </msup> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <mi>F</mi> <mi>F</mi> <mi>T</mi> <mrow> <mo>(</mo> <mrow> <mi>R</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>)</mo> </mrow> <msub> <mi>&omega;</mi> <mi>&sigma;</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>h</mi> <mo>-</mo> <msup> <mi>h</mi> <mo>*</mo> </msup> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

Wherein, c (h) is confidence map function, is expressed asWherein, b is a constant, and α is scale parameter, and β is shape Parameter,Represent convolution symbol；

(8d) sets current frame image as t two field pictures, and previous frame image study arrive tracking current frame image space-time above and below Literary model isSo as to the space-time context model for the next two field picture of tracking that present frame learnsFor：

Wherein, ρ is undated parameter, and ρ ∈ (0,1), as t=1,f_t ^sc(h) study of t two field pictures is represented The spatial context model arrived.

A kind of 7. method for tracking target of combination spatio-temporal context information according to claim 1, it is characterised in that step 9 specifically include following sub-step：

(9b) sets occlusion coefficient threshold epsilon, and 0 ＜ ε ＜ 1, then weight A1 corresponding to the first confidence matrix, the second confidence matrix are corresponding Weight A2 renewal it is as follows：

<mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>A</mi> <mn>1</mn> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>Y</mi> </mrow> </mfrac> <mo>&times;</mo> <mi>A</mi> </mrow> </mtd> <mtd> <mrow> <mi>A</mi> <mn>2</mn> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mi>A</mi> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>c</mi> <mi>c</mi> <mo>></mo> <mi>&epsiv;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>A</mi> <mn>1</mn> <mo>=</mo> <mi>A</mi> <mn>2</mn> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>c</mi> <mi>c</mi> <mo>&le;</mo> <mi>&epsiv;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced>

Wherein, A represents weight corresponding to the first confidence matrix that previous frame image determines, and Y is represented in final confidence matrix most Big the value of the confidence.

A kind of 8. method for tracking target of combination spatio-temporal context information according to claim 7, it is characterised in that sub-step Rapid 9 (a) specifically include following sub-step：

(9a1) obtains the color histogram feature of current frame image region of search；It is J by the color histogram characteristic quantification Level, j-th stage character representation is u_j, and u_j=j, j=1 ..., J；

J initial value is 1；

(9a2) set the positional representation of the pixel of the first two field picture target area asK is the first two field picture target area In the pixel total number that includes, then j-th stage feature u_jThe probability density function being distributed on the first two field picture target areaIt is fixed Justice is：

<mrow> <msub> <mi>M</mi> <msub> <mi>u</mi> <mi>j</mi> </msub> </msub> <mo>=</mo> <mi>C</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mi>K</mi> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>b</mi> <mo>(</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> <mo>-</mo> <msub> <mi>u</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, C is normalized constant, and K () is kernel function, | | | |²Square of modulus value is represented, δ () represents that pulse rings Answer function,Represent positionThe quantization level of corresponding color histogram feature；

(9a3) sets the positional representation of the pixel of any sub-block to be searched of current frame image as { d_i}_{I=1,2 ... k}, k is current frame image The pixel total number included in any sub-block to be searched, it is equal with the pixel total number included in the first two field picture target area, Then j-th stage feature u_jThe probability density function being distributed in any sub-block to be searched of current frame imageIt is defined as：

<mrow> <msub> <mi>N</mi> <msub> <mi>u</mi> <mi>j</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>C</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mi>K</mi> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <mfrac> <mrow> <mi>s</mi> <mo>-</mo> <msub> <mi>d</mi> <mi>i</mi> </msub> </mrow> <msub> <mi>h</mi> <mn>1</mn> </msub> </mfrac> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>b</mi> <mo>(</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> <mo>-</mo> <msub> <mi>u</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, s is the center position of current frame image target area, and C is normalized constant, and K () is kernel function, | | | |²Square of modulus value is represented, δ () represents impulse response function,Represent positionCorresponding color histogram feature Quantization level, h₁For the windows radius of kernel function；

The center position in (9a4) note current frame image region of search with the sub-block to be searched of maximum the value of the confidence is y₀, make One intermediate variableIt is expressed as：

Make the second intermediate variableIt is expressed as：

Wherein, λ₁>=1, it is coverage extent parameter；

(9a5) makes j value add 1, repeats sub-step (9a2) to (9a4), J the second intermediate variables is obtained, so as to calculate To the occlusion coefficient of current frame image region of search