CN107341817A

CN107341817A - Self-adaptive visual track algorithm based on online metric learning

Info

Publication number: CN107341817A
Application number: CN201710455281.3A
Authority: CN
Inventors: 康文静; 孙叔桥; 刘功亮
Original assignee: Harbin Institute of Technology Weihai
Current assignee: Harbin Institute of Technology Weihai
Priority date: 2017-06-16
Filing date: 2017-06-16
Publication date: 2017-11-10
Anticipated expiration: 2037-06-16
Also published as: CN107341817B

Abstract

The present invention relates to Visual Tracking field,Specifically a kind of self-adaptive visual track algorithm based on online metric learning,In the practical application scene in vision tracking field,Obtainable target priori is generally seldom in video sequence to be tracked,The predefined formula distance metric algorithm of tradition is difficult reply long-range tracing task requirement,The present invention proposes a kind of online Vision Tracking of robust of combination learning distance metric,Before tracking is considered as by it,Two classification problems of background,And constantly update grader as video promotes,It also proposed a kind of new template renewal algorithm,Tracking process is set to have more robustness,To improve the precision and efficiency of algorithm,It is proposed to reduce characteristic dimension while tracking effect is ensured using dense SIFT feature and random PCA,A series of experiments result is shown,Carried algorithm has certain competitiveness compared with many epidemic algorithms instantly.

Description

Self-adaptive visual track algorithm based on online metric learning

Technical field：

The present invention relates to Visual Tracking field, specifically a kind of self-adaptive visual based on online metric learning Track algorithm.

Background technology：

As an important topic of computer vision, the discussion of vision tracking has continued for many years.Its main task It is to identify the target in video sequence, and the position of the propulsion constantly tracking target with video.In numerous trackings, It is one of most popular method to carry out classification to target morphology in feature space.

For most of existing algorithms, the distance matrix metric of predefined formula is all applied in similarity computing, than Such as Euclidean distance and mahalanobis distance.However, when obvious deformation occurs for such as structure target, the measurement of this fixation Mode is extremely difficult to higher tracking accuracy requirement.Moreover, the change of background and illumination also result in based on it is predefined away from Track algorithm from measurement tracks failure.

Therefore, there is scholar to propose the adaptable track algorithm based on learning distance metric, tracking is improved with this Robustness.The basic thought of metric learning tracking is to train to obtain grader with former frames of video sequence, and follow-up Constantly updated during tracking.The presence of distance metric cause before, background preferably distinguished in feature space, have simultaneously The point distance for having identical category label reduces.New projecting space generally has lower dimension, and this also causes algorithm to compare The amount of calculation of luv space substantially reduces.

The content of the invention：

The present invention is for shortcoming and defect present in prior art, it is proposed that one kind has abandoned Most current algorithm institute The distance metric mode of the predefined formula of application, matching result is found in the new space of continuous acquistion, can be with so as to expand The diversity of track target, improve the robustness under the tracking of algorithm long-range；In addition adaptivity screen selecting formwork is had also combined, is changed The mechanical template renewal mode of existing algorithm so that ATL more accurately and rapidly fits while fault-tolerant ability is ensured Answer the self-adaptive visual track algorithm based on online metric learning of object variations.

The present invention is reached by following measures：

A kind of self-adaptive visual track algorithm based on online metric learning, it is characterised in that including herein below：

Step 1：Target is described with SIFT feature：DSIFT in the VLFeat feature extractions storehouse of feature extraction algorithm application Fast algorithm, for the segment of 50 × 50 sizes, corresponding SIFT feature dimension is 128 × 9=1152, and this is calculated online tracking It is very big amount of calculation for method；Therefore dimensionality reduction is carried out to the SIFT feature extracted with random PCA (RPCA)： Give an eigenmatrixThe basic thought of Feature Dimension Reduction is to find a mapping matrixBy original n Dimensional feature space is projected to new k dimensional feature spaces, so as to realize dimensionality reduction purpose, after known features matrix and target dimension, RPCA can define an over-sampling dimensionWith an accidental projection matrixTo the new matrix Y=X being calculated Ω application feature decompositions, have

Wherein B is an intermediary matrix, is then carried out SVD decomposition

X approximate matrix can be thus obtained by following formula

Finally, the eigenmatrix X after new projection^projCan passes through X^proj=XV_kObtain, here V_kIt is k before being intercepted to y What row obtained, for the SIFT feature of one 1152 dimension, the feature quantity after dimensionality reduction is about original 1/3, i.e., 384 tie up；

Step 2：Grader is constructed and trains using supervised study：Give two samplesDistance The purpose of metric learning is by adjusting original feature space, changing the relative space position relation between training set sample, make Identical category sample between distance as far as possible reduce, it is different classes of between sample distance as far as possible increase, distance on this condition can To be write as

d_G(x, y)=(x-y)^TG(x-y) (4)

Wherein, G is the distance matrix metric of acquistion, and the index for weighing similarity, constraint bar above are used as using K-L divergences Part can is write as

Wherein l and u is two distance thresholds；

Above-mentioned divergence problem, because the sample number obtained in a frame is very limited, nothing are solved with LogDet methods Method obtains the value of whole parameters, and " boot-strap " (bootstrap) method construct training set, practical application have been used in algorithm In, training set includes two class samples：Represent the positive sample of targetWith the negative sample for representing background information

It is being met constraintsInitial distance metric matrix after, tracking calculate Method in follow-up every frame inner updating distance metric, can give two segment u of extraction in t frames_tAnd v_t, then between them away from From forIf the distance of prediction is y_t, then can obtains new distance metric by solving following formula G_t+1

Wherein D is normalized function, and η is normalized parameter,It is the loss between target range and estimated distance Function, make z_t=u_t-v_t,Then the solution of this minimization problem is

The target of learning distance metric is

WhereinWithT frames are represented respectively corresponds to target and background sample in Sample Storehouse.

Also include adaptive selection template renewal in the present invention, training template is divided into according to update mode used in algorithm Two classes：Quick more new template and sane more new template；The former use is and the latter in order to adapt to the deformation of target in time Using being then in order to prevent tracking result from drifting about, wherein quick more new template is each frame to be extracted to obtain according to searching template , the design for searching template both ensure that efficiency of the algorithm in processing background information, also ensure that the accurate description to target； Sane more new template is then stored in ATL, and its size is fixed for given video sequence, and original template is from user Extract and obtain around the target location of first frame mark.

The centre bit that each asterisk (*) in template extraction mode represents an extraction sample segment is quickly updated in the present invention Put；Current goal center is origin, is positive sample away from the segment extracted in its 2 pixel coverage, and remaining segment is negative sample； After algorithm estimates target location out of present frame, the segment of 9 target sizes will be by track algorithm from 2 around the position Extracted in region in individual pixel coverage, it is assumed that I (x；T) represent to extract obtained segment out of t frames, its corresponding and instruction Practice the positive sample collection T in set T_posFrom the segment of minimum；The SIFT feature extraction result of the segment is represented, ifWith T_posBetween average distance be less than threshold value, then the more new task of ATL is exactly to calculateWith current mould Plate storehouse M_t={ m₁..., m_kIn each template distance, and it is compared the distance between with target in ATL, if newly Distance corresponding to template is more than M_tIn distance corresponding at least one template, then that minimum template of respective distances is by by new Template I (x；T) the reason for substituting, doing so is, if the distance between new template and positive sample collection are less than threshold value, the template It is considered positive sample；Meanwhile if it is bigger with the template distance in current template storehouse, then it is assumed that this carry more Fresh information, and these information are not available for the template in current template storehouse, and therefore, new template is used instead that With ATL M_tThe more like old template of other interior templates.

The present invention proposes a kind of online Vision Tracking based on metric learning, efficiently solves long-range tracking process The problem of middle target is easy to be lost, has stronger adaptability to motion blur and object deformation.Carried algorithm passes through distance metric Practise and constantly update metric space, so as to improve the robustness of algorithm.The template in adaptive template storehouse proposed in text, according to more New paragon can be divided into quick more new template and sane more new template, be respectively intended to tackle the Shandong of obvious deformation and the tracking of target Rod, continuity.The combination of the two, it ensure that algorithm can rapidly adapt to the change of target, can also be missed in algorithm When sentencing, the possibility for giving target for change again is still ensured that.In addition, text in carry Vision Tracking apply with owner into Divide analytic approach that dSIFT primitive character dimension effectively is reduced into 2/3, further increases algorithm as Feature Dimension Reduction mode Speed.Carried algorithm and current popular algorithm are compared under OTB video sequences, the results show inventive algorithm energy It is enough to tackle most of tracing task well, there is stronger competitiveness.

Brief description of the drawings：

Accompanying drawing 1 is that template extraction mode is quickly updated in the present invention.

Accompanying drawing 2 is Duplication curve synoptic diagram in the present invention.

Accompanying drawing 3 is centre deviation schematic diagram in the present invention.

Accompanying drawing 4 is that tracking effect intuitively compares schematic diagram in the present invention.

Embodiment：

The present invention is further illustrated below in conjunction with the accompanying drawings.

A kind of Vision Tracking based on learning distance metric is proposed in the present invention, has abandoned Most current algorithm The distance metric mode for the predefined formula applied, matching result is found in the new space of continuous acquistion, can so as to expand The diversity of target is tracked, improves the robustness under the tracking of algorithm long-range.In addition, algorithm has also combined adaptivity screening mould Plate, change the mechanical template renewal mode of existing algorithm so that ATL while fault-tolerant ability is ensured, more accurately, Quickly adapt to object variations.

Construct one of major issue of a robust Vision Tracking is how to select suitable goal description mode, this The result of tracking can be not only influenceed, can also influence the speed of tracking.Intuitively, pixel value can effectively one thing of description Body, and easily obtain.If however, being not added with advanced processes, grey value characteristics are easily by illumination variation, attitudes vibration etc. The influence of condition change.Even if it is compensated by metric matrix study, it is desirable to improve the tracking based on grey value characteristics The performance of algorithm is still challenged heavy.Rotation, miniature deformation, illumination variation etc. can be preferably tackled in view of SIFT feature Complex situations, SIFT feature is primarily upon in of the invention.

SIFT feature portrays target with characteristic point and description.For an object, it is micro- that key point corresponds to its multilayer Gauss The extreme value divided under (DoG).After a series of necessary intermediate steps (delete key point, feedback dimensionality reduction, direction are specified etc.), mesh Each key point extracted in mark can correspond to 128 dimensional feature vectors.In the application, 50 × 50 pixel sizes Segment can filter out 9 SIFT key points.Because SIFT feature extraction is computationally intensive, algorithm keeps track speed has been had a strong impact on. Therefore, here feature extraction algorithm application VLFeat feature extractions storehouse] in dSIFT fast algorithms, ensure that feature Arithmetic speed is improved while effect and consistency.

For the segment of 50 × 50 sizes, corresponding SIFT feature dimension is 128 × 9=1152, and this is calculated online tracking It is very big amount of calculation for method.Therefore, this algorithm proposes special to the SIFT extracted with random PCA (RPCA) Sign carries out dimensionality reduction.

Give an eigenmatrixThe basic thought of Feature Dimension Reduction is to find a mapping matrixOriginal n dimensional feature spaces are projected to new k dimensional feature spaces, so as to realize dimensionality reduction purpose.When known features square After battle array and target dimension, RPCA can define an over-sampling dimensionWith an accidental projection matrixTo calculating The new matrix Y=X Ω application feature decompositions arrived, have

Wherein B is an intermediary matrix.Then carried out SVD decomposition

X approximate matrix can be thus obtained by following formula

Finally, the eigenmatrix W after new projection^projCan passes through W^proj=XV_kObtain.Here V_kIt is k before being intercepted to y What row obtained.For the SIFT feature of one 1152 dimension, the feature quantity after dimensionality reduction is about original 1/3, i.e., 384 tie up.Compare It is more stable in accidental projection algorithm (RP), RPCA algorithm effects.

Whether machine learning method can have class label to be divided into two classes according to training set：Supervised learn and it is non-supervisory Formula learns.On the whole, supervised study (training process includes class label) shows more in the tracking application of most of visions It is good.First, during tracking, it is easy to obtain the class label of training sample because no matter initialization procedure or with Track deterministic process, target location and relevant information before present frame are all known.Target is marked with classification 1 as prospect, In addition all information are referred to as background, are marked with classification 0 (or -1).In addition, non-supervisory formula algorithm usually require it is longer Training time because its training process by constantly cluster and mean shift form.Therefore, this algorithm applies supervised Practise to construct and train grader.

Give two samplesThe purpose of learning distance metric is by adjusting original feature space, changing Relative space position relation between training set sample so that distance reduces as far as possible between the sample of identical category, different classes of Sample distance increases as far as possible.Distance on this condition can be write as

d_G(x, y)=(x-y)^TG(x-y) (12)

Wherein, G is the distance matrix metric of acquistion.The index for weighing similarity, constraint bar above are used as using K-L divergences Part can is write as

Wherein l and u is two distance thresholds.

Above-mentioned divergence problem is solved with LogDet methods., can not because the sample number obtained in a frame is very limited The value of whole parameters is obtained, " boot-strap " (bootstrap) method construct training set has been used in algorithm.Practical application In, training set includes two class samples：Represent the positive sample of targetWith the negative sample for representing background information

It is being met constraintsInitial distance metric matrix after, tracking calculate Method can be in follow-up every frame inner updating distance metric.Two segment u of extraction in given t frames_tAnd v_t, then between them away from From forIf the distance of prediction is y_t, then can obtains new distance metric by solving following formula G_t+1

Wherein D is normalized function, and η is normalized parameter,It is the loss between target range and estimated distance Function.Make z_t=u_t-v_t,Then the solution of this minimization problem is

The target of learning distance metric is

During tracking, template is presented with important influence for algorithm.Generally, when there is new frame of video input When, most of existing algorithm can extract segment around the target location that the center or estimation that user gives obtain, from And obtain positive and negative template.It is sensitive that this results in deformation of the target for object, therefore, it is difficult to tackle well target object regarding The deformation that may occur during frequency, so as to cause tracking to fail.And once target is lost, because all templates all receive Influence, such algorithm is also difficult to give target for change again.

In order to avoid this case, training template has been divided into two classes according to update mode used in algorithm：Quick renewal Template and sane more new template.The former use is to adapt to the deformation of target in time, and the use of the latter is then in order to anti- Only tracking result is drifted about.Quick more new template is to extract what is obtained to each frame according to the search template shown in Fig. 1.Search template Design both ensure that algorithm processing background information on efficiency, also ensure that the accurate description to target.Sane renewal mould Plate is then stored in ATL, and its size is fixed for given video sequence.Original template is marked from user in first frame Target location around extract and obtain.

As shown in figure 1, quickly each asterisk (*) represents the center of an extraction sample segment in renewal template extraction mode Position；Current goal center is origin, is positive sample away from the segment extracted in its 2 pixel coverage, and remaining segment is negative sample This.

After algorithm estimates target location out of present frame, the segments of 9 target sizes will by track algorithm from this Extracted in region around position in 2 pixel coverages.It is assumed that I (x；T) represent to extract obtained segment out of t frames, Positive sample collection T in its corresponding T with training set_posFrom the segment of minimum.Represent the SIFT feature extraction knot of the segment Fruit.IfWith T_posBetween average distance be less than threshold value, then the more new task of ATL is exactly to calculateWith Current template storehouse M_t={ m₁..., m_kIn each template distance, and it is compared the distance between with target in ATL. If distance corresponding to new template is more than M_tIn distance corresponding at least one template, then that minimum template of respective distances will By new template I (x；T) substitute.The reason for doing so is, if the distance between new template and positive sample collection are less than threshold value, The template is considered positive sample；Meanwhile if it is bigger with the template distance in current template storehouse, then it is assumed that it is carried More fresh informations, and these information are not available for the template in current template storehouse.Therefore, new template is used to replace For that and ATL M_tThe more like old template of other interior templates.

Carry out a series of experiments in the present invention to compare carried algorithm and some current epidemic algorithms, used in experiment Video sequence is the video sequence in OTB benchmark.These sequences contain posture, illumination, rotation and dimensional variation, screening The multiclass situation such as gear and quick movement.Experimental situation is MATLAB R2012a, the use of hardware is Intel 3.10GHz processors, 4GB RAM。

The present invention carries algorithm and compared with following seven classes algorithm：1)CCT；2)CSK；3)DFT；4)FOT；5)KCF； 6)LCT；7)LSHT.

Two class evaluation indexes have been used in comparing：Duplication (OR) and centre deviation (CLE).One in given t frames Segment P corresponding to estimated location_tThe corresponding true value position G with the frame_t, defining Duplication is

Wherein ∩ and ∪ represents the common factor and union in region respectively, | | represent the pixel number in region.Duplication OR Curve is drawn from 0 to 1 according to threshold value, and the point on curve represents that Duplication is higher than the frame proportion of threshold value.It can be seen that Duplication is bent Line is higher, and it is better to represent algorithm performance.

CLE represents the center deviation between estimated location and true value under Euclidean distance

Wherein (P_xt, P_yt) and (G_xt, G_yt) estimated location center and true value place-centric are represented respectively.Likewise, centre bit It is smaller to put deviation, then algorithm performance is better.

Here the tracking test result of some challenging sequences is given.It is all in whole experiment process Parameter used in track algorithm is all fixed, to meet a needs of algorithm can preferably tackle all situations.Fig. 2 is portion Divide Duplication (OR) curve corresponding to OTB sequences.Abscissa corresponds to threshold value setting from 0% to 100%, and curve then represents corresponding weight Ratio of the folded rate higher than whole video sequence frame number shared by the frame of threshold value.Therefore, a curve declines slower, its corresponding algorithm Performance is better.In view of if without overlapping region algorithm keeps track fail, then threshold value be numerical value corresponding to 1% place it is visual For the success rate of the algorithm.Fig. 3 is centre deviation (CLE) corresponding to the OTB video sequences of part.Similar with Fig. 2, centre deviation is bent Line is drawn from 1 to 50 according to threshold value and formed, and it is whole shared by the frame of threshold value to represent that centre deviation is less than for corresponding point on curve The ratio of video sequence frame number.It can be seen that ratio more Gao Ze represent algorithm performance it is better.It is 20 institutes usually using centre deviation threshold value Corresponding value is analyzed.

Quick motion (FM) and motion blur (MB) are challenges common in actual video sequence.Therefore, algorithm tackles this The performance of a little problems is an important indicator in actual assessment.Contained in OTB benchmark many containing quick motion With the video sequence of motion blur, such as BlurCar1, BlurFace, Car1¹,、Deer、Girl2,、Human9,、 Soccer etc..Carried algorithm and other algorithms are compared using these video sequences in text, corresponding OR and CLE are represented respectively In figs. 2 and 3.It can be seen that carried algorithm reply it is most of include FM and MB video sequence when effect all Better than other algorithms.Although some algorithms (such as CCT) have good effect approximate being considered as in static target following, But open defect be present in situations such as its reply rapid moving object and motion blur.

Accompanying drawing 2 is Duplication curve synoptic diagram, and for curve according to 1% to 100% threshold rendering, transverse axis is threshold value, and the longitudinal axis is Duplication is higher than the frame of video proportion of threshold value.

Accompanying drawing 3 is centre deviation schematic diagram, and wherein curve transverse axis is drawn according to threshold value 1 to 50, and the value of threshold value is corresponding to be estimated Position and the Euclidean distance of the center of true value, the longitudinal axis represent the frame of video proportion that deviation is less than threshold value.

Except FM and MB, the deformation (DEF) of structural object, (OCC), plane internal rotation (IPR) and plane outward turning are blocked Turn the significant challenge in (OPR) and video tracking.Video sequence comprising these factors has Bolt²、Coupon、David3、 Gym, Trellis etc..In addition, David3 and Trellis also contains blurred background, illumination variation and dimensional variation etc..These Experimental result corresponding to video sequence also show the superiority of put forward track algorithm.

Fig. 4 has marked out the tracking result that last frame is corresponded in above-mentioned video sequence, for more intuitively comparing.Institute Tracking result is marked with algorithm with 8 kinds of different colours respectively.It can be seen that the performance for carrying algorithm is better than other algorithms, and track mesh Target position away from real goal closer to.

Claims

1. a kind of self-adaptive visual track algorithm based on online metric learning, it is characterised in that including herein below：

Step 1：Target is described with SIFT feature：DSIFT in the VLFeat feature extractions storehouse of feature extraction algorithm application is quick Algorithm, for the segment of 50 × 50 sizes, corresponding SIFT feature dimension is 128 × 9=1152, this to on-line tracking and Speech is very big amount of calculation；Therefore dimensionality reduction is carried out to the SIFT feature extracted with random PCA (RPCA)：It is given One eigenmatrixThe basic thought of Feature Dimension Reduction is to find a mapping matrixBy original n Wei Te Sign space is projected to new k dimensional feature spaces, so as to realize dimensionality reduction purpose, after known features matrix and target dimension, and RPCA An over-sampling dimension can be definedWith an accidental projection matrixShould to the new matrix Y=X Ω being calculated With feature decomposition, have

Wherein B is an intermediary matrix, is then carried out SVD decomposition

<mrow> <mi>B</mi> <mo>=</mo> <mover> <mi>U</mi> <mo>~</mo> </mover> <msup> <mi>&Sigma;V</mi> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow>

X approximate matrix can be thus obtained by following formula

<mrow> <mi>X</mi> <mo>&ap;</mo> <msup> <mi>QQ</mi> <mi>T</mi> </msup> <mi>X</mi> <mo>=</mo> <mi>Q</mi> <mrow> <mo>(</mo> <mover> <mi>U</mi> <mo>~</mo> </mover> <msup> <mi>&Sigma;V</mi> <mi>T</mi> </msup> <mo>)</mo> </mrow> <mo>:</mo> <mo>=</mo> <msup> <mi>U&Sigma;V</mi> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>21</mn> <mo>)</mo> </mrow> </mrow>

Finally, the eigenmatrix X after new projection^projCan passes through X^proj=XV_kObtain, here V_kIt is that k arranges to obtain before being intercepted to V , for the SIFT feature of one 1152 dimension, the feature quantity after dimensionality reduction is about original 1/3, i.e., 384 tie up；

Step 2：Grader is constructed and trains using supervised study：Give two samplesDistance metric The purpose of habit is by adjusting original feature space, changing the relative space position relation between training set sample so that identical Distance reduces as far as possible between the sample of classification, it is different classes of between sample distance increase as far as possible, distance on this condition can be write as

d_G(x, y)=(x-y)^TG(x-y) (22)

Wherein, G is the distance matrix metric of acquistion, and using K-L divergences as the index for weighing similarity, constraints above is just It can be write as

<mrow> <mtable> <mtr> <mtd> <mrow> <munder> <mi>min</mi> <mi>G</mi> </munder> <mi>K</mi> <mi>L</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>(</mo> <mrow> <mi>x</mi> <mo>;</mo> <msub> <mi>G</mi> <mn>0</mn> </msub> </mrow> <mo>)</mo> <mi>p</mi> <mo>(</mo> <mrow> <mi>x</mi> <mo>;</mo> <mi>G</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mtable> <mtr> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> </mrow> </mtd> <mtd> <mrow> <msub> <mi>d</mi> <mi>G</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <mi>l</mi> <mo>,</mo> <mi>i</mi> <mi>f</mi> <mi> </mi> <mi>l</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>l</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> <mtd> <mrow> <msub> <mi>d</mi> <mi>G</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <mi>u</mi> <mo>,</mo> <mi>i</mi> <mi>f</mi> <mi> </mi> <mi>l</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>&NotEqual;</mo> <mi>l</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>23</mn> <mo>)</mo> </mrow> </mrow>

Wherein l and u is two distance thresholds；

Above-mentioned divergence problem is solved with LogDet methods, because the sample number obtained in a frame is very limited, can not be obtained The value of whole parameters, has used " boot-strap " (bootstrap) method construct training set in algorithm, in practical application, Training set includes two class samples：Represent the positive sample of targetWith the negative sample for representing background information

It is being met constraintsInitial distance metric matrix after, track algorithm meeting In follow-up every frame inner updating distance metric, the two segment u extracted in t frames are given_tAnd v_t, then the distance between they beIf the distance of prediction is y_t, then can obtains new distance metric G by solving following formula_t+1

Wherein D is normalized function, and η is normalized parameter,It is the loss function between target range and estimated distance, Make z_t=u_t-v_t,Then the solution of this minimization problem is

<mrow> <msub> <mi>G</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>=</mo> <msub> <mi>G</mi> <mi>t</mi> </msub> <mo>-</mo> <mfrac> <mrow> <mi>&eta;</mi> <mrow> <mo>(</mo> <mover> <mi>y</mi> <mo>&OverBar;</mo> </mover> <mo>-</mo> <msub> <mi>y</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>G</mi> <mi>t</mi> </msub> <msub> <mi>z</mi> <mi>t</mi> </msub> <msubsup> <mi>z</mi> <mi>t</mi> <mi>T</mi> </msubsup> <msub> <mi>G</mi> <mi>t</mi> </msub> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <mi>&eta;</mi> <mrow> <mo>(</mo> <mover> <mi>y</mi> <mo>&OverBar;</mo> </mover> <mo>-</mo> <msub> <mi>y</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <msubsup> <mi>z</mi> <mi>t</mi> <mi>T</mi> </msubsup> <msub> <mi>G</mi> <mi>t</mi> </msub> <msub> <mi>z</mi> <mi>t</mi> </msub> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>25</mn> <mo>)</mo> </mrow> </mrow>

The target of learning distance metric is

A kind of 2. self-adaptive visual track algorithm based on online metric learning according to claim 1, it is characterised in that Also include adaptive selection template renewal, training template has been divided into two classes according to update mode used in algorithm：Quick renewal Template and sane more new template；The former use is to adapt to the deformation of target in time, and the use of the latter is then in order to anti- Only tracking result is drifted about, wherein quick more new template is extracted to obtain according to template is searched to each frame, searches setting for template Meter both ensure that efficiency of the algorithm in processing background information, also ensure that the accurate description to target；Sane more new template is then It is stored in ATL, its size is fixed for given video sequence, and original template is the mesh marked from user in first frame Extraction obtains around cursor position.

A kind of 3. self-adaptive visual track algorithm based on online metric learning according to claim 2, it is characterised in that Each asterisk (*) represents the center of an extraction sample segment in quick renewal template extraction mode；Current goal center Position is origin, is positive sample away from the segment extracted in its 2 pixel coverage, and remaining segment is negative sample；When algorithm is from present frame After inside estimating target location, the segments of 9 target sizes will be by track algorithm from around the position in 2 pixel coverages Extracted in region, it is assumed that I (x；T) represent to extract obtained segment out of t frames, in its corresponding T with training set just Sample set T_posFrom the segment of minimum；The SIFT feature extraction result of the segment is represented, ifWith T_posBetween Average distance be less than threshold value, then the more new task of ATL is exactly to calculateWith current template storehouse M_t={ m₁,…, m_kIn each template distance, and it is compared the distance between with target in ATL, if distance is big corresponding to new template In M_tIn distance corresponding at least one template, then that minimum template of respective distances is by by new template I (x；T) substitute, The reason for doing so is, if the distance between new template and positive sample collection are less than threshold value, the template is considered just Sample；Meanwhile if it is bigger with the template distance in current template storehouse, then it is assumed that this carry more fresh informations, and this A little information are not available for the template in current template storehouse, and therefore, new template is used instead that and ATL M_tIt is interior The more like old template of other templates.