CN112233140A

CN112233140A - SSVM tracking method based on DIOU loss and smoothness constraint

Info

Publication number: CN112233140A
Application number: CN202010755733.1A
Authority: CN
Inventors: 袁广林; 孙子文; 李从利; 秦晓燕; 韩裕生; 陈萍; 李豪; 琚长瑞
Original assignee: PLA Army Academy of Artillery and Air Defense
Current assignee: PLA Army Academy of Artillery and Air Defense
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2021-01-15
Anticipated expiration: 2040-07-31
Also published as: CN112233140B

Abstract

The invention discloses an SSVM tracking method based on DIOU loss and smoothness constraint. The method comprises the following steps: establishing a structured SVM model based on DIOU loss and smoothness constraint, and converting the structured SVM model based on DIOU loss and smoothness constraint into a dual problem according to a Lagrange multiplier method; solving a structured SVM model based on DIOU loss and smooth constraint by adopting a dual coordinate descent principle, and estimating the state of a target; and evaluating the position of the tracking target by adopting a multi-scale target tracking method, and selecting the structured output with the maximum response as a tracking result. The method overcomes the problems of inaccurate loss function and model drift, and effectively improves the accuracy and the success rate of the target tracking algorithm based on the structured SVM.

Description

SSVM tracking method based on DIOU loss and smoothness constraint

Technical Field

The invention relates to the technical field of computer visual target tracking, in particular to an SSVM tracking method based on DIOU loss and smoothness constraint.

Background

Object tracking is an important research content in the field of computer vision, the object of which is to estimate the state of an object using a sequence of images. The target tracking has important application prospects in the civil fields of video monitoring, vehicle navigation, man-machine interaction, intelligent transportation and the like, and the military fields of vision guidance, target positioning, fire control and the like. In recent years, although the target tracking has been greatly developed, it still faces many challenging problems such as complex background, illumination change and target shielding, so the target tracking is always a hot problem in the field of computer vision.

Target tracking is classified into production tracking and discriminant tracking. Representative methods in generative Tracking are IVT Tracking [ Ross D A, Lim J, Lin R S, et al. inventive Learning for Robust Visual Tracking [ J ]. International Journal of Computer Vision,2008,77(1-3):125- & ltd > 141.], L1 Tracking [ Mei X, Ling H. Robust Visual Tracking L1 minimization [ A ]. Proceedings of IEEE Conference on Computer Vision [ C ]. Kyoto, Japan: IEEE Computer Society Press,2009,1436- & ltd. ] and coherent filter Tracking [ Sun voyage, Lily, Xiao Fu, Wai ] related filter target Tracking [ J ]. electroc, 2017,45(10) & ltd. & gt 2332 ] based on multi-stage Learning. Representative methods of discriminant tracking are MIL tracking [ Babenko B, Yang M H, Belongie S.Robust object tracking with online multiple instance learning [ J ]. IEEE Transactions on Pattern and Machine learning, 2011,33(8): 1619. 1632 ], TLD tracking [ Kalal Z, MikolajczyK, as J.Tracking-learning-detecting [ J ]. IEEE Transactions on Pattern and Machine learning, 2012,34 (7. nA. 1422 ], random forest tracking [ SaffariA, Leistner C, Torque J, Godec M, Bib. Link [ Saffoni ] C, equation [ J.Soffoni J.: sample C, sample J, sample M, sample C, 2011,263-. In 2004, Avidan first proposed an SVM-based target tracking method [ s.avidan.supported vector tracking [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2004,26(8): 1064-. Inspired by the application of structured SVM in target detection, in 2011, Hare et al first proposed a structured SVM-based target tracking method on ICCV [ Hare S, Saffari A, Torr P H S.Struck: structured output tracking with keys [ A ]. Procedents of IEEE International Conference on Computer Vision: Barcelona, Spain: IEEE Computer Society Press,2011, 263. 270 ], an extension of this article in 2015 [ Hare S, Saffari A, Torr PH S.Struck: structured output tracking with keys [ J ]. IEEE Transactions Patches and interfaces, 2016 (2106): intelligent model for analysis on the top of the International journal (PAMI 2099). The Struck considers the target tracking as a structured output problem, avoids the intermediate classification link of the traditional discriminant tracking and obviously improves the target tracking performance. In order to adapt to the change of the target without losing the time context information of the target, Yao et al [ Yao R, Shi Q, Shen C, et al. robust Tracking with Weighted Online Structured Tracking [ A ]. Proceedings of the European conference on Computer Vision [ C ]. Berlin Heidelberg: Springer,2012, Part III: 158-. In order to improve the Tracking performance of the occlusion and deformation target, Yao et al (Yao R, Shi Q, Shen C, et al, part-based Visual Tracking with on line tension Structural Learning [ A ]. Procedings of Computer Vision and Pattern registration [ C ]. Portland, OR, USA: IEEE Computer Society Press,2013, 2363-. In order to solve the problem of model drift in target Tracking, Bai and Tang [ BaiY, Tang M. robust Tracking via super powered Tracking SVM [ A ]. Proceedings of IEEE Conference on Computer Vision and Pattern registration [ C ]. Providence, RI, USA IEEE Computer Society Press,2012,1854 1861 ] proposed an online Laplace sequencing SVM Tracking in 2012. Also in order to deal with the problem of model drift, MEEM tracking was proposed in 2014 by Zhang et al [ J.Zhang, S.Ma, and S.Scalaroff. Meem: Robust tracking via multiple experiments using estimation minimization [ A ]. Proceedings of European Conference on Computer Vision [ C ]. Zurich, Switzerland: IEEE Computer Society Press,2014, 188-. In 2015, Hong et al [ Hong S, You T, KWak S, et al, on line Tracking by boundary Learning Using Map with a coherent Neural Network [ A ]. Proceedings of International Conference on Machine Learning [ C ]. Lille, France: IEEE Computer Society Press,2015,597-606.] proposes to use an online SVM to guide back-propagation of CNN characteristics of a specific target to an input layer, and further to establish a Saliency Map Tracking target of the specific target. Ning et al [ Ning J F, Yang J M, Jiang S J, et al, object tracking via dual linear structured SVM and explicit feature map [ A ]. Procedence of IEEE Conference on Computer Vision and Pattern Recognition [ C ]. Las Vegas: IEEE Computer Society Press,2016, 4266-. Also, to increase the speed of the target, in 2017 Wang et al [ Wang, Mengmeng, Liu, Yong, Huang, Zeyi. Large marker Object Tracking with circular Feature Maps [ A ]. Procedents of IEEE Conference on Computer Vision and Pattern registration [ C ]. Honolulu, HI, USA: IEEE Computer Society Press,2017, 4800-. Ji et al (Ji Z, Feng K, Qian Y.part-based Visual Tracking Via Structural Support Correlation Filter [ J ]. Computing Research reproduction, 2018,1805 and 09971.] in 2018 adopt a similar idea to LMCF Tracking to accelerate structured SVM Tracking based on a target component, and the aim is to improve the speed of target Tracking. In order to further improve the performance of target Tracking based on SVM, in 2019, Zuo et al [ Zuo W, Wu X, Lin L, et al. learning Support Correlation Filters for Visual Tracking [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2019,41(5): 1158-.

In summary, the target tracking method based on the structured SVM has better tracking performance and thus receives wide attention, but the existing method has two problems. In one aspect, prior structured SVM target tracking utilizes IOU as a loss function, and as shown in FIG. 1(a), when the sample box does not overlap with the target box, there is no difference in IOU loss for the sample. However, the distances between most of the sampling frames and the center of the target frame are different, the IOU loss cannot describe the difference, and the influence on the hyperplane of the structured SVM cannot be reflected, so that the performance of the classifier is influenced. On the other hand, in order to adapt to the change of the target, the existing method updates the structured SVM by using the tracking result. As shown in fig. 1(b), due to target occlusion, illumination variation, complex background, motion blur, target deformation, and the like, non-target information is introduced into the training sample, thereby causing a structural SVM hyperplane offset (model drift), and further causing a tracking failure.

Disclosure of Invention

The invention aims to provide an SSVM tracking method based on DIOU loss and smoothness constraint, which effectively solves the problems of inaccurate loss function and model drift in target tracking based on a structured SVM, thereby improving the accuracy and success rate of a target tracking algorithm based on the structured SVM.

The technical solution for realizing the purpose of the invention is as follows: a SSVM tracking method based on DIOU loss and smoothness constraint comprises the following steps:

step 1, establishing a structured SVM model based on DIOU loss and smoothness constraint, and converting the structured SVM model based on DIOU loss and smoothness constraint into a dual problem according to a Lagrange multiplier method;

step 2, solving a structured SVM model based on DIOU loss and smooth constraint by adopting a dual coordinate descent principle, and estimating the state of the target;

and 3, evaluating the position of the tracking target by adopting a multi-scale target tracking method, and selecting the structured output with the maximum response as a tracking result.

Further, the step 1 of establishing a structured SVM model based on the DIOU loss and the smoothness constraint, and converting the structured SVM model based on the DIOU loss and the smoothness constraint into a dual problem according to a lagrange multiplier method, specifically as follows:

step 1.1, in the target tracking based on the structured SVM, Y is a rectangular frame space, any element of Y is represented by (x ', Y', w, h), wherein (x ', Y') represents the central position of the rectangular frame, w and h respectively represent the width and height of the rectangular frame, and the training data is assumed to be

Where N denotes the total number of training data, i denotes any number of data indices in the training data set, i ═ 1,2, …, N, (x)_i,y_i) A rectangular frame center position representing the ith training data representation; establishing a DCSSVM (distributed support vector machine) model based on DIOU loss and smoothness constraint, wherein the model is described as follows:

wherein w is a normal vector of the structured SVM at the moment t, and w_t-1The normal vector of the structured SVM at the moment t-1 is shown, and lambda is a smooth constraint coefficient; Ψ_i(y)＝Φ(x_i,y_i)-Φ(x_i,y)，Ψ_i(y) represents a feature vector Φ (x)_i,y_i) And the eigenvector phi (x)_iY) difference, phi (x)_i,y_i) Rectangular box y representing training sample_iIn the image x_iCharacteristic vector of (c), phi (x)_iY) represents the prediction rectangle y in the training image x_iA feature vector of (a); c is a regularization parameter, ξ_iIs a relaxation variable; l (y)_iY) is a loss function representing the loss of structural error of the predicted output rectangular box y;

b、b^gtrespectively represent B, B^gtCenter point of (B)^gt＝(x^gt,y^gt,w^gt,h^gt) Is the position of the target frame, where (x)^gt,y^gt) Indicates the center position, w, of the target frame^gtAnd h^gtRespectively representing the width and height of the target frame; b ═ x, y, w, h) is the position of the prediction box, where (x, y) denotes the center position of the prediction box, and w and h denote the width and height of the prediction box, respectively; ρ (·) is the euclidean distance, c is the diagonal length of the smallest bounding box covering the two location boxes;

step 1.2, in order to solve the formula (1), obtaining the dual problem of the formula (1) by using a Lagrangian multiplier method, and introducing the Lagrangian multiplier for the dual problem

And

the following conditions are satisfied:

the lagrangian function of equation (1) is as follows:

where ξ represents the relaxation variable, α and β are the introduced lagrangian multipliers; the right side of the equation represents the meaning of the value, ξ, corresponding to a particular training sample_iRepresents the corresponding relaxation variable at the ith training sample,

and

representing a Lagrange multiplier corresponding to the ith training sample;

the Lagrange function L (w, xi, alpha, beta) is respectively corresponding to w and xi_iCalculating the partial derivative and making the partial derivative be 0:

substituting the formulas (4) and (5) into the formula (3), and eliminating w, beta and xi in L (w, xi, alpha, beta) to obtain the dual problem of the formula (1), which is shown in formulas (6a) to (6 c):

further, the step 2 of solving the structured SVM model based on the DIOU loss and the smooth constraint by using the dual coordinate descent principle estimates the state of the target as follows:

step 2.1, the dual coordinate descent optimization algorithm selects a training sample k from the training set by using a formula (7) each time, and then updates the dual scalar quantity by using a formula (8)

Wherein the content of the first and second substances,

representing the dual scalar before the update,

representing the dual scalar after the update,

is represented by

To

An increment of change;

to obtain

Formula (8) is first substituted into formula (6a), and then formula (6a) is converted to

To yield formula (9):

wherein c is an and increment

An unrelated constant; will be the pair of formula (9)

Derivative and let the derivative be 0:

according to the constraint condition of the formula (6c), obtaining

The value range is as follows:

wherein the content of the first and second substances,

representing the dual scalar corresponding to the kth training sample,

representing the accumulated sum of dual scalars corresponding to all the training samples;

step 2.2, benefitObtained by the formulae (10) to (11)

Then, combining equation (4) to obtain an updated equation of w, as shown in equation (12):

in the formula w^(old)Representing a classification hyperplane normal vector, w^(new)Representing the classification hyperplane normal vector after the update;

is represented by

To

An increment of change; λ is a smooth constraint coefficient;

Ψ_k(y^*) Representing feature vectors

And the eigenvector phi (x)_k,y^*) The difference value of (a) to (b),

rectangular box for representing training sample

In the image x_kThe feature vector of (a) above (b),

representing the prediction rectangle y^*In the image x_kA feature vector of (a);

and 2.3, calculating the score of the candidate sample by utilizing inner product operation, and estimating the state of the target according to a maximum score criterion formula (13):

structured output y with maximum response^*I.e. the position of the target, Y represents the set of all predicted structured output rectangular boxes; Ψ_t(y)＝Φ(x_t,y_t)-Φ(x_t,y)，Ψ_t(y) represents a feature vector Φ (x)_t,y_t) And the eigenvector phi (x)_tY) difference, phi (x)_t,y_t) Rectangular box y representing training sample_tIn the image x_tCharacteristic vector of (c), phi (x)_tY) denotes predicting the rectangular frame y in the image x_tThe feature vector of (1).

Further, the step 3 of estimating the position of the tracking target by using the multi-scale target tracking method, and selecting the structured output with the maximum response as the tracking result specifically comprises the following steps:

when target tracking is carried out, a conservative scale pool S is used as {1, 0.995, 1.005}, three different scales are adopted for evaluation, and the maximum response is selected as a tracking result; for the target feature, the Lab color and local rank transformed LRT feature of the target is selected.

Compared with the prior art, the invention has the remarkable advantages that: (1) the problems of inaccurate loss function and model drift in the existing target tracking method based on the structured SVM are solved, and the accuracy and the success rate of the target tracking algorithm based on the structured SVM are improved; (2) the method has better performance in challenging videos such as background mixing, fast moving, illumination change, motion blur, target deformation, target shielding and the like, and improves the performance of target tracking.

Drawings

Fig. 1 is a schematic diagram of loss function inaccuracy and model drift problems existing in a conventional target tracking method based on a structured SVM, wherein (a) is a schematic diagram of a sampling box, and (b) is a schematic diagram of online learning of the structured SVM.

FIG. 2 is a flow chart of the SSVM tracking method based on DIOU loss and smoothness constraint of the present invention.

Fig. 3 is a schematic diagram of results of OPE performance evaluation on OTB100 data set by using the method of the present invention and other 3 high performance algorithms in the embodiment of the present invention, where (a) is a mean center error diagram, and (b) is a position coincidence ratio diagram.

Fig. 4 is a key frame screenshot of 5 challenging videos using the method of the present invention and other 3 high performance tracking algorithms in an embodiment of the present invention, where (a) is a toy video screenshot, (b) is a pedestrian video screenshot, (c) is a walking video screenshot, (d) is a face video screenshot, and (e) is a singer 2 video screenshot.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

With reference to fig. 2, the ssov tracking method based on DIOU loss and smoothness constraint of the present invention includes the following steps:

step 1, establishing a structured SVM model based on DIOU loss and smoothness constraint, and converting the structured SVM model based on DIOU loss and smoothness constraint into a dual problem according to a Lagrange multiplier method, which is specifically as follows:

Where N denotes the total number of training data, i denotes any number of data indices in the training data set, i ═ 1,2, …, N, (x)_i,y_i) A rectangular frame center position representing the ith training data representation; an SSVM (Structured SVM Based on DIOU Loss and Smoothness Constraints, DCSSVM) model Based on DIOU Loss and Smoothness Constraints is established, and is described as follows:

b、b^gtrespectively represent B, B^gtCenter point of (B)^gt＝(x^gt,y^gt,w^gt,h^gt) Is the position of the target frame, where (x)^gt,y^gt) Indicates the center position, w, of the target frame^gtAnd h^gtRespectively representing the width and height of the target frame; b ═ x, y, w, h) is the position of the prediction box, where (x, y) denotes the center position of the prediction box, and w and h denote the width and height of the prediction box, respectively; ρ (·) is the euclidean distance, c is the diagonal length of the smallest bounding box covering the two location boxes; .

Step 1.2, in order to solve the formula (1), obtaining the dual problem of the formula (1) by using a Lagrangian multiplier method, and introducing the Lagrangian multiplier for the purpose

And

the following conditions are satisfied:

the lagrangian function of equation (1) is as follows:

and

representing a Lagrange multiplier corresponding to the ith training sample;

step 2, solving a structural SVM model based on DIOU loss and smooth constraint by adopting a dual coordinate descent principle, and estimating the state of the target, wherein the structural SVM model comprises the following specific steps:

Wherein the content of the first and second substances,

representing the dual scalar before the update,

representing the dual scalar after the update,

is represented by

To

An increment of change;

the key to solve the problem is how to obtain the formula (8)

Is represented by

To

The increment of the change. For this purpose, formula (8) is first substituted into formula (6a), and then formula (6a) is converted to

Thus, the formula (9) is obtained.

Wherein c is an and increment

An unrelated constant; will be the pair of formula (9)

Derivative and let the derivative be 0:

based on the constraint conditions of the formula (6c), the following results are obtained

The value range is as follows:

wherein the content of the first and second substances,

representing the dual scalar corresponding to the kth training sample,

step 2.2 obtaining the compound of formulae (10) to (11)

is represented by

To

An increment of change; λ is a smooth constraint coefficient;

Ψ_k(y^*) Representing feature vectors

And the eigenvector phi (x)_k,y^*) The difference value of (a) to (b),

rectangular box for representing training sample

In the image x_kCharacteristic vector of (c), phi (x)_k,y^*) Representing the prediction rectangle y^*In the image x_kThe feature vector of (1).

In the target tracking based on the structured SVM, the number of support vectors in the structured SVM is increasing with the passage of time, and in order to ensure the efficiency of target tracking, the number of support vectors needs to be fixed. For this purpose, when the number of patterns in the structured SVM exceeds the budget, a support vector deletion is selected according to equation (13). The structuring SVM of the DIOU loss and smooth constraint provided by the invention adopts the strategy.

In the formula of alpha^*Representing the corresponding dual scalar when the support vector is the minimum norm,

represents the training sample (x)_i,y_i) Corresponding dual scalars, alpha representing the set of all dual scalars, Ψ_i(x_iY) is sample x_iThe support vector of (2).

In summary, the DCSSVM learning algorithm proposed by the present invention is shown as algorithm 1.

Example 1

The experimental data for this example is data on the 2015 published OTB100 reference dataset y.wu, j.lim, and m.h.yang.object tracking benchmark J. IEEE Transactions on Pattern Analysis and Machine Analysis 2015,37(9): 1834. 1848. the reference library contains 100 videos annotated as 11 challenging attributes, respectively Illumination Variation (IV), Scale Variation (SV), Occlusion (OCC), Deformation (DEF), Motion Blur (MB), Fast Motion (FM), in-plane rotation (IPR), out-of-plane rotation (OPR), object out-of-bounds (OV), Background Clutter (BC), Low Resolution (LR). The OTB100 reference data set provides two indexes for evaluating the performance of the target tracking method. One is precision index and the other is success index. The present embodiment is written using Matlab (version R2017 a) and OpenCV (version 2.4.8). In terms of hardware environment, experiments were performed on a DELL XPS8930 desktop (with a CPU model of Intel core i78700K, 16GB for memory).

4 recent structured SVM-based trackers [ Hare S, safari a, Torr P H s.struck: structured output tracking with kernels [ A ]. Proceedings of IEEE International Conference on Computer Vision: barcelona, Spain: IEEE Computer Society Press,2011,263-270.] [ Ning J F, Yang J M, Jiang S J, et al, object tracking video dual linear structured SVM and explicit feature map [ A ]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition [ C ]. Las Vegas: IEEE Computer Society Press,2016, 4266-: IEEE Computer Society Press,2017, 4800-. At tracking speed, we chose a long video, liquor, with 1741 frames to evaluate. Table 1 shows OPE performance versus velocity for 6 structured SVM tracking methods on OTB100 data sets.

TABLE 16 OPE Performance vs. velocity comparisons on OTB100 datasets for structured SVM tracking methods

Table 2 shows the results of Success in different challenging videos by 6 structured SVM tracking methods. As can be seen from table 2, the Scale-DCSSVM tracking method of the present invention performs better in challenging videos such as background clutter, fast movement, illumination change, motion blur, target deformation, and target occlusion, and the result verifies the effectiveness of the tracking method of the present invention.

TABLE 26 comparison of Success in different attribute videos by structured SVM tracking method

We evaluated the performance of our tracking algorithm Scale-DCSSVM on the OTB100 dataset along with the 3 newer high performance algorithms DeepSRDCF, CREST, SimFC. Respectively, SiamFC [ Bertonitto L., Valmadre J., Henriques J.F., Vedal A., & Torr P.H.S.Fully-conditional Simerase Networks for Object Tracking [ C ]// processing of European Conference Convergence Vision, Amsterdam, The Netherlands: IEEE Computer Society Press,2016, 850-. Fig. 3 shows the results of OPE performance evaluation on OTB100 data set by 4 high performance algorithms, wherein fig. 3(a) is a mean center error graph and fig. 3(b) is a position coincidence graph. The result shows that the algorithm of the invention has obvious improvement on two indexes of accuracy and success rate.

For further comparison, fig. 4 shows key frame screenshots of the tracking model of the present invention on 5 challenging videos with 3 high performance algorithms, such as SiamFC, CREST and DeepSRDCF, where fig. 4(a) is a toy video screenshot, fig. 4(b) is a pedestrian video screenshot, fig. 4(c) is a walking video screenshot, fig. 4(d) is a face video screenshot, and fig. 4(e) is a singer 2 video screenshot. The main challenges of the toy video are target occlusion, the main challenges of the pedestrian video are target occlusion and target deformation, the main challenges of the walking video are target occlusion, motion blur and background mixing, the main challenges of the face video are illumination change, and the main challenges of the singer 2 video are background mixing. From fig. 4(a) - (e), the method of the present invention has better tracking performance on the challenging video such as target occlusion, background mixing, and illumination change.

Claims

1. A SSVM tracking method based on DIOU loss and smoothness constraint is characterized by comprising the following steps:

2. The DIOU loss and smoothness constraint-based SSVM tracking method of claim 1, wherein the step 1 of establishing a DIOU loss and smoothness constraint-based structured SVM model, and according to lagrangian multiplier method, transforming the DIOU loss and smoothness constraint-based structured SVM model into a dual problem specifically as follows:

And

the following conditions are satisfied:

the lagrangian function of equation (1) is as follows:

and

representing a Lagrange multiplier corresponding to the ith training sample;

3. the method of claim 2, wherein the step 2 of solving the structured SVM model based on the DIOU loss and the smoothness constraint by using the dual coordinate descent principle estimates the state of the target as follows:

Wherein the content of the first and second substances,

representing the dual scalar before the update,

representing the dual scalar after the update,

is represented by

To

An increment of change;

to obtain

To yield formula (9):

wherein c is an and increment

An unrelated constant; will be the pair of formula (9)

Derivative and let the derivative be 0:

according to the constraint condition of the formula (6c), obtaining

The value range is as follows:

wherein the content of the first and second substances,

representing the dual scalar corresponding to the kth training sample,

step 2.2 obtaining the compound of formulae (10) to (11)

is represented by

To

An increment of change; λ is a smooth constraint coefficient;

Ψ_k(y^*) Representing feature vectors

And the eigenvector phi (x)_k,y^*) The difference value of (a) to (b),

rectangular box for representing training sample

In the image x_kCharacteristic vector of (c), phi (x)_k,y^*) Representing the prediction rectangle y^*In the image x_kA feature vector of (a);

4. The SSVM tracking method based on DIOU loss and smoothness constraint of claim 1, wherein the step 3 of estimating the position of the tracked target by using the multi-scale target tracking method, selects the structured output with the largest response as the tracking result, and specifically comprises the following steps: