CN110223316A

CN110223316A - Fast-moving target tracking method based on circulation Recurrent networks

Info

Publication number: CN110223316A
Application number: CN201910512271.8A
Authority: CN
Inventors: 邬向前; 卜巍; 马丁
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-06-13
Filing date: 2019-06-13
Publication date: 2019-09-10
Anticipated expiration: 2039-06-13
Also published as: CN110223316B

Abstract

The invention discloses a kind of fast-moving target tracking methods based on circulation Recurrent networks, and described method includes following steps: Step 1: using ResNet50 network as the basic network of Recurrent networks；Step 2: introducing LSTM network on its basis after the complete Recurrent networks of training and forming the various cosmetic variations that final circulation Recurrent networks occur to capture target during tracking；Step 3: being trained using Smooth-L1 loss function to circulation Recurrent networks.Whole process of the present invention carries out target following using a neural network, supervises the feature upper returning target frame coordinate in different scale with deep layer, and use the long various cosmetic variations that memory network occurs in short-term to capture target during tracking.More existing method for tracking target can accurately position target, have good robustness in the case where not needing to carry out online updating.

Description

Fast-moving target tracking method based on circulation Recurrent networks

Technical field

The present invention relates to a kind of method for tracking target more particularly to a kind of fast-moving target trackings based on circulation Recurrent networks Method.

Background technique

The purpose of target following is that frame is demarcated by given initial frame automatically to mark target in subsequent frames.With The progress of target following technology plays effect in more and more fields, for example video monitoring, human-computer interaction and movement are known Not etc..However, ineffective target following result will directly affect the performance of the above-mentioned related application based on target following, To limit the application category and application effect of method for tracking target to a certain extent.In recent years, due to convolutional Neural net For network in the application of computer vision field, target following achieves huge success.

Summary of the invention

In order to preferably carry out target following, the present invention provides a kind of fast-moving target trackings based on circulation Recurrent networks Method.This method propose a kind of Recurrent networks to obtain expression more abundant for target, and incorporate on this basis with Time-domain information during track, so that entire tracking process does not need frequently to update can be obtained with the cosmetic variation for adapting to target Obtain accurate target positioning.Method of the invention can be very good to carry out target following, and obtain on multiple databases Competitive result.

The purpose of the present invention is what is be achieved through the following technical solutions:

A kind of fast-moving target tracking method based on circulation Recurrent networks, includes the following steps:

Step 1: using ResNet50 network as the basic network of Recurrent networks comprising Pool5, Rec5b, Six convolution modules of Rec4f, Rec3d, Rec2c and Pool1 and the Pool5_A being engaged on after each convolution module, Six add-on modules of Rec5b_A, Rec4f_A, Rec3d_A, Rec2c_A and Pool1_A and be connected on all convolution modules it 3 full articulamentums afterwards；Wherein: Pool5_A, Rec5b_A, Rec4f_A, Rec3d_A, Rec2c_A and Pool1_A possess identical Structure: 3 Ge Juan bases, 1 Concat layers, 1 Correlation layers, 1 sigmoid layers and 3 full articulamentums；This time The input for returning network includes two kinds of information: one is the full size image pair of two continuous frames (former frame and present frame)；The second is The rectangle frame coordinate of target in former frame；The advantages of having benefited from the feature cascade of this basic network itself, target is in present frame Coordinate is predicted to obtain jointly by 6 convolution modules and its add-on module and 3 full articulamentums；

Step 2: introducing long memory network (LSTM) in short-term on its basis after the complete Recurrent networks of training and being formed most The various cosmetic variations that whole circulation Recurrent networks occur to capture target during tracking, the LSTM internet startup disk is in base After the full articulamentum of the 2nd of present networks, output is 4 units (including predicting the upper left corner of rectangle frame and the cross in the lower right corner Ordinate), the output of this 4 units later will predict final rectangle frame coordinate as the input of the 3rd full articulamentum；

Step 3: being trained using Smooth-L1 loss function to circulation Recurrent networks.

Compared with the prior art, the present invention has the advantage that

Whole process of the present invention carries out target following using a neural network, supervises the spy in different scale with deep layer Upper returning target frame coordinate is levied, and is become using the long various appearances that memory network occurs in short-term to capture target during tracking Change.More existing method for tracking target can accurately position target in the case where not needing to carry out online updating, It has good robustness.

Detailed description of the invention

Fig. 1 is that the present invention is based on the fast-moving target tracking method flow diagrams of circulation Recurrent networks；

Fig. 2 is add-on module structure chart；

Fig. 3 is visual comparison result of the different method for tracking target from method of the invention in different challenge scenes；

Fig. 4 is the comparison of method of the invention and 12 kinds of method for tracking target in TC128 data set.

Fig. 5 is the comparison of method of the invention and 7 kinds of method for tracking target in VOT2017 data set.

Specific embodiment

Further description of the technical solution of the present invention with reference to the accompanying drawing, and however, it is not limited to this, all to this Inventive technique scheme is modified or replaced equivalently, and without departing from the spirit and scope of the technical solution of the present invention, should all be covered Within the protection scope of the present invention.

The present invention provides a kind of fast-moving target tracking method based on circulation Recurrent networks, Fig. 1 show whole network Overall structure, be generally segmented into three parts, particular content is as follows:

First part is basic network, and current most of track algorithms mostly use greatly the more light-weighted net such as VGG-M Why network selects such light-weighted network to be to be able to go out in target appearance as basic network as basic network When existing acute variation, on-line fine is carried out in time, enables whole network when target such cosmetic variation occurs again Relatively high response can be generated, to accurately position target.However, update so frequent will greatly increase algorithm Complexity, seriously dragged slowly the speed of tracker.For this purpose, the present invention uses net based on the ResNet50 of larger capacity Network designs and proposes 6 accessory modules to come in different scale on this basis using the basic network as characteristic extracting module Feature on the position of target frame is returned.

The second part is the specific structure of add-on module, as shown in Figure 2.CNN model is by cascading multiple convolution sum ponds Change layer and carrys out abstract characteristics.That is, the CNN feature of shallow-layer is more focused on for target detail (including edge, angle etc.) It portrays, can accurately position target using these shallow-layer features.And the CNN feature of deep layer often focuses more on the language of target volume Adopted information tends to distinguish target and background by further feature.In order to give full play to the excellent of different depth feature Point, our Pool5, Rec5b, Rec4f, Rec3d, Rec2c and Pool1 in 6 convolution modules of ResNet50 add respectively Add add-on module Pool5_A, Rec5b_A, Rec4f_A, Rec3d_A, Rec2c_A and Pool1_A.Wherein, each additional mode Block all enough can predict the coordinate of target frame under the feature of current scale.For each add-on module, we Using identical structure.As shown in Fig. 2, the feature of Pool5_A, which is carried out up-sampling operation, first makes for Rec5b_A Its feature sizes that can match Rec5b, the correlation calculated between two groups of features followed by Correlation layers will Correlation layers of output is connected with Concat layers of characteristic use of Rec5b, connects two convolutional layers and one later Sigmoid layers and three full articulamentums.

Part III is to introduce long memory network (LSTM) in short-term on the basis of Recurrent networks to form final be recycled back to Return network, structure is as shown in Figure 3.Speed index is often the important evidence for measuring tracker performance quality, with the hair of CNN Exhibition, more and more the track algorithm based on CNN is suggested.Mostly there are following characteristics in these track algorithms: in most base In the track algorithm of CNN, designed by the most number of plies of CNN model it is shallower, the purpose is to when deformation occurs for target Timely trim network enables whole network to capture the deformation of target.However, frequent on-line fine can greatly increase Computation system complexity.Therefore, the present invention combines the Recurrent networks designed before with long memory network (LSTM) in short-term, so that Whole network can also capture the outer of target without finely tuning the case where by the information being stored in long memory network in short-term See variation.For this purpose, LSTM is embedded between the 2nd and the 3rd full articulamentum of Recurrent networks by we.The input of LSTM and defeated It can indicate out are as follows:

Z_t=T (AX_t+BY_t-1+b) (1)；

Y_t=O_t⊙T(C_t) (2)；

Wherein, t is frame index, Z_tIt is the output vector of present frame, X_tIt is the input vector of present frame, Y_t-1Be circulation to Amount, b be it is bigoted, T is tangent function, O_tIt is the out gate of t frame, C_tIt is memory cell, A and B are weight terms, and ⊙ indicates dot product Operation.

The objective function of network training is Smooth-L1.

Four, experimental result

The database that network training uses is YouTubeBoundingBoxes database, wherein including 380000 views Frequently.The present invention trains Recurrent networks and circulation Recurrent networks using different Training strategies.For Recurrent networks, the present invention is adopted It is trained with two stage method.It is abundant the purpose is to obtain using discrete two frame as input in first stage Location information.In second stage, the space time information of supplementary network is carried out using continuous two frame.Later, fixed Recurrent networks Parameter train circulation Recurrent networks.

When verifying inventive energy, the present invention is evaluated proposed by the invention using the public database of three standards The performance of method for tracking target, respectively OTB100, TC128 and VOT2017.Wherein, OTB100 and TC128 is used identical Evaluation criterion, respectively accuracy rate (precision) and success rate (Success).It is different from OTB100 and TC128, The evaluation criterion of VOT2017 includes: accuracy rate-robust rate curve (Accuracy-Robustnessrank is denoted as A-R ranking) It is average Duplication (ExpectAverageOverlapRate) with expectation.

Fig. 3 illustrates the visualization comparison of method He other current the best ways of the invention.From figure 3, it can be seen that Method of the invention has obtained accurate tracking result in various challenging scenes.It carries out below some specific Analysis:

(1) method of the invention enables Recurrent networks to predict target under the feature of different scale using deep layer supervision Position, while by merging the long various deformation that memory network (LSTM) occurs in short-term to capture target during tracking, from And it can accurately predict the position of target.

(2) in the case where incorporating long memory network in short-term, method of the invention is more enough (such as to be schemed when deformation occurs for target 32,4 rows) it can also obtain accurate target positioning.

(3) when dimensional variation occurs for target (3,5,6 rows of such as Fig. 3), the positioning result ratio that method of the invention obtains Other methods are much better.

Table 1 illustrate method and 19 kinds of best method for tracking target of the invention in accuracy rate (Precision), successfully Quantitative assessment result of the evaluation criterions such as rate (Success) and speed (FPS) in OTB data set.

1 experimental result of table and at present best target following result on OTB100 database accuracy rate (Precision), The comparison of success rate (Success) and speed (FPS)

From table 1 it follows that method of the invention in OTB100 test data set, integrates accuracy rate (Precision), success rate (Success) and speed (FPS) these three evaluation criterions all achieve it is competitive as a result, This demonstrate that the validity of method of the invention.

As can be seen that the accuracy rate (Precision) of method of the invention, success rate (Success) are high from Figure 4 and 5 In other methods, it means that even if method of the invention is more more robust than other methods on challenging data set. Particularly, method of the invention has also obtained good result in VOT2017 data set.VOT2017 data set is a difficulty And challenging conspicuousness detection data collection, the disturbing factor of many of them difficulty includes: to fast move, and is blocked, ruler Degree variation etc..Method of the invention can be accurately located target, this depends on method of the invention, and there is powerful feature to mention Ability and feature are taken in the ductility of time dimension, make e-learning to feature can guarantee and position target under complex background, To obtain preferable tracking result.

Claims

1. a kind of fast-moving target tracking method based on circulation Recurrent networks, it is characterised in that described method includes following steps:

Step 1: using ResNet50 network as the basic network of Recurrent networks；

Step 2: introducing LSTM network on its basis after the complete Recurrent networks of training and forming final circulation Recurrent networks The various cosmetic variations occurred to capture target during tracking；

2. the fast-moving target tracking method according to claim 1 based on circulation Recurrent networks, it is characterised in that the base Present networks include six convolution modules of Pool5, Rec5b, Rec4f, Rec3d, Rec2c and Pool1 and are engaged on each convolution Six add-on modules of Pool5_A, Rec5b_A, Rec4f_A, Rec3d_A, Rec2c_A and Pool1_A and string after module 3 full articulamentums being associated in after all convolution modules.

3. the fast-moving target tracking method according to claim 2 based on circulation Recurrent networks, it is characterised in that described Pool5_A, Rec5b_A, Rec4f_A, Rec3d_A, Rec2c_A and Pool1_A possess identical structure: 3 Ge Juan bases, 1 Concat layers, 1 Correlation layers, 1 sigmoid layers and 3 full articulamentums.

4. the fast-moving target tracking method according to claim 1 based on circulation Recurrent networks, it is characterised in that described time The input for returning network includes two kinds of information: one is the full size image pair of two continuous frames；The second is in former frame target square Shape frame coordinate.

5. the fast-moving target tracking method according to claim 1 based on circulation Recurrent networks, it is characterised in that described For LSTM internet startup disk after the 2nd full articulamentum of basic network, output is the upper left corner and the lower right corner of prediction rectangle frame 4 units of transverse and longitudinal coordinate, the output of this 4 units later will predict final square as the input of the 3rd full articulamentum Shape frame coordinate.

6. the fast-moving target tracking method according to claim 1 based on circulation Recurrent networks, it is characterised in that described LSTM's outputs and inputs expression are as follows:

Z_t=T (AX_t+BY_t-1+b) (1)；

Y_t=O_t⊙T(C_t) (2)；

Wherein, t is frame index, Z_tIt is the output vector of present frame, X_tIt is the input vector of present frame, Y_t-1It is cyclic vector, b is Bigoted, T is tangent function, O_tIt is the out gate of t frame, C_tIt is memory cell, A and B are weight terms, and ⊙ indicates dot product operation.