CN111968155B - Target tracking method based on segmented target mask updating template - Google Patents

Target tracking method based on segmented target mask updating template Download PDF

Info

Publication number
CN111968155B
CN111968155B CN202010718018.0A CN202010718018A CN111968155B CN 111968155 B CN111968155 B CN 111968155B CN 202010718018 A CN202010718018 A CN 202010718018A CN 111968155 B CN111968155 B CN 111968155B
Authority
CN
China
Prior art keywords
target
frame
template
target template
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010718018.0A
Other languages
Chinese (zh)
Other versions
CN111968155A (en
Inventor
张静
郝志晖
刘婧
苏育挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010718018.0A priority Critical patent/CN111968155B/en
Publication of CN111968155A publication Critical patent/CN111968155A/en
Application granted granted Critical
Publication of CN111968155B publication Critical patent/CN111968155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method based on a segmented target mask updating template, which comprises the following steps: constructing a basic network framework of target tracking; initializing a network, acquiring foreground information in a regression frame, generating an initialized target template with prominent foreground, and linearly overlapping the initialized target template; inputting the linear superposition result into a target template branch of a tracking module to obtain the central position and the size of the target of the next frame; every m frames, the network calculates the target template corresponding to the frame through the target center point of the corresponding frame and the regression frame parameters, and inputs the target template of the frame into the mask segmentation module to generate the subsequent frame target template with the prominent foreground; linearly superposing the initialization target template, the initialization target template with the outstanding foreground and the subsequent frame target template with the outstanding foreground to generate a new target template used by the next frame; and inputting the target template into a tracking network frame, and calculating the central position and the size of the target of the next frame.

Description

Target tracking method based on segmented target mask updating template
Technical Field
The invention relates to the field of deep neural networks, in particular to a target tracking method based on a segmented target mask updating template under a deep twin network framework.
Background
With the increasing development of artificial intelligence, computer vision is more and more widely applied to various fields and daily life of people. The target tracking technology is an important branch of the computer vision field, plays a significant role in different application fields such as automatic driving, man-machine interaction, pedestrian detection, weapon accurate striking and the like, and has very wide application prospect and profound research significance.
The basic tasks of target tracking can be briefly summarized as: the size and the position of a target are given in the first frame of the video, and the central position and the size of the target are calculated frame by frame in the subsequent frames by an algorithm, so that the target in the video is tracked. The target tracking algorithm mainly comprises the following steps according to the video frame number and whether a target is single: long video multi-target tracking, long video single-target tracking, short video multi-target tracking and short video single-target tracking. Although the target tracking algorithm has developed rapidly in recent years, it still faces a number of challenging problems, such as: the problems of similar target interference, non-rigid deformation of the target, size change of the target, rotation of the target in a plane and outside the plane and the like can influence the performance of a target tracking algorithm to different degrees; meanwhile, the real-time performance of the algorithm is required by the practical application of target tracking, so that the speed of the algorithm is also a key index for measuring the performance of the algorithm.
The target tracking algorithm has two important branches, namely a related filtering algorithm and a deep learning algorithm. In recent years, in deep learning algorithms, a target tracking algorithm based on a deep twin convolutional neural network is better balanced in speed and precision and better in stability than a related filtering algorithm, and is paid attention and developed by a plurality of researchers. The algorithm generally adopts a large-scale data set end-to-end training model and carries out off-line tracking, so that an on-line updating strategy for adjusting a filter template in a similar related filtering algorithm is lacked, and the stability of the tracking algorithm is reduced to different degrees in an application scene in which a target object is changed in rapid motion and is shielded or a target is deformed in a non-rigid way. Therefore, it is necessary to add an online target template updating mechanism under a deep twin convolutional neural network framework and improve the adaptability of the algorithm to complex application scenes.
Deep mask network (DeepMask)[1]) The method is an example segmentation model based on a VGG network, the network realizes front and background segmentation, foreground semantic segmentation and foreground example segmentation, and the network outputs a mask of an image foreground. As is well known, a target tracking algorithm can be essentially understood as the problem of two categories of image foreground and background in a search area, and the attention requirement of a single target tracking network on the image foreground is far greater than that of the image background, so that the target to be tracked is highlighted in a target template by segmenting the image foreground, namely a mask of the target, and the target is more easily noticed by the network in the tracking process; meanwhile, the online updating mechanism is combined, so that the adaptability of the target tracking algorithm to a complex scene and the stability of the operation of the algorithm are greatly improved.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a target tracking method for updating a template based on a segmented target mask. The target tracking method improves the attention of a network to a target in an initialization frame by segmenting a mask of the target in a target template; meanwhile, a template online updating strategy is added in a subsequent frame, so that the adaptability of the algorithm to challenging problems in the tracking process is effectively improved.
The purpose of the invention is realized by the following technical scheme:
a target tracking method for updating a template based on a segmented target mask comprises the following steps:
constructing a basic network framework of target tracking;
initializing a network, inputting target regression frame parameters obtained by initialization into a mask segmentation module based on a DeepMask network frame in the basic network frame to obtain foreground information in the regression frame, generating an initialization target template with prominent foreground, and linearly overlapping the initialization target template;
inputting the linear superposition result into a target template branch of a tracking module to obtain the central position and the size of the target of the next frame; every m frames, the network calculates the target template corresponding to the frame through the target center point of the corresponding frame and the regression frame parameters, and inputs the target template of the frame into the mask segmentation module to generate the subsequent frame target template with the prominent foreground;
linearly superposing the initialization target template, the initialization target template with the outstanding foreground and the subsequent frame target template with the outstanding foreground to generate a new target template used by the next frame;
and inputting the target template into a tracking network frame, and calculating the central position and the size of the target of the next frame.
Further, the basic network framework is:
and a mask segmentation module based on a DeepMask network framework is added at the front end of the framework based on a basic tracking framework of the SimRPN + +.
Further, the new target template is:
Ti+1=T0+αA0+βAi
wherein, T0Represents the initialization target template, A0Initial target template, A, representing a salient foregroundiThe target template of the subsequent frame showing the outstanding foreground, alpha and beta are both hyper-parameters, Ti+1Representing the new target template used for the next frame tracking.
In a further aspect of the present invention,
A0=Fcrop(DeepMask(x0,y0,bbox0))
wherein x is0、y0、bbox0Respectively generating an initial center coordinate and a regression frame parameter in a target initialization stage; fcropRepresenting a clipping function for clipping a target template in a frame of a video; the DeepMask represents a mask segmentation module based on a DeepMask network framework; a. the0An initialization target template representing a salient foreground;
Ai=Fcrop(DeepMask(xi,yi,bboxi)m|i)
wherein m represents updating the target template of the subsequent frame of the salient foreground once every m frames, xi、yi、bboxiRepresenting the center coordinates of the object in this frame and the regression box parameters, F, respectively, when updatedcropRepresenting a clipping function, DeepMask representing a mask segmentation Module, AiA subsequent frame object template representing a salient foreground.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
1. the invention uses SiamRPN +[2]Tracking the frame on a basis, using DeepMask[1]The network framework divides the target mask in the initialization process, generates attention about the foreground by using the target mask, and increases the weight in the target initialization frame, so that the attention degree of the network to the tracked target object is improved, and the target position is correctly predicted in the tracking process.
2. The invention alleviates the tracking algorithm by adding the online updating mechanism of the target templateWhen the off-line tracking is carried out, the negative influence caused by challenging problems that the target object has large form change, is shielded and the like cannot be adapted. The mechanism is based on DeepMask[1]And a mask segmentation module of the network framework segments the target mask of the subsequent frame to generate mask information related to the target in the subsequent frame, so that the adaptability of the new target template to motion change is effectively improved.
3. According to the method, the foreground attention is generated by using the target mask information which is more accurate than the regression frame parameter parallel to the coordinate axis in the initialization stage, and the linear superposition of the target related foreground information is added in the subsequent stage, so that the newly generated target template enriches the target related motion information in the subsequent frame, and the adaptability of the algorithm to the target motion is improved. In the general reference data set, the method provided by the invention has better experimental results.
Drawings
FIG. 1 is a flow chart of a method of object tracking based on updating a template of a segmented object mask;
FIG. 2 is a schematic diagram based on DeepMask[1]A target mask segmentation network block diagram of the network framework;
FIG. 3 is a block diagram of a tracking algorithm for updating a target template based on a segmented target mask.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
1. The embodiment of the invention provides a target tracking method for updating a template based on a segmented target mask, and referring to fig. 1, the method comprises the following steps:
101: constructing a basic network framework of a target tracking algorithm: namely, firstly constructing the target based on SiamRPN +[2]Then adding a DeepMask-based frame before the target template branch at the input end of the network[1]And the mask segmentation module of the network framework forms a complete basic network framework.
Wherein, DeepMask[1]Network framework is provided by VGG featuresThe system comprises a network, a mask prediction branch and a category scoring branch. The invention omits the output of the category scoring branch, and only uses the output of the mask prediction branch as the preliminary result of segmenting the foreground. Dividing the mask into a module and a basic tracking network SiamRPN +[2]The branch parts of the intermediate target template are connected to form a basic network framework of the method.
102: the invention firstly carries out mask segmentation of the initialization target template, namely, the initialization information is input based on DeepMask[1]The foreground highlighting module generates an initialized target template with a highlighted foreground.
The method comprises the steps of firstly obtaining a target center position and regression frame parameters in an initialization process, and inputting the target center position and the regression frame parameters into a DeepMask[1]And predicting branches of the mask in the frame to generate a mask segmentation result, then generating an initialized target template with prominent foreground through calculation, and finally linearly superposing the item and the initialized target template.
103: in the tracking process, the invention adds a template on-line updating strategy which is beneficial to the target template to adapt to the change of the target motion. The strategy mainly realizes the online updating of the target template by segmenting the foreground information in the target template of the subsequent frame and updating partial information in the new target template once every m frames. After the tracking is started, the algorithm counts the subsequent frames, when the number of the interval frames is m, the center position of the target of the frame and the information of the regression frame are obtained, the information is input into a mask extraction module to generate a subsequent frame target template with prominent foreground, and the template is linearly superposed with the initialized target template and the target template with prominent foreground to generate a new template used by the next frame. The subsequent frame object template of the salient foreground remains unchanged until the next update. In the invention, m is 30;
104: in the testing stage, firstly, an algorithm is initialized to obtain a first frame target template and a subsequent frame search area, the initialized target template generates related items of two parts of outstanding foregrounds through a mask extraction module, then a new template and the search area cut into a fixed size are simultaneously input into a feature extraction network for feature matching and calculation, the position offset and the size of the next frame target are calculated, the next frame target is tracked, the tracking is counted, and the subsequent frame target template items of the outstanding foregrounds in the new template are updated every 30 frames.
In summary, the embodiment of the present invention designs a target tracking method based on a segmented target mask update template through steps 101 to 104, increases the proportion of foreground information in an initialization process, and effectively improves the attention degree of a feature extraction network to a target; meanwhile, because the target motion has certain continuity in time and space, the target motion changes in recent frames are relatively similar, and the information of the previous frame has a certain guiding effect on the subsequent frame. By utilizing the characteristic, the invention adds an online updating strategy of the target template in the tracking process and adds the latest motion change information of the target in the target template. The strategy effectively improves the adaptability of the new target template to the motion change of the object, thereby effectively improving the performance of the algorithm.
2. The technical solutions of the above embodiments are further described below, and the details are described in the following:
201: generally speaking, in the task of tracking a short-video single target, the center position and the size of the target to be tracked in the first frame are usually given first, and the position offset and the size change of the target are calculated frame by frame in the subsequent frames by an algorithm. In recent years, a target tracking algorithm based on a deep twin convolutional neural network generally adopts a dual-branch feature matching framework, wherein a target template branch mainly extracts features of a target template containing a target to be tracked and matches the features of a search area in the network. Thus, the information contained in the target template has a direct impact on the results of the final target tracking. The target template is generally composed of a target to be detected as a center and a small amount of background information around the target, namely a foreground and a background in the target template. Target templates used by a common twin network are directly generated in an initialization process, and no weight is distributed to foreground and background information in the target templates, so that the attention degree of the network to the foreground and the background in the target templates is almost consistent. The invention utilizes a DeepMask-based method[1]Network frame highlightingThe foreground information in the target template increases the proportion of the target to be tracked, so that the network can pay attention to the target to be tracked more easily, and the performance of a tracking algorithm is improved.
The invention adopts a deep layer twin convolution neural network SiamRPN +[2]As a basic framework of a tracking network, a DeepMask-based template is added before the target template branches[1]A mask segmentation module of the web framework. The input of the module is the central position of the target to be tracked and the parameter of the regression frame, and the central position and the parameter of the regression frame are input into the DeepMask[1]A network. The network consists of three parts, namely a basic feature extraction network VGG, a mask prediction branch and a score prediction branch, and the invention omits the output of the score prediction branch, so that only two parts of the network are used. The VGG network comprises 8 convolution layers of 3 x 3 and 4 maximum value pooling layers of 2 x 2, the mask prediction branch comprises 1 convolution layer of 1 x 1 and an up-sampling layer based on bilinear interpolation, and the output of the VGG network is a target image in a regression frame of the prominent foreground. Basic network frame SiamRPN + of target tracking[2]Extracting features by using ResNet50, and outputting the final tracking result through three RPN (Region Proposal Network) networks connected in series. In the invention, SiamRPN +[2]The target template branch input of the network is a new target template formed by a linear superposition initialization target template, an initialization target template with prominent foreground and a subsequent frame target template with prominent foreground.
202: in the initialization process of the algorithm, the central position of the target and the regression box parameters are given by a first frame and input based on DeepMask[1]The mask segmentation module of the network segments the foreground information of the image in the regression frame, obtains the image in the regression frame with outstanding foreground, enables the image to cover the image in the original first frame regression frame, and then passes through a cutting function FcropAnd segmenting the first frame passing through the segmentation mask to obtain a target template with the prominent foreground, wherein the target template is an initialized target template with the prominent foreground. Said FcropThe target template size formula for clipping is expressed as follows:
A=s(w+p)×s(h+p)
wherein A is the size of the target template after cuttingSmall, i.e., 127 × 127; w and h are the width and the height of the regression frame respectively; s is the scaling factor, p is the expansion factor, and p is (w + h)/2. Recording the initialization target template of the outstanding foreground as A0
In the tracking process, the form and the motion state of the target to be tracked can change to a certain extent, so that the timeliness of the information in the target template of the initial frame is poor, and the target template cannot be adapted to the change of the target template well. In order to improve the adaptability of the target template to the target form change in the subsequent frame and the attention degree of the network to the target, the invention adds an online updating mechanism of the target template. The mechanism mainly depends on mask segmentation of the target in the subsequent frame, so that not only is the information of the target in the latest frame increased in the initialized target template, but also the prospect in the subsequent frame is highlighted, and the attention degree of the network to the target is further improved. Similar to the initial target template for generating the prominent foreground, the center position and the regression frame parameters of the frame target output by the algorithm are obtained every 30 frames subsequently, and the center position and the regression frame parameters are input based on DeepMask[1]The mask segmentation module of the network acquires the foreground of the image in the regression frame of the frame, then covers the position of the regression frame in the frame, and finally passes through FcropCutting to obtain a target template of the frame with outstanding foreground, and marking as Ai
203: in the testing process, due to the consistency of the target motion in a time domain and a space domain, the target template online updating strategy is performed every 30 frames. Firstly, through an initialization stage, an initialization target template and an initialization target template with a prominent foreground are generated from a first frame, and are linearly superposed and input into SiamRPN +[2]The target template branch of (2) extracts features. When tracking to frame 30, it will be based on SiamRPN +[2]Target center position and regression box input of tracking frame output are based on DeepMask[1]The mask segmentation module of the network acquires a target template of the frame with prominent foreground, and linearly superimposes the target template in the template, wherein a formula generated by a new template is represented as follows:
Ti+1=T0+αA0+βAi
wherein, T0Indicating the initialization target template, A0Initial target template, A, representing a salient foregroundiThe target template of the subsequent frame showing the outstanding foreground, alpha and beta are both hyper-parameters, Ti+1The new target template used for next frame tracking is shown, and in the present invention, α is 0.03 and β is 0.005. Inputting the new target template into a target template branch of a tracking network, and calculating the position of the target of the next frame and the size of a regression frame. The formula of the whole algorithm is expressed as follows:
(x,y,Δs)i+1=S(Ti+1,Ri+1)
wherein, Ti+1A new object template, R, representing the initialized frame object template containing the salient foreground and the subsequent frame object template containing the salient foregroundi+1Represents the search area of the next frame, S represents the tracking algorithm, (x, y, Δ S)i+1Indicating the target position and the regression box variation size in the next frame.
3. The following example is described in detail below with reference to specific experimental data for effect evaluation of the above embodiment:
301: data composition
The test set consists of all video sequences in VOT2016 and OTB100 data sets, wherein the VOT2016 data set comprises 60 videos which are all color sequences; the OTB100 data set contains 100 videos, of which there are 75 color sequences, 25 grayscale sequences.
302: evaluation criterion
The performance evaluation method adopts different evaluation indexes to evaluate the performance of the algorithm on the VOT2016 data set and the OTB100 data set respectively.
On the VOT2016 data set, three evaluation indexes, namely Accuracy, Robustness and EAO (average Overlap expectation), are mainly adopted to evaluate the performance of the target tracking algorithm: accuracy is an index for measuring the tracking Accuracy of the tracking algorithm. By calculating IoU (overlap ratio, Intersection over Union) between the predicted target regression box and the true target regression box for each frame in the video sequence,
Figure BDA0002598934440000071
wherein the content of the first and second substances,
Figure BDA0002598934440000072
the real target regression box is represented by the real target regression box,
Figure BDA0002598934440000073
representing a predicted target regression box. In order to ensure the accuracy of the test in the experiment, the accuracy of each frame can be repeatedly measured for many times during calculation, and finally all results are averaged, so that the final accuracy value of the tracking algorithm is obtained. The larger the Accuracy value is, the better the tracking result Accuracy is; robustness is an index for measuring the stability of a tracking algorithm and is used for calculating the number of frames with tracking loss in the tracking process. The larger the Robustness value is, the more the number of tracking lost frames of the tracking algorithm is, the more unstable the algorithm is; the EAO value is used for comprehensively evaluating the accuracy and the robustness of the algorithm and is an important index for overall evaluation and tracking of the performance of the algorithm. The value calculation process is as follows: firstly, all video sequences are classified according to length, and the length of a tracker to be tested is NsThe sequence of (1) is tested, from the first frame to the last frame, there is no reinitialization mechanism after the tracking failure, and the Accuracy (Accuracy) phi of each frame is obtainediThen averaging each frame to obtain the accuracy of the sequence
Figure BDA0002598934440000077
All the lengths are NsThe sequences are all evaluated once and averaged to obtain the length N of the trackersEAO values on the sequence
Figure BDA0002598934440000074
EAO was calculated for sequences of other lengths in the same manner. For different NsValue is corresponded to
Figure BDA0002598934440000075
Averaging again over the sequence length range [ N ]lo,Nhi]To obtain a constant value:
Figure BDA0002598934440000076
on OTB datasets, one-pass computation accuracy maps (Precision Plot) and Success rate maps (Success Plot) were mainly used. In the accuracy map, the distance between the central point of the target prediction position and the true value is usually calculated, and the ratio of the number of frames with the distance smaller than a set threshold value to the total number of frames of the sequence is counted. The accuracy rates obtained by setting different threshold values are different, and the threshold value in the experiment is set to be 0-50; in the success rate diagram, IoU between the predicted value and the true value of the target regression frame is usually calculated to obtain an overlap ratio score (overlap score), and the proportion of the frames with the scores larger than a set threshold value to the total number of the sequence frames is counted, wherein the threshold value is set to be 0-1 in the experiment of the invention.
303: comparison algorithm
In comparative experiments, the present invention was compared with seven mainstream algorithms on the VOT2016 dataset, 2 correlation filtering-like algorithms, 5 deep learning-like algorithms. The correlation filtering algorithm comprises: C-COT[3]、ECO[4]. The algorithm has high tracking speed, but has lower accuracy compared with the deep learning algorithm. The deep learning algorithm comprises the following steps: SiameseFC[5]、SiameseRPN[6]、DaSiamRPN[7]And SiamRPN +[2]. The algorithm balances the tracking speed and precision, not only meets the requirement of real-time tracking, but also has good accuracy and stability.
Eight mainstream algorithms were compared on the OTB100 dataset, 6 correlation filtering class algorithms, 5 deep learning class algorithms. The correlation filtering algorithm comprises: ECO[4]、SRDCF[8]、BACF[9]、SRDCFdecon[10]、Staple[11]And LMCF[12]. The deep learning algorithm comprises the following steps: SiamDW[13]SiamFCRes22, Memtrack in (1)[14]
Tables 1 and 2 show the objective evaluation results of the method and the related comparison algorithm on the VOT2016 dataset and the OTB100 dataset, respectively (the best evaluation result is shown in bold type). As can be seen from table 1, accuray is generally higher than the related filtering tracking algorithm in most deep learning methods and the evaluation results obtained by the method provided by the present invention; robustness is generally slightly higher than related filtering algorithms; the EAO index is generally obviously higher than that of the related filtering algorithm. Compared with a deep learning type comparison algorithm, the method obtains higher results on the three objective evaluation indexes, and objectively shows that the method has better performance on the VOT2016 data set than the comparison algorithm. As can be seen from table 2, the difference between the Success and the precision indexes of the related filtering algorithm and the deep learning algorithm is not obvious, but the performance of the deep learning algorithm is generally higher than that of the related filtering algorithm, and the two objective indexes tested on the OTB100 data set by the algorithm provided by the present invention have better objective results than the related filtering algorithm and the deep learning algorithm.
The two databases are verified through experiments, the experimental results of the algorithm are better, and the method also shows that the algorithm provided by the invention can reduce the probability of tracking failure in the tracking process to a certain extent, effectively improves the precision of the tracking result, and enhances the adaptability of a tracker to the problems of non-rigid deformation, shielding and rapid change of a target object, so that the accuracy and the stability of the tracking algorithm are improved.
TABLE 1
Figure BDA0002598934440000081
TABLE 2
Figure BDA0002598934440000082
Figure BDA0002598934440000091
Reference to the literature
[1]Pinheiro P O,Collobert R,Dollar P.Learning to Segment Object Candidates[C]//Advances in Neural Information Processing Systems,Dec.7-12,2015,Montreal,Canada,1990-1998
[2]Li B,Wu W,Wang Q,et al.SiamRPN++:Evolution of Siamese Visual Tracking with Very Deep Networks[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019,pp:4277-4286.
[3]Danelljan M,Robinson A,Khan F S,et al.Beyond Correlation Filters:Learning Continuous Convolution Operators for Visual Tracking[C]//Proceedings of the 2016European Conference on Computer Vision,Oct.8-16,2016.Amsterdam,the Netherlands.2016,472-488.
[4]Danelljan M,Gavves G,Khan F S,et al.Eco:Efficient convolution operators for tracking[C]//2017IEEE Conference on Computer Vision and Pattern Recognition,Jul.21-26,2017,Honolulu,HI,USA:IEEE,2017,79:6931-6939.
[5]Bertinetto L,Valmadre J,Henriques J F,et al.Fully-convolutional Siamese networks for object tracking[C]//2016European Conference on Computer Vision Workshop,Oct.8-10,2016,Amsterdam,The Netherlands,2016,9914:850-865.
[6]Li B,Yan J,Wu W,et al.High Performance Visual Tracking with Siamese Region Proposal Network[C];2018IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,UT,2018,pp.8971-8980.
[7]Zhang Z,Peng H.Deeper and wider siamese networks for real-time visual tracking[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019,pp:4591-4600.
[8]Danelljan M,Hager G,Shahbaz Khan F,et al.Learning spatially regularized correlation filters for visual tracking[C]//IEEE International Conference on Computer Vision,Dec.13-16,2015,Santiago,Chile.2015,pp:4310-4318.
[9]Kiani Galoogahi H,Fagg A,Lucey S.Learning background-aware correlation filters for visual tracking[C]//IEEE International Conference on Computer Vision,Oct.22-29,2017,Venice,Italy.2017,pp:1135-1143.
[10]Danelljan M,Hager G,Khan F S,et al.Adaptive decontamination of the training set:A unified formulation for discriminative visual tracking[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.27-30,2016,Las Vegas,NV,USA:IEEE,2016,59:1430-1438.
[11]Bertinetto L,Valmadre J,Golodetz S,et al.Staple:Complementary learners for real-time tracking[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.27-30,2016,Las Vegas,NV,USA:IEEE,2016,237:1401-1409.
[12]Wang M M,Liu Y,Huang Z.Large margin object tracking with circulant feature maps[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jul.21-26,2017,Honolulu,HI,USA.2017,pp:4021-4029.
[13]Zhang Z,Peng H.Deeper and wider siamese networks for real-time visual tracking[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019,pp:4591-4600.
[14]Yang T Y,Chan Antoni B.Learning Dynamic Memory Networks for Object Tracking[C]//European Conference on Computer Vision,Sep,8-14,2018,Munich,Germany.2018,pp:152-167.
The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (4)

1. A target tracking method for updating a template based on a segmented target mask is characterized by comprising the following steps:
constructing a basic network framework of target tracking;
initializing a network, inputting target regression frame parameters obtained by initialization into a mask segmentation module based on a DeepMask network frame in the basic network frame to obtain foreground information in the regression frame, generating an initialization target template with prominent foreground, and linearly overlapping the initialization target template;
inputting the linear superposition result into a target template branch of a tracking module to obtain the central position and the size of the target of the next frame; every m frames, the network calculates the target template corresponding to the frame through the target center point of the corresponding frame and the regression frame parameters, and inputs the target template of the frame into the mask segmentation module to generate the subsequent frame target template with the prominent foreground;
linearly superposing the initialization target template, the initialization target template with the outstanding foreground and the subsequent frame target template with the outstanding foreground to generate a new target template used by the next frame;
and inputting the target template into a tracking network frame, and calculating the central position and the size of the target of the next frame.
2. The method of claim 1, wherein the basic network framework is:
and a mask segmentation module based on a DeepMask network framework is added at the front end of the framework based on a basic tracking framework of the SimRPN + +.
3. The method of claim 1, wherein the new target template is:
Ti+1=T0+αA0+βAi
wherein, T0Indicating the initialization target template, A0Initial target template representing a prominent foreground, AiThe target template of the subsequent frame showing the outstanding foreground, alpha and beta are both hyper-parameters, Ti+1Representing the new target template used for the next frame tracking.
4. The method of claim 3, wherein the step of updating the template of the object tracking based on the mask of the segmented object comprises,
A0=Fcrop(DeepMask(x0,y0,bbox0))
wherein x is0、y0、bbox0Respectively generating a horizontal coordinate and a vertical coordinate of an initial center and a regression frame parameter in a target initialization stage; fcropRepresenting a clipping function for clipping a target template in a frame of a video; the DeepMask represents a mask segmentation module based on a DeepMask network framework; a. the0An initialization target template representing a salient foreground;
Ai=Fcrop(DeepMask(xi,yi,bboxi)m|i)
wherein m represents updating the target template of the subsequent frame of the salient foreground once every m frames, xi、yi、bboxiRepresenting the center coordinates of the object in this frame and the regression box parameters, F, respectively, when updatedcropRepresenting a clipping function, DeepMask representing a mask segmentation Module, AiA subsequent frame object template representing a salient foreground.
CN202010718018.0A 2020-07-23 2020-07-23 Target tracking method based on segmented target mask updating template Active CN111968155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010718018.0A CN111968155B (en) 2020-07-23 2020-07-23 Target tracking method based on segmented target mask updating template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010718018.0A CN111968155B (en) 2020-07-23 2020-07-23 Target tracking method based on segmented target mask updating template

Publications (2)

Publication Number Publication Date
CN111968155A CN111968155A (en) 2020-11-20
CN111968155B true CN111968155B (en) 2022-05-17

Family

ID=73363922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010718018.0A Active CN111968155B (en) 2020-07-23 2020-07-23 Target tracking method based on segmented target mask updating template

Country Status (1)

Country Link
CN (1) CN111968155B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541944B (en) * 2020-12-10 2022-07-12 山东师范大学 Probability twin target tracking method and system based on conditional variational encoder
CN112927127A (en) * 2021-03-11 2021-06-08 华南理工大学 Video privacy data fuzzification method running on edge device
CN112991395B (en) * 2021-04-28 2022-04-15 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886994A (en) * 2019-01-11 2019-06-14 上海交通大学 Adaptive sheltering detection system and method in video tracking
CN110163887A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 The video target tracking method combined with foreground segmentation is estimated based on sport interpolation
CN110210551A (en) * 2019-05-28 2019-09-06 北京工业大学 A kind of visual target tracking method based on adaptive main body sensitivity
CN110706254A (en) * 2019-09-19 2020-01-17 浙江大学 Target tracking template self-adaptive updating method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886994A (en) * 2019-01-11 2019-06-14 上海交通大学 Adaptive sheltering detection system and method in video tracking
CN110163887A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 The video target tracking method combined with foreground segmentation is estimated based on sport interpolation
CN110210551A (en) * 2019-05-28 2019-09-06 北京工业大学 A kind of visual target tracking method based on adaptive main body sensitivity
CN110706254A (en) * 2019-09-19 2020-01-17 浙江大学 Target tracking template self-adaptive updating method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TSDM: Tracking by SiamRPN++ with a Depth refiner and a Mask-generator;Pengyao Zhao et al;《arxiv》;20200508;全文 *
结合掩膜与孪生网络的目标跟踪方法研究;石胜斌 等;《计算机技术与发展》;20200531;第30卷(第5期);全文 *

Also Published As

Publication number Publication date
CN111968155A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
CN108734151B (en) Robust long-range target tracking method based on correlation filtering and depth twin network
CN111968155B (en) Target tracking method based on segmented target mask updating template
CN109816689B (en) Moving target tracking method based on adaptive fusion of multilayer convolution characteristics
CN108846358B (en) Target tracking method for feature fusion based on twin network
CN110473231B (en) Target tracking method of twin full convolution network with prejudging type learning updating strategy
CN112184752A (en) Video target tracking method based on pyramid convolution
CN105741316A (en) Robust target tracking method based on deep learning and multi-scale correlation filtering
CN111583300B (en) Target tracking method based on enrichment target morphological change update template
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN107730536B (en) High-speed correlation filtering object tracking method based on depth features
CN113657560B (en) Weak supervision image semantic segmentation method and system based on node classification
CN107657625A (en) Merge the unsupervised methods of video segmentation that space-time multiple features represent
CN108830170B (en) End-to-end target tracking method based on layered feature representation
CN108280844B (en) Video target positioning method based on area candidate frame tracking
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN112183675B (en) Tracking method for low-resolution target based on twin network
CN113436227A (en) Twin network target tracking method based on inverted residual error
CN112232134A (en) Human body posture estimation method based on hourglass network and attention mechanism
CN113052755A (en) High-resolution image intelligent matting method based on deep learning
CN111027586A (en) Target tracking method based on novel response map fusion
CN113902991A (en) Twin network target tracking method based on cascade characteristic fusion
Wang et al. Hierarchical spatiotemporal context-aware correlation filters for visual tracking
CN110310305A (en) A kind of method for tracking target and device based on BSSD detection and Kalman filtering
CN111462132A (en) Video object segmentation method and system based on deep learning
Zhang et al. Spatio-temporal matching for siamese visual tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant