CN111968155B

CN111968155B - A target tracking method based on segmentation target mask update template

Info

Publication number: CN111968155B
Application number: CN202010718018.0A
Authority: CN
Inventors: 张静; 郝志晖; 刘婧; 苏育挺
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2022-05-17
Anticipated expiration: 2040-07-23
Also published as: CN111968155A

Abstract

The invention discloses a target tracking method based on segmentation target mask update template, comprising the following steps: constructing a basic network framework for target tracking; initializing the network, obtaining foreground information in the regression frame, generating an initialization target template with outstanding foreground, and then combining with Initialize the linear superposition of the target template; input the above linear superposition result into the target template branch of the tracking module to obtain the center position and size of the target in the next frame; every m frame, the network will calculate the target center point and regression frame parameters of the corresponding frame. The target template corresponding to the frame, and the frame target template is input into the above-mentioned mask segmentation module, and the subsequent frame target template of the prominent foreground is generated; the linearly superimposed initialization target template, the initialized target template of the foreground and the subsequent frame target template of the prominent foreground are superimposed. , generate a new target template used in the next frame; input the target template into the tracking network framework, and calculate the center position and size of the target in the next frame.

Description

A target tracking method based on segmentation target mask update template

技术领域technical field

本发明涉及深度神经网络领域，尤其涉及深层孪生网络框架下一种基于分割目标掩模更新模板的目标跟踪方法。The invention relates to the field of deep neural networks, in particular to a target tracking method based on a segmentation target mask update template under a deep Siamese network framework.

背景技术Background technique

随着人工智能的日益发展，计算机视觉越来越广泛地应用于各个领域和人们的日常生活中。目标跟踪技术作为计算机视觉领域的一个重要分支，在自动驾驶、人机交互、行人检测和武器精确打击等不同应用领域都有着举足轻重的地位和作用，其本身也具有十分广阔的应用前景和深远的研究意义。With the increasing development of artificial intelligence, computer vision is more and more widely used in various fields and people's daily life. As an important branch of computer vision, target tracking technology plays a pivotal role in different application fields such as automatic driving, human-computer interaction, pedestrian detection and weapon precision strike. It also has a very broad application prospect and far-reaching effect. Significance.

目标跟踪的基本任务可以简要概括为：在视频第一帧中给定目标的大小和位置，由算法在后续帧中逐帧计算出目标中心位置和尺寸大小，由此实现跟踪视频中的目标。目标跟踪算法按照视频帧数和目标是否单一主要分为：长视频多目标跟踪、长视频单目标跟踪、短视频多目标跟踪和短视频单目标跟踪。虽然目标跟踪算法近年来发展迅速，但其仍然面临诸多挑战性问题，例如：同类目标干扰、目标非刚性形变、目标尺寸变化和目标在平面内、外旋转等等，这些问题都会不同程度地影响目标跟踪算法的性能；同时，目标跟踪的实际应用要求算法具有实时性，因此算法的速度也是衡量算法性能的关键指标。The basic task of target tracking can be briefly summarized as: given the size and position of the target in the first frame of the video, the algorithm calculates the center position and size of the target frame by frame in the subsequent frames, thereby realizing the target tracking in the video. According to the number of video frames and whether the target is single, the target tracking algorithm is mainly divided into: long video multi-target tracking, long video single target tracking, short video multi target tracking and short video single target tracking. Although the target tracking algorithm has developed rapidly in recent years, it still faces many challenging problems, such as: interference of the same target, non-rigid deformation of the target, change of target size, and rotation of the target in and out of the plane, etc. These problems will affect to varying degrees. The performance of the target tracking algorithm; at the same time, the practical application of target tracking requires the algorithm to be real-time, so the speed of the algorithm is also a key indicator to measure the performance of the algorithm.

目标跟踪算法有两个重要分支，分别为相关滤波类算法和深度学习类算法。近年来，在深度学习类算法中，基于深层孪生卷积神经网络的目标跟踪算法由于较好地平衡了速度和精度，且在稳定性上较相关滤波类算法更优，受到了众多研究者的关注和开发。该类算法通常采用大规模数据集端到端训练模型并离线跟踪，因而缺乏类似相关滤波算法中调整滤波器模板的在线更新策略，在出现目标物体快速运动变化、被遮挡或目标发生非刚性形变的应用场景中，跟踪算法的稳定性会发生不同程度的下降。因此，在深层孪生卷积神经网络框架下增加目标模板在线更新机制、提升算法对复杂应用场景的适应能力是十分必要的。There are two important branches of target tracking algorithm, namely correlation filtering algorithm and deep learning algorithm. In recent years, among deep learning algorithms, target tracking algorithms based on deep Siamese convolutional neural networks have been favored by many researchers due to their better balance between speed and accuracy and better stability than related filtering algorithms. Follow and develop. This type of algorithm usually uses a large-scale dataset to train the model end-to-end and track offline, so it lacks an online update strategy for adjusting the filter template similar to related filtering algorithms. In the application scenarios of , the stability of the tracking algorithm will decrease to varying degrees. Therefore, it is necessary to increase the online update mechanism of target templates under the framework of deep Siamese convolutional neural network and improve the adaptability of the algorithm to complex application scenarios.

深度掩模网络(DeepMask^[1])是一个基于VGG网络的实例分割模型，网络实现了前、背景分割、前景语义分割和前景实例分割，网络输出为图像前景的掩模。众所周知，目标跟踪算法从本质上可以理解为在搜索区域内的图像前景、背景的二分类问题，而单目标跟踪网络对于图像前景的注意力要求远大于图像背景，因此，通过分割图像前景，即目标的掩模，在目标模板中突出待跟踪的目标，有利于目标在跟踪过程中更容易被网络注意；同时，结合所述在线更新机制，十分有利于提升目标跟踪算法对复杂场景的适应能力和算法运行的稳定性。The deep mask network (DeepMask ^[1] ) is an instance segmentation model based on the VGG network. The network implements front and background segmentation, foreground semantic segmentation and foreground instance segmentation, and the network outputs the mask of the image foreground. As we all know, the target tracking algorithm can be essentially understood as a binary classification problem of the image foreground and background in the search area, and the single target tracking network requires much more attention to the image foreground than the image background. Therefore, by segmenting the image foreground, namely The mask of the target highlights the target to be tracked in the target template, which is beneficial for the target to be more easily noticed by the network during the tracking process; at the same time, combined with the online update mechanism, it is very beneficial to improve the target tracking algorithm's ability to adapt to complex scenes and the stability of the algorithm operation.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为了克服现有技术中的不足，提供一种基于分割目标掩模更新模板的目标跟踪方法。该目标跟踪方法通过分割目标模板中目标的掩模，提升网络对初始化帧中目标的注意力；同时在后续帧中增加模板在线更新策略，有效提升算法对跟踪过程中出现的挑战性问题的适应能力。The purpose of the present invention is to provide a target tracking method based on segmentation target mask update template in order to overcome the deficiencies in the prior art. The target tracking method improves the network's attention to the target in the initialization frame by segmenting the target mask in the target template; at the same time, the template online update strategy is added in subsequent frames to effectively improve the algorithm's adaptation to challenging problems in the tracking process. ability.

本发明的目的是通过以下技术方案实现的：The purpose of this invention is to realize through the following technical solutions:

一种基于分割目标掩模更新模板的目标跟踪方法，包括以下步骤：A target tracking method based on segmentation target mask update template, comprising the following steps:

构建目标跟踪的基础网络框架；Build the basic network framework for target tracking;

初始化网络，将初始化获取的目标回归框参数输入所述基础网络框架中基于DeepMask网络框架的掩模分割模块获取回归框内的前景信息，生成突出前景的初始化目标模板，再与初始化目标模板线性叠加；Initialize the network, input the initialized and obtained target regression frame parameters into the mask segmentation module based on the DeepMask network framework in the basic network framework to obtain the foreground information in the regression frame, generate an initialized target template that highlights the foreground, and then linearly superimpose with the initialized target template. ;

将上述线性叠加结果输入跟踪模块的目标模板分支，获取下一帧目标的中心位置及尺寸大小；每隔m帧，网络将通过对应帧目标中心点及回归框参数计算该帧对应的目标模板，并将该帧目标模板输入上述掩模分割模块，生成所述突出前景的后续帧目标模板；Input the above linear superposition result into the target template branch of the tracking module to obtain the center position and size of the target in the next frame; every m frame, the network will calculate the target template corresponding to the frame through the corresponding frame target center point and regression frame parameters, and input the frame target template into the above-mentioned mask segmentation module to generate the subsequent frame target template of the prominent foreground;

线性叠加初始化目标模板、突出前景的初始化目标模板和突出前景的后续帧目标模板，生成下一帧所使用的新目标模板；Linearly superimpose the initial target template, the initial target template highlighting the foreground, and the subsequent frame target template highlighting the foreground, to generate a new target template used in the next frame;

将目标模板输入跟踪网络框架，计算下一帧目标的中心位置和尺寸大小。Input the target template into the tracking network framework, and calculate the center position and size of the target in the next frame.

进一步的，所述基础网络框架为：Further, the basic network framework is:

基于SiamRPN++的基本跟踪框架，在框架前端增加基于DeepMask网络框架的掩模分割模块。Based on the basic tracking framework of SiamRPN++, a mask segmentation module based on the DeepMask network framework is added to the front end of the framework.

进一步的，所述新目标模板为：Further, the new target template is:

T_i+1＝T₀+αA₀+βA_i T _i+1 =T ₀ +αA ₀ +βA _i

其中，T₀表示初始化目标模板，A₀表示突出前景的初始化目标模板，A_i表示突出前景的后续帧目标模板，α、β均为超参数，T_i+1表示下一帧跟踪所使用的新目标模板。Among them, T ₀ represents the initialization target template, A ₀ represents the initialization target template that highlights the foreground, A _i represents the subsequent frame target template that highlights the foreground, α and β are hyperparameters, and T _i+1 represents the next frame tracking. New target template.

进一步的，further,

A₀＝F_crop(DeepMask(x₀，y₀，bbox₀))A ₀ =F _crop (DeepMask(x ₀ , y ₀ , bbox ₀ ))

其中，x₀、y₀、bbox₀分别为目标初始化阶段生成的初始中心坐标和回归框参数；F_crop表示裁剪函数，用以在视频某一帧中裁剪目标模板；DeepMask表示基于DeepMask网络框架的掩模分割模块；A₀表示突出前景的初始化目标模板；Among them, x ₀ , y ₀ , and bbox ₀ are the initial center coordinates and regression box parameters generated in the target initialization stage, respectively; F _crop represents the cropping function, which is used to crop the target template in a certain frame of the video; DeepMask represents the network framework based on DeepMask. Mask segmentation module; A ₀ represents the initialization target template that highlights the foreground;

A_i＝F_crop(DeepMask(x_i，y_i，bbox_i)_m|i)A _i =F _crop (DeepMask( _xi , y _i , bbox _i ) _m|i )

其中，m表示每隔m帧更新一次所述突出前景的后续帧目标模板，x_i、y_i、bbox_i分别表示当更新时这一帧中目标的中心坐标和回归框参数，F_crop表示裁剪函数，DeepMask表示掩模分割模块，A_i表示突出前景的后续帧目标模板。Among them, m represents that the target template of the subsequent frame of the prominent foreground is updated every m frames, x _i , y _i , and bbox _i represent the center coordinates and regression frame parameters of the target in this frame when updating, respectively, and F _crop represents cropping function, DeepMask represents the mask segmentation module, and A _i represents the subsequent frame target template that highlights the foreground.

与现有技术相比，本发明的技术方案所带来的有益效果是：Compared with the prior art, the beneficial effects brought by the technical solution of the present invention are:

1、本发明以SiamRPN++^[2]为基础跟踪框架，利用DeepMask^[1]网络框架在初始化过程中分割目标掩模，并利用所述目标掩模生成关于前景的注意力，提高目标初始化帧中的权重，从而网络对于所跟踪目标物体的关注程度，以便在跟踪过程中正确预测目标位置。1. The present invention takes SiamRPN++ ^[2] as the base tracking framework, uses the DeepMask ^[1] network framework to segment the target mask in the initialization process, and uses the target mask to generate attention on the foreground to improve the target initialization frame. Weight, so the network pays attention to the tracked target object, so as to correctly predict the target position during the tracking process.

2、本发明通过增加目标模板在线更新机制，缓解了跟踪算法进行离线跟踪时无法适应目标物体形态变化较大、被遮挡等挑战性问题带来的负面影响。该机制通过所述基于DeepMask^[1]网络框架的掩模分割模块分割后续帧目标掩模，生成后续帧中目标相关的掩模信息，有效提升了新目标模板对运动变化的适应能力。2. The present invention alleviates the negative impact caused by the inability of the tracking algorithm to adapt to challenging problems such as large morphological changes and occlusion of the target object when the tracking algorithm performs offline tracking by adding an online update mechanism of the target template. This mechanism uses the mask segmentation module based on the DeepMask ^[1] network framework to segment the target mask of the subsequent frame, and generates the mask information related to the target in the subsequent frame, which effectively improves the adaptability of the new target template to motion changes.

3、本发明不仅在初始化阶段利用比平行于坐标轴的回归框参数更精确的目标掩模信息生成前景注意力，在后续阶段还增加了对目标相关前景信息的线性叠加，使得新生成的目标模板富集后续帧中目标相关的运动信息，从而提升了算法对目标运动的适应能力。在通用基准数据集中，本发明提出的方法具有更好的实验结果。3. The present invention not only uses more accurate target mask information than the regression frame parameters parallel to the coordinate axis to generate foreground attention in the initialization stage, but also increases the linear superposition of target-related foreground information in the subsequent stage, so that the newly generated target The template enriches target-related motion information in subsequent frames, thereby improving the algorithm's ability to adapt to target motion. In the general benchmark dataset, the method proposed in the present invention has better experimental results.

附图说明Description of drawings

图1为一种基于分割目标掩模更新模板的目标跟踪方法的流程图；Fig. 1 is a kind of flow chart of the target tracking method based on segmentation target mask update template;

图2为基于DeepMask^[1]网络框架的目标掩模分割网络框图；Figure 2 is a block diagram of the target mask segmentation network based on the DeepMask ^[1] network framework;

图3为基于分割目标掩模更新目标模板的跟踪算法框图。Figure 3 is a block diagram of a tracking algorithm for updating target templates based on segmentation target masks.

具体实施方式Detailed ways

以下结合附图和具体实施例对本发明作进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

1.本发明实施例提出了一种基于分割目标掩模更新模板的目标跟踪方法，参见图1，该方法包括以下步骤：1. An embodiment of the present invention proposes a target tracking method based on a segmentation target mask update template. Referring to FIG. 1, the method includes the following steps:

101：构建目标跟踪算法的基础网络框架：即，首先构建基于SiamRPN++^[2]的基础跟踪框架，然后在该网络输入端的目标模板分支前增加基于DeepMask^[1]网络框架的掩模分割模块，构成完整的基础网络框架。101: Build the basic network framework of the target tracking algorithm: that is, first build the basic tracking framework based on SiamRPN++ ^[2] , and then add a mask segmentation module based on the DeepMask ^[1] network framework before the target template branch at the input of the network. Complete basic network framework.

其中，DeepMask^[1]网络框架由VGG特征提取网络、掩模预测分支和类别得分分支组成。本发明省去了类别得分分支的输出，只使用掩模预测分支的输出作为分割前景的初步结果。将所述掩模分割模块与基本跟踪网络SiamRPN++^[2]中目标模板分支部分连接，构成本发明所述方法的基础网络框架。Among them, the DeepMask ^[1] network framework consists of a VGG feature extraction network, a mask prediction branch and a class scoring branch. The present invention omits the output of the category score branch, and only uses the output of the mask prediction branch as the preliminary result of segmenting the foreground. The mask segmentation module is connected with the target template branch part of the basic tracking network SiamRPN++ ^[2] to form the basic network framework of the method of the present invention.

102：本发明首先进行初始化目标模板的掩模分割，即将初始化信息输入基于DeepMask^[1]的前景突出模块，生成突出前景的初始化目标模板。102: The present invention firstly performs mask segmentation of the initialization target template, that is, inputting the initialization information into the foreground highlighting module based on DeepMask ^[1] to generate an initialization target template highlighting the foreground.

该方法首先获取初始化过程中的目标中心位置和回归框参数，将其输入DeepMask^[1]框架中掩模预测分支，生成掩模分割结果，再通过计算生成突出前景的初始化目标模板，最后将该项与初始化目标模板线性叠加。The method first obtains the target center position and regression frame parameters in the initialization process, and inputs them into the mask prediction branch in the DeepMask ^[1] framework to generate the mask segmentation result, and then generates the initialization target template that highlights the foreground through calculation. Items are superimposed linearly with the initialization target template.

103：在跟踪过程中，本发明增加了利于目标模板适应目标运动变化的模板在线更新策略。该策略主要通过分割后续帧目标模板中的前景信息，并每隔m帧进行一次对新目标模板中部分信息的更新，实现目标模板的在线更新。在跟踪开始后，算法会对后续帧进行计数，当间隔帧数为m时，获取该帧目标中心位置及回归框信息，输入掩模提取模块生成突出前景的后续帧目标模板，并将该模板与初始化目标模板及上述突出前景的目标模板线性叠加，生成下一帧使用的新模板。在下一次更新前，所述突出前景的后续帧目标模板保持不变。本发明中，m＝30；103: During the tracking process, the present invention adds a template online update strategy that is beneficial for the target template to adapt to changes in the target motion. The strategy mainly realizes the online update of the target template by segmenting the foreground information in the target template of subsequent frames, and updating part of the information in the new target template every m frames. After the tracking starts, the algorithm will count the subsequent frames. When the interval frame number is m, obtain the target center position and regression box information of the frame, input the mask extraction module to generate the subsequent frame target template that highlights the foreground, and use the template to It is linearly superimposed with the initialization target template and the above-mentioned target template with prominent foreground to generate a new template for the next frame. Until the next update, the target template of the subsequent frame of the prominent foreground remains unchanged. In the present invention, m=30;

104：在测试阶段，首先对算法进行初始化，得到第一帧目标模板和后续帧的搜索区域，将初始化目标模板通过掩模提取模块生成两部分突出前景的相关项，再将新模板及裁剪为固定大小的搜索区域同时输入特征提取网络进行特征匹配及计算，计算下一帧目标的位置偏移和尺寸大小，完成对下一帧目标的跟踪，并对跟踪进行计数，每隔30帧对新模板中突出前景的后续帧目标模板项进行更新。104: In the testing phase, first initialize the algorithm to obtain the target template of the first frame and the search area of the subsequent frames, generate two parts of the relevant items that highlight the foreground through the mask extraction module of the initialized target template, and then cut the new template and the target template into The fixed-size search area is also input into the feature extraction network for feature matching and calculation, calculating the position offset and size of the target in the next frame, completing the tracking of the target in the next frame, and counting the tracking. Subsequent frame target template items that highlight the foreground in the template are updated.

综上所述，本发明实施例通过步骤101至步骤104设计了一种基于分割目标掩模更新模板的目标跟踪方法，在初始化过程中增加了前景信息所占的比重，有效提升了特征提取网络对目标的关注程度；同时，由于目标运动在时间和空间上都具有一定的连续性，因此最近一些帧中目标运动变化较为相近，且前帧信息对后续帧具有一定的指导作用。利用这一特性，本发明在跟踪过程中增加了目标模板在线更新策略，在目标模板中增加了目标最近的运动变化信息。该策略有效提升了所述新目标模板对物体运动变化的适应能力，从而有效提升了算法性能。To sum up, the embodiment of the present invention designs a target tracking method based on segmentation target mask update template through steps 101 to 104, increases the proportion of foreground information in the initialization process, and effectively improves the feature extraction network. The degree of attention to the target; at the same time, since the target motion has a certain continuity in time and space, the target motion changes in recent frames are relatively similar, and the previous frame information has a certain guiding role for the subsequent frames. Taking advantage of this feature, the present invention adds an online update strategy of the target template in the tracking process, and adds the latest motion change information of the target to the target template. This strategy effectively improves the adaptability of the new target template to changes in object motion, thereby effectively improving the performance of the algorithm.

2.下面对上述实施例的技术方案进行进一步地介绍，详见下文描述：2. The technical solutions of the above-described embodiments are further introduced below, and are described in detail below:

201：通常来讲，在短视频单目标跟踪的任务中，通常会首先给定第一帧中待跟踪目标的中心位置和尺寸大小，由算法在后续帧中逐帧计算目标的位置偏移和尺寸变化。近年来，基于深层孪生卷积神经网络的目标跟踪算法通常采用双支路的特征匹配框架，其中目标模板分支主要对包含待跟踪目标的目标模板进行特征提取，并在网络中与搜索区域的特征进行匹配。因此，目标模板中包含的信息对最终目标跟踪的结果有直接的影响。目标模板通常由以待测目标为中心，联合该目标周围的少量背景信息组成，即为目标模板中的前景和背景。常用孪生网络所使用的目标模板均为初始化过程直接产生，没有对目标模板中前景、背景信息分配权重，因此网络对于目标模板中前景背景的关注程度几乎一致。本发明利用基于DeepMask^[1]网络框架突出目标模板中的前景信息，使待跟踪目标所占比重增加，从而使网络更容易关注到待跟踪目标，进而提升跟踪算法的性能。201: Generally speaking, in the task of short video single target tracking, the center position and size of the target to be tracked in the first frame are usually given first, and the algorithm calculates the target position offset and the target frame by frame in subsequent frames. Dimensions vary. In recent years, target tracking algorithms based on deep Siamese convolutional neural networks usually use a two-branch feature matching framework. to match. Therefore, the information contained in the target template has a direct impact on the final target tracking results. The target template is usually composed of the target to be tested as the center, combined with a small amount of background information around the target, that is, the foreground and background in the target template. The target templates used by the commonly used Siamese network are all directly generated during the initialization process, and there is no weight assigned to the foreground and background information in the target template, so the network pays almost the same attention to the foreground and background in the target template. The invention uses the network framework based on DeepMask ^[1] to highlight the foreground information in the target template, so that the proportion of the target to be tracked increases, so that the network can more easily pay attention to the target to be tracked, thereby improving the performance of the tracking algorithm.

本发明采用深层孪生卷积神经网络SiamRPN++^[2]作为跟踪网络的基本框架，在其目标模板分支前增加基于DeepMask^[1]网络框架的掩模分割模块。该模块的输入为待跟踪目标的中心位置和回归框参数，将其输入DeepMask^[1]网络。该网络由基本特征提取网络VGG、掩模预测分支和得分预测分支三两部分构成，本发明省去了得分预测分支的输出，因而只使用该网络两部分分支。其中VGG网络包含8个3×3的卷积层、4个2×2的最大值池化层，掩模预测分支包含1个1×1的卷积层和基于双线性插值的上采样层，该网络输出即为突出前景的回归框内目标图像。目标跟踪的基础网络框架SiamRPN++^[2]采用ResNet50提取特征，并通过三个串联的RPN(Region Proposal Network，区域候选网络)网络输出最后的跟踪结果。在本发明中，SiamRPN++^[2]网络的目标模板分支输入为线性叠加初始化目标模板，突出前景的初始化目标模板和突出前景的后续帧目标模板构成的新目标模板。The present invention adopts the deep Siamese convolutional neural network SiamRPN++ ^[2] as the basic framework of the tracking network, and adds a mask segmentation module based on the DeepMask ^[1] network framework before its target template branch. The input of this module is the center position of the target to be tracked and the parameters of the regression box, which are input into the DeepMask ^[1] network. The network consists of three parts: the basic feature extraction network VGG, the mask prediction branch and the score prediction branch. The present invention omits the output of the score prediction branch, so only the two parts of the network are used. The VGG network contains eight 3×3 convolutional layers, four 2×2 max pooling layers, and the mask prediction branch contains one 1×1 convolutional layer and an upsampling layer based on bilinear interpolation , and the output of the network is the target image in the regression box that highlights the foreground. The basic network framework of target tracking, SiamRPN++ ^[2] , uses ResNet50 to extract features, and outputs the final tracking results through three series RPN (Region Proposal Network, Region Proposal Network) networks. In the present invention, the target template branch input of the SiamRPN++ ^[2] network is a new target template composed of a linearly superimposed initialized target template, an initialized target template that highlights the foreground, and a subsequent frame target template that highlights the foreground.

202：在算法的初始化过程中，由第一帧给定目标的中心位置和回归框参数，将其输入基于DeepMask^[1]网络的掩模分割模块，分割回归框内图像的前景信息，获取突出前景的回归框内图像，令其覆盖原始第一帧回归框内的图像，再经过裁剪函数F_crop分割经过分割掩模的第一帧，获取突出前景的目标模板，该目标模板即为突出前景的初始化目标模板。所述F_crop裁剪的目标模板大小公式表示如下：202: In the initialization process of the algorithm, the center position of the target and the parameters of the regression box are given by the first frame, and input it into the mask segmentation module based on the DeepMask ^[1] network, segment the foreground information of the image in the regression box, and obtain the highlight. The image in the regression frame of the foreground is made to cover the image in the original first frame of the regression frame, and then the first frame after the segmentation mask is divided by the crop function F _crop , and the target template that highlights the foreground is obtained. The target template is the prominent foreground The initialization target template of . The target template size formula for F _crop cropping is expressed as follows:

A＝s(w+p)×s(h+p)A=s(w+p)×s(h+p)

其中，A为裁剪后的目标模板大小，即127×127；w、h分别为回归框的宽和高；s为缩放因子，p为扩充因子，p＝(w+h)/2。记突出前景的初始化目标模板为A₀。Among them, A is the size of the target template after cropping, that is, 127×127; w and h are the width and height of the regression box, respectively; s is the scaling factor, p is the expansion factor, p=(w+h)/2. Note that the initial target template for highlighting the foreground is A ₀ .

在跟踪过程中，待跟踪目标的形态和运动状态都会发生一定的变化，因此初始帧目标模板中的信息时效性较差，往往不能很好地适应目标模板的变化。为提升目标模板对后续帧中目标形态变化的适应能力和网络对目标的关注程度，本发明增加了目标模板在线更新机制。该机制主要依靠对后续帧中的目标进行掩模分割，不仅在初始化目标模板中增加了最近帧目标的信息，还突出了后续帧中的前景，使网络进一步提升了对目标的关注程度。与生成突出前景的初始化目标模板类似，后续每隔30帧获取由算法输出的该帧目标的中心位置和回归框参数，将其输入基于DeepMask^[1]网络的掩模分割模块获取该帧回归框内图像的前景，再覆盖该帧中回归框的位置，最后经过F_crop裁剪获取该帧突出前景的目标模板，记为A_i。During the tracking process, the shape and motion state of the target to be tracked will change to a certain extent, so the information in the target template of the initial frame is less timely and often cannot adapt to the changes of the target template well. In order to improve the adaptability of the target template to the morphological changes of the target in subsequent frames and the network's attention to the target, the present invention adds an online update mechanism of the target template. This mechanism mainly relies on mask segmentation of the target in the subsequent frames, which not only increases the information of the target in the nearest frame in the initial target template, but also highlights the foreground in the subsequent frame, which further enhances the network's attention to the target. Similar to the initialization target template for generating a prominent foreground, the center position and regression box parameters of the target in the frame output by the algorithm are obtained every 30 frames, and they are input into the mask segmentation module based on the DeepMask ^[1] network to obtain the frame regression frame. The foreground of the inner image, and then cover the position of the regression frame in the frame, and finally obtain the target template of the frame highlighting the foreground through F _crop cropping, which is recorded as A _i .

203：在测试过程中，由于目标运动在时域和空域的连贯性，本发明所述目标模板在线更新策略每隔30帧进行一次。首先，经过初始化阶段，由第一帧生成初始化目标模板以及突出前景的初始化目标模板，将其线性叠加，输入SiamRPN++^[2]的目标模板分支提取特征。当跟踪至第30帧，将基于SiamRPN++^[2]跟踪框架输出的目标中心位置和回归框输入基于DeepMask^[1]网络的掩模分割模块获取该帧突出前景的目标模板，并将其再线性叠加在上述模板中，新模板生成的公式表示如下：203: During the testing process, due to the continuity of the target motion in the time domain and the space domain, the target template online update strategy of the present invention is performed every 30 frames. First, after the initialization phase, the initial target template and the initial target template highlighting the foreground are generated from the first frame, linearly superimposed, and input into the target template branch of SiamRPN++ ^[2] to extract features. When the tracking reaches the 30th frame, input the target center position and regression box output based on the SiamRPN++ ^[2] tracking framework into the mask segmentation module based on the DeepMask ^[1] network to obtain the target template that highlights the foreground of the frame, and then linearly stack it. In the above template, the formula generated by the new template is expressed as follows:

T_i+1＝T₀+αA₀+βA_i T _i+1 =T ₀ +αA ₀ +βA _i

其中，T₀表示初始化目标模板，A₀表示突出前景的初始化目标模板，A_i表示突出前景的后续帧目标模板，α、β均为超参数，T_i+1表示下一帧跟踪所使用的新目标模板，本发明中，α＝0.03，β＝0.005。将该新目标模板输入跟踪网络的目标模板分支，计算下一帧目标的位置和回归框大小。整个算法的公式表达如下：Among them, T ₀ represents the initialization target template, A ₀ represents the initialization target template that highlights the foreground, A _i represents the subsequent frame target template that highlights the foreground, α and β are hyperparameters, and T _i+1 represents the next frame tracking. The new target template, in the present invention, α=0.03, β=0.005. Input the new target template into the target template branch of the tracking network, and calculate the position of the target in the next frame and the size of the regression box. The formula of the whole algorithm is expressed as follows:

(x，y，Δs)_i+1＝S(T_i+1，R_i+1)(x, y, Δs) _i+1 =S(T _i+1 , R _i+1 )

其中，T_i+1表示所述包含突出前景的初始化帧目标模板和突出前景的后续帧目标模板的新目标模板，R_i+1表示下一帧的搜索区域，S表示所述跟踪算法，(x，y，Δs)_i+1表示下一帧中的目标位置及回归框变化大小。Wherein, T _i+1 represents the new target template that contains the initial frame target template that highlights the foreground and the subsequent frame target template that highlights the foreground, R _i+1 represents the search area of the next frame, S represents the tracking algorithm, ( x, y, Δs) _i+1 represents the target position in the next frame and the change size of the regression frame.

3.下面结合具体的实验数据对上述实施例的方案进行效果评估，详见下文描述：3. Below in conjunction with specific experimental data, the scheme of the above-described embodiment is evaluated for effect, as described in detail below:

301：数据组成301: Data Composition

测试集由VOT2016、OTB100数据集中的所有视频序列构成，其中，VOT2016数据集中包含60个视频，均为彩色序列；OTB100数据集中包含100个视频，其中有75个彩色序列、25个灰度序列。The test set consists of all the video sequences in the VOT2016 and OTB100 datasets. Among them, the VOT2016 dataset contains 60 videos, all of which are color sequences; the OTB100 dataset contains 100 videos, including 75 color sequences and 25 grayscale sequences.

302：评估准则302: Evaluation Criteria

本发明分别在VOT2016数据集、OTB100数据集上采用不同的评价指标对算法进行性能评估。The present invention uses different evaluation indicators to evaluate the performance of the algorithm on the VOT2016 data set and the OTB100 data set respectively.

在VOT2016数据集上，主要采用三种评价指标，即Accuracy(准确率)、Robustness(鲁棒性)和EAO(平均重叠期望，ExpectAverage Overlap rate)对目标跟踪算法性能进行评估：Accuracy是一种衡量跟踪算法跟踪准确性的指标。通过计算视频序列中每一帧的预测目标回归框和真实目标回归框之间的IoU(重叠率，Intersection over Union)，On the VOT2016 data set, three evaluation indicators are mainly used, namely Accuracy (accuracy rate), Robustness (robustness) and EAO (average overlap expectation, ExpectAverage Overlap rate) to evaluate the performance of the target tracking algorithm: Accuracy is a measure Tracking algorithms track metrics of accuracy. By calculating the IoU (overlap rate, Intersection over Union) between the predicted target regression frame and the real target regression frame of each frame in the video sequence,

其中，

表示真实目标回归框，

表示预测目标回归框。为了保证实验中测试的准确性，计算时会多次重复测量每帧的准确率，最后将所有结果取平均，从而获取跟踪算法最终的准确率值。所述Accuracy值越大，表明跟踪结果准确性越好；Robustness是一种衡量跟踪算法稳定性的指标，计算跟踪过程出现跟丢情况的帧的数量。所述Robustness值越大，表明跟踪算法跟丢的帧数越多，算法越不稳定；EAO值则用来综合评估算法的准确性和鲁棒性，是总体评估跟踪算法性能的重要指标。其值计算过程如下：首先将所有视频序列按照长度分类，将待测的跟踪器在长度为N_s的序列上测试，从第一帧到最后一帧，没有跟踪失败后重新初始化的机制，得到每一帧的准确率(Accuracy)φ_i，之后对每一帧取平均，得到该序列上的准确率

将所有长度为N_s的序列全部测评一遍并求平均，得到跟踪器在长度为N_s序列上的EAO值

其他长度的序列也按照同样的方法计算EAO。对不同N_s值对应的

再次求平均，在序列长度范围为[N_lo，N_hi]上得到一个定值：in,

represents the true target regression box,

Represents the prediction target regression box. In order to ensure the accuracy of the test in the experiment, the accuracy of each frame will be measured repeatedly during the calculation, and finally all the results are averaged to obtain the final accuracy value of the tracking algorithm. The larger the Accuracy value is, the better the accuracy of the tracking result is; Robustness is an indicator for measuring the stability of the tracking algorithm, and the number of frames that are tracked and lost during the tracking process is calculated. The larger the Robustness value, the more frames lost by the tracking algorithm, and the more unstable the algorithm is; the EAO value is used to comprehensively evaluate the accuracy and robustness of the algorithm, and is an important indicator for evaluating the overall performance of the tracking algorithm. The calculation process of its value is as follows: First, all video sequences are classified according to their lengths, and the tracker to be tested is tested on a sequence of length N _s . From the first frame to the last frame, there is no mechanism for re-initialization after tracking failure. Accuracy rate of each frame (Accuracy) φ _i , and then average each frame to get the accuracy rate on the sequence

All sequences of length N _s are evaluated and averaged to obtain the EAO value of the tracker on the sequence of length N _s

For sequences of other lengths, EAO is calculated in the same way. Corresponding to different N _s values

Average again to get a constant value over the range of sequence lengths [N _lo , N _hi ]:

在OTB数据集上，主要采用一次通过计算精确率图(Precision Plot)和成功率图(Success Plot)。在精确率图中，通常计算目标预测位置的中心点和真实值之间的距离，并统计距离小于设定阈值的帧数占序列总帧数的比例。不同的阈值设定得到的精确率不同，本发明所述实验中阈值设定为0-50；在成功率图中，通常计算目标回归框的预测值和真实值之间的IoU，得到重合率得分(overlap score)，并统计得分大于设定阈值的帧占序列总帧数的比例，本发明所述实验中阈值设定为0-1。On the OTB dataset, the one-pass calculation precision plot (Precision Plot) and success rate plot (Success Plot) are mainly used. In the accuracy rate graph, the distance between the center point of the target predicted position and the actual value is usually calculated, and the ratio of the number of frames with a distance smaller than the set threshold to the total number of frames in the sequence is counted. Different threshold settings have different accuracy rates. In the experiment of the present invention, the threshold is set to 0-50; in the success rate graph, the IoU between the predicted value and the true value of the target regression frame is usually calculated to obtain the coincidence rate. The overlap score is calculated, and the ratio of frames with a score greater than the set threshold to the total number of frames in the sequence is counted. In the experiment of the present invention, the threshold is set to 0-1.

303：对比算法303: Comparison Algorithms

在对比实验中，本发明在VOT2016数据集上与七种主流算法进行比较，其中2种相关滤波类算法，5种深度学习类算法。所述相关滤波类算法包括：C-COT^[3]、ECO^[4]。该类算法跟踪速度较快，但准确性与深度学习类算法相比较低。所述深度学习类算法包括：SiameseFC^[5]、SiameseRPN^[6]、DaSiamRPN^[7]和SiamRPN++^[2]。该类算法平衡了跟踪的速度和精度，不仅满足实时跟踪的要求，而且具有良好的准确性和稳定性。In the comparative experiment, the present invention is compared with seven mainstream algorithms on the VOT2016 data set, including two related filtering algorithms and five deep learning algorithms. The correlation filtering algorithms include: C-COT ^[3] , ECO ^[4] . This type of algorithm has a faster tracking speed, but its accuracy is lower than that of deep learning algorithms. The deep learning algorithms include: SiameseFC ^[5] , SiameseRPN ^[6] , DaSiamRPN ^[7] and SiamRPN++ ^[2] . This kind of algorithm balances the speed and accuracy of tracking, which not only meets the requirements of real-time tracking, but also has good accuracy and stability.

在OTB100数据集上与八种主流算法进行比较，其中6种相关滤波类算法，5种深度学习类算法。所述相关滤波类算法包括：ECO^[4]、SRDCF^[8]、BACF^[9]、SRDCFdecon^[10]、Staple^[11]和LMCF^[12]。所述深度学习类算法包括：SiamDW^[13]中的SiamFCRes22、Memtrack^[14]。On the OTB100 dataset, it is compared with eight mainstream algorithms, including 6 related filtering algorithms and 5 deep learning algorithms. The correlation filtering algorithms include: ECO ^[4] , SRDCF ^[8] , BACF ^[9] , SRDCFdecon ^[10] , Staple ^[11] and LMCF ^[12] . The deep learning algorithms include: SiamFCRes22 and Memtrack ^{[14] in SiamDW [13]} ^.

表1、表2分别表示本方法与相关对比算法在VOT2016数据集、OTB100数据集上的客观评估结果(最好的评价结果用加粗字体表示)。由表1可以看出，在大部分深度学习类方法和本发明所提出的方法得到的评价结果中，Accuracy普遍高于相关滤波类跟踪算法；Robustness普遍略高于相关滤波类算法；EAO指标普遍明显高于相关滤波类算法。相较于深度学习类对比算法，本方法在所述三个客观评价指标上都取得了更高的结果，客观上表明本方法在VOT2016数据集上较对比算法具有更好的性能。由表2可以看出，相关滤波类算法和深度学习类算法在Success和Presion指标上的差距不明显，但总体上深度学习类算法的性能普遍高于相关滤波类算法，而本发明提出的算法在OTB100数据集上测试的所述两项客观指标上较相关滤波类算法和深度学习类算法都具有更好的客观结果。Table 1 and Table 2 respectively show the objective evaluation results of this method and related comparison algorithms on the VOT2016 data set and the OTB100 data set (the best evaluation results are shown in bold). It can be seen from Table 1 that in the evaluation results obtained by most deep learning methods and the method proposed by the present invention, the Accuracy is generally higher than the related filtering tracking algorithm; the Robustness is generally slightly higher than the related filtering algorithm; the EAO index is generally higher. Significantly higher than related filtering algorithms. Compared with the deep learning class comparison algorithm, this method has achieved higher results on the three objective evaluation indicators, which objectively shows that this method has better performance than the comparison algorithm on the VOT2016 data set. It can be seen from Table 2 that the difference between the related filtering algorithm and the deep learning algorithm in the Success and Presion indicators is not obvious, but the performance of the deep learning algorithm is generally higher than that of the related filtering algorithm, and the algorithm proposed by the present invention is generally higher. The two objective indicators tested on the OTB100 dataset have better objective results than related filtering algorithms and deep learning algorithms.

经过所述两个数据库的实验验证，本发明所述算法的实验结果均更佳，这也在一定程度上说明本发明提出的算法能够降低跟踪过程中跟踪失败的概率，有效提升跟踪结果的精度，增强跟踪器面对如目标物体非刚性形变、被遮挡以及快速变化问题时的适应能力，从而提升跟踪算法的准确性和稳定性。After the experimental verification of the two databases, the experimental results of the algorithm of the present invention are better, which also shows that the algorithm proposed by the present invention can reduce the probability of tracking failure in the tracking process to a certain extent, and effectively improve the accuracy of the tracking results. , to enhance the adaptability of the tracker when faced with problems such as non-rigid deformation, occlusion and rapid changes of the target object, thereby improving the accuracy and stability of the tracking algorithm.

表1Table 1

表2Table 2

参考文献references

[1]Pinheiro P O,Collobert R,Dollar P.Learning to Segment ObjectCandidates[C]//Advances in Neural Information Processing Systems,Dec.7-12,2015,Montreal,Canada,1990-1998[1]Pinheiro P O,Collobert R,Dollar P.Learning to Segment ObjectCandidates[C]//Advances in Neural Information Processing Systems,Dec.7-12,2015,Montreal,Canada,1990-1998

[2]Li B,Wu W,Wang Q,et al.SiamRPN++:Evolution of Siamese VisualTracking with Very Deep Networks[C]//IEEE International Conference onComputer Vision and Pattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019,pp:4277-4286.[2] Li B, Wu W, Wang Q, et al.SiamRPN++:Evolution of Siamese VisualTracking with Very Deep Networks[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.16-20,2019,Long Beach, CA, USA. 2019, pp: 4277-4286.

[3]Danelljan M,Robinson A,Khan F S,et al.Beyond Correlation Filters:Learning Continuous Convolution Operators for Visual Tracking[C]//Proceedingsof the 2016European Conference on Computer Vision,Oct.8-16,2016.Amsterdam,theNetherlands.2016,472-488.[3]Danelljan M,Robinson A,Khan F S,et al.Beyond Correlation Filters:Learning Continuous Convolution Operators for Visual Tracking[C]//Proceedingsof the 2016European Conference on Computer Vision,Oct.8-16,2016.Amsterdam,theNetherlands .2016, 472-488.

[4]Danelljan M,Gavves G,Khan F S,et al.Eco:Efficient convolutionoperators for tracking[C]//2017IEEE Conference on Computer Vision and PatternRecognition,Jul.21-26,2017,Honolulu,HI,USA:IEEE,2017,79:6931-6939.[4] Danelljan M, Gavves G, Khan F S, et al.Eco:Efficient convolutionoperators for tracking[C]//2017IEEE Conference on Computer Vision and PatternRecognition,Jul.21-26,2017,Honolulu,HI,USA:IEEE, 2017, 79: 6931-6939.

[5]Bertinetto L,Valmadre J,Henriques J F,et al.Fully-convolutionalSiamese networks for object tracking[C]//2016European Conference on ComputerVision Workshop,Oct.8-10,2016,Amsterdam,The Netherlands,2016,9914:850-865.[5] Bertinetto L, Valmadre J, Henriques J F, et al.Fully-convolutional Siamese networks for object tracking[C]//2016European Conference on ComputerVision Workshop,Oct.8-10,2016,Amsterdam,The Netherlands,2016,9914: 850-865.

[6]Li B,Yan J,Wu W,et al.High Performance Visual Tracking withSiamese Region Proposal Network[C]；2018IEEE/CVF Conference on Computer Visionand Pattern Recognition,Salt Lake City,UT,2018,pp.8971-8980.[6] Li B, Yan J, Wu W, et al. High Performance Visual Tracking with Siamese Region Proposal Network [C]; 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp.8971-8980 .

[7]Zhang Z,Peng H.Deeper and wider siamese networks for real-timevisual tracking[C]//IEEE International Conference on Computer Vision andPattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019,pp:4591-4600.[7] Zhang Z, Peng H. Deeper and wider siamese networks for real-time visual tracking[C]//IEEE International Conference on Computer Vision and Pattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019, pp:4591-4600.

[8]Danelljan M,Hager G,Shahbaz Khan F,et al.Learning spatiallyregularized correlation filters for visual tracking[C]//IEEE InternationalConference on Computer Vision,Dec.13-16,2015,Santiago,Chile.2015,pp:4310-4318.[8] Danelljan M, Hager G, Shahbaz Khan F, et al.Learning spatiallyregularized correlation filters for visual tracking[C]//IEEE InternationalConference on Computer Vision,Dec.13-16,2015,Santiago,Chile.2015,pp: 4310-4318.

[9]Kiani Galoogahi H,Fagg A,Lucey S.Learning background-awarecorrelation filters for visual tracking[C]//IEEE International Conference onComputer Vision,Oct.22-29,2017,Venice,Italy.2017,pp:1135-1143.[9] Kiani Galoogahi H, Fagg A, Lucey S. Learning background-awarecorrelation filters for visual tracking[C]//IEEE International Conference onComputer Vision,Oct.22-29,2017,Venice,Italy.2017,pp:1135- 1143.

[10]Danelljan M,Hager G,Khan F S,et al.Adaptive decontamination ofthe training set:A unified formulation for discriminative visual tracking[C]//IEEE International Conference on Computer Vision and PatternRecognition,Jun.27-30,2016,Las Vegas,NV,USA:IEEE,2016,59:1430-1438.[10] Danelljan M, Hager G, Khan F S, et al.Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking[C]//IEEE International Conference on Computer Vision and PatternRecognition,Jun.27-30,2016, Las Vegas, NV, USA: IEEE, 2016, 59: 1430-1438.

[11]Bertinetto L,Valmadre J,Golodetz S,et al.Staple:Complementarylearners for real-time tracking[C]//IEEE International Conference on ComputerVision and Pattern Recognition,Jun.27-30,2016,Las Vegas,NV,USA:IEEE,2016,237:1401-1409.[11] Bertinetto L, Valmadre J, Golodetz S, et al.Staple: Complementarylearners for real-time tracking[C]//IEEE International Conference on ComputerVision and Pattern Recognition,Jun.27-30,2016,Las Vegas,NV, USA: IEEE, 2016, 237: 1401-1409.

[12]Wang M M,Liu Y,Huang Z.Large margin object tracking withcirculant feature maps[C]//IEEE International Conference on Computer Visionand Pattern Recognition,Jul.21-26,2017,Honolulu,HI,USA.2017,pp:4021-4029.[12] Wang M M, Liu Y, Huang Z. Large margin object tracking with circulant feature maps[C]//IEEE International Conference on Computer Vision and Pattern Recognition, Jul. 21-26, 2017, Honolulu, HI, USA. 2017, pp :4021-4029.

[13]Zhang Z,Peng H.Deeper and wider siamese networks for real-timevisual tracking[C]//IEEE International Conference on Computer Vision andPattern Recognition,Jun.16-20,2019,Long Beach,CA,USA.2019,pp:4591-4600.[13] Zhang Z, Peng H. Deeper and wider siamese networks for real-time visual tracking[C]//IEEE International Conference on Computer Vision and Pattern Recognition, Jun.16-20,2019,Long Beach,CA,USA.2019, pp:4591-4600.

[14]Yang T Y,Chan Antoni B.Learning Dynamic Memory Networks forObject Tracking[C]//European Conference on Computer Vision,Sep,8-14,2018,Munich,Germany.2018,pp:152-167.[14] Yang T Y, Chan Antoni B. Learning Dynamic Memory Networks for Object Tracking[C]//European Conference on Computer Vision, Sep, 8-14, 2018, Munich, Germany. 2018, pp: 152-167.

本发明并不限于上文描述的实施方式。以上对具体实施方式的描述旨在描述和说明本发明的技术方案，上述的具体实施方式仅仅是示意性的，并不是限制性的。在不脱离本发明宗旨和权利要求所保护的范围情况下，本领域的普通技术人员在本发明的启示下还可做出很多形式的具体变换，这些均属于本发明的保护范围之内。The present invention is not limited to the embodiments described above. The above description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above-mentioned specific embodiments are only illustrative and not restrictive. Without departing from the spirit of the present invention and the protection scope of the claims, those of ordinary skill in the art can also make many specific transformations under the inspiration of the present invention, which all fall within the protection scope of the present invention.

Claims

1. A target tracking method for updating a template based on a segmented target mask is characterized by comprising the following steps:

constructing a basic network framework of target tracking;

initializing a network, inputting target regression frame parameters obtained by initialization into a mask segmentation module based on a DeepMask network frame in the basic network frame to obtain foreground information in the regression frame, generating an initialization target template with prominent foreground, and linearly overlapping the initialization target template;

inputting the linear superposition result into a target template branch of a tracking module to obtain the central position and the size of the target of the next frame; every m frames, the network calculates the target template corresponding to the frame through the target center point of the corresponding frame and the regression frame parameters, and inputs the target template of the frame into the mask segmentation module to generate the subsequent frame target template with the prominent foreground;

linearly superposing the initialization target template, the initialization target template with the outstanding foreground and the subsequent frame target template with the outstanding foreground to generate a new target template used by the next frame;

and inputting the target template into a tracking network frame, and calculating the central position and the size of the target of the next frame.

2. The method of claim 1, wherein the basic network framework is:

and a mask segmentation module based on a DeepMask network framework is added at the front end of the framework based on a basic tracking framework of the SimRPN + +.

3. The method of claim 1, wherein the new target template is:

T_i+1＝T₀+αA₀+βA_i

wherein, T₀Indicating the initialization target template, A₀Initial target template representing a prominent foreground, A_iThe target template of the subsequent frame showing the outstanding foreground, alpha and beta are both hyper-parameters, T_i+1Representing the new target template used for the next frame tracking.

4. The method of claim 3, wherein the step of updating the template of the object tracking based on the mask of the segmented object comprises,

A₀＝F_crop(DeepMask(x₀，y₀，bbox₀))

wherein x is₀、y₀、bbox₀Respectively generating a horizontal coordinate and a vertical coordinate of an initial center and a regression frame parameter in a target initialization stage; f_cropRepresenting a clipping function for clipping a target template in a frame of a video; the DeepMask represents a mask segmentation module based on a DeepMask network framework; a. the₀An initialization target template representing a salient foreground;

A_i＝F_crop(DeepMask(x_i，y_i，bbox_i)_m|i)

wherein m represents updating the target template of the subsequent frame of the salient foreground once every m frames, x_i、y_i、bbox_iRepresenting the center coordinates of the object in this frame and the regression box parameters, F, respectively, when updated_cropRepresenting a clipping function, DeepMask representing a mask segmentation Module, A_iA subsequent frame object template representing a salient foreground.