CN110135365B - Robust object tracking method based on hallucination adversarial network - Google Patents

Robust object tracking method based on hallucination adversarial network Download PDF

Info

Publication number
CN110135365B
CN110135365B CN201910418050.4A CN201910418050A CN110135365B CN 110135365 B CN110135365 B CN 110135365B CN 201910418050 A CN201910418050 A CN 201910418050A CN 110135365 B CN110135365 B CN 110135365B
Authority
CN
China
Prior art keywords
samples
target
hallucination
deformation
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910418050.4A
Other languages
Chinese (zh)
Other versions
CN110135365A (en
Inventor
王菡子
吴强强
严严
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201910418050.4A priority Critical patent/CN110135365B/en
Publication of CN110135365A publication Critical patent/CN110135365A/en
Application granted granted Critical
Publication of CN110135365B publication Critical patent/CN110135365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

基于幻觉对抗网络的鲁棒目标跟踪方法,涉及计算机视觉技术。首先提出一种新的幻觉对抗网络,旨在于学习样本对间的非线性形变,并将学习到的形变施加在新目标以此来生成新的目标形变样本。为了能有效训练所提出的幻觉对抗网络,提出形变重构损失。基于离线训练的幻觉对抗网络,提出基于幻觉对抗网络的目标跟踪方法,该方法能有效缓解深度神经网络在目标跟踪过程中由于在线更新发生的过拟合问题。此外,为了能进一步提升形变迁移质量,提出选择性性变迁移方法,进一步提升了跟踪精度。提出的目标跟踪方法在当前主流目标跟踪数据集上取得了具有竞争力的结果。

Figure 201910418050

Robust target tracking method based on hallucination adversarial network, involving computer vision technology. First, a new hallucination adversarial network is proposed, which aims to learn the nonlinear deformation between sample pairs, and apply the learned deformation to the new target to generate new target deformation samples. In order to effectively train the proposed hallucination adversarial network, a deformation reconstruction loss is proposed. Based on the hallucination adversarial network trained offline, a target tracking method based on the hallucination adversarial network is proposed, which can effectively alleviate the overfitting problem of the deep neural network due to online update in the target tracking process. In addition, in order to further improve the quality of deformation transfer, a selective deformation transfer method is proposed, which further improves the tracking accuracy. The proposed object tracking method achieves competitive results on current mainstream object tracking datasets.

Figure 201910418050

Description

基于幻觉对抗网络的鲁棒目标跟踪方法Robust object tracking method based on hallucination adversarial network

技术领域technical field

本发明涉及计算机视觉技术,尤其是涉及一种基于幻觉对抗网络的鲁棒目标跟踪方法。The invention relates to computer vision technology, in particular to a robust target tracking method based on hallucination confrontation network.

背景技术Background technique

近几年,深度神经网络在计算机视觉领域的应用取得了巨大成功。目标跟踪作为计算机视觉领域的基础问题之一,其在当前许多计算机视觉任务中均扮演了十分重要的角色,如无人驾驶、增强现实、机器人等领域。近来,基于深度神经网络的目标跟踪算法研究受到了国内外研究者的广泛关注。然而,与其他计算机视觉任务所不同(如目标检测和语义分割),深度神经网络在目标跟踪任务中的应用仍然十分的有效,主要原因为目标跟踪任务本身存在一定的特殊性,其缺少多样化的在线目标训练样本,因此极大地限制了深度神经网络的泛化性,进而影响跟踪结果。同时,目标跟踪任务旨在于跟踪任意目标,其对于要跟踪的目标不提前给出任何先验知识,这一点也对于深度神经网络离线训练数据集的选择带来了巨大挑战。因此,提出一个具有强泛化性的基于深度神经网络的目标跟踪算法具有重要的现实意义。In recent years, the application of deep neural networks in the field of computer vision has achieved great success. As one of the basic problems in the field of computer vision, object tracking plays a very important role in many current computer vision tasks, such as unmanned driving, augmented reality, robotics and other fields. Recently, the research of target tracking algorithm based on deep neural network has received extensive attention from researchers at home and abroad. However, unlike other computer vision tasks (such as object detection and semantic segmentation), the application of deep neural networks in object tracking tasks is still very effective. Therefore, the generalization of deep neural networks is greatly limited, which in turn affects the tracking results. At the same time, the target tracking task is designed to track any target, and it does not give any prior knowledge of the target to be tracked in advance, which also brings great challenges to the selection of offline training datasets for deep neural networks. Therefore, it is of great practical significance to propose a target tracking algorithm based on deep neural network with strong generalization.

为了缓解上述问题,当前国内外的研究者们主要提出了两种类型的解决方法。第一类方法将目标跟踪任务看作是一个模板匹配的问题,其具体实现往往采用深度孪生网络,将目标模板和搜索区域同时作为深度孪生网络的输入,最后得到搜索区域中与目标模板最为相似的子区域位置。基于相似度计算的深度孪生网络可以通过使用大量标注的目标跟踪数据集进行完全离线的训练,因此其可以避免由于在线训练样本过少所带来的过拟合问题。在基于深度孪生网络的目标跟踪算法中,其开创性的算法为SiamFC。基于SiamFC,研究者们提出了许多改进算法,其包括使用区域建议窗口生成网络的SiamRPN、使用动态记忆网络的MemSiamFC、使用更深层次骨架网络的SiamRPN++等。由于SiamFC类型的跟踪算法能避免耗时的在线训练步骤,因此其往往能达到远超实时的跟踪速度。然而,由于此类算法缺少对于目标表观变化在线学习的过程,其精度仍然较为受限(如在OTB数据集上的精度结果)。研究者们所提出的另一类方法旨在于利用有限的在线样本来学习鲁棒的神经网络分类器。此类方法的一般思路为使用迁移学习领域的方法来缓解过拟合问题,其较为代表性的方法为H.Nam等人于2016年提出的MDNet。MDNet首先使用多域离线学习来学习较好的分类器初始模型参数,然后在跟踪过程中,通过收集目标的正负样本来进一步训练分类器。近来,基于MDNet,研究者们提出了使用对抗学习的VITAL、学习不同层次目标表征的BranchOut、使用RNN的SANet等。相比于前一类方法,此类方法比上一类方法能达到更高的跟踪精度。然而,由于极为有限的在线样本(尤其是目标样本),使得此类方法的在线学习十分受限,仍易造成过拟合的问题,进而影响跟踪性能。因此,设计一种简单有效的方法来缓解深度目标跟踪算法在跟踪过程中发生的过拟合问题,具有非常重大的意义。In order to alleviate the above problems, researchers at home and abroad have mainly proposed two types of solutions. The first method regards the target tracking task as a template matching problem, and its specific implementation often uses a deep siamese network, taking the target template and the search area as the input of the deep siamese network at the same time, and finally obtains the search area that is most similar to the target template. sub-region location. The deep Siamese network based on similarity calculation can be trained completely offline by using a large number of labeled target tracking data sets, so it can avoid the overfitting problem caused by too few online training samples. In the target tracking algorithm based on deep Siamese network, its pioneering algorithm is SiamFC. Based on SiamFC, researchers have proposed many improved algorithms, including SiamRPN using region proposal window generation network, MemSiamFC using dynamic memory network, SiamRPN++ using deeper skeleton network, etc. Since SiamFC-type tracking algorithms avoid time-consuming online training steps, they often achieve tracking speeds far beyond real-time. However, since such algorithms lack an online learning process for target appearance changes, their accuracy is still limited (such as the accuracy results on the OTB dataset). Another class of methods proposed by researchers aims to learn robust neural network classifiers with limited online samples. The general idea of such methods is to use methods in the field of transfer learning to alleviate the overfitting problem, and a more representative method is MDNet proposed by H. Nam et al. in 2016. MDNet first uses multi-domain offline learning to learn better initial model parameters of the classifier, and then further trains the classifier by collecting positive and negative samples of the target during the tracking process. Recently, based on MDNet, researchers have proposed VITAL using adversarial learning, BranchOut learning target representations at different levels, SANet using RNN, etc. Compared with the previous method, this method can achieve higher tracking accuracy than the previous method. However, due to the extremely limited online samples (especially target samples), the online learning of such methods is very limited, and it is still prone to overfitting, which in turn affects the tracking performance. Therefore, it is of great significance to design a simple and effective method to alleviate the overfitting problem of deep target tracking algorithms during the tracking process.

与当前的目标跟踪算法相比,人类可以轻而易举的对移动的目标进行跟踪。虽然人脑的机制到目前为止还没被完全的探索清楚,但我们可以确定的是通过人类以前的学习经历,人脑衍生出了无与伦比的想象机制。人类可以从平时看到的各类事物中学习到相似的动作或变换,从而将这种相似的变换施加到不同的目标,以此想象出新的目标在不同姿态或动作下的样子。这样的想象机制与机器学习中的数据增强方法极为的类似,人脑可以类比为一个视觉分类器,然后使用想象机制来得到不同状态下的目标样本,从而训练出一个鲁棒的视觉分类器。Compared with current object tracking algorithms, humans can track moving objects with ease. Although the mechanism of the human brain has not been fully explored so far, we can be sure that through the previous learning experience of human beings, the human brain has derived an unparalleled imagination mechanism. Humans can learn similar actions or transformations from various things they see at ordinary times, so as to apply such similar transformations to different targets, so as to imagine the appearance of new targets in different poses or actions. Such an imaginary mechanism is very similar to the data augmentation method in machine learning. The human brain can be compared to a visual classifier, and then the imaginary mechanism is used to obtain target samples in different states, thereby training a robust visual classifier.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供基于幻觉对抗网络的鲁棒目标跟踪方法。The purpose of the present invention is to provide a robust target tracking method based on hallucination confrontation network.

本发明包括以下步骤:The present invention includes the following steps:

1)在有标注的目标跟踪数据集中收集大量形变样本对作为训练样本集合;1) Collect a large number of deformation sample pairs in the labeled target tracking data set as a training sample set;

在步骤1)中,所述在有标注的目标跟踪数据集中收集大量形变样本对作为训练样本集合具体过程可为:标记视频序列收集大量目标样本对,一对样本包含同一个目标;在视频序列a中,首先在第t帧选取目标样本

Figure BDA0002065066300000021
然后在后20帧内随机选取一帧中的目标样本作为
Figure BDA0002065066300000022
用于构成一组形变样本对
Figure BDA0002065066300000023
选取大量的形变样本对构成训练样本集合;所述数据集为Fei-Fei Li等人在2015年提出的ILSVRC-2015视频目标检测数据集。In step 1), the specific process of collecting a large number of deformation sample pairs in the marked target tracking data set as a training sample set may be: marking a video sequence to collect a large number of target sample pairs, and a pair of samples contains the same target; In a, first select the target sample in the t-th frame
Figure BDA0002065066300000021
Then randomly select the target sample in one frame in the next 20 frames as
Figure BDA0002065066300000022
used to form a set of deformed sample pairs
Figure BDA0002065066300000023
A large number of deformed sample pairs are selected to form a training sample set; the data set is the ILSVRC-2015 video target detection data set proposed by Fei-Fei Li et al. in 2015.

2)对步骤1)中所得到的训练样本集合中的所有样本进行特征提取,得到训练样本特征集合;2) Feature extraction is performed on all samples in the training sample set obtained in step 1) to obtain a training sample feature set;

在步骤2)中,所述特征提取的步骤可为:首先将目标样本使用双线形插值方法改变大小至107×107×3,然后使用神经网络特征提取器φ(·)对所有插值后的目标样本进行特征提取;所述特征提取器φ(·)的结构可为在Imagenet数据集上预训练的VGG-M模型的前三层卷积层。In step 2), the feature extraction step may be as follows: first, use the bilinear interpolation method to change the size of the target sample to 107×107×3, and then use the neural network feature extractor Feature extraction is performed on the target sample; the structure of the feature extractor φ(·) can be the first three convolutional layers of the VGG-M model pre-trained on the Imagenet dataset.

3)使用步骤2)中得到的训练样本特征集合、对抗损失和所提出的形变重构损失来离线训练所提出的幻觉对抗网络;3) using the training sample feature set, adversarial loss and the proposed deformation reconstruction loss obtained in step 2) to train the proposed hallucination adversarial network offline;

在步骤3)中,所述训练的过程可为:首先从训练样本特征集合中选取两组训练样本特征对,表示为

Figure BDA0002065066300000031
Figure BDA0002065066300000032
使用幻觉对抗网络学习
Figure BDA0002065066300000033
Figure BDA0002065066300000034
间的形变,并将此形变施加到
Figure BDA0002065066300000035
用以生成关于目标b新的形变样本,使用对抗损失保证生成的样本分布与目标b分布相近:In step 3), the training process can be as follows: first, two sets of training sample feature pairs are selected from the training sample feature set, which are expressed as
Figure BDA0002065066300000031
and
Figure BDA0002065066300000032
Learning with Illusion Adversarial Networks
Figure BDA0002065066300000033
and
Figure BDA0002065066300000034
deformation between and apply this deformation to
Figure BDA0002065066300000035
It is used to generate new deformed samples about the target b, and the adversarial loss is used to ensure that the generated sample distribution is similar to the distribution of the target b:

Figure BDA0002065066300000036
Figure BDA0002065066300000036

其中,

Figure BDA0002065066300000037
En和De分表表示所提出的对抗幻想器中的编码器和解码器部分;为了使得生成样本有效编码形变za,提出形变重构损失对生成样本进行约束:in,
Figure BDA0002065066300000037
En and D e sub-tables represent the encoder and decoder parts in the proposed adversarial fantasy device; in order to make the generated samples effectively encode the deformation za , a deformation reconstruction loss is proposed to constrain the generated samples:

Figure BDA0002065066300000038
Figure BDA0002065066300000038

其中,

Figure BDA0002065066300000039
最终,用于离线训练所提出的幻觉对抗网络的总损失函数为:in,
Figure BDA0002065066300000039
Finally, the total loss function for offline training of the proposed hallucination adversarial network is:

lall=ladv+λldef, (公式三)l all =l adv +λl def , (Formula 3)

其中,λ为用于平衡两项损失的超参数;where λ is the hyperparameter used to balance the two losses;

所述幻觉对抗网络的离线训练可包括以下子步骤:The offline training of the hallucination confrontation network may include the following sub-steps:

3.1公式(三)中的参数λ设置为0.5;3.1 The parameter λ in formula (3) is set to 0.5;

3.2在训练中,使用的优化器为Adam(D.P.Kingma,and J.L.Ba,“Adam:A methodfor stochastic optimization,”in Proceedings of the International Conferenceon Learning Representations,2014),迭代次数为5×105,学习率为2×10-43.2 In training, the optimizer used is Adam (DPKingma, and JLBa, “Adam: A method for stochastic optimization,” in Proceedings of the International Conferenceon Learning Representations, 2014), the number of iterations is 5×10 5 , and the learning rate is 2 × 10-4 ;

3.3所提出的幻觉对抗网络的编码器和解码器结构均为隐层节点数为2048的三层感知机,编码器输入层节点为9216,编码器输出层节点为64;解码器输入层节点为4672;判别网络同样为隐层节点数为2048的三层感知机,其输入节点数为9216,输出节点数为1。3.3 The encoder and decoder structures of the proposed hallucination adversarial network are both three-layer perceptrons with 2048 hidden layer nodes, 9216 encoder input layer nodes, 64 encoder output layer nodes; decoder input layer nodes are 4672; the discriminant network is also a three-layer perceptron with 2048 hidden layer nodes, the number of input nodes is 9216, and the number of output nodes is 1.

4)给定测试视频中的第一帧标注图像,采集目标样本,并在目标样本周围采用高斯和随机采样方式进行正负样本的采样;4) Given the first frame annotated image in the test video, collect target samples, and use Gaussian and random sampling methods to sample positive and negative samples around the target samples;

在步骤4)中,所述采样的细节可为:在每一次迭代训练中,正负样本比例按照1︰3的比例进行采样,即32个正样本和96个负样本,正样本判定标准为所采样样本和目标样本的区域重叠率大于0.7,负样本的判定标准为所采样样本和目标样本的区域重叠率低于0.5。In step 4), the details of the sampling can be: in each iterative training, the ratio of positive and negative samples is sampled at a ratio of 1:3, that is, 32 positive samples and 96 negative samples, and the positive sample determination standard is The regional overlap ratio of the sampled sample and the target sample is greater than 0.7, and the criterion for negative samples is that the regional overlap ratio of the sampled sample and the target sample is lower than 0.5.

5)使用所提出的选择性形变迁移方法对跟踪目标进行待迁移样本对的选择;5) Use the proposed selective deformation migration method to select the pair of samples to be migrated for the tracking target;

在步骤5)中,所述待迁移样本对的选择的过程可为:定义Ns表示用于收集形变样本对的数据集中视频片断的数目,si为视频片断的身份标识,其中,

Figure BDA0002065066300000041
Figure BDA0002065066300000042
表示视频片断si中对应样本的个数;对于视频片断si的特征表达ψ(si),可以通过如下方式计算得到:In step 5), the process of selecting the pair of samples to be migrated may be: defining N s to represent the number of video clips in the data set used to collect deformation sample pairs, and s i to be the identification of the video clips, wherein,
Figure BDA0002065066300000041
Figure BDA0002065066300000042
Represents the number of corresponding samples in the video segment s i ; for the feature expression ψ(s i ) of the video segment s i , it can be calculated as follows:

Figure BDA0002065066300000043
Figure BDA0002065066300000043

其中,

Figure BDA0002065066300000044
为深度特征提取器,对于目标特征
Figure BDA0002065066300000045
计算其余每个视频片断表征ψ(si)间的欧式距离,选取距离最近的T个视频片断;在选择的T个视频片断中,采用与步骤1)中相同的方式收集大量的形变样本对,构成集合DS,用于后续目标形变迁移;in,
Figure BDA0002065066300000044
is the deep feature extractor, for the target feature
Figure BDA0002065066300000045
Calculate the Euclidean distance between the remaining video clips representing ψ(s i ), and select the T video clips with the nearest distance; in the selected T video clips, collect a large number of deformation sample pairs in the same way as in step 1). , forming a set D S for subsequent target deformation migration;

所述选择性形变迁移方法可包括以下子步骤:The selective deformation transfer method may include the following sub-steps:

5.1在计算视频片断的特征表达时,所使用的深度特征提取器

Figure BDA0002065066300000046
为去掉全连接层的ResNet34模型;5.1 The deep feature extractor used when computing the feature representation of video clips
Figure BDA0002065066300000046
To remove the ResNet34 model of the fully connected layer;

5.2在选择相似视频片断时,参数T设置为2×1035.2 When selecting similar video clips, the parameter T is set to 2×10 3 .

6)基于选择得到的待迁移样本对,使用离线训练好的幻觉对抗网络生成形变的正样本;6) Based on the selected sample pairs to be migrated, use the offline trained hallucination confrontation network to generate deformed positive samples;

在步骤6)中,所述基于选择得到的待迁移样本对,使用离线训练好的幻觉对抗网络生成形变的正样本的具体步骤可为:在每一次迭代训练中,从集合DS随机选择64对样本对,每一对样本对与目标样本输入对抗幻想,生成对应形变样本,最终,对于每一次迭代,共计生成64个正样本。In step 6), based on the selected sample pairs to be migrated, the specific steps of using the offline trained hallucination confrontation network to generate deformed positive samples may be: in each iteration training, randomly select 64 samples from the set D S For sample pairs, each pair of samples is input against the target sample to generate corresponding deformation samples, and finally, for each iteration, a total of 64 positive samples are generated.

7)使用空间采样的正负样本和生成的正样本共同对分类器进行训练,其产生的分类误差损失用于同时更新分类器和幻觉对抗网络;7) Use the positive and negative samples of spatial sampling and the generated positive samples to jointly train the classifier, and the resulting classification error loss is used to update the classifier and the hallucination confrontation network at the same time;

在步骤7)中,所述使用空间采样的正负样本和生成的正样本共同对分类器进行训练,其产生的分类误差损失用于同时更新分类器和幻觉对抗网络的具体方法可为:将生成的64个正样本、32个空间采样的正样本和96个空间采样的负样本共同输入分类器,计算二分类的交叉熵损失,然后使用Adam优化器,通过反向传播算法同时更新分类器和幻觉对抗网络。In step 7), the classifier is jointly trained using the positive and negative samples of spatial sampling and the generated positive samples, and the classification error loss generated by the classifier is used to update the classifier and the hallucination confrontation network at the same time. The specific method can be: The generated 64 positive samples, 32 spatially sampled positive samples, and 96 spatially sampled negative samples are input to the classifier, and the cross-entropy loss of the binary classification is calculated, and then the Adam optimizer is used to update the classifier at the same time through the back-propagation algorithm. And the hallucinations adversarial network.

8)给定新的测试帧,使用训练好的分类器置信度最高的区域作为目标位置,完成当前帧的跟踪;8) Given a new test frame, use the region with the highest confidence of the trained classifier as the target position to complete the tracking of the current frame;

在步骤8)中,所述给定新的测试帧,使用训练好的分类器置信度最高的区域作为目标位置,完成当前帧的跟踪的具体过程可为:在当前测试帧,同时使用随机采样和高斯采样在上一帧估计的目标位置处进行样本采样;采样的样本输入分类器得到其对应的目标置信度。In step 8), the given new test frame, using the region with the highest confidence of the trained classifier as the target position, the specific process of completing the tracking of the current frame can be: in the current test frame, use random sampling at the same time With Gaussian sampling, sample sampling is performed at the target position estimated in the previous frame; the sampled sample is input to the classifier to obtain its corresponding target confidence.

本发明旨在于将人脑的想象机制用于当前基于深度学习的目标跟踪算法,提出一种新的基于幻觉对抗网络的鲁棒目标跟踪方法。本发明首先提出一种新的幻觉对抗网络,旨在于学习样本对间的非线性形变,并将学习到的形变施加在新目标以此来生成新的目标形变样本。为了能有效训练所提出的幻觉对抗网络,提出形变重构损失。基于离线训练的幻觉对抗网络,提出基于幻觉对抗网络的目标跟踪方法,该方法能有效缓解深度神经网络在目标跟踪过程中由于在线更新发生的过拟合问题。此外,为了能进一步提升形变迁移质量,提出选择性性变迁移方法,进一步提升了跟踪精度。本发明提出的目标跟踪方法在当前主流目标跟踪数据集上取得了具有竞争力的结果。The invention aims to apply the imagination mechanism of the human brain to the current target tracking algorithm based on deep learning, and proposes a new robust target tracking method based on the hallucination confrontation network. The present invention first proposes a new hallucination confrontation network, which aims to learn the nonlinear deformation between pairs of samples, and apply the learned deformation to a new target to generate a new target deformation sample. In order to effectively train the proposed hallucination adversarial network, a deformation reconstruction loss is proposed. Based on the hallucination adversarial network trained offline, a target tracking method based on the hallucination adversarial network is proposed, which can effectively alleviate the overfitting problem of the deep neural network due to online update in the target tracking process. In addition, in order to further improve the quality of deformation transfer, a selective deformation transfer method is proposed, which further improves the tracking accuracy. The target tracking method proposed by the present invention has achieved competitive results on the current mainstream target tracking data sets.

附图说明Description of drawings

图1为本发明实施例的流程示意图。FIG. 1 is a schematic flowchart of an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明的方法作详细说明,本实施例在以本发明技术方案为前提下进行实施,给出了实施方式和具体操作过程,但本发明的保护范围不限于下述的实施例。The method of the present invention will be described in detail below in conjunction with the accompanying drawings and examples. The present example is implemented on the premise of the technical solution of the present invention, and the implementation manner and specific operation process are given, but the protection scope of the present invention is not limited to the following example.

参见图1,本发明实施例包括以下步骤:Referring to Fig. 1, the embodiment of the present invention includes the following steps:

A.在有标注的目标跟踪数据集中收集大量形变样本对作为训练样本集合。具体过程如下:标记视频序列来收集大量目标样本对(一对样本包含同一个目标)。如在视频序列a中,首先在第t帧选取目标样本

Figure BDA0002065066300000051
然后在后20帧内随机选取一帧中的目标样本作为
Figure BDA0002065066300000052
以此来构成一组形变样本对
Figure BDA0002065066300000053
按照上述步骤,选取大量的形变样本对来构成训练样本集合。A. Collect a large number of deformation sample pairs in an annotated target tracking dataset as a training sample set. The specific process is as follows: label the video sequence to collect a large number of target sample pairs (a pair of samples contains the same target). As in the video sequence a, first select the target sample in the t-th frame
Figure BDA0002065066300000051
Then randomly select the target sample in one frame in the next 20 frames as
Figure BDA0002065066300000052
This constitutes a set of deformation sample pairs
Figure BDA0002065066300000053
According to the above steps, a large number of deformation sample pairs are selected to form a training sample set.

B.对步骤A中所得到的训练样本集合中的所有样本进行特征提取,得到训练样本特征集合。特征提取步骤如下:首先将目标样本使用双线形插值方法改变大小至107×107×3,然后使用神经网络特征提取器φ(·)对所有插值后的目标样本进行特征提取。B. Perform feature extraction on all samples in the training sample set obtained in step A to obtain a training sample feature set. The feature extraction steps are as follows: first, the target sample is changed to 107×107×3 using the bilinear interpolation method, and then the feature extraction is performed on all the interpolated target samples by using the neural network feature extractor φ(·).

C.使用步骤B中得到的训练样本特征集合、对抗损失和所提出的形变重构损失来离线训练所提出的幻觉对抗网络。其具体训练过程描述如下:首先从训练样本特征集合中选取两组训练样本特征对,表示为

Figure BDA0002065066300000054
Figure BDA0002065066300000055
使用幻觉对抗网络学习
Figure BDA0002065066300000056
Figure BDA0002065066300000057
间的形变,并将此形变施加到
Figure BDA0002065066300000058
用以生成关于目标b新的形变样本。使用对抗损失保证生成的样本分布与目标b分布相近:C. Use the training sample feature set obtained in step B, the adversarial loss, and the proposed deformation reconstruction loss to train the proposed hallucination adversarial network offline. The specific training process is described as follows: First, two sets of training sample feature pairs are selected from the training sample feature set, expressed as
Figure BDA0002065066300000054
and
Figure BDA0002065066300000055
Learning with Illusion Adversarial Networks
Figure BDA0002065066300000056
and
Figure BDA0002065066300000057
deformation between and apply this deformation to
Figure BDA0002065066300000058
Used to generate new deformation samples about target b. Using an adversarial loss ensures that the generated sample distribution is close to the target b distribution:

Figure BDA0002065066300000061
Figure BDA0002065066300000061

其中,

Figure BDA0002065066300000062
En和De分表表示所提出的幻觉对抗网络中的编码器和解码器部分。为了使得生成样本有效编码形变za,提出形变重构损失对生成样本进行约束:in,
Figure BDA0002065066300000062
The En and De sub-tables represent the encoder and decoder parts in the proposed hallucination adversarial network. In order to make the generated samples effectively encode the deformation za, a deformation reconstruction loss is proposed to constrain the generated samples:

Figure BDA0002065066300000063
Figure BDA0002065066300000063

其中,

Figure BDA0002065066300000064
最终,用于离线训练所提出的幻觉对抗网络的总损失函数为:in,
Figure BDA0002065066300000064
Finally, the total loss function for offline training of the proposed hallucination adversarial network is:

lall=ladv+λldef, (公式三)l all =l adv +λl def , (Formula 3)

其中,λ为用于平衡两项损失的超参数。where λ is the hyperparameter used to balance the two losses.

D、给定测试视频中的第一帧标注图像,采集目标样本,并在目标样本周围采用高斯和随机采样方式进行正负样本的采样。采样细节描述如下:在每一次迭代训练中,正负样本比例按照1︰3的比例进行采样,即32个正样本和96个负样本。正样本判定标准为所采样样本和目标样本的区域重叠率大于0.7,负样本的判定标准为所采样样本和目标样本的区域重叠率低于0.5。D. Given the first frame annotated image in the test video, collect target samples, and use Gaussian and random sampling methods to sample positive and negative samples around the target samples. The sampling details are described as follows: In each iteration of training, the ratio of positive and negative samples is sampled in a ratio of 1:3, that is, 32 positive samples and 96 negative samples. The criterion for positive samples is that the regional overlap ratio between the sampled sample and the target sample is greater than 0.7, and the criterion for negative samples is that the regional overlap ratio between the sampled sample and the target sample is less than 0.5.

E、使用所提出的选择性形变迁移方法来对跟踪目标进行待迁移样本对的选择。具体选择过程描述如下:定义Ns表示用于收集形变样本对的数据集中视频片断的数目,si为视频片断的身份标识,其中,

Figure BDA0002065066300000065
Figure BDA0002065066300000066
表示视频片断si中对应样本的个数。对于视频片断si的特征表达ψ(si),可以通过如下方式计算得到:E. Use the proposed selective deformation transfer method to select pairs of samples to be migrated for the tracking target. The specific selection process is described as follows: define N s to represent the number of video clips in the data set used to collect deformation sample pairs, s i to be the identity of the video clips, where,
Figure BDA0002065066300000065
Figure BDA0002065066300000066
Indicates the number of corresponding samples in the video segment si . For the feature expression ψ(s i ) of the video segment si , it can be calculated as follows:

Figure BDA0002065066300000067
Figure BDA0002065066300000067

其中,

Figure BDA0002065066300000068
为深度特征提取器。对于目标特征
Figure BDA0002065066300000069
计算其余每个视频片断表征ψ(si)间的欧式距离,选取距离最近的T个视频片断。在选择的T个视频片断中,采用与步骤A中相同的方式收集大量的形变样本对,构成集合DS,用于后续目标形变迁移。in,
Figure BDA0002065066300000068
is a deep feature extractor. For target features
Figure BDA0002065066300000069
Calculate the Euclidean distance between the representation ψ(s i ) of each of the remaining video clips, and select the T video clips with the closest distance. In the selected T video clips, a large number of deformation sample pairs are collected in the same manner as in step A to form a set D S for subsequent target deformation migration.

F、基于选择得到的待迁移样本对,使用离线训练好的幻觉对抗网络生成形变的正样本。具体生成步骤如下:在每一次迭代训练中,从集合DS随机选择64对样本对,每一对样本对与目标样本输入对抗幻想,生成对应形变样本。最终,对于每一次迭代,共计生成64个正样本。F. Based on the selected sample pairs to be migrated, use the offline trained hallucination confrontation network to generate deformed positive samples. The specific generation steps are as follows: in each iterative training, 64 pairs of samples are randomly selected from the set D S , and each pair of samples is input against the target sample to generate corresponding deformation samples. Finally, for each iteration, a total of 64 positive samples are generated.

G、使用空间采样的正负样本和生成的正样本来共同对分类器进行训练,其产生的分类误差损失用于同时更新分类器和幻觉对抗网络。具体优化过程如下:将生成的64个正样本、32个空间采样的正样本和96个空间采样的负样本共同输入分类器,计算二分类的交叉熵损失,然后使用Adam优化器,通过反向传播算法同时更新分类器和幻觉对抗网络。G. The classifier is jointly trained using the spatially sampled positive and negative samples and the generated positive samples, and the resulting classification error loss is used to simultaneously update the classifier and the hallucination adversarial network. The specific optimization process is as follows: Input the generated 64 positive samples, 32 spatially sampled positive samples and 96 spatially sampled negative samples into the classifier, calculate the cross-entropy loss of the binary classification, and then use the Adam optimizer to reverse The propagation algorithm updates both the classifier and the hallucination adversarial network.

H、给定新的测试帧,使用训练好的分类器置信度最高的区域作为目标位置,完成当前帧的跟踪。具体过程如下:在当前测试帧,同时使用随机采样和高斯采样在上一帧估计的目标位置处进行样本采样。采样的样本输入分类器得到其对应的目标置信度。H. Given a new test frame, use the region with the highest confidence of the trained classifier as the target position to complete the tracking of the current frame. The specific process is as follows: in the current test frame, use random sampling and Gaussian sampling to sample samples at the target position estimated in the previous frame. The sampled samples are input to the classifier to obtain their corresponding target confidences.

表1为本发明与其他9个目标跟踪算法在OTB-2013数据集上所取得的精度和成功率对比。本发明方法在主流的数据集上取得了优异的跟踪结果。Table 1 compares the accuracy and success rate achieved by the present invention and other 9 target tracking algorithms on the OTB-2013 data set. The method of the present invention achieves excellent tracking results on mainstream datasets.

表1Table 1

方法method 精度(%)Accuracy (%) 成功率(%)Success rate(%) 本发明this invention 95.195.1 69.669.6 VITAL(2018)VITAL(2018) 92.992.9 69.469.4 MCPF(2017)MCPF (2017) 91.691.6 67.767.7 CCOT(2016)CCOT (2016) 91.291.2 67.867.8 MDNet(2016)MDNet (2016) 90.990.9 66.866.8 CREST(2017)CREST (2017) 90.890.8 67.367.3 MetaSDNet(2018)MetaSDNet (2018) 90.590.5 68.468.4 ADNet(2017)ADNet (2017) 90.390.3 65.965.9 TRACA(2018)TRACA (2018) 89.889.8 65.265.2 HCFT(2015)HCFT (2015) 89.189.1 60.560.5

在表1中:In Table 1:

VITAL对应为Y.Song等人提出的方法(Y.Song,C.Ma,X.Wu,L.Gong,L.Bao,W.Zuo,C.Shen,R.Lau,and M.-H.Yang,“VITAL:VIsual Tracking via Adversarial Learning,”in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition,2018,pp.8990-8999.)VITAL corresponds to the method proposed by Y.Song et al. (Y.Song, C.Ma, X.Wu, L.Gong, L.Bao, W.Zuo, C.Shen, R.Lau, and M.-H. Yang, "VITAL: VIsual Tracking via Adversarial Learning," in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2018, pp.8990-8999.)

MCPF对应为T.Zhang等人提出的方法(T.Zhang,C.Xu,and M.-H.Yang,“Multi-Task Correlation Particle Filter for Robust Object Tracking,”in Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition,2017,pp.4819-4827.)MCPF corresponds to the method proposed by T. Zhang et al. (T. Zhang, C. Xu, and M.-H. Yang, "Multi-Task Correlation Particle Filter for Robust Object Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.4819-4827.)

CCOT对应为M.Danelljan等人提出的方法(M.Danelljan,A.Robinson,F.S.Khan,and M.Felsberg,“Beyond Correlation Filters:Learning Continuous ConvolutionOperators for Visual Tracking,”in Proceedings of the European Conference onComputer Vision,2016,pp.472-488.)CCOT corresponds to the method proposed by M. Danelljan et al. (M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg, "Beyond Correlation Filters: Learning Continuous ConvolutionOperators for Visual Tracking," in Proceedings of the European Conference on Computer Vision, 2016, pp.472-488.)

MDNet对应为H.Nam等人提出的方法(H.Nam and B.Han,“Learning Multi-domainConvolutional Neural Networks for Visual Tracking,”in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,2016,pp.817-825.)MDNet corresponds to the method proposed by H.Nam et al. (H.Nam and B.Han, "Learning Multi-domain Convolutional Neural Networks for Visual Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.817- 825.)

CREST对应为Y.Song等人提出的方法(Y.Song,C.Ma,L.Gong,J.Zhang,R.~W.H.Lau,and M.-H.Yang,“CREST:Convolutional Residual Learning for VisualTracking,”in Proceedings of the IEEE International Conference on ComputerVision,2017,pp.2555-2564.)CREST corresponds to the method proposed by Y. Song et al. (Y. Song, C. Ma, L. Gong, J. Zhang, R. ~ W. H. Lau, and M.-H. Yang, "CREST: Convolutional Residual Learning for Visual Tracking ,” in Proceedings of the IEEE International Conference on ComputerVision, 2017, pp.2555-2564.)

MetaSDNet对应为E.Park等人提出的方法(E.Park and A.C.Berg,“Meta-Tracker:Fast and Robust Online Adaptation for Visual Object Trackers,”inProceedings of the European Conference on Computer Vision,2018,pp.569-585.)MetaSDNet corresponds to the method proposed by E.Park et al. (E.Park and A.C.Berg, "Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers," in Proceedings of the European Conference on Computer Vision, 2018, pp.569- 585.)

ADNet对应为S.Yun等人提出的方法(S.Yun,J.Choi,Y.Yoo,K.Yun,and J.Y.Choi,“Action-decision Networks for Visual Tracking with Deep ReinforcementLearning,”in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition,2017,pp.2711-2720.)ADNet corresponds to the method proposed by S. Yun et al. (S. Yun, J. Choi, Y. Yoo, K. Yun, and J. Y. Choi, "Action-decision Networks for Visual Tracking with Deep Reinforcement Learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.2711-2720.)

TRACA对应为J.Choi等人提出的方法(J.Choi,H.J.Chang,T.Fischer,S.Yun,andJ.Y.Choi,“Context-aware Deep Feature Compression for High-speed VisualTracking,”in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition,2018,pp.479-488)。TRACA corresponds to the method proposed by J. Choi et al. (J. Choi, H. J. Chang, T. Fischer, S. Yun, and J. Y. Choi, "Context-aware Deep Feature Compression for High-speed Visual Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 479-488).

HCFT对应为C.Ma等人提出的方法(C.Ma,J.-B.Huang,X.Yang,and M.-H.Yang,“Hierarchical Convolutional Features for Visual Tracking,”in Proceedings ofthe IEEE International Conference on Computer Vision,2015,3074-3082)。HCFT corresponds to the method proposed by C.Ma et al. (C.Ma, J.-B.Huang, X.Yang, and M.-H.Yang, "Hierarchical Convolutional Features for Visual Tracking," in Proceedings of the IEEE International Conference on Computer Vision, 2015, 3074-3082).

Claims (7)

1.基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于包括以下步骤:1. The robust target tracking method based on hallucination confrontation network is characterized in that comprising the following steps: 1)在有标注的目标跟踪数据集中收集大量形变样本对作为训练样本集合;1) Collect a large number of deformation sample pairs in the labeled target tracking data set as a training sample set; 2)对步骤1)中所得到的训练样本集合中的所有样本进行特征提取,得到训练样本特征集合;2) Feature extraction is performed on all samples in the training sample set obtained in step 1) to obtain a training sample feature set; 3)使用步骤2)中得到的训练样本特征集合、对抗损失和所提出的形变重构损失来离线训练所提出的幻觉对抗网络;3) using the training sample feature set, adversarial loss and the proposed deformation reconstruction loss obtained in step 2) to train the proposed hallucination adversarial network offline; 所述训练的过程为:首先从训练样本特征集合中选取两组训练样本特征对,表示为
Figure FDA0002948010470000011
Figure FDA0002948010470000012
使用幻觉对抗网络学习
Figure FDA0002948010470000013
Figure FDA0002948010470000014
间的形变,并将此形变施加到
Figure FDA0002948010470000015
用以生成关于目标b新的形变样本,使用对抗损失保证生成的样本分布与目标b分布相近:
The training process is as follows: first, two sets of training sample feature pairs are selected from the training sample feature set, which are expressed as
Figure FDA0002948010470000011
and
Figure FDA0002948010470000012
Learning with Illusion Adversarial Networks
Figure FDA0002948010470000013
and
Figure FDA0002948010470000014
deformation between and apply this deformation to
Figure FDA0002948010470000015
It is used to generate new deformed samples about the target b, and the adversarial loss is used to ensure that the generated sample distribution is similar to the distribution of the target b:
Figure FDA0002948010470000016
Figure FDA0002948010470000016
其中,
Figure FDA0002948010470000017
En和De分别表示所提出的对抗幻想器中的编码器和解码器部分;为了使得生成样本有效编码形变za,提出形变重构损失对生成样本进行约束:
in,
Figure FDA0002948010470000017
En and D e represent the encoder and decoder parts in the proposed adversarial fantasy device, respectively; in order to make the generated samples effectively encode the deformation za , a deformation reconstruction loss is proposed to constrain the generated samples:
Figure FDA0002948010470000018
Figure FDA0002948010470000018
其中,
Figure FDA0002948010470000019
最终,用于离线训练所提出的幻觉对抗网络的总损失函数为:
in,
Figure FDA0002948010470000019
Finally, the total loss function for offline training of the proposed hallucination adversarial network is:
Figure FDA00029480104700000110
Figure FDA00029480104700000110
其中,λ为用于平衡两项损失的超参数;where λ is the hyperparameter used to balance the two losses; 所述幻觉对抗网络的离线训练包括以下子步骤:The offline training of the hallucination confrontation network includes the following sub-steps: 3.1(公式三)中的参数λ设置为0.5;The parameter λ in 3.1 (formula 3) is set to 0.5; 3.2在训练中,使用的优化器为Adam,迭代次数为5×105,学习率为2×10-43.2 In training, the optimizer used is Adam, the number of iterations is 5×10 5 , and the learning rate is 2×10 -4 ; 3.3所提出的幻觉对抗网络的编码器和解码器结构均为隐层节点数为2048的三层感知机,编码器输入层节点为9216,编码器输出层节点为64;解码器输入层节点为4672;判别网络同样为隐层节点数为2048的三层感知机,其输入节点数为9216,输出节点数为1;3.3 The encoder and decoder structures of the proposed hallucination adversarial network are both three-layer perceptrons with 2048 hidden layer nodes, 9216 encoder input layer nodes, 64 encoder output layer nodes; decoder input layer nodes are 4672; the discriminant network is also a three-layer perceptron with 2048 hidden layer nodes, the number of input nodes is 9216, and the number of output nodes is 1; 4)给定测试视频中的第一帧标注图像,采集目标样本,并在目标样本周围采用高斯和随机采样方式进行正负样本的采样;4) Given the first frame annotated image in the test video, collect target samples, and use Gaussian and random sampling methods to sample positive and negative samples around the target samples; 5)使用所提出的选择性形变迁移方法对跟踪目标进行待迁移样本对的选择;5) Use the proposed selective deformation migration method to select the pair of samples to be migrated for the tracking target; 所述待迁移样本对的选择的过程为:定义Ns表示用于收集形变样本对的数据集中视频片断的数目,si为视频片断的身份标识,其中,
Figure FDA0002948010470000021
Figure FDA0002948010470000022
表示视频片断si中对应样本的个数;对于视频片断si的特征表达ψ(si),通过如下方式计算得到:
The process of selecting the pair of samples to be migrated is: defining N s to represent the number of video clips in the data set used to collect the pairs of deformed samples, and s i to be the identification of the video clips, where,
Figure FDA0002948010470000021
Figure FDA0002948010470000022
Represents the number of corresponding samples in the video segment s i ; for the feature expression ψ(s i ) of the video segment s i , it is calculated by the following method:
Figure FDA0002948010470000023
Figure FDA0002948010470000023
其中,
Figure FDA0002948010470000024
为深度特征提取器,对于目标特征
Figure FDA0002948010470000025
计算其余每个视频片断表征ψ(si)间的欧式距离,选取距离最近的T个视频片断;在选择的T个视频片断中,采用与步骤1)中相同的方式收集大量的形变样本对,构成集合DS,用于后续目标形变迁移;
in,
Figure FDA0002948010470000024
is the deep feature extractor, for the target feature
Figure FDA0002948010470000025
Calculate the Euclidean distance between the remaining video clips representing ψ(s i ), and select the T video clips with the nearest distance; in the selected T video clips, collect a large number of deformation sample pairs in the same way as in step 1). , forming a set D S for subsequent target deformation migration;
所述选择性形变迁移方法包括以下子步骤:The selective deformation migration method includes the following sub-steps: 5.1在计算视频片断的特征表达时,所使用的深度特征提取器
Figure FDA0002948010470000029
为去掉全连接层的ResNet34模型;
5.1 The deep feature extractor used when computing the feature representation of video clips
Figure FDA0002948010470000029
To remove the ResNet34 model of the fully connected layer;
5.2在选择相似视频片断时,参数T设置为2×1035.2 When selecting similar video clips, the parameter T is set to 2×10 3 ; 6)基于选择得到的待迁移样本对,使用离线训练好的幻觉对抗网络生成形变的正样本;6) Based on the selected sample pairs to be migrated, use the offline trained hallucination confrontation network to generate deformed positive samples; 7)使用空间采样的正负样本和生成的正样本共同对分类器进行训练,其产生的分类误差损失用于同时更新分类器和幻觉对抗网络;7) Use the positive and negative samples of spatial sampling and the generated positive samples to jointly train the classifier, and the resulting classification error loss is used to update the classifier and the hallucination confrontation network at the same time; 8)给定新的测试帧,使用训练好的分类器置信度最高的区域作为目标位置,完成当前帧的跟踪。8) Given a new test frame, use the region with the highest confidence of the trained classifier as the target position to complete the tracking of the current frame.
2.如权利要求1所述基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于在步骤1)中,所述在有标注的目标跟踪数据集中收集大量形变样本对作为训练样本集合具体过程为:标记视频序列收集大量目标样本对,一对样本包含同一个目标;在视频序列a中,首先在第t帧选取目标样本
Figure FDA0002948010470000026
然后在后20帧内随机选取一帧中的目标样本作为
Figure FDA0002948010470000027
用于构成一组形变样本对
Figure FDA0002948010470000028
选取大量的形变样本对构成训练样本集合;所述数据集为Fei-Fei Li在2015年提出的ILSVRC-2015视频目标检测数据集。
2. the robust target tracking method based on hallucination confrontation network as claimed in claim 1, it is characterized in that in step 1) in, described in the target tracking data set with labeling, collect a large amount of deformation samples pair as training sample collection The concrete process is: : Mark the video sequence to collect a large number of target sample pairs, and a pair of samples contains the same target; in the video sequence a, first select the target sample in the t-th frame
Figure FDA0002948010470000026
Then randomly select the target sample in one frame in the next 20 frames as
Figure FDA0002948010470000027
used to form a set of deformed sample pairs
Figure FDA0002948010470000028
A large number of deformation sample pairs are selected to form a training sample set; the data set is the ILSVRC-2015 video target detection data set proposed by Fei-Fei Li in 2015.
3.如权利要求1所述基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于在步骤2)中,所述特征提取的步骤为:首先将目标样本使用双线形插值方法改变大小至107×107×3,然后使用神经网络特征提取器φ(·)对所有插值后的目标样本进行特征提取;所述特征提取器φ(·)的结构为在Imagenet数据集上预训练的VGG-M模型的前三层卷积层。3. the robust target tracking method based on hallucination confrontation network as claimed in claim 1, it is characterized in that in step 2), the step of described feature extraction is: at first the target sample is changed size to 107 using bilinear interpolation method ×107×3, and then use the neural network feature extractor φ(·) to perform feature extraction on all interpolated target samples; the structure of the feature extractor φ(·) is the VGG-M pre-trained on the Imagenet dataset The first three convolutional layers of the model. 4.如权利要求1所述基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于在步骤4)中,所述采样的细节为:在每一次迭代训练中,正负样本比例按照1︰3的比例进行采样,即32个正样本和96个负样本,正样本判定标准为所采样样本和目标样本的区域重叠率大于0.7,负样本的判定标准为所采样样本和目标样本的区域重叠率低于0.5。4. the robust target tracking method based on hallucination confrontation network as claimed in claim 1 is characterized in that in step 4), the details of the sampling are: in each iteration training, the ratio of positive and negative samples is 1:3 The ratio of samples to be sampled, that is, 32 positive samples and 96 negative samples, the determination standard of positive samples is that the regional overlap rate of the sampled samples and the target samples is greater than 0.7, and the determination standard of negative samples is the regional overlap ratio of the sampled samples and the target samples. below 0.5. 5.如权利要求1所述基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于在步骤6)中,所述基于选择得到的待迁移样本对,使用离线训练好的幻觉对抗网络生成形变的正样本的具体步骤为:在每一次迭代训练中,从集合DS随机选择64对样本对,每一对样本对与目标样本输入对抗幻想,生成对应形变样本,最终,对于每一次迭代,共计生成64个正样本。5. the robust target tracking method based on hallucination confrontation network as claimed in claim 1, it is characterized in that in step 6) in, described based on the sample pair to be migrated that obtains by selection, use offline trained hallucination confrontation network to generate deformation The specific steps of positive samples are: in each iterative training, randomly select 64 pairs of samples from the set D S , each pair of sample pairs and the target sample input against fantasy, generate corresponding deformation samples, and finally, for each iteration, a total of Generate 64 positive samples. 6.如权利要求1所述基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于在步骤7)中,所述使用空间采样的正负样本和生成的正样本共同对分类器进行训练,其产生的分类误差损失用于同时更新分类器和幻觉对抗网络的具体方法为:将生成的64个正样本、32个空间采样的正样本和96个空间采样的负样本共同输入分类器,计算二分类的交叉熵损失,然后使用Adam优化器,通过反向传播算法同时更新分类器和幻觉对抗网络。6. the robust target tracking method based on hallucination confrontation network as claimed in claim 1, is characterized in that in step 7) in, described using the positive and negative samples of space sampling and the positive samples generated to jointly train the classifier, its The resulting classification error loss is used to update the classifier and the hallucination adversarial network at the same time. The cross-entropy loss for classification is then used to update both the classifier and the hallucination adversarial network through a back-propagation algorithm using the Adam optimizer. 7.如权利要求1所述基于幻觉对抗网络的鲁棒目标跟踪方法,其特征在于在步骤8)中,所述给定新的测试帧,使用训练好的分类器置信度最高的区域作为目标位置,完成当前帧的跟踪的具体过程为:在当前测试帧,同时使用随机采样和高斯采样在上一帧估计的目标位置处进行样本采样;采样的样本输入分类器得到其对应的目标置信度。7. the robust target tracking method based on hallucination confrontation network as claimed in claim 1, is characterized in that in step 8) in, described given new test frame, use the region with the highest confidence of trained classifier as target The specific process of completing the tracking of the current frame is: in the current test frame, use random sampling and Gaussian sampling to sample samples at the target position estimated in the previous frame; the sampled samples are input into the classifier to obtain the corresponding target confidence. .
CN201910418050.4A 2019-05-20 2019-05-20 Robust object tracking method based on hallucination adversarial network Active CN110135365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910418050.4A CN110135365B (en) 2019-05-20 2019-05-20 Robust object tracking method based on hallucination adversarial network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910418050.4A CN110135365B (en) 2019-05-20 2019-05-20 Robust object tracking method based on hallucination adversarial network

Publications (2)

Publication Number Publication Date
CN110135365A CN110135365A (en) 2019-08-16
CN110135365B true CN110135365B (en) 2021-04-06

Family

ID=67571357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910418050.4A Active CN110135365B (en) 2019-05-20 2019-05-20 Robust object tracking method based on hallucination adversarial network

Country Status (1)

Country Link
CN (1) CN110135365B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274917B (en) * 2020-01-17 2023-07-18 江南大学 A Long-term Target Tracking Method Based on Depth Detection
CN111460948B (en) * 2020-03-25 2023-10-13 中国人民解放军陆军炮兵防空兵学院 Target tracking method based on cost sensitive structured SVM
CN111354019B (en) * 2020-03-31 2024-01-26 中国人民解放军军事科学院军事医学研究院 A neural network-based visual tracking failure detection system and its training method
CN111914912B (en) * 2020-07-16 2023-06-13 天津大学 A Cross-Domain Multi-view Target Recognition Method Based on Siamese Conditional Adversarial Network
CN113052203B (en) * 2021-02-09 2022-01-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Anomaly detection method and device for multiple types of data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681774A (en) * 2018-05-11 2018-10-19 电子科技大学 Based on the human body target tracking method for generating confrontation network negative sample enhancing
CN109345559A (en) * 2018-08-30 2019-02-15 西安电子科技大学 A moving target tracking method based on sample augmentation and deep classification network
US10282852B1 (en) * 2018-07-16 2019-05-07 Accel Robotics Corporation Autonomous store tracking system
CN109766830A (en) * 2019-01-09 2019-05-17 深圳市芯鹏智能信息有限公司 A kind of ship seakeeping system and method based on artificial intelligence image procossing

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324932B (en) * 2013-06-07 2017-04-12 东软集团股份有限公司 Video-based vehicle detecting and tracking method and system
KR101925907B1 (en) * 2016-06-03 2019-02-26 (주)싸이언테크 Apparatus and method for studying pattern of moving objects using adversarial deep generative model
CN108229434A (en) * 2018-02-01 2018-06-29 福州大学 A kind of vehicle identification and the method for careful reconstruct
CN108898620B (en) * 2018-06-14 2021-06-18 厦门大学 Target Tracking Method Based on Multiple Siamese Neural Networks and Regional Neural Networks
CN109325967B (en) * 2018-09-14 2023-04-07 腾讯科技(深圳)有限公司 Target tracking method, device, medium, and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681774A (en) * 2018-05-11 2018-10-19 电子科技大学 Based on the human body target tracking method for generating confrontation network negative sample enhancing
US10282852B1 (en) * 2018-07-16 2019-05-07 Accel Robotics Corporation Autonomous store tracking system
CN109345559A (en) * 2018-08-30 2019-02-15 西安电子科技大学 A moving target tracking method based on sample augmentation and deep classification network
CN109766830A (en) * 2019-01-09 2019-05-17 深圳市芯鹏智能信息有限公司 A kind of ship seakeeping system and method based on artificial intelligence image procossing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking;Qiangqiang Wu等;《arXiv:1811.02208v1》;20181109;第1-16页 *
Robust Visual Tracking Based on Adversarial Fusion Networks;Ximing Zhang等;《2018 37th Chinese Control Conference (CCC)》;20181008;第9142-9147页 *
具有显著姿态变化的长时间人体目标跟踪算法研究;周琦栋;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180915;第I138-333页 *

Also Published As

Publication number Publication date
CN110135365A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN110135365B (en) Robust object tracking method based on hallucination adversarial network
CN111354017B (en) Target tracking method based on twin neural network and parallel attention module
CN108520530B (en) Target tracking method based on long-time and short-time memory network
CN108304826A (en) Facial expression recognizing method based on convolutional neural networks
CN112651998B (en) Human body tracking algorithm based on attention mechanism and dual-stream multi-domain convolutional neural network
CN108108677A (en) One kind is based on improved CNN facial expression recognizing methods
CN110223324A (en) A kind of method for tracking target of the twin matching network indicated based on robust features
CN107292813A (en) A kind of multi-pose Face generation method based on generation confrontation network
CN106934456A (en) A kind of depth convolutional neural networks model building method
CN103886325B (en) Cyclic matrix video tracking method with partition
Zhu et al. Tiny object tracking: A large-scale dataset and a baseline
CN104933417A (en) Behavior recognition method based on sparse spatial-temporal characteristics
CN108416266A (en) A kind of video behavior method for quickly identifying extracting moving target using light stream
CN108573479A (en) Face Image Deblurring and Restoration Method Based on Dual Generative Adversarial Network
CN108399435A (en) A kind of video classification methods based on sound feature
CN103226835A (en) Target tracking method and system based on on-line initialization gradient enhancement regression tree
CN107452022A (en) A kind of video target tracking method
CN112489081A (en) Visual target tracking method and device
CN111582210A (en) Human Behavior Recognition Method Based on Quantum Neural Network
Zhang et al. Self-guided adaptation: Progressive representation alignment for domain adaptive object detection
CN107146237A (en) A Target Tracking Method Based on Online State Learning and Estimation
Kumar Shukla et al. Comparative analysis of machine learning based approaches for face detection and recognition
CN109753897A (en) Behavior recognition method based on memory unit reinforcement-temporal dynamic learning
CN115375732A (en) Unsupervised target tracking method and system based on module migration
CN110189362A (en) Efficient Target Tracking Method Based on Multi-Branch Autoencoder Adversarial Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant