CN110135365B - Robust target tracking method based on illusion countermeasure network - Google Patents
Robust target tracking method based on illusion countermeasure network Download PDFInfo
- Publication number
- CN110135365B CN110135365B CN201910418050.4A CN201910418050A CN110135365B CN 110135365 B CN110135365 B CN 110135365B CN 201910418050 A CN201910418050 A CN 201910418050A CN 110135365 B CN110135365 B CN 110135365B
- Authority
- CN
- China
- Prior art keywords
- samples
- target
- sample
- deformation
- countermeasure network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 230000008569 process Effects 0.000 claims abstract description 18
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000013508 migration Methods 0.000 claims abstract description 12
- 230000005012 migration Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 49
- 238000005070 sampling Methods 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 8
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 6
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000002860 competitive effect Effects 0.000 abstract description 2
- 230000000007 visual effect Effects 0.000 description 10
- 241000282414 Homo sapiens Species 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 4
- 238000003909 pattern recognition Methods 0.000 description 4
- 238000012937 correction Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
A robust target tracking method based on a phantom countermeasure network relates to a computer vision technology. Firstly, a new illusion countermeasure network is proposed, which aims to learn the nonlinear deformation between sample pairs and apply the learned deformation to a new target so as to generate a new target deformation sample. To be able to effectively train the proposed phantom countermeasure network, a deformation reconstruction loss is proposed. The method can effectively relieve the over-fitting problem of the deep neural network caused by online updating in the target tracking process. In addition, in order to further improve the deformation migration quality, a selective deformation migration method is provided, and the tracking accuracy is further improved. The proposed target tracking method achieves competitive results on current mainstream target tracking data sets.
Description
Technical Field
The invention relates to a computer vision technology, in particular to a robust target tracking method based on a phantom countermeasure network.
Background
In recent years, deep neural networks have enjoyed great success in the field of computer vision. As one of the basic problems in the field of computer vision, target tracking plays an important role in many current computer vision tasks, such as the fields of unmanned driving, augmented reality, robotics, and the like. Recently, the research of target tracking algorithm based on deep neural network has received much attention from researchers at home and abroad. However, unlike other computer vision tasks (such as target detection and semantic segmentation), the application of the deep neural network in the target tracking task is still very effective, mainly because the target tracking task itself has a certain particularity and lacks diversified on-line target training samples, thereby greatly limiting the generalization of the deep neural network and further influencing the tracking result. Meanwhile, the target tracking task aims at tracking any target, and does not give any prior knowledge to the target to be tracked in advance, which also brings great challenges to the selection of the deep neural network offline training data set. Therefore, the method for providing the deep neural network-based target tracking algorithm with strong generalization has important practical significance.
To alleviate the above problems, researchers at home and abroad have proposed two types of solutions. The first method regards a target tracking task as a problem of template matching, and a depth twin network is often adopted for specific implementation, and a target template and a search area are simultaneously used as the input of the depth twin network, so that the position of a sub-area in the search area, which is most similar to the target template, is obtained finally. The depth twin network based on similarity calculation can be completely trained offline by using a large number of labeled target tracking data sets, so that the overfitting problem caused by too few online training samples can be avoided. In the target tracking algorithm based on the deep twin network, the pioneering algorithm is SiamFC. Based on SiamFC, researchers have proposed many improved algorithms including SiamRPN that generates networks using region suggestion windows, memspiamfc that uses dynamic memory networks, SiamRPN + + that uses deeper-level skeleton networks, and so on. The SiamFC type of tracking algorithm tends to achieve far superior real-time tracking speeds because it avoids the time-consuming on-line training step. However, since such algorithms lack the process of online learning for target apparent changes, their accuracy is still relatively limited (e.g., accuracy results on OTB datasets). Another class of methods proposed by researchers is directed to learning robust neural network classifiers using limited online samples. The general idea of such methods is to use methods in the field of transfer learning to alleviate the over-fitting problem, a representative method of which is MDNet proposed by h.nam et al in 2016. MDNet firstly uses multi-domain off-line learning to learn better classifier initial model parameters, and then in the tracking process, the classifier is further trained by collecting positive and negative samples of a target. Recently, researchers have proposed via using antagonistic learning, branch out learning different level target tokens, SANet using RNN, and the like based on MDNet. Compared with the former method, the method can achieve higher tracking precision than the former method. However, due to the extremely limited online samples (especially target samples), online learning of such methods is very limited, and it still easily causes the problem of overfitting, thereby affecting the tracking performance. Therefore, a simple and effective method is designed to relieve the over-fitting problem of the depth target tracking algorithm in the tracking process, and the method has great significance.
Compared with the current target tracking algorithm, the human can track the moving target easily. Although the mechanisms of the human brain have not been fully explored to date, we can determine that the human brain derives an unrivaled imagination mechanism through the prior learning experience of the human. Human beings can learn similar actions or transformations from various things seen at ordinary times, so that the similar transformations are applied to different objects, thereby imagining the appearance of a new object under different postures or actions. The imagination mechanism is very similar to a data enhancement method in machine learning, and a human brain can be analogized to a visual classifier, and then the imagination mechanism is used for obtaining target samples in different states, so that a robust visual classifier is trained.
Disclosure of Invention
The invention aims to provide a robust target tracking method based on a phantom countermeasure network.
The invention comprises the following steps:
1) collecting a large number of deformation sample pairs in the marked target tracking data set as a training sample set;
in step 1), the specific process of collecting a large number of deformation sample pairs in the labeled target tracking data set as a training sample set may be: the marked video sequence collects a large number of target sample pairs, and one pair of samples contains the same target; in a video sequence a, firstly, a target sample is selected at the t-th frameThen randomly selects in the last 20 framesTarget samples in a frame asFor forming a set of deformed sample pairsSelecting a large number of deformation sample pairs to form a training sample set; the data set is an ILSVRC-2015 video target detection data set proposed by Fei-Fei Li et al in 2015.
2) Performing feature extraction on all samples in the training sample set obtained in the step 1) to obtain a training sample feature set;
in step 2), the step of feature extraction may be: firstly, changing the size of a target sample to 107 multiplied by 3 by using a double linear interpolation method, and then performing feature extraction on all interpolated target samples by using a neural network feature extractor phi (-); the structure of the feature extractor phi (-) can be the first three convolutional layers of the VGG-M model pre-trained on the Imagenet dataset.
3) Training the proposed phantom countermeasure network off-line using the training sample feature set, the countermeasure loss and the proposed deformation reconstruction loss obtained in step 2);
in step 3), the training process may be: firstly, two groups of training sample feature pairs are selected from a training sample feature set and are expressed asAndcountering web learning using hallucinationsAndand applying the deformation toTo generate new deformation samples for target b, the generated sample distribution is guaranteed to be similar to the target b distribution using the countermeasures loss:
wherein,Enand DeThe part table represents the encoder and decoder parts in the proposed countering phantom; to enable efficient coding of deformation z for generating samplesaAnd proposing deformation reconstruction loss to constrain the generated samples:
wherein,finally, the total loss function of the proposed phantom countermeasure network for offline training is:
lall=ladv+λldef(formula three)
Wherein λ is a hyper-parameter for balancing two losses;
the off-line training of the phantom confrontation network may comprise the sub-steps of:
3.1 the parameter λ in equation (three) is set to 0.5;
3.2 in the training, the optimizer used was Adam (D.P.Kingma, and J.L.Ba, "Adam: A method for storage optimization," in Proceedings of the International Conference on Learning retrieval, 2014) with an iteration number of 5 × 105Learning rate of 2 × 10-4;
3.3 the structures of the encoder and the decoder of the illusion countermeasure network are three-layer perceptrons with 2048 hidden layer nodes, 9216 encoder input layer nodes and 64 encoder output layer nodes; decoder input layer node 4672; the discrimination network is also a three-layer perceptron with hidden nodes of 2048, the number of input nodes is 9216, and the number of output nodes is 1.
4) A first frame of labeled image in a given test video is acquired, a target sample is acquired, and positive and negative samples are sampled around the target sample in a Gaussian and random sampling mode;
in step 4), the details of the sampling may be: in each iterative training, sampling is carried out according to the ratio of positive samples to negative samples which is 1: 3, namely 32 positive samples and 96 negative samples, the judgment standard of the positive samples is that the area overlapping rate of the sampled samples and the target samples is greater than 0.7, and the judgment standard of the negative samples is that the area overlapping rate of the sampled samples and the target samples is less than 0.5.
5) Selecting a sample pair to be migrated for the tracking target by using the proposed selective deformation migration method;
in step 5), the process of selecting the sample pair to be migrated may be: definition of NsRepresenting the number of video segments in the data set used to collect pairs of deformation samples, siIs the identification of the video clip, wherein, representing a video segment siThe number of corresponding samples in (1); for video segments siIs characterized by psi(s)i) It can be calculated as follows:
wherein,for the depth feature extractor, for the target featureComputing each of the remaining video clip representations ψ(s)i) Selecting T video clips with the nearest distance according to the Euclidean distance between the video clips; collecting a large number of deformation sample pairs in the same manner as in step 1) in the selected T video segments to form a set DSFor subsequent target deformation migration;
the selective deformation migration method may include the following sub-steps:
5.1 depth feature extractor for use in computing feature representations of video segmentsTo remove the ResNet34 model of the full link layer;
5.2 in selecting similar video segments, parameter T is set to 2 × 103。
6) Based on the selected sample pair to be migrated, generating a deformed positive sample by using an off-line trained phantom countermeasure network;
in step 6), the specific step of generating a deformed positive sample by using the off-line trained phantom countermeasure network based on the selected to-be-migrated sample pair may be: in each iterative training, from the set DSRandomly selecting 64 pairs of samples, each pair of samples opposing the target sample input to generate corresponding deformation samples, and finally generating 64 positive samples in total for each iteration.
7) Training the classifier by using the positive and negative samples of the spatial sampling and the generated positive sample together, wherein the generated classification error loss is used for updating the classifier and the illusion countermeasure network simultaneously;
in step 7), the classifier is trained by using the spatially sampled positive and negative samples and the generated positive samples, and the specific method for updating the classifier and the illusion countermeasure network simultaneously by using the classification error loss generated by the training of the classifier is as follows: the generated 64 positive samples, 32 spatially sampled positive samples and 96 spatially sampled negative samples are jointly input into a classifier, the cross entropy loss of the two classes is calculated, and then the classifier and the phantom countermeasure network are simultaneously updated through a back propagation algorithm by using an Adam optimizer.
8) Giving a new test frame, and using the region with the highest confidence coefficient of the trained classifier as a target position to complete the tracking of the current frame;
in step 8), the given new test frame uses the region with the highest confidence of the trained classifier as the target position, and the specific process of completing the tracking of the current frame may be: sampling samples at the target position estimated in the previous frame at the current test frame by using random sampling and Gaussian sampling simultaneously; and inputting the sampled samples into a classifier to obtain the corresponding target confidence of the samples.
The invention aims to apply the imagination mechanism of the human brain to the current target tracking algorithm based on deep learning and provides a novel robust target tracking method based on an illusion countermeasure network. The invention firstly proposes a new illusion countermeasure network, which aims to learn the nonlinear deformation between sample pairs and apply the learned deformation to a new target so as to generate a new target deformation sample. To be able to effectively train the proposed phantom countermeasure network, a deformation reconstruction loss is proposed. The method can effectively relieve the over-fitting problem of the deep neural network caused by online updating in the target tracking process. In addition, in order to further improve the deformation migration quality, a selective deformation migration method is provided, and the tracking accuracy is further improved. The target tracking method provided by the invention obtains competitive results on the current mainstream target tracking data set.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
The method of the present invention will be described in detail with reference to the accompanying drawings and examples, which are provided for implementation on the premise of the technical solution of the present invention, and give the implementation modes and the specific operation procedures, but the protection scope of the present invention is not limited to the following examples.
Referring to fig. 1, an embodiment of the present invention includes the steps of:
A. a large number of deformation sample pairs are collected in the labeled target tracking dataset as a training sample set. The specific process is as follows: video sequences are tagged to collect a large number of target sample pairs (a pair of samples containing the same target). For example, in video sequence a, first, a target sample is selected at the t-th frameThen randomly selecting a target sample in one frame as a target sample in the next 20 framesThus, a set of deformation sample pairs is formedAccording to the steps, a large number of deformation sample pairs are selected to form a training sample set.
B. And C, performing feature extraction on all samples in the training sample set obtained in the step A to obtain a training sample feature set. The characteristic extraction steps are as follows: the target samples are first resized to 107 x 3 using a bi-linear interpolation method, and then all interpolated target samples are feature extracted using a neural network feature extractor phi ().
C. And B, training the proposed illusion confrontation network off line by using the training sample feature set, the confrontation loss and the proposed deformation reconstruction loss obtained in the step B. The specific training process is described as follows: firstly, two groups of training sample feature pairs are selected from a training sample feature set and are expressed asAndcountering web learning using hallucinationsAndand applying the deformation toTo generate a new deformation sample for object b. Using the countermeasures losses ensures that the generated sample distribution is close to the target b distribution:
wherein,Enand DeThe part tables represent the encoder and decoder parts in the proposed phantom countermeasure network. To enable efficient coding of deformation z for generating samplesaAnd proposing deformation reconstruction loss to constrain the generated samples:
wherein,finally, the total loss function of the proposed phantom countermeasure network for offline training is:
lall=ladv+λldef(formula three)
Where λ is a hyper-parameter used to balance the two losses.
D. A first frame of labeled image in a given test video is acquired, a target sample is acquired, and positive and negative samples are sampled around the target sample in a Gaussian and random sampling mode. The sampling details are described below: in each iterative training, sampling is carried out according to the ratio of positive samples to negative samples of 1: 3, namely 32 positive samples and 96 negative samples. The positive sample judgment criterion is that the area overlapping rate of the sampled sample and the target sample is more than 0.7, and the negative sample judgment criterion is that the area overlapping rate of the sampled sample and the target sample is less than 0.5.
E. The proposed selective deformation migration method is used to select a sample pair to be migrated for the tracking target. The specific selection process is described as follows: definition of NsRepresenting the number of video segments in the data set used to collect pairs of deformation samples, siIs the identification of the video clip, wherein, representing a video segment siThe number of corresponding samples in (1). For video segments siIs characterized by psi(s)i) It can be calculated as follows:
wherein,is a depth feature extractor. For the target featureComputing each of the remaining video clip representations ψ(s)i) And selecting T video clips with the nearest distance. Collecting a large number of deformation sample pairs in the same manner as in step A in the selected T video segments to form a set DSAnd the method is used for subsequent target deformation migration.
F. And generating a deformed positive sample by using an off-line trained illusion countermeasure network based on the selected to-be-migrated sample pair. The specific generation steps are as follows: in each iterative training, from the set DSRandomly selecting 64 pairs of sample pairs, each pair of sample pairs countering the illusion with the target sample inputCorresponding to the deformation samples. Finally, for each iteration, a total of 64 positive samples are generated.
G. The spatially sampled positive and negative samples and the generated positive samples are used together to train the classifier, which produces a classification error penalty for updating the classifier and the phantom challenge network simultaneously. The specific optimization process is as follows: the generated 64 positive samples, 32 spatially sampled positive samples and 96 spatially sampled negative samples are jointly input into a classifier, the cross entropy loss of the two classes is calculated, and then the classifier and the phantom countermeasure network are simultaneously updated through a back propagation algorithm by using an Adam optimizer.
H. And giving a new test frame, and using the region with the highest confidence coefficient of the trained classifier as a target position to complete the tracking of the current frame. The specific process is as follows: at the current test frame, sampling samples are taken at the target position estimated at the previous frame using both random sampling and gaussian sampling. And inputting the sampled samples into a classifier to obtain the corresponding target confidence of the samples.
Table 1 shows the comparison between the accuracy and the success rate of the OTB-2013 data set obtained by the present invention and other 9 target tracking algorithms. The method of the invention achieves excellent tracking results on mainstream data sets.
TABLE 1
Method | Precision (%) | Success rate (%) |
The invention | 95.1 | 69.6 |
VITAL(2018) | 92.9 | 69.4 |
MCPF(2017) | 91.6 | 67.7 |
CCOT(2016) | 91.2 | 67.8 |
MDNet(2016) | 90.9 | 66.8 |
CREST(2017) | 90.8 | 67.3 |
MetaSDNet(2018) | 90.5 | 68.4 |
ADNet(2017) | 90.3 | 65.9 |
TRACA(2018) | 89.8 | 65.2 |
HCFT(2015) | 89.1 | 60.5 |
In table 1:
VITAL corresponds to the method proposed by Y.Song et al (Y.Song, C.Ma, X.Wu, L.Gong, L.Bao, W.Zuo, C.Shen, R.Lau, and M. -H.Yang, "VITAL: VIsual Tracking via Adversal left," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018, pp.8990-8999.)
MCPF corresponds to the method proposed for T.Zhang et al (T.Zhang, C.Xu, and M. -H.Yang, "Multi-Task correction Particle Filter for Robust Object Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017, pp.4819-4827.)
CCOT corresponds to the method proposed for M.Danelljan et al (M.Danelljan, A.Robinson, F.S.Khan, and M.Felsberg, "Beyond correction Filters: Learning Continuous correction Operators for Visual Tracking," in Proceedings of the European Conference on Computer Vision,2016, pp.472-488.)
MDNet corresponds to the method proposed for H.Nam et al (H.Nam and B.Han, "Learning Multi-domain computational network for Visual Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016, pp.817-825.)
CREST corresponds to the method proposed by Y.Song et al (Y.Song, C.Ma, L.Gong, J.Zhang, R. -W.H.Lau, and M. -H.Yang, "CREST: volumetric responsible Learning for Visual Tracking," in Proceedings of the IEEE International Conference on Computer Vision,2017, pp.2555-2564.)
MetaSDNet corresponds to the method proposed for E.park et al (E.park and A.C.berg, "Meta-Tracker: Fast and Robust on attachment for Visual objects" in Proceedings of the European Conference on Computer Vision,2018, pp.569-585.)
ADNet corresponds to the method proposed for S.Yun et al (S.Yun, J.Choi, Y.Yoo, K.Yun, and J.Y.Choi, "Action-precision Networks for Visual Tracking with Deep recording Learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017, pp.2711-2720.)
TRACA corresponds to the method proposed by J.Choi et al (J.Choi, H.J.Chang, T.Fischer, S.Yun, and J.Y.Choi, "Context-aware Deep feed Compression for High-speed Visual Tracking," in Proceedings of the IEEE Conference Computer Vision and Pattern registration, 2018, pp.479-488).
HCFT corresponds to the method proposed by C.Ma et al (C.Ma, J. -B.Huang, X.Yang, and M. -H.Yang, "Hierarchical relational provisions for Visual Tracking," in Proceedings of the IEEE International Conference on Computer Vision,2015, 3074-.
Claims (7)
1. The robust target tracking method based on the illusion countermeasure network is characterized by comprising the following steps of:
1) collecting a large number of deformation sample pairs in the marked target tracking data set as a training sample set;
2) performing feature extraction on all samples in the training sample set obtained in the step 1) to obtain a training sample feature set;
3) training the proposed phantom countermeasure network off-line using the training sample feature set, the countermeasure loss and the proposed deformation reconstruction loss obtained in step 2);
the training process comprises the following steps: firstly, two groups of training sample feature pairs are selected from a training sample feature set and are expressed asAndcountering web learning using hallucinationsAndin betweenDeformation and applying the deformation toTo generate new deformation samples for target b, the generated sample distribution is guaranteed to be similar to the target b distribution using the countermeasures loss:
wherein,Enand DeRepresenting the encoder and decoder parts, respectively, in the proposed countering phantom; to enable efficient coding of deformation z for generating samplesaAnd proposing deformation reconstruction loss to constrain the generated samples:
wherein,finally, the total loss function of the proposed phantom countermeasure network for offline training is:
wherein λ is a hyper-parameter for balancing two losses;
the off-line training of the phantom countermeasure network comprises the following sub-steps:
3.1 (equation three) the parameter λ is set to 0.5;
3.2 in training, Adam was used as optimizer and 5X 10 iterations were performed5Learning rate of 2 × 10-4;
3.3 the structures of the encoder and the decoder of the illusion countermeasure network are three-layer perceptrons with 2048 hidden layer nodes, 9216 encoder input layer nodes and 64 encoder output layer nodes; decoder input layer node 4672; the judging network is also a three-layer perceptron with hidden nodes of 2048, the number of input nodes is 9216, and the number of output nodes is 1;
4) a first frame of labeled image in a given test video is acquired, a target sample is acquired, and positive and negative samples are sampled around the target sample in a Gaussian and random sampling mode;
5) selecting a sample pair to be migrated for the tracking target by using the proposed selective deformation migration method;
the process of selecting the sample pair to be migrated is as follows: definition of NsRepresenting the number of video segments in the data set used to collect pairs of deformation samples, siIs the identification of the video clip, wherein, representing a video segment siThe number of corresponding samples in (1); for video segments siIs characterized by psi(s)i) The calculation is carried out in the following way:
wherein,for the depth feature extractor, for the target featureComputing each of the remaining video clip representations ψ(s)i) The Euclidean distance between the videos is selected, and T videos with the nearest distance are selectedFragmenting; collecting a large number of deformation sample pairs in the same manner as in step 1) in the selected T video segments to form a set DSFor subsequent target deformation migration;
the selective deformation migration method comprises the following substeps:
5.1 depth feature extractor for use in computing feature representations of video segmentsTo remove the ResNet34 model of the full link layer;
5.2 in selecting similar video segments, parameter T is set to 2 × 103;
6) Based on the selected sample pair to be migrated, generating a deformed positive sample by using an off-line trained phantom countermeasure network;
7) training the classifier by using the positive and negative samples of the spatial sampling and the generated positive sample together, wherein the generated classification error loss is used for updating the classifier and the illusion countermeasure network simultaneously;
8) and giving a new test frame, and using the region with the highest confidence coefficient of the trained classifier as a target position to complete the tracking of the current frame.
2. The method for robust target tracking based on an illusion-resisting network as claimed in claim 1, wherein in step 1), the specific process of collecting a large number of deformation sample pairs in the labeled target tracking data set as the training sample set is as follows: the marked video sequence collects a large number of target sample pairs, and one pair of samples contains the same target; in a video sequence a, firstly, a target sample is selected at the t-th frameThen randomly selecting a target sample in one frame as a target sample in the next 20 framesFor forming a set of deformed sample pairsSelecting a large number of deformation sample pairs to form a training sample set; the data set is an ILSVRC-2015 video target detection data set proposed by Fei-Fei Li in 2015.
3. The robust target tracking method based on the phantom countermeasure network as claimed in claim 1, wherein in step 2), the step of feature extraction is: firstly, changing the size of a target sample to 107 multiplied by 3 by using a double linear interpolation method, and then performing feature extraction on all interpolated target samples by using a neural network feature extractor phi (-); the structure of the feature extractor phi (-) is the first three convolutional layers of the VGG-M model pre-trained on the Imagenet dataset.
4. A robust target tracking method based on an illusion-resisting network as claimed in claim 1, wherein in step 4), the details of the sampling are: in each iterative training, sampling is carried out according to the ratio of positive samples to negative samples which is 1: 3, namely 32 positive samples and 96 negative samples, the judgment standard of the positive samples is that the area overlapping rate of the sampled samples and the target samples is greater than 0.7, and the judgment standard of the negative samples is that the area overlapping rate of the sampled samples and the target samples is less than 0.5.
5. The method for robust target tracking based on phantom countermeasure network as claimed in claim 1, wherein in step 6), the step of generating the deformed positive sample using the phantom countermeasure network trained offline based on the selected pair of samples to be migrated comprises the specific steps of: in each iterative training, from the set DSRandomly selecting 64 pairs of samples, each pair of samples opposing the target sample input to generate corresponding deformation samples, and finally generating 64 positive samples in total for each iteration.
6. The robust target tracking method based on the phantom countermeasure network as claimed in claim 1, wherein in step 7), the classifier is trained by using the spatially sampled positive and negative samples and the generated positive samples, and the specific method for updating the classifier and the phantom countermeasure network simultaneously with the loss of classification error is as follows: the generated 64 positive samples, 32 spatially sampled positive samples and 96 spatially sampled negative samples are jointly input into a classifier, the cross entropy loss of the two classes is calculated, and then the classifier and the phantom countermeasure network are simultaneously updated through a back propagation algorithm by using an Adam optimizer.
7. The method for tracking the robust target based on the phantom countermeasure network as claimed in claim 1, wherein in step 8), the given new test frame uses the region with the highest confidence of the trained classifier as the target position, and the specific process for completing the tracking of the current frame is as follows: sampling samples at the target position estimated in the previous frame at the current test frame by using random sampling and Gaussian sampling simultaneously; and inputting the sampled samples into a classifier to obtain the corresponding target confidence of the samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910418050.4A CN110135365B (en) | 2019-05-20 | 2019-05-20 | Robust target tracking method based on illusion countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910418050.4A CN110135365B (en) | 2019-05-20 | 2019-05-20 | Robust target tracking method based on illusion countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110135365A CN110135365A (en) | 2019-08-16 |
CN110135365B true CN110135365B (en) | 2021-04-06 |
Family
ID=67571357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910418050.4A Active CN110135365B (en) | 2019-05-20 | 2019-05-20 | Robust target tracking method based on illusion countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135365B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111274917B (en) * | 2020-01-17 | 2023-07-18 | 江南大学 | Long-time target tracking method based on depth detection |
CN111460948B (en) * | 2020-03-25 | 2023-10-13 | 中国人民解放军陆军炮兵防空兵学院 | Target tracking method based on cost sensitive structured SVM |
CN111354019B (en) * | 2020-03-31 | 2024-01-26 | 中国人民解放军军事科学院军事医学研究院 | Visual tracking failure detection system based on neural network and training method thereof |
CN111914912B (en) * | 2020-07-16 | 2023-06-13 | 天津大学 | Cross-domain multi-view target identification method based on twin condition countermeasure network |
CN113052203B (en) * | 2021-02-09 | 2022-01-18 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Anomaly detection method and device for multiple types of data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681774A (en) * | 2018-05-11 | 2018-10-19 | 电子科技大学 | Based on the human body target tracking method for generating confrontation network negative sample enhancing |
CN109345559A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Expand the motion target tracking method with depth sorting network based on sample |
US10282852B1 (en) * | 2018-07-16 | 2019-05-07 | Accel Robotics Corporation | Autonomous store tracking system |
CN109766830A (en) * | 2019-01-09 | 2019-05-17 | 深圳市芯鹏智能信息有限公司 | A kind of ship seakeeping system and method based on artificial intelligence image procossing |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324932B (en) * | 2013-06-07 | 2017-04-12 | 东软集团股份有限公司 | Video-based vehicle detecting and tracking method and system |
KR101925907B1 (en) * | 2016-06-03 | 2019-02-26 | (주)싸이언테크 | Apparatus and method for studying pattern of moving objects using adversarial deep generative model |
CN108229434A (en) * | 2018-02-01 | 2018-06-29 | 福州大学 | A kind of vehicle identification and the method for careful reconstruct |
CN108898620B (en) * | 2018-06-14 | 2021-06-18 | 厦门大学 | Target tracking method based on multiple twin neural networks and regional neural network |
CN109325967B (en) * | 2018-09-14 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Target tracking method, device, medium, and apparatus |
-
2019
- 2019-05-20 CN CN201910418050.4A patent/CN110135365B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681774A (en) * | 2018-05-11 | 2018-10-19 | 电子科技大学 | Based on the human body target tracking method for generating confrontation network negative sample enhancing |
US10282852B1 (en) * | 2018-07-16 | 2019-05-07 | Accel Robotics Corporation | Autonomous store tracking system |
CN109345559A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Expand the motion target tracking method with depth sorting network based on sample |
CN109766830A (en) * | 2019-01-09 | 2019-05-17 | 深圳市芯鹏智能信息有限公司 | A kind of ship seakeeping system and method based on artificial intelligence image procossing |
Non-Patent Citations (3)
Title |
---|
DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking;Qiangqiang Wu等;《arXiv:1811.02208v1》;20181109;第1-16页 * |
Robust Visual Tracking Based on Adversarial Fusion Networks;Ximing Zhang等;《2018 37th Chinese Control Conference (CCC)》;20181008;第9142-9147页 * |
具有显著姿态变化的长时间人体目标跟踪算法研究;周琦栋;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180915;第I138-333页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110135365A (en) | 2019-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135365B (en) | Robust target tracking method based on illusion countermeasure network | |
Wang et al. | Ranet: Ranking attention network for fast video object segmentation | |
CN111354017B (en) | Target tracking method based on twin neural network and parallel attention module | |
Sun et al. | Lattice long short-term memory for human action recognition | |
Zhang et al. | Nonlinear regression via deep negative correlation learning | |
Chen et al. | Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction | |
CN110096950B (en) | Multi-feature fusion behavior identification method based on key frame | |
Wang et al. | Learning attentions: residual attentional siamese network for high performance online visual tracking | |
CN107609460B (en) | Human body behavior recognition method integrating space-time dual network flow and attention mechanism | |
CN108520530B (en) | Target tracking method based on long-time and short-time memory network | |
CN112307995B (en) | Semi-supervised pedestrian re-identification method based on feature decoupling learning | |
CN112651998B (en) | Human body tracking algorithm based on attention mechanism and double-flow multi-domain convolutional neural network | |
Wang et al. | A cognitive memory-augmented network for visual anomaly detection | |
CN109727272B (en) | Target tracking method based on double-branch space-time regularization correlation filter | |
CN110189362B (en) | Efficient target tracking method based on multi-branch self-coding countermeasure network | |
CN109859241A (en) | Adaptive features select and time consistency robust correlation filtering visual tracking method | |
Xu et al. | Gait recognition from a single image using a phase-aware gait cycle reconstruction network | |
CN109711411B (en) | Image segmentation and identification method based on capsule neurons | |
CN111862167B (en) | Rapid robust target tracking method based on sparse compact correlation filter | |
Tan et al. | Bidirectional long short-term memory with temporal dense sampling for human action recognition | |
Zhang et al. | Object detection and tracking based on recurrent neural networks | |
Putra et al. | Markerless human activity recognition method based on deep neural network model using multiple cameras | |
CN111968155A (en) | Target tracking method based on segmented target mask updating template | |
CN113763417A (en) | Target tracking method based on twin network and residual error structure | |
CN117409475A (en) | 3D-CNN action recognition method based on bones |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |