CN106651915B - The method for tracking target of multi-scale expression based on convolutional neural networks - Google Patents
The method for tracking target of multi-scale expression based on convolutional neural networks Download PDFInfo
- Publication number
- CN106651915B CN106651915B CN201611201895.0A CN201611201895A CN106651915B CN 106651915 B CN106651915 B CN 106651915B CN 201611201895 A CN201611201895 A CN 201611201895A CN 106651915 B CN106651915 B CN 106651915B
- Authority
- CN
- China
- Prior art keywords
- network
- model
- scale
- target
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention belongs to technical field of image processing, the method for tracking target of the multi-scale expression based on convolutional neural networks is provided, comprising: multiple dimensioned convolutional neural networks structure pre-training;More example classification devices are constructed using Analysis On Multi-scale Features expression;More examples are improved to track online;Multistep difference model modification.Ability of the algorithm using the automatic study further feature of convolutional neural networks, the available deep layer image expression for being related to semantic information, while utilizing the multi-scale expression of laplacian pyramid building image, the multiple dimensioned convolutional neural networks structure of training.In conjunction with improved multi-instance learning algorithm, online tracker is constructed, realizes the tenacious tracking of target.
Description
Technical field
The present invention relates to the method for tracking target of the multi-scale expression based on convolutional neural networks, belong to image processing techniques
Field.
Background technique
In recent years, target following technology is rapidly developed, still with the proposition of a large amount of target tracking algorisms
Since in actual tracking, there are many Difficulties for target following task, for example, object blocks, visual angle change, target deformation,
Ambient lighting changes and is difficult to expect complicated background, causes many problems of existing algorithm.Based on differentiation
In the target tracking algorism of model, display model, two classifiers of training, thus handle usually are constructed using the difference of target and background
Target is separated from background.Existing most of track algorithms rely on the appearance mould of the feature construction target of hand-designed
Type is unable to the essential information of effective expression target, especially in complex condition, has to the ability to express of the display model of target
Limit, causes the failure of object module.During tracking, due to the error that the error tracking of target introduces, building up can be made
At drifting problem.Track algorithm based on multi-instance learning can solve drifting problem to a certain extent, but due to model letter
Number is easily saturated itself, so that the separating capacity of model declines, causes to limit to tracking performance.
Summary of the invention
In view of the problems of the existing technology, the present invention carries out multi-resolution decomposition to image using laplacian pyramid,
A kind of target tracking algorism of multi-scale expression based on convolutional neural networks is provided.The algorithm utilizes oneself of convolutional neural networks
The ability of dynamic study further feature, the available deep layer image expression for being related to semantic information, while utilizing Laplce's gold word
Tower constructs the multi-scale expression of image, the multiple dimensioned convolutional neural networks structure of training.In conjunction with improved multi-instance learning algorithm,
Online tracker is constructed, realizes the tenacious tracking of target.
The technical solution of the present invention is as follows:
The method for tracking target of multi-scale expression based on convolutional neural networks, comprising the following steps:
The first step, multiple dimensioned convolutional neural networks structure pre-training;
Second step constructs more example classification devices using Analysis On Multi-scale Features expression;
Third step is improved more examples and is tracked online;
4th step, multistep difference model modification.
The invention has the advantages that:, there are multiple dimensioned structural information, the thick scale of image usually reflects figure in natural image
The overall structure of picture, the fine dimension of image include more image detail.Image is carried out using laplacian pyramid more
Scale Decomposition proposes the target tracking algorism of the multi-scale expression based on convolutional neural networks.This method can extract more rulers
The convolution feature of degree constitutes the stronger display model of ability to express.In combination with improved multi-instance learning algorithm, model is solved
The problem of easily model separating capacity caused by saturation declines.Compared with existing target tracking algorism, this method is able to achieve more
Stable tracking, accuracy are higher.
Detailed description of the invention
Fig. 1 is convolutional neural networks structural schematic diagram;
Fig. 2 is multiple dimensioned convolutional neural networks training schematic diagram;
Fig. 3 is the percentage of different errors of centration distances;
Fig. 4 is to successfully track frame percentage.
Specific embodiment
The present invention will be further described below.
The method for tracking target of multi-scale expression based on convolutional neural networks, comprising the following steps:
The first step, multiple dimensioned convolutional neural networks model pre-training
Laplace transform is done to image, constructs the pyramid space of image, then extracts the three of laplacian pyramid
Input of the image as network model under kind scale;Multiple dimensioned convolutional Neural net is built using Lasagne deep learning frame
Network model constitutes network model pond;Each network model includes three convolutional layers, two full articulamentums and one
Softmax layers;Network model is as shown in Figure 1.The shallow structure initialization network parameter of VGG-net is used simultaneously.
During pre-training, network parameter is continued to optimize using part of standards track file;Every kind of scale image point
Do not correspond to thick scale network, medium scale network and fine dimension network, network share parameter between different scale, scale by slightly to
Carefully it is trained.In order to obtain different classes of object information, for the corresponding different network of different classes of video set building, to catch
The common feature of different classes of object is obtained, shares network parameter repetitive exercise in addition to the last layer between network, as shown in Figure 2.?
In training process, using cross entropy as loss function L, form of Definition are as follows:
L=- ∑itilog(pi) (1)
Wherein, tiFor the authentic signature (target or background) of i-th of image block, piPrediction for i-th of image block is general
Rate.Network parameter is continued to optimize using gradient descent method (SGD) in the training process, until all samples are trained up,
The network parameter for finally retaining three kinds of scales, obtains the good multiple dimensioned convolutional neural networks model of pre-training.
Second step constructs more example classification devices using Analysis On Multi-scale Features expression
The last layer of the good multiple dimensioned convolution model of pre-training is removed, adds a random initializtion again
Softmax layers, network parameter is finely adjusted using the target that image first frame gives.Then divide from the network of three kinds of scales
Indescribably take three layers of convolution of characteristic pattern as convolution feature.Common group of feature for extracting two layers of convolution of fine dimension network simultaneously
At the multi-scale expression of display model.In order to reduce the data dimension of feature, using maximum pond to two layers of characteristic pattern of convolution into
Row dimensionality reduction.All convolution features are connected and composed to the multiple dimensioned display model of target.
In order to realize the online updating of target, need to object module real-time update.Using obtained convolution feature as spy
Pond is levied, learns two classifiers using multi-instance learning algorithm.The classifier be one be made of multiple Weak Classifiers it is strong
Classifier.Its implementation are as follows: by the way of enhancing study, maximize objective function, that is, log-likelihood function, successively select K
A Weak Classifier, and by each Weak Classifier weighted sum, to construct more example classification devices.
Third step is improved more examples and is tracked online
In multi-instance learning algorithm, each exemplary likelihood probability is indicated are as follows:
P (y | x)=σ (H (x)) (2)
Wherein, x is that the feature space of image is expressed, and y is a dichotomic variable, is used to indicate in image with the presence or absence of mesh
Mark, H (x) are the strong classifier of multiple weak typings composition, and σ (x) is Sigmoid function, i.e.,
By the property of Sigmoid function it is found that when x is gradually increased or is gradually reduced, function is easy to be saturated.Work as selection
When weak typing constitutes strong classifier, it is easy to cause overfitting problem.In order to solve this problem, we are in Sigmoid function
One penalty factor of middle introducing is saturated to slow down function, improved Sigmoid function are as follows:
Wherein, k is the Weak Classifier number for forming strong classifier.When the number of Weak Classifier gradually increases, punishment because
Son can quickly inhibit the size of independent variable to a reasonable range, slow down the speed of function saturation, while can ensure letter
Number convergence.
4th step, multistep difference model modification
During tracking, multiple dimensioned convolutional neural networks model is updated by the way of multistep difference model modification.
For thick scale network modeling, network model parameter is updated by the way of updating fastly, with timely adaptive model
Cosmetic variation;For fine dimension network model, network model parameter is updated by the way of updating slowly, can be avoided mould
Type changes the error noise that may be introduced and mistake updates;For medium scale network model, renewal frequency is therebetween.
In this way, enable model to adapt to the cosmetic variation of target in time, while error tracking can be resisted to model more
New influence.
When there is new frame image input, n candidate target frame { x is chosen around previous frame target position1,…,
xn, according to p (y | x)=σ (H (x)), the objective result of the peak response position of likelihood probability frame thus is selected, such as formula (5)
It is shown.
We carry out the method for tracking target of the multi-scale expression based on convolutional neural networks of proposition in terms of two
Analysis verifying.It is the accurate rate of track algorithm first, the followed by success rate of algorithm.And use target following standard data set
(OTB) part sequence of pictures is tested, and classical MIL, TLD, Struck, SCM, KCF and TGPR method conduct pair is chosen
According to.
With regard to the accurate rate aspect of algorithm, we carry out the essence of evaluation algorithms using the errors of centration of tracking target and actual position
Exactness calculates the Euclidean distance of tracking target and actual position, different distances is arranged as threshold value, statistics reaches different threshold values
It is required that percentage, and the corresponding percentage of selected threshold 20 be final score.As a result as shown in figure 3, as seen from the figure we
Method obtain higher score, the essence of the method for tracking target tracking of this multi-scale expression of the explanation based on convolutional neural networks
True rate is higher.
With regard to the success rate aspect of algorithm, we calculate the coincidence factor of tracking target and actual position according to formula (6)
Wherein, rtFor the area for tracking target, roFor the area of real goal, ∩ represents intersection operation, and ∪ represents union behaviour
Make.Using coincidence factor as threshold value, the successful percentage under different threshold values is counted, and using AUC size as final score.Knot
For fruit as shown in figure 4, our method obtains higher AUC value as seen from the figure, this illustrates more rulers based on convolutional neural networks
The method for tracking target tracking of degree expression has higher success rate.
Claims (1)
1. a kind of method for tracking target of the multi-scale expression based on convolutional neural networks, it is characterised in that following steps:
The first step, multiple dimensioned convolutional neural networks model pre-training
Laplace transform is done to image, constructs the pyramid space of image, under three kinds of scales for extracting laplacian pyramid
Input of the image as network model;Multiple dimensioned convolutional neural networks model, structure are built using Lasagne deep learning frame
At network model basin;Each network model includes three convolutional layers, two full articulamentums and one softmax layers;Simultaneously
Using the shallow structure initialization network parameter of VGG-net;
During pre-training, track file continuouslys optimize network parameter;Every kind of scale image respectively corresponds thick scale net
Network, medium scale network and fine dimension network;Network share parameter between different scale, scale are trained from thick to thin;
Different networks is constructed for different classes of video set, for obtaining different classes of object information;Last is removed between network
The outer shared network parameter repetitive exercise of layer, for capturing the common feature of different classes of object;In the training process, using intersection
Entropy is as loss function L, form of Definition are as follows:
L=- ∑itilog(pi) (1)
Wherein, tiFor the authentic signature of i-th of image block, i.e. target or background;piFor the prediction probability of i-th of image block;
Network parameter is continued to optimize using gradient descent method SGD in the training process, until all samples are trained up, most
The network parameter for retaining three kinds of scales afterwards, obtains the good multiple dimensioned convolutional neural networks model of pre-training;
Second step constructs more example classification devices using Analysis On Multi-scale Features expression
The last layer of the good multiple dimensioned convolution model of pre-training is removed, adds the softmax layer of a random initializtion again,
Network parameter is finely adjusted using the target that image first frame gives;Then convolution is extracted respectively from the network of three kinds of scales
Three layers of characteristic pattern is as convolution feature;Two layers of convolution of feature for extracting fine dimension network simultaneously collectively constitutes display model
Multi-scale expression;Dimensionality reduction is carried out to two layers of characteristic pattern of convolution using maximum pond, reduces the data dimension of feature;By all volumes
Product feature connects and composes the multiple dimensioned display model of target;
Using obtained convolution feature as feature pool, learn two classifiers using multi-instance learning algorithm;It is learned using enhancing
The mode of habit maximizes objective function, that is, log-likelihood function, successively selects k Weak Classifier, and each Weak Classifier is added
Power summation, constructs more example classification devices;
Third step is improved more examples and is tracked online
In multi-instance learning algorithm, each exemplary likelihood probability is indicated are as follows:
P (y | x)=σ (H (x)) (2)
Wherein, x is that the feature space of image is expressed, and y is a dichotomic variable, is used to indicate in image with the presence or absence of target, H
It (x) is the strong classifier of multiple weak typings composition, σ (x) is Sigmoid function, i.e.,
A penalty factor is introduced in Sigmoid function slows down function saturation, improved Sigmoid function are as follows:
Wherein, k is the Weak Classifier number for forming strong classifier;
4th step, during tracking, using the multiple dimensioned convolutional neural networks model of multistep difference model modification
For thick scale network modeling, network model parameter is updated by the way of updating fastly, with the outer of timely adaptive model
See variation;For fine dimension network model, network model parameter is updated by the way of updating slowly, is avoided model from changing and is introduced
Error noise and mistake update;For medium scale network model, renewal frequency is therebetween;In this way,
Enable model to adapt to the cosmetic variation of target in time, while influence of the error tracking to model modification can be resisted;
When there is new frame image input, n candidate target frame { x is chosen around previous frame target position1..., xn,
According to p (y | x)=σ (H (x)), the objective result of the peak response position of likelihood probability frame thus is selected, as shown in formula (5):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611201895.0A CN106651915B (en) | 2016-12-23 | 2016-12-23 | The method for tracking target of multi-scale expression based on convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611201895.0A CN106651915B (en) | 2016-12-23 | 2016-12-23 | The method for tracking target of multi-scale expression based on convolutional neural networks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106651915A CN106651915A (en) | 2017-05-10 |
CN106651915B true CN106651915B (en) | 2019-08-09 |
Family
ID=58828084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611201895.0A Active CN106651915B (en) | 2016-12-23 | 2016-12-23 | The method for tracking target of multi-scale expression based on convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106651915B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107622507B (en) * | 2017-08-09 | 2020-04-07 | 中北大学 | Air target tracking method based on deep learning |
CN108682022B (en) * | 2018-04-25 | 2020-11-24 | 清华大学 | Visual tracking method and system based on anti-migration network |
CN108876754A (en) * | 2018-05-31 | 2018-11-23 | 深圳市唯特视科技有限公司 | A kind of remote sensing images missing data method for reconstructing based on depth convolutional neural networks |
CN108985365B (en) * | 2018-07-05 | 2021-10-01 | 重庆大学 | Multi-source heterogeneous data fusion method based on deep subspace switching ensemble learning |
CN109284680B (en) * | 2018-08-20 | 2022-02-08 | 北京粉笔蓝天科技有限公司 | Progressive image recognition method, device, system and storage medium |
CN111260536B (en) * | 2018-12-03 | 2022-03-08 | 中国科学院沈阳自动化研究所 | Digital image multi-scale convolution processor with variable parameters and implementation method thereof |
CN113228063A (en) * | 2019-01-04 | 2021-08-06 | 美国索尼公司 | Multiple prediction network |
CN111259930B (en) * | 2020-01-09 | 2023-04-25 | 南京信息工程大学 | General target detection method of self-adaptive attention guidance mechanism |
CN111681263B (en) * | 2020-05-25 | 2022-05-03 | 厦门大学 | Multi-scale antagonistic target tracking algorithm based on three-value quantization |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325125A (en) * | 2013-07-03 | 2013-09-25 | 北京工业大学 | Moving target tracking method based on improved multi-example learning algorithm |
CN105741316A (en) * | 2016-01-20 | 2016-07-06 | 西北工业大学 | Robust target tracking method based on deep learning and multi-scale correlation filtering |
CN105956532A (en) * | 2016-04-25 | 2016-09-21 | 大连理工大学 | Traffic scene classification method based on multi-scale convolution neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8345984B2 (en) * | 2010-01-28 | 2013-01-01 | Nec Laboratories America, Inc. | 3D convolutional neural networks for automatic human action recognition |
-
2016
- 2016-12-23 CN CN201611201895.0A patent/CN106651915B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325125A (en) * | 2013-07-03 | 2013-09-25 | 北京工业大学 | Moving target tracking method based on improved multi-example learning algorithm |
CN105741316A (en) * | 2016-01-20 | 2016-07-06 | 西北工业大学 | Robust target tracking method based on deep learning and multi-scale correlation filtering |
CN105956532A (en) * | 2016-04-25 | 2016-09-21 | 大连理工大学 | Traffic scene classification method based on multi-scale convolution neural network |
Also Published As
Publication number | Publication date |
---|---|
CN106651915A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106651915B (en) | The method for tracking target of multi-scale expression based on convolutional neural networks | |
US11195051B2 (en) | Method for person re-identification based on deep model with multi-loss fusion training strategy | |
CN109145939B (en) | Semantic segmentation method for small-target sensitive dual-channel convolutional neural network | |
Yang et al. | Knowledge distillation in generations: More tolerant teachers educate better students | |
CN105975931B (en) | A kind of convolutional neural networks face identification method based on multiple dimensioned pond | |
CN109583322B (en) | Face recognition deep network training method and system | |
CN106407986B (en) | A kind of identification method of image target of synthetic aperture radar based on depth model | |
CN106372581B (en) | Method for constructing and training face recognition feature extraction network | |
CN109492529A (en) | A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion | |
CN107203753A (en) | A kind of action identification method based on fuzzy neural network and graph model reasoning | |
CN108304826A (en) | Facial expression recognizing method based on convolutional neural networks | |
CN108133188A (en) | A kind of Activity recognition method based on motion history image and convolutional neural networks | |
CN107229904A (en) | A kind of object detection and recognition method based on deep learning | |
CN104298974B (en) | A kind of Human bodys' response method based on deep video sequence | |
CN107066559A (en) | A kind of method for searching three-dimension model based on deep learning | |
CN104636732B (en) | A kind of pedestrian recognition method based on the deep belief network of sequence | |
CN106778921A (en) | Personnel based on deep learning encoding model recognition methods again | |
CN107122798A (en) | Chin-up count detection method and device based on depth convolutional network | |
CN109815826A (en) | The generation method and device of face character model | |
CN103778414A (en) | Real-time face recognition method based on deep neural network | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN109086660A (en) | Training method, equipment and the storage medium of multi-task learning depth network | |
CN109033953A (en) | Training method, equipment and the storage medium of multi-task learning depth network | |
CN110321862B (en) | Pedestrian re-identification method based on compact ternary loss | |
CN109993102A (en) | Similar face retrieval method, apparatus and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |