CN106651915B - The method for tracking target of multi-scale expression based on convolutional neural networks - Google Patents

The method for tracking target of multi-scale expression based on convolutional neural networks Download PDF

Info

Publication number
CN106651915B
CN106651915B CN201611201895.0A CN201611201895A CN106651915B CN 106651915 B CN106651915 B CN 106651915B CN 201611201895 A CN201611201895 A CN 201611201895A CN 106651915 B CN106651915 B CN 106651915B
Authority
CN
China
Prior art keywords
network
model
scale
target
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611201895.0A
Other languages
Chinese (zh)
Other versions
CN106651915A (en
Inventor
唐爽硕
王凡
胡小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201611201895.0A priority Critical patent/CN106651915B/en
Publication of CN106651915A publication Critical patent/CN106651915A/en
Application granted granted Critical
Publication of CN106651915B publication Critical patent/CN106651915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention belongs to technical field of image processing, the method for tracking target of the multi-scale expression based on convolutional neural networks is provided, comprising: multiple dimensioned convolutional neural networks structure pre-training;More example classification devices are constructed using Analysis On Multi-scale Features expression;More examples are improved to track online;Multistep difference model modification.Ability of the algorithm using the automatic study further feature of convolutional neural networks, the available deep layer image expression for being related to semantic information, while utilizing the multi-scale expression of laplacian pyramid building image, the multiple dimensioned convolutional neural networks structure of training.In conjunction with improved multi-instance learning algorithm, online tracker is constructed, realizes the tenacious tracking of target.

Description

The method for tracking target of multi-scale expression based on convolutional neural networks
Technical field
The present invention relates to the method for tracking target of the multi-scale expression based on convolutional neural networks, belong to image processing techniques Field.
Background technique
In recent years, target following technology is rapidly developed, still with the proposition of a large amount of target tracking algorisms Since in actual tracking, there are many Difficulties for target following task, for example, object blocks, visual angle change, target deformation, Ambient lighting changes and is difficult to expect complicated background, causes many problems of existing algorithm.Based on differentiation In the target tracking algorism of model, display model, two classifiers of training, thus handle usually are constructed using the difference of target and background Target is separated from background.Existing most of track algorithms rely on the appearance mould of the feature construction target of hand-designed Type is unable to the essential information of effective expression target, especially in complex condition, has to the ability to express of the display model of target Limit, causes the failure of object module.During tracking, due to the error that the error tracking of target introduces, building up can be made At drifting problem.Track algorithm based on multi-instance learning can solve drifting problem to a certain extent, but due to model letter Number is easily saturated itself, so that the separating capacity of model declines, causes to limit to tracking performance.
Summary of the invention
In view of the problems of the existing technology, the present invention carries out multi-resolution decomposition to image using laplacian pyramid, A kind of target tracking algorism of multi-scale expression based on convolutional neural networks is provided.The algorithm utilizes oneself of convolutional neural networks The ability of dynamic study further feature, the available deep layer image expression for being related to semantic information, while utilizing Laplce's gold word Tower constructs the multi-scale expression of image, the multiple dimensioned convolutional neural networks structure of training.In conjunction with improved multi-instance learning algorithm, Online tracker is constructed, realizes the tenacious tracking of target.
The technical solution of the present invention is as follows:
The method for tracking target of multi-scale expression based on convolutional neural networks, comprising the following steps:
The first step, multiple dimensioned convolutional neural networks structure pre-training;
Second step constructs more example classification devices using Analysis On Multi-scale Features expression;
Third step is improved more examples and is tracked online;
4th step, multistep difference model modification.
The invention has the advantages that:, there are multiple dimensioned structural information, the thick scale of image usually reflects figure in natural image The overall structure of picture, the fine dimension of image include more image detail.Image is carried out using laplacian pyramid more Scale Decomposition proposes the target tracking algorism of the multi-scale expression based on convolutional neural networks.This method can extract more rulers The convolution feature of degree constitutes the stronger display model of ability to express.In combination with improved multi-instance learning algorithm, model is solved The problem of easily model separating capacity caused by saturation declines.Compared with existing target tracking algorism, this method is able to achieve more Stable tracking, accuracy are higher.
Detailed description of the invention
Fig. 1 is convolutional neural networks structural schematic diagram;
Fig. 2 is multiple dimensioned convolutional neural networks training schematic diagram;
Fig. 3 is the percentage of different errors of centration distances;
Fig. 4 is to successfully track frame percentage.
Specific embodiment
The present invention will be further described below.
The method for tracking target of multi-scale expression based on convolutional neural networks, comprising the following steps:
The first step, multiple dimensioned convolutional neural networks model pre-training
Laplace transform is done to image, constructs the pyramid space of image, then extracts the three of laplacian pyramid Input of the image as network model under kind scale;Multiple dimensioned convolutional Neural net is built using Lasagne deep learning frame Network model constitutes network model pond;Each network model includes three convolutional layers, two full articulamentums and one Softmax layers;Network model is as shown in Figure 1.The shallow structure initialization network parameter of VGG-net is used simultaneously.
During pre-training, network parameter is continued to optimize using part of standards track file;Every kind of scale image point Do not correspond to thick scale network, medium scale network and fine dimension network, network share parameter between different scale, scale by slightly to Carefully it is trained.In order to obtain different classes of object information, for the corresponding different network of different classes of video set building, to catch The common feature of different classes of object is obtained, shares network parameter repetitive exercise in addition to the last layer between network, as shown in Figure 2.? In training process, using cross entropy as loss function L, form of Definition are as follows:
L=- ∑itilog(pi) (1)
Wherein, tiFor the authentic signature (target or background) of i-th of image block, piPrediction for i-th of image block is general Rate.Network parameter is continued to optimize using gradient descent method (SGD) in the training process, until all samples are trained up, The network parameter for finally retaining three kinds of scales, obtains the good multiple dimensioned convolutional neural networks model of pre-training.
Second step constructs more example classification devices using Analysis On Multi-scale Features expression
The last layer of the good multiple dimensioned convolution model of pre-training is removed, adds a random initializtion again Softmax layers, network parameter is finely adjusted using the target that image first frame gives.Then divide from the network of three kinds of scales Indescribably take three layers of convolution of characteristic pattern as convolution feature.Common group of feature for extracting two layers of convolution of fine dimension network simultaneously At the multi-scale expression of display model.In order to reduce the data dimension of feature, using maximum pond to two layers of characteristic pattern of convolution into Row dimensionality reduction.All convolution features are connected and composed to the multiple dimensioned display model of target.
In order to realize the online updating of target, need to object module real-time update.Using obtained convolution feature as spy Pond is levied, learns two classifiers using multi-instance learning algorithm.The classifier be one be made of multiple Weak Classifiers it is strong Classifier.Its implementation are as follows: by the way of enhancing study, maximize objective function, that is, log-likelihood function, successively select K A Weak Classifier, and by each Weak Classifier weighted sum, to construct more example classification devices.
Third step is improved more examples and is tracked online
In multi-instance learning algorithm, each exemplary likelihood probability is indicated are as follows:
P (y | x)=σ (H (x)) (2)
Wherein, x is that the feature space of image is expressed, and y is a dichotomic variable, is used to indicate in image with the presence or absence of mesh Mark, H (x) are the strong classifier of multiple weak typings composition, and σ (x) is Sigmoid function, i.e.,
By the property of Sigmoid function it is found that when x is gradually increased or is gradually reduced, function is easy to be saturated.Work as selection When weak typing constitutes strong classifier, it is easy to cause overfitting problem.In order to solve this problem, we are in Sigmoid function One penalty factor of middle introducing is saturated to slow down function, improved Sigmoid function are as follows:
Wherein, k is the Weak Classifier number for forming strong classifier.When the number of Weak Classifier gradually increases, punishment because Son can quickly inhibit the size of independent variable to a reasonable range, slow down the speed of function saturation, while can ensure letter Number convergence.
4th step, multistep difference model modification
During tracking, multiple dimensioned convolutional neural networks model is updated by the way of multistep difference model modification.
For thick scale network modeling, network model parameter is updated by the way of updating fastly, with timely adaptive model Cosmetic variation;For fine dimension network model, network model parameter is updated by the way of updating slowly, can be avoided mould Type changes the error noise that may be introduced and mistake updates;For medium scale network model, renewal frequency is therebetween. In this way, enable model to adapt to the cosmetic variation of target in time, while error tracking can be resisted to model more New influence.
When there is new frame image input, n candidate target frame { x is chosen around previous frame target position1,…, xn, according to p (y | x)=σ (H (x)), the objective result of the peak response position of likelihood probability frame thus is selected, such as formula (5) It is shown.
We carry out the method for tracking target of the multi-scale expression based on convolutional neural networks of proposition in terms of two Analysis verifying.It is the accurate rate of track algorithm first, the followed by success rate of algorithm.And use target following standard data set (OTB) part sequence of pictures is tested, and classical MIL, TLD, Struck, SCM, KCF and TGPR method conduct pair is chosen According to.
With regard to the accurate rate aspect of algorithm, we carry out the essence of evaluation algorithms using the errors of centration of tracking target and actual position Exactness calculates the Euclidean distance of tracking target and actual position, different distances is arranged as threshold value, statistics reaches different threshold values It is required that percentage, and the corresponding percentage of selected threshold 20 be final score.As a result as shown in figure 3, as seen from the figure we Method obtain higher score, the essence of the method for tracking target tracking of this multi-scale expression of the explanation based on convolutional neural networks True rate is higher.
With regard to the success rate aspect of algorithm, we calculate the coincidence factor of tracking target and actual position according to formula (6)
Wherein, rtFor the area for tracking target, roFor the area of real goal, ∩ represents intersection operation, and ∪ represents union behaviour Make.Using coincidence factor as threshold value, the successful percentage under different threshold values is counted, and using AUC size as final score.Knot For fruit as shown in figure 4, our method obtains higher AUC value as seen from the figure, this illustrates more rulers based on convolutional neural networks The method for tracking target tracking of degree expression has higher success rate.

Claims (1)

1. a kind of method for tracking target of the multi-scale expression based on convolutional neural networks, it is characterised in that following steps:
The first step, multiple dimensioned convolutional neural networks model pre-training
Laplace transform is done to image, constructs the pyramid space of image, under three kinds of scales for extracting laplacian pyramid Input of the image as network model;Multiple dimensioned convolutional neural networks model, structure are built using Lasagne deep learning frame At network model basin;Each network model includes three convolutional layers, two full articulamentums and one softmax layers;Simultaneously Using the shallow structure initialization network parameter of VGG-net;
During pre-training, track file continuouslys optimize network parameter;Every kind of scale image respectively corresponds thick scale net Network, medium scale network and fine dimension network;Network share parameter between different scale, scale are trained from thick to thin;
Different networks is constructed for different classes of video set, for obtaining different classes of object information;Last is removed between network The outer shared network parameter repetitive exercise of layer, for capturing the common feature of different classes of object;In the training process, using intersection Entropy is as loss function L, form of Definition are as follows:
L=- ∑itilog(pi) (1)
Wherein, tiFor the authentic signature of i-th of image block, i.e. target or background;piFor the prediction probability of i-th of image block;
Network parameter is continued to optimize using gradient descent method SGD in the training process, until all samples are trained up, most The network parameter for retaining three kinds of scales afterwards, obtains the good multiple dimensioned convolutional neural networks model of pre-training;
Second step constructs more example classification devices using Analysis On Multi-scale Features expression
The last layer of the good multiple dimensioned convolution model of pre-training is removed, adds the softmax layer of a random initializtion again, Network parameter is finely adjusted using the target that image first frame gives;Then convolution is extracted respectively from the network of three kinds of scales Three layers of characteristic pattern is as convolution feature;Two layers of convolution of feature for extracting fine dimension network simultaneously collectively constitutes display model Multi-scale expression;Dimensionality reduction is carried out to two layers of characteristic pattern of convolution using maximum pond, reduces the data dimension of feature;By all volumes Product feature connects and composes the multiple dimensioned display model of target;
Using obtained convolution feature as feature pool, learn two classifiers using multi-instance learning algorithm;It is learned using enhancing The mode of habit maximizes objective function, that is, log-likelihood function, successively selects k Weak Classifier, and each Weak Classifier is added Power summation, constructs more example classification devices;
Third step is improved more examples and is tracked online
In multi-instance learning algorithm, each exemplary likelihood probability is indicated are as follows:
P (y | x)=σ (H (x)) (2)
Wherein, x is that the feature space of image is expressed, and y is a dichotomic variable, is used to indicate in image with the presence or absence of target, H It (x) is the strong classifier of multiple weak typings composition, σ (x) is Sigmoid function, i.e.,
A penalty factor is introduced in Sigmoid function slows down function saturation, improved Sigmoid function are as follows:
Wherein, k is the Weak Classifier number for forming strong classifier;
4th step, during tracking, using the multiple dimensioned convolutional neural networks model of multistep difference model modification
For thick scale network modeling, network model parameter is updated by the way of updating fastly, with the outer of timely adaptive model See variation;For fine dimension network model, network model parameter is updated by the way of updating slowly, is avoided model from changing and is introduced Error noise and mistake update;For medium scale network model, renewal frequency is therebetween;In this way, Enable model to adapt to the cosmetic variation of target in time, while influence of the error tracking to model modification can be resisted;
When there is new frame image input, n candidate target frame { x is chosen around previous frame target position1..., xn, According to p (y | x)=σ (H (x)), the objective result of the peak response position of likelihood probability frame thus is selected, as shown in formula (5):
CN201611201895.0A 2016-12-23 2016-12-23 The method for tracking target of multi-scale expression based on convolutional neural networks Active CN106651915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611201895.0A CN106651915B (en) 2016-12-23 2016-12-23 The method for tracking target of multi-scale expression based on convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611201895.0A CN106651915B (en) 2016-12-23 2016-12-23 The method for tracking target of multi-scale expression based on convolutional neural networks

Publications (2)

Publication Number Publication Date
CN106651915A CN106651915A (en) 2017-05-10
CN106651915B true CN106651915B (en) 2019-08-09

Family

ID=58828084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611201895.0A Active CN106651915B (en) 2016-12-23 2016-12-23 The method for tracking target of multi-scale expression based on convolutional neural networks

Country Status (1)

Country Link
CN (1) CN106651915B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622507B (en) * 2017-08-09 2020-04-07 中北大学 Air target tracking method based on deep learning
CN108682022B (en) * 2018-04-25 2020-11-24 清华大学 Visual tracking method and system based on anti-migration network
CN108876754A (en) * 2018-05-31 2018-11-23 深圳市唯特视科技有限公司 A kind of remote sensing images missing data method for reconstructing based on depth convolutional neural networks
CN108985365B (en) * 2018-07-05 2021-10-01 重庆大学 Multi-source heterogeneous data fusion method based on deep subspace switching ensemble learning
CN109284680B (en) * 2018-08-20 2022-02-08 北京粉笔蓝天科技有限公司 Progressive image recognition method, device, system and storage medium
CN111260536B (en) * 2018-12-03 2022-03-08 中国科学院沈阳自动化研究所 Digital image multi-scale convolution processor with variable parameters and implementation method thereof
CN113228063A (en) * 2019-01-04 2021-08-06 美国索尼公司 Multiple prediction network
CN111259930B (en) * 2020-01-09 2023-04-25 南京信息工程大学 General target detection method of self-adaptive attention guidance mechanism
CN111681263B (en) * 2020-05-25 2022-05-03 厦门大学 Multi-scale antagonistic target tracking algorithm based on three-value quantization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325125A (en) * 2013-07-03 2013-09-25 北京工业大学 Moving target tracking method based on improved multi-example learning algorithm
CN105741316A (en) * 2016-01-20 2016-07-06 西北工业大学 Robust target tracking method based on deep learning and multi-scale correlation filtering
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8345984B2 (en) * 2010-01-28 2013-01-01 Nec Laboratories America, Inc. 3D convolutional neural networks for automatic human action recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325125A (en) * 2013-07-03 2013-09-25 北京工业大学 Moving target tracking method based on improved multi-example learning algorithm
CN105741316A (en) * 2016-01-20 2016-07-06 西北工业大学 Robust target tracking method based on deep learning and multi-scale correlation filtering
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network

Also Published As

Publication number Publication date
CN106651915A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106651915B (en) The method for tracking target of multi-scale expression based on convolutional neural networks
US11195051B2 (en) Method for person re-identification based on deep model with multi-loss fusion training strategy
CN109145939B (en) Semantic segmentation method for small-target sensitive dual-channel convolutional neural network
Yang et al. Knowledge distillation in generations: More tolerant teachers educate better students
CN105975931B (en) A kind of convolutional neural networks face identification method based on multiple dimensioned pond
CN109583322B (en) Face recognition deep network training method and system
CN106407986B (en) A kind of identification method of image target of synthetic aperture radar based on depth model
CN106372581B (en) Method for constructing and training face recognition feature extraction network
CN109492529A (en) A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN107203753A (en) A kind of action identification method based on fuzzy neural network and graph model reasoning
CN108304826A (en) Facial expression recognizing method based on convolutional neural networks
CN108133188A (en) A kind of Activity recognition method based on motion history image and convolutional neural networks
CN107229904A (en) A kind of object detection and recognition method based on deep learning
CN104298974B (en) A kind of Human bodys' response method based on deep video sequence
CN107066559A (en) A kind of method for searching three-dimension model based on deep learning
CN104636732B (en) A kind of pedestrian recognition method based on the deep belief network of sequence
CN106778921A (en) Personnel based on deep learning encoding model recognition methods again
CN107122798A (en) Chin-up count detection method and device based on depth convolutional network
CN109815826A (en) The generation method and device of face character model
CN103778414A (en) Real-time face recognition method based on deep neural network
CN110378208B (en) Behavior identification method based on deep residual error network
CN109086660A (en) Training method, equipment and the storage medium of multi-task learning depth network
CN109033953A (en) Training method, equipment and the storage medium of multi-task learning depth network
CN110321862B (en) Pedestrian re-identification method based on compact ternary loss
CN109993102A (en) Similar face retrieval method, apparatus and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant