CN106651915B

CN106651915B - The method for tracking target of multi-scale expression based on convolutional neural networks

Info

Publication number: CN106651915B
Application number: CN201611201895.0A
Authority: CN
Inventors: 唐爽硕; 王凡; 胡小鹏
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2016-12-23
Filing date: 2016-12-23
Publication date: 2019-08-09
Anticipated expiration: 2036-12-23
Also published as: CN106651915A

Abstract

The invention belongs to technical field of image processing, the method for tracking target of the multi-scale expression based on convolutional neural networks is provided, comprising: multiple dimensioned convolutional neural networks structure pre-training；More example classification devices are constructed using Analysis On Multi-scale Features expression；More examples are improved to track online；Multistep difference model modification.Ability of the algorithm using the automatic study further feature of convolutional neural networks, the available deep layer image expression for being related to semantic information, while utilizing the multi-scale expression of laplacian pyramid building image, the multiple dimensioned convolutional neural networks structure of training.In conjunction with improved multi-instance learning algorithm, online tracker is constructed, realizes the tenacious tracking of target.

Description

The method for tracking target of multi-scale expression based on convolutional neural networks

Technical field

The present invention relates to the method for tracking target of the multi-scale expression based on convolutional neural networks, belong to image processing techniques Field.

Background technique

In recent years, target following technology is rapidly developed, still with the proposition of a large amount of target tracking algorisms Since in actual tracking, there are many Difficulties for target following task, for example, object blocks, visual angle change, target deformation, Ambient lighting changes and is difficult to expect complicated background, causes many problems of existing algorithm.Based on differentiation In the target tracking algorism of model, display model, two classifiers of training, thus handle usually are constructed using the difference of target and background Target is separated from background.Existing most of track algorithms rely on the appearance mould of the feature construction target of hand-designed Type is unable to the essential information of effective expression target, especially in complex condition, has to the ability to express of the display model of target Limit, causes the failure of object module.During tracking, due to the error that the error tracking of target introduces, building up can be made At drifting problem.Track algorithm based on multi-instance learning can solve drifting problem to a certain extent, but due to model letter Number is easily saturated itself, so that the separating capacity of model declines, causes to limit to tracking performance.

Summary of the invention

In view of the problems of the existing technology, the present invention carries out multi-resolution decomposition to image using laplacian pyramid, A kind of target tracking algorism of multi-scale expression based on convolutional neural networks is provided.The algorithm utilizes oneself of convolutional neural networks The ability of dynamic study further feature, the available deep layer image expression for being related to semantic information, while utilizing Laplce's gold word Tower constructs the multi-scale expression of image, the multiple dimensioned convolutional neural networks structure of training.In conjunction with improved multi-instance learning algorithm, Online tracker is constructed, realizes the tenacious tracking of target.

The technical solution of the present invention is as follows:

The method for tracking target of multi-scale expression based on convolutional neural networks, comprising the following steps:

The first step, multiple dimensioned convolutional neural networks structure pre-training；

Second step constructs more example classification devices using Analysis On Multi-scale Features expression；

Third step is improved more examples and is tracked online；

4th step, multistep difference model modification.

The invention has the advantages that:, there are multiple dimensioned structural information, the thick scale of image usually reflects figure in natural image The overall structure of picture, the fine dimension of image include more image detail.Image is carried out using laplacian pyramid more Scale Decomposition proposes the target tracking algorism of the multi-scale expression based on convolutional neural networks.This method can extract more rulers The convolution feature of degree constitutes the stronger display model of ability to express.In combination with improved multi-instance learning algorithm, model is solved The problem of easily model separating capacity caused by saturation declines.Compared with existing target tracking algorism, this method is able to achieve more Stable tracking, accuracy are higher.

Detailed description of the invention

Fig. 1 is convolutional neural networks structural schematic diagram；

Fig. 2 is multiple dimensioned convolutional neural networks training schematic diagram；

Fig. 3 is the percentage of different errors of centration distances；

Fig. 4 is to successfully track frame percentage.

Specific embodiment

The present invention will be further described below.

The first step, multiple dimensioned convolutional neural networks model pre-training

Laplace transform is done to image, constructs the pyramid space of image, then extracts the three of laplacian pyramid Input of the image as network model under kind scale；Multiple dimensioned convolutional Neural net is built using Lasagne deep learning frame Network model constitutes network model pond；Each network model includes three convolutional layers, two full articulamentums and one Softmax layers；Network model is as shown in Figure 1.The shallow structure initialization network parameter of VGG-net is used simultaneously.

During pre-training, network parameter is continued to optimize using part of standards track file；Every kind of scale image point Do not correspond to thick scale network, medium scale network and fine dimension network, network share parameter between different scale, scale by slightly to Carefully it is trained.In order to obtain different classes of object information, for the corresponding different network of different classes of video set building, to catch The common feature of different classes of object is obtained, shares network parameter repetitive exercise in addition to the last layer between network, as shown in Figure 2.? In training process, using cross entropy as loss function L, form of Definition are as follows:

L=- ∑_it_ilog(p_i) (1)

Wherein, t_iFor the authentic signature (target or background) of i-th of image block, p_iPrediction for i-th of image block is general Rate.Network parameter is continued to optimize using gradient descent method (SGD) in the training process, until all samples are trained up, The network parameter for finally retaining three kinds of scales, obtains the good multiple dimensioned convolutional neural networks model of pre-training.

Second step constructs more example classification devices using Analysis On Multi-scale Features expression

The last layer of the good multiple dimensioned convolution model of pre-training is removed, adds a random initializtion again Softmax layers, network parameter is finely adjusted using the target that image first frame gives.Then divide from the network of three kinds of scales Indescribably take three layers of convolution of characteristic pattern as convolution feature.Common group of feature for extracting two layers of convolution of fine dimension network simultaneously At the multi-scale expression of display model.In order to reduce the data dimension of feature, using maximum pond to two layers of characteristic pattern of convolution into Row dimensionality reduction.All convolution features are connected and composed to the multiple dimensioned display model of target.

In order to realize the online updating of target, need to object module real-time update.Using obtained convolution feature as spy Pond is levied, learns two classifiers using multi-instance learning algorithm.The classifier be one be made of multiple Weak Classifiers it is strong Classifier.Its implementation are as follows: by the way of enhancing study, maximize objective function, that is, log-likelihood function, successively select K A Weak Classifier, and by each Weak Classifier weighted sum, to construct more example classification devices.

Third step is improved more examples and is tracked online

In multi-instance learning algorithm, each exemplary likelihood probability is indicated are as follows:

P (y | x)=σ (H (x)) (2)

Wherein, x is that the feature space of image is expressed, and y is a dichotomic variable, is used to indicate in image with the presence or absence of mesh Mark, H (x) are the strong classifier of multiple weak typings composition, and σ (x) is Sigmoid function, i.e.,

By the property of Sigmoid function it is found that when x is gradually increased or is gradually reduced, function is easy to be saturated.Work as selection When weak typing constitutes strong classifier, it is easy to cause overfitting problem.In order to solve this problem, we are in Sigmoid function One penalty factor of middle introducing is saturated to slow down function, improved Sigmoid function are as follows:

Wherein, k is the Weak Classifier number for forming strong classifier.When the number of Weak Classifier gradually increases, punishment because Son can quickly inhibit the size of independent variable to a reasonable range, slow down the speed of function saturation, while can ensure letter Number convergence.

4th step, multistep difference model modification

During tracking, multiple dimensioned convolutional neural networks model is updated by the way of multistep difference model modification.

For thick scale network modeling, network model parameter is updated by the way of updating fastly, with timely adaptive model Cosmetic variation；For fine dimension network model, network model parameter is updated by the way of updating slowly, can be avoided mould Type changes the error noise that may be introduced and mistake updates；For medium scale network model, renewal frequency is therebetween. In this way, enable model to adapt to the cosmetic variation of target in time, while error tracking can be resisted to model more New influence.

When there is new frame image input, n candidate target frame { x is chosen around previous frame target position₁,…, x_n, according to p (y | x)=σ (H (x)), the objective result of the peak response position of likelihood probability frame thus is selected, such as formula (5) It is shown.

We carry out the method for tracking target of the multi-scale expression based on convolutional neural networks of proposition in terms of two Analysis verifying.It is the accurate rate of track algorithm first, the followed by success rate of algorithm.And use target following standard data set (OTB) part sequence of pictures is tested, and classical MIL, TLD, Struck, SCM, KCF and TGPR method conduct pair is chosen According to.

With regard to the accurate rate aspect of algorithm, we carry out the essence of evaluation algorithms using the errors of centration of tracking target and actual position Exactness calculates the Euclidean distance of tracking target and actual position, different distances is arranged as threshold value, statistics reaches different threshold values It is required that percentage, and the corresponding percentage of selected threshold 20 be final score.As a result as shown in figure 3, as seen from the figure we Method obtain higher score, the essence of the method for tracking target tracking of this multi-scale expression of the explanation based on convolutional neural networks True rate is higher.

With regard to the success rate aspect of algorithm, we calculate the coincidence factor of tracking target and actual position according to formula (6)

Wherein, r_tFor the area for tracking target, r_oFor the area of real goal, ∩ represents intersection operation, and ∪ represents union behaviour Make.Using coincidence factor as threshold value, the successful percentage under different threshold values is counted, and using AUC size as final score.Knot For fruit as shown in figure 4, our method obtains higher AUC value as seen from the figure, this illustrates more rulers based on convolutional neural networks The method for tracking target tracking of degree expression has higher success rate.

Claims

1. a kind of method for tracking target of the multi-scale expression based on convolutional neural networks, it is characterised in that following steps:

Laplace transform is done to image, constructs the pyramid space of image, under three kinds of scales for extracting laplacian pyramid Input of the image as network model；Multiple dimensioned convolutional neural networks model, structure are built using Lasagne deep learning frame At network model basin；Each network model includes three convolutional layers, two full articulamentums and one softmax layers；Simultaneously Using the shallow structure initialization network parameter of VGG-net；

During pre-training, track file continuouslys optimize network parameter；Every kind of scale image respectively corresponds thick scale net Network, medium scale network and fine dimension network；Network share parameter between different scale, scale are trained from thick to thin；

Different networks is constructed for different classes of video set, for obtaining different classes of object information；Last is removed between network The outer shared network parameter repetitive exercise of layer, for capturing the common feature of different classes of object；In the training process, using intersection Entropy is as loss function L, form of Definition are as follows:

L=- ∑_it_ilog(p_i) (1)

Wherein, t_iFor the authentic signature of i-th of image block, i.e. target or background；p_iFor the prediction probability of i-th of image block；

Network parameter is continued to optimize using gradient descent method SGD in the training process, until all samples are trained up, most The network parameter for retaining three kinds of scales afterwards, obtains the good multiple dimensioned convolutional neural networks model of pre-training；

The last layer of the good multiple dimensioned convolution model of pre-training is removed, adds the softmax layer of a random initializtion again, Network parameter is finely adjusted using the target that image first frame gives；Then convolution is extracted respectively from the network of three kinds of scales Three layers of characteristic pattern is as convolution feature；Two layers of convolution of feature for extracting fine dimension network simultaneously collectively constitutes display model Multi-scale expression；Dimensionality reduction is carried out to two layers of characteristic pattern of convolution using maximum pond, reduces the data dimension of feature；By all volumes Product feature connects and composes the multiple dimensioned display model of target；

Using obtained convolution feature as feature pool, learn two classifiers using multi-instance learning algorithm；It is learned using enhancing The mode of habit maximizes objective function, that is, log-likelihood function, successively selects k Weak Classifier, and each Weak Classifier is added Power summation, constructs more example classification devices；

Third step is improved more examples and is tracked online

P (y | x)=σ (H (x)) (2)

Wherein, x is that the feature space of image is expressed, and y is a dichotomic variable, is used to indicate in image with the presence or absence of target, H It (x) is the strong classifier of multiple weak typings composition, σ (x) is Sigmoid function, i.e.,

A penalty factor is introduced in Sigmoid function slows down function saturation, improved Sigmoid function are as follows:

Wherein, k is the Weak Classifier number for forming strong classifier；

4th step, during tracking, using the multiple dimensioned convolutional neural networks model of multistep difference model modification

For thick scale network modeling, network model parameter is updated by the way of updating fastly, with the outer of timely adaptive model See variation；For fine dimension network model, network model parameter is updated by the way of updating slowly, is avoided model from changing and is introduced Error noise and mistake update；For medium scale network model, renewal frequency is therebetween；In this way, Enable model to adapt to the cosmetic variation of target in time, while influence of the error tracking to model modification can be resisted；

When there is new frame image input, n candidate target frame { x is chosen around previous frame target position₁..., x_n, According to p (y | x)=σ (H (x)), the objective result of the peak response position of likelihood probability frame thus is selected, as shown in formula (5):