CN109543615A

CN109543615A - A kind of double learning model method for tracking target based on multi-stage characteristics

Info

Publication number: CN109543615A
Application number: CN201811405327.1A
Authority: CN
Inventors: 张建明; 金晓康; 李旭东; 陆朝铨
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2019-03-29
Anticipated expiration: 2038-11-23
Also published as: CN109543615B

Abstract

The invention belongs to mode identification technology more particularly to a kind of double learning model method for tracking target based on multi-stage characteristics.This method is specific as follows: A1, reading target video sequence, obtains the target position in t frame image；A2, it is cut out in t+1 frame image and includes the candidate region of target position and carry out feature extraction；A3, training correlation filter model: A4, future position: A5, sorter model re-detection obtain the final position of target in t+1 frame image；The update of A6, dual model, real-time online update correlation filter model and sorter model.This method effectively increases the accuracy rate and success rate of tracking, has well adapted to quick movement, has blocked, the influence of the disturbing factors such as illumination.

Description

A kind of double learning model method for tracking target based on multi-stage characteristics

Technical field

The invention belongs to mode identification technology more particularly to a kind of double learning model targets based on multi-stage characteristics with Track method.

Background technique

Target following is that a basis and important research direction, basic thought are to pass through sequence in computer vision field The video information of column image establishes model, and correlation according to the time and spatially determines the posture and fortune of interested target Dynamic rail mark.Currently, target following technology is all widely used on civilian, military, as video monitoring, human-computer interaction, nobody Drive and the tracking of guided missile intercept etc., but still there is many factors that can not be overcome, as illumination variation, dimensional variation, It blocks, quickly movement, complex background and rotation etc..

In recent years, target following achieves significant progress, many competitive algorithms is proposed, according to apparent mould The difference of type is broadly divided into production and discriminate.Production track algorithm mainly models prospect, passes through minimum weight Structure error searches for candidate region, finds optimal matching position in the current frame, utilizes on-line study new mechanism object module. The problem of tracking problem is converted two classification by discriminate track algorithm, by acquiring one group of positive and negative samples, training in each frame Classifier with discriminating power, to maximumlly distinguish target and background.The performance of discriminate track algorithm relies primarily on In the method quality of feature extraction, the superiority and inferiority of classifier and the viability of online updating classifier mechanism.In order to overcome target Appearance with time change, indicate that target is particularly important using suitable Feature Descriptor, such as color histogram, Haar-like, SURF, HOG, subspace expression, super-pixel etc., even manifold set.

Currently, computer vision field has started the upsurge of deep learning.Depth convolutional neural networks (CNN) are strong due to it Big character representation ability, shows excellent performance, such as image classification, target detection and region of interest in many tasks The detection etc. in domain.Contain multiple convolutional layers in convolutional neural networks, pond layer and softmax layers.Wherein these convolutional layers have There is very strong discriminating power, while remaining the information of space and structuring.Depth convolution feature combination correlation filtering prediction bits Set be target following a hot spot, high-rise convolution feature includes semantic information abundant, and the convolution feature of low layer mentions Higher spatial resolution and edge detail information have been supplied, has been played an important role for accurately positioning target.

In discriminate track algorithm, the method for tracking target based on discriminate correlation filtering (DCF) is increasingly becoming research heat Point achieves good result on the standard data set of target following.It trains a correlation filter to be predicted, obtains The classification score of one target, and efficient meter is realized to all space cycle training samples using discrete Fourier transform It calculates, ensure that the real-time of tracking.Therefore, multiple track algorithms combined based on CNN and DCF are proposed in recent years.These are calculated Method relies on the expression ability powerful by convolution feature, achieves excellent tracking effect, while not needing that the time is spent to exist Line updates depth model, substantially increases the real-time of algorithm.

However, merging depth convolution feature in a DCF frame still remains limitation:

(1) in the Fusion Features stage of multilayer, high-rise convolution feature often has bigger weight, because of high-rise letter Breath has richer semantic information, and compared with the feature of low layer, the effect played is more obvious, therefore this is reasonable.But It is to mislead semantic information, and error to be exaggerated by online updating due to being easy to be interfered by various factors in tracking process Target drift is caused even to lose target.Therefore, the method for layered characteristic fusion does not explore effective pass between feature completely System, while the filter of multilayer is merged using weight, is not effectively utilized the information of the response diagram of multiple filters, is made At information waste.

(2) in the process of movement, in the case where seriously being blocked or quickly being moved, correlation filtering is difficult target Such challenge is adapted to, therefore interference information can be introduced into the continuous update of correlation filter by tracker, cause error Accumulation, while causing tracking drift even failure.In this case, single learning model cannot effectively be competent at tracking , need to intervene new correction mechanism.

Summary of the invention

(1) technical problems to be solved

For existing technical problem, the present invention provides a kind of double learning model target followings based on multi-stage characteristics Method, this method effectively increase the accuracy rate and success rate of tracking, well adapted to quick movement, block, illumination etc. it is dry Disturb the influence of factor.

(2) technical solution

In order to achieve the above object, the main technical schemes that the present invention uses include:

A kind of double learning model method for tracking target based on multi-stage characteristics, for target video sequence, given first In frame in the case where target original state, the state of target is estimated in video sequence below, is specifically comprised the following steps:

A1, target video sequence is read, obtains the target position in t frame image, t=1；

A2, cut out in t+1 frame image include the target position candidate region, the candidate region is the 2.5 times of regions in t frame image centered on target position；

The low-level features for extracting the depth characteristic of convolutional layer and the candidate region in the candidate region, will be each described The characteristic pattern of the output of depth characteristic and the output of low-level features as the multichannel convolutive neural network of target in convolutional layerM, N is respectively the width and height of characteristic pattern, and D is channel number；

A3, training correlation filter model:

The candidate region is divided into several cell fritters, establishes Gaussian function label Y for every cell fritter:

Pass through the Gaussian function label Y and characteristic pattern X_l, construct each layer, Mei Yite in multichannel convolutive neural network Levy the correlation filter in channel；

The characteristic pattern of each layer, each feature channel is obtained according to the correlation filter；

The relevant response figure of each layer of characteristic pattern and each is obtained according to the characteristic pattern of each layer, each feature channel Position in the relevant response figure of layer characteristic pattern where maximum response；

A4, future position:

The maximum response of the maximum response of current layer and preceding layer is weighted to obtain the position of preceding layer maximum response It sets, by iteration, obtains final response diagram；

Maximum response is found in final response diagram, using the position of maximum response as the predicted position of target；

A5, sorter model re-detection obtain the HPSR of t+1 frame image according to the relevant response figure of each layer of characteristic pattern Index；

When HPSR index is greater than or equal to given threshold θ, final position of the predicted position of the target as target；

When HPSR index is less than given threshold θ, sorter model is enabled, the result and target that sorter model is obtained Predicted position combine, obtain the final position of t+1 frame image object；

The update of A6, dual model, real-time online update correlation filter model and sorter model, are used for t+2 frame figure The target position of picture determines.

Further, the depth characteristic in the step A2 includes advanced features and mid-level features；

The characteristic pattern for extracting convolutional layer Conv3-4, convolutional layer Conv4-4, convolutional layer Conv5-4, as advanced features；

The characteristic pattern for extracting convolutional layer Conv1-2, convolutional layer Conv2-2, as mid-level features.

Further, the low-level features in the step A2 include HOG feature, Gray feature and Color Name feature, Three kinds of above-mentioned features are linked togather, as low-level features.

Further, before extracting depth characteristic and low-level features, the image of the candidate region is subjected to single precision Processing and resampling, calculate mean value, then go mean value to image after normalization；

Feature extraction is carried out to the candidate region using the network model of VGGNet-19 pre-training, extracts different convolution The depth characteristic of layer.

Further, the step A3 includes:

A31, the candidate region is divided into several cell fritters, establishes Gaussian function label Y for every cell fritter:

Wherein, Y (m, n) indicates the label at (m, n), m ∈ { 0,1 ..., M-1 }, n ∈ { 0,1 ..., N-1 }；σ is Parameter,

Wherein, w and h respectively indicates the width and height of target position in t frame, and σ ' indicates that the output factor, value are 0.1, cell_size indicates the side length of cell fritter in t+1 frame image, value 4；

A32, the correlation filter W for constructing each layer in multichannel convolutive neural network, each feature channel_l ^d:

Wherein, λ is the regularization parameter of correlation filter,WithRespectively X_lWith the discrete Fourier transform of Y, For X_lComplex conjugate；

A33, according to the correlation filter W_l ^dObtain the characteristic pattern of each layer, each feature channel Obtain l layers of relevant response figure E_lWith relevant response figure E_lIn maximum response:

Wherein,It is operated for inverse Fourier transform,ForDiscrete Fourier transform, relevant response figure E_lIt is big Small is M × N；

The response diagram of whole multi-stage characteristics is denoted as set { E₁,E₂...,E_l}。

Further, the step A4 includes:

A41, the maximum response of the maximum response of current layer and preceding layer is weighted to obtain preceding layer maximum response Position final response diagram is obtained by iteration；

E_l-1(m, n)=α_l-1E_l-1(m,n)+α_lE_l(m,n) (5)

Wherein, E_lThere is maximum response in the response diagram position where (m, n) in the characteristic pattern of l grades of (m, n) expression Value, α_lFor l grades of weight；

Obtain final response diagram E；

A42, maximum response is found by final response diagram E, position the center p of current tracking target_t=(x_t, y_t)；

(x_t,y_t)=arg_m,nmaxE(m,n) (6)。

Further,

Wherein, max (E_l) it is E_lIn maximum value, μ_lAnd σ_lL layers of the response diagram respectively in t+1 frame image Average value and standard deviation, β_iFor l layers of relevant response figure E_lWeight coefficient.

Further, in the step A5, sorter model is enabled, in the target p that step 42 obtains_t=(x_t,y_t) around Acquire candidate sample set x_SVM, svm classifier is completed using updating:

Wherein, scores is the classification score of classifier,It is respectively the parameter and biasing of classifier with b；

The position for the maximum scores being calculated is determined as the position that sorter model obtains, then with correlation filtering The result p of device model_t=(x_t,y_t) it is weighted synthesis, obtain final result.

Further, the step A6 includes:

A61, correlation filter model is updated, updates the molecule of t+1 frameAnd denominatorTo update t+1 frame Correlation filter W_l ^d,

Wherein,η is Learning rate；

A62, sorter model is updated, positive and negative sample is collected with the window of target sizes near the target position of t+1 frame This, carries out online SVM training after extracting feature；

When given training dataset is G={ (x_SVM,j,y_SVM,j), j=1 ... r }, r is the number of sample, x_SVM,jFor Training sample, y_SVM,jFor sample label, if the encirclement frame area of sample is m, it is n, their weight that the target of t+1 frame, which surrounds frame, Folded rate s=(m ∩ n)/(m ∪ n), is positive sample as s>0.5, and when s<0.1 is negative sample, trained objective function are as follows:

Determination after the completion of update, for t+2 frame target position.

(3) beneficial effect

The beneficial effects of the present invention are:

1, double learning model method for tracking target provided by the invention based on multi-stage characteristics, effectively increase the standard of tracking True rate and success rate have well adapted to quick movement, have blocked, the influence of the disturbing factors such as illumination.

2, the invention proposes the correlation filtering model of multi-stage characteristics fusion, target is obtained by way of recursion layer by layer Position.

3, the present invention obtains different response diagrams using the correlation filter of different stage, and each response diagram is calculated PSR, and be weighted, obtain the evaluation index to current location confidence level.

4, the invention proposes online updating sorter models, and by acquiring positive and negative samples around target, training has The classifier of discriminating power online detects target again, obtains the knot of comprehensive correlation filter model behind new position Fruit obtains final target position.

Detailed description of the invention

Fig. 1 is main algorithm flow chart of the invention；

Fig. 2 is the visualization schematic diagram of multi-stage characteristics in the present invention；

Fig. 3 is fluctuation situation of the HPSR on woman sequence image in the present invention；

Fig. 4 is the target following schematic diagram in the embodiment of the present invention.

Specific embodiment

In order to preferably explain the present invention, in order to understand, with reference to the accompanying drawing, by specific embodiment, to this hair It is bright to be described in detail.

A2, cut out in t+1 frame image include the target position candidate region, the candidate region is the 2.5 times of regions in t frame image centered on target position.The image of the candidate region is subjected to single precision processing and is adopted again Sample calculates mean value, then goes mean value to image after normalization.

The low-level features for extracting the depth characteristic of convolutional layer and the candidate region in the candidate region, will be each described The characteristic pattern of the output of depth characteristic and the output of low-level features as the multichannel convolutive neural network of target in convolutional layerM, N is respectively the width and height of characteristic pattern, and D is channel number.

Wherein, feature extraction is carried out to the candidate region using the network model of VGGNet-19 pre-training, extracted different The depth characteristic of convolutional layer.The characteristic pattern for extracting convolutional layer Conv3-4, convolutional layer Conv4-4, convolutional layer Conv5-4, as more Advanced features in grade feature；The characteristic pattern for extracting convolutional layer Conv1-2, convolutional layer Conv2-2, in multi-stage characteristics Grade feature.

Feature extraction, the feature are carried out again to the candidate region of target using preset multiple manual feature operators For HOG feature, Gray feature and Color Name feature, three kinds of above-mentioned features are linked togather, as in multi-stage characteristics Low-level features.

A3, training correlation filter model:

A4, future position.

E_l-1(m, n)=α_l-1E_l-1(m,n)+α_lE_l(m,n) (5)

Obtain final response diagram E；

A42, maximum response is found by final response diagram E, position the center p of current tracking target_t=(x_t, y_t), the predicted position as target:

(x_t,y_t)=arg_m,nmaxE(m,n) (6)。

A5, sorter model re-detection obtain the HPSR of t+1 frame image according to the relevant response figure of each layer of characteristic pattern Index:

Wherein, max (E_l) it is E_lIn maximum value, μ_lAnd σ_lL grades of response diagram is flat respectively in t frame image Mean value and standard deviation, β_iFor l grades of related corresponding figure E_lWeight coefficient.

When HPSR index is less than given threshold θ, sorter model is enabled, the result and target that sorter model is obtained Predicted position combine, obtain the final position of t+1 frame image object.

In the target p that step A42 is obtained_t=(x_t,y_t) the candidate sample set x of surrounding acquisition_SVM, SVM is completed using updating Classification:

Wherein, scores is the classification score of classifier,It is respectively the parameter and biasing of classifier with b.

A61, correlation filter model is updated, updates the molecule of t+1 frameAnd denominatorTo update t+1 frame Correlation filter

Wherein,η is Learning rate；

Determination after the completion of update, for t+2 frame target position.

The technical principle of the invention is described above in combination with a specific embodiment, these descriptions are intended merely to explain of the invention Principle shall not be construed in any way as a limitation of the scope of protection of the invention.Based on explaining herein, those skilled in the art It can associate with other specific embodiments of the invention without creative labor, these modes fall within this hair Within bright protection scope.

Claims

1. a kind of double learning model method for tracking target based on multi-stage characteristics, which is characterized in that it is directed to target video sequence, In given first frame in the case where target original state, the state of target is estimated in video sequence below, specifically include as Lower step:

A2, cut out in t+1 frame image include the target position candidate region；

The low-level features for extracting the depth characteristic of convolutional layer and the candidate region in the candidate region, by each convolution The characteristic pattern of the output of depth characteristic and the output of low-level features as the multichannel convolutive neural network of target in layerM, N is respectively the width and height of characteristic pattern, and D is channel number；

A3, training correlation filter model:

Pass through the Gaussian function label Y and characteristic pattern X_l, it is logical to construct each layer, each feature in multichannel convolutive neural network The correlation filter in road；

According to the characteristic pattern of each layer, each feature channel obtain each layer of characteristic pattern relevant response figure and each layer of spy Levy the position in the relevant response figure of figure where maximum response；

A4, future position:

The maximum response of the maximum response of current layer and preceding layer is weighted to obtain the position of preceding layer maximum response, is led to Iteration is crossed, final response diagram is obtained；

A5, sorter model re-detection refer to according to the HPSR that the relevant response figure of each layer of characteristic pattern obtains t+1 frame image Mark；

When HPSR index be less than given threshold θ when, enable sorter model, by sorter model obtain result and target it is pre- Location, which is set, to be combined, and the final position of t+1 frame image object is obtained；

The update of A6, dual model, real-time online updates correlation filter model and sorter model, for t+2 frame image Target position determines.

2. double learning model method for tracking target according to claim 1, which is characterized in that the candidate region is t 2.5 times of regions in frame image centered on target position.

3. double learning model method for tracking target according to claim 1, which is characterized in that the depth in the step A2 Feature includes advanced features and mid-level features；

4. double learning model method for tracking target according to claim 1, which is characterized in that rudimentary in the step A2 Feature includes HOG feature, Gray feature and Color Name feature, three kinds of above-mentioned features is linked togather, as rudimentary spy Sign.

5. double learning model method for tracking target according to claim 1, which is characterized in that extracting depth characteristic and low Before grade feature, the image of the candidate region is subjected to single precision processing and resampling, mean value is calculated after normalization, then right Image goes mean value；

Feature extraction is carried out to the candidate region using the network model of VGGNet-19 pre-training, extracts different convolutional layers Depth characteristic.

6. double learning model method for tracking target according to claim 1, which is characterized in that the step A3 includes:

Wherein, w and h respectively indicates the width and height of target position in t frame, and σ ' indicates the output factor, value 0.1, Cell_size indicates the side length of cell fritter in t+1 frame image, value 4；

Wherein, λ is the regularization parameter of correlation filter,WithRespectively X_lWith the discrete Fourier transform of Y,For X_l Complex conjugate；

Wherein,It is operated for inverse Fourier transform,ForDiscrete Fourier transform, relevant response figure E_lSize be M ×N；

7. double learning model method for tracking target according to claim 1, which is characterized in that the step A4 includes:

A41, the maximum response of the maximum response of current layer and preceding layer is weighted to obtain the position of preceding layer maximum response It sets, by iteration, obtains final response diagram；

E_l-1(m, n)=α_l-1E_l-1(m,n)+α_lE_l(m,n)(5)

Wherein, E_lThere is maximum response, α in the response diagram position where (m, n) in the characteristic pattern of l grades of (m, n) expression_l For l grades of weight；

Obtain final response diagram E；

A42, maximum response is found by final response diagram E, position the center p of current tracking target_t=(x_t,y_t)；

(x_t,y_t)=arg_m,nmax E(m,n) (6)。

8. double learning model method for tracking target according to claim 1, which is characterized in that

Wherein, max (E_l) it is E_lIn maximum value, μ_lAnd σ_lThe average value of l layers of response diagram respectively in t+1 frame image And standard deviation, β_iFor l layers of relevant response figure E_lWeight coefficient.

9. double learning model method for tracking target according to claim 7, which is characterized in that in the step A5, enable Sorter model, in the target p that step 42 obtains_t=(x_t,y_t) the candidate sample set x of surrounding acquisition_SVM, completed using updating Svm classifier:

The position for the maximum scores being calculated is determined as the position that sorter model obtains, then with correlation filter mould The result p of type_t=(x_t,y_t) it is weighted synthesis, obtain final result.

10. double learning model method for tracking target according to claim 9, which is characterized in that the step A6 includes:

A61, correlation filter model is updated, updates the molecule of t+1 frameAnd denominatorTo update the correlation of t+1 frame Filter W_l ^d,

Wherein,η is study Rate；

A62, sorter model is updated, positive negative sample is collected with the window of target sizes near the target position of t+1 frame, is mentioned Online SVM training is carried out after taking feature；

When given training dataset is G={ (x_SVM,j,y_SVM,j), j=1 ... r }, r is the number of sample, x_SVM,jFor training Sample, y_SVM,jFor sample label, if the encirclement frame area of sample is m, it is n, their Duplication that the target of t+1 frame, which surrounds frame, S=(m ∩ n)/(m ∪ n) is positive sample as s>0.5, is negative sample, trained objective function when s<0.1 are as follows:

Determination after the completion of update, for t+2 frame target position.