CN108830878A

CN108830878A - A kind of method for tracking target based on FPN neural network

Info

Publication number: CN108830878A
Application number: CN201810329415.1A
Authority: CN
Inventors: 罗均; 高建焘; 李小毛; 谢少荣; 彭艳
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2018-11-16
Anticipated expiration: 2038-04-13
Also published as: CN108830878B

Abstract

The invention proposes a kind of method for tracking target based on FPN neural network.This method does not use traditional VGG deep neural network, but uses FPN deep neural network instead.Using the good fusion faculty of discriminating power possessed by spatial information possessed by the depth characteristic figure in FPN deep neural network to the output of neural network shallow-layer network and the depth characteristic figure of deep layer network output, to improve target tracking accuracy.This method is a kind of track algorithm of realtime robustness, is achieved good results in different tracking scenes.

Description

Target tracking method based on FPN neural network

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a target tracking method based on an FPN neural network.

Background

Target tracking is one of the most active researches on computer vision technology because of wide application in many fields such as behavior analysis, vehicle navigation, human-computer interaction, medical imaging, video monitoring and the like. Target tracking refers to the location of a target in the first frame of a given video, with each frame following it, for target localization. The core problem of target tracking follows targets that change over time. Although the target tracking algorithm is rapidly developed under the continuous research of scholars at home and abroad in recent years, a good effect cannot be obtained under the conditions of severe illumination change, rapid target movement, partial shielding and the like.

In recent years, scholars at home and abroad propose various tracking algorithms which can be mainly divided into two types: one is based on generative models describing and characterizing the target itself; another class is discriminant models that aim at separating objects from the background. The generative model focuses on establishing a representation of the target appearance model, and although it is important to construct an effective appearance model to handle various challenging situations in tracking, at the same time, it also adds significant computational complexity and discards useful information around the target area that can be used to better separate the object from the background; the discriminant model converts the tracking problem into a two-classification problem of a target and a background, namely, the tracked target is used as a foreground, and the foreground target and the background are distinguished by using a judger for online learning or offline training, so that the position of the foreground target is obtained. Before judgment, feature extraction is often performed to serve as a judgment basis to improve the judgment accuracy, but this also results in a large number of candidate samples needing feature extraction, so that the real-time performance is difficult to achieve.

Correlation filtering is a conventional signal processing method that describes the degree of similarity between two samples. The 2010 MOSSE algorithm is introduced into target tracking, so that the tracking algorithm speed reaches a high speed state, but the accuracy is low due to the fact that random sampling is adopted, and the number of positive and negative training is insufficient. The CSK algorithm in 2012 based on the MOSSE algorithm carries out dense sampling on the target by establishing a cyclic shift structure, so that the number of positive and negative samples is increased, and the problem of insufficient training samples in target tracking is solved. Besides, the calculation of the target sample is converted into the solution in the frequency domain by processing the samples through cyclic displacement, and the efficiency of target tracking is greatly improved by using a fast Fourier transform method. However, the CSK algorithm adopts a single-channel gray scale feature, and is not robust enough in feature characterization. Aiming at the problems, in 2015, the CN algorithm is changed to be the color characteristic of multiple channels, and the KCF algorithm is changed to be the HOG characteristic of multiple channels, so that the precision is improved. However, the CN algorithm and the KCF algorithm use fixed-size templates in convolution solution, so that the model has no scale self-adaptation function, the DSST algorithm is additionally provided with a scale filter on the basis of an original position filter, the FDSST algorithm is improved on the basis of the DSST algorithm to increase the tracking speed of the DSST algorithm, and the SAMF algorithm obtains candidate samples through multi-scale sampling to enable the model to have scale adaptability. Since the number of positive and negative samples is increased by using the cyclic shift to construct the samples, the image pixels cross the boundary, so that the false samples are generated, and the discrimination force of the classifier is reduced, namely the boundary effect. The SRDCF algorithm proposed in 2015 greatly reduces the boundary effect and improves the tracking accuracy by introducing a regularization weight coefficient that conforms to the spatial constraint. The tracking algorithm based on the correlation filtering belongs to a discriminant tracking algorithm, and features are often extracted before judgment so as to serve as judgment basis to improve the judgment accuracy, and the characterization capability of the features determines the tracking effect to a great extent. Since the first time Hinton took advantage of the AlexNet deep convolutional neural network in the ImageNet image classification competition in 2012, the deep convolutional neural network started to rise, which also exhibited impressive performance in many tasks, in particular, the strong feature extraction capability. In 2015, the deep convolutional neural network of the VGG is proposed to be applied to the SRDCF algorithm by the DeepSRDCF algorithm, so that the precision is further improved, but only a feature map output by a single-layer neural network can be used, and the original potential of the deep convolutional neural network is greatly limited. The 2016C-COT algorithm enables the depth feature map output by the multi-layer neural network to be effectively combined through a method of training a continuous convolution filter. However, the way of combining the depth feature maps output by the multilayer neural network with the C-COT algorithm is only to simply combine the depth feature maps in a high-dimensional space, and does not well integrate the spatial information of the depth feature maps output by the shallow network of the neural network and the discriminative power of the depth feature maps output by the deep network.

Aiming at the problem that the existing tracking algorithm cannot well integrate the spatial information of the depth feature map output by the shallow network of the neural network and the discrimination capability of the depth feature map output by the deep network, a tracking algorithm needs to be designed, so that the spatial information of the depth feature map output by the shallow network of the neural network and the discrimination capability of the depth feature map output by the deep network can be well utilized, and the tracking precision is improved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a target tracking method based on an FPN neural network.

In order to achieve the purpose, the invention adopts the following technical scheme:

a target tracking method based on an FPN neural network comprises the following specific steps:

step one, for the first frame image t being 1, the central position (x) of the first frame tracking target given by the tracking task_t,y_t) And tracking target area size information (l)_t,h_t) Expanding the tracking target area by a certain ratio (l)_p,t,h_p,t)＝α(l_t,h_t) According to the target center position (x)_t,y_t) And the expanded size (l) of the tracking target region_p,t,h_p,t) Sampling in the frame image to obtain a training sample; wherein x_tFor tracking the abscissa, y, of the central position of the target_tFor tracking the ordinate of the central position of the target,/_tFor the length of the original tracking target region, h_tWidth of original tracking target region, expansion ratio of α, l_p,tTo extend the length of the trailing target area, h_p,tTo expand the width of the target area to be tracked;

step two, inputting training samples T obtained by sampling in the first frame image into the FPN neural network, and extracting P in the FPN neural network₂Characteristics of the layerWherein T represents a training sample, Z_TP representing the number of channels n in an FPN neural network₂The characteristics of the layers are such that,representing P in FPN neural network₂Characteristic Z of the layer_TA medium nth dimension feature;

step three, extracting P obtained by passing the training sample T through the FPN neural network₂Characteristics of the layerFor calculation of the correlation filter parameters;

step four, for the next frame of image t +1, tracking the central position (x) of the target according to the previous frame_t,y_t) And the expanded size (l) of the tracking target region_p,t,h_p,t) For the expanded tracking target area (l) of the previous frame_p,t,h_p,t) Carrying out multi-scale scaling to obtain various candidate region sizes { (l)_p,t+1,h_p,t+1)}＝{β(l_p,t,h_p,t) Wherein β is a scaling scale, β ═ {0.985,0.99, 0.995,1,1.005,1.01,1.015}, and then, the target center position (x) is tracked according to the previous frame_t,y_t) And a plurality of candidate region sizes { (l)_p,t+1,h_p,t+1)}＝{β(l_p,t,h_p,t) Sampling the frame image to obtain a candidate sample set X ═ X (X)₁X₂… X₇)；

Inputting the candidate sample set X obtained by sampling into the FPN neural network, and extracting P in the FPN neural network₂Characteristics of the layerWhereinRepresenting the first candidate sample X₁Inputting the data into an FPN neural network and providing P with the number of channels n₂Layer characteristics, i.e.

Step six, extracting P obtained by passing each candidate sample of the candidate sample set through an FPN neural network₂Characteristic Z of the layer_XFor calculating the response map, and finally determining the central position (x) of the tracking target of the frame_t+1,y_t+1) And tracking target area size (l)_t+1,h_t+1)；

And step seven, after the central position and the area size of the tracking target are obtained, continuously repeating the step one to the step six until the video is finished, and completing the tracking of the central position and the area size of the tracking target.

The calculation of the parameters of the relevant filter in the third step specifically includes:

first, for P₂Characteristics of the layerPerforming fast Fourier transform to transform the features from time domain to frequency domainTo obtain

Second, for each feature channelk is 1,2, … n, which is vectorized and then reconstructed into a diagonal matrix, i.e., a matrix with a high degree of freedom

Diagonal matrix constructed from all channels in a featurek 1,2, … n are recombined to form a diagonal matrix, i.e.

Using this diagonal matrix D_tCalculating to obtain A_tAndwherein,

wherein W is a regularization matrix constructed by the spatial regularization coefficients;the method comprises the steps of obtaining a Gaussian label through fast Fourier transform and real quantization;

finally, according to the formulaAndcalculating to obtain related filtering parametersThe specific method comprises the following steps: firstly, matrix A is formed₁Decomposed into an upper triangular matrix L₁And a lower triangular matrix U₁I.e. A₁＝L₁+U₁(ii) a Then, Gaussian-Seidel iteration is carried out to obtain relevant filter parameters after real numberFinally by calculatingDeriving correlation filter parametersWhere B is a unitary matrix of orthonormal bases, the function of which is to correlate the filter parametersAnd (5) carrying out real number conversion.

The method comprises the following steps:

firstly, each candidate sample of the candidate sample set is extracted to obtain P after passing through an FPN neural network₂Characteristic Z of the layer_XPerforming fast Fourier transform to obtain frequency domain characteristicsWhereink is 1, and 2 … 7 is the kth candidate sample P₂Characteristics of the layerObtaining frequency domain characteristics through fast Fourier transform

Then, for each candidate sample in the candidate sample set, the characteristic of fast Fourier transform is performedk is 1,2 … 7, which is related to the filter parameters after fast Fourier transformPerform dot multiplication, andthen, performing inverse Fourier transform to obtain a response map corresponding to the candidate samplek =1,2 … 7, i.e.

WhereinFor the kth candidate sample X_kK is the corresponding graph of 1,2 … 7, F^-1Is an inverse Fourier transform;

finally, response graphs corresponding to all candidate samples in the candidate sample setFind the point (x) where the response value is maximum_t+1,y_t+1) The corresponding position of the point is the tracking target position of the frame, and the size of the candidate sample corresponding to the point is the enlarged tracking target area size (l)_p,t+1,h_p,t+1) Then, the target region size is calculated from the enlargement ratio α

Compared with the prior art, the method has the following outstanding advantages:

the method does not use the traditional VGG deep neural network, but uses the FPN deep neural network instead, and utilizes the space information of the depth characteristic diagram output by the shallow network of the neural network in the FPN deep neural network and the good judgment capability of the depth characteristic diagram output by the deep network, thereby improving the target tracking precision. The method is a real-time robust tracking algorithm, and achieves good effect in different tracking scenes.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The following further describes a specific embodiment of the present invention with reference to the drawings.

As shown in fig. 1, a target tracking method based on an FPN neural network includes the following specific steps:

Using this diagonal matrix D_tCalculating to obtain A_tAndwherein,

Step six, extracting P obtained by passing each candidate sample of the candidate sample set through an FPN neural network₂Characteristics of the layerZ_XFor calculating the response map, and finally determining the central position (x) of the tracking target of the frame_t+1,y_t+1) And tracking target area size (l)_t+1,h_t+1)；

Then, for each candidate sample in the candidate sample set, the characteristic of fast Fourier transform is performedk is 1,2 … 7, which is related to the filter parameters after fast Fourier transformPerforming dot multiplication and then performing inverse Fourier transform to obtain a response map corresponding to the candidate samplek is 1,2 … 7, i.e.

Claims

1. A target tracking method based on an FPN neural network is characterized by comprising the following specific steps:

2. The target tracking method based on the FPN neural network of claim 1, wherein the calculation of the relevant filter parameters in step three is specifically:

Second, for each feature channelVectorised and then reconstructed into a diagonal matrix, i.e.

Diagonal matrix constructed from all channels in a featureThe recombination is configured as a diagonal matrix, i.e.

Using this diagonal matrix D_tCalculating to obtain A_tAndwherein,

wherein W is a regularization matrix constructed by the spatial regularization coefficients;obtained by fast Fourier transform and real number of Gaussian labelThe label of (1);

finally, according to the formulaAndcalculating to obtain related filtering parameters after fast Fourier transform

3. The target tracking method based on the FPN neural network of claim 2, wherein the calculation is performed to obtain the relevant filtering parameters by the following specific method: firstly, matrix A is formed_tDecomposed into an upper triangular matrix L_tAnd a lower triangular matrix U_tI.e. A_t＝L_t+U_t(ii) a Then, Gaussian-Seidel iteration is carried out to obtain related filter parameters which are subjected to fast Fourier transform and real number conversionFinally by calculatingObtaining the parameters of the related filter after fast Fourier transformWherein B is unitary matrix composed of orthonormal basis and is used for fast Fourier transform of related filter parametersAnd (5) carrying out real number conversion.

4. The target tracking method based on the FPN neural network of claim 1, wherein the sixth specific method of the steps is as follows:

firstly, each candidate sample of the candidate sample set is extracted to obtain P after passing through an FPN neural network₂Characteristic Z of the layer_XPerforming fast Fourier transform to obtain frequency domain characteristicsWhereinFor the kth candidate sample P₂Characteristics of the layerObtaining frequency domain characteristics through fast Fourier transform

Then, for each candidate sample in the candidate sample set, the characteristic of fast Fourier transform is performedCorrelating it with fast Fourier transformed correlation filter parametersPerforming dot multiplication and then performing inverse Fourier transform to obtain a response map corresponding to the candidate sampleNamely, it is