CN110084834B

CN110084834B - Target tracking method based on rapid tensor singular value decomposition feature dimension reduction

Info

Publication number: CN110084834B
Application number: CN201910349128.1A
Authority: CN
Inventors: 傅衡成; 周武能
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2021-04-06
Anticipated expiration: 2039-04-28
Also published as: CN110084834A

Abstract

The invention discloses a target tracking method based on rapid tensor singular value decomposition feature dimension reduction, which comprises the following steps: extracting various characteristics of each frame of video data and constructing a tensor structure; performing singular value decomposition on the constructed tensor; and training a relevant filter for the features after dimension reduction, and tracking the target. The method can effectively reduce the number of the features and accelerate the tracking speed, and compared with the traditional vector-based principal component analysis feature dimension reduction mode and the like, the method better retains the structural information of the features; tensor singular value decomposition has invariance to the rotation of features to enhance the robustness of the tracker to the rotation of the target.

Description

Target tracking method based on rapid tensor singular value decomposition feature dimension reduction

Technical Field

The invention relates to a target tracking method based on rapid tensor singular value decomposition feature dimension reduction, and belongs to the technical field of video target tracking.

Background

The target tracking has important significance for the development of the fields of robots, unmanned planes, automatic driving, navigation, guidance and the like. For example, in the human-computer interaction process, the camera continuously tracks the human behavior, and the robot achieves the understanding of the human posture, the human motion and the human gesture through a series of analysis processing, so that the friendly communication between the human and the machine is better realized; in the unmanned aerial vehicle target tracking process, visual information of a target is continuously acquired and transmitted to a ground control station, and a video image sequence is analyzed through an algorithm to obtain real-time position information of the tracked target so as to ensure that the tracked target is within the visual field range of the unmanned aerial vehicle in real time.

In recent years, a related filtering type target tracking method not only has higher tracking speed but also has good tracking precision, but also has seriously reduced tracking speed of related filters along with the continuous increase of various characteristics.

In recent years, image features used for correlation filtering, such as color name features, gradient histogram features, and depth features of a depth convolution neural network, have increased tracking accuracy, but have decreased tracking speed of the correlation filter.

Disclosure of Invention

The purpose of the invention is: provided is a target tracking method which can improve the accuracy by using more characteristics and reduce the loss of speed.

In order to achieve the above object, the technical solution of the present invention is to provide a target tracking method based on fast tensor singular value decomposition feature dimension reduction, which is characterized in that a tensor singular value decomposition feature dimension reduction feature is used for target tracking, and the method includes the following steps:

(1) extracting gradient direction histogram features HOG, color name features CN and pre-trained depth convolution features CNN of a t frame tracking result window;

(2) discharging the features extracted in the step (1) into a horizontal section of a tensor to form 4 mutually independent third-order tensors which are respectively recorded as L_i，i＝1,2,3,4；

(3) Respectively calculating average characteristics by taking the horizontal slice of each tensor as a unit, and recording the average characteristics of the ith characteristic tensor as M_iThen, there are:

in the formula, N_iThe number of horizontal slices for the ith feature tensor; l is_i(j,: means a third order tensor L_iThe jth horizontal slice of (a);

(4) the horizontal slice of each feature tensor is subtracted by the corresponding average feature and is recorded as A_i；

(5) Transforming the time-domain feature tensor into the frequency domain using a fast Fourier transform, and for each feature in the frequency domainPerforming traditional matrix singular value decomposition on each horizontal slice of the feature tensor, and intercepting columns of a left singular matrix, wherein the reserved number is that the front k dimension of the columns is also equal to the dimension of the feature tensor after dimension reduction; after the singular value decomposition of each side slice is completed, the left singular matrixes which are reserved are formed into left singular eigenvectors according to the original arrangement sequence, finally, the left singular eigenvectors are converted into time domains from the frequency domains through the inverse fast Fourier transform, and the left singular eigenvectors of each time domain form are respectively recorded as the left singular eigenvectors

(6) Carrying out tensor product operation by utilizing the left singular characteristic tensor of the time domain form and the characteristic tensor which subtracts the average characteristic to obtain a characteristic tensor after dimension reduction, and adding the average characteristic which is obtained correspondingly before to each front slice of each characteristic tensor after dimension reduction to obtain a characteristic tensor F_i：

In the formula, tprod (·) represents a tensor product; tran (U)_i) Transpose of the representation of the feature tensor U

(7) For each feature tensor F_iTransposing, placing the front slices into the side slices, and then applying each feature tensor F_iThe side panels of (a) are all trained and noted as a filter

And updating the previous filter, wherein the updated formula is shown as the following formula:

in the formula (I), the compound is shown in the specification,

a filter representing the ith feature of the t-th frame, wherein eta is the learning rate of the filter;

(8) extracting gradient direction histogram features HOG, color name features CN and depth convolution features CNN of the candidate region of the t +1 th frame, keeping the discharging sequence consistent with the step (2), and carrying out tensor multiplication on a projection operator obtained by the t-th frame and each feature tensor of the t +1 th frame to obtain a feature tensor after dimension reduction;

(9) performing convolution operation on the feature tensor subjected to dimensionality reduction by using the filter obtained from the t-th frame, obtaining a confidence map by slicing the side surface of each feature tensor, adding the confidence maps to obtain a response map, and taking the position with the maximum response map as the position of a target in the t + 1-th frame;

(10) and (4) judging whether the frame is the last frame or not, if not, making t equal to t +1, returning to the step (1), and if so, stopping tracking.

Preferably, in step (1), the histogram of gradient direction feature HOG includes 31 layers, the color name feature CN includes 11 layers, the Layer1 of the depth convolution feature CNN includes 96 layers, and the Layer5 includes 512 layers.

Preferably, in step (5), the matrix singular value decomposition in the frequency domain form is as follows, and the projection operator is updated according to the following formula:

in the formula, P_i ^tAnd a represents the learning rate of the projection operator.

The method can effectively reduce the number of the features and accelerate the tracking speed, and compared with the traditional vector-based principal component analysis feature dimension reduction mode and the like, the method better retains the structural information of the features; tensor singular value decomposition has invariance to the rotation of features to enhance the robustness of the tracker to the rotation of the target.

Drawings

FIG. 1 is a flow algorithm implemented by the present invention;

figure 2 is an example of tensor feature dimension reduction as practiced by the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The invention provides a target tracking method based on rapid tensor singular value decomposition feature dimension reduction, which comprises the following steps of:

(1) and extracting gradient direction histogram features HOG, color name features CN and pre-trained depth convolution features CNN of the t frame tracking result window. The histogram of gradient direction feature HOG includes 31 layers, the color name feature CN includes 11 layers, the Layer1 of the depth convolution feature CNN includes 96 layers, and the Layer5 includes 512 layers.

(2) The features are arranged into horizontal slices of tensors to form 4 mutually independent third-order tensors which are respectively marked as L_i，i＝1,2,3,4。

(3) The average features are calculated in units of horizontal slices per tensor, respectively.

In the formula, N_iThe number of horizontal slices of the ith feature tensor is 31, for example, the number of horizontal slices of the HOG feature tensor is 31; l is_i(j,: means a third order tensor L_iThe jth horizontal slice of (1), M_iIs the average characteristic of the ith tensor.

(4) The horizontal slice of each feature tensor is subtracted by the corresponding average feature and is recorded as A_iWherein i is 1, 2, 3, 4.

(5) Converting the time domain feature tensor into the frequency domain using a fast Fourier transform, and performing a conventional moment on each horizontal slice of each feature tensor in the frequency domainAnd performing matrix singular value decomposition, and intercepting columns of a left singular matrix, wherein the reserved front k dimensions of the columns are also equal to the dimensions of the feature tensor after dimension reduction. After the singular value decomposition of each side slice is completed, the reserved left singular matrixes are formed into tensors according to the original arrangement sequence, finally, the left singular eigenvector is converted from the frequency domain to the time domain through the inverse fast Fourier transform, and the left singular eigenvector (also called as a projection operator) of each time domain form is recorded as a projection operator

The matrix singular value decomposition in the form of the frequency domain is shown as the following formula, and the projection operator is updated by the following formula:

in the formula, P_i ^tA projection operator representing the ith feature tensor for the tth frame; α represents the learning rate of the projection operator.

(6) And carrying out tensor product operation by utilizing the left singular tensor of the time domain form and the feature tensor of which the average feature is subtracted to obtain a feature tensor after dimension reduction, and adding the previously obtained average feature to each front slice of each feature tensor after dimension reduction. As shown in the following formula:

in the formula, tprod (·) represents the tensor product, tran (U)_i) Representing tensor U_iTranspose of (F)_iRepresenting the resulting feature tensor.

(7) Transposing each feature tensor, arranging the front slices to the side slices, then training a filter for each tensor side slice and recording as

And performing a further operation on the previous filterThe new, updated formula is shown as follows:

in the formula (I), the compound is shown in the specification,

and eta is the learning rate of the filter.

(8) And (3) extracting gradient direction histogram features HOG, color name features CN and depth convolution features CNN of the candidate region of the t +1 th frame, keeping the discharging sequence consistent with the step (2), and carrying out tensor multiplication on the projection operator obtained by the t th frame and each feature tensor of the t +1 th frame to obtain the feature tensor after dimension reduction.

(9) And performing convolution operation on the feature tensor after dimensionality reduction by using the filter obtained from the t-th frame, obtaining a confidence map by side slicing of each feature tensor, adding the confidence maps to obtain a response map, and taking the position with the maximum response map as the position of the target in the t + 1-th frame.

Claims

1. A target tracking method based on fast tensor singular value decomposition feature dimension reduction is characterized in that the tensor singular value decomposition feature dimension reduction feature is utilized to track a target, and the method comprises the following steps:

(5) Converting a time domain feature tensor into a frequency domain by utilizing fast Fourier transform, performing traditional matrix singular value decomposition on each horizontal slice of each feature tensor in the form of the frequency domain, and intercepting columns of a left singular matrix, wherein the reserved front k dimensions of the columns are also equal to the dimensions of the feature tensor after dimension reduction; after the singular value decomposition of each side slice is completed, the left singular matrixes which are reserved are formed into left singular eigenvectors according to the original arrangement sequence, finally, the left singular eigenvectors are converted into time domains from the frequency domains through the inverse fast Fourier transform, and the left singular eigenvectors of each time domain form are respectively recorded as the left singular eigenvectors

In the formula, tprod (·) represents a tensor product; tran (U)_i) Tensor U representing features_iIs transferred to

(7) For each feature tensor F_iTransposing, arranging the front slicesTo the side slice, and then for each feature tensor F_iThe side panels of (a) are all trained and noted as a filter

in the formula (I), the compound is shown in the specification,

2. The fast tensor singular value decomposition feature dimension reduction-based target tracking method as claimed in claim 1, wherein in step (1), the gradient direction histogram feature HOG comprises 31 layers, the color name feature CN comprises 11 layers, the Layer1 of the deep convolution feature CNN comprises 96 layers, and the Layer5 comprises 512 layers.

3. The method for tracking the target based on the feature dimension reduction of the fast tensor singular value decomposition as claimed in claim 1, wherein in the step (5), the matrix singular value decomposition in the frequency domain form is as follows, and the projection operator is updated according to the following formula: