CN105956603A

CN105956603A - Video sequence classifying method based on tensor time domain association model

Info

Publication number: CN105956603A
Application number: CN201610237984.4A
Authority: CN
Inventors: 张静; 徐传忠; 苏育挺; 井佩光
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-04-15
Filing date: 2016-04-15
Publication date: 2016-09-21

Abstract

A video sequence classifying method based on a tensor time domain association model comprises the steps of representing an original video sequence in a three-grade video tensor manner; performing tensor Tucker decomposition on the three-grade video tensor for obtaining a latent kernel tensor; applying an autoregression model on the time domain of the obtained latent kernel tensor for establishing relevance between adjacent time slices; dynamically studying the front process, updating the result until an algorithm is convergent, and obtaining an optimal result. The video sequence classifying method ensures time domain relevance and dependence of the video sequence after dimension reduction through limiting the time domain of the video sequence. The video sequence classifying method has advantages of sufficiently utilizing latent useful information in the video, eliminating redundant information in the video, ensuring high continuity of the video sequence in time domain, reducing classification difficulty of the video sequence, and improving classification accuracy of the video sequence. The video sequence classifying method is better than a traditional video sequence classification method and greatly improves classification precision.

Description

A kind of video sequence sorting technique based on tensor time domain correlation model

Technical field

The present invention relates to a kind of video sequence sorting technique.Particularly relate to one resolution of tensor technology associated with time domain Model combines, tensor video sequence is carried out that spatial domain dimensionality reduction obtains that the video sequence of potential low-dimensional represents based on tensor time The video sequence sorting technique of territory correlation model.

Background technology

Flourish recently as video capture device, the quantity of video data has obtained significant increase.In order to Being analyzed these video datas, video sequence classification has obtained great concern.Video sequence classification is widely used in Video frequency abstract, video frequency searching, action recognition etc..The gesture of human body, action video are ingredients important in video data, its In the transmission of deaf mute's information, the transmission of noisy environmental information, computer is the most medium has a wide range of applications.

Learning method based on subspace had obtained the very big concern in field in recent years, according to the sign shape that data are different Formula, learning method based on subspace is broadly divided into two classes: 1) traditional learning method based on subspace；2) based on multi-thread The sub-space learning method of property algebraically.

Video sequence is generally transformed into the form of vector or matrix, this kind by traditional learning method based on subspace The structural information that processing mode is easily destroyed between adjacent data causes the loss of time-space domain information and easily causes dimension disaster. Method based on polyteny sub-space learning uses tensor to model video sequence, so can preservation video sequence naturally In intrinsic structural information, do not result in the loss of structural information between data.But, mostly this method is to use isotropic Processing mode.Owing to video sequence is typical time series, in this sequence, data show in time domain and associate on the time Property, the method for isotropic can weaken between video sequence characteristic in time domain, affect classifying quality.

Owing to video sequence is inherently the time series data of higher-dimension, therefore in it relativity of time domain and dependence Property is most important to video content analysis.A few thing was had to start to focus on higher-dimension time series analysis, by polyteny in recent years Algebraic operation combines with time domain smoothness constraint and processes the time series data of high-order, but this method is not considered The statistical distribution rule that video time sequence is intrinsic, it is impossible to the architectural feature of enough practical coupling video sequence data.Simultaneously Such method typically requires and manual arranges many important parameters, is more suitable for longer video sequence classification problem.

Summary of the invention

The technical problem to be solved is to provide a kind of redundancy eliminated in video, maintains video The sequence successional based on tensor time domain correlation model video sequence sorting technique in time domain.

The technical solution adopted in the present invention is: a kind of video sequence sorting technique based on tensor time domain correlation model, Comprise the steps:

1) original video sequence is characterized as the form of three rank video tensors；

2) three rank video tensors are carried out tensor Tucker and decomposes the core tensor that acquisition is potential；

3) the time domain application autoregression model to the potential core tensor obtained sets up the association between adjacent time section Property；

4) dynamic learning step 2)～step 3) updating result until algorithmic statement, result reaches optimum.

Step 1) it is given a series of video sequence X={X¹,…,Xⁱ..., X^N, and described video sequence is represented Become the form of three rank video tensors, wherein:

It is three rank video tensors, wherein I₁、I₂With T be expressed as a video sequence width, Height and the length of time shaft, N is the quantity of video sequence；Definition It it is one of tensor Isochronous surface.

Step 2) including: by finding two mapping matrixes in the framework that tensor Tucker decomposesWithBy three initial rank video tensorsMigration is a potential core tensorMoving J is kept during shifting₁<I₁, J₂<I₂, the potential core tensor Y that obtainedⁱIt is three initial rank video tensor XⁱA kind of low-dimensional The representation of intrinsic, described transition process is expressed as:

Yⁱ=Xⁱ×₁U₁×₂U₂ (1)

Wherein, I₁、I₂With the length that T is expressed as the width of a video sequence, height and time shaft；J₁And J₂Also It is expressed as width and the height of a video sequence.

Step 3) comprise the steps:

(1) application autoregression model sets up isochronous surfaceWith this isochronous surfaceIsochronous surface aboveRelatedness model, by relatedness model with m isochronous surfaceGo to predict next Individual isochronous surfaceDescribed relatedness model is as follows:

Y_{t}^{i} = Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i} + ϵ_{t}^{i} - - - (2)

WhereinIt is white Gaussian noise, meetsN is Gauss distribution,It is that the covariance of white noise is opened Amount, meetsVar is variance,It it is the model coefficient of autoregression model；

(2) model parameter in relatedness modelMaximal possibility estimation with Yule-Waler side based on tensor Journey obtains:

Σ_{k = 1}^{m} a_{k} (Σ_{t = 1}^{T} < Y_{t - k}^{i}, Y_{t - l}^{i} >) = Σ_{t = 1}^{T} < Y_{t}^{i}, Y_{t - l}^{i} > - - - (3)

(3) by Yⁱ=Xⁱ×₁U₁×₂U₂It is applied to relatedness modelIn, and rearrange and obtain Following noise covariance tensor expression:

\begin{matrix} ϵ_{t}^{i} = Y_{t}^{i} - Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i} \\ = (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} \end{matrix} - - - (4)

(4) as long as having estimated model parameterNoise covariance between actual value and the predictive value of isochronous surface The accumulation norm of tensor is obtained by following formula:

Σ_{t = 2}^{T} {|| ϵ_{t}^{i} ||}_{F}^{2} = Σ_{t = 2}^{T} {|| (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} ||}_{F}^{2} - - - (5)

(5) object function is obtained

By minimizing the accumulation norm of noise covariance tensor, obtain goal of function:

\begin{matrix} \min_{U_{1}, U_{2}, {a_{k}}} Σ_{i = 1}^{N} Σ_{t = 2}^{T} {|| (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} ||}_{F}^{2} \\ \begin{matrix} s . t . & U_{1} U_{1}^{T} = E & U_{2} U_{2}^{T} = E \end{matrix} \end{matrix} - - - (6)

Wherein, E representation unit orthogonal matrix；

(6) object function is solved

Alternately descent algorithm is used to go to solve object function, first random initializtion mapping matrix U₁And U₂Go to obtain potential Core tensor Yⁱ, then with potential core tensor YⁱGo to calculate the model parameter of relatedness modelFixing mapping square therewith Battle array U₂With potential core tensor Yⁱ, update the mapping matrix U of object function₁:

\begin{matrix} \min_{U_{1}} Σ_{i = 1}^{N} Σ_{t = 2}^{T} {|| (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U}_{(- 1)} \times {U_{1}}_{1} ||}_{F}^{2} \\ \begin{matrix} s . t . & U_{1} U_{1}^{T} = E \end{matrix} \end{matrix} - - - (7)

OrderObtain the equivalent form of value of object function:

\begin{matrix} \max_{U_{1}} t r (U_{1} (Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 1}^{i} G_{t 1}^{i^{T}}) U_{1}^{T}) \\ \begin{matrix} s . t . & U_{1} U_{1}^{T} = E \end{matrix} \end{matrix} - - - (8)

Minimize the equivalent form of value of object function by method of Lagrange multipliers and Eigenvalues Decomposition method, then object function is relative In U₁Partial derivative calculated by following formula:

Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 1}^{i} G_{t 1}^{i^{T}} u_{1 j}^{T} = λ_{1 j} u_{1 j}^{T} - - - (9)

U in the formula_1jIt it is matrixGeneralized eigenvector, λ_1jIt it is corresponding eigenvalue；

Equally, fixing mapping matrix U₁, then object function is relative to mapping matrix U₂Partial derivative calculated by following formula:

Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 2}^{i} G_{t 2}^{i^{T}} u_{2 j}^{T} = λ_{2 j} u_{2 j}^{T} - - - (10)

U in formula_2jIt it is matrixGeneralized eigenvector, λ_2jIt it is corresponding eigenvalue.

Step 4) be by step 2)～step 3) be incorporated in a dynamic learning framework make step 2)～step 3) Habit process is updated over time, until meeting the condition of convergenceWherein, β represents forecast error.

A kind of based on tensor time domain correlation model the video sequence sorting technique of the present invention, core is tensor video Sequence carries out the video sequence of the potential low-dimensional of spatial domain dimensionality reduction acquisition and represents.In this process by the time domain of video sequence is entered Row limits still has time domain relatedness and dependency with the video sequence after ensureing dimensionality reduction.The present invention sufficiently utilizes in video latent Useful information, eliminate the redundancy in video, maintain video sequence seriality in time domain.Reduce video The difficulty of sequence classification, promotes the accuracy rate of video sequence classification.The method of the present invention is better than traditional video sequence classification side Method, precision is greatly improved.

Accompanying drawing explanation

The flow chart of a kind of based on tensor time domain correlation model the video sequence sorting technique that Fig. 1 is.

Detailed description of the invention

Below in conjunction with embodiment and accompanying drawing, a kind of based on tensor time domain correlation model the video sequence of the present invention is classified Method is described in detail.

As it is shown in figure 1, a kind of based on tensor time domain correlation model the video sequence sorting technique of the present invention, including as follows Step:

1) original video sequence is characterized as the form of three rank video tensors；Specifically:

Given a series of video sequence X={X¹,…,Xⁱ..., X^N, and described video sequence is expressed as one three The form of rank video tensor, wherein:

It is three rank video tensors, wherein I₁、I₂With T be expressed as a video sequence width, Height and the length of time shaft, N is the quantity of video sequence；Definition It it is one of tensor Isochronous surface.Compared with the method for each frame visual signature in traditional extraction video sequence, video sequence is configured to three The method of rank video tensor can preserve the structural information of original video data and not cause the loss of information, and tensor and tensor Decomposition technique is the most highly developed, also haves laid a good foundation for follow-up algorithm design.

2) three rank video tensors are carried out tensor Tucker and decomposes the core tensor that acquisition is potential；Including:

By finding two mapping matrixes in the framework that tensor Tucker decomposesWithWill be Three first rank video tensorsMigration is a potential core tensorJ is kept in transition process₁< I₁, J₂<I₂, the potential core tensor Y that obtainedⁱIt is three initial rank video tensor XⁱThe representation of a kind of low-dimensional intrinsic, Described transition process is expressed as:

Yⁱ=Xⁱ×₁U₁×₂U₂ (1)

3) the time domain application autoregression model to the potential core tensor obtained sets up the association between adjacent time section Property；Owing to autoregression model shows powerful advantage when sequence analysis time, he can go by method statistically Model the seriality relation between each time point in time series.All have due to frame each in video sequence in the present invention The time domain continuity of height and time dependence, preserve initial video sequence tensor to obtain the character representation of robust simultaneously Time domain continuity and time dependence, we apply autoregression model to be modeled the internal relation of video sequence time domain. Specifically

Comprise the steps:

(1) in order to keep video sequence seriality in time domain between each time point, application autoregression model (AR) Set up isochronous surfaceWith this isochronous surfaceIsochronous surface aboveRelatedness model, by association Property model is with m isochronous surfaceGo to predict next isochronous surfaceDescribed relatedness model is as follows:

Y_{t}^{i} = Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i} + ϵ_{t}^{i} - - - (2)

WhereinIt is white Gaussian noise, meetsN is Gauss distribution,It is the covariance tensor of white noise, MeetVar is variance,It it is the model coefficient of autoregression model；

Σ_{k = 1}^{m} a_{k} (Σ_{t = 1}^{T} < Y_{t - k}^{i}, Y_{t - l}^{i} >) = Σ_{t = 1}^{T} < Y_{t}^{i}, Y_{t - l}^{i} > - - - (3)

\begin{matrix} ϵ_{t}^{i} = Y_{t}^{i} - Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i} \\ = (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} \end{matrix} - - - (4)

Σ_{t = 2}^{T} {|| ϵ_{t}^{i} ||}_{F}^{2} = Σ_{t = 2}^{T} {|| (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} ||}_{F}^{2} - - - (5)

(5) object function is obtained

\begin{matrix} \min_{U_{1}, U_{2}, {a_{k}}} Σ_{i = 1}^{N} Σ_{t = 2}^{T} {|| (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} ||}_{F}^{2} \\ \begin{matrix} s . t . & U_{1} U_{1}^{T} = E & U_{2} U_{2}^{T} = E \end{matrix} \end{matrix} - - - (6)

Wherein, E representation unit orthogonal matrix；

In order to preserve more information as far as possible and control the yardstick of known variables, we add in object function The restrictive condition of orthogonality.This object function shows that the new video sequence tensor obtained should subtract on time dimension as far as possible Few forecast error is to ensure the seriality on time dimension, and eliminates the noise in original video sequence and redundancy on spatial domain Information.

(6) object function is solved

In order to effectively be solved scheme, typical alternately descent algorithm is used to go to solve object function, the most at random Initialize mapping matrix U₁And U₂Go to obtain potential core tensor Yⁱ, then with potential core tensor YⁱGo to calculate relatedness model Model parameterFix mapping matrix U therewith₂With potential core tensor Yⁱ, update the mapping matrix U of object function₁:

\begin{matrix} \min_{U_{1}} Σ_{i = 1}^{N} Σ_{t = 2}^{T} {|| (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U}_{(- 1)} \times {U_{1}}_{1} ||}_{F}^{2} \\ \begin{matrix} s . t . & U_{1} U_{1}^{T} = E \end{matrix} \end{matrix} - - - (7)

OrderObtain the equivalent form of value of object function:

\begin{matrix} \max_{U_{1}} t r (U_{1} (Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 1}^{i} G_{t 1}^{i^{T}}) U_{1}^{T}) \\ \begin{matrix} s . t . & U_{1} U_{1}^{T} = E \end{matrix} \end{matrix} - - - (8)

Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 1}^{i} G_{t 1}^{i^{T}} u_{1 j}^{T} = λ_{1 j} u_{1 j}^{T} - - - (9)

Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 2}^{i} G_{t 2}^{i^{T}} u_{2 j}^{T} = λ_{2 j} u_{2 j}^{T} - - - (10)

4) dynamic learning step 2)～step 3) updating result until algorithmic statement, result reaches optimum.Specifically will step Rapid 2)～step 3) be incorporated in a dynamic learning framework make step 2)～step 3) learning process carry out more over time Newly, until meeting the condition of convergenceWherein, β represents forecast error.

After being modeled processing to the method for the original video sequence present invention and obtaining new video sequence, we adopt With method (Y.M.Lui, J.R.Beveridge, the and M.Kirby, " Action classification on of product manifold product manifolds,”In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2010, pp.833 839.) the new video sequence obtained is classified. In product manifold, each tensor time series can be mapped as a point in product manifold, can be by calculating survey between points Ground linear distance goes to weigh the distance between tensor.

The effectiveness of the inventive method is verified below with a concrete experiment:

Experiment uses Cambridge gesture database, and this data base comprises the gesture that 9 classes are different altogether, and every class gesture has 100 a total of 900 videos of video.Every kind of gesture is all capture under two kinds of different main body backgrounds, be respectively 5 kinds not Same illumination and 10 kinds of actions.5 subsets it are divided into again according to this data base of different illumination conditions, Set1, Set2, Set3, Set4, Set5.In an experiment we by 20 frames in the middle of each video extraction the size of all video sequences Be adjusted to 20 × 20 × 20, using the Set5 of shooting under normal lighting conditions as training set, other as test set.In addition What Set5 was the most random has been divided into training set and checking collection.Experimental result such as table 1.

Wherein, KAHM, RLPP, DCCA, GDA are based on traditional sub-space learning method；TB, TCCA, PM are based on many The method of linear algebra sub-space learning；MLDS, BPTF are to tie mutually with relativity of time domain based on multilinear algebra sub-space learning The method closed.By table 1 it can be seen that the method for the present invention is significantly better than traditional video sequence sorting technique, precision obtains Arrive the biggest raising.

Table 1

Claims

1. a video sequence sorting technique based on tensor time domain correlation model, it is characterised in that comprise the steps:

3) the time domain application autoregression model to the potential core tensor obtained sets up the relatedness between adjacent time section；

A kind of video sequence sorting technique based on tensor time domain correlation model the most according to claim 1, its feature exists In, step 1) it is given a series of video sequence X={X¹,…,Xⁱ..., X^N, and described video sequence is expressed as one The form of individual three rank video tensors, wherein:

It is three rank video tensors, wherein I₁、I₂With the width, highly that T is expressed as a video sequence With the length of time shaft, N is the quantity of video sequence；Definition It it is the time of tensor Section.

A kind of video sequence sorting technique based on tensor time domain correlation model the most according to claim 1, its feature exists In, step 2) including: by finding two mapping matrixes in the framework that tensor Tucker decomposesWithBy three initial rank video tensorsMigration is a potential core tensorMoving J is kept during shifting₁<I₁, J₂<I₂, the potential core tensor Y that obtainedⁱIt is three initial rank video tensor XⁱA kind of low-dimensional The representation of intrinsic, described transition process is expressed as:

Yⁱ=Xⁱ×₁U₁×₂U₂ (1)

Wherein, I₁、I₂With the length that T is expressed as the width of a video sequence, height and time shaft；J₁And J₂Also table is distinguished It is shown as width and the height of a video sequence.

A kind of video sequence sorting technique based on tensor time domain correlation model the most according to claim 1, its feature exists In, step 3) comprise the steps:

(1) application autoregression model sets up isochronous surfaceWith this isochronous surfaceIsochronous surface above Relatedness model, by relatedness model with m isochronous surfaceGo to predict next isochronous surface Described relatedness model is as follows:

Y_{t}^{i} = Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i} + ϵ_{t}^{i} - - - (2)

WhereinIt is white Gaussian noise, meetsN is Gauss distribution,It is the covariance tensor of white noise, meetsVar is variance,It it is the model coefficient of autoregression model；

(2) model parameter in relatedness modelMaximal possibility estimation obtain with Yule-Waler equation based on tensor Arrive:

Σ_{k = 1}^{m} a_{k} (Σ_{t = 1}^{T} < Y_{t - k}^{i}, Y_{t - l}^{i} >) = Σ_{t = 1}^{T} < Y_{t}^{i}, Y_{t - l}^{i} > - - - (3)

(3) by Yⁱ=Xⁱ×₁U₁×₂U₂It is applied to relatedness modelIn, and rearrange obtain following Noise covariance tensor expression:

\begin{matrix} ϵ_{t}^{i} = Y_{t}^{i} - Σ_{k = 1}^{m} a_{k} Y_{t - k}^{i} \\ = (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} \end{matrix} - - - (4)

(4) as long as having estimated model parameterNoise covariance tensor between actual value and the predictive value of isochronous surface Accumulation norm obtained by following formula:

Σ_{t = 2}^{T} | | ϵ_{t}^{i} | |_{F}^{2} = Σ_{t = 2}^{T} | | (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} | |_{F}^{2} - - - (5)

(5) object function is obtained

\underset{U_{1}, U_{2}, {a_{k}}}{m i n} Σ_{i = 1}^{N} Σ_{t = 2}^{T} | | (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U_{1}}_{1} \times {U_{2}}_{2} | |_{F}^{2} - - - (6)

\begin{matrix} s . t . & U_{1} U_{1}^{T} = E, & U_{2} U_{2}^{T} = E \end{matrix}

Wherein, E representation unit orthogonal matrix；

(6) object function is solved

Alternately descent algorithm is used to go to solve object function, first random initializtion mapping matrix U₁And U₂Go to obtain potential core Tensor Yⁱ, then with potential core tensor YⁱGo to calculate the model parameter of relatedness modelFix mapping matrix U therewith₂ With potential core tensor Yⁱ, update the mapping matrix U of object function₁:

\underset{U_{1}}{m i n} Σ_{i = 1}^{N} Σ_{t = 2}^{T} | | (X_{t}^{i} - Σ_{k = 1}^{m} a_{k} X_{t - k}^{i}) \times {U}_{(- 1)} \times {U_{1}}_{1} | |_{F}^{2} - - - (7)

\begin{matrix} s . t . & U_{1} U_{1}^{T} = E \end{matrix}

OrderObtain the equivalent form of value of object function:

\underset{U_{1}}{m a x} t r (U_{1} (Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 1}^{i} G_{t 1}^{i^{T}}) U_{1}^{T}) - - - (8)

\begin{matrix} s . t . & U_{1} U_{1}^{T} = E \end{matrix}

Minimize the equivalent form of value of object function by method of Lagrange multipliers and Eigenvalues Decomposition method, then object function is relative to U₁ Partial derivative calculated by following formula:

Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 1}^{i} G_{t 1}^{i^{T}} u_{1 j}^{T} = λ_{1 j} u_{1 j}^{T} - - - (9)

Σ_{i = 1}^{N} Σ_{t = 2}^{T} G_{t 2}^{i} G_{t 2}^{i^{T}} u_{2 j}^{T} = λ_{2 j} u_{2 j}^{T} - - - (10)

A kind of video sequence sorting technique based on tensor time domain correlation model the most according to claim 1, its feature exists In, step 4) be by step 2)～step 3) be incorporated in a dynamic learning framework make step 2)～step 3) learning process It is updated over time, until meeting the condition of convergenceWherein, β represents forecast error.