CN104881651A

CN104881651A - Figure behavior identification method based on random projection and Fisher vectors

Info

Publication number: CN104881651A
Application number: CN201510289260.XA
Authority: CN
Inventors: 何军; 薛莹; 周媛; 胡昭华
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanji Agricultural Machinery Research Institute Co.,Ltd.
Priority date: 2015-05-29
Filing date: 2015-05-29
Publication date: 2015-09-02
Anticipated expiration: 2035-05-29
Also published as: CN104881651B

Abstract

The invention discloses a figure behavior identification method based on random projection and Fisher vectors. The method employs a random projection theorem method to replace a principal component analysis method for characteristic dimension reduction, for the purpose of solving the problems of large time consumption, indeterminate reservation of principle components and the like. A random projection theorem indicates that through a compression measurement matrix, original signals with a sparse property can be projected to a certain low-dimension subspace, and the point distance between a vector after mapping and an original high-dimension characteristic vector maintains basically unchanged, i.e., data distortion is not generated in a whole compression process. Besides, different from hard division of a BoW model, the method provided by the invention employs a GMM-Fisher vector hybrid model for soft division of locus characteristic vectors, is integrated with the characteristics of a Fisher nucleus generation mode and a discrimination mode, can calculate the occurrence frequency of each characteristic descriptor, can also describe the probability distribution conditions of these characteristic descriptors in the perspective of statistics, enriches characteristic expression of behavior motion and also improves the behavior identification efficiency.

Description

A kind of personage's Activity recognition method based on accidental projection and Fisher vector

Technical field

The present invention relates to signal processing technology field, particularly a kind of personage's Activity recognition method based on accidental projection and Fisher vector.

Background technology

Activity recognition technology is widely used the fields such as video monitoring, video frequency searching, military detection, medical diagnosis and monitoring, has broad application prospects and economic worth.Traditional Activity recognition method is that the track characteristic extracted is embedded into visual word bag (Bag-of-Words, BoW) in model, an abundant visual dictionary is built by the local path feature extracted in video, and the frequency utilizing the mode of central cluster to count local feature vectors respectively to occur relative to center word, the histogram consisted of vision word frequency characterizes the object that a class video finally reaches personage's Activity recognition.BoW model most critical to construct a very huge visual dictionary exactly, thus the accuracy rate of Activity recognition depends on the size of the scale of institute's structural attitude dictionary to a great extent, local path Feature Descriptor is abundanter, personage's Activity recognition is more accurate, but action trail vector dimension is higher simultaneously, just add the consumption assessed the cost with the time so to a certain extent, also bring huge challenge to the study of sorter and the process of training simultaneously, therefore under the prerequisite ensureing original higher-dimension trace information, effective measures are found just imperative to the dimension reducing these track characteristic vectors.

Principal Component Analysis Method realizes linear dimensionality reduction to original signal under the minimum meaning of square error.It carrys out the number of the contribution of scaling information amount with square extent, namely the component that variance is larger useful information contained by it is more, so abandon those to contribute information on little component, original signal with regard to being mapped on the linear subspaces opened by front K maximum major component, to reach the object of the dimension reducing data.But the method cannot process the data on non-linearity manifold and computation process is quite consuming time, need take a large amount of storage spaces, have a strong impact on the efficiency of sorter training and classification.

Summary of the invention

Technical matters to be solved by this invention overcomes the deficiencies in the prior art and provides a kind of personage's Activity recognition method based on accidental projection and Fisher vector, the present invention adopts GMM-Fisher vector mixture model to carry out fuzzy partitioning to track proper vector, it has merged the feature of Fisher karyogenesis pattern and discrimination model, the frequency that each characteristic descriptor occurs can not only be calculated, can also describe the probability distribution situation of these Feature Descriptors from statistical meaning, the feature representation both having enriched behavior act turn improves the efficiency of Activity recognition.

The present invention is for solving the problems of the technologies described above by the following technical solutions:

According to a kind of personage's Activity recognition method based on accidental projection and Fisher vector that the present invention proposes, comprise the following steps:

Step (1), under fixing frame number prerequisite, extract and follow the tracks of local behavioural characteristic in video, its characteristic locus is extracted in least error allowed band, then all kinds of characteristic locus descriptor information is merged, obtain a higher-dimension track characteristic vector, form the characteristic locus space of matrices of this class behavior video;

Step (2), space of matrices step (1) obtained adopt the method for accidental projection to project in low n-dimensional subspace n, then adopt the generative process of gauss hybrid models to the track characteristic signal after projection dimensionality reduction to carry out modeling, try to achieve the Fisher vector of track characteristic;

Step (3), the Fisher vector that step (2) obtains being utilized the method for accidental projection again by its reprojection in a low n-dimensional subspace n, training a lineoid for distinguishing various actions by SVM classifier by the mode of adding class label;

The Fisher vector that the track characteristic of test video treated by step (4), the sorter trained according to step (3) carries out behaviour classification prediction, realizes behaviour classification identification.

As a kind of further prioritization scheme of personage's Activity recognition method based on accidental projection and Fisher vector of the present invention, specifically comprise the following steps:

Step one, travel through all m training video S=[S ₁, S ₂... S _m], for each training video S _iextract its characteristic locus descriptor, form track characteristic obtain higher-dimension track characteristic matrix X=[X ₁, X ₂..., X _m]; Wherein, T _ibe the number of track in I behavior video, I is integer and 1≤I≤m, and subscript T is transposition, x _afor the characteristic locus descriptor extracted in training video, a is integer and 1≤a≤T _i;

Step 2, the method for the higher-dimension track characteristic matrix accidental projection in step one is projected in a low n-dimensional subspace n, namely

V^{R P} = [v_{t} &Element; R^{d}], t = 1, ..., Σ_{I = 1}^{m} T_{I}, d < < D;

Wherein, d represents the dimension of track characteristic after random dimension-reduction treatment, and D is initial trace dimension, R ^drepresent the low n-dimensional subspace n after dimensionality reduction, v _trepresent the track characteristic of a behavior video after dimensionality reduction, V ^rPthe set of all behavior video track features after expression projection dimensionality reduction;

Step 3, p _λ(v _t) be about parameter set λ={ w _i, u _i, the probability density function of ∑ i}, is used for carrying out modeling to the generative process of the track characteristic signal after accidental projection dimensionality reduction, wherein, and w _irepresent the hybrid weight of i-th Gauss's unit, u _irepresent the mean vector of i-th Gauss's unit, ∑ i represents the covariance matrix of i-th Gauss's unit, i=1 ..., K; If v _t∈ R ^dall obey independent same distribution, the gauss hybrid models of the parameter set λ of the GMM containing K Gauss's unit be defined as follows:

p_{λ} (v_{t}) = Σ_{i = 1}^{K} w_{i} p_{i} (v_{t});

Wherein represent the track characteristic v after dimensionality reduction _tthe probability density function of i-th Gauss's unit; Covariance matrix is diagonal matrix, from Bayesian formula, and the track characteristic v after dimensionality reduction _tthe probability being assigned to i-th Gauss's unit is:

r (i) = \frac{w_{i} p_{i} (v_{t})}{Σ_{i = 1}^{K} w_{i} p_{i} (v_{t})};

Step 4, then the log-likelihood function of set V about λ of all track characteristics after accidental projection, then the track characteristic v after dimensionality reduction _tabout GMM parameter set λ={ w _i, u _i, the gradient of ∑ i} is expressed as:

\frac{\partial l_{λ} (V)}{\partial w_{i}} = Σ_{t = 1}^{T_{m}} [\frac{r (i)}{w_{i}} - \frac{r (1)}{w_{1}}]

\frac{\partial l_{λ} (V)}{\partial u_{i}^{k}} = Σ_{t = 1}^{T_{m}} r (i) [\frac{x_{t}^{k} - u_{i}^{k}}{σ_{i}^{k^{2}}}]

\frac{\partial l_{λ} (V)}{\partial σ_{i}^{k}} = Σ_{t = 1}^{T_{m}} r (i) [\frac{{(x_{t}^{k} - u_{i}^{k})}^{2}}{σ_{i}^{k^{3}}} - \frac{1}{σ_{i}^{k}}];

Wherein, represent the mean vector of i-th Gauss's unit containing k major component, represent the covariance vector of i-th Gauss's unit containing k major component;

After normalized gradient vector, each Grad of cascade, finally obtains the Fisher vector of track characteristic;

Step 5, the Fisher vector obtained in step 4 is utilized the method for accidental projection again by its reprojection in a low n-dimensional subspace n, namely

V^{{RP}^{'}} = [v_{t}^{'} &Element; R^{d^{'}}], t = 1, ..., Σ_{I = 1}^{m} T_{I}, d^{'} < < d;

Wherein, d' represents Fisher vector dimension after accidental projection secondary dimensionality reduction, R ^d'represent the low n-dimensional subspace n after secondary dimensionality reduction, v _t' represent a Fisher vector after secondary dimensionality reduction, V ^rP'the set of all Fisher vector after expression secondary dimensionality reduction;

Step 6, training SVM classifier, by the track characteristic after m training video dimensionality reduction coding stick the label of m character pair behavior respectively, train the lineoid distinguishing different behavior act;

Step 7, choose n test set Z=[Z ₁, Z ₂... Z _n, 1≤J≤n] in a new behavior video Z _j, extract the track characteristic of test video wherein, n represents the number of test set video, T _jrepresent J test set behavior video Z _jthe number of middle track;

Step 8, to Y _jutilize accidental projection theorem to carry out Feature Dimension Reduction to it, projected to the low n-dimensional subspace n H of ^rPin, namely

H^{R P} = [h_{t t} &Element; R^{d d}], t t = 1, ..., Σ_{J = 1}^{n} T_{J};

Wherein, h _ttrepresent the track characteristic after test video dimensionality reduction, dd represents the dimension after dimensionality reduction, R ^ddrepresent the codomain after test set track characteristic dimensionality reduction;

Step 9, order then wherein H is the set of all track characteristics after accidental projection, according to obtain in step 4 about GMM parameter set λ={ w _i, u _i, ∑ i}, calculates the relevant gradient vector of the track characteristic of test set behavior video, namely

\frac{\partial l_{λ} (H)}{\partial w_{i}} = Σ_{t t = 1}^{T_{n}} [\frac{r (i)}{w_{i}} - \frac{r (1)}{w_{1}}]

\frac{\partial l_{λ} (H)}{\partial u_{i}^{k}} = Σ_{t t = 1}^{T_{n}} r (i) [\frac{x_{t t}^{k} - u_{i}^{k}}{σ_{i}^{k^{2}}}]

\frac{\partial l_{λ} (H)}{\partial σ_{i}^{k}} = Σ_{t t = 1}^{T_{n}} r (i) [\frac{{(x_{t t}^{k} - u_{i}^{k})}^{2}}{σ_{i}^{k^{3}}} - \frac{1}{σ_{i}^{k}}];

Each Grad of cascade after normalized gradient vector, finally obtains the Fisher vector of test set behavior video track feature;

Step 10, the Fisher vector of accidental projection theorem to test set behavior video track feature is utilized to carry out quadratic character dimensionality reduction;

Step 11, the Fisher vector of sorter to the test set behavior video track feature after quadratic character dimensionality reduction trained according to step 6 carry out behaviour classification prediction, the identification of consummatory behavior test set video.

As a kind of further prioritization scheme of personage's Activity recognition method based on accidental projection and Fisher vector of the present invention, the method for accidental projection in described step 2, specific as follows:

Track characteristic space x is tieed up to original D _t∈ R ^d, the stochastic matrix Φ of an effect column unit length, is projected to a low n-dimensional subspace n v _t∈ R ^din, wherein d<<D, its equation expression is as follows:

v_{t}^{d} = {Φx}_{t}^{D};

Wherein, represent the original D dimension space of track characteristic, represent that track characteristic is reduced to the low n-dimensional subspace n of d dimension;

Stochastic matrix Φ meets JL lemma, by x _t∈ R ^dwith least error from v _t∈ R ^dreconstruct out, the low n-dimensional subspace n v namely after projection _tcontain original track characteristic x _tin approximate full detail.

As a kind of further prioritization scheme of personage's Activity recognition method based on accidental projection and Fisher vector of the present invention, described stochastic matrix is the stochastic matrix meeting the equidistant character of constraint.

As a kind of further prioritization scheme of personage's Activity recognition method based on accidental projection and Fisher vector of the present invention, described d=100, dd=d'=48.

As a kind of further prioritization scheme of personage's Activity recognition method based on accidental projection and Fisher vector of the present invention, the kernel function of the SVM classifier in described step 6 adopts Linear linear kernel function to realize a multiclass and exports.

The present invention adopts above technical scheme compared with prior art, there is following technique effect: the present invention adopts the method for accidental projection theorem to substitute Principal Component Analysis Method to carry out Feature Dimension Reduction, large to solve its time loss, main composition retains the problems such as indefinite, accidental projection theorem shows, by a compression calculation matrix, the original signal with sparse property can be projected on certain low n-dimensional subspace n, vector after this mapping and between original high dimensional feature vector point distance substantially remain unchanged, namely whole compression process can not produce twisting of data.In addition the hard plot of BoW model is different from, the present invention adopts GMM-Fisher vector mixture model to carry out fuzzy partitioning to track proper vector, it has merged the feature of Fisher karyogenesis pattern and discrimination model, the frequency that each characteristic descriptor occurs can not only be calculated, can also describe the probability distribution situation of these Feature Descriptors from statistical meaning, the feature representation both having enriched behavior act turn improves the efficiency of Activity recognition.

Accompanying drawing explanation

Fig. 1 is that the present invention adopts cluster sampling to carry out the effect of visualization figure of track behavior act extraction to video set, wherein: (a) is that KTH data set personage waves the effect of visualization figure of behavior act; B () is that KTH data set personage hurries up the effect of visualization figure of behavior act; C () is that KTH data set personage boxes the effect of visualization figure of behavior act; D () is that UCF50 data set personage plays basketball the effect of visualization figure of behavior act; E () is the effect of visualization figure of UCF50 data set personage weight lifting behavior act; F () is the effect of visualization figure of UCF50 data set personage golf spherical pendulum behavior act.

Fig. 2 is the process flow diagram of the personage's Activity recognition method based on accidental projection and Fisher vector of the present invention.

Embodiment

Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

What the present invention tested computing machine used is configured to internal memory 8GB, and CPU is the desk-top computer of Intel Core i3 3.4GHz, and code used is developed with C Plus Plus on visual studio 2013.Two kinds of following identical default parameters of data set setting, in dense track following algorithm, get N=32, n _σ=2, n _τ=3, track following length L=15 frames, sampling step length W=5 pixel, the characteristic locus dimension d=100 in accidental projection after dimensionality reduction, d'=48, the kernel function of SVM classifier adopts Linear linear kernel function to realize a multiclass and exports.

As shown in Figure 1, a kind of effect of visualization figure dense sampling of video set being carried out to the extraction of track behavior act.The present invention, by extracting dense track to characterize a class behavior motion, utilizes optical flow field to carry out multi-level dense sampling to point of interest and realizes following the tracks of.These points of interest to be repetitively sampled and tracked in the scope of a regular length frame along an intensive grid, and action trail is exactly the result that the Feature Descriptor continuity in these fixing frame numbers is expressed.The shape of track is used for distinguishing different personage's Behavioral change, and its performance is exactly the change over time and space of personage's target movement position in video in video, i.e. displacement vector.Consider the difference of the position that personage's behavior occurs in each video, the present invention, by carrying out summation operation to all positional informations extracted, realizes the normalization of position vector.For any track except extracting its positional information, also want each descriptor information to enrich its expression.As direction histogram (HOG) is used for describing the external static information of personage, light stream histogram (HOF) is used for describing the local motion information of track, and moving boundaries histogram (MBH) is used for describing the relative motion between pixel.Therefore final track is the set of positional information, direction gradient, light stream and moving boundaries histogram information.(a) in Fig. 1 is that KTH data set personage waves the effect of visualization figure of behavior act; (b) in Fig. 1 is that KTH data set personage hurries up the effect of visualization figure of behavior act; (c) in Fig. 1 is that KTH data set personage boxes the effect of visualization figure of behavior act; (d) in Fig. 1 is that UCF50 data set personage plays basketball the effect of visualization figure of behavior act; (e) in Fig. 1 is the effect of visualization figure of UCF50 data set personage weight lifting behavior act; (f) in Fig. 1 is the effect of visualization figure of UCF50 data set personage golf spherical pendulum behavior act.Can see that from Fig. 1 extracted effective track the description of image can go out the continuous motion of video personage behavior intuitively.

As shown in Figure 2, a kind of process flow diagram of the personage's Activity recognition based on Fisher vector and projection theorem.The present invention is directed to a class behavior video, first extract under fixing frame number prerequisite and follow the tracks of local behavioural characteristic, effective track is extracted again in least error allowed band, then merge all kinds of descriptor information and form a higher-dimension track characteristic vector, form the characteristic locus space of matrices of this class behavior video, finally be embedded in Gaussian Mixture-Fisher vector model framework, a lineoid for distinguishing various actions is trained by adding the mode of class label by a SVM classifier, final behaviour classification is realized by this lineoid, period, we adopted the mode of accidental projection to carry out reprojection's dimensionality reduction to reduce computation complexity to high dimensional feature.

The discrimination of the present invention's personage's behavior under KTH data set as shown in Table 1.KTH data set comprises 6 kinds of personage's behavior acts: take a walk, jog, hurry up, box, wave and clap hands, and each action realizes in four different scenes: indoor, outdoor, outdoor size changes, different clothes of arranging in pairs or groups out of doors.In most of scene, background is single and static, but ground unrest is large.Result shows, the inventive method effectively can identify the different personage's behavior act of KTH data centralization.

Table one

The discrimination of personage's behavior under UCF50 data set as shown in Table 2.UCF50 data set has 50 action classes, comprise play basketball, dive, golf swing arm, weight lifting, horizontal bar, the sports items such as to ride and the video segment of actual life chosen from YouTube.This data set background is complicated, and scene differs, and visual angle is different, larger relative to Activity recognition difficulty.Result shows, the inventive method effectively can identify the different personage's behavior act of UCF500 data centralization.

Table two

The comparison (unit: s) of the computing time of accidental projection and principal component analysis (PCA) two kinds of dimension reduction methods as shown in Table 3.Result shows that the dimension reduction method of accidental projection substantially increases the efficiency of Activity recognition algorithm, and working time improves nearly 200 times relative to principal component analysis (PCA).

Table three

	10 dimensions	30 dimensions	50 dimensions	70 dimensions	90 dimensions	110 dimensions	130 dimensions	150 dimensions
									RP	0.18	0.21	0.23	0.27	0.34	0.37	0.41	0.43
PCA	28.28	28.67	28.85	28.97	29.13	29.52	29.87	30.16

More than just the preferred embodiment of the present invention is described.Concerning those skilled in the art, other advantage and distortion can be associated easily according to above embodiment.Therefore, the present invention is not limited to above-mentioned embodiment, and it carries out detailed, exemplary explanation as just example to a kind of form of the present invention.Not deviating from the scope of present inventive concept, the usual change that those of ordinary skill in the art carry out in the aspects of the technology of the present invention and replacement, all should be included within protection scope of the present invention.

Claims

1., based on personage's Activity recognition method of accidental projection and Fisher vector, it is characterized in that, comprise the following steps:

2. a kind of personage's Activity recognition method based on accidental projection and Fisher vector according to claim 1, is characterized in that, specifically comprise the following steps:

V^{R P} = [v_{t} &Element; R^{d}], t = 1, ..., Σ_{I = 1}^{m} T_{I}, d < < D;

p_{λ} (v_{t}) = Σ_{i = 1}^{K} w_{i} p_{i} (v_{t});

r (i) = \frac{w_{i} p_{i} (v_{t})}{Σ_{i = 1}^{K} w_{i} p_{i} (v_{t})};

Step 4, then the log-likelihood function of set V about λ of all track characteristics after accidental projection, then the track characteristic v after dimensionality reduction _tabout GMM parameter set λ={ w _i, _xi,the gradient of ∑ i} is expressed as:

\frac{\partial l_{λ} (V)}{\partial w_{i}} = Σ_{t = 1}^{T_{m}} [\frac{r (i)}{w_{i}} - \frac{r (1)}{w_{1}}]

\frac{\partial l_{λ} (V)}{\partial u_{i}^{k}} = Σ_{t = 1}^{T_{m}} r (i) [\frac{x_{t}^{k} - u_{i}^{k}}{σ_{i}^{k^{2}}}]

\frac{\partial l_{λ} (V)}{\partial σ_{i}^{k}} = Σ_{t = 1}^{T_{m}} r (i) [\frac{{(x_{t}^{k} - u_{i}^{k})}^{2}}{σ_{i}^{k^{3}}} - \frac{1}{σ_{i}^{k}}];

V^{{RP}^{'}} = [v_{t}^{'} &Element; R^{d^{'}}], t = 1, ..., Σ_{I = 1}^{m} T_{I}, d^{'} < < d;

Wherein, d' represents Fisher vector dimension after accidental projection secondary dimensionality reduction, R ^d'represent the low n-dimensional subspace n after secondary dimensionality reduction, v ' _trepresent a Fisher vector after secondary dimensionality reduction, V ^rP'the set of all Fisher vector after expression secondary dimensionality reduction;

H^{R P} = [h_{t t} &Element; R^{d d}], t t = 1, ..., Σ_{J = 1}^{n} T_{J};

\frac{\partial l_{λ} (H)}{\partial w_{i}} = Σ_{t t = 1}^{T_{n}} [\frac{r (i)}{w_{i}} - \frac{r (1)}{w_{1}}]

\frac{\partial l_{λ} (H)}{\partial u_{i}^{k}} = Σ_{t t = 1}^{T_{n}} r (i) [\frac{x_{t t}^{k} - u_{i}^{k}}{σ_{i}^{k^{2}}}]

\frac{\partial l_{λ} (H)}{\partial σ_{i}^{k}} = Σ_{t t = 1}^{T_{n}} r (i) [\frac{{(x_{t t}^{k} - u_{i}^{k})}^{2}}{σ_{i}^{k^{3}}} - \frac{1}{σ_{i}^{k}}];

3. a kind of personage's Activity recognition method based on accidental projection and Fisher vector according to claim 2, is characterized in that, the method for accidental projection in described step 2, specific as follows:

v_{t}^{d} = {Φx}_{t}^{D};

4. a kind of personage's Activity recognition method based on accidental projection and Fisher vector according to claim 3, is characterized in that, described stochastic matrix is the stochastic matrix meeting the equidistant character of constraint.

5. a kind of personage's Activity recognition method based on accidental projection and Fisher vector according to claim 2, is characterized in that, described d=100, dd=d'=48.

6. a kind of personage's Activity recognition method based on accidental projection and Fisher vector according to claim 2, is characterized in that, the kernel function of the SVM classifier in described step 6 adopts Linear linear kernel function to realize a multiclass and exports.