CN106940795A

CN106940795A - A kind of gesture classification method based on tensor resolution

Info

Publication number: CN106940795A
Application number: CN201710207158.XA
Authority: CN
Inventors: 苏育挺; 刘琛琛; 张静; 刘安安
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2017-07-11

Abstract

The invention discloses a kind of gesture classification method based on tensor resolution, the gesture classification method comprises the following steps：Gesture video is modeled with three rank tensors；Each gesture video is then decomposed using amended Higher-order Singular value decomposition method；The result of tensor resolution is carried out visualizing presentation and analyzed；Typical angular is utilized respectively to classify to gesture video by k nearest neighbor grader and support vector machine classifier；The number for changing factor matrix column vector carries out contrast experiment.The present invention is visualized tensor resolution result, presents simultaneously for classifying, the physical significance of tensor resolution may be better understood.

Description

A kind of gesture classification method based on tensor resolution

Technical field

The present invention relates to action recognition field, more particularly to a kind of gesture classification method based on tensor resolution.

Background technology

In recent years, man-machine interaction and statistical learning are constantly developing, and action recognition problem is directed to one solved Major issue, human action can also have very big difference due to the complexity of itself between same action, therefore how effective The invariant features for extracting the essential distinction between different actions or being extracted from same action between them are vital.

Schuldt et al.^[1]Using the local event in local space time's feature capture video, this feature is adapted to movement The change of pattern size, frequency and speed.Simultaneously by this method for expressing and SVMs (Support Vector Machine, SVM) classification schemes are combined for action recognition, but classification of this method when identification is jogged and running is acted is accurate True rate is relatively low.

Scovanner et al.^[2]Proposing one is used for three-dimensional (3D) Scale invariant features transform of video or 3D rendering (Scale Invariant Feature Transform, SIFT) descriptor, specifically represents to regard using the method for one bag of word Frequently, and relation when proposing a kind of method to find between empty word, preferably to describe video data, although this method is average Accuracy rate is higher, but the classification accuracy when classifying some actions is still not ideal enough.

Jhuang et al.^[3]The system for then proposing biologically inspired, the action recognition for video sequence.The system is by increasing Plus the space-time characteristic detector layer composition of complexity, by direction of motion sensing unit array analysis list entries, and attempt not The direction of motion sensing unit and different system architectures of same type.

Although people have done effort as described above, human action's identification is still a difficulty due to the complexity of itself Topic, it is therefore desirable to which instrument carries out action recognition to new more powerful method in other words, this method attempts that this is effective by tensor Representing the method for high level data structure is used for action recognition, in the hope of that can reach preferable classifying quality.

The content of the invention

The invention provides a kind of gesture classification method based on tensor resolution, the present invention is regarded tensor resolution result Feelization, presents simultaneously for classifying, the physical significance of tensor resolution may be better understood, described below：

A kind of gesture classification method based on tensor resolution, the gesture classification method comprises the following steps：

Gesture video is modeled with three rank tensors；Amended higher order singular value point is then utilized to each gesture video Solution method is decomposed；

The result of tensor resolution is carried out visualizing presentation and analyzed；

Typical angular is utilized respectively to classify to gesture video by k nearest neighbor grader and support vector machine classifier；Change The number of variable factor matrix column vector carries out contrast experiment.

Wherein, it is described to be specially the step of be modeled with three rank tensors to gesture video：

First rank of tensor represents horizontal direction, and second-order represents vertical direction, and the 3rd rank represents time shaft；

The picture read in sample is matrix, and the 3rd rank by matrix in tensor is connected, and constitutes an expression hand Three rank tensors of gesture video.

Wherein, it is described the step of then decomposed to each gesture video using amended Higher-order Singular value decomposition method Specially：

First to three rank gesture video tensorsThe calculating of the matrixing of tensor is carried out, transposition is then carried out, Then to result after transposition, it carries out singular value decomposition, finally builds factor matrix and calculates core tensor.

Wherein, it is described that gesture video is classified specially by k nearest neighbor grader using typical angular：

Between three factor matrixs that each sample in the video sample and database in Set5 is calculated with typical angular Distance, and build training dataset and test data set；

Data set includes considering single factor matrix, two factor matrixs, three factor matrixs, using staying a cross validation Method chooses the optimal K values corresponding to k nearest neighbor grader under different situations, then using this optimal K values training pattern and test number According to.

The beneficial effect for the technical scheme that the present invention is provided is：The present invention uses amended Higher-order Singular value decomposition side Method, can preferably visualize the result for presenting and decomposing；Gesture classification and visualization presentation are combined, and are preferably demonstrated by tensor point The physical significance of solution.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the gesture classification method based on tensor resolution；

Fig. 2 is Cambridge gesture database sample；

Sample and the reference axis artificially built to wherein every small picture is presented for visualization in Fig. 3；

Fig. 4 shows for part classifying result；

Fig. 5 is hybrid matrix sample.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, further is made to embodiment of the present invention below It is described in detail on ground.

With the development in pluralism of data processing technique, the representation of data develops into 2-D data by initial vector Matrix form, but to represent many attributes of data, although can be by the way that multidimensional data to be expressed as to the shape of vector sum matrix Formula, but the structure of initial data can be so destroyed, cause the information of initial data to lose.It is data science to handle high dimensional data A kind of trend of development, it is therefore desirable to find appropriate high dimensional data method for expressing, tensor has adapted to this requirement, tensor be to Amount, the high-order of matrix are promoted, and because original data Holistic modeling is tensor by he, therefore he is not destroyed between data Structural information, the correlation between data is maintained well.Place is further development of due to hardware devices such as computers The data of reason higher-dimension have established solid foundation, therefore reference tensor goes processing multidimensional data to have become a current research Focus, while increasing field has been also applicable in, such as image procossing, biomedicine.

Tensor is used for the expression of numerous high dimensional datas, because it has adapted to the requirement of numerous high dimensional datas now, therefore closes Increasingly it is taken seriously in various researchs such as tensor subspace study, tensor resolution etc. of tensor^[4], studied as a kind of tensor Important method, tensor resolution also develops constantly, and various tensor resolution methods are constantly suggested, especially basic Tucker^[5]The tensor resolution method for increasing various constraintss on decomposition method is even more to emerge in an endless stream, these Tucker decomposition sides Method has its different scope of application, under the conditions of increased constraints is suitable, and tensor resolution can reach good effect.

Tensor resolution by higher-dimension tensor resolution into several low-dimensional datas, so as to show inner link between the two, its Middle CP (CANDECOMP/PARAFAC)^[6]Decompose significant to the order for studying tensor, tensor resolution is widely used It is specific in compression of images, data recovery, image repair, image classification etc. in fields such as computer vision, Digital Signal Processing Application aspect development prospect is good.

This subject of statistical learning is comprising very extensively, and it is all its important content that supervised learning, which also has unsupervised learning, and supervision is learned Habit refers to first learning the input data for having output, then trains model, finally the input with model prediction newly Output corresponding to data.Unsupervised learning then refers to input data and does not contain corresponding output, and computer is not to having The input data of output is learnt and is set up model, and then new data are handled, and specific learning method includes Cluster etc..The present invention carries out gesture classification by the grader in supervised learning method using tensor resolution result.The present invention Embodiment carries out the result after tensor resolution the presentation of vision, and then the picture of presentation is analyzed, and analysis is obtained after decomposing To the corresponding physical significance of each composition, make every effort to be better understood on tensor resolution preferably for gesture classification.

Embodiment 1

A kind of gesture classification method based on tensor resolution, referring to Fig. 1, the gesture classification method comprises the following steps：

101：Gesture video is modeled with three rank tensors；Amended higher order singular is then utilized to each gesture video Value decomposition method is decomposed；

102：The result of tensor resolution is carried out visualizing presentation and analyzed；

103：Typical angular is utilized respectively to divide gesture video by k nearest neighbor grader and support vector machine classifier Class；The number for changing factor matrix column vector carries out contrast experiment.

Wherein, it is specially the step of being modeled with three rank tensors to gesture video in step 101：

Wherein, then being divided using amended Higher-order Singular value decomposition method each gesture video in step 101 The step of solution is specially：

Wherein, being classified specially to gesture video by k nearest neighbor grader using typical angular in step 103：

In summary, the embodiment of the present invention uses amended Higher-order Singular value decomposition method^[7], can preferably visualize The result decomposed is presented；Gesture classification and visualization presentation are combined, and are preferably demonstrated by the physical significance of tensor resolution.

Embodiment 2

The scheme in embodiment 1 is further introduced with reference to specific computing formula, example, it is as detailed below Description：

201：Gesture video is modeled with three rank tensors；

Specially：Use Cambridge gesture database in this experiment, including 9 class gestures, as shown in Fig. 2 by a gesture Representation of video shot is a three rank tensors, and the first rank of wherein tensor represents horizontal direction, and second-order represents vertical direction, the 3rd rank What is represented is time shaft.It is matrix to read the picture in sample first, and then the 3rd rank by these matrixes in tensor is gone here and there Connection, thus constitutes three rank tensors of an expression video.

202：Amended Higher-order Singular value decomposition (High Order Singular are utilized to each gesture video Value Decomposition,HOSVD)^[7]Method is decomposed；

First to N ranks (dimension) tensorThe calculating of the matrixing of tensor is carried out, result is obtained for A₍₁₎, A₍₂₎,…,A_(N)(being several matrixes), then carries out transposition to above-mentioned result, obtains B₍₁₎,B₍₂₎,…,B_(N)(it is corresponding Several matrixes), SVD (singular value decomposition) is then carried out to it, its formula is as follows：

Provided with a Matrix C ∈ R^m×n, then the matrix following form can be decomposed into by SVD：

C=U ∑s V^T (1)

Wherein, Σ is expressed as a matrix, and element of the matrix in addition to the elements in a main diagonal is 0, and the master of ∑ is diagonal The element of line is arranged according to the order successively decreased, and U and V are respectively orthogonal matrix^[8]。

Amended HOSVD (Higher-order Singular value decomposition) employs U_(k)(k-th of matrix B_(K)Obtained by singular value decomposition Orthogonal matrix U) in preceding L column vector factor matrix is initialized, constitute Stiefel manifolds^[9], pass through correspondence Formula calculate core tensor.

Above-mentioned specific calculation procedure is known to those skilled in the art, and the embodiment of the present invention is not repeated this.

Therefore amended HOSVD decomposition methods just complete the high-order by three rank gesture tensors of structure with above-mentioned modification Singular value decomposition method is decomposed, and obtains three factor matrixs of core tensor sum.

203：The result of tensor resolution is carried out visualizing presentation and analyzed；

This method constructs 40 × 40 × 30 tensor sample first, in order to show the meaning of above-mentioned decomposition result, to most The row of three factor matrixs obtained eventually carry out the display of image.

Such as U₍₁₎It is a factor matrix after decomposing, then by U₍₁₎A column vector be expressed as a pictures, if table The size for showing three rank tensors of video is I × J × K, then U₍₁₎The corresponding picture size of column vector be J × K, in the case of remaining Picture size similarly.

Below by taking first factor matrix analysis as an example, it is first five column vector of first factor matrix to see Fig. 3, Fig. 3 Picture display：Each of which row one gesture of correspondence, corresponds to factor matrix per each picture of a line from left to right A column vector.

Three reference axis that there is no harm in three rank video tensors of artificial structure are respectively X-axis, Y-axis and Z axis, and water is corresponded to respectively Square to, vertical direction and time shaft, then it is Y-Z plane that the column vector of first factor matrix is corresponding, by above-mentioned right The description of database understands that above-mentioned gesture motion to the left and to the right corresponds to Y direction displacement reduction, Y direction displacement respectively Increase, therefore represent in above-mentioned picture, such as picture vertical direction in the first row is Y-axis, horizontal direction correspondence Z axis, then It can be seen that per with the increase of Z axis data, the Y-axis coordinate value of the data point shown in corresponding picture subtracts in pictures It is small, and in the picture of the second row the Y-axis coordinate value changes of data point then with it is on the contrary in the first row.And the picture pair of the third line What is answered is the fit when the five fingers close up, it is envisaged that once, during fist, is not sent out in the displacement of Y direction Raw obvious change, therefore the Y-axis coordinate of data point is approximately constant.

204：It is utilized respectively typical angular and passes through k nearest neighbor (K-nearest neighbor, KNN) grader and SVM classifier Gesture video is classified, is specially：

Typical angular is a kind of method for distance between two size identical matrixes of calculating.

Provided with two matrix D ∈ R^m×nWith E ∈ R^m×n, then following calculating is carried out

F=D^TE,F∈R^n×n (2)

F=U Σ V^T (3)

Wherein, Σ the elements in a main diagonal is for weighing two the distance between matrix Ds and E, corresponding to typical case Angle^[9].Σ, U and V meaning are identical with formula (1).

Carry out video sampling first in KNN graders, the method for sampling there are two kinds, and one kind is continuous sampling, another It is interval sampling, continuous sampling is continuously takes some pictures, and interval is sampled as choosing a pictures every some pictures, built Size is 20 × 20 × 20 video tensor, and different illumination is divided into when Cambridge gesture database that this method is used is according to experiment 5 set：Set1, Set2, Set3, Set4, Set5, Set5 is then chosen in database, and (corresponding illumination level is more equal It is even) as training set, Set1, Set2, Set3, Set4 are used as test set.

The tensor of above-mentioned structure is decomposed with amended HOSVD, three corresponding to each video tensor are obtained Factor matrix, then with typical angular calculate Set5 in video sample and database in each sample three factor matrixs it Between distance and build training dataset and test data set.Training pattern finally is gone with training dataset, test data set is used Tested, and built hybrid matrix and statistics accuracy rate.

During data set is built, various possible combined situations are taken into full account, including consider single factor square Battle array, the combination of two factor matrixs, three factor matrixs.And employ and stay a cross-validation method to choose KNN under different situations Optimal K values corresponding to grader, then use this optimal K values training pattern and test data.

The construction method of data set is identical with the construction method of the data set using KNN graders.It is above-mentioned to have shown that root Data set is built according to the various combination of factor matrix, then data set has 7 kinds, as KNN graders, counts this 7 kinds of situations Corresponding hybrid matrix.This corresponding classification accuracy of 7 kinds of situations Set1, Set2, Set3, Set4 is counted simultaneously and average accurate Rate.

Need to select many parameters firstly for SVM classifier, the kernel function used in this experiment is radial direction base letter Number, so needing to select suitable parameter g and parameter c, therefore the method for employing cross validation, experiment employs libsvm works The python operators for cross validation that tool case is carried.Wherein grid.py operators (.py represent the operator be python operators) Optimized parameter is obtained by cross validation, this method is mounted with two programs of gnuplot and python, then uses data Grid.py operators choose optimized parameter g and c.

Part classifying result during experiment as shown in figure 4, Fig. 4 represents that sample mode samples for interval, and take arrange to The classification accuracy on each test set when number is 10 is measured, it can be seen that being achieved using the combination of factor matrix higher Classification accuracy, while using the combination of three factor matrixs relative to the classification using first and the 3rd factor matrix Accuracy rate is not improved too much, illustrates that the information that second factor matrix and two other factor matrix are included has largely Redundancy, this also with visualize analysis result match.

Fig. 5 be hybrid matrix sample, represent using interval sampling method, and column vector number be 5 when experimental result, often The leftmost side of a line represents the classification belonging to gesture to be sorted, and the top of each row represents that the gesture is judged to the class being broken into Not, what is marked in table is by the data corresponding to misclassification probability highest, it can be seen that being judged to other gesture from these data What the direction of motion of the direction of motion and the gesture misjudged was generally identical, wherein when differing direction, both Y coordinates are to erect Nogata is also close to changing with time, and VR is mistaken for FC probability highest in such as table, and VR is mistaken for FR probability Also it is higher.The information shown this demonstrate first factor matrix is mainly the movable information i.e. change of position, additionally includes The profile informations of some gestures.

205：The number for changing factor matrix column vector in step 202 carries out contrast experiment, is specially：

In order to be best seen from the physical significance of tensor resolution, the present invention is after the experiment of step 204 has been carried out, increase One group of contrast experiment, that is, change the number of the column vector of taken factor matrix, passes through to visualize and result is presented and classification is tied Fruit analyzes the physical significance of tensor resolution.

It can be seen that with the increase of factor matrix column vector number, classification it is accurate take the lead in increasing then keep it is constant or Person said and slightly reduce, when being classified using the typical angular of single factor matrix, first and second factor matrix Classification accuracy increase with the increase of column vector number, and the classification accuracy amplification of second factor matrix is compared with first Factor matrix is larger, and the accuracy rate classified using the 3rd factor matrix is first increased with the number increase of column vector and subtracted afterwards It is small.It can be seen that the position of column vector of three factor matrixs with important information is different, knot also is presented as visualizing Shown in fruit, show that the position of more apparent information is different in three factor matrixs.This is probably due to the factor by environment, such as illumination Etc. and cause.

In summary, the embodiment of the present invention uses amended Higher-order Singular value decomposition method, can preferably visualize and is in The result now decomposed；Gesture classification and visualization presentation are combined, and are preferably demonstrated by the physical significance of tensor resolution.

Embodiment 3

Feasibility checking is carried out to the scheme in Examples 1 and 2 with reference to specific computing formula, example, referred to down Text description：

The database of this experiment is Cambridge gesture database, and the database is altogether comprising 900 samples, by the kind of gesture Class is divided into 9 classes, is divided into 5 kinds according to the light levels of picture, and Set1, Set2, Set3, Set4 and Set5 have been corresponded to respectively, each Kind light levels under include 9 class gestures, respectively the five fingers close up to the left, the five fingers close up to the right, the five fingers close up clench fist, the five fingers Separate to the left, the five fingers of the five fingers separately to the right, separated close up, V-type gesture to the left, V-type gesture to the right, V-type gesture closes up.It is each Again comprising 20 samples under individual gesture.Several pictures are included in each sample, the number of picture is inconsistent.In sample One picture corresponds to a frame of video, so can be formed by a video by choosing some pictures.

This method assesses classification performance using accuracy rate and hybrid matrix,

Accuracy=a/b

Wherein, accuracy represents accuracy rate, and a represents the number correctly classified, and b represents total number.Hybrid matrix is such as Shown in Fig. 5, it can show that all situations that a certain class gesture is classified, including accuracy rate and mistake are divided into other a certain class gestures Probability.

According to the effect of the typical angular between tensor resolution result, the data of construction in classification significantly, construct in an experiment Tensor size be 20 × 20 × 20, interval sampling and column vector number be 10 when classifying quality it is as shown in Figure 4.By tensor Such as decomposition result, which visualize when presenting, can significantly find out its physical significance, shown in Fig. 3, and concrete analysis is shown in specific The step of embodiment (3).

Bibliography：

[1]Schuldt C,Laptev I,Caputo B.Recognizing human actions:a local SVM approach[C]//International Conference on Pattern Recognition.IEEE,2004:32- 36Vol.3.

[2]Scovanner P,Ali S,Shah M.A 3-dimensional sift descriptor and its application to action recognition[J].2007:357-360.

[3]Jhuang H,Serre T,Wolf L,et al.A Biologically Inspired System for Action Recognition[C]//IEEE,International Conference on Computer Vision.IEEE, 2007:1-8.

[4] tensor resolution method and application study [D] Hefei of the Liu Ya nanmus based on figure and low-rank representation：University of Anhui, 2014.

[5]L.R.Tucker,Some mathematical notes on three-mode factor analysis, Psychometrika, 31 (1966), 279~311.

[6]R.A.Harshman,Foundations of the PARAFAC procedure:Models and conditions for an“explanatory”multi-model factor analysis,UCLA working papers In phonetics, 16 (1970), 1~84.

[7]Dijun Luo,Chris Ding,Heng Huang.Are Tensor Decomposition Solutions UniqueOn the Global HOSVD and ParaFac Algorithms[J].Lecture Notes in Computer Science,2009,6634(1):148-159.

[8]] Yang Huayong, Lin Xiaoli, stand in great numbers video human face identification [J] meters of the space based on spectral clustering in Grassmann manifold Calculation machine is applied and software, 2014 (5)：168-171.

[9] gradient algorithm in Zhang Jianjun, Cao Jie, Wang Yuanyuan .Stiefel manifolds and its application in feature extraction [J] radar journals, 2013,2 (3)：309-313.

It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Sequence number is for illustration only, and the quality of embodiment is not represented.

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modifications, equivalent substitutions and improvements made etc. should be included within the scope of the present invention.

Claims

1. a kind of gesture classification method based on tensor resolution, it is characterised in that the gesture classification method comprises the following steps：

Gesture video is modeled with three rank tensors；Amended Higher-order Singular value decomposition side is then utilized to each gesture video Method is decomposed；

Typical angular is utilized respectively to classify to gesture video by k nearest neighbor grader and support vector machine classifier；Change because The number of sub-matrix column vector carries out contrast experiment.

2. a kind of gesture classification method based on tensor resolution according to claim 1, it is characterised in that described to use three ranks The step of tensor is modeled to gesture video be specially：

The picture read in sample is matrix, and the 3rd rank by matrix in tensor is connected, and constitutes one and represents that gesture is regarded Three rank tensors of frequency.

3. a kind of gesture classification method based on tensor resolution according to claim 1, it is characterised in that described to each The step of gesture video is then decomposed using amended Higher-order Singular value decomposition method be specially：

4. a kind of gesture classification method based on tensor resolution according to claim 1, it is characterised in that the utilization allusion quotation Gesture video is classified specially by k nearest neighbor grader at type angle：

The distance between three factor matrixs of each sample in the video sample and database in Set5 are calculated with typical angular, And build training dataset and test data set；

Data set includes considering single factor matrix, two factor matrixs, three factor matrixs, is selected using a cross-validation method is stayed The optimal K values corresponding to k nearest neighbor grader under different situations are taken, this optimal K values training pattern and test data is then used.