CN108664904A

CN108664904A - A kind of human body sitting posture Activity recognition method and system based on Kinect

Info

Publication number: CN108664904A
Application number: CN201810369535.4A
Authority: CN
Inventors: 李方敏; 蔡诗琦; 旷海兰; 刘新华; 张韬; 栾悉道
Original assignee: Changsha University
Current assignee: Changsha University
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2018-10-16

Abstract

The human body sitting posture Activity recognition method based on Kinect that the invention discloses a kind of, including：Skeleton image sequence is obtained from human body attitude behavioral data collection, the characteristic sequence of image is obtained according to each frame image and its former frame in skeleton image sequence and a later frame image, clustering processing is carried out to the characteristic sequence of the frame image of acquisition using K means, to obtain multiple clustering processing results, obtained multiple clustering processing results are handled using PCA methods, to obtain the local feature that multiple importance according to pivot information arrange from high to low, local feature is handled using feature coding algorithm, to obtain characteristics of image descriptor, according to the global loss function of characteristics of image descriptor structure of other frame images in frame image and skeleton image sequence, global loss function is solved using based on stochastic gradient descent algorithm.The present invention can solve traditional healthy sitting posture detecting method low technical problem of recognition accuracy when carrying out sitting posture behavioral value to " non-positive seat " state.

Description

A kind of human body sitting posture Activity recognition method and system based on Kinect

Technical field

The invention belongs to technical field of computer vision, more particularly, to a kind of human body sitting posture row based on Kinect For recognition methods and system.

Background technology

In recent years, in medical field, the application of human body attitude behavioural analysis becomes more and more extensive, in medical diagnosis on disease, health Reexamine estimate and the daily monitoring of the elderly etc. have it is involved.Specifically, most of social worker is to do at present Public room works, and for them, sitting posture is the most permanent operating attitude.Medical research shows sitting and bad seat Appearance will lead to a variety of occupational musculoskeletal diseases such as protrusion of lumber intervertebral disc, cervical spondylosis etc..Nowadays, had much about strong The method of health sitting posture detection, but these methods have usually only taken into account in the undesirable monitoring of sitting posture to skeleton information Angle analysis is carried out, then judges whether human body sitting posture behavior is healthy only according to this angular standard.

However, in real life scene, when people is in the posture in " seat ", in addition to normally tapping keyboard or writing Except both conventional postures, it is also possible to will appear other sitting postures such as wave, drink water, receiving calls, clapping hands when more Life activities under state (" non-positive seat " state can also be referred to as), the bone information of the object in these behavior states Angle can may also change, or even not meet the detection framework of simple bone angle information, if at this point, still applied Traditional healthy sitting posture detecting method carries out sitting posture behavioral value to these " non-positive seat " states, then testing result can be caused to occur Sizable deviation, actual scene application be not high.

Invention content

For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of, and the human body based on Kinect is sat Appearance Activity recognition method and system, it is intended that solve traditional healthy sitting posture detecting method to " non-positive seat " state into When row sitting posture behavioral value, testing result deviation is larger, recognition accuracy is low technical problem.

To achieve the above object, according to one aspect of the present invention, a kind of human body sitting posture row based on Kinect is provided For recognition methods, include the following steps：

(1) skeleton image sequence is obtained from human body attitude behavioral data collection, according to each frame image in skeleton image sequence And its former frame and a later frame image obtain the time local feature of the frame image, and according to each frame figure in skeleton image sequence Space local feature as obtaining the frame image, time local feature and space local feature collectively form characteristic sequence f_n,s, Middle n ∈ [1,7], s are the picture frames in skeleton image sequence；

(2) K-means is used to carry out clustering processing to the characteristic sequence of the frame image obtained in step (1), it is more to obtain A clustering processing result；

(3) multiple clustering processing results that step (2) obtains are handled using PCA methods, with obtain it is multiple according to The local feature that the importance of pivot information arranges from high to low.

(4) local feature for using feature coding algorithm to obtain step (3) is handled, to obtain characteristics of image description Symbol；

(5) according to the global loss letter of characteristics of image descriptor structure of other frame images in frame image and skeleton image sequence Number, solves global loss function using based on stochastic gradient descent algorithm, to obtain making global loss function to minimize Optimum linear transforming function transformation function；

(6) using obtained optimum linear transforming function transformation function to the image of other frame images in frame image and skeleton image sequence Feature descriptor is handled, to obtain the similarity of frame image and other frame images in skeleton image sequence.

(7) frame image of the nonparametric K- nearest neighbor algorithms to being obtained in step (6) and other frames in skeleton image sequence are utilized The similarity of image is handled, to carry out behavior classification to all images.

Preferably, which is obtained according to each frame image and its former frame and a later frame image in skeleton image sequence Time local feature be specially：

First, artis is divided into three joint groups, wherein head, left hand, the right hand, left foot, the right side according to human body Displacement vector of the foot in time-varying process constitutes the first joint group, and neck, left hand elbow, right hand elbow, left knee, right knee are in the time Displacement vector in change procedure constitutes second joint group, and backbone, left shoulder, right shoulder, left stern, right stern are in time-varying process Displacement vector constitutes third joint group；

Then, the time shifting vector of different artis in each frame image is obtained：

Wherein 1 ＜ s ＜ τ, τ is the quantity of picture frame in skeleton image sequence,It is s frame images in skeleton image sequence Coordinate of i-th of artis in coordinate system (X, Y, Z),Indicate i-th of joint of s frame images in skeleton image sequence The time shifting vector of point；

Finally, same joint group will be belonged in the time shifting vector of the different artis of each frame image achieved above Time local feature combine, to establish the time local feature f of the first, second, and third joint group respectively₁To f₃。

Preferably, it is specially according to the space local feature that each frame image obtains the frame image in skeleton image sequence：

First, artis is divided into 4 joint groups according to human body, wherein head, left hand, the right hand be respectively and backbone Relative position vector constitute the 4th joint group, head, left hand, left foot respectively with the Relative position vector of right stern constitute the 5th close Section group；Head, the right hand, right crus of diaphragm respectively constitute the 6th joint group with the Relative position vector of left stern；Left hand, the right hand are respectively and head Relative position vector constitute the 7th joint group；

Then, the space displacement vector of different artis in each frame image is obtained：

WhereinIt indicates space displacement vector of i-th of artis relative to j-th of artis in s frame images, and has i≠j；

Finally, same joint group will be belonged in the space displacement vector of the different artis of each frame image achieved above Time local feature combine, to establish the time local feature f of the four, the five, the 6th and the 7th joint group respectively₄ To f₇。

Preferably, feature coding algorithm is local feature polymerization description vectors symbol algorithm；

Preferably, local feature is handled, is specifically to obtain characteristics of image descriptor：

First, it is calculated using following formula：

F_n,c=[υ_n,c,1,...,υ_n,c,i,...,υ_n,c,k]

Wherein F_n,cIt indicates multiple residual vector υ of corresponding c-th of clustering processing result of n-th of joint group_n,c,mSeries connection Obtained vector, υ_n,c,kDuring indicating clustering processing, the local feature set of the subgroup n of c-th of initialization, c=in gathering k 1,2 ... and C }, C indicates that the quantity of clustering processing result, m={ 1,2 ..., k }, k indicate gathering in clustering processing result Quantity；μ_n,_c,mIt indicates in corresponding c-th of clustering processing result of n-th of joint group The corresponding cluster centre of m-th of gathering, and have

S_n,c,m={ f_n,s| m=arg min_p||f_n,s-μ_n,c,p||}

Wherein p={ 1,2 ..., k }, and have p ≠ m；

Then, by all joint groups in frame image and the corresponding vector F of all clustering processing results_n,cSummation, just obtains figure As feature descriptor.

Preferably, step (5) is specially：

First, it is as follows to build global loss function：

ε (L)=(1- μ) ε_pull(L)+με_push(L)+γ||L^TL-I||²

Wherein μ is component ε_push(L) and component ε_pull(L) ratio between component, γ are regularization coefficients, and I is unit Matrix, L indicates linear transformation, and has

WhereinIt is the characteristics of image descriptor of frame image, component ε_pull(L) it indicates to further real target sample image Feature descriptorAs same category of measurement, component ε_push(L) it indicates to further feature vector characteristics of image descriptor While practical will not push away characteristics of image descriptor in the same category of sample l that acts as fraudulent substitute for a person interferedMeasurement, ξ It is characteristics of image descriptor and acts as fraudulent substitute for a person the expectation separation spacing between sample l, j → i indicates characteristics of image descriptorWith Characteristics of image descriptorIt is same category of target base sample, l, which is sample index, acts as fraudulent substitute for a person sample when being i, Indicate characteristics of image descriptorWith characteristics of image descriptorIt is not same category of target base sample；

Then, the optimum linear transforming function transformation function L for making global loss function minimize is acquired using SGD algorithms^*：

L^*=arg min_Lε(L)。

It is another aspect of this invention to provide that a kind of human body sitting posture Activity recognition system based on Kinect is provided, including：

First module, for obtaining skeleton image sequence from human body attitude behavioral data collection, according in skeleton image sequence Each frame image and its former frame and a later frame image obtain the time local feature of the frame image, and according to skeleton image sequence In each frame image obtain the space local feature of the frame image, time local feature and space local feature collectively form feature Sequence f_n,s, wherein n ∈ [1,7], s are the picture frames in skeleton image sequence；

Second module, for being carried out at cluster using the characteristic sequence of the frame image obtained in K-means pairs of the first module Reason, to obtain multiple clustering processing results；

Third module, multiple clustering processing results for being obtained using the second module of PCA methods pair are handled, with The local feature arranged from high to low to multiple importance according to pivot information.

4th module, the local feature for being obtained to third module using feature coding algorithm are handled, to obtain Characteristics of image descriptor；

5th module, for being built according to the characteristics of image descriptor of other frame images in frame image and skeleton image sequence Global loss function solves global loss function using based on stochastic gradient descent algorithm, to obtain making global loss The optimum linear transforming function transformation function of function minimization；

6th module, the optimum linear transforming function transformation function for using is to other frames in frame image and skeleton image sequence The characteristics of image descriptor of image is handled, to obtain the similarity of frame image and other frame images in skeleton image sequence.

7th module, for utilizing the frame image obtained in the 6th module of nonparametric K- nearest neighbor algorithms pair and skeleton image sequence The similarity of other frame images is handled in row, to carry out behavior classification to all images.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show Beneficial effect：

(1) discrimination of the method for the present invention is high：Since invention introduces the solution procedurees of optimum linear transforming function transformation function, and The optimum linear transforming function transformation function is applied in K-NN sorting algorithms, to further improve recognition accuracy；

(2) present invention is when applied to sitting posture health detection, it is contemplated that more practical application scenes, have extensive Application and practicability.

Description of the drawings

Fig. 1 is the flow chart of the human body sitting posture Activity recognition method the present invention is based on Kinect,

Fig. 2 is that 20 postures action of MSR-Action3D data sets is identified to obtain using the method for the present invention Confusion matrix.

Fig. 3 is to obscure square using what the method for the present invention was identified for UTKinect-Action3D data sets Battle array.

Fig. 4 is that Florence 3D Actions data sets are obscured using what the method for the present invention was identified Matrix.

Fig. 5 is that the performance that the method for the present invention is obtained for human body attitude identification and human body sitting posture Activity recognition is more bent Line.

Specific implementation mode

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below It does not constitute a conflict with each other and can be combined with each other.

The human body sitting posture Activity recognition method based on Kinect that the present invention provides a kind of, first in Kinect bones Simplify the definition of skeletal joint point on the basis of data, and devises a kind of completely new human body attitude Activity recognition on this basis Frame, and this specific goal in research has been made to be further improved more to adapt to human body sitting posture behavior to the frame for human body sitting posture Identification..In the frame, the space-time based on framework information being defined first and describes local feature, K-means clusters is then used to calculate Method and principal component analysis (Principal Component Analysis, PCA) algorithm obtain the office of final feature clustering result Portion polymerize description vectors symbol, then proposes using self-defined loss function and global stochastic gradient optimization algorithm two parts combination Mode, carry out differentiate feature metric learning, the study transformation results in relation to characteristic information are calculated, finally use K-NN Grader is to finally learning to obtain the Classification and Identification of characteristic information progress posture behavior.Experiment show this method is in human body The validity and accuracy of sitting posture Activity recognition, and have good behaviour on human body sitting posture Activity recognition.

As shown in Figure 1, the human body sitting posture Activity recognition method the present invention is based on Kinect includes the following steps：

(1) skeleton image sequence is obtained from human body attitude behavioral data collection, according to each frame image in skeleton image sequence And its former frame and a later frame image obtain the time local feature of the frame image, and according to each frame figure in skeleton image sequence Space local feature as obtaining the frame image, time local feature and space local feature collectively form characteristic sequence；

Specifically, human body attitude behavioral data collection can be the MSR- Action3D data sets of Microsoft Research publication, Either UTKinect-Action3D data sets or Florence 3D Actions data sets.

The time of the frame image is obtained according to each frame image and its former frame and a later frame image in skeleton image sequence Local feature is specially：First, artis is divided into three joint groups according to human body, wherein head, left hand, the right hand, The displacement vector of left foot, right crus of diaphragm in time-varying process constitutes the first joint group, neck, left hand elbow, right hand elbow, left knee, the right side Displacement vector of the knee in time-varying process constitutes second joint group, and backbone, left shoulder, right shoulder, left stern, right stern are in time change Displacement vector in the process constitutes third joint group；

Wherein s is the picture frame in skeleton image sequence, and it is the number of picture frame in skeleton image sequence to have 1 ＜ s ＜ τ, τ Amount,It is coordinate of i-th of the artis of s frame images in skeleton image sequence in coordinate system (X, Y, Z),Indicate bone The time shifting vector of i-th of artis of s frame images in frame image sequence；

Finally, same joint group will be belonged in the time shifting vector of the different artis of each frame image achieved above Time local feature combine, to establish the time local feature f of the first, second, and third joint group respectively₁To f₃；

It is specially according to the space local feature that each frame image obtains the frame image in skeleton image sequence：First, it presses Artis is divided into 4 joint groups according to human body, wherein head, left hand, the right hand respectively with the Relative position vector of backbone The 4th joint group is constituted, head, left hand, left foot respectively constitute the 5th joint group with the Relative position vector of right stern；Head, the right side Hand, right crus of diaphragm respectively constitute the 6th joint group with the Relative position vector of left stern；Left hand, the right hand are respectively sweared with the relative position on head Amount constitutes the 7th joint group.

Finally, same joint group will be belonged in the space displacement vector of the different artis of each frame image achieved above Time local feature combine, to establish the time local feature f of the four, the five, the 6th and the 7th joint group respectively₄ To f₇；

Finally obtained characteristic sequence is expressed as f in this step_n,s, wherein n ∈ [1,7].

(3) step (2) is obtained using Principal Component Analysis (Primary component analysis, abbreviation PCA) Multiple clustering processing results handled, it is special to obtain the part that multiple importance according to pivot information arrange from high to low It levies (wherein importance refers to the relevance between local feature and final human body sitting posture Activity recognition result).

Although can inevitably cause the loss of information while dimensionality reduction, but correlation is usually present between real data Property, therefore can try every possible means as possible to reduce the loss of information while dimensionality reduction.A benefit using PCA be exactly into When row Data Dimensionality Reduction, new calculated " pivot " vector can be ranked up according to its importance, take out be in sequence on demand The subsequent dimension of sequence is saved in front i.e. mostly important part, to carry out simplified model either to data into While row compression, the information of initial data is maintained to greatest extent.

Specifically, the feature coding algorithm used in this step is local feature polymerization description vectors symbol (vector Of locally aggregated descriptors, abbreviation VLAD) algorithm.

This step is specifically to use following formula：

F_n,c=[υ_n,c,1,...,υ_n,c,i,...,υ_n,c,k]

Wherein F_n,cIt indicates multiple residual vector υ of corresponding c-th of clustering processing result of n-th of joint group_n,c,mSeries connection Obtained vector, υ_n,c,kDuring the clustering processing for indicating step (2), the part of the subgroup n of c-th of initialization is special in gathering k Collection is closed, c=1,2 ... and C }, C indicates that the quantity of clustering processing result, m={ 1,2 ..., k }, k indicate clustering processing result The quantity of middle gathering.

In above-mentioned formula,Wherein μ_n,c,mIndicate the corresponding c of n-th of joint group The corresponding cluster centre of m-th of gathering in a clustering processing result, and have

S_n,c,m={ f_n,s| m=arg min_p||f_n,s-μ_n,c,p||}。

Wherein p={ 1,2 ..., k }, and have p ≠ m.

By all joint groups in frame image and the corresponding vector F of all clustering processing results_n,cAfter summation, this step is just obtained Rapid characteristics of image descriptor.

(5) according to the global loss letter of characteristics of image descriptor structure of other frame images in frame image and skeleton image sequence Number loses the overall situation using based on stochastic gradient descent (Stochastic gradient descent, abbreviation SGD) algorithm Function is solved, to obtain the optimum linear transforming function transformation function for making global loss function minimize；

Specifically, it is as follows to build global loss function first：

ε (L)=(1- μ) ε_pull(L)+με_push(L)+γ||L^TL-I||²

WhereinIt is the characteristics of image descriptor of frame image, component ε_pull(L) it indicates to further real target sample image Feature descriptorAs same category of measurement, component ε_push(L) it indicates to further feature vector characteristics of image descriptor While practical will not push away characteristics of image descriptor in the same category of sample l that acts as fraudulent substitute for a person interferedMeasurement, ξ It is characteristics of image descriptor and acts as fraudulent substitute for a person the expectation separation spacing between sample l, j → i indicates characteristics of image descriptorWith Characteristics of image descriptorIt is same category of target base sample, l, which is sample index, acts as fraudulent substitute for a person sample when being i, Indicate characteristics of image descriptorWith characteristics of image descriptorIt is not same category of target base sample.

L^*=arg min_Lε(L)

(7) utilize nonparametric K- neighbours (K-Nearest Neighbors, abbreviation K-NN) algorithm to being obtained in step (6) Frame image and the similarity of other frame images in skeleton image sequence handled, with to all images carry out behavior classification.

In K-NN algorithms, input is made of the nearest training examples of k in data set, output be then a class at Member.The principle of classification of one new object to be sorted is that the object is divided into most of neighbours institutes in a neighbours nearest from it Correspondence class in.Neighbours would generally be assigned with weights to be used for indicating contribution of the neighbours to classification.For example, can select to be sorted Distance weights as the neighbours sample of the object to each neighbours.In the present invention, it by the metric learning in two stages, incites somebody to action The final transformation of the feature samples arrived, feature representation and sample when can determine that each sample participates in classification according to the transformation Between distance definition.

When carrying out final classification to these samples, K-NN is as a kind of most directly for the calculation for unknown data of classifying Method, each training data have specific label, can also explicitly judge the label of new data.Specific algorithm process is：

(1) define between data apart from calculation, calculate new data at a distance from known class data point；

(2) by calculating apart from sort ascending, selection and k nearest point of current data to be sorted；

(3) for discrete classification, the k most classifications of the frequency of occurrences is returned and make prediction classification；K is then returned for recurrence The weighted value of a point is as predicted value.

In simple terms, the process it is to be understood that have so a pile you known the data of classification, then when one is new When data enter, begins to seek distance with point each of in training data, then choose k point nearest from this training data Look at what type these points belong to, then use the principle that the minority is subordinate to the majority, sorts out to new data.

Experimental result

MSR-Action3D is currently used most common data sets, and the data set is selected to test frame first And the method assessed with the currently used data set is compared.

When being trained assessment, the method for intersecting topic division that selection is proposed using Wang et al., specifically It is ten themes of MSR-Action3D, selects theme 1,3,5,7,9 for instructing, and theme 2,4,6,8,10 is for testing. In this case, frame of the invention acts the confusion matrix being identified for 20 postures of MSR- Action3D As shown in Figure 2.

It can be accurate to 14 in 20 posture behaviors of data set from can be seen that frame in the confusion matrix of Fig. 2 It identifies, remaining has 4 discriminations 90% or more, and only 2 accuracy of identification are less than 90%.The assessment result with simultaneously It is compared with the art methods for equally using these data sets to carry out gesture recognition, as shown in table 1 below.

Table 1

Can be seen that the average recognition rate of the frame from the comparing result of upper table 1 can reach 95.86%, and currently make It is compared with the Activity recognition technical method of same behavior data set, still there is higher accuracy of identification.

By the method for the present invention on UTKinect-Action3D data sets and Florence 3D Actions data sets Test carries out the finally obtained confusion matrix of human body attitude Activity recognition and distinguishes shown in following Fig. 3 and Fig. 4.

Assessment result in these three data sets is by the reference value as the assessment accuracy of frame, and to reach sitting posture This specific identification object of behavior, after frame feature extraction is carried out refinement adjustment, using the frame after adjustment to sitting posture state Under behavior be detected identification, the assessment result under the result and former frame is compared.Before being detected, first All human body attitudes action that three data are concentrated is integrated and selected, it will wherein can be by as human body under sitting posture state Picking out for behavior is tested.The final sitting posture behavior tested has：It sits down, height is waved, level is waved, hand is grabbed It takes, hand is clamped down on, both hands are brandished, is drunk water, is received calls, clapping hands, sees table, stands.Sitting posture after corresponding obtained refinement adjustment Detection framework compares the recognition result of each sitting posture behavior and the testing result of former detection framework as shown in Figure 5.

The sitting posture behavior frameworks after specific adjusted are can be seen that by the line chart in Fig. 5 with the present invention most to begin to use Accuracy of detection of the human body attitude behavioral value frame in specified sitting posture behavior it is more close, compared using based on sitting posture behavior In the particularity of the specified conditions of human body entirety posture behavior, corresponding initial input feature is reduced, to reduce in training process Intrinsic dimensionality this refinement adjustment after, the assessment result of the frame simultaneously significantly affects, still have more considerable validity and Accuracy rate.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include Within protection scope of the present invention.

Claims

1. a kind of human body sitting posture Activity recognition method based on Kinect, which is characterized in that include the following steps：

(1) obtain skeleton image sequence from human body attitude behavioral data collection, according to each frame image in skeleton image sequence and its Former frame and a later frame image obtain the time local feature of the frame image, and are obtained according to each frame image in skeleton image sequence The space local feature of the frame image, time local feature and space local feature is taken to collectively form characteristic sequence f_n,s, wherein n ∈ [1,7], s are the picture frames in skeleton image sequence；

(2) K-means is used to carry out clustering processing to the characteristic sequence of the frame image obtained in step (1), it is multiple poly- to obtain Class handling result；

(3) multiple clustering processing results that step (2) obtains are handled using PCA methods, it is multiple according to pivot to obtain The local feature that the importance of information arranges from high to low.

(4) local feature for using feature coding algorithm to obtain step (3) is handled, to obtain characteristics of image descriptor；

(5) global loss function is built according to the characteristics of image descriptor of other frame images in frame image and skeleton image sequence, Global loss function is solved using based on stochastic gradient descent algorithm, to obtain making global loss function to minimize most Good linear transformation function；

(6) using obtained optimum linear transforming function transformation function to the characteristics of image of other frame images in frame image and skeleton image sequence Descriptor is handled, to obtain the similarity of frame image and other frame images in skeleton image sequence.

(7) frame image of the nonparametric K- nearest neighbor algorithms to being obtained in step (6) and other frame images in skeleton image sequence are utilized Similarity handled, with to all images carry out behavior classification.

2. human body sitting posture Activity recognition method according to claim 1, which is characterized in that according to every in skeleton image sequence The time local feature that one frame image and its former frame and a later frame image obtain the frame image is specially：

First, artis is divided into three joint groups according to human body, wherein head, left hand, the right hand, left foot, right crus of diaphragm exists Displacement vector in time-varying process constitutes the first joint group, and neck, left hand elbow, right hand elbow, left knee, right knee are in time change Displacement vector in the process constitutes second joint group, the displacement of backbone, left shoulder, right shoulder, left stern, right stern in time-varying process Vector constitutes third joint group；

Wherein 1 ＜ s ＜ τ, τ is the quantity of picture frame in skeleton image sequence,It is of s frames image in skeleton image sequence Coordinate of the i artis in coordinate system (X, Y, Z),Indicate i-th of artis of s frame images in skeleton image sequence Time shifting vector；

Finally, by belong in the time shifting vector of the different artis of each frame image achieved above same joint group when Between local feature combine, to establish the time local feature f of the first, second, and third joint group respectively₁To f₃。

3. human body sitting posture Activity recognition method according to claim 2, which is characterized in that according to every in skeleton image sequence The space local feature that one frame image obtains the frame image is specially：

First, artis is divided into 4 joint groups according to human body, wherein head, left hand, the right hand respectively with the phase of backbone 4th joint group is constituted to position vector, head, left hand, left foot respectively constitute the 5th joint with the Relative position vector of right stern Group；Head, the right hand, right crus of diaphragm respectively constitute the 6th joint group with the Relative position vector of left stern；Left hand, the right hand respectively with head Relative position vector constitutes the 7th joint group；

WhereinIndicate space displacement vector of i-th of artis relative to j-th of artis in s frame images, and have i ≠ j；

Finally, by belong in the space displacement vector of the different artis of each frame image achieved above same joint group when Between local feature combine, to establish the time local feature f of the four, the five, the 6th and the 7th joint group respectively₄To f₇。

4. human body sitting posture Activity recognition method according to claim 3, which is characterized in that feature coding algorithm is local spy Sign polymerization description vectors accord with algorithm.

5. human body sitting posture Activity recognition method according to claim 4, which is characterized in that local feature is handled, It is specifically to obtain characteristics of image descriptor：

First, it is calculated using following formula：

F_n,c=[υ_n,c,1,...,υ_n,c,i,...,υ_n,c,k]

Wherein F_n,cIt indicates multiple residual vector υ of corresponding c-th of clustering processing result of n-th of joint group_n,c,mSeries connection obtains Vector, υ_n,c,kDuring indicating clustering processing, the local feature set of the subgroup n of c-th of initialization in gathering k, c=1, 2 ... C }, C indicates that the quantity of clustering processing result, m={ 1,2 ..., k }, k indicate the quantity of gathering in clustering processing result；μ_n,_c,mIt indicates in corresponding c-th of clustering processing result of n-th of joint group m-th The corresponding cluster centre of gathering, and have

S_n,c,m={ f_n,s| m=arg min_p||f_n,s-μ_n,c,p||}

Wherein p={ 1,2 ..., k }, and have p ≠ m；

Then, by all joint groups in frame image and the corresponding vector F of all clustering processing results_n,cSummation just obtains image spy Levy descriptor.

6. human body sitting posture Activity recognition method according to claim 5, which is characterized in that step (5) is specially：

First, it is as follows to build global loss function：

ε (L)=(1- μ) ε_pull(L)+με_push(L)+γ||L^TL-I||²

Wherein μ is component ε_push(L) and component ε_pull(L) ratio between component, γ are regularization coefficients, and I is unit matrix, L It indicates linear transformation, and has

WhereinIt is the characteristics of image descriptor of frame image, component ε_pull(L) it indicates to further real target sample characteristics of image DescriptorAs same category of measurement, component ε_push(L) it indicates to further the same of feature vector characteristics of image descriptor When practical will not push away characteristics of image descriptor in the same category of sample l that acts as fraudulent substitute for a person interferedMeasurement, ξ is figure As feature descriptor and act as fraudulent substitute for a person the expectation separation spacing between sample l, j → i indicates characteristics of image descriptorAnd image Feature descriptorIt is same category of target base sample, l, which is sample index, acts as fraudulent substitute for a person sample when being i,It indicates Characteristics of image descriptorWith characteristics of image descriptorIt is not same category of target base sample；

L^*=argmin_Lε(L)。

7. a kind of human body sitting posture Activity recognition system based on Kinect, which is characterized in that including：

First module, for obtaining skeleton image sequence from human body attitude behavioral data collection, according to each in skeleton image sequence Frame image and its former frame and a later frame image obtain the time local feature of the frame image, and according to every in skeleton image sequence One frame image obtains the space local feature of the frame image, and time local feature and space local feature collectively form characteristic sequence f_n,s, wherein n ∈ [1,7], s are the picture frames in skeleton image sequence；

Second module, for carrying out clustering processing using the characteristic sequence of the frame image obtained in K-means pairs of the first module, with Obtain multiple clustering processing results；

Third module, multiple clustering processing results for being obtained using the second module of PCA methods pair are handled, more to obtain The local feature that a importance according to pivot information arranges from high to low.

4th module, the local feature for being obtained to third module using feature coding algorithm are handled, to obtain image Feature descriptor；

5th module, it is global for being built according to the characteristics of image descriptor of other frame images in frame image and skeleton image sequence Loss function solves global loss function using based on stochastic gradient descent algorithm, to obtain making global loss function The optimum linear transforming function transformation function of minimum；

6th module, the optimum linear transforming function transformation function for using is to other frame images in frame image and skeleton image sequence Characteristics of image descriptor handled, to obtain the similarity of other frame images in frame image and skeleton image sequence.

7th module, for using in the frame image and skeleton image sequence obtained in the 6th module of nonparametric K- nearest neighbor algorithms pair The similarity of other frame images is handled, to carry out behavior classification to all images.