CN109614899A

CN109614899A - A kind of human motion recognition method based on Lie group feature and convolutional neural networks

Info

Publication number: CN109614899A
Application number: CN201811446456.5A
Authority: CN
Inventors: 蔡林沁; 丁和恩; 陆相羽; 隆涛; 陈思维
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Beijing Zhifeng Technology Co.,Ltd.
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2019-04-12
Anticipated expiration: 2038-11-29
Also published as: CN109614899B

Abstract

The human motion recognition method based on Lie group feature and convolutional neural networks that the present invention relates to a kind of, belongs to CRT technology field.This method comprises: S1: data acquisition extracts skeleton information using Microsoft somatosensory device Kinect, obtains the motion information of experimenter；S2: Lie group feature is extracted, take a kind of Lie group bone representation method that the relative dimensional geometrical relationship between each limbs of human body is simulated using rigid limbs transformation, human action is modeled as a series of curves in Lie group, it will be the curve based on Lie algebra space based on the curve mapping in Lie group space using logarithmic mapping and then in conjunction with the corresponding relationship between Lie group and Lie algebra；S3: tagsort merges Lie group feature and convolutional neural networks and allows convolutional neural networks to be learnt to Lie group feature, be classified using Lie group feature training convolutional neural networks, to realize that human action identifies.The present invention can obtain good recognition effect.

Description

A kind of human motion recognition method based on Lie group feature and convolutional neural networks

Technical field

The invention belongs to CRT technology fields, are related to a kind of human body based on Lie group feature and convolutional neural networks Action identification method.

Background technique

With the fast development of science and technology, more natural human-computer interaction becomes the more more and more urgent needs of people, Ren Mengeng Add serious hope computer that can think deeply and understand the signal of extraneous input as human brain, understands the daily behavioral activity of the mankind, with Convenient for more easily naturally being exchanged with computer.

Human action identification refers to digital picture or video signal flow etc. for object, passes through image procossing and automatic identification The methods of, obtain a kind of practical technique of human action information.Due to the variability of human action, camera motion, light intensity The presence of the problems such as otherness under the conditions of variation, the gap of different people figure, human body varying environment so that human action identification Research become a multi-crossed disciplines and extremely challenging technical problem.

In recent years, since human action identifies that, in computer vision, human-computer interaction, video monitoring, health care virtually shows The extensive use in the fields such as real, already becomes the research field when next hot topic, by computer vision, artificial intelligence etc. The favor of area research person.Currently, the method for most human body action recognitions mainly uses manual extraction feature.This method master It is divided into feature detection and feature describes two stages, wherein common characteristic detection method is if any 3D Corner Detection, Cuboid Algorithm and Hessian3D matrix；And common attribute description son such as Cuboid algorithm, histograms of oriented gradients (HOG), light stream histogram Scheme (HOF), enhanced intensive Trajectory Arithmetic (iDT) etc..But since the method for manual extraction feature is more time-consuming, Er Qieti The quality of feature is taken to be highly dependent on the experience of researcher, so this method based on manual extraction feature is slowly just lost The favour of researcher is gone.Given this reason, Many researchers propose to carry out human body with the color image video of human motion Action recognition, this method achieve certain effect, but since color image video lacks the three-dimensional spatial information of human motion, The description of human motion cannot be accomplished comprehensively, and artificially block, the factors such as illumination variation under the influence of, unavoidably Cause action recognition inaccurate in addition it is unrecognized as a result, having embodied great limitation.

In recent years, with the appearance of some depth transducers, the Kinect to produce such as Microsoft, HuaShuo Co., Ltd produce Xtion PRO etc. greatly changes the method extracted for body motion information.Use depth transducer energy convenient and efficient Body motion information is obtained, compared to color image, depth image and bone information have significantly in description human motion Advantage, on the one hand, depth transducer equipment is not only easy to operate, and greatly simplifies the calibration process of common camera； On the other hand, the depth image obtained directly contains the depth information of human body, can effectively overcome the shadow of illumination variation etc. It rings and depth image has more distinction than the texture and color description of color image for the description of geometry.So base The research interest of numerous researchers is caused in the human action identification of bone information, has emerged in large numbers the achievement of many stages.Closely Nian Lai, many scholars' propositions extract human body motion feature in popular world, pass through the relative dimensional geometry of human body difference limbs Relationship can more fully describe the feature of athletic performance, relative to the joint being only connected with each other between limbs The more advantage such as angle change between point change in location, limbs.

In the classification of motion, the method for some deep learnings proposed in recent years identifies field in image recognition and physical activity Deng successful application, cause extensive concern.Such as convolutional neural networks, depth confidence network are handled to high dimensional data, Feature learning etc. has embodied advantage, for reducing calculation amount, reduces the complexity of identification process, enhances accuracy of identification There is preferable effect.

Therefore, in order to overcome the shortcomings of that traditional-handwork extracts feature, the three-dimensional of the bone information of human motion is made full use of The advantage of spatial information and deep learning, the present invention propose a kind of based on the knowledge of the human action of Lie group feature and convolutional neural networks Other method.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of human action based on Lie group feature and convolutional neural networks Recognition methods, this method greatly overcome interference of the traditional technology to extraneous environmental change and human somatotype variation etc., can Preferably overcome some action identification methods based on traditional theorem in Euclid space that can not simulate, state the space complexity of human action And the defect of geometrical relationship；Simultaneously this method can preferably processing movement between Similarity Problem and class between high variability problem； On calculating cost and recognition effect, feature is handled with convolutional neural networks feature can be learnt very well, Classification, also can largely reduce calculating cost；Recognition accuracy is high.

In order to achieve the above objectives, the invention provides the following technical scheme:

A kind of human motion recognition method based on Lie group feature and convolutional neural networks, specifically includes the following steps:

S1: data acquisition extracts skeleton information using Microsoft somatosensory device Kinect, obtains the movement letter of experimenter Breath；

S2: extracting Lie group feature, take it is a kind of using the rigid limbs transformation rotation of such as three-dimensional space (, translate) come mould The Lie group bone representation method of relative dimensional geometrical relationship between anthropomorphic each limbs of body, is modeled as a system in Lie group for human action Curve mapping based on Lie group is base using logarithmic mapping by column curve, and then combine the corresponding relationship between Lie group and Lie algebra Curve in Lie algebra space；

S3: tagsort merges Lie group feature and convolutional neural networks, using Lie group feature training convolutional neural networks, Convolutional neural networks are allowed to be learnt to Lie group feature, be classified, to realize that human action identifies.

Further, it in the step S1, for the skeleton information of acquisition, is normalized, guarantees skeleton ruler Very little equal consistency.

Further, the step S2 is specifically included:

Human skeleton is indicated with S=(V, E), wherein V={ v₁..., v_NBe artis set, E={ e₁..., e_MTable Show the set of rigid limbs, wherein N is artis quantity, and M is rigid limbs quantity；Define e_n1∈R³And e_n2∈R³It respectively indicates Limbs e_nBeginning and end；Given a pair of joint limbs e_mAnd e_n, the static state of human body, which acts, can use e_mAnd e_nBetween opposite geometry Relationship is described, and this description method is summarized are as follows: in local coordinate system will wherein a limbs rotated, move to separately In one limbs same direction same position；Wherein complete rigid limbs conversion process an are as follows: limbs are first around axisWith certain angle Degree θ rotated, rotate to another limbs it is equidirectional after stop rotating, then translate againDistance be allowed to and another limbs It is overlapped.

Further, in the step S2, the complete rigid limbs conversion process specifically:

For rigid limbs e_mAnd e_n, by e_nBy rotating, translating, make itself and e_mIt is overlapped, obtains e_mSame e_nBetween one three Tie up transformation relation are as follows:

Wherein,It indicates with limbs e_mFor starting point, with limbs e_nFor the rotary shaft vector of terminal；θ_{M, n}Indicate limbs e_mAround AxisIt rotates to and limbs e_nEquidirectional angle；For postrotational limbs e_mMove to limbs e_nDistance vector；

Similarly, by e_mBy rotating, translating, make itself and e_nIt is overlapped, obtains e_mSame e_nBetween another three-dimension varying relationship Are as follows:

Gather the opposite 3D geometrical relationship between all limbs pair, at a certain moment t, a human body skeleton representation is following shape Formula:

S (t)=(T_1,2(t), T_2,1(t) ..., T_{M-1, M}(t), T_{M, M-1}(t)) (3)

Wherein, M is rigid limbs quantity, and M (M-1) is the total degree of all rigid limbs transformation, T_{M, M-1}(t) limbs are indicated e_M-1With limbs e_MBetween three-dimension varying relationship；Bone representation more than, the bone sequence for describing human action indicate For the curve of following form:

{ S (t), t ∈ [0, T '] } (4)

Wherein, T ' is total time；

Remember R_{I, j}(t) it is 3D spin matrix, is expressed asSo (3) can be deformed into:

It will indicate that the curve mapping in Lie group space can be obtained to Lie algebra space:

Wherein, vec (g) indicates vector space.

Further, it is specifically included in the step S3:

S31: by the curve obtained in step S2 dynamic time warping (Dynamic Time Warping, DTW) method Regular processing is carried out, problem is become with fix-rate, before from Lie group space reflection to Lie algebra space, is moved for every class by curve Make, all needs first to calculate a standard curve, then by all snaps of all curves to standard curve, and keep its length consistent.

S32: right using Fourier descriptor (Fourier temporal pyramid, FTP) after DTW is handled Curve is described, by one three layers of time pyramid representation of obtained Fourier descriptor, and by each partial-length A quarter is as its low frequency coefficient, the Feature Descriptor entirely acted, the step be advantageous in that can overcome noise, The unfavorable factors such as time deviation enhance robustness；

S33: after completing Lie group feature extraction, motion characteristic is fused in convolutional neural networks be trained, learn, Classification；Using two-dimensional convolution core in convolution process, convolution exports later are as follows:

Wherein, R is activation primitive ReLU, x_iFor input feature vector figure, y_iTo export characteristic pattern, w_{I, j}Jth interlayer is arrived for i-th layer Weight, b_jTo set parameter.

The beneficial effects of the present invention are:

(1) present invention is described human motion using skeleton information, overcomes tradition and carries out manual extraction spy The shortcomings that sign, has stronger robustness for the problems such as noise, rate change.The skeleton letter extracted by equipment such as Kinect Breath greatly overcomes the wrong identification as caused by the external world's care change, blocks, dress ornament changes and human somatotype difference etc. The shortcomings that.When human body makes different movements, there is different positions and angular relationship etc. between corresponding joint and bone, this Description more accurate and effective of a little features for human action.

(2) present invention describes human action using Lie group feature, utilizes the relative dimensional geometrical relationship between parts of body Simulate human action.Currently, most action identification methods based on skeleton information be all the explicit feature of extraction comparison such as, Three-dimensional coordinate, joint angles relative changing value and the speed for passing through the feature such as artis that post-processing is crossed of artis, angle Speed, orientation information, bone angle etc..Although these information based on algebraic operation can describe human body row to a certain extent For, but compared to the description based on three-dimensional geometry space, it can show slightly not comprehensive enough.The present invention utilize limbs between in three dimensions Relative geometrical relation, for motion characteristic extract it is more detailed.Meanwhile feature extraction is carried out in popular world, it can not It is limited to only to calculate geometrical relationship between the limbs being connected, relevant geometrical relationship can also be calculated between disconnected limbs, Can be fabulous overcome some action identification methods based on traditional theorem in Euclid space can not simulate, the space of stating human action it is multiple The defect of polygamy and geometrical relationship, and can preferably overcome due between movement Similarity Problem and class between high variability problem etc. The problem of being unfavorable for action recognition caused by factor.

(3) present invention fusion Lie group feature and convolutional neural networks carry out action recognition, in view of the knot of convolutional neural networks Structure feature, on the one hand, the neuron weight on same Feature Mapping face is identical (weight is shared), and network is learned parallel It practises, compared to other neural networks, reduces training parameter, network structure becomes simpler, more adaptable；Another party Face due to convolutional neural networks processing capacity strong for higher-dimension complex data and can carry out the spy of multidimensional input Property, compared to some other sorting algorithms such as HMM, random forest etc., not only calculating speed is fast, reduces calculating cost, is knowing Also tool has great advantage in terms of other effect.

(4) (such as Florence3D, UTKinect-Action, MSRAction-3D are tested in multiple and different databases Deng), preferable recognition effect is obtained, stronger generalization ability has been embodied, is suitable for human action and identifies field.

Detailed description of the invention

In order to keep the purpose of the present invention, technical scheme and beneficial effects clearer, the present invention provides following attached drawing and carries out Illustrate:

Fig. 1 is the overall framework figure of the method for the invention；

Fig. 2 is Lie group bone representation method (three-dimensional geometry rotation, translation between limbs) schematic diagram that the present invention uses；

Relational graph of the Fig. 3 between Lie group and Lie algebra；

The overall framework figure for the convolutional neural networks that Fig. 4 uses for the present invention and the parameter used.

Specific embodiment

Below in conjunction with attached drawing, a preferred embodiment of the present invention will be described in detail.

Fig. 1 is the overall frame of the human motion recognition method of the present invention based on Lie group feature and convolutional neural networks Frame, as shown in Figure 1, recognition methods groundwork of the present invention is to obtain human body by the somatosensory device Kinect that Microsoft produces The bone information of motion sequence simulates people using rigid limbs transformation (rotation, the translation of such as three-dimensional space) with a kind of Human action is modeled as a series of songs in Lie group by the Lie group bone representation method of the relative dimensional geometrical relationship between each limbs of body Line, and then the corresponding relationship between Lie group and Lie algebra is combined, such as Fig. 3 is reflected the curve based on Lie group space using logarithmic mapping It penetrates as the curve based on Lie algebra space.Finally, fusion Lie group feature and convolutional neural networks, utilize Lie group feature training convolutional Neural network allows convolutional neural networks to be learnt to Lie group feature, be classified, to realize that human action identifies.It is specific real It is as described below to apply scheme:

S1: data acquisition extracts skeleton information using Microsoft somatosensory device Kinect, obtains the movement letter of experimenter Breath；It for the skeleton information of acquisition, is normalized, guarantees the consistency of skeleton size etc..

S2: extracting Lie group feature, take it is a kind of using the rigid limbs transformation rotation of such as three-dimensional space (, translate) come mould The Lie group bone representation method of relative dimensional geometrical relationship between anthropomorphic each limbs of body, is modeled as a system in Lie group for human action Column curve, and then combine the corresponding relationship between Lie group and Lie algebra, using logarithmic mapping by the curve mapping based on Lie group space For the curve based on Lie algebra space.As shown in Fig. 2, specific human action three-dimensional geometry representation method principle is as follows:

Human skeleton is indicated with S=(V, E), wherein V={ v₁..., v_NBe artis set, E={ e₁..., e_MTable Show the set of rigid limbs, wherein N is artis quantity, and M is rigid limbs quantity；Define e_n1∈R³And e_n2∈R³It respectively indicates Limbs e_nBeginning and end；Given a pair of joint limbs e_mAnd e_n, the static state of human body, which acts, can use e_mAnd e_nBetween opposite geometry Relationship is described, and this description method is summarized are as follows: in local coordinate system will wherein a limbs rotated, move to separately In one limbs same direction same position；Wherein complete rigid limbs conversion process an are as follows: limbs are first around axisWith certain angle Degree θ rotated, rotate to another limbs it is equidirectional after stop rotating, then translate againDistance be allowed to and another limbs It is overlapped.It is specifically shown in Fig. 2.

Gather the opposite 3D geometrical relationship between all limbs pair, in moment t, a human body skeleton representation is following form:

S (t)=(T_1,2(t), T_2,1(t) ..., T_M-1, M (t), T_{M, M-1}(t)) (3)

{ S (t), t ∈ [0, T '] } (4)

Wherein, T ' is total time；

Wherein, vec (g) indicates vector space.

For the curve obtained in step S2, regular processing is carried out with dynamic time warping (DTW) method, is become with fix-rate Problem.It is acted for every class, all needs first to calculate a standard curve, then by all snaps of all curves to standard curve, and make Its length is consistent.After DTW is handled, we take Fourier descriptor that curve is described in turn, Fu that will be obtained In leaf three layers of time pyramid representation are described, and using a quarter of each partial-length as its low frequency coefficient, The Feature Descriptor entirely acted, which, which is advantageous in that, can overcome the unfavorable factors such as noise, time deviation, enhancing Robustness.In terms of action recognition, motion characteristic is fused to convolutional neural networks after completing Lie group feature extraction by such as Fig. 4 In be trained, learn, classify.Using two-dimensional convolution core in convolution process, convolution exports later are as follows:

Wherein, R is activation primitive ReLU, x_iFor input feature vector figure, y_iTo export characteristic pattern, w_i,jJth interlayer is arrived for i-th layer Weight, b_jTo set parameter；The structure of entire convolutional neural networks has been divided into 6 layers, and wherein first layer, third layer are convolutional layer C1,C3；The second layer, the 4th layer be maximum pond layer S2, S4；Layer 5, layer 6 are full articulamentum FC5, FC6.Each layer of tool The parameter of body is shown in Fig. 4.

Finally, it is stated that preferred embodiment above is only used to illustrate the technical scheme of the present invention and not to limit it, although logical It crosses above preferred embodiment the present invention is described in detail, however, those skilled in the art should understand that, can be Various changes are made to it in form and in details, without departing from claims of the present invention limited range.

Claims

1. a kind of human motion recognition method based on Lie group feature and convolutional neural networks, which is characterized in that this method is specific The following steps are included:

S1: data acquisition extracts skeleton information using Microsoft somatosensory device Kinect, obtains the motion information of experimenter；

S2: extracting Lie group feature, takes a kind of relative dimensional geometry simulated between each limbs of human body using rigid limbs transformation Human action is modeled as a series of curves in Lie group, and then combines Lie group and Lie algebra by the Lie group bone representation method of relationship Between corresponding relationship, using logarithmic mapping by the curve mapping based on Lie group be the curve based on Lie algebra space；The rigidity Limbs variation includes the rotation and translation of three-dimensional space；

S3: tagsort merges Lie group feature and convolutional neural networks, using Lie group feature training convolutional neural networks, allows volume Product neural network learns Lie group feature, is classified, to realize that human action identifies.

2. the human motion recognition method according to claim 1 based on Lie group feature and convolutional neural networks, feature It is, in the step S1, for the skeleton information of acquisition, is normalized, guarantees the consistency of skeleton size.

3. the human motion recognition method according to claim 2 based on Lie group feature and convolutional neural networks, feature It is, the step S2 is specifically included:

Human skeleton is indicated with S=(V, E), wherein V={ v₁,…,v_NBe artis set, E={ e₁,…,e_MIndicate rigid Property limbs set, wherein N is artis quantity, and M is rigid limbs quantity；Define e_n1∈R³And e_n2∈R³Respectively indicate limbs e_nBeginning and end；Given a pair of joint limbs e_mAnd e_n, the static state of human body, which acts, can use e_mAnd e_nBetween relative geometrical relation It is described, this description method is summarized are as follows: wherein a limbs will be rotated, be moved to and another limb in local coordinate system In the same direction same position of body；Wherein complete rigid limbs conversion process an are as follows: limbs are first around axisWith certain angle Degree θ rotated, rotate to another limbs it is equidirectional after stop rotating, then translate againDistance be allowed to and another limbs It is overlapped.

4. the human motion recognition method according to claim 3 based on Lie group feature and convolutional neural networks, feature It is, in the step S2, the detailed process of the rigidity limbs transformation are as follows:

For rigid limbs e_mAnd e_n, by e_nBy rotating, translating, make itself and e_mIt is overlapped, obtains e_mSame e_nBetween one three-dimensional become Change relationship are as follows:

Wherein,It indicates with limbs e_mFor starting point, with limbs e_nFor the rotary shaft vector of terminal；θ_m,nIndicate limbs e_mAround axisIt rotates to and limbs e_nEquidirectional angle；For postrotational limbs e_mMove to limbs e_nDistance vector；

S (t)=(T_1,2(t),T_2,1(t),...,T_M-1,M(t),T_M,M-1(t)) (3)

Wherein, M is rigid limbs quantity, and M (M-1) is the total degree of all rigid limbs transformation, T_M,M-1(t) limbs e is indicated_M-1 With limbs e_MBetween three-dimension varying relationship；Bone representation more than, the bone sequence for describing human action are expressed as The curve of following form:

{S(t),t∈[0,T']} (4)

Wherein, T ' is total time；

Remember R_i,j(t) it is 3D spin matrix, is expressed asSo (3) can be deformed into:

Wherein, vec (g) indicates vector space.

5. the human motion recognition method according to claim 4 based on Lie group feature and convolutional neural networks, feature It is, is specifically included in the step S3:

S31: the curve obtained in step S2 dynamic time warping (Dynamic Time Warping, DTW) method is carried out Regular processing becomes problem with fix-rate；

S32: after DTW is handled, using Fourier descriptor (Fourier temporal pyramid, FTP) to curve It is described, by one three layers of time pyramid representation of obtained Fourier descriptor, and by four points of each partial-length One of be used as its low frequency coefficient, the Feature Descriptor entirely acted；

S33: after completing Lie group feature extraction, motion characteristic is fused in convolutional neural networks and is trained, learns, dividing Class；Using two-dimensional convolution core in convolution process, convolution exports later are as follows:

Wherein, R is activation primitive ReLU, x_iFor input feature vector figure, y_iTo export characteristic pattern, w_i,jThe power of jth interlayer is arrived for i-th layer Value, b_jFor offset parameter.