CN105138995A

CN105138995A - Time-invariant and view-invariant human action identification method based on skeleton information

Info

Publication number: CN105138995A
Application number: CN201510551025.5A
Authority: CN
Inventors: 刘智; 冯欣; 张�杰; 杨武; 张凌; 张杰慧; 黄智勇
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2015-12-09
Anticipated expiration: 2035-09-01
Also published as: CN105138995B

Abstract

The invention discloses a time-invariant and view-invariant human action identification method based on skeleton information. The method comprises steps of: extracting human action video segments with the same video length; extracting twenty joints information expressing a human action from each frame of the video segments; acquiring a characteristic vector HCBV using the center of two hips as a reference, an angular characteristic vector AV, and a relative position characteristic vector PRPV by means of computing and processing according to the twenty joints information in each frame of the video segments; by using a support vector machine classifier, performing classification and identification on the characteristic vector HCBV using the center of two hips as a reference, the angular characteristic vector AV, and the relative position characteristic vector PRPV in order to obtain an identification probability of each action classification; performing weighting summation fusion on the identification probability of each action classification acquired by the characteristic vector HCBV using the center of two hips as a reference, the characteristic vector AV, and the relative position characteristic vector PRPV in order to obtain an action identification result. The method is simple and visualized, high in identification correct rate, and short in identification time.

Description

Based on framework information time constant and look constant Human bodys' response method

Technical field

The present invention relates to Activity recognition method, be specifically related to a kind of based on framework information time constant and look constant Human bodys' response method.

Background technology

Human bodys' response is at video monitoring, and man-machine interaction, a lot of field such as video extraction all plays very important role.The fields such as Human bodys' response can be adapted to Criminal Investigation, patient looks after, home for destitute.A period of time in the past, the most of feature based on engineer of machine vision generic task, as Scale invariant features transform (SIFT), histograms of oriented gradients (HOG), motion history image (MHI) etc.But a lot of classical visual identity method is only realize by piecing together more existing successful methods.Have scholar to think, the research of Activity recognition in progress in recent years slowly.The appearance of depth camera makes researcher can rethink some problems of image procossing and machine vision.Take color different with texture information compared with RGB camera, depth camera can record the depth information of human body, can obtain geological information and the framework information of human body from these information.And the change of depth camera to light is insensitive, thus in the visual tasks such as Video segmentation, target identification, Activity recognition, than traditional rgb video, there is better resolvability.

The potential relation found between human body behavior classification and framework information is absorbed in the research of present people to Activity recognition, as: based on Lie group and 3D skeleton point Human bodys' response " Humanactionrecognitionbyrepresenting3Dskeletonsaspointsi naliegroup; " see [1], behavior recognition methods computation complexity is high, spended time is long, extracting single video characteristic averaging time is 6.53 seconds, is not easy to promote the use of.As: a kind of based on 3D joint histogrammic look constant Human bodys' response method " Viewinvarianthumanactionrecognitionusinghistogramsof3Djo ints; " see [2], this recognition methods lost the contextual information of interframe before and after joint, and recognition correct rate is low.As: the space-time posture in 3D Human bodys' response represents " Space-timeposerepresentationfor3dhumanactionrecognition; " see [3], behavior recognition methods only studies posture, namely be research benchmark with image, identified by image, not only high to video capture equipment requirement, and make the information discrimination of acquisition low.And for example: the Activity recognition " Activityrecognitionfornaturalhumanrobotinteraction, " in natural man-machine interaction is shown in [4], and behavior Study of recognition man-machine interaction, recognition efficiency is low.Therefore can be represented a posture of human body by the 3D geometric relationship of skeleton joint information modeling health different parts, but existing recognition efficiency is low, time overhead is larger.

Document [1] Vemulapalli, F.Arrate, andR.Chellappa, " Humanactionrecognitionbyrepresenting3Dskeletonsaspointsi naliegroup; " inComputerVisionandPatternRecognition (CVPR), 2014IEEEConferenceon, 2014, pp.588-595.

Document [2] L.Xia, C.-C.Chen, andJ.K.Aggarwal, " Viewinvarianthumanactionrecognitionusinghistogramsof3Djo ints; " inComputerVisionandPatternRecognitionWorkshops (CVPRW), 2012IEEEComputerSocietyConferenceon, 2012, pp.20-27.

Document [3] M.Devanne, H.Wannous, S.Berretti, P.Pala, M.Daoudi, andA.DelBimbo. " Space-timeposerepresentationfor3dhumanactionrecognition, " inNewTrendsinImageAnalysisandProcessing-ICIAP2013.Spring er, 2013, pp.456-464.

Document [4] A.Chrungoo, S.Manimaran, andB.Ravindran, " Activityrecognitionfornaturalhumanrobotinteraction, " inSocialRobotics.Springer, 2014, pp.84-94.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, provide a kind of based on framework information time constant and look constant Human bodys' response method, this recognition methods simple, intuitive, recognition correct rate is high, and recognition time is short.

Object of the present invention can be achieved through the following technical solutions:

Based on framework information time constant and look constant Human bodys' response method, it is characterized in that: comprise the following steps:

1) extract human body behavior video-frequency band, and the video-frequency band of different length is standardized to a fixing video length;

2) according to the video extraction framework information of the regular length of gained, from each frame of video, 20 joint information of expressing human body behavior are namely extracted;

3) three proper vectors are extracted according to extracting 20 joint information of expressing human body behavior in each frame, namely the proper vector HCBV that it is benchmark that the information computing in 20 joints draws with Liang Kuan center from each frame of video, angle character vector AV and relative seat feature vector PRPV, the described proper vector HCBV that is benchmark with Liang Kuan center, with each frame two hip center knuckle for true origin, calculate the distance d of other joints of this frame to initial point, elevation angle φ and azimuth angle theta three parameters, by the distance d of other joints in all for this video frames except initial point to initial point, elevation angle φ and azimuth angle theta three parameter serial connections are HCBV, described angle character vector AV is by the vector of the angle serial connection in all for this video frames between two adjacent segments, described relative seat feature vector PRPV is by the vector of a certain joint in all for this video frames relative to the relative position serial connection between other joints,

4) respectively Classification and Identification is carried out to three proper vectors obtained: adopt support vector machine classifier to carry out Classification and Identification to the proper vector HCBV being benchmark with Liang Kuan center respectively, Classification and Identification is carried out to angle proper vector AV, to the capable Classification and Identification of relative seat feature vector PRPV, draw the identification probability of each behavior classification;

5) identification probability of each behavior classification is merged: the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification that the proper vector HCBV being benchmark with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV draw, wherein with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3.

Described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right crus of diaphragm respectively.

Described extraction human body behavior video-frequency band, first will carry out pre-service to each video length, adopts frame of video regularization method of interpolation the video-frequency band of different video length to be standardized to an identical video length.

The described computing method with the Liang Kuan center proper vector HCBV that is benchmark extract with two hip center knuckle as true origin from each frame of video, other joints are to the distance d of initial point, elevation angle φ and azimuth angle theta three parameters, then by the distance d of other joints of frames all in this video to initial point, elevation angle φ and azimuth angle theta are carried out vectorization process and are formed reference characteristic vector; If containing tNum frame in described video, then the dimension of this proper vector is 3 × 19 × tNum.

Describedly all be multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, see formula (1):

D＝λ×d(1)

In formula, height factors λ equals the inverse of the spacing in two hip center knuckle and ridge joint.

The computing method of described angle character vector AV are the angles extracted from each frame of video between each adjacent segment, then the angle between the adjacent segment in all for this video frames are carried out vectorization process angulation proper vector; If containing tNum frame in described video, then the dimension of this angle character vector AV is 19 × tNum.

The computing method of described relative seat feature vector PRPV extract a certain joint relative to the relative position between other joints from each frame of video, then all passes internode relative position in all for this video frames carried out vectorization process and form relative seat feature vector; If containing tNum frame in this video, then the dimension of this proper vector is 19 × 20 × tNum.

In the computing method of described relative seat feature vector PRPV, for the some joint i in t frame, extract relative position parameter p by the distance calculated between this joint i and other joints j _t ^ij, see formula (2):

p_{t}^{i j} = p_{t}^{i} - p_{t}^{j} - - - (2)

Wherein for the coordinate of joint i in this t frame, the three-dimensional relative position attribute of joint i in t frame, see formula (3):

p_{t}^{i} = {p_{t}^{i j} | i &NotEqual; j} - - - (3)

Therefore relative seat feature vector PRPV, see formula (4);

P R P V = {p_{t}^{i} | i = 1, .., 20; t = 1, .., t N u m} - - - (4) .

Before three proper vectors carry out Classification and Identification, adopt min-max method respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope.

Beneficial effect of the present invention: this behavior recognition methods comprises: first, pre-service is carried out to the video length gathered, the video-frequency band of different video length is standardized to a fixing video length, the proper vector extracted in different video is not only made to have identical dimension, and the main movement pattern information that can keep in video, thus ensure that the feature that this method time is constant.Secondly, from each frame of video, extract the information in human body behavior 20 joints, these 20 joints contain the major joint describing human body behavior, therefore enough express human body behavioural characteristic.According to the proper vector HCBV that in each frame, it is benchmark that the information computing in 20 joints draws with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV tri-proper vectors, from bone information, extract angle information and relative position form 3 kinds of different proper vector HCBV, AV and PRPV, HVBC combines angle and the azimuth information in each joint, AV considers all passes internode angle information, PRPV considers all joints relative position information, thus has the feature of looking immutableness.Then, adopt support vector machine classifier to carry out Classification and Identification to three proper vectors respectively, draw the identification probability of each behavior classification.Finally, the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification.The method calculates easy, takes a short time.The inventive method can obtain the recognition effect consistent with current method on UTKinect-Action3D data set, the present invention utilizes the framework information of human body to extract the feature in video, thus method more simple, intuitive, recognition time is short, recognition accuracy is high, improve real-time, and the method has the feature of time constant and unchanged view angle, makes the inventive method have stronger robustness when being applied to other data sets.

Described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right crus of diaphragm respectively.These 20 joints are the major joint of expressing human body behavior, to the expressiveness of human body behavior the most by force, clearer, thus make recognition efficiency higher.

The support vector machine classifier of reference characteristic vector is adopted to classify to the proper vector HCBV being benchmark with Liang Kuan center, adopt the support vector machine classifier of angle character vector to classify to angle proper vector AV, adopt the support vector machine classifier of relative seat feature vector to classify to relative seat feature vector PRPV.Adopt these three support vector machine classifiers to classify respectively, namely being formed first classifies merges again, and this method is relative to first merging vectorial better effects if of classifying again.

Describedly all be multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, reduce the impact of differing heights subject on proper vector.

Before three proper vectors carry out Classification and Identification, adopt min-max method respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope, makes data normalization, improves recognition correct rate.

Accompanying drawing explanation

Fig. 1 is the schematic diagram of the proper vector that the present invention is benchmark with Liang Kuan center;

Fig. 2 is the schematic diagram of angle character vector of the present invention.

Specific embodiment

Below in conjunction with accompanying drawing, the invention will be further described.

See shown in Fig. 1 to Fig. 2, a kind of based on framework information time constant and look constant Human bodys' response method,

Adopt depth camera sampling depth video, depth camera is relative to traditional rgb video, and deep video can not change along with the change of light, therefore at Video segmentation, has better distinction in the visual tasks such as Activity recognition than rgb video.The speed of deep video is 30 frames/second.

Step 1, extracts human body behavior video-frequency band, and the video-frequency band of different length is standardized to a fixing video length; From the deep video of shooting, extract human body behavior video, first will carry out pre-service to video length, adopt frame of video regularization method of interpolation the video-frequency band of different video length to be standardized to the video length of identical (fixing); Here video length refers to the frame number that video-frequency band comprises, frame of video regularization method of interpolation is adopted the video-frequency band of different frame number to be adjusted to the unify video section of same number of frames, the frame number of this unify video section is the intermediate value of all video-frequency band frame numbers, and frame of video regularization method of interpolation is the rule method being realized video length by frame of video interpolation technique.The length no requirement (NR) of video-frequency band, as long as it is just passable to comprise complete action behavior in this video-frequency band, generally at least two ten frames.Such as, be that the video-frequency band of 10 seconds is adjusted to the video-frequency band that time span is 15 seconds by time span, then video i-th frame after adjustment comes from [10*i/15] frame of former video, and wherein [] rounds on being.Generally the video length of same data centralization can not differ too large.The video-frequency band of different frame number is adjusted to the identical video-frequency band of frame number, the proper vector extracted in different video is not only made to have identical dimension, and the main movement pattern information that can keep in video, thus ensure that the feature of this method time invariance.This step is the important process step before Human bodys' response.

Step 2, according to the video extraction framework information of the regular length of gained, namely from each frame extracting video-frequency band, extract 20 joint information of expressing human body behavior, the x in 20 joint information and each joint, y, z coordinate information, described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right crus of diaphragm respectively; The present invention only needs the framework information extracting these 20 relations, utilize all pixels in frame to extract feature relative to existing method, method herein only utilizes the framework information of human body to extract the feature in deep video, and thus method is simpler, more efficient, real-time is higher.

Step 3: extract three proper vectors according to extracting 20 joint information of expressing human body behavior in each frame, namely from each frame of video-frequency band, the information computing in 20 joints draws with Liang Kuan center be benchmark proper vector HCBV (HipCenterBasedVector) angle character vector AV (AngleVector) and relative seat feature vector PRPV (PairwiseRelativePositionVector);

The described computing method with the Liang Kuan center proper vector HCBV that is benchmark extract with two hip center knuckle as true origin from each frame of video, calculate the distance d of other joints of this frame to initial point, elevation angle φ and azimuth angle theta three parameters, the distance d of other joints to initial point of all frames is calculated according to the three-dimensional coordinate of each articulation point, elevation angle φ and azimuth angle theta three parameters, then by the distance d of other joints except initial point all in this video to initial point, elevation angle φ and azimuth angle theta are carried out vectorization process and are formed reference characteristic vector.Other joints relatively, two hip center knuckle mobile ranges are minimum, so with the Liang Kuan center proper vector HCBV computing method that are benchmark using two hip center knuckle as the initial point of 3D rectangular coordinate, for other joints in each frame of deep video except two hip center knuckle, following three parameters can be calculated, namely this joint is to the distance d of initial point, elevation angle φ and azimuth angle theta.Describedly all be multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, see formula (1):

D＝λ×d(1)

In formula, height factors λ equals the inverse of the spacing in two hip center knuckle and ridge joint, and constitutive characteristic vector is carried out to the distance D of two hip center knuckle in other joints after standardization, reduces differing heights by the impact of person's examination on proper vector.

Have 3 × 19 parameters in every frame video, if containing tNum frame in video, then the dimension of this proper vector is 3 × 19 × tNum.As the distance D that Di_j is the i-th frame joint j, θ i_j is the position angle of the i-th frame joint j, and φ i_j is the elevation angle of the i-th frame joint j, draws: D1_1, φ 1_1 θ 1_1, D1_2 φ 1_2 θ 1_2, D1_3 φ 1_3 θ 1_3 ... .D1_19 φ 1_19 θ 1_19, D2_1 θ 2_1, D2_2 θ 2_2, D2_3 θ 2_3 ... D2_19 φ 2_19 θ 2_19, D20_1 θ 20_1, D20_2 θ 20__2, D20_3 θ 20_3.......The distance D of joint each in a certain frame to two hip center knuckle, elevation angle φ is connected with azimuth angle theta, then these three parameters of all frames is carried out connecting (vectorization process), and the Ge Yiliangkuan center that obtains is the proper vector HCBV of benchmark.

Described angle character vector AV is by the vector of the angle serial connection in all for this video frames between two adjacent segments, first determine the angle of all adjacent segments in skeleton structure, the angle of adjacent segment calculates according to the three-dimensional coordinate of each adjacent segment, and angle character vector AV is intended to the overall flexibility information extracting human body.The computing method of angle character vector AV are the angles extracted from each frame of video between each adjacent segment, then the angle between the adjacent segment in all for this video frames are carried out vectorization process angulation proper vector; All have 19 angle parameters in every frame video, if containing tNum frame in video, then the dimension of this angle character vector AV is 19 × tNum.

Described relative seat feature vector PRPV is by the vector of a certain joint in all for this video frames relative to the relative position serial connection between other joints, first extract a certain joint interarticular relative position information all relative to other, the relative position information in each joint calculates according to the three-dimensional coordinate of each adjacent segment, the computing method of described relative seat feature vector PRPV extract a certain joint relative to the relative position between other joints from each frame of video, then all passes internode relative position in all for this video frames is carried out vectorization process and form relative seat feature vector.

In the computing method of described relative seat feature vector PRPV, for the some joint i in t frame, extract relative position parameter p by the distance calculated between this joint i and other joints j _t ^ij(keeping original relative seat feature vector), see formula (2):

p_{t}^{i j} = p_{t}^{i} - p_{t}^{j} - - - (2)

p_{t}^{i} = {p_{t}^{i j} | i &NotEqual; j} - - - (3)

, all there are 19 × 20 angle parameters in 20 joints of human skeleton in every frame video, if containing tNum frame in this video, then the dimension of this proper vector is 19 × 20 × tNum, therefore relative seat feature vector PRPV, see formula (4);

P R P V = {p_{t}^{i} | i = 1, .., 20; t = 1, .., t N u m} - - - (4) .

Because the height of person to person is different, so interarticular distance also has difference, in order to eliminate this impact, all interarticular distances are standardized, wherein enter to equal the inverse of joint and ridge joint spacing between two hips, therefore standardization processing carried out to two interarticular relative positions and seen formula (5):

P＝p _t ^ij*λ(5)

Before three proper vectors carry out Classification and Identification, all adopt min-max method by respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope.

Step 4, respectively Classification and Identification is carried out to three proper vectors obtained: adopt three support vector machine classifiers (SVM) to carry out Classification and Identification to the proper vector HCBV being benchmark with Liang Kuan center respectively, Classification and Identification is carried out to angle proper vector AV, to the capable Classification and Identification of relative seat feature vector PRPV, draw the identification probability of each behavior classification; Support vector machine classifier described in this enforcement adopts LIBLINEAR sorter.The source code that the list of references that the sorting technique of support vector machine classifier SVM directly uses provides and method, class categories is exactly the classification of the human body behavior contained by this data set, and this algorithm can be suitable for the data set with any classification.Can both obtain by each proper vector the probability that this video belongs to certain behavior, therefore comprehensive three proper vectors improve recognition efficiency.

Step 5, the identification probability of each behavior classification is merged: the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification that the proper vector HCBV being benchmark with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV draw, wherein with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3.By the classification results of three of same video proper vectors is integrated, comprehensively in fact exactly the prediction probability of three proper vectors in each behavior is weighted summation, so just can obtain the prediction probability of each behavior after suing for peace, what probability was large is exactly the behavior identified, make the fusion of this classification results very simple, improve counting yield.Here with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3, and the weights of each proper vector obtain according to many experiments and experience for many years.

Experimental result of the present invention and analysis

A. data set and pre-service

Intel (R) Core (TM) the i5-4200M4 processor CPU of experiment main frame to be a dominant frequency be 2.50GHz, inside save as 4G, the inventive method has carried out experimental evaluation on UTKinect-Action3D data set.UTKinect-Action3D data set is the data set of the potential relation found between human body behavior classification and framework information, this data set uses static Kinect collected by camera to form, complete ten kinds of different behavior acts by ten different subjects and see Table I, every subject's each behavior act shooting secondary.Remove an invalid video, whole data set containing 109 nineteen effective videos, all provides the three-dimensional coordinate in 20 joints altogether in each video.For the convenience of experiment, 200 video sequences are employed in experiment herein, the second time shooting that the tenth subject being about to disappearance throws away (carry) action uses 1242 frames in raw data to supplement to the video of 1300 frames, and raw data refers to the original long video be not segmented the earliest.UTKinect-Action3D data set contains from multiple different angle shot video sequence and has otherness in very high class, therefore very challenging.In described class, otherness also has very large difference between same behavior classification, and the action of waving of such as different people has very big difference.On pretreatment, we have carried out simple pre-service to each video, and first process uses frame of video regularization method of interpolation to standardize to unified video length by all video lengths of data centralization, and this video length is the intermediate value of all video lengths.Second process uses min-max, and method is by respectively by the x of all videos, y, and z coordinate value is standardized to [0,1] scope.

B. performance evaluation

For the experimental evaluation on UTKinect-Action3D data set, use the Setup Experiments of intersection subject herein, namely the behavior act of five subjects is used for training, with { 1,3,5,7,9} represents their behavior sequence, and the behavior act of other five subjects is used for test, with { 2,4,6,8,10} represents their behavior sequence, and Table I gives the identification accuracy of every class behavior.As can be seen from Table I, the average recognition accuracy of each behavior is 95%.UTKinect-Action3D data set is individual from taking from various visual angles and challenging data set, and the length of each video is also completely different.The experimental result of high discrimination indicates the unchanged view angle of context of methods and the feature of time invariance.Can find out from Table I, throw away (carry), throw (throw) and push away the discrimination of (push) action relatively low.Wherein (carry) action of throwing away of subject 9 and subject 10 is identified as action throw and push respectively by mistake, because the video frame number that these two actions contain causes provided information to be not enough to for Classification and Identification very little, therefore selected video frame number will comprise complete action behavior, at least two ten frames substantially.

Discrimination (the mean value: 95%) of each behavior act on Table I UTKINECT-Action3D data set.

Behavior	walk	sit down	stand up	pick up	carry
						Accuracy	100	100	100	100	80
Behavior	throw	push	pull	wavc hand	clap hand
						Accuracy	80	90	100	100	100

Context of methods and existing Activity recognition method recognition effect on UTKINECT-Action3D data set compare sees Table II.Method in this paper (Proposed) obtains the classification performance of 95%, and the recognition correct rate of other Activity recognition methods is all lower than the inventive method.Meanwhile, it is 0.18 second that the inventive method extracts single video characteristic averaging time, the single video of extraction characteristic averaging time based on the Human bodys' response of Lie group and 3D skeleton point is 6.53 seconds, required in the Human bodys' response based on Lie group and 3D skeleton point, therefore the method more simple, intuitive of this paper, also more efficient on time overhead.

The comparison of Table II context of methods and existing method recognition effect on UTKINECT-Action3D data set

Method (Method)	Accuracy rate (Accuracy)
		Xia et al. (2012) document [2]	90.92％
Devanne et al. (2013) document [3]	91.5％
		Chrungoo et al. (2014) document [4]	91.96％
Proposed	95％

The present invention proposes a kind of directly perceived and simple and effective Human bodys' response method of framework information based on deep video, the method forms 3 kinds of different proper vector HCBV, AV and PRPV by extracting interarticular angle information and relative position information in deep video.By merging the classification results of HCBV, AV, PRPV tri-proper vectors, method herein obtains good recognition result at UTKinect-Action3D data set, herein proposed method more simple, intuitive, and time overhead is less.Meanwhile, the feature that this method is extracted has the feature of time invariance and unchanged view angle, makes this method have stronger robustness when being applied to other data sets.

Claims

1. based on framework information time constant and look constant Human bodys' response method, it is characterized in that: comprise the following steps:

2. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle and right crus of diaphragm respectively.

3. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: described extraction human body behavior video-frequency band, first to carry out pre-service to each video length, adopt frame of video regularization method of interpolation the video-frequency band of different video length to be standardized to an identical video length.

4. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: the described computing method with the Liang Kuan center proper vector HCBV that is benchmark extract with two hip center knuckle as true origin from each frame of video, other joints are to the distance d of initial point, elevation angle φ and azimuth angle theta three parameters, then by the distance d of other joints of frames all in this video to initial point, elevation angle φ and azimuth angle theta are carried out vectorization process and are formed reference characteristic vector; If containing tNum frame in described video, then the dimension of this proper vector is 3 × 19 × tNum.

5. according to claim 1 or 4 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: be describedly all multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, see formula (1):

D＝λ×d(1)

6. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: the computing method of described angle character vector AV are the angles extracted from each frame of video between each adjacent segment, then the angle between the adjacent segment in all for this video frames are carried out vectorization process angulation proper vector; If containing tNum frame in described video, then the dimension of this angle character vector AV is 19 × tNum.

7. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: the computing method of described relative seat feature vector PRPV extract a certain joint relative to the relative position between other joints from each frame of video, then all passes internode relative position in all for this video frames carried out vectorization process and form relative seat feature vector; If containing tNum frame in this video, then the dimension of this proper vector is 19 × 20 × tNum.

8. according to claim 1 or 7 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: in the computing method of described relative seat feature vector PRPV, for the some joint i in t frame, extract relative position parameter p by the distance calculated between this joint i and other joints j _t ^ij, see formula (2):

p_{t}^{i j} = p_{t}^{i} - p_{t}^{j} - - - (2)

p_{t}^{i} = {p_{t}^{i j} | i &NotEqual; j} - - - (3)

Therefore relative seat feature vector PRPV, see formula (4);

P R P V = {p_{t}^{i} | i = 1, .., 20; t = 1, .., t N u m} - - - (4) .

9. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: before three proper vectors carry out Classification and Identification, adopt min-max method respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope.