CN105138995A - Time-invariant and view-invariant human action identification method based on skeleton information - Google Patents

Time-invariant and view-invariant human action identification method based on skeleton information Download PDF

Info

Publication number
CN105138995A
CN105138995A CN201510551025.5A CN201510551025A CN105138995A CN 105138995 A CN105138995 A CN 105138995A CN 201510551025 A CN201510551025 A CN 201510551025A CN 105138995 A CN105138995 A CN 105138995A
Authority
CN
China
Prior art keywords
video
vector
frame
identification
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510551025.5A
Other languages
Chinese (zh)
Other versions
CN105138995B (en
Inventor
刘智
冯欣
张�杰
杨武
张凌
张杰慧
黄智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN201510551025.5A priority Critical patent/CN105138995B/en
Publication of CN105138995A publication Critical patent/CN105138995A/en
Application granted granted Critical
Publication of CN105138995B publication Critical patent/CN105138995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/033Recognition of patterns in medical or anatomical images of skeletal patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a time-invariant and view-invariant human action identification method based on skeleton information. The method comprises steps of: extracting human action video segments with the same video length; extracting twenty joints information expressing a human action from each frame of the video segments; acquiring a characteristic vector HCBV using the center of two hips as a reference, an angular characteristic vector AV, and a relative position characteristic vector PRPV by means of computing and processing according to the twenty joints information in each frame of the video segments; by using a support vector machine classifier, performing classification and identification on the characteristic vector HCBV using the center of two hips as a reference, the angular characteristic vector AV, and the relative position characteristic vector PRPV in order to obtain an identification probability of each action classification; performing weighting summation fusion on the identification probability of each action classification acquired by the characteristic vector HCBV using the center of two hips as a reference, the characteristic vector AV, and the relative position characteristic vector PRPV in order to obtain an action identification result. The method is simple and visualized, high in identification correct rate, and short in identification time.

Description

Based on framework information time constant and look constant Human bodys' response method
Technical field
The present invention relates to Activity recognition method, be specifically related to a kind of based on framework information time constant and look constant Human bodys' response method.
Background technology
Human bodys' response is at video monitoring, and man-machine interaction, a lot of field such as video extraction all plays very important role.The fields such as Human bodys' response can be adapted to Criminal Investigation, patient looks after, home for destitute.A period of time in the past, the most of feature based on engineer of machine vision generic task, as Scale invariant features transform (SIFT), histograms of oriented gradients (HOG), motion history image (MHI) etc.But a lot of classical visual identity method is only realize by piecing together more existing successful methods.Have scholar to think, the research of Activity recognition in progress in recent years slowly.The appearance of depth camera makes researcher can rethink some problems of image procossing and machine vision.Take color different with texture information compared with RGB camera, depth camera can record the depth information of human body, can obtain geological information and the framework information of human body from these information.And the change of depth camera to light is insensitive, thus in the visual tasks such as Video segmentation, target identification, Activity recognition, than traditional rgb video, there is better resolvability.
The potential relation found between human body behavior classification and framework information is absorbed in the research of present people to Activity recognition, as: based on Lie group and 3D skeleton point Human bodys' response " Humanactionrecognitionbyrepresenting3Dskeletonsaspointsi naliegroup; " see [1], behavior recognition methods computation complexity is high, spended time is long, extracting single video characteristic averaging time is 6.53 seconds, is not easy to promote the use of.As: a kind of based on 3D joint histogrammic look constant Human bodys' response method " Viewinvarianthumanactionrecognitionusinghistogramsof3Djo ints; " see [2], this recognition methods lost the contextual information of interframe before and after joint, and recognition correct rate is low.As: the space-time posture in 3D Human bodys' response represents " Space-timeposerepresentationfor3dhumanactionrecognition; " see [3], behavior recognition methods only studies posture, namely be research benchmark with image, identified by image, not only high to video capture equipment requirement, and make the information discrimination of acquisition low.And for example: the Activity recognition " Activityrecognitionfornaturalhumanrobotinteraction, " in natural man-machine interaction is shown in [4], and behavior Study of recognition man-machine interaction, recognition efficiency is low.Therefore can be represented a posture of human body by the 3D geometric relationship of skeleton joint information modeling health different parts, but existing recognition efficiency is low, time overhead is larger.
Document [1] Vemulapalli, F.Arrate, andR.Chellappa, " Humanactionrecognitionbyrepresenting3Dskeletonsaspointsi naliegroup; " inComputerVisionandPatternRecognition (CVPR), 2014IEEEConferenceon, 2014, pp.588-595.
Document [2] L.Xia, C.-C.Chen, andJ.K.Aggarwal, " Viewinvarianthumanactionrecognitionusinghistogramsof3Djo ints; " inComputerVisionandPatternRecognitionWorkshops (CVPRW), 2012IEEEComputerSocietyConferenceon, 2012, pp.20-27.
Document [3] M.Devanne, H.Wannous, S.Berretti, P.Pala, M.Daoudi, andA.DelBimbo. " Space-timeposerepresentationfor3dhumanactionrecognition, " inNewTrendsinImageAnalysisandProcessing-ICIAP2013.Spring er, 2013, pp.456-464.
Document [4] A.Chrungoo, S.Manimaran, andB.Ravindran, " Activityrecognitionfornaturalhumanrobotinteraction, " inSocialRobotics.Springer, 2014, pp.84-94.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide a kind of based on framework information time constant and look constant Human bodys' response method, this recognition methods simple, intuitive, recognition correct rate is high, and recognition time is short.
Object of the present invention can be achieved through the following technical solutions:
Based on framework information time constant and look constant Human bodys' response method, it is characterized in that: comprise the following steps:
1) extract human body behavior video-frequency band, and the video-frequency band of different length is standardized to a fixing video length;
2) according to the video extraction framework information of the regular length of gained, from each frame of video, 20 joint information of expressing human body behavior are namely extracted;
3) three proper vectors are extracted according to extracting 20 joint information of expressing human body behavior in each frame, namely the proper vector HCBV that it is benchmark that the information computing in 20 joints draws with Liang Kuan center from each frame of video, angle character vector AV and relative seat feature vector PRPV, the described proper vector HCBV that is benchmark with Liang Kuan center, with each frame two hip center knuckle for true origin, calculate the distance d of other joints of this frame to initial point, elevation angle φ and azimuth angle theta three parameters, by the distance d of other joints in all for this video frames except initial point to initial point, elevation angle φ and azimuth angle theta three parameter serial connections are HCBV, described angle character vector AV is by the vector of the angle serial connection in all for this video frames between two adjacent segments, described relative seat feature vector PRPV is by the vector of a certain joint in all for this video frames relative to the relative position serial connection between other joints,
4) respectively Classification and Identification is carried out to three proper vectors obtained: adopt support vector machine classifier to carry out Classification and Identification to the proper vector HCBV being benchmark with Liang Kuan center respectively, Classification and Identification is carried out to angle proper vector AV, to the capable Classification and Identification of relative seat feature vector PRPV, draw the identification probability of each behavior classification;
5) identification probability of each behavior classification is merged: the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification that the proper vector HCBV being benchmark with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV draw, wherein with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3.
Described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right crus of diaphragm respectively.
Described extraction human body behavior video-frequency band, first will carry out pre-service to each video length, adopts frame of video regularization method of interpolation the video-frequency band of different video length to be standardized to an identical video length.
The described computing method with the Liang Kuan center proper vector HCBV that is benchmark extract with two hip center knuckle as true origin from each frame of video, other joints are to the distance d of initial point, elevation angle φ and azimuth angle theta three parameters, then by the distance d of other joints of frames all in this video to initial point, elevation angle φ and azimuth angle theta are carried out vectorization process and are formed reference characteristic vector; If containing tNum frame in described video, then the dimension of this proper vector is 3 × 19 × tNum.
Describedly all be multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, see formula (1):
D=λ×d(1)
In formula, height factors λ equals the inverse of the spacing in two hip center knuckle and ridge joint.
The computing method of described angle character vector AV are the angles extracted from each frame of video between each adjacent segment, then the angle between the adjacent segment in all for this video frames are carried out vectorization process angulation proper vector; If containing tNum frame in described video, then the dimension of this angle character vector AV is 19 × tNum.
The computing method of described relative seat feature vector PRPV extract a certain joint relative to the relative position between other joints from each frame of video, then all passes internode relative position in all for this video frames carried out vectorization process and form relative seat feature vector; If containing tNum frame in this video, then the dimension of this proper vector is 19 × 20 × tNum.
In the computing method of described relative seat feature vector PRPV, for the some joint i in t frame, extract relative position parameter p by the distance calculated between this joint i and other joints j t ij, see formula (2):
p t i j = p t i - p t j - - - ( 2 )
Wherein for the coordinate of joint i in this t frame, the three-dimensional relative position attribute of joint i in t frame, see formula (3):
p t i = { p t i j | i ≠ j } - - - ( 3 )
Therefore relative seat feature vector PRPV, see formula (4);
P R P V = { p t i | i = 1 , .. , 20 ; t = 1 , .. , t N u m } - - - ( 4 ) .
Before three proper vectors carry out Classification and Identification, adopt min-max method respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope.
Beneficial effect of the present invention: this behavior recognition methods comprises: first, pre-service is carried out to the video length gathered, the video-frequency band of different video length is standardized to a fixing video length, the proper vector extracted in different video is not only made to have identical dimension, and the main movement pattern information that can keep in video, thus ensure that the feature that this method time is constant.Secondly, from each frame of video, extract the information in human body behavior 20 joints, these 20 joints contain the major joint describing human body behavior, therefore enough express human body behavioural characteristic.According to the proper vector HCBV that in each frame, it is benchmark that the information computing in 20 joints draws with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV tri-proper vectors, from bone information, extract angle information and relative position form 3 kinds of different proper vector HCBV, AV and PRPV, HVBC combines angle and the azimuth information in each joint, AV considers all passes internode angle information, PRPV considers all joints relative position information, thus has the feature of looking immutableness.Then, adopt support vector machine classifier to carry out Classification and Identification to three proper vectors respectively, draw the identification probability of each behavior classification.Finally, the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification.The method calculates easy, takes a short time.The inventive method can obtain the recognition effect consistent with current method on UTKinect-Action3D data set, the present invention utilizes the framework information of human body to extract the feature in video, thus method more simple, intuitive, recognition time is short, recognition accuracy is high, improve real-time, and the method has the feature of time constant and unchanged view angle, makes the inventive method have stronger robustness when being applied to other data sets.
Described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right crus of diaphragm respectively.These 20 joints are the major joint of expressing human body behavior, to the expressiveness of human body behavior the most by force, clearer, thus make recognition efficiency higher.
The support vector machine classifier of reference characteristic vector is adopted to classify to the proper vector HCBV being benchmark with Liang Kuan center, adopt the support vector machine classifier of angle character vector to classify to angle proper vector AV, adopt the support vector machine classifier of relative seat feature vector to classify to relative seat feature vector PRPV.Adopt these three support vector machine classifiers to classify respectively, namely being formed first classifies merges again, and this method is relative to first merging vectorial better effects if of classifying again.
Describedly all be multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, reduce the impact of differing heights subject on proper vector.
Before three proper vectors carry out Classification and Identification, adopt min-max method respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope, makes data normalization, improves recognition correct rate.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of the proper vector that the present invention is benchmark with Liang Kuan center;
Fig. 2 is the schematic diagram of angle character vector of the present invention.
Specific embodiment
Below in conjunction with accompanying drawing, the invention will be further described.
See shown in Fig. 1 to Fig. 2, a kind of based on framework information time constant and look constant Human bodys' response method,
Adopt depth camera sampling depth video, depth camera is relative to traditional rgb video, and deep video can not change along with the change of light, therefore at Video segmentation, has better distinction in the visual tasks such as Activity recognition than rgb video.The speed of deep video is 30 frames/second.
Step 1, extracts human body behavior video-frequency band, and the video-frequency band of different length is standardized to a fixing video length; From the deep video of shooting, extract human body behavior video, first will carry out pre-service to video length, adopt frame of video regularization method of interpolation the video-frequency band of different video length to be standardized to the video length of identical (fixing); Here video length refers to the frame number that video-frequency band comprises, frame of video regularization method of interpolation is adopted the video-frequency band of different frame number to be adjusted to the unify video section of same number of frames, the frame number of this unify video section is the intermediate value of all video-frequency band frame numbers, and frame of video regularization method of interpolation is the rule method being realized video length by frame of video interpolation technique.The length no requirement (NR) of video-frequency band, as long as it is just passable to comprise complete action behavior in this video-frequency band, generally at least two ten frames.Such as, be that the video-frequency band of 10 seconds is adjusted to the video-frequency band that time span is 15 seconds by time span, then video i-th frame after adjustment comes from [10*i/15] frame of former video, and wherein [] rounds on being.Generally the video length of same data centralization can not differ too large.The video-frequency band of different frame number is adjusted to the identical video-frequency band of frame number, the proper vector extracted in different video is not only made to have identical dimension, and the main movement pattern information that can keep in video, thus ensure that the feature of this method time invariance.This step is the important process step before Human bodys' response.
Step 2, according to the video extraction framework information of the regular length of gained, namely from each frame extracting video-frequency band, extract 20 joint information of expressing human body behavior, the x in 20 joint information and each joint, y, z coordinate information, described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right crus of diaphragm respectively; The present invention only needs the framework information extracting these 20 relations, utilize all pixels in frame to extract feature relative to existing method, method herein only utilizes the framework information of human body to extract the feature in deep video, and thus method is simpler, more efficient, real-time is higher.
Step 3: extract three proper vectors according to extracting 20 joint information of expressing human body behavior in each frame, namely from each frame of video-frequency band, the information computing in 20 joints draws with Liang Kuan center be benchmark proper vector HCBV (HipCenterBasedVector) angle character vector AV (AngleVector) and relative seat feature vector PRPV (PairwiseRelativePositionVector);
The described computing method with the Liang Kuan center proper vector HCBV that is benchmark extract with two hip center knuckle as true origin from each frame of video, calculate the distance d of other joints of this frame to initial point, elevation angle φ and azimuth angle theta three parameters, the distance d of other joints to initial point of all frames is calculated according to the three-dimensional coordinate of each articulation point, elevation angle φ and azimuth angle theta three parameters, then by the distance d of other joints except initial point all in this video to initial point, elevation angle φ and azimuth angle theta are carried out vectorization process and are formed reference characteristic vector.Other joints relatively, two hip center knuckle mobile ranges are minimum, so with the Liang Kuan center proper vector HCBV computing method that are benchmark using two hip center knuckle as the initial point of 3D rectangular coordinate, for other joints in each frame of deep video except two hip center knuckle, following three parameters can be calculated, namely this joint is to the distance d of initial point, elevation angle φ and azimuth angle theta.Describedly all be multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, see formula (1):
D=λ×d(1)
In formula, height factors λ equals the inverse of the spacing in two hip center knuckle and ridge joint, and constitutive characteristic vector is carried out to the distance D of two hip center knuckle in other joints after standardization, reduces differing heights by the impact of person's examination on proper vector.
Have 3 × 19 parameters in every frame video, if containing tNum frame in video, then the dimension of this proper vector is 3 × 19 × tNum.As the distance D that Di_j is the i-th frame joint j, θ i_j is the position angle of the i-th frame joint j, and φ i_j is the elevation angle of the i-th frame joint j, draws: D1_1, φ 1_1 θ 1_1, D1_2 φ 1_2 θ 1_2, D1_3 φ 1_3 θ 1_3 ... .D1_19 φ 1_19 θ 1_19, D2_1 θ 2_1, D2_2 θ 2_2, D2_3 θ 2_3 ... D2_19 φ 2_19 θ 2_19, D20_1 θ 20_1, D20_2 θ 20__2, D20_3 θ 20_3.......The distance D of joint each in a certain frame to two hip center knuckle, elevation angle φ is connected with azimuth angle theta, then these three parameters of all frames is carried out connecting (vectorization process), and the Ge Yiliangkuan center that obtains is the proper vector HCBV of benchmark.
Described angle character vector AV is by the vector of the angle serial connection in all for this video frames between two adjacent segments, first determine the angle of all adjacent segments in skeleton structure, the angle of adjacent segment calculates according to the three-dimensional coordinate of each adjacent segment, and angle character vector AV is intended to the overall flexibility information extracting human body.The computing method of angle character vector AV are the angles extracted from each frame of video between each adjacent segment, then the angle between the adjacent segment in all for this video frames are carried out vectorization process angulation proper vector; All have 19 angle parameters in every frame video, if containing tNum frame in video, then the dimension of this angle character vector AV is 19 × tNum.
Described relative seat feature vector PRPV is by the vector of a certain joint in all for this video frames relative to the relative position serial connection between other joints, first extract a certain joint interarticular relative position information all relative to other, the relative position information in each joint calculates according to the three-dimensional coordinate of each adjacent segment, the computing method of described relative seat feature vector PRPV extract a certain joint relative to the relative position between other joints from each frame of video, then all passes internode relative position in all for this video frames is carried out vectorization process and form relative seat feature vector.
In the computing method of described relative seat feature vector PRPV, for the some joint i in t frame, extract relative position parameter p by the distance calculated between this joint i and other joints j t ij(keeping original relative seat feature vector), see formula (2):
p t i j = p t i - p t j - - - ( 2 )
Wherein for the coordinate of joint i in this t frame, the three-dimensional relative position attribute of joint i in t frame, see formula (3):
p t i = { p t i j | i ≠ j } - - - ( 3 )
, all there are 19 × 20 angle parameters in 20 joints of human skeleton in every frame video, if containing tNum frame in this video, then the dimension of this proper vector is 19 × 20 × tNum, therefore relative seat feature vector PRPV, see formula (4);
P R P V = { p t i | i = 1 , .. , 20 ; t = 1 , .. , t N u m } - - - ( 4 ) .
Because the height of person to person is different, so interarticular distance also has difference, in order to eliminate this impact, all interarticular distances are standardized, wherein enter to equal the inverse of joint and ridge joint spacing between two hips, therefore standardization processing carried out to two interarticular relative positions and seen formula (5):
P=p t ij*λ(5)
Before three proper vectors carry out Classification and Identification, all adopt min-max method by respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope.
Step 4, respectively Classification and Identification is carried out to three proper vectors obtained: adopt three support vector machine classifiers (SVM) to carry out Classification and Identification to the proper vector HCBV being benchmark with Liang Kuan center respectively, Classification and Identification is carried out to angle proper vector AV, to the capable Classification and Identification of relative seat feature vector PRPV, draw the identification probability of each behavior classification; Support vector machine classifier described in this enforcement adopts LIBLINEAR sorter.The source code that the list of references that the sorting technique of support vector machine classifier SVM directly uses provides and method, class categories is exactly the classification of the human body behavior contained by this data set, and this algorithm can be suitable for the data set with any classification.Can both obtain by each proper vector the probability that this video belongs to certain behavior, therefore comprehensive three proper vectors improve recognition efficiency.
Step 5, the identification probability of each behavior classification is merged: the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification that the proper vector HCBV being benchmark with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV draw, wherein with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3.By the classification results of three of same video proper vectors is integrated, comprehensively in fact exactly the prediction probability of three proper vectors in each behavior is weighted summation, so just can obtain the prediction probability of each behavior after suing for peace, what probability was large is exactly the behavior identified, make the fusion of this classification results very simple, improve counting yield.Here with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3, and the weights of each proper vector obtain according to many experiments and experience for many years.
Experimental result of the present invention and analysis
A. data set and pre-service
Intel (R) Core (TM) the i5-4200M4 processor CPU of experiment main frame to be a dominant frequency be 2.50GHz, inside save as 4G, the inventive method has carried out experimental evaluation on UTKinect-Action3D data set.UTKinect-Action3D data set is the data set of the potential relation found between human body behavior classification and framework information, this data set uses static Kinect collected by camera to form, complete ten kinds of different behavior acts by ten different subjects and see Table I, every subject's each behavior act shooting secondary.Remove an invalid video, whole data set containing 109 nineteen effective videos, all provides the three-dimensional coordinate in 20 joints altogether in each video.For the convenience of experiment, 200 video sequences are employed in experiment herein, the second time shooting that the tenth subject being about to disappearance throws away (carry) action uses 1242 frames in raw data to supplement to the video of 1300 frames, and raw data refers to the original long video be not segmented the earliest.UTKinect-Action3D data set contains from multiple different angle shot video sequence and has otherness in very high class, therefore very challenging.In described class, otherness also has very large difference between same behavior classification, and the action of waving of such as different people has very big difference.On pretreatment, we have carried out simple pre-service to each video, and first process uses frame of video regularization method of interpolation to standardize to unified video length by all video lengths of data centralization, and this video length is the intermediate value of all video lengths.Second process uses min-max, and method is by respectively by the x of all videos, y, and z coordinate value is standardized to [0,1] scope.
B. performance evaluation
For the experimental evaluation on UTKinect-Action3D data set, use the Setup Experiments of intersection subject herein, namely the behavior act of five subjects is used for training, with { 1,3,5,7,9} represents their behavior sequence, and the behavior act of other five subjects is used for test, with { 2,4,6,8,10} represents their behavior sequence, and Table I gives the identification accuracy of every class behavior.As can be seen from Table I, the average recognition accuracy of each behavior is 95%.UTKinect-Action3D data set is individual from taking from various visual angles and challenging data set, and the length of each video is also completely different.The experimental result of high discrimination indicates the unchanged view angle of context of methods and the feature of time invariance.Can find out from Table I, throw away (carry), throw (throw) and push away the discrimination of (push) action relatively low.Wherein (carry) action of throwing away of subject 9 and subject 10 is identified as action throw and push respectively by mistake, because the video frame number that these two actions contain causes provided information to be not enough to for Classification and Identification very little, therefore selected video frame number will comprise complete action behavior, at least two ten frames substantially.
Discrimination (the mean value: 95%) of each behavior act on Table I UTKINECT-Action3D data set.
Behavior walk sit down stand up pick up carry
Accuracy 100 100 100 100 80
Behavior throw push pull wavc hand clap hand
Accuracy 80 90 100 100 100
Context of methods and existing Activity recognition method recognition effect on UTKINECT-Action3D data set compare sees Table II.Method in this paper (Proposed) obtains the classification performance of 95%, and the recognition correct rate of other Activity recognition methods is all lower than the inventive method.Meanwhile, it is 0.18 second that the inventive method extracts single video characteristic averaging time, the single video of extraction characteristic averaging time based on the Human bodys' response of Lie group and 3D skeleton point is 6.53 seconds, required in the Human bodys' response based on Lie group and 3D skeleton point, therefore the method more simple, intuitive of this paper, also more efficient on time overhead.
The comparison of Table II context of methods and existing method recognition effect on UTKINECT-Action3D data set
Method (Method) Accuracy rate (Accuracy)
Xia et al. (2012) document [2] 90.92%
Devanne et al. (2013) document [3] 91.5%
Chrungoo et al. (2014) document [4] 91.96%
Proposed 95%
The present invention proposes a kind of directly perceived and simple and effective Human bodys' response method of framework information based on deep video, the method forms 3 kinds of different proper vector HCBV, AV and PRPV by extracting interarticular angle information and relative position information in deep video.By merging the classification results of HCBV, AV, PRPV tri-proper vectors, method herein obtains good recognition result at UTKinect-Action3D data set, herein proposed method more simple, intuitive, and time overhead is less.Meanwhile, the feature that this method is extracted has the feature of time invariance and unchanged view angle, makes this method have stronger robustness when being applied to other data sets.

Claims (9)

1. based on framework information time constant and look constant Human bodys' response method, it is characterized in that: comprise the following steps:
1) extract human body behavior video-frequency band, and the video-frequency band of different length is standardized to a fixing video length;
2) according to the video extraction framework information of the regular length of gained, from each frame of video, 20 joint information of expressing human body behavior are namely extracted;
3) three proper vectors are extracted according to extracting 20 joint information of expressing human body behavior in each frame, namely the proper vector HCBV that it is benchmark that the information computing in 20 joints draws with Liang Kuan center from each frame of video, angle character vector AV and relative seat feature vector PRPV, the described proper vector HCBV that is benchmark with Liang Kuan center, with each frame two hip center knuckle for true origin, calculate the distance d of other joints of this frame to initial point, elevation angle φ and azimuth angle theta three parameters, by the distance d of other joints in all for this video frames except initial point to initial point, elevation angle φ and azimuth angle theta three parameter serial connections are HCBV, described angle character vector AV is by the vector of the angle serial connection in all for this video frames between two adjacent segments, described relative seat feature vector PRPV is by the vector of a certain joint in all for this video frames relative to the relative position serial connection between other joints,
4) respectively Classification and Identification is carried out to three proper vectors obtained: adopt support vector machine classifier to carry out Classification and Identification to the proper vector HCBV being benchmark with Liang Kuan center respectively, Classification and Identification is carried out to angle proper vector AV, to the capable Classification and Identification of relative seat feature vector PRPV, draw the identification probability of each behavior classification;
5) identification probability of each behavior classification is merged: the recognition result that the behavior that obtains is merged in summation is weighted to the identification probability of each behavior classification that the proper vector HCBV being benchmark with Liang Kuan center, angle character vector AV and relative seat feature vector PRPV draw, wherein with the weights of the Liang Kuan center proper vector HCBV that is benchmark for 0.4, the weights of angle character vector AV are 0.3, and the weights of relative seat feature vector PRPV are 0.3.
2. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: described 20 joints Shi Liangkuan center, ridge, Liang Jian center, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, the right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle and right crus of diaphragm respectively.
3. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: described extraction human body behavior video-frequency band, first to carry out pre-service to each video length, adopt frame of video regularization method of interpolation the video-frequency band of different video length to be standardized to an identical video length.
4. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: the described computing method with the Liang Kuan center proper vector HCBV that is benchmark extract with two hip center knuckle as true origin from each frame of video, other joints are to the distance d of initial point, elevation angle φ and azimuth angle theta three parameters, then by the distance d of other joints of frames all in this video to initial point, elevation angle φ and azimuth angle theta are carried out vectorization process and are formed reference characteristic vector; If containing tNum frame in described video, then the dimension of this proper vector is 3 × 19 × tNum.
5. according to claim 1 or 4 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: be describedly all multiplied by height factors λ specification with other joints in the proper vector HCBV that is benchmark of Liang Kuan center to the distance d of two hip center knuckle and turn to D, see formula (1):
D=λ×d(1)
In formula, height factors λ equals the inverse of the spacing in two hip center knuckle and ridge joint.
6. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: the computing method of described angle character vector AV are the angles extracted from each frame of video between each adjacent segment, then the angle between the adjacent segment in all for this video frames are carried out vectorization process angulation proper vector; If containing tNum frame in described video, then the dimension of this angle character vector AV is 19 × tNum.
7. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: the computing method of described relative seat feature vector PRPV extract a certain joint relative to the relative position between other joints from each frame of video, then all passes internode relative position in all for this video frames carried out vectorization process and form relative seat feature vector; If containing tNum frame in this video, then the dimension of this proper vector is 19 × 20 × tNum.
8. according to claim 1 or 7 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: in the computing method of described relative seat feature vector PRPV, for the some joint i in t frame, extract relative position parameter p by the distance calculated between this joint i and other joints j t ij, see formula (2):
p t i j = p t i - p t j - - - ( 2 )
Wherein for the coordinate of joint i in this t frame, the three-dimensional relative position attribute of joint i in t frame, see formula (3):
p t i = { p t i j | i ≠ j } - - - ( 3 )
Therefore relative seat feature vector PRPV, see formula (4);
P R P V = { p t i | i = 1 , .. , 20 ; t = 1 , .. , t N u m } - - - ( 4 ) .
9. according to claim 1 based on framework information time constant and look constant Human bodys' response method, it is characterized in that: before three proper vectors carry out Classification and Identification, adopt min-max method respectively by the x of frames all in video, y, z coordinate value is standardized to [0,1] scope.
CN201510551025.5A 2015-09-01 2015-09-01 The when constant and constant Human bodys' response method of view based on framework information Active CN105138995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510551025.5A CN105138995B (en) 2015-09-01 2015-09-01 The when constant and constant Human bodys' response method of view based on framework information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510551025.5A CN105138995B (en) 2015-09-01 2015-09-01 The when constant and constant Human bodys' response method of view based on framework information

Publications (2)

Publication Number Publication Date
CN105138995A true CN105138995A (en) 2015-12-09
CN105138995B CN105138995B (en) 2019-06-25

Family

ID=54724339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510551025.5A Active CN105138995B (en) 2015-09-01 2015-09-01 The when constant and constant Human bodys' response method of view based on framework information

Country Status (1)

Country Link
CN (1) CN105138995B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295544A (en) * 2016-08-04 2017-01-04 山东师范大学 A kind of unchanged view angle gait recognition method based on Kinect
CN107392131A (en) * 2017-07-14 2017-11-24 天津大学 A kind of action identification method based on skeleton nodal distance
CN107451524A (en) * 2016-06-01 2017-12-08 丰田自动车株式会社 Activity recognition device, learning device, Activity recognition method, learning method and computer-readable recording medium
CN108446583A (en) * 2018-01-26 2018-08-24 西安电子科技大学昆山创新研究院 Human bodys' response method based on Attitude estimation
CN111860086A (en) * 2019-06-26 2020-10-30 广州凡拓数字创意科技股份有限公司 Gesture recognition method, device and system based on deep neural network
EP3809321A1 (en) * 2019-10-15 2021-04-21 Fujitsu Limited Action recognition method and apparatus and electronic equipment
CN113822250A (en) * 2021-11-23 2021-12-21 中船(浙江)海洋科技有限公司 Ship driving abnormal behavior detection method
CN114783059A (en) * 2022-04-20 2022-07-22 浙江东昊信息工程有限公司 Temple incense and worship participation management method and system based on depth camera

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
JIANG WANG ETAL.: "Learning actionlet ensemble for 3D human action recognition", 《IEEE TRANSACTIONS ON PATTEERN ANALYSIS AND MACHINE INTELLIGENCE》 *
JIANG WANG ETAL.: "Mining Actionlet Ensemble for Action Recognition with Depth Cameras", 《2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
JING DU ETAL.: "3D action recognition Based on limb angle model", 《INFORMATION SCIENCE AND TECHNOLOGY(ICIST),2014 4TH IEEE INTERNATIONAL CONFERENCE ON IEEE》 *
RAVITEJA VEMULAPALLI ETAL.: "Human action recognition by representing 3D skeletons as points in a lie group", 《2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
于静等: "基于深度信息的动态手势识别算法", 《山东大学学报(工学版)》 *
李阳: "基于无线传感器网络和概率融合的行为识别方法", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
杜靖: "基于3D模型的行为识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
田国会等: "一种基于关节点信息的人体行为识别新方法", 《机器人》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451524A (en) * 2016-06-01 2017-12-08 丰田自动车株式会社 Activity recognition device, learning device, Activity recognition method, learning method and computer-readable recording medium
CN107451524B (en) * 2016-06-01 2020-07-07 丰田自动车株式会社 Behavior recognition device, learning device, behavior recognition method, learning method, and computer-readable recording medium
CN106295544A (en) * 2016-08-04 2017-01-04 山东师范大学 A kind of unchanged view angle gait recognition method based on Kinect
CN106295544B (en) * 2016-08-04 2019-05-28 山东师范大学 A kind of unchanged view angle gait recognition method based on Kinect
CN107392131A (en) * 2017-07-14 2017-11-24 天津大学 A kind of action identification method based on skeleton nodal distance
CN108446583A (en) * 2018-01-26 2018-08-24 西安电子科技大学昆山创新研究院 Human bodys' response method based on Attitude estimation
CN111860086A (en) * 2019-06-26 2020-10-30 广州凡拓数字创意科技股份有限公司 Gesture recognition method, device and system based on deep neural network
EP3809321A1 (en) * 2019-10-15 2021-04-21 Fujitsu Limited Action recognition method and apparatus and electronic equipment
US11423699B2 (en) 2019-10-15 2022-08-23 Fujitsu Limited Action recognition method and apparatus and electronic equipment
CN113822250A (en) * 2021-11-23 2021-12-21 中船(浙江)海洋科技有限公司 Ship driving abnormal behavior detection method
CN114783059A (en) * 2022-04-20 2022-07-22 浙江东昊信息工程有限公司 Temple incense and worship participation management method and system based on depth camera

Also Published As

Publication number Publication date
CN105138995B (en) 2019-06-25

Similar Documents

Publication Publication Date Title
Jalal et al. Human daily activity recognition with joints plus body features representation using Kinect sensor
CN105138995A (en) Time-invariant and view-invariant human action identification method based on skeleton information
Kamal et al. A hybrid feature extraction approach for human detection, tracking and activity recognition using depth sensors
Zeng et al. Silhouette-based gait recognition via deterministic learning
Zhang et al. Active energy image plus 2DLPP for gait recognition
Kusakunniran et al. Gait recognition across various walking speeds using higher order shape configuration based on a differential composition model
Yao et al. Robust gait recognition using hybrid descriptors based on skeleton gait energy image
CN103942577A (en) Identity identification method based on self-established sample library and composite characters in video monitoring
Liu et al. Gait recognition based on outermost contour
CN105975932B (en) Gait Recognition classification method based on time series shapelet
Shirke et al. Literature review: Model free human gait recognition
Arivazhagan et al. Human action recognition from RGB-D data using complete local binary pattern
Do et al. Real-time and robust multiple-view gender classification using gait features in video surveillance
Liu et al. Survey of gait recognition
Lima et al. Simple and efficient pose-based gait recognition method for challenging environments
Khan et al. Person identification using spatiotemporal motion characteristics
Liu et al. Gender recognition using dynamic gait energy image
Mogan et al. Gait recognition using temporal gradient patterns
CN103390150B (en) human body part detection method and device
More et al. Gait-based human recognition using partial wavelet coherence and phase features
Fan et al. Human gait recognition based on discrete cosine transform and linear discriminant analysis
Singh et al. Bayesian gait-based gender identification (BGGI) network on individuals wearing loosely fitted clothing
Ti et al. GenReGait: Gender Recognition using Gait Features
Akhter et al. Deep Skeleton Modeling and Hybrid Hand-crafted Cues over Physical Exercises
Huynh et al. Robust classification of human actions from 3D data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant