CN106203363A

CN106203363A - Human skeleton motion sequence Activity recognition method

Info

Publication number: CN106203363A
Application number: CN201610562181.6A
Authority: CN
Inventors: 王亮; 杜勇
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-07-15
Filing date: 2016-07-15
Publication date: 2016-12-07

Abstract

The invention discloses a kind of human skeleton motion sequence Activity recognition method.Wherein, the method includes obtaining human skeleton node coordinate；Being concatenated by node corresponding for each for human skeleton limbs, the motion feature forming extremity and trunk is expressed；The motion feature of extremity and trunk is expressed and concatenates, form the vector expression of human skeleton；By the vector expression corresponding to frame each in human skeleton motion sequence, arrange sequentially in time, obtain three-dimensional matrice；Numerical value in three-dimensional matrice is done normalization and dimension normalization, obtains the image expression that human skeleton sequence pair is answered；The textural characteristics using convolutional neural networks to extract in image expression adaptively is expressed；Express based on this textural characteristics and carry out behavior kind judging, and determine the behavior classification belonging to human skeleton sequence in the way of ballot.The behavior of people, without complicated data prediction, accurately can be identified by the embodiment of the present invention according to human skeleton coordinate sequence.

Description

Human skeleton motion sequence Activity recognition method

Technical field

The present embodiments relate to computer vision, pattern recognition and degree of depth learning art field, be specifically related to a kind of people Body skeleton motion sequence Activity recognition method.

Background technology

The recovery of neural network theory has promoted developing rapidly of artificial intelligence technology.Society, intelligent robot, nothing People drives a car etc. and will enter into the life of people.Intelligent transportation, intelligent video monitoring and smart city etc. are required for computer Behavior to people automatically analyzes.Currently, the human skeleton algorithm for estimating of depth camera machine technology combined high precision, Ke Yizhi Connect the frame sequence providing human motion process corresponding, the behavior of people can be identified accurately based on this frame sequence.

Traditional Activity recognition algorithm based on human skeleton sequence is mainly encoded on the basis of manual feature extraction Rear design grader realizes behavior classification, and the extraction process of manual feature is relatively complicated, and its with feature coding subsequently and point Class process is generally separated and carries out, though composition system can be cascaded, but is unfavorable for actual application because of inefficient.Additionally, traditional method Training and test are typically to carry out in small data set, and when data volume increases, model computation complexity is for general hard Part condition is difficult to bear, and is difficult to play a role in actual applications.

In view of this, the special proposition present invention.

Summary of the invention

In view of the above problems, it is proposed that the present invention is to provide a kind of a kind of human body solving the problems referred to above at least in part Skeleton motion sequence Activity recognition method.

To achieve these goals, according to an aspect of the invention, it is provided techniques below scheme:

A kind of human skeleton motion sequence Activity recognition method, the method includes:

Obtain described human skeleton node coordinate；Wherein, human skeleton described in a series of continuous moment constitutes described human body Skeleton motion sequence；

Being concatenated by node corresponding for each for described human skeleton limbs, the motion feature forming extremity and trunk is expressed；

The motion feature of described extremity and described trunk is expressed and concatenates, form the vector expression of human skeleton；

Described vector corresponding to each frame in described human skeleton motion sequence is expressed, arranges sequentially in time, To three-dimensional matrice；

Numerical value in described three-dimensional matrice is done normalization and dimension normalization, obtains the image that human skeleton sequence pair is answered Express；

The textural characteristics using convolutional neural networks to extract in described image expression adaptively is expressed；

Described textural characteristics is expressed and is mapped as contained behavior class in dimension and data base by full Connection Neural Network layer The most total identical vector, and by soft maximization layer normalization, determine the generic probability of described human skeleton；

Minimize following maximum likelihood loss function:

L (Ω) = - Σ_{m = 0}^{M - 1} l n Σ_{k = 0}^{C - 1} δ (k - r) p (C_{k} | x_{m})

Wherein, described M represents the total amount of the described human skeleton motion sequence samples in data base；Described δ () represents Kronecker function；Described C represents behavior classification sum；Described k represents behavior item number；Described p (C_k|x_m) represent described m Individual human skeleton motion sequence samples x_mIt is under the jurisdiction of kth behavior C_kGeneric probability；Described r is human skeleton motion sequence sample This x_mCorresponding real behavior category label；Described L (Ω) represents maximum likelihood loss function；

The image expression answered from described human skeleton sequence pair and flipped image thereof determine four folding corner regions and center Region, carries out behavior kind judging respectively, and determines the behavior classification belonging to described human skeleton sequence in the way of ballot.

Preferably, method according to claim 1, it is characterised in that described acquisition described human skeleton node is sat Mark specifically includes:

Described human skeleton motion sequence is obtained by depth camera or motion capture system；

According to human body physical arrangement, single moment human skeleton coordinate is divided into extremity and torso portion；

For described extremity and described torso portion, according to each node physical connection order respectively to three coordinates of x, y, z Component arranges, and forms described extremity and described trunk and expresses at y-z, z-x, the vector of x-y plane projection, thus To described human skeleton node coordinate.

Preferably, described node corresponding for each for described human skeleton limbs is concatenated, form extremity and the fortune of trunk Dynamic feature representation, specifically includes:

According to left arm, right arm, trunk, left lower limb, the order of right lower limb, described vector is expressed and concatenates, thus shape The motion feature becoming described extremity and described trunk is expressed.

Preferably, described expression by the motion feature of described extremity and described trunk concatenates, and forms human skeleton Vector is expressed, and specifically includes:

The motion feature of described extremity and described trunk is expressed and is sequentially arranged, form the institute of described human skeleton State vector expression.

Preferably, described numerical value in described three-dimensional matrice is done normalization and dimension normalization, obtain human skeleton sequence The image expression that row are corresponding, specifically includes:

To the numerical value in described three-dimensional matrice and dimension, it is normalized operation according to below equation:

p = f l o o r (255 \times \frac{p - c_{m i n}}{c_{m a x} - c_{m i n}})

Wherein, the pixel value during described p represents the image expression that described human skeleton sequence pair is answered；Described c_maxWith described c_minRepresent the maximum and minimum value of all human skeleton node coordinates in training set respectively；Described floor represents and rounds downwards letter Number.

Preferably, described method also includes:

During textural characteristics in the described image expression of described extraction is expressed, use and maximize pond method reduction Convolution output dimension.

Preferably, described method also includes:

The described image expression answering described human skeleton sequence pair is normalized.

To achieve these goals, according to another aspect of the present invention, a kind of human skeleton motion sequence is additionally provided Activity recognition method, described method includes:

Obtain the node coordinate of human skeleton sequence to be analyzed；

Node corresponding for the described each limbs of human skeleton to be analyzed is concatenated, forms extremity and trunk motion feature table Reach；

The motion feature of described extremity and described trunk is expressed and concatenates, formed described human skeleton to be analyzed to Scale reaches；

Described vector corresponding to each frame in described human skeleton motion sequence to be analyzed is expressed, arranges sequentially in time Row, obtain three-dimensional matrice；

Numerical value in described three-dimensional matrice is done normalization and dimension normalization, obtains described human skeleton sequence to be analyzed Corresponding image expression；

Four folding corner regions and central area is determined from described image expression and flipped image thereof；Said method is utilized to instruct Described in the model extraction practiced, the textural characteristics of four folding corner regions and described central area is expressed, and based on described textural characteristics table Reach, in the way of ballot, determine the behavior classification belonging to described human skeleton sequence to be analyzed.

Compared with prior art, technique scheme at least has the advantages that

The embodiment of the present invention by provide one the most simply, in high precision, high efficiency human body skeleton motion sequence Activity recognition method.Human skeleton sequence is converted into according to certain rule the image expression of correspondence.Then, after based on converting Image expression architectural feature when utilizing its textural characteristics of convolutional neural networks model extraction indirectly to obtain frame sequence empty, And classify, thus the behavior answering raw skeleton sequence pair is identified, it is not necessary to complicated data prediction, can be according to human body The behavior of people is identified by skeleton coordinate sequence.

The embodiment of the present invention has weight in fields such as intelligent video monitoring, robot vision, man-machine interaction and game controls Want using value.

Accompanying drawing explanation

Accompanying drawing, as the part of the present invention, is used for providing further understanding of the invention, and the present invention's is schematic Embodiment and explanation thereof are used for explaining the present invention, but do not constitute inappropriate limitation of the present invention.Obviously, the accompanying drawing in describing below It is only some embodiments, to those skilled in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawings are obtained according to these accompanying drawings.In the accompanying drawings:

Fig. 1 is the training method according to the human skeleton motion sequence Activity recognition model shown in an exemplary embodiment Schematic flow sheet；

Fig. 2 is to concatenate formation four according to each for human skeleton limbs corresponding node being carried out shown in another exemplary embodiment The schematic diagram that limb and trunk motion feature are expressed；

Fig. 3 is according to obtaining, based on convolutional neural networks, the figure that human skeleton sequence pair is answered shown in an exemplary embodiment As the schematic diagram expressed；

Fig. 4 is according to synchronizing pond schematic diagram during empty shown in an exemplary embodiment；

Fig. 5 is according to the hierarchical space-time adaptive feature learning schematic diagram shown in an exemplary embodiment；

Fig. 6 is to know according to the human skeleton motion sequence behavior based on convolutional neural networks shown in an exemplary embodiment The schematic flow sheet of other method.

These accompanying drawings and word describe and are not intended as limiting by any way the concept of the present invention, but pass through reference Specific embodiment is that those skilled in the art illustrate idea of the invention.

Detailed description of the invention

Below in conjunction with the accompanying drawings and the specific embodiment technical side that the embodiment of the present invention solved the technical problem that, is used The technique effect of case and realization carries out clear, complete description.Obviously, described embodiment is only of the application Divide embodiment, be not whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not paying creation Property work on the premise of, the embodiment of other equivalents all of being obtained or substantially modification all falls within protection scope of the present invention.

It should be noted that in the following description, understand for convenience, give many details.But it is the brightest Aobvious, the realization of the present invention can not have these details.

Also, it should be noted the most clearly limiting or in the case of not conflicting, each embodiment in the present invention and Technical characteristic therein can be mutually combined and form technical scheme.

In recent years, depth camera rapid technological improvement, combine high efficiency based on its acquired range image sequence Attitude estimation algorithm, can effectively obtain human skeleton motion sequence, can realize high-precision behavior based on these sequences Identify.

To this end, the embodiment of the present invention proposes a kind of human skeleton motion sequence Activity recognition method.As it is shown in figure 1, the party Method may include that

S100: obtain human skeleton node coordinate.Wherein, the human skeleton in a series of continuous moment constitutes human skeleton fortune Dynamic sequence；

In an optional embodiment, this step can be able to be got by depth camera or motion capture system Human skeleton motion sequence.Human skeleton motion sequence includes the human skeleton in a series of continuous moment.Then, according to human body thing Reason structure, is divided into five parts by single moment human skeleton coordinate according to extremity and trunk, and to each several part according to each node Three coordinate components of x, y, z are arranged by physical connection order respectively, form extremity and trunk three coordinate plane projection Vector expression.

S110: concatenated by node corresponding for each for human skeleton limbs, forms extremity and trunk motion feature is expressed.

In an optional embodiment, this step can according to left arm, right arm, trunk, left lower limb, right lower limb suitable Extremity and trunk are concatenated in the vector expression of three coordinate plane projection, thus obtain human skeleton at y-by sequence respectively The vector of tri-coordinate plane projection of z, z-x, x-y expresses (i.e. motion feature expression).

S120: extremity and trunk motion feature are expressed and concatenate, forms the vector expression of human skeleton.

In an optional embodiment, extremity and trunk motion feature are expressed and is sequentially arranged, form human body The vector expression of skeleton.

In actual applications, can be respectively at tri-coordinate planes of y-z, z-x, x-y, by time each for human skeleton motion sequence The vector expression carving human skeleton attitude corresponding is sequentially arranged, and (it is floating-point to constitute three two-dimensional matrixs of red, green, blue Matrix), as shown in Figure 2.Wherein, the longitudinal direction (arranging) of each matrix represents that each node of single width skeleton is in space y-z, z-x, x-y plane On the vector expression of projection, laterally (OK) then represents each projection coordinate rule over time, is that single skeleton node exists Coordinate figure the most in the same time.

S130: by the vector expression corresponding to frame each in human skeleton motion sequence, arrange sequentially in time, obtain three Dimension matrix.

Wherein, will by human skeleton motion sequence each moment skeleton pose corresponding vector express be sequentially arranged and Three two-dimensional matrixs (being respectively seen as the RGB three primary colours of image) constituted are stacked into a three-dimensional matrice.These three two-dimensional matrix Three passages respectively as RGB color image.

S140: the numerical value in three-dimensional matrice is done normalization and dimension normalization, obtains the figure that human skeleton sequence pair is answered As expressing.

In this step, it is normalized operation according to below equation:

p = f l o o r (255 \times \frac{p - c_{\min}}{c_{m a x} - c_{\min}})

Wherein, p represents the pixel value in the image expression that human skeleton sequence pair is answered, c_maxAnd c_minRepresent training set respectively In the maximum and minimum value of all human skeleton node coordinates, floor represents downward bracket function.

Operated by above-mentioned normalization so that image pixel Distribution value, between 0-255, i.e. obtains human skeleton sequence The image expression that (namely raw skeleton sequence) is corresponding, as shown in Figure 3.

S150: use convolutional neural networks to extract the texture in the image expression that human skeleton sequence pair is answered adaptively Feature representation.

In this step, maximization pond (max-pooling) can be used to reduce convolution output dimension, with place simultaneously Redundant data in reason image, can reduce motion frequency simultaneously and change the impact causing accuracy of identification.Due to input picture Longitudinally reflection configuration space architectural feature, the most then embody time-varying multidate information, make maximization pond (max-pooling) operate Scale invariability show as selecting the articulare of more distinction in the vertical, the most then the frequency showing as motion is constant Property, synchronize pond (maximizing pond) when being sky, as shown in Figure 4.Motion frequency is solved by synchronizing pond operation during sky The problem of rate change.

Fig. 5 schematically illustrates hierarchical space-time adaptive feature learning process.Specific implementation process can be adopted With the convolution filter shown in Fig. 5, to carry out two-dimensional convolution operation.

The architectural feature when embodiment of the present invention obtains human skeleton sequence empty indirectly by texture feature extraction information. The information representation when textural characteristics obtained in this step is expressed namely be empty.

S160: textural characteristics is expressed and is mapped as contained behavior class in dimension and data base by full Connection Neural Network layer The most total identical vector, and by soft maximization layer normalization, determine the generic probability of human skeleton.

In this step, data base, for example, it may be Berkeley disclosed in Univ California-Berkeley MHAD data base, the MSR Action3D data base of Microsoft or ChaLearn Gesture Recognition data base.Wherein, Berkeley MHAD data base is gathered by motion capture system, containing 659 sequences, and totally 11 behavior classifications, frame of video Rate is that 480 frames are per second, and its human skeleton provided contains 35 nodes.MSR Action3D data base is by class in early days Kinect device gathers, and frame per second is that 15 frames are per second, totally 557 behavior sequences, 20 behavior classifications, its human skeleton provided Containing 20 nodes.ChaLearn Gesture Recognition data base comprises the Kinect data of 23 hours, totally 20 Italian gestures classification, it is provided that human skeleton comprise 20 nodes, this data base is multi-modal data storehouse, carries simultaneously Supply data sequence and the human skeleton sequence of Kinect output after rgb video, range image sequence, foreground segmentation.

The human skeleton sequence i.e. obtaining being currently entered after vector value normalization after mapping is under the jurisdiction of each behavior class Other probability, then after soft maximization (Softmax) layer normalization, i.e. can get the generic probability of list entries.So, output Vector dimension is identical with behavior classification number.

Specifically, the generic probability of soft maximization layer output can determine according to below equation:

p (C_{k}) = \frac{e^{O_{k}}}{Σ_{i = 0}^{C - 1} e^{O_{i}}}

Wherein, p (C_k) represent generic probability；O_k、O_iRepresent kth and i-th element that full articulamentum exports respectively；C represents Behavior classification sum.

List entries generic can be judged according to the maximum of generic probability.

S170: minimize following maximum likelihood loss function:

L (Ω) = - Σ_{m = 0}^{M - 1} l n Σ_{k = 0}^{C - 1} δ (k - r) p (C_{k} | x_{m})

Wherein, the total amount of the human skeleton motion sequence samples during M represents data base；δ () represents Kronecker letter Number；C represents behavior classification sum；K represents behavior item number；p(C_k|x_m) represent m-th human skeleton motion sequence samples x_mIt is subordinate to In kth behavior C_kGeneric probability；R is human skeleton motion sequence samples x_mCorresponding real behavior category label；L (Ω) table Show maximum likelihood loss function.

The reverse procedure of training uses BPTT (Back-Propagation Through Time) algorithm to carry out.

S180: select four corners and center the image expression answered from human skeleton sequence pair and flipped image thereof Territory, carries out behavior kind judging respectively, and determines the behavior classification belonging to human skeleton sequence in the way of ballot.

Below by the Sequence Transformed rear image dimension of human skeleton for illustrating as a example by 60x 60.It is cut out from image The image of 52x 52 is used for testing, respectively with input picture coordinate points (0,0), (0,8), (8,0), (8,8), (4,4) for shearing The top left corner apex of image, shears the image of 5 52x 52 and respective flipped image totally 10 images are used for testing, last root Vote according to the test result of these 10 images, to judge the behavior classification belonging to list entries.

Owing to the length of human skeleton sequence is different, then the image table that the human skeleton sequence pair obtained by step S140 is answered Reach ground width the most different.

In order to unify yardstick, in order to subsequent treatment, in a preferred embodiment, after step s 140, also may be used To include that the image expression answering human skeleton sequence pair is normalized.

The embodiment of the present invention also proposes a kind of human skeleton motion sequence Activity recognition method.As shown in Figure 6, the method can To include:

S200: obtain the node coordinate of human skeleton sequence to be analyzed.

Wherein, the acquisition methods that this step relates to is referred to being embodied as of step S100 in above-described embodiment Journey, does not repeats them here.

S210: the node being analysed to each limbs of human skeleton corresponding concatenates, forms extremity and trunk motion feature Express.

Specific implementation process about this step is referred to the explanation of step S110 in above-described embodiment, the most superfluous at this State.

S220: expressed by the motion feature of extremity and trunk and concatenate, forms the vector expression of human skeleton to be analyzed.

Wherein, the specific implementation process about this step is referred to the explanation of step S120 in above-described embodiment, at this Repeat no more.

S230: be analysed to the vector expression corresponding to each frame in human skeleton motion sequence, arrange sequentially in time, Obtain three-dimensional matrice.

Wherein, the specific implementation process about this step is referred to the explanation of step S130 in above-described embodiment, at this Repeat no more.

S240: the numerical value in three-dimensional matrice does normalization and dimension normalization, obtains human skeleton sequence pair to be analyzed The image expression answered.

This step is Sequence Transformed for corresponding image expression by being analysed to human skeleton, it is achieved that be analysed to human body The static structure information that during empty in frame sequence, multidate information is converted in image.

Specific implementation process about this step is referred to the explanation of step S140 in above-described embodiment, the most superfluous at this State.

S250: determine four folding corner regions the image expression answered from human skeleton sequence pair to be analyzed and flipped image thereof And central area.

S260: utilize above-mentioned human skeleton motion sequence Activity recognition method to extract four folding corner regions and central area Textural characteristics is expressed, and expresses based on this textural characteristics, determines the row belonging to human skeleton sequence to be analyzed in the way of ballot For classification.

In this step, a human skeleton motion sequence Activity recognition mould can be built by step S100 to step S180 Type, it can be convolutional neural networks model, utilizes this model (the most hierarchical bank of filters) to extract four comer area The textural characteristics of territory and central area image is expressed, and indirectly realizes human skeleton sequence institute to be analyzed based on the expression obtained The identification of genus behavior.

Effectiveness of the invention is verified below by experimental result.

Experiment is carried out on the public data storehouse of three standards, and it is disclosed in Univ California-Berkeley respectively Berkeley MHAD data base, the MSR Action3D data base of Microsoft, and the most challenging ChaLearn Gesture Recognition data base.Data base's entirety is divided into training set, checking collects and test set three part, totally 955 videos, Each video duration 1-2 minute.

Table one schematically illustrates the experimental result on Berkeley MHAD data base.

Table one:

Table two schematically illustrates the experimental result on MSR Action3D data base.

Table two:

Table three schematically illustrates the experimental result on ChaLearn Gesture Recognition data base.

Table three:

Come this model is calculated effect as a example by the experiment on ChaLearn gesture recognition dataset Rate is analyzed, and wherein, F1-score, also known as balance F mark, is the harmonic mean of accuracy rate and recall rate.The present invention implements Example realizes based on the convolutional neural networks code ConvNet that increases income.Running on video card with GPU code, training process processes One sequence probably needs 1.95ms, and test process uses the mode of ballot, general 2.27ms/sequence.This efficiency has been Entirely can meet real-time application demand.

Totally test result indicate that, the method discrimination on above three public data storehouse has all reached the most high-precision Degree, and this model manipulation is simple, has the highest computational efficiency, it is simple to actual application.

Although in above-described embodiment, each step is described according to the mode of above-mentioned precedence, but this area Those of skill will appreciate that, in order to realize the effect of the present embodiment, perform not necessarily in such order between different steps, It can simultaneously (parallel) perform or perform with reverse order, these simply change all protection scope of the present invention it In.

The technical scheme provided the embodiment of the present invention above is described in detail.Although applying concrete herein Individual example principle and the embodiment of the present invention are set forth, but, the explanation of above-described embodiment be only applicable to help reason Solve the principle of the embodiment of the present invention；For those skilled in the art, according to the embodiment of the present invention, it is being embodied as All can make a change within mode and range of application.

It should be noted that referred to herein to flow chart be not limited solely to form shown in this article, it is all right Carry out other to divide and/or combination.

It can further be stated that: labelling and word in accompanying drawing are intended merely to be illustrated more clearly that the present invention, and it is right to be not intended as The improper restriction of scope.

Term " includes ", " comprising " or any other like term are intended to comprising of nonexcludability, so that Process, method, article or equipment/device including a series of key elements not only include those key elements, but also include the brightest Other key element really listed, or also include the key element that these processes, method, article or equipment/device are intrinsic.

Each step of the present invention can realize with general calculating device, and such as, they can concentrate on single Calculate on device, such as: personal computer, server computer, handheld device or portable set, laptop device or many Processor device, it is also possible to be distributed on the network that multiple calculating device is formed, they can be to be different from order herein Step shown or described by execution, or they are fabricated to respectively each integrated circuit modules, or by many in them Individual module or step are fabricated to single integrated circuit module and realize.Therefore, the invention is not restricted to any specific hardware and soft Part or its combination.

The method that the present invention provides can use PLD to realize, it is also possible to is embodied as computer program soft Part or program module (it include performing particular task or realize the routine of particular abstract data type, program, object, assembly or Data structure etc.), can be such as a kind of computer program according to embodiments of the invention, run this computer program Product makes computer perform for the method demonstrated.Described computer program includes computer-readable recording medium, should Comprise computer program logic or code section on medium, be used for realizing described method.Described computer-readable recording medium can To be the built-in medium being mounted in a computer or the removable medium (example that can disassemble from basic computer As: use the storage device of hot plug technology).Described built-in medium includes but not limited to rewritable nonvolatile memory, Such as: RAM, ROM, flash memory and hard disk.Described removable medium includes but not limited to: optical storage media is (such as: CD- ROM and DVD), magnetic-optical storage medium (such as: MO), magnetic storage medium (such as: tape or portable hard drive), have built-in can Rewrite the media (such as: storage card) of nonvolatile memory and there are the media (such as: ROM box) of built-in ROM.

Particular embodiments described above, has been carried out the purpose of the present invention, technical scheme and beneficial effect the most in detail Describe in detail bright, be it should be understood that the specific embodiment that the foregoing is only the present invention, be not limited to the present invention, all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included in the guarantor of the present invention Within the scope of protecting.

Claims

1. a human skeleton motion sequence Activity recognition method, it is characterised in that described method at least includes:

Obtain described human skeleton node coordinate；Wherein, the described human skeleton in a series of continuous moment constitutes described human bone Frame motion sequence；

Described vector corresponding to each frame in described human skeleton motion sequence is expressed, arranges sequentially in time, obtain three Dimension matrix；

Numerical value in described three-dimensional matrice is done normalization and dimension normalization, obtains the image table that human skeleton sequence pair is answered Reach；

Described textural characteristics is expressed by full Connection Neural Network layer be mapped as in dimension and data base contained by behavior classification total The vector that number is identical, and by soft maximization layer normalization, determine the generic probability of described human skeleton；

Minimize following maximum likelihood loss function:

L (Ω) = - Σ_{m = 0}^{M - 1} l n Σ_{k = 0}^{C - 1} δ (k - r) p (C_{k} | x_{m})

The image expression answered from described human skeleton sequence pair and flipped image thereof determine four folding corner regions and central area, Carry out behavior kind judging respectively, and determine the behavior classification belonging to described human skeleton sequence in the way of ballot.

Method the most according to claim 1, it is characterised in that described acquisition described human skeleton node coordinate specifically wraps Include:

For described extremity and described torso portion, according to each node physical connection order respectively to three coordinate components of x, y, z Arrange, form described extremity and described trunk and express at y-z, z-x, the vector of x-y plane projection, thus obtain institute State human skeleton node coordinate.

Method the most according to claim 2, it is characterised in that described node corresponding for each for described human skeleton limbs is entered Row concatenation, the motion feature forming extremity and trunk is expressed, and specifically includes:

According to left arm, right arm, trunk, left lower limb, the order of right lower limb, described vector is expressed and concatenates, thus form institute The motion feature stating extremity and described trunk is expressed.

Method the most according to claim 3, it is characterised in that described by the motion feature table of described extremity and described trunk Reach and concatenate, form the vector expression of human skeleton, specifically include:

The motion feature of described extremity and described trunk is expressed and is sequentially arranged, formed described human skeleton described to Scale reaches.

Method the most according to claim 1, it is characterised in that described numerical value in described three-dimensional matrice is done normalization and Dimension normalization, obtains the image expression that human skeleton sequence pair is answered, specifically includes:

p = f l o o r (255 \times \frac{p - c_{\min}}{c_{m a x} - c_{\min}})

Wherein, the pixel value during described p represents the image expression that described human skeleton sequence pair is answered；Described c_maxWith described c_minPoint Biao Shi the maximum and minimum value of all human skeleton node coordinates in training set；Described floor represents downward bracket function.

Method the most according to claim 1, it is characterised in that described method also includes:

8. a human skeleton motion sequence Activity recognition method, it is characterised in that described method includes:

Obtain the node coordinate of human skeleton sequence to be analyzed；

Node corresponding for the described each limbs of human skeleton to be analyzed is concatenated, forms extremity and trunk motion feature is expressed；

The motion feature of described extremity and described trunk is expressed and concatenates, form the vector table of described human skeleton to be analyzed Reach；

Described vector corresponding to each frame in described human skeleton motion sequence to be analyzed is expressed, arranges sequentially in time, Obtain three-dimensional matrice；

Numerical value in described three-dimensional matrice does normalization and dimension normalization, and obtaining described human skeleton sequence pair to be analyzed should Image expression；

Four folding corner regions and central area is determined from described image expression and flipped image thereof；

Arbitrary described method in claim 1 to 7 is utilized to extract described four folding corner regions and the texture of described central area Feature representation, and express based on described textural characteristics, determine belonging to described human skeleton sequence to be analyzed in the way of ballot Behavior classification.