CN110084211B - Action recognition method - Google Patents

Action recognition method Download PDF

Info

Publication number
CN110084211B
CN110084211B CN201910362475.8A CN201910362475A CN110084211B CN 110084211 B CN110084211 B CN 110084211B CN 201910362475 A CN201910362475 A CN 201910362475A CN 110084211 B CN110084211 B CN 110084211B
Authority
CN
China
Prior art keywords
action
cluster
subgroup
posture
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910362475.8A
Other languages
Chinese (zh)
Other versions
CN110084211A (en
Inventor
杨剑宇
黄瑶
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201910362475.8A priority Critical patent/CN110084211B/en
Publication of CN110084211A publication Critical patent/CN110084211A/en
Application granted granted Critical
Publication of CN110084211B publication Critical patent/CN110084211B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Abstract

The invention provides a motion recognition method, which comprises the steps of obtaining three-dimensional bone joint point information of a target; designing a cross-layer connection neural network to extract features of three-dimensional coordinates of each frame of skeletal joint points of the action sequence in the training set to obtain a feature vector of the frame; clustering all the feature vectors in the training set into K clusters; calculating the support degree of each cluster to each action category; defining a posture subgroup, extracting the posture subgroup from the training set, and forming a posture subgroup set; learning to obtain Z hierarchical classifiers; acquiring a feature vector of each frame of a test action sequence by using a cross-layer connection neural network and dividing the feature vector into clusters closest to the feature vector; respectively calculating the classification results of the Z hierarchical classifiers on the test samples; and selecting the category with the highest occurrence frequency in the classification results of the Z hierarchical classifiers as the category of the test action sequence. The invention reduces the influence of the intra-class difference of the action on the action recognition effect, is not influenced by the action generation rate, and can process the action sequence with any time length.

Description

Action recognition method
Technical Field
The invention relates to a motion recognition method, and belongs to the technical field of human motion recognition.
Background
Human body action recognition is a research hotspot in the field of computer vision and is widely applied to the fields of human-computer interaction, video monitoring, virtual reality and the like. With the development of depth cameras, people can conveniently acquire three-dimensional positions of human skeletal joint points, wherein a large amount of motion information is contained. Therefore, motion recognition based on skeletal joint points has attracted increasing attention from researchers. The motion recognition research is carried out by utilizing the bone joint point data, and the influence of factors such as illumination change, camera view angle change and the like is not easy to influence. However, accurate identification of human body motion remains a very challenging task due to the effects of inter-class differences in motion, occlusion of skeletal joint points, and other factors.
Most existing research efforts, which use hidden markov models, conditional random fields or temporal pyramids to construct temporal dynamic models of sequences of skeletal joint points, extract various features of the spatial structure of human skeletal joint points, such as the relative positions of pairs of skeletal joint points, the angles of skeletal joint points, the amplitudes of movement of skeletal joint points. The effectiveness of the extracted features depends on the accuracy of original bone joint point coordinates and information of the whole action sequence, is easily influenced by individual inaccurate data, and is difficult to effectively identify actions by using the features under the conditions of large time scale change and complex bone joint point space structure. There is a need to design an algorithm that reduces the dependency on the accuracy of the original joint coordinates and performs motion recognition based on key frames in the motion sequence.
Disclosure of Invention
The invention is provided for solving the problems in the prior art, the technical proposal is as follows,
a motion recognition method comprises the following steps:
acquiring three-dimensional bone joint point information of a target by using a depth sensor to obtain three-dimensional coordinates of each bone joint point of a human body, and dividing an obtained action sequence into a training set and a test set;
designing a cross-layer connection neural network model, and extracting features of the three-dimensional coordinates of each frame of skeletal joint points of the action sequence in the training set to obtain a feature vector of the frame;
clustering the feature vectors of all frames in the training set into K clusters;
step four, calculating the weight of each cluster and the support degree of each cluster to each action type;
defining a posture subgroup, and extracting the posture subgroup from the training set to form a posture subgroup set;
defining a classifier corresponding to the class C action in the posture subgroup, wherein C belongs to [1, C ], and C represents the total number of action classes in the training set;
step seven, learning to obtain Z hierarchical classifiers;
step eight, acquiring a feature vector of each frame of the test action sequence by using a cross-layer connection neural network model and dividing the feature vector into clusters closest to the feature vector;
step nine, respectively calculating the classification results of the Z hierarchical classifiers on the test action sequences;
and step ten, selecting the category with the highest occurrence frequency in the classification results of the Z hierarchical classifiers as the category of the test action sequence.
Preferably, the cross-layer connection neural network model in the second step includes a first hidden layer, a second hidden layer, a third hidden layer and a fourth hidden layer, and an output of the first hidden layer
Figure BDA0002047255110000028
Entering a second hidden layer through a relu activation module, and outputting the second hidden layer
Figure BDA0002047255110000029
Entering a third hidden layer through a tanh activation module, and outputting the third hidden layer
Figure BDA00020472551100000210
Entering a fourth hidden layer through a relu activation module, and outputting the fourth hidden layer
Figure BDA00020472551100000211
Output from the first hidden layer
Figure BDA00020472551100000212
Adding the output q of the similarity calculation module, inputting the result into a tanh activation module, performing nonlinear mapping, and outputting the feature vector of the frame by an output layer of the cross-layer connection neural network model;
the input of the cross-layer connection neural network model is a one-dimensional vector x (x) formed by combining three-dimensional coordinates of R joint points of each frame of the action sequence in the training set1,x2,...,x3R)TThe output is (y ═ y)1,y2,...,y3R)TThe numbers of neurons in the first hidden layer, the second hidden layer, the third hidden layer and the fourth hidden layer are N, M, M, N respectively. The calculation mode of the hidden layer is as follows:
Figure BDA0002047255110000021
wherein the content of the first and second substances,
Figure BDA0002047255110000022
input of hidden layer l, WlWeight matrix of hidden layer l, blFor the bias vector of the hidden layer/,
Figure BDA0002047255110000023
the output of hidden layer l is in the range of {1,2,3,4 }.
Figure BDA0002047255110000024
Figure BDA0002047255110000025
b1、b4
Figure BDA0002047255110000026
b2、b3
Figure BDA0002047255110000027
relu and tanh are activation functions, and are preferably relu and tanh, and other activation functions may be selected and are within the scope of the present application. The input to the relu activation module is
Figure BDA0002047255110000031
Output is as
Figure BDA0002047255110000032
The elements of each dimension of the input vector get the corresponding output by:
Figure BDA0002047255110000033
wherein d is1∈[1,D1]。
the input of the tanh activation module is
Figure BDA0002047255110000034
Output is as
Figure BDA0002047255110000035
The elements of each dimension of the input vector get the corresponding output by:
Figure BDA0002047255110000036
wherein d is2∈[1,D2]。
Output of the first hidden layer
Figure BDA0002047255110000037
Output of the fourth hidden layer
Figure BDA0002047255110000038
And adding the outputs q of the similarity calculation module, inputting the results into the tanh activation module, and performing nonlinear mapping. The input of the similarity calculation module is the input of a third hidden layer
Figure BDA0002047255110000039
Similarity calculation matrix UT=[u1,u2,…,uN]TWherein u is1、u2、…、uNAre all column vectors, un=[u1n,u2n,...,uMn]T,umn∈[0,1],m∈[1,M],n∈[1,N],umnIs randomly set to the interval [0,1 ]]Any number of (2). Output of
Figure BDA00020472551100000310
The similarity calculation module obtains a characteristic vector and u by the action of x through two hidden layers and an activation function1、u2、…、uNA similarity calculation is performed to increase the dimensionality of the feature vector from M to N. Due to unThe element value of each dimension is between 0 and 1, and the output of the tanh activation function is also between 0 and 1, so the output of the tanh activation module is selected as the input of the similarity calculation module. Computing mode of output layerComprises the following steps:
y=tanh(WOo+bO),
wherein the content of the first and second substances,
Figure BDA00020472551100000311
is an input to the output layer or layers,
Figure BDA00020472551100000312
in order to be a weight matrix, the weight matrix,
Figure BDA00020472551100000313
is a bias vector.
The loss function of the cross-layer connection neural network is loss | | | | x-y | | | sweet wind2Defining the feature vector f of the frame in the action sequence as the input of the third hidden layer
Figure BDA0002047255110000041
Preferably, the third step clusters the feature vectors of all frames in the training set into K clusters, and the specific steps are as follows:
a. randomly selecting one vector from all the feature vectors in the training set as a first clustering center;
b. calculating the shortest distance between each feature vector and the current existing clustering center, namely the Euclidean distance between each feature vector and the nearest clustering center, sequencing the feature vectors from large to small, and randomly selecting one feature vector from the feature vectors corresponding to the first K distances as the next clustering center;
c. repeating the step b until K eigenvectors are selected as K clustering centers;
d. calculating Euclidean distances from each feature vector in the training set to K clustering centers, and dividing each vector into clusters corresponding to the nearest clustering center;
e. recalculating the center μ of each clusterkThe new center is the mean of all the feature vectors in the cluster, and the calculation formula is as follows:
Figure BDA0002047255110000042
wherein n iskIndicates the number of feature vectors in the kth cluster, fiRepresents the feature vector in the cluster, i ∈ [1, n ∈ ]k],k∈[1,K];
f. Defining the distance χ between the characteristic vector f and the kth clusterkThe sum of the Euclidean distance between the feature vector and the center of the cluster and the Euclidean distance between the feature vector and the 3 feature vectors which are farthest away from the feature vector in the cluster is expressed by a formula:
Figure BDA0002047255110000043
wherein the content of the first and second substances,
Figure BDA0002047255110000044
3 feature vectors with the farthest distance f in the kth cluster;
g. calculating the distance between each feature vector and K clusters, and dividing the distance into the clusters closest to the feature vector;
h. recalculating the center of each cluster;
i. judging whether the center of each cluster is changed, and finishing clustering if the center of each cluster is not changed; otherwise, repeating g and h in sequence until the centers of all the clusters are not changed.
Preferably, the calculation formula of the weight of each cluster in the fourth step is as follows:
Figure BDA0002047255110000051
wherein the content of the first and second substances,
Figure BDA0002047255110000052
nk,cis the number of the feature vectors of the class c action in the kth cluster, nkThe number of feature vectors, w, for all action classes in the kth clusterkIs the weight of the kth cluster, K ∈ [1, K ∈]。
Further, the calculation formula of the support degree of each cluster in the fourth step for each action category is as follows:
sk,c=wk*rk,c
wherein s isk,cSupport for class c actions for the kth cluster.
Further, the definition of "gesture subgroup" in said step five means that N is to be definedαA set of cluster centers is defined as a size NαPosture subgroup PαThe alpha attitude subgroup is formulated as:
Figure BDA0002047255110000053
wherein
Figure BDA0002047255110000054
Is a set of { mu } fromk|k∈[1,K]N selected fromαA cluster center;
the training set has a total of J action sequences, of which each action sequence Vj(J is more than or equal to 1 and less than or equal to J), calculating the cluster to which each frame in the sequence belongs, and forming the centers of the clusters into a set EjA total of J cluster center sets are available;
for each cluster center set Ej,EjEach non-empty subset is a posture subgroup, the posture subgroups with 2 or 3 elements are taken out, and the posture subgroups form an action sequence VjPosture subgroup set GjCombining all the posture subgroups taken out from the J cluster center sets into a posture subgroup set
Figure BDA0002047255110000055
Further, the gesture subgroup is defined in the sixth step
Figure BDA0002047255110000056
The classifier corresponding to the class c action is
Figure BDA0002047255110000061
The calculation method is as follows:
setting the action sequence labels belonging to the class c action in the current training set to be 1, and setting the labels of the rest action sequences to be 0; the number of action sequences in the current training set is recorded as
Figure BDA0002047255110000062
For action sequences in the current training set
Figure BDA0002047255110000063
If the gesture subgroup PαIncluded in the sequence of actions
Figure BDA0002047255110000064
Cluster center set of
Figure BDA0002047255110000065
Namely, it is
Figure BDA0002047255110000066
And P isαThe sum of the supporting degrees of the cluster to the class c action of the cluster center in (1) is more than a threshold value theta (P)α) Then, then
Figure BDA0002047255110000067
Otherwise
Figure BDA0002047255110000068
Is formulated as:
Figure BDA0002047255110000069
wherein the content of the first and second substances,
Figure BDA00020472551100000610
to represent
Figure BDA00020472551100000611
The support degree of the cluster to which each cluster belongs to the type c action;
selecting theta to makeClassifier
Figure BDA00020472551100000612
And (3) the average error of all action sequence classification in the current training set belongs to the minimum, and the formula is as follows:
Figure BDA00020472551100000613
wherein the content of the first and second substances,
Figure BDA00020472551100000614
representing sequences of actions in a current training set
Figure BDA00020472551100000615
The label of (1).
Further, the specific steps of learning and obtaining the Z hierarchical classifiers in the seventh step are as follows:
a. randomly and repeatedly extracting J action sequences from the training set to serve as a current sub-training set S;
b. randomly selecting L posture subgroups from the posture subgroup set G, sorting the posture subgroups according to the selection sequence to form a candidate posture subgroup set G' ═ { P1',P'2,...,P'LIn which P is1',P'2,...,P'LSelecting a posture subgroup from the posture subgroup set G;
c. calculating the difference degree of the action sequences in the current sub-training set S, wherein the calculation formula is as follows:
Figure BDA0002047255110000071
wherein p iscRepresenting the proportion of sequences belonging to the class c action in the current sub-training set;
d. selecting a posture subgroup from the candidate posture subgroup set, finding a classifier meeting the requirement, and taking the classifier as the classifier of the current sub-training set S, wherein the specific steps are as follows:
selecting a first posture subgroup P from the candidate posture subgroup set1' from the 1 st action category, calculate the corresponding classifier
Figure BDA0002047255110000072
The training set S is divided into two sets: set S1And set S2;S1To satisfy
Figure BDA0002047255110000073
Set of action sequences of S2To satisfy
Figure BDA0002047255110000074
A set of action sequences of (1);
separately calculate the set S1And set S2Degree of difference of (2)12Selecting the smaller difference degree and the difference between the two, and judging the current classifier if the difference value is larger than the threshold lambda
Figure BDA0002047255110000075
According with the division condition, the attitude subgroup is removed from the candidate attitude subgroup set, and the classifier is used
Figure BDA0002047255110000076
A classifier as a partition set S;
if not, continuing to select the next action category and judging whether the corresponding classifier meets the division condition or not until the classifier meeting the division condition is found or all action categories are traversed;
and if the classifiers which accord with the division conditions are not found in all the action categories in traversing the currently selected posture subgroup, sequentially selecting the next posture subgroup from the candidate posture subgroup set, and calculating the classifiers which correspond to different action categories until the classifiers which accord with the requirements are found.
e. Dividing the current training set S into sets S when a classifier meeting the requirements is found1And set S2Then, the sets S are respectively judged1And set S2Act inIf the sequence belongs to the same category, the set is taken as the current sub-training set, the steps c and d are repeated in sequence, a classifier meeting the requirements is found and is taken as the classifier corresponding to the set, the set is continuously divided until the set does not need to be divided, and at the moment, a hierarchical classifier F is obtainedz
f. Repeating the steps a to e until Z hierarchical classifiers are obtained.
Further, in the ninth step, the classification method of the test action sequence by the Z hierarchical classifiers is as follows:
for each level classifier, firstly, the classifier corresponding to the first sub-training set is used for dividing the test action sequence, after the test action sequence is divided into a certain set, the classifier corresponding to the set is used for continuously dividing the test action sequence, and the steps are repeated until the test action sequence can not be divided; at this time, the class to which the training action sequence in the set into which the test action sequence is divided belongs is the class to which the training action sequence belongs.
The invention designs the cross-layer connection neural network to extract the characteristics of the human body posture of each frame of the action sequence, thereby reducing the influence of the intra-class difference of the action on the action recognition effect; extracting key frames of the action sequence, and classifying actions based on the key frames, so that the method is not influenced by the action occurrence rate; a sequence of actions of arbitrary length of time can be processed.
Drawings
FIG. 1 is a flow chart of the operation of a method of motion recognition in accordance with the present invention.
FIG. 2 is a schematic diagram of a cross-layer connection neural network model according to the present invention.
FIG. 3 is a block diagram of a similarity calculation module according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a method for recognizing an action includes the following steps:
1. and acquiring the information of the bone joint points of the target by using the depth sensor to obtain the three-dimensional coordinates of each bone joint point of the human body. The obtained plurality of motion sequences are divided into a training set and a test set, and the number of the motion sequences in the training set and the test set is J137 and T68, respectively.
2. Combining the three-dimensional coordinates of 20 joint points of each frame of the motion sequence in the training set into a one-dimensional vector x ═ x (x ═ x)1,x2,...,x60)T. A cross-layer connected neural network model as shown in figure 2 was designed. The input of the neural network is x, and the output is y ═ y1,y2,...,y60)TThe number of the neurons in the first hidden layer, the second hidden layer, the third hidden layer and the fourth hidden layer is respectively 50, 30 and 50. The calculation mode of the hidden layer is as follows:
Figure BDA0002047255110000081
wherein the content of the first and second substances,
Figure BDA0002047255110000082
input of hidden layer l, WlWeight matrix of hidden layer l, blFor the bias vector of the hidden layer/,
Figure BDA0002047255110000083
the output of hidden layer l is in the range of {1,2,3,4 }.
Figure BDA0002047255110000091
Figure BDA0002047255110000092
b1、b4
Figure BDA0002047255110000093
b2、b3
Figure BDA0002047255110000094
relu and tanh are activation functions, and are preferably relu and tanh, and other activation functions may be used, and are within the scope of the present patent. The input to the relu activation module is
Figure BDA0002047255110000095
Output is as
Figure BDA0002047255110000096
The elements of each dimension of the input vector get the corresponding output by:
Figure BDA0002047255110000097
wherein d is1∈[1,D1]。
the input of the tanh activation module is
Figure BDA0002047255110000098
Output is as
Figure BDA0002047255110000099
The elements of each dimension of the input vector get the corresponding output by:
Figure BDA00020472551100000910
wherein d is2∈[1,D2]。
Figure BDA00020472551100000911
For the addition operator, it will output the first hidden layer
Figure BDA00020472551100000912
Output of the fourth hidden layer
Figure BDA00020472551100000913
And adding the outputs q of the similarity calculation module, inputting the results into the tanh activation module, and performing nonlinear mapping. The similarity calculation module is shown in FIG. 3, and its input is the input of the third hidden layer
Figure BDA00020472551100000914
Similarity calculation matrix UT=[u1,u2,…,u50]TWherein u is1、u2、…、u50Are all column vectors, un=[u1n,u2n,...,u30n]T,umn∈[0,1],m∈[1,30],n∈[1,50],umnIs randomly set to the interval [0,1 ]]Any number of (2). Output of
Figure BDA00020472551100000915
The similarity calculation module obtains a characteristic vector and u by the action of x through two hidden layers and an activation function1、u2、…、u50.A similarity calculation is performed to increase the dimensionality of the feature vector from 30 to 50. Due to unThe element value of each dimension is between 0 and 1, and the output of the tanh activation function is also between 0 and 1, so the output of the tanh activation module is selected as the input of the similarity calculation module. The output layer is calculated in the following way:
y=tanh(WOo+bO),
wherein the content of the first and second substances,
Figure BDA0002047255110000101
is an input to the output layer or layers,
Figure BDA0002047255110000102
in order to be a weight matrix, the weight matrix,
Figure BDA0002047255110000103
is a bias vector.
The loss function of the cross-layer connection neural network is loss | | | | x-y | | | sweet wind2Define the same in the action sequenceThe feature vector f of the frame is the input of the third hidden layer
Figure BDA0002047255110000104
3. Each frame of each motion sequence in the training set has a corresponding feature vector. Clustering the feature vectors of all frames in a training set into K clusters, wherein K is 400, and the method comprises the following steps:
step 1: randomly selecting one vector from all the feature vectors in the training set as a first clustering center.
Step 2: and calculating the shortest distance between each feature vector and the current center (namely the Euclidean distance between each feature vector and the nearest center), sequencing the feature vectors from large to small, and randomly selecting one feature vector corresponding to the top 400 feature vectors as the next clustering center.
Step 3: step 2 is repeated until 400 feature vectors are selected as 400 cluster centers.
Step 4: and calculating Euclidean distances from each feature vector in the training set to 400 clustering centers, and dividing each vector into clusters corresponding to the nearest clustering centers.
Step 5: recalculating the center μ of each clusterkThe new center is the mean of all the feature vectors in the cluster, and the calculation formula is as follows:
Figure BDA0002047255110000105
wherein n iskIndicates the number of feature vectors in the kth cluster, fiRepresents the feature vector in the cluster, i ∈ [1, n ∈ ]k],k∈[1,400]。
Step 6: defining the distance χ between the characteristic vector f and the kth clusterkThe sum of the Euclidean distance between the feature vector and the center of the cluster and the Euclidean distance between the feature vector and the 3 feature vectors which are farthest away from the feature vector in the cluster is expressed by a formula:
Figure BDA0002047255110000106
wherein the content of the first and second substances,
Figure BDA0002047255110000111
the 3 feature vectors in the kth cluster that are farthest from f.
Step 7: the distance of each feature vector from 400 clusters is calculated and divided into the clusters closest to it.
Step 8: the center of each cluster is recalculated.
Step 9: and judging whether the center of each cluster is changed or not, and finishing clustering if the center of each cluster is not changed. Otherwise, repeating steps 7 and 8 in sequence until the centers of all clusters are not changed any more.
In the above manner, the feature vectors corresponding to all frames in the training set can be clustered into 400 clusters.
4. Calculating a weight w for each clusterkThe calculation formula is as follows:
Figure BDA0002047255110000112
wherein the content of the first and second substances,
Figure BDA0002047255110000113
nk,cis the number of eigenvectors of class C actions in the kth cluster, C ∈ [1, C]C represents the total number of action categories in the training set; c is set to 8.
5. Defining the support degree s of the kth cluster to the type c actionk,cWeight w for the clusterkThe ratio r of the feature vector of the class c motion in the clusterk,cThe calculation formula is as follows:
sk,c=wk*rk,c
in the above manner, the support degree of each cluster for different action categories can be calculated.
6. Define the "attitude subgroup": will be composed of NαA set of cluster centers is defined as a size NαPosture subgroup PαIs formulated as:
Figure BDA0002047255110000114
wherein the content of the first and second substances,
Figure BDA0002047255110000115
is a set of { mu } fromk|k∈[1,400]N selected fromαA cluster center.
The training set has a total of 137 motion sequences, of which each motion sequence Vj(1 ≦ j ≦ 137), calculating the cluster to which each frame in the sequence belongs, and grouping the centers of these clusters into a set EjA total of 137 cluster-center sets are available.
For each cluster center set Ej,EjEach non-empty subset is a posture subgroup, the posture subgroups with 2 or 3 elements are taken out, and the posture subgroups form an action sequence VjPosture subgroup set Gj. All the posture subgroups taken out of the 137 cluster center sets are combined into a posture subgroup set
Figure BDA0002047255110000121
7. Defining gesture subgroups
Figure BDA0002047255110000122
The classifier corresponding to the class c action is
Figure BDA0002047255110000123
The calculation method is as follows:
and setting the action sequence label belonging to the class c action in the current training set to be 1, and setting the labels of the rest action sequences to be 0. The number of action sequences in the current training set is recorded as
Figure BDA0002047255110000124
For action sequences in the current training set
Figure BDA0002047255110000125
If the gesture subgroup PαIncluded in the sequence of actions
Figure BDA0002047255110000126
Cluster center set of
Figure BDA0002047255110000127
Namely, it is
Figure BDA0002047255110000128
And P isαThe sum of the supporting degrees of the cluster to the class c action of the cluster center in (1) is more than a threshold value theta (P)α) Then, then
Figure BDA0002047255110000129
Otherwise
Figure BDA00020472551100001210
Is formulated as:
Figure BDA00020472551100001211
wherein the content of the first and second substances,
Figure BDA00020472551100001212
to represent
Figure BDA00020472551100001213
And the support degree of the cluster to which each cluster belongs to the action of the type c.
Selecting theta to make the classifier
Figure BDA00020472551100001214
And (3) the average error of all action sequence classification in the current training set belongs to the minimum, and the formula is as follows:
Figure BDA00020472551100001215
wherein the content of the first and second substances,
Figure BDA00020472551100001216
representing sequences of actions in a current training set
Figure BDA00020472551100001217
The label of (1).
8. 137 action sequences are randomly and repeatedly extracted from the training set to be used as a current sub-training set S, and the current sub-training set S is subjected to hierarchical division to obtain a classifier FzThe method comprises the following steps:
step 1: randomly selecting 40 posture subgroups from the posture subgroup set G, and sorting the posture subgroups according to the selection sequence to form a candidate posture subgroup set G' ═ { P1',P'2,...,P'40In which P is1',P'2,...,P'40Is a gesture subgroup selected from the gesture subgroup set.
Step 2: calculating the difference degree of the action sequences in the current sub-training set S, wherein the calculation formula is as follows:
Figure BDA0002047255110000131
wherein p iscRepresenting the proportion of sequences belonging to class c actions in the current sub-training set.
Step 3: selecting a posture subgroup from the candidate posture subgroup set, finding a classifier meeting the requirement, taking the classifier as the classifier of the current sub-training set S, and dividing S into 2 sets, wherein the steps are as follows:
selecting a first posture subgroup P from the candidate posture subgroup set1' from the 1 st action category, calculate the corresponding classifier
Figure BDA0002047255110000132
The training set S is divided into two sets: set S1And set S2。S1To satisfy
Figure BDA0002047255110000133
Motion sequence set ofAnd then, S2To satisfy
Figure BDA0002047255110000134
The set of action sequences of (1).
Separately calculate the set S1And set S2Degree of difference of (2)12Selecting the smaller difference degree and the difference between the two, and judging the current classifier if the difference value is greater than 0.1
Figure BDA0002047255110000135
According with the division condition, the attitude subgroup is removed from the candidate attitude subgroup set, and the classifier is used
Figure BDA0002047255110000136
As a classifier for dividing the set S.
And if not, continuously selecting the next action category, and judging whether the corresponding classifier meets the division condition or not until the classifier meeting the division condition is found or all action categories are traversed.
And if the classifiers which accord with the division conditions are not found in all the action categories in traversing the currently selected posture subgroup, sequentially selecting the next posture subgroup from the candidate posture subgroup set, and calculating the classifiers which correspond to different action categories until the classifiers which accord with the requirements are found.
Step 4: dividing the current training set S into sets S when a classifier meeting the requirements is found1And set S2Then, the sets S are respectively judged1And set S2If not, the set is taken as the current sub-training set, and steps 2 and 3 are repeated in sequence, and the set is continuously divided until no more sets need to be divided. At this point, a hierarchical classifier F is obtainedz
9. Repeat step 8 until 20 hierarchical classifiers are obtained.
10. And connecting the three-dimensional coordinates of 20 joint points of each frame of all the action sequences in the test set into a one-dimensional vector, and inputting the one-dimensional vector into the trained cross-layer connection neural network model to obtain the characteristic vector of the human body posture of each frame of all the test samples.
11. For a certain test sample, the distance between the feature vector of each frame of the action sequence and each cluster is calculated and is divided into the clusters closest to the feature vector.
12. The resulting 20-level classifiers were used to classify the test action sequence:
for each level classifier, firstly, the classifier corresponding to the first sub-training set is used for dividing the test action sequence, after the test action sequence is divided into a certain set, the classifier corresponding to the set is used for continuously dividing the test action sequence, and the steps are repeated until the test action sequence can not be divided any more. At this time, the class to which the training action sequence in the set into which the test action sequence is divided belongs is the class to which the training action sequence belongs.
13. And selecting the category with the highest occurrence frequency in the classification results of the 20 hierarchical classifiers as the category of the test action sequence.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (7)

1. A motion recognition method comprises the following steps:
acquiring three-dimensional bone joint point information of a target by using a depth sensor to obtain three-dimensional coordinates of each bone joint point of a human body, and dividing an obtained action sequence into a training set and a test set;
designing a cross-layer connection neural network model, and extracting features of the three-dimensional coordinates of each frame of skeletal joint points of the action sequence in the training set to obtain a feature vector of each frame;
clustering the feature vectors of all frames in the training set into K clusters;
step four, calculating the weight of each cluster and the support degree of each cluster to each action type;
defining a posture subgroup, and extracting the posture subgroup from the training set to form a posture subgroup set;
defining a classifier corresponding to the class C action in the posture subgroup, wherein C belongs to [1, C ], and C represents the total number of action classes in the training set;
step seven, learning to obtain Z hierarchical classifiers;
step eight, acquiring a feature vector of each frame of the test action sequence by using a cross-layer connection neural network model and dividing the feature vector into clusters closest to the feature vector;
step nine, respectively calculating the classification results of the Z hierarchical classifiers on the test action sequences;
step ten, selecting the category with the largest occurrence frequency in the classification results of the Z hierarchical classifiers as the category of the test action sequence;
clustering the feature vectors of all frames in the training set into K clusters, specifically comprising the following steps:
a. randomly selecting one vector from all the feature vectors in the training set as a first clustering center;
b. calculating the shortest distance between each feature vector and the current existing clustering center, namely the Euclidean distance between each feature vector and the nearest clustering center, sequencing the feature vectors from large to small, and randomly selecting one feature vector from the feature vectors corresponding to the first K distances as the next clustering center;
c. repeating the step b until K eigenvectors are selected as K clustering centers;
d. calculating Euclidean distances from each feature vector in the training set to K clustering centers, and dividing each vector into clusters corresponding to the nearest clustering center;
e. recalculating the center μ of each clusterkThe new center is the mean of all the feature vectors in the cluster, and the calculation formula is as follows:
Figure FDA0002769391400000021
wherein n iskIndicates the number of feature vectors in the kth cluster, fiRepresents the feature vector in the cluster, i ∈ [1, n ∈ ]k],k∈[1,K];
f. Defining the distance χ between the characteristic vector f and the kth clusterkThe sum of the Euclidean distance between the feature vector and the center of the cluster and the Euclidean distance between the feature vector and the 3 feature vectors which are farthest away from the feature vector in the cluster is expressed by a formula:
Figure FDA0002769391400000022
wherein the content of the first and second substances,
Figure FDA0002769391400000023
3 feature vectors with the farthest distance f in the kth cluster;
g. calculating the distance between each feature vector and K clusters, and dividing the distance into the clusters closest to the feature vector;
h. recalculating the center of each cluster;
i. judging whether the center of each cluster is changed, and finishing clustering if the center of each cluster is not changed; otherwise, repeating g and h in sequence until the centers of all clusters are not changed any more;
defining the gesture subgroup in the step five means that the gesture subgroup is formed by NαA set of cluster centers is defined as a size NαPosture subgroup PαThe alpha attitude subgroup is formulated as:
Figure FDA0002769391400000024
wherein the content of the first and second substances,
Figure FDA0002769391400000025
is a set of { mu } fromk|k∈[1,K]N selected fromαA cluster center;
the training set has a total of J action sequences, of which each action sequence Vj(J is more than or equal to 1 and less than or equal to J), calculating the cluster to which each frame in the sequence belongs, and forming the centers of the clusters into a set EjA total of J cluster center sets are available;
for each cluster center set Ej,EjEach non-empty subset is a posture subgroup, the posture subgroups with 2 or 3 elements are taken out, and the posture subgroups form an action sequence VjPosture subgroup set GjCombining all the posture subgroups taken out from the J cluster center sets into a posture subgroup set
Figure FDA0002769391400000031
2. A motion recognition method according to claim 1, characterized in that: the cross-layer connection neural network model in the second step comprises a first hidden layer, a second hidden layer, a third hidden layer and a fourth hidden layer, and the output of the first hidden layer
Figure FDA0002769391400000032
Entering a second hidden layer through a relu activation module, and outputting the second hidden layer
Figure FDA0002769391400000033
Entering a third hidden layer through a tanh activation module, and outputting the third hidden layer
Figure FDA0002769391400000034
Entering a fourth hidden layer through a relu activation module, and outputting the fourth hidden layer
Figure FDA0002769391400000035
Output from the first hidden layer
Figure FDA0002769391400000036
Similarity calculationAdding the output q of the modules, inputting the result into a tanh activation module, carrying out nonlinear mapping, and outputting the feature vector of the frame by an output layer of a cross-layer connection neural network model;
the input of the cross-layer connection neural network model is a one-dimensional vector x (x) formed by combining three-dimensional coordinates of R joint points of each frame of the action sequence in the training set1,x2,...,x3R)TThe output is (y ═ y)1,y2,...,y3R)TThe number of neurons in the first hidden layer, the second hidden layer, the third hidden layer and the fourth hidden layer is N, M, M, N respectively, and the output of each hidden layer is calculated in the following way:
Figure FDA0002769391400000037
wherein
Figure FDA0002769391400000038
Input of hidden layer l, WlWeight matrix of hidden layer l, blFor the bias vector of the hidden layer/,
Figure FDA0002769391400000039
for the output of hidden layer l, l is in {1,2,3,4},
Figure FDA00027693914000000310
Figure FDA00027693914000000311
the input to the relu activation module is
Figure FDA00027693914000000312
Output is as
Figure FDA00027693914000000313
The elements of each dimension of the input vector get the corresponding output by:
Figure FDA00027693914000000314
wherein D ∈ [1, D ];
the input of the tanh activation module is
Figure FDA00027693914000000315
Output is as
Figure FDA00027693914000000316
The elements of each dimension of the input vector get the corresponding output by:
Figure FDA0002769391400000041
wherein D 'belongs to [1, D' ];
output of the similarity calculation module
Figure FDA0002769391400000042
The input of the similarity calculation module is the input of a third hidden layer
Figure FDA0002769391400000043
Similarity calculation matrix UT=[u1,u2,…,uN]TWherein u is1、u2、…、uNAre all column vectors, un=[u1n,u2n,...,uMn]T,umn∈[0,1],m∈[1,M],n∈[1,N],umnIs randomly set to the interval [0,1 ]]Any one of the above; the calculation mode of the output layer of the cross-layer connection neural network is as follows:
y=tanh(WOo+bO),
wherein the content of the first and second substances,
Figure FDA0002769391400000044
is an input to the output layer or layers,
Figure FDA0002769391400000045
in order to be a weight matrix, the weight matrix,
Figure FDA0002769391400000046
is a bias vector;
the loss function of the cross-layer connection neural network is loss | | | | x-y | | | sweet wind2Defining the feature vector f of the frame in the action sequence as the input of the third hidden layer
Figure FDA0002769391400000047
3. A motion recognition method according to claim 1, characterized in that: the calculation formula of the weight of each cluster in the fourth step is as follows:
Figure FDA0002769391400000048
wherein the content of the first and second substances,
Figure FDA0002769391400000049
nk,cis the number of the feature vectors of the class c action in the kth cluster, nkThe number of feature vectors, w, for all action classes in the kth clusterkIs the weight of the kth cluster, K ∈ [1, K ∈],c∈[1,C]And C represents the total number of action classes in the training set.
4. A motion recognition method according to claim 3, characterized in that: the calculation formula of the support degree of each cluster in the fourth step to each action category is as follows:
sk,c=wk*rk,c
wherein s isk,cSupport for class c actions for the kth cluster.
5. According toA motion recognition method as claimed in claim 1, characterized in that: defining a posture subgroup in step six
Figure FDA0002769391400000051
The classifier corresponding to the class c action is
Figure FDA0002769391400000052
The calculation method is as follows:
setting the action sequence labels belonging to the class c action in the current training set to be 1, and setting the labels of the rest action sequences to be 0; the number of action sequences in the current training set is recorded as
Figure FDA0002769391400000053
For action sequences in the current training set
Figure FDA0002769391400000054
If the gesture subgroup PαIncluded in the sequence of actions
Figure FDA0002769391400000055
Cluster center set of
Figure FDA0002769391400000056
Namely, it is
Figure FDA0002769391400000057
And P isαThe sum of the supporting degrees of the cluster to the class c action of the cluster center in (1) is more than a threshold value theta (P)α) Then, then
Figure FDA0002769391400000058
Otherwise
Figure FDA0002769391400000059
Is formulated as:
Figure FDA00027693914000000510
wherein the content of the first and second substances,
Figure FDA00027693914000000511
to represent
Figure FDA00027693914000000512
The support degree of the cluster to which each cluster belongs to the type c action;
selecting theta to make the classifier
Figure FDA00027693914000000513
And (3) the average error of all action sequence classification in the current training set belongs to the minimum, and the formula is as follows:
Figure FDA00027693914000000514
wherein the content of the first and second substances,
Figure FDA00027693914000000515
representing sequences of actions in a current training set
Figure FDA00027693914000000516
The label of (1).
6. A motion recognition method according to claim 5, characterized in that: the specific steps of learning and obtaining the Z hierarchical classifiers in the seventh step are as follows:
a. randomly and repeatedly extracting J action sequences from the training set to serve as a current sub-training set S;
b. randomly selecting L posture subgroups from the posture subgroup set G, sorting the posture subgroups according to the selection sequence to form a candidate posture subgroup set G' ═ { P1′,P′2,...,P′LIn which P is1′,P′2,...,P′LFor gesture subgroups selected from gesture subgroup set G;
c. Calculating the difference degree of the action sequences in the current sub-training set S, wherein the calculation formula is as follows:
Figure FDA0002769391400000061
wherein p iscRepresenting the proportion of sequences belonging to the class c action in the current sub-training set;
d. selecting a posture subgroup from the candidate posture subgroup set, finding a classifier meeting the requirement, and taking the classifier as the classifier of the current sub-training set S, wherein the specific steps are as follows:
selecting a first posture subgroup P from the candidate posture subgroup set1' from the 1 st action category, calculate the corresponding classifier
Figure FDA0002769391400000062
The training set S is divided into two sets: set S1And set S2;S1To satisfy
Figure FDA0002769391400000063
Set of action sequences of S2To satisfy
Figure FDA0002769391400000064
A set of action sequences of (1);
separately calculate the set S1And set S2Degree of difference of (2)12Selecting the smaller difference degree and the difference between the two, and judging the current classifier if the difference value is larger than the threshold lambda
Figure FDA0002769391400000065
According with the division condition, the attitude subgroup is removed from the candidate attitude subgroup set, and the classifier is used
Figure FDA0002769391400000066
As a set of divisions SA classifier;
if not, continuing to select the next action category and judging whether the corresponding classifier meets the division condition or not until the classifier meeting the division condition is found or all action categories are traversed;
if the classifiers which meet the dividing conditions are not found in all the action categories in a traversing way for the currently selected posture subgroup, sequentially selecting the next posture subgroup from the candidate posture subgroup set, and calculating the classifiers which correspond to different action categories until the classifiers which meet the requirements are found;
e. dividing the current training set S into sets S when a classifier meeting the requirements is found1And set S2Then, the sets S are respectively judged1And set S2If not, the set is taken as the current sub-training set, the steps c and d are repeated in sequence, a classifier meeting the requirements is found and taken as a classifier corresponding to the set, the set is continuously divided until no more sets need to be divided, and at this time, a hierarchical classifier F is obtainedz
f. Repeating the steps a to e until Z hierarchical classifiers are obtained.
7. A motion recognition method according to claim 6, characterized in that: in the ninth step, the classification method of the test action sequence by the Z hierarchical classifiers is as follows:
for each level classifier, firstly, the classifier corresponding to the first sub-training set is used for dividing the test action sequence, after the test action sequence is divided into a certain set, the classifier corresponding to the set is used for continuously dividing the test action sequence, and the steps are repeated until the test action sequence can not be divided; at this time, the class to which the training action sequence in the set into which the test action sequence is divided belongs is the class to which the training action sequence belongs.
CN201910362475.8A 2019-04-30 2019-04-30 Action recognition method Active CN110084211B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910362475.8A CN110084211B (en) 2019-04-30 2019-04-30 Action recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910362475.8A CN110084211B (en) 2019-04-30 2019-04-30 Action recognition method

Publications (2)

Publication Number Publication Date
CN110084211A CN110084211A (en) 2019-08-02
CN110084211B true CN110084211B (en) 2020-12-18

Family

ID=67418132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910362475.8A Active CN110084211B (en) 2019-04-30 2019-04-30 Action recognition method

Country Status (1)

Country Link
CN (1) CN110084211B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364922B (en) * 2020-11-13 2023-01-10 苏州浪潮智能科技有限公司 Method and system for predicting human skeleton motion in machine room environment
CN112712003B (en) * 2020-12-25 2022-07-26 华南理工大学 Joint tag data enhancement method for identifying skeleton action sequence
CN113591797B (en) * 2021-08-23 2023-07-28 苏州大学 Depth video behavior recognition method
CN114565784A (en) * 2022-03-15 2022-05-31 平安科技(深圳)有限公司 Pedestrian abnormal behavior detection method and device based on clustering algorithm and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663449A (en) * 2012-03-12 2012-09-12 西安电子科技大学 Method for tracing human body movement based on maximum geometric flow histogram
CN103198492A (en) * 2013-03-28 2013-07-10 沈阳航空航天大学 Human motion capture method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9443314B1 (en) * 2012-03-29 2016-09-13 Google Inc. Hierarchical conditional random field model for labeling and segmenting images
CN104318248B (en) * 2014-10-21 2018-04-06 北京智谷睿拓技术服务有限公司 Action identification method and action recognition device
KR20180069452A (en) * 2016-12-15 2018-06-25 삼성전자주식회사 Method for training the neural network, method for recogning using neural network and apparatus thereof
CN108537145A (en) * 2018-03-21 2018-09-14 东北电力大学 Human bodys' response method based on space-time skeleton character and depth belief network
CN108681700B (en) * 2018-05-04 2021-09-28 苏州大学 Complex behavior identification method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663449A (en) * 2012-03-12 2012-09-12 西安电子科技大学 Method for tracing human body movement based on maximum geometric flow histogram
CN103198492A (en) * 2013-03-28 2013-07-10 沈阳航空航天大学 Human motion capture method

Also Published As

Publication number Publication date
CN110084211A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN110084211B (en) Action recognition method
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
CN110321833B (en) Human body behavior identification method based on convolutional neural network and cyclic neural network
Swets et al. Hierarchical discriminant analysis for image retrieval
Phung et al. A pyramidal neural network for visual pattern recognition
CN110163258A (en) A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
Zhang et al. Detecting densely distributed graph patterns for fine-grained image categorization
Frolova et al. Most probable longest common subsequence for recognition of gesture character input
CN106897390A (en) Target precise search method based on depth measure study
Cong et al. Self-supervised online metric learning with low rank constraint for scene categorization
CN105894047A (en) Human face classification system based on three-dimensional data
CN111950455B (en) Motion imagery electroencephalogram characteristic identification method based on LFFCNN-GRU algorithm model
CN110119707B (en) Human body action recognition method
Khan et al. Facial expression recognition on real world face images using intelligent techniques: A survey
Bu Human motion gesture recognition algorithm in video based on convolutional neural features of training images
Zhang et al. Multiview unsupervised shapelet learning for multivariate time series clustering
Sooai et al. Comparison of recognition accuracy on dynamic hand gesture using feature selection
CN108520205B (en) motion-KNN-based human body motion recognition method
CN110070070B (en) Action recognition method
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN110163130A (en) A kind of random forest grader and classification method of the feature pre-align for gesture identification
Rangulov et al. Emotion recognition on large video dataset based on convolutional feature extractor and recurrent neural network
Zhang et al. Gesture recognition using enhanced depth motion map and static pose map
CN113887509B (en) Rapid multi-modal video face recognition method based on image set
CN115937910A (en) Palm print image identification method based on small sample measurement network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant