CN111291693B - Deep integration method based on skeleton motion recognition - Google Patents

Deep integration method based on skeleton motion recognition Download PDF

Info

Publication number
CN111291693B
CN111291693B CN202010097008.XA CN202010097008A CN111291693B CN 111291693 B CN111291693 B CN 111291693B CN 202010097008 A CN202010097008 A CN 202010097008A CN 111291693 B CN111291693 B CN 111291693B
Authority
CN
China
Prior art keywords
motion recognition
joints
spatial
features
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010097008.XA
Other languages
Chinese (zh)
Other versions
CN111291693A (en
Inventor
杨会成
徐姝琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Polytechnic University
Original Assignee
Anhui Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Polytechnic University filed Critical Anhui Polytechnic University
Priority to CN202010097008.XA priority Critical patent/CN111291693B/en
Publication of CN111291693A publication Critical patent/CN111291693A/en
Application granted granted Critical
Publication of CN111291693B publication Critical patent/CN111291693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a deep integration method of motion recognition based on bones, which uses a Convolutional Neural Network (CNN) and a long-short term memory network (LSTM), namely a deep integration model, to capture various space-time dynamics of motion recognition tasks. The action is a spatiotemporal event, so the action recognition task requires spatiotemporal features. In view of the goals, we model three sub-networks (called SNet, TNet and body net) to capture the spatio-temporal dynamic differences of the action recognition task. Under the push of ensemble learning, a hybrid network (called HNet) is modeled using two subnetworks TNet and BodyNet to capture the strong temporal dynamics. Compared with other methods on the UTD MHAD data set, the recognition rate of the method on the data set reaches 92.1 percent, and is far higher than the recognition rates of 85.81 percent and 88.10 percent in the prior art.

Description

Deep integration method based on skeleton motion recognition
Technical Field
The invention relates to the technical field of human body action recognition, in particular to a deep integration method of action recognition based on bones.
Background
At present, human body action recognition is a popular research subject in the field of computer vision, has good practical application prospect, can be applied to the fields of image processing, computer vision, machine learning and the like, but becomes challenging due to the diversity of human body actions and the like. At present, human body action recognition is complex, human body action cannot be completely recognized during deep learning, and errors and the like easily occur. Therefore, it is important to design a deep integration method based on skeleton motion recognition.
Disclosure of Invention
In view of the shortcomings of the prior art, the present invention provides a deep integration method for bone-based motion recognition, which uses Convolutional Neural Network (CNN) and long-short term memory network (LSTM), i.e. a deep integration model, to capture various spatiotemporal dynamics of motion recognition task. The action is a spatiotemporal event, so the action recognition task requires spatiotemporal features. In view of the goals, we model three sub-networks (called SNet, TNet and body net) to capture the spatio-temporal dynamic differences of the action recognition task. Under the push of ensemble learning, a hybrid network (called HNet) is modeled using two subnets TNet and BodyNet to capture the strong temporal dynamics.
The invention provides a deep integration method based on skeleton action recognition, which comprises the following steps:
the method comprises the following steps: establishing a recurrent neural network, and modeling a sequence problem by using the recurrent neural network;
step two: modeling a spatial net using two spatial distance maps to capture spatial dynamics for a motion recognition task;
step three: modeling a time domain network using a distance map in the time domain to capture time domain dynamics and perform motion recognition tasks;
step four: using a multi-layer stacked LSTM network as BaseNet, wherein the BaseNet consists of three bidirectional LSTM layers, and a dropout layer is introduced between two Bi-LSTM layers so as to relieve the overfitting problem generated in the training of the BaseNet and enable a fully connected layer and a softmax layer to follow for operation classification tasks;
step five: the Hybrid Net model for HNet will be described using the body Net and TNet functions, and distinct and strong discriminative temporal features will be selected from the body Net and TNet features to efficiently construct HNet.
The further improvement lies in that: in the first step, the recurrent neural network comprises LSTM units which are input by an input gate (I) t ) Input node (G) t ) Forgetting the door (F) t ) And an output gate (O) t ) Said input gate (I) t ) Is given by the equation I t =σ(W IX X t +W IH H t-1 +b I ) (ii) a Forget to remember the door (F) t ) Is F t =σ(W FX X t +W FH H t-1 +b F ) (ii) a Output gate (O) t ) Is given by the equation O t =σ(W OX X t +W OH H t-1 +b O ) (ii) a Input node (G) t ) Is G t =Tanh(W GX X t +W GH H t-1 +b G ) The combination formula of LSTM units is C t =F t C t-1 +I t G t And H t =O t Tanh(C t ) Wherein all W s And b s Is the weight matrix and offset for each gate.
The further improvement lies in that: in the second step, on the basis of the paired distance features for the motion recognition task, four joint distance maps are constructed, one in 3D space and the other three in 2D orthogonal space, and each motion is performed by two subjects so AS to process the motion including human-to-human interaction, in a motion sequence, each frame has two bones related to main and auxiliary objects, a skeleton sequence AS includes M skeleton frames, each frame includes 2N joints, wherein the first N joints are related to the main object and the remaining N joints are used for the auxiliary objects, AS = { Fr = 1 ,......Fr M Where Fr represents the framework,
Figure BDA0002385446000000031
Figure BDA0002385446000000032
denotes j th I of the frame th 3D coordinates (x, y, z) of the individual joints; the first set of spatial features is defined in terms of hip joints and is named SF1 xyz ,SF1 xy ,SF1 yz And SF1 xz (ii) a The second set of spatial features is defined in terms of shoulder center and is named SF2 xyz ,SF2 xy ,SF2 yz And SF2 xz The modeling formula of the spatial feature comprises
Figure BDA0002385446000000033
Figure BDA0002385446000000034
Figure BDA0002385446000000035
Figure BDA0002385446000000036
Figure BDA0002385446000000037
Figure BDA0002385446000000038
Figure BDA0002385446000000039
Figure BDA00023854460000000310
Where f denotes the frame number and Jxyz denotes the (x, y, z) coordinates of the joint. Jxy, jyz and Jxz refer to the (x, y), (y, z), (x, z) coordinates of the joint, respectively. D n () Representing the distance between two points in euclidean n space. Let r = (r) 1 ,r 2 ,...,r n ) And s =(s) 1 ,s 2 ,...,s n ) Are two points in Euclidean n space, then D n () The calculation formula of (c) is: />
Figure BDA0002385446000000041
The further improvement lies in that: in the third step, four time characteristics are constructed, namely TF xyz ,TF xy ,TF yz And TF xz The formula of the four time characteristics is as follows:
Figure BDA0002385446000000042
Figure BDA0002385446000000043
Figure BDA0002385446000000044
Figure BDA0002385446000000045
the further improvement lies in that: in the fourth step, bodinet is used to extract various features from fine-grained body parts in the time domain of the entire sequence, and for each frame Fr, skeletal joints related to a main subject are grouped into five groups, which correspond to five body parts, respectively, and joints of auxiliary subjects are also grouped.
The invention has the beneficial effects that: a Convolutional Neural Network (CNN) and a long-short term memory network (LSTM), namely a deep integration model, are used for capturing various space-time dynamics of motion recognition tasks, and compared with other methods on a UTD MHAD data set, the recognition rate of the method on the data set reaches 92.1 percent and is far higher than the recognition rates of 85.81 percent and 88.10 percent in the prior art.
Drawings
Fig. 1 is a schematic diagram of the basic structure of the LSTM unit of the present invention.
FIG. 2 is a distance diagram of the SNet and TNet of the present invention.
Fig. 3 is a schematic view of the spatial network structure of the present invention.
Fig. 4 is a schematic diagram of a time domain network structure according to the present invention.
Fig. 5 is a schematic diagram of BaseNet of the present invention.
Figure 6 is a schematic view of the structure of the body network of the present invention.
Fig. 7 is a schematic view of a hybrid net structure of the present invention.
FIG. 8 is a table of action category calculations for the present invention.
Fig. 9 is a comparison graph of recognition accuracy of the present invention.
Detailed Description
In order to further understand the present invention, the following detailed description will be made with reference to the following examples, which are only used for explaining the present invention and are not to be construed as limiting the scope of the present invention. As shown in fig. 1-9, the present embodiment provides a deep integration method of bone-based motion recognition, the method comprising the steps of:
the method comprises the following steps: establishing a recurrent neural network, and modeling a sequence problem by using the recurrent neural network;
step two: modeling a spatial net using two spatial distance maps to capture spatial dynamics for a motion recognition task;
step three: modeling a time domain network using the distance map in the time domain to capture time domain dynamics and perform motion recognition tasks;
step four: the method comprises the following steps that a multi-layer stacked LSTM network is used as BaseNet, the BaseNet is composed of three bidirectional LSTM layers, a dropout layer is introduced between two Bi-LSTM layers, so that the overfitting problem generated in the process of training the BaseNet is relieved, and a completely connected layer and a softmax layer are used for operating a classification task;
step five: the Hybrid Net model for HNet will be described using the body Net and TNet functions, and distinct and strong discriminative temporal features will be selected from the body Net and TNet features to efficiently construct HNet.
Since the RNN has its internal memory, it can store information about previous calculations. Theoretically, RNNs handle sequences of arbitrary length. In practice, however, it cannot model long-term sequences due to two major problems: the gradient disappeared and the gradient exploded. Long term short term memory (LSTM) has been proposed to address this problem. Fig. 1 shows the basic structure of an LSTM cell. Suppose X t Is the input of LSTM, t is the time step, the LSTM unit is input by the input gate (I) t ) Input node (G) t ) Forgetting the door (F) t ) And an output gate (O) t ). The basic equation for these gates is defined by the equation.
The further improvement lies in that: in the first step, the recurrent neural network comprises LSTM units which are input by an input gate (I) t ) Input node (G) t ) Forgetting the door (F) t ) And an output gate (O) t ) Said input gate (I) t ) Is given by the equation I t =σ(W IX X t +W IH H t-1 +b I ) (ii) a Forget to remember the door (F) t ) Is F t =σ(W FX X t +W FH H t-1 +b F ) (ii) a Output gate (O) t ) Is given by the equation O t =σ(W OX X t +W OH H t-1 +b O ) (ii) a Input node (G) t ) Is G t =Tanh(W GX X t +W GH H t-1 +b G ) The combination formula of LSTM units is C t =F t C t-1 +I t G t And H t =O t Tanh(C t ) Wherein all W s And b s Is the weight matrix and offset of each gate.
The further improvement lies in that: in the second step, on the basis of the paired distance features for the motion recognition task, four joint distance maps are constructed, one in 3D space and the other three in 2D orthogonal space, and each motion is performed by two subjects so AS to process the motion including human-to-human interaction, in a motion sequence, each frame has two bones related to main and auxiliary objects, a skeleton sequence AS includes M skeleton frames, each frame includes 2N joints, wherein the first N joints are related to the main object and the remaining N joints are used for the auxiliary objects, AS = { Fr = 1 ,......Fr M Where Fr represents the framework,
Figure BDA00023854460000000712
Figure BDA00023854460000000713
denotes j th I of the frame th 3D coordinates (x, y, z) of the individual joints; the first set of spatial features is defined in terms of hip joints and is named SF1 xyz ,SF1 xy ,SF1 yz And SF1 xz (ii) a The second set of spatial features is defined in terms of shoulder center and is named SF2 xyz ,SF2 xy ,SF2 yz And SF2 xz The modeling formula of the spatial feature comprises->
Figure BDA0002385446000000071
Figure BDA0002385446000000072
Figure BDA0002385446000000073
Figure BDA0002385446000000074
Figure BDA0002385446000000075
Figure BDA0002385446000000076
Figure BDA0002385446000000077
Figure BDA0002385446000000078
Where f denotes the frame number and Jxyz denotes the (x, y, z) coordinates of the joint. Jxy, jyz and Jxz refer to the (x, y), (y, z), (x, z) coordinates of the joint, respectively. D n () Representing the distance between two points in euclidean n space. Let r = (r) 1 ,r 2 ,...,r n ) And s =(s) 1 ,s 2 ,...,s n ) Are two points in Euclidean n space, then D n () The calculation formula of (c) is: />
Figure BDA0002385446000000079
The further improvement lies in that: in the third step, four time characteristics are constructed, namely TF xyz ,TF xy ,TF yz And TF xz The formula of the four time characteristics is as follows:
Figure BDA00023854460000000710
Figure BDA00023854460000000711
Figure BDA0002385446000000081
Figure BDA0002385446000000082
TF, SF-1 and SF-2 are characterized by having values of 2N × (M-1), (2N-1) × M and (2N-1) × M, respectively. Due to the number of framesThe quantity (M) varies from one motion sequence to another, so the size of the feature also varies according to the value of M. The size of the features should be the same for all action sequences of batch learning. Assuming that a motion sequence contains M frames, each containing N joints, and if the calculated distances between successive frames are arranged in a single vector, the TF feature will yield (M-1) Distance Vectors (DV). All these DVs together constitute a matrix with (M-1) columns, all the DVs being of the same size (N). The number of columns in the matrix varies according to the number of frames (M) in the sequence. Bicubic interpolation is used to generate a matrix array (M) with fixed values | ). Finally, the size of (N M) is generated | ) Of the matrix of (a). The matrix generation size is (N M) | ) The TF feature vector of (1). Note that for any value of M, (M) | ) Are all fixed. Like the TF eigenvectors, SF 1 And SF 2 The size of the feature vector is ((N-1) × M | ). Since the height and width of an object in skeletal data may have different proportions, the feature values extracted from the skeletal data need to be normalized to fit the desired range 0-1]. Thus, as equation
Figure BDA0002385446000000083
As shown, a normalization equation is proposed in this work. To keep the value at 0-255]Within the range, the Normalized matrix is multiplied by 255, as shown by the formula Gray image = Normalized M255. The resulting matrix looks like a grayscale image with 256 intensity levels. To classify using the pre-trained CNN model, a paper color coding mechanism is used to convert the grayscale image to a color image. The color image is input (X) into the CNN model. As a result, the motion recognition problem is converted into an image classification problem. Thus, CNN is fine-tuned for the action classification task herein.
Where M is the feature matrix to be normalized, min (M) is the minimum value in M, and max (M) is the maximum value in M. Multiplicative fusion is employed to compute the spatial and temporal scores of the proposed SNet and TNet, respectively. Suppose S1 1 ,S1 2 ,S1 3 ,S1 4 ,S2 1 ,S2 2 ,S2 3 And S2 4 Is a vector in the space distance maps (SF 1) and (SF 2), such as the equation Spatial score (ss) for action A = (S1) 1 ΔS1 2 ΔS1 3 ΔS1 4 ΔS2 1 ΔS2 2 ΔS2 3 ΔS2 4 ) The spatial score (ss) is calculated as shown. Similarly, t 1 ,t 2 ,t 3 And t 4 Is a vector of four CNNs trained on a time-distance map (TF), as in the equation Temporal score (ts) for action A = (t) 1 Δt 2 Δt 3 Δt 4 ) The spatial score (ts) is calculated as shown.
In order to study the discrimination ability of the proposed spatial distance map with related working features, experiments have been performed using Alexnet, but features extracted in the time domain are also essential for robust motion recognition. To investigate the assumptions made herein, a subnet TNet using a time-distance map is proposed herein. To train subnets SNet and TNet using Alexnet, the maximum number of epochs for all experiments was 100, the batch size was set to 128, the initial learning rate was set to 0.001 for fine tuning, and training from 0.001 was performed from scratch. The network is trained using back propagation with a random gradient descent with a momentum value of 0.9, spatial and temporal features are crucial for the motion recognition task. Furthermore, the HNet is modeled herein using BodyNet and TNet to extract robust temporal features.
The further improvement lies in that: in the fourth step, bodinet is used to extract various features from fine-grained body parts in the time domain of the entire sequence, and for each frame Fr, skeletal joints related to a main subject are grouped into five groups, which correspond to five body parts, respectively, and joints of auxiliary subjects are also grouped. The LSTM network of the multi-layer stack serves as BaseNet. The proposed BaseNet consists of three Bi-directional LSTM (Bi-L STM) layers, as shown in fig. 5. A Dropout (DP) layer was introduced between the two Bi-LSTM layers to alleviate the overfitting problem that occurs when training BaseNet. Finally, the fully connected layer (FC) and softmax layer are immediately followed for the operation classification task. Since the relative geometry between body parts provides important information for the task of motion recognition, the present invention designs BodyNet to follow the entire sequenceFine-grained body parts in the time domain of the column extract various features, as shown in fig. 6. For each frame Fr, the skeletal joints related to the main subject are grouped into five groups, corresponding respectively to the equations
Figure BDA0002385446000000101
Five body parts in (1). Likewise, joints of the auxiliary objects are also grouped.
Where Fr ∈ AS, τ i I =1, \8230, 5, a set of joints corresponding to body parts RH, RL, LH, LL and Trunk, respectively. The proposed BodyNet contains three BaseNet as shown in fig. 6. It uses three temporal features to extract temporal dynamics between different body parts in the time domain. These are the motion of the joint, part-to-part distance and edge-to-edge distance. They are called BodyNet-Feature1 (BNF 1), bodyNet-Feature2 (BNF 2), and BodyNet-Feature3 (BNF 3), respectively. The action function (BNF 1) is one of the important distinguishing functions for different classes of actions, such as the equation
Figure BDA0002385446000000102
The method as described in (1).
In addition, to capture the geometrical relationship between body parts in the time domain, equations are respectively proposed in equations
Figure BDA0002385446000000103
And
Figure BDA0002385446000000104
BNF 2 and BNF 3 characteristics as defined in (1).
Introduction of Hybrid Net modeling called HNet using the body Net and TNet functions, as shown in fig. 7, studies were conducted to select distinct strong discriminative temporal features from the body Net and TNet features to efficiently construct HNet. To explore the best temporal features, first, the accuracy of each feature of body net and TNet is calculated for the four action classes reported in fig. 8. According to fig. 8, BNF1 has good performance compared to other functions for "jump" movements in four movement categories, as it is highly relevant to capture the movement of the body to the ground. On the other hand, BNF 2 functions are good at recognizing "answer call" actions, but perform the worst for "clap hand" actions. The reason is that the motion between the two hands is the discriminant force to identify the action "clap" while BNF 2 captures the motion between one part until all the rest of the time domain. It is shown that different temporal characteristics have a unique discriminative power to identify actions.

Claims (5)

1. A deep integration method based on skeleton action recognition is characterized in that: the method comprises the following steps:
the method comprises the following steps: establishing a recurrent neural network, and modeling a sequence problem by using the recurrent neural network;
step two: modeling a spatial net using two spatial distance maps to capture spatial dynamics for a motion recognition task;
step three: modeling a time domain network using a distance map in the time domain to capture time domain dynamics and perform motion recognition tasks;
step four: using a multi-layer stacked LSTM network as BaseNet, wherein the BaseNet consists of three bidirectional LSTM layers, and a dropout layer is introduced between two Bi-LSTM layers so as to relieve the overfitting problem generated in the training of the BaseNet and enable a fully connected layer and a softmax layer to follow for operation classification tasks;
step five: the Hybrid Net model for HNet will be described using the body Net and TNet functions, and distinct and strong discriminative temporal features will be selected from the body Net and TNet features to efficiently construct HNet.
2. A method of deep integration of bone-based motion recognition as claimed in claim 1, wherein: in the first step, the recurrent neural network comprises LSTM units which are formed by input gates (I) t ) Input node (G) t ) Forgetting the door (F) t ) And an output gate (O) t ) Said input gate (I) t ) Is given by the equation I t =σ(W IX X t +W IH H t-1 +b I ) (ii) a Forget to remember the door (F) t ) Equation of (2)Is F t =σ(W FX X t +W FH H t-1 +b F ) (ii) a Output gate (O) t ) Is given by the equation O t =σ(W OX X t +W OH H t-1 +b O ) (ii) a Input node (G) t ) Is G t =Tanh(W GX X t +W GH H t-1 +b G ) The combination formula of LSTM units is C t =F t C t-1 +I t G t And H t =O t Tanh(C t ) Wherein all W s And b s Is the weight matrix and offset for each gate.
3. A method of deep integration of bone-based motion recognition as claimed in claim 1, wherein: in the second step, on the basis of the paired distance features for the motion recognition task, four joint distance maps are constructed, one in 3D space and the other three in 2D orthogonal space, and each motion is performed by two subjects so AS to process the motion including human-to-human interaction, in a motion sequence, each frame has two bones related to main and auxiliary objects, a skeleton sequence AS includes M skeleton frames, each frame includes 2N joints, wherein the first N joints are related to the main object and the remaining N joints are used for the auxiliary objects, AS = { Fr = 1 ,......Fr M Where Fr represents the framework,
Figure FDA0002385445990000021
Figure FDA0002385445990000022
denotes j th I of the frame th 3D coordinates (x, y, z) of the individual joints; the first set of spatial features is defined in terms of hip joints and is named SF1 xyz ,SF1 xy ,SF1 yz And SF1 xz (ii) a The second set of spatial features is defined in terms of shoulder center and is named SF2 xyz ,SF2 xy ,SF2 yz And SF2 xz Modeling formula of said spatial featuresIncluded
Figure FDA0002385445990000023
Figure FDA0002385445990000024
Figure FDA0002385445990000025
Figure FDA0002385445990000026
Figure FDA0002385445990000027
Figure FDA0002385445990000028
Figure FDA0002385445990000031
Figure FDA0002385445990000032
Wherein f represents a frame number and Jxyz represents the (x, y, z) coordinates of the joint; jxy, jyz and Jxz refer to the (x, y), (y, z), (x, z) coordinates of the joint, respectively; d n () Represents the distance between two points in euclidean n space; let r = (r) 1 ,r 2 ,...,r n ) And s =(s) 1 ,s 2 ,...,s n ) Are two points in Euclidean n space, then D n () The calculation formula of (c) is:
Figure FDA0002385445990000033
4. a method of deep integration of bone-based motion recognition as claimed in claim 1, wherein: in the third step, four time characteristics are constructed, namely TF xyz ,TF xy ,TF yz And TF xz The formula of the four time characteristics is as follows:
Figure FDA0002385445990000034
Figure FDA0002385445990000035
Figure FDA0002385445990000036
Figure FDA0002385445990000037
5. a method of deep integration of bone-based motion recognition as claimed in claim 1, wherein: in the fourth step, bodinet is used to extract various features from fine-grained body parts in the time domain of the entire sequence, and for each frame Fr, skeletal joints related to a main subject are grouped into five groups, which correspond to five body parts, respectively, and joints of auxiliary subjects are also grouped.
CN202010097008.XA 2020-02-17 2020-02-17 Deep integration method based on skeleton motion recognition Active CN111291693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010097008.XA CN111291693B (en) 2020-02-17 2020-02-17 Deep integration method based on skeleton motion recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010097008.XA CN111291693B (en) 2020-02-17 2020-02-17 Deep integration method based on skeleton motion recognition

Publications (2)

Publication Number Publication Date
CN111291693A CN111291693A (en) 2020-06-16
CN111291693B true CN111291693B (en) 2023-03-31

Family

ID=71017962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010097008.XA Active CN111291693B (en) 2020-02-17 2020-02-17 Deep integration method based on skeleton motion recognition

Country Status (1)

Country Link
CN (1) CN111291693B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784812B (en) * 2021-02-08 2022-09-23 安徽工程大学 Deep squatting action recognition method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11295119B2 (en) * 2017-06-30 2022-04-05 The Johns Hopkins University Systems and method for action recognition using micro-doppler signatures and recurrent neural networks
CN110348321A (en) * 2019-06-18 2019-10-18 杭州电子科技大学 Human motion recognition method based on bone space-time characteristic and long memory network in short-term
CN110796110B (en) * 2019-11-05 2022-07-26 西安电子科技大学 Human behavior identification method and system based on graph convolution network

Also Published As

Publication number Publication date
CN111291693A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN108764050B (en) Method, system and equipment for recognizing skeleton behavior based on angle independence
Soo Kim et al. Interpretable 3d human action analysis with temporal convolutional networks
Liu et al. Multi-view hierarchical bidirectional recurrent neural network for depth video sequence based action recognition
Parisi et al. A generalized learning paradigm exploiting the structure of feedforward neural networks
US7379568B2 (en) Weak hypothesis generation apparatus and method, learning apparatus and method, detection apparatus and method, facial expression learning apparatus and method, facial expression recognition apparatus and method, and robot apparatus
CN106066996A (en) The local feature method for expressing of human action and in the application of Activity recognition
CN110135249A (en) Human bodys' response method based on time attention mechanism and LSTM
Jalal et al. American sign language posture understanding with deep neural networks
CN111814719A (en) Skeleton behavior identification method based on 3D space-time diagram convolution
TWI758828B (en) Self-learning intelligent driving device
CN109064389B (en) Deep learning method for generating realistic images by hand-drawn line drawings
Sussner et al. The Kosko subsethood fuzzy associative memory (KS-FAM): Mathematical background and applications in computer vision
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN111444488A (en) Identity authentication method based on dynamic gesture
Dong et al. Dynamic gesture recognition by directional pulse coupled neural networks for human-robot interaction in real time
CN111291693B (en) Deep integration method based on skeleton motion recognition
CN111339888B (en) Double interaction behavior recognition method based on joint point motion diagram
CN114638408A (en) Pedestrian trajectory prediction method based on spatiotemporal information
CN116453025A (en) Volleyball match group behavior identification method integrating space-time information in frame-missing environment
Aviles-Arriaga et al. Visual recognition of similar gestures
Zacharatos et al. Emotion recognition from 3D motion capture data using deep CNNs
Zhao et al. Human action recognition based on improved fusion attention CNN and RNN
Lee Nonlinear approaches to independent component analysis
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
Guo et al. Exploiting LSTM-RNNs and 3D skeleton features for hand gesture recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant