CN112801060A - Motion action recognition method and device, model, electronic equipment and storage medium - Google Patents
Motion action recognition method and device, model, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112801060A CN112801060A CN202110371059.1A CN202110371059A CN112801060A CN 112801060 A CN112801060 A CN 112801060A CN 202110371059 A CN202110371059 A CN 202110371059A CN 112801060 A CN112801060 A CN 112801060A
- Authority
- CN
- China
- Prior art keywords
- building block
- space
- sequence
- layer
- time graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The application discloses a motion action recognition method and device, a model, electronic equipment and a storage medium, comprising the following steps: acquiring a skeleton sequence of the motion action acquired by the attitude estimation equipment; inputting the skeleton sequence into a trained non-local space-time diagram convolution model to obtain a motion action recognition result; the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
Description
Technical Field
The patent relates to the technical field of deep neural networks, in particular to a motion action recognition method and device, a model, electronic equipment and a storage medium.
Background
The intelligent sports equipment needs to have the function of identifying the human body action types so as to judge the body-building actions (such as deep squatting, push-up, sit-up and the like) of a user, and the change of the human body joint sequence is very important for identifying the human body action types. Traditional methods for modeling joint sequence variations often rely on features of artificial design, thus resulting in limited expressive power and generalization difficulties. To overcome these limitations, a new approach is needed that can automatically capture the spatial and temporal patterns of changes in joint sequences. Recently, a graph convolutional neural network (GCN) that generalizes a Convolutional Neural Network (CNN) into an arbitrary structure has received increasing attention and has been successfully adopted in many applications, such as image classification, document classification, semi-supervised learning, and the like.
The space-time graph convolution model is used for applying graph convolution to a human body action classification task for the first time. Although the space-time graph convolution model can well model the change of the human skeleton sequence, the space-time graph convolution model cannot well represent the wide-range space-time dependence due to the locality of convolution operation, but is very important for recognizing some motion actions.
Disclosure of Invention
The embodiment of the application aims to provide a motion action identification method and device, a model, electronic equipment and a storage medium, so as to solve the problem that large-range space-time dependence cannot be modeled in a space-time diagram convolution model.
According to a first aspect of embodiments of the present application, there is provided a motion recognition method, including: acquiring a skeleton sequence of the motion action acquired by the attitude estimation equipment; inputting the skeleton sequence into a trained non-local space-time diagram convolution model to obtain a motion action recognition result; the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
According to a second aspect of embodiments of the present application, there is provided a motion recognition apparatus including: the acquisition module is used for acquiring a skeleton sequence of the motion action acquired by the attitude estimation equipment; the recognition module is used for inputting the skeleton sequence into a trained non-local space-time graph convolution model to obtain a motion action recognition result; the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
According to a third aspect of embodiments of the present application, there is provided a non-local space-time graph convolution model, including: the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
According to a fourth aspect of embodiments of the present application, there is provided an electronic apparatus, including: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect.
According to a fifth aspect of embodiments herein, there is provided a computer-readable storage medium having stored thereon computer instructions, characterized in that the instructions, when executed by a processor, implement the steps of the method according to the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the embodiment, the motion action framework sequence is obtained by using the attitude estimation equipment, and the obtained framework sequence is input into the trained non-local space-time diagram convolution model to obtain the motion action recognition result. The change of the human body skeleton sequence is crucial to the recognition of human body action types, although the space-time diagram convolution model can well model the change of the human body skeleton sequence, due to the locality of convolution operation, the space-time diagram convolution model cannot well represent large-range space-time dependence, but is crucial to the recognition of some movement actions. The ability of the space-time graph convolution model to model the relationship between the human joint points over a frame, namely the ability of spatial modeling, can be enhanced by non-local operations. Through the jump connection, the sequence information can be better transmitted in the model, so that the time modeling capability is enhanced. The combination of non-local operations, jump-connection and space-time graph convolution enables the space-time graph convolution to have better space-time modeling capability.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flow chart illustrating a method of motion activity recognition according to an exemplary embodiment.
FIG. 2 is a space-time diagram of a skeletal sequence used by the space-time diagram convolution shown in accordance with an exemplary embodiment, with points in FIG. 2 representing joints of the body, edges between joints of the body being defined in accordance with natural connections of the body, inter-frame edges connecting the same nodes between successive frames, and joint coordinates as inputs to the space-time diagram convolution.
FIG. 3 is a diagram illustrating a distance partitioning strategy, according to an example embodiment.
FIG. 4 is a diagram of a non-local space-time graph convolution model architecture in accordance with an exemplary embodiment.
FIG. 5 is a non-local layer structure diagram shown in accordance with an example embodiment.
Fig. 6 is a block diagram illustrating a motion activity recognition device according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Fig. 1 is a flowchart illustrating a motion recognition method according to an exemplary embodiment, and referring to fig. 1, a motion recognition method according to an embodiment of the present invention may include:
step S11, collecting a skeleton sequence of the motion action obtained by the attitude estimation equipment;
step S12, inputting the skeleton sequence into a trained non-local space-time graph convolution model to obtain a motion action recognition result;
the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
According to the embodiment, the change of the human body skeleton sequence is crucial to the identification of the human body action type, the space-time diagram convolution model can well model the change of the human body skeleton sequence, but due to the locality of convolution operation, the space-time diagram convolution model cannot well represent large-range space-time dependence, but is crucial to the identification of some movement actions.
In the specific implementation of step S11, a skeleton sequence of the motion action acquired by the posture estimation device is acquired;
specifically, the attitude estimation device of the present embodiment adopts an Azure Kinect DK depth camera, which is certainly not limited thereto; and capturing a motion skeleton sequence in the motion action video through the depth camera.
In one possible implementation, the motion action video captured by the depth camera includes a video composed of successive image frames in which a person is performing some motion, such as push-up, squat, pull-up, etc.
In the specific implementation of step S12, inputting the skeleton sequence into a trained non-local space-time diagram convolution model to obtain a motion action recognition result;
in particular, FIG. 4 is a diagram of a non-local space-time graph convolution model architecture, shown in accordance with an exemplary embodiment. Referring to fig. 4, the non-local space-time graph convolution model is formed by sequentially stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer, wherein the building block group comprises a building block one B1, a building block two B2, a building block three B3, a building block four B4 and a building block five B5 which are sequentially connected, an additional jump connection is further arranged between the building block one and the building block five, an additional jump connection is further arranged between the building block two and the building block four, and each building block is composed of two space-time graph convolution models and one non-local layer.
The implementation steps of the space-time graph convolution model comprise:
(1) constructing a space-time diagram of a joint on a motion action skeleton sequence, referring to fig. 2, wherein the motion action skeleton sequence comprises a plurality of frames, and each frame comprises a human body skeleton diagram;
in particular, the skeleton sequence is typically represented by 2D or 3D coordinates of each human joint in each frame. In practical application, the Azure Kinect DK is mainly adopted for collecting joint point data. In the space-time graph convolution model, joint sequences are represented hierarchically using space-time graphs.
The space-time graph convolution model is provided withA joint point anda non-directional space-time diagram is constructed on the joint sequence of the frame. In this graph, a set of nodes (Is shown asOn the frameIndividual node) comprising all nodes, nodes in the node sequenceThe coordinate vector of (2) is input as a feature vector into the space vector convolution model. Edge setComprising two subsets, a first subset(Is shown asOn the frameA joint point, andforming a natural connecting edge between the articulation points of the human body), whereinIs a collection of natural connecting edges between human body joint points, describing the connections between joint points in the same frame. Second subsetInter-frame edges are included connecting the same joint points between successive frames. Therefore, the temperature of the molten metal is controlled,of the same particular joint pointAll edges of (a) represent the joint point over timeA trajectory.
(2) Defining a distance-based sampling function on a frame space diagram of the motion action skeleton sequence;
specifically, inOn a single frame of time, there areJoint point and bone edge. In conventional convolution, when the input is a 2D grid, the output signature of the convolution operation is also a 2D grid. Using a single step size and appropriate padding, the size of the output signature can be the same as the size of the input signature. In the following description we will assume this case. Considering the convolution kernel size asFor the number of channels isInput feature map ofA conventional convolution operation is performed. In spatial positionThe output value of (d) is:
wherein the sampling functionTraversing positionNeighbor of (2), weight functionA weight vector in the c-dimensional real space is provided for calculating an inner product with the c-dimensional input feature vector. The convolution operation on the graph is then defined by extending the above formula to the case where the input feature graph is located on a spatial graph.
On the image, sampling the functionIs defined at the central positionOn the adjacent pixels of the pixel array. On the graph, sampling functions can be similarly defined at nodesAdjacent set ofThe above. Here, theRepresents fromToThe minimum length of any of the paths is,indicating the selectable path length. Thus, the sampling functionCan be written as
(3) Defining a mapping function from a node to a label on a space diagram, and implementing the mapping function by adopting a distance division strategy;
in particular, we have employed a distance partitioning strategy to implement label mapping. Specific strategies are described below, in conjunction with FIG. 3.
The distance division strategy is based on the node to the root nodeIs a distance ofWhereinRepresenting other joint points in the same frame, dividing the neighbor set. In the space-time graph convolution model, settingThe neighbor set is divided into two subsets,which represents the root node of the network,representing the remaining adjacent nodes. Thus, the space-time graph convolution model will have two different weight vectors that can model local dissimilarity. In the form ofAnd。
(4) defining a weighting function based on said mapping function;
in particular, the joint pointOf (2) a neighbor setThe distance division strategy is divided into two fixed subsets, each subset having a digital label. Therefore, we have a mappingThe neighboring nodes are mapped to the labels of the corresponding subset. Weight functionCan pass throughIndex tensor of dimension to realize
(5) Based on the sampling function and the weighting function, the traditional convolution is popularized to the space map convolution;
in particular, the conventional convolution is now rewritten to the form of a graph convolution
Normalization termEqual to the cardinality of the corresponding subset. This term is added to balance the contribution of the different subsets to the output. Combining the sampling function and the weighting function to obtain
(6) Extending the sampling function and the mapping function to a time dimension, thereby generalizing the spatial graph convolution operation to a time-space domain;
in particular, after the spatial map convolution is formulated, the task of dynamically modeling the time space within the sequence of joint points is now entered. We extend the notion of neighborhood to also include temporally connected joints
Parameter(s)Controlling the temporal extent in the neighborhood graph, and therefore can be referred to as the temporal convolution kernel size,is shown asAnd (5) frame. In order to complete the convolution on the space-time diagram, the space-time diagram convolution also needs a sampling function, the sampling function is the same as that of the weight function and the space diagram, and label mapping is carried outDifferent. Because the time axis is regular, the convolution of the space-time diagram will be directly followed bySpatio-temporal neighborhood label mapping for root nodesInstead, it is changed into
In this way, the space-time graph convolution model defines a well-defined convolution operation on the constructed space-time graph.
(7) And respectively performing space map convolution on the space map and time convolution on the time dimension to realize the space-time map convolution model.
In particular, the implementation of graph-based convolution is not as simple as 2D or 3D convolution. Here we provide detailed implementation information of the space-time graph convolution for skeletal motion recognition.
The human body joint points in a frame are connected by an adjacent matrixExpress, identity matrixIndicating self-connection.
In the single frame case, for the distance partitioning strategy, the adjacency matrixIs split into a plurality of matricesAnd isAnd. The spatial map convolution can therefore be implemented by
In a similar manner to that described above,whereinTo representTo middleGo to the firstThe elements of the column are,to representTo middleGo to the firstThe elements of the column are,is thatThe degree matrix of (c). Is provided withTo avoidAll zero rows in (1).
In fact, in a spatio-temporal situation, we can represent the input feature map asThe tensor of the dimension. We implement space-time graph convolution by performing space graph convolution in the third dimension of the tensor, i.e., the spatial dimension, and time convolution in the second dimension of the tensor, respectively.
FIG. 5 is a diagram illustrating a non-local layer structure, the non-local layer including2D convolution of (a).
Denotes an input tensor, in whichWhich represents the number of frames,the number of the joint points is represented,the number of characteristic channels is indicated.,,And are andto representThe 2D convolution of (a) with (b),it is meant that the matrix multiplication is performed,representing an element-by-element addition.
The specific calculation steps for the non-local layer are as follows:
the method comprises the following steps:(Note:, andrespectively represent,Andthree of theseWeight of 2D convolution)
Step two:(Note:the output of the non-local layer is represented,representsThis isWeight of 2D convolution)
We describe here in more detail the flow of data in the model.
We first input the joint sequence into the bulk normalization layer to normalize the data. Data is then input into building block one and we will get two identical outputs, one of which will directly be a skip input to building block five and the other input into building block two. The second building block will obtain two identical outputs, one of which will be directly used as a skip input of the fourth building block, and the other output is input into the third building block. The output of building block three is connected to the skip input of building block two as the input of building block four. The output of building block four is connected to the skip input of building block one as the input of building block five. The number of input/output characteristic channels of each building block is (1, 16), (16, 32), (32, 64), (64, 128), (128, 256). Each building block consists of two space-time graph convolution models and a non-local layer. The Resnet mechanism is applied to each space-time graph convolution model. Also, after each space-time graph convolution model, we randomly discarded features with a probability of 0.5 to avoid overfitting. And then performing global average pooling on the output of the building block five to obtain 256-dimensional feature vectors of each motion action skeleton sequence. Finally, we provide them to the SoftMax classifier to get the classification result.
After the model is constructed, when trained, we will train the model using a random gradient descent with a learning rate of 0.1. Every 10 cycles we will reduce the learning rate by 0.1.
In order to verify the effect of the method provided by the embodiment of the invention, NTU RGB + D is selected as a data set and compared with the existing ST-GCN and 2s-AGCN, so that the effect of the method and the model is highlighted.
Briefly, NTU RGB + D (see Amir Shahroudy, Jun Liu, Tian-Tson Ng, Gang Wang: NTU RGB + D: A Large Scale database for 3D Human Activity analysis. CVPR 2016: 1010- "1019) is introduced here, where NTU RGB + D is a Large-Scale motion recognition Dataset containing 56,578 framework sequences of 60 motion classes captured from 40 different objects and 3 different camera perspectives. Each skeleton map contains 25 human joints as nodes and their 3D positions in space as initial features. Each frame of action contains 1 to 2 objects. Producers of NTU RGB + D suggest reporting the accuracy of classification under two settings: (1) Cross-Subject (X-Subject), in which 40 objects were divided into training and testing groups, yielded 40,091 and 16,487 training and testing examples, respectively. (2) Cross-View (X-View), all 18,932 samples collected from camera 1 were used for testing, and the remaining 37,646 samples were used for training.
Experiments were performed on this data set NTU RGB + D and the results are shown in table 1. Experimental results show that the method provided by the embodiment of the invention realizes great performance improvement.
Table 1 shows the accuracy of the method provided by the embodiments of the present invention compared to ST-GCN and 2s-AGCN in two settings of the NTU RGB + D dataset.
Among them, ST-GCN can be referred to: sijie Yan, Yuanjun Xiong, Dahua Lin: Spatial Temporal Graph relational Networks for Skeleton-Based Action recognition. AAAI 2018: 7444-. 2s-AGCN can be referred to: lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu: Two-Stream Adaptive Graph conditional Networks for Skeleton-Based Action registration. CVPR 2019: 12026-.
Corresponding to the embodiment of the motion action recognition method, the application also provides an embodiment of a motion action recognition device.
Fig. 6 is a block diagram illustrating a motion activity recognition device according to an example embodiment. Referring to fig. 6, the apparatus may include:
the acquisition module 31 is configured to acquire a skeleton sequence of the motion action acquired by the posture estimation device;
the recognition module 32 is configured to input the skeleton sequence into a trained non-local space-time graph convolution model to obtain a motion action recognition result; the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method of motion activity recognition as described above.
Accordingly, the present application also provides a computer readable storage medium, on which computer instructions are stored, wherein the instructions, when executed by a processor, implement the motion action recognition method as described above.
The embodiment of the present invention further provides a non-local space-time graph convolution model, which includes: the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
With respect to the non-local space-time graph convolution model in the above embodiment, the specific manner of each part thereof has been described in detail in the embodiment related to the method, and will not be elaborated herein.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (6)
1. A motion recognition method, comprising:
acquiring a skeleton sequence of the motion action acquired by the attitude estimation equipment;
inputting the skeleton sequence into a trained non-local space-time diagram convolution model to obtain a motion action recognition result;
the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
2. The method of claim 1, wherein the pose estimation device employs an Azure Kinect DK depth camera.
3. An exercise motion recognition device, comprising:
the acquisition module is used for acquiring a skeleton sequence of the motion action acquired by the attitude estimation equipment;
the recognition module is used for inputting the skeleton sequence into a trained non-local space-time graph convolution model to obtain a motion action recognition result;
the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
4. A non-local space-time graph convolution model, comprising: the non-local space-time graph convolution model is formed by stacking a batch normalization layer, a building block group, a global average pooling layer and a Softmax layer in sequence, the building block group comprises a first building block, a second building block, a third building block, a fourth building block and a fifth building block which are connected in sequence, an additional jump connection is arranged between the first building block and the fifth building block, an additional jump connection is arranged between the second building block and the fourth building block, and each building block consists of two space-time graph convolution models and a non-local layer.
5. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
6. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110371059.1A CN112801060A (en) | 2021-04-07 | 2021-04-07 | Motion action recognition method and device, model, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110371059.1A CN112801060A (en) | 2021-04-07 | 2021-04-07 | Motion action recognition method and device, model, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112801060A true CN112801060A (en) | 2021-05-14 |
Family
ID=75816376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110371059.1A Pending CN112801060A (en) | 2021-04-07 | 2021-04-07 | Motion action recognition method and device, model, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112801060A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919232A (en) * | 2019-03-11 | 2019-06-21 | 西安电子科技大学 | Image classification method based on convolutional neural networks and non local connection network |
CN110532925A (en) * | 2019-08-22 | 2019-12-03 | 西安电子科技大学 | Driver Fatigue Detection based on space-time diagram convolutional network |
CN110796110A (en) * | 2019-11-05 | 2020-02-14 | 西安电子科技大学 | Human behavior identification method and system based on graph convolution network |
CN111460928A (en) * | 2020-03-17 | 2020-07-28 | 中国科学院计算技术研究所 | Human body action recognition system and method |
CN111601088A (en) * | 2020-05-27 | 2020-08-28 | 大连成者科技有限公司 | Sitting posture monitoring system based on monocular camera sitting posture identification technology |
CN111612046A (en) * | 2020-04-29 | 2020-09-01 | 杭州电子科技大学 | Characteristic pyramid graph convolutional neural network and application thereof in 3D point cloud classification |
CN111652124A (en) * | 2020-06-02 | 2020-09-11 | 电子科技大学 | Construction method of human behavior recognition model based on graph convolution network |
CN111814719A (en) * | 2020-07-17 | 2020-10-23 | 江南大学 | Skeleton behavior identification method based on 3D space-time diagram convolution |
CN111860267A (en) * | 2020-07-13 | 2020-10-30 | 浙大城市学院 | Multichannel body-building movement identification method based on human body bone joint point positions |
CN111950406A (en) * | 2020-07-28 | 2020-11-17 | 深圳职业技术学院 | Finger vein identification method, device and storage medium |
CN112232106A (en) * | 2020-08-12 | 2021-01-15 | 北京工业大学 | Two-dimensional to three-dimensional human body posture estimation method |
CN112528811A (en) * | 2020-12-02 | 2021-03-19 | 建信金融科技有限责任公司 | Behavior recognition method and device |
-
2021
- 2021-04-07 CN CN202110371059.1A patent/CN112801060A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919232A (en) * | 2019-03-11 | 2019-06-21 | 西安电子科技大学 | Image classification method based on convolutional neural networks and non local connection network |
CN110532925A (en) * | 2019-08-22 | 2019-12-03 | 西安电子科技大学 | Driver Fatigue Detection based on space-time diagram convolutional network |
CN110796110A (en) * | 2019-11-05 | 2020-02-14 | 西安电子科技大学 | Human behavior identification method and system based on graph convolution network |
CN111460928A (en) * | 2020-03-17 | 2020-07-28 | 中国科学院计算技术研究所 | Human body action recognition system and method |
CN111612046A (en) * | 2020-04-29 | 2020-09-01 | 杭州电子科技大学 | Characteristic pyramid graph convolutional neural network and application thereof in 3D point cloud classification |
CN111601088A (en) * | 2020-05-27 | 2020-08-28 | 大连成者科技有限公司 | Sitting posture monitoring system based on monocular camera sitting posture identification technology |
CN111652124A (en) * | 2020-06-02 | 2020-09-11 | 电子科技大学 | Construction method of human behavior recognition model based on graph convolution network |
CN111860267A (en) * | 2020-07-13 | 2020-10-30 | 浙大城市学院 | Multichannel body-building movement identification method based on human body bone joint point positions |
CN111814719A (en) * | 2020-07-17 | 2020-10-23 | 江南大学 | Skeleton behavior identification method based on 3D space-time diagram convolution |
CN111950406A (en) * | 2020-07-28 | 2020-11-17 | 深圳职业技术学院 | Finger vein identification method, device and storage medium |
CN112232106A (en) * | 2020-08-12 | 2021-01-15 | 北京工业大学 | Two-dimensional to three-dimensional human body posture estimation method |
CN112528811A (en) * | 2020-12-02 | 2021-03-19 | 建信金融科技有限责任公司 | Behavior recognition method and device |
Non-Patent Citations (8)
Title |
---|
LEI SHI等: "Non-Local Graph Convolutional Networks for Skeleton-Based Action Recognition", 《ARXIV:1805.07694V2》 * |
LEI SHI等: "Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks", 《IEEE TRANSACTIONS ON IMAGE PROCESSING 》 * |
LEI SHI等: "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
SIJIE YAN等: "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition", 《ARXIV:1801.07455V2》 * |
XIAOLONG WANG等: "Non-local Neural Networks", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
曹毅等: "时空自适应图卷积神经网络的骨架行为识别", 《华中科技大学学报》 * |
王志华: "基于时空图卷积神经网络的人体动作识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
黄晨: "基于姿态序列的视频人体动作识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Graph edge convolutional neural networks for skeleton-based action recognition | |
Liu et al. | Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction | |
Xia et al. | Multi-scale mixed dense graph convolution network for skeleton-based action recognition | |
CN110472604B (en) | Pedestrian and crowd behavior identification method based on video | |
Geng et al. | Human action recognition based on convolutional neural networks with a convolutional auto-encoder | |
CN109558781A (en) | A kind of multi-angle video recognition methods and device, equipment and storage medium | |
Bruce et al. | Multimodal fusion via teacher-student network for indoor action recognition | |
CN112131908A (en) | Action identification method and device based on double-flow network, storage medium and equipment | |
Fan et al. | Context-aware cross-attention for skeleton-based human action recognition | |
CN108647571A (en) | Video actions disaggregated model training method, device and video actions sorting technique | |
CN111401106A (en) | Behavior identification method, device and equipment | |
Zhang et al. | Graph convolutional LSTM model for skeleton-based action recognition | |
Jiang et al. | Inception spatial temporal graph convolutional networks for skeleton-based action recognition | |
Chen et al. | Hierarchical posture representation for robust action recognition | |
Wei et al. | Dynamic hypergraph convolutional networks for skeleton-based action recognition | |
Bavil et al. | Action Capsules: Human skeleton action recognition | |
Wu et al. | Multimodal human action recognition based on spatio-temporal action representation recognition model | |
Xiaolong | Simulation analysis of athletes’ motion recognition based on deep learning method and convolution algorithm | |
CN112801060A (en) | Motion action recognition method and device, model, electronic equipment and storage medium | |
CN114782992A (en) | Super-joint and multi-mode network and behavior identification method thereof | |
Raju | Exercise detection and tracking using MediaPipe BlazePose and Spatial-Temporal Graph Convolutional Neural Network | |
CN112926517B (en) | Artificial intelligence monitoring method | |
Zhong et al. | Research on discriminative skeleton-based action recognition in spatiotemporal fusion and human-robot interaction | |
Sun et al. | A Deep Learning Method for Intelligent Analysis of Sports Training Postures | |
Shi et al. | Graph convolutional networks with objects for skeleton-based action recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210514 |