CN117854155A - Human skeleton action recognition method and system - Google Patents

Human skeleton action recognition method and system Download PDF

Info

Publication number
CN117854155A
CN117854155A CN202410258206.8A CN202410258206A CN117854155A CN 117854155 A CN117854155 A CN 117854155A CN 202410258206 A CN202410258206 A CN 202410258206A CN 117854155 A CN117854155 A CN 117854155A
Authority
CN
China
Prior art keywords
preset
skeleton
motion
sequence
bone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410258206.8A
Other languages
Chinese (zh)
Other versions
CN117854155B (en
Inventor
柳凌峰
臧拓
涂建锋
段梦然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202410258206.8A priority Critical patent/CN117854155B/en
Publication of CN117854155A publication Critical patent/CN117854155A/en
Application granted granted Critical
Publication of CN117854155B publication Critical patent/CN117854155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention provides a human skeleton action recognition method and a system, wherein the method comprises the following steps: calculating corresponding bone characteristics and angle characteristics according to the bone motion sequence, and dividing the bone motion sequence into a corresponding training set and a corresponding verification set; fusion processing is carried out on the bone characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and the time characteristics in the bone motion sequence are extracted in real time through a preset causal convolution algorithm; stacking the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model; and respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set so as to generate a corresponding target motion recognition model. The method and the device can greatly improve the accuracy of motion recognition and improve the user experience.

Description

Human skeleton action recognition method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a human skeleton action recognition method and system.
Background
With the rise of artificial intelligence technology and the promotion of intelligent age, human behavior recognition technology is mature, has been endowed with a great deal of practical significance, and has been widely applied in the fields of virtual reality, intelligent monitoring, motion analysis and the like.
The essence of the existing human behavior recognition is that the action characteristics of a human body are extracted from videos, images or skeleton sequences, and corresponding classification recognition is performed to further identify specific action types, wherein the skeleton sequence data has the characteristic of strong background interference resistance, and meanwhile, the data are sparse and easy to acquire, so that deep application is achieved.
Furthermore, in the prior art, most of the skeleton sequence data is extracted through one-dimensional time convolution, however, the extraction capacity of the extraction mode on the features in the skeleton connection relation is limited in the space dimension, so that the extracted features are not comprehensive enough, in addition, the time sequence can be confused in each time step in the time dimension, and the phenomenon of misjudgment of actions sensitive to some time sequences can occur, so that the use experience of a user is correspondingly reduced.
Disclosure of Invention
Based on the above, the invention aims to provide a human skeleton motion recognition method and system, so as to solve the problems that the human skeleton motion recognition is not comprehensive enough and the motion misjudgment is easy to occur in the prior art.
The first aspect of the embodiment of the invention provides:
a method of human skeletal motion recognition, wherein the method comprises:
acquiring a bone motion sequence generated in the human body motion process in real time through a preset acquisition device, calculating corresponding bone characteristics and angle characteristics according to the bone motion sequence, and dividing the bone motion sequence into a corresponding training set and a corresponding verification set;
fusion processing is carried out on the bone characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and time characteristics in the bone motion sequence are extracted in real time through a preset causal convolution algorithm;
stacking the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model;
and respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set to generate a corresponding target motion recognition model, and recognizing the motion category corresponding to the skeleton motion sequence through the target motion recognition model.
The beneficial effects of the invention are as follows: through the bone motion sequence generated correspondingly in the human motion process, based on the bone motion sequence, the corresponding bone features and angle features can be further analyzed, meanwhile, a training set and a verification set for subsequent training are prepared, further, fusion processing is carried out on the current bone features and angle features to generate space features for constructing a model, meanwhile, a skeleton time sequence corresponding to the current space features is acquired, based on the skeleton time sequence, a needed initial motion recognition model is further constructed, based on the skeleton time sequence, the training is needed to be further carried out correspondingly through the training set and the verification set to finally generate a target motion recognition model for accurately recognizing human motions, the human motions can be recognized through the target motion recognition model, the recognition accuracy is high, the recognition range is wide, and the use experience of users is correspondingly improved greatly.
Further, the step of acquiring the bone movement sequence generated in the human body movement process in real time through the preset acquisition device comprises the following steps of:
Capturing continuous limb movements generated by a human body in real time through an inertial sensor and a depth camera, and extracting corresponding initial bone movement sequences in real time in the continuous limb movements;
extracting a corresponding skeleton sequence from the initial skeleton motion sequence, and detecting the number of motion frames corresponding to the skeleton sequence in real time;
and adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence, wherein the skeleton sequence and the motion frame number both comprise specific numerical values.
Further, the step of adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence includes:
when the skeleton sequence is obtained in real time, detecting a plurality of joint points contained in the skeleton sequence in real time, and detecting three-dimensional Cartesian coordinates corresponding to each joint point in real time;
and adjusting the three-dimensional Cartesian coordinates of each articulation point to a preset position, and uniformly adjusting the motion frame number of the skeleton sequence to a preset value so as to correspondingly generate the skeleton motion sequence.
Further, the step of fusing the bone feature and the angle feature based on a preset rule to generate a corresponding spatial feature includes:
When the bone characteristics and the angle characteristics are obtained in real time, a mapping relation between the bone characteristics and the angle characteristics is constructed in real time, and the bone characteristics and the angle characteristics are integrated based on the mapping relation so as to generate corresponding characteristic frameworks;
the characteristic frameworks are sequentially input into a preset prior mixed graph convolution and a preset dynamic gating graph convolution, a distance graph corresponding to the characteristic frameworks is calculated through the preset prior mixed graph convolution, and a dynamic graph corresponding to the characteristic frameworks is calculated through the preset dynamic gating graph convolution;
and correspondingly generating the spatial features according to the distance map and the dynamic map.
Further, the step of calculating the distance map corresponding to the feature skeleton through the convolution of the preset prior mixed map includes:
randomly selecting a target node from skeleton spines in the characteristic skeleton, and setting the target node to a corresponding source node;
splitting the characteristic skeleton into corresponding rooted trees according to the source nodes, sequentially connecting joint points with the same distance in the rooted trees, and sequentially connecting joint points with different distances together to correspondingly generate the distance map, wherein the source nodes have uniqueness.
Further, the step of calculating the dynamic graph corresponding to the feature skeleton through the convolution of the preset dynamic gating graph includes:
embedding the feature skeleton into low-dimensional features through a preset embedding function to generate a corresponding feature map, and generating a corresponding feature matrix according to the feature map;
and calculating a corresponding adjacent matrix according to the feature matrix, and carrying out feature extraction processing among all the nodes of the feature skeleton through the adjacent matrix so as to correspondingly generate the dynamic graph, wherein the adjacent matrix has uniqueness.
Further, the expression of the preset causal convolution algorithm is:
wherein Caucanv (S) represents the temporal feature, S-i represents the number of historical spatial features, i represents the ith spatial feature, k-1 represents the fill length, x represents the spatial feature, and k represents the size of the convolution kernel.
A second aspect of an embodiment of the present invention proposes:
a human skeletal motion recognition system, wherein the system comprises:
the acquisition module is used for acquiring a bone movement sequence generated in the human body movement process in real time through a preset acquisition device, calculating corresponding bone characteristics and angle characteristics according to the bone movement sequence, and dividing the bone movement sequence into a corresponding training set and a corresponding verification set;
The calculation module is used for carrying out fusion processing on the skeleton characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and extracting time characteristics in the skeleton movement sequence in real time through a preset causal convolution algorithm;
the fusion module is used for carrying out stacking processing on the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model;
the training module is used for respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set so as to generate a corresponding target motion recognition model, and recognizing the motion category corresponding to the skeleton motion sequence through the target motion recognition model.
Further, the acquisition module is specifically configured to:
capturing continuous limb movements generated by a human body in real time through an inertial sensor and a depth camera, and extracting corresponding initial bone movement sequences in real time in the continuous limb movements;
extracting a corresponding skeleton sequence from the initial skeleton motion sequence, and detecting the number of motion frames corresponding to the skeleton sequence in real time;
And adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence, wherein the skeleton sequence and the motion frame number both comprise specific numerical values.
Further, the acquisition module is specifically further configured to:
when the skeleton sequence is obtained in real time, detecting a plurality of joint points contained in the skeleton sequence in real time, and detecting three-dimensional Cartesian coordinates corresponding to each joint point in real time;
and adjusting the three-dimensional Cartesian coordinates of each articulation point to a preset position, and uniformly adjusting the motion frame number of the skeleton sequence to a preset value so as to correspondingly generate the skeleton motion sequence.
Further, the computing module is specifically configured to:
when the bone characteristics and the angle characteristics are obtained in real time, a mapping relation between the bone characteristics and the angle characteristics is constructed in real time, and the bone characteristics and the angle characteristics are integrated based on the mapping relation so as to generate corresponding characteristic frameworks;
the characteristic frameworks are sequentially input into a preset prior mixed graph convolution and a preset dynamic gating graph convolution, a distance graph corresponding to the characteristic frameworks is calculated through the preset prior mixed graph convolution, and a dynamic graph corresponding to the characteristic frameworks is calculated through the preset dynamic gating graph convolution;
And correspondingly generating the spatial features according to the distance map and the dynamic map.
Further, the computing module is specifically configured to:
randomly selecting a target node from skeleton spines in the characteristic skeleton, and setting the target node to a corresponding source node;
splitting the characteristic skeleton into corresponding rooted trees according to the source nodes, sequentially connecting joint points with the same distance in the rooted trees, and sequentially connecting joint points with different distances together to correspondingly generate the distance map, wherein the source nodes have uniqueness.
Further, the computing module is specifically configured to:
embedding the feature skeleton into low-dimensional features through a preset embedding function to generate a corresponding feature map, and generating a corresponding feature matrix according to the feature map;
and calculating a corresponding adjacent matrix according to the feature matrix, and carrying out feature extraction processing among all the nodes of the feature skeleton through the adjacent matrix so as to correspondingly generate the dynamic graph, wherein the adjacent matrix has uniqueness.
Further, the expression of the preset causal convolution algorithm is:
wherein Caucanv (S) represents the temporal feature, S-i represents the number of historical spatial features, i represents the ith spatial feature, k-1 represents the fill length, x represents the spatial feature, and k represents the size of the convolution kernel.
A third aspect of an embodiment of the present invention proposes:
a computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the human skeletal action recognition method as described above when the computer program is executed by the processor.
A fourth aspect of the embodiment of the present invention proposes:
a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the human skeletal action recognition method as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a flowchart of a method for identifying human skeleton actions according to a first embodiment of the present invention;
fig. 2 is a block diagram of a human skeleton motion recognition system according to a sixth embodiment of the present invention.
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, a human skeleton motion recognition method provided by a first embodiment of the present invention is shown, and the human skeleton motion recognition method provided by the present embodiment can recognize a motion of a human body through a constructed target motion recognition model, and meanwhile, the recognition accuracy is high, the recognition range is wide, and the use experience of a user is correspondingly greatly improved.
Specifically, the present embodiment provides:
the human skeleton motion identifying method includes the following steps:
step S10, acquiring a bone motion sequence generated in the human body motion process in real time through a preset acquisition device, calculating corresponding bone characteristics and angle characteristics according to the bone motion sequence, and dividing the bone motion sequence into a corresponding training set and a verification set;
step S20, fusion processing is carried out on the skeleton characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and time characteristics in the skeleton movement sequence are extracted in real time through a preset causal convolution algorithm;
step S30, stacking the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model;
and step S40, respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set to generate a corresponding target motion recognition model, and recognizing the motion category corresponding to the skeleton motion sequence through the target motion recognition model.
Specifically, in the present embodiment, it is first to be described that the human skeleton motion recognition method is implemented based on the existing inertial sensor and depth sensor, where the inertial sensor and depth sensor can directly capture the skeleton motion sequence generated in the human motion process, so as to facilitate the recognition of the subsequent human motion. Furthermore, in order to facilitate subsequent training of a high-precision motion recognition model, at this time, bone features and angle features corresponding to the current bone motion sequence need to be further extracted in real time, and at the same time, the whole of the current bone motion sequence is divided into a corresponding training set and a verification set, so that subsequent processing is facilitated.
Further, in order to accurately identify the action category corresponding to the current skeleton motion sequence, fusion processing is required to be performed on the current skeleton feature and the angle feature automatically through preset rules, corresponding space features are generated, meanwhile, a skeleton time sequence corresponding to the current space feature is further calculated through a preset causal convolution algorithm, a required temporal and spatial feature extraction module can be further trained in an iterative mode based on the fusion processing, specifically, the temporal and spatial feature extraction module can be used for primarily completing identification of human actions, but is not accurate enough, a preset global pooling layer and a preset full-connection layer are further required to be correspondingly fused into the current temporal and spatial feature extraction module, a required initial action identification model can be generated, on the basis, the prepared training set and the verification set are correspondingly input into the current initial action identification model, corresponding iterative training is performed, parameters in the current initial action identification model can be iteratively updated through a gradient descent method, the size of a learning rate can be adjusted through an application annealing method until the training is stopped to a proper round, and the user can accurately identify the target motion in the corresponding training process, and the user can be accurately identified by the training process.
Second embodiment
Further, the step of acquiring the bone movement sequence generated in the human body movement process in real time through the preset acquisition device comprises the following steps of:
capturing continuous limb movements generated by a human body in real time through an inertial sensor and a depth camera, and extracting corresponding initial bone movement sequences in real time in the continuous limb movements;
extracting a corresponding skeleton sequence from the initial skeleton motion sequence, and detecting the number of motion frames corresponding to the skeleton sequence in real time;
and adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence, wherein the skeleton sequence and the motion frame number both comprise specific numerical values.
In particular, in the present embodiment, it should be noted that, in implementation, the inertial sensor and the depth camera are generated by capturing the human body in real timeContinuous limb motion, based on which a desired initial sequence of bone movements is extracted in real time in the current continuous limb motion, wherein the initial sequence of bone movements may be usedWherein V represents the number of joints, c= [ x, y, z]And (3) representing three-dimensional Cartesian coordinates of the joint points, J representing the joint points, T representing sequence numbers, further extracting a required skeleton sequence from the current initial skeleton motion sequence in real time, and simultaneously detecting the motion frame number corresponding to the current skeleton sequence in real time.
Furthermore, the skeleton sequence and the corresponding motion frame number are correspondingly adjusted, so that the required skeleton motion sequence can be further obtained, and the subsequent processing is facilitated.
Further, the step of adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence includes:
when the skeleton sequence is obtained in real time, detecting a plurality of joint points contained in the skeleton sequence in real time, and detecting three-dimensional Cartesian coordinates corresponding to each joint point in real time;
and adjusting the three-dimensional Cartesian coordinates of each articulation point to a preset position, and uniformly adjusting the motion frame number of the skeleton sequence to a preset value so as to correspondingly generate the skeleton motion sequence.
Specifically, in this embodiment, it should be further noted that, after the required skeleton sequence is obtained in real time through the above steps, a plurality of nodes included in the current skeleton sequence and three-dimensional cartesian coordinates corresponding to each node can be correspondingly detected at this time. Further, in order to facilitate subsequent training, to improve training efficiency, normalization pretreatment is required at this time, specifically, the skeleton coordinates are all aligned in a translation manner by using the middle joint point of the spine of the first frame, and the frame numbers of all skeleton sequences are set to be the same frame number, that is, uniformly adjusted to be a preset value, so that the skeleton motion sequences can be correspondingly generated, so that subsequent processing is facilitated.
Third embodiment
Further, the step of fusing the bone feature and the angle feature based on a preset rule to generate a corresponding spatial feature includes:
when the bone characteristics and the angle characteristics are obtained in real time, a mapping relation between the bone characteristics and the angle characteristics is constructed in real time, and the bone characteristics and the angle characteristics are integrated based on the mapping relation so as to generate corresponding characteristic frameworks;
the characteristic frameworks are sequentially input into a preset prior mixed graph convolution and a preset dynamic gating graph convolution, a distance graph corresponding to the characteristic frameworks is calculated through the preset prior mixed graph convolution, and a dynamic graph corresponding to the characteristic frameworks is calculated through the preset dynamic gating graph convolution;
and correspondingly generating the spatial features according to the distance map and the dynamic map.
In addition, in this embodiment, after the required skeleton feature and the required angle feature are obtained in real time through the above steps, in order to facilitate the fusion processing of the two, it is necessary to first detect the mapping relationship between the two, integrate the two according to the mapping relationship, and further generate the required feature skeleton, so as to facilitate the subsequent processing.
Furthermore, the length and the angle of the feature skeleton are rich in features, and play an important role in motion judgment, and the invention designs a multi-feature fusion module for calculating and integrating different skeleton features as inputs of a network, as follows:
1. bone characteristics are calculated. For the original coordinate featuresFor a certain joint point j of the t-th frame, expressed as +.>Skeletal features may be expressed as +.>Wherein->=,/>And represents the node naturally connected to the node j at time t.
2. The bone angle is calculated. In the invention, bone angle information is divided into absolute and relative parts. The invention calculates the bone included angle in a frame in the relative angle part, and for a certain joint point j of the t frame, the joint point j is expressed asTwo nodes naturally connected to the node are +.>And->Re-computing the vector between themAnd->Finally, calculating the included angle between the two vectors:
3. and (5) calculating an absolute angle. Absolute angle calculation refers to the angle between the bone vector and the base vector of the coordinate axis unit. For substrate vectorsAnd (3) representing.
4. Fusion of joint angular bone features. Four forms of skeletal features were extracted using a 1 x 1 convolution, and the fused features were used as input || representations of the model to splice along the channel.
Wherein S is angle Representing the fusion characteristics S direction Representing the skeleton vector, the required skeleton characteristics can be obtained in the mode so as to facilitate subsequent processing.
Further, the step of calculating the distance map corresponding to the feature skeleton through the convolution of the preset prior mixed map includes:
randomly selecting a target node from skeleton spines in the characteristic skeleton, and setting the target node to a corresponding source node;
splitting the characteristic skeleton into corresponding rooted trees according to the source nodes, sequentially connecting joint points with the same distance in the rooted trees, and sequentially connecting joint points with different distances together to correspondingly generate the distance map, wherein the source nodes have uniqueness.
In addition, in this embodiment, it should be further noted that, in order to accurately calculate the distance map corresponding to the above feature skeleton, in spatial modeling, the present invention proposes a prior decoupling map convolution, which decouples the spatial modeling into two parts, including a prior hybrid map convolution using a prior information map connection relationship and a dynamic gate map convolution using a data sample feature information calculation map connection, so as to simulate two cases of a "rough" judgment action through outline judgment and a specific "detail" feature judgment action when the human visual system judges actions.
1. The prior mixture map is convolved.
The connection between bones can be expressed as g= (V, E), where v= { V 1 ,v 2 ,v 3 ,…,v V The } is a set of all nodes, generation EThe connection relation of the table edges is expressed as A epsilon R by an adjacency matrix VxV Conventional graph convolution can be expressed as:
S in the characteristics of the upper layer are represented,expressed +.>Degree matrix, & gt>Is an identity matrix, W represents a convolution coefficient, and graph convolution can realize feature aggregation among space nodes.
Calculating a distance map: the design of the adjacency matrix is the key of graph convolution, the original skeleton graph connection relation is difficult to reflect the characteristic information of different nodes at a long distance, and in order to solve the problem, the invention designs the distance graph according to the prior distance between the nodes to obtain the neighborhood node information which is connected with and complemented with the original graph, and improves the information transmission efficiency. Firstly, selecting a joint point in a backbone as a source node, then decomposing the backbone into a rooted tree according to the distance between the joint point and the source node, wherein different distances represent different semantic spaces, then connecting joint points with the same distance to form a distance graph in the same distance, and connecting the graphs with different distances through the joint points positioned at the backbone position to form a complete distance graph Can be expressed as:
wherein the method comprises the steps ofIndicating a distance of +.>Is a group of distance edges->The method is characterized in that different distance edges are connected through the spine joint, a distance graph and an original skeleton connection graph are mixed and used for extracting spatial features, and a distance priori is an important supplement to the original priori.
Fourth embodiment
Further, the step of calculating the dynamic graph corresponding to the feature skeleton through the convolution of the preset dynamic gating graph includes:
embedding the feature skeleton into low-dimensional features through a preset embedding function to generate a corresponding feature map, and generating a corresponding feature matrix according to the feature map;
and calculating a corresponding adjacent matrix according to the feature matrix, and carrying out feature extraction processing among all the nodes of the feature skeleton through the adjacent matrix so as to correspondingly generate the dynamic graph, wherein the adjacent matrix has uniqueness.
In this embodiment, it should be noted that, in order to accurately calculate the dynamic graph corresponding to the feature skeleton, a dynamic gating graph convolution algorithm is applied, the dynamic graph convolution does not include any prior human body connection structure, and the graph generation is completely driven by data, so that the topological relation of the human body joints is flexibly constructed for each sample in each layer of the network through a data driving mode, and the data features are effectively utilized.
Calculating a dynamic diagram: first using two different embedding functionsAnd->Embedding features (by 1 x 1 convolution or linear layer) into low-dimensional features, resulting in two feature maps +.>These two bits are then combinedAfter the multiplication of the transposed sign matrix, the adjacent matrix can represent the similarity between the joint characteristics, and then the invention provides a gating mechanism which uses a threshold value to screen off the joint points with too low similarity, and then +_>And normalizing to obtain a data self-adaptive adjacency matrix A. The whole process is expressed as follows:
wherein the method comprises the steps ofRepresenting two embedding functions->Gate (·) represents a gating mechanism, one of which can be expressed as:
wherein the method comprises the steps ofIs that the threshold coefficient can be trained with the network, < >>Representing node->And node->Through a gating mechanism, some unimportant edges can be screened out, and the information flow efficiency is improved.
Extracting features through a dynamic diagram: for input featuresAfter 1 x 1 convolution to extract features, -is obtained>Firstly, performing matrix multiplication with a generated self-adaptive adjacent matrix A, extracting features among nodes of skeleton space information through the adjacent matrix A, and further extracting features through 1X 1 convolution to obtain +. >
The method comprises the steps of carrying out a first treatment on the surface of the Where relu denotes the activation function and res denotes the residual connection.
In this embodiment, it should be noted that, in order to accurately and effectively train a required motion recognition model, since the feature importance after convolution of different convolution kernels of the graphs is different, in order to guide the network to pay attention to more important spatial features, the relationship between prior information and data features is balanced, and the invention designs a graph feature fusion module guided by attention.
For features extracted by different spatial convolution kernels, a four-dimensional vector representation may be used,wherein H refers to the number of different convolution kernels, and features obtained by the different convolution kernels are directly added to obtain +.>
Then using time maximization to pool, selecting one frame with maximum weight to obtainThen using the spatially averaged pooling layer to get +.>
Finally, a fully connected layer is utilized to make the selective weight more adaptive and reduce the feature dimension.
Wherein the method comprises the steps ofIs a full connection layer, is a->Is an activation function->Is a batch normalization layer,/->Is the coefficient of the dimension reduction,
the feature is then upscaled using the same operation, at which point. Then use +.>The function normalization calculates the attention coefficients of the different convolution kernels.
The function is:
wherein,
multiplying the attention coefficient and the original feature and adding to obtain the final attention fusion feature. Features obtained by different convolution kernels are multiplied by different attention coefficients to help the network focus on more important spatial features.
Fifth embodiment
Further, the expression of the preset causal convolution algorithm is:
wherein Caucanv (S) represents the temporal feature, S-i represents the number of historical spatial features, i represents the ith spatial feature, k-1 represents the fill length, x represents the spatial feature, and k represents the size of the convolution kernel.
Multi-scale causal dilation time convolution
In the time dimension of the space-time block, the invention provides a multi-scale causal expansion convolution, and in order to solve the problem of future leakage of one-dimensional time convolution, causal convolution is introduced in the time dimension, and the causal convolution limits the flow direction of information in the convolution process, so that the output at the current moment is ensured to be influenced only by the past moment.
Multi-scale temporal features. In addition, a multi-scale causal dilation time convolution is proposed in combination with the use of different dilation convolution coefficients to obtain different time receptive fields. Expressed by the formula:
where k represents the size of the convolution kernel, d represents the expansion ratio, res represents the residual connection, Representing along-goingThe tracks splice features.
Mapping and classifying: the above process is repeated L times, namely, L space-time feature extraction blocks are formed, and then the global pooling layer and the full connection layer are connected to output the final action category.
Referring to fig. 2, a sixth embodiment of the present invention provides:
a human skeletal motion recognition system, wherein the system comprises:
the acquisition module is used for acquiring a bone movement sequence generated in the human body movement process in real time through a preset acquisition device, calculating corresponding bone characteristics and angle characteristics according to the bone movement sequence, and dividing the bone movement sequence into a corresponding training set and a corresponding verification set;
the calculation module is used for carrying out fusion processing on the skeleton characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and extracting time characteristics in the skeleton movement sequence in real time through a preset causal convolution algorithm;
the fusion module is used for carrying out stacking processing on the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model;
The training module is used for respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set so as to generate a corresponding target motion recognition model, and recognizing the motion category corresponding to the skeleton motion sequence through the target motion recognition model.
Further, the acquisition module is specifically configured to:
capturing continuous limb movements generated by a human body in real time through an inertial sensor and a depth camera, and extracting corresponding initial bone movement sequences in real time in the continuous limb movements;
extracting a corresponding skeleton sequence from the initial skeleton motion sequence, and detecting the number of motion frames corresponding to the skeleton sequence in real time;
and adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence, wherein the skeleton sequence and the motion frame number both comprise specific numerical values.
Further, the acquisition module is specifically further configured to:
when the skeleton sequence is obtained in real time, detecting a plurality of joint points contained in the skeleton sequence in real time, and detecting three-dimensional Cartesian coordinates corresponding to each joint point in real time;
And adjusting the three-dimensional Cartesian coordinates of each articulation point to a preset position, and uniformly adjusting the motion frame number of the skeleton sequence to a preset value so as to correspondingly generate the skeleton motion sequence.
Further, the computing module is specifically configured to:
when the bone characteristics and the angle characteristics are obtained in real time, a mapping relation between the bone characteristics and the angle characteristics is constructed in real time, and the bone characteristics and the angle characteristics are integrated based on the mapping relation so as to generate corresponding characteristic frameworks;
the characteristic frameworks are sequentially input into a preset prior mixed graph convolution and a preset dynamic gating graph convolution, a distance graph corresponding to the characteristic frameworks is calculated through the preset prior mixed graph convolution, and a dynamic graph corresponding to the characteristic frameworks is calculated through the preset dynamic gating graph convolution;
and correspondingly generating the spatial features according to the distance map and the dynamic map.
Further, the computing module is specifically configured to:
randomly selecting a target node from skeleton spines in the characteristic skeleton, and setting the target node to a corresponding source node;
Splitting the characteristic skeleton into corresponding rooted trees according to the source nodes, sequentially connecting joint points with the same distance in the rooted trees, and sequentially connecting joint points with different distances together to correspondingly generate the distance map, wherein the source nodes have uniqueness.
Further, the computing module is specifically configured to:
embedding the feature skeleton into low-dimensional features through a preset embedding function to generate a corresponding feature map, and generating a corresponding feature matrix according to the feature map;
and calculating a corresponding adjacent matrix according to the feature matrix, and carrying out feature extraction processing among all the nodes of the feature skeleton through the adjacent matrix so as to correspondingly generate the dynamic graph, wherein the adjacent matrix has uniqueness.
Further, the expression of the preset causal convolution algorithm is:
wherein Caucanv (S) represents the temporal feature, S-i represents the number of historical spatial features, i represents the ith spatial feature, k-1 represents the fill length, x represents the spatial feature, and k represents the size of the convolution kernel.
A seventh embodiment of the present invention provides a computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the human skeletal action recognition method as described above when executing the computer program.
An eighth embodiment of the present invention provides a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a human skeletal motion recognition method as described above.
In summary, the human skeleton motion recognition method and system provided by the embodiment of the invention can recognize the motion of the human body through the constructed target motion recognition model, and simultaneously has higher recognition accuracy and wide recognition range, thereby greatly improving the use experience of the user.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (7)

1. A method for identifying human skeletal motion, the method comprising:
Acquiring a bone motion sequence generated in the human body motion process in real time through a preset acquisition device, calculating corresponding bone characteristics and angle characteristics according to the bone motion sequence, and dividing the bone motion sequence into a corresponding training set and a corresponding verification set;
fusion processing is carried out on the bone characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and time characteristics in the bone motion sequence are extracted in real time through a preset causal convolution algorithm;
stacking the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model;
respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set to generate a corresponding target motion recognition model, and recognizing a motion category corresponding to the skeleton motion sequence through the target motion recognition model;
the step of fusing the bone feature and the angle feature based on a preset rule to generate a corresponding spatial feature includes:
When the bone characteristics and the angle characteristics are obtained in real time, a mapping relation between the bone characteristics and the angle characteristics is constructed in real time, and the bone characteristics and the angle characteristics are integrated based on the mapping relation so as to generate corresponding characteristic frameworks;
the characteristic frameworks are sequentially input into a preset prior mixed graph convolution and a preset dynamic gating graph convolution, a distance graph corresponding to the characteristic frameworks is calculated through the preset prior mixed graph convolution, and a dynamic graph corresponding to the characteristic frameworks is calculated through the preset dynamic gating graph convolution;
correspondingly generating the space features according to the distance map and the dynamic map;
the step of calculating the distance map corresponding to the characteristic framework through the convolution of the preset prior mixed map comprises the following steps:
randomly selecting a target node from skeleton spines in the characteristic skeleton, and setting the target node to a corresponding source node;
splitting the characteristic skeleton into corresponding rooted trees according to the source nodes, sequentially connecting joint points with the same distance in the rooted trees, and sequentially connecting joint points with different distances together to correspondingly generate the distance map, wherein the source nodes have uniqueness;
The step of calculating the dynamic graph corresponding to the characteristic framework through the convolution of the preset dynamic gating graph comprises the following steps:
embedding the feature skeleton into low-dimensional features through a preset embedding function to generate a corresponding feature map, and generating a corresponding feature matrix according to the feature map;
and calculating a corresponding adjacent matrix according to the feature matrix, and carrying out feature extraction processing among all the nodes of the feature skeleton through the adjacent matrix so as to correspondingly generate the dynamic graph, wherein the adjacent matrix has uniqueness.
2. The human skeletal motion recognition method of claim 1, wherein: the step of acquiring the skeleton movement sequence generated in the human body movement process in real time through the preset acquisition device comprises the following steps of:
capturing continuous limb movements generated by a human body in real time through an inertial sensor and a depth camera, and extracting corresponding initial bone movement sequences in real time in the continuous limb movements;
extracting a corresponding skeleton sequence from the initial skeleton motion sequence, and detecting the number of motion frames corresponding to the skeleton sequence in real time;
and adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence, wherein the skeleton sequence and the motion frame number both comprise specific numerical values.
3. The human skeletal motion recognition method of claim 2, wherein: the step of adjusting the skeleton sequence and the motion frame number to correspondingly generate the skeleton motion sequence comprises the following steps:
when the skeleton sequence is obtained in real time, detecting a plurality of joint points contained in the skeleton sequence in real time, and detecting three-dimensional Cartesian coordinates corresponding to each joint point in real time;
and adjusting the three-dimensional Cartesian coordinates of each articulation point to a preset position, and uniformly adjusting the motion frame number of the skeleton sequence to a preset value so as to correspondingly generate the skeleton motion sequence.
4. The human skeletal motion recognition method of claim 1, wherein: the expression of the preset causal convolution algorithm is as follows:
wherein Caucanv (S) represents the temporal feature, S-i represents the number of historical spatial features, i represents the ith spatial feature, k-1 represents the fill length, x represents the spatial feature, and k represents the size of the convolution kernel.
5. A human skeletal motion recognition system for implementing the human skeletal motion recognition method of any one of claims 1 to 4, the system comprising:
The acquisition module is used for acquiring a bone movement sequence generated in the human body movement process in real time through a preset acquisition device, calculating corresponding bone characteristics and angle characteristics according to the bone movement sequence, and dividing the bone movement sequence into a corresponding training set and a corresponding verification set;
the calculation module is used for carrying out fusion processing on the skeleton characteristics and the angle characteristics based on a preset rule so as to generate corresponding space characteristics, and extracting time characteristics in the skeleton movement sequence in real time through a preset causal convolution algorithm;
the fusion module is used for carrying out stacking processing on the preset graph convolution and the preset causal convolution to generate a corresponding space-time feature extraction module, and fusing a preset global pooling layer and a preset full-connection layer into the space-time feature extraction module to generate a corresponding initial action recognition model;
the training module is used for respectively carrying out iterative training on the initial motion recognition model through the training set and the verification set so as to generate a corresponding target motion recognition model, and recognizing the motion category corresponding to the skeleton motion sequence through the target motion recognition model.
6. A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the human skeletal action recognition method of any one of claims 1 to 4 when the computer program is executed.
7. A readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the human skeletal action recognition method of any one of claims 1 to 4.
CN202410258206.8A 2024-03-07 2024-03-07 Human skeleton action recognition method and system Active CN117854155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410258206.8A CN117854155B (en) 2024-03-07 2024-03-07 Human skeleton action recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410258206.8A CN117854155B (en) 2024-03-07 2024-03-07 Human skeleton action recognition method and system

Publications (2)

Publication Number Publication Date
CN117854155A true CN117854155A (en) 2024-04-09
CN117854155B CN117854155B (en) 2024-05-14

Family

ID=90543729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410258206.8A Active CN117854155B (en) 2024-03-07 2024-03-07 Human skeleton action recognition method and system

Country Status (1)

Country Link
CN (1) CN117854155B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536926A (en) * 2021-06-15 2021-10-22 杭州电子科技大学 Human body action recognition method based on distance vector and multi-angle self-adaptive network
WO2022000420A1 (en) * 2020-07-02 2022-01-06 浙江大学 Human body action recognition method, human body action recognition system, and device
CN114821640A (en) * 2022-04-12 2022-07-29 杭州电子科技大学 Skeleton action identification method based on multi-stream multi-scale expansion space-time diagram convolution network
CN116012950A (en) * 2023-02-15 2023-04-25 杭州电子科技大学信息工程学院 Skeleton action recognition method based on multi-heart space-time attention pattern convolution network
US20230196841A1 (en) * 2021-12-21 2023-06-22 Korea Electronics Technology Institute Behavior recognition artificial intelligence network system and method for efficient recognition of hand signals and gestures
CN116343334A (en) * 2023-03-27 2023-06-27 青岛科技大学 Motion recognition method of three-stream self-adaptive graph convolution model fused with joint capture
CN116524601A (en) * 2023-06-21 2023-08-01 深圳市金大智能创新科技有限公司 Self-adaptive multi-stage human behavior recognition model for assisting in monitoring of pension robot
CN116645721A (en) * 2023-04-26 2023-08-25 贵州大学 Sitting posture identification method and system based on deep learning
CN116665300A (en) * 2023-05-29 2023-08-29 杭州电子科技大学信息工程学院 Skeleton action recognition method based on space-time self-adaptive feature fusion graph convolution network
CN117475518A (en) * 2023-12-27 2024-01-30 华东交通大学 Synchronous human motion recognition and prediction method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022000420A1 (en) * 2020-07-02 2022-01-06 浙江大学 Human body action recognition method, human body action recognition system, and device
CN113536926A (en) * 2021-06-15 2021-10-22 杭州电子科技大学 Human body action recognition method based on distance vector and multi-angle self-adaptive network
US20230196841A1 (en) * 2021-12-21 2023-06-22 Korea Electronics Technology Institute Behavior recognition artificial intelligence network system and method for efficient recognition of hand signals and gestures
CN114821640A (en) * 2022-04-12 2022-07-29 杭州电子科技大学 Skeleton action identification method based on multi-stream multi-scale expansion space-time diagram convolution network
CN116012950A (en) * 2023-02-15 2023-04-25 杭州电子科技大学信息工程学院 Skeleton action recognition method based on multi-heart space-time attention pattern convolution network
CN116343334A (en) * 2023-03-27 2023-06-27 青岛科技大学 Motion recognition method of three-stream self-adaptive graph convolution model fused with joint capture
CN116645721A (en) * 2023-04-26 2023-08-25 贵州大学 Sitting posture identification method and system based on deep learning
CN116665300A (en) * 2023-05-29 2023-08-29 杭州电子科技大学信息工程学院 Skeleton action recognition method based on space-time self-adaptive feature fusion graph convolution network
CN116524601A (en) * 2023-06-21 2023-08-01 深圳市金大智能创新科技有限公司 Self-adaptive multi-stage human behavior recognition model for assisting in monitoring of pension robot
CN117475518A (en) * 2023-12-27 2024-01-30 华东交通大学 Synchronous human motion recognition and prediction method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIKUN ZHANG 等: "Graph Edge Convolutional Neural Networks for Skeleton Based Action Recognition", ARXIV, 16 May 2018 (2018-05-16) *
董安;孙频捷;: "基于图卷积的骨架行为识别", 现代计算机, no. 02, 15 January 2020 (2020-01-15) *

Also Published As

Publication number Publication date
CN117854155B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
Liu et al. Attribute-aware face aging with wavelet-based generative adversarial networks
CN111476292B (en) Small sample element learning training method for medical image classification processing artificial intelligence
US11908244B2 (en) Human posture detection utilizing posture reference maps
CN110084173B (en) Human head detection method and device
CN104834922B (en) Gesture identification method based on hybrid neural networks
CN112750140B (en) Information mining-based disguised target image segmentation method
CN112184752A (en) Video target tracking method based on pyramid convolution
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
CN105139004A (en) Face expression identification method based on video sequences
CN108875586B (en) Functional limb rehabilitation training detection method based on depth image and skeleton data multi-feature fusion
CN112257665A (en) Image content recognition method, image recognition model training method, and medium
CN111881731A (en) Behavior recognition method, system, device and medium based on human skeleton
CN112905828B (en) Image retriever, database and retrieval method combining significant features
CN105956570A (en) Lip characteristic and deep learning based smiling face recognition method
Liu et al. Target recognition of sport athletes based on deep learning and convolutional neural network
CN111353385B (en) Pedestrian re-identification method and device based on mask alignment and attention mechanism
CN112906520A (en) Gesture coding-based action recognition method and device
Arif et al. Human pose estimation and object interaction for sports behaviour
CN115018999A (en) Multi-robot-cooperation dense point cloud map construction method and device
CN108597589B (en) Model generation method, target detection method and medical imaging system
CN114792401A (en) Training method, device and equipment of behavior recognition model and storage medium
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
CN116758212A (en) 3D reconstruction method, device, equipment and medium based on self-adaptive denoising algorithm
CN117854155B (en) Human skeleton action recognition method and system
CN113255514B (en) Behavior identification method based on local scene perception graph convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant