CN112507940A - Skeleton action recognition method based on difference guidance representation learning network - Google Patents

Skeleton action recognition method based on difference guidance representation learning network Download PDF

Info

Publication number
CN112507940A
CN112507940A CN202011497126.6A CN202011497126A CN112507940A CN 112507940 A CN112507940 A CN 112507940A CN 202011497126 A CN202011497126 A CN 202011497126A CN 112507940 A CN112507940 A CN 112507940A
Authority
CN
China
Prior art keywords
sequence
representation
differential
difference
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011497126.6A
Other languages
Chinese (zh)
Other versions
CN112507940B (en
Inventor
马千里
陈子鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202011497126.6A priority Critical patent/CN112507940B/en
Publication of CN112507940A publication Critical patent/CN112507940A/en
Application granted granted Critical
Publication of CN112507940B publication Critical patent/CN112507940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a bone action recognition method based on a difference guidance representation learning network, which comprises the following steps: acquiring a skeleton action sequence; calculating a differential value of the skeleton action sequence to obtain a differential sequence; inputting the differential sequence into a differential information module, and obtaining differential characteristic representation by a long-term and short-term memory network; inputting the skeleton action sequence and the differential feature representation into an original information module, and guiding the representation learning of the skeleton action sequence by using the differential feature representation to obtain the feature representation of the original sequence; splicing the differential feature representation and the original sequence feature representation, extracting multi-scale features by a multi-scale convolution neural network, and obtaining pooling representation by using maximum pooling operation; the pooled representation is input to the fully connected layer for classification. The invention uses the long-term and short-term memory network to model the difference information of the skeleton action sequence so as to guide the representation learning of the skeleton action sequence, and uses the multi-scale convolutional neural network to extract the multi-scale characteristics of the skeleton action sequence, thereby improving the accuracy of the skeleton action identification.

Description

Skeleton action recognition method based on difference guidance representation learning network
Technical Field
The invention relates to the technical field of skeleton action recognition, in particular to a skeleton action recognition method based on a difference guide representation learning network.
Background
As an important branch of computer vision, human action recognition has wide application. Traditional research has primarily identified motion from video recorded by a two-dimensional camera. However, the motion of the human body is generally represented and recognized in a three-dimensional space. Therefore, in recent years, motion recognition methods based on three-dimensional human bones have attracted attention, and are widely used in scenes such as human-computer interaction and virtual reality. In the motion recognition problem of human bones, a human body is composed of three-dimensional bones, and the motion of the human body is represented by the motion of bone joints in a three-dimensional space.
Motion recognition based on human bones is generally considered a time series problem. In the traditional method, machine learning methods such as k-means clustering, support vector machines and hidden Markov models are used for estimating adjacent joint points of a human body and identifying different actions. However, the conventional method cannot effectively model complex timing information and motion patterns of the bone motion sequence, resulting in poor recognition effect of the bone motion. Moreover, traditional methods are more difficult to handle bone motion sequences of different lengths.
With the development of deep learning, many neural network-based methods are applied to bone motion recognition. Among them, the recurrent neural networks and long-short term memory networks are more common and effective because they use recurrent structures to better model the timing-dependent information of the bone node sequences. However, the existing bone motion recognition method does not fully model the differential information of the bone sequence, and the differential information reflects the dynamic evolution of the bone sequence and plays an important role in the representation learning of the bone sequence. Bone sequence segments with larger differential values imply a larger range of motion, which provides important information for bone motion recognition. Therefore, it is highly desirable to use differential information to guide the representation learning of the neural network so as to improve the accuracy of the bone motion recognition.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a bone motion recognition method based on a difference guidance representation learning network.
The purpose of the invention can be achieved by adopting the following technical scheme:
a bone motion recognition method based on a difference guide representation learning network comprises the following steps:
s1, acquiring bone motion sequence data, and preprocessing the data;
s2, calculating the difference value of the skeleton action sequence to obtain a difference sequence, inputting the difference sequence into a difference information module, calculating by using a long-term and short-term memory network to obtain difference characteristic representation, then inputting the skeleton action sequence and the difference characteristic representation into an original information module, and guiding the representation learning of the skeleton action sequence by using the difference characteristic representation to obtain original sequence characteristic representation;
s3, splicing the differential feature representation and the original sequence feature representation, extracting the multi-scale features of the skeleton action sequence by using a multi-scale convolution neural network, and obtaining a pooling representation by using a maximum pooling operation;
and S4, inputting the pooled representation into a full connection layer for classification.
Further, the calculation process of the differential feature representation and the original sequence feature representation in step S2 is as follows:
s2.1, calculating the difference value of the bone motion sequence obtained in the step S1 to obtain a difference sequence:
given the original bone motion sequence X ═ { X ═ X1,x2,…,xt…,xTWhere T is the length of the bone action sequence, the corresponding differential sequence
Figure BDA0002842507900000021
The calculation is as follows:
Figure BDA0002842507900000031
in the formula ,xtIs the input data for the time step t,
Figure BDA0002842507900000032
is the difference data for time step t;
s2.2, inputting the differential sequence into a differential information module, in the differential information module, calculating by using a long-short term memory network to obtain differential feature representation, and at a time step t, representing the differential feature representation
Figure BDA00028425079000000315
The calculation formula is as follows:
Figure BDA0002842507900000033
Figure BDA0002842507900000034
Figure BDA0002842507900000035
wherein ,
Figure BDA0002842507900000036
is the output of the hidden layer of the long-short term memory network at time step t-1,
Figure BDA0002842507900000037
is the input data for the time step t,
Figure BDA0002842507900000038
respectively an input gate, a forgetting gate and an output gate of the long-short term memory network,
Figure BDA0002842507900000039
is the information that is currently being added to,
Figure BDA00028425079000000310
is the information of the memory cell, σ, tanh are all nonlinear activation functions,
Figure BDA00028425079000000311
the method is element-by-element multiplication, M is an affine transformation matrix consisting of trainable parameters, and the difference characteristics represent difference information of a skeleton action sequence, reflect the dynamic change of skeleton actions and contribute to skeleton action identification;
s2.3, inputting the skeleton action sequence and the differential feature representation into an original information module, guiding the representation learning of the skeleton action sequence by using the differential feature representation in the original information module to obtain an original sequence feature representation, and calculating an original sequence feature representation h by using a long-short term memory network at a time step ttThe formula of (1) is as follows:
Figure BDA00028425079000000312
ut=tanh(Wu[ht-1,xt]+bu)
Figure BDA00028425079000000313
Figure BDA00028425079000000314
wherein ,ht-1Is the long short term memory network hidden layer output, x, of time step t-1tIs input data of time step t, it,ft,otInput gate, forget gate and output gate u of long-short term memory networktIs currently added information, ctIs information of a memory cell, M' and WuIs an affine transformation matrix composed of trainable parameters, buIs a bias term;
calculating to obtain differential feature representation through a differential information module and an original information module
Figure BDA0002842507900000041
And original sequence feature representation H ═ H1,h2,…,ht…,hT}, wherein ,
Figure BDA0002842507900000042
is a differential characterization of time step t, htIs the original sequence feature representation at time step t.
Further, the process of extracting the multi-scale features of the action sequence by using the multi-scale convolutional neural network in step S3 and obtaining the pooled representation by using the maximum pooling operation is as follows:
s3.1, splicing the differential feature representation and the original sequence feature representation obtained in the step S2:
Figure BDA0002842507900000043
wherein ,
Figure BDA0002842507900000044
is a differential characterization of time step t, htIs the original sequence characteristic representation of time step t, and the splicing operation is applied at all time steps to obtain the sequence representation
Figure BDA0002842507900000045
Is a sequential representation of time step t;
s3.2, extracting multi-scale features of the action sequence by using a multi-scale convolution neural network, capturing bone actions with different amplitudes by using the multi-scale convolution neural network, and further improving the accuracy of bone action identification by setting the F e to Rw×n×2×kIs the convolution kernel of the convolution operation, wherein w, n, k respectively represent the width, height and number of the convolution kernel, and the convolution operation is represented as:
Figure BDA0002842507900000046
wherein ,
Figure BDA0002842507900000047
finger sequence representation
Figure BDA0002842507900000048
Is a convolution operation, f is a non-linear transformation function, bgIs a bias term, applies a convolution kernel to each position of the sequence, using zero padding to generate a feature matrix of the same length as the input
Figure BDA0002842507900000049
Wherein, T and k respectively represent the length of an input sequence and the number of convolution kernels, and G is a feature matrix obtained by performing convolution by using windows with the same size;
using a multi-scale convolution neural network, using windows with different sizes to carry out convolution operation, assuming that r is the number of windows, obtaining r convolution operation results, and splicing to obtain a multi-scale characteristic matrix
Figure BDA0002842507900000051
At a multi-scale feature matrix
Figure BDA0002842507900000052
Using maximal pooling operations in the time dimension of (a) to obtain a pooled representation
Figure BDA0002842507900000053
Further, the classification process in step S4 is as follows:
the pooled expression P obtained in step S3 is input to the full link layer for classification, and the formula is as follows:
Figure BDA0002842507900000054
wherein ,WpIs an affine transformation matrix composed of trainable parameters, bpIs a bias term, softmax is a non-linear activation function,
Figure BDA0002842507900000055
is a predicted distribution;
will reduce cross entropy lossAs a training target, wherein the cross entropy function of two distributions
Figure BDA0002842507900000056
The expression is as follows:
Figure BDA0002842507900000057
in the above formula, y is the true distribution, yiRefers to the i-th dimension of y,
Figure BDA0002842507900000058
finger-shaped
Figure BDA0002842507900000059
The ith dimension of (a).
Compared with the prior art, the invention has the following advantages and effects:
the invention uses the long-short term memory network to model the differential information of the skeleton action sequence, and uses the differential information to guide the representation learning of the skeleton action sequence. And moreover, the multi-scale convolutional neural network is used for extracting the multi-scale features of the bone action sequence, so that the accuracy of bone action identification is further improved.
Drawings
FIG. 1 is a detailed flowchart of a method for recognizing skeletal actions based on a difference guide representation learning network according to an embodiment of the present invention;
fig. 2 is a network structure diagram of a method for recognizing skeletal actions based on a difference-oriented representation learning network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses a bone motion recognition method based on a difference guide representation learning network, as shown in fig. 1, the bone motion recognition method includes the following steps:
and step S1, acquiring the bone motion sequence data and preprocessing the data. In practice, the data used is derived from the "UTD-MHAD" data set. The data set is human skeletal data collected in an indoor environment and contains 27 different actions. Each bone of the data set is composed of 20 joints, each joint being represented using three-dimensional coordinates. The coordinates of human bones are tiled into a 60-dimensional vector, and a plurality of bones in continuous time are represented as a bone motion sequence.
Step S2 is to calculate a difference value of the bone motion sequence obtained in step S1 to obtain a difference sequence. And inputting the differential sequence into a differential information module, and calculating by using a long-term and short-term memory network to obtain differential feature representation. Then, the skeleton action sequence and the differential feature representation are input into an original information module, and the differential feature representation is used for guiding the representation learning of the skeleton action sequence to obtain the original sequence feature representation. The specific process is as follows:
and S2.1, calculating the difference value of the bone motion sequence obtained in the step S1 to obtain a difference sequence. First, as shown in fig. 2, a 5-time step bone motion sequence X ═ X is given1,x2,…,x5The corresponding human body action of the sequence is shooting, and the corresponding difference sequence
Figure BDA0002842507900000061
The calculation is as follows:
Figure BDA0002842507900000062
in the formula ,xtIs the input data for the time step t,
Figure BDA0002842507900000063
is the difference data at time step t.
And S2.2, as shown in the figure 2, inputting the differential sequence into a differential information module, and calculating by using a long-short term memory network to obtain a differential feature representation. At time step t, differential feature representation
Figure BDA0002842507900000064
The calculation formula is as follows:
Figure BDA0002842507900000071
Figure BDA0002842507900000072
Figure BDA0002842507900000073
wherein ,
Figure BDA0002842507900000074
is the output of the hidden layer of the long-short term memory network at time step t-1,
Figure BDA0002842507900000075
is the input data for the time step t,
Figure BDA0002842507900000076
the input gate, the forgetting gate and the output gate of the long-short term memory network are respectively.
Figure BDA0002842507900000077
Is the information that is currently being added to,
Figure BDA0002842507900000078
is the information of the memory cell, σ, tanh are all nonlinear activation functions,
Figure BDA0002842507900000079
is to pursueMultiplication of elements, M is an affine transformation matrix consisting of trainable parameters.
Differential feature representation
Figure BDA00028425079000000710
The differential information of the skeleton action sequence is modeled, the dynamic change of the skeleton action is reflected, and the human skeleton action identification is facilitated. As shown in fig. 2, the more dynamically changing action is the action of shooting a shot, the less dynamically changing action may be a preparatory action before shooting a shot, and the differential information modeling the skeletal sequence of actions helps to better identify the action of "shooting a shot".
And S2.3, as shown in the figure 2, inputting the skeleton action sequence and the differential feature representation into an original information module together, and calculating the feature representation of the original sequence. In the original information module, the differential feature representation is used for guiding the representation learning of the skeleton action sequence to obtain the original sequence feature representation. At time step t, the original sequence feature representation h is calculated using the long-short term memory networktThe formula of (1) is as follows:
Figure BDA00028425079000000711
ut=tanh(Wu[ht-1,xt]+bu)
Figure BDA00028425079000000712
Figure BDA00028425079000000713
wherein ,ht-1Is the long short term memory network hidden layer output, x, of time step t-1tIs input data of time step t, it,ft,otThe input gate, the forgetting gate and the output gate of the long-short term memory network are respectively. u. oftIs currently added information, ctIs information of the memory cell, σ, tanh are nonlinear activationThe function of the function is that of the function,
Figure BDA00028425079000000714
is an element-by-element multiplication, M' and WuIs an affine transformation matrix composed of trainable parameters, buIs the bias term.
Calculating to obtain differential feature representation through a differential information module and an original information module
Figure BDA0002842507900000081
And original sequence feature representation H ═ H1,h2,…,h5}, wherein ,
Figure BDA0002842507900000082
is a differential characterization of time step t, htIs the original sequence feature representation at time step t.
And S3, splicing the difference feature representation and the original sequence feature representation obtained in the step S2, extracting the multi-scale features of the action sequence by using a multi-scale convolution neural network, and obtaining a pooled representation by using a maximum pooling operation. The specific process is as follows:
s3.1, splicing the differential feature representation and the original sequence feature representation obtained in the step S2:
Figure BDA0002842507900000083
wherein ,
Figure BDA0002842507900000084
is a differential characterization of time step t, htIs the original sequence characteristic representation of time step t, and the splicing operation is applied at all time steps to obtain the sequence representation
Figure BDA0002842507900000085
Is a sequential representation of time steps t.
And S3.2, extracting the multi-scale features of the action sequence by using a multi-scale convolutional neural network as shown in figure 2. Let F ∈ Rw ×n×2×kIs the convolution kernel of convolution operation, wherein w, n and k respectively represent the width, height and number of the convolution kernel. In practice, w is selected from 2, 3 and 5, n is set to 128 and k is set to 96. The convolution operation is represented as:
Figure BDA0002842507900000086
wherein ,
Figure BDA0002842507900000087
finger sequence representation
Figure BDA0002842507900000088
Is a convolution operation, f is a non-linear transformation function, bgIs the bias term. A convolution kernel is applied to each position of the sequence, using zero padding to generate a feature matrix of the same length as the input
Figure BDA0002842507900000089
G is the feature matrix resulting from convolution using windows of the same size.
Using a multi-scale convolution neural network, carrying out convolution operation by using 3 windows with different sizes, wherein the sizes of the windows are respectively 2, 3 and 5, obtaining results of the 3 convolution operations, and splicing to obtain a multi-scale characteristic matrix
Figure BDA00028425079000000810
At a multi-scale feature matrix
Figure BDA00028425079000000811
Using maximal pooling operations in the time dimension of (a) to obtain a pooled representation
Figure BDA00028425079000000812
Step S4, the pooled presentation obtained in step S3 is input to the full link layer and classified, and the formula is as follows:
Figure BDA0002842507900000091
wherein ,WpIs an affine transformation matrix composed of trainable parameters, bpIs a bias term, softmax is a non-linear activation function,
Figure BDA0002842507900000092
is the predicted distribution.
Using reduced cross-entropy loss as a training target, wherein the cross-entropy function of two distributions
Figure BDA0002842507900000093
The expression is as follows:
Figure BDA0002842507900000094
in the above formula, y is the true distribution, yiRefers to the i-th dimension of y,
Figure BDA0002842507900000095
finger-shaped
Figure BDA0002842507900000096
The ith dimension of (a).
In summary, in the present embodiment, the long and short term memory network is used to learn the difference information of the bone motion sequence, the difference information is used to guide the representation learning of the original sequence, and the multi-scale convolutional neural network is used to extract the multi-scale features of the bone motion sequence, so as to improve the accuracy of bone motion recognition. Compared with the traditional method, the method fully models the differential information of the skeleton action sequence, is more sensitive to the change of skeleton actions, is beneficial to more accurately identifying the actions of people by a machine, and serves the scenes of human-computer interaction, virtual reality and the like.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (4)

1. A bone motion recognition method based on a difference guide representation learning network is characterized by comprising the following steps:
s1, acquiring bone motion sequence data, and preprocessing the data;
s2, calculating the difference value of the skeleton action sequence to obtain a difference sequence, inputting the difference sequence into a difference information module, calculating by using a long-term and short-term memory network to obtain difference characteristic representation, then inputting the skeleton action sequence and the difference characteristic representation into an original information module, and guiding the representation learning of the skeleton action sequence by using the difference characteristic representation to obtain original sequence characteristic representation;
s3, splicing the differential feature representation and the original sequence feature representation, extracting the multi-scale features of the skeleton action sequence by using a multi-scale convolution neural network, and obtaining a pooling representation by using a maximum pooling operation;
and S4, inputting the pooled representation into a full connection layer for classification.
2. A method for recognizing bone motion based on difference-oriented representation learning network as claimed in claim 1, wherein the calculation process of the difference feature representation and the original sequence feature representation in step S2 is as follows:
s2.1, calculating the difference value of the bone motion sequence obtained in the step S1 to obtain a difference sequence:
given the original bone motion sequence X ═ { X ═ X1,x2,…,xt…,xTWhere T is the length of the bone action sequence, the corresponding differential sequence
Figure FDA0002842507890000011
The calculation is as follows:
Figure FDA0002842507890000012
in the formula ,xtIs the input data for the time step t,
Figure FDA0002842507890000013
is the difference data for time step t;
s2.2, inputting the differential sequence into a differential information module, in the differential information module, calculating by using a long-short term memory network to obtain differential feature representation, and at a time step t, representing the differential feature representation
Figure FDA0002842507890000021
The calculation formula is as follows:
Figure FDA0002842507890000022
Figure FDA0002842507890000023
Figure FDA0002842507890000024
wherein ,
Figure FDA0002842507890000025
is the output of the hidden layer of the long-short term memory network at time step t-1,
Figure FDA0002842507890000026
is the input data for the time step t,
Figure FDA0002842507890000027
respectively an input gate, a forgetting gate and an output gate of the long-short term memory network,
Figure FDA0002842507890000028
is the information that is currently being added to,
Figure FDA0002842507890000029
is the information of the memory cell, σ, tanh are all nonlinear activation functions,
Figure FDA00028425078900000210
is an element-by-element multiplication, M is an affine transformation matrix consisting of trainable parameters;
s2.3, inputting the skeleton action sequence and the differential feature representation into an original information module, guiding the representation learning of the skeleton action sequence by using the differential feature representation in the original information module to obtain an original sequence feature representation, and calculating an original sequence feature representation h by using a long-short term memory network at a time step ttThe formula of (1) is as follows:
Figure FDA00028425078900000211
ut=tanh(Wu[ht-1,xt]+bu)
Figure FDA00028425078900000212
Figure FDA00028425078900000213
wherein ,ht-1Is the long short term memory network hidden layer output, x, of time step t-1tIs input data of time step t, it,ft,otInput gate, forget gate and output gate u of long-short term memory networktIs currently added information, ctIs information of a memory cell, M' and WuIs an affine transformation matrix composed of trainable parameters, buIs a bias term;
calculating to obtain differential feature representation through a differential information module and an original information module
Figure FDA0002842507890000031
And original sequence feature representation H ═ H1,h2,…,ht…,hT}, wherein ,
Figure FDA0002842507890000032
is a differential characterization of time step t, htIs the original sequence feature representation at time step t.
3. A method for recognizing bone motion based on difference-oriented representation learning network as claimed in claim 1, wherein the step S3 is implemented by using a multi-scale convolutional neural network to extract multi-scale features of the motion sequence, and obtaining the pooled representation by using the maximal pooling operation as follows:
s3.1, splicing the differential feature representation and the original sequence feature representation obtained in the step S2:
Figure FDA0002842507890000033
wherein ,
Figure FDA0002842507890000034
is a differential characterization of time step t, htIs the original sequence characteristic representation of time step t, and the splicing operation is applied at all time steps to obtain the sequence representation
Figure FDA0002842507890000035
Figure FDA0002842507890000036
Is a sequential representation of time step t;
s3.2, extracting multi-scale features of the action sequence by using a multi-scale convolution neural network, and setting F to be within the range of Rw×n×2×kIs the convolution kernel of the convolution operation, wherein w, n, k respectively represent the width, height and number of the convolution kernel, and the convolution operation is represented as:
Figure FDA0002842507890000037
wherein ,
Figure FDA0002842507890000038
finger sequence representation
Figure FDA0002842507890000039
Is a convolution operation, f is a non-linear transformation function, bgIs a bias term, applies a convolution kernel to each position of the sequence, using zero padding to generate a feature matrix of the same length as the input
Figure FDA00028425078900000310
Wherein, T and k respectively represent the length of an input sequence and the number of convolution kernels, and G is a feature matrix obtained by performing convolution by using windows with the same size;
using a multi-scale convolution neural network, using windows with different sizes to carry out convolution operation, assuming that r is the number of windows, obtaining r convolution operation results, and splicing to obtain a multi-scale characteristic matrix
Figure FDA0002842507890000041
At a multi-scale feature matrix
Figure FDA0002842507890000042
Using maximal pooling operations in the time dimension of (a) to obtain a pooled representation
Figure FDA0002842507890000043
4. The method for recognizing bone motion based on difference guide expression learning network as claimed in claim 1, wherein the classification procedure in step S4 is as follows:
the pooled expression P obtained in step S3 is input to the full link layer for classification, and the formula is as follows:
Figure FDA0002842507890000044
wherein ,WpIs an affine transformation matrix composed of trainable parameters, bpIs a bias term, softmax is a non-linear activation function,
Figure FDA0002842507890000045
is a predicted distribution;
using reduced cross-entropy loss as a training target, wherein the cross-entropy function of two distributions
Figure FDA0002842507890000046
The expression is as follows:
Figure FDA0002842507890000047
in the above formula, y is the true distribution, yiRefers to the i-th dimension of y,
Figure FDA0002842507890000048
finger-shaped
Figure FDA0002842507890000049
The ith dimension of (a).
CN202011497126.6A 2020-12-17 2020-12-17 Bone action recognition method based on differential guidance representation learning network Active CN112507940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011497126.6A CN112507940B (en) 2020-12-17 2020-12-17 Bone action recognition method based on differential guidance representation learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011497126.6A CN112507940B (en) 2020-12-17 2020-12-17 Bone action recognition method based on differential guidance representation learning network

Publications (2)

Publication Number Publication Date
CN112507940A true CN112507940A (en) 2021-03-16
CN112507940B CN112507940B (en) 2023-08-25

Family

ID=74922265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011497126.6A Active CN112507940B (en) 2020-12-17 2020-12-17 Bone action recognition method based on differential guidance representation learning network

Country Status (1)

Country Link
CN (1) CN112507940B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661919A (en) * 2022-09-26 2023-01-31 珠海视熙科技有限公司 Repeated action cycle statistical method and device, fitness equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228109A (en) * 2016-07-08 2016-12-14 天津大学 A kind of action identification method based on skeleton motion track
CN111310707A (en) * 2020-02-28 2020-06-19 山东大学 Skeleton-based method and system for recognizing attention network actions
CN111339942A (en) * 2020-02-26 2020-06-26 山东大学 Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
CN111709323A (en) * 2020-05-29 2020-09-25 重庆大学 Gesture recognition method based on lie group and long-and-short term memory network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228109A (en) * 2016-07-08 2016-12-14 天津大学 A kind of action identification method based on skeleton motion track
CN111339942A (en) * 2020-02-26 2020-06-26 山东大学 Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
CN111310707A (en) * 2020-02-28 2020-06-19 山东大学 Skeleton-based method and system for recognizing attention network actions
CN111709323A (en) * 2020-05-29 2020-09-25 重庆大学 Gesture recognition method based on lie group and long-and-short term memory network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661919A (en) * 2022-09-26 2023-01-31 珠海视熙科技有限公司 Repeated action cycle statistical method and device, fitness equipment and storage medium
CN115661919B (en) * 2022-09-26 2023-08-29 珠海视熙科技有限公司 Repeated action period statistics method and device, body-building equipment and storage medium

Also Published As

Publication number Publication date
CN112507940B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
Lim et al. Isolated sign language recognition using convolutional neural network hand modelling and hand energy image
CN110222580B (en) Human hand three-dimensional attitude estimation method and device based on three-dimensional point cloud
CN112184752A (en) Video target tracking method based on pyramid convolution
Xin et al. Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition
CN108182260B (en) Multivariate time sequence classification method based on semantic selection
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN111222486B (en) Training method, device and equipment for hand gesture recognition model and storage medium
Sincan et al. Using motion history images with 3d convolutional networks in isolated sign language recognition
CN106599810B (en) A kind of head pose estimation method encoded certainly based on stack
CN113963445A (en) Pedestrian falling action recognition method and device based on attitude estimation
CN111028319A (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
CN114419732A (en) HRNet human body posture identification method based on attention mechanism optimization
CN115761905A (en) Diver action identification method based on skeleton joint points
CN111429481A (en) Target tracking method, device and terminal based on adaptive expression
Sun et al. Two-stage deep regression enhanced depth estimation from a single RGB image
CN113902989A (en) Live scene detection method, storage medium and electronic device
CN112507940B (en) Bone action recognition method based on differential guidance representation learning network
Liang et al. Egocentric hand pose estimation and distance recovery in a single RGB image
Ikram et al. Real time hand gesture recognition using leap motion controller based on CNN-SVM architechture
Sun et al. Human action recognition using a convolutional neural network based on skeleton heatmaps from two-stage pose estimation
CN111368637A (en) Multi-mask convolution neural network-based object recognition method for transfer robot
Usman et al. Skeleton-based motion prediction: A survey
CN115830707A (en) Multi-view human behavior identification method based on hypergraph learning
CN111914751B (en) Image crowd density identification detection method and system
CN114581485A (en) Target tracking method based on language modeling pattern twin network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant