CN112507940A - Skeleton action recognition method based on difference guidance representation learning network - Google Patents
Skeleton action recognition method based on difference guidance representation learning network Download PDFInfo
- Publication number
- CN112507940A CN112507940A CN202011497126.6A CN202011497126A CN112507940A CN 112507940 A CN112507940 A CN 112507940A CN 202011497126 A CN202011497126 A CN 202011497126A CN 112507940 A CN112507940 A CN 112507940A
- Authority
- CN
- China
- Prior art keywords
- sequence
- representation
- differential
- difference
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000009471 action Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 29
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 49
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000011176 pooling Methods 0.000 claims abstract description 12
- 230000006403 short-term memory Effects 0.000 claims abstract description 9
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 6
- 230000007787 long-term memory Effects 0.000 claims abstract description 6
- 230000015654 memory Effects 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 15
- 230000009466 transformation Effects 0.000 claims description 12
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 9
- 238000009826 distribution Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000012512 characterization method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a bone action recognition method based on a difference guidance representation learning network, which comprises the following steps: acquiring a skeleton action sequence; calculating a differential value of the skeleton action sequence to obtain a differential sequence; inputting the differential sequence into a differential information module, and obtaining differential characteristic representation by a long-term and short-term memory network; inputting the skeleton action sequence and the differential feature representation into an original information module, and guiding the representation learning of the skeleton action sequence by using the differential feature representation to obtain the feature representation of the original sequence; splicing the differential feature representation and the original sequence feature representation, extracting multi-scale features by a multi-scale convolution neural network, and obtaining pooling representation by using maximum pooling operation; the pooled representation is input to the fully connected layer for classification. The invention uses the long-term and short-term memory network to model the difference information of the skeleton action sequence so as to guide the representation learning of the skeleton action sequence, and uses the multi-scale convolutional neural network to extract the multi-scale characteristics of the skeleton action sequence, thereby improving the accuracy of the skeleton action identification.
Description
Technical Field
The invention relates to the technical field of skeleton action recognition, in particular to a skeleton action recognition method based on a difference guide representation learning network.
Background
As an important branch of computer vision, human action recognition has wide application. Traditional research has primarily identified motion from video recorded by a two-dimensional camera. However, the motion of the human body is generally represented and recognized in a three-dimensional space. Therefore, in recent years, motion recognition methods based on three-dimensional human bones have attracted attention, and are widely used in scenes such as human-computer interaction and virtual reality. In the motion recognition problem of human bones, a human body is composed of three-dimensional bones, and the motion of the human body is represented by the motion of bone joints in a three-dimensional space.
Motion recognition based on human bones is generally considered a time series problem. In the traditional method, machine learning methods such as k-means clustering, support vector machines and hidden Markov models are used for estimating adjacent joint points of a human body and identifying different actions. However, the conventional method cannot effectively model complex timing information and motion patterns of the bone motion sequence, resulting in poor recognition effect of the bone motion. Moreover, traditional methods are more difficult to handle bone motion sequences of different lengths.
With the development of deep learning, many neural network-based methods are applied to bone motion recognition. Among them, the recurrent neural networks and long-short term memory networks are more common and effective because they use recurrent structures to better model the timing-dependent information of the bone node sequences. However, the existing bone motion recognition method does not fully model the differential information of the bone sequence, and the differential information reflects the dynamic evolution of the bone sequence and plays an important role in the representation learning of the bone sequence. Bone sequence segments with larger differential values imply a larger range of motion, which provides important information for bone motion recognition. Therefore, it is highly desirable to use differential information to guide the representation learning of the neural network so as to improve the accuracy of the bone motion recognition.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a bone motion recognition method based on a difference guidance representation learning network.
The purpose of the invention can be achieved by adopting the following technical scheme:
a bone motion recognition method based on a difference guide representation learning network comprises the following steps:
s1, acquiring bone motion sequence data, and preprocessing the data;
s2, calculating the difference value of the skeleton action sequence to obtain a difference sequence, inputting the difference sequence into a difference information module, calculating by using a long-term and short-term memory network to obtain difference characteristic representation, then inputting the skeleton action sequence and the difference characteristic representation into an original information module, and guiding the representation learning of the skeleton action sequence by using the difference characteristic representation to obtain original sequence characteristic representation;
s3, splicing the differential feature representation and the original sequence feature representation, extracting the multi-scale features of the skeleton action sequence by using a multi-scale convolution neural network, and obtaining a pooling representation by using a maximum pooling operation;
and S4, inputting the pooled representation into a full connection layer for classification.
Further, the calculation process of the differential feature representation and the original sequence feature representation in step S2 is as follows:
s2.1, calculating the difference value of the bone motion sequence obtained in the step S1 to obtain a difference sequence:
given the original bone motion sequence X ═ { X ═ X1,x2,…,xt…,xTWhere T is the length of the bone action sequence, the corresponding differential sequenceThe calculation is as follows:
s2.2, inputting the differential sequence into a differential information module, in the differential information module, calculating by using a long-short term memory network to obtain differential feature representation, and at a time step t, representing the differential feature representationThe calculation formula is as follows:
wherein ,is the output of the hidden layer of the long-short term memory network at time step t-1,is the input data for the time step t,respectively an input gate, a forgetting gate and an output gate of the long-short term memory network,is the information that is currently being added to,is the information of the memory cell, σ, tanh are all nonlinear activation functions,the method is element-by-element multiplication, M is an affine transformation matrix consisting of trainable parameters, and the difference characteristics represent difference information of a skeleton action sequence, reflect the dynamic change of skeleton actions and contribute to skeleton action identification;
s2.3, inputting the skeleton action sequence and the differential feature representation into an original information module, guiding the representation learning of the skeleton action sequence by using the differential feature representation in the original information module to obtain an original sequence feature representation, and calculating an original sequence feature representation h by using a long-short term memory network at a time step ttThe formula of (1) is as follows:
ut=tanh(Wu[ht-1,xt]+bu)
wherein ,ht-1Is the long short term memory network hidden layer output, x, of time step t-1tIs input data of time step t, it,ft,otInput gate, forget gate and output gate u of long-short term memory networktIs currently added information, ctIs information of a memory cell, M' and WuIs an affine transformation matrix composed of trainable parameters, buIs a bias term;
calculating to obtain differential feature representation through a differential information module and an original information moduleAnd original sequence feature representation H ═ H1,h2,…,ht…,hT}, wherein ,is a differential characterization of time step t, htIs the original sequence feature representation at time step t.
Further, the process of extracting the multi-scale features of the action sequence by using the multi-scale convolutional neural network in step S3 and obtaining the pooled representation by using the maximum pooling operation is as follows:
s3.1, splicing the differential feature representation and the original sequence feature representation obtained in the step S2:
wherein ,is a differential characterization of time step t, htIs the original sequence characteristic representation of time step t, and the splicing operation is applied at all time steps to obtain the sequence representationIs a sequential representation of time step t;
s3.2, extracting multi-scale features of the action sequence by using a multi-scale convolution neural network, capturing bone actions with different amplitudes by using the multi-scale convolution neural network, and further improving the accuracy of bone action identification by setting the F e to Rw×n×2×kIs the convolution kernel of the convolution operation, wherein w, n, k respectively represent the width, height and number of the convolution kernel, and the convolution operation is represented as:
wherein ,finger sequence representationIs a convolution operation, f is a non-linear transformation function, bgIs a bias term, applies a convolution kernel to each position of the sequence, using zero padding to generate a feature matrix of the same length as the inputWherein, T and k respectively represent the length of an input sequence and the number of convolution kernels, and G is a feature matrix obtained by performing convolution by using windows with the same size;
using a multi-scale convolution neural network, using windows with different sizes to carry out convolution operation, assuming that r is the number of windows, obtaining r convolution operation results, and splicing to obtain a multi-scale characteristic matrixAt a multi-scale feature matrixUsing maximal pooling operations in the time dimension of (a) to obtain a pooled representation
Further, the classification process in step S4 is as follows:
the pooled expression P obtained in step S3 is input to the full link layer for classification, and the formula is as follows:
wherein ,WpIs an affine transformation matrix composed of trainable parameters, bpIs a bias term, softmax is a non-linear activation function,is a predicted distribution;
will reduce cross entropy lossAs a training target, wherein the cross entropy function of two distributionsThe expression is as follows:
in the above formula, y is the true distribution, yiRefers to the i-th dimension of y,finger-shapedThe ith dimension of (a).
Compared with the prior art, the invention has the following advantages and effects:
the invention uses the long-short term memory network to model the differential information of the skeleton action sequence, and uses the differential information to guide the representation learning of the skeleton action sequence. And moreover, the multi-scale convolutional neural network is used for extracting the multi-scale features of the bone action sequence, so that the accuracy of bone action identification is further improved.
Drawings
FIG. 1 is a detailed flowchart of a method for recognizing skeletal actions based on a difference guide representation learning network according to an embodiment of the present invention;
fig. 2 is a network structure diagram of a method for recognizing skeletal actions based on a difference-oriented representation learning network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses a bone motion recognition method based on a difference guide representation learning network, as shown in fig. 1, the bone motion recognition method includes the following steps:
and step S1, acquiring the bone motion sequence data and preprocessing the data. In practice, the data used is derived from the "UTD-MHAD" data set. The data set is human skeletal data collected in an indoor environment and contains 27 different actions. Each bone of the data set is composed of 20 joints, each joint being represented using three-dimensional coordinates. The coordinates of human bones are tiled into a 60-dimensional vector, and a plurality of bones in continuous time are represented as a bone motion sequence.
Step S2 is to calculate a difference value of the bone motion sequence obtained in step S1 to obtain a difference sequence. And inputting the differential sequence into a differential information module, and calculating by using a long-term and short-term memory network to obtain differential feature representation. Then, the skeleton action sequence and the differential feature representation are input into an original information module, and the differential feature representation is used for guiding the representation learning of the skeleton action sequence to obtain the original sequence feature representation. The specific process is as follows:
and S2.1, calculating the difference value of the bone motion sequence obtained in the step S1 to obtain a difference sequence. First, as shown in fig. 2, a 5-time step bone motion sequence X ═ X is given1,x2,…,x5The corresponding human body action of the sequence is shooting, and the corresponding difference sequenceThe calculation is as follows:
And S2.2, as shown in the figure 2, inputting the differential sequence into a differential information module, and calculating by using a long-short term memory network to obtain a differential feature representation. At time step t, differential feature representationThe calculation formula is as follows:
wherein ,is the output of the hidden layer of the long-short term memory network at time step t-1,is the input data for the time step t,the input gate, the forgetting gate and the output gate of the long-short term memory network are respectively.Is the information that is currently being added to,is the information of the memory cell, σ, tanh are all nonlinear activation functions,is to pursueMultiplication of elements, M is an affine transformation matrix consisting of trainable parameters.
Differential feature representationThe differential information of the skeleton action sequence is modeled, the dynamic change of the skeleton action is reflected, and the human skeleton action identification is facilitated. As shown in fig. 2, the more dynamically changing action is the action of shooting a shot, the less dynamically changing action may be a preparatory action before shooting a shot, and the differential information modeling the skeletal sequence of actions helps to better identify the action of "shooting a shot".
And S2.3, as shown in the figure 2, inputting the skeleton action sequence and the differential feature representation into an original information module together, and calculating the feature representation of the original sequence. In the original information module, the differential feature representation is used for guiding the representation learning of the skeleton action sequence to obtain the original sequence feature representation. At time step t, the original sequence feature representation h is calculated using the long-short term memory networktThe formula of (1) is as follows:
ut=tanh(Wu[ht-1,xt]+bu)
wherein ,ht-1Is the long short term memory network hidden layer output, x, of time step t-1tIs input data of time step t, it,ft,otThe input gate, the forgetting gate and the output gate of the long-short term memory network are respectively. u. oftIs currently added information, ctIs information of the memory cell, σ, tanh are nonlinear activationThe function of the function is that of the function,is an element-by-element multiplication, M' and WuIs an affine transformation matrix composed of trainable parameters, buIs the bias term.
Calculating to obtain differential feature representation through a differential information module and an original information moduleAnd original sequence feature representation H ═ H1,h2,…,h5}, wherein ,is a differential characterization of time step t, htIs the original sequence feature representation at time step t.
And S3, splicing the difference feature representation and the original sequence feature representation obtained in the step S2, extracting the multi-scale features of the action sequence by using a multi-scale convolution neural network, and obtaining a pooled representation by using a maximum pooling operation. The specific process is as follows:
s3.1, splicing the differential feature representation and the original sequence feature representation obtained in the step S2:
wherein ,is a differential characterization of time step t, htIs the original sequence characteristic representation of time step t, and the splicing operation is applied at all time steps to obtain the sequence representationIs a sequential representation of time steps t.
And S3.2, extracting the multi-scale features of the action sequence by using a multi-scale convolutional neural network as shown in figure 2. Let F ∈ Rw ×n×2×kIs the convolution kernel of convolution operation, wherein w, n and k respectively represent the width, height and number of the convolution kernel. In practice, w is selected from 2, 3 and 5, n is set to 128 and k is set to 96. The convolution operation is represented as:
wherein ,finger sequence representationIs a convolution operation, f is a non-linear transformation function, bgIs the bias term. A convolution kernel is applied to each position of the sequence, using zero padding to generate a feature matrix of the same length as the inputG is the feature matrix resulting from convolution using windows of the same size.
Using a multi-scale convolution neural network, carrying out convolution operation by using 3 windows with different sizes, wherein the sizes of the windows are respectively 2, 3 and 5, obtaining results of the 3 convolution operations, and splicing to obtain a multi-scale characteristic matrixAt a multi-scale feature matrixUsing maximal pooling operations in the time dimension of (a) to obtain a pooled representation
Step S4, the pooled presentation obtained in step S3 is input to the full link layer and classified, and the formula is as follows:
wherein ,WpIs an affine transformation matrix composed of trainable parameters, bpIs a bias term, softmax is a non-linear activation function,is the predicted distribution.
Using reduced cross-entropy loss as a training target, wherein the cross-entropy function of two distributionsThe expression is as follows:
in the above formula, y is the true distribution, yiRefers to the i-th dimension of y,finger-shapedThe ith dimension of (a).
In summary, in the present embodiment, the long and short term memory network is used to learn the difference information of the bone motion sequence, the difference information is used to guide the representation learning of the original sequence, and the multi-scale convolutional neural network is used to extract the multi-scale features of the bone motion sequence, so as to improve the accuracy of bone motion recognition. Compared with the traditional method, the method fully models the differential information of the skeleton action sequence, is more sensitive to the change of skeleton actions, is beneficial to more accurately identifying the actions of people by a machine, and serves the scenes of human-computer interaction, virtual reality and the like.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (4)
1. A bone motion recognition method based on a difference guide representation learning network is characterized by comprising the following steps:
s1, acquiring bone motion sequence data, and preprocessing the data;
s2, calculating the difference value of the skeleton action sequence to obtain a difference sequence, inputting the difference sequence into a difference information module, calculating by using a long-term and short-term memory network to obtain difference characteristic representation, then inputting the skeleton action sequence and the difference characteristic representation into an original information module, and guiding the representation learning of the skeleton action sequence by using the difference characteristic representation to obtain original sequence characteristic representation;
s3, splicing the differential feature representation and the original sequence feature representation, extracting the multi-scale features of the skeleton action sequence by using a multi-scale convolution neural network, and obtaining a pooling representation by using a maximum pooling operation;
and S4, inputting the pooled representation into a full connection layer for classification.
2. A method for recognizing bone motion based on difference-oriented representation learning network as claimed in claim 1, wherein the calculation process of the difference feature representation and the original sequence feature representation in step S2 is as follows:
s2.1, calculating the difference value of the bone motion sequence obtained in the step S1 to obtain a difference sequence:
given the original bone motion sequence X ═ { X ═ X1,x2,…,xt…,xTWhere T is the length of the bone action sequence, the corresponding differential sequenceThe calculation is as follows:
s2.2, inputting the differential sequence into a differential information module, in the differential information module, calculating by using a long-short term memory network to obtain differential feature representation, and at a time step t, representing the differential feature representationThe calculation formula is as follows:
wherein ,is the output of the hidden layer of the long-short term memory network at time step t-1,is the input data for the time step t,respectively an input gate, a forgetting gate and an output gate of the long-short term memory network,is the information that is currently being added to,is the information of the memory cell, σ, tanh are all nonlinear activation functions,is an element-by-element multiplication, M is an affine transformation matrix consisting of trainable parameters;
s2.3, inputting the skeleton action sequence and the differential feature representation into an original information module, guiding the representation learning of the skeleton action sequence by using the differential feature representation in the original information module to obtain an original sequence feature representation, and calculating an original sequence feature representation h by using a long-short term memory network at a time step ttThe formula of (1) is as follows:
ut=tanh(Wu[ht-1,xt]+bu)
wherein ,ht-1Is the long short term memory network hidden layer output, x, of time step t-1tIs input data of time step t, it,ft,otInput gate, forget gate and output gate u of long-short term memory networktIs currently added information, ctIs information of a memory cell, M' and WuIs an affine transformation matrix composed of trainable parameters, buIs a bias term;
calculating to obtain differential feature representation through a differential information module and an original information moduleAnd original sequence feature representation H ═ H1,h2,…,ht…,hT}, wherein ,is a differential characterization of time step t, htIs the original sequence feature representation at time step t.
3. A method for recognizing bone motion based on difference-oriented representation learning network as claimed in claim 1, wherein the step S3 is implemented by using a multi-scale convolutional neural network to extract multi-scale features of the motion sequence, and obtaining the pooled representation by using the maximal pooling operation as follows:
s3.1, splicing the differential feature representation and the original sequence feature representation obtained in the step S2:
wherein ,is a differential characterization of time step t, htIs the original sequence characteristic representation of time step t, and the splicing operation is applied at all time steps to obtain the sequence representation Is a sequential representation of time step t;
s3.2, extracting multi-scale features of the action sequence by using a multi-scale convolution neural network, and setting F to be within the range of Rw×n×2×kIs the convolution kernel of the convolution operation, wherein w, n, k respectively represent the width, height and number of the convolution kernel, and the convolution operation is represented as:
wherein ,finger sequence representationIs a convolution operation, f is a non-linear transformation function, bgIs a bias term, applies a convolution kernel to each position of the sequence, using zero padding to generate a feature matrix of the same length as the inputWherein, T and k respectively represent the length of an input sequence and the number of convolution kernels, and G is a feature matrix obtained by performing convolution by using windows with the same size;
using a multi-scale convolution neural network, using windows with different sizes to carry out convolution operation, assuming that r is the number of windows, obtaining r convolution operation results, and splicing to obtain a multi-scale characteristic matrixAt a multi-scale feature matrixUsing maximal pooling operations in the time dimension of (a) to obtain a pooled representation
4. The method for recognizing bone motion based on difference guide expression learning network as claimed in claim 1, wherein the classification procedure in step S4 is as follows:
the pooled expression P obtained in step S3 is input to the full link layer for classification, and the formula is as follows:
wherein ,WpIs an affine transformation matrix composed of trainable parameters, bpIs a bias term, softmax is a non-linear activation function,is a predicted distribution;
using reduced cross-entropy loss as a training target, wherein the cross-entropy function of two distributionsThe expression is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011497126.6A CN112507940B (en) | 2020-12-17 | 2020-12-17 | Bone action recognition method based on differential guidance representation learning network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011497126.6A CN112507940B (en) | 2020-12-17 | 2020-12-17 | Bone action recognition method based on differential guidance representation learning network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112507940A true CN112507940A (en) | 2021-03-16 |
CN112507940B CN112507940B (en) | 2023-08-25 |
Family
ID=74922265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011497126.6A Active CN112507940B (en) | 2020-12-17 | 2020-12-17 | Bone action recognition method based on differential guidance representation learning network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112507940B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115661919A (en) * | 2022-09-26 | 2023-01-31 | 珠海视熙科技有限公司 | Repeated action cycle statistical method and device, fitness equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228109A (en) * | 2016-07-08 | 2016-12-14 | 天津大学 | A kind of action identification method based on skeleton motion track |
CN111310707A (en) * | 2020-02-28 | 2020-06-19 | 山东大学 | Skeleton-based method and system for recognizing attention network actions |
CN111339942A (en) * | 2020-02-26 | 2020-06-26 | 山东大学 | Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment |
CN111709323A (en) * | 2020-05-29 | 2020-09-25 | 重庆大学 | Gesture recognition method based on lie group and long-and-short term memory network |
-
2020
- 2020-12-17 CN CN202011497126.6A patent/CN112507940B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228109A (en) * | 2016-07-08 | 2016-12-14 | 天津大学 | A kind of action identification method based on skeleton motion track |
CN111339942A (en) * | 2020-02-26 | 2020-06-26 | 山东大学 | Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment |
CN111310707A (en) * | 2020-02-28 | 2020-06-19 | 山东大学 | Skeleton-based method and system for recognizing attention network actions |
CN111709323A (en) * | 2020-05-29 | 2020-09-25 | 重庆大学 | Gesture recognition method based on lie group and long-and-short term memory network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115661919A (en) * | 2022-09-26 | 2023-01-31 | 珠海视熙科技有限公司 | Repeated action cycle statistical method and device, fitness equipment and storage medium |
CN115661919B (en) * | 2022-09-26 | 2023-08-29 | 珠海视熙科技有限公司 | Repeated action period statistics method and device, body-building equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112507940B (en) | 2023-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lim et al. | Isolated sign language recognition using convolutional neural network hand modelling and hand energy image | |
CN110222580B (en) | Human hand three-dimensional attitude estimation method and device based on three-dimensional point cloud | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
Xin et al. | Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition | |
CN108182260B (en) | Multivariate time sequence classification method based on semantic selection | |
CN111368759B (en) | Monocular vision-based mobile robot semantic map construction system | |
CN111222486B (en) | Training method, device and equipment for hand gesture recognition model and storage medium | |
Sincan et al. | Using motion history images with 3d convolutional networks in isolated sign language recognition | |
CN106599810B (en) | A kind of head pose estimation method encoded certainly based on stack | |
CN113963445A (en) | Pedestrian falling action recognition method and device based on attitude estimation | |
CN111028319A (en) | Three-dimensional non-photorealistic expression generation method based on facial motion unit | |
CN114419732A (en) | HRNet human body posture identification method based on attention mechanism optimization | |
CN115761905A (en) | Diver action identification method based on skeleton joint points | |
CN111429481A (en) | Target tracking method, device and terminal based on adaptive expression | |
Sun et al. | Two-stage deep regression enhanced depth estimation from a single RGB image | |
CN113902989A (en) | Live scene detection method, storage medium and electronic device | |
CN112507940B (en) | Bone action recognition method based on differential guidance representation learning network | |
Liang et al. | Egocentric hand pose estimation and distance recovery in a single RGB image | |
Ikram et al. | Real time hand gesture recognition using leap motion controller based on CNN-SVM architechture | |
Sun et al. | Human action recognition using a convolutional neural network based on skeleton heatmaps from two-stage pose estimation | |
CN111368637A (en) | Multi-mask convolution neural network-based object recognition method for transfer robot | |
Usman et al. | Skeleton-based motion prediction: A survey | |
CN115830707A (en) | Multi-view human behavior identification method based on hypergraph learning | |
CN111914751B (en) | Image crowd density identification detection method and system | |
CN114581485A (en) | Target tracking method based on language modeling pattern twin network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |