CN113780129A - Motion recognition method based on unsupervised graph sequence predictive coding and storage medium - Google Patents

Motion recognition method based on unsupervised graph sequence predictive coding and storage medium Download PDF

Info

Publication number
CN113780129A
CN113780129A CN202111009498.4A CN202111009498A CN113780129A CN 113780129 A CN113780129 A CN 113780129A CN 202111009498 A CN202111009498 A CN 202111009498A CN 113780129 A CN113780129 A CN 113780129A
Authority
CN
China
Prior art keywords
sequence
graph
network
data
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111009498.4A
Other languages
Chinese (zh)
Other versions
CN113780129B (en
Inventor
赵生捷
梁爽
叶珂男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202111009498.4A priority Critical patent/CN113780129B/en
Publication of CN113780129A publication Critical patent/CN113780129A/en
Application granted granted Critical
Publication of CN113780129B publication Critical patent/CN113780129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an action recognition method based on unsupervised graph sequence predictive coding and a storage medium, wherein the action recognition method comprises the training and the use of a model and is used for recognizing various actions performed by a human body in a skeleton sequence, and the action recognition method aims to solve the problems that the existing action recognition method highly depends on a large amount of marked data, and has lower precision under the condition of only a small amount of marks, and the existing unsupervised method is over-fitted without utilizing topological information of a graph and has poorer serious generalization capability. The system method comprises the steps of unchanged visual angle transformation, resampling and block-level bone map data enhancement of bone sequence data; embedding and extracting a space-time graph convolution bone sequence block; aggregating context features of the graph convolution cyclic neural network; the predictive coding constructs a positive sample pair and a negative sample pair; and extracting features through a pre-training model, and obtaining action categories corresponding to the bone sequences to be recognized by using a classifier. Compared with the prior art, the method has the advantages of low training difficulty, high identification precision, excellent performance and the like.

Description

Motion recognition method based on unsupervised graph sequence predictive coding and storage medium
Technical Field
The present invention relates to the field of motion recognition technologies, and in particular, to a motion recognition method and a storage medium based on unsupervised graph sequence predictive coding.
Background
In computer vision tasks, motion recognition is a hot problem that is now of great interest. The fields of unmanned robots, smart cities, intelligent transportation and the like need to analyze and recognize human behaviors. In recent years, as image convolution is valued and utilized by more and more researchers, the development of a pose estimation algorithm and a depth sensor, and the robustness and the visual removal characteristics of skeleton data are concentrated on the characteristics of actions, and action identification by using the skeleton data becomes a hot point of current research.
Early motion recognition was mainly based on still pictures. In recent years, as research progresses, more and more researchers have given more attention to the dynamic nature of motion, and thus have turned their attention to video-based motion recognition. The most significant difference of video-based motion recognition compared to still picture-based methods is the increase of the time dimension, the data becoming a time sequence of 2D pictures. The time dimension provides rich features and brings great challenges — computational power and increased storage space. Skeletal-based motion recognition alleviates the computational requirements of motion recognition algorithms, but most methods are based on supervised tasks, highly dependent on the number and quality of data set samples. Because of the high inter-class similarity of actions, accurately labeling enough data to train a deep learning model is a challenging and costly problem, and it is therefore highly desirable for researchers to find a robust, label-free method to learn representations of action recognition to better utilize temporal and spatial information. Existing unsupervised work attempts to address the borrowing task of drawing or reconstructing a skeleton sequence using the potential embedding of encoders. However, these codec models typically flatten spatial channels into a single feature vector, ignoring the spatial relationships of the skeleton map. And these borrowing tasks often have problems with overfitting and are not always helpful in downstream tasks.
Disclosure of Invention
The present invention aims to overcome the defects of the prior art, and provides an action recognition method based on unsupervised graph sequence predictive coding with low training difficulty, high recognition accuracy and excellent performance, and a storage medium.
The purpose of the invention can be realized by the following technical scheme:
an action recognition method based on unsupervised graph sequence predictive coding, the action recognition method comprises the following steps:
step 1: acquiring a skeleton data sequence, and preprocessing the data sequence to obtain an input training data block;
step 2: inputting the input training data block into a null graph convolutional network f (-) to obtain an embedded representation of the sequence skeleton graph block, inputting the embedded representation into a cyclic neural network g (-) and aggregating context information;
and step 3: predicting the next sequence of bone picture block embedded representation through a prediction network phi (-) according to the context information, inputting the predicted embedded representation into a recurrent neural network g (-) to obtain a new context representation, and repeating for a plurality of times to obtain a series of predicted picture embedded representations;
and 4, step 4: comparing the obtained prediction graph embedded representation with the real graph embedded representation, optimizing the space-time graph convolution network f (-) and the graph convolution cyclic neural network g (-) and the prediction network phi (-) through comparing loss function reverse conduction, and obtaining a pre-training model through a plurality of iterations;
and 5: removing the prediction network phi (-) according to the obtained pre-training model, taking the parts of the space-time graph convolution network f (-) and the cyclic neural network g (-) as feature extractors, adding a classifier on the upper layer of the feature extractors, and obtaining a final classification model through training of inputting labeled data;
step 6: acquiring a bone data sequence to be detected, and preprocessing the bone data sequence to obtain an input prediction data block;
and 7: and inputting the input prediction data block into the classification model, predicting various action probabilities of the people needing to be identified, and completing action identification.
Preferably, the step 1 specifically comprises:
step 1-1: for a given bone sequenceObtaining the bone sequence data of the corrected view angle by the data X through view angle invariant transformation F (-) to obtain
Figure BDA0003238364090000021
Step 1-2: bone sequence data for a given corrected view angle
Figure BDA0003238364090000022
And input sample window size TwindowFirst, will have TsampleThe skeleton sequence of the frame is upsampled to T by linear interpolationwindowA sequence of xk frames, where k ∈ N +, Twindow·(k-1)<Tsample<Twindow·k;
Step 1-3: for interpolated data obtained in the preceding step
Figure BDA0003238364090000023
Is divided into fractions containing TpatchSequence block of frame, P ═ P1,p2,...,pnFor each sequence block piApplying random skeleton map data enhancement to finally obtain enhanced skeleton sequence blocks
Figure BDA0003238364090000024
Preferably, the step 2 specifically comprises:
step 2-1: according to the bone sequence block obtained in the step 1
Figure BDA0003238364090000025
Inputting the input data block into the space-map convolutional network f (-) to obtain the embedded representation
Figure BDA0003238364090000031
Step 2-2: according to step 2-1: the resulting embedded representation
Figure BDA0003238364090000032
Obtaining a context representation C in an input graph convolution recurrent neural network g (-) toi
Preferably, the step 3 specifically comprises:
step 3-1: context information C obtained according to step 2iPredicting a next sequence of bone tile-embedded representations over a prediction network phi (·)
Figure BDA0003238364090000033
Step 3-2: graph-embedded representation obtained according to step 3-1
Figure BDA0003238364090000034
Obtaining context information via a graph convolution recurrent neural network g (-) to
Figure BDA0003238364090000035
Step 3-3: context information obtained according to step 3-2
Figure BDA0003238364090000036
Repeating the step 3-1 and the step 3-2 for several times by analogy to obtain a series of predicted graph embedding representations
Figure BDA0003238364090000037
Preferably, the space-time graph convolution network f (-) and the recurrent neural network g (-) are both constructed based on a graph convolution neural network, and the prediction network Φ (-) is constructed based on a neural network.
More preferably, the graph convolution rule of the space-time graph convolution network f (-) and the recurrent neural network g (-) is:
Figure BDA0003238364090000038
wherein the content of the first and second substances,
Figure BDA0003238364090000039
and
Figure BDA00032383640900000310
respectively representing an input characteristic diagram and an output characteristic diagram;
Figure BDA00032383640900000311
the unit matrix I is added to the drawing defined tie matrix A, namely the node itself links the node itself,
Figure BDA00032383640900000312
representing its diagonal matrix, τ the activation function, and Θ the learnable weight matrix of the atlas layer.
More preferably, the structure of the recurrent neural network g (-) is based on a gated recurrent unit GRU, and the calculation rule is as follows:
Figure BDA00032383640900000313
Figure BDA00032383640900000314
Figure BDA00032383640900000315
Figure BDA00032383640900000316
wherein z istIndicating an update gate, rtA reset gate is shown, which is,
Figure BDA00032383640900000317
representing candidate activation vectors;
Figure BDA00032383640900000318
is a graph convolution operator; an indication of a hadamard product; sigma represents a Sigmoid activation function, and psi is a Tanh activation function; omegazz、ωhz、ωzrAnd ωhrRespectively representing the parameters of each memory gate; q. q.stIs the memory/forgetting weight.
Preferably, the contrast loss function in step 4 is specifically:
Figure BDA0003238364090000041
wherein z isi,kAnd
Figure BDA0003238364090000042
respectively represent z taken from the i-th samplekAnd
Figure BDA0003238364090000043
Figure BDA0003238364090000044
representing embedded representation pairs
Figure BDA0003238364090000045
The similarity of (c).
Preferably, the step 5 specifically comprises:
step 5-1: the training model obtained according to the step 4 comprises a space-time graph convolution network f (-) and a graph convolution recurrent neural network g (-) and a prediction network phi (-) and only f (-) and g (-) are used for replacing phi (-) with a classifier network
Figure BDA0003238364090000046
Constructing a classification model;
step 5-2: inputting the training data with the labels, and training the labeled training data to obtain a final classification model.
A storage medium storing a motion recognition method based on unsupervised graph sequence predictive coding according to any one of the above.
Compared with the prior art, the invention has the following beneficial effects:
firstly, the skeleton action recognition framework based on the unsupervised graph convolution can learn the effective representation of human body action from unlabeled data through comparison and learning, so that the requirement of sample labeling is reduced, and the training difficulty is simplified.
Secondly, the recognition precision is high: the motion recognition method based on the unsupervised graph sequence predictive coding simultaneously and fully utilizes the space and time dependency by utilizing the graph convolution and the contrast learning, avoids the limitation of generative learning and sample-based contrast learning in the motion recognition based on the unsupervised skeleton, and improves the motion recognition precision.
Thirdly, excellent performance: compared with the latest SOTA method on three reference data sets, the action identification method based on the unsupervised graph sequence predictive coding has the performance that the SOTA is higher than 20 percent.
Drawings
FIG. 1 is a flow chart of a method of motion recognition in the present invention;
FIG. 2 is a schematic view of the overall framework of the present invention;
FIG. 3 is a schematic diagram of training of a contrast learning-based and training model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
As shown in fig. 1, the present embodiment provides a skeleton motion recognition method based on unsupervised graph convolution, and the main objective is to learn a representation of motion recognition from unlabeled data by using an unsupervised contrast learning method, and to maximally utilize time information of a skeleton sequence and space information of a skeleton graph, and perform classification model training on the learned representation by using a small amount of labeled data, so as to more accurately recognize human motions.
As shown in fig. 1 and fig. 2, the method for identifying an action based on unsupervised graph sequence predictive coding in this embodiment mainly includes the following steps:
step 1: obtaining a skeleton data sequence, preprocessing the data through view angle invariant transformation, time window resampling and block-level data enhancement, and obtaining an input training data block which is segmented into specific lengths and has a fixed window size;
the method specifically comprises the following steps:
step 1-1: for given bone sequence data X, obtaining bone sequence data of corrected visual angle through constant visual angle transformation F (-) to obtain
Figure BDA0003238364090000051
Step 1-2: bone sequence data for a given corrected view angle
Figure BDA0003238364090000052
And input sample window size TwindowFirst, will have TsampleThe skeleton sequence of the frame is upsampled to T by linear interpolationwindowA sequence of xk frames, where k ∈ N +, Twindow·(k-1)<Tsample<Twindow·k;
Step 1-3: for interpolated data obtained in the preceding step
Figure BDA0003238364090000053
Is divided into fractions containing TpatchSequence block of frame, P ═ P1,p2,...,pnFor each sequence block piRandom skeleton map data enhancement is applied, the same enhancement is applied in blocks, and different enhancements are applied among blocks; the enhancement comprises displacement, inclination and rotation, and finally the enhanced bone sequence block is obtained
Figure BDA0003238364090000054
Step 2: inputting the input training data block into a null graph convolutional network f (-) to obtain an embedded representation of the sequence skeleton graph block, inputting the embedded representation into a cyclic neural network g (-) and aggregating context information;
the method specifically comprises the following steps:
step 2-1: according to the bone sequence block obtained in the step 1
Figure BDA0003238364090000055
Inputting the input data block into the space-map convolutional network f (-) to obtain the embedded representation
Figure BDA0003238364090000056
Step 2-2: according to step 2-1: the resulting embedded representation
Figure BDA0003238364090000057
Obtaining a context representation C in an input graph convolution recurrent neural network g (-) toi
And step 3: predicting the next sequence of bone picture block embedded representation through a prediction network phi (-) according to the context information, inputting the predicted embedded representation into a recurrent neural network g (-) to obtain a new context representation, and repeating for a plurality of times to obtain a series of predicted picture embedded representations;
the method specifically comprises the following steps:
step 3-1: context information C obtained according to step 2iPredicting a next sequence of bone tile-embedded representations over a prediction network phi (·)
Figure BDA0003238364090000061
Step 3-2: graph-embedded representation obtained according to step 3-1
Figure BDA0003238364090000062
Obtaining context information via a graph convolution recurrent neural network g (-) to
Figure BDA0003238364090000063
Step 3-3: context information obtained according to step 3-2
Figure BDA0003238364090000064
Repeating the step 3-1 and the step 3-2 for a plurality of times by analogy to obtain a series of predicted graph inlaysIn represents
Figure BDA0003238364090000065
And 4, step 4: comparing the obtained prediction graph embedded representation with the real graph embedded representation, optimizing the space-time graph convolution network f (-) and the graph convolution cyclic neural network g (-) and the prediction network phi (-) through comparing the loss function reverse conduction, and obtaining a pre-training model through a plurality of iterations, as shown in FIG. 3;
in the embodiment, both the space-time graph convolution network f (-) and the recurrent neural network g (-) are constructed based on a graph convolution neural network, and the prediction network phi (-) is constructed based on a neural network;
the graph convolution rule of the space-time graph convolution network f (-) and the recurrent neural network g (-) is as follows:
Figure BDA0003238364090000066
wherein the content of the first and second substances,
Figure BDA0003238364090000067
and
Figure BDA0003238364090000068
respectively representing an input characteristic diagram and an output characteristic diagram;
Figure BDA0003238364090000069
the unit matrix I is added to the drawing defined tie matrix A, namely the node itself links the node itself,
Figure BDA00032383640900000610
representing the angle matrix, tau representing the activation function, theta representing the learnable weight matrix of the graph convolution layer;
the construction of the recurrent neural network g (-) is based on a gated recurrent unit GRU, and the calculation rule is as follows:
Figure BDA00032383640900000611
Figure BDA00032383640900000612
Figure BDA00032383640900000613
Figure BDA00032383640900000614
wherein z istIndicating an update gate, rtA reset gate is shown, which is,
Figure BDA00032383640900000615
representing candidate activation vectors;
Figure BDA00032383640900000616
is a graph convolution operator; an indication of a hadamard product; sigma represents a Sigmoid activation function, and psi is a Tanh activation function; omegazz、ωhz、ωzrAnd ωhrRespectively representing the parameters of each memory gate; q. q.stIs the memory/forgetting weight.
The contrast loss function is specifically:
Figure BDA0003238364090000071
wherein z isi,kAnd
Figure BDA0003238364090000072
respectively represent z taken from the i-th samplekAnd
Figure BDA0003238364090000073
Figure BDA0003238364090000074
representing embedded representation pairs
Figure BDA0003238364090000075
The similarity of (2);
and 5: removing the prediction network phi (-) according to the obtained pre-training model, taking the parts of the space-time graph convolution network f (-) and the cyclic neural network g (-) as feature extractors, adding a classifier on the upper layer of the feature extractors, and obtaining a final classification model through training of inputting labeled data;
the method specifically comprises the following steps:
step 5-1: the training model obtained according to the step 4 comprises a space-time graph convolution network f (-) and a graph convolution recurrent neural network g (-) and a prediction network phi (-) and only f (-) and g (-) are used for replacing phi (-) with a classifier network
Figure BDA0003238364090000077
Constructing a classification model;
step 5-2: inputting labeled training data, and performing labeled data training to obtain a final classification model;
step 6: acquiring a bone data sequence to be detected, and preprocessing the bone data sequence to obtain an input prediction data block;
and 7: and inputting the input prediction data block into the classification model, predicting various action probabilities of the people needing to be identified, and completing action identification.
In the embodiment, the prediction network phi (-) is a single-layer fully-connected neural network construction and classifier network
Figure BDA0003238364090000078
The multi-classification classifier is obtained by training through methods such as a multilayer perceptron.
In order to support and verify the performance of the motion recognition method proposed by the present invention, the method is compared with other latest leading-edge motion recognition methods on three widely used public standard data sets, and the comparison results are shown in table 1.
Experimental comparisons three widely used public standard data sets were used: NTU RGB + D60, Northwestern-UCLA (NW-UCLA) and UWA3D Multiview Activity II (UWA 3D). The experiment adopts a linear probe verification method widely used by an unsupervised learning method to verify, namely, the weight of a pre-training model is fixed, a linear classifier taking the output characteristics of the pre-training model as input is trained, and the performance of a test set is reported to measure the effectiveness of the learning representation.
TABLE 1 comparative results
Figure BDA0003238364090000076
Figure BDA0003238364090000081
The comparison results show that the motion recognition method proposed in this example is excellent in performance.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A motion recognition method based on unsupervised graph sequence predictive coding is characterized by comprising the following steps:
step 1: acquiring a skeleton data sequence, and preprocessing the data sequence to obtain an input training data block;
step 2: inputting the input training data block into a null graph convolutional network f (-) to obtain an embedded representation of the sequence skeleton graph block, inputting the embedded representation into a cyclic neural network g (-) and aggregating context information;
and step 3: predicting the next sequence of bone picture block embedded representation through a prediction network phi (-) according to the context information, inputting the predicted embedded representation into a recurrent neural network g (-) to obtain a new context representation, and repeating for a plurality of times to obtain a series of predicted picture embedded representations;
and 4, step 4: comparing the obtained prediction graph embedded representation with the real graph embedded representation, optimizing the space-time graph convolution network f (-) and the graph convolution cyclic neural network g (-) and the prediction network phi (-) through comparing loss function reverse conduction, and obtaining a pre-training model through a plurality of iterations;
and 5: removing the prediction network phi (-) according to the obtained pre-training model, taking the parts of the space-time graph convolution network f (-) and the cyclic neural network g (-) as feature extractors, adding a classifier on the upper layer of the feature extractors, and obtaining a final classification model through training of inputting labeled data;
step 6: acquiring a bone data sequence to be detected, and preprocessing the bone data sequence to obtain an input prediction data block;
and 7: and inputting the input prediction data block into the classification model, predicting various action probabilities of the people needing to be identified, and completing action identification.
2. The method for motion recognition based on unsupervised graph sequence predictive coding according to claim 1, wherein the step 1 specifically comprises:
step 1-1: for given bone sequence data X, obtaining bone sequence data of corrected visual angle through constant visual angle transformation F (-) to obtain
Figure FDA0003238364080000011
Step 1-2: bone sequence data for a given corrected view angle
Figure FDA0003238364080000012
And input sample window size TwindowFirst, will have TsampleThe skeleton sequence of the frame is upsampled to T by linear interpolationwindowA sequence of xk frames, where k ∈ N +, Twindow·(k-1)<Tsample<Twindow·k;
Step 1-3: for interpolated data obtained in the preceding step
Figure FDA0003238364080000013
Is divided into fractions containing TpatchSequence block of frame, P ═ P1,p2,...,pnFor each sequence block piApplying random skeleton map data enhancement to finally obtain enhanced skeleton sequence blocks
Figure FDA0003238364080000021
3. The method for motion recognition based on unsupervised graph sequence predictive coding according to claim 1, wherein the step 2 specifically comprises:
step 2-1: according to the bone sequence block obtained in the step 1
Figure FDA0003238364080000022
Inputting the input data block into the space-map convolutional network f (-) to obtain the embedded representation
Figure FDA0003238364080000023
Step 2-2: according to step 2-1: the resulting embedded representation
Figure FDA0003238364080000024
Obtaining a context representation C in an input graph convolution recurrent neural network g (-) toi
4. The method for motion recognition based on unsupervised graph sequence predictive coding according to claim 1, wherein the step 3 specifically comprises:
step 3-1: context information C obtained according to step 2iPredicting a next sequence of bone tile-embedded representations over a prediction network phi (·)
Figure FDA0003238364080000025
Step 3-2: according toGraph-embedded representation from step 3-1
Figure FDA0003238364080000026
Obtaining context information via a graph convolution recurrent neural network g (-) to
Figure FDA0003238364080000027
Step 3-3: context information obtained according to step 3-2
Figure FDA0003238364080000028
Repeating the step 3-1 and the step 3-2 for several times by analogy to obtain a series of predicted graph embedding representations
Figure FDA0003238364080000029
5. The method of claim 1, wherein the space-time graph convolutional network f (-) and the recurrent neural network g (-) are constructed based on a graph convolution neural network, and the prediction network Φ (-) is constructed based on a neural network.
6. The method of claim 5, wherein the graph convolution rule between the spatio-temporal graph convolution network f (-) and the recurrent neural network g (-) is as follows:
Figure FDA00032383640800000210
wherein the content of the first and second substances,
Figure FDA00032383640800000211
and
Figure FDA00032383640800000212
respectively representing input characteristicsOutputting a characteristic diagram;
Figure FDA00032383640800000213
the unit matrix I is added to the drawing defined tie matrix A, namely the node itself links the node itself,
Figure FDA00032383640800000214
representing its diagonal matrix, τ the activation function, and Θ the learnable weight matrix of the atlas layer.
7. The method of claim 5, wherein the recurrent neural network g (-) is constructed based on gated recurrent units GRU, and the calculation rule is:
Figure FDA0003238364080000031
Figure FDA0003238364080000032
Figure FDA0003238364080000033
Figure FDA0003238364080000034
wherein z istIndicating an update gate, rtA reset gate is shown, which is,
Figure FDA0003238364080000035
representing candidate activation vectors;
Figure FDA0003238364080000036
for pattern convolutionAn operator; an indication of a hadamard product; sigma represents a Sigmoid activation function, and psi is a Tanh activation function; omegazz、ωhz、ωzrAnd ωhrRespectively representing the parameters of each memory gate; q. q.stIs the memory/forgetting weight.
8. The method according to claim 1, wherein the contrast loss function in step 4 is specifically:
Figure FDA0003238364080000037
wherein z isi,kAnd
Figure FDA0003238364080000038
respectively represent z taken from the i-th samplekAnd
Figure FDA0003238364080000039
Figure FDA00032383640800000310
representing embedded representation pairs
Figure FDA00032383640800000311
The similarity of (c).
9. The method for motion recognition based on unsupervised graph sequence predictive coding according to claim 1, wherein the step 5 specifically comprises:
step 5-1: the training model obtained according to the step 4 comprises a space-time graph convolution network f (-) and a graph convolution recurrent neural network g (-) and a prediction network phi (-) and only f (-) and g (-) are used for replacing phi (-) with a classifier network
Figure FDA00032383640800000312
Constructing a classification model;
step 5-2: inputting the training data with the labels, and training the labeled training data to obtain a final classification model.
10. A storage medium storing an unsupervised graph sequence predictive coding-based motion recognition method according to any one of claims 1 to 9.
CN202111009498.4A 2021-08-31 2021-08-31 Action recognition method based on unsupervised graph sequence predictive coding and storage medium Active CN113780129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111009498.4A CN113780129B (en) 2021-08-31 2021-08-31 Action recognition method based on unsupervised graph sequence predictive coding and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111009498.4A CN113780129B (en) 2021-08-31 2021-08-31 Action recognition method based on unsupervised graph sequence predictive coding and storage medium

Publications (2)

Publication Number Publication Date
CN113780129A true CN113780129A (en) 2021-12-10
CN113780129B CN113780129B (en) 2023-07-04

Family

ID=78840308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111009498.4A Active CN113780129B (en) 2021-08-31 2021-08-31 Action recognition method based on unsupervised graph sequence predictive coding and storage medium

Country Status (1)

Country Link
CN (1) CN113780129B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019397A (en) * 2022-06-15 2022-09-06 北京大学深圳研究生院 Comparison self-monitoring human behavior recognition method and system based on temporal-spatial information aggregation
CN115035606A (en) * 2022-08-11 2022-09-09 天津大学 Bone action recognition method based on segment-driven contrast learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059620A (en) * 2019-04-17 2019-07-26 安徽艾睿思智能科技有限公司 Bone Activity recognition method based on space-time attention
CN111310707A (en) * 2020-02-28 2020-06-19 山东大学 Skeleton-based method and system for recognizing attention network actions
CN111339942A (en) * 2020-02-26 2020-06-26 山东大学 Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
WO2021069945A1 (en) * 2019-10-09 2021-04-15 Toyota Motor Europe Method for recognizing activities using separate spatial and temporal attention weights

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059620A (en) * 2019-04-17 2019-07-26 安徽艾睿思智能科技有限公司 Bone Activity recognition method based on space-time attention
WO2021069945A1 (en) * 2019-10-09 2021-04-15 Toyota Motor Europe Method for recognizing activities using separate spatial and temporal attention weights
CN111339942A (en) * 2020-02-26 2020-06-26 山东大学 Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
CN111310707A (en) * 2020-02-28 2020-06-19 山东大学 Skeleton-based method and system for recognizing attention network actions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
管珊珊;张益农;: "基于残差时空图卷积网络的3D人体行为识别", 计算机应用与软件, no. 03 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019397A (en) * 2022-06-15 2022-09-06 北京大学深圳研究生院 Comparison self-monitoring human behavior recognition method and system based on temporal-spatial information aggregation
CN115019397B (en) * 2022-06-15 2024-04-19 北京大学深圳研究生院 Method and system for identifying contrasting self-supervision human body behaviors based on time-space information aggregation
CN115035606A (en) * 2022-08-11 2022-09-09 天津大学 Bone action recognition method based on segment-driven contrast learning
CN115035606B (en) * 2022-08-11 2022-10-21 天津大学 Bone action recognition method based on segment-driven contrast learning

Also Published As

Publication number Publication date
CN113780129B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN110414432B (en) Training method of object recognition model, object recognition method and corresponding device
Yang et al. Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN108171209B (en) Face age estimation method for metric learning based on convolutional neural network
CN106919903B (en) robust continuous emotion tracking method based on deep learning
Wang et al. Deep learning algorithms with applications to video analytics for a smart city: A survey
Mukhopadhyay et al. Facial emotion recognition based on textural pattern and convolutional neural network
CN113158815B (en) Unsupervised pedestrian re-identification method, system and computer readable medium
CN111582210B (en) Human body behavior recognition method based on quantum neural network
CN113780129B (en) Action recognition method based on unsupervised graph sequence predictive coding and storage medium
CN112307995A (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
CN115100709B (en) Feature separation image face recognition and age estimation method
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN112070010B (en) Pedestrian re-recognition method for enhancing local feature learning by combining multiple-loss dynamic training strategies
CN112651940A (en) Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
CN111723667A (en) Human body joint point coordinate-based intelligent lamp pole crowd behavior identification method and device
Xu et al. Task-aware meta-learning paradigm for universal structural damage segmentation using limited images
CN111242003B (en) Video salient object detection method based on multi-scale constrained self-attention mechanism
CN111209886B (en) Rapid pedestrian re-identification method based on deep neural network
CN110135253B (en) Finger vein authentication method based on long-term recursive convolutional neural network
CN116758621A (en) Self-attention mechanism-based face expression depth convolution identification method for shielding people
Nimbarte et al. Biased face patching approach for age invariant face recognition using convolutional neural network
CN112818887B (en) Human skeleton sequence behavior identification method based on unsupervised learning
CN114821631A (en) Pedestrian feature extraction method based on attention mechanism and multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant