CN113762082B

CN113762082B - Unsupervised skeleton action recognition method based on cyclic graph convolution automatic encoder

Info

Publication number: CN113762082B
Application number: CN202110908006.9A
Authority: CN
Inventors: 赵生捷; 梁爽; 姚晗
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2024-02-27
Anticipated expiration: 2041-08-09
Also published as: CN113762082A

Abstract

The invention relates to an unsupervised skeleton action recognition method based on a cyclic graph convolution autoencoder, which is characterized in that it includes: inputting a human skeleton action sequence into a cyclic graph convolution encoder; and outputting the action from the cyclic graph convolution encoder. The representation vector of the sequence; the representation vector of the action sequence is calculated through the weighted nearest neighbor classification algorithm to obtain the recognition category of the human skeleton action sequence; the cyclic graph convolution encoder includes: a multi-layer spatial joint attention module, which is used to combine the human skeleton action sequence and The hidden layer of the recurrent graph convolution encoder adaptively measures the importance of different joints in different actions to obtain a weighted skeleton sequence; the multi-layer graph convolution gated recurrent unit layer is used to integrate the connection relationship features of the weighted skeleton sequence. Get the representation vector of the action sequence. Compared with the existing technology, the present invention can significantly improve the recognition accuracy of the unsupervised action recognition system and has broad application prospects.

Description

Unsupervised skeleton action recognition method based on recurrent graph convolutional autoencoder

技术领域Technical field

本发明涉及计算机视觉、动作识别技术领域，尤其是涉及一种基于循环图卷积自动编码器的无监督骨架动作识别方法。The invention relates to the technical fields of computer vision and action recognition, and in particular to an unsupervised skeleton action recognition method based on a cyclic graph convolution autoencoder.

背景技术Background technique

人和动物等自然界生物体的运动，包括全身运动以及头、肢体、手、眼等身体的部分运动，通常被称为生物运动。这些运动形式对于人类感知动态环境变化以及推断他人或其他物种的意图至关重要。识别和理解被观察个体的动作是人体视觉感知的基本属性，在不同场景下对动作的识别能力也是至关重要的。基于以上原因，人体动作识别任务吸引了大量计算机视觉领域研究人员的注意。由于动作识别任务应用广泛，例如应用在视频监控，人机交互，运动分析等领域，所以逐渐发展为一个重要的研究方向。对人体动作识别的研究可追溯到1973年，Johansson通过实验观察发现人体动作主要通过若干身体关键骨骼点的移动来实现，10-12个关键结点的组合与追踪便能刻画如步行、跑步、跳舞等动作，实现人体动作的识别。The movements of natural organisms such as humans and animals, including whole body movements and partial movements of the body such as head, limbs, hands, eyes, etc., are usually called biological movements. These forms of movement are critical for humans to perceive dynamic environmental changes and infer the intentions of other people or other species. Recognizing and understanding the actions of an observed individual is a basic attribute of human visual perception, and the ability to recognize actions in different scenarios is also crucial. For the above reasons, the human action recognition task has attracted the attention of a large number of researchers in the field of computer vision. Since action recognition tasks are widely used, such as in video surveillance, human-computer interaction, motion analysis and other fields, it has gradually developed into an important research direction. Research on human action recognition can be traced back to 1973. Through experimental observations, Johansson found that human actions are mainly realized through the movement of several key skeletal points of the body. The combination and tracking of 10-12 key nodes can describe such things as walking, running, Dancing and other movements can realize the recognition of human body movements.

近年来，随着Kinect和RealSense等深度传感器的相继问世与迅速发展，人类可以更加便捷的获得图像的RGB信息、深度信息以及骨架信息。这也给动作识别领域带来了巨大的发展。早期的动作识别方法大多基于视频序列，但面临的计算复杂度高，易受其他因素影响等缺点，但骨架信息对人体外观、环境交互、视角变化等因素都非常鲁棒，同时计算复杂度低，数据易存储。基于骨架数据的动作识别成为快速发展的研究方向，利用这些关键点的变化信息就可以进行有效的动作识别。In recent years, with the advent and rapid development of depth sensors such as Kinect and RealSense, humans can more easily obtain the RGB information, depth information and skeleton information of images. This has also brought huge development to the field of action recognition. Most of the early action recognition methods were based on video sequences, but they faced shortcomings such as high computational complexity and vulnerability to other factors. However, skeleton information is very robust to factors such as human body appearance, environmental interaction, and perspective changes, and has low computational complexity. , data is easy to store. Action recognition based on skeleton data has become a rapidly developing research direction. Effective action recognition can be carried out by using the change information of these key points.

目前，基于骨架数据的动作识别领域的研究也随着深度学习技术的发展而快速变化，从最早的利用人工提取的特征进行识别，到目前的利用循环神经网络、卷积神经网络进行识别。但是这些方法都无法利用骨架数据本身的拓扑特征，由此识别精度还有待提高。At present, research in the field of action recognition based on skeleton data is also changing rapidly with the development of deep learning technology, from the earliest recognition using manually extracted features to the current recognition using recurrent neural networks and convolutional neural networks. However, these methods cannot utilize the topological characteristics of the skeleton data itself, so the recognition accuracy needs to be improved.

发明内容Contents of the invention

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于循环图卷积自动编码器的无监督骨架动作识别方法。The purpose of the present invention is to provide an unsupervised skeleton action recognition method based on a cyclic graph convolution autoencoder in order to overcome the shortcomings of the above-mentioned existing technologies.

本发明的目的可以通过以下技术方案来实现：The object of the present invention can be achieved through the following technical solutions:

一种基于循环图卷积自动编码器的无监督骨架动作识别方法，包括以下步骤：An unsupervised skeleton action recognition method based on recurrent graph convolution autoencoder, including the following steps:

S1、将人体骨架动作序列输入至循环图卷积编码器；S1. Input the human skeleton action sequence to the cyclic graph convolution encoder;

S2、循环图卷积编码器输出得到动作序列的表征向量；S2. The cyclic graph convolution encoder outputs the representation vector of the action sequence;

S3、通过加权最近邻分类算法计算动作序列的表征向量得到人体骨架动作序列的识别类别；S3. Calculate the representation vector of the action sequence through the weighted nearest neighbor classification algorithm to obtain the recognition category of the human skeleton action sequence;

所述循环图卷积编码器包括：多层空间关节注意力模块，用于结合人体骨架动作序列和循环图卷积编码器的隐藏层，自适应衡量不同动作不同关节的重要性，得到加权的骨架序列；多层图卷积门控循环单元层，用于整合加权的骨架序列的连接关系特征，得到动作序列的表征向量。The cycle graph convolution encoder includes: a multi-layer spatial joint attention module, which is used to combine the human skeleton action sequence and the hidden layer of the cycle graph convolution encoder to adaptively measure the importance of different joints in different actions to obtain a weighted Skeleton sequence; a multi-layer graph convolution gated recurrent unit layer is used to integrate the connection relationship features of the weighted skeleton sequence to obtain the representation vector of the action sequence.

进一步地，空间关节注意力模块中，加权的骨架序列计算表达式为：Furthermore, in the spatial joint attention module, the weighted skeleton sequence calculation expression is:

x′_t＝(α_t+1)·x_t x′ _t =(α _t +1)·x _t

s_t＝U_sφ(W_xx_t+W_hh_t-1+b_s)+b_u s _t =U _s φ(W _x x _t +W _h h _t-1 +b _s )+b _u

式中，x′_t表示加权的骨架序列，α_t表示每个关节的重要性，s_t表示每个关节的重要性得分，表示在t时刻N个关节的序列坐标，h_t-1表示隐含层信息，W_x和W_h表示可学习的参数矩阵，φ表示激活函数，b_s和b_u表示偏置。In the formula, x′ _t represents the weighted skeleton sequence, α _t represents the importance of each joint, s _t represents the importance score of each joint, represents the sequence coordinates of N joints at time t, h _t-1 represents the hidden layer information, W _x and W _h represent the learnable parameter matrix, φ represents the activation function, and b _s and b _u represent the bias.

进一步地，图卷积门控循环单元层中，整合加权的骨架序列的连接关系特征的表达式为：Furthermore, in the graph convolution gated recurrent unit layer, the expression for integrating the connection relationship features of the weighted skeleton sequence is:

式中，H^(l+1)表示图卷积第l+1层的输出，代表带有自旋的对称邻接矩阵，A表示邻接矩阵，I表示单位矩阵，/>为度矩阵，τ代表激活函数，H^(l)代表图卷积第l层的输出，Θ^(l)代表第l层的一个可学习的参数矩阵。In the formula, H ^(l+1) represents the output of the l+1th layer of graph convolution, Represents the symmetric adjacency matrix with spin, A represents the adjacency matrix, I represents the identity matrix,/> is the degree matrix, τ represents the activation function, H ^(l) represents the output of the l-th layer of graph convolution, and Θ ^(l) represents a learnable parameter matrix of the l-th layer.

进一步地，图卷积门控循环单元层的表示式为：Furthermore, the expression of the graph convolution gated recurrent unit layer is:

式中，z_t代表更新门，r_t代表重置门，代表候选的激活向量，/>代表图卷积和H^(l ⁺¹⁾对应，W_xz、W_hz、W_xr、W_hr、W_xh和W_hh表示不同门控中的参数矩阵，⊙代表Hadamard乘子。In the formula, z _t represents the update gate, r _t represents the reset gate, Represents the candidate activation vector, /> represents the graph convolution corresponding to H ^(l ⁺¹⁾ , W _xz , W _hz , W _xr , W _hr , W _xh and W _hh represent the parameter matrices in different gates, and ⊙ represents the Hadamard multiplier.

进一步地，所述循环图卷积编码器的训练步骤包括Further, the training steps of the recurrent graph convolutional encoder include

A1、向循环图卷积编码器输入训练动作序列集，从而得到动作序列的表征向量；A1. Input the training action sequence set to the cyclic graph convolution encoder to obtain the representation vector of the action sequence;

A2、将动作序列的表征向量和隐含层向量输入至解码器，进行序列复原得到重建动作序列集；A2. Input the representation vector and hidden layer vector of the action sequence to the decoder, and perform sequence restoration to obtain the reconstructed action sequence set;

A3、将重建动作序列集和训练动作序列集进行对比，通过重建损失函数计算损失函数值；A3. Compare the reconstructed action sequence set and the training action sequence set, and calculate the loss function value through the reconstruction loss function;

重复上述步骤A1到A3直至损失函数值达到预设截止条件。Repeat the above steps A1 to A3 until the loss function value reaches the preset cutoff condition.

进一步地，所述隐含层向量为数值为零，长度和人体骨架动作序列相同的向量。Further, the hidden layer vector is a vector with a value of zero and the same length as the human skeleton action sequence.

进一步地，所述重建损失函数的表达式为：Further, the expression of the reconstruction loss function is:

式中，表示训练动作序列集，/>代表重建动作序列集，||·||_F代表Frobenius范数，L表示损失函数值。In the formula, Represents a set of training action sequences,/> represents the reconstructed action sequence set, ||·|| _F represents the Frobenius norm, and L represents the loss function value.

进一步地，所述循环图卷积编码器采用梯度下降的方法进行训练。Furthermore, the cyclic graph convolutional encoder is trained using a gradient descent method.

进一步地，所述空间关节注意力模块和图卷积门控循环单元层均为三层。Further, the spatial joint attention module and the graph convolution gated recurrent unit layer are both three layers.

进一步地，所述加权最近邻分类算法中，得到k个最接近的样本后，k为设定数值，计算每个类别的投票数，并通过加权投票获得识别结果，其中权重的计算表达式为：Further, in the weighted nearest neighbor classification algorithm, after obtaining the k closest samples, k is a set value, the number of votes for each category is calculated, and the recognition result is obtained through weighted voting, where the calculation expression of the weight is :

式中，w_i和d_i分别表示样本i的投票权重以及余弦距离。In the formula, w _i and d _i represent the voting weight and cosine distance of sample i respectively.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明将循环图卷积编码器应用于骨架动作识别，并且在循环图卷积编码器中设置了多层空间关节注意力模块，使识别过程考虑了骨架序列数据的空间拓扑关系，利用动作序列的时空依赖关系，提升识别精度；同时，本发明采用了加权最近邻分类算法作为分类器进行最后识别，利用指数爆炸的思想保证对结果有益的样本具有更大的投票权重，进一步提高了识别的准确性。The present invention applies the cycle graph convolution encoder to skeleton action recognition, and sets up a multi-layer spatial joint attention module in the cycle graph convolution encoder, so that the recognition process takes into account the spatial topological relationship of the skeleton sequence data, and uses the action sequence The spatio-temporal dependence improves the recognition accuracy; at the same time, the present invention adopts the weighted nearest neighbor classification algorithm as the classifier for final recognition, and uses the idea of exponential explosion to ensure that samples that are beneficial to the result have greater voting weight, further improving the recognition accuracy. accuracy.

附图说明Description of drawings

图1为本发明的整体流程示意图。Figure 1 is a schematic diagram of the overall process of the present invention.

图2为空间关节注意力模块示意图。Figure 2 is a schematic diagram of the spatial joint attention module.

图3为图卷积门控循环单元层示意图。Figure 3 is a schematic diagram of the graph convolution gated recurrent unit layer.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented based on the technical solution of the present invention and provides detailed implementation modes and specific operating procedures. However, the protection scope of the present invention is not limited to the following embodiments.

本实施例提供了一种基于循环图卷积自动编码器的无监督骨架动作识别方法，用以解决现有无监督动作识别方法忽略动作序列空间依赖关系的问题，提升动作识别的准确率。This embodiment provides an unsupervised skeleton action recognition method based on a cyclic graph convolution autoencoder to solve the problem that existing unsupervised action recognition methods ignore the spatial dependence of action sequences and improve the accuracy of action recognition.

如图1中的直线流程所示，本实施例的具体步骤如下：As shown in the straight-line process in Figure 1, the specific steps of this embodiment are as follows:

步骤S1、将人体骨架动作序列经过预处理后输入至循环图卷积编码器。Step S1: After preprocessing, the human skeleton action sequence is input to the cyclic graph convolution encoder.

步骤S2、循环图卷积编码器输出得到动作序列的表征向量。其中循环图卷积编码器包括：多层空间关节注意力模块，用于结合人体骨架动作序列和循环图卷积编码器的隐藏层，自适应衡量不同动作不同关节的重要性，得到加权的骨架序列；多层图卷积门控循环单元层(图卷积GRU层)，用于整合加权的骨架序列的连接关系特征，得到动作序列的表征向量。空间关节注意力模块和图卷积门控循环单元层优选采用三层。Step S2: The cyclic graph convolution encoder outputs the representation vector of the action sequence. The cycle graph convolution encoder includes: a multi-layer spatial joint attention module, which is used to combine the human skeleton action sequence and the hidden layer of the cycle graph convolution encoder to adaptively measure the importance of different joints in different actions to obtain a weighted skeleton. Sequence; a multi-layer graph convolution gated recurrent unit layer (graph convolution GRU layer) is used to integrate the connection relationship features of the weighted skeleton sequence to obtain the representation vector of the action sequence. The spatial joint attention module and the graph convolution gated recurrent unit layer are preferably three layers.

步骤S3、通过加权最近邻分类算法计算动作序列的表征向量得到人体骨架动作序列的识别类别，完成动作识别流程。Step S3: Calculate the representation vector of the action sequence through the weighted nearest neighbor classification algorithm to obtain the recognition category of the human skeleton action sequence, completing the action recognition process.

循环图卷积编码器的训练步骤如图1中的虚线流程所述，整体是采用梯度下降的方法进行训练，具体如下：The training steps of the loop graph convolutional encoder are described in the dotted line process in Figure 1. The overall training is performed using the gradient descent method, as follows:

步骤A1、向循环图卷积编码器输入训练动作序列集，从而得到动作序列的表征向量。Step A1: Input the training action sequence set to the recurrent graph convolution encoder to obtain the representation vector of the action sequence.

步骤A2、将动作序列的表征向量和隐含层向量输入至解码器，进行序列复原，得到重建动作序列集；Step A2: Input the representation vector and hidden layer vector of the action sequence to the decoder, perform sequence restoration, and obtain a reconstructed action sequence set;

步骤A3、将重建动作序列集和训练动作序列集进行对比，通过重建损失函数计算损失函数值。Step A3: Compare the reconstructed action sequence set and the training action sequence set, and calculate the loss function value through the reconstruction loss function.

步骤重复上述步骤A1到A3直至损失函数值达到预设截止条件。Steps Repeat the above steps A1 to A3 until the loss function value reaches the preset cutoff condition.

上述训练过程中，训练的重建损失函数的表达式为：During the above training process, the expression of the trained reconstruction loss function is:

接下来，本实施方式分为几部分对本实施例加以详细说明。Next, this embodiment will be described in detail in several parts.

一、空间关节注意力模块如图2所示。空间关节注意力模块用于结合人体骨架动作序列x_t和循环图卷积编码器的隐藏层h_t-1，自适应地衡量不同动作不同关节的重要性，得到加权的骨架序列x′_t。计算加权的骨架序列x′_t具体如下：1. The spatial joint attention module is shown in Figure 2. The spatial joint attention module is used to combine the human skeleton action sequence x _t and the hidden layer h _t-1 of the recurrent graph convolution encoder, adaptively measure the importance of different joints in different actions, and obtain a weighted skeleton sequence x′ _t . The details of calculating the weighted skeleton sequence x′ _t are as follows:

首先，计算每个关节的重要性得分s_t，其计算表达式为：First, calculate the importance score s _t of each joint, and its calculation expression is:

式中，表示在t时刻N个关节的序列坐标，h_t-1表示隐含层信息，W_x和W_h表示可学习的参数矩阵，φ表示激活函数，b_s和b_u表示偏置。In the formula, represents the sequence coordinates of N joints at time t, h _t-1 represents the hidden layer information, W _x and W _h represent the learnable parameter matrix, φ represents the activation function, and b _s and b _u represent the bias.

然后，计算每个关节的重要性α_t，其计算表达式为：Then, calculate the importance α _t of each joint, and its calculation expression is:

最后，计算加权的骨架序列x′_t，其计算表达式为：Finally, the weighted skeleton sequence x′ _t is calculated, and its calculation expression is:

x′_t＝(α_t+1)·x_t x′ _t =(α _t +1)·x _t

其中，·代表点乘。Among them, · represents dot product.

二、多层图卷积门控循环单元层如图3所示。多层图卷积门控循环单元层用于整合加权的骨架序列的连接关系特征，充分利用每一帧关节之间的空间依赖关系，同时保留时间维度的特征，得到动作序列的表征向量。2. The multi-layer graph convolution gated recurrent unit layer is shown in Figure 3. The multi-layer graph convolution gated recurrent unit layer is used to integrate the connection relationship features of the weighted skeleton sequence, make full use of the spatial dependence between the joints of each frame, while retaining the characteristics of the time dimension, and obtain the representation vector of the action sequence.

图卷积门控循环单元层中，整合加权的骨架序列的连接关系特征的表达式为：In the graph convolution gated recurrent unit layer, the expression for integrating the connection relationship features of the weighted skeleton sequence is:

式中，H^(l+1)表示图卷积第l+1层的输出，代表带有自旋的对称邻接矩阵，A表示图的邻接矩阵，I表示单位矩阵，/>为度矩阵，τ代表激活函数，H^(l)代表图卷积第l层的输出，Θ^(l)代表第l层的一个可学习的参数矩阵。In the formula, H ^(l+1) represents the output of the l+1th layer of graph convolution, Represents the symmetric adjacency matrix with spin, A represents the adjacency matrix of the graph, I represents the identity matrix,/> is the degree matrix, τ represents the activation function, H ^(l) represents the output of the l-th layer of graph convolution, and Θ ^(l) represents a learnable parameter matrix of the l-th layer.

图卷积门控循环单元层结合了图卷积与门控循环单元，其表达式为：The graph convolution gated recurrent unit layer combines graph convolution and gated recurrent units, and its expression is:

三、在循环图卷积编码器的训练步骤中，解码器的输入为动作序列的表征向量以及一个隐含层向量，其中为了使解码器完全依赖于编码器传递的状态，从而迫使编码器学习更好的特征表示，隐含层向量采用为全为0的，与x_t相同大小的向量。3. In the training step of the recurrent graph convolutional encoder, the input of the decoder is the representation vector of the action sequence and a hidden layer vector. In order to make the decoder completely rely on the state passed by the encoder, thus forcing the encoder to learn For better feature representation, the hidden layer vector is a vector that is all 0 and has the same size as x _t .

四、本实施例采用了加权最近邻分类算法作为分类器，由动作序列的表征向量得到人体骨架动作序列的识别类别。具体来说，得到前k个最接近的样本后，计算每个类别的投票数，并通过加权投票获得识别结果。在本实例中k＝9。其中权重的计算表达式为：4. This embodiment uses the weighted nearest neighbor classification algorithm as the classifier, and obtains the recognition category of the human skeleton action sequence from the representation vector of the action sequence. Specifically, after obtaining the top k closest samples, the number of votes for each category is calculated, and the recognition result is obtained through weighted voting. In this example k=9. The calculation expression of the weight is:

为了支持以及验证本发明提出的动作识别方法的性能，本实施例在三个公开标准数据集上，采用识别准确率作为评价指标，将本发明同其它的常规无监督骨架动作识别方法，包括LongT GAN(Long-Term Dynamics GAN,长期动态生成对抗网络)、P&C(Predict&Cluster，预测和聚集)、MS2L(Multi-Task Self-Supervised Learning，多任务自监督学习)进行了比较。In order to support and verify the performance of the action recognition method proposed by the present invention, this embodiment uses recognition accuracy as the evaluation index on three public standard data sets, and compares the present invention with other conventional unsupervised skeleton action recognition methods, including LongT GAN (Long-Term Dynamics GAN, long-term dynamic generative adversarial network), P&C (Predict&Cluster, prediction and aggregation), MS2L (Multi-Task Self-Supervised Learning, multi-task self-supervised learning) were compared.

表1为NTU-RGB+D 60数据集上，本发明与其他基于骨架的无监督动作识别方法识别准确率的比较。其中，CS(Cross-Subject)与CV(Cross-View)代表该数据集的两种不同测试方法，其中CS是指按照采集数据的不同志愿者为依据划分训练集和测试集，CV是指按照采集数据所使用的不同视角相机的结果为依据划分训练集和测试集。Table 1 shows the comparison of the recognition accuracy of the present invention and other skeleton-based unsupervised action recognition methods on the NTU-RGB+D 60 data set. Among them, CS (Cross-Subject) and CV (Cross-View) represent two different testing methods for this data set. CS refers to dividing the training set and test set based on the different volunteers who collected the data, and CV refers to dividing the training set and the test set based on the different volunteers who collected the data. The training set and test set are divided based on the results of different viewing angle cameras used to collect data.

表1 NTU-RGB+D 60数据集上识别准确率(％)对比Table 1 Comparison of recognition accuracy (%) on NTU-RGB+D 60 data set

从表1中可以看出，在NTU-RGB+D 60数据集的两种测试方法CS和CV上，本发明提出的基于循环图卷积自动编码器的无监督人体动作识别方法均优于现有方法，分别高于现有方法1.8和2.9个百分点。As can be seen from Table 1, on the two test methods CS and CV of the NTU-RGB+D 60 data set, the unsupervised human action recognition method based on the cyclic graph convolution autoencoder proposed by the present invention is better than the existing one. method, which is 1.8 and 2.9 percentage points higher than the existing method respectively.

表2为NW-UCLA数据集上，本发明与其他基于骨架的无监督动作识别方法识别准确率的比较。Table 2 shows the comparison of the recognition accuracy between the present invention and other skeleton-based unsupervised action recognition methods on the NW-UCLA data set.

表2 NW-UCLA数据集上识别准确率(％)对比Table 2 Comparison of recognition accuracy (%) on NW-UCLA data set

从表2中可以看出，虽然在NW-UCLA数据集上现有方法已经取得了80％以上的精度，但本发明提出的方法仍然可以进一步提高识别精度。As can be seen from Table 2, although the existing method has achieved more than 80% accuracy on the NW-UCLA data set, the method proposed by the present invention can still further improve the recognition accuracy.

表3为UWA3D数据集上，本发明与其他基于骨架的无监督动作识别方法识别准确率的比较。其中，V3与V4代表UWA3D数据集上两种测试方法。Table 3 shows the comparison of recognition accuracy between the present invention and other skeleton-based unsupervised action recognition methods on the UWA3D data set. Among them, V3 and V4 represent two testing methods on the UWA3D data set.

表3 UWA3D数据集上识别准确率(％)对比Table 3 Comparison of recognition accuracy (%) on UWA3D data set

从表3中可以看出，相比于其他基于骨架的无监督动作识别方法，在UWA3D数据集上，本方法在所有测试方法上都具有更优异的识别准确率，在V4的测试条件下，提升了2.4％的识别准确率。这三个数据集上的实施例共同说明了本发明提出的一种基于循环图卷积自动编码器的无监督人体骨架动作识别方法能够在不同数据集，不同测试条件下稳定地取得优异的识别准确率。As can be seen from Table 3, compared with other skeleton-based unsupervised action recognition methods, this method has better recognition accuracy in all test methods on the UWA3D data set. Under the test conditions of V4, Improved recognition accuracy by 2.4%. The examples on these three data sets jointly illustrate that the unsupervised human skeleton action recognition method based on cyclic graph convolution autoencoder proposed by the present invention can stably achieve excellent recognition in different data sets and different test conditions. Accuracy.

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred embodiments of the present invention are described in detail above. It should be understood that those skilled in the art can make many modifications and changes based on the concept of the present invention without creative efforts. Therefore, any technical solutions that can be obtained by those skilled in the art through logical analysis, reasoning or limited experiments based on the concept of the present invention and on the basis of the prior art should be within the scope of protection determined by the claims.

Claims

1. An unsupervised skeleton action recognition method based on a cyclic graph convolution automatic encoder is characterized by comprising the following steps:

s1, inputting a human skeleton action sequence to a cyclic graph convolution encoder;

s2, outputting and obtaining a characterization vector of the action sequence by a cyclic graph convolution encoder;

s3, calculating a characterization vector of the action sequence through a weighted nearest neighbor classification algorithm to obtain an identification category of the human skeleton action sequence;

the cyclic graph convolution encoder includes: the multi-layer space joint attention module is used for combining the human skeleton action sequence and a hidden layer of the cyclic graph convolution encoder, and adaptively measuring the importance of different joints with different actions to obtain a weighted skeleton sequence; the multi-layer diagram convolution gating circulating unit layer is used for integrating the connection relation characteristics of the weighted skeleton sequences to obtain the characterization vector of the action sequence;

in the spatial joint attention module, the weighted skeleton sequence calculation expression is:

x′ _t ＝(α _t +1)·x _t

s _t ＝U _s φ(W _x x _t +W _h h _t-1 +b _t )+b _u

wherein x 'is' _t Representing a weighted skeleton sequence, alpha _t Representing the importance of each joint s _t Representing the importance score of each joint,representing the sequence coordinates of N joints at time t, h _t-1 Representing hidden layer information, W _x And W is _h Represents a matrix of learnable parameters, phi represents an activation function, b _s And b _u Representing the bias;

in the graph convolution gating circulating unit layer, the expression of the connection relation characteristic of the integrated weighted skeleton sequence is as follows:

wherein H is ^(l+1) Indicating that the output of layer l +1 is convolved,representing a symmetric adjacency matrix with spin, A represents the adjacency matrixI represents an identity matrix,>for the degree matrix, τ represents the activation function, H ^(l) Representing the output of layer I of the graph convolution, Θ ^(l) A matrix of learnable parameters representing the first layer;

the representation of the graph convolution gating loop unit layer is as follows:

wherein z is _t Representing an update gate, r _t Representing a reset gate and,activation vector representing candidate->Representative graph convolution sum H ^(l+1) Correspondingly, W _xz 、W _hz 、W _xr 、W _hr 、W _xh And W is _hh Representing the parameter matrix in different gating, +..

2. The method for recognizing actions of an unsupervised skeleton based on a cyclic graph convolution automatic encoder according to claim 1, wherein the training step of the cyclic graph convolution encoder comprises

A1, inputting a training action sequence set to a cyclic graph convolution encoder so as to obtain a characterization vector of an action sequence;

a2, inputting the characterization vector and the hidden layer vector of the action sequence to a decoder, and performing sequence restoration to obtain a reconstructed action sequence set;

a3, comparing the reconstructed action sequence set with the training action sequence set, and calculating a loss function value through a reconstruction loss function;

repeating the steps A1 to A3 until the loss function value reaches the preset cut-off condition.

3. The method for recognizing the actions of the unsupervised skeleton based on the automatic loop-chart convolution encoder according to claim 2, wherein the hidden layer vector is a vector with a value of zero and the same length as the action sequence of the human skeleton.

4. The method for recognizing the actions of the unsupervised skeleton based on the automatic loop-chart convolution encoder according to claim 2, wherein the expression of the reconstruction loss function is:

in the method, in the process of the invention,representing training action sequence set,/->Representing a set of reconstructed action sequences, I.I _F Representing the Frobenius norm and L representing the loss function value.

5. The method for recognizing actions of an unsupervised skeleton based on a cyclic graph convolution automatic encoder according to claim 2, wherein the cyclic graph convolution encoder is trained by a gradient descent method.

6. The method for recognizing actions of an unsupervised skeleton based on a loop-based convolution automatic encoder according to claim 1, wherein the spatial joint attention module and the loop-based convolution gating loop unit layer are three layers.

7. The method for recognizing the actions of the unsupervised skeleton based on the automatic loop-chart convolution encoder according to claim 1, wherein in the weighted nearest neighbor classification algorithm, k is a set value after k nearest samples are obtained, the number of votes in each category is calculated, and recognition results are obtained through weighted voting, wherein the calculation expression of the weight is:

wherein w is _i And d _i The voting weights and cosine distances of the samples i are shown, respectively.