CN113673411B

CN113673411B - Attention mechanism-based lightweight shift graph convolution behavior identification method

Info

Publication number: CN113673411B
Application number: CN202110941050.XA
Authority: CN
Inventors: 宋晓宁; 苏江毅; 冯振华
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2022-10-18
Anticipated expiration: 2041-08-17
Also published as: CN113673411A

Abstract

The invention discloses a light-weight shift mapping convolution behavior identification method based on an attention mechanism, which comprises the steps of preprocessing a data set to generate a joint point information flow data set, a bone length information flow data set, a joint point information flow data set based on motion information and a bone length information flow data set based on the motion information; constructing an ALS-GCN network, and obtaining the time-space characteristics of the information flow through the ALS-GCN network; fusing the time-space characteristics of the information flow to obtain a behavior recognition result; the problem of undersize receptive field is solved by constructing a spatial shifting module based on an attention mechanism; meanwhile, the problem of overhigh parameter quantity caused by a nonlinear stacking mode is solved by constructing the time shifting module, and high identification precision can be achieved under the condition of less calculation amount.

Description

A Lightweight Shifted Graph Convolution Behavior Recognition Method Based on Attention Mechanism

技术领域technical field

本发明涉及的技术领域，尤其涉及一种基于注意力机制的轻量级移位图卷积行为识别方法。The technical field to which the present invention relates, in particular, to a light-weight shift graph convolution behavior recognition method based on an attention mechanism.

背景技术Background technique

行为识别是人工智能领域的重要研究方向之一，在视频监督，智能监控和人机交互等方向有着重要的应用。行为识别同时也是一项具有挑战性的任务，不仅因为处理视频片段所需的计算要求更高，而且易受外界环境因素的影响。这导致了基于RGB视频的行为识别方法往往难以同时满足时效性和准确度的要求。最近几年，得益于深度相机的发展与普及，例如，Microsoft Kinetic，基于深度信息的行为识别逐渐成为了该领域的重要研究方向之一。与传统的RGB数据相比，骨架序列因为不包含颜色信息，所以具有简洁、易校准、不易受外观因素影响的特点。Behavior recognition is one of the important research directions in the field of artificial intelligence, and has important applications in video supervision, intelligent monitoring and human-computer interaction. Behavior recognition is at the same time a challenging task, not only because of the higher computational requirements required to process video clips, but also susceptible to external environmental factors. As a result, it is often difficult for action recognition methods based on RGB video to meet the requirements of timeliness and accuracy at the same time. In recent years, thanks to the development and popularization of depth cameras, such as Microsoft Kinetic, behavior recognition based on depth information has gradually become one of the important research directions in this field. Compared with traditional RGB data, the skeleton sequence has the characteristics of simplicity, easy calibration, and not easily affected by appearance factors because it does not contain color information.

早期的基于人体骨架的行为识别方法主要通过手工设计特征的方式来对行为进行表征，但是，由于手工提取的特征往往表征能力有限并且需要耗费大量精力用于调参优化。Early behavior recognition methods based on human skeletons mainly characterized behaviors by manually designing features. However, the hand-extracted features often have limited representation ability and require a lot of energy for parameter tuning and optimization.

目前主流的基于骨架行为识别的方法虽然在表征时间信息方面优势明显，可以从不同时间区间提取多尺度的特定局部模式，但都存在一定的问题，例如优化难度高、易丢失原始的关节点信息、存在参数量过于庞大，对计算要求过高的问题。Although the current mainstream skeleton behavior recognition methods have obvious advantages in representing time information and can extract multi-scale specific local patterns from different time intervals, they all have certain problems, such as high optimization difficulty and easy loss of original joint point information. , There is a problem that the amount of parameters is too large and the calculation requirements are too high.

发明内容SUMMARY OF THE INVENTION

本部分的目的在于概述本发明的实施例的一些方面以及简要介绍一些较佳实施例。在本部分以及本申请的说明书摘要和发明名称中可能会做些简化或省略以避免使本部分、说明书摘要和发明名称的目的模糊，而这种简化或省略不能用于限制本发明的范围。The purpose of this section is to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section and the abstract and title of the application to avoid obscuring the purpose of this section, abstract and title, and such simplifications or omissions may not be used to limit the scope of the invention.

鉴于上述现有存在的问题，提出了本发明。The present invention has been proposed in view of the above-mentioned existing problems.

因此，本发明提供了一种基于注意力机制的轻量级移位图卷积行为识别方法，能够解决感受野过小的问题和非线性堆叠的方式造成的参数量过高的问题。Therefore, the present invention provides a light-weight shift graph convolution behavior recognition method based on attention mechanism, which can solve the problem of too small receptive field and the problem of too high parameter amount caused by nonlinear stacking.

为解决上述技术问题，本发明提供如下技术方案：包括，对数据集进行预处理，生成关节点信息流数据集、骨长信息流数据集、基于运动信息的关节点信息流数据集和基于运动信息的骨长信息流数据集；构建ALS-GCN网络，并通过所述ALS-GCN网络获得信息流的时空间特征；融合所述信息流的时空间特征，获得行为识别结果。In order to solve the above technical problems, the present invention provides the following technical solutions: including, preprocessing the data set, generating joint point information flow data set, bone length information flow data set, joint point information flow data set based on motion information and motion-based information flow data set. A data set of bone length information flow of information; construct an ALS-GCN network, and obtain the spatiotemporal characteristics of the information flow through the ALS-GCN network; fuse the spatiotemporal characteristics of the information flow to obtain behavior recognition results.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：所述预处理包括，剔除误检的摄像头数据；定义评判骨架序列能量的指标，剔除误检的关节点数据；归一化关节点坐标数据；归一化视角。As a preferred solution of the light-weight shift graph convolution behavior recognition method based on the attention mechanism of the present invention, wherein: the preprocessing includes: removing falsely detected camera data; defining an index for judging the energy of the skeleton sequence , remove the falsely detected joint point data; normalize the joint point coordinate data; normalize the viewing angle.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：还包括，通过卷积神经网络对所述关节点信息流数据集、骨长信息流数据集、基于运动信息的关节点信息流数据集和基于运动信息的骨长信息流数据集进行特征提取，获得关节点信息流、骨长信息流、运动信息流和基于运动信息的骨长信息流。As a preferred solution of the light-weight shift graph convolution behavior recognition method based on the attention mechanism of the present invention, the method further includes: using a convolutional neural network to analyze the joint point information flow data set, bone length Information flow data set, joint point information flow data set based on motion information and bone length information flow data set based on motion information are extracted to obtain joint point information flow, bone length information flow, motion information flow and bone length information flow based on motion information. long information flow.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：所述ALS-GCN网络包括空间移位模块和时间移位模块；所述空间移位模块包括空间注意力模块和通道注意力模块；通过设定自适应的时序shift图卷积和增加短连接，优化所述时间移位模块。As a preferred solution of the light-weight shift graph convolution behavior recognition method based on the attention mechanism of the present invention, wherein: the ALS-GCN network includes a spatial shift module and a temporal shift module; the spatial shift module The shift module includes a spatial attention module and a channel attention module; the temporal shift module is optimized by setting an adaptive temporal shift graph convolution and adding short connections.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：所述骨长信息流包括，定义离骨架重心较近的关节为源关节，离重心较远的关节为目标关节；t帧中的源关节为：As a preferred solution of the light-weight displacement graph convolution behavior recognition method based on the attention mechanism of the present invention, wherein: the bone length information flow includes defining the joint closer to the center of gravity of the skeleton as the source joint, The joint farther from the center of gravity is the target joint; the source joint in frame t is:

V_i,t＝(x_i,t,y_i,t,z_i,t)V _i,t =( _xi,t ,y _i,t ,z _i,t )

t帧中的目标关节为：The target joint in frame t is:

V_j,t＝(x_j,t,y_j,t,z_j,t)V _j,t =(x _j,t ,y _j,t ,z _j,t )

骨长信息流e_i,j,t定义如下：The bone length information flow e _i,j,t is defined as follows:

e_i,j,t＝V_j,t-V_i,t e _i,j,t =V _j,t -V _i,t

其中，i和j为关节点，x、y、z为关节坐标。Among them, i and j are joint points, and x, y, and z are joint coordinates.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：所述运动信息流包括，As a preferred solution of the light-weight shift graph convolution behavior recognition method based on the attention mechanism of the present invention, wherein: the motion information flow includes:

em_i,t,t+1＝V_i,t-V_i,t+1＝(x_i,t+1-x_i,t,y_i,t+1-y_i,t,z_i,t+1-z_i,t)em _i,t,t+1 =V _i,t -V _i,t+1 =( _xi,t+1 -xi _,t ,y _i,t+1 -y _i,t ,z _{i,t +1} -z _i,t )

其中，em_i,t,t+1为所述运动信息流。Wherein, em _i,t,t+1 is the motion information flow.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：基于运动信息的骨长信息流包括，As a preferred solution of the light-weight shift graph convolution behavior recognition method based on the attention mechanism of the present invention, wherein: the bone length information flow based on motion information includes:

bm_i,t,t+1＝e_i,j,t-e_i,j,t+1 bm _i,t,t+1 =e _i,j,t -e _i,j,t+1

其中，bm_i,t,t+1为所述基于运动信息的骨长信息流。Wherein, bm _i,t,t+1 is the bone length information flow based on the motion information.

作为本发明所述的基于注意力机制的轻量级移位图卷积行为识别方法的一种优选方案，其中：所述融合包括，将信息流的时空间特征的Softmax分数进行相加。As a preferred solution of the light-weight shift graph convolution behavior recognition method based on the attention mechanism of the present invention, wherein: the fusion includes adding the Softmax scores of the spatiotemporal features of the information flow.

本发明的有益效果：本发明通过构建基于注意力机制的空间移位模块的方式，解决了感受野过小的问题；同时通过构建时间移位模块的方式，解决了非线性堆叠的方式造成的参数量过高的问题，能够在较少计算量的情况下，达到较高的识别精度。Beneficial effects of the present invention: The present invention solves the problem of too small receptive field by constructing a spatial shift module based on an attention mechanism; at the same time, by constructing a temporal shift module, it solves the problem caused by nonlinear stacking. For the problem that the amount of parameters is too high, higher recognition accuracy can be achieved with less computation.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其它的附图。其中：In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort. in:

图1为本发明第一个实施例所述的基于注意力机制的轻量级移位图卷积行为识别方法的空间移位模块100结构示意图；FIG. 1 is a schematic structural diagram of the spatial shift module 100 of the light-weight shift graph convolution behavior recognition method based on the attention mechanism according to the first embodiment of the present invention;

图2为本发明第一个实施例所述的基于注意力机制的轻量级移位图卷积行为识别方法的正向移入操作与反向移出操作示意图；2 is a schematic diagram of a forward shift-in operation and a reverse shift-out operation of the light-weight shift graph convolution behavior recognition method based on the attention mechanism according to the first embodiment of the present invention;

图3为本发明第一个实施例所述的基于注意力机制的轻量级移位图卷积行为识别方法的整体网络结构示意图。FIG. 3 is a schematic diagram of the overall network structure of the light-weight shift graph convolution behavior recognition method based on the attention mechanism according to the first embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合说明书附图对本发明的具体实施方式做详细的说明，显然所描述的实施例是本发明的一部分实施例，而不是全部实施例。基于本发明中的实施例，本领域普通人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明的保护的范围。In order to make the above objects, features and advantages of the present invention more obvious and easy to understand, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Example. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是本发明还可以采用其他不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似推广，因此本发明不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention, but the present invention can also be implemented in other ways different from those described herein, and those skilled in the art can do so without departing from the connotation of the present invention. Similar promotion, therefore, the present invention is not limited by the specific embodiments disclosed below.

其次，此处所称的“一个实施例”或“实施例”是指可包含于本发明至少一个实现方式中的特定特征、结构或特性。在本说明书中不同地方出现的“在一个实施例中”并非均指同一个实施例，也不是单独的或选择性的与其他实施例互相排斥的实施例。Second, reference herein to "one embodiment" or "an embodiment" refers to a particular feature, structure, or characteristic that may be included in at least one implementation of the present invention. The appearances of "in one embodiment" in various places in this specification are not all referring to the same embodiment, nor are they separate or selectively mutually exclusive from other embodiments.

本发明结合示意图进行详细描述，在详述本发明实施例时，为便于说明，表示器件结构的剖面图会不依一般比例作局部放大，而且所述示意图只是示例，其在此不应限制本发明保护的范围。此外，在实际制作中应包含长度、宽度及深度的三维空间尺寸。The present invention is described in detail with reference to the schematic diagrams. When describing the embodiments of the present invention in detail, for the convenience of explanation, the sectional views showing the device structure will not be partially enlarged according to the general scale, and the schematic diagrams are only examples, which should not limit the present invention. scope of protection. In addition, the three-dimensional spatial dimensions of length, width and depth should be included in the actual production.

同时在本发明的描述中，需要说明的是，术语中的“上、下、内和外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一、第二或第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。At the same time, in the description of the present invention, it should be noted that the orientation or positional relationship indicated in terms such as "upper, lower, inner and outer" is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention. The invention and simplified description do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first, second or third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

本发明中除非另有明确的规定和限定，术语“安装、相连、连接”应做广义理解，例如：可以是固定连接、可拆卸连接或一体式连接；同样可以是机械连接、电连接或直接连接，也可以通过中间媒介间接相连，也可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。Unless otherwise expressly specified and limited in the present invention, the term "installation, connection, connection" should be understood in a broad sense, for example: it may be a fixed connection, a detachable connection or an integral connection; it may also be a mechanical connection, an electrical connection or a direct connection. The connection can also be indirectly connected through an intermediate medium, or it can be the internal communication between two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood in specific situations.

实施例1Example 1

参照图1～3，为本发明的第一个实施例，该实施例提供了一种基于注意力机制的轻量级移位图卷积行为识别方法，包括：Referring to Figures 1-3, it is the first embodiment of the present invention, which provides a light-weight shift graph convolution behavior recognition method based on attention mechanism, including:

S1：对数据集进行预处理，生成关节点信息流数据集、骨长信息流数据集、基于运动信息的关节点信息流数据集和基于运动信息的骨长信息流数据集。S1: Preprocess the data set to generate a joint point information flow data set, a bone length information flow data set, a joint point information flow data set based on motion information and a bone length information flow data set based on motion information.

预处理数据集包括如下步骤：Preprocessing the dataset includes the following steps:

(1)剔除误检的摄像头数据；(1) Eliminate falsely detected camera data;

为了减少因使用微软Kinect摄像头而产生的摄像头误检，以保证人体的id具有一致性，在本实施例中，首先进行检测，即在原始id消失后判断是否会产生一个新的id，如果存在，则认为新生成的id被错误分配，应将旧的id赋给新出现的id，进而减少摄像头的误检率。In order to reduce the false detection of the camera caused by the use of the Microsoft Kinect camera and ensure the consistency of the id of the human body, in this embodiment, the detection is performed first, that is, it is judged whether a new id will be generated after the original id disappears. , it is considered that the newly generated id is wrongly assigned, and the old id should be assigned to the new id, thereby reducing the false detection rate of the camera.

(2)定义评判骨架序列能量的指标，剔除误检的关节点数据；(2) Define the index for judging the energy of the skeleton sequence, and eliminate the falsely detected joint point data;

针对关节点数据可能存在误检的问题，即把桌椅在内的一些物体误检为人体，因此，本实施例引入了一个评判骨架序列能量的指标，其定义为三维坐标在时间维度上标准差的平均值；由于在进行训练数据采集时，静止物体的能量值远小于人体的能量值，因此，可以通过设定阈值的方式，将低于阈值的这些物体过滤掉。Aiming at the problem that the joint point data may be falsely detected, that is, some objects including tables and chairs are falsely detected as human bodies. Therefore, this embodiment introduces an index for judging the energy of the skeleton sequence, which is defined as the standard of three-dimensional coordinates in the time dimension. The average value of the difference; since the energy value of stationary objects is much smaller than the energy value of the human body during training data collection, these objects below the threshold can be filtered out by setting a threshold.

(3)归一化关节点坐标数据；(3) Normalized joint point coordinate data;

为了统一数据的分布，简化所设计模型的训练过程，将出现在第一帧中的第一个人的关节点视为坐标系的原点，然后从每个关节点的坐标中减去中心点的坐标，以将其他帧归一化到这个坐标系中。In order to unify the distribution of data and simplify the training process of the designed model, the joint point of the first person appearing in the first frame is regarded as the origin of the coordinate system, and then the center point is subtracted from the coordinates of each joint point. coordinates to normalize other frames into this coordinate system.

(4)归一化视角。(4) Normalized perspective.

将每一帧中的人体骨架都旋转到一个特定的角度，以降低因视角变化而对模型训练造成的影响；具体来说，就是以第一帧中第一次出现的人体骨架为基准，通过对之后所有帧中的骨架进行旋转操作，使得之后所有帧中骨架的左右肩连线平行于x轴，并且脊柱平行于坐标系的z轴，从而得到一个旋转矩阵。Rotate the human skeleton in each frame to a specific angle to reduce the impact on model training due to changes in perspective; Rotate the skeleton in all subsequent frames so that the line connecting the left and right shoulders of the skeleton in all subsequent frames is parallel to the x-axis, and the spine is parallel to the z-axis of the coordinate system, thereby obtaining a rotation matrix.

通过上述处理，生成可用于网络训练的关节点信息流数据集、骨长信息流数据集、基于运动信息的关节点信息流数据集和基于运动信息的骨长信息流数据集。Through the above processing, a joint point information flow data set, a bone length information flow data set, a joint point information flow data set based on motion information, and a bone length information flow data set based on motion information that can be used for network training are generated.

S2：构建ALS-GCN网络，并通过ALS-GCN网络获得信息流的时空间特征。S2: Build an ALS-GCN network and obtain the spatiotemporal features of the information flow through the ALS-GCN network.

ALS-GCN网络的整体架构由基本块(Basic Block)组成，Basic Block共9个空间移位模块100和时间移位模块200组成，其输出通道数分别为64、64、64、128、128、128、256、256和256。The overall architecture of the ALS-GCN network consists of basic blocks. The Basic Block consists of 9 spatial shift modules 100 and temporal shift modules 200. The number of output channels is 64, 64, 64, 128, 128, 128, 256, 256 and 256.

(1)具体的，参照图1，空间移位模块100包括空间注意力模块101和通道注意力模块102，其中需要说明的是，图中提及的在全局Shift图卷积操作，由于突破了人体物理连接限制，每个节点都与其他节点相互连。(1) Specifically, referring to FIG. 1, the spatial shift module 100 includes a spatial attention module 101 and a channel attention module 102. It should be noted that the convolution operation in the global Shift graph mentioned in the figure, due to the breakthrough of the The physical connection of the human body is limited, and each node is interconnected with other nodes.

①空间注意力模块100，表现在图像上的体现是指对特征图上不同位置的关注程度不同，通过数学语言描述即为：针对某个大小为H×W×C的特征图，一个有效的空间注意力对应一个大小为H×W的矩阵；通过与原特征图上对应位置的像素进行像素相乘操作(Pixel-Wise Multiply)，从而得到相应位置上像素点的权重；在基于关节点的人体骨架数据中，空间注意力模块101可以帮助ALS-GCN网络对每个关节赋予不同程度的注意力权重，具体实现公式如下：①The spatial attention module 100, the manifestation on the image refers to the different degrees of attention to different positions on the feature map, which is described by mathematical language: for a feature map with a size of H×W×C, an effective Spatial attention corresponds to a matrix of size H×W; the pixel-wise multiply operation (Pixel-Wise Multiply) is performed with the pixel at the corresponding position on the original feature map to obtain the weight of the pixel at the corresponding position; in the joint point-based In the human skeleton data, the spatial attention module 101 can help the ALS-GCN network to assign different degrees of attention weight to each joint. The specific implementation formula is as follows:

M_s(f_in)＝σ(g_s(AvgPool(f_in))M _s (f _in )=σ(g _s (AvgPool(f _in ))

其中，σ表示sigmoid激活函数操作；g_s为一维卷积操作，卷积大小为

AvgPool是指平均池化操作，

为输入特征，最后生成的基于空间注意力机制的注意力特征图为M_s∈R^1×1×N；通过残差方式将注意力特征图M_s乘以输入特征图，最终实现了赋予了每个关节点不同程度注意力权重的目的。Among them, σ represents the sigmoid activation function operation; g _s is the one-dimensional convolution operation, and the convolution size is

AvgPool refers to the average pooling operation,

As the input feature, the finally generated attention feature map based on the spatial attention mechanism is M _s ∈ R ^1×1×N ; the attention feature map M _s is multiplied by the input feature map through the residual method, and finally the given The purpose of different degrees of attention weights for each joint point.

②通道注意力模块102，通常是指为了有效计算通道与关键信息的相关度，需要给每个通道分配一个权重值，相关度越大权重值则越高，其表现在图像上就是对不同的图像通道的关注程度不同；反映在数学上就是指：针对某个大小为H×W×C的特征图，生成一个维度为1×1×C的矩阵用以表示通道注意力；通过通道倍增(Channel-Wise Multiply)，从而压缩输入特征图的空间维数，逐元素求和，以产生通道注意力图，在基于关节点的人体骨架数据中，通道注意力模块102可以帮助模型根据输入样本中不同通道特性，加强分辨通道特征，其具体实现如下：②The channel attention module 102 usually means that in order to effectively calculate the correlation between the channel and the key information, it is necessary to assign a weight value to each channel. The degree of attention of image channels is different; it is reflected in mathematics: for a feature map of size H×W×C, a matrix of dimension 1×1×C is generated to represent channel attention; through channel multiplication ( Channel-Wise Multiply), thereby compressing the spatial dimension of the input feature map, and summing it element by element to generate a channel attention map. In the human skeleton data based on joint points, the channel attention module 102 can help the model according to different input samples. Channel characteristics, strengthen the distinguishing channel characteristics, the specific implementation is as follows:

M_C(f_in)＝σ(W₂(δ(W₁(AvgPool(f_in)))))M _C (f _in )=σ(W ₂ (δ(W ₁ (AvgPool(f _in )))))

其中，δ表示ReLU激活函数操作，

与

为两个全连接层权重，其中f_in是关节和帧的平均值。Among them, δ represents the ReLU activation function operation,

and

are the weights of the two fully connected layers, where f _in is the average of joints and frames.

(2)通过设定自适应的时序shift图卷积和增加短连接，优化时间移位模块200。(2) The time shift module 200 is optimized by setting an adaptive time sequence shift graph convolution and adding short connections.

在Shift-GCN，与ST-GCN的基础上对时间移位模块200进行改进，为了方便揭示所提出的方法的本质，用符号

来表示一个骨架序列特征，其中，T为关键帧的数目，N为关节点数，C为通道数。The time shift module 200 is improved on the basis of Shift-GCN and ST-GCN. In order to easily reveal the essence of the proposed method, we use the notation

to represent a skeleton sequence feature, where T is the number of key frames, N is the number of joint points, and C is the number of channels.

设{S_i|i＝1,2,,...,C}为一个可学习的移位参数，通过参数S_i，能够实现正向移入操作与反向移出操作，如图2所示，作为一种自适应的时序shift图卷积，对于每个通道都能够学习出一个可学习的时间偏移参数，相比于需要手动设置移位参数的方式，该方法极大的提高了时序shift图卷积的泛化能力。Let {S _i |i=1,2,,...,C} be a learnable shift parameter, through the parameter S _i , the forward shift-in operation and the reverse shift-out operation can be realized, as shown in Figure 2, As an adaptive timing shift graph convolution, a learnable time shift parameter can be learned for each channel. Compared with the manual setting of shift parameters, this method greatly improves timing shift. The generalization ability of graph convolutions.

此外，为了更好的表征骨架序列在时间维度空间维度上的信息，确保帧内信息的多次使用，从而在加强帧与帧之间相关性的同时，加强特征的表现能力，在原先的基础上增加了短连接，通过这种方式能够在提升精度的同时，尽可能的实现了对网络结构的精简化。In addition, in order to better characterize the information of the skeleton sequence in the time dimension and the space dimension, ensure the multiple use of the information in the frame, so as to strengthen the correlation between frames, and at the same time strengthen the expressive ability of features. A short connection is added to the network. In this way, the network structure can be simplified as much as possible while improving the accuracy.

进一步的，通过卷积神经网络对关节点信息流数据集、骨长信息流数据集、基于运动信息的关节点信息流数据集和基于运动信息的骨长信息流数据集进行特征提取，获得关节点信息流、骨长信息流、运动信息流和基于运动信息的骨长信息流(速度差信息流)。Further, feature extraction is performed on the joint point information flow data set, the bone length information flow data set, the joint point information flow data set based on motion information, and the bone length information flow data set based on motion information through the convolutional neural network to obtain joints. Point information flow, bone length information flow, motion information flow and bone length information flow based on motion information (velocity difference information flow).

其中，因为关节点信息流可以直接获取，此处不详细论述；骨长信息流、运动信息流和基于运动信息的骨长信息流生成过程如下：Among them, because the information flow of joint points can be obtained directly, it will not be discussed in detail here; the generation process of bone length information flow, motion information flow and bone length information flow based on motion information is as follows:

(a)骨长信息流(a) Bone length information flow

定义离骨架重心较近的关节为源关节，离重心较远的关节为目标关节；Define the joint closer to the center of gravity of the skeleton as the source joint, and the joint farther from the center of gravity as the target joint;

t帧中的源关节为：The source joints in frame t are:

V_i,t＝(x_i,t,y_i,t,z_i,t)V _i,t =( _xi,t ,y _i,t ,z _i,t )

t帧中的目标关节为：The target joint in frame t is:

V_j,t＝(x_j,t,y_j,t,z_j,t)V _j,t =(x _j,t ,y _j,t ,z _j,t )

e_i,j,t＝V_j,t-V_i,t e _i,j,t =V _j,t -V _i,t

(b)运动信息流(b) Sports information flow

运动信息流是通过计算相邻两个帧中相同关节点之间的差值得到的，而提取两个连续帧之间的关节和骨骼的坐标差作为运动信息流，可以帮助提取时间特征，其表达式如下：The motion information flow is obtained by calculating the difference between the same joint points in two adjacent frames, and extracting the coordinate difference of the joints and bones between two consecutive frames as the motion information flow can help to extract temporal features. The expression is as follows:

其中，em_i,t,t+1为运动信息流。Among them, em _i,t,t+1 is the motion information flow.

(c)基于运动信息的骨长信息流：(c) Bone length information flow based on motion information:

bm_i,t,t+1＝e_i,j,t-e_i,j,t+1 bm _i,t,t+1 =e _i,j,t -e _i,j,t+1

其中，bm_i,t,t+1为基于运动信息的骨长信息流。Among them, bm _i,t,t+1 is the bone length information flow based on motion information.

最后，通过ALS-GCN网络从关节点信息流、骨长信息流、运动信息流和基于运动信息的骨长信息流中获得信息流的时空间特征，即通过空间移位模块100获得信息流的空间特征，通过时间移位模块200获得信息流的时间特征。Finally, the temporal and spatial characteristics of the information flow are obtained from the joint point information flow, the bone length information flow, the motion information flow and the motion information-based bone length information flow through the ALS-GCN network, that is, the spatial shift module 100 obtains the information flow. Spatial features, the temporal features of the information flow are obtained through the temporal shift module 200 .

S3：融合信息流的时空间特征，获得行为识别结果。S3: Integrate the spatiotemporal features of the information flow to obtain behavior recognition results.

将信息流数据集的时空间特征的Softmax分数进行相加，以完成融合；如图3所示，关节信息和骨骼信息以及它们所对应的运动信息都被集成在一个多流框架中，通过融合生成最终的预测结果，其中所有的流都使用相同架构的ALS-GCN网络。The Softmax scores of the spatio-temporal features of the information flow dataset are added to complete the fusion; as shown in Figure 3, joint information and bone information and their corresponding motion information are integrated in a multi-flow framework, through fusion Generate final predictions, where all streams use the same architecture ALS-GCN network.

实施例2Example 2

本实施例将对基于移位图卷积的方法进行有效性的评估，通过在NTU60 RGB+D与NTU120 RGB+D上的实验结果与其他先进的方法进行对比，从而进一步的验证本方法的有效性。This embodiment will evaluate the effectiveness of the method based on shift graph convolution, and further verify the effectiveness of the method by comparing the experimental results on NTU60 RGB+D and NTU120 RGB+D with other advanced methods sex.

实验设置Experimental setup

本实施例的实验均是在64位的Ubuntu16.04.1LTS操作系统下进行，所有的网络模型均使用4块GeForce RTX 2080Ti GPU进行训练，对于NTU60 RGB+D数据集，batch size设置为64，epoch为80，采用随机梯度下降的梯度下降方法(stochastic gradient descent，SGD)，动量设为0.9，初始学习率设为0.1，并在epoch为40和60时学习率自乘0.1；对于NTU120 RGB+D数据集，batch size设置为64，同样采用SGD的梯度下降方法，考虑到数据集体量的变化，我们将epoch设置为120，初始学习率设为0.1，并在epoch分别为60和90时学习率自乘0.1。所有实验均在PyTorch 1.3.1深度学习框架上进行，Python版本为3.6.2。The experiments in this example are all performed under the 64-bit Ubuntu16.04.1LTS operating system. All network models are trained using 4 GeForce RTX 2080Ti GPUs. For the NTU60 RGB+D dataset, the batch size is set to 64, and the epoch is 80, the stochastic gradient descent (SGD) method is adopted, the momentum is set to 0.9, the initial learning rate is set to 0.1, and the learning rate is multiplied by 0.1 when the epoch is 40 and 60; for NTU120 RGB+D For the data set, the batch size is set to 64, and the gradient descent method of SGD is also used. Considering the change of the data collective, we set the epoch to 120, the initial learning rate to 0.1, and the learning rate when the epoch is 60 and 90 respectively. Multiply by 0.1. All experiments are performed on PyTorch 1.3.1 deep learning framework with Python version 3.6.2.

为了验证本方法的效果，本实施例在NTU60 RGB+D与NTU120 RGB+D这两个数据库上进行实验对比。In order to verify the effect of this method, in this embodiment, experimental comparison is performed on two databases, NTU60 RGB+D and NTU120 RGB+D.

在实验中，我们使用2s-AGCN的实验结果作为baseline，并使用Part-I与Part-II，来代表空间注意力模块101与通道注意力模块102，在NTU60 RGB+D数据集的X-View评判标准下，通过手动删除这两个模块中的其中一个的方式来测试这两个注意力机制的重要性，其中wo/X意味着删除X模块后的实验结果。In the experiment, we use the experimental results of 2s-AGCN as the baseline, and use Part-I and Part-II to represent the spatial attention module 101 and the channel attention module 102, in the X-View of the NTU60 RGB+D dataset Under the judging criteria, the importance of the two attention mechanisms is tested by manually removing one of the two modules, where wo/X means the experimental result after removing the X module.

表1：NTU60 RGB+D数据集中基于关节点信息流的精度对比。Table 1: Accuracy comparison based on joint information flow in NTU60 RGB+D dataset.

得益于Shift-GCN四流架构的成功，以NTU60 RGB+D数据集中X-view作为评判标准，在关节点信息流，骨长信息流，运动信息流和基于运动信息的骨长信息流的基础上验证了ALS-GCN方法的有效性，具体表现如表2所示。具体的融合方式是将多个流的Softmax分数相加，从而得到融合后的分数，其中，1s-ALS-GCN表示仅利用原始骨架坐标作为输入，即为仅使用关节点信息流在X-view标准的分类结果；2s-ALS-GCN则表示为融合关节点信息和骨长信息后的结果；4s-ALS-GCN则表示为对上述四种信息流的进行融合后的最终结果。Benefiting from the success of the Shift-GCN four-stream architecture, X-view in the NTU60 RGB+D dataset is used as the judging criterion, in the joint point information flow, bone length information flow, motion information flow and motion information-based bone length information flow Based on this, the effectiveness of the ALS-GCN method is verified, and the specific performance is shown in Table 2. The specific fusion method is to add the Softmax scores of multiple streams to obtain the fused score. Among them, 1s-ALS-GCN means that only the original skeleton coordinates are used as input, that is, only the joint point information flow is used in X-view. The standard classification result; 2s-ALS-GCN is the result of fusion of joint point information and bone length information; 4s-ALS-GCN is the final result of the fusion of the above four information streams.

表2不：同信息流的精度(％)对比。Table 2: Comparison of the accuracy (%) of the same information flow.

为了进一步验证本方法(ALS-GCN)的有效性，本实施例还在NTU60RGB+D数据集上与近几年的其他主流方法进行了比较，如表3所示，通过实验结果可以发现，相比于非图卷积方法，例如AGC-LSTM、HCN等，本方法有很大的优势；而相比于近两年的主流图卷积方法，本方法也有优异的表现，虽然相比于最新的一些方法仍有一定差距，但是考虑到本方法的参数量较低以及epoch仅有80等因素，本方法仍有较高的参考价值。In order to further verify the effectiveness of this method (ALS-GCN), this example is also compared with other mainstream methods in recent years on the NTU60RGB+D data set. Compared with non-graph convolution methods, such as AGC-LSTM, HCN, etc., this method has great advantages; compared with the mainstream graph convolution methods in the past two years, this method also has excellent performance, although compared with the latest There are still some gaps in some methods, but considering the factors such as the low number of parameters of this method and only 80 epochs, this method still has a high reference value.

表3：NTU60 RGB+D数据集下在X-Sub与X-View的评判标准的精度(％)对比。Table 3: Comparison of the accuracy (%) of the evaluation criteria of X-Sub and X-View under the NTU60 RGB+D dataset.

方法method 年份years X-Subject(％)X-Subject (%) X-View(％)X-View(%) Lie GroupLie Group 20142014 50.150.1 82.882.8 DPRLDPRL 20182018 83.583.5 89.889.8 SRN-TSLSRN-TSL 20182018 84.884.8 84.884.8 HCNHCN 20182018 86.586.5 86.586.5 AGC-LSTMAGC-LSTM 20192019 89.289.2 89.289.2 ST-GCNST-GCN 20182018 81.581.5 88.388.3 DPRL+GCNNDPRL+GCNN 20182018 83.583.5 89.889.8 AS-GCNAS-GCN 20192019 86.886.8 94.294.2 2s-AGCN2s-AGCN 20192019 88.588.5 95.195.1 Mix-DimensionMix-Dimension 20202020 89.789.7 96.096.0 PGCN-TCAPGCN-TCA 20202020 88.088.0 93.693.6 PA-ResGCN-B19PA-ResGCN-B19 20202020 90.990.9 96.096.0 MS-AAGCNMS-AAGCN 20202020 90.090.0 96.296.2 FGCN-spatial+FGCN-motionFGCN-spatial+FGCN-motion 20202020 90.290.2 96.396.3 CGCNCGCN 20202020 90.390.3 96.496.4 Shift-GCNShift-GCN 20202020 90.790.7 96.596.5 DC-GCN+ADGDC-GCN+ADG 20202020 90.890.8 96.696.6 MDM-GCNMDM-GCN 20192019 89.289.2 95.995.9 4s-ALS-GCN4s-ALS-GCN 20212021 90.590.5 96.596.5

为了更好的证明本方法的优越性，本实施例还在NTU120 RGB+D数据集上进行了比较，具体的实验结果，如表4所示；相比于2020年某些方法，本方法仍有不足，但是考虑到NTU120 RGB+D数据集的大小是NTU60 RGB+D数据集的一倍，而我们的epoch仍然只有120，所以本方法在NTU120 RGB+D数据集下的表现仍然可圈可点。In order to better prove the superiority of this method, this example is also compared on the NTU120 RGB+D data set. The specific experimental results are shown in Table 4. Compared with some methods in 2020, this method is still There are deficiencies, but considering that the size of the NTU120 RGB+D dataset is double that of the NTU60 RGB+D dataset, and our epoch is still only 120, the performance of this method under the NTU120 RGB+D dataset is still remarkable. point.

表4：NTU120 RGB+D数据集下在X-Subject与X-Set的评判标准的精度(％)对比。Table 4: Comparison of the accuracy (%) of the evaluation criteria of X-Subject and X-Set under the NTU120 RGB+D dataset.

方法method 年份years X-Subject(％)X-Subject (%) X-Set(％)X-Set(%) Body PoseEvolution MapBody PoseEvolution Map 20182018 64.664.6 66.966.9 TSRJI(Late Fusion)TSRJI (Late Fusion) 20192019 67.967.9 62.862.8 Logsin-RNNLogsin-RNN 20192019 68.368.3 67.267.2 GVFE+AS-GCN with DH-TCNGVFE+AS-GCN with DH-TCN 20192019 78.378.3 79.879.8 SGNSGN 20202020 79.279.2 81.581.5 Mix-DimensionMix-Dimension 20202020 80.580.5 83.283.2 ST-TR-agcnST-TR-agcn 20202020 82.782.7 84.784.7 FGCNFGCN 20202020 85.485.4 87.487.4 Shift-GCNShift-GCN 20202020 85.985.9 87.687.6 VPNVPN 20202020 86.386.3 87.887.8 1s-ALS-GCN1s-ALS-GCN 20212021 80.980.9 84.784.7 2s-ALS-GCN2s-ALS-GCN 20212021 85.585.5 88.288.2 4s-ALS-GCN4s-ALS-GCN 20212021 86.386.3 89.189.1

最后，为了体现本方法可以在兼顾参数量的同时显著提高实验精度，我们在NTU60RGB+D数据集的X-Subject与X-View标准上对本文方法的参数量进行了比较，如表5所示；通过比较，我们可以发现，本方法表现最为均衡，不仅将参数量控制在一个合适的区间内，而且精度也高于近两年内的方法。Finally, in order to show that this method can significantly improve the experimental accuracy while taking into account the amount of parameters, we compared the amount of parameters of the method in this paper on the X-Subject and X-View standards of the NTU60RGB+D dataset, as shown in Table 5. ; Through comparison, we can find that this method has the most balanced performance, not only controlling the parameters within a suitable interval, but also having a higher accuracy than the methods in the past two years.

表5：不同方法参数量对比。Table 5: Comparison of parameters of different methods.

根据在行为识别数据集NTU60 RGB+D与NTU120 RGB+D上的实验结果表明，本方法能够在较少计算量的情况下，达到较高的实验精度。According to the experimental results on the behavior recognition datasets NTU60 RGB+D and NTU120 RGB+D, this method can achieve high experimental accuracy with less computational effort.

且可以发现，本方法通过构建空间移位模块100，解决了感受野过小的问题，同时通过时间移位模块200解决了非线性堆叠的方式造成的参数量过高的问题。And it can be found that the method solves the problem of too small receptive field by constructing the spatial shift module 100 , and at the same time solves the problem of excessively high parameters caused by the nonlinear stacking method by the time shift module 200 .

应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions without departing from the spirit and scope of the technical solutions of the present invention should be included in the scope of the claims of the present invention.

Claims

1. a light-weight shift graph convolution behavior recognition method based on attention mechanism, is characterized in that, comprises:

Preprocess the data set to generate joint point information flow data set, bone length information flow data set, joint point information flow data set based on motion information and bone length information flow data set based on motion information;

The preprocessing includes:

Eliminate falsely detected camera data;

Define the index for judging the energy of the skeleton sequence, and eliminate the falsely detected joint point data;

Normalized joint point coordinate data;

normalized perspective;

Feature extraction is performed on the joint point information flow data set, the bone length information flow data set, the joint point information flow data set based on motion information and the bone length information flow data set based on motion information through a convolutional neural network to obtain joint points. Information flow, bone length information flow, motion information flow and motion information-based bone length information flow;

The bone length information flow includes:

Define the joint closer to the center of gravity of the skeleton as the source joint, and the joint farther from the center of gravity as the target joint;

The source joints in frame t are:

V _i,t =( _xi,t ,y _i,t ,z _i,t )

The target joint in frame t is:

V _j,t =(x _j,t ,y _j,t ,z _j,t )

The bone length information flow e _i,j,t is defined as follows:

e _i,j,t =V _j,t -V _i,t

Among them, i and j are joint points, and x, y, and z are joint coordinates;

The motion information flow includes:

em _i,t,t+1 =V _i,t -V _i,t+1 =( _xi,t+1 -xi _,t ,y _i,t+1 -y _i,t ,z _{i,t +1} -z _i,t )

Wherein, em _i,t,t+1 is the motion information flow;

The bone length information flow based on motion information includes:

bm _i,t,t+1 =e _i,j,t -e _i,j,t+1

Wherein, bm _i,t,t+1 is the bone length information flow based on the motion information;

constructing an ALS-GCN network, and obtaining the spatiotemporal characteristics of the information flow through the ALS-GCN network;

The ALS-GCN network includes a space shift module (100) and a time shift module (200);

The spatial shift module (100) includes a spatial attention module (101) and a channel attention module (102);

Optimize the time shift module (200) by setting an adaptive time sequence shift graph convolution and adding short connections;

The temporal and spatial features of the information flow are fused to obtain behavior recognition results.

2. The method for recognizing light-weight shift graph convolution behavior based on an attention mechanism according to claim 1, wherein the fusion comprises:

The Softmax scores of the spatio-temporal features of the information flow are summed.