CN115862150B

CN115862150B - Diver action recognition method based on three-dimensional human body skin

Info

Publication number: CN115862150B
Application number: CN202310015851.2A
Authority: CN
Inventors: 姜宇; 赵明浩; 齐红; 王跃航; 王光诚; 魏枫林; 王凯
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-05-23
Anticipated expiration: 2043-01-06
Also published as: CN115862150A

Abstract

The invention relates to a diver action recognition method based on three-dimensional human body skin. The invention relates to the technical field of computer vision, which extracts human body shape, posture and vertex data from diver videos by a three-dimensional human body shape and posture estimation method; the human body shape, gesture and vertex data are subjected to a data fusion module to obtain high-level semantic information; performing action recognition by using the high-level semantic information through a TCA-GCN module; performing action recognition by using the high-level semantic information through the STGCN module; and linearly fusing the identification results of the two modules. By the technical scheme, the three-dimensional gesture motion estimation of the diver is realized, and the accuracy of motion recognition is improved.

Description

A Diver Action Recognition Method Based on 3D Human Skin

技术领域technical field

本发明涉及计算机视觉技术领域，是一种基于三维人体蒙皮的潜水员动作识别方法。The invention relates to the technical field of computer vision, and relates to a diver action recognition method based on three-dimensional human skin.

背景技术Background technique

动作识别是计算机理解人类行为的基础，在人机交互，视频理解等领域都发挥着重要作用，在计算机视觉领域已经成为了热门话题。由于潜水员作业环境的特殊性，他们无法通过语言的形式进行沟通与表达，但是由于人类肢体天然的富有强语义信息，因此潜水员水下作业能够借助一些动作来表达一些特殊含义。比如能够通过不同手势，来表达自己体能透支、缺氧、腿抽筋等紧急情况。在这样的场景下，如何准确高效的识别潜水员动作，成为了一个重要的研究方向。Action recognition is the basis for computers to understand human behavior. It plays an important role in human-computer interaction, video understanding and other fields, and has become a hot topic in the field of computer vision. Due to the particularity of divers' working environment, they cannot communicate and express in the form of language, but because human limbs are naturally rich in strong semantic information, divers can use some actions to express some special meanings in underwater operations. For example, you can use different gestures to express your physical overdraft, hypoxia, leg cramps and other emergencies. In such a scenario, how to accurately and efficiently recognize divers' movements has become an important research direction.

现有的潜水员动作识别方法大多数都是基于人体骨骼点进行，但由于骨骼数据缺乏人体表面信息，更加抽象，低语义而只能表示人体的动作特征，无法体现出更具体，更高层次的信息，比如形状特征、顶点特征等，并不能更加准确的表示人体动作。为了利用更具体，更高层次的语义信息，本申请提出了基于三维人体蒙皮的潜水员动作识别方法。由于人体结构能够天然的表示成一个图结构，因此目前很多方法也是基于图卷积。图卷积方法能够更加准确的找到人体不同关键点的关系，得到更好表示的空间维度信息，从而得到更为准确的动作识别结果。由于潜水员的每一个动作都是一个序列数据，因此目前很多方法也通过LSTM、时间卷积等方式得到动作序列间的关系，这可以提取出更好的时间维度信息，以便于达到更好的性能。目前SMPL是主流的三维人体蒙皮表示方法，通过

，/>

两个参数分别表示人体的形状和姿态。同时，利用/>

，/>

，SMPL能够得到人体网格的顶点参数v，这对于我们的动作识别任务又提供了更多的语义信息。这些三维人体蒙皮信息表示了人体的姿态、形状、顶点，通过数据融合，能够得到更高层次的语义信息，最后利用图卷积深度学习方法，得到更准确的潜水员姿态估计结果。Most of the existing diver action recognition methods are based on human bone points. However, due to the lack of human surface information, the bone data is more abstract and has low semantics. It can only represent the action characteristics of the human body, and cannot reflect more specific and higher-level Information, such as shape features, vertex features, etc., cannot represent human actions more accurately. In order to utilize more specific and higher-level semantic information, this application proposes a diver action recognition method based on 3D human skin. Since the human body structure can be naturally represented as a graph structure, many current methods are also based on graph convolution. The graph convolution method can more accurately find the relationship between different key points of the human body, obtain better spatial dimension information, and obtain more accurate action recognition results. Since every action of a diver is a sequence of data, many current methods also use LSTM, time convolution, etc. to obtain the relationship between action sequences, which can extract better time dimension information in order to achieve better performance. . At present, SMPL is the mainstream three-dimensional human skin representation method, through

, />

The two parameters represent the shape and pose of the human body, respectively. At the same time, using />

, />

, SMPL can get the vertex parameter v of the human body mesh, which provides more semantic information for our action recognition task. These three-dimensional human body skin information represents the posture, shape, and vertices of the human body. Through data fusion, higher-level semantic information can be obtained. Finally, the graph convolution deep learning method is used to obtain more accurate diver posture estimation results.

发明内容Contents of the invention

本发明为克服现有技术的不足，本发明实现利用三维人体蒙皮信息对潜水员动作进行识别，用更高层次的语义信息，达到更加准确的动作识别效果。In order to overcome the deficiencies of the prior art, the present invention realizes the use of three-dimensional human body skin information to identify divers' movements, and uses higher-level semantic information to achieve more accurate action recognition effects.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device.

本发明提供了一种基于三维人体蒙皮的潜水员动作识别方法，本发明提供了以下技术方案：The invention provides a diver's action recognition method based on three-dimensional human skin, and the invention provides the following technical solutions:

一种基于三维人体蒙皮的潜水员动作识别方法，所述方法包括以下步骤：A diver's action recognition method based on three-dimensional human skin, said method comprising the following steps:

步骤1：通过三维人体姿态估计方法提取潜水员视频帧的人体形状，姿态和顶点信息；Step 1: Extract the human body shape, posture and vertex information of the diver's video frame through the 3D human body pose estimation method;

步骤2：将人体形状、姿态、顶点数据经过数据融合得到高层次语义信息；Step 2: The human body shape, posture, and vertex data are fused to obtain high-level semantic information;

步骤3：利用高层次语义信息经过TCA-GCN模块进行动作识别；Step 3: Use high-level semantic information to perform action recognition through the TCA-GCN module;

步骤4：利用高层次语义信息经过STGCN模块进行动作识别；Step 4: Use high-level semantic information to perform action recognition through the STGCN module;

步骤5：将步骤3和步骤4的识别结果进行线性融合，对潜水员动作进行识别。Step 5: Perform linear fusion of the recognition results of Step 3 and Step 4 to recognize diver actions.

优选地，所述步骤2具体为：Preferably, the step 2 is specifically:

将顶点信息下采样，同时将下采样顶点信息和形状信息分别经过特征提取网络中的卷积模块得到编码信息，将编码信息拼接到姿态信息，得到高层次语义信息。The vertex information is down-sampled, and the down-sampled vertex information and shape information are respectively passed through the convolution module in the feature extraction network to obtain the encoded information, and the encoded information is spliced into the attitude information to obtain high-level semantic information.

优选地，所述步骤3具体为：Preferably, the step 3 is specifically:

TCA-GCN模块包括TCA模块和TF模块，其中TCA模块主要考虑了高层次语义信息的时空维度特征并进行结合，再经过TF模块将时间建模卷积的结果进行带有注意力方法的融合，最后将提取到的时空信息特征经过全连接层和Softmax层得到估计的动作类别。The TCA-GCN module includes the TCA module and the TF module. The TCA module mainly considers the spatio-temporal dimension features of high-level semantic information and combines them, and then through the TF module, the result of the temporal modeling convolution is fused with an attention method. Finally, the extracted spatiotemporal information features are passed through the fully connected layer and the Softmax layer to obtain the estimated action category.

优选地，TCA 模块包括时间聚合, 拓扑生成以及两部分的通道维度聚合，其中TCA模块

通过下式表示：Preferably, the TCA module includes time aggregation, topology generation and two-part channel dimension aggregation, wherein the TCA module

Expressed by the following formula:

其中，

表示为通道维度聚合,/>

表示为拼接操作，/>

为潜水员关节特征在时间聚合之后的结构，/>

表示特征经过拓扑生成处理的结果,/>

为关节点特征在通道维度上的聚合，/>

为1号关节点在时间维度上的卷积结果，/>

为1号关节点经过拓扑处理的结果，

为1号关节点特征在时间聚合之后的结构，/>

为1号关节点特征经过拓扑生成处理的结果，/>

为时间聚合模块，/>

为时间权重特征，/>

为关节点特征，/>

为1号关节点的时间权重特征，/>

为1号关节点特征，/>

为第T号关节点的时间权重特征，/>

为第T号关节点关节点特征，/>

为三阶邻接矩阵的归一化和维数变换运算，/>

为第k个通道的邻接矩阵，/>

为关节连接强度的可训练参数，/>

为通道相关性矩阵。in,

Expressed as channel dimension aggregation, />

Expressed as a concatenation operation, />

is the structure of diver joint features after time aggregation, />

Indicates the result of the feature processed by topology generation, />

is the aggregation of joint point features on the channel dimension, />

It is the convolution result of joint point 1 in the time dimension, />

is the result of topological processing of joint node 1,

It is the structure of joint point feature No. 1 after time aggregation, />

It is the result of topology generation processing for the feature of the joint node 1, />

For the time aggregation module, />

is the time weight feature, />

is the joint feature, />

is the time weight feature of joint point 1, />

is the feature of joint point 1, />

is the time weight feature of the joint point T, />

is the feature of joint node T, />

is the normalization and dimension transformation operation of the third-order adjacency matrix, />

is the adjacency matrix of the kth channel, />

is a trainable parameter for joint connection strength, />

is the channel correlation matrix.

优选地，TF模块

通过下式表示：Preferably, the TF module

Expressed by the following formula:

为多卷积函数，最终结合时态建模生成了最终的TCA-GCN，将得到的时空特征信息经过全连接层和Softmax进行动作类别判断，使用L1损失作为损失函数，并使用真实动作类别标签Groud Truth做有监督学习。

It is a multi-convolution function, and finally combined with temporal modeling to generate the final TCA-GCN, the obtained spatio-temporal feature information is passed through the fully connected layer and Softmax to judge the action category, using L1 loss as the loss function, and using the real action category label Groud Truth does supervised learning.

优选地，所述步骤4具体为：Preferably, the step 4 is specifically:

STGCN模块包括图卷积模块和时间卷积模块，经过图卷积，学习到空间中相邻点的局部特征，经过时间卷积，学习到序列数据中的时序信息；将提取到的时空信息特征经过全连接层和Softmax层得到估计的动作类别。The STGCN module includes a graph convolution module and a time convolution module. After graph convolution, the local features of adjacent points in the space are learned, and after time convolution, the timing information in the sequence data is learned; the extracted spatiotemporal information features The estimated action category is obtained through the fully connected layer and the Softmax layer.

优选地，所述步骤5具体为：Preferably, the step 5 is specifically:

将步骤3和步骤4的结果融合，作为输出，通过下式表示输出结果:The results of step 3 and step 4 are fused as output, and the output result is represented by the following formula:

其中，

是STGCN模块的动作识别结果，/>

为该结果的权重，/>

表示TCA-GCN模块的识别结果，score为加权后的最终输出结果。in,

is the action recognition result of the STGCN module, />

is the weight of the result, />

Indicates the recognition result of the TCA-GCN module, and score is the weighted final output result.

一种基于三维人体蒙皮的潜水员动作识别系统，所述系统包括：A diver action recognition system based on three-dimensional human skin, said system comprising:

数据提取模块，所述数据提取模块通过三维人体姿态估计方法提取潜水员视频帧的人体形状，姿态和顶点信息；Data extraction module, described data extraction module extracts the human body shape of diver's video frame, posture and apex information by three-dimensional human body posture estimation method;

数据融合模块，所述数据融合模块：将人体形状、姿态、顶点数据经过数据融合得到高层次语义信息；Data fusion module, the data fusion module: obtain high-level semantic information through data fusion of human body shape, posture, and vertex data;

TCA-GCN动作估计模块：利用高层次语义信息经过TCA-GCN模块进行动作识别；TCA-GCN action estimation module: use high-level semantic information to perform action recognition through the TCA-GCN module;

STGCN动作估计模块：利用高层次语义信息经过STGCN模块进行动作识别；STGCN action estimation module: use high-level semantic information to perform action recognition through the STGCN module;

线性融合模块，所述线性融合模块将TCA-GCN模块和STGCN模块的识别结果进行线性融合，对潜水员动作进行识别。A linear fusion module, the linear fusion module linearly fuses the identification results of the TCA-GCN module and the STGCN module to identify diver actions.

一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行，以用于实现一种基于三维人体蒙皮的潜水员动作识别方法。A computer-readable storage medium stores a computer program on it, and the program is executed by a processor to realize a diver's action recognition method based on a three-dimensional human skin.

一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现一种基于三维人体蒙皮的潜水员动作识别方法。A computer device includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a diver's action recognition method based on three-dimensional human skin is realized.

本发明具有以下有益效果：The present invention has the following beneficial effects:

本发明与现有技术相比：The present invention compares with prior art:

本发明通过三维人体形状与姿态估计方法从潜水员视频中提取出人体形状、姿态、顶点数据；将人体形状、姿态、顶点数据经过数据融合模块得到高层次语义信息；利用高层次语义信息经过TCA-GCN模块进行动作识别；利用高层次语义信息经过STGCN模块进行动作识别；将两个模块的识别结果进行线性融合。通过本发明中的技术方案，实现了对潜水员三维姿态的动作估计，提高了动作识别的准确率。The present invention extracts the human body shape, posture and vertex data from the diver video through the three-dimensional human body shape and posture estimation method; the human body shape, posture and vertex data are obtained through the data fusion module to obtain high-level semantic information; the high-level semantic information is used through TCA- The GCN module performs action recognition; the STGCN module uses high-level semantic information for action recognition; the recognition results of the two modules are linearly fused. Through the technical solution in the invention, the action estimation of the diver's three-dimensional posture is realized, and the accuracy rate of action recognition is improved.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the specific implementation or description of the prior art. Obviously, the accompanying drawings in the following description The drawings show some implementations of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.

图1是基于三维人体蒙皮数据的潜水员动作识别方法的示意流程图；Fig. 1 is the schematic flowchart of the diver action recognition method based on three-dimensional human skin data;

图2是基于三维人体蒙皮数据的潜水员动作识别方法的框图。Fig. 2 is a block diagram of a diver action recognition method based on three-dimensional human skin data.

具体实施方式Detailed ways

下面将结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described embodiments are part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

在本发明的描述中，需要说明的是，术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一”、“第二”、“第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer" etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the referred device or element must have a specific orientation, or in a specific orientation. construction and operation, therefore, should not be construed as limiting the invention. In addition, the terms "first", "second", and "third" are used for descriptive purposes only, and should not be construed as indicating or implying relative importance.

在本发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that unless otherwise specified and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense, for example, it can be a fixed connection or a detachable connection. Connected, or integrally connected; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication of two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention in specific situations.

此外，下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as there is no conflict with each other.

以下结合具体实施例，对本发明进行了详细说明。The present invention is described in detail below in conjunction with specific embodiments.

具体实施例一：Specific embodiment one:

根据图1至图2所示，本发明为解决上述技术问题采取的具体优化技术方案是：本发明涉及一种基于三维人体蒙皮的潜水员动作识别方法。According to Fig. 1 to Fig. 2, the specific optimized technical solution adopted by the present invention to solve the above technical problems is: the present invention relates to a diver's action recognition method based on three-dimensional human skin.

具体实施例二：Specific embodiment two:

本申请实施例二与实施例一的区别仅在于：The difference between the second embodiment of the present application and the first embodiment is only:

所述步骤2具体为：The step 2 is specifically:

具体实施例三：Specific embodiment three:

本申请实施例三与实施例二的区别仅在于：The difference between the third embodiment of the present application and the second embodiment is only:

具体实施例四：Specific embodiment four:

本申请实施例四与实施例三的区别仅在于：The difference between the fourth embodiment of the present application and the third embodiment is only:

TCA 模块包括时间聚合, 拓扑生成以及两部分的通道维度聚合，其中TCA模块通过下式表示：The TCA module includes time aggregation, topology generation, and two-part channel dimension aggregation, where the TCA module is represented by the following formula:

其中，

表示为通道维度聚合,/>

表示为拼接操作，/>

为潜水员关节特征在时间聚合之后的结构，/>

表示特征经过拓扑生成处理的结果,/>

为关节点特征在通道维度上的聚合，/>

为1号关节点在时间维度上的卷积结果，/>

为1号关节点经过拓扑处理的结果，

为1号关节点特征在时间聚合之后的结构，/>

为1号关节点特征经过拓扑生成处理的结果，/>

为时间聚合模块，/>

为时间权重特征，/>

为关节点特征，/>

为1号关节点的时间权重特征，/>

为1号关节点特征，/>

为第T号关节点的时间权重特征，/>

为第T号关节点关节点特征，/>

为三阶邻接矩阵的归一化和维数变换运算，/>

为第k个通道的邻接矩阵，/>

为关节连接强度的可训练参数，/>

为通道相关性矩阵。in,

Expressed as channel dimension aggregation, />

Expressed as a concatenation operation, />

is the structure of diver joint features after time aggregation, />

Indicates the result of the feature processed by topology generation, />

is the aggregation of joint point features on the channel dimension, />

It is the convolution result of joint point 1 in the time dimension, />

is the result of topological processing of joint node 1,

It is the structure of joint point feature No. 1 after time aggregation, />

For the time aggregation module, />

is the time weight feature, />

is the joint feature, />

is the time weight feature of joint point 1, />

is the feature of joint point 1, />

is the time weight feature of the joint point T, />

is the feature of joint node T, />

is the adjacency matrix of the kth channel, />

is a trainable parameter for joint connection strength, />

is the channel correlation matrix.

具体实施例五：Specific embodiment five:

本申请实施例五与实施例四的区别仅在于：The difference between the fifth embodiment of the present application and the fourth embodiment is only:

TF模块通过下式表示：The TF module is represented by the following formula:

具体实施例六：Specific embodiment six:

本申请实施例六与实施例五的区别仅在于：The difference between the sixth embodiment of the present application and the fifth embodiment is only:

具体实施例七：Specific embodiment seven:

本申请实施例七与实施例六的区别仅在于：The difference between Embodiment 7 of the present application and Embodiment 6 is only:

其中，

是STGCN模块的动作识别结果，/>

为该结果的权重，/>

is the action recognition result of the STGCN module, />

is the weight of the result, />

具体实施例八：Specific embodiment eight:

本申请实施例八与实施例七的区别仅在于：The difference between the eighth embodiment of the present application and the seventh embodiment is only:

本发明提供一种基于三维人体蒙皮的潜水员动作识别系统，所述系统包括：The present invention provides a diver action recognition system based on three-dimensional human body skin, said system comprising:

具体实施例九：Specific embodiment nine:

本申请实施例九与实施例八的区别仅在于：The difference between the ninth embodiment of the present application and the eighth embodiment is only:

本发明提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行，以用于实现如一种基于三维人体蒙皮的潜水员动作识别方法。The present invention provides a computer-readable storage medium, on which a computer program is stored, and the program is executed by a processor, so as to realize a diver's action recognition method based on a three-dimensional human skin.

所述方法包括以下步骤：The method comprises the steps of:

所述方法包括：数据提取模块，数据融合模块，动作估计模块，融合模块。The method includes: a data extraction module, a data fusion module, an action estimation module, and a fusion module.

所述数据提取模块，利用三维人体姿态估计方法提取潜水员视频帧的人体形状，姿态，顶点信息。The data extraction module extracts the human body shape, posture and vertex information of the diver's video frame by using a three-dimensional human body posture estimation method.

所述数据融合模块，使用人体形状、姿态、顶点信息提取高层次语义信息。The data fusion module extracts high-level semantic information by using human body shape, posture, and vertex information.

所述动作估计模块，使用高层次语义信息分别经过TCA-GCN模块和STGCN模块进行动作识别。The action estimation module uses high-level semantic information to perform action recognition through the TCA-GCN module and the STGCN module respectively.

所述融合模块，用于融合动作估计模块中的结果以得到更准确的潜水员动作识别结果。The fusion module is used to fuse the results of the motion estimation module to obtain more accurate diver motion recognition results.

所述构建模块具体包含，特征提取网络，STGCN网络，TCA-GCN网络。The building blocks specifically include a feature extraction network, a STGCN network, and a TCA-GCN network.

步骤21，将顶点信息下采样，同时将下采样顶点信息和形状信息分别经过特征提取网络中的卷积模块得到编码信息。将编码信息拼接到姿态信息，得到高层次语义信息。Step 21, down-sampling the vertex information, and simultaneously pass the down-sampled vertex information and shape information through the convolution module in the feature extraction network to obtain coding information. The encoded information is spliced into the pose information to obtain high-level semantic information.

步骤22，STGCN包括了图卷积模块和时间卷积模块。经过图卷积，学习到空间中相邻点的局部特征。经过时间卷积，学习到序列数据中的时序信息。最后将提取到的时空信息特征经过全连接层和Softmax层得到估计的动作类别。Step 22, STGCN includes a graph convolution module and a temporal convolution module. Through graph convolution, the local features of adjacent points in space are learned. After time convolution, the timing information in the sequence data is learned. Finally, the extracted spatiotemporal information features are passed through the fully connected layer and the Softmax layer to obtain the estimated action category.

步骤23，TCA-GCN主要由TCA模块和TF模块两部分构成，其中TCA模块主要考虑了高层次语义信息的时空维度特征并进行结合，再经过TF模块将时间建模卷积的结果进行带有注意力方法的融合，最后将提取到的时空信息特征经过全连接层和Softmax层得到估计的动作类别。Step 23, TCA-GCN is mainly composed of two parts: TCA module and TF module. The TCA module mainly considers the spatio-temporal dimension characteristics of high-level semantic information and combines them, and then through the TF module, the result of time modeling convolution is carried out with The fusion of the attention method, and finally the extracted spatiotemporal information features are passed through the fully connected layer and the Softmax layer to obtain the estimated action category.

步骤24，根据步骤23，步骤24所述结果，使用线性加权的方式输出更加准确的潜水员动作识别结果。In step 24, according to the results in steps 23 and 24, a more accurate diver action recognition result is output in a linear weighted manner.

所述过程的计算公式为：The calculation formula of the process is:

式中，

表示从视频中抽取的第i帧图像，/>

表示人体姿态与形状估计方法，

分别表示第i帧的形状、姿态、顶点信息。/>

表示数据融合模块，用于得到高层次语义信息/>

。/>

和/>

分别表示STGCN模块和TCA-GCN模块，利用/>

分别得到两个动作识别结果/>

和/>

。/>

表示将识别结果进行线性融合，/>

表示STGCN的结果权重，最后得到更加准确的识别结果/>

。In the formula,

Indicates the i-th frame image extracted from the video, />

Indicates the human pose and shape estimation method,

Represents the shape, pose, and vertex information of the i-th frame, respectively. />

Represents a data fusion module for obtaining high-level semantic information/>

. />

and />

Represent the STGCN module and the TCA-GCN module respectively, using />

Get two action recognition results respectively />

and />

. />

Indicates that the recognition results are linearly fused, />

Indicates the result weight of STGCN, and finally obtains a more accurate recognition result />

.

具体实施例十：Specific embodiment ten:

本申请实施例十与实施例九的区别仅在于：The difference between Embodiment 10 of the present application and Embodiment 9 only lies in:

本发明提供一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，其所述处理器执行所述计算机程序时实现一种基于三维人体蒙皮的潜水员动作识别方法。The invention provides a computer device, which includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a diver action recognition method based on three-dimensional human skin is realized.

方法包括:Methods include:

步骤1，由于高质量三维人体蒙皮对于我们的潜水员动作识别任务具有更好的提升效果，因此本申请使用了目前三维人体姿态估计效果较好的ROMP网络，得到了人体形状、姿态、顶点参数。Step 1. Since the high-quality 3D human body skin has a better improvement effect on our diver action recognition task, this application uses the current ROMP network with a better effect in 3D human body pose estimation, and obtains the body shape, pose, and vertex parameters. .

步骤2，使用一个数据融合模块，将人体形状，姿态，顶点参数进行融合，得到更高层次的语义信息。具体的，将顶点信息下采样，再分别将下采样结果和形状参数通过一个卷积网络，得到顶点和形状的编码信息，最后将编码信息拼接到姿态参数上，得到更高层次的语义信息。Step 2. Use a data fusion module to fuse human body shape, posture, and vertex parameters to obtain higher-level semantic information. Specifically, the vertex information is down-sampled, and then the down-sampling results and shape parameters are passed through a convolutional network to obtain the coded information of the vertex and shape, and finally the coded information is spliced to the pose parameters to obtain higher-level semantic information.

步骤3，基于蒙皮关键点构造时空图。SMPL由24个蒙皮关键点表示，因此可以构造一个时空图。在关键点序列上构造一个无向时空图G = (V, E)，该序列包含N个关键点和T个帧，同时具有体内和帧间连接，即时间-空间图。Step 3. Construct a spatio-temporal graph based on skin key points. SMPL is represented by 24 skinned keypoints, so a spatiotemporal map can be constructed. Construct an undirected spatio-temporal graph G = (V, E) on the keypoint sequence, which contains N keypoints and T frames, with both in-vivo and inter-frame connections, i.e., a spatio-temporal graph.

步骤4，使用GCN能够得到丰富的空间信息。GCN公式如下所示：Step 4, using GCN can get rich spatial information. The GCN formula looks like this:

其中,

为各个关键点之间的连接关系矩阵，/>

为相关特征矩阵，/>

为不同关节点的重要程度。/>

表示为归一化处理。为了符合潜水员水下作业的三维动作，我们模仿动作的运动趋势，关键点分为根节点，离心点以及向心点。这样邻接矩阵就变成了三维，具体公式形式如下所示：in,

is the connection relationship matrix between each key point, />

is the correlation feature matrix, />

is the importance of different joint points. />

Expressed as normalized processing. In order to conform to the three-dimensional action of divers' underwater operations, we imitate the movement trend of the action, and the key points are divided into root node, centrifugal point and centripetal point. In this way, the adjacency matrix becomes three-dimensional, and the specific formula is as follows:

对于不同的维度

来说，/>

分别代表了潜水员运动时的根节点，离心点以及向心点。/>

表示为可以训练的量，以适应不同时间段的每个关键点不同的重要性。for different dimensions

say, />

Represent the root node, centrifugal point and centripetal point when the diver moves. />

Expressed as a quantity that can be trained to accommodate the different importance of each key point in different time periods.

步骤5，使用基于图卷积的深度学习网络STGCN进行潜水员动作识别。步骤2得到的高层次语义信息维度大小为（S,24,7），其中S表示动作序列长度，24表示24个人体蒙皮关键点，7表示每个关键点的特征维度。使用采样函数指定对每个节点进行图卷积操作时，所涉及到的相邻节点范围。经过图卷积，学习到空间中相邻点的局部特征。经过时间卷积，学习到序列数据中的时序信息。将得到的时空特征信息经过全连接层和Softmax进行动作类别判断，使用L1损失作为损失函数，并使用真实动作类别标签Groud Truth做有监督学习。Step 5, using STGCN, a deep learning network based on graph convolution, for diver action recognition. The dimensions of the high-level semantic information obtained in step 2 are (S, 24, 7), where S represents the length of the action sequence, 24 represents the 24 key points of human body skin, and 7 represents the feature dimension of each key point. Use the sampling function to specify the range of adjacent nodes involved in the graph convolution operation for each node. Through graph convolution, the local features of adjacent points in space are learned. After time convolution, the timing information in the sequence data is learned. The obtained spatio-temporal feature information is passed through the fully connected layer and Softmax to judge the action category, using L1 loss as the loss function, and using the real action category label Groud Truth for supervised learning.

步骤6，使用基于图卷积的深度学习网络TCA-GCN进行潜水员动作识别。时间聚合模块来学习时间维特征，并使用信道聚合模块来有效地将空间动态通道级拓扑特征与时间动态拓扑特征相结合。在时间建模上提取了多尺度的特征，并将其与注意机制进行融合，该方法主要有TCA 模块和模块两部分组成，TCA 模块又分为时间聚合, 拓扑生成以及两部分的通道维度聚合组成，其中TCA模块具体公式如下所示：Step 6, using graph convolution-based deep learning network TCA-GCN for diver action recognition. A temporal aggregation module to learn time-dimensional features, and a channel aggregation module to effectively combine spatial dynamic channel-level topological features with temporal dynamic topological features. Multi-scale features are extracted in time modeling and fused with the attention mechanism. This method mainly consists of TCA module and module. The TCA module is divided into time aggregation, topology generation and two-part channel dimension aggregation. The specific formula of the TCA module is as follows:

其中，

表示为通道维度聚合,/>

表示为拼接操作，/>

为潜水员关节特征在时间聚合之后的结构,表示为/>

，/>

表示特征经过拓扑生成处理的结果，表示为/>

。TF模块表示为/>

，/>

为多卷积函数，最终结合时态建模生成了最终的TCA-GCN。将得到的时空特征信息经过全连接层和Softmax进行动作类别判断，使用L1损失作为损失函数，并使用真实动作类别标签GroudTruth做有监督学习。in,

Expressed as channel dimension aggregation, />

Expressed as a concatenation operation, />

is the structure of the diver's joint features after time aggregation, expressed as />

, />

Indicates the result of the feature processed by topology generation, expressed as />

. TF modules are denoted as />

, />

For multiple convolution functions, the final TCA-GCN is finally generated in combination with temporal modeling. The obtained spatio-temporal feature information is used to judge the action category through the fully connected layer and Softmax, using L1 loss as the loss function, and using the real action category label GroudTruth for supervised learning.

步骤7，使用加权线性融合的方式提升动作识别准确率。由于步骤5和步骤6两个模块考虑的数据特征和特征提取方式不同，将两个模块的结果融合，作为输出，公式如下:Step 7, use weighted linear fusion to improve the accuracy of action recognition. Since the data features and feature extraction methods considered by the two modules of step 5 and step 6 are different, the results of the two modules are fused as output, and the formula is as follows:

其中

是STGCN模块的动作识别结果，/>

为该结果的权重，/>

表示TCA-GCN模块的识别结果，score为加权后的最终结果。in

is the action recognition result of the STGCN module, />

is the weight of the result, />

Indicates the recognition result of the TCA-GCN module, and score is the weighted final result.

具体实施例十一：Specific embodiment eleven:

本申请实施例十一与实施例十的区别仅在于：The difference between the eleventh embodiment of the present application and the tenth embodiment is only:

本实施例提供了基于三维人体蒙皮数据的潜水员动作识别方法，包括：This embodiment provides a diver's action recognition method based on three-dimensional human skin data, including:

步骤1，利用三维人体姿态估计方法提取潜水员人体形状、姿态、顶点信息；Step 1, using the 3D human body posture estimation method to extract the shape, posture and vertex information of the diver's human body;

具体地，为了提高潜水员动作估计的准确性，本申请使用了目前效果较好的三维人体姿态估计网络RMOP。该网络输出

，分别表示了人体形状、姿态、顶点信息。其中，

，/>

，/>

。Specifically, in order to improve the accuracy of diver's motion estimation, this application uses RMOP, a 3D human pose estimation network with good effect at present. The network output

, respectively represent the human body shape, posture, and vertex information. in,

, />

.

步骤2，利用形状、姿态、顶点信息经过数据融合模块得到高层次语义信息；Step 2, using the shape, posture, and vertex information to obtain high-level semantic information through the data fusion module;

具体的，将顶点信息下采样得到

，把下采样结果/>

通过一个卷积网络，该网络只改变通道信息，而不改变其他维度信息，得到高层次顶点编码信息/>

。同时，将形状参数/>

也通过一个卷积网络，得到形状的编码信息/>

，最后将/>

，

拼接（embedding）到姿态参数/>

上，得到更高层次的语义信息/>

。Specifically, the vertex information is down-sampled to obtain

, put the downsampled result />

Through a convolutional network, the network only changes the channel information without changing other dimensional information, and obtains high-level vertex coding information />

. At the same time, the shape parameter />

Also through a convolutional network, the encoded information of the shape is obtained />

, and finally the />

,

Embedding (embedding) to pose parameters />

to get higher-level semantic information/>

.

步骤3，利用高层次语义信息经过STGCN模块进行动作识别；Step 3, using high-level semantic information to perform action recognition through the STGCN module;

具体的，三维人体蒙皮参数由24个人体蒙皮关键点表示，因此可以构造一个时空图。在关键点序列上构造一个无向时空图G = (V, E)，该序列包含N个关键点和T个帧，同时具有体内和帧间连接，即时间-空间图。层次语义信息维度大小为（S,24,7），其中S表示动作序列长度，24表示24个人体蒙皮关键点，7表示每个关键点的特征维度。使用采样函数指定对每个节点进行图卷积操作时，所涉及到的相邻节点范围。经过图卷积，学习到空间中相邻点的局部特征。经过时间卷积，学习到序列数据中的时序信息。抽取出的特性信息将经过一个全连接层和Softmax进行动作类别判断，使用L1损失作为损失函数，并使用真实动作类别标签Groud Truth做有监督学习。Specifically, the 3D human skin parameters are represented by 24 human skin key points, so a space-time map can be constructed. Construct an undirected spatio-temporal graph G = (V, E) on the keypoint sequence, which contains N keypoints and T frames, with both in-vivo and inter-frame connections, i.e., a spatio-temporal graph. The size of the hierarchical semantic information dimension is (S, 24, 7), where S represents the length of the action sequence, 24 represents the 24 human skin key points, and 7 represents the feature dimension of each key point. Use the sampling function to specify the range of adjacent nodes involved in the graph convolution operation for each node. Through graph convolution, the local features of adjacent points in space are learned. After time convolution, the timing information in the sequence data is learned. The extracted feature information will pass through a fully connected layer and Softmax to judge the action category, use L1 loss as the loss function, and use the real action category label Groud Truth for supervised learning.

步骤4，利用高层次语义信息经过TCA-GCN模块进行动作识别；Step 4, using high-level semantic information to perform action recognition through the TCA-GCN module;

具体的，该模块主要由两个子模块构成TCA模块和TF模块。其中，TCA模块能够考虑序列的时间与空间维度特征并进行结合。蒙皮序列数据通过时间聚合模块生成样本时间权重，之后使用信道聚合模块来有效地将空间动态通道级拓扑特征与时间动态拓扑特征相结合，生成TF模块的输入。TF模块能够将先前的时间建模卷积方法进行带有注意力方法的融合。经过两个子模块后，能够得到较好的特征信息，最后这些特征信息经过一个全连接层和Softmax进行动作类别判断，使用L1损失作为损失函数，并使用真实动作类别标签GroudTruth做有监督学习。Specifically, the module mainly consists of two sub-modules, a TCA module and a TF module. Among them, the TCA module can consider and combine the time and space dimension characteristics of the sequence. The skin sequence data generates sample time weights through the temporal aggregation module, and then uses the channel aggregation module to effectively combine the spatial dynamic channel-level topological features with the temporal dynamic topological features to generate the input of the TF module. The TF module is able to fuse previous temporal modeling convolution methods with attention methods. After two sub-modules, better feature information can be obtained. Finally, these feature information go through a fully connected layer and Softmax to judge the action category, use L1 loss as the loss function, and use the real action category label GroudTruth for supervised learning.

步骤5，融合两个模块的结果输出最终动作类别。Step 5, fusing the results of the two modules to output the final action category.

具体来说，使用加权线性融合的方式提升动作识别准确率。由于步骤5和步骤6两个模块考虑的数据特征不同，我们将两个模块的结果融合，作为输出，公式如下:Specifically, weighted linear fusion is used to improve the accuracy of action recognition. Since the data characteristics considered by the two modules of step 5 and step 6 are different, we fuse the results of the two modules as output, and the formula is as follows:

其中

是STGCN模块的动作识别结果，/>

为该结果的权重，/>

表示TCA-GCN模块的识别结果，score为加权后的最终结果。in

is the action recognition result of the STGCN module, />

is the weight of the result, />

本申请实施例的技术方案，通过更具体，更高层次的特征信息表示潜水员的动作，利用三维人体姿态估计方法得到形状、姿态、顶点等参数。通过本申请方法，将三维人体信息经过动作识别模块（STGCN模块和TCA-GCN模块）和线性加权模块，可以完成潜水员的动作识别。本申请为潜水员水下作业沟通交流提供了便利。In the technical solution of the embodiment of the present application, the action of the diver is represented by more specific and higher-level feature information, and parameters such as shape, posture, and vertex are obtained by using a three-dimensional human body posture estimation method. Through the method of this application, the diver's action recognition can be completed by passing the three-dimensional human body information through the action recognition module (STGCN module and TCA-GCN module) and the linear weighting module. This application provides convenience for divers to communicate in underwater operations.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、 “示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或 N 个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“N个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更N个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备（如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统）使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例（非穷尽性列表）包括以下：具有一个或N个布线的电连接部（电子装置），便携式计算机盘盒（磁装置），随机存取存储器（RAM），只读存储器（ROM），可擦除可编辑只读存储器（EPROM 或闪速存储器），光纤装置，以及便携式光盘只读存储器（CDROM）。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，N 个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列（PGA），现场可编程门阵列（FPGA）等。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily referring to the same embodiment or example. Moreover, the described specific features, structures, materials or characteristics may be combined in any one or N embodiments or examples in a suitable manner. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other. In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "N" means at least two, such as two, three, etc., unless otherwise specifically defined. Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a custom logical function or step of a process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain. The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with an instruction execution system, device, or device (such as a computer-based system, a system including a processor, or other systems that can fetch instructions from an instruction execution system, device, or device and execute instructions), or in conjunction with such an instruction execution system, device or equipment used. For purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transmit a program for use in or in conjunction with an instruction execution system, device, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connection with one or N wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read-Only Memory (ROM), Erasable and Editable Read-Only Memory (EPROM or Flash), Fiber Optic Devices, and Portable Compact Disc Read-Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, as it may be possible, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or other suitable processing if necessary. The program is processed electronically and stored in computer memory. It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the above-described embodiments, the N steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), etc.

以上所述仅是一种基于三维人体蒙皮的潜水员动作识别方法的优选实施方式，一种基于三维人体蒙皮的潜水员动作识别方法的保护范围并不仅局限于上述实施例，凡属于该思路下的技术方案均属于本发明的保护范围。应当指出，对于本领域的技术人员来说，在不脱离本发明原理前提下的若干改进和变化，这些改进和变化也应视为本发明的保护范围。The above is only a preferred implementation of a method for divers' action recognition based on three-dimensional human skin. The protection scope of a method for diver's action recognition based on three-dimensional human skin is not limited to the above-mentioned embodiments. All technical solutions belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and changes without departing from the principles of the present invention should also be regarded as the protection scope of the present invention.

Claims

1. a diver's action recognition method based on three-dimensional human body skin, is characterized in that: described method comprises the following steps:

Step 1: Extract the human body shape, posture and vertex information of the diver's video frame through the 3D human body pose estimation method;

Step 2: The human body shape, posture, and vertex data are fused to obtain high-level semantic information;

The step 2 is specifically:

The vertex information is down-sampled, and the down-sampled vertex information and shape information are respectively passed through the convolution module in the feature extraction network to obtain the encoded information, and the encoded information is spliced into the attitude information to obtain high-level semantic information;

Step 3: Use high-level semantic information to perform action recognition through the TCA-GCN module;

Step 4: Use high-level semantic information to perform action recognition through the STGCN module;

Step 5: Perform linear fusion of the recognition results of Step 3 and Step 4 to recognize the diver's action;

The step 5 is specifically:

The results of step 3 and step 4 are fused as output, and the output result is represented by the following formula:

score=γ*score _st +(1-γ)score _tca

Among them, score _st is the action recognition result of the STGCN module, γ is the weight of the result, score _tca is the recognition result of the TCA-GCN module, and score is the final weighted output result.

2. The method according to claim 1, characterized in that: said step 3 is specifically:

The TCA-GCN module includes the TCA module and the TF module. The TCA module mainly considers the spatio-temporal dimension features of high-level semantic information and combines them, and then through the TF module, the result of the temporal modeling convolution is fused with an attention method. Finally, the extracted spatiotemporal information features are passed through the fully connected layer and the Softmax layer to obtain the estimated action category.

3. The method according to claim 2, characterized in that:

The TCA module includes time aggregation, topology generation, and two-part channel dimension aggregation, where the TCA module F _out is represented by the following formula:

A _out ＝TA(W(W),X)＝(W ₁ ,X ₁ )|||…||(A _T ,X _T )

S=μ(A _k )+α·Q

Among them, CA represents the channel dimension aggregation, || represents the splicing operation, A _out is the structure of the diver's joint features after time aggregation, S represents the result of the feature after topology generation processing, F _out is the joint point feature in the channel dimension Aggregation, A _out1 is the convolution result of joint point 1 in the time dimension, S ₁ is the result of topological processing of joint point 1,

It is the structure of joint point feature No. 1 after time aggregation, />

TA is the time aggregation module, W(W) is the time weight feature, X is the joint point feature, W ₁ is the time weight feature of the joint point 1, X ₁ is 1 A _T is the time weight feature of the T-th joint point, X _T is the joint point feature of the T-th joint point, μ is the normalization and dimension transformation operation of the third-order adjacency matrix, A _k is the The adjacency matrix of k channels, α is a trainable parameter of joint connection strength, and Q is the channel correlation matrix.

4. The method according to claim 3, characterized in that:

The TF module Z _out is represented by the following formula:

Z _out = sk(MSCONV(F _out ))

MSCONV is a multi-convolution function, and finally combined with temporal modeling to generate the final TCA-GCN, the obtained spatio-temporal feature information is passed through the fully connected layer and Softmax to judge the action category, using L1 loss as the loss function, and using the real action category Label Groud Truth for supervised learning.

5. The method according to claim 4, characterized in that: said step 4 is specifically:

The STGCN module includes a graph convolution module and a time convolution module. After graph convolution, the local features of adjacent points in the space are learned, and after time convolution, the timing information in the sequence data is learned; the extracted spatiotemporal information features The estimated action category is obtained through the fully connected layer and the Softmax layer.

6. A diver's action recognition system based on three-dimensional human skin, characterized in that: the system includes:

Data extraction module, described data extraction module extracts the human body shape of diver's video frame, posture and vertex information by three-dimensional human body posture estimation method;

Data fusion module, the data fusion module: obtain high-level semantic information through data fusion of human body shape, posture, and vertex data;

TCA-GCN action estimation module: use high-level semantic information to perform action recognition through the TCA-GCN module;

STGCN action estimation module: use high-level semantic information to perform action recognition through the STGCN module;

A linear fusion module, which linearly fuses the identification results of the TCA-GCN module and the STGCN module to identify diver actions;

The results of action recognition performed by the TCA-GCN module and the action recognition performed by the STGCN module are fused as output, and the output result is expressed by the following formula:

score=γ*score _st +(1-γ)score _tca

7. A computer-readable storage medium on which a computer program is stored, wherein the program is executed by a processor to implement the method according to any one of claims 1-5.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor implements the method according to any one of claims 1-5 when executing the computer program .