WO2021237875A1 - Hand data recognition method and system based on graph convolutional network, and storage medium - Google Patents

Hand data recognition method and system based on graph convolutional network, and storage medium Download PDF

Info

Publication number
WO2021237875A1
WO2021237875A1 PCT/CN2020/099766 CN2020099766W WO2021237875A1 WO 2021237875 A1 WO2021237875 A1 WO 2021237875A1 CN 2020099766 W CN2020099766 W CN 2020099766W WO 2021237875 A1 WO2021237875 A1 WO 2021237875A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
hand
dimensional
coordinates
convolutional network
Prior art date
Application number
PCT/CN2020/099766
Other languages
French (fr)
Chinese (zh)
Inventor
黄昌正
周言明
陈曦
霍炼楚
Original Assignee
广州幻境科技有限公司
肇庆市安可电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州幻境科技有限公司, 肇庆市安可电子科技有限公司 filed Critical 广州幻境科技有限公司
Publication of WO2021237875A1 publication Critical patent/WO2021237875A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

A hand data recognition method and system based on a graph convolutional network, and a storage medium. The method comprises the following steps: obtaining a hand image in a preset state (S11); extracting a feature image of the hand image, a key point coordinate, and a two-dimensional thermal image (S12); combining the feature image with the two-dimensional thermal image to generate a feature vector (S13); generating a three-dimensional joint point position coordinate according to the feature vector and the key point coordinate (S14); and restoring a hand gesture according to the three-dimensional joint point position coordinate (S15). By means of the method, in the virtual interaction process, interaction personnel can accurately complete the virtual interaction process without wearing specific gloves, thus simplifying an application device for the virtual interaction process, and broadening application scenarios to a certain extent.

Description

基于图卷积网络的手部数据识别方法、系统和存储介质Hand data recognition method, system and storage medium based on graph convolutional network 技术领域Technical field
本发明涉及计算机视觉技术领域,尤其是一种基于图卷积网络的手部数据识别方法、系统和存储介质。The present invention relates to the field of computer vision technology, in particular to a method, system and storage medium for hand data recognition based on graph convolutional networks.
背景技术Background technique
在虚拟现实的交互过程中,手部姿态识别过程是通过在手部上穿戴特定的手套,使特定手套追踪手部姿态数据,虚拟设备接收该手部的实时姿态,并在虚拟现实界面内进行跟踪显示,以提高虚拟现实界面内的真实感。然而,特定手套及其配套设施严重限制了应用范围,从而使得虚拟设备无法得到有效推广。In the interactive process of virtual reality, the hand gesture recognition process is to wear a specific glove on the hand to make the specific glove track the hand posture data. The virtual device receives the real-time posture of the hand and performs it in the virtual reality interface. Tracking display to improve the sense of realism in the virtual reality interface. However, specific gloves and their supporting facilities severely limit the scope of application, making virtual devices unable to be effectively promoted.
发明内容Summary of the invention
为解决上述技术问题,本发明的目的在于:提供一种基于图卷积网络的手部数据识别方法、系统和存储介质,其能在一定程度上扩宽应用场景。In order to solve the above technical problems, the purpose of the present invention is to provide a hand data recognition method, system and storage medium based on graph convolutional network, which can broaden application scenarios to a certain extent.
本发明实施例的第一方面提供了:The first aspect of the embodiments of the present invention provides:
一种基于图卷积网络的手部数据识别方法,包括以下步骤:A hand data recognition method based on graph convolutional network includes the following steps:
获取预设状态的手部图像;Obtain a hand image in a preset state;
提取所述手部图像的特征图像、关键点坐标和二维热图像;Extracting feature images, key point coordinates and two-dimensional thermal images of the hand image;
将所述特征图像和所述二维热图像进行结合,生成特征向量;Combining the feature image and the two-dimensional thermal image to generate a feature vector;
根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;Generating three-dimensional joint point position coordinates according to the feature vector and the key point coordinates;
根据所述三维关节点位置坐标还原手部姿态。Restore the hand posture according to the three-dimensional joint point position coordinates.
进一步地,所述提取所述手部图像的关键点坐标和二维热图像,包括:Further, the extracting the key point coordinates and the two-dimensional thermal image of the hand image includes:
采用堆叠沙漏网络从所述第一图像中提取关键点特征位置;Extracting key point feature positions from the first image by using a stacked hourglass network;
根据所述关键点特征位置预测所述二维热图,以及确定所述关键点坐标。Predicting the two-dimensional heat map according to the characteristic positions of the key points, and determining the coordinates of the key points.
进一步地,所述将所述特征图像和所述二维热图像进行结合,生成特征向量,包括:Further, the combining the characteristic image and the two-dimensional thermal image to generate a characteristic vector includes:
将所述二维热图像的尺寸大小转换为所述特征图像的尺寸大小;Converting the size of the two-dimensional thermal image into the size of the characteristic image;
根据所述特征图像和尺寸转化后的所述二维热图通过卷积网络计算得到特征向量。According to the feature image and the size-converted two-dimensional heat map, a feature vector is calculated through a convolutional network.
进一步地,所述根据所述特征向量和所述关键点坐标生成三维关节点位置坐标,包括:Further, the generating three-dimensional joint point position coordinates according to the feature vector and the key point coordinates includes:
根据所述特征向量计算得到三维网格的顶点坐标;Calculating the vertex coordinates of the three-dimensional grid according to the feature vector;
根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标。The three-dimensional joint point position coordinates are calculated according to the vertex coordinates and the key point coordinates.
进一步地,所述根据所述特征向量计算得到三维网格的顶点坐标,其具体为:Further, the vertex coordinates of the three-dimensional grid obtained by calculating according to the feature vector are specifically:
根据所述特征向量采用图卷积网络计算得到三维网格的所有顶点坐标。According to the feature vector, the graph convolution network is used to calculate the coordinates of all the vertices of the three-dimensional grid.
进一步地,所述根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标,其具体为:Further, the calculation of the three-dimensional joint point position coordinates according to the vertex coordinates and the key point coordinates is specifically:
根据所述顶点坐标和所述关键点坐标采用线性图卷积网络回归三维关节点位置坐标。According to the vertex coordinates and the key point coordinates, a linear graph convolution network is used to regress the three-dimensional joint point position coordinates.
进一步地,所述根据所述三维关节点位置坐标还原手部姿态,其具体为:Further, the restoration of the hand posture according to the three-dimensional joint point position coordinates is specifically:
根据所述三维关节点位置坐标在虚拟现实界面中还原手部图像对应的手部姿态。Restore the hand posture corresponding to the hand image in the virtual reality interface according to the three-dimensional joint point position coordinates.
本发明实施例的第二方面提供了:The second aspect of the embodiments of the present invention provides:
一种基于图卷积网络的手部数据识别系统,包括:A hand data recognition system based on graph convolutional network, including:
获取模块,用于获取预设状态的手部图像;The acquisition module is used to acquire a hand image in a preset state;
提取模块,用于提取所述手部图像的特征图像、关键点坐标和二维热图像;An extraction module for extracting feature images, key point coordinates and two-dimensional thermal images of the hand image;
结合模块,用于将所述特征图像和所述二维热图像进行结合,生成特征向量;The combining module is used to combine the feature image and the two-dimensional thermal image to generate a feature vector;
生成模块,用于根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;A generating module, configured to generate three-dimensional joint point position coordinates according to the feature vector and the key point coordinates;
还原模块,用于根据所述三维关节点位置坐标还原手部姿态。The restoration module is used to restore the hand posture according to the position coordinates of the three-dimensional joint points.
本发明实施例的第三方面提供了:The third aspect of the embodiments of the present invention provides:
一种基于图卷积网络的手部数据识别系统,包括:A hand data recognition system based on graph convolutional network, including:
至少一个存储器,用于存储程序;At least one memory for storing programs;
至少一个处理器,用于加载所述程序以执行所述的基于图卷积网络的手部数据识别方法。At least one processor is configured to load the program to execute the hand data recognition method based on the graph convolutional network.
本发明实施例的第四方面提供了:The fourth aspect of the embodiments of the present invention provides:
一种计算机可读存储介质,其中存储有处理器可执行的指令,所述处理器可执行的指令在由处理器执行时用于实现所述的基于图卷积网络的手部数据识别方法。A computer-readable storage medium, in which instructions executable by a processor are stored, and the instructions executable by the processor are used to implement the hand data recognition method based on graph convolutional network when executed by the processor.
本发明实施例的有益效果是:本发明实施例通过获取预设状态的手部图像,并提手部图像的特征图像、关键点坐标和二维热图像,接着将特征图像和二维热图像进行结合后生成特征向量,然后根据特征向量和关键点坐标生成三维关节点位置坐标,最后根据三维关节点位置坐标还原手部姿态,使得在虚拟交互过程中,交互人员无需佩戴特定手套即能完成交互过程,从而简化虚拟交互过程的应用设备,以在一定程度上扩宽应用场景。The beneficial effect of the embodiment of the present invention is that the embodiment of the present invention obtains a hand image in a preset state, and handles the feature image, key point coordinates, and two-dimensional thermal image of the hand image, and then combines the feature image and the two-dimensional thermal image After the combination, the feature vector is generated, and then the three-dimensional joint point position coordinates are generated according to the feature vector and the key point coordinates, and finally the hand posture is restored according to the three-dimensional joint point position coordinates, so that during the virtual interaction process, the interactor can complete the process without wearing special gloves. The interaction process, thereby simplifying the application equipment of the virtual interaction process, in order to broaden the application scenarios to a certain extent.
附图说明Description of the drawings
图1为本发明一种具体实施例的基于图卷积网络的手部数据识别方法的流程图;FIG. 1 is a flowchart of a method for hand data recognition based on a graph convolutional network according to a specific embodiment of the present invention;
图2为一种具体实施例的堆叠沙漏网络结构示意图;FIG. 2 is a schematic diagram of a stacked hourglass network structure according to a specific embodiment;
图3为一种具体实施例的21个关节节点的分布示意图。Fig. 3 is a schematic diagram of the distribution of 21 joint nodes in a specific embodiment.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明做进一步的详细说明。对于以下实施例中的步骤编 号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。The present invention will be further described in detail below in conjunction with the drawings and specific embodiments. For the step numbers in the following embodiments, they are set only for ease of elaboration and description, and there is no restriction on the order between the steps. The execution order of the steps in the embodiments can be adapted according to the understanding of those skilled in the art. Sexual adjustment.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, “some embodiments” are referred to, which describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments, and Can be combined with each other without conflict.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terminology used herein is only for the purpose of describing the embodiments of the application, and is not intended to limit the application.
参照图1,本发明实施例提供了一种基于图卷积网络的手部数据识别方法,本实施例应用于控制服务器,该控制服务器可以与多个终端设备通信。其中,终端设备可以是摄像头、虚拟显示设备等。1, an embodiment of the present invention provides a hand data recognition method based on a graph convolutional network. This embodiment is applied to a control server, and the control server can communicate with multiple terminal devices. Among them, the terminal device may be a camera, a virtual display device, and so on.
本实施例包括步骤S11-S15:This embodiment includes steps S11-S15:
S11、获取预设状态的手部图像;该手部图像可通过普通的RGB摄像头获取。该预设状态是指拍摄场景下手部在图像的中心位置,同时手部所占图像比例适中。S11. Obtain a hand image in a preset state; the hand image can be obtained through an ordinary RGB camera. The preset state means that the hand is at the center of the image in the shooting scene, and the hand occupies a moderate proportion of the image.
S12、提取手部图像的特征图像、关键点坐标和二维热图像;具体地,可使用堆叠沙漏网络从手部图像中提取关键点像素位置,预测手部关键点热图,确定初始关键点坐标。S12. Extract the feature image, key point coordinates and two-dimensional thermal image of the hand image; specifically, the stacked hourglass network can be used to extract the key point pixel position from the hand image, predict the hand key point heat map, and determine the initial key point coordinate.
如图2所示,堆叠沙漏网络是一种对称结构的网络架构。本步骤中是利用其多尺度特征来识别姿态,获取低分辨率特征过程中每一个网络层,则在上采样的过程中相应的就会有一个对应网络层。整体的网络架构先利用卷积和池化操作将特征降到一个很低的分辨率,例如4*4。在每一步最大池化操作的时候,网络都会加上一个新的卷积分支,用于直接对池化前的原始分辨率提取特征,类似残差操作,并与后面上采样操作后提取出的特征相融合。在到达最低分辨率之后,网络就开始对特征进行上采样,即最近邻插值,并结合不同尺度下的信息,之后再与前面连接过的特征按元素位置相加。当到达输出分辨率时,再接2个的卷积来进行最后的运算。最终网络的输出就是一组关键点热图,用于预测如图3所示的21个关键点在每一个像素点存在的概率。如图2所示,从C1到C4是一个下采样的过程,特征图的分辨率逐渐降低,而C1a、C2a、C3a和C4a是与之对应的特征图进行下采样前的一个备份。将达到最低分辨率的特征图逐步进行上采样,然后分辨率恢复的特征图和相对应备份的原特征图进行结合得到C1b、C2b、C3b和C4b。在不同的特征图下,对应提取手部不同的关键点即可以取得较好的精度。As shown in Figure 2, the stacked hourglass network is a symmetrical network architecture. In this step, its multi-scale features are used to recognize gestures, and for each network layer in the process of obtaining low-resolution features, there will be a corresponding network layer during the up-sampling process. The overall network architecture first uses convolution and pooling operations to reduce the features to a very low resolution, such as 4*4. In each step of the maximum pooling operation, the network will add a new convolution branch, which is used to directly extract features from the original resolution before pooling, similar to the residual operation, and extracted from the subsequent upsampling operation Feature fusion. After reaching the lowest resolution, the network starts to up-sampling the features, that is, nearest neighbor interpolation, and combines the information at different scales, and then adds the previously connected features according to the element position. When the output resolution is reached, two more convolutions are used for the final calculation. The output of the final network is a set of key point heat maps, which are used to predict the probability of the existence of 21 key points in each pixel as shown in Figure 3. As shown in Figure 2, from C1 to C4 is a down-sampling process, the resolution of the feature map is gradually reduced, and C1a, C2a, C3a, and C4a are a backup of the corresponding feature maps before down-sampling. The feature map that reaches the lowest resolution is gradually up-sampled, and then the restored feature map and the corresponding backup original feature map are combined to obtain C1b, C2b, C3b, and C4b. Under different feature maps, correspondingly extracting different key points of the hand can achieve better accuracy.
S13、将特征图像和二维热图像进行结合,生成特征向量;该特征向量为关键点的特征向 量。其具体是在将特征图像和二维热图像进行结合时,输入由8个残差层和4个池化层组成的残差网络,以生成关键点特征向量。S13. Combine the feature image and the two-dimensional thermal image to generate a feature vector; the feature vector is the feature vector of the key point. Specifically, when the feature image and the two-dimensional thermal image are combined, the residual network composed of 8 residual layers and 4 pooling layers is input to generate the key point feature vector.
在一些实施例中,步骤S13可以通过以下步骤实现:In some embodiments, step S13 can be implemented through the following steps:
将二维热图像的尺寸大小转换为特征图像的尺寸大小;其可以使用1*1的卷积将包含关键点的二维热图像的尺寸大小转化为特征图像的尺寸大小。The size of the two-dimensional thermal image is converted into the size of the feature image; it can use 1*1 convolution to convert the size of the two-dimensional thermal image containing the key points into the size of the feature image.
根据特征图像和尺寸转化后的二维热图通过卷积网络计算得到特征向量。According to the feature image and the transformed two-dimensional heat map, the feature vector is calculated through the convolutional network.
在本实施例中,卷积网络的结构类似resnet18,由8个残差层和4个池化层组成,采用该卷积网络进行特征向量计算,提高计算结果的准确性。In this embodiment, the structure of the convolutional network is similar to resnet18 and consists of 8 residual layers and 4 pooling layers. The convolutional network is used to perform feature vector calculations to improve the accuracy of the calculation results.
S14、根据特征向量和关键点坐标生成三维关节点位置坐标;S14: Generate three-dimensional joint point position coordinates according to the feature vector and key point coordinates;
具体地,本步骤是先根据特征向量计算得到三维网格的顶点坐标,接着根据顶点坐标和所述关键点坐标计算得到三维关节点位置坐标。Specifically, this step is to first calculate the vertex coordinates of the three-dimensional grid according to the feature vector, and then calculate the three-dimensional joint point position coordinates according to the vertex coordinates and the key point coordinates.
在一些实施例中,根据特征向量计算得到三维网格的顶点坐标,其具体可以通过以下步骤实现:In some embodiments, the vertex coordinates of the three-dimensional grid are calculated according to the feature vector, which can be specifically implemented by the following steps:
根据特征向量采用图卷积网络计算得到三维网格的所有顶点坐标。According to the feature vector, the graph convolution network is used to calculate the coordinates of all the vertices of the three-dimensional grid.
具体是把关键点特征向量输入到图卷积网络,图卷积网络经过一系列网络层的计算输出3D网格中所有顶点的3D坐标,利用该3D网格中顶点的3D坐标重建手部表面的3D网格。Specifically, the key point feature vector is input to the graph convolutional network. The graph convolutional network outputs the 3D coordinates of all vertices in the 3D grid through a series of network layer calculations, and uses the 3D coordinates of the vertices in the 3D grid to reconstruct the hand surface 3D grid.
手部3D网格其本质是图形结构,因此,3D网格可以采用无向图M=(V,ε,W)表示,其中,
Figure PCTCN2020099766-appb-000001
是网格中N个顶点的集合,
Figure PCTCN2020099766-appb-000002
是网格中E条边的集合,W={w ij} N×N是邻接矩阵。
The 3D grid of the hand is essentially a graphic structure. Therefore, the 3D grid can be represented by an undirected graph M=(V,ε,W), where,
Figure PCTCN2020099766-appb-000001
Is the set of N vertices in the grid,
Figure PCTCN2020099766-appb-000002
Is the set of E edges in the grid, W={w ij } N×N is the adjacency matrix.
定义图M顶点上的信号f=(f 1,…,f N) T∈R N×F,用于表示3D网格中N个顶点的F维特征,在切比雪夫图卷积中,信号
Figure PCTCN2020099766-appb-000003
上的图卷积运算定义为公式1:
Define the signal f=(f 1 ,…,f N ) T ∈R N×F on the vertices of the graph M, which is used to represent the F-dimensional features of the N vertices in the 3D grid. In the Chebyshev graph convolution, the signal
Figure PCTCN2020099766-appb-000003
The graph convolution operation above is defined as formula 1:
Figure PCTCN2020099766-appb-000004
Figure PCTCN2020099766-appb-000004
其中,T K(x)=2xT K-1(x)-T K-2(x)是k阶切比雪夫多项式,T 0=1,T 0=x,
Figure PCTCN2020099766-appb-000005
是重新缩放的拉普拉斯算子,
Figure PCTCN2020099766-appb-000006
λ max是L的最大特征值,θ k∈R Fin×Fout是图卷积层中的可训练参数,
Figure PCTCN2020099766-appb-000007
是图卷积层的输出信号。
Among them, T K (x)=2xT K-1 (x)-T K-2 (x) is a Chebyshev polynomial of order k, T 0 =1, T 0 =x,
Figure PCTCN2020099766-appb-000005
Is the rescaled Laplacian,
Figure PCTCN2020099766-appb-000006
λ max is the maximum eigenvalue of L, θ k ∈R Fin×Fout is the trainable parameter in the graph convolutional layer,
Figure PCTCN2020099766-appb-000007
Is the output signal of the graph convolutional layer.
在预先定义的标识手部表面的三角形网格的图结构上,首先执行图粗化操作,类似于卷积神经网络池化的过程,使用Graclus多级聚类算法来粗化图向量,并创建树结构来存储相邻 粗化级别的图向量中顶点的对应关系,在图卷积前向传播器件,将已粗化后的图向量中的顶点特征上采样到图结构中的相应子顶点,最后执行图卷积以更新图网络中的特征,所有图卷积层的参数K设置为3。On the pre-defined map structure of the triangle grid that identifies the hand surface, first perform the image coarsening operation, similar to the process of convolutional neural network pooling, using the Graclus multi-level clustering algorithm to coarsen the image vector, and create The tree structure stores the correspondence between the vertices in the graph vectors of adjacent coarsening levels, and the forward propagation device in the graph convolution will upsample the vertex features in the coarsened graph vectors to the corresponding sub-vertices in the graph structure, Finally, the graph convolution is performed to update the features in the graph network, and the parameter K of all graph convolution layers is set to 3.
具体是从沙漏网络提取的特征向量作为图卷积的输入,通过两个全连接层,特征向量在图形粗化过程中转换为具有64维特征的80个顶点,接着这些特征在卷积过程中被上采样由低维度向高维度转化。通过两个上采样层和四个图形卷积层,网络输出1280个网格顶点的3D坐标。Specifically, the feature vector extracted from the hourglass network is used as the input of the graph convolution. Through two fully connected layers, the feature vector is converted into 80 vertices with 64-dimensional features during the graph coarsening process, and these features are then used in the convolution process. The up-sampling is transformed from low-dimensional to high-dimensional. Through two upsampling layers and four graphics convolutional layers, the network outputs the 3D coordinates of 1280 mesh vertices.
在一些实施例中,根据顶点坐标和所述关键点坐标计算得到三维关节点位置坐标,其可通过以下方式实现:In some embodiments, the three-dimensional joint point position coordinates are calculated according to the vertex coordinates and the key point coordinates, which can be implemented in the following ways:
根据顶点坐标和关键点坐标采用线性图卷积网络回归三维关节点位置坐标。According to the vertex coordinates and key point coordinates, a linear graph convolution network is used to return the three-dimensional joint point position coordinates.
本实施例中具体可使用简化的线性图卷积,从三维手部网格顶点坐标线性回归3D手关节点位置坐标。三维网格顶点坐标包含了整个手部的关键点坐标,可以直接从中筛选出21个关节节点的三维坐标,如图3所示,在一个手部上,从0关节点-20关节点共21个关节节点涵盖了整个手部姿势。使用不带非线性激活模块的两层图卷积网络直接从三维网格顶点估计三维关节深度信息,然后利用前面获取的二维关键点,生成三维关节位置坐标。In this embodiment, a simplified linear graph convolution can be specifically used to linearly regress the position coordinates of the 3D hand joint points from the coordinates of the vertices of the three-dimensional hand grid. The vertex coordinates of the three-dimensional mesh include the coordinates of the key points of the entire hand, from which the three-dimensional coordinates of 21 joint nodes can be directly filtered. As shown in Figure 3, on a hand, there are 21 joint points from 0 joint points to 20 joint points. A joint node covers the entire hand posture. A two-layer graph convolutional network without a nonlinear activation module is used to directly estimate the 3D joint depth information from the 3D mesh vertices, and then use the previously obtained 2D key points to generate 3D joint position coordinates.
本实施例中能够提取到涵盖整个手部姿态的关节点坐标,从而提高虚拟现实中虚拟手部姿态同步过程中的准确性。In this embodiment, the coordinates of the joint points covering the entire hand posture can be extracted, thereby improving the accuracy of the virtual hand posture synchronization process in virtual reality.
S15、根据三维关节点位置坐标还原手部姿态。其具体是根据三维关节点位置坐标在虚拟现实界面中还原手部图像对应的手部姿态,以使虚拟现实中的手部姿态数据最大程度的同步实际手部姿态,增强虚拟交互过程中的同步性。S15: Restore the hand posture according to the position coordinates of the three-dimensional joint points. Specifically, it restores the hand posture corresponding to the hand image in the virtual reality interface according to the three-dimensional joint point position coordinates, so that the hand posture data in the virtual reality can be synchronized with the actual hand posture to the greatest extent, and the synchronization in the virtual interaction process is enhanced sex.
综上所述,本实施例通过获取预设状态的手部图像,并提手部图像的特征图像、关键点坐标和二维热图像,接着将特征图像和二维热图像进行结合后生成特征向量,然后根据特征向量和关键点坐标生成三维关节点位置坐标,最后根据三维关节点位置坐标还原手部姿态,使得在虚拟交互过程中,交互人员无需佩戴特定手套即能完成虚拟交互过程,从而简化虚拟交互过程的应用设备,以在一定程度上扩宽应用场景。In summary, this embodiment obtains a hand image in a preset state, and handles the feature image, key point coordinates, and two-dimensional thermal image of the hand image, and then combines the feature image and the two-dimensional thermal image to generate the feature Vector, and then generate the three-dimensional joint point position coordinates according to the feature vector and the key point coordinates, and finally restore the hand posture according to the three-dimensional joint point position coordinates, so that in the virtual interaction process, the interactor can complete the virtual interaction process without wearing special gloves. Application equipment that simplifies the virtual interaction process to expand application scenarios to a certain extent.
本发明实施例提供了一种与图1方法相对应的基于图卷积网络的手部数据识别系统,包括:The embodiment of the present invention provides a hand data recognition system based on a graph convolutional network corresponding to the method in FIG. 1, including:
获取模块,用于获取预设状态的手部图像;The acquisition module is used to acquire a hand image in a preset state;
提取模块,用于提取所述手部图像的特征图像、关键点坐标和二维热图像;An extraction module for extracting feature images, key point coordinates and two-dimensional thermal images of the hand image;
结合模块,用于将所述特征图像和所述二维热图像进行结合,生成特征向量;The combining module is used to combine the feature image and the two-dimensional thermal image to generate a feature vector;
生成模块,用于根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;A generating module, configured to generate three-dimensional joint point position coordinates according to the feature vector and the key point coordinates;
还原模块,用于根据所述三维关节点位置坐标还原手部姿态。The restoration module is used to restore the hand posture according to the position coordinates of the three-dimensional joint points.
本发明方法实施例的内容均适用于本系统实施例,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同。The contents of the method embodiments of the present invention are all applicable to the system embodiments, and the specific functions implemented by the system embodiments are the same as those of the foregoing method embodiments, and the beneficial effects achieved are also the same as those of the foregoing method.
本发明实施例提供了一种基于图卷积网络的手部数据识别系统,包括:The embodiment of the present invention provides a hand data recognition system based on graph convolutional network, including:
至少一个存储器,用于存储程序;At least one memory for storing programs;
至少一个处理器,用于加载所述程序以执行所述的基于图卷积网络的手部数据识别方法。At least one processor is configured to load the program to execute the hand data recognition method based on the graph convolutional network.
本发明方法实施例的内容均适用于本系统实施例,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同。The contents of the method embodiments of the present invention are all applicable to the system embodiments, and the specific functions implemented by the system embodiments are the same as those of the foregoing method embodiments, and the beneficial effects achieved are also the same as those of the foregoing method.
此外,本发明实施例提供了一种计算机可读存储介质,其中存储有处理器可执行的指令,所述处理器可执行的指令在由处理器执行时用于实现所述的基于图卷积网络的手部数据识别方法。In addition, an embodiment of the present invention provides a computer-readable storage medium, in which instructions executable by a processor are stored, and the instructions executable by the processor are used to implement the graph-based convolution when executed by the processor. Hand data recognition method of the network.
以上是对本发明的较佳实施进行了具体说明,但本发明并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a detailed description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. These equivalent modifications or replacements are all included in the scope defined by the claims of this application.

Claims (10)

  1. 一种基于图卷积网络的手部数据识别方法,其特征在于,包括以下步骤:A hand data recognition method based on graph convolutional network is characterized in that it comprises the following steps:
    获取预设状态的手部图像;Obtain a hand image in a preset state;
    提取所述手部图像的特征图像、关键点坐标和二维热图像;Extracting feature images, key point coordinates and two-dimensional thermal images of the hand image;
    将所述特征图像和所述二维热图像进行结合,生成特征向量;Combining the feature image and the two-dimensional thermal image to generate a feature vector;
    根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;Generating three-dimensional joint point position coordinates according to the feature vector and the key point coordinates;
    根据所述三维关节点位置坐标还原手部姿态。Restore the hand posture according to the three-dimensional joint point position coordinates.
  2. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述提取所述手部图像的关键点坐标和二维热图像,包括:The hand data recognition method based on graph convolutional network according to claim 1, wherein said extracting the key point coordinates and two-dimensional thermal image of the hand image comprises:
    采用堆叠沙漏网络从所述第一图像中提取关键点特征位置;Extracting key point feature positions from the first image by using a stacked hourglass network;
    根据所述关键点特征位置预测所述二维热图,以及确定所述关键点坐标。Predicting the two-dimensional heat map according to the characteristic positions of the key points, and determining the coordinates of the key points.
  3. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述将所述特征图像和所述二维热图像进行结合,生成特征向量,包括:The hand data recognition method based on graph convolutional network according to claim 1, wherein said combining said characteristic image and said two-dimensional thermal image to generate a characteristic vector comprises:
    将所述二维热图像的尺寸大小转换为所述特征图像的尺寸大小;Converting the size of the two-dimensional thermal image into the size of the characteristic image;
    根据所述特征图像和尺寸转化后的所述二维热图通过卷积网络计算得到特征向量。According to the feature image and the size-converted two-dimensional heat map, a feature vector is calculated through a convolutional network.
  4. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述特征向量和所述关键点坐标生成三维关节点位置坐标,包括:The hand data recognition method based on graph convolutional network according to claim 1, wherein said generating three-dimensional joint point position coordinates according to said feature vector and said key point coordinates comprises:
    根据所述特征向量计算得到三维网格的顶点坐标;Calculating the vertex coordinates of the three-dimensional grid according to the feature vector;
    根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标。The three-dimensional joint point position coordinates are calculated according to the vertex coordinates and the key point coordinates.
  5. 根据权利要求4所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述特征向量计算得到三维网格的顶点坐标,其具体为:The hand data recognition method based on graph convolutional network according to claim 4, characterized in that the vertex coordinates of the three-dimensional grid are calculated according to the feature vector, which is specifically:
    根据所述特征向量采用图卷积网络计算得到三维网格的所有顶点坐标。According to the feature vector, the graph convolution network is used to calculate the coordinates of all the vertices of the three-dimensional grid.
  6. 根据权利要求4所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标,其具体为:The method for recognizing hand data based on graph convolutional network according to claim 4, wherein the three-dimensional joint point position coordinates are calculated according to the vertex coordinates and the key point coordinates, which are specifically:
    根据所述顶点坐标和所述关键点坐标采用线性图卷积网络回归三维关节点位置坐标。According to the vertex coordinates and the key point coordinates, a linear graph convolution network is used to regress the three-dimensional joint point position coordinates.
  7. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述三维关节点位置坐标还原手部姿态,其具体为:The hand data recognition method based on graph convolutional network according to claim 1, wherein said restoring hand posture according to said three-dimensional joint point position coordinates is specifically:
    根据所述三维关节点位置坐标在虚拟现实界面中还原手部图像对应的手部姿态。Restore the hand posture corresponding to the hand image in the virtual reality interface according to the three-dimensional joint point position coordinates.
  8. 一种基于图卷积网络的手部数据识别系统,其特征在于,包括:A hand data recognition system based on graph convolutional network, which is characterized in that it comprises:
    获取模块,用于获取预设状态的手部图像;The acquisition module is used to acquire a hand image in a preset state;
    提取模块,用于提取所述手部图像的特征图像、关键点坐标和二维热图像;An extraction module for extracting feature images, key point coordinates and two-dimensional thermal images of the hand image;
    结合模块,用于将所述特征图像和所述二维热图像进行结合,生成特征向量;The combining module is used to combine the feature image and the two-dimensional thermal image to generate a feature vector;
    生成模块,用于根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;A generating module, configured to generate three-dimensional joint point position coordinates according to the feature vector and the key point coordinates;
    还原模块,用于根据所述三维关节点位置坐标还原手部姿态。The restoration module is used to restore the hand posture according to the position coordinates of the three-dimensional joint points.
  9. 一种基于图卷积网络的手部数据识别系统,其特征在于,包括:A hand data recognition system based on graph convolutional network, which is characterized in that it comprises:
    至少一个存储器,用于存储程序;At least one memory for storing programs;
    至少一个处理器,用于加载所述程序以执行如权利要求1-7任一项所述的基于图卷积网络的手部数据识别方法。At least one processor is configured to load the program to execute the hand data recognition method based on graph convolutional network according to any one of claims 1-7.
  10. 一种计算机可读存储介质,其中存储有处理器可执行的指令,其特征在于,所述处理器可执行的指令在由处理器执行时用于实现如权利要求1-7任一项所述的基于图卷积网络的手部数据识别方法。A computer-readable storage medium storing instructions executable by a processor, wherein the instructions executable by the processor are used to implement the instructions in any one of claims 1-7 when executed by the processor. Hand data recognition method based on graph convolutional network.
PCT/CN2020/099766 2020-05-29 2020-07-01 Hand data recognition method and system based on graph convolutional network, and storage medium WO2021237875A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010473675.3 2020-05-29
CN202010473675.3A CN111753669A (en) 2020-05-29 2020-05-29 Hand data identification method, system and storage medium based on graph convolution network

Publications (1)

Publication Number Publication Date
WO2021237875A1 true WO2021237875A1 (en) 2021-12-02

Family

ID=72674421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099766 WO2021237875A1 (en) 2020-05-29 2020-07-01 Hand data recognition method and system based on graph convolutional network, and storage medium

Country Status (2)

Country Link
CN (1) CN111753669A (en)
WO (1) WO2021237875A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260774A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Method and device for generating 3D joint point regression model
CN114565815A (en) * 2022-02-25 2022-05-31 包头市迪迦科技有限公司 Intelligent video fusion method and system based on three-dimensional model
CN116486489A (en) * 2023-06-26 2023-07-25 江西农业大学 Three-dimensional hand object posture estimation method and system based on semantic perception graph convolution

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095254B (en) * 2021-04-20 2022-05-24 清华大学深圳国际研究生院 Method and system for positioning key points of human body part
CN113724393B (en) * 2021-08-12 2024-03-19 北京达佳互联信息技术有限公司 Three-dimensional reconstruction method, device, equipment and storage medium
CN115909406A (en) * 2022-11-30 2023-04-04 广东海洋大学 Gesture recognition method based on multi-class classification
CN116243803B (en) * 2023-05-11 2023-12-05 南京鸿威互动科技有限公司 Action evaluation method, system, equipment and readable storage medium based on VR technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN108830150A (en) * 2018-05-07 2018-11-16 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device
CN109821239A (en) * 2019-02-20 2019-05-31 网易(杭州)网络有限公司 Implementation method, device, equipment and the storage medium of somatic sensation television game
CN110427877A (en) * 2019-08-01 2019-11-08 大连海事大学 A method of the human body three-dimensional posture estimation based on structural information
CN110874865A (en) * 2019-11-14 2020-03-10 腾讯科技(深圳)有限公司 Three-dimensional skeleton generation method and computer equipment
CN110991319A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Hand key point detection method, gesture recognition method and related device
CN111062261A (en) * 2019-11-25 2020-04-24 维沃移动通信(杭州)有限公司 Image processing method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN108830150A (en) * 2018-05-07 2018-11-16 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device
CN109821239A (en) * 2019-02-20 2019-05-31 网易(杭州)网络有限公司 Implementation method, device, equipment and the storage medium of somatic sensation television game
CN110427877A (en) * 2019-08-01 2019-11-08 大连海事大学 A method of the human body three-dimensional posture estimation based on structural information
CN110874865A (en) * 2019-11-14 2020-03-10 腾讯科技(深圳)有限公司 Three-dimensional skeleton generation method and computer equipment
CN111062261A (en) * 2019-11-25 2020-04-24 维沃移动通信(杭州)有限公司 Image processing method and device
CN110991319A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Hand key point detection method, gesture recognition method and related device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260774A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Method and device for generating 3D joint point regression model
CN111260774B (en) * 2020-01-20 2023-06-23 北京百度网讯科技有限公司 Method and device for generating 3D joint point regression model
CN114565815A (en) * 2022-02-25 2022-05-31 包头市迪迦科技有限公司 Intelligent video fusion method and system based on three-dimensional model
CN114565815B (en) * 2022-02-25 2023-11-03 包头市迪迦科技有限公司 Video intelligent fusion method and system based on three-dimensional model
CN116486489A (en) * 2023-06-26 2023-07-25 江西农业大学 Three-dimensional hand object posture estimation method and system based on semantic perception graph convolution
CN116486489B (en) * 2023-06-26 2023-08-29 江西农业大学 Three-dimensional hand object posture estimation method and system based on semantic perception graph convolution

Also Published As

Publication number Publication date
CN111753669A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
WO2021237875A1 (en) Hand data recognition method and system based on graph convolutional network, and storage medium
CN109448090B (en) Image processing method, device, electronic equipment and storage medium
CN111160085A (en) Human body image key point posture estimation method
CN112330729B (en) Image depth prediction method, device, terminal equipment and readable storage medium
Wang et al. Laplacian pyramid adversarial network for face completion
CN111598993A (en) Three-dimensional data reconstruction method and device based on multi-view imaging technology
JP2007000205A (en) Image processing apparatus, image processing method, and image processing program
CN113034652A (en) Virtual image driving method, device, equipment and storage medium
WO2022100419A1 (en) Image processing method and related device
CN112837215B (en) Image shape transformation method based on generation countermeasure network
WO2022052782A1 (en) Image processing method and related device
WO2023083030A1 (en) Posture recognition method and related device
CN114332415A (en) Three-dimensional reconstruction method and device of power transmission line corridor based on multi-view technology
WO2022213623A1 (en) Image generation method and apparatus, three-dimensional facial model generation method and apparatus, electronic device and storage medium
WO2022208440A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
Su et al. Virtualpose: Learning generalizable 3d human pose models from virtual data
Zhou et al. A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
CN113593001A (en) Target object three-dimensional reconstruction method and device, computer equipment and storage medium
CN117422851A (en) Virtual clothes changing method and device and electronic equipment
Wang et al. Learning continuous depth representation via geometric spatial aggregator
US20230104702A1 (en) Transformer-based shape models
CN112508776B (en) Action migration method and device and electronic equipment
CN114723973A (en) Image feature matching method and device for large-scale change robustness
CN117036658A (en) Image processing method and related equipment
CN113643357A (en) AR portrait photographing method and system based on 3D positioning information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20938377

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20938377

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20938377

Country of ref document: EP

Kind code of ref document: A1