WO2021237875A1 - 基于图卷积网络的手部数据识别方法、系统和存储介质 - Google Patents

基于图卷积网络的手部数据识别方法、系统和存储介质 Download PDF

Info

Publication number
WO2021237875A1
WO2021237875A1 PCT/CN2020/099766 CN2020099766W WO2021237875A1 WO 2021237875 A1 WO2021237875 A1 WO 2021237875A1 CN 2020099766 W CN2020099766 W CN 2020099766W WO 2021237875 A1 WO2021237875 A1 WO 2021237875A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
hand
dimensional
coordinates
convolutional network
Prior art date
Application number
PCT/CN2020/099766
Other languages
English (en)
French (fr)
Inventor
黄昌正
周言明
陈曦
霍炼楚
Original Assignee
广州幻境科技有限公司
肇庆市安可电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州幻境科技有限公司, 肇庆市安可电子科技有限公司 filed Critical 广州幻境科技有限公司
Publication of WO2021237875A1 publication Critical patent/WO2021237875A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Definitions

  • the present invention relates to the field of computer vision technology, in particular to a method, system and storage medium for hand data recognition based on graph convolutional networks.
  • the hand gesture recognition process is to wear a specific glove on the hand to make the specific glove track the hand posture data.
  • the virtual device receives the real-time posture of the hand and performs it in the virtual reality interface. Tracking display to improve the sense of realism in the virtual reality interface.
  • specific gloves and their supporting facilities severely limit the scope of application, making virtual devices unable to be effectively promoted.
  • the purpose of the present invention is to provide a hand data recognition method, system and storage medium based on graph convolutional network, which can broaden application scenarios to a certain extent.
  • a hand data recognition method based on graph convolutional network includes the following steps:
  • the extracting the key point coordinates and the two-dimensional thermal image of the hand image includes:
  • the combining the characteristic image and the two-dimensional thermal image to generate a characteristic vector includes:
  • a feature vector is calculated through a convolutional network.
  • the generating three-dimensional joint point position coordinates according to the feature vector and the key point coordinates includes:
  • the three-dimensional joint point position coordinates are calculated according to the vertex coordinates and the key point coordinates.
  • vertex coordinates of the three-dimensional grid obtained by calculating according to the feature vector are specifically:
  • the graph convolution network is used to calculate the coordinates of all the vertices of the three-dimensional grid.
  • a linear graph convolution network is used to regress the three-dimensional joint point position coordinates.
  • a hand data recognition system based on graph convolutional network including:
  • the acquisition module is used to acquire a hand image in a preset state
  • An extraction module for extracting feature images, key point coordinates and two-dimensional thermal images of the hand image
  • the combining module is used to combine the feature image and the two-dimensional thermal image to generate a feature vector
  • a generating module configured to generate three-dimensional joint point position coordinates according to the feature vector and the key point coordinates
  • the restoration module is used to restore the hand posture according to the position coordinates of the three-dimensional joint points.
  • a hand data recognition system based on graph convolutional network including:
  • At least one memory for storing programs
  • At least one processor is configured to load the program to execute the hand data recognition method based on the graph convolutional network.
  • a computer-readable storage medium in which instructions executable by a processor are stored, and the instructions executable by the processor are used to implement the hand data recognition method based on graph convolutional network when executed by the processor.
  • the beneficial effect of the embodiment of the present invention is that the embodiment of the present invention obtains a hand image in a preset state, and handles the feature image, key point coordinates, and two-dimensional thermal image of the hand image, and then combines the feature image and the two-dimensional thermal image After the combination, the feature vector is generated, and then the three-dimensional joint point position coordinates are generated according to the feature vector and the key point coordinates, and finally the hand posture is restored according to the three-dimensional joint point position coordinates, so that during the virtual interaction process, the interactor can complete the process without wearing special gloves.
  • the interaction process thereby simplifying the application equipment of the virtual interaction process, in order to broaden the application scenarios to a certain extent.
  • FIG. 1 is a flowchart of a method for hand data recognition based on a graph convolutional network according to a specific embodiment of the present invention
  • FIG. 2 is a schematic diagram of a stacked hourglass network structure according to a specific embodiment
  • Fig. 3 is a schematic diagram of the distribution of 21 joint nodes in a specific embodiment.
  • an embodiment of the present invention provides a hand data recognition method based on a graph convolutional network.
  • This embodiment is applied to a control server, and the control server can communicate with multiple terminal devices.
  • the terminal device may be a camera, a virtual display device, and so on.
  • This embodiment includes steps S11-S15:
  • the preset state means that the hand is at the center of the image in the shooting scene, and the hand occupies a moderate proportion of the image.
  • the stacked hourglass network is a symmetrical network architecture.
  • its multi-scale features are used to recognize gestures, and for each network layer in the process of obtaining low-resolution features, there will be a corresponding network layer during the up-sampling process.
  • the overall network architecture first uses convolution and pooling operations to reduce the features to a very low resolution, such as 4*4.
  • the network will add a new convolution branch, which is used to directly extract features from the original resolution before pooling, similar to the residual operation, and extracted from the subsequent upsampling operation Feature fusion.
  • the network After reaching the lowest resolution, the network starts to up-sampling the features, that is, nearest neighbor interpolation, and combines the information at different scales, and then adds the previously connected features according to the element position.
  • the output of the final network is a set of key point heat maps, which are used to predict the probability of the existence of 21 key points in each pixel as shown in Figure 3.
  • the resolution of the feature map is gradually reduced, and C1a, C2a, C3a, and C4a are a backup of the corresponding feature maps before down-sampling.
  • the feature map that reaches the lowest resolution is gradually up-sampled, and then the restored feature map and the corresponding backup original feature map are combined to obtain C1b, C2b, C3b, and C4b. Under different feature maps, correspondingly extracting different key points of the hand can achieve better accuracy.
  • step S13 can be implemented through the following steps:
  • the size of the two-dimensional thermal image is converted into the size of the feature image; it can use 1*1 convolution to convert the size of the two-dimensional thermal image containing the key points into the size of the feature image.
  • the feature vector is calculated through the convolutional network.
  • the structure of the convolutional network is similar to resnet18 and consists of 8 residual layers and 4 pooling layers.
  • the convolutional network is used to perform feature vector calculations to improve the accuracy of the calculation results.
  • this step is to first calculate the vertex coordinates of the three-dimensional grid according to the feature vector, and then calculate the three-dimensional joint point position coordinates according to the vertex coordinates and the key point coordinates.
  • the vertex coordinates of the three-dimensional grid are calculated according to the feature vector, which can be specifically implemented by the following steps:
  • the graph convolution network is used to calculate the coordinates of all the vertices of the three-dimensional grid.
  • the key point feature vector is input to the graph convolutional network.
  • the graph convolutional network outputs the 3D coordinates of all vertices in the 3D grid through a series of network layer calculations, and uses the 3D coordinates of the vertices in the 3D grid to reconstruct the hand surface 3D grid.
  • the pre-defined map structure of the triangle grid that identifies the hand surface On the pre-defined map structure of the triangle grid that identifies the hand surface, first perform the image coarsening operation, similar to the process of convolutional neural network pooling, using the Graclus multi-level clustering algorithm to coarsen the image vector, and create The tree structure stores the correspondence between the vertices in the graph vectors of adjacent coarsening levels, and the forward propagation device in the graph convolution will upsample the vertex features in the coarsened graph vectors to the corresponding sub-vertices in the graph structure, Finally, the graph convolution is performed to update the features in the graph network, and the parameter K of all graph convolution layers is set to 3.
  • the feature vector extracted from the hourglass network is used as the input of the graph convolution.
  • the feature vector is converted into 80 vertices with 64-dimensional features during the graph coarsening process, and these features are then used in the convolution process.
  • the up-sampling is transformed from low-dimensional to high-dimensional.
  • the network outputs the 3D coordinates of 1280 mesh vertices.
  • the three-dimensional joint point position coordinates are calculated according to the vertex coordinates and the key point coordinates, which can be implemented in the following ways:
  • a linear graph convolution network is used to return the three-dimensional joint point position coordinates.
  • a simplified linear graph convolution can be specifically used to linearly regress the position coordinates of the 3D hand joint points from the coordinates of the vertices of the three-dimensional hand grid.
  • the vertex coordinates of the three-dimensional mesh include the coordinates of the key points of the entire hand, from which the three-dimensional coordinates of 21 joint nodes can be directly filtered.
  • 21 joint points from 0 joint points to 20 joint points.
  • a joint node covers the entire hand posture.
  • a two-layer graph convolutional network without a nonlinear activation module is used to directly estimate the 3D joint depth information from the 3D mesh vertices, and then use the previously obtained 2D key points to generate 3D joint position coordinates.
  • the coordinates of the joint points covering the entire hand posture can be extracted, thereby improving the accuracy of the virtual hand posture synchronization process in virtual reality.
  • S15 Restore the hand posture according to the position coordinates of the three-dimensional joint points. Specifically, it restores the hand posture corresponding to the hand image in the virtual reality interface according to the three-dimensional joint point position coordinates, so that the hand posture data in the virtual reality can be synchronized with the actual hand posture to the greatest extent, and the synchronization in the virtual interaction process is enhanced sex.
  • this embodiment obtains a hand image in a preset state, and handles the feature image, key point coordinates, and two-dimensional thermal image of the hand image, and then combines the feature image and the two-dimensional thermal image to generate the feature Vector, and then generate the three-dimensional joint point position coordinates according to the feature vector and the key point coordinates, and finally restore the hand posture according to the three-dimensional joint point position coordinates, so that in the virtual interaction process, the interactor can complete the virtual interaction process without wearing special gloves.
  • Application equipment that simplifies the virtual interaction process to expand application scenarios to a certain extent.
  • the embodiment of the present invention provides a hand data recognition system based on a graph convolutional network corresponding to the method in FIG. 1, including:
  • the acquisition module is used to acquire a hand image in a preset state
  • An extraction module for extracting feature images, key point coordinates and two-dimensional thermal images of the hand image
  • the combining module is used to combine the feature image and the two-dimensional thermal image to generate a feature vector
  • a generating module configured to generate three-dimensional joint point position coordinates according to the feature vector and the key point coordinates
  • the restoration module is used to restore the hand posture according to the position coordinates of the three-dimensional joint points.
  • the embodiment of the present invention provides a hand data recognition system based on graph convolutional network, including:
  • At least one memory for storing programs
  • At least one processor is configured to load the program to execute the hand data recognition method based on the graph convolutional network.
  • an embodiment of the present invention provides a computer-readable storage medium, in which instructions executable by a processor are stored, and the instructions executable by the processor are used to implement the graph-based convolution when executed by the processor. Hand data recognition method of the network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

一种基于图卷积网络的手部数据识别方法、系统和存储介质,方法包括以下步骤:获取预设状态的手部图像(S11);提取所述手部图像的特征图像、关键点坐标和二维热图像(S12);将所述特征图像和所述二维热图像进行结合,生成特征向量(S13);根据所述特征向量和所述关键点坐标生成三维关节点位置坐标(S14);根据所述三维关节点位置坐标还原手部姿态(S15)。该方法能在虚拟交互过程中,交互人员无需佩戴特定手套即能准确完成虚拟交互过程,从而简化虚拟交互过程的应用设备,以在一定程度上扩宽应用场景。

Description

基于图卷积网络的手部数据识别方法、系统和存储介质 技术领域
本发明涉及计算机视觉技术领域,尤其是一种基于图卷积网络的手部数据识别方法、系统和存储介质。
背景技术
在虚拟现实的交互过程中,手部姿态识别过程是通过在手部上穿戴特定的手套,使特定手套追踪手部姿态数据,虚拟设备接收该手部的实时姿态,并在虚拟现实界面内进行跟踪显示,以提高虚拟现实界面内的真实感。然而,特定手套及其配套设施严重限制了应用范围,从而使得虚拟设备无法得到有效推广。
发明内容
为解决上述技术问题,本发明的目的在于:提供一种基于图卷积网络的手部数据识别方法、系统和存储介质,其能在一定程度上扩宽应用场景。
本发明实施例的第一方面提供了:
一种基于图卷积网络的手部数据识别方法,包括以下步骤:
获取预设状态的手部图像;
提取所述手部图像的特征图像、关键点坐标和二维热图像;
将所述特征图像和所述二维热图像进行结合,生成特征向量;
根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;
根据所述三维关节点位置坐标还原手部姿态。
进一步地,所述提取所述手部图像的关键点坐标和二维热图像,包括:
采用堆叠沙漏网络从所述第一图像中提取关键点特征位置;
根据所述关键点特征位置预测所述二维热图,以及确定所述关键点坐标。
进一步地,所述将所述特征图像和所述二维热图像进行结合,生成特征向量,包括:
将所述二维热图像的尺寸大小转换为所述特征图像的尺寸大小;
根据所述特征图像和尺寸转化后的所述二维热图通过卷积网络计算得到特征向量。
进一步地,所述根据所述特征向量和所述关键点坐标生成三维关节点位置坐标,包括:
根据所述特征向量计算得到三维网格的顶点坐标;
根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标。
进一步地,所述根据所述特征向量计算得到三维网格的顶点坐标,其具体为:
根据所述特征向量采用图卷积网络计算得到三维网格的所有顶点坐标。
进一步地,所述根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标,其具体为:
根据所述顶点坐标和所述关键点坐标采用线性图卷积网络回归三维关节点位置坐标。
进一步地,所述根据所述三维关节点位置坐标还原手部姿态,其具体为:
根据所述三维关节点位置坐标在虚拟现实界面中还原手部图像对应的手部姿态。
本发明实施例的第二方面提供了:
一种基于图卷积网络的手部数据识别系统,包括:
获取模块,用于获取预设状态的手部图像;
提取模块,用于提取所述手部图像的特征图像、关键点坐标和二维热图像;
结合模块,用于将所述特征图像和所述二维热图像进行结合,生成特征向量;
生成模块,用于根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;
还原模块,用于根据所述三维关节点位置坐标还原手部姿态。
本发明实施例的第三方面提供了:
一种基于图卷积网络的手部数据识别系统,包括:
至少一个存储器,用于存储程序;
至少一个处理器,用于加载所述程序以执行所述的基于图卷积网络的手部数据识别方法。
本发明实施例的第四方面提供了:
一种计算机可读存储介质,其中存储有处理器可执行的指令,所述处理器可执行的指令在由处理器执行时用于实现所述的基于图卷积网络的手部数据识别方法。
本发明实施例的有益效果是:本发明实施例通过获取预设状态的手部图像,并提手部图像的特征图像、关键点坐标和二维热图像,接着将特征图像和二维热图像进行结合后生成特征向量,然后根据特征向量和关键点坐标生成三维关节点位置坐标,最后根据三维关节点位置坐标还原手部姿态,使得在虚拟交互过程中,交互人员无需佩戴特定手套即能完成交互过程,从而简化虚拟交互过程的应用设备,以在一定程度上扩宽应用场景。
附图说明
图1为本发明一种具体实施例的基于图卷积网络的手部数据识别方法的流程图;
图2为一种具体实施例的堆叠沙漏网络结构示意图;
图3为一种具体实施例的21个关节节点的分布示意图。
具体实施方式
下面结合附图和具体实施例对本发明做进一步的详细说明。对于以下实施例中的步骤编 号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
参照图1,本发明实施例提供了一种基于图卷积网络的手部数据识别方法,本实施例应用于控制服务器,该控制服务器可以与多个终端设备通信。其中,终端设备可以是摄像头、虚拟显示设备等。
本实施例包括步骤S11-S15:
S11、获取预设状态的手部图像;该手部图像可通过普通的RGB摄像头获取。该预设状态是指拍摄场景下手部在图像的中心位置,同时手部所占图像比例适中。
S12、提取手部图像的特征图像、关键点坐标和二维热图像;具体地,可使用堆叠沙漏网络从手部图像中提取关键点像素位置,预测手部关键点热图,确定初始关键点坐标。
如图2所示,堆叠沙漏网络是一种对称结构的网络架构。本步骤中是利用其多尺度特征来识别姿态,获取低分辨率特征过程中每一个网络层,则在上采样的过程中相应的就会有一个对应网络层。整体的网络架构先利用卷积和池化操作将特征降到一个很低的分辨率,例如4*4。在每一步最大池化操作的时候,网络都会加上一个新的卷积分支,用于直接对池化前的原始分辨率提取特征,类似残差操作,并与后面上采样操作后提取出的特征相融合。在到达最低分辨率之后,网络就开始对特征进行上采样,即最近邻插值,并结合不同尺度下的信息,之后再与前面连接过的特征按元素位置相加。当到达输出分辨率时,再接2个的卷积来进行最后的运算。最终网络的输出就是一组关键点热图,用于预测如图3所示的21个关键点在每一个像素点存在的概率。如图2所示,从C1到C4是一个下采样的过程,特征图的分辨率逐渐降低,而C1a、C2a、C3a和C4a是与之对应的特征图进行下采样前的一个备份。将达到最低分辨率的特征图逐步进行上采样,然后分辨率恢复的特征图和相对应备份的原特征图进行结合得到C1b、C2b、C3b和C4b。在不同的特征图下,对应提取手部不同的关键点即可以取得较好的精度。
S13、将特征图像和二维热图像进行结合,生成特征向量;该特征向量为关键点的特征向 量。其具体是在将特征图像和二维热图像进行结合时,输入由8个残差层和4个池化层组成的残差网络,以生成关键点特征向量。
在一些实施例中,步骤S13可以通过以下步骤实现:
将二维热图像的尺寸大小转换为特征图像的尺寸大小;其可以使用1*1的卷积将包含关键点的二维热图像的尺寸大小转化为特征图像的尺寸大小。
根据特征图像和尺寸转化后的二维热图通过卷积网络计算得到特征向量。
在本实施例中,卷积网络的结构类似resnet18,由8个残差层和4个池化层组成,采用该卷积网络进行特征向量计算,提高计算结果的准确性。
S14、根据特征向量和关键点坐标生成三维关节点位置坐标;
具体地,本步骤是先根据特征向量计算得到三维网格的顶点坐标,接着根据顶点坐标和所述关键点坐标计算得到三维关节点位置坐标。
在一些实施例中,根据特征向量计算得到三维网格的顶点坐标,其具体可以通过以下步骤实现:
根据特征向量采用图卷积网络计算得到三维网格的所有顶点坐标。
具体是把关键点特征向量输入到图卷积网络,图卷积网络经过一系列网络层的计算输出3D网格中所有顶点的3D坐标,利用该3D网格中顶点的3D坐标重建手部表面的3D网格。
手部3D网格其本质是图形结构,因此,3D网格可以采用无向图M=(V,ε,W)表示,其中,
Figure PCTCN2020099766-appb-000001
是网格中N个顶点的集合,
Figure PCTCN2020099766-appb-000002
是网格中E条边的集合,W={w ij} N×N是邻接矩阵。
定义图M顶点上的信号f=(f 1,…,f N) T∈R N×F,用于表示3D网格中N个顶点的F维特征,在切比雪夫图卷积中,信号
Figure PCTCN2020099766-appb-000003
上的图卷积运算定义为公式1:
Figure PCTCN2020099766-appb-000004
其中,T K(x)=2xT K-1(x)-T K-2(x)是k阶切比雪夫多项式,T 0=1,T 0=x,
Figure PCTCN2020099766-appb-000005
是重新缩放的拉普拉斯算子,
Figure PCTCN2020099766-appb-000006
λ max是L的最大特征值,θ k∈R Fin×Fout是图卷积层中的可训练参数,
Figure PCTCN2020099766-appb-000007
是图卷积层的输出信号。
在预先定义的标识手部表面的三角形网格的图结构上,首先执行图粗化操作,类似于卷积神经网络池化的过程,使用Graclus多级聚类算法来粗化图向量,并创建树结构来存储相邻 粗化级别的图向量中顶点的对应关系,在图卷积前向传播器件,将已粗化后的图向量中的顶点特征上采样到图结构中的相应子顶点,最后执行图卷积以更新图网络中的特征,所有图卷积层的参数K设置为3。
具体是从沙漏网络提取的特征向量作为图卷积的输入,通过两个全连接层,特征向量在图形粗化过程中转换为具有64维特征的80个顶点,接着这些特征在卷积过程中被上采样由低维度向高维度转化。通过两个上采样层和四个图形卷积层,网络输出1280个网格顶点的3D坐标。
在一些实施例中,根据顶点坐标和所述关键点坐标计算得到三维关节点位置坐标,其可通过以下方式实现:
根据顶点坐标和关键点坐标采用线性图卷积网络回归三维关节点位置坐标。
本实施例中具体可使用简化的线性图卷积,从三维手部网格顶点坐标线性回归3D手关节点位置坐标。三维网格顶点坐标包含了整个手部的关键点坐标,可以直接从中筛选出21个关节节点的三维坐标,如图3所示,在一个手部上,从0关节点-20关节点共21个关节节点涵盖了整个手部姿势。使用不带非线性激活模块的两层图卷积网络直接从三维网格顶点估计三维关节深度信息,然后利用前面获取的二维关键点,生成三维关节位置坐标。
本实施例中能够提取到涵盖整个手部姿态的关节点坐标,从而提高虚拟现实中虚拟手部姿态同步过程中的准确性。
S15、根据三维关节点位置坐标还原手部姿态。其具体是根据三维关节点位置坐标在虚拟现实界面中还原手部图像对应的手部姿态,以使虚拟现实中的手部姿态数据最大程度的同步实际手部姿态,增强虚拟交互过程中的同步性。
综上所述,本实施例通过获取预设状态的手部图像,并提手部图像的特征图像、关键点坐标和二维热图像,接着将特征图像和二维热图像进行结合后生成特征向量,然后根据特征向量和关键点坐标生成三维关节点位置坐标,最后根据三维关节点位置坐标还原手部姿态,使得在虚拟交互过程中,交互人员无需佩戴特定手套即能完成虚拟交互过程,从而简化虚拟交互过程的应用设备,以在一定程度上扩宽应用场景。
本发明实施例提供了一种与图1方法相对应的基于图卷积网络的手部数据识别系统,包括:
获取模块,用于获取预设状态的手部图像;
提取模块,用于提取所述手部图像的特征图像、关键点坐标和二维热图像;
结合模块,用于将所述特征图像和所述二维热图像进行结合,生成特征向量;
生成模块,用于根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;
还原模块,用于根据所述三维关节点位置坐标还原手部姿态。
本发明方法实施例的内容均适用于本系统实施例,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同。
本发明实施例提供了一种基于图卷积网络的手部数据识别系统,包括:
至少一个存储器,用于存储程序;
至少一个处理器,用于加载所述程序以执行所述的基于图卷积网络的手部数据识别方法。
本发明方法实施例的内容均适用于本系统实施例,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同。
此外,本发明实施例提供了一种计算机可读存储介质,其中存储有处理器可执行的指令,所述处理器可执行的指令在由处理器执行时用于实现所述的基于图卷积网络的手部数据识别方法。
以上是对本发明的较佳实施进行了具体说明,但本发明并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (10)

  1. 一种基于图卷积网络的手部数据识别方法,其特征在于,包括以下步骤:
    获取预设状态的手部图像;
    提取所述手部图像的特征图像、关键点坐标和二维热图像;
    将所述特征图像和所述二维热图像进行结合,生成特征向量;
    根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;
    根据所述三维关节点位置坐标还原手部姿态。
  2. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述提取所述手部图像的关键点坐标和二维热图像,包括:
    采用堆叠沙漏网络从所述第一图像中提取关键点特征位置;
    根据所述关键点特征位置预测所述二维热图,以及确定所述关键点坐标。
  3. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述将所述特征图像和所述二维热图像进行结合,生成特征向量,包括:
    将所述二维热图像的尺寸大小转换为所述特征图像的尺寸大小;
    根据所述特征图像和尺寸转化后的所述二维热图通过卷积网络计算得到特征向量。
  4. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述特征向量和所述关键点坐标生成三维关节点位置坐标,包括:
    根据所述特征向量计算得到三维网格的顶点坐标;
    根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标。
  5. 根据权利要求4所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述特征向量计算得到三维网格的顶点坐标,其具体为:
    根据所述特征向量采用图卷积网络计算得到三维网格的所有顶点坐标。
  6. 根据权利要求4所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述顶点坐标和所述关键点坐标计算得到三维关节点位置坐标,其具体为:
    根据所述顶点坐标和所述关键点坐标采用线性图卷积网络回归三维关节点位置坐标。
  7. 根据权利要求1所述的一种基于图卷积网络的手部数据识别方法,其特征在于,所述根据所述三维关节点位置坐标还原手部姿态,其具体为:
    根据所述三维关节点位置坐标在虚拟现实界面中还原手部图像对应的手部姿态。
  8. 一种基于图卷积网络的手部数据识别系统,其特征在于,包括:
    获取模块,用于获取预设状态的手部图像;
    提取模块,用于提取所述手部图像的特征图像、关键点坐标和二维热图像;
    结合模块,用于将所述特征图像和所述二维热图像进行结合,生成特征向量;
    生成模块,用于根据所述特征向量和所述关键点坐标生成三维关节点位置坐标;
    还原模块,用于根据所述三维关节点位置坐标还原手部姿态。
  9. 一种基于图卷积网络的手部数据识别系统,其特征在于,包括:
    至少一个存储器,用于存储程序;
    至少一个处理器,用于加载所述程序以执行如权利要求1-7任一项所述的基于图卷积网络的手部数据识别方法。
  10. 一种计算机可读存储介质,其中存储有处理器可执行的指令,其特征在于,所述处理器可执行的指令在由处理器执行时用于实现如权利要求1-7任一项所述的基于图卷积网络的手部数据识别方法。
PCT/CN2020/099766 2020-05-29 2020-07-01 基于图卷积网络的手部数据识别方法、系统和存储介质 WO2021237875A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010473675.3A CN111753669A (zh) 2020-05-29 2020-05-29 基于图卷积网络的手部数据识别方法、系统和存储介质
CN202010473675.3 2020-05-29

Publications (1)

Publication Number Publication Date
WO2021237875A1 true WO2021237875A1 (zh) 2021-12-02

Family

ID=72674421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099766 WO2021237875A1 (zh) 2020-05-29 2020-07-01 基于图卷积网络的手部数据识别方法、系统和存储介质

Country Status (2)

Country Link
CN (1) CN111753669A (zh)
WO (1) WO2021237875A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260774A (zh) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 生成3d关节点回归模型的方法和装置
CN114565815A (zh) * 2022-02-25 2022-05-31 包头市迪迦科技有限公司 一种基于三维模型的视频智能融合方法及系统
CN116486489A (zh) * 2023-06-26 2023-07-25 江西农业大学 基于语义感知图卷积的三维手物姿态估计方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095254B (zh) * 2021-04-20 2022-05-24 清华大学深圳国际研究生院 一种人体部位关键点的定位方法及系统
CN113724393B (zh) * 2021-08-12 2024-03-19 北京达佳互联信息技术有限公司 三维重建方法、装置、设备及存储介质
CN115909406A (zh) * 2022-11-30 2023-04-04 广东海洋大学 一种基于多类分类的手势识别方法
CN116243803B (zh) * 2023-05-11 2023-12-05 南京鸿威互动科技有限公司 一种基于vr技术的动作评估方法、系统、设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN108830150A (zh) * 2018-05-07 2018-11-16 山东师范大学 一种基于三维人体姿态估计方法及装置
CN109821239A (zh) * 2019-02-20 2019-05-31 网易(杭州)网络有限公司 体感游戏的实现方法、装置、设备及存储介质
CN110427877A (zh) * 2019-08-01 2019-11-08 大连海事大学 一种基于结构信息的人体三维姿态估算的方法
CN110874865A (zh) * 2019-11-14 2020-03-10 腾讯科技(深圳)有限公司 三维骨架生成方法和计算机设备
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置
CN111062261A (zh) * 2019-11-25 2020-04-24 维沃移动通信(杭州)有限公司 一种图像处理方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN108830150A (zh) * 2018-05-07 2018-11-16 山东师范大学 一种基于三维人体姿态估计方法及装置
CN109821239A (zh) * 2019-02-20 2019-05-31 网易(杭州)网络有限公司 体感游戏的实现方法、装置、设备及存储介质
CN110427877A (zh) * 2019-08-01 2019-11-08 大连海事大学 一种基于结构信息的人体三维姿态估算的方法
CN110874865A (zh) * 2019-11-14 2020-03-10 腾讯科技(深圳)有限公司 三维骨架生成方法和计算机设备
CN111062261A (zh) * 2019-11-25 2020-04-24 维沃移动通信(杭州)有限公司 一种图像处理方法及装置
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260774A (zh) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 生成3d关节点回归模型的方法和装置
CN111260774B (zh) * 2020-01-20 2023-06-23 北京百度网讯科技有限公司 生成3d关节点回归模型的方法和装置
CN114565815A (zh) * 2022-02-25 2022-05-31 包头市迪迦科技有限公司 一种基于三维模型的视频智能融合方法及系统
CN114565815B (zh) * 2022-02-25 2023-11-03 包头市迪迦科技有限公司 一种基于三维模型的视频智能融合方法及系统
CN116486489A (zh) * 2023-06-26 2023-07-25 江西农业大学 基于语义感知图卷积的三维手物姿态估计方法及系统
CN116486489B (zh) * 2023-06-26 2023-08-29 江西农业大学 基于语义感知图卷积的三维手物姿态估计方法及系统

Also Published As

Publication number Publication date
CN111753669A (zh) 2020-10-09

Similar Documents

Publication Publication Date Title
WO2021237875A1 (zh) 基于图卷积网络的手部数据识别方法、系统和存储介质
CN111160085A (zh) 一种人体图像关键点姿态估计方法
CN112330729B (zh) 图像深度预测方法、装置、终端设备及可读存储介质
Wang et al. Laplacian pyramid adversarial network for face completion
CN111598993A (zh) 基于多视角成像技术的三维数据重建方法、装置
WO2022100419A1 (zh) 一种图像处理方法及相关设备
CN112837215B (zh) 一种基于生成对抗网络的图像形状变换方法
CN112991350A (zh) 一种基于模态差异缩减的rgb-t图像语义分割方法
WO2023083030A1 (zh) 一种姿态识别方法及其相关设备
CN113593001A (zh) 目标对象三维重建方法、装置、计算机设备和存储介质
CN114332415A (zh) 基于多视角技术的输电线路廊道的三维重建方法及装置
CN116097307A (zh) 图像的处理方法及相关设备
WO2022213623A1 (zh) 图像生成、三维人脸模型生成的方法、装置、电子设备及存储介质
CN114693557A (zh) 基于姿态引导的行人视频修复方法、系统、设备和介质
EP4292059A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
Su et al. Virtualpose: Learning generalizable 3d human pose models from virtual data
Zhou et al. A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
Wang et al. Learning continuous depth representation via geometric spatial aggregator
CN117422851A (zh) 虚拟换衣方法及其装置、电子设备
US20230104702A1 (en) Transformer-based shape models
KR20230083212A (ko) 객체 자세 추정 장치 및 방법
CN112508776B (zh) 动作迁移方法、装置和电子设备
CN114723973A (zh) 大尺度变化鲁棒的图像特征匹配方法及装置
CN115049764A (zh) Smpl参数预测模型的训练方法、装置、设备及介质
CN117036658A (zh) 一种图像处理方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20938377

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20938377

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20938377

Country of ref document: EP

Kind code of ref document: A1