WO2022006784A1 - 人体骨架检测方法、装置、系统、设备及存储介质 - Google Patents

人体骨架检测方法、装置、系统、设备及存储介质 Download PDF

Info

Publication number
WO2022006784A1
WO2022006784A1 PCT/CN2020/100900 CN2020100900W WO2022006784A1 WO 2022006784 A1 WO2022006784 A1 WO 2022006784A1 CN 2020100900 W CN2020100900 W CN 2020100900W WO 2022006784 A1 WO2022006784 A1 WO 2022006784A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
human skeleton
data
processing
input
Prior art date
Application number
PCT/CN2020/100900
Other languages
English (en)
French (fr)
Inventor
韩晓光
邱陵腾
张轩烨
崔曙光
Original Assignee
香港中文大学(深圳)
深圳市大数据研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 香港中文大学(深圳), 深圳市大数据研究院 filed Critical 香港中文大学(深圳)
Priority to PCT/CN2020/100900 priority Critical patent/WO2022006784A1/zh
Publication of WO2022006784A1 publication Critical patent/WO2022006784A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to a human skeleton detection method, device, system, device and storage medium.
  • Human skeleton estimation has long been the main content in the field of computational vision. Obtaining the key points of the human body through skeleton analysis can simplify the process of motion estimation. Especially for some 3D human body reconstruction tasks, skeleton estimation is one of the prior tasks.
  • skeleton detection is mainly divided into top-down and bottom-up methods. The top-down method first detects all people in the scene, locates the detection frames of the human body, each detection frame contains the key points of the human skeleton, and then performs pose estimation for each person.
  • the bottom-up approach is to detect the key points of the entire image, and then group each key point through clustering to obtain the key point skeleton of each person. Human skeleton estimation is more challenging in crowded scenes, because the human skeleton is occluded in this scene, making the detection of key points inaccurate.
  • the embodiments of the present disclosure provide a human skeleton detection method, apparatus, system, device, and storage medium.
  • an embodiment of the present disclosure provides a method for detecting a human skeleton.
  • the human skeleton detection method includes:
  • the processing of the to-be-recognized picture to obtain the initial posture of the target human skeleton is implemented as:
  • the heat map is converted into coordinate data as the initial pose of the target human skeleton.
  • the acquiring feature maps output by different decoding layers in the process of processing the to-be-identified picture is implemented as:
  • the skeleton detection network is used to process the image to be recognized, and the feature maps of at least three decoding layers are extracted from it, which is denoted as as well as Among them, the feature map as well as The resolution increases and the number of channels decreases.
  • the processing of the feature map obtains feature map data, and extraction from the feature map data and the initial The position data corresponding to the pose is used as input data and is implemented as:
  • the feature map is fused part, including:
  • S11 Process the feature map to the same resolution and number of channels
  • S12 Use the self-attention network to fuse the features processed in step S11, and perform normalization processing;
  • the manner of training the graph convolutional neural network is implemented as:
  • the attention module of the graph convolutional neural network wherein the first attention module obtains the initial pose and the position data as input features; the second attention module obtains the output features of the first attention module and the position data as input features; the third attention module obtains the output features of the second attention module and the position data as input features.
  • an embodiment of the present disclosure provides a human skeleton detection apparatus.
  • the human skeleton detection device includes:
  • the first acquisition module is configured to process the image to be recognized and acquire the initial posture of the target human skeleton
  • a second acquisition module configured to acquire feature maps output by different decoding layers in the process of processing the to-be-identified picture
  • an extraction module configured to process the feature map to obtain feature map data, and extract position data corresponding to the initial posture from the feature map data as input data;
  • the third acquisition module is configured to input the initial posture and the input data to the trained graph convolutional neural network, and obtain the final posture of the target human skeleton; wherein, the matrix representation of the graph convolutional neural network Determined according to the constraint relationship of the human skeleton structure.
  • embodiments of the present disclosure provide an electronic device, including a memory and a processor, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are processed by the The device executes to implement the method as described in any one of the first aspects.
  • an embodiment of the present disclosure provides a computer-readable storage medium on which computer instructions are stored, and when the computer instructions are executed by a processor, implement the method according to any one of the first aspects.
  • the image to be recognized is firstly processed, the initial posture of the target human skeleton is obtained, then the feature maps output by different decoding layers in the process of processing the to-be-recognized image are obtained, and then the feature maps are processed to obtain features image data, and extract the position data corresponding to the initial posture from the feature map data as input data, and finally input the initial posture and the input data to the trained graph convolutional neural network to obtain the target
  • the final pose of the human skeleton wherein the matrix representation of the graph convolutional neural network is determined according to the constraint relationship of the human skeleton structure.
  • the technical solution can accurately adjust the position of the occluded key points by combining the basic constraint information between the joint structures of the human body and the relevant data of the occluded key points contained in the excavated feature map, so it is more suitable for the detection of human skeletons. high accuracy.
  • FIG. 1 shows a network architecture of a human skeleton detection system according to an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of a human skeleton detection method according to an embodiment of the present disclosure
  • 3a and 3b are schematic diagrams showing the comparison effect of the human skeleton detection method in the prior art and the human skeleton detection method according to the embodiment of the present disclosure
  • FIG. 5 shows a specific flowchart of feature fusion according to an embodiment of the present disclosure
  • FIG. 6 shows a flowchart of feature map fusion according to an embodiment of the present disclosure
  • FIG. 7 shows a structural block diagram of a human skeleton detection apparatus according to an embodiment of the present disclosure
  • FIG. 8 shows a structural block diagram of an electronic device according to an embodiment of the present disclosure
  • FIG. 9 shows a schematic structural diagram of a computer system suitable for implementing a method according to an embodiment of the present disclosure.
  • skeleton estimation of the human body has long been the main content in the field of computational vision. Obtaining the key points of the human body through skeleton analysis can simplify the process of motion estimation. Especially for some 3D human body reconstruction tasks, skeleton estimation is a priori task. one.
  • skeleton detection is mainly divided into top-down and bottom-up methods. The top-down method first detects all people in the scene, locates the detection frames of the human body, each detection frame contains the key points of the human skeleton, and then performs pose estimation for each person.
  • the bottom-up approach is to detect the key points of the entire image, and then group each key point through clustering to obtain the key point skeleton of each person. Human skeleton estimation is more challenging in crowded scenes, because the human skeleton is occluded in this scene, making the detection of key points inaccurate.
  • the human skeleton detection method proposed in the present disclosure can accurately adjust the position of the occluded key points by combining the basic constraint information between the joint structures of the human body and the relevant data of the occluded key points contained in the excavated feature map. Therefore, for The detection of human skeleton has a high accuracy rate.
  • FIG. 1 shows a network architecture of a human skeleton detection system according to an embodiment of the present disclosure.
  • the human skeleton detection system 100 includes: an initial pose estimation module 110 , a cascaded adaptive module 120 and a graph convolutional neural network 130 .
  • the initial posture estimation module 110 is used to process the to-be-recognized picture and obtain the initial posture of the target human skeleton. First, the heat map of the key points of the target human skeleton is obtained from the image to be recognized, and then the heat map is converted into the initial pose of the target human skeleton represented by the coordinates.
  • the cascade adaptation module 120 is configured to acquire feature maps output by different decoding layers in the process of processing the to-be-identified picture, and process the feature maps to obtain feature map data. For example, feature maps 1, 2, and 3 are sequentially extracted from the last three decoding layers of the heat map obtained by processing the image to be recognized by the initial pose estimation module 110, and then feature map data is generated through feature fusion. Among them, the resolution of the feature maps 1, 2, and 3 extracted by the three decoding layers increases and the number of channels decreases. Feature maps 1, 2, and 3 of different resolutions and number of channels represent the detailed features of different degrees of accuracy of the occluded key points of the human skeleton.
  • the cascade adaptation module 120 includes a conversion sub-module 121 and a fusion sub-module 122, wherein the conversion sub-module 121 is used to convert the two feature map data to be fused into features of the same number of channels, and the fusion sub-module 122 is used for. Fusion of transformed feature map data to generate feature map data
  • the cascaded adaptive module 120 and the graph convolutional neural network 130 are jointly used to adjust the initial posture of the human skeleton, and accurately adjust the position of the occluded key points in the image to be recognized.
  • the feature maps 1, 2, and 3 are fused in the order of resolution from low to high, so that the context information of the feature map can be used to mine the difference between different feature maps.
  • the detailed features of the occluded key points of the human skeleton are obtained, and three feature map data are obtained accordingly.
  • Three feature map data Extract the position data corresponding to the initial posture respectively
  • the training weights are respectively input to the residual graph convolution module 131 in the graph convolutional neural network 130, and correspondingly output pose 1, pose 2 and final pose of the target human skeleton.
  • the graph convolutional neural network 130 is trained with the error determination objective function in conjunction with pose 1, pose 2, and the final pose.
  • the graph convolutional neural network 130 according to the initial pose and the input data
  • the final posture of the target human skeleton is output, wherein the input data is the position data corresponding to the initial posture extracted from the feature map data.
  • the Laplacian matrix in the graph convolutional neural network 130 provides basic constraint information between the joints of the human body structure.
  • the human skeleton detection system provided by the present disclosure can be adapted to the detection of occluded human skeletons in crowded scenes.
  • the relevant data of the occluded key points can accurately adjust the position of the occluded key points, so the detection of the human skeleton has a high accuracy rate.
  • FIG. 2 shows a flowchart of a human skeleton detection method according to an embodiment of the present disclosure.
  • the human skeleton detection method includes the following steps S101-S104:
  • step S101 the image to be recognized is processed to obtain the initial posture of the target human skeleton
  • step S102 acquiring feature maps output by different decoding layers in the process of processing the to-be-identified picture
  • step S103 processing the feature map to obtain feature map data, and extracting position data corresponding to the initial posture from the feature map data as input data;
  • step S104 the initial posture and the input data are input into the trained graph convolutional neural network to obtain the final posture of the target human skeleton; wherein, the matrix representation of the graph convolutional neural network is based on the human skeleton The constraints of the structure are determined.
  • the processing of the to-be-recognized picture in step S101 to obtain the initial posture of the target human skeleton is implemented as:
  • the heat map is converted into coordinate data as the initial pose of the target human skeleton.
  • the skeleton detection network may be an AlphaPose system, and details can be referred to in the prior art, which will not be repeated in this disclosure.
  • the AlphaPose system is used to process the image to be recognized and output the heat map H, and the initial pose of the human skeleton estimated from the heat map H is recorded as:
  • x j and y j are the position of the jth joint
  • c j is the confidence score
  • k is the number of joints in the human skeleton.
  • the initial pose of the human skeleton represented by the heat map H is converted into the initial pose represented by the coordinates as the initial physical position data of the target human skeleton.
  • the heat map H is normalized to the likelihood value [0, 1] by the Softmax function, and then the integral operation is applied to estimate the joint position and denoted as:
  • the obtaining feature maps output by different decoding layers in the process of processing the to-be-identified picture in step S102 is implemented as:
  • the skeleton detection network is used to process the image to be recognized, and the feature maps of at least three decoding layers are extracted from it, which is denoted as as well as Among them, the feature map as well as The resolution increases and the number of channels decreases.
  • the AlphaPose system is used to process the image to be recognized, and the feature map is extracted from the last three decoding layers from which the heat map is obtained, which is recorded as a rough feature map Intermediate feature map and fine feature maps
  • the above feature map The resolution increases and the number of channels decreases.
  • the features of are denoted as conv_1,
  • the features are denoted as conv_2 and
  • the features of conv_3 are recorded as conv_3; among them, the resolution and number of channels of conv_1 are represented as 21*21*512, the resolution and number of channels of conv_2 are represented as 42*42*256, and the resolution and number of channels of conv_3 are represented as 84*84 *128.
  • step S103 the feature map is processed to obtain feature map data, and position data corresponding to the initial posture is extracted from the feature map data as input data, which is implemented as:
  • the feature map is processed by convolution To the same number of channels, for example, convolve the number of channels of conv_1, conv_2 and conv_3 to 256.
  • the feature after convolution processing is the feature map data Then based on the initial pose represented by the coordinates obtained from the heat map H, from Extract the location data of the corresponding location from
  • the size of the convolution kernel used in the convolution processing module is 3, the stride is 1, and RELU is used as the activation function.
  • FIG. 4 shows a flowchart of feature map fusion according to an embodiment of the present disclosure.
  • the feature map is fused part, including:
  • S11 Process the feature map to the same resolution and number of channels.
  • step S12 Use the self-attention network to fuse the features processed in step S11, and perform normalization processing.
  • FIG. 5 shows a specific flowchart of feature fusion according to an embodiment of the present disclosure.
  • FIG. 6 shows a flowchart of feature map fusion according to an embodiment of the present disclosure.
  • the feature map is fused part, including:
  • step S21 the feature map data is obtained in step S21 and step S22
  • steps S11-S13 the feature map data is obtained in step S21 and step S22
  • the method of training the graph convolutional neural network is implemented as:
  • the attention module of the graph convolutional neural network wherein the first attention module obtains the initial pose and the position data as input features; the second attention module obtains the output features of the first attention module and the position data as input features; the third attention module obtains the output features of the second attention module and the position data as input features.
  • the graph convolutional neural network uses the network structure of Deep-GCN, wherein the Laplacian matrix of the graph convolutional neural network is automatically given according to the human skeleton structure.
  • the information that may be lost in the initial pose can be used to adjust the initial pose of the target human skeleton, which can improve the pose estimation result.
  • FIG. 7 shows a structural block diagram of a human skeleton detection apparatus according to an embodiment of the present disclosure.
  • the apparatus may be realized by software, hardware or a combination of the two to become part or all of the electronic device.
  • the human skeleton detection apparatus 700 includes a first acquisition module 710 , a second acquisition module 720 , an extraction module 730 and a third acquisition module 740 .
  • the first obtaining module 710 is configured to process the to-be-recognized picture and obtain the initial posture of the target human skeleton;
  • the second obtaining module 720 is configured to obtain feature maps output by different decoding layers in the process of processing the to-be-identified picture;
  • the extraction module 730 is configured to process the feature map to obtain feature map data, and extract position data corresponding to the initial posture from the feature map data as input data;
  • the third acquisition module 740 is configured to input the initial posture and the input data to a trained graph convolutional neural network to obtain the final posture of the target human skeleton; wherein, the graph convolutional neural network
  • the matrix representation is determined according to the constraint relationship of the human skeleton structure.
  • the human skeleton detection device provided by the present disclosure can be adapted to the detection of occluded human skeletons in crowded scenes.
  • the relevant data of the occluded key points can accurately adjust the position of the occluded key points, so the detection of the human skeleton has a high accuracy rate.
  • FIG. 8 shows a structural block diagram of the electronic device according to an embodiment of the present disclosure.
  • the electronic device 800 includes a memory 801 and a processor 802 , wherein the memory 801 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 802 to implement the following method steps:
  • FIG. 9 shows a schematic structural diagram of a computer system suitable for implementing a method according to an embodiment of the present disclosure.
  • a computer system 900 includes a processing unit 901 that can perform the above-described implementation according to a program stored in a read only memory (ROM) 902 or a program loaded from a storage section 908 into a random access memory (RAM) 903 various methods in the example.
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 900 are also stored.
  • the processing unit 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to bus 904 .
  • the following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, etc.; an output section 907 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 908 including a hard disk, etc. ; and a communication section 909 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 909 performs a communication process via a network such as the Internet.
  • a drive 910 is also connected to the I/O interface 905 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 910 as needed so that a computer program read therefrom is installed into the storage section 908 as needed.
  • the processing unit 901 may be implemented as a processing unit such as a CPU, a GPU, a TPU, an FPGA, and an NPU.
  • embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program containing program code for performing the above-described method.
  • the computer program may be downloaded and installed from the network via the communication portion 909, and/or installed from the removable medium 911.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units or modules involved in the embodiments of the present disclosure may be implemented in a software manner, or may be implemented in a programmable hardware manner.
  • the described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves in certain circumstances.
  • the present disclosure also provides a computer-readable storage medium
  • the computer-readable storage medium may be a computer-readable storage medium included in the electronic device or computer system in the above-mentioned embodiments; it may also exist independently , a computer-readable storage medium that does not fit into a device.
  • the computer-readable storage medium stores one or more programs used by one or more processors to perform the methods described in the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种人体骨架检测方法、装置、系统、设备及存储介质,所述方法包括:处理待识别图片,获取目标人体骨架的初始姿态(S101);获取处理所述待识别图片过程中不同解码层输出的特征图(S102);处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据(S103);输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定(S104)。通过结合人体关节结构之间的基本约束信息以及挖掘出的特征图中包含的被遮挡关键点的相关数据,能够精确地调整被遮挡关键点的位置,对于人体骨架的检测有较高的正确率。

Description

人体骨架检测方法、装置、系统、设备及存储介质
相关申请的交叉引用
无。
技术领域
本公开涉及图像处理技术领域,具体涉及一种人体骨架检测方法、装置、系统、设备及存储介质。
背景技术
人体的骨架估计长期都是计算视觉领域中的主要内容,通过骨架分析得到人体的关键点,可以简化动作估计的过程,尤其对于一些3d人体重建任务,骨架估计是先验任务之一。现有技术中,骨架检测主要分为自顶而下和自下而上的方法。自顶而下的方法首先检测场景中的所有人,定位出人体的检测框,每个检测框包含人体骨架的关键点,然后对每个人进行姿势估计。而自下而上的做法是通过检测整张图片的关键点,再通过聚类对各个关键点分组从而得到每个人的关键点骨架。人体骨架估计在拥挤的场景下更具挑战性,因为该场景下人体骨架存在着被遮挡的情况,使得关键点的检测不够准确。
发明内容
为了解决相关技术中的问题,本公开实施例提供一种人体骨架检测方法、装置、系统、设备以及存储介质。
第一方面,本公开实施例中提供了一种人体骨架检测方法。
具体地,所述人体骨架检测方法,包括:
处理待识别图片,获取目标人体骨架的初始姿态;
获取处理所述待识别图片过程中不同解码层输出的特征图;
处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据;
输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
结合第一方面,本公开在第一方面的第一种实现方式中,所述处理待识别图片,获取目标人体骨架的初始姿态,被实施为:
利用骨架检测网络处理待识别图片,生成目标人体骨架的热度图;
将所述热度图转化为坐标数据,作为目标人体骨架的初始姿态。
结合第一方面的第一种实现方式,本公开在第一方面的第二种实现方式中,所述获取处理所述待识别图片过程中不同解码层输出的特征图,被实施为:
利用骨架检测网络处理待识别图片,从中提取至少三个解码层的特征图记为
Figure PCTCN2020100900-appb-000001
以及
Figure PCTCN2020100900-appb-000002
其中,所述特征图
Figure PCTCN2020100900-appb-000003
以及
Figure PCTCN2020100900-appb-000004
的分辨率递增以及通道数递减。
结合第一方面的第二种实现方式,本公开在第一方面的第三种实现方式中,所述处理所述 特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据,被实施为:
将所述特征图
Figure PCTCN2020100900-appb-000005
转化为特征图数据
Figure PCTCN2020100900-appb-000006
并从所述特征图数据
Figure PCTCN2020100900-appb-000007
中提取位置数据
Figure PCTCN2020100900-appb-000008
融合所述特征图
Figure PCTCN2020100900-appb-000009
并从融合后的特征图数据
Figure PCTCN2020100900-appb-000010
中提取位置数据
Figure PCTCN2020100900-appb-000011
融合所述特征图
Figure PCTCN2020100900-appb-000012
并从融合后的特征图数据
Figure PCTCN2020100900-appb-000013
中提取位置数据
Figure PCTCN2020100900-appb-000014
结合第一方面的第三种实现方式,本公开在第一方面的第四种实现方式中,所述融合所述特征图
Figure PCTCN2020100900-appb-000015
的部分,包括:
S11:处理所述特征图
Figure PCTCN2020100900-appb-000016
至相同的分辨率以及通道数;
S12:利用自注意力网络融合步骤S11中处理得到的特征,并进行归一化处理;
S13:融合步骤S2中处理得到的特征与步骤S1处理得到的特征获取特征图数据
Figure PCTCN2020100900-appb-000017
结合第一方面的第三种实现方式、第一方面的第四种实现方式,本公开在第一方面的第五种实现方式中,训练所述图卷积神经网络的方式,被实施为:
分别将位置数据
Figure PCTCN2020100900-appb-000018
输入图卷积神经网络的注意力模块,其中,第一个所述注意力模块获取所述初始姿态以及所述位置数据
Figure PCTCN2020100900-appb-000019
作为输入特征;第二个所述注意力模块获取所述第一个注意力模块的输出特征以及所述位置数据
Figure PCTCN2020100900-appb-000020
作为输入特征;第三个所述注意力模块获取所述第二个注意力模块的输出特征以及所述位置数据
Figure PCTCN2020100900-appb-000021
作为输入特征。
第二方面,本公开实施例中提供了一种人体骨架检测装置。
具体地,所述人体骨架检测装置,包括:
第一获取模块,被配置为处理待识别图片,获取目标人体骨架的初始姿态;
第二获取模块,被配置为获取处理所述待识别图片过程中不同解码层输出的特征图;
提取模块,被配置为处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据;
第三获取模块,被配置为输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
第三方面,本公开实施例提供了一种电子设备,包括存储器和处理器,其中,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行以实现如第一方面任一项所述的方法。
第四方面,本公开实施例中提供了一种计算机可读存储介质,其上存储有计算机指令,该计算机指令被处理器执行时实现如第一方面任一项所述的方法。
本公开实施例提供的技术方案可以包括以下有益效果:
根据本公开实施例提供的技术方案,首先处理待识别图片,获取目标人体骨架的初始姿态,然后获取处理所述待识别图片过程中不同解码层输出的特征图,之后处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据,最后输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态,其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。该技术方案 通过结合人体关节结构之间的基本约束信息以及挖掘出的特征图中包含的被遮挡关键点的相关数据,能够精确地调整被遮挡关键点的位置,因此对于人体骨架的检测具有较高的正确率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
本公开的这些方面或其他方面在以下实施例的描述中会更加简明易懂。应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对示例性实施例或相关技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些示例性实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出根据本公开的实施例的人体骨架检测系统的网络构架;
图2示出根据本公开的实施例的人体骨架检测方法的流程图;
图3a、图3b示出现有技术与本公开的实施例的人体骨架检测方法检测人体骨架的对比效果示意图;
图4示出根据本公开实施例的特征图融合的流程图;
图5示出了根据本公开实施例的特征融合的具体流程图;
图6示出根据本公开实施例的特征图融合的流程图;
图7示出根据本公开的实施例的人体骨架检测装置的结构框图;
图8示出根据本公开的实施例的电子设备的结构框图;
图9示出适于用来实现根据本公开实施例的方法的计算机系统的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开示例性实施例中的附图,对本公开示例性实施例中的技术方案进行清楚、完整地描述。
在本公开的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
下面将结合本公开示例性实施例中的附图,对本公开示例性实施例中的技术方案进行清楚、完整地描述,显然,所描述的示例性实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施 例,都属于本公开保护的范围。
上文提及,人体的骨架估计长期都是计算视觉领域中的主要内容,通过骨架分析得到人体的关键点,可以简化动作估计的过程,尤其对于一些3d人体重建任务,骨架估计是先验任务之一。现有技术中,骨架检测主要分为自顶而下和自下而上的方法。自顶而下的方法首先检测场景中的所有人,定位出人体的检测框,每个检测框包含人体骨架的关键点,然后对每个人进行姿势估计。而自下而上的做法是通过检测整张图片的关键点,再通过聚类对各个关键点分组从而得到每个人的关键点骨架。人体骨架估计在拥挤的场景下更具挑战性,因为该场景下人体骨架存在着被遮挡的情况,使得关键点的检测不够准确。
本公开提出的人体骨架检测方法,通过结合人体关节结构之间的基本约束信息以及挖掘出的特征图中包含的被遮挡关键点的相关数据,能够精确地调整被遮挡关键点的位置,因此对于人体骨架的检测具有较高的正确率。
图1示出根据本公开的实施例的人体骨架检测系统的网络构架。如图1所示,所述人体骨架检测系统100包括:初始姿势估计模块110、级联自适应模块120和图卷积神经网络130。
所述初始姿势估计模块110用于处理待识别图片,获取目标人体骨架的初始姿态。首先从待识别图片中获取目标人体骨架关键点的热度图(heat map),然后将热度图转化为坐标表示的目标人体骨架的初始姿态。
所述级联自适应模块120用于获取处理所述待识别图片过程中不同解码层输出的特征图,并处理所述特征图得到特征图数据。比如,从所述初始姿势估计模块110处理待识别图片得到热度图的后三层解码层中依次提取出特征图1、2、3,然后通过特征融合生成特征图数据。其中,三个解码层提取出的特征图1、2、3的分辨率递增以及通道数递减。不同分辨率以及通道数的特征图1、2、3表示人体骨架被遮挡关键点的不同精确程度的细节特征。
所述级联自适应模块120包括转换子模块121以及融合子模块122,其中,转换子模块121用于将待融合的两个特征图数据转换为相同通道数的特征,融合子模块122用于融合经过转换的特征图数据,生成特征图数据
Figure PCTCN2020100900-appb-000022
所述级联自适应模块120和所述图卷积神经网络130联合用于调整人体骨架初始姿态,精确地调整待识别图片中被遮挡关键点的位置。其中,所述级联自适应模块120在处理特征图的过程中,特征图1、2、3按照分辨率从低至高的顺序被融合,从而能够利用特征图上下文信息,挖掘出不同特征图之间人体骨架被遮挡关键点的细节特征,相应的得到三个特征图数据
Figure PCTCN2020100900-appb-000023
Figure PCTCN2020100900-appb-000024
三个特征图数据
Figure PCTCN2020100900-appb-000025
分别提取与所述初始姿态对应的位置数据
Figure PCTCN2020100900-appb-000026
作为输入数据,分别输入到所述图卷积神经网络130中残差图卷积模块131训练权重,并相应输出目标人体骨架的姿势1、姿势2和最终姿势。联合姿势1、姿势2以及最终姿势的误差确定目标函数训练所述图卷积神经网络130。
所述图卷积神经网络130根据所述初始姿态以及输入数据
Figure PCTCN2020100900-appb-000027
输出目标人体骨架的最终姿态,其中,所述输入数据为从所述特征图数据中提取的与所述初始姿态对应的位置数据。其中,所述图卷积神经网络130中的拉普拉斯矩阵提供人体结构关节之间的基本约束信息。
本公开提供的人体骨架检测系统可以适应于拥挤场景下被遮挡的人体骨架的检测,通过结 合人体关节结构之间的基本约束信息以及挖掘出的不同分辨率、通道数的特征图中包含的被遮挡关键点的相关数据,能够精确地调整被遮挡关键点的位置,因此对于人体骨架的检测具有较高的正确率。
图2示出根据本公开的实施例的人体骨架检测方法的流程图。如图2所示,所述人体骨架检测方法包括以下步骤S101-S104:
在步骤S101中,处理待识别图片,获取目标人体骨架的初始姿态;
在步骤S102中,获取处理所述待识别图片过程中不同解码层输出的特征图;
在步骤S103中,处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据;
在步骤S104中,输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
现有的人体骨架检测方法很大程度上依赖于热图表示法进行关节位置估计,对于被遮挡关键点的位置估计往往不够准确。如图3a以及图3b所示,图中人体关节存在遮挡,与左侧图相比较,右侧图为采用本公开的方法正确识别出的骨架位置。从右侧图可以看出,图3a中女生的可见骨架关键点被错误的识别为男生骨架这一错误被纠正,图3b中被桌椅上图书遮挡的不可见骨架关键点被正确识别。可见,本公开的人体骨架检测方法用于识别被遮挡关键点能够得到较好的结果。
根据本公开的实施例,步骤S101中所述处理待识别图片,获取目标人体骨架的初始姿态,被实施为:
利用骨架检测网络处理待识别图片,生成目标人体骨架的热度图;
将所述热度图转化为坐标数据,作为目标人体骨架的初始姿态。
在本公开方式中,所述骨架检测网络可以是AlphaPose系统,具体参见现有技术,本公开对此不予赘述。利用AlphaPose系统处理待识别图片输出热度图H,从热度图H中估计人体骨架的初始姿态记为:
{<x 1,y 1,c 1>,<x 2,y 2,c 2>,...,<x j,y j,c j>,...,<x k,y k,c k>,}
其中,x j和y j是第j个关节的位置,c j为置信度得分,k为人体骨架中关节的数量。
考虑到热图表示的人体骨架的初始姿态会导致关节估计的量化误差,因此,将热度图H表示的人体骨架的初始姿态转换为坐标表示的初始姿态,作为目标人体骨架的初始物理位置数据。具体地,将热度图H通过Softmax函数进行归一化处理为似然值[0,1],之后应用积分运算估算关节位置记为:
Figure PCTCN2020100900-appb-000028
其中,
Figure PCTCN2020100900-appb-000029
是第k个关节的估计位置,A表示似然区域,H k(p)表示p点的似然值。
根据本公开的实施例,步骤S102中所述获取处理所述待识别图片过程中不同解码层输出的特征图,被实施为:
利用骨架检测网络处理待识别图片,从中提取至少三个解码层的特征图记为
Figure PCTCN2020100900-appb-000030
以及
Figure PCTCN2020100900-appb-000031
其中,所述特征图
Figure PCTCN2020100900-appb-000032
以及
Figure PCTCN2020100900-appb-000033
的分辨率递增以及通道数递减。
在本公开方式中,利用AlphaPose系统处理待识别图片,从得到热度图的后三层解码层提取特征图,记为粗略特征图
Figure PCTCN2020100900-appb-000034
中间特征图
Figure PCTCN2020100900-appb-000035
以及精细特征图
Figure PCTCN2020100900-appb-000036
上述特征图
Figure PCTCN2020100900-appb-000037
的分辨率递增以及通道数递减。比如,
Figure PCTCN2020100900-appb-000038
的的特征记为conv_1、
Figure PCTCN2020100900-appb-000039
的特征记为conv_2以及
Figure PCTCN2020100900-appb-000040
的特征记为conv_3;其中,conv_1的分辨率以及通道数表示为21*21*512,conv_2的分辨率以及通道数表示为42*42*256,conv_3的分辨率以及通道数表示为84*84*128。
根据本公开的实施例,步骤S103中处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据,被实施为:
将所述特征图
Figure PCTCN2020100900-appb-000041
转化为特征图数据
Figure PCTCN2020100900-appb-000042
并从所述特征图数据
Figure PCTCN2020100900-appb-000043
中提取位置数据
Figure PCTCN2020100900-appb-000044
融合所述特征图
Figure PCTCN2020100900-appb-000045
并从融合后的特征图数据
Figure PCTCN2020100900-appb-000046
中提取位置数据
Figure PCTCN2020100900-appb-000047
融合所述特征图
Figure PCTCN2020100900-appb-000048
并从融合后的特征图数据
Figure PCTCN2020100900-appb-000049
中提取位置数据
Figure PCTCN2020100900-appb-000050
在本公开方式中,卷积处理特征图
Figure PCTCN2020100900-appb-000051
至相同通道数,比如,将conv_1、conv_2以及conv_3的通道数卷积至256。对于特征图
Figure PCTCN2020100900-appb-000052
来说,卷积处理后的特征即为特征图数据
Figure PCTCN2020100900-appb-000053
然后基于热度图H得到的坐标表示的初始姿态,从
Figure PCTCN2020100900-appb-000054
中提取相应位置的位置数据
Figure PCTCN2020100900-appb-000055
其中,卷积处理的模块中使用的卷积核的大小为3、步长为1,使用RELU作为激活函数。
在本公开方式中,图4示出根据本公开实施例的特征图融合的流程图。如图4所示,所述融合所述特征图
Figure PCTCN2020100900-appb-000056
的部分,包括:
S11:处理所述特征图
Figure PCTCN2020100900-appb-000057
至相同的分辨率以及通道数。
S12:利用自注意力网络融合步骤S11中处理得到的特征,并进行归一化处理。
S13:融合步骤S2中处理得到的特征与步骤S1处理得到的特征获取特征图数据
Figure PCTCN2020100900-appb-000058
下面以具体的实例来说明融合过程,具体可参见图5,图5示出了根据本公开实施例的特征融合的具体流程图。
Figure PCTCN2020100900-appb-000059
的的特征conv_1、
Figure PCTCN2020100900-appb-000060
的特征conv_2为例进行说明:
首先将conv_1、conv_2的通道数卷积至256,分别标记为特征图数据
Figure PCTCN2020100900-appb-000061
(21*21*256)以及特征图数据
Figure PCTCN2020100900-appb-000062
(42*42*256);之后,将特征图数据
Figure PCTCN2020100900-appb-000063
(21*21*256)上采样到
Figure PCTCN2020100900-appb-000064
(42*42*256)。
之后利用自注意力机制网络融合特征
Figure PCTCN2020100900-appb-000065
(42*42*256)以及特征
Figure PCTCN2020100900-appb-000066
(42*42*256),得到注意力图H2(42*42*512),并通过softmax函数对H2进行归一化处理。
最后将处理后的注意力图H2和
Figure PCTCN2020100900-appb-000067
点乘并卷积处理通道数至256,并将处理后的特征与
Figure PCTCN2020100900-appb-000068
相加后,得到特征图数据
Figure PCTCN2020100900-appb-000069
在本公开方式中,图6示出根据本公开实施例的特征图融合的流程图。如图6所示,所述融合所述特征图
Figure PCTCN2020100900-appb-000070
的部分,包括:
S21:融合所述特征图
Figure PCTCN2020100900-appb-000071
得到特征图数据
Figure PCTCN2020100900-appb-000072
S22:融合所述特征图数据
Figure PCTCN2020100900-appb-000073
以及所述特征图
Figure PCTCN2020100900-appb-000074
得到特征图数据
Figure PCTCN2020100900-appb-000075
其中,步骤S21、步骤S22中得到特征图数据
Figure PCTCN2020100900-appb-000076
的具体技术内容参见步骤S11-S13的说明,在此不予赘述。
根据本公开的实施例,所述训练所述图卷积神经网络的方式,被实施为:
分别将位置数据
Figure PCTCN2020100900-appb-000077
输入图卷积神经网络的注意力模块,其中,第一个所述注意力模块获取所述初始姿态以及所述位置数据
Figure PCTCN2020100900-appb-000078
作为输入特征;第二个所述注意力模块获取所述第一个注意力模块的输出特征以及所述位置数据
Figure PCTCN2020100900-appb-000079
作为输入特征;第三个所述注意力模块获取所述第二个注意力模块的输出特征以及所述位置数据
Figure PCTCN2020100900-appb-000080
作为输入特征。
根据本公开的实施例,所述图卷积神经网络使用的是Deep-GCN的网络结构,其中图卷积神经网络的拉普拉斯矩阵根据人体骨架结构自动给出。
根据本公开的实施例,通过结合位置数据
Figure PCTCN2020100900-appb-000081
能够挖掘存储在特征图
Figure PCTCN2020100900-appb-000082
但可能在初始姿态中丢失的信息,进而来调整目标人体骨架的初始姿态,可以改善姿态估计结果。
图7示出根据本公开的实施例的人体骨架检测装置的结构框图。其中,该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。
如图7所示,所述人体骨架检测装置700包括第一获取模块710、第二获取模块720、提取模块730和第三获取模块740。
所述第一获取模块710被配置为处理待识别图片,获取目标人体骨架的初始姿态;
所述第二获取模块720被配置为获取处理所述待识别图片过程中不同解码层输出的特征图;
所述提取模块730被配置为处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据;
所述第三获取模块740被配置为输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
本公开提供的人体骨架检测装置可以适应于拥挤场景下被遮挡的人体骨架的检测,通过结合人体关节结构之间的基本约束信息以及挖掘出的不同分辨率、通道数的特征图中包含的被遮挡关键点的相关数据,能够精确地调整被遮挡关键点的位置,因此对于人体骨架的检测具有较高的正确率。
本公开还公开了一种电子设备,图8示出根据本公开的实施例的电子设备的结构框图。
如图8所示,所述电子设备800包括存储器801和处理器802,其中,存储器801用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器802执行以实现以下方法步骤:
处理待识别图片,获取目标人体骨架的初始姿态;
获取处理所述待识别图片过程中不同解码层输出的特征图;
处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据;
输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
图9示出适于用来实现根据本公开实施例的方法的计算机系统的结构示意图。
如图9所示,计算机系统900包括处理单元901,其可以根据存储在只读存储器(ROM) 902中的程序或者从存储部分908加载到随机访问存储器(RAM)903中的程序而执行上述实施例中的各种方法。在RAM 903中,还存储有系统900操作所需的各种程序和数据。处理单元901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907;包括硬盘等的存储部分908;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信过程。驱动器910也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入存储部分908。其中,所述处理单元901可实现为CPU、GPU、TPU、FPGA、NPU等处理单元。
特别地,根据本公开的实施例,上文描述的方法可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行上述方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分909从网络上被下载和安装,和/或从可拆卸介质911被安装。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元或模块可以通过软件的方式实现,也可以通过可编程硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。
作为另一方面,本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中电子设备或计算机系统中所包含的计算机可读存储介质;也可以是单独存在,未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本公开的方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (10)

  1. 一种人体骨架检测方法,其特征在于,包括:
    处理待识别图片,获取目标人体骨架的初始姿态;
    获取处理所述待识别图片过程中不同解码层输出的特征图;
    处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据;
    输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
  2. 根据权利要求1所述的方法,其特征在于,所述处理待识别图片,获取目标人体骨架的初始姿态,被实施为:
    利用骨架检测网络处理待识别图片,生成目标人体骨架的热度图;
    将所述热度图转化为坐标数据,作为目标人体骨架的初始姿态。
  3. 根据权利要求2所述的方法,其特征在于,所述获取处理所述待识别图片过程中不同解码层输出的特征图,被实施为:
    利用骨架检测网络处理待识别图片,从中提取至少三个解码层的特征图记为
    Figure PCTCN2020100900-appb-100001
    以及
    Figure PCTCN2020100900-appb-100002
    其中,所述特征图
    Figure PCTCN2020100900-appb-100003
    以及
    Figure PCTCN2020100900-appb-100004
    的分辨率递增以及通道数递减。
  4. 根据权利要求3所述的方法,其特征在于,所述处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述初始姿态对应的位置数据作为输入数据,被实施为:
    将所述特征图
    Figure PCTCN2020100900-appb-100005
    转化为特征图数据
    Figure PCTCN2020100900-appb-100006
    并从所述特征图数据
    Figure PCTCN2020100900-appb-100007
    中提取位置数据
    Figure PCTCN2020100900-appb-100008
    融合所述特征图
    Figure PCTCN2020100900-appb-100009
    并从融合后的特征图数据
    Figure PCTCN2020100900-appb-100010
    中提取位置数据
    Figure PCTCN2020100900-appb-100011
    融合所述特征图
    Figure PCTCN2020100900-appb-100012
    并从融合后的特征图数据
    Figure PCTCN2020100900-appb-100013
    中提取位置数据
    Figure PCTCN2020100900-appb-100014
  5. 根据权利要求4所述的方法,其特征在于,所述融合所述特征图
    Figure PCTCN2020100900-appb-100015
    的部分,包括:
    S11:处理所述特征图
    Figure PCTCN2020100900-appb-100016
    至相同的分辨率以及通道数;
    S12:利用自注意力网络融合步骤S11中处理得到的特征,并进行归一化处理;
    S13:融合步骤S2中处理得到的特征与步骤S1处理得到的特征获取特征图数据
    Figure PCTCN2020100900-appb-100017
  6. 根据权利要求4或5所述的方法,其特征在于,训练所述图卷积神经网络的方式,被实施为:
    分别将位置数据
    Figure PCTCN2020100900-appb-100018
    输入图卷积神经网络的注意力模块,其中,第一个所述注意力模块获取所述初始姿态以及所述位置数据
    Figure PCTCN2020100900-appb-100019
    作为输入特征;第二个所述注意力模块获取所述第一个注意力模块的输出特征以及所述位置数据
    Figure PCTCN2020100900-appb-100020
    作为输入特征;第三个所述注意力模块获取所述第二个注意力模块的输出特征以及所述位置数据
    Figure PCTCN2020100900-appb-100021
    作为输入特征。
  7. 一种人体骨架检测装置,其特征在于,包括:
    第一获取模块,被配置为处理待识别图片,获取目标人体骨架的初始姿态;
    第二获取模块,被配置为获取处理所述待识别图片过程中不同解码层输出的特征图;
    提取模块,被配置为处理所述特征图得到特征图数据,并从所述特征图数据中提取与所述 初始姿态对应的位置数据作为输入数据;
    第三获取模块,被配置为输入所述初始姿态以及所述输入数据至经过训练的图卷积神经网络,获取所述目标人体骨架的最终姿态;其中,所述图卷积神经网络的矩阵表示根据人体骨架结构的约束关系确定。
  8. 一种人体骨架检测系统,其特征在于,包括:
    初始姿势估计模块,用于处理待识别图片,获取目标人体骨架的初始姿态;
    级联自适应模块,用于获取处理所述待识别图片过程中不同解码层输出的特征图,并处理所述特征图得到特征图数据;
    级联图卷积神经网络,用于根据所述初始姿态以及输入数据,输出目标人体骨架的最终姿态,其中,所述输入数据为从所述特征图数据中提取的与所述初始姿态对应的位置数据。
  9. 一种电子设备,其特征在于,包括存储器和处理器;其中,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行以实现权利要求1-6中任一项所述的方法步骤。
  10. 一种可读存储介质,其上存储有计算机指令,其特征在于,该计算机指令被处理器执行时实现权利要求1-6中任一项所述的方法步骤。
PCT/CN2020/100900 2020-07-08 2020-07-08 人体骨架检测方法、装置、系统、设备及存储介质 WO2022006784A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/100900 WO2022006784A1 (zh) 2020-07-08 2020-07-08 人体骨架检测方法、装置、系统、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/100900 WO2022006784A1 (zh) 2020-07-08 2020-07-08 人体骨架检测方法、装置、系统、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022006784A1 true WO2022006784A1 (zh) 2022-01-13

Family

ID=79552227

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100900 WO2022006784A1 (zh) 2020-07-08 2020-07-08 人体骨架检测方法、装置、系统、设备及存储介质

Country Status (1)

Country Link
WO (1) WO2022006784A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116563952A (zh) * 2023-07-07 2023-08-08 厦门医学院 一种结合图神经网络和骨长约束的动捕缺失数据恢复方法
CN117021435A (zh) * 2023-05-12 2023-11-10 浙江闽立电动工具有限公司 修边机的修边控制系统及其方法
WO2024120390A1 (zh) * 2022-12-05 2024-06-13 中慧医学成像(深圳)有限公司 用于脊柱超声图像的自动标注方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787439A (zh) * 2016-02-04 2016-07-20 广州新节奏智能科技有限公司 一种基于卷积神经网络的深度图像人体关节定位方法
CN106650827A (zh) * 2016-12-30 2017-05-10 南京大学 基于结构指导深度学习的人体姿态估计方法和系统
US10372228B2 (en) * 2016-07-20 2019-08-06 Usens, Inc. Method and system for 3D hand skeleton tracking
CN110969114A (zh) * 2019-11-28 2020-04-07 四川省骨科医院 一种人体动作功能检测系统、检测方法及检测仪
CN111311714A (zh) * 2020-03-31 2020-06-19 北京慧夜科技有限公司 一种三维动画的姿态预测方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787439A (zh) * 2016-02-04 2016-07-20 广州新节奏智能科技有限公司 一种基于卷积神经网络的深度图像人体关节定位方法
US10372228B2 (en) * 2016-07-20 2019-08-06 Usens, Inc. Method and system for 3D hand skeleton tracking
CN106650827A (zh) * 2016-12-30 2017-05-10 南京大学 基于结构指导深度学习的人体姿态估计方法和系统
CN110969114A (zh) * 2019-11-28 2020-04-07 四川省骨科医院 一种人体动作功能检测系统、检测方法及检测仪
CN111311714A (zh) * 2020-03-31 2020-06-19 北京慧夜科技有限公司 一种三维动画的姿态预测方法和系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024120390A1 (zh) * 2022-12-05 2024-06-13 中慧医学成像(深圳)有限公司 用于脊柱超声图像的自动标注方法和装置
CN117021435A (zh) * 2023-05-12 2023-11-10 浙江闽立电动工具有限公司 修边机的修边控制系统及其方法
CN117021435B (zh) * 2023-05-12 2024-03-26 浙江闽立电动工具有限公司 修边机的修边控制系统及其方法
CN116563952A (zh) * 2023-07-07 2023-08-08 厦门医学院 一种结合图神经网络和骨长约束的动捕缺失数据恢复方法
CN116563952B (zh) * 2023-07-07 2023-09-15 厦门医学院 一种结合图神经网络和骨长约束的动捕缺失数据恢复方法

Similar Documents

Publication Publication Date Title
WO2022006784A1 (zh) 人体骨架检测方法、装置、系统、设备及存储介质
CN109584276B (zh) 关键点检测方法、装置、设备及可读介质
CN110309706B (zh) 人脸关键点检测方法、装置、计算机设备及存储介质
US11321593B2 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
CN108229322B (zh) 基于视频的人脸识别方法、装置、电子设备及存储介质
WO2019020075A1 (zh) 图像处理方法、装置、存储介质、计算机程序和电子设备
EP3872764B1 (en) Method and apparatus for constructing map
US20210209774A1 (en) Image adjustment method and apparatus, electronic device and storage medium
EP3862914A1 (en) Video action recognition method, apparatus, and device, and storage medium
US20150302240A1 (en) Method and device for locating feature points on human face and storage medium
WO2020062493A1 (zh) 图像处理方法和装置
WO2021051547A1 (zh) 暴力行为检测方法及系统
WO2018040982A1 (zh) 一种用于增强现实的实时图像叠加方法及装置
CN113435408A (zh) 人脸活体检测方法、装置、电子设备及存储介质
CN113362314B (zh) 医学图像识别方法、识别模型训练方法及装置
CN113837130B (zh) 一种人体手部骨架检测方法及系统
CN113537153A (zh) 仪表图像识别方法、装置、电子设备和计算机可读介质
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质
CN112926552B (zh) 基于深度神经网络的遥感影像车辆目标识别模型及方法
JP2023512359A (ja) 関連対象検出方法、及び装置
CN110059651B (zh) 一种相机实时跟踪注册方法
CN112634355A (zh) 一种目标跟踪方法、装置、设备及存储介质
CN110070110B (zh) 一种自适应阈值图像匹配方法
WO2017173977A1 (zh) 一种移动终端目标跟踪方法、装置和移动终端
CN116452741B (zh) 对象重建方法、对象重建模型的训练方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20944641

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20944641

Country of ref document: EP

Kind code of ref document: A1