CN111860276B

CN111860276B - Human body key point detection method, device, network equipment and storage medium

Info

Publication number: CN111860276B
Application number: CN202010674493.2A
Authority: CN
Inventors: 胥杰; 马丹; 乔曦雨; 周凯文; 杜欧杰; 范亚平; 周丹; 雷林
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2023-04-11
Anticipated expiration: 2040-07-14
Also published as: CN111860276A

Abstract

The embodiment of the invention relates to the technical field of computer vision, and discloses a human body key point detection method, which is applied to a pre-trained key point detection model and comprises the following steps: extracting a first feature map of a target human body image; performing deconvolution of a plurality of layers according to the first feature map to obtain a second feature map, wherein when performing deconvolution of the next layer, the first feature map and the third feature map which have the same resolution as that of the third feature map obtained by deconvolution of the previous layer are combined and then subjected to deconvolution; outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram, and determining all human body key points of the target human body image according to the thermodynamic diagrams. The embodiment of the invention also provides a human body key point detection device, network equipment and a storage medium. The human body key point detection method, the human body key point detection device, the network equipment and the storage medium can improve the detection precision of the human body key point and the accuracy of a human body posture estimation result.

Description

Human key point detection method, device, network equipment and storage medium

技术领域technical field

本发明涉及计算机视觉技术领域，特别涉及一种人体关键点检测方法、装置、网络设备及存储介质。The invention relates to the technical field of computer vision, in particular to a human body key point detection method, device, network equipment and storage medium.

背景技术Background technique

人体姿态估计是计算机视觉领域中的一个重要研究方向，被广泛应用于人体活动分析、人机交互以及视频监视等方面。人体姿态估计是指通过计算机算法在图像或视频中定位人体关键点(如肩、肘、腕、膝和踝等)来估计人体姿态。Human pose estimation is an important research direction in the field of computer vision, and is widely used in human activity analysis, human-computer interaction, and video surveillance. Human pose estimation refers to the estimation of human pose by locating key points of the human body (such as shoulders, elbows, wrists, knees, and ankles, etc.) in images or videos by computer algorithms.

然而，发明人发现现有技术至少存在以下问题：目前人体关键点的检测方法精度不高，容易造成人体关键点的输出误差，影响人体姿态估计的结果。However, the inventors have found at least the following problems in the prior art: the accuracy of the current detection method for key points of the human body is not high, which may easily cause output errors of the key points of the human body and affect the result of human body pose estimation.

发明内容Contents of the invention

本发明实施方式的目的在于提供一种人体关键点检测方法、装置、网络设备及存储介质，使得人体关键点的检测精度和人体姿态估计结果的准确性提高。The purpose of the embodiment of the present invention is to provide a human body key point detection method, device, network equipment and storage medium, so that the detection accuracy of the human body key point and the accuracy of the human body posture estimation result are improved.

为解决上述技术问题，本发明的实施方式提供了一种人体关键点检测方法，应用于预训练的关键点检测模型，包括：利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，第一特征图包括多组不同分辨率的特征图；利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，利用第二子模块将分辨率与前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积；利用关键点检测模型的第三子模块根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。In order to solve the above technical problems, the embodiment of the present invention provides a human body key point detection method, which is applied to the pre-trained key point detection model, including: using the first sub-module of the key point detection model to extract the first part of the target human body image The feature map, the first feature map includes multiple groups of feature maps with different resolutions; the second sub-module of the key point detection model is used to perform deconvolution of several layers according to the first feature map to obtain the second feature map, wherein, after performing In the deconvolution of the next layer, the second sub-module is used to deconvolute the first feature map and the third feature map with the same resolution as the third feature map obtained by deconvolution of the previous layer; use the key The third sub-module of the point detection model outputs a heat map of each human key point of the target human body image according to the second feature map, and determines each human body key point of the target human body image according to the heat map.

本发明的实施方式还提供了一种人体关键点检测装置，包括：提取模块，用于提取目标人体图像的第一特征图，第一特征图包括多组不同分辨率的特征图；反卷积模块，用于根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，将分辨率前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积；检测模块，用于根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。The embodiment of the present invention also provides a human body key point detection device, including: an extraction module, used to extract the first feature map of the target human body image, the first feature map includes multiple groups of feature maps with different resolutions; deconvolution The module is used to perform deconvolution of several layers according to the first feature map to obtain the second feature map, wherein, when performing the deconvolution of the next layer, the third feature obtained by the deconvolution of the previous layer of resolution Deconvolution is performed after the first feature map and the third feature map with the same figure are combined; the detection module is used to output the heat map of each key point of the target human body image according to the second feature map, and determine the target human body image according to the heat map key points of each human body.

本发明的实施方式还提供了一种网络设备，包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行上述的人体关键点检测方法。Embodiments of the present invention also provide a network device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by at least one processor. Executed by one processor, so that at least one processor can execute the above-mentioned human body key point detection method.

本发明的实施方式还提供了一种计算机可读存储介质，存储有计算机程序，计算机程序被处理器执行时实现上述的人体关键点检测方法。Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned human body key point detection method is realized.

本发明实施方式相对于现有技术而言，通过关键点检测模型提取目标人体图像多组不同分辨率的第一特征图；根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，将分辨率与前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积；根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。由于第三特征图的信息与组合的第一特征图的信息可以互相弥补，因此将两个特征图组合后进行反卷积，可以使关键点检测模型学习到目标人体图像中更多的人体关键点的信息，从而有效地提高人体关键点的检测精度和人体姿态估计结果的准确性。Compared with the prior art, the embodiment of the present invention uses the key point detection model to extract multiple sets of first feature maps of different resolutions of the target human body image; deconvolution of several layers is performed according to the first feature map to obtain the second feature map , wherein, when deconvolution of the next layer is performed, deconvolution is performed after combining the first feature map and the third feature map with the same resolution as the third feature map obtained by deconvolution of the previous layer; according to the first The second feature map outputs a heat map of each key point of the human body image of the target human body, and determines each key point of the human body of the target human body image according to the heat map. Since the information of the third feature map and the information of the combined first feature map can complement each other, deconvolution after combining the two feature maps can enable the key point detection model to learn more human key points in the target human body image. Point information, thereby effectively improving the detection accuracy of human key points and the accuracy of human pose estimation results.

另外，利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，包括：在进行每一层的反卷积时，利用第二子模块根据通道数将反卷积分为若干组。通过在进行每一层的反卷积时，根据通道数将每一层的反卷积分为若干组进行，可以使反卷积的运算量在原来的基础上大幅减少。In addition, using the second sub-module of the key point detection model to perform deconvolution of several layers according to the first feature map, including: when performing deconvolution of each layer, using the second sub-module to integrate the deconvolution according to the number of channels for several groups. When deconvolution of each layer is performed, the deconvolution of each layer is divided into several groups according to the number of channels, so that the calculation amount of deconvolution can be greatly reduced on the original basis.

另外，利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，包括：在进行第一层的反卷积时，利用第二子模块根据最小分辨率的第一特征图进行反卷积。In addition, using the second sub-module of the key point detection model to perform deconvolution of several layers according to the first feature map, including: when performing deconvolution of the first layer, using the second sub-module according to the first minimum resolution The feature maps are deconvolved.

另外，利用关键点检测模型的第三子模块根据第二特征图输出目标人体图像的各个人体关键点的热力图，包括：利用第三子模块根据人体肢体空间关系对第二特征图进行分组卷积，根据分组卷积的结果输出目标人体图像的各个人体关键点的热力图。由于人体中同一部位的肢体在目标人体图像中存在较为固定的空间关系，且同一部位的肢体在目标人体图像中的特征也较为类似，因此根据人体肢体空间关系对第二特征图进行分组卷积，可以使关键点检测模型在同一组卷积中互相学习和参考同一部位肢体的空间位置和图像特征，从而有效地提高人体关键点的检测精度。In addition, the third sub-module of the key point detection model is used to output the heat map of each human body key point of the target human body image according to the second feature map, including: using the third sub-module to group the second feature map according to the spatial relationship of human limbs According to the result of group convolution, the heat map of each key point of the target human body image is output. Since the limbs of the same part of the human body have a relatively fixed spatial relationship in the target human body image, and the features of the limbs of the same part in the target human body image are relatively similar, the second feature map is grouped and convolved according to the spatial relationship of the human limbs , which can make the key point detection models learn from each other and refer to the spatial position and image features of the same limb in the same group of convolutions, thereby effectively improving the detection accuracy of human key points.

另外，利用第三子模块根据人体肢体空间关系对第二特征图进行分组卷积，具体为：利用第三子模块根据预设的关键点分组对第二特征图进行分组卷积，关键点分组中的每一组的第二特征图包括处于邻近位置、并属于人体同一肢体位置的第一人体关键点的第二特征图，第一人体关键点为根据人体肢体空间关系预先定义的关键点。In addition, use the third sub-module to perform group convolution on the second feature map according to the spatial relationship of human limbs, specifically: use the third sub-module to perform group convolution on the second feature map according to the preset key point grouping, the key point grouping The second feature map of each group includes the second feature map of the first human body key points that are in adjacent positions and belong to the same limb position of the human body. The first human body key points are key points predefined according to the spatial relationship of human body limbs.

另外，利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，具体为：利用第一子模块根据轻量级卷积神经网络获取目标人体图像的第一特征图。由于轻量级卷积神经网络可以减少关键点检测模型对计算能力的要求，因此根据轻量级卷积神经网络获取第一特征图，可以从整体上降低关键点检测模型对运行模型的设备的计算能力要求。In addition, using the first sub-module of the key point detection model to extract the first feature map of the target human body image is specifically: using the first sub-module to obtain the first feature map of the target human body image according to a lightweight convolutional neural network. Since the lightweight convolutional neural network can reduce the computing power requirements of the key point detection model, the acquisition of the first feature map according to the lightweight convolutional neural network can reduce the overall impact of the key point detection model on the device running the model. Computing power required.

另外，利用第一子模块根据轻量级卷积神经网络获取目标人体图像的第一特征图，包括：利用第一子模块将轻量级卷积神经网络最后一层卷积层的通道数减少至预设个数，根据减少通道数后的轻量级卷积神经网络获取目标人体图像的第一特征图。通过合理设置预设个数并将轻量级卷积神经网络最后一层卷积层的通道数减少至预设个数，可以使关键点检测模型在检测效果相差不大的情况下减少轻量级卷积神经网络的运算量，从而进一步降低关键点检测模型对运行模型的设备的计算能力要求。In addition, using the first submodule to obtain the first feature map of the target human body image according to the lightweight convolutional neural network includes: using the first submodule to reduce the number of channels of the last convolutional layer of the lightweight convolutional neural network To the preset number, the first feature map of the target human body image is obtained according to the lightweight convolutional neural network after reducing the number of channels. By setting the preset number reasonably and reducing the channel number of the last convolutional layer of the lightweight convolutional neural network to the preset number, the key point detection model can be reduced in weight while the detection effect is not much different. The computational load of the level convolutional neural network, thereby further reducing the computing power requirements of the key point detection model on the device running the model.

附图说明Description of drawings

一个或多个实施例通过与之对应的附图中的图片进行示例性说明，这些示例性说明并不构成对实施例的限定。One or more embodiments are exemplified by pictures in the accompanying drawings, and these exemplifications are not intended to limit the embodiments.

图1是本发明第一实施方式提供的人体关键点检测方法的流程示意图；FIG. 1 is a schematic flow diagram of a human body key point detection method provided in the first embodiment of the present invention;

图2是本发明第一实施方式提供的人体关键点检测方法的原理示意图；Fig. 2 is a schematic diagram of the principle of the human body key point detection method provided by the first embodiment of the present invention;

图3是本发明第二实施方式提供的人体关键点检测方法的流程示意图；Fig. 3 is a schematic flow chart of a human body key point detection method provided by the second embodiment of the present invention;

图4是本发明第三实施方式提供的人体关键点检测方法的流程示意图；Fig. 4 is a schematic flow chart of a human body key point detection method provided by the third embodiment of the present invention;

图5是本发明第三实施方式提供的人体关键点检测方法的检测示例图；Fig. 5 is a detection example diagram of the human body key point detection method provided by the third embodiment of the present invention;

图6是本发明第四实施方式提供的人体关键点检测装置的模块结构示意图；Fig. 6 is a schematic diagram of the module structure of the human body key point detection device provided by the fourth embodiment of the present invention;

图7是本发明第五实施方式提供的网络设备的结构示意图。Fig. 7 is a schematic structural diagram of a network device provided in a fifth embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明的各实施方式进行详细的阐述。然而，本领域的普通技术人员可以理解，在本发明各实施方式中，为了使读者更好地理解本申请而提出了许多技术细节。但是，即使没有这些技术细节和基于以下各实施方式的种种变化和修改，也可以实现本申请所要求保护的技术方案。In order to make the object, technical solution and advantages of the present invention clearer, various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present invention, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in this application can also be realized.

本发明的第一实施方式涉及一种人体关键点检测方法，利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，其中，第一特征图包括多组不同分辨率的特征图；利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，利用第二子模块将分辨率与前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行卷积；利用关键点检测模型的第三子模块根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个关键点。通过将分辨率与前一层反卷积得到的第三特征图相同的第一特征图链接到第三特征图，将两个特征图组合后进行下一层的反卷积，由于两个特征图的信息可以互相弥补，因此可以使关键点检测模型在反卷积的过程中学习到目标人体图像中更多的人体关键点的信息，从而提高人体关键点的检测精度和人体姿态估计的准确性。The first embodiment of the present invention relates to a human body key point detection method, using the first sub-module of the key point detection model to extract the first feature map of the target human body image, wherein the first feature map includes multiple sets of features with different resolutions Figure; use the second sub-module of the key point detection model to perform several layers of deconvolution according to the first feature map to obtain the second feature map, wherein, when performing the deconvolution of the next layer, use the second sub-module to The first feature map and the third feature map with the same resolution as the third feature map obtained by the previous layer of deconvolution are combined for convolution; the third sub-module of the key point detection model is used to output the target human body according to the second feature map The heat map of each key point of the human body in the image, and determine each key point of the target human body image according to the heat map. By linking the first feature map with the same resolution as the third feature map obtained by the previous layer of deconvolution to the third feature map, the two feature maps are combined for the next layer of deconvolution, due to the two features The information in the graph can complement each other, so the key point detection model can learn more information about the key points of the human body in the target human body image during the deconvolution process, thereby improving the detection accuracy of human key points and the accuracy of human body pose estimation. sex.

应当说明的是，本发明实施方式提供的人体关键点检测方法应用在预训练的关键点检测模型，即执行主体为关键点检测模型。可以理解的是，关键点检测模型的第一子模块、第二子模块和第三子模块为关键点检测模型中的组成部分。可选地，关键点检测模型还可以包括其它的组成部分(子模块)，这里不做具体限定。可选地，关键点检测模型在训练时可以使用RGB格式的数据，而关键点检测模型在训练时采用的学习率、训练周期、批量大小等均可以根据实际需要进行设置，本发明实施方式对此不做具体限定。It should be noted that the human body key point detection method provided in the embodiment of the present invention is applied to a pre-trained key point detection model, that is, the key point detection model is executed as the main body. It can be understood that the first sub-module, the second sub-module and the third sub-module of the key point detection model are components of the key point detection model. Optionally, the key point detection model may also include other components (submodules), which are not specifically limited here. Optionally, the key point detection model can use data in RGB format during training, and the learning rate, training cycle, batch size, etc. adopted by the key point detection model during training can be set according to actual needs. This is not specifically limited.

本发明实施方式提供的人体关键点检测方法的具体流程如图1所示，具体包括以下步骤：The specific flow of the human body key point detection method provided by the embodiment of the present invention is shown in Figure 1, and specifically includes the following steps:

S101：利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，其中，第一特征图包括多组不同分辨率的特征图。S101: Using the first sub-module of the key point detection model to extract a first feature map of a target human body image, where the first feature map includes multiple groups of feature maps with different resolutions.

其中，目标人体图像是指待检测人体关键点的人体图像，可以使用目标检测方法来获取，例如使用目标检测方法从视频的帧图像中获取目标人体图像，具体的目标检测方法可以根据实际需要进行设置，这里不做具体限定。Among them, the target human body image refers to the human body image of the key points of the human body to be detected, which can be obtained by using the target detection method, for example, using the target detection method to obtain the target human body image from the frame image of the video, and the specific target detection method can be carried out according to actual needs The setting is not specifically limited here.

请参考图2，其为本发明实施方式提供的人体关键点检测方法的原理示意图。具体地，关键点检测模型的第一子模块用于提取目标人体图像的第一特征图，对应图中的特征提取部分；第二子模块用于根据第一特征图进行反卷积，对应图中的反卷积部分；第三子模块用于输出人体关键点的热力图，对应图中的输出热力图部分。Please refer to FIG. 2 , which is a schematic diagram of the principles of the human body key point detection method provided by the embodiment of the present invention. Specifically, the first sub-module of the key point detection model is used to extract the first feature map of the target human body image, corresponding to the feature extraction part in the figure; the second sub-module is used to perform deconvolution according to the first feature map, corresponding to the The deconvolution part in ; the third sub-module is used to output the heat map of the key points of the human body, corresponding to the output heat map part in the figure.

可选地，第一子模块具体可以使用子神经网络对目标人体图像的特征进行提取，其中，子神经网络例如是卷积神经网络，通过卷积神经网络的卷积层提取目标人体图像的第一特征图，具体使用的子神经网络可以根据实际需要进行设置，这里不做具体限定。通过将子神经网络中每一层的卷积核的尺寸设置为不同尺寸，可以得到多组不同分辨率的第一特征图，例如是64*48、32*24、16*12等不同分辨率，每一层的卷积核的尺寸可以根据实际需要进行设置，这里不做具体限定。Optionally, the first sub-module may specifically use a sub-neural network to extract features of the target human body image, wherein the sub-neural network is, for example, a convolutional neural network, and the first sub-module of the target human body image is extracted through a convolutional layer of the convolutional neural network. A feature map, the specific sub-neural network used can be set according to actual needs, and is not specifically limited here. By setting the size of the convolution kernel of each layer in the sub-neural network to different sizes, multiple sets of first feature maps with different resolutions can be obtained, such as 64*48, 32*24, 16*12 and other different resolutions , the size of the convolution kernel of each layer can be set according to actual needs, and is not specifically limited here.

S102：利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，利用第二子模块将分辨率与前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积。S102: Use the second sub-module of the key point detection model to perform several layers of deconvolution according to the first feature map to obtain the second feature map, wherein, when performing the deconvolution of the next layer, use the second sub-module to deconvolute The first feature map and the third feature map with the same resolution as the third feature map obtained by deconvolution of the previous layer are combined for deconvolution.

如图2所示，G00/G10代表第一层的反卷积，G01/G11代表第二层的反卷积，G02/G12代表第三层的反卷积。在实际应用中，关键点检测模型的反卷积层数可以根据需要进行设置。可以理解的是，当反卷积层的层数越多时，人体关键点的检测精度越高，但相应的计算量就会越大，因此，反卷积的实际层数可以根据应用的场景和/或运行模型的设备的计算能力进行合理设置。As shown in Figure 2, G00/G10 represents the deconvolution of the first layer, G01/G11 represents the deconvolution of the second layer, and G02/G12 represents the deconvolution of the third layer. In practical applications, the number of deconvolution layers of the keypoint detection model can be set as needed. It is understandable that when the number of deconvolution layers is more, the detection accuracy of key points of the human body is higher, but the corresponding calculation amount will be larger. Therefore, the actual number of deconvolution layers can be determined according to the application scenario and /or the computing power of the device running the model should be reasonably set.

在图2中，以G01为例，左边的(深色)方块为直接从提取的第一特征图中复制得到第一特征图，右边的(浅色)方块为对前一层进行反卷积得到的第三特征图，其中，左边方块的分辨率与右边方块的分辨率相同，例如，G01中的右边方块的分辨率为32*24，则从第一特征图中复制分辨率为32*24的第一特征图作为左边方块的内容。在进行G02(下一层)的反卷积时，将G01中的左边方块与右边方块进行组合后进行反卷积，具体可以将是左边方块对应的第一特征图与右边方块对应的第三特征图并排放置，然后使用反卷积的卷积核对并排放置的特征图进行反卷积。In Figure 2, taking G01 as an example, the left (dark) block is the first feature map copied directly from the extracted first feature map, and the right (light) block is the deconvolution of the previous layer The obtained third feature map, in which the resolution of the left block is the same as that of the right block, for example, the resolution of the right block in G01 is 32*24, then the resolution copied from the first feature map is 32* The first feature map of 24 is used as the content of the left box. When performing the deconvolution of G02 (the next layer), deconvolution is performed after combining the left block and the right block in G01. Specifically, the first feature map corresponding to the left block and the third feature map corresponding to the right block can be used. The feature maps are placed side by side, and then the side-by-side feature maps are deconvolved using the deconvolution kernel.

在一个具体的例子中，S102包括：在进行第一层的反卷积时，关键点检测模型利用第二子模块根据最小分辨率的第一特征图进行反卷积。In a specific example, S102 includes: when performing the deconvolution of the first layer, the key point detection model uses the second sub-module to perform deconvolution according to the first feature map with the minimum resolution.

继续参考图2，在进行G00(第一层)的反卷积时，直接使用图2中左边最长的方块(即最小分辨率的第一特征图)进行反卷积。可选地，最小分辨率的第一特征图为第一子模块的子神经网络的最后一层卷积层提取的特征图。Continuing to refer to Figure 2, when performing deconvolution of G00 (first layer), directly use the longest block on the left in Figure 2 (ie the first feature map with the smallest resolution) for deconvolution. Optionally, the first feature map with the smallest resolution is the feature map extracted by the last convolutional layer of the sub-neural network of the first sub-module.

S103：利用关键点检测模型的第三子模块根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。S103: Using the third sub-module of the key point detection model to output a heat map of each human body key point of the target human body image according to the second feature map, and determine each human body key point of the target human body image according to the heat map.

其中，人体关键点的个数可以根据实际情况进行设置，例如是14、16或63等，这里不做具体限定。Wherein, the number of key points of the human body can be set according to the actual situation, for example, 14, 16 or 63, etc., which is not specifically limited here.

可选地，关键点检测模型利用第三子模块根据人体关键点的个数对第二子模块得到的第二特征图进行卷积，输出与人体关键点的个数相同的热力图，其中，每一热力图对应一个人体关键点；然后关键点检测模型再根据所有的热力图确定目标人体图像的所有的人体关键点。可选地，第三子模块对第二特征图进行卷积时，卷积核的尺寸使用1*1的尺寸。Optionally, the key point detection model uses the third sub-module to convolve the second feature map obtained by the second sub-module according to the number of key points of the human body, and outputs a heat map with the same number of key points of the human body, wherein, Each heat map corresponds to a key point of the human body; then the key point detection model determines all key points of the human body in the target human body image based on all the heat maps. Optionally, when the third sub-module performs convolution on the second feature map, the size of the convolution kernel is 1*1.

与现有技术相比，本发明实施方式提供的人体关键点检测方法，通过关键点检测模型提取目标人体图像多组不同分辨率的第一特征图；根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，将分辨率与前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积；根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。由于第三特征图的信息与组合的第一特征图的信息可以互相弥补，因此将两个特征图组合后进行反卷积，可以使关键点检测模型学习到目标人体图像中更多的人体关键点的信息，从而有效地提高人体关键点的检测精度和人体姿态估计结果的准确性。Compared with the prior art, the human body key point detection method provided by the embodiment of the present invention uses the key point detection model to extract multiple sets of first feature maps of different resolutions of the target human body image; performs several layers of deconvolution according to the first feature map product to obtain the second feature map, wherein, when performing the deconvolution of the next layer, the first feature map and the third feature map with the same resolution as the third feature map obtained by the previous layer of deconvolution are combined performing deconvolution; outputting a heat map of each human key point of the target human body image according to the second feature map, and determining each human body key point of the target human body image according to the heat map. Since the information of the third feature map and the information of the combined first feature map can complement each other, deconvolution after combining the two feature maps can enable the key point detection model to learn more human key points in the target human body image. Point information, thereby effectively improving the detection accuracy of human key points and the accuracy of human pose estimation results.

本发明的第二实施方式涉及一种人体关键点检测方法。第二实施方式与第一实施方式大致相同，主要区别之处在于：利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，包括：在进行每一层的反卷积时，利用第二子模块根据通道数将反卷积分为若干组。通过将反卷积分为若干组进行，可以使关键点检测模型的运算量大幅减少，从而使关键点检测模型可以在计算能力较低的移动终端上运行，拓宽人体关键点检测方法的应用范围。The second embodiment of the present invention relates to a method for detecting key points of a human body. The second embodiment is roughly the same as the first embodiment, and the main difference is that: using the second sub-module of the key point detection model to perform deconvolution of several layers according to the first feature map, including: performing deconvolution of each layer During convolution, use the second sub-module to integrate the deconvolution into several groups according to the number of channels. By dividing the deconvolution into several groups, the calculation amount of the key point detection model can be greatly reduced, so that the key point detection model can be run on mobile terminals with low computing power, and the application range of the key point detection method of the human body can be broadened.

本发明实施方式提供的人体关键点检测方法的具体流程如图3所示，具体包括以下步骤：The specific flow of the human body key point detection method provided by the embodiment of the present invention is shown in Figure 3, which specifically includes the following steps:

S201：利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，其中，第一特征图包括多组不同分辨率的特征图。S201: Using the first sub-module of the key point detection model to extract a first feature map of the target human body image, where the first feature map includes multiple groups of feature maps with different resolutions.

S202：利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，利用第二子模块将分辨率前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积,在进行每一层的反卷积时，利用第二子模块根据通道数将反卷积分为若干组。S202: Use the second submodule of the key point detection model to perform several layers of deconvolution according to the first feature map to obtain a second feature map, wherein, when performing deconvolution of the next layer, use the second submodule to deconvolute The third feature map obtained by the deconvolution of the previous layer of resolution is the same as the first feature map and the third feature map are combined for deconvolution. When performing deconvolution of each layer, the second sub-module is used according to the channel The number will be deconvoluted into groups.

S203：利用关键点检测模型的第三子模块根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。S203: Using the third sub-module of the key point detection model to output a heat map of each human body key point of the target human body image according to the second feature map, and determine each human body key point of the target human body image according to the heat map.

其中，S201和S203分别与第一实施方式中的S101和S103相同，具体可以参见第一实施方式中的描述，为了避免重复，这里不再赘述。Wherein, S201 and S203 are respectively the same as S101 and S103 in the first embodiment. For details, please refer to the description in the first embodiment. In order to avoid repetition, details are not repeated here.

对于S202，可以理解的是，为了输出多个人体关键点的热力图，需要使用多个卷积核对第一特征图进行反卷积，而卷积核的个数与通道数对应，因此，在每一层的反卷积中都会存在有多个通道数，关键点检测模型可以根据通道数将反卷积分为若干组进行。其中，每一层反卷积的分组数可以根据实际需要进行设置，例如分为2组、4组或8组等，这里不做具体限定。可选地，每一层反卷积的分组数可以相同也可以不同，例如图2中的G00与G01的分组数可以相同也可以不同。而为了方便对应前后层的反卷积，可以将每一层的分组数设置为相同的组数，例如均分为2组。For S202, it can be understood that in order to output the heat map of multiple key points of the human body, it is necessary to use multiple convolution kernels to deconvolute the first feature map, and the number of convolution kernels corresponds to the number of channels. Therefore, in There are multiple channels in the deconvolution of each layer, and the key point detection model can divide the deconvolution into several groups according to the number of channels. Wherein, the number of groups of each layer of deconvolution can be set according to actual needs, for example, divided into 2 groups, 4 groups or 8 groups, etc., which are not specifically limited here. Optionally, the number of groups of each layer of deconvolution may be the same or different, for example, the number of groups of G00 and G01 in FIG. 2 may be the same or different. In order to facilitate the deconvolution corresponding to the front and rear layers, the number of groups of each layer can be set to the same number of groups, for example, divided into 2 groups.

为了评估反卷积的运算量，可以使用以下计算公式计算第i层反卷积的运算量：In order to evaluate the computational complexity of deconvolution, the computational complexity of the i-th layer deconvolution can be calculated using the following formula:

FLOPs_i＝C_i-1*H_i*W_i*C_i*C；FLOPs _i = C _i-1 *H _i *W _i *C _i *C;

其中，C表示常量，与反卷积的卷积核大小以及每次反卷积的运算量相关；C_i-1表示反卷积输入的通道数，C_i表示反卷积输出的通道数，H_i和W_i分别表示反卷积输出的特征图的长和宽。Among them, C represents a constant, which is related to the size of the convolution kernel of deconvolution and the calculation amount of each deconvolution; C _i-1 represents the number of channels input by deconvolution, and C _i represents the number of channels output by deconvolution, H _i and W _i denote the length and width of the feature map output by deconvolution, respectively.

由于H_i和W_i在分组时不会发生变化，而C为常量，因此若将反卷积进行2分组，C_i-1和C_i均减少到原来的1/2,则每组的运算量减少为原来的1/4，总的运算量为原来(不分组)的1/2。若将反卷积进行4分组，C_i-1和C_i均减少到原来的1/4，则每组的运算量减少为原来的1/16，总的运算量为原来(不分组)的1/4。Since H _i and W _i do not change during grouping, and C is a constant, if the deconvolution is divided into two groups, both C _i-1 and C _i are reduced to 1/2 of the original, and the operation of each group The amount is reduced to 1/4 of the original, and the total calculation amount is 1/2 of the original (without grouping). If the deconvolution is grouped into 4 groups, both C _i-1 and C _i are reduced to 1/4 of the original, then the calculation amount of each group is reduced to 1/16 of the original, and the total calculation amount is the original (without grouping) 1/4.

与现有技术相比，本发明实施方式提供的人体关键点检测方法，通过在进行每一层的反卷积时，根据通道数将反卷积分为若干组进行，可以使反卷积的运算量在原来的基础上大幅减少，从而使关键点检测模型可以在相同计算能力下通过增加模型运算量(如增加反卷积的层数)来提高人体关键点的检测精度，或者，使关键点检测模型在不增加模型运算量的情况下提高检测的速度，降低对运行模型的设备的计算能力要求，使关键点检测模型可以在计算能力较低的设备(例如移动终端)上运行，拓宽关键点检测模型及人体关键点检测方法的应用范围。Compared with the prior art, the human body key point detection method provided by the embodiment of the present invention can make the deconvolution operation The amount is greatly reduced on the original basis, so that the key point detection model can improve the detection accuracy of human key points by increasing the model calculation amount (such as increasing the number of deconvolution layers) under the same computing power, or make the key points The detection model improves the speed of detection without increasing the amount of calculation of the model, reduces the computing power requirements of the equipment running the model, enables the key point detection model to run on devices with low computing power (such as mobile terminals), and broadens the scope of key points. The point detection model and the application range of the human body key point detection method.

为了进一步降低关键点检测模型对计算能力的要求，在一个具体的例子中，S201中，利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，具体可以为：利用第一子模块根据轻量级卷积神经网络获取目标人体图像的第一特征图。In order to further reduce the computing power requirement of the key point detection model, in a specific example, in S201, the first feature map of the target human body image is extracted by using the first sub-module of the key point detection model, which may specifically be: using the first The sub-module acquires the first feature map of the target human body image according to the lightweight convolutional neural network.

其中，轻量级卷积神经网络具体可以根据实际需要选用，例如为MobileNet V2或ShuffleNet v2等，这里不做具体限制。由于轻量级卷积神经网络可以减少关键点检测模型对计算能力的要求，因此根据轻量级卷积神经网络获取第一特征图，可以从整体上降低关键点检测模型对运行模型的设备的计算能力要求。Among them, the lightweight convolutional neural network can be selected according to actual needs, such as MobileNet V2 or ShuffleNet v2, etc., and there are no specific restrictions here. Since the lightweight convolutional neural network can reduce the computing power requirements of the key point detection model, the acquisition of the first feature map according to the lightweight convolutional neural network can reduce the overall impact of the key point detection model on the device running the model. Computing power required.

在一个具体的例子中，上述利用第一子模块根据轻量级卷积神经网络获取目标人体图像的第一特征图，可以进一步细化为：利用第一子模块将轻量级卷积神经网络最后一层卷积层的通道数减少至预设个数，根据减少通道数后的轻量级卷积神经网络获取目标人体图像的第一特征图。In a specific example, using the first sub-module to obtain the first feature map of the target human body image according to the lightweight convolutional neural network can be further refined as: using the first sub-module to convert the lightweight convolutional neural network The number of channels of the last convolutional layer is reduced to a preset number, and the first feature map of the target human body image is obtained according to the lightweight convolutional neural network after the reduced number of channels.

其中，预设个数可以根据训练的效果而决定，若将轻量级卷积神经网络最后一层卷积层的通道数减少后，检测的效果与减少前相比降低的程度在可以接收范围内，则将减少后的通道数作为预设个数进行应用。可以理解的是，将通道数减少至预设个数，可以进一步减少轻量级卷积神经网络的运算量。例如，若轻量级卷积神经网络为MobileNet V2，若将MobileNet V2最后一层的通道数从1280减少至160时，关键点检测模型的检测精度仅下降0.5％，则可以将160作为最后一层通道的预设个数。Among them, the preset number can be determined according to the effect of training. If the number of channels in the last convolutional layer of the lightweight convolutional neural network is reduced, the detection effect will be reduced to an acceptable extent compared with that before the reduction. , the reduced number of channels will be used as the preset number. It can be understood that reducing the number of channels to a preset number can further reduce the computation load of the lightweight convolutional neural network. For example, if the lightweight convolutional neural network is MobileNet V2, if the number of channels in the last layer of MobileNet V2 is reduced from 1280 to 160, the detection accuracy of the key point detection model will only decrease by 0.5%, and 160 can be used as the last layer. The preset number of layer channels.

通过合理设置预设个数并将轻量级卷积神经网络最后一层卷积层的通道数减少至预设个数，可以使关键点检测模型在检测效果相差不大的情况下减少轻量级卷积神经网络的运算量，从而进一步降低关键点检测模型对运行模型的设备的计算能力要求。By setting the preset number reasonably and reducing the channel number of the last convolutional layer of the lightweight convolutional neural network to the preset number, the key point detection model can be reduced in weight while the detection effect is not much different. The computational load of the level convolutional neural network, thereby further reducing the computing power requirements of the key point detection model on the device running the model.

本发明的第三实施方式涉及一种人体关键点检测方法。第三实施方式与第一实施方式大致相同，主要区别之处在于：利用关键点检测模型的第三子模块根据第二特征图输出目标人体图像的各个人体关键点的热力图，包括：利用第三子模块根据人体肢体空间关系对第二特征图进行分组卷积，根据分组卷积的结果输出目标人体图像的各个人体关键点的热力图。由于人体相同部位的肢体在图像中具有较为固定的空间关系，因此根据人体肢体空间关系对第二特征图进行分组卷积，可以使人体关键点的检测更加准确。The third embodiment of the present invention relates to a method for detecting key points of a human body. The third embodiment is roughly the same as the first embodiment, the main difference is that the third sub-module of the key point detection model is used to output the heat map of each human key point of the target human body image according to the second feature map, including: using the first The third sub-module performs group convolution on the second feature map according to the spatial relationship of human limbs, and outputs the heat map of each key point of the target human body image according to the result of the group convolution. Since the limbs of the same part of the human body have a relatively fixed spatial relationship in the image, the second feature map is grouped and convolved according to the spatial relationship of the human limbs, which can make the detection of the key points of the human body more accurate.

本发明实施方式提供的人体关键点检测方法的具体流程如图4所示，具体包括以下步骤：The specific flow of the human body key point detection method provided by the embodiment of the present invention is shown in Figure 4, and specifically includes the following steps:

S301：利用关键点检测模型的第一子模块提取目标人体图像的第一特征图，其中，第一特征图包括多组不同分辨率的特征图。S301: Using the first sub-module of the key point detection model to extract a first feature map of the target human body image, where the first feature map includes multiple groups of feature maps with different resolutions.

S302：利用关键点检测模型的第二子模块根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，利用第二子模块将分辨率前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积。S302: Use the second sub-module of the key point detection model to perform several layers of deconvolution according to the first feature map to obtain the second feature map, wherein, when performing the deconvolution of the next layer, use the second sub-module to deconvolute The third feature map obtained by the deconvolution of the previous layer of resolution is the same as the first feature map and the third feature map are combined for deconvolution.

S303：利用关键点检测模型的第三子模块根据人体肢体空间关系对第二特征图进行分组卷积，根据分组卷积的结果输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。S303: Use the third sub-module of the key point detection model to perform group convolution on the second feature map according to the spatial relationship of human limbs, output the heat map of each key point of the target human body image according to the result of the group convolution, and according to the heat map Determining each human body keypoint of the target human body image.

其中，S301和S302分别与第一实施方式中的S101和S102相同，具体可以参见第一实施方式中的描述，为了避免重复，这里不再赘述。Wherein, S301 and S302 are respectively the same as S101 and S102 in the first embodiment. For details, please refer to the description in the first embodiment. In order to avoid repetition, details are not repeated here.

对于S303，具体地，由于人体中同一部位的肢体在目标人体图像中会存在较为固定的空间关系，例如，手部中各个人体关键点的相对位置会相互邻近，并且，同一部位的肢体由于集中在目标人体图像的某个区域，在目标人体图像中的特征一般也较为类似，因此根据人体肢体空间关系对第二特征图进行分组卷积，可以使关键点检测模型在同一组卷积中互相学习和参考同一部位肢体的空间位置和图像特征，从而有效地提高人体关键点的检测精度。For S303, specifically, because the limbs of the same part of the human body will have a relatively fixed spatial relationship in the target human body image, for example, the relative positions of the key points of the human body in the hand will be adjacent to each other, and the limbs of the same part will be concentrated due to the In a certain area of the target human body image, the features in the target human body image are generally similar. Therefore, grouping and convolving the second feature map according to the spatial relationship of human body limbs can make the key point detection models interact with each other in the same group of convolutions. Learning and referring to the spatial position and image features of the limbs of the same part can effectively improve the detection accuracy of human key points.

可选地，根据人体肢体空间关系对第二特征图的具体分组，可以根据实际需要设置分组数，这里不做具体限制。例如，可以根据头部、躯干、手部、和脚部等区域进行分组。可选地，还可以对某中一个部位进行细分从而得到新的分组，例如，将手部中的左手和右手各分为一组。Optionally, according to the specific grouping of the second feature map according to the spatial relationship of human limbs, the number of groups can be set according to actual needs, and no specific limitation is set here. For example, you can group by regions such as head, torso, hands, and feet. Optionally, a part can also be subdivided to obtain a new group, for example, the left hand and the right hand in the hand are divided into one group.

可以理解的是，根据人体肢体空间关系对第二特征图进行分组卷积，在训练的过程中，第二特征图的分组卷积可以通过反向传播算法影响到关键检测模型的前面分组反卷积及其它部分，使关键点检测模型整体的检测效果是根据人体肢体空间关系得到的，从而有效地提高人体关键点的检测精度。It can be understood that the second feature map is grouped and convolved according to the spatial relationship of human limbs. During the training process, the grouped convolution of the second feature map can affect the previous group deconvolution of the key detection model through the backpropagation algorithm. and other parts, so that the overall detection effect of the key point detection model is obtained according to the spatial relationship of human body limbs, thereby effectively improving the detection accuracy of human key points.

在一个具体的例子中，利用第三子模块根据人体肢体空间关系对第二特征图进行分组卷积，具体可以为：利用第三子模块根据预设的关键点分组对第二特征图进行分组卷积，其中，关键点分组中的每一组的第二特征图包括处于邻近位置、并属于人体同一肢体位置的第一人体关键点的第二特征图，第一人体关键点为根据人体肢体空间关系预先定义的关键点。In a specific example, the second feature map is grouped and convolved according to the spatial relationship of human limbs by using the third sub-module, which may specifically be: using the third sub-module to group the second feature map according to the preset key point grouping Convolution, wherein, the second feature map of each group in the key point grouping includes the second feature map of the first human body key point that is in the adjacent position and belongs to the same limb position of the human body, the first human body key point is based on the body limb Spatial relationships to predefined keypoints.

请参考图5，其为本发明实施方式提供的人体关键点检测方法的检测示例图。在图5中，人体关键点分为63个关键点，每一关键点为根据人体肢体空间关系预先定义的关键点，例如，0和58人体关键点分别是指颈部的左右两边的点。在图5中，每个框表示一个分组，其中头部的关键点分为一组、手部分为四组，躯干部分为两组、脚部分为两组。Please refer to FIG. 5 , which is a detection example diagram of the human body key point detection method provided by the embodiment of the present invention. In Figure 5, the key points of the human body are divided into 63 key points, and each key point is a key point predefined according to the spatial relationship of the body limbs. For example, 0 and 58 key points of the human body refer to the points on the left and right sides of the neck, respectively. In Figure 5, each box represents a grouping, where the key points of the head are divided into one group, the hands are divided into four groups, the torso is divided into two groups, and the feet are divided into two groups.

可以理解的是，图5的人体关键点为目标人体图像的边缘点，当需要知道目标人体图像中骨架所在的关键点时，可以根据骨架所在的关键点与图中定义的人体关键点的相对位置或相对距离计算得到。It can be understood that the key points of the human body in Figure 5 are the edge points of the target human body image. When it is necessary to know the key points where the skeleton is located in the target human body image, it can be based on the relative relationship between the key points where the skeleton is located and the key points of the human body defined in the figure. The position or relative distance is calculated.

与现有技术相比，本发明实施方式提供的人体关键点检测方法，根据人体肢体空间关系对第二特征图进行分组卷积，根据分组卷积的结果输出目标人体图像的各个人体关键点的热力图。由于人体中同一部位的肢体在目标人体图像中存在较为固定的空间关系，且同一部位的肢体在目标人体图像中的特征也较为类似，因此根据人体肢体空间关系对第二特征图进行分组卷积，可以使关键点检测模型在同一组卷积中互相学习和参考同一部位肢体的空间位置和图像特征，从而有效地提高人体关键点的检测精度。Compared with the prior art, the human body key point detection method provided by the embodiment of the present invention performs group convolution on the second feature map according to the spatial relationship of human body limbs, and outputs the key points of each human body in the target human body image according to the result of the group convolution. heat map. Since the limbs of the same part of the human body have a relatively fixed spatial relationship in the target human body image, and the features of the limbs of the same part in the target human body image are relatively similar, the second feature map is grouped and convolved according to the spatial relationship of the human limbs , which can make the key point detection models learn from each other and refer to the spatial position and image features of the same limb in the same group of convolutions, thereby effectively improving the detection accuracy of human key points.

上面各种方法的步骤划分，只是为了描述清楚，实现时可以合并为一个步骤或者对某些步骤进行拆分，分解为多个步骤，只要包含相同的逻辑关系，都在本专利的保护范围内；对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计，但不改变其算法和流程的核心设计都在该专利的保护范围内。The division of steps in the above methods is only for the sake of clarity of description. During implementation, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they contain the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

本发明第四实施方式涉及一种人体关键点检测装置400，如图6所示，包含：提取模块401、反卷积模块402和检测模块403，各模块功能详细说明如下：The fourth embodiment of the present invention relates to a human body key point detection device 400, as shown in Figure 6, comprising: an extraction module 401, a deconvolution module 402, and a detection module 403, and the functions of each module are described in detail as follows:

提取模块401，用于提取目标人体图像的第一特征图，第一特征图包括多组不同分辨率的特征图；The extraction module 401 is used to extract the first feature map of the target human body image, the first feature map includes multiple groups of feature maps with different resolutions;

反卷积模块402，用于根据第一特征图进行若干层的反卷积，得到第二特征图，其中，在进行下一层的反卷积时，将分辨率前一层反卷积得到的第三特征图相同的第一特征图和第三特征图组合后进行反卷积；The deconvolution module 402 is used to perform deconvolution of several layers according to the first feature map to obtain a second feature map, wherein, when deconvolution of the next layer is performed, the previous layer of resolution is deconvolved to obtain Deconvolution is performed after combining the same first feature map and third feature map of the third feature map;

检测模块403，用于根据第二特征图输出目标人体图像的各个人体关键点的热力图，并根据热力图确定目标人体图像的各个人体关键点。The detection module 403 is configured to output a heat map of each human key point of the target human body image according to the second feature map, and determine each human body key point of the target human body image according to the heat map.

进一步地，反卷积模块402还用于：Further, the deconvolution module 402 is also used for:

在进行每一层的反卷积时，根据通道数将反卷积分为若干组。When deconvolution of each layer is performed, the deconvolution is integrated into several groups according to the number of channels.

在进行第一层的反卷积时，根据最小分辨率的第一特征图进行反卷积。When deconvolution of the first layer is performed, deconvolution is performed according to the first feature map of the minimum resolution.

进一步地，检测模块403还用于：Further, the detection module 403 is also used for:

根据人体肢体空间关系对第二特征图进行分组卷积，根据分组卷积的结果输出目标人体图像的各个人体关键点的热力图。Carry out group convolution on the second feature map according to the spatial relationship of human body limbs, and output the heat map of each key point of the human body in the target human body image according to the result of the group convolution.

根据预设的关键点分组对第二特征图进行分组卷积，其中，关键点分组中的每一组的第二特征图包括处于邻近位置、并属于人体同一肢体位置的第一人体关键点的第二特征图，第一人体关键点为根据人体肢体空间关系预先定义的关键点。Perform group convolution on the second feature map according to the preset key point grouping, wherein the second feature map of each group in the key point grouping includes the key points of the first human body that are in adjacent positions and belong to the same limb position of the human body In the second feature map, the first key points of the human body are key points predefined according to the spatial relationship of human body limbs.

进一步地，提取模块401还用于：Further, the extraction module 401 is also used for:

根据轻量级卷积神经网络获取目标人体图像的第一特征图。Obtain the first feature map of the target human body image based on a lightweight convolutional neural network.

将轻量级卷积神经网络最后一层卷积层的通道数减少至预设个数，根据减少通道数后的轻量级卷积神经网络获取目标人体图像的第一特征图。The number of channels of the last convolutional layer of the lightweight convolutional neural network is reduced to a preset number, and the first feature map of the target human body image is obtained according to the reduced number of channels of the lightweight convolutional neural network.

不难发现，本实施方式为与第一实施方式、第二实施方式及第三实施方式相对应的装置实施例，本实施方式可与第一实施方式、第二实施方式及第三实施方式互相配合实施。第一实施方式、第二实施方式及第三实施方式中提到的相关技术细节在本实施方式中依然有效，为了减少重复，这里不再赘述。相应地，本实施方式中提到的相关技术细节也可应用在第一实施方式、第二实施方式及第三实施方式中。It is not difficult to find that this embodiment is a device embodiment corresponding to the first embodiment, the second embodiment and the third embodiment, and this embodiment can be mutually compatible with the first embodiment, the second embodiment and the third embodiment. Cooperate with implementation. The relevant technical details mentioned in the first embodiment, the second embodiment, and the third embodiment are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the related technical details mentioned in this implementation manner can also be applied in the first implementation manner, the second implementation manner and the third implementation manner.

值得一提的是，本实施方式中所涉及到的各模块均为逻辑模块，在实际应用中，一个逻辑单元可以是一个物理单元，也可以是一个物理单元的一部分，还可以以多个物理单元的组合实现。此外，为了突出本发明的创新部分，本实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入，但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present invention, units that are not closely related to solving the technical problems proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

本发明第五实施方式涉及一种网络设备，如图7所示，包括至少一个处理器501；以及，与至少一个处理器501通信连接的存储器502；其中，存储器502存储有可被至少一个处理器501执行的指令，指令被至少一个处理器501执行，以使至少一个处理器501能够执行上述的人体关键点检测方法。The fifth embodiment of the present invention relates to a network device. As shown in FIG. 7 , it includes at least one processor 501; and a memory 502 communicatively connected to at least one processor 501; The instructions executed by the processor 501 are executed by at least one processor 501, so that the at least one processor 501 can execute the above-mentioned human body key point detection method.

其中，存储器502和处理器501利用总线方式连接，总线可以包括任意数量的互联的总线和桥，总线将一个或多个处理器501和存储器502的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起，这些都是本领域所公知的，因此，本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件，也可以是多个元件，比如多个接收器和发送器，提供用于在传输介质上与各种其他装置通信的单元。经处理器501处理的数据通过天线在无线介质上进行传输，进一步，天线还接收数据并将数据传送给处理器501。Wherein, the memory 502 and the processor 501 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 501 and various circuits of the memory 502 together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor 501 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 501 .

处理器501负责管理总线和通常的处理，还可以提供各种功能，包括定时，外围接口，电压调节、电源管理以及其他控制功能。而存储器502可以被用于存储处理器501在执行操作时所使用的数据。Processor 501 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 502 may be used to store data used by the processor 501 when performing operations.

本发明第六实施方式涉及一种计算机可读存储介质，存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。The sixth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.

即，本领域技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序存储在一个存储介质中，包括若干指令用以使得一个设备(可以是单片机，芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device (can It is a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc and other media that can store program codes.

本领域的普通技术人员可以理解，上述各实施方式是实现本发明的具体实施例，而在实际应用中，可以在形式上和细节上对其作各种改变，而不偏离本发明的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present invention, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present invention. scope.

Claims

1. a human body key point detection method, is characterized in that, is applied to the key point detection model of pre-training, and described human body key point detection method comprises:

Using the first sub-module of the key point detection model to extract the first feature map of the target human body image, the first feature map includes multiple groups of feature maps with different resolutions;

Use the second sub-module of the key point detection model to perform several layers of deconvolution according to the first feature map to obtain a second feature map, wherein, when performing deconvolution of the next layer, use the first feature map The second sub-module performs deconvolution after combining the first feature map with the same resolution as the third feature map obtained by the previous layer of deconvolution with the third feature map;

Utilize the third sub-module of the key point detection model to output the heat map of each human body key point of the target human body image according to the second feature map, and determine each human body key point of the target human body image according to the heat map point;

Wherein, the third sub-module using the key point detection model outputs a heat map of each human body key point of the target human body image according to the second feature map, including:

Using the third sub-module to perform group convolution on the second feature map according to the spatial relationship of human limbs, and output a heat map of each key point of the target human body image according to the result of the group convolution.

2. human body key point detection method according to claim 1, is characterized in that, the described second submodule utilizing described key point detection model carries out the deconvolution of several layers according to described first feature map, comprising:

When deconvolution of each layer is performed, the second sub-module is used to integrate the deconvolution into several groups according to the number of channels.

3. human body key point detection method according to claim 1, is characterized in that, the described second submodule utilizing described key point detection model carries out the deconvolution of several layers according to described first feature map, comprising:

When deconvolution of the first layer is performed, the second sub-module is used to perform deconvolution according to the first feature map with the minimum resolution.

4. The human body key point detection method according to claim 1, wherein the second feature map is grouped and convolved according to the spatial relationship of human limbs by using the third submodule, specifically:

Use the third submodule to perform group convolution on the second feature map according to the preset key point grouping, the second feature map of each group in the key point grouping includes adjacent positions and belonging to The second feature map of the first key points of the human body at the same limb position of the human body, where the first key points of the human body are key points predefined according to the spatial relationship of the human body limbs.

5. human body key point detection method according to claim 1, is characterized in that, described utilizing the first submodule of described key point detection model to extract the first feature map of described target human body image, specifically:

The first feature map of the target human body image is obtained by using the first submodule according to a lightweight convolutional neural network.

6. human body key point detection method according to claim 5, is characterized in that, described utilizing described first submodule to obtain the first feature map of described target human body image according to lightweight convolutional neural network, comprising:

Use the first sub-module to reduce the number of channels of the last convolutional layer of the lightweight convolutional neural network to a preset number, and obtain the obtained information according to the reduced number of channels of the lightweight convolutional neural network. The first feature map of the target human body image.

7. A human key point detection device, characterized in that it comprises:

An extraction module, used to extract the first feature map of the target human body image, the first feature map includes multiple groups of feature maps with different resolutions;

The deconvolution module is used to perform deconvolution of several layers according to the first feature map to obtain a second feature map, wherein, when performing deconvolution of the next layer, the previous layer of resolution is deconvoluted performing deconvolution after combining the first feature map and the third feature map obtained with the same third feature map;

A detection module, configured to output a heat map of each human body key point of the target human body image according to the second feature map, and determine each human body key point of the target human body image according to the heat map;

Wherein, the outputting the heat map of each key point of the target human body image according to the second feature map includes:

Perform group convolution on the second feature map according to the spatial relationship of human body limbs, and output a heat map of each human body key point of the target human body image according to the result of the group convolution.

8. A network device, characterized in that, comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the operation described in any one of claims 1 to 6 The human key point detection method described above.

9. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the human body key point detection method according to any one of claims 1 to 6 is realized.