CN112115746A

CN112115746A - Human body action recognition device and method and electronic equipment

Info

Publication number: CN112115746A
Application number: CN201910541727.3A
Authority: CN
Inventors: 尹汭; 谭志明; 张宗艳; 丁蓝
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2020-12-22
Anticipated expiration: 2039-06-21
Also published as: JP7419964B2; JP2021002332A; CN112115746B

Abstract

Embodiments of the present invention provide a human body motion recognition device and method, and electronic equipment. First, the bounding box of the human body is detected in the input image, and there are selected key points based on the human body in the detected bounding box to detect the action of the human body and/or based on the convolutional neural network. The detection method is faster and has higher recognition accuracy, and by combining the two detection methods, different detection methods can be selected according to different situations, which can flexibly respond to various scenarios and needs.

Description

Human body motion recognition device and method, electronic equipment

技术领域technical field

本发明涉及信息技术领域。The present invention relates to the field of information technology.

背景技术Background technique

近年来，在深度学习的帮助下，计算机视觉领域的研究取得了很大的进步。深度学习是指在分层神经网络上运用各种机器学习算法解决图像、文本等各种问题的算法集合。深度学习的核心是特征学习，旨在通过分层神经网络获取分层次的特征信息，从而解决以往需要人工设计特征的重要难题。In recent years, research in the field of computer vision has made great progress with the help of deep learning. Deep learning refers to a collection of algorithms that use various machine learning algorithms on hierarchical neural networks to solve various problems such as images and texts. The core of deep learning is feature learning, which aims to obtain hierarchical feature information through hierarchical neural networks, so as to solve the important problems that required manual design of features in the past.

安全监控是深度学习的重要应用之一，而人体动作和行为识别是安全监控的重要组成部分。Security monitoring is one of the important applications of deep learning, and human action and behavior recognition is an important part of security monitoring.

应该注意，上面对技术背景的介绍只是为了方便对本发明的技术方案进行清楚、完整的说明，并方便本领域技术人员的理解而阐述的。不能仅仅因为这些方案在本发明的背景技术部分进行了阐述而认为上述技术方案为本领域技术人员所公知。It should be noted that the above description of the technical background is only for the convenience of clearly and completely describing the technical solutions of the present invention and facilitating the understanding of those skilled in the art. It should not be assumed that the above-mentioned technical solutions are well known to those skilled in the art simply because these solutions are described in the background section of the present invention.

发明内容SUMMARY OF THE INVENTION

但是，由于人体的动作比较复杂，应用的场景也多变，现有的动作识别方法处理速度较慢，且识别精度不高，另外，无法灵活的应对各种不同的场景和需求。However, due to the complex movements of the human body and the changeable application scenarios, the existing action recognition methods have slow processing speed and low recognition accuracy. In addition, they cannot flexibly respond to various scenarios and needs.

本发明实施例提供一种人体动作识别装置及方法、电子设备，首先在输入图像中检测人体的边界框，并在检测出的边界框中有选择的基于人体的关键点来检测人体的动作和/或基于卷积神经网络来检测人体的动作，这样，通过分级的检测方式，处理速度较快且识别精度较高，并且，通过将两种检测方式相结合，可以根据不同的情况来选择不同的检测方式，能够灵活的应对各种场景和需求。Embodiments of the present invention provide a human body action recognition device and method, and electronic equipment. First, a bounding box of a human body is detected in an input image, and human body-based key points are selected in the detected bounding box to detect the human body's actions and actions. / or based on the convolutional neural network to detect human actions, in this way, through the hierarchical detection method, the processing speed is faster and the recognition accuracy is higher, and by combining the two detection methods, different detection methods can be selected according to different situations. It can flexibly respond to various scenarios and needs.

根据本发明实施例的第一方面，提供一种人体动作识别装置，所述装置包括：目标检测单元，其用于检测输入图像中的人体的边界框；第一检测单元，其用于在检测出的人体的边界框中，基于所述人体的关键点计算所述人体的特征，并根据所述人体的特征来检测所述人体的动作，得到第一识别结果；第二检测单元，其用于在检测出的人体的边界框中，基于卷积神经网络来检测所述人体的动作，得到第二识别结果；以及选择单元，其用于选择所述第一检测单元和所述第二检测单元中的至少一个来检测人体的动作，以得到所述第一识别结果和所述第二识别结果中的至少一个。According to a first aspect of the embodiments of the present invention, there is provided a human motion recognition device, the device includes: a target detection unit, which is used for detecting the bounding box of the human body in an input image; and a first detection unit, which is used for detecting In the bounding box of the human body, the features of the human body are calculated based on the key points of the human body, and the movements of the human body are detected according to the features of the human body to obtain the first recognition result; the second detection unit, which uses In the bounding box of the detected human body, the action of the human body is detected based on the convolutional neural network to obtain a second recognition result; and a selection unit is used to select the first detection unit and the second detection unit. At least one of the units is used to detect the motion of the human body, so as to obtain at least one of the first recognition result and the second recognition result.

根据本发明实施例的第二方面，提供一种电子设备，所述电子设备包括根据本发明实施例的第一方面所述的装置。According to a second aspect of the embodiments of the present invention, there is provided an electronic device, the electronic device includes the apparatus according to the first aspect of the embodiments of the present invention.

根据本发明实施例的第三方面，提供一种人体动作识别方法，所述方法包括：检测输入图像中的人体的边界框；选择并进行以下的至少一种检测：在检测出的人体的边界框中，基于所述人体的关键点计算所述人体的特征，并根据所述人体的特征来检测所述人体的动作，得到第一识别结果；以及在检测出的人体的边界框中，基于卷积神经网络来检测所述人体的动作，得到第二识别结果。According to a third aspect of the embodiments of the present invention, there is provided a human action recognition method, the method comprising: detecting a bounding box of a human body in an input image; selecting and performing at least one of the following detections: at the detected boundary of the human body In the frame, the features of the human body are calculated based on the key points of the human body, and the movements of the human body are detected according to the features of the human body to obtain a first recognition result; and in the bounding box of the detected human body, based on A convolutional neural network is used to detect the action of the human body, and a second recognition result is obtained.

本发明的有益效果在于：首先在输入图像中检测人体的边界框，并在检测出的边界框中有选择的基于人体的关键点来检测人体的动作和/或基于卷积神经网络来检测人体的动作，这样，通过分级的检测方式，处理速度较快且识别精度较高，并且，通过将两种检测方式相结合，可以根据不同的情况来选择不同的检测方式，能够灵活的应对各种场景和需求。The beneficial effect of the present invention is: firstly, the bounding box of the human body is detected in the input image, and the human body-based key points are selected in the detected bounding box to detect the human body's action and/or the convolutional neural network to detect the human body. In this way, through the hierarchical detection method, the processing speed is faster and the recognition accuracy is higher, and by combining the two detection methods, different detection methods can be selected according to different situations, which can flexibly respond to various scenarios and needs.

参照后文的说明和附图，详细公开了本发明的特定实施方式，指明了本发明的原理可以被采用的方式。应该理解，本发明的实施方式在范围上并不因而受到限制。在所附权利要求的精神和条款的范围内，本发明的实施方式包括许多改变、修改和等同。With reference to the following description and drawings, specific embodiments of the invention are disclosed in detail, indicating the manner in which the principles of the invention may be employed. It should be understood that embodiments of the present invention are not thereby limited in scope. Embodiments of the invention include many changes, modifications and equivalents within the spirit and scope of the appended claims.

针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多个其它实施方式中使用，与其它实施方式中的特征相组合，或替代其它实施方式中的特征。Features described and/or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, in combination with, or instead of features in other embodiments .

应该强调，术语“包括/包含”在本文使用时指特征、整件、步骤或组件的存在，但并不排除一个或更多个其它特征、整件、步骤或组件的存在或附加。It should be emphasized that the term "comprising/comprising" when used herein refers to the presence of a feature, integer, step or component, but does not exclude the presence or addition of one or more other features, integers, steps or components.

附图说明Description of drawings

所包括的附图用来提供对本发明实施例的进一步的理解，其构成了说明书的一部分，用于例示本发明的实施方式，并与文字描述一起来阐释本发明的原理。显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。在附图中：The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention, constitute a part of the specification, are used to illustrate embodiments of the invention, and together with the written description, serve to explain the principles of the invention. Obviously, the drawings in the following description are only some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort. In the attached image:

图1是本发明实施例1的人体动作识别装置的一示意图；1 is a schematic diagram of a human body motion recognition device according to Embodiment 1 of the present invention;

图2是本发明实施例1的第一检测单元102的一示意图；FIG. 2 is a schematic diagram of the first detection unit 102 according to Embodiment 1 of the present invention;

图3是本发明实施例1的人体的关键点的检测结果的一示意图；3 is a schematic diagram of the detection result of the key points of the human body according to Embodiment 1 of the present invention;

图4是本发明实施例1的基于关键点得到人体的特征的一示意图；4 is a schematic diagram of obtaining the features of the human body based on key points according to Embodiment 1 of the present invention;

图5是本发明实施例1的利用人体动作识别装置100进行人体动作识别的一示意图；FIG. 5 is a schematic diagram of performing human motion recognition using the human motion recognition device 100 according to Embodiment 1 of the present invention;

图6是本发明实施例2的电子设备的一示意图；6 is a schematic diagram of an electronic device according to Embodiment 2 of the present invention;

图7是本发明实施例2的电子设备的系统构成的一示意框图；7 is a schematic block diagram of a system configuration of an electronic device according to Embodiment 2 of the present invention;

图8是本发明实施例3的人体动作识别方法的一示意图。FIG. 8 is a schematic diagram of a method for recognizing human action according to Embodiment 3 of the present invention.

具体实施方式Detailed ways

参照附图，通过下面的说明书，本发明的前述以及其它特征将变得明显。在说明书和附图中，具体公开了本发明的特定实施方式，其表明了其中可以采用本发明的原则的部分实施方式，应了解的是，本发明不限于所描述的实施方式，相反，本发明包括落入所附权利要求的范围内的全部修改、变型以及等同物。The foregoing and other features of the present invention will become apparent from the following description with reference to the accompanying drawings. In the specification and drawings, specific embodiments of the invention are disclosed in detail, which are indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the described embodiments, but rather The invention includes all modifications, variations and equivalents falling within the scope of the appended claims.

实施例1Example 1

本发明实施例提供一种人体动作识别装置。图1是本发明实施例1的人体动作识别装置的一示意图。Embodiments of the present invention provide a human motion recognition device. FIG. 1 is a schematic diagram of a human body motion recognition device according to Embodiment 1 of the present invention.

如图1所示，人体动作识别装置100包括：As shown in FIG. 1 , the human body motion recognition device 100 includes:

目标检测单元101，其用于检测输入图像中的人体的边界框；A target detection unit 101, which is used to detect the bounding box of the human body in the input image;

第一检测单元102，其用于在检测出的人体的边界框中，基于人体的关键点计算人体的特征并根据人体的特征来检测人体的动作，得到第一识别结果；The first detection unit 102 is used to calculate the features of the human body based on the key points of the human body in the detected bounding box of the human body, and detect the movements of the human body according to the features of the human body to obtain a first recognition result;

第二检测单元103，其用于在检测出的人体的边界框中，基于卷积神经网络来检测人体的动作，得到第二识别结果；以及The second detection unit 103 is configured to detect the motion of the human body based on the convolutional neural network in the detected bounding box of the human body, and obtain a second recognition result; and

选择单元104，其用于选择第一检测单元102和第二检测单元103中的至少一个来检测人体的动作，以得到第一识别结果和第二识别结果中的至少一个。The selection unit 104 is configured to select at least one of the first detection unit 102 and the second detection unit 103 to detect the motion of the human body, so as to obtain at least one of the first recognition result and the second recognition result.

由上述实施例可知，首先在输入图像中检测人体的边界框，并在检测出的边界框中有选择的基于人体的关键点来检测人体的动作和/或基于卷积神经网络来检测人体的动作，这样，通过分级的检测方式，处理速度较快且识别精度较高，并且，通过将两种检测方式相结合，可以根据不同的情况来选择不同的检测方式，能够灵活的应对各种场景和需求。It can be seen from the above embodiment that the bounding box of the human body is first detected in the input image, and selected key points based on the human body are selected in the detected bounding box to detect the action of the human body and/or detect the movement of the human body based on the convolutional neural network. In this way, through the hierarchical detection method, the processing speed is faster and the recognition accuracy is higher, and by combining the two detection methods, different detection methods can be selected according to different situations, which can flexibly respond to various scenarios. and demand.

在本实施例中，该输入图像可以是实时获得或预先获得的图像。例如，该输入图像是监控设备拍摄得到的视频图像，每一个输入图像对应于该视频图像的一帧。In this embodiment, the input image may be an image obtained in real time or obtained in advance. For example, the input image is a video image captured by a monitoring device, and each input image corresponds to a frame of the video image.

在本实施例中，目标检测单元101用于检测输入图像中的人体的边界框。目标检测单元101可以基于各种目标检测方法进行检测，例如，Faster R-CNN，FPN，Yolo网络等。In this embodiment, the target detection unit 101 is used to detect the bounding box of the human body in the input image. The target detection unit 101 can perform detection based on various target detection methods, for example, Faster R-CNN, FPN, Yolo network, and the like.

在本实施例中，可以根据不同的需求使用不同的网络进行检测，例如，对处理速度要求较高时可以使用Yolo网络，对识别精度要求较高时可以使用Faster R-CNN网络。In this embodiment, different networks can be used for detection according to different requirements. For example, the Yolo network can be used when the processing speed is high, and the Faster R-CNN network can be used when the recognition accuracy is high.

通过目标检测单元101，当输入图像中存在至少一个人体时，检测出至少一个人体的边界框。在检测出人体的边界框之后，选择单元104选择第一检测单元102和第二检测单元103中的至少一个来检测人体的动作，以得到第一识别结果和第二识别结果中的至少一个。Through the target detection unit 101, when there is at least one human body in the input image, the bounding box of at least one human body is detected. After detecting the bounding box of the human body, the selection unit 104 selects at least one of the first detection unit 102 and the second detection unit 103 to detect the motion of the human body to obtain at least one of the first recognition result and the second recognition result.

在本实施例中，选择单元104可以根据实际的需要或应用场景来选择第一检测单元102和第二检测单元103中的至少一个来检测人体的动作，以得到第一识别结果和第二识别结果中的至少一个。当目标检测单元101在输入图像中检测到多个人体的多个边界框时，根据选择单元104的选择结果，第一检测单元102和/或第二检测单元103对多个边界框逐一进行检测。In this embodiment, the selection unit 104 may select at least one of the first detection unit 102 and the second detection unit 103 to detect the motion of the human body according to actual needs or application scenarios, so as to obtain the first recognition result and the second recognition result at least one of the results. When the target detection unit 101 detects multiple bounding boxes of multiple human bodies in the input image, according to the selection result of the selection unit 104, the first detection unit 102 and/or the second detection unit 103 detect the multiple bounding boxes one by one .

例如，对于仅需要检测简单动作的情况，例如，走，站，坐等简单的躯干动作，在该情况下，选择单元104选择第二检测单元103来检测人体的动作，即输出第二识别结果。For example, for a situation where only simple movements need to be detected, such as simple trunk movements such as walking, standing, and sitting, in this case, the selection unit 104 selects the second detection unit 103 to detect the movements of the human body, ie, outputs the second recognition result.

在本实施例中，第二检测单元103在检测出的人体的边界框中，基于卷积神经网络(CNN)来检测人体的动作，得到第二识别结果。In this embodiment, the second detection unit 103 detects the motion of the human body based on a convolutional neural network (CNN) in the detected bounding box of the human body, and obtains the second recognition result.

在本实施例中，可以使用流行的CNN网络来实现躯干动作的检测，例如，使用AlexNet网络来进行检测。In this embodiment, a popular CNN network can be used to detect the torso action, for example, an AlexNet network can be used for detection.

在本实施例中，在对该CNN网络进行训练时，可以先建立训练数据集，该训练数据集包括动作标注为“走”、“站”、“坐”、“跑”、“蹲”以及“躺”的人体的图像，这些图像可以从开放数据集得到。In this embodiment, when training the CNN network, a training data set can be established first, and the training data set includes actions marked as "walking", "standing", "sit", "running", "squatting" and Images of the "lying" human body, which are available from open datasets.

又例如，对于需要同时检测简单动作和较为复杂的动作的情况，例如，除了走，站，坐等简单的躯干动作，还包括抬头以及举手等较为复杂的局部动作，在该情况下，选择单元104选择第一检测单元102来检测人体的动作，即输出第一识别结果。或者，也可以选择第一检测单元102和第二检测单元103同时进行检测，即输出第一识别结果和第二识别结果。For another example, in the case where it is necessary to detect simple movements and more complex movements at the same time, for example, in addition to simple trunk movements such as walking, standing, and sitting, it also includes more complex partial movements such as raising the head and raising the hand, in this case, select the unit 104 Select the first detection unit 102 to detect the motion of the human body, that is, to output the first recognition result. Alternatively, the first detection unit 102 and the second detection unit 103 may also be selected to perform detection at the same time, that is, to output the first recognition result and the second recognition result.

在本实施例中，第一检测单元102在检测出的人体的边界框中，基于人体的关键点计算人体的特征并根据人体的特征来检测人体的动作，得到第一识别结果。In this embodiment, the first detection unit 102 calculates the features of the human body based on the key points of the human body in the detected bounding box of the human body, and detects the movements of the human body according to the features of the human body to obtain the first recognition result.

图2是本发明实施例1的第一检测单元102的一示意图。如图2所示，第一检测单元102包括：FIG. 2 is a schematic diagram of the first detection unit 102 according to Embodiment 1 of the present invention. As shown in FIG. 2, the first detection unit 102 includes:

第一检测模块201，其用于在检测出的人体的边界框中检测人体的关键点；a first detection module 201, which is used to detect the key points of the human body in the detected bounding box of the human body;

计算模块202，其用于根据检测出的人体的关键点计算人体的特征；以及a calculation module 202, which is used to calculate the features of the human body according to the detected key points of the human body; and

第二检测模块203，其用于根据计算出的人体的特征，基于分类器和/或预设的规则检测人体的动作，得到第一识别结果。The second detection module 203 is configured to detect the movement of the human body based on the classifier and/or preset rules according to the calculated features of the human body, and obtain the first recognition result.

在本实施例中，第一检测模块201可以基于各种方法来检测人体的关键点(key-points)，例如，第一检测模块201基于级联金字塔网络(CPN，Cascaded Pyramid Network)来检测人体的关键点。或者，也可以基于Open-pose或Alpha-pose等方法来检测。In this embodiment, the first detection module 201 may detect the key-points of the human body based on various methods. For example, the first detection module 201 detects the human body based on the Cascaded Pyramid Network (CPN, Cascaded Pyramid Network). key point. Alternatively, it can also be detected based on methods such as Open-pose or Alpha-pose.

在本实施例中，人体的关键点可以包括分别表示人体的多个部位所在位置的多个点，例如，分别表示人体的两个耳朵，两个眼睛，鼻子，两个肩膀，两个肘部，两个手腕，两个髋部，两个膝盖以及两个脚踝的点。In this embodiment, the key points of the human body may include multiple points that respectively represent the positions of multiple parts of the human body, for example, two ears, two eyes, nose, two shoulders, two elbows of the human body, respectively , two wrists, two hips, two knees, and two ankle points.

图3是本发明实施例1的人体的关键点的检测结果的一示意图。如图3所示，在一个人体的边界框中，通过CPN检测到表示人体的各个部位的关键点并能够输出这些关键点的位置信息。FIG. 3 is a schematic diagram of the detection result of the key points of the human body according to Embodiment 1 of the present invention. As shown in Figure 3, in the bounding box of a human body, key points representing various parts of the human body are detected by CPN and the position information of these key points can be output.

在本实施例中，计算模块202根据第一检测模块201检测出的人体的关键点计算人体的特征，例如，人体的特征可以包括：分别表示人体的多个部位所在位置的多个点的二维坐标；以及多个点的连线之间的至少一个角度。In this embodiment, the calculation module 202 calculates the features of the human body according to the key points of the human body detected by the first detection module 201. For example, the features of the human body may include: two points representing the positions of multiple parts of the human body respectively. dimensional coordinates; and at least one angle between lines connecting multiple points.

在本实施例中，需要计算的人体的特征可以根据实际需要而确定。In this embodiment, the characteristics of the human body to be calculated can be determined according to actual needs.

图4是本发明实施例1的基于关键点得到人体的特征的一示意图。如图4所示，用于计算特征的关键点包括以下人体部位所在的点：鼻子、左肩、右肩、左肘、右肘、左手腕、右手腕、左髋部、右髋部、左膝、右膝、左脚踝以及右脚踝。计算的人体的特征包括这些点的二维坐标，例如，这些点的X坐标和Y坐标，另外，人体的特征还可以包括左腿和躯干之间的第1角度、右腿和躯干之间的第2角度、左小腿和左大腿之间的第3角度以及右小腿和右大腿之间的第4角度。FIG. 4 is a schematic diagram of obtaining features of a human body based on key points according to Embodiment 1 of the present invention. As shown in Figure 4, the key points used to calculate the features include the points where the following body parts are located: nose, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee , right knee, left ankle, and right ankle. The calculated features of the human body include the two-dimensional coordinates of these points, for example, the X and Y coordinates of these points. In addition, the features of the human body may also include the first angle between the left leg and the torso, the angle between the right leg and the torso. 2nd angle, 3rd angle between left calf and left thigh, and 4th angle between right calf and right thigh.

在计算模块202计算出人体的特征之后，第二检测模块203根据计算出的人体的特征，基于分类器和/或预设的规则检测人体的动作，得到第一识别结果。After the calculation module 202 calculates the characteristics of the human body, the second detection module 203 detects the movements of the human body based on the classifier and/or preset rules according to the calculated characteristics of the human body, and obtains a first recognition result.

在本实施例中，第二检测模块203可以根据计算出的人体的特征，基于分类器检测人体的躯干动作，并基于预设的规则检测人体的头部动作和上肢动作。In this embodiment, the second detection module 203 may detect the body movement of the human body based on the classifier according to the calculated characteristics of the human body, and detect the head movement and the upper limb movement of the human body based on a preset rule.

在本实施例中，第二检测模块203可以基于各种分类器检测人体的躯干动作，例如，第二检测模块203可以基于多层感知机(MLP，Multi-Layer Perceptron)分类器进行检测。根据计算出的特征并基于MLP分类器进行检测，能够获得较好的检测性能。In this embodiment, the second detection module 203 may detect the torso action of the human body based on various classifiers. For example, the second detection module 203 may perform detection based on a Multi-Layer Perceptron (MLP, Multi-Layer Perceptron) classifier. According to the calculated features and detection based on the MLP classifier, better detection performance can be obtained.

在本实施例中，第二检测模块203还可以基于预设的规则检测人体的头部动作和上肢动作，例如，抬头看，低头看，举手等动作。可以根据实际需要来针对不同的动作设定预设的规则，例如，当两个耳朵的高度高于两个眼睛的高度时，则判断为“向下看”；当手腕的高度高于肘部的高度时，则判断为“举手”。In this embodiment, the second detection module 203 may also detect head movements and upper limb movements of the human body based on preset rules, for example, movements such as looking up, looking down, raising hands, and the like. Preset rules can be set for different actions according to actual needs. For example, when the height of the two ears is higher than the height of the two eyes, it is judged as "looking down"; when the height of the wrist is higher than the elbow When the height is high, it is judged as "raised hand".

图5是本发明实施例1的利用人体动作识别装置100进行人体动作识别的一示意图。如图5所示，包含多个人体的输入图像输入至目标检测单元101中，目标检测单元101检测到输入图像中的各个人体的边界框并分别输出至第一检测单元102和第二检测单元103中，第一检测单元102和第二检测单元103根据选择单元104的选择结果实施检测，输出第一识别结果和第二识别结果中的至少一个。FIG. 5 is a schematic diagram of performing human motion recognition using the human motion recognition apparatus 100 according to Embodiment 1 of the present invention. As shown in FIG. 5 , an input image containing multiple human bodies is input to the target detection unit 101 , and the target detection unit 101 detects the bounding boxes of each human body in the input image and outputs to the first detection unit 102 and the second detection unit respectively. In step 103, the first detection unit 102 and the second detection unit 103 perform detection according to the selection result of the selection unit 104, and output at least one of the first recognition result and the second recognition result.

实施例2Example 2

本发明实施例还提供了一种电子设备，图6是本发明实施例2的电子设备的一示意图。如图6所示，电子设备600包括人体动作识别装置601，人体动作识别装置601的结构和功能与实施例1中的记载相同，此处不再赘述。An embodiment of the present invention further provides an electronic device, and FIG. 6 is a schematic diagram of the electronic device according to Embodiment 2 of the present invention. As shown in FIG. 6 , the electronic device 600 includes a human body motion recognition device 601 , and the structure and function of the human body motion recognition device 601 are the same as those described in Embodiment 1, and are not repeated here.

图7是本发明实施例2的电子设备的系统构成的一示意框图。如图7所示，电子设备700可以包括中央处理器701和存储器702；该存储器702耦合到该中央处理器701。该图是示例性的；还可以使用其它类型的结构，来补充或代替该结构，以实现电信功能或其它功能。FIG. 7 is a schematic block diagram of a system configuration of an electronic device according to Embodiment 2 of the present invention. As shown in FIG. 7 , the electronic device 700 may include a central processing unit 701 and a memory 702 ; the memory 702 is coupled to the central processing unit 701 . This figure is exemplary; other types of structures may be used in addition to or in place of this structure to implement telecommunication functions or other functions.

如图7所示，电子设备700还可以包括：输入单元703、显示器704、电源705。As shown in FIG. 7 , the electronic device 700 may further include: an input unit 703 , a display 704 , and a power supply 705 .

在一个实施方式中，实施例1所述的人体动作识别装置的功能可以被集成到中央处理器701中。其中，中央处理器701可以被配置为：检测输入图像中的人体的边界框；选择并进行以下的至少一种检测：在检测出的人体的边界框中，基于所述人体的关键点计算所述人体的特征，并根据所述人体的特征来检测所述人体的动作，得到第一识别结果；以及在检测出的人体的边界框中，基于卷积神经网络来检测所述人体的动作，得到第二识别结果。In one embodiment, the functions of the human motion recognition device described in Embodiment 1 may be integrated into the central processing unit 701 . Wherein, the central processor 701 may be configured to: detect the bounding box of the human body in the input image; select and perform at least one of the following detections: in the detected bounding box of the human body, calculate the detected bounding box based on the key points of the human body The feature of the human body is detected, and the action of the human body is detected according to the feature of the human body to obtain a first recognition result; and in the bounding box of the detected human body, the action of the human body is detected based on a convolutional neural network, A second identification result is obtained.

例如，在检测出的人体的边界框中，基于所述人体的关键点计算所述人体的特征，并根据所述人体的特征来检测所述人体的动作，得到第一识别结果，包括：在检测出的人体的边界框中检测所述人体的关键点；根据检测出的所述人体的关键点计算所述人体的特征；以及根据计算出的所述人体的特征，基于分类器和/或预设的规则检测所述人体的动作，得到第一识别结果。For example, in the bounding box of the detected human body, the features of the human body are calculated based on the key points of the human body, and the movements of the human body are detected according to the features of the human body, and the first recognition result is obtained, including: Detecting the key points of the human body in the bounding box of the detected human body; calculating the features of the human body according to the detected key points of the human body; and according to the calculated features of the human body, based on a classifier and/or The preset rules detect the movements of the human body to obtain a first recognition result.

在另一个实施方式中，实施例1所述的人体动作识别装置可以与该中央处理器701分开配置，例如可以将该人体动作识别装置配置为与中央处理器701连接的芯片，通过中央处理器701的控制来实现该人体动作识别装置的功能。In another implementation manner, the human body motion recognition device described in Embodiment 1 may be configured separately from the central processing unit 701. For example, the human body motion recognition device may be configured as a chip connected to the central processing unit 701, through the central processing unit 701. 701 to realize the function of the human action recognition device.

在本实施例中电子设备700也并不是必须要包括图7中所示的所有部件。In this embodiment, the electronic device 700 does not necessarily include all the components shown in FIG. 7 .

如图7所示，中央处理器701有时也称为控制器或操作控件，可以包括微处理器或其它处理器装置和/或逻辑装置，中央处理器701接收输入并控制电子设备700的各个部件的操作。As shown in FIG. 7 , the central processing unit 701 , also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, and the central processing unit 701 receives input and controls the various components of the electronic device 700 operation.

该存储器702，例如可以是缓存器、闪存、硬驱、可移动介质、易失性存储器、非易失性存储器或其它合适装置中的一种或更多种。并且该中央处理器701可执行该存储器702存储的该程序，以实现信息存储或处理等。其它部件的功能与现有类似，此处不再赘述。电子设备700的各部件可以通过专用硬件、固件、软件或其结合来实现，而不偏离本发明的范围。The memory 702 may be, for example, one or more of a cache, flash memory, hard drive, removable media, volatile memory, non-volatile memory, or other suitable devices. And the central processing unit 701 can execute the program stored in the memory 702 to realize information storage or processing. The functions of other components are similar to the existing ones, and are not repeated here. The various components of electronic device 700 may be implemented by dedicated hardware, firmware, software, or a combination thereof, without departing from the scope of the present invention.

实施例3Example 3

本发明实施例还提供一种人体动作识别方法，该方法对应于实施例1的人体动作识别装置。图8是本发明实施例3的人体动作识别方法的一示意图。如图8所示，该方法包括：An embodiment of the present invention further provides a method for recognizing human body motion, and the method corresponds to the device for recognizing human body motion in Embodiment 1. FIG. 8 is a schematic diagram of a method for recognizing human action according to Embodiment 3 of the present invention. As shown in Figure 8, the method includes:

步骤801：检测输入图像中的人体的边界框；以及Step 801: Detect the bounding box of the human body in the input image; and

步骤802：选择并进行以下的至少一种检测：在检测出的人体的边界框中，基于所述人体的关键点计算所述人体的特征，并根据所述人体的特征来检测所述人体的动作，得到第一识别结果；以及在检测出的人体的边界框中，基于卷积神经网络来检测所述人体的动作，得到第二识别结果。Step 802: Select and perform at least one of the following detections: in the detected bounding box of the human body, calculate the features of the human body based on the key points of the human body, and detect the human body according to the features of the human body. action to obtain a first recognition result; and in the bounding box of the detected human body, the action of the human body is detected based on a convolutional neural network to obtain a second recognition result.

在本实施例中，上述各个步骤的具体实现方法与实施例1中的记载相同，此处不再重复。In this embodiment, the specific implementation methods of the above steps are the same as those described in Embodiment 1, and are not repeated here.

本发明实施例还提供一种计算机可读程序，其中当在人体动作识别装置或电子设备中执行所述程序时，所述程序使得计算机在所述人体动作识别装置或电子设备中执行实施例3所述的人体动作识别方法。An embodiment of the present invention further provides a computer-readable program, wherein when the program is executed in a human motion recognition apparatus or electronic device, the program causes a computer to execute Embodiment 3 in the human motion recognition apparatus or electronic equipment The described method for human action recognition.

本发明实施例还提供一种存储有计算机可读程序的存储介质，其中所述计算机可读程序使得计算机在人体动作识别装置或电子设备中执行实施例3所述的人体动作识别方法。An embodiment of the present invention further provides a storage medium storing a computer-readable program, wherein the computer-readable program enables a computer to execute the human motion recognition method described in Embodiment 3 in a human motion recognition apparatus or electronic device.

结合本发明实施例描述的人体动作识别装置或电子设备中执行人体动作识别方法可直接体现为硬件、由处理器执行的软件模块或二者组合。例如，图1中所示的功能框图中的一个或多个和/或功能框图的一个或多个组合，既可以对应于计算机程序流程的各个软件模块，亦可以对应于各个硬件模块。这些软件模块，可以分别对应于图8所示的各个步骤。这些硬件模块例如可利用现场可编程门阵列(FPGA)将这些软件模块固化而实现。The method for performing human motion recognition in the human motion recognition apparatus or electronic device described in conjunction with the embodiments of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of the two. For example, one or more of the functional block diagrams shown in FIG. 1 and/or one or more combinations of the functional block diagrams may correspond to either each software module of the computer program flow, or each hardware module. These software modules may correspond to the respective steps shown in FIG. 8 . These hardware modules can be implemented by, for example, solidifying these software modules using a Field Programmable Gate Array (FPGA).

软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动磁盘、CD-ROM或者本领域已知的任何其它形式的存储介质。可以将一种存储介质耦接至处理器，从而使处理器能够从该存储介质读取信息，且可向该存储介质写入信息；或者该存储介质可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。该软件模块可以存储在移动终端的存储器中，也可以存储在可插入移动终端的存储卡中。例如，如果电子设备采用的是较大容量的MEGA-SIM卡或者大容量的闪存装置，则该软件模块可存储在该MEGA-SIM卡或者大容量的闪存装置中。A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. A storage medium can be coupled to the processor, such that the processor can read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor. The processor and storage medium may reside in an ASIC. The software module can be stored in the memory of the mobile terminal, or can be stored in a memory card that can be inserted into the mobile terminal. For example, if the electronic device adopts a larger-capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.

针对图1描述的功能框图中的一个或多个和/或功能框图的一个或多个组合，可以实现为用于执行本申请所描述功能的通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立门或晶体管逻辑器件、分立硬件组件、或者其任意适当组合。针对图1描述的功能框图中的一个或多个和/或功能框图的一个或多个组合，还可以实现为计算设备的组合，例如，DSP和微处理器的组合、多个微处理器、与DSP通信结合的一个或多个微处理器或者任何其它这种配置。One or more of the functional block diagrams described with respect to FIG. 1 and/or one or more combinations of the functional block diagrams can be implemented as a general purpose processor, a digital signal processor (DSP), a special purpose processor for performing the functions described herein Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or any suitable combination thereof. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to FIG. 1 can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, multiple microprocessors, One or more microprocessors or any other such arrangement in communication with the DSP.

以上结合具体的实施方式对本发明进行了描述，但本领域技术人员应该清楚，这些描述都是示例性的，并不是对本发明保护范围的限制。本领域技术人员可以根据本发明的精神和原理对本发明做出各种变型和修改，这些变型和修改也在本发明的范围内。The present invention has been described above with reference to the specific embodiments, but those skilled in the art should understand that these descriptions are all exemplary and do not limit the protection scope of the present invention. Various variations and modifications of the present invention can be made by those skilled in the art in accordance with the spirit and principles of the present invention, and these variations and modifications are also within the scope of the present invention.

Claims

1. A human body motion recognition device, the device comprising:

a target detection unit for detecting the bounding box of the human body in the input image;

The first detection unit is configured to calculate the features of the human body based on the key points of the human body in the bounding box of the detected human body, and detect the movements of the human body according to the features of the human body to obtain the first detection unit. identification results;

a second detection unit, configured to detect the motion of the human body based on the convolutional neural network in the bounding box of the detected human body, and obtain a second recognition result; and

a selection unit, configured to select at least one of the first detection unit and the second detection unit to detect the motion of the human body, so as to obtain at least one of the first recognition result and the second recognition result.

2. The apparatus according to claim 1, wherein the first detection unit comprises:

a first detection module, which is used for detecting the key points of the human body in the bounding box of the detected human body;

a computing module for computing the features of the human body according to the detected key points of the human body; and

The second detection module is configured to detect the movement of the human body based on the classifier and/or preset rules according to the calculated characteristics of the human body, and obtain a first recognition result.

3. The apparatus of claim 2, wherein,

The key points of the human body include a plurality of points respectively representing the positions of the multiple parts of the human body.

4. The apparatus of claim 2, wherein,

The characteristics of the human body include:

two-dimensional coordinates of a plurality of points respectively representing the positions of the plurality of parts of the human body; and

at least one angle between lines connecting the plurality of points.

5. The apparatus of claim 2, wherein,

The first detection module detects the key points of the human body based on a Cascaded Pyramid Network (CPN).

6. The apparatus of claim 2, wherein,

The classifier is a multilayer perceptron (MLP) classifier.

7. The apparatus of claim 1, wherein,

The second detection module detects the movement of the torso of the human body based on the classifier according to the calculated characteristics of the human body, and detects the movement of the head and the upper limbs of the human body based on a preset rule.

8. An electronic device comprising the apparatus of any of claims 1-7.

9. A method for human action recognition, the method comprising:

Detect the bounding box of the human body in the input image;

Select and perform at least one of the following detections: in the bounding box of the detected human body, calculate the features of the human body based on the key points of the human body, and detect the movements of the human body according to the features of the human body, and obtain a first recognition result; and, in the bounding box of the detected human body, detecting the motion of the human body based on a convolutional neural network to obtain a second recognition result.

10. The method according to claim 9, wherein, in the bounding box of the detected human body, the feature of the human body is calculated based on the key points of the human body, and the feature of the human body is detected according to the feature of the human body. action to obtain the first recognition result, including:

Detect key points of the human body in the bounding box of the detected human body;

Calculate the features of the human body according to the detected key points of the human body; and

According to the calculated features of the human body, the movements of the human body are detected based on a classifier and/or a preset rule, and a first recognition result is obtained.