CN116206356A

CN116206356A - Behavior recognition device and method and electronic equipment

Info

Publication number: CN116206356A
Application number: CN202111443162.9A
Authority: CN
Inventors: 温思寒; 姚杰; 朱建清
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-06-02

Abstract

Embodiments of the present application provide a behavior recognition device and method, and electronic equipment. The method includes: detecting an object in an image to obtain an object detection frame; using a lightweight network and performing pose estimation based on the object detection frame to obtain multiple key points of the object; wherein, the lightweight The backbone network of the level network is a MobileNet network structure, and the lightweight network further includes an upsampling module connected to the MobileNet network structure; and the behavior of the object is identified based on the multiple key points. In this way, the speed of posture estimation can be accelerated, not only can the accuracy of behavior recognition results be improved, but also behavior recognition can be performed in real time.

Description

Behavior recognition device and method and electronic equipment

技术领域technical field

本申请实施例涉及图像检测技术领域。The embodiment of the present application relates to the technical field of image detection.

背景技术Background technique

人工智能和深度学习技术的最新进展使基于图像的行为识别(behaviorrecognition)技术成为现实。行为识别技术可以识别由多个动作(action)或移动(movement)组成的复杂行为。可以通过物体检测模块检测出物体框，并通过姿势估计(poseestimation)模块进行多个关键点的检测，进而识别出物体的行为。Recent advances in artificial intelligence and deep learning technologies have made image-based behavior recognition a reality. Behavior recognition technology can recognize complex behaviors composed of multiple actions or movements. The object frame can be detected by the object detection module, and multiple key points can be detected by the pose estimation module to recognize the behavior of the object.

应该注意，上面对技术背景的介绍只是为了方便对本申请的技术方案进行清楚、完整的说明，并方便本领域技术人员的理解而阐述，不能仅仅因为这些方案在本申请的背景技术部分进行了阐述而认为上述技术方案为本领域技术人员所公知。It should be noted that the above introduction to the technical background is only for the convenience of a clear and complete description of the technical solutions of the present application, and for the convenience of those skilled in the art to understand and explain, not just because these solutions are described in the background technology part of the application. It is considered that the above technical solutions are well known to those skilled in the art.

发明内容Contents of the invention

但是，发明人发现：姿势估计模块是比较耗时的部分，如果检测的物体数目增加，则姿势估计需要耗费的时间大大增多，无法实现实时地识别，因此难以应用到嵌入式设备等实时性要求较高的场合。However, the inventors found that the pose estimation module is a time-consuming part. If the number of detected objects increases, the time spent on pose estimation will be greatly increased, and real-time recognition cannot be realized. Therefore, it is difficult to apply to real-time requirements such as embedded devices. higher occasions.

针对上述技术问题的至少之一，本申请实施例提供一种行为识别装置和方法以及电子设备，期待在保证行为识别结果的准确性的前提下，提高行为识别的速度。Aiming at at least one of the above technical problems, the embodiments of the present application provide a behavior recognition device and method as well as electronic equipment, expecting to improve the speed of behavior recognition under the premise of ensuring the accuracy of behavior recognition results.

根据本申请实施例的一个方面，提供一种行为识别装置，包括：According to an aspect of an embodiment of the present application, a behavior recognition device is provided, including:

检测单元，其对图像中的物体进行检测以获得物体检测框；A detection unit, which detects an object in the image to obtain an object detection frame;

估计单元，其使用轻量级网络并基于所述物体检测框进行姿势估计，获得所述物体的多个关键点；其中，所述轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块；以及An estimation unit, which uses a lightweight network and performs pose estimation based on the object detection frame to obtain multiple key points of the object; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the lightweight The magnitude network also includes an upsampling module connected to the MobileNet network structure; and

识别单元，其基于所述多个关键点对所述物体的行为进行识别。A recognition unit, which recognizes the behavior of the object based on the plurality of key points.

根据本申请实施例的另一个方面，提供一种行为识别方法，包括：According to another aspect of the embodiment of the present application, a behavior recognition method is provided, including:

对图像中的物体进行检测以获得物体检测框；Detect objects in the image to obtain object detection frames;

使用轻量级网络并基于所述物体检测框进行姿势估计，获得所述物体的多个关键点；其中，所述轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块；以及Use a lightweight network and perform pose estimation based on the object detection frame to obtain multiple key points of the object; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network is also Including an upsampling module connected to the MobileNet network structure; and

基于所述多个关键点对所述物体的行为进行识别。Recognize the behavior of the object based on the multiple key points.

根据本申请实施例的另一个方面，提供一种电子设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器被配置为执行所述计算机程序而实现如前所述的行为识别方法。According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory and a processor, the memory stores a computer program, and the processor is configured to execute the computer program to implement the aforementioned behavior recognition methods.

本申请实施例的有益效果之一在于：使用轻量级网络并基于物体检测框进行姿势估计，获得物体的多个关键点；其中，所述轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块。由此，能够加速姿势估计的速度，不仅能够提高行为识别结果的准确性，而且能够实时地进行行为识别。One of the beneficial effects of the embodiments of the present application is: using a lightweight network and performing pose estimation based on the object detection frame to obtain multiple key points of the object; wherein, the backbone network of the lightweight network is a MobileNet network structure, and The lightweight network also includes an upsampling module connected to the MobileNet network structure. In this way, the speed of posture estimation can be accelerated, not only can the accuracy of behavior recognition results be improved, but also behavior recognition can be performed in real time.

参照后文的说明和附图，详细公开了本申请实施例的特定实施方式，指明了本申请实施例的原理可以被采用的方式。应该理解，本申请的实施方式在范围上并不因而受到限制。在所附权利要求的精神和条款的范围内，本申请的实施方式包括许多改变、修改和等同。With reference to the following descriptions and drawings, specific implementation manners of the embodiments of the present application are disclosed in detail, indicating how the principles of the embodiments of the present application can be adopted. It should be understood that the embodiments of the present application are not limited thereby in scope. Embodiments of the present application encompass many changes, modifications and equivalents within the spirit and scope of the appended claims.

附图说明Description of drawings

所包括的附图用来提供对本申请实施例的进一步的理解，其构成了说明书的一部分，用于例示本申请的实施方式，并与文字描述一起来阐释本申请的原理。显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的实施方式。在附图中：The included drawings are used to provide a further understanding of the embodiments of the present application, which constitute a part of the specification, are used to illustrate the implementation of the present application, and explain the principle of the present application together with the text description. Apparently, the drawings in the following description are only some embodiments of the present application, and those skilled in the art can also obtain other implementation manners according to these drawings without creative efforts. In the attached picture:

图1是本申请实施例的行为识别的一框架示意图；FIG. 1 is a schematic diagram of a framework of behavior recognition in an embodiment of the present application;

图2是CPN的一框架示意图；FIG. 2 is a schematic diagram of a framework of a CPN;

图3是本申请实施例的行为识别方法的一示意图；Fig. 3 is a schematic diagram of the behavior recognition method of the embodiment of the present application;

图4是本申请实施例的轻量级网络的一示意图；FIG. 4 is a schematic diagram of a lightweight network according to an embodiment of the present application;

图5是图4的轻量级网络中Block的一示意图；Fig. 5 is a schematic diagram of Block in the lightweight network of Fig. 4;

图6是图5的Block中的SeModule一示意图；Fig. 6 is a schematic diagram of SeModule in the Block of Fig. 5;

图7是本申请实施例的行为识别的一示例图；FIG. 7 is an example diagram of behavior recognition in the embodiment of the present application;

图8是本申请实施例的行为识别装置的示意图；Fig. 8 is a schematic diagram of a behavior recognition device according to an embodiment of the present application;

图9是本申请实施例的电子设备的示意图。Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

参照附图，通过下面的说明书，本申请实施例的前述以及其它特征将变得明显。在说明书和附图中，具体公开了本申请的特定实施方式，其表明了其中可以采用本申请实施例的原则的部分实施方式，应了解的是，本申请不限于所描述的实施方式，相反，本申请实施例包括落入所附权利要求的范围内的全部修改、变型以及等同物。The foregoing and other features of the embodiments of the present application will become apparent from the following description, with reference to the accompanying drawings. In the description and drawings, specific embodiments of the present application are specifically disclosed, which indicate some embodiments in which the principles of the embodiments of the present application can be adopted. It should be understood that the present application is not limited to the described embodiments, on the contrary , The embodiments of the present application include all modifications, variations and equivalents falling within the scope of the appended claims.

在本申请实施例中，术语“第一”、“第二”等用于对不同元素从称谓上进行区分，但并不表示这些元素的空间排列或时间顺序等，这些元素不应被这些术语所限制。术语“和/或”包括相关联列出的术语的一种或多个中的任何一个和所有组合。术语“包含”、“包括”、“具有”等是指所陈述的特征、元素、元件或组件的存在，但并不排除存在或添加一个或多个其他特征、元素、元件或组件。In this embodiment of the application, the terms "first", "second", etc. are used to distinguish different elements from the title, but do not indicate the spatial arrangement or time order of these elements, and these elements should not be referred to by these terms restricted. The term "and/or" includes any and all combinations of one or more of the associated listed items. The terms "comprising", "including", "having" and the like refer to the presence of stated features, elements, elements or components, but do not exclude the presence or addition of one or more other features, elements, elements or components.

在本申请实施例中，单数形式“一”、“该”等包括复数形式，应广义地理解为“一种”或“一类”而并不是限定为“一个”的含义；此外术语“所述”应理解为既包括单数形式也包括复数形式，除非上下文另外明确指出。此外术语“根据”应理解为“至少部分根据……”，术语“基于”应理解为“至少部分基于……”，除非上下文另外明确指出。In the embodiments of the present application, the singular forms "a", "the", etc. include plural forms, which should be broadly understood as "one" or "a class" rather than limited to "one"; in addition, the term "all The above should be understood to include both the singular and the plural, unless the context clearly dictates otherwise. Furthermore, the term "based on" should be understood as "at least in part based on..." and the term "based on" should be understood as "at least in part based on...", unless the context clearly indicates otherwise.

针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多个其它实施方式中使用，与其它实施方式中的特征相组合，或替代其它实施方式中的特征。术语“包括/包含”在本文使用时指特征、整件、步骤或组件的存在，但并不排除一个或更多个其它特征、整件、步骤或组件的存在或附加。Features described and/or illustrated with respect to one embodiment can be used in the same or similar manner in one or more other embodiments, in combination with, or instead of features in other embodiments . The term "comprising/comprising" when used herein refers to the presence of a feature, integer, step or component, but does not exclude the presence or addition of one or more other features, integers, steps or components.

图1是本申请实施例的行为识别的一框架示意图。如图1所示，对于输入图像，可以使用物体检测模块101进行检测，获得物体检测框；然后可以使用姿势估计模块102进行姿势估计，获得一个或多个物体的关键点。可以使用特征计算模块103进行特征提取等，然后通过轻量级分类器104识别出物体的行为。在进行姿势估计时，可以使用神经网络模型等，例如使用级联金字塔网络(CPN，Cascaded Pyramid Network)模型。FIG. 1 is a schematic diagram of a framework of behavior recognition according to an embodiment of the present application. As shown in FIG. 1 , for an input image, the object detection module 101 can be used for detection to obtain an object detection frame; then the pose estimation module 102 can be used for pose estimation to obtain key points of one or more objects. The feature calculation module 103 can be used to perform feature extraction, and then the behavior of the object can be identified through the lightweight classifier 104 . When performing pose estimation, a neural network model or the like can be used, for example, a cascaded pyramid network (CPN, Cascaded Pyramid Network) model can be used.

图2是CPN的一框架示意图，CPN可以包括GlobalNet和RefineNet；其中GlobalNet的骨干网络为ResNet。其中，GolbalNet负责网络关键点的检测，对比较容易检测的部位(例如眼睛、胳膊等)的关键点预测效果较好，采用的损失函数为L2 loss。可以在每一个elem-sum操作之前，都对特征图(featuremap)使用1*1的卷积操作。RefineNet对GolbalNet预测的结果进行修正；GolbalNet对身体部位的那些被遮挡的、看不见的或者有复杂背景的关键点预测误差较大，RefineNet可以修正这些关键点。关于CPN等模型的具体内容还可以参考相关技术。Figure 2 is a schematic diagram of a framework of CPN. CPN can include GlobalNet and RefineNet; the backbone network of GlobalNet is ResNet. Among them, GolbalNet is responsible for the detection of key points of the network, and it has a better prediction effect on key points of parts that are easier to detect (such as eyes, arms, etc.), and the loss function used is L2 loss. A 1*1 convolution operation can be used on the feature map (featuremap) before each elem-sum operation. RefineNet corrects the results predicted by GolbalNet; GolbalNet has a large prediction error for key points that are occluded, invisible, or have complex backgrounds, and RefineNet can correct these key points. For specific content of models such as CPN, reference may also be made to related technologies.

但是，CPN是比较重量级的网络结构，如果检测的物体数目增加，则姿势估计需要耗费的时间大大增多，无法实现实时地识别，因此难以应用到嵌入式设备等实时性要求较高的场合。However, CPN is a relatively heavyweight network structure. If the number of detected objects increases, the time required for pose estimation will be greatly increased, and real-time recognition cannot be realized. Therefore, it is difficult to apply to embedded devices and other occasions with high real-time requirements.

在本申请实施例中，作为检测目标的物体可以是各种年龄阶段的人体，例如可以是老年人，也可以是儿童，还可以是老年人和/或护理人员，儿童和/或监护人员。本申请不限于此，作为检测目标的物体可以是具有生命特征的人体，或者是不具有生命特征的机器人等等。In this embodiment of the present application, the object to be detected may be a human body of various ages, for example, an elderly person, a child, or an elderly person and/or a caregiver, a child and/or a guardian. The present application is not limited thereto, and the object to be detected may be a human body with vital signs, or a robot without vital signs, or the like.

第一方面的实施例Embodiments of the first aspect

本申请实施例提供一种行为识别方法。图3是本申请实施例的行为识别方法的一示意图，如图3所示，该方法包括：An embodiment of the present application provides a behavior recognition method. Fig. 3 is a schematic diagram of the behavior recognition method of the embodiment of the present application, as shown in Fig. 3, the method includes:

301，对图像中的物体进行检测以获得物体检测框；301. Detect an object in an image to obtain an object detection frame;

302，使用轻量级网络并基于所述物体检测框进行姿势估计，获得物体的多个关键点；其中，轻量级网络的骨干网络为MobileNet网络结构，并且轻量级网络还包括与MobileNet网络结构连接的上采样模块；以及302. Use a lightweight network to perform pose estimation based on the object detection frame, and obtain multiple key points of the object; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network also includes a MobileNet network a structurally connected upsampling module; and

303，基于多个关键点对物体的行为进行识别。303. Identify the behavior of the object based on multiple key points.

值得注意的是，以上附图3仅示意性地对本申请实施例进行了说明，但本申请不限于此。例如可以适当地调整各个操作之间的执行顺序，此外还可以增加其他的一些操作或者减少其中的某些操作。本领域的技术人员可以根据上述内容进行适当地变型，而不仅限于上述附图3的记载。It should be noted that the above accompanying drawing 3 only schematically illustrates the embodiment of the present application, but the present application is not limited thereto. For example, the execution order of various operations can be appropriately adjusted, and some other operations can be added or some of them can be reduced. Those skilled in the art can make appropriate modifications according to the above content, and are not limited to the above description in FIG. 3 .

在一些实施例中，具有待检测物体的图像可以是视频帧中的一帧或多帧图像，即该图像可以是动态图像，但本申请不限于此，对于一张或多张静态图像，本申请实施例同样适用。In some embodiments, the image with the object to be detected can be one or more frames of images in the video frame, that is, the image can be a dynamic image, but the application is not limited thereto. For one or more static images, the present application The application examples are also applicable.

在一些实施例中，所述轻量级网络通过如下方式生成：使用MobileNet来代替级联金字塔网络中的GlobalNet的骨干网络，并且使用上采样模块来代替所述级联金字塔网络中的RefineNet和金字塔结构。In some embodiments, the lightweight network is generated by using MobileNet to replace the backbone network of GlobalNet in the cascaded pyramid network, and using an upsampling module to replace the RefineNet and pyramid in the cascaded pyramid network structure.

例如，使用MobileNetv3来代替Resnet-50backbone，这样可以使用更少的参数，并且降低推理(inference)期间所需的内存占用，从而加快姿势估计的速度。此外，去掉CPN的RefineNet和金字塔结构，直接使用上采样模块对MobileNetv3的输出进行处理，这样可以进一步提高姿势估计的速度。For example, using MobileNetv3 instead of Resnet-50backbone can use fewer parameters and reduce the memory footprint required during inference, thereby speeding up pose estimation. In addition, remove the RefineNet and pyramid structure of CPN, and directly use the upsampling module to process the output of MobileNetv3, which can further improve the speed of pose estimation.

在一些实施例中，所述轻量级网络中使用以MobileNet为骨干网络的Globalnet进行多次下采样，并使用上采样模块对所述多次下采样中的一次下采样的结果进行多次上采样。由此，可以直接使用上采样模块简化网络结构，实现轻量级网络结构，进一步提高姿势估计的速度。In some embodiments, the lightweight network uses Globalnet with MobileNet as the backbone network to perform multiple down-sampling, and uses an up-sampling module to perform multiple up-sampling on the result of one down-sampling in the multiple down-sampling sampling. Thus, the upsampling module can be directly used to simplify the network structure, realize a lightweight network structure, and further improve the speed of pose estimation.

在一些实施例中，所述上采样模块对所述多次下采样中的最底层下采样的结果进行多次上采样。由此，可以使用最底层下采样的结果，不仅能够实现轻量级网络结构，进一步提高姿势估计的速度，而且能够进一步提高姿势估计的准确性。In some embodiments, the up-sampling module performs multiple up-sampling on the result of the lowest down-sampling in the multiple down-sampling. As a result, the bottom-level downsampling results can be used, which can not only realize a lightweight network structure, further improve the speed of pose estimation, but also further improve the accuracy of pose estimation.

图4是本申请实施例的轻量级网络的一示意图，如图4所示，对于3xHxW的输入图像，可以进行卷积“Conv2d,16x3x3,2”(如401所示)和批归一化(BN，BatchNorm)“BN,hswish”(如402所示)的操作。Fig. 4 is a schematic diagram of the lightweight network of the embodiment of the present application. As shown in Fig. 4, for a 3xHxW input image, convolution "Conv2d, 16x3x3, 2" (as shown in 401) and batch normalization can be performed (BN, BatchNorm) "BN, hswish" (as shown in 402 ) operation.

如图4所示，可以使用MobileNetv3进行下采样(如403至406所示)，例如“Block,3x16x16x16,1,Relu,None”“Block,3x16x64x24,2,Relu,None”“Block,3x24x72x24,1,Relu,None”等操作(如403所示)，“Block,5x24x72x40,2,Relu,Se”“Block,5x40x120x40,1,Relu,Se”“Block,5x40x120x40,1,Relu,Se”等操作(如404所示)，“Block,3x40x240x80,2,hswish,None”“Block,3x80x200x80,1,hswish,None”“Block,3x80x184x80,1,hswish,None”“Block,3x80x184x80,1,hswish,None”等操作(如405所示)，“Block,3x80x480x112,1,hswish,Se”“Block,3x112x672x160,1,hswish,Se”“Block,5x112x672x160,1,hswish,Se”“Block,5x160x672x160,2,hswish,Se”“Block,5x160x960x160,1,hswish,Se”等操作(如406所示)。其中，可以使用线性修正单元(Relu，Rectified linear unit)、hswish、Se等，关于这些参数的具体含义可以参考相关技术。As shown in Figure 4, MobileNetv3 can be used for downsampling (as shown in 403 to 406), such as "Block,3x16x16x16,1,Relu,None" "Block,3x16x64x24,2,Relu,None" "Block,3x24x72x24,1 ,Relu,None" and other operations (as shown in 403), "Block,5x24x72x40,2,Relu,Se" "Block,5x40x120x40,1,Relu,Se" and "Block,5x40x120x40,1,Relu,Se" and other operations ( As shown in 404), "Block,3x40x240x80,2,hswish,None" "Block,3x80x200x80,1,hswish,None" "Block,3x80x184x80,1,hswish,None" "Block,3x80x184x80,1,hswish,None" Wait for the operation (as shown in 405), "Block,3x80x480x112,1,hswish,Se" "Block,3x112x672x160,1,hswish,Se" "Block,5x112x672x160,1,hswish,Se" "Block,5x160x672x160,2,hswish ,Se" "Block,5x160x960x160,1,hswish,Se" and other operations (as shown in 406). Wherein, a linear correction unit (Relu, Rectified linear unit), hswish, Se, etc. may be used, and for specific meanings of these parameters, reference may be made to related technologies.

图5是图4的轻量级网络中Block的一示意图。如图5所示，可以进行“Conv72x1x1,1”(如501所示)，“BN,nolinear”(如502所示)，“Conv 72x5x5,2”(如503所示)，“BN,nolinear”(如504所示)，“Conv 40x1x1,1”(如505所示)，“BN”(如506所示)等操作；此外，还可以进行“Conv 40x1x1,1”(如507所示)，“BN”(如508所示)等操作。如图5所示，Block还可以包括SeModule模块509。FIG. 5 is a schematic diagram of Blocks in the lightweight network of FIG. 4 . As shown in Figure 5, "Conv72x1x1,1" (as shown in 501), "BN, nolinear" (as shown in 502), "Conv 72x5x5,2" (as shown in 503), "BN, nolinear" can be performed (as shown in 504), "Conv 40x1x1,1" (as shown in 505), "BN" (as shown in 506) and other operations; in addition, "Conv 40x1x1,1" (as shown in 507) can also be performed, "BN" (shown in 508) and other operations. As shown in FIG. 5 , the Block may also include a SeModule module 509 .

图6是图5的Block中的SeModule模块509的一示意图。如图6所示，可以进行“AdaptiveAvgPool”(如601所示)，“Conv 10x1x1,1”(如602所示)，“BN,Relu”(如603所示)，“Conv 40x1x1,1”(如604所示)，“BN,hsigmoid”(如605所示)等操作。FIG. 6 is a schematic diagram of the SeModule module 509 in the Block of FIG. 5 . As shown in Figure 6, "AdaptiveAvgPool" (as shown in 601), "Conv 10x1x1,1" (as shown in 602), "BN, Relu" (as shown in 603), "Conv 40x1x1,1" ( As shown in 604), "BN, hsigmoid" (as shown in 605) and other operations.

以上图4至6示意性说明了使用MobileNetv3来代替Resnet的Globalnet，以下再对本申请实施例的上采样模块进行说明。Figures 4 to 6 above schematically illustrate the use of MobileNetv3 to replace the Globalnet of Resnet, and the upsampling module of the embodiment of the present application will be described below.

如图4所示，上采样模块对多次下采样中的最底层下采样(如406所示)的结果进行多次上采样。如图4所示，可以进行“ConvTranspose2d,80x4x4,2”(如407所示)，“BN,Relu”(如408所示)，“ConvTranspose2d,40x4x4,2”(如409所示)，“BN,Relu”(如410所示)，“ConvTranspose2d,24x4x4,2”(如411所示)，“BN,Relu”(如412所示)，“Conv2d,17x1x1,1”(如413所示)等操作。如图4所示，可以输出17xHxW的热力图(Heatmap)。As shown in FIG. 4 , the up-sampling module performs multiple up-sampling on the result of the bottom-level down-sampling (as shown in 406 ) among the multiple down-samplings. As shown in Figure 4, "ConvTranspose2d, 80x4x4, 2" (as shown in 407), "BN, Relu" (as shown in 408), "ConvTranspose2d, 40x4x4, 2" (as shown in 409), "BN ,Relu" (as shown in 410), "ConvTranspose2d,24x4x4,2" (as shown in 411), "BN, Relu" (as shown in 412), "Conv2d,17x1x1,1" (as shown in 413), etc. operate. As shown in Figure 4, a 17xHxW heatmap can be output.

以上图4至6示例性对本申请的轻量级网络进行了说明，但本申请不限于此。4 to 6 above exemplarily illustrate the lightweight network of the present application, but the present application is not limited thereto.

图7是本申请实施例的行为识别的一示例图，为简单起见，仅为一个物体(人体)标注了附图标记。如图7所示，经过本申请实施例的行为识别，可以为多个物体生成物体检测框701，此外还可以获得连接起来的多个关键点702，由此可以对物体的行为进行识别，不仅能够准确地识别出物体的行为，而且行为识别的速度快，能够满足实时性的要求。Fig. 7 is an example diagram of behavior recognition according to the embodiment of the present application. For the sake of simplicity, only one object (human body) is marked with reference signs. As shown in Figure 7, through the behavior recognition of the embodiment of the present application, an object detection frame 701 can be generated for multiple objects, and a plurality of connected key points 702 can also be obtained, so that the behavior of the object can be recognized, not only The behavior of the object can be accurately recognized, and the speed of behavior recognition is fast, which can meet the real-time requirement.

表1示出了CPN(以CPN-Resnet50表示)和本申请实施例的轻量级网络(以MobileNet-transpose表示)的一比较结果。如表1所示，本申请实施例的轻量级网络能够显著地减少内存占用和参数。Table 1 shows a comparison result between the CPN (represented by CPN-Resnet50) and the lightweight network of the embodiment of the present application (represented by MobileNet-transpose). As shown in Table 1, the lightweight network of the embodiment of the present application can significantly reduce memory usage and parameters.

表1Table 1

模型Model GPU内存(M)GPU memory (M) 权重参数(M)Weight parameter (M) CPN-Resnet50CPN-Resnet50 22752275 108.9108.9 MobileNet-transposeMobileNet-transpose 839839 6.46.4

表2示出了现有的姿势估计模型(以CPN-EfficientNet表示)、用MobileNetv3代替EfficientNet的骨干网络的模型(以CPN-MobileNetv3表示)和本申请实施例的轻量级网络(以MobileNet-transpose表示)的一比较结果。Table 2 shows the existing pose estimation model (represented by CPN-EfficientNet), the model of the backbone network of EfficientNet replaced by MobileNetv3 (represented by CPN-MobileNetv3) and the lightweight network of the embodiment of the present application (represented by MobileNet-transpose Indicates a comparison result of ).

表2Table 2

模型Model 权重参数(M)Weight parameter (M) FPSFPS AP(0.5:0.95)AP(0.5:0.95) AR(0.5:0.95)AR(0.5:0.95) CPN-EfficientNetCPN-EfficientNet 7.5M7.5M 17.1617.16 0.5910.591 0.6310.631 CPN-MobileNetv3CPN-MobileNetv3 5.5M5.5M 20.9120.91 0.5560.556 0.6020.602 MobileNet-transposeMobileNet-transpose 6.4M6.4M 21.5421.54 0.6070.607 0.6460.646

如表2所示，CPN-MobileNetv3相对于CPN-EfficientNet，速度虽然增加了，但是性能出现了下降。而本申请实施例的轻量级网络MobileNet-transpose由于直接使用了上采样模块，不仅能够保证行为识别的准确性，提升系统性能，而且能够增加行为识别的速度。As shown in Table 2, although the speed of CPN-MobileNetv3 has increased compared with CPN-EfficientNet, the performance has declined. However, because the lightweight network MobileNet-transpose in the embodiment of the present application directly uses the upsampling module, it can not only ensure the accuracy of behavior recognition, improve system performance, but also increase the speed of behavior recognition.

以上仅对与本申请相关的各步骤或过程进行了说明，但本申请不限于此。行为识别方法还可以包括其他步骤或者过程，关于这些步骤或者过程的具体内容，可以参考现有技术。此外，以上仅以行为识别模型的一些结构为例对本申请实施例进行了示例性说明，但本申请不限于这些结构，还可以对这些结构进行适当的变型，这些变型的实施方式均应包含在本申请实施例的范围之内。The above only describes the steps or processes related to the present application, but the present application is not limited thereto. The behavior recognition method may also include other steps or processes, and for the specific content of these steps or processes, reference may be made to the prior art. In addition, the embodiments of the present application are only illustrated by taking some structures of the behavior recognition model as examples above, but the present application is not limited to these structures, and appropriate modifications can also be made to these structures, and the implementation of these modifications should be included in within the scope of the embodiments of this application.

以上各个实施例仅对本申请实施例进行了示例性说明，但本申请不限于此，还可以在以上各个实施例的基础上进行适当的变型。例如，可以单独使用上述各个实施例，也可以将以上各个实施例中的一种或多种结合起来。The above embodiments only illustrate the embodiments of the present application as examples, but the present application is not limited thereto, and appropriate modifications can also be made on the basis of the above various embodiments. For example, each of the above-mentioned embodiments may be used alone, or one or more of the above-mentioned embodiments may be combined.

由上述实施例可知，使用轻量级网络并基于物体检测框进行姿势估计，获得物体的多个关键点；其中，轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块。由此，能够加速姿势估计的速度，不仅能够提高行为识别结果的准确性，而且能够实时地进行行为识别。As can be seen from the above-mentioned embodiments, using a lightweight network and performing pose estimation based on the object detection frame, multiple key points of the object are obtained; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network is also It includes an upsampling module connected with the MobileNet network structure. In this way, the speed of posture estimation can be accelerated, not only can the accuracy of behavior recognition results be improved, but also behavior recognition can be performed in real time.

第二方面的实施例Embodiments of the second aspect

本申请实施例提供一种行为识别装置，与第一方面的实施例相同内容不再赘述。The embodiment of the present application provides a behavior recognition device, and the same content as the embodiment of the first aspect will not be repeated.

图8是本申请实施例的行为识别装置的一示意图，如图8所示，行为识别装置800包括：FIG. 8 is a schematic diagram of a behavior recognition device according to an embodiment of the present application. As shown in FIG. 8, the behavior recognition device 800 includes:

检测单元801，其对图像中的物体进行检测以获得物体检测框；A detection unit 801, which detects an object in the image to obtain an object detection frame;

估计单元802，其使用轻量级网络并基于所述物体检测框进行姿势估计，获得所述物体的多个关键点；其中，所述轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块；以及An estimation unit 802, which uses a lightweight network and performs pose estimation based on the object detection frame to obtain multiple key points of the object; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the The lightweight network also includes an upsampling module connected to the MobileNet network structure; and

识别单元803，其基于所述多个关键点对所述物体的行为进行识别。A recognition unit 803, which recognizes the behavior of the object based on the multiple key points.

在一些实施例中，所述轻量级网络通过如下方式生成：使用MobileNet来代替级联金字塔网络中的GlobalNet的骨干网络，并且使用所述上采样模块来代替所述级联金字塔网络中的RefineNet和金字塔结构。In some embodiments, the lightweight network is generated by using MobileNet to replace the backbone network of the GlobalNet in the cascaded pyramid network, and using the up-sampling module to replace the RefineNet in the cascaded pyramid network and pyramid structures.

在一些实施例中，所述轻量级网络中使用以MobileNet为骨干网络的Globalnet进行多次下采样，并使用所述上采样模块对所述多次下采样中的一次下采样的结果进行多次上采样。In some embodiments, the lightweight network uses Globalnet with MobileNet as the backbone network to perform multiple downsampling, and uses the upsampling module to perform multiple downsampling on the result of one downsampling in the multiple downsamplings. upsampling.

在一些实施例中，所述上采样模块对所述多次下采样中的最底层下采样的结果进行多次上采样。In some embodiments, the up-sampling module performs multiple up-sampling on the result of the lowest down-sampling in the multiple down-sampling.

值得注意的是，以上仅对与本申请相关的各部件或模块进行了说明，但本申请不限于此。行为识别装置800还可以包括其他部件或者模块，关于这些部件或者模块的具体内容，可以参考相关技术。It should be noted that the above only describes the components or modules related to the present application, but the present application is not limited thereto. The behavior recognition apparatus 800 may also include other components or modules, and for the specific content of these components or modules, reference may be made to related technologies.

为了简单起见，图8中仅示例性示出了各个部件或模块之间的连接关系或信号走向，但是本领域技术人员应该清楚的是，可以采用总线连接等各种相关技术。上述各个部件或模块可以通过例如处理器、存储器等硬件设施来实现；本申请实施例并不对此进行限制。For the sake of simplicity, FIG. 8 only exemplarily shows the connection relationship or signal direction among various components or modules, but it should be clear to those skilled in the art that various related technologies such as bus connection may be used. The foregoing components or modules may be implemented by hardware facilities such as processors and memories; this is not limited in this embodiment of the present application.

由上述实施例可知，使用轻量级网络并基于物体检测框进行姿势估计，获得物体的多个关键点；其中，所述轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块。由此，能够加速姿势估计的速度，不仅能够提高行为识别结果的准确性，而且能够实时地进行行为识别。As can be seen from the above-mentioned embodiments, a lightweight network is used to perform pose estimation based on object detection frames to obtain multiple key points of an object; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the lightweight The network also includes an upsampling module connected with the MobileNet network structure. In this way, the speed of posture estimation can be accelerated, not only can the accuracy of behavior recognition results be improved, but also behavior recognition can be performed in real time.

第三方面的实施例Embodiments of the third aspect

本申请实施例提供一种电子设备，包括有如第二方面的实施例所述的行为识别装置800，其内容被合并于此。该电子设备例如可以是计算机、服务器、工作站、膝上型计算机、智能手机，等等；但本申请实施例不限于此。An embodiment of the present application provides an electronic device, including the behavior recognition apparatus 800 described in the embodiment of the second aspect, the content of which is incorporated herein. The electronic device may be, for example, a computer, a server, a workstation, a laptop computer, a smart phone, etc.; but this embodiment of the present application is not limited thereto.

图9是本申请实施例的电子设备的示意图。如图9所示，电子设备900可以包括：处理器(例如中央处理器CPU)910和存储器920；存储器920耦合到中央处理器910。其中该存储器920可存储各种数据；此外还存储信息处理的程序921，并且在处理器910的控制下执行该程序921。Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 9 , an electronic device 900 may include: a processor (such as a central processing unit CPU) 910 and a memory 920 ; the memory 920 is coupled to the central processing unit 910 . The memory 920 can store various data; in addition, it also stores a program 921 for information processing, and executes the program 921 under the control of the processor 910 .

在一些实施例中，行为识别装置800的功能被集成到处理器910中实现。其中，处理器910被配置为实现如第一方面的实施例所述的行为识别方法。In some embodiments, the functions of the behavior recognition device 800 are integrated into the processor 910 for implementation. Wherein, the processor 910 is configured to implement the behavior recognition method described in the embodiment of the first aspect.

在一些实施例中，行为识别装置800与处理器910分开配置，例如可以将行为识别装置800配置为与处理器910连接的芯片，通过处理器910的控制来实现行为识别装置800的功能。In some embodiments, the behavior recognition device 800 is configured separately from the processor 910 , for example, the behavior recognition device 800 can be configured as a chip connected to the processor 910 , and the functions of the behavior recognition device 800 are realized through the control of the processor 910 .

例如，处理器910被配置为进行如下的控制：对图像中的物体进行检测以获得物体检测框；使用轻量级网络并基于所述物体检测框进行姿势估计，获得所述物体的多个关键点；其中，所述轻量级网络的骨干网络为MobileNet网络结构，并且所述轻量级网络还包括与所述MobileNet网络结构连接的上采样模块；以及基于所述多个关键点对所述物体的行为进行识别。For example, the processor 910 is configured to perform the following control: detect an object in the image to obtain an object detection frame; use a lightweight network to perform pose estimation based on the object detection frame, and obtain multiple key points of the object point; wherein, the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network also includes an upsampling module connected to the MobileNet network structure; and based on the multiple key points to the Object behavior is recognized.

此外，如图9所示，电子设备900还可以包括：输入输出(I/O)设备930和显示器940等；其中，上述部件的功能与现有技术类似，此处不再赘述。值得注意的是，电子设备900也并不是必须要包括图9中所示的所有部件；此外，电子设备900还可以包括图9中没有示出的部件，可以参考相关技术。In addition, as shown in FIG. 9 , the electronic device 900 may further include: an input/output (I/O) device 930 and a display 940 , etc.; where the functions of the above components are similar to those of the prior art, and will not be repeated here. It should be noted that the electronic device 900 does not necessarily include all the components shown in FIG. 9 ; in addition, the electronic device 900 may also include components not shown in FIG. 9 , and reference may be made to related technologies.

本申请实施例还提供一种计算机可读程序，其中当在电子设备中执行所述程序时，所述程序使得计算机在所述电子设备中执行如第一方面的实施例所述的行为识别方法。The embodiment of the present application also provides a computer-readable program, wherein when the program is executed in the electronic device, the program causes the computer to execute the behavior recognition method as described in the embodiment of the first aspect in the electronic device .

本申请实施例还提供一种存储有计算机可读程序的存储介质，其中所述计算机可读程序使得计算机在电子设备中执行如第一方面的实施例所述的行为识别方法。The embodiment of the present application also provides a storage medium storing a computer-readable program, wherein the computer-readable program enables the computer to execute the behavior recognition method as described in the embodiment of the first aspect in the electronic device.

本申请以上的装置和方法可以由硬件实现，也可以由硬件结合软件实现。本申请涉及这样的计算机可读程序，当该程序被逻辑部件所执行时，能够使该逻辑部件实现上文所述的装置或构成部件，或使该逻辑部件实现上文所述的各种方法或步骤。本申请还涉及用于存储以上程序的存储介质，如硬盘、磁盘、光盘、DVD、flash存储器等。The above devices and methods in this application can be implemented by hardware, or by combining hardware and software. The present application relates to a computer-readable program that, when executed by a logic component, enables the logic component to realize the above-mentioned device or constituent component, or enables the logic component to realize the above-mentioned various methods or steps. The present application also relates to storage media for storing the above programs, such as hard disks, magnetic disks, optical disks, DVDs, flash memories, and the like.

结合本申请实施例描述的方法/装置可直接体现为硬件、由处理器执行的软件模块或二者组合。例如，图中所示的功能框图中的一个或多个和/或功能框图的一个或多个组合，既可以对应于计算机程序流程的各个软件模块，亦可以对应于各个硬件模块。这些软件模块，可以分别对应于图中所示的各个步骤。这些硬件模块例如可利用现场可编程门阵列(FPGA)将这些软件模块固化而实现。The method/device described in conjunction with the embodiments of the present application may be directly embodied as hardware, a software module executed by a processor, or a combination of both. For example, one or more of the functional block diagrams shown in the figure and/or one or more combinations of the functional block diagrams may correspond to each software module or each hardware module of the computer program flow. These software modules may respectively correspond to the steps shown in the figure. These hardware modules, for example, can be realized by solidifying these software modules by using a Field Programmable Gate Array (FPGA).

软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动磁盘、CD-ROM或者本领域已知的任何其它形式的存储介质。可以将一种存储介质耦接至处理器，从而使处理器能够从该存储介质读取信息，且可向该存储介质写入信息；或者该存储介质可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。该软件模块可以存储在移动终端的存储器中，也可以存储在可插入移动终端的存储卡中。例如，若设备(如移动终端)采用的是较大容量的MEGA-SIM卡或者大容量的闪存装置，则该软件模块可存储在该MEGA-SIM卡或者大容量的闪存装置中。A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other form of storage medium known in the art. A storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium, or it can be an integral part of the processor. The processor and storage medium can be located in the ASIC. The software module can be stored in the memory of the mobile terminal, or can be stored in a memory card that can be inserted into the mobile terminal. For example, if the device (such as a mobile terminal) adopts a large-capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or large-capacity flash memory device.

针对附图中描述的功能方框中的一个或多个和/或功能方框的一个或多个组合，可以实现为用于执行本申请所描述功能的通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意适当组合。针对附图描述的功能方框中的一个或多个和/或功能方框的一个或多个组合，还可以实现为计算设备的组合，例如，DSP和微处理器的组合、多个微处理器、与DSP通信结合的一个或多个微处理器或者任何其它这种配置。One or more of the functional blocks described in the accompanying drawings and/or one or more combinations of the functional blocks can be implemented as a general-purpose processor, a digital signal processor (DSP) for performing the functions described in this application ), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or any suitable combination thereof. One or more of the functional blocks described in the drawings and/or one or more combinations of the functional blocks can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors processor, one or more microprocessors in communication with a DSP, or any other such configuration.

以上结合具体的实施方式对本申请进行了描述，但本领域技术人员应该清楚，这些描述都是示例性的，并不是对本申请保护范围的限制。本领域技术人员可以根据本申请原理对本申请做出各种变型和修改，这些变型和修改也在本申请的范围内。The present application has been described above in conjunction with specific implementation manners, but those skilled in the art should be clear that these descriptions are exemplary rather than limiting the protection scope of the present application. Those skilled in the art can make various variations and modifications to the application based on the principle of the application, and these variations and modifications are also within the scope of the application.

Claims

1. A behavior recognition apparatus, the apparatus comprising:

a detection unit that detects an object in an image to obtain an object detection frame;

an estimation unit that obtains a plurality of key points of the object using a lightweight network and performing pose estimation based on the object detection frame; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and

and an identification unit that identifies a behavior of the object based on the plurality of key points.

2. The apparatus of claim 1, wherein the lightweight network is generated by: a backbone network of GlobalNet in a cascading pyramid network is replaced with MobileNet and the upsampling module is used to replace RefineNet and pyramid structures in the cascading pyramid network.

3. The apparatus of claim 2, wherein the lightweight network uses Globalnet with MobileNet as backbone network to perform multiple downsampling, and wherein the upsampling module uses the upsampling module to upsample the result of one of the multiple downsampling.

4. The apparatus of claim 3, wherein the upsampling module upsamples the result of a bottommost downsampling of the plurality of downsampling a plurality of times.

5. A method of behavior recognition, the method comprising:

detecting an object in the image to obtain an object detection frame;

using a lightweight network and performing pose estimation based on the object detection frame to obtain a plurality of key points of the object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and

and identifying the behavior of the object based on the plurality of key points.

6. The method of claim 5, wherein the lightweight network is generated by: a backbone network of GlobalNet in a cascading pyramid network is replaced with MobileNet and the upsampling module is used to replace RefineNet and pyramid structures in the cascading pyramid network.

7. The method of claim 6, wherein the lightweight network uses a Globalnet with a MobileNet backbone network for multiple downsampling, and wherein the upsampling module upsamples the result of one of the multiple downsampling.

8. The method of claim 7, wherein the upsampling module upsamples the result of a bottommost downsampling of the plurality of downsampling a plurality of times.

9. An electronic device comprising a memory storing a computer program and a processor configured to execute the computer program to implement the behavior recognition method of any one of claims 5 to 8.