CN116206356A - Behavior recognition device and method and electronic equipment - Google Patents
Behavior recognition device and method and electronic equipment Download PDFInfo
- Publication number
- CN116206356A CN116206356A CN202111443162.9A CN202111443162A CN116206356A CN 116206356 A CN116206356 A CN 116206356A CN 202111443162 A CN202111443162 A CN 202111443162A CN 116206356 A CN116206356 A CN 116206356A
- Authority
- CN
- China
- Prior art keywords
- network
- mobilenet
- lightweight
- behavior recognition
- present application
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application provides a behavior recognition device and method and electronic equipment. The method comprises the following steps: detecting an object in the image to obtain an object detection frame; using a lightweight network and performing pose estimation based on the object detection frame to obtain a plurality of key points of the object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and identifying behavior of the object based on the plurality of keypoints. Thus, the speed of gesture estimation can be increased, and not only the accuracy of the behavior recognition result can be improved, but also the behavior recognition can be performed in real time.
Description
Technical Field
The embodiment of the application relates to the technical field of image detection.
Background
Recent advances in artificial intelligence and deep learning techniques have enabled image-based behavior recognition (behavior recognition) techniques. Behavior recognition techniques can recognize complex behaviors consisting of multiple actions (actions) or movements (movements). The object frame can be detected by the object detection module, and the detection of a plurality of key points is performed by the pose estimation (pose estimation) module, so that the behavior of the object is identified.
It should be noted that the foregoing description of the background art is only for the purpose of facilitating a clear and complete description of the technical solutions of the present application and for the convenience of understanding by those skilled in the art, and is not to be construed as merely illustrative of the background art section of the present application.
Disclosure of Invention
However, the inventors found that: the gesture estimation module is a relatively time-consuming part, and if the number of detected objects increases, the time required for gesture estimation increases greatly, and real-time recognition cannot be realized, so that the gesture estimation module is difficult to apply to occasions with high real-time requirements such as embedded equipment.
Aiming at least one of the technical problems, the embodiment of the application provides a behavior recognition device and method and electronic equipment, which are expected to improve the speed of behavior recognition on the premise of ensuring the accuracy of behavior recognition results.
According to an aspect of the embodiments of the present application, there is provided a behavior recognition apparatus, including:
a detection unit that detects an object in an image to obtain an object detection frame;
an estimation unit that obtains a plurality of key points of the object using a lightweight network and performing pose estimation based on the object detection frame; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and
and an identification unit that identifies a behavior of the object based on the plurality of key points.
According to another aspect of the embodiments of the present application, there is provided a behavior recognition method, including:
detecting an object in the image to obtain an object detection frame;
using a lightweight network and performing pose estimation based on the object detection frame to obtain a plurality of key points of the object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and
and identifying the behavior of the object based on the plurality of key points.
According to another aspect of embodiments of the present application, there is provided an electronic device comprising a memory storing a computer program and a processor configured to execute the computer program to implement the behavior recognition method as described above.
One of the beneficial effects of the embodiment of the application is that: using a lightweight network and carrying out gesture estimation based on an object detection frame to obtain a plurality of key points of an object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure. Thus, the speed of gesture estimation can be increased, and not only the accuracy of the behavior recognition result can be improved, but also the behavior recognition can be performed in real time.
Specific implementations of the embodiments of the present application are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the embodiments of the present application may be employed. It should be understood that the embodiments of the present application are not limited in scope thereby. The embodiments of the present application include many variations, modifications and equivalents within the spirit and scope of the appended claims.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. It is obvious that the drawings in the following description are only examples of the present application, and that other embodiments may be obtained from these drawings without inventive work for a person of ordinary skill in the art. In the drawings:
FIG. 1 is a schematic diagram of a framework for behavior recognition according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a frame of a CPN;
FIG. 3 is a schematic diagram of a behavior recognition method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a lightweight network according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a Block in the lightweight network of FIG. 4;
FIG. 6 is a schematic diagram of SeModule in Block of FIG. 5;
FIG. 7 is an exemplary diagram of behavior recognition according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a behavior recognition device of an embodiment of the present application;
fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The foregoing and other features of embodiments of the present application will become apparent from the following description, taken in conjunction with the accompanying drawings. In the specification and drawings, there have been specifically disclosed specific embodiments of the present application which are indicative of some of the ways in which the principles of the embodiments of the present application may be employed, it being understood that the present application is not limited to the described embodiments, but, on the contrary, the embodiments of the present application include all modifications, variations and equivalents falling within the scope of the appended claims.
In the embodiments of the present application, the terms "first," "second," and the like are used to distinguish between different elements from each other by reference, but do not denote a spatial arrangement or a temporal order of the elements, and the elements should not be limited by the terms. The term "and/or" includes any and all combinations of one or more of the associated listed terms. The terms "comprises," "comprising," "including," "having," and the like, are intended to reference the presence of stated features, elements, components, or groups of components, but do not preclude the presence or addition of one or more other features, elements, components, or groups of components.
In the embodiments of the present application, the singular forms "a," an, "and" the "include plural referents and should be construed broadly to mean" one "or" one type "and not limited to" one "or" another; furthermore, the term "comprising" is to be interpreted as including both the singular and the plural, unless the context clearly dictates otherwise. Furthermore, the term "according to" should be understood as "at least partially according to … …", and the term "based on" should be understood as "based at least partially on … …", unless the context clearly indicates otherwise.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments in combination with or instead of the features of the other embodiments. The term "comprises/comprising" when used herein refers to the presence of a feature, integer, step or component, but does not exclude the presence or addition of one or more other features, integers, steps or components.
FIG. 1 is a schematic diagram of a framework for behavior recognition according to an embodiment of the present application. As shown in fig. 1, for an input image, detection may be performed using an object detection module 101, obtaining an object detection frame; pose estimation may then be performed using pose estimation module 102 to obtain keypoints for one or more objects. Feature extraction or the like may be performed using the feature calculation module 103, and then the behavior of the object is recognized by the lightweight classifier 104. In making the pose estimation, a neural network model or the like may be used, for example, using a cascading pyramid network (CPN, cascaded Pyramid Network) model.
FIG. 2 is a schematic diagram of a framework of a CPN, which can include GlobalNet and RefineNet; wherein the backbone network of the GlobalNet is ResNet. The golbank net is responsible for detecting network key points, and has a good key point prediction effect on parts (such as eyes, arms and the like) which are easy to detect, and the adopted loss function is L2 loss. A convolution operation of 1*1 may be used on the feature map (featuremap) prior to each elem-sum operation. The refinnenet corrects the result of GolbalNet prediction; golbank net predicts that the keypoints of the body part that are occluded, invisible or have a complex background are more error-prone, and RefineNet can correct the keypoints. For the details of the model of CPN and the like, reference may also be made to the related art.
However, CPN is a relatively heavy-weight network structure, and if the number of detected objects increases, the time required for pose estimation increases significantly, and real-time recognition cannot be realized, so that it is difficult to apply the CPN to a case where real-time requirements such as embedded devices are high.
In the embodiment of the present application, the object as the detection target may be a human body of various ages, for example, an elderly person, a child, an elderly person and/or a nursing person, a child and/or a guardian. The present application is not limited thereto, and the object as the detection target may be a human body having a vital sign, a robot not having a vital sign, or the like.
Example of the first aspect
The embodiment of the application provides a behavior recognition method. FIG. 3 is a schematic diagram of a behavior recognition method according to an embodiment of the present application, as shown in FIG. 3, the method includes:
301, detecting an object in an image to obtain an object detection frame;
302, using a lightweight network and performing gesture estimation based on the object detection frame to obtain a plurality of key points of an object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and
303, identifying the behavior of the object based on the plurality of keypoints.
It should be noted that fig. 3 above is only illustrative of the embodiment of the present application, but the present application is not limited thereto. For example, the order of execution among the operations may be appropriately adjusted, and other operations may be added or some of the operations may be reduced. Those skilled in the art can make appropriate modifications in light of the above, and are not limited to the description of fig. 3 above.
In some embodiments, the image with the object to be detected may be one or more images in a video frame, i.e. the image may be a dynamic image, but the application is not limited thereto, and the embodiments of the application are equally applicable to one or more static images.
In some embodiments, the lightweight network is generated by: a MobileNet is used instead of the backbone network of globalet in the cascaded pyramid network and an upsampling module is used instead of the refianenet and pyramid structures in the cascaded pyramid network.
For example, using MobileNetv3 instead of Resnet-50 backbond allows fewer parameters to be used and reduces the memory footprint required during reasoning (reference), thereby speeding up pose estimation. In addition, the refinnenet and pyramid structures of the CPN are removed, and the output of the MobileNet v3 is directly processed by using the upsampling module, so that the speed of posture estimation can be further improved.
In some embodiments, the lightweight network uses Globalnet with MobileNet as backbone network to perform multiple downsampling, and uses upsampling module to perform multiple upsampling on the result of one downsampling in the multiple downsampling. Therefore, the up-sampling module can be directly used for simplifying the network structure, realizing the lightweight network structure and further improving the speed of gesture estimation.
In some embodiments, the upsampling module upsamples the result of the lowest downsampling of the plurality of downsampling a plurality of times. Thus, the result of the bottom-layer downsampling can be used, so that a lightweight network structure can be realized, the speed of gesture estimation can be further improved, and the accuracy of gesture estimation can be further improved.
Fig. 4 is a schematic diagram of a lightweight network according to an embodiment of the present application, as shown in fig. 4, and operation of convolution "Conv2d,16x3x3,2" (shown as 401) and batch normalization (BN, batch norm) "BN, hswitch" (shown as 402) may be performed for an input image of 3 xHxW.
As shown in fig. 4, downsampling may be performed using mobilenet 3 (as shown in 403 to 406), such as operations of "Block,3x16, 1, relu, none", "Block,3x16x64x24,2, relu, none", "Block,3x24x72x24,1, relu, none", etc. (as shown in 403), "Block,5x24x72x40,2, relu, se", "Block,5x40x120x40,1, relu, se", "Block,5x40x120x40,1, relu, se", etc. (as shown in 404), "Block,3x40x240x80,2, hswick, none" "Block,3x80x200x80,1, hswick, none" "Block,3x80x184x80,1, hswick, none" "and the like (shown at 405), and" "Block,3x80x480x112,1, hswick, se" "Block,3x112x672x160,1, hswick, se" "Block,5x160x672x160,2, hswick, se" "Block,5x160x960x160,1, hswick, se" "and the like (shown at 406). Among them, a linear correction unit (Relu, rectified linear unit), hswick, se, etc. may be used, and for the specific meaning of these parameters, reference may be made to the related art.
Fig. 5 is a schematic diagram of a Block in the lightweight network of fig. 4. As shown in fig. 5, operations such as "Conv72x1x1,1" (shown as 501), "BN, nolinear" (shown as 502), "Conv 72x5x5,2" (shown as 503), "BN, nolinear" (shown as 504), "Conv 40x1x1,1" (shown as 505), "BN" (shown as 506) may be performed; in addition, "Conv 40x1, 1" (shown as 507), and "BN" (shown as 508) may also be performed. As shown in fig. 5, the Block may further include a SeModule module 509.
Fig. 6 is a schematic diagram of the SeModule module 509 in the Block of fig. 5. As shown in FIG. 6, operations such as "AdapteveAvgPool" (shown as 601), "Conv 10x1, 1" (shown as 602), "BN, relu" (shown as 603), "Conv 40x1, 1" (shown as 604), "BN, hsigmoid" (shown as 605) and the like may be performed.
Fig. 4 to 6 above schematically illustrate Globalnet using mobilenet v3 instead of Resnet, and the upsampling module of the embodiment of the present application is described below.
As shown in fig. 4, the upsampling module upsamples the result of the lowest downsampling (shown as 406) of the multiple downsampling multiple times. As shown in fig. 4, operations of "Conv fransposse 2d,80x4x4,2" (shown as 407), "BN, relu" (shown as 408), "Conv fransposse 2d,40x4x4,2" (shown as 409), "BN, relu" (shown as 410), "Conv fransposse 2d,24x4x4,2" (shown as 411), "BN, relu" (shown as 412), "Conv2d,17x1x1,1" (shown as 413) and the like may be performed. As shown in fig. 4, a thermodynamic diagram (hetmap) of 17xHxW may be output.
The lightweight network of the present application is illustrated by way of example in fig. 4-6 above, but the present application is not limited thereto.
Fig. 7 is an exemplary diagram of behavior recognition according to an embodiment of the present application, in which only one object (human body) is labeled with a reference numeral for simplicity. As shown in fig. 7, through behavior recognition in the embodiment of the present application, an object detection frame 701 may be generated for a plurality of objects, and in addition, a plurality of connected key points 702 may be obtained, so that the behavior of the objects may be recognized, so that not only the behavior of the objects may be accurately recognized, but also the speed of behavior recognition is fast, and the requirement of real-time performance may be satisfied.
Table 1 shows a comparison of CPN (denoted CPN-Resnet 50) and lightweight network (denoted MobileNet-transfer) of embodiments of the present application. As shown in table 1, the lightweight network of the embodiments of the present application can significantly reduce memory occupancy and parameters.
TABLE 1
Model | GPU memory (M) | Weight parameter (M) |
CPN-Resnet50 | 2275 | 108.9 |
MobileNet-transpose | 839 | 6.4 |
Table 2 shows a comparison of an existing pose estimation model (expressed in CPN-EfficientNet), a model of the backbone network with MobileNet v3 instead of EfficientNet (expressed in CPN-MobileNet v 3), and a lightweight network of an embodiment of the present application (expressed in MobileNet-Transose).
TABLE 2
Model | Weight parameter (M) | FPS | AP(0.5:0.95) | AR(0.5:0.95) |
CPN-EfficientNet | 7.5M | 17.16 | 0.591 | 0.631 |
CPN-MobileNetv3 | 5.5M | 20.91 | 0.556 | 0.602 |
MobileNet-transpose | 6.4M | 21.54 | 0.607 | 0.646 |
As shown in Table 2, CPN-MobileNet v3 has a reduced performance, although its velocity is increased relative to CPN-EfficientNet. The lightweight network MobileNet-transfer in the embodiment of the application directly uses the up-sampling module, so that the accuracy of behavior recognition can be ensured, the system performance can be improved, and the speed of behavior recognition can be increased.
The above only describes each step or process related to the present application, but the present application is not limited thereto. The behavior recognition method may also comprise other steps or processes, for the details of which reference may be made to the prior art. In addition, the embodiments of the present application have been described above by taking only some structures of the behavior recognition model as examples, but the present application is not limited to these structures, and these structures may be modified appropriately, and implementation manners of these modifications should be included in the scope of the embodiments of the present application.
The above embodiments are merely illustrative of the embodiments of the present application, but the present application is not limited thereto, and appropriate modifications may be made on the basis of the above embodiments. For example, each of the above embodiments may be used alone, or one or more of the above embodiments may be combined.
As can be seen from the above embodiments, a plurality of key points of an object are obtained by using a lightweight network and performing pose estimation based on an object detection frame; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure. Thus, the speed of gesture estimation can be increased, and not only the accuracy of the behavior recognition result can be improved, but also the behavior recognition can be performed in real time.
Embodiments of the second aspect
The embodiments of the present application provide a behavior recognition device, and the content of the behavior recognition device is the same as that of the embodiment of the first aspect and will not be repeated.
Fig. 8 is a schematic diagram of a behavior recognition device according to an embodiment of the present application, and as shown in fig. 8, the behavior recognition device 800 includes:
a detection unit 801 that detects an object in an image to obtain an object detection frame;
an estimation unit 802 that obtains a plurality of key points of the object using a lightweight network and performing pose estimation based on the object detection frame; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and
and an identification unit 803 that identifies the behavior of the object based on the plurality of key points.
In some embodiments, the lightweight network is generated by: a backbone network of GlobalNet in a cascading pyramid network is replaced with MobileNet and the upsampling module is used to replace RefineNet and pyramid structures in the cascading pyramid network.
In some embodiments, the lightweight network uses Globalnet with MobileNet as backbone network to perform multiple downsampling, and the upsampling module is used to upsample the result of one downsampling of the multiple downsampling multiple times.
In some embodiments, the upsampling module upsamples the result of the lowest downsampling of the plurality of downsampling a plurality of times.
It should be noted that only the respective components or modules related to the present application are described above, but the present application is not limited thereto. The behavior recognition apparatus 800 may further include other components or modules, and regarding the specific contents of these components or modules, reference may be made to the related art.
For simplicity, the connection relationships or signal trends between the various components or modules are shown only by way of example in fig. 8, but it should be apparent to those skilled in the art that various related techniques such as bus connections may be employed. The above-described respective components or modules may be implemented by hardware means such as a processor, a memory, or the like; the embodiments of the present application are not limited in this regard.
The above embodiments are merely illustrative of the embodiments of the present application, but the present application is not limited thereto, and appropriate modifications may be made on the basis of the above embodiments. For example, each of the above embodiments may be used alone, or one or more of the above embodiments may be combined.
As can be seen from the above embodiments, a plurality of key points of an object are obtained by using a lightweight network and performing pose estimation based on an object detection frame; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure. Thus, the speed of gesture estimation can be increased, and not only the accuracy of the behavior recognition result can be improved, but also the behavior recognition can be performed in real time.
Embodiments of the third aspect
An embodiment of the present application provides an electronic device, including a behavior recognition apparatus 800 according to an embodiment of the second aspect, and the content of which is incorporated herein. The electronic device may be, for example, a computer, server, workstation, laptop, smart phone, etc.; embodiments of the present application are not so limited.
Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 may include: a processor (e.g., central processing unit, CPU) 910 and a memory 920; memory 920 is coupled to central processor 910. Wherein the memory 920 may store various data; further, a program 921 for information processing is stored, and the program 921 is executed under the control of the processor 910.
In some embodiments, the functionality of behavior recognition device 800 is integrated into processor 910 for implementation. Wherein the processor 910 is configured to implement the behavior recognition method as described in the embodiments of the first aspect.
In some embodiments, the behavior recognition apparatus 800 is configured separately from the processor 910, for example, the behavior recognition apparatus 800 may be configured as a chip connected to the processor 910, and the functions of the behavior recognition apparatus 800 are implemented by the control of the processor 910.
For example, the processor 910 is configured to control: detecting an object in the image to obtain an object detection frame; using a lightweight network and performing pose estimation based on the object detection frame to obtain a plurality of key points of the object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and identifying behavior of the object based on the plurality of keypoints.
In some embodiments, the lightweight network is generated by: a backbone network of GlobalNet in a cascading pyramid network is replaced with MobileNet and the upsampling module is used to replace RefineNet and pyramid structures in the cascading pyramid network.
In some embodiments, the lightweight network uses Globalnet with MobileNet as backbone network to perform multiple downsampling, and the upsampling module is used to upsample the result of one downsampling of the multiple downsampling multiple times.
In some embodiments, the upsampling module upsamples the result of the lowest downsampling of the plurality of downsampling a plurality of times.
In addition, as shown in fig. 9, the electronic device 900 may further include: input output (I/O) devices 930 and a display 940; wherein, the functions of the above components are similar to the prior art, and are not repeated here. It is noted that the electronic device 900 need not include all of the components shown in fig. 9; in addition, the electronic device 900 may further include components not shown in fig. 9, and reference may be made to the related art.
Embodiments of the present application also provide a computer readable program, wherein the program when executed in an electronic device causes the computer to perform the behavior recognition method as described in the embodiments of the first aspect in the electronic device.
Embodiments of the present application also provide a storage medium storing a computer-readable program, where the computer-readable program causes a computer to execute the behavior recognition method according to the embodiment of the first aspect in an electronic device.
The apparatus and method of the present application may be implemented by hardware, or may be implemented by hardware in combination with software. The present application relates to a computer readable program which, when executed by a logic means, enables the logic means to carry out the apparatus or constituent means described above, or enables the logic means to carry out the various methods or steps described above. The present application also relates to a storage medium such as a hard disk, a magnetic disk, an optical disk, a DVD, a flash memory, or the like for storing the above program.
The methods/apparatus described in connection with the embodiments of the present application may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional blocks shown in the figures and/or one or more combinations of the functional blocks may correspond to individual software modules or individual hardware modules of the computer program flow. These software modules may correspond to the individual steps shown in the figures, respectively. These hardware modules may be implemented, for example, by solidifying the software modules using a Field Programmable Gate Array (FPGA).
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software modules may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the apparatus (e.g., mobile terminal) employs a MEGA-SIM card of a relatively large capacity or a flash memory device of a large capacity, the software module may be stored in the MEGA-SIM card or the flash memory device of a large capacity.
One or more of the functional blocks described in the figures and/or one or more combinations of functional blocks may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof for use in performing the functions described herein. One or more of the functional blocks described with respect to the figures and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.
The present application has been described in connection with specific embodiments, but it should be apparent to those skilled in the art that these descriptions are intended to be illustrative and not limiting. Various modifications and adaptations of the disclosure may occur to those skilled in the art and are within the scope of the disclosure.
Claims (9)
1. A behavior recognition apparatus, the apparatus comprising:
a detection unit that detects an object in an image to obtain an object detection frame;
an estimation unit that obtains a plurality of key points of the object using a lightweight network and performing pose estimation based on the object detection frame; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and
and an identification unit that identifies a behavior of the object based on the plurality of key points.
2. The apparatus of claim 1, wherein the lightweight network is generated by: a backbone network of GlobalNet in a cascading pyramid network is replaced with MobileNet and the upsampling module is used to replace RefineNet and pyramid structures in the cascading pyramid network.
3. The apparatus of claim 2, wherein the lightweight network uses Globalnet with MobileNet as backbone network to perform multiple downsampling, and wherein the upsampling module uses the upsampling module to upsample the result of one of the multiple downsampling.
4. The apparatus of claim 3, wherein the upsampling module upsamples the result of a bottommost downsampling of the plurality of downsampling a plurality of times.
5. A method of behavior recognition, the method comprising:
detecting an object in the image to obtain an object detection frame;
using a lightweight network and performing pose estimation based on the object detection frame to obtain a plurality of key points of the object; the backbone network of the lightweight network is a MobileNet network structure, and the lightweight network further comprises an up-sampling module connected with the MobileNet network structure; and
and identifying the behavior of the object based on the plurality of key points.
6. The method of claim 5, wherein the lightweight network is generated by: a backbone network of GlobalNet in a cascading pyramid network is replaced with MobileNet and the upsampling module is used to replace RefineNet and pyramid structures in the cascading pyramid network.
7. The method of claim 6, wherein the lightweight network uses a Globalnet with a MobileNet backbone network for multiple downsampling, and wherein the upsampling module upsamples the result of one of the multiple downsampling.
8. The method of claim 7, wherein the upsampling module upsamples the result of a bottommost downsampling of the plurality of downsampling a plurality of times.
9. An electronic device comprising a memory storing a computer program and a processor configured to execute the computer program to implement the behavior recognition method of any one of claims 5 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111443162.9A CN116206356A (en) | 2021-11-30 | 2021-11-30 | Behavior recognition device and method and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111443162.9A CN116206356A (en) | 2021-11-30 | 2021-11-30 | Behavior recognition device and method and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116206356A true CN116206356A (en) | 2023-06-02 |
Family
ID=86506386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111443162.9A Pending CN116206356A (en) | 2021-11-30 | 2021-11-30 | Behavior recognition device and method and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116206356A (en) |
-
2021
- 2021-11-30 CN CN202111443162.9A patent/CN116206356A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109522874B (en) | Human body action recognition method and device, terminal equipment and storage medium | |
CN111274977B (en) | Multitasking convolutional neural network model, using method, device and storage medium | |
CN110363817B (en) | Target pose estimation method, electronic device, and medium | |
CN109829396B (en) | Face recognition motion blur processing method, device, equipment and storage medium | |
CN109074497B (en) | Identifying activity in a sequence of video images using depth information | |
CN113807361B (en) | Neural network, target detection method, neural network training method and related products | |
CN110991513A (en) | Image target recognition system and method with human-like continuous learning capability | |
CN113221771A (en) | Living body face recognition method, living body face recognition device, living body face recognition equipment, storage medium and program product | |
CN111104925A (en) | Image processing method, image processing apparatus, storage medium, and electronic device | |
CN115631112B (en) | Building contour correction method and device based on deep learning | |
WO2020119058A1 (en) | Micro-expression description method and device, computer device and readable storage medium | |
CN112136140A (en) | Method and apparatus for image recognition | |
CN112633159A (en) | Human-object interaction relation recognition method, model training method and corresponding device | |
KR102421604B1 (en) | Image processing methods, devices and electronic devices | |
CN114022748B (en) | Target identification method, device, equipment and storage medium | |
CN110598647B (en) | Head posture recognition method based on image recognition | |
US11709914B2 (en) | Face recognition method, terminal device using the same, and computer readable storage medium | |
CN111507252A (en) | Human body falling detection device and method, electronic terminal and storage medium | |
CN114049491A (en) | Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium | |
CN116071625B (en) | Training method of deep learning model, target detection method and device | |
WO2023109086A1 (en) | Character recognition method, apparatus and device, and storage medium | |
CN116309643A (en) | Face shielding score determining method, electronic equipment and medium | |
CN116206356A (en) | Behavior recognition device and method and electronic equipment | |
CN112001479B (en) | Processing method and system based on deep learning model and electronic equipment | |
CN115346270A (en) | Traffic police gesture recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |