CN111931725B

CN111931725B - Human motion recognition method, device and storage medium

Info

Publication number: CN111931725B
Application number: CN202011007051.9A
Authority: CN
Inventors: 李季; 吕彬; 吕少松; 赵昕蕾; 赵雪
Original assignee: Beijing Infinity Innovation Technology Co ltd
Current assignee: Beijing Infinity Innovation Technology Co ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2023-10-13
Anticipated expiration: 2040-09-23
Also published as: CN111931725A

Abstract

One or more embodiments of the present disclosure provide a human motion recognition method, apparatus, and storage medium, wherein the human motion recognition method includes: acquiring an image containing a human body; identifying human bones from the image to obtain an identification result of the human bones; determining the size proportion among bones of a plurality of parts of the human body according to the identification result of the human body bones, and determining the three-dimensional structure of the human body bones; determining a target object according to the size proportion; and determining the action type of the target object according to the three-dimensional structure. The method can improve the accuracy of human motion recognition and can recognize specific people.

Description

Human motion recognition method, device and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a human motion recognition method, apparatus, and storage medium.

Background

In the process of recognizing human body actions from images, the requirements on factors such as image quality, image size, image light sensation quality and the like are approximately the same, but the specific requirements on each factor are often contradictory. In the existing traditional image recognition mode, the situation that the image quality is improved, but noise in the image is increased or part of image characteristics are lost, the recognition difficulty of the image is increased, or the image quality is concentrated in different environments is emphasized, and human body actions are difficult to recognize through the image is often caused. Based on the existing image recognition technology, specific objects and human skeleton structures can be recognized, meanwhile, the image structures of the outlines and the colors of the image targets can be subjected to multi-constraint multi-target action recognition according to the whole image structures, including the fixed scenes, the colors of the image targets, the image outlines and the light sensation, and the change of the fixed scenes, the colors of the target images, the distances and the light sensation often cause the problem that the action cannot be effectively recognized, and meanwhile, the problem that the existing recognition model cannot be effectively and rapidly adjusted is also caused.

Disclosure of Invention

One or more embodiments of the present disclosure provide a method, apparatus, and storage medium for recognizing human body actions, which are used to solve the problem in the related art that in the process of recognizing human body actions based on images, the accuracy of recognition results is not high due to the influence of the background environment in the images.

One or more embodiments of the present disclosure provide a human motion recognition method, including: acquiring an image containing a human body; identifying human bones from the image to obtain an identification result of the human bones; determining the size proportion among bones of a plurality of parts of the human body according to the identification result of the human body bones, and determining the three-dimensional structure of the human body bones; determining a target object according to the size proportion; and determining the action type of the target object according to the three-dimensional structure.

Optionally, identifying the human skeleton from the image includes: inputting the image into a pre-trained bone extraction model to obtain bone data, wherein the bone data comprises position information of each joint point of the human body; inputting the position information of each joint point into a human skeleton recognition model obtained through training in advance to obtain a human skeleton recognition result, wherein the human skeleton recognition result at least comprises the following information: the name of the bones of each part of the human body, the size of the bones of each part of the human body and the position information of the bones of each part of the human body.

Optionally, determining the size ratio between bones of the multiple parts of the human body according to the identification result of the bones of the human body includes: inputting the human skeleton recognition result into a pre-trained human skeleton proportion recognition model to obtain the size proportion among skeletons of a plurality of parts of the human body, wherein the parts at least comprise the following three parts: legs, waist, shoulders, arms, neck and hips.

Optionally, determining the three-dimensional structure of the human skeleton according to the recognition result of the human skeleton includes: inputting the identification result of the human skeleton into a pre-trained human skeleton three-dimensional structure identification model to obtain the three-dimensional structure of the human skeleton.

Optionally, determining the action type of the target object according to the three-dimensional structure includes: and inputting the data of the three-dimensional structure into a human body action classification model which is obtained through training in advance, and obtaining the action type of the target object.

Optionally, determining the action type of the target object according to the three-dimensional structure includes: acquiring the type of a scene corresponding to the action of the target object; and inputting the type of the scene and the data of the three-dimensional structure into the human motion classification model to obtain the motion type of the target object in the scene.

Optionally, the method further comprises: receiving a target image and annotation information of the target image, wherein the annotation information comprises action types of human bodies in the target image and types of scenes corresponding to the action types; and using the target image and the labeling information as training data, and retraining the human body action recognition model based on the training data to obtain a retrained human body action recognition model.

Optionally, the human action recognition model is deployed in a device, and the method further includes: acquiring working condition parameters of the equipment and parameters of the human body action recognition model; determining optimal configuration parameters of the equipment according to the working condition parameters of the equipment and the parameters of the human body action recognition model; and adjusting the control parameters of the equipment according to the optimal configuration parameters.

One or more embodiments of the present disclosure also provide an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing any one of the human action recognition methods described above when executing the program.

One or more embodiments of the present disclosure also provide an electronic device, a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the human action recognition methods described above.

According to the human body action recognition method of one or more embodiments of the present disclosure, human body bones are recognized from an acquired image including a human body, a recognition result of the human body bones is obtained, a size ratio among bones of a plurality of parts in the human body and a three-dimensional structure of the human body bones are determined according to the obtained recognition result of the human body bones, and since the size ratio among bones of different parts of the human body and the three-dimensional structure of the human body bones can generally reflect posture information of the human body, and these data are generally not affected by external factors such as a background in the image and a color of the image, the action type of the human body can be more accurately recognized based on these data, and the recognition accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow diagram illustrating a human action recognition method according to one or more embodiments of the present disclosure;

FIG. 2 is a flow diagram illustrating a human action recognition method according to one or more embodiments of the present disclosure;

fig. 3 is a block diagram of an electronic device shown in accordance with one or more embodiments of the present disclosure.

Detailed Description

The technical solutions of the present disclosure will be clearly and completely described below in connection with embodiments, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

In the description of the present disclosure, it should be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," etc. indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present disclosure and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be configured and operated in a particular orientation, and thus should not be construed as limiting the present disclosure.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. Furthermore, the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this disclosure will be understood by those of ordinary skill in the art in the specific context.

Fig. 1 is a flow diagram illustrating a human action recognition method, as shown in fig. 1, according to one or more embodiments of the present disclosure, the method comprising:

step 101: acquiring an image containing a human body;

the image may be, for example, an image, a picture, or a video acquired by a camera.

Step 102: identifying human bones from the image to obtain an identification result of the human bones;

in one example, the image may be input into a trained bone extraction model that may identify the human bones of the image, which may include coordinates of various joints of the human body, for example.

Step 103: determining the size proportion among bones of a plurality of parts of the human body according to the identification result of the human body bones, and determining the three-dimensional structure of the human body bones;

in the above example, after the coordinates of each joint point of the human body are obtained, the coordinates of each joint point of the human body may be inputted as input data to a human body bone ratio recognition model trained beforehand, which is capable of inputting the dimensional ratios between bones of the leg, waist, shoulder, thigh, forearm, neck, hip, and the like of the human body, for example, when the model is trained, two sets of dimensional ratios including the dimensional ratio between the upper arm and neck of the human body and the dimensional ratio between the forearm and thigh are set, and when the coordinates of each joint point of the human body are inputted to the model, the model is capable of outputting two sets of dimensional ratios including the dimensional ratio between the upper arm and neck and the dimensional ratio between the forearm and thigh. It should be noted that the two sets of size ratios are only taken as examples for illustration, and the sizes of bones of the parts of the two sets of size ratios have certain differences for different people, so that if one individual needs to be obviously distinguished from other individuals, the difference between one individual and the other can be better represented by referring to the size ratios among the multiple sets of different parts. Therefore, in one or more embodiments of the present disclosure, the dimensional proportion relationship between bones of multiple parts of the human body and the dimensional proportion relationship of bones in continuous motion can be determined without affecting the system operation efficiency.

The human skeleton proportion recognition model can be obtained by learning, for example, using a currently known supervised learning algorithm with labeled human skeleton data as training data, wherein the human skeleton data can comprise position data (the position data can be coordinate values, for example) of joints of a human body, and the data can be labeled with corresponding attributes in advance according to the joints to which the human skeleton data belong.

Along with the above example, after obtaining the coordinates of each joint point of the human body, the coordinates of each joint point of the human body may be input as input data into a human skeleton three-dimensional structure recognition model trained in advance, the model may construct a three-dimensional structure of the human skeleton based on the input data, and the coordinate values of each joint point in the three-dimensional structure of the human skeleton may be output.

The three-dimensional structure recognition module of the human skeleton may be obtained by learning, for example, using the labeled human skeleton data as training data by using a currently known supervised learning algorithm, wherein the human skeleton may include position data (the position data may be, for example, coordinate values) of each joint of the human body, for example, a large number of coordinates of each joint of the human body and data of three-dimensional coordinates of the human skeleton may be obtained as a training data set, and learning is performed based on the training data set, so that the three-dimensional structure recognition model of the human skeleton is recognized.

Step 104: determining a target object according to the size proportion;

since the size ratios of bones are generally different between the parts of different human bodies, an individual can be uniquely identified according to the size ratios between bones of a plurality of parts of a human body, and based on this, different human bodies can be distinguished from the above image, that is, a group of bone size ratios corresponds to an individual, so that when the action type of a certain target object needs to be identified, the target object can be identified based on the size ratios between bones of a plurality of parts of the target object.

Step 105: and determining the action type of the target object according to the three-dimensional structure.

For example, the dimensional ratios between bones of the leg, waist, shoulder, forearm, neck, hip, etc. of the human body and the three-dimensional structure of the human body bones may be input to a human motion recognition model obtained by training in advance, to obtain the motion type of the human body. The human motion recognition model can be obtained by training a pre-constructed training data set based on a currently known supervised learning algorithm, wherein the training data set can comprise bone size proportion data among various parts of a human body, which are pre-marked with various motion types, and data of a three-dimensional structure of the human body bone.

According to the human body action recognition method of one or more embodiments of the present disclosure, human body bones are recognized from an acquired image including a human body, a recognition result of the human body bones is obtained, a size ratio among bones of a plurality of parts in the human body and a three-dimensional structure of the human body bones are determined according to the obtained recognition result of the human body bones, and since the size ratio among bones of different parts of the human body and the three-dimensional structure of the human body bones can generally reflect posture information of the human body, and these data are generally not influenced by external factors such as a background in the image and a color of the image, the action type of the human body can be more accurately recognized based on these data, and the recognition accuracy is improved.

In one or more embodiments of the present disclosure, identifying human bone from the image may include:

inputting the image into a pre-trained bone extraction model to obtain bone data, wherein the bone data can comprise the position information of each joint point of the human body, and the position information of each joint point of the human body can be coordinate values of each joint point of the human body;

inputting the position information of each joint point into a human skeleton recognition model obtained through training in advance to obtain a recognition result of the human skeleton, wherein the recognition result of the human skeleton at least comprises the following information:

the name of the bone of each part of the human body, the size of the bone of each part of the human body and the position of the bone of each part of the human body, wherein the position information of the bone of each part of the human body, such as coordinate values corresponding to the bone of each part of the human body.

In one or more embodiments of the present disclosure, determining a dimensional ratio between bones of a plurality of parts of the human body according to the recognition result of the human body bones may include:

inputting the human skeleton recognition result into a pre-trained human skeleton proportion recognition model to obtain the size proportion among skeletons of a plurality of parts of the human body, wherein the parts at least comprise the following three parts: legs, waist, shoulders, arms, neck and hips. Wherein a set of dimensional ratios between bones of a plurality of parts of a human body may be obtained or a plurality of sets of dimensional ratios between bones of a plurality of parts of a human body may be obtained, three sets are illustrated herein, for example, wherein one set includes: the dimensional proportions between the bones of the thigh, head and calf, one set comprising: the dimensional ratios between the bones of the forearm, head and thigh, another group comprising: the dimensional ratio between the bones of the forearm, neck and thigh.

In one or more embodiments of the present disclosure, determining the three-dimensional structure of the human skeleton according to the recognition result of the human skeleton may include:

inputting the recognition result of the human skeleton into a pre-trained human skeleton three-dimensional structure recognition model to obtain the three-dimensional structure of the human skeleton, wherein the three-dimensional structure of the human skeleton can comprise coordinate values corresponding to all joint points in the three-dimensional structure of the human skeleton.

In one or more embodiments of the present disclosure, determining the action type of the target object from the three-dimensional structure may include:

and inputting the data of the three-dimensional structure into a human body action classification model which is obtained through training in advance, and obtaining the action type of the target object.

In one or more embodiments of the present disclosure, determining the action type of the target object according to the three-dimensional structure may include:

acquiring the type of a scene corresponding to the action of the target object;

and inputting the type of the scene and the data of the three-dimensional structure into the human motion classification model to obtain the motion type of the target object in the scene. The scene type may be, for example, a type of scene specified by a user when predicting a type of motion of a human body, based on which recognition of the type of motion of the human body in an image can recognize meaning of the current motion of the human body under the scene, wherein the scene may include, for example, sports, daily life, public places, etc., or the scenes may be further subdivided into various sub-scenes, for example, sports, and may be further subdivided into more specific sports, based on which, after the user specifies that the motion of the human body is currently a certain sports, the meaning of the motion of the human body in the sports of the certain sports may be recognized.

In one or more embodiments of the present disclosure, the human action recognition method may further include:

receiving a target image and annotation information of the target image, wherein the annotation information comprises action types of human bodies in the target image and types of scenes corresponding to the action types; and using the target image and the labeling information as training data, and retraining the human body action recognition model based on the training data to obtain a retrained human body action recognition model. For example, when a user uploads an image to a server, selects a scene type corresponding to the image, and marks a motion type of a human body in the image, the server may store the data as training data of a human body motion recognition model after receiving the data. Retraining the human motion recognition model based on these training data may enable the new model trained to recognize a greater variety of motions in a greater variety of scenes.

In one or more embodiments of the present disclosure, the human motion recognition model is deployed in a device, based on which the human motion recognition method may further include: acquiring working condition parameters of the equipment and parameters of the human body action recognition model; determining working condition parameters of the equipment and parameters of the human body action recognition model, and determining optimal configuration parameters of the equipment; and adjusting the control parameters of the equipment according to the optimal configuration parameters. In one example, the optimal configuration parameters of the device can be obtained by analyzing the monitored data parameter set and the functional relationship between the device operating condition parameter set and the device power factor of the DCS (Distributed Control Systems, distributed control system) based on the historical database (the database stores the historical data of the model parameters and records other historical data during the model operation) and the human action recognition model library, and considering the constraint factors of the device. Wherein the plurality of constraint factors of the device may include: memory, GPU (graphics processor, graphics Processing Unit), CPU (Central Processing Unit ), network, etc. After the optimal configuration parameters are determined, the configuration parameters can be sent to the DCS, so that the DCS adjusts the running state of the equipment through executing the control instruction sent by the system platform. In one or more embodiments of the present disclosure, parameters of the human motion recognition model may be adjusted according to the set target by an automatic optimizing algorithm, an ant colony algorithm, a bee colony algorithm, an artificial fish colony algorithm.

In one or more embodiments of the present disclosure, a target human motion recognition model may be searched in a human motion recognition model library according to device operating mode data of a DCS, motion data of a human motion monitoring system, and data of an image environment of the image environment monitoring system, a target control parameter of a device may be determined according to the searched model, parameters of the human motion recognition model may be adjusted according to the target control parameter, and the adjusted model may be saved as a new human motion recognition model, so that the human motion recognition model may be free from manual configuration, optimal configuration parameters may be analyzed from various real-time data, and then automatic adjustment may be performed, thereby realizing real-time automatic optimization.

To facilitate an understanding of the human action recognition method provided by one or more embodiments of the present disclosure, the method is exemplarily described below in connection with fig. 2, and as shown in fig. 2, the method includes:

acquiring image data from a DCS;

constructing training data based on the acquired image data;

training a bone extraction model based on training data, inputting application data (namely data of an image to be predicted) into the bone extraction model after training, and identifying human bone data;

obtaining a human skeleton size proportion and a human skeleton three-dimensional structure according to human skeleton data;

obtaining a human body action recognition result according to the human body bone size proportion and the human body bone three-dimensional structure;

as shown in fig. 2, after the human motion recognition result is obtained, the recognition result and the image corresponding to the result may be added to a training data set of the human motion recognition model, and a new human motion recognition model may be trained based on the training data set later to update the model.

One or more embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the human action recognition methods described above.

Fig. 3 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

1. A human motion recognition method, comprising:

acquiring an image containing a human body;

identifying human bones from the image to obtain an identification result of the human bones;

determining the size proportion among bones of a plurality of parts of the human body according to the identification result of the human body bones, and determining the three-dimensional structure of the human body bones;

determining a target object according to the size proportion;

determining the action type of the target object according to the three-dimensional structure;

determining the action type of the target object according to the three-dimensional structure, wherein the method comprises the following steps:

inputting the data of the three-dimensional structure into a human body action classification model which is obtained through training in advance, and obtaining the action type of the target object;

wherein the three-dimensional structure comprises coordinate values corresponding to all joint points in the three-dimensional structure of the human skeleton;

the three-dimensional structure determines the action type of the target object, including:

acquiring the type of a scene corresponding to the action of the target object;

and inputting the type of the scene and the data of the three-dimensional structure into the human motion classification model to obtain the motion type of the target object in the scene.

2. The method of claim 1, wherein identifying human bone from the image comprises:

inputting the image into a pre-trained bone extraction model to obtain bone data, wherein the bone data comprises position information of each joint point of the human body;

inputting the position information of each joint point into a human skeleton recognition model obtained through training in advance to obtain a human skeleton recognition result, wherein the human skeleton recognition result at least comprises the following information:

the name of the bones of each part of the human body, the size of the bones of each part of the human body and the position information of the bones of each part of the human body.

3. The method of claim 1, wherein determining the dimensional proportions between bones of the plurality of parts of the human body based on the identification of the bones of the human body comprises:

inputting the human skeleton recognition result into a pre-trained human skeleton proportion recognition model to obtain the size proportion among skeletons of a plurality of parts of the human body, wherein the parts at least comprise the following three parts:

legs, waist, shoulders, arms, neck and hips.

4. The method of claim 1, wherein determining the three-dimensional structure of the human skeleton based on the recognition result of the human skeleton comprises:

inputting the identification result of the human skeleton into a pre-trained human skeleton three-dimensional structure identification model to obtain the three-dimensional structure of the human skeleton.

5. The method according to claim 1, wherein the method further comprises:

receiving a target image and annotation information of the target image, wherein the annotation information comprises action types of human bodies in the target image and types of scenes corresponding to the action types;

and using the target image and the labeling information as training data, and retraining the human body action recognition model based on the training data to obtain a retrained human body action recognition model.

6. The method of any one of claims 1 to 5, wherein the human action recognition model is deployed in a device, the method further comprising:

acquiring working condition parameters of the equipment and parameters of the human body action recognition model;

determining optimal configuration parameters of the equipment according to the working condition parameters of the equipment and the parameters of the human body action recognition model;

and adjusting the control parameters of the equipment according to the optimal configuration parameters.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the human action recognition method of any one of claims 1 to 5 when the program is executed by the processor.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the human action recognition method of any one of claims 1 to 5.