CN115019343A - Human body action recognition method, device and equipment - Google Patents

Human body action recognition method, device and equipment Download PDF

Info

Publication number
CN115019343A
CN115019343A CN202210668939.XA CN202210668939A CN115019343A CN 115019343 A CN115019343 A CN 115019343A CN 202210668939 A CN202210668939 A CN 202210668939A CN 115019343 A CN115019343 A CN 115019343A
Authority
CN
China
Prior art keywords
human body
action
limb
basic
parts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210668939.XA
Other languages
Chinese (zh)
Inventor
潘华东
魏乃科
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202210668939.XA priority Critical patent/CN115019343A/en
Publication of CN115019343A publication Critical patent/CN115019343A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Abstract

The application discloses a human body action recognition method, a human body action recognition device and human body action recognition equipment, wherein a plurality of frames of images including a human body are received, and action basic elements in each frame of image are analyzed according to predefined action basic elements forming actions; determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image; determining the basic actions matched with the multi-frame images according to the conversion relation between the limb states corresponding to the sampling time according to the conversion relation between the limb states corresponding to different predefined basic actions; and obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions. The method can be used for identifying the actions according to the requirements of different industries.

Description

Human body action recognition method, device and equipment
Technical Field
The application relates to the technical field of video processing, in particular to a human body action recognition method, device and equipment.
Background
Action recognition is an important item in the digital transformation process of the industry, but the requirements for actions are different due to different industry requirements, and the requirements are very fragmented. For example, a content audit is performed on the video data, and the action identification is part of the content audit and is used for filtering the video data related to violence; in the aspect of action skill training, after motion data transmitted by the data acquisition device needs to be calculated and analyzed, position and posture information and the like of a user during motion can be acquired, so that a basis is provided for the user to share data, obtain action guidance and the like; in the service industry, the actions of the service personnel are identified to judge whether the behaviors of the personnel meet the requirements of the industry.
The prior art is to receive video data with multiple frames of original image data; sampling from the original image data to obtain target image data; identifying the action appearing in the video data according to the global characteristic of the target image data to obtain a global action; identifying the action appearing in the video data according to the local characteristics of the target image data to obtain a local action; fusing the global action with the local action as a target action occurring in the video data. The method is characterized in that a specific model is trained to extract global features and local features so as to recognize specific actions, the model trained by the method has no expansibility, and if a new action occurs, the model needs to be retrained, so that action recognition cannot be performed according to requirements of different industries.
Disclosure of Invention
In order to solve the problem that in the prior art, human body action recognition is carried out in a mode of a trained model, if a new action occurs, the model needs to be retrained, and action recognition cannot be carried out according to requirements of different industries, the application provides a human body action recognition method, a human body action recognition device and human body action recognition equipment.
In a first aspect, the present application provides a human motion recognition method, including:
receiving a plurality of frames of images including a human body, and analyzing action basic elements in each frame of image according to predefined action basic elements forming actions;
determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image;
determining the basic actions matched with the multi-frame images according to the conversion relation between the limb states corresponding to the sampling time according to the conversion relation between the limb states corresponding to different predefined basic actions;
and obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions.
In one possible embodiment, the predefined action base elements that make up the action include at least one of:
the human body parts corresponding to different parts of the human body, the key points corresponding to different parts when the human body acts, the key points corresponding to different positions when the human body parts act, the target area related to the human body action and the target object related to the human body action.
In a possible implementation manner, determining an association relationship between each part of the human body and other targets according to the analyzed action basic elements to obtain a limb state corresponding to the human body in each frame of image, includes:
and determining the association relationship among all parts of the human body, the distance and the angle between all parts of the human body and the target object, and the static relationship or the dynamic change relationship of the human body track according to the analyzed action basic elements.
In one possible implementation mode, a limb state classification model is trained in advance through different sample images and corresponding limb states;
the method comprises the steps of inputting each frame of image into a limb state classification model, analyzing action basic elements in each frame of image through the limb state classification model, and determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image.
In a possible implementation manner, determining an association relationship between each part of the human body and other targets according to the analyzed action basic elements to obtain a limb state corresponding to the human body in each frame of image, includes:
acquiring predefined association relations between various parts of the human body corresponding to different limb states and logic combinations of the association relations between the various parts of the human body and other targets;
and determining the association relationship among all the parts of the human body and the corresponding logic combination of the association relationship between all the parts of the human body and other targets according to the analyzed action basic elements to obtain the body state corresponding to the human body in each frame of image.
In a possible implementation manner, the logical combination corresponding to the association relationship between each part of the human body and other targets includes at least one of the following:
the position relation between key points corresponding to different parts and human body parts corresponding to different parts of the human body when the human body acts;
the position relation between key points corresponding to different parts and the target area related to the human body action or the target object related to the human body action when the human body acts;
compared with the previous frame of image data, the moving direction and the moving speed of key points corresponding to different parts when the human body acts;
the distance relationship between key points corresponding to different parts when the human body acts;
the relation between the line segments corresponding to different parts when the human body acts comprises the ratio of the length of the line segments, whether the line segments are intersected or not and the intersected angle of the line segments, wherein the line segments of the human body parts are constructed by any two predefined key points of the human body parts;
the finger points, and the pointing direction comprises at least one of upward pointing, downward pointing, leftward pointing and rightward pointing.
In a possible implementation manner, after obtaining the body state corresponding to the human body in each frame of image, the method further includes:
determining the human body direction of the human body relative to the acquisition equipment in each frame of image to obtain the limb states corresponding to different human body directions;
wherein the human body direction includes: the human body front direction collecting device, the human body front direction back direction collecting device, the human body face collecting device and the inclined part, the human body back direction collecting device and the inclined part, the human body left side face collecting device and the human body right side face collecting device.
In a possible embodiment, the predefined transition relationship between the limb states corresponding to different basic actions includes at least one of:
predefining at least one limb state corresponding to each basic action, a conversion relation corresponding to the at least one limb state, and a conversion relation corresponding to the at least one accompanying limb state and the at least one accompanying limb state;
and analyzing the association relation between each part of the human body to obtain the limb state, and analyzing the distance and the angle between each part of the human body and the object and the static relation or the dynamic change relation of the human body track to obtain the accompanying conversion relation.
In a possible implementation manner, the conversion relationship corresponding to the at least one limb state includes a continuous mode in which the limb state corresponding to the multi-frame image is one, and a state conversion mode in which switching is performed according to a predetermined order when the limb state corresponding to the multi-frame image is multiple;
the at least one conversion relation of the accompanying limb states comprises a continuous mode in which the accompanying limb states corresponding to the multi-frame images are one, and a state conversion mode in which the state conversion mode is switched according to a preset sequence when the accompanying limb states corresponding to the multi-frame images are multiple.
In one possible implementation, the basic action combination corresponding to different custom actions is determined as follows:
determining at least one basic action corresponding to the user-defined action, and combining the at least one basic action according to a time sequence, wherein the basic action corresponding to the user-defined action is combined.
In a second aspect, the present application provides a human motion recognition apparatus, the apparatus comprising:
the analysis module is used for receiving multi-frame images including human bodies and analyzing action basic elements in each frame of image according to predefined action basic elements forming actions;
the body state determining module is used for determining the incidence relation among all the parts of the human body and the incidence relation between all the parts of the human body and other targets according to the analyzed action basic elements to obtain the body state corresponding to the human body in each frame of image;
the basic action determining module is used for determining the basic action matched with the multi-frame image according to the conversion relationship between the limb states corresponding to the sampling time according to the predefined conversion relationship between the limb states corresponding to different basic actions;
and the action determining module is used for obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions.
In a third aspect, the present application provides a human motion recognition apparatus, the apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
In a fourth aspect, the present application provides a computer storage medium storing a computer program for causing a computer to perform the method of the first aspect.
The application provides a human body action recognition method, a human body action recognition device and human body action recognition equipment.
Drawings
Fig. 1 is a schematic view of an application scenario of a human body motion recognition method according to an exemplary embodiment of the present invention;
fig. 2 is a flowchart illustrating a human body motion recognition method according to an exemplary embodiment of the present invention;
FIG. 3 is a flow diagram illustrating a custom action build process according to an exemplary embodiment of the present invention;
FIG. 4 is a schematic diagram of an exemplary limb state construction according to an exemplary embodiment of the present invention;
FIG. 5 is a basic action building diagram according to an example of an exemplary embodiment of the present invention;
FIG. 6 is a diagram illustrating an exemplary custom action build according to an illustrative embodiment of the present invention;
FIG. 7 is a diagram illustrating custom action construction according to an example embodiment of the invention;
fig. 8 is a schematic view of a human body motion recognition apparatus according to an example embodiment of the present invention;
fig. 9 is a schematic diagram illustrating a human motion recognition apparatus according to an exemplary embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be described in detail and clearly with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic view of an application scenario of a motion recognition method provided in an embodiment of the present application, where the application scenario includes: server 101, database 102, at least one acquisition device (acquisition device 103_1, acquisition device 103_2, acquisition device 103_3 in the example in the figure). The system comprises a collection device, a server 101 and a database 102, wherein the collection device is used for collecting multi-frame images including a human body and sending the multi-frame images to the server 101, the model of the collection device is not limited, the multi-frame images can be obtained, the server 101 is used for receiving the multi-frame images sent by the collection device and identifying human body actions, and the database 102 is used for storing programs and data required by a human body action identification method.
Action recognition is an important item in the digital transformation process of the industry, but the requirements for actions are different due to different industry requirements, and the requirements are very fragmented. For example, a content audit is performed on the video data, and the action identification is part of the content audit and is used for filtering the video data related to violence; in the aspect of action skill training, after motion data transmitted by the data acquisition device needs to be calculated and analyzed, position and posture information and the like of a user during motion can be acquired, so that a basis is provided for the user to share data, obtain action guidance and the like; in the service industry, the actions of the service personnel are identified to judge whether the behaviors of the personnel meet the requirements of the industry.
However, in the prior art, there is no expansibility in a manner of performing motion recognition by training a classification model, and motion recognition cannot be performed according to the requirements of different industries, and for the problems in the prior art, the embodiment of the present application provides a human body motion recognition method, as shown in fig. 2, the method includes:
s201: receiving a plurality of frames of images including a human body, and analyzing the action basic elements in each frame of image according to predefined action basic elements forming the action.
The method comprises the following steps that a plurality of frames of images are collected by a collecting device capable of obtaining the plurality of frames of images, action basic elements are predefined according to user requirements, and in the embodiment provided by the application, the predefined action basic elements forming the action comprise at least one of the following:
the human body parts corresponding to different parts of a human body, key points corresponding to different parts when the human body acts, key points corresponding to different positions when the human body parts act, a target area related to the human body action and a target object related to the human body action.
The action basic elements can include any one or more of the above, and can also include other types, such as other animals related to human actions, which can be expanded according to user requirements or different industry requirements, and are not limited in detail here.
S202: and determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image.
After analyzing the multi-frame images to obtain the action basic elements included in the multi-frame images, analyzing the association relationship between each part of the human body and other targets, wherein the association relationship can be but is not limited to predefined one.
The limb state is defined as the incidence relation between each part of the human body, or the distance and angle between each part of the human body and a target object, and the static relation or dynamic change relation of the human body track. The static relation of the human body track is that the human body in the current frame image tends to be close to or far away from the target object, and the dynamic relation of the human body track is that the human body is close to or far away from the target object compared with the previous frame image in the current frame image.
S203: and determining the basic action matched with the multi-frame image according to the conversion relation between the limb states corresponding to the sampling time according to the predefined conversion relation between the limb states corresponding to different basic actions.
In one possible embodiment, the predefined conversion relationship between the limb states corresponding to different basic motions includes at least one of the following:
predefining at least one limb state corresponding to each basic action, a conversion relation corresponding to the at least one limb state, and a conversion relation corresponding to the at least one accompanying limb state and the at least one accompanying limb state;
and analyzing the association relation between each part of the human body to obtain the limb state, and analyzing the distance and the angle between each part of the human body and the object and the static relation or the dynamic change relation of the human body track to obtain the accompanying conversion relation.
S204: and obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions.
In one possible implementation, the following method is adopted to determine the basic action combination corresponding to different custom actions:
determining at least one basic action corresponding to the user-defined action, and combining the at least one basic action according to a time sequence, wherein the basic action corresponding to the user-defined action is combined.
As shown in fig. 3, the human body motion recognition method provided in the embodiment of the present application is to construct a body state by predefined motion basic elements, construct a basic motion by a predefined conversion relationship between the body states, combine the basic motions according to a time sequence to obtain a customized motion, recognize a human body motion in a multi-frame image according to a process of constructing the customized motion, and if a new motion occurs, define the motion according to the process of customizing the motion, so that motion types in the whole human body motion recognition system are supplemented, and thus human body motion recognition requirements of different industries can be met.
The motion basic elements in S201 include at least one of human body parts corresponding to different parts of a human body, key points corresponding to different parts when the human body moves, key points corresponding to different positions when the human body parts move, a target area related to the human body motion, and a target object related to the human body motion.
Wherein, the human body part that different positions of human body correspond includes: head, hands, feet, upper body, etc.; the key points corresponding to different parts during human body action comprise: wrist, elbow, knee, head, foot, etc.; the key points corresponding to different positions when the human body part acts can be that when the hand acts, the key points comprise: thumb tip, index finger tip, etc.; the target area related to the human body motion comprises: the room, street, predefined spatial range, etc. where the human body is in when acting; the target object related to the human body motion comprises: teapots, books, cups, tables, chairs, etc. The listed action basic elements can be adjusted according to the requirements and actual conditions of users.
The above-mentioned limb status in S202 includes two sources:
(1) and (5) a limb state classification model.
In one possible implementation mode, a limb state classification model is trained in advance through different sample images and corresponding limb states;
the method comprises the steps of inputting each frame of image into a limb state classification model, analyzing action basic elements in each frame of image through the limb state classification model, and determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image.
The limb state classification model comprises a human body state model and a hand state model, wherein the human body state model comprises: standing upright, lying on stomach, squatting, sitting on the ground, sitting on a chair, lying on a table, tilting legs, and the like;
the hand state model comprises: OK, like, click, gesture 1, gesture 2, gesture 3, gesture 5, fist making, etc.
The listed models can be specifically increased according to the requirements of users and actual conditions.
(2) And the logical combination of the incidence relation among all the parts of the human body and the incidence relation between all the parts of the human body and other targets.
In a possible implementation manner, determining an association relationship between each part of the human body and other targets according to the analyzed action basic elements to obtain a limb state corresponding to the human body in each frame of image, includes:
acquiring predefined association relations between each part of the human body corresponding to different limb states and logic combinations of the association relations between each part of the human body and other targets;
and determining the association relationship among all the parts of the human body and the corresponding logic combination of the association relationship between all the parts of the human body and other targets according to the analyzed action basic elements to obtain the body state corresponding to the human body in each frame of image.
In a possible implementation manner, the logical combination corresponding to the association relationship between each part of the human body and other targets includes at least one of the following:
the position relation between key points corresponding to different parts and human body parts corresponding to different parts of the human body when the human body acts;
the position relation between key points corresponding to different parts and the target area related to the human body action or the target object related to the human body action when the human body acts;
compared with the previous frame of image data, the moving direction and the moving speed of key points corresponding to different parts when the human body acts;
the distance relationship between key points corresponding to different parts when the human body acts;
the relation between the corresponding line segments of different parts when the human body acts comprises the ratio between the lengths of the line segments, whether the line segments are intersected or not and the intersected angle of the line segments, and the human body part line segments are constructed by any two predefined key points of the human body part;
the finger points, and the pointing direction comprises at least one of upward pointing, downward pointing, leftward pointing and rightward pointing.
The position relationship between the key points corresponding to different parts and the human body parts corresponding to different parts of the human body during the human body action, and the position relationship between the key points corresponding to different parts and the target area related to the human body action or the target object related to the human body action during the human body action can be simply described as the relationship between points and frames; compared with the previous frame of image data, the moving direction and the moving speed of the key points corresponding to different parts during the action of the human body can be simply described as point motion; the distance relationship between the key points corresponding to different parts during the human body movement can be simply described as a 'point-to-point relationship'; the relationship between the line segments corresponding to different components during the human body movement can be simply described as the relationship between the line segments.
And sorting the predefined logic combination of the incidence relations among the human body parts corresponding to different limb states and the incidence relations between the human body parts and other targets according to a preset table entry, wherein the table entry comprises the incidence relations, the incidence parties, the relation types and the corresponding parameters.
Specifically, as shown in table 1:
TABLE 1
Figure BDA0003692479600000111
In table 1, the X-axis and the Y-axis of the picture refer to pixel coordinates generated by the capturing device when capturing an image, generally, the upper left corner of the image is used as an origin, the upward/downward/left/right movement refers to a movement direction relative to a human body, and the horizontal/vertical/picture distance refers to a movement in the pixel coordinates or a position in the pixel coordinates.
For the relationship between a point and a point, the distance between the point and the point needs to be normalized, specifically, the actual distance between two key points is obtained first, and then the normalized distance between the two key points is obtained by dividing the actual distance by a predefined reference dimension, and the normalization has the function of changing the absolute value of the physical system numerical value into a certain relative value relationship, so that the calculation can be simplified, and the magnitude value can be reduced.
The perspective relation of the camera is considered, and the limb state is constructed by independently combining the directions on the basis of the incidence relation.
In a possible implementation manner, after obtaining the body state corresponding to the human body in each frame of image, the method further includes:
determining the human body direction of the human body relative to the acquisition equipment in each frame of image to obtain the limb states corresponding to different human body directions;
wherein human direction includes: the human body front direction collecting device, the human body front direction back direction collecting device, the human body face collecting device and the inclined part, the human body back direction collecting device and the inclined part, the human body left side face collecting device and the human body right side face collecting device. In the process of human body action recognition, the human body direction is obtained by analyzing the image. In different human body directions, the body states obtained through analysis are different, for example, the body states of the hands cannot be obtained under the condition that the human body faces the collecting device, so that the human body direction needs to be considered when the body states are constructed in advance.
As shown in fig. 4, the limb state may be an and or combination of the above associations, for example, when the limb state occurs from the association a, the association B or the component relationship C occurs, and the limb state is AB + AC.
FIG. 4 contains two columns, each column representing an "and" association and an "OR" relationship between the columns. A, B two associations are filled in the first column and A, C two associations are filled in the 2 nd column. They together form a limb state, and the different camera directions are also combined in an or relationship. The specific relationship type, the associated party, the parameters, and the like of the association relationship A, B, C need to be generated by debugging according to the actual service scenario.
The basic operation in S203 is specifically constructed by the following embodiments:
the basic action is formed by combining the continuous, switching and accompanying conversion relations of the limb states.
In a possible implementation manner, the conversion relationship corresponding to the at least one limb state includes a continuous mode in which the limb state corresponding to the multi-frame image is one, and a state conversion mode in which switching is performed according to a predetermined order when the limb state corresponding to the multi-frame image is multiple;
the at least one conversion relation of the accompanying limb states comprises a continuous mode in which the accompanying limb states corresponding to the multi-frame images are one, and a state conversion mode in which the state conversion mode is switched according to a preset sequence when the accompanying limb states corresponding to the multi-frame images are multiple.
As shown in fig. 5, the limb states only include one row, and may include 1 or more limb states, and if only one limb state is included, the state is the state continuation mode, as shown in fig. 5, only the limb state a, such as the seesaw-two-lang leg, is included in the Ta period. Wherein the Ta period includes a plurality of frames of images.
If the body states are included and the switching is carried out according to the preset sequence, the state switching mode is set. As shown in fig. 5, the limb state a is in the period Ta, and the limb state B is switched to in the period Tb, and the fall is formed by the transition between the standing and falling. Tab represents the process of switching the limb state from A to B.
A row of companion patterns may be added in a persistent or transitional pattern of limb states. As shown in fig. 5, in the period Ta, the legs of the user are lifted, and the state of the limbs sitting on the chair is accompanied. And a plurality of images are included in the time period Ta, wherein each image comprises the limb state A and the corresponding accompanying limb state E. Or a plurality of limb states can be continuously generated, and one accompanying limb state is added, such as limb states B and C in FIG. 5 corresponding to the same accompanying limb state F. There is also a transition mode accompanying the limb state, and in fig. 5, the accompanying limb state is switched from E to F in the period Ta to Tc.
For complex business requirements, coherent and complex action definitions can be performed through the time sequence combination of basic actions. The specific construction flow is shown in fig. 6.
Given the intuitive convenience of user configuration, the action definition includes a selection of a base action, and an associated modifier constraint for the base action. The decoration content comprises posture, body direction, limbs and the like, and can also comprise other body parts such as head, shoulders and the like, and the decoration content can be specifically set according to the actual situation. The posture can be a human body posture, such as squatting, lying, standing, sitting, bending and the like. Human body orientation, such as front, back, side, etc. "limbs" is defined as the human body's limbs, such as the left hand, right hand, both hands, left foot, right foot, both feet, etc. In addition to the above modification conditions, other modification conditions may be possible, such as camera top mount or tilt mount. Dress ornament such as police uniform, nurse uniform, etc. For different industries, the same limb state may represent different basic actions, such as the service industry, and the delivery of things to guests may be performed when the limb state is stoop, and the infusion of fluid to patients may be performed when the limb state is stoop, such as a nurse.
For the action of raising the legs, the two basic actions of standing the feet and raising the feet are combined. Specifically, as shown in fig. 7, first, a basic action vertical leg is constructed: selecting a sitting posture, the front side/side surface of the human body direction and the four limbs and two feet to form a limb state, and constructing a basic action through the limb state to erect the feet; then constructing a basic action tilting foot: selecting a posture sitting posture and a front side/side of the human body direction to form a limb state, and constructing a basic action tilting foot according to the limb state; and finally, combining the basic action vertical foot placement and the basic action tilting foot according to a time sequence to obtain the user-defined action tilting Erlang leg.
In order to include these modification states for each basic action, it is necessary to limit the amount of modification that the basic action can support, that is, to combine the basic actions. The user use freedom degree is improved by performing action customization in the mode, related actions can be designed by the provided elements, reusability is achieved, the multiplexing efficiency of the system is improved, and the maintenance cost is reduced.
Based on the same inventive concept, the present application provides a human body motion recognition apparatus 800, as shown in fig. 8, the apparatus includes:
an analysis module 801, configured to receive multiple frames of images including a human body, and analyze an action basic element in each frame of image according to a predefined action basic element constituting an action;
a body state determining module 802, configured to determine, according to the analyzed action basic elements, an association relationship between each component of the human body and other targets, so as to obtain a body state corresponding to the human body in each frame of image;
a basic action determining module 803, configured to determine, according to predefined transition relationships between the body states corresponding to different basic actions, a basic action matched to the multi-frame image according to the transition relationship between the body states corresponding to the sampling times;
and the action determining module 804 is used for obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions.
In one possible implementation, the parsing module 801 is configured to determine the action basic elements of the predefined component action, including at least one of the following:
the human body parts corresponding to different parts of a human body, key points corresponding to different parts when the human body acts, key points corresponding to different positions when the human body parts act, a target area related to the human body action and a target object related to the human body action.
In a possible implementation manner, the body state determining module 802 is configured to determine, according to the motion basic elements obtained through the analysis, association relationships between each component of the human body and other targets, to obtain a body state corresponding to the human body in each frame of image, and includes:
and determining the incidence relation among all parts of the human body, the distance and the angle between all parts of the human body and the target object, and the static relation or the dynamic change relation of the human body track according to the analyzed action basic elements.
In one possible implementation, the determine limb state module 802 is configured to train a limb state classification model in advance through different sample images and corresponding limb states;
the method comprises the steps of inputting each frame of image into a limb state classification model, analyzing action basic elements in each frame of image through the limb state classification model, and determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image.
In a possible implementation manner, the body state determining module 802 is configured to determine, according to the motion basic elements obtained through the analysis, an association relationship between each component of the human body and another target, and obtain a body state corresponding to the human body in each frame of image, and includes:
acquiring predefined association relations between various parts of the human body corresponding to different limb states and logic combinations of the association relations between the various parts of the human body and other targets;
and determining the association relationship among all parts of the human body and the corresponding logic combination of the association relationship between all parts of the human body and other targets according to the analyzed action basic elements to obtain the body state corresponding to the human body in each frame of image.
In a possible implementation, the module for determining a state of limbs 802 is configured to determine a logical combination corresponding to an association relationship between each component of the human body and other objects, where the logical combination includes at least one of:
the position relation between key points corresponding to different parts and human body parts corresponding to different parts of the human body when the human body acts;
the position relation between key points corresponding to different parts and the target area related to the human body action or the target object related to the human body action when the human body acts;
compared with the previous frame of image data, the moving direction and the moving speed of key points corresponding to different parts when the human body acts;
distance relations between key points corresponding to different parts when the human body acts;
the relation between the line segments corresponding to different parts when the human body acts comprises the ratio of the length of the line segments, whether the line segments are intersected or not and the intersected angle of the line segments, wherein the line segments of the human body parts are constructed by any two predefined key points of the human body parts;
the pointing direction of the finger comprises at least one of upward pointing, downward pointing, leftward pointing and rightward pointing.
In a possible implementation, after the determining the limb state module 802 is configured to obtain the corresponding limb state of the human body in each frame of image, the method further includes:
determining the human body direction of the human body relative to the acquisition equipment in each frame of image to obtain the limb states corresponding to different human body directions;
wherein human direction includes: the human body front direction collecting device, the human body front direction back direction collecting device, the human body face collecting device and the inclined part, the human body back direction collecting device and the inclined part, the human body left side face collecting device and the human body right side face collecting device.
In a possible implementation, the determining the basic motion module 803 is used for a transition relationship between the predefined limb states corresponding to different basic motions, and includes at least one of the following:
predefining at least one limb state corresponding to each basic action, a conversion relation corresponding to the at least one limb state, and a conversion relation corresponding to the at least one accompanying limb state and the at least one accompanying limb state;
and analyzing the association relation between each part of the human body to obtain the limb state, and analyzing the distance and the angle between each part of the human body and the object and the static relation or the dynamic change relation of the human body track to obtain the accompanying conversion relation.
In a possible implementation manner, the determination basis action module 803 is configured to determine a conversion relationship corresponding to the at least one limb state, where the conversion relationship includes a continuous mode in which the limb state corresponding to multiple frames of images is one, and a state conversion mode in which switching is performed according to a predetermined order when the limb state corresponding to multiple frames of images is multiple;
the at least one conversion relation of the accompanying limb states comprises a continuous mode in which the accompanying limb states corresponding to the multi-frame images are one, and a state conversion mode in which the state conversion mode is switched according to a preset sequence when the accompanying limb states corresponding to the multi-frame images are multiple.
In one possible implementation, the determine action module 804 is configured to determine the corresponding basic action combination of different custom actions as follows:
determining at least one basic action corresponding to the user-defined action, and combining the at least one basic action according to a time sequence, wherein the basic action corresponding to the user-defined action is combined.
Based on the same inventive concept, the present application provides a human motion recognition apparatus, the apparatus including:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as any one of the human action recognition methods in the embodiments of the present application.
The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 9. The electronic device 130 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 9, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that couples various system components including the memory 132 and the processor 131.
The processor 131 is configured to read and execute the instructions in the memory 132, so that the at least one processor can execute the human body motion recognition method provided in the foregoing embodiments.
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, various aspects of a human body motion recognition method provided by the present application may also be implemented in the form of a program product, which includes program code for causing a computer device to perform the steps of a human body motion recognition according to various exemplary embodiments of the present application described above in this specification, when the program product is run on the computer device.
In addition, the present application also provides a computer-readable storage medium storing a computer program for causing a computer to execute the method of any one of the above embodiments.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (13)

1. A human body action recognition method is characterized by comprising the following steps:
receiving a plurality of frames of images including a human body, and analyzing action basic elements in each frame of image according to predefined action basic elements forming actions;
determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image;
determining the basic actions matched with the multi-frame images according to the conversion relation between the limb states corresponding to the sampling time according to the conversion relation between the limb states corresponding to different predefined basic actions;
and obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions.
2. The method of claim 1, wherein the predefined action base elements that make up the action comprise at least one of:
the human body parts corresponding to different parts of the human body, the key points corresponding to different parts when the human body acts, the key points corresponding to different positions when the human body parts act, the target area related to the human body action and the target object related to the human body action.
3. The method according to claim 1, wherein determining the association relationship between each part of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image comprises:
and determining the association relationship among all parts of the human body, the distance and the angle between all parts of the human body and the target object, and the static relationship or the dynamic change relationship of the human body track according to the analyzed action basic elements.
4. The method of any one of claims 1 to 3, further comprising:
training a limb state classification model through different sample images and corresponding limb states in advance;
the method comprises the steps of inputting each frame of image into a limb state classification model, analyzing action basic elements in each frame of image through the limb state classification model, and determining the incidence relation among all parts of the human body and the incidence relation between all parts of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image.
5. The method according to any one of claims 1 to 3, wherein determining the association relationship between each part of the human body and other targets according to the analyzed action basic elements to obtain the corresponding limb state of the human body in each frame of image comprises:
acquiring predefined association relations between various parts of the human body corresponding to different limb states and logic combinations of the association relations between the various parts of the human body and other targets;
and determining the association relationship among all the parts of the human body and the corresponding logic combination of the association relationship between all the parts of the human body and other targets according to the analyzed action basic elements to obtain the body state corresponding to the human body in each frame of image.
6. The method according to claim 5, wherein the logical combination of the association relationship between the parts of the human body and other objects comprises at least one of the following:
the position relation between key points corresponding to different parts and human body parts corresponding to different parts of the human body when the human body acts;
the position relation between key points corresponding to different parts and the target area related to the human body action or the target object related to the human body action when the human body acts;
compared with the previous frame of image data, the moving direction and the moving speed of key points corresponding to different parts when the human body acts;
distance relations between key points corresponding to different parts when the human body acts;
the relation between the line segments corresponding to different parts when the human body acts comprises the ratio of the length of the line segments, whether the line segments are intersected or not and the intersected angle of the line segments, wherein the line segments of the human body parts are constructed by any two predefined key points of the human body parts;
the finger points, and the pointing direction comprises at least one of upward pointing, downward pointing, leftward pointing and rightward pointing.
7. The method of claim 2, after obtaining the corresponding limb state of the human body in each frame of image, further comprising:
determining the human body direction of the human body in each frame of image relative to the acquisition equipment to obtain the limb states corresponding to different human body directions;
wherein the human body direction includes: the human body front direction collecting device, the human body front direction back direction collecting device, the human body face collecting device and the inclined part, the human body back direction collecting device and the inclined part, the human body left side face collecting device and the human body right side face collecting device.
8. The method according to claim 3, wherein the predefined transition relationship between the limb states corresponding to different basic actions comprises at least one of:
predefining at least one limb state corresponding to each basic action, a conversion relation corresponding to the at least one limb state, and a conversion relation corresponding to the at least one accompanying limb state and the at least one accompanying limb state;
and analyzing the association relation between each part of the human body to obtain the limb state, and analyzing the distance and the angle between each part of the human body and the object and the static relation or the dynamic change relation of the human body track to obtain the accompanying conversion relation.
9. The method of claim 8,
the conversion relation corresponding to the at least one limb state comprises a continuous mode in which the limb state corresponding to the multi-frame image is one, and a state conversion mode in which switching is performed according to a preset sequence when the limb state corresponding to the multi-frame image is multiple;
the at least one conversion relation of the accompanying limb states comprises a continuous mode in which the accompanying limb states corresponding to the multi-frame images are one, and a state conversion mode in which the state conversion mode is switched according to a preset sequence when the accompanying limb states corresponding to the multi-frame images are multiple.
10. The method of claim 1, wherein the basic action combinations corresponding to different custom actions are determined as follows:
determining at least one basic action corresponding to the user-defined action, and combining the at least one basic action according to a time sequence to obtain a basic action combination corresponding to the user-defined action.
11. A human motion recognition apparatus, comprising:
the analysis module is used for receiving multi-frame images including human bodies and analyzing the action basic elements in each frame of image according to predefined action basic elements forming actions;
the body state determining module is used for determining the incidence relation among all the parts of the human body and the incidence relation between all the parts of the human body and other targets according to the analyzed action basic elements to obtain the body state corresponding to the human body in each frame of image;
the basic action determining module is used for determining the basic action matched with the multi-frame image according to the conversion relationship between the limb states corresponding to the sampling time according to the predefined conversion relationship between the limb states corresponding to different basic actions;
and the action determining module is used for obtaining the custom action corresponding to the matched basic action according to the basic action combination corresponding to different custom actions.
12. A human motion recognition device, characterized in that the device comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
13. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method according to any one of claims 1-10.
CN202210668939.XA 2022-06-14 2022-06-14 Human body action recognition method, device and equipment Pending CN115019343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210668939.XA CN115019343A (en) 2022-06-14 2022-06-14 Human body action recognition method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210668939.XA CN115019343A (en) 2022-06-14 2022-06-14 Human body action recognition method, device and equipment

Publications (1)

Publication Number Publication Date
CN115019343A true CN115019343A (en) 2022-09-06

Family

ID=83075431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210668939.XA Pending CN115019343A (en) 2022-06-14 2022-06-14 Human body action recognition method, device and equipment

Country Status (1)

Country Link
CN (1) CN115019343A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661944A (en) * 2022-12-29 2023-01-31 浙江大华技术股份有限公司 Motion recognition method, electronic device and computer-readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661944A (en) * 2022-12-29 2023-01-31 浙江大华技术股份有限公司 Motion recognition method, electronic device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
Chen et al. A survey of depth and inertial sensor fusion for human action recognition
CN1960674B (en) System and method for ergonomic tracking for individual physical exertion
JP2018077882A (en) Method and system for operation environment having multiple client devices and displays
CN103930944B (en) Adaptive tracking system for space input equipment
Datcu et al. On the usability and effectiveness of different interaction types in augmented reality
CN103988150A (en) Fast fingertip detection for initializing vision-based hand tracker
WO2005124604A1 (en) System and method for simulating human movement using profile paths
CN107930048B (en) Space somatosensory recognition motion analysis system and motion analysis method
CN106201173A (en) The interaction control method of a kind of user's interactive icons based on projection and system
CN104881526A (en) Article wearing method and glasses try wearing method based on 3D (three dimensional) technology
WO2021097750A1 (en) Human body posture recognition method and apparatus, storage medium, and electronic device
CN115019343A (en) Human body action recognition method, device and equipment
US20220198774A1 (en) System and method for dynamically cropping a video transmission
CN112614234A (en) Method for editing mixed reality three-dimensional scene and mixed reality equipment
Fender et al. SpaceState: Ad-Hoc definition and recognition of hierarchical room states for smart environments
Shipley et al. Markerless motion-capture for point-light displays
Soroni et al. Hand Gesture Based Virtual Blackboard Using Webcam
CN110517298A (en) Path matching method and apparatus
JPWO2016021152A1 (en) Posture estimation method and posture estimation apparatus
CN112613490B (en) Behavior recognition method and device, machine readable medium and equipment
Narang et al. Generating virtual avatars with personalized walking gaits using commodity hardware
Varga et al. Survey and investigation of hand motion processing technologies for compliance with shape conceptualization
Hwang et al. 2D and 3D full-body gesture database for analyzing daily human gestures
Günther et al. MAPVI: meeting accessibility for persons with visual impairments.
Khan et al. Internet of Things prototyping for cultural heritage dissemination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination