CN111401318B - Action recognition method and device - Google Patents

Action recognition method and device Download PDF

Info

Publication number
CN111401318B
CN111401318B CN202010292042.2A CN202010292042A CN111401318B CN 111401318 B CN111401318 B CN 111401318B CN 202010292042 A CN202010292042 A CN 202010292042A CN 111401318 B CN111401318 B CN 111401318B
Authority
CN
China
Prior art keywords
action
motion
recognition
virtual object
dimensional virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010292042.2A
Other languages
Chinese (zh)
Other versions
CN111401318A (en
Inventor
周明才
周大江
朱世艾
杜志军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010292042.2A priority Critical patent/CN111401318B/en
Publication of CN111401318A publication Critical patent/CN111401318A/en
Application granted granted Critical
Publication of CN111401318B publication Critical patent/CN111401318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/11Hand-related biometrics; Hand pose recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

The present specification provides a motion recognition method and a device, wherein the motion recognition method includes: acquiring an image frame acquired by image acquisition equipment; performing segmentation processing on the action area of the target object in the image frame to obtain an intermediate image; inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points; generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong; and performing motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determining the motion type of the target object.

Description

Action recognition method and device
Technical Field
The present disclosure relates to the field of gesture recognition technologies, and in particular, to a motion recognition method. The present specification also relates to a motion recognition apparatus, a computing device, and a computer-readable storage medium.
Background
With the development of the internet technology, the AR interaction based on gesture recognition is widely applied to the mobile terminal, and a user can operate the related application on the mobile terminal through a gesture in a manner of not touching the mobile terminal. However, in the gesture recognition, a great number of gesture images are collected in advance to serve as training samples, a gesture detector is trained offline, and then the gesture recognition can be realized by deploying the detector on a line, or a two-dimensional gesture image is constructed in a two-dimensional hand key point recognition mode, and then the gesture recognition is realized based on the two-dimensional gesture image; the problems of high cost, low universality and low flexibility exist no matter the gesture recognition is realized through a detector or the gesture recognition is realized through constructing a two-dimensional image, and the gesture recognition accuracy is further influenced because the gesture types of the same gesture under different visual angles are possibly different due to the high degree of freedom of the human hand.
Disclosure of Invention
In view of this, the embodiments of the present specification provide a method for motion recognition. The present specification also relates to an action recognition apparatus, a computing device, and a computer-readable storage medium, which are used to solve the technical problems in the prior art.
According to a first aspect of embodiments herein, there is provided a motion recognition method including:
acquiring an image frame acquired by image acquisition equipment;
segmenting the action area of the target object in the image frame to obtain an intermediate image;
inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action group frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points;
generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong;
and performing motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determining the motion type of the target object.
Optionally, the motion recognition model performs key point recognition, including:
identifying the action key points corresponding to the target object in the intermediate image, and determining key point labels of the action key points;
determining a target node label corresponding to the key point label based on the corresponding relation between the key point label and the node label of the action group frame node;
and determining the action group frame node corresponding to the action key point to which the key point label belongs according to the target node label.
Optionally, the coordinate mapping is performed by the motion recognition model, and includes:
determining position information of the action key point in the intermediate image;
and mapping the three-dimensional coordinate information corresponding to the action key points based on the position information.
Optionally, the generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame nodes and the virtual action frame to which the action frame nodes belong includes:
determining the corresponding relation between the action frame nodes and the three-dimensional coordinate information according to the action key points, and determining the connection relation of the action frame nodes based on the virtual action frame;
and performing connection processing on the three-dimensional coordinate information according to the connection relation to generate the three-dimensional virtual object.
Optionally, after the step of generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which the action frame node belongs is executed, and before the step of performing action recognition on the three-dimensional virtual object based on the action recognition data set and determining the action type of the target object is executed, the method further includes:
detecting the number of objects of the generated three-dimensional virtual object;
under the condition that the number of the objects is larger than a preset number threshold, carrying out standardization processing on a plurality of three-dimensional virtual objects corresponding to the number of the objects to obtain virtual objects to be selected;
and selecting a target virtual object from the virtual objects to be selected as the three-dimensional virtual object.
Optionally, before the step of acquiring the image frame acquired by the image acquisition device is executed, the method further includes:
receiving a click instruction submitted by a user through an action interaction page;
displaying at least one action image frame to the user according to the click command; the action image frame comprises a display area corresponding to a display action.
Optionally, the performing motion recognition on the three-dimensional virtual object based on the motion recognition data set to determine the motion type of the target object includes:
receiving the action identification data set issued by the server aiming at the action image frame; the action identification data carries an action identification rule matched with the display action;
judging whether the action of the three-dimensional virtual object is matched with the display action or not according to the action identification rule;
if yes, determining the display action type of the display action according to the action recognition data set, and taking the display action type as the action type.
Optionally, after determining the type of the display action according to the action recognition data set and executing the type of the display action as the action type sub-step, the method further includes:
and displaying recommendation information matched with the action type to the user through the action interaction page.
Optionally, after the step of performing motion recognition on the three-dimensional virtual object based on the motion recognition data set and determining the motion type of the target object is executed, the method further includes:
matching the action type with a display action type of the display action;
if the matching is successful, displaying successful matching information to the user through the action interaction page;
and if the matching fails, displaying reminding information to the user through the action interaction page, wherein the reminding information carries an action strategy.
Optionally, when the target object is a hand, the segmenting a motion region of the target object in the image frame to obtain an intermediate image includes:
detecting a characteristic region corresponding to the hand in the image frame;
cutting the image frame according to the characteristic area to obtain the intermediate image containing hand characteristics;
correspondingly, the action type is a gesture action type.
Optionally, the performing motion recognition on the three-dimensional virtual object based on the motion recognition data set, and determining the motion type of the target object include:
analyzing the action identification data set to obtain action elements;
determining an intermediate action type of the three-dimensional virtual object based on a recognition result of action recognition of the three-dimensional virtual object by the action element;
determining the intermediate action type as the action type of the target object.
Optionally, the determining the intermediate action type of the three-dimensional virtual object based on the recognition result of the action element for performing action recognition on the three-dimensional virtual object includes:
calculating action angles among all virtual sub-objects in the three-dimensional virtual object;
and detecting the action angle based on the action element, and determining the intermediate action type of the three-dimensional virtual object according to the detection result.
Optionally, the determining, based on the recognition result of the motion recognition performed on the three-dimensional virtual object by the motion element, the intermediate motion type of the three-dimensional virtual object includes:
detecting the action position of each virtual sub-object in the three-dimensional virtual object according to the action element;
determining the intermediate action type of the three-dimensional virtual object based on the detection result.
Optionally, the action recognition data set is composed of an action recognition packet issued by a server, where the action recognition packet includes a recognition rule for recognizing the three-dimensional virtual object.
According to a second aspect of embodiments herein, there is provided a motion recognition apparatus comprising:
the acquisition module is configured to acquire an image frame acquired by an image acquisition device;
a processing module configured to perform segmentation processing on a motion region of a target object in the image frame to obtain an intermediate image;
the identification module is configured to input the intermediate image into the action identification model to perform key point identification and coordinate mapping, and obtain action group frame nodes corresponding to the identified action key points and three-dimensional coordinate information mapped by the action key points;
the generating module is configured to generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame node and a virtual action group frame to which the action group frame node belongs;
a determination module configured to perform motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determine a motion type of the target object.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring an image frame acquired by image acquisition equipment;
performing segmentation processing on the action area of the target object in the image frame to obtain an intermediate image;
inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points;
generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong;
and performing motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determining the motion type of the target object.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the motion recognition method.
According to the action recognition method provided by the embodiment of the specification, in the process of recognizing actions placed by a user, image segmentation processing is performed on an obtained image frame to obtain an intermediate image containing a target object, the intermediate image is input into an action recognition model to obtain action group frame nodes and three-dimensional coordinate information, and the three-dimensional virtual object is generated according to the three-dimensional coordinate information, the action group frame nodes and a virtual action group frame to which the action group frame nodes belong, so that the action type can be recognized through the generated three-dimensional virtual object, the problem that the accuracy of action type recognition is influenced due to the fact that image frame acquisition standards are insufficient due to different image acquisition angles is solved effectively, meanwhile, the action type of the target object is recognized by performing action recognition on the three-dimensional virtual object according to the action recognition data set, the action type of the target object is determined, the action recognition scene is more universal, flexible and better in expansibility is achieved, the situation that the action type does not need to be retrained on the model under the condition that a new action type is added can be realized, and the new action type can be recognized by increasing the action recognition data set, and the application scene becomes wider.
Drawings
Fig. 1 is a flowchart of a method for recognizing an action according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an intermediate image in a motion recognition method according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a motion group frame in a motion recognition method according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a three-dimensional virtual object in a motion recognition method according to an embodiment of the present specification;
FIG. 5 is a flowchart illustrating a method for motion recognition applied in a gesture recognition scenario according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a motion recognition device according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present specification. This description may be implemented in many ways other than those specifically set forth herein, and those skilled in the art will appreciate that the present description is susceptible to similar generalizations without departing from the scope of the description, and thus is not limited to the specific implementations disclosed below.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context.
In the present specification, a motion recognition method is provided, and the present specification relates to a motion recognition apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
Fig. 1 shows a flowchart of a motion recognition method provided in an embodiment of the present specification, which specifically includes the following steps:
step 102: and acquiring an image frame acquired by the image acquisition equipment.
In practical application, corresponding services are provided for users by identifying gestures placed by the users, so that the service platform can enable the users to obtain the corresponding services through convenient operation, the experience effect of the users can be effectively improved, and the accuracy of gesture identification is a basis for supporting the services; gesture recognition usually adopts the mode of training the gesture detector to and carry out gesture recognition through two-dimentional hand key point, though can reach the purpose of discernment, there is the defect in flexibility and degree of accuracy, under the gesture image that gathers receives the collection angle influence, the accuracy of gesture recognition this moment will decline to a great extent, and when increasing new gesture, the gesture detector need train again and just can discern new gesture, so to the limitation of scene great.
In order to improve the accuracy of action recognition and still perform action recognition under the condition of a newly added action, in the process of recognizing an action placed by a user, image segmentation processing is performed on an acquired image frame to obtain an intermediate image containing a target object, the intermediate image is input into an action recognition model to obtain action frame nodes and three-dimensional coordinate information, the three-dimensional virtual object is generated according to the three-dimensional coordinate information, the action frame nodes and a virtual action frame to which the action frame nodes belong, the action type recognition through the generated three-dimensional virtual object is realized, the problem that image frame acquisition is not enough standard due to different image acquisition angles and the action type recognition accuracy is affected is effectively solved, meanwhile, action recognition is performed on the three-dimensional virtual object according to the action recognition data set to determine the action type of the target object, so that the action recognition method has more universality, flexibility and better expansibility in an action recognition scene can be realized in the action recognition scene, the mode of a new action type is not required to be re-applied, and the application range of a new action scene is wider.
In specific implementation, the motion recognition method provided by the specification can recognize gesture motions, for example, a user places a number "8" through a hand, and recognizes a gesture type of the user as the number type "8" by recognizing a hand placing posture of the user; or the body motion can be recognized, for example, the user puts the letter "Y" through the body motion, and the body motion of the user can be recognized as the letter type "Y" through recognizing the body motion of the user; in addition, the leg actions can be recognized, for example, a user places a Chinese character 'person' on the leg, and the leg actions of the user can be recognized as the Chinese character type 'person'.
Based on the above, the motion recognition is to recognize the motion type of the user through the placing motion of the hand, the limb or the leg, and correspondingly, the target object is the hand, the limb or the leg of the user; when the target object is a hand, the motion recognition method is to recognize the motion type of the hand placing of the user, and the recognized motion type is the type to which the gesture played by the user through the hand placing belongs, such as a number "8" or a number "1"; when the target object is a limb, the motion recognition method recognizes the motion type of the limb placement of the user, and the recognized motion type is the type of the limb where the user places through the limb, such as the letter "Y" or the number of Chinese characters.
In this embodiment, the motion recognition method will be described by using a target object as a hand of a user, and in the case that the target object is a limb or a foot, reference may be made to the description content related to this embodiment, which is not described herein in detail, and it should be noted that the specific description content may be referred to each other no matter whether the target object is a limb or a hand.
Specifically, the image acquisition device specifically refers to a device for acquiring an image of a user placing action, and may be a mobile phone or a camera, and correspondingly, the image frame specifically refers to an image obtained after image acquisition is performed on the user placing action, when the acquired image frame includes the user placing action, subsequent action recognition may be performed, or a large number of image frames may be acquired through the image acquisition device, and an image including the user placing action is selected as the image frame for subsequent action recognition.
In addition, before performing the action recognition, the user needs to be informed of the action type that can be recognized, for example, in a payment scene, the gesture placed by the user can be recognized as "1" and "2", the gesture "1" represents the payment amount, and the gesture "2" represents the abandoning of the payment amount, at this time, the gesture that can be recognized by the payment platform needs to be informed of the user, and then the gesture placed by the user can be recognized, and the gesture placed by the user is recognized as "1" or "2", so as to perform subsequent operations, based on this, the action that the user needs to place needs to be shown to the user through the action interaction page, in one or more embodiments of this embodiment, a specific implementation manner is as follows:
receiving a click instruction submitted by a user through an action interaction page;
displaying at least one action image frame to the user according to the click command; the action image frame comprises a display area corresponding to a display action.
In practical application, the action interaction page specifically refers to a page displayed to a user through a terminal device, action interaction can be performed on the page, namely, a placing action of the user is identified, based on the action interaction page, a click command submitted by the user through the action interaction page is received, at the moment, it can be determined that the user needs to obtain corresponding service through placing the corresponding action, at the moment, at least one action image frame is displayed to the user according to the click command, wherein the action image frame comprises a display area corresponding to the display action, the display action is an action for informing the user of needing to place, the display action can be a gesture action or a limb action, correspondingly, and the display area is an area corresponding to the display action in the action image frame.
For example, in an applet selection scene, a user may present a gesture action image frame corresponding to each applet to the user through an action interaction page after submitting a click instruction through the action interaction page corresponding to an applet selection item, and jump from the page to a corresponding applet page under the condition that the user places a gesture the same as the presentation action in the gesture action image frame, based on which, after submitting the click instruction, the user presents an action image frame a corresponding to the applet a according to the click instruction, where the action image frame includes a presentation action a; and when the user performs the hand placing action through the hand, the hand action of the user is identified, and the corresponding small program is jumped to according to the identification result, or the identification result is inconsistent with the three actions of a, b and c, no processing is performed.
Before the image frame is obtained, the action image frame of the display area containing the display action is displayed for the user, so that the action type which can be identified by the user can be informed, the user can conveniently put out the corresponding action, the accurate action identification can be carried out, the corresponding service is provided, and the experience effect of the user is effectively improved.
Step 104: and segmenting the action area of the target object in the image frame to obtain an intermediate image.
Specifically, on the basis of the image frame acquisition, the image frame is further subjected to image segmentation processing to obtain an intermediate image including the motion region of the target object, and then a subsequent motion recognition process is performed according to the intermediate image.
Further, when the target object is a hand, that is, an intermediate image including the hand and obtained by segmenting or cropping a feature region corresponding to the hand is used for subsequent motion type identification of the hand, in one or more embodiments of the present embodiment, a specific implementation manner is as follows:
detecting a characteristic region corresponding to the hand in the image frame;
cutting the image frame according to the characteristic area to obtain the intermediate image containing hand characteristics;
correspondingly, the action type is a gesture action type.
In practical application, after image frames are acquired, hands in the image frames need to be detected, and after a feature region corresponding to the hands is detected in the image frames, the image frames are cut according to the feature region, so that the intermediate image containing the hand features can be obtained and used for subsequent hand action type identification.
Step 106: and inputting the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, and obtaining action group frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points.
Specifically, on the basis of the intermediate image obtained by the image segmentation processing, the operation key points corresponding to the target object in the intermediate image are identified and coordinate mapped, and based on the identification and coordinate mapping, the intermediate image is input to the operation identification model to perform key point identification and coordinate mapping, so as to obtain the operation key points corresponding to the target object in the intermediate image output by the operation identification model, the operation group frame nodes corresponding to the operation key points, and the three-dimensional coordinate information mapped by the operation key points.
In practical application, the action key points specifically refer to key points capable of being recognized in actions put by a user, each action key point is used for performing action recognition on a target object in the subsequent process, the action group frame nodes specifically refer to nodes contained in a virtual action group frame, correspondingly, the virtual action group frame can be understood as a three-dimensional model, the three-dimensional model refers to a model framework in a three-dimensional space, the model can be adjusted through three-dimensional coordinate information to construct the three-dimensional virtual object, the action recognition accuracy can be improved, and the three-dimensional virtual object is a three-dimensional model consistent with the actions of the target object.
Referring to fig. 2, when the target object is a hand, an intermediate image shown in (a) of fig. 2 is obtained by performing image segmentation processing on an image frame, wherein the intermediate image includes a feature region corresponding to the hand, and the image is input to an action recognition model for performing key point recognition and coordinate mapping, so as to obtain action key points corresponding to the recognized hand, wherein 21 action key points are identifiable by the hand, and the distribution of the 21 key points is as shown in (b) of fig. 2, and the 21 action key points are obtained by performing key point recognition on the intermediate image shown in (a) of fig. 2, wherein the 21 action key points are respectively action key point 0, action key point 1 \ 8230, and action key point 20, and the action key points are respectively action group frame node a, action group frame node b \ 8230, and action group frame node u, 21 action group frame nodes in total, and the action recognition model can recognize action frame nodes corresponding to the action frame node a, action frame node b, action frame node 82300, and action frame node corresponding to the action frame node b, and action key frame node 8230u;
based on this, in order to improve the accuracy of subsequent action recognition, a three-dimensional virtual object needs to be constructed in a three-dimensional space, at this time, three-dimensional coordinate information mapped by each action key point is determined, and then the three-dimensional virtual object is generated according to the three-dimensional coordinate information and an action frame node corresponding to the action key point.
In one or more implementations of this embodiment, the performing, by the motion recognition model, key point recognition includes:
identifying the action key points corresponding to the target object in the intermediate image, and determining key point labels of the action key points;
determining a target node label corresponding to the key point label based on the corresponding relation between the key point label and the node label of the action group frame node;
and determining the action group frame node corresponding to the action key point to which the key point label belongs according to the target node label.
In one or more implementations of this embodiment, the performing coordinate mapping by the motion recognition model includes:
determining position information of the action key point in the intermediate image;
and mapping the three-dimensional coordinate information corresponding to the action key points based on the position information.
In practical application, in the process of performing key point identification and coordinate mapping by the action identification model, all action key points corresponding to a target object are identified in the intermediate image, key point labels of the action key points are determined at the same time, then a target node label corresponding to the key point label is determined according to a pre-established correspondence between the key point label and a node label of an action group frame node, and finally the action group frame node corresponding to the action key point to which the key point label belongs can be determined according to the target node label;
based on this, after each action key point is identified, the position information of the action key point in the intermediate image is determined at the same time, and then the three-dimensional coordinate information corresponding to the action key point is mapped based on the position information, wherein the three-dimensional coordinate information mapped by the action key point can be understood as configuring a depth coordinate for each action key point according to the position information, so as to map the three-dimensional coordinate information.
Referring to fig. 2, after the intermediate image shown in (a) of fig. 2 is input into the motion recognition model, 21 motion key points corresponding to the hand are recognized in the intermediate image, and meanwhile, the key point labels of the motion key points are respectively label 0, label 1 \8230;, and label 20, and the node labels of the motion group frame nodes are respectively label a, label b \8230;, and label u, the node labels corresponding to the key point labels are determined based on the corresponding relationship between the key point labels and the node labels of the motion group frame nodes, and the motion group frame nodes corresponding to the motion key points can be determined according to the node labels, respectively, motion key point 0 corresponds to motion group frame node a, motion key point 1 corresponds to motion group frame node b \8230;, and motion key point 20 corresponds to motion group frame node u;
based on this, it is determined that the position information of each action key point in the intermediate image is (x 1, y 1) corresponding to the action key point 0, the position information of the action key point 1 is (x 2, y 2) \8230, (x 20, y 20) corresponding to the action key point 20, three-dimensional coordinate information is mapped for each action key point based on the position information, (x 1, y1, z 1) corresponding to the action key point 0, (x 2, y2, z 2) } 8230for the action key point 1, and (x 20, y20, z 20) corresponding to the action key point 20, respectively, for subsequently constructing a three-dimensional virtual object, i.e., a three-dimensional hand model corresponding to the hand.
In practical applications, the process of mapping the three-dimensional coordinate information corresponding to the action key point according to the position information may be: firstly, a two-dimensional coordinate system is built based on the intermediate image, corresponding two-dimensional coordinate information of each action key point in the two-dimensional coordinate system is determined, then a transformation relation between the two-dimensional coordinate system and a three-dimensional coordinate system built based on a virtual action group frame is determined, finally, three-dimensional coordinate information mapped into the three-dimensional coordinate system by the two-dimensional coordinate information can be determined based on the transformation relation, and the three-dimensional coordinate information can be used as the three-dimensional coordinate information mapped by each action key point.
In addition, the action key points of the target object can be set according to the actual application scene, and after the action key points are set, the three-dimensional virtual object corresponding to the action corresponding to the target can be constructed subsequently for action recognition as long as each action key point is recognized through the action recognition model.
In conclusion, the action frame nodes corresponding to the action key points and the mapped three-dimensional coordinate information are identified through the action identification model, so that a virtual object can be constructed in a three-dimensional space for action identification according to the three-dimensional coordinate information in the follow-up process, the condition that the identification accuracy is influenced due to the angle problem is avoided, and the action identification accuracy can be effectively improved.
Step 108: and generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame node and the virtual action group frame to which the action group frame node belongs.
Specifically, on the basis of obtaining the action frame nodes corresponding to the action key points and the mapped three-dimensional coordinate information, the three-dimensional virtual object is further generated according to the three-dimensional coordinate information, the action frame nodes and the virtual action frame to which the action frame nodes belong, the three-dimensional virtual object is specifically a three-dimensional "model" consistent with the action placed by the target object, and the situation that the accuracy is affected due to the problem of shielding or angles existing in the action can be avoided by generating the three-dimensional virtual object.
Further, in the process of generating the three-dimensional virtual object, it is only possible to adjust the virtual rack node to generate the three-dimensional virtual object by determining the corresponding relationship between the three-dimensional coordinate information and the action rack node, where in one or more embodiments of this embodiment, specific embodiments are as follows:
determining the corresponding relation between the action frame nodes and the three-dimensional coordinate information according to the action key points, and determining the connection relation of the action frame nodes based on the virtual action frame;
and performing connection processing on the three-dimensional coordinate information according to the connection relation to generate the three-dimensional virtual object.
In practical application, when determining the action frame nodes corresponding to the action key points and the three-dimensional coordinate information mapped by the action key points, the corresponding relation between the action frame nodes and the three-dimensional coordinate information can be determined, and meanwhile, the connection relation of the action frame nodes can be determined according to the virtual action frame, namely, each action frame node can form a virtual action frame according to the connection relation;
based on the above, the three-dimensional coordinate information is connected according to the connection relationship, and a three-dimensional virtual object matching with the action of the target object can be generated.
According to the above example, it is determined that the action key point 0 corresponds to the action group frame node a, the action key point 1 corresponds to the action group frame node b \8230, the action key point 20 corresponds to the action group frame node u, and the three-dimensional coordinate information mapped by the action key point 0 is (x 1, y1, z 1), the three-dimensional coordinate information mapped by the action key point 1 is (x 2, y2, z 2) \8230, and the three-dimensional coordinate information mapped by the action key point 20 is (x 20, y20, z 20), at this time, it can be determined that the action group frame node a corresponds to the three-dimensional coordinate information (x 1, y1, z 1), the three-dimensional coordinate information mapped by the action group frame node b is (x 2, y2, z 2), and the action group frame node u corresponds to the three-dimensional coordinate information (x 20, y20, z 20);
further, when the corresponding relationship between each motion frame node and the three-dimensional coordinate information is determined, and the connection relationship between the motion frame nodes is determined to be (a, b, c, d, e), (a, f, g, h, i), (a, j, k, l, m), (a, n, o, p, q) and (a, r, s, t, u) according to the virtual motion frame, the three-dimensional coordinate information mapped to each motion key point is connected according to the connection relationship, so that a frame corresponding to the hand portion shown in (a) in fig. 4 can be constructed, and then a region corresponding to the palm portion in the hand portion is constructed based on the connection relationship, as shown in (b) in fig. 4, and a palm portion is constructed from a, b, c, d, e and f, so that a frame corresponding to the gesture of the hand portion of the user can be determined to be initially completed, and a three-dimensional virtual object, that is a three-dimensional model corresponding to the gesture of the user, is obtained through rendering and generating processes, and is shown in (c) in fig. 4, and then the three-dimensional model corresponding to the gesture of the user, i.e. the gesture of the placing motion of the user can be determined.
In summary, while determining the corresponding relationship between the action frame nodes and the three-dimensional virtual object, the connection relationship between each action frame node in the virtual action frame is determined, and the three-dimensional virtual object can be generated by performing connection processing on the three-dimensional coordinate information according to the connection relationship, so that a three-dimensional "model" consistent with the action of the target object is generated in a three-dimensional space, and then action recognition is performed on the three-dimensional virtual object, so that the action type of the target object can be accurately recognized, and the recognition accuracy is effectively improved.
In addition, while the three-dimensional virtual object is generated, since it is the three-dimensional coordinate information obtained based on the intermediate image, there may be a case that one action key point corresponds to two or more three-dimensional coordinate information, at this time, if a three-dimensional virtual object is generated according to the three-dimensional coordinate information, there may be two or more possibilities that the three-dimensional coordinate information mapped by the action key point 20 of the hand has (x 20, y20, z 20) and (x 21, y21, z 21), at this time, two three-dimensional virtual objects will be generated, which are respectively the little thumb of the hand being buckled to the palm and the little thumb being unbent, and if gesture recognition is continued, there will be motion types corresponding to two types of gestures recognized, and in order to avoid this, normalization processing may be performed on the constructed three-dimensional virtual object, in one or more embodiments of the present embodiment, as follows:
detecting the number of objects of the generated three-dimensional virtual object;
under the condition that the number of the objects is larger than a preset number threshold, carrying out standardization processing on a plurality of three-dimensional virtual objects corresponding to the number of the objects to obtain virtual objects to be selected;
and selecting a target virtual object from the virtual objects to be selected as the three-dimensional virtual object.
In practical application, on the basis of generating the three-dimensional virtual objects, the number of the objects of the three-dimensional virtual objects is further detected, and under the condition that the number of the objects is greater than a preset number threshold, it is indicated that a plurality of the generated three-dimensional virtual objects exist at this time, firstly, the three-dimensional virtual objects need to be subjected to standardization processing to obtain virtual objects to be selected corresponding to the number of the objects, then, the virtual object to be selected with the highest action matching degree with the target object is selected from the virtual objects to be selected to be determined as the target virtual object, and the target virtual object is used as the three-dimensional virtual object to perform subsequent action recognition; if the number of objects is less than or equal to the number threshold, which indicates that there is one three-dimensional virtual object, the following step 110 may be executed. In specific implementation, the preset number threshold is 1.
For example, three generated three-dimensional virtual objects matched with the hand motions of the user exist, namely a three-dimensional virtual object 1, a three-dimensional virtual object 2 and a three-dimensional virtual object 3, wherein the motions of the three-dimensional virtual object 1 are that a thumb and an index finger are straightened, the other three fingers are buckled to the palm, the motions of the three-dimensional virtual object 2 are that the thumb, the index finger and the middle finger are straightened, the other two fingers are buckled to the palm, the motions of the three-dimensional virtual object 3 are that the thumb, the index finger and the small finger are straightened, and the other two fingers are buckled to the palm, so that the three gestures can be determined to exist;
further, the similarity of the three gestures in the intermediate image respectively corresponding to the three gestures and the hand can be calculated, the three-dimensional virtual object corresponding to the gesture with the highest similarity is selected and determined as the three-dimensional virtual object to be recognized subsequently, and the similarity of the three-dimensional virtual object 1 and the gesture in the intermediate image is determined to be the highest through calculation, and then the three-dimensional virtual object 1 is taken as the target three-dimensional virtual object to be recognized subsequently.
In practical application, the three-dimensional virtual objects are standardized, specifically, the facing directions of a plurality of three-dimensional virtual objects are the same through rotation, scaling, translation and other modes, and at the moment, the standardized three-dimensional virtual objects are more convenient to select a target virtual object as the three-dimensional virtual object.
In summary, after the three-dimensional virtual objects are generated, if a plurality of three-dimensional virtual objects exist, the virtual objects to be selected may be obtained through a normalization process, and then a target virtual object is selected from the virtual objects to be selected as the three-dimensional virtual object for subsequent identification, so that the accuracy of motion identification may be further improved.
Step 110: and performing motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determining the motion type of the target object.
Specifically, on the basis of the generation of the three-dimensional virtual object, the three-dimensional virtual object is subjected to action recognition according to the action recognition data set, so as to determine the action type of the target object, wherein the action recognition data set is composed of an action recognition packet issued by a server, and the action recognition packet includes a recognition rule for recognizing the three-dimensional virtual object; the server specifically refers to a server to which a platform related to the action recognition belongs.
In practical application, the action recognition data set is composed of a plurality of action recognition packets, and each action recognition packet contains at least one action recognition rule, and each action recognition rule can recognize one action type, for example, the action recognition rule is a rule for judging whether each fingertip is in a palm area or not and judging whether each fingertip is straight or not; and performing action recognition on the three-dimensional virtual object based on the action recognition rule, determining that the thumb and the index finger of the three-dimensional virtual object are straightened, the fingertips of other three fingers are positioned in the palm, and determining that the action type of the three-dimensional virtual object at the moment is the digital type '8', so that the gesture of the user can be determined to be '8'.
Further, in the process of identifying the action type, the action type can be identified according to the action identification data set, and in one or more embodiments of this embodiment, specific implementation manners are as follows:
analyzing the action recognition data set to obtain action elements;
determining an intermediate action type of the three-dimensional virtual object based on a recognition result of action recognition of the three-dimensional virtual object by the action element;
determining the intermediate action type as the action type of the target object.
In practical applications, the identification data needs to be analyzed first to obtain an action element, where the action element is specifically an element for performing action identification on the three-dimensional virtual object, that is, a condition for judging an action type of the three-dimensional virtual object, and the action identification is performed on the three-dimensional virtual object according to the action element, at this time, an intermediate action type of the three-dimensional virtual object may be determined according to the action identification result, and then the intermediate action type is determined as the action type of the target object.
Furthermore, in the process of determining the intermediate action type of the three-dimensional virtual object by performing action recognition on the three-dimensional virtual object based on the action element, the first aspect may be implemented by calculating an action angle of each virtual sub-object of the three-dimensional virtual object, and the second aspect may be implemented by calculating a position of each virtual sub-object in the three-dimensional virtual object, in one or more embodiments of this embodiment, the process of determining the intermediate action type by the first aspect includes:
calculating action angles among all virtual sub-objects in the three-dimensional virtual object;
and detecting the action angle based on the action element, and determining the intermediate action type of the three-dimensional virtual object according to the detection result.
In practical application, the virtual sub-object specifically refers to sub-units constituting the three-dimensional virtual object, for example, when the target object is a hand, the generated three-dimensional virtual object is a three-dimensional hand object, and at this time, the virtual sub-object is each finger and palm in the three-dimensional hand object; or under the condition that the target object is a limb, the generated three-dimensional virtual object is the three-dimensional limb object, and at the moment, the virtual sub-object is each limb and body in the three-dimensional limb object.
Based on this, the intermediate motion type of the three-dimensional virtual object can be determined according to the detection result by calculating the motion angle between each virtual sub-object in the three-dimensional virtual object, determining the existing posture of each virtual sub-object, and then detecting the motion angle based on the motion element.
For example, the three-dimensional virtual object is a three-dimensional model constructed based on a hand, and at this time, the virtual sub-objects are each finger and palm center of the hand, and the angles of the little finger, ring finger and middle finger and the palm center are determined to be 0 and the angle of the thumb and index finger is 90 degrees by calculating the first motion angle between each finger and the palm center and the second motion angle between each finger, and at this time, the motion angle is detected from the motion element, and when the angles of the little finger, ring finger and middle finger and the palm center are 0 and the angle of the thumb and index finger is 90 degrees, the gesture at this time is the same as the motion element of the gesture type "8", and the middle motion type of the three-dimensional virtual object is determined to be the gesture type "8" based on the detection result.
In one or more implementations of the present embodiment, the determining of the type of the intermediate action in the second aspect includes:
detecting the action position of each virtual sub-object in the three-dimensional virtual object according to the action element;
determining the intermediate action type of the three-dimensional virtual object based on the detection result.
In practical applications, the action position is specifically a position where each virtual sub-object is located, for example, when the three-dimensional virtual object is generated according to a hand, the virtual sub-objects are each finger and palm of the hand at this time, the action positions of the small finger, ring finger and middle finger are determined to be in the palm, the action positions of the large finger and the index finger are outside the palm, each action position is detected according to the action element at this time, the action positions of the small finger, ring finger and middle finger are determined to be in the palm, and when the action positions of the large finger and the index finger are outside the palm, the gesture at this time is the same as the action element of gesture type "8", and the middle action type of the three-dimensional virtual object is determined to be gesture type "8" according to the detection result.
In summary, in the process of determining the intermediate action type of the three-dimensional virtual object, the intermediate action type of the three-dimensional virtual object may be detected by calculating the action angle or the action position of each virtual sub-object in the three-dimensional virtual object, so that the accuracy of identifying the action type is effectively improved.
In addition, on the basis of displaying the action image frame to the user through the action interaction page, at this time, action recognition is performed, and it is necessary to recognize the action of the target object according to a recognition rule related to the display action in the action image frame, so as to determine whether the action placed by the user is correct, and if it is correct, it is sufficient to perform other processing if it is described that the action type is the same as the action type of the display action, in one or more embodiments of this embodiment, a specific implementation manner is as follows:
receiving the action identification data set issued by the server aiming at the action image frame; the action identification data carries an action identification rule matched with the display action;
judging whether the action of the three-dimensional virtual object is matched with the display action or not according to the action identification rule;
if yes, determining a display action type of the display action according to the action recognition data set, and taking the display action type as the action type;
if not, no processing is carried out;
further, after the action type is used as the action type, the recommendation information matched with the action type can be displayed to the user through the action interaction page.
In practical application, under the condition that the action identification data set issued by the server aiming at the action image frame is received, whether the action of the three-dimensional virtual object is matched with the display action is judged according to an action identification rule matched with the display action carried in the action identification data; if the display action type is not matched with the action type, the action of the user through the target object placement is different from the display action, the user needs to perform action placement again until the placement is correct, and then the display action type is used as the action type;
based on this, at this time, it may be determined that the action type placed by the user is correct, and then the recommendation information matched with the action type may be presented to the user through the action interaction page.
For example, in a payment scene, when a user places "1" through a gesture, payment is performed, when the user places "2" through the gesture, payment is abandoned, two gestures and meanings of the two gestures are shown to the user through a mobile phone end, the gesture placed by the user is acquired through a camera of the mobile phone end, a corresponding three-dimensional virtual object is generated, and at the moment, whether the action of the three-dimensional virtual object is matched with the gesture "1" or the gesture "2" is judged according to an action recognition rule (corresponding to the gesture "1" and the gesture "2") in a recognition data set issued by a server end;
when the payment page is matched with the gesture '1', determining that the gesture type of the user is '1' to indicate that payment is performed, determining the gesture type '1' as the gesture type of the user, and displaying reminding information of successful payment to the user through the payment page; when the payment is matched with the gesture '2', determining that the gesture type of the user is '2' to give up payment, determining the gesture type '2' as the gesture type of the user, and displaying reminding information for giving up payment to the user through a payment page;
under the condition that the gesture is not matched with the gesture '1' and the gesture '2', it is determined that the gesture placed by the user through the hand is wrong, the gesture type cannot be correctly identified, and then reminding information for re-gesture identification is displayed to the user through the payment page.
In conclusion, after the action of the target object is recognized, corresponding recommendation information is displayed to the user according to the action type, and in the case that the recommendation information cannot be recognized, the reminding information is displayed to the user through the action interaction page, so that the experience effect of the user is effectively improved, and the user can conveniently obtain the relevant recommendation information.
In addition, on the basis of displaying an action image frame to a user through an action interaction page, in the case that the action type is identified, the action type is matched with the action type of a display action, and corresponding reminding information is displayed according to a matching result, in one or more embodiments of the embodiment, a specific implementation manner is as follows:
matching the action type with a display action type of the display action;
if the matching is successful, displaying successful matching information to the user through the action interaction page;
and if the matching fails, displaying reminding information to the user through the action interaction page, wherein the reminding information carries an action strategy.
In practical application, the action policy specifically indicates that the user is reminded of how a correct placing action can be successful.
For example, in an applet selection scene, a gesture of a user for placing is acquired through a camera of a mobile phone end, a corresponding three-dimensional virtual object is generated, the action type of the three-dimensional virtual object is recognized to be gesture "2" according to an action recognition data set, the gesture of the user for placing is determined to be "2", when a display action corresponding to the gesture "2" exists in the applet selection scene, a successful matching prompt is displayed to the user through an action interaction page in the mobile phone, the applet corresponding to the gesture "2" is skipped to, when a display action corresponding to the gesture "2" does not exist in the applet selection scene, a matching failure prompt is displayed to the user through the action interaction page in the mobile phone, the gesture type existing in the applet is displayed to the user, and the placing action is convenient for the user.
According to the action recognition method provided by the embodiment, in the process of recognizing the action placed by the user, the obtained image frame is subjected to image segmentation processing to obtain an intermediate image containing a target object, the intermediate image is input into an action recognition model to obtain an action group frame node and three-dimensional coordinate information, the three-dimensional virtual object is generated according to the three-dimensional coordinate information, the action group frame node and the virtual action group frame to which the action group frame node belongs, the action type can be recognized through the generated three-dimensional virtual object, the problem that the accuracy of action type recognition is influenced due to the fact that image frame acquisition is not enough standard due to different image acquisition angles is solved effectively, meanwhile, action recognition is performed on the three-dimensional virtual object according to the action recognition data set to determine the action type of the target object, so that the action recognition method has more universality, flexibility and better expansibility in an action recognition scene, the model does not need to be retrained again under the condition that a new action type is added, and the recognition of the new action type can be realized by a way of adding an action recognition data set, and application scenes become wider.
The following will further describe the motion recognition method by taking an application of the motion recognition method provided in this specification in a gesture recognition scenario as an example with reference to fig. 5. Fig. 5 shows a processing flow chart of an action recognition method applied in a gesture recognition scenario provided in an embodiment of the present specification, and specifically includes the following steps:
step 502: and receiving a click instruction submitted by a user through the action interaction page.
Specifically, the payment platform is convenient for a user to use corresponding payment software, improves the convenience of the user, provides gesture recognition service for the user, and enables the user to perform corresponding payment operation in a gesture placing mode through an action interaction page;
based on this, when the user submits the click command, the gesture of the user needs to be recognized at this time, so as to show the corresponding payment information to the user.
Step 504: and displaying the action image frame through the action interaction page according to the click command, wherein the action image frame comprises a display area corresponding to the display action.
Specifically, the action image frame comprises a display area corresponding to the gesture 1 and the gesture 2, and the gesture capable of being placed is informed to the user through the action interaction page, so that the user can correctly place the gesture and recognize the subsequent gesture.
Step 506: and acquiring the image frames acquired by the mobile phone.
Step 508: and performing segmentation processing on the characteristic region of the gesture in the image frame to obtain an intermediate image containing the gesture.
Step 510: and inputting the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, and obtaining gesture group frame nodes corresponding to the recognized gesture key points and three-dimensional coordinate information mapped by the gesture key points.
For details, reference may be made to the related description in the foregoing embodiments for performing the key point identification and the coordinate mapping, and this embodiment is not described in detail herein.
Step 512: and determining the corresponding relation between the gesture group frame nodes and the three-dimensional coordinate information according to the gesture key points, and determining the connection relation of the gesture group frame nodes based on the virtual gesture group frame.
Step 514: and connecting the three-dimensional coordinate information according to the connection relation to obtain a plurality of three-dimensional virtual gestures.
Step 516: and carrying out standardization processing on the plurality of three-dimensional virtual gestures, and determining a target three-dimensional virtual gesture according to a standardization processing result.
Step 518: and receiving a gesture recognition rule issued by the server aiming at the action image frame.
Step 520: and performing gesture recognition on the target three-dimensional virtual gesture according to the gesture recognition rule, and determining the gesture type of the user.
Step 522: judging whether the gesture type is matched with the gesture type of the display movement; if not, go to step 524; if so, go to step 526.
Step 524: and displaying the reminding information to the user.
Step 526: and presenting recommendation information matched with the gesture type to the user.
Specifically, gesture recognition is performed on the three-dimensional virtual gesture according to a gesture recognition rule, when the gesture type of the user is determined to be '1', the balance of the payment account is displayed to the user, and when the gesture of the user is determined not to be matched with the gesture type of the displayed action, if the gesture placed by the user is indicated to be in problem, reminding information is displayed to the user, and the user is reminded to place the gesture again.
The action recognition method provided by the embodiment realizes the recognition of action types through the generated three-dimensional virtual object, effectively solves the problem that the accuracy of action type recognition is affected due to the fact that image frame collection is not standard enough due to different image collection angles, and meanwhile, the action recognition is carried out on the three-dimensional virtual object according to the action recognition data set to determine the action type of the target object, so that the action recognition method has more universality, flexibility and better expansibility in an action recognition scene, can realize that the new action type can be recognized in a mode of adding the action recognition data set without retraining the model under the condition of adding the new action type, and enables the application scene to become wider.
Corresponding to the above method embodiment, the present specification further provides an embodiment of a motion recognition apparatus, and fig. 6 shows a schematic structural diagram of a motion recognition apparatus provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:
an obtaining module 602 configured to obtain an image frame captured by an image capturing device;
a processing module 604 configured to perform segmentation processing on a motion region of a target object in the image frame to obtain an intermediate image;
the identification module 606 is configured to input the intermediate image into the action identification model to perform key point identification and coordinate mapping, so as to obtain action group frame nodes corresponding to the identified action key points and three-dimensional coordinate information mapped by the action key points;
a generating module 608 configured to generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and a virtual action frame to which the action frame node belongs;
a determining module 610 configured to perform motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determine a motion type of the target object.
In an alternative embodiment, the identifying module 606 includes:
a key point label determining unit configured to identify the action key point corresponding to the target object in the intermediate image and determine a key point label of the action key point;
the node tag determining unit is configured to determine a target node tag corresponding to a key point tag based on the corresponding relation between the key point tag and the node tag of the action group frame node;
and the action group frame node determining unit is configured to determine the action group frame node corresponding to the action key point to which the key point label belongs according to the target node label.
In an optional embodiment, the identifying module 606 includes:
a determination position information unit configured to determine position information of the action key point in the intermediate image;
a mapping coordinate information unit configured to map out the three-dimensional coordinate information corresponding to the action key point based on the position information.
In an optional embodiment, the generating module 608 includes:
a connection relation determining unit configured to determine a correspondence relation between the action group frame node and the three-dimensional coordinate information according to the action key point, and determine a connection relation of the action group frame node based on the virtual action group frame;
and a three-dimensional virtual object generating unit configured to perform connection processing on the three-dimensional coordinate information according to the connection relation to generate the three-dimensional virtual object.
In an optional embodiment, the motion recognition apparatus further includes:
a detection module configured to detect a number of objects of the generated three-dimensional virtual object;
the standardization processing module is configured to standardize a plurality of three-dimensional virtual objects corresponding to the number of the objects under the condition that the number of the objects is larger than a preset number threshold value to obtain virtual objects to be selected;
a selection module configured to select a target virtual object as the three-dimensional virtual object among the virtual objects to be selected.
In an optional embodiment, the motion recognition apparatus further includes:
the instruction receiving module is configured to receive a click instruction submitted by a user through the action interaction page;
a presentation module configured to present at least one action image frame to the user according to the click instruction; the action image frame comprises a display area corresponding to a display action.
In an optional embodiment, the determining module 610 includes:
the receiving unit is configured to receive the action identification data set issued by the server aiming at the action image frame; the action identification data carries an action identification rule matched with the display action;
the judging unit is configured to judge whether the action of the three-dimensional virtual object is matched with the display action according to the action recognition rule;
if yes, operating a unit for determining the action type;
the action type determining unit is configured to determine a presentation action type of the presentation action according to the action recognition data set, and use the presentation action type as the action type.
In an optional embodiment, the motion recognition apparatus further includes:
and the recommendation information display module is configured to display recommendation information matched with the action type to the user through the action interaction page.
In an optional embodiment, the motion recognition apparatus further includes:
a matching module configured to match the action type with a presentation action type of the presentation action;
if the matching is successful, operating a first display information module, wherein the first display information module is configured to display successful matching information to the user through the action interaction page;
and if the matching fails, operating a second display information module, wherein the second display information module is configured to display reminding information to the user through the action interaction page, and the reminding information carries an action strategy.
In an optional embodiment, in case that the target object is a hand, the processing module 604 includes:
a feature region detection unit configured to detect a feature region corresponding to the hand in the image frame;
a cropping image unit configured to crop the image frame according to the feature area, to obtain the intermediate image containing hand features;
correspondingly, the action type is a gesture action type.
In an optional embodiment, the determining module 610 includes:
the analysis unit is configured to analyze the action recognition data set to obtain action elements;
a recognition unit configured to determine an intermediate motion type of the three-dimensional virtual object based on a recognition result of motion recognition of the three-dimensional virtual object by the motion element;
a determination unit configured to determine the intermediate action type as the action type of the target object.
In an optional embodiment, the identification unit includes:
a calculation sub-module configured to calculate an action angle between each of the three-dimensional virtual objects;
a first determining submodule configured to detect the action angle based on the action element, and determine the intermediate action type of the three-dimensional virtual object according to a detection result.
In an optional embodiment, the identification unit includes:
a detection sub-module configured to detect an action position of each virtual sub-object in the three-dimensional virtual object according to the action element;
a second determination sub-module configured to determine the intermediate action type of the three-dimensional virtual object based on a detection result.
In an optional embodiment, the action recognition data set is composed of an action recognition packet issued by a server, where the action recognition packet includes a recognition rule for recognizing the three-dimensional virtual object.
The motion recognition device provided by the specification performs image segmentation on an acquired image frame to obtain an intermediate image containing a target object in the process of recognizing a motion placed by a user, inputs the intermediate image into a motion recognition model to obtain a motion group frame node and three-dimensional coordinate information, generates a three-dimensional virtual object according to the three-dimensional coordinate information, the motion group frame node and a virtual motion group frame to which the motion group frame node belongs, realizes motion type recognition through the generated three-dimensional virtual object, effectively solves the problem that image frame acquisition is not enough standard due to different image acquisition angles to influence motion type recognition accuracy, performs motion recognition on the three-dimensional virtual object according to a motion recognition data set, determines the motion type of the target object, enables the motion recognition to be more universal, flexible and better in expansibility in a motion recognition scene, and can realize recognition of a new motion type without retraining the model under the condition of adding the new motion type and enables application scenes to be more extensive by increasing a motion recognition data set.
The above is a schematic scheme of a motion recognition apparatus of the present embodiment. It should be noted that the technical solution of the motion recognition device is the same as that of the motion recognition method, and for details that are not described in detail in the technical solution of the motion recognition device, reference may be made to the description of the technical solution of the motion recognition method.
Fig. 7 illustrates a block diagram of a computing device 700 provided according to an embodiment of the present description. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device structure shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the description. Other components may be added or replaced as desired by those skilled in the art.
Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.
Wherein processor 720 is configured to execute the following computer-executable instructions:
acquiring an image frame acquired by image acquisition equipment;
segmenting the action area of the target object in the image frame to obtain an intermediate image;
inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action group frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points;
generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong;
and performing motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determining the motion type of the target object.
The foregoing is a schematic diagram of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the motion recognition method, and for details that are not described in detail in the technical solution of the computing device, reference may be made to the description of the technical solution of the motion recognition method.
An embodiment of the present specification also provides a computer readable storage medium storing computer instructions that, when executed by a processor, are operable to:
acquiring an image frame acquired by image acquisition equipment;
segmenting the action area of the target object in the image frame to obtain an intermediate image;
inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action group frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points;
generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong;
and performing motion recognition on the three-dimensional virtual object based on a motion recognition data set, and determining the motion type of the target object.
The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned motion recognition method, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the above-mentioned motion recognition method.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that for simplicity and convenience of description, the above-described method embodiments are shown as a series of combinations of acts, but those skilled in the art will appreciate that the present description is not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps from the present description. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and utilize the specification. The specification is limited only by the claims and their full scope and equivalents.

Claims (17)

1. A motion recognition method, comprising:
acquiring an image frame acquired by image acquisition equipment;
segmenting the action area of the target object in the image frame to obtain an intermediate image;
inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action group frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points;
generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong;
and performing motion recognition on the three-dimensional virtual object based on the motion elements obtained by analyzing the motion recognition data set, and determining the motion type of the target object.
2. The motion recognition method according to claim 1, wherein the motion recognition model performs key point recognition, and comprises:
identifying the action key points corresponding to the target object in the intermediate image, and determining key point labels of the action key points;
determining a target node label corresponding to the key point label based on the corresponding relation between the key point label and the node label of the action group frame node;
and determining the action group frame node corresponding to the action key point to which the key point label belongs according to the target node label.
3. The motion recognition method of claim 2, the motion recognition model performing coordinate mapping, comprising:
determining position information of the action key point in the intermediate image;
and mapping the three-dimensional coordinate information corresponding to the action key points based on the position information.
4. The motion recognition method according to claim 1, wherein the generating a three-dimensional virtual object from the three-dimensional coordinate information, the motion frame nodes, and the virtual motion frame to which the motion frame nodes belong comprises:
determining the corresponding relation between the action group frame nodes and the three-dimensional coordinate information according to the action key points, and determining the connection relation of the action group frame nodes based on the virtual action group frame;
and performing connection processing on the three-dimensional coordinate information according to the connection relation to generate the three-dimensional virtual object.
5. The motion recognition method according to claim 1, further comprising, after the step of generating a three-dimensional virtual object according to the three-dimensional coordinate information, the motion frame node, and the virtual motion frame to which the motion frame node belongs are executed, and before the step of performing motion recognition on the three-dimensional virtual object based on the motion recognition data set and determining the motion type of the target object is executed:
detecting the number of objects of the generated three-dimensional virtual object;
under the condition that the number of the objects is larger than a preset number threshold, carrying out standardization processing on a plurality of three-dimensional virtual objects corresponding to the number of the objects to obtain virtual objects to be selected;
and selecting a target virtual object from the virtual objects to be selected as the three-dimensional virtual object.
6. The motion recognition method according to claim 1, wherein before the step of acquiring the image frames captured by the image capturing device is performed, the method further comprises:
receiving a click instruction submitted by a user through an action interaction page;
displaying at least one action image frame to the user according to the click command; the action image frame comprises a display area corresponding to a display action.
7. The motion recognition method of claim 6, wherein the performing motion recognition on the three-dimensional virtual object based on the motion recognition data set, determining the motion type of the target object, comprises:
receiving the action identification data set issued by the server aiming at the action image frame; the action identification data carries an action identification rule matched with the display action;
judging whether the action of the three-dimensional virtual object is matched with the display action or not according to the action identification rule;
if yes, determining the display action type of the display action according to the action recognition data set, and taking the display action type as the action type.
8. The action recognition method of claim 7, after determining a type of the show action according to the action recognition data set and executing the type of the show action as the action type sub-step, further comprising:
and displaying recommendation information matched with the action type to the user through the action interaction page.
9. The motion recognition method according to claim 6, wherein after the step of performing motion recognition on the three-dimensional virtual object based on the motion elements obtained by parsing the motion recognition data set and determining the motion type of the target object is performed, the method further comprises:
matching the action type with a display action type of the display action;
if the matching is successful, displaying successful matching information to the user through the action interaction page;
and if the matching fails, displaying reminding information to the user through the action interaction page, wherein the reminding information carries an action strategy.
10. The motion recognition method according to claim 1, wherein, when the target object is a hand, the obtaining an intermediate image by performing segmentation processing on a motion region of the target object in the image frame, comprises:
detecting a characteristic region corresponding to the hand in the image frame;
cutting the image frame according to the characteristic area to obtain the intermediate image containing hand characteristics;
correspondingly, the action type is a gesture action type.
11. The motion recognition method according to claim 1, wherein the motion recognition of the three-dimensional virtual object based on the motion elements obtained by parsing the motion recognition data set to determine the motion type of the target object comprises:
analyzing the action recognition data set to obtain action elements;
determining an intermediate action type of the three-dimensional virtual object based on a recognition result of action recognition of the three-dimensional virtual object by the action element;
determining the intermediate action type as the action type of the target object.
12. The motion recognition method according to claim 11, wherein the determining of the intermediate motion type of the three-dimensional virtual object based on the recognition result of the motion recognition of the three-dimensional virtual object by the motion element includes:
calculating action angles among all virtual sub-objects in the three-dimensional virtual object;
and detecting the action angle based on the action element, and determining the intermediate action type of the three-dimensional virtual object according to the detection result.
13. The motion recognition method according to claim 11, wherein the determining of the intermediate motion type of the three-dimensional virtual object based on the recognition result of the motion recognition of the three-dimensional virtual object by the motion element includes:
detecting the action position of each virtual sub-object in the three-dimensional virtual object according to the action element;
determining the intermediate action type of the three-dimensional virtual object based on the detection result.
14. The action recognition method according to claim 1, wherein the action recognition data set is composed of an action recognition packet issued by a server, and the action recognition packet includes a recognition rule for recognizing the three-dimensional virtual object.
15. A motion recognition device comprising:
the acquisition module is configured to acquire an image frame acquired by an image acquisition device;
a processing module configured to perform segmentation processing on a motion region of a target object in the image frame to obtain an intermediate image;
the identification module is configured to input the intermediate image into the action identification model to perform key point identification and coordinate mapping, and obtain action group frame nodes corresponding to the identified action key points and three-dimensional coordinate information mapped by the action key points;
the generating module is configured to generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame node and the virtual action group frame to which the action group frame node belongs;
and the determining module is configured to perform motion recognition on the three-dimensional virtual object based on the motion elements obtained by analyzing the motion recognition data set, and determine the motion type of the target object.
16. A computing device, comprising:
a memory and a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring an image frame acquired by image acquisition equipment;
performing segmentation processing on the action area of the target object in the image frame to obtain an intermediate image;
inputting the intermediate image into an action recognition model to perform key point recognition and coordinate mapping, and obtaining action frame nodes corresponding to the recognized action key points and three-dimensional coordinate information mapped by the action key points;
generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action group frame nodes and the virtual action group frame to which the action group frame nodes belong;
and performing motion recognition on the three-dimensional virtual object based on motion elements obtained by analyzing the motion recognition data set, and determining the motion type of the target object.
17. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the action recognition method of any one of claims 1 to 14.
CN202010292042.2A 2020-04-14 2020-04-14 Action recognition method and device Active CN111401318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010292042.2A CN111401318B (en) 2020-04-14 2020-04-14 Action recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010292042.2A CN111401318B (en) 2020-04-14 2020-04-14 Action recognition method and device

Publications (2)

Publication Number Publication Date
CN111401318A CN111401318A (en) 2020-07-10
CN111401318B true CN111401318B (en) 2022-10-04

Family

ID=71433225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010292042.2A Active CN111401318B (en) 2020-04-14 2020-04-14 Action recognition method and device

Country Status (1)

Country Link
CN (1) CN111401318B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112083800A (en) * 2020-07-24 2020-12-15 青岛小鸟看看科技有限公司 Gesture recognition method and system based on adaptive finger joint rule filtering
CN112560622B (en) * 2020-12-08 2023-07-21 中国联合网络通信集团有限公司 Virtual object action control method and device and electronic equipment
CN113900521A (en) * 2021-09-30 2022-01-07 上海千丘智能科技有限公司 Interactive method and system for multi-person behavior training
CN114385004A (en) * 2021-12-15 2022-04-22 北京五八信息技术有限公司 Interaction method and device based on augmented reality, electronic equipment and readable medium
CN115830196B (en) * 2022-12-09 2024-04-05 支付宝(杭州)信息技术有限公司 Virtual image processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886741A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of gesture identification method of base finger identification
CN108256461A (en) * 2018-01-11 2018-07-06 深圳市鑫汇达机械设计有限公司 A kind of gesture identifying device for virtual reality device
CN109858524A (en) * 2019-01-04 2019-06-07 北京达佳互联信息技术有限公司 Gesture identification method, device, electronic equipment and storage medium
CN110163048A (en) * 2018-07-10 2019-08-23 腾讯科技(深圳)有限公司 Identification model training method, recognition methods and the equipment of hand key point
CN110221690A (en) * 2019-05-13 2019-09-10 Oppo广东移动通信有限公司 Gesture interaction method and device, storage medium, communication terminal based on AR scene
CN110991319A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Hand key point detection method, gesture recognition method and related device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2879022A4 (en) * 2012-07-27 2016-03-23 Nec Solution Innovators Ltd Three-dimensional user-interface device, and three-dimensional operation method
US9310895B2 (en) * 2012-10-12 2016-04-12 Microsoft Technology Licensing, Llc Touchless input
US20140375539A1 (en) * 2013-06-19 2014-12-25 Thaddeus Gabara Method and Apparatus for a Virtual Keyboard Plane
US9501810B2 (en) * 2014-09-12 2016-11-22 General Electric Company Creating a virtual environment for touchless interaction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886741A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of gesture identification method of base finger identification
CN108256461A (en) * 2018-01-11 2018-07-06 深圳市鑫汇达机械设计有限公司 A kind of gesture identifying device for virtual reality device
CN110163048A (en) * 2018-07-10 2019-08-23 腾讯科技(深圳)有限公司 Identification model training method, recognition methods and the equipment of hand key point
CN109858524A (en) * 2019-01-04 2019-06-07 北京达佳互联信息技术有限公司 Gesture identification method, device, electronic equipment and storage medium
CN110221690A (en) * 2019-05-13 2019-09-10 Oppo广东移动通信有限公司 Gesture interaction method and device, storage medium, communication terminal based on AR scene
CN110991319A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Hand key point detection method, gesture recognition method and related device

Also Published As

Publication number Publication date
CN111401318A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401318B (en) Action recognition method and device
CA3097712C (en) Systems and methods for full body measurements extraction
CN110532984B (en) Key point detection method, gesture recognition method, device and system
US10616475B2 (en) Photo-taking prompting method and apparatus, an apparatus and non-volatile computer storage medium
Raheja et al. Robust gesture recognition using Kinect: A comparison between DTW and HMM
US20190392587A1 (en) System for predicting articulated object feature location
CN113711235A (en) System and method for weight measurement from user photos using a deep learning network
CN109858333B (en) Image processing method, image processing device, electronic equipment and computer readable medium
Kang et al. Development of head detection and tracking systems for visual surveillance
US9734435B2 (en) Recognition of hand poses by classification using discrete values
CN111507285A (en) Face attribute recognition method and device, computer equipment and storage medium
CN112907569A (en) Head image area segmentation method and device, electronic equipment and storage medium
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
US8610831B2 (en) Method and apparatus for determining motion
CN111782041A (en) Typing method and device, equipment and storage medium
US9952671B2 (en) Method and apparatus for determining motion
CN114167993A (en) Information processing method and device
CN113781462A (en) Human body disability detection method, device, equipment and storage medium
WO2023041181A1 (en) Electronic device and method for determining human height using neural networks
CN111640185A (en) Virtual building display method and device
CN113646800A (en) Object condition determination system, object condition determination method, and program
CN115690891B (en) Face emotion recognition method, device, equipment and storage medium
CN116612495A (en) Image processing method and device
CN117635715A (en) Pose determining method, virtual image generating method and model training method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant