CN110135246B - Human body action recognition method and device - Google Patents

Human body action recognition method and device Download PDF

Info

Publication number
CN110135246B
CN110135246B CN201910264883.XA CN201910264883A CN110135246B CN 110135246 B CN110135246 B CN 110135246B CN 201910264883 A CN201910264883 A CN 201910264883A CN 110135246 B CN110135246 B CN 110135246B
Authority
CN
China
Prior art keywords
action
key
human body
candidate
video image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910264883.XA
Other languages
Chinese (zh)
Other versions
CN110135246A (en
Inventor
叶明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910264883.XA priority Critical patent/CN110135246B/en
Publication of CN110135246A publication Critical patent/CN110135246A/en
Priority to PCT/CN2019/103161 priority patent/WO2020199479A1/en
Application granted granted Critical
Publication of CN110135246B publication Critical patent/CN110135246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Abstract

The invention is applicable to the technical field of image recognition, and provides a human body action recognition method and device, wherein the method comprises the following steps: acquiring a video file of a target object; analyzing each video image frame, extracting a human body region image related to a target object in the video image frame, and determining an interactable object contained in the video image frame; marking each key part in a preset key part list of the human body in the human body area image, and acquiring feature coordinates of each key part; generating a key feature sequence according to feature coordinates of the key part corresponding to each video image frame; determining candidate actions of the target object through key feature sequences of the key parts; and respectively calculating the matching degree between each candidate action and the interactable object, and determining the action type of the target object according to the matching degree. According to the method and the device, whether the target user has interaction behavior is determined by means of interaction actions, so that a plurality of approximate gestures can be distinguished, and the accuracy of action recognition is further improved.

Description

Human body action recognition method and device
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a human body action recognition method and device.
Background
With the continuous development of image recognition technology, a computer can automatically recognize more and more information from image files and video files, for example, determine the human body action type of a user contained in a picture, and perform operations such as object tracking, object behavior analysis and the like based on the recognized action information, so that the accuracy and recognition rate of the image recognition technology directly influence the processing effect of subsequent steps. The existing human body motion recognition technology generally adopts a convolutional neural network for recognition, however, the technology needs to carry out time sequence recursion operation for a plurality of times by means of optical flow information, so that the recognition speed is low, the accuracy is low, and particularly, for partial approximate gesture behaviors, such as sitting and squatting actions, the accuracy of motion recognition is further reduced because the human body gestures are similar and cannot be accurately distinguished through the convolutional neural network.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a method and an apparatus for identifying a human body motion, so as to solve the problems that in the existing method for identifying a human body motion, the identification speed is low, and the accuracy is not high, and particularly, for partial similar gesture behaviors, for example, for sitting and squatting actions, the accuracy of motion identification is further reduced due to the fact that the human body gestures are similar and cannot be accurately distinguished through a convolutional neural network.
A first aspect of an embodiment of the present invention provides a method for identifying a human motion, including:
acquiring a video file of a target object; the video file includes a plurality of video image frames;
analyzing each video image frame, extracting a human body area image related to the target object in the video image frame, and determining an interactable object contained in the video image frame;
marking each key part in a preset key part list of the human body in the human body area image, and acquiring feature coordinates of each key part;
generating a key feature sequence related to the key part according to the feature coordinates of the key part corresponding to each video image frame;
determining at least one candidate action of the target object through the key feature sequences of the key parts;
and respectively calculating the matching degree between each candidate action and the interactable object, and determining the action type of the target object from the candidate actions according to the matching degree.
A second aspect of an embodiment of the present invention provides an apparatus for recognizing a human motion, including:
A video file acquisition unit for acquiring a video file of a target object; the video file includes a plurality of video image frames;
a human body region image extraction unit, configured to parse each video image frame, extract a human body region image related to the target object in the video image frame, and determine an interactable object contained in the video image frame;
the key part identification unit is used for marking each key part in a preset human body key part list in the human body area image and acquiring the characteristic coordinates of each key part;
a key feature sequence generating unit, configured to generate a key feature sequence related to the key location according to the feature coordinates corresponding to the key location in each video image frame;
a candidate action recognition unit, configured to determine at least one candidate action of the target object through the key feature sequences of the respective key parts;
and the action type recognition unit is used for respectively calculating the matching degree between each candidate action and the interactable object and determining the action type of the target object from the candidate actions according to the matching degree.
A third aspect of the embodiments of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the first aspect when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the first aspect.
The human body action recognition method and the human body action recognition device provided by the embodiment of the invention have the following beneficial effects:
according to the embodiment of the invention, the video file of the target user needing to be subjected to action analysis is obtained, each video image frame of the video file is analyzed, the human body region image contained in each video image frame is determined, the interactable object which can have interaction action with the target user in the video image frame is identified, each key part is marked in the human body region image, the change condition of each part of the target object is determined according to the feature coordinates of each key part, so that the candidate action of the target object is determined, the candidate actions with similar multiple postures are further screened according to the matching degree between the candidate actions and the interactable object, the action type of the target object is determined, and the human body action of the target object is automatically identified. Compared with the existing human body motion recognition technology, the method and the device have the advantages that the motion type of the video image is not required to be recognized by the aid of the neural network, recognition time delay caused by time sequence recursion is avoided, recognition efficiency is improved, on the other hand, the terminal equipment can determine the interactive object in the video image frame, and whether the interactive behavior exists in the target user or not is determined by means of the interactive motion, so that multiple approximate gestures can be distinguished, and accuracy of motion recognition is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an implementation of a method for identifying human actions according to a first embodiment of the present invention;
fig. 2 is a flowchart of an implementation of a method S106 for identifying a human motion according to a second embodiment of the present invention;
fig. 3 is a flowchart of an implementation of a method S104 for identifying a human motion according to a third embodiment of the present invention;
fig. 4 is a flowchart of a specific implementation of a method S102 for identifying a human motion according to a fourth embodiment of the present invention;
fig. 5 is a flowchart of an implementation of a method S105 for identifying a human motion according to a fifth embodiment of the present invention;
FIG. 6 is a block diagram of a human motion recognition device according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a terminal device according to another embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
According to the embodiment of the invention, the video file of the target user needing to perform action analysis is obtained, all video image frames of the video file are analyzed, the human body region image contained in each video image frame is determined, the interactable object which can have interaction action with the target user in the video image frame is identified, all key parts are marked in the human body region image, the change condition of all parts of the target object is determined according to the characteristic coordinates of all the key parts, so that the candidate action of the target object is determined, the candidate actions with similar gestures are further screened according to the matching degree between the candidate actions and the interactable object, the action types of the target object are determined, the human body action of the target object is automatically identified, the problems that the existing human body action identification method is low in identification speed and low in accuracy, and particularly for partial similar gesture actions such as sitting and squatting are solved, and the accuracy of action identification is further reduced due to the fact that the human body gestures are similar and cannot be accurately distinguished through a convolutional neural network are solved.
In the embodiment of the present invention, the execution subject of the flow is a terminal device. The terminal device includes, but is not limited to: and the server, the computer, the smart phone, the tablet personal computer and other equipment capable of executing the identification operation of human body actions. Fig. 1 shows a flowchart of an implementation of a method for identifying a human motion according to a first embodiment of the present invention, which is described in detail below:
in S101, a video file of a target object is acquired; the video file includes a plurality of video image frames.
In this embodiment, the administrator may designate a video file containing a target object as the target video file, in which case the terminal device downloads the video file about the target object from the video database according to the file identification of the target video file, and identifies the action behavior of the target object. Preferably, the terminal device is a video monitoring device, and obtains a video file in a current scene; in this case, the terminal device recognizes each object captured in the current scene as a target object, configures an object number for each object based on face images of different captured objects, determines an action type of each monitored object in real time according to a video file generated in the monitoring process, and if the action type of a certain target object is detected to be in an abnormal action list, generates warning information to inform the monitored object executing the abnormal action to stop the abnormal action, thereby achieving the purpose of warning the abnormal action of the monitored object in real time.
Alternatively, the user may transmit face information of the target object to the terminal device. And the terminal equipment searches the face of each video file in the video database based on the face information, and takes the video file containing the face information as a target video file. The specific seek operation may be: the terminal equipment identifies candidate faces in each video image frame in each video file in a video database, extracts face characteristic values of key areas in the candidate faces, matches the face characteristic values of each candidate face with face information of a target face, and if the matching degree of the face characteristic values of each candidate face and the face information of the target face is larger than a preset matching threshold value, the face characteristic values of each candidate face and the face information of the target face are represented to correspond to the same entity person, and the video file is identified to be a face image containing the target object.
In this embodiment, the video file includes a plurality of video image frames, each video image frame corresponds to a frame number, and each video image frame is arranged and packaged based on a positive sequence of the frame numbers to generate the video file. The frame number may be determined based on the time the video image frame was played in the video file.
In S102, each of the video image frames is parsed, a human body region image of the video image frame with respect to the target object is extracted, and an interactable object contained in the video image frame is determined.
In this embodiment, the terminal device analyzes the video file, respectively performs human body recognition on each video image frame in the video file, and extracts a human body region image of each video image frame about the target object. The specific way of extracting the human body region image can be as follows: the terminal equipment judges whether the video image frame contains a human face area image or not through a human face recognition algorithm, and if not, the video image frame does not contain the human face area image; otherwise, if the video image frame contains a face image, carrying out contour recognition on the region of the coordinate based on the coordinate where the face image is located, extracting a human body region image corresponding to the face image based on contour information obtained by recognition, and carrying out matching according to the face image and a face template of a target object, so as to judge whether the human body region image is a human body region image of the target object.
Optionally, if the number of the target objects is multiple, that is, the behaviors of the multiple objects need to be monitored, after determining the human body area image of the human face image contained in the video image frame, the terminal device matches the human face image with the human face templates of the target objects, so as to determine the target objects corresponding to the human face image, mark the object identifiers of the associated target objects on the human body area image, and then can quickly determine the human body area image corresponding to each target object in the video image frame, thereby facilitating the motion tracking of multiple objects.
Optionally, in this embodiment, the terminal device may obtain, according to the object identifier of the target object, an object human template associated with the object identifier. The object human body template can be used for representing human body characteristics of the target object, such as body type information, sex information and/or hairstyle information, the terminal equipment can perform sliding frame extraction in a video image frame according to the object human body template, calculate the matching degree between a framed candidate region and the object human body template, and identify the candidate region as a human body region image of the target object if the matching degree is greater than a preset matching threshold; otherwise, if the matching degree of the two is smaller than or equal to the matching threshold, identifying the human body region image of which the candidate region is not the target object, and continuing to slide and frame; if all the candidate areas in the video image frame do not contain the human body area image, repeating the operation on the video image frame of the next frame, and identifying the human body area image of the target object.
In this embodiment, the terminal device may extract, in addition to the human body region image of the target object, an interactable object that can interact with the user from the image. The specific identification mode can be as follows: and determining contour information contained in the video image frame through a contour recognition algorithm, determining the type of a main body of each shooting main body based on the contour information, and determining an interactable object according to the type of the main body. The profile characteristics of different types of interactable subjects may be different, so that the subject type of the shooting subject can be determined by identifying the wheel base information, and the shooting subject capable of interacting with the target object is selected as the interactable object according to the subject type. For example, a photographing object such as a chair, a table, a knife, etc. may interact with the target object, whereas a photographing subject such as a cloud, a sun, etc. may have a low probability of interacting with the target object. Thus, by identifying the subject type, a large portion of the invalid interactable objects can be filtered.
Optionally, after identifying the shooting subjects, the terminal device calculates a distance value between each shooting subject and the human body region image, and selects the shooting subject with the distance value smaller than a preset threshold as the interactable object. Preferably, the terminal device may select a shooting subject having a contour boundary adjacent to the human body region image as the interactable subject, and since the target object interacts with the interaction subject, i.e., the two are in contact with each other, the contour boundary of the interactable object is adjacent to the target user.
In S103, each key part in a preset key part list of the human body is marked in the human body area image, and feature coordinates of each key part are obtained.
In this embodiment, the terminal device stores a human body key part list, where the human body key part list includes a plurality of human body key parts, and preferably, the human body key part list includes 17 key parts, which are respectively: 17 key parts of nose, eyes, ears, shoulders, wrists, hands, waists, knees and feet. By positioning a plurality of key parts of the human body and tracking the movement change conditions of the key parts, the accuracy of human body action recognition can be improved.
In this embodiment, the terminal device marks each key part in the image of the human body area in the specific marking manner: based on the contour information of the human body region image, determining the current gesture type of the target object, wherein the gesture type is specifically: standing type, walking type, lying type, sitting type, etc., and then marking each key position on the human body region image according to the corresponding relation between different key positions and gesture type. Optionally, the correspondence records a distance value and a relative direction vector of the key part and a contour center point of the human body region image, and the terminal device may locate each key part based on the distance value and the relative direction vector and perform the marking operation.
In this embodiment, the terminal device establishes an image coordinate axis based on the video image frame, and determines feature coordinates of each key location according to the location of each key location on the video image frame. Optionally, the terminal device may use the end point of the lower left corner of the video image frame as the origin of coordinates, or may use the center point of the image as the origin of coordinates, which is determined according to the default settings of the administrator or the device.
In S104, a key feature sequence regarding the key location is generated according to the feature coordinates of the key location corresponding to each of the video image frames.
In this embodiment, the terminal device needs to determine the motion trail of each key location, so that, based on the location identifier of the key location, the terminal device extracts the feature coordinates corresponding to the location identifier from each video image frame, encapsulates all the feature coordinates related to the feature location, and generates the key feature sequence related to the feature location. The sequence order of each element in the key feature sequence is consistent with the frame number of the video image frame, namely each element in the key feature sequence has a time sequence relationship, so that the condition that the key part changes based on time can be determined through the key feature sequence.
Optionally, if the key part in the partial video image frame is blocked and no corresponding feature coordinate exists, the terminal device may establish a feature curve about the key part on a preset coordinate axis according to the frame number of the video image frame, sequentially connect each feature coordinate based on the frame number, and fill the feature coordinate corresponding to the missing video image frame through a smoothing algorithm, so as to determine the feature coordinate corresponding to the missing video image frame.
In S105, at least one candidate action of the target object is determined by the key feature sequence of each key part.
In this embodiment, according to the key feature sequences of the plurality of key parts, the terminal device may determine the motion trajectories of the different key parts, and then, the action types conforming to the motion trajectories are used as candidate types. Specifically, the terminal device may determine a movement direction of the key part according to the key feature sequence, and then match the movement directions of the key parts with the movement directions of the key parts of each action template in each action type library one by one based on the movement directions of the plurality of key parts, and select, based on the number of matched key parts, for example, an action template with the number of matched key parts being greater than a preset matching threshold as a candidate action of the target object.
Optionally, the terminal device may be provided with a maximum frame number, and then the terminal device divides the key feature sequence of the key part into a plurality of feature subsequences based on the maximum frame number, and determines action types of different feature subsequences respectively.
In S106, a matching degree between each candidate action and the interactable object is calculated, and an action type of the target object is determined from the candidate actions according to the matching degree.
In this embodiment, the terminal device may obtain an interaction behavior list of the interactable object, detect similarities between the candidate actions and respective interaction behaviors in the interaction behavior list, select the maximum similarity as a matching degree between the candidate actions and the interactable object, and then determine an action type of the target object according to the matching degree of the respective candidate actions. It should be noted that, the identified action types may be multiple, for example, the user may cut the fruit with the fruit knife while holding the fruit, that is, the method includes two interactive actions of "holding" and "cutting", so the number of action types finally identified by the terminal device may be multiple. Based on the above, the terminal device may select the candidate action with the matching degree larger than the preset matching threshold as the action type currently executed by the target object.
For another example, the terminal device may determine the action type of the video file obtained by video monitoring, specifically, the video file may be a video file related to a security check area, determine the interaction behavior of personnel in the security check area, and detect whether there is an abnormal behavior of the user. The method comprises the steps of locating a target object to be identified in a video monitoring file, judging action types between the target object and each interactable object, wherein the interactable object can be a suitcase or a certificate to be authenticated, judging whether a user submits the suitcase to carry out security check operation according to a rule or takes dangerous goods from the suitcase to avoid the security check operation, and accordingly accuracy of a security check process can be improved.
Optionally, the terminal device may identify a distance value between each interactable object and the image of the human body region, select one interactable object with the lowest distance value as a target interaction object, and calculate a matching degree between the target interaction object and each candidate action, thereby determining an action type of the target object.
As can be seen from the foregoing, in the method for identifying a human body motion provided by the embodiment of the present invention, by acquiring a video file of a target user that needs to perform motion behavior analysis, analyzing each video image frame of the video file, determining a human body region image included in each video image frame, marking each key position in the human body region image, and determining a change condition of each position of the target object according to feature coordinates of each key position, thereby determining a motion type of the target object, and automatically identifying a human body motion of the target object. Compared with the existing human body motion recognition technology, the embodiment of the invention does not need to rely on a neural network to recognize the motion type of the video image, does not use optical flow information, avoids recognition time delay caused by time sequence recursion, improves recognition efficiency, determines the motion of a target object by positioning a plurality of key parts and determining the change condition of the plurality of key parts, and further improves accuracy, thereby improving the image recognition effect and the object behavior analysis efficiency.
Fig. 2 shows a flowchart of a specific implementation of a human motion recognition method S106 according to a second embodiment of the present invention. Referring to fig. 2, with respect to the embodiment described in fig. 1, the method S106 for identifying a human motion provided in this embodiment includes: s1061 to S1066 are specifically described below:
further, the calculating the matching degree between each candidate action and the interactable object, and determining the action type of the target object from the candidate actions according to the matching degree, includes:
in S1061, a distance value between the interactable object and the human body region image is acquired, and an interaction confidence of the interactable object is determined based on the distance value.
In this embodiment, the terminal device may mark an area image in which the interactable object is located on a video image frame, use a central coordinate of the area image as a feature coordinate of the interactable object, calculate an euclidean distance between the feature coordinate and a central coordinate of the human body area, and use the euclidean distance as a distance value between the interactable object and the human body area image. If the distance value is smaller, the interaction probability between the two is larger; conversely, if the distance value is larger, the interaction probability between the two is smaller. Therefore, the terminal device can calculate the interaction confidence between the interactable object and the target human body according to the distance value.
In 1062, the similarity between the key feature sequence and the standard feature sequence of each candidate action is calculated, and the similarity is identified as the action confidence of the candidate action.
In this embodiment, the terminal device needs to determine the correct probability of the identified candidate action, so that the standard feature sequence of the candidate action is obtained, and the similarity between the key feature sequence and the standard feature sequence in the multiple video image frames is calculated. The calculation mode of the similarity may be: the terminal equipment generates a standard curve related to a standard feature sequence on a preset coordinate axis, calculates a behavior curve of the key feature sequence, calculates the area of a closed area surrounded by the two curves, and determines the similarity between the key feature sequence and the standard feature sequence based on the area. If the area is larger, the difference between the two actions is larger, and the similarity is smaller; conversely, if the area is smaller, the smaller the difference between the two actions is, the greater the similarity is.
In S1063, based on the object type of the interactable object, a probability of interaction of the candidate action with the object type is determined.
In this embodiment, the terminal device determines, according to the contour information of the interactable object, an object type of the interactable object, that is, determines which type of article the interactable object belongs to, and determines interaction probability of the object type and the candidate action. For example, the object type of "basketball" can be used as an action receptor of "shooting" and "kicking" waiting for selection actions, namely, the interaction probability is high; and for the 'sitting' and 'standing' waiting for the selection action, the 'basketball' object type cannot be interacted with, and the interaction probability is small. The terminal equipment can acquire the action receptor object of each candidate action according to the action record library, calculate the number of action records corresponding to the object type, and determine the interaction probability between the object type and the candidate action based on the number.
In S1064, an object region image of the interactable object is extracted from the video image frame, and an object confidence level of the interactable object is determined according to the object region image and a standard image preset by the object type.
In this embodiment, the terminal device further needs to determine the accuracy of identifying the interactable object, so that an object region image of the interactable object is obtained, similarity comparison is performed between the object region image and a standard image matched with the object type, and the object confidence of the interactable object is determined according to the similarity between the two images.
In S1065, the interaction confidence, the action confidence, the object confidence, and the interaction probability are imported into a matching degree calculation model, and the matching degree of the candidate action is determined; the matching degree calculation model specifically comprises the following steps:
wherein ,the degree of matching for the candidate action a; />The interaction confidence is the interaction confidence; s is(s) h The action confidence is the action confidence; s is(s) o Confidence for the object; />Is the interaction probability; />And the triggering probability of the candidate action a is preset.
In this embodiment, the terminal device imports the four calculated parameters into the matching degree calculation model to determine the matching degree between the candidate action and the interactable object, so that the action type can be screened and identified by means of the interaction object. Specifically, the trigger probability of the candidate action can be calculated according to the action type corresponding to the previous image frame and the action type of the next image frame, and the trigger probability of the current action can be determined through the triggered action and the subsequent action because the action has a certain continuity.
In S1066, the candidate action with the matching degree greater than the matching threshold is selected as the action type of the target object.
In this embodiment, since there may be a plurality of interactions with the interactable object, the terminal device may select, as the type of the action of the target object, a candidate action having a matching degree greater than a preset matching threshold.
In the embodiment of the invention, the confidence degrees of the candidate actions and the interactable objects in a plurality of dimensions are determined, so that the matching degree of each candidate action is calculated, the accuracy of the matching degree calculation can be improved, and the accuracy of human action recognition is improved.
Fig. 3 is a flowchart showing a specific implementation of a human motion recognition method S104 according to a third embodiment of the present invention. Referring to fig. 3, with respect to the embodiment described in fig. 1, the method S104 for identifying a human motion provided in this embodiment includes: s1041 to S1045 are specifically described below:
further, the generating a key feature sequence about the key location according to the feature coordinates of the key location corresponding to each video image frame includes:
in S1041, a first feature coordinate and a second feature coordinate of the same key part in two video image frames with adjacent frames are obtained, and an image distance value between the first feature coordinate and the second feature coordinate is calculated.
In this embodiment, the terminal device needs to track the key parts of the human body, and if the displacement of the same key part in two adjacent image frames is detected to be too large, the two key parts are identified to belong to different human bodies, so that re-tracking can be quickly performed, and the accuracy of motion recognition is improved. Based on the above, the terminal device obtains the first feature coordinates and the second feature coordinates of the same key part in two video image frames adjacent to each other in the frame number, and introduces the two feature coordinates into the euclidean distance calculation formula to calculate the distance value between the two coordinate points, namely the image distance value. The image distance value specifically refers to the distance between two coordinate points on the video image frame, and is not the moving distance of the key part in the actual scene, so that the image distance value needs to be subjected to numerical conversion.
In S1042, an image area of the human body region image is calculated, and a photographing focal length between the target object and a photographing module is determined based on the image area.
In this embodiment, the terminal device acquires the area occupied by the human body region image in the video image frame, that is, the image area. The terminal equipment is provided with a standard human body area and a standard shooting focal length corresponding to the area. The terminal device may calculate a ratio between the current image area and the standard human body area, determine a scaling ratio, and calculate an actual photographing focal length between the target object and the photographing model, that is, the photographing focal length described above, based on the scaling ratio and the standard photographing focal length.
In S1043, importing the shooting focal length, the image distance value and the shooting frame rate of the video file into a distance conversion model, and calculating the actual moving distances of the key parts in the two video image frames; the distance conversion model specifically comprises the following steps:
wherein Dist is the actual movement distance; standard dist is the image distance value; figDist is the shooting focal length; baseDist is a preset reference focal length; actFrame is the shooting frame rate; baseFrame is the reference frame rate.
In this embodiment, the shooting focal length corresponding to the video image frame, the image distance values of the two key parts and the shooting frame rate of the video file are imported into the distance conversion model by the terminal device, so that the actual moving distance of the key parts in the scene can be calculated.
In S1044, identifying two feature coordinates of which the actual moving distance is smaller than a preset distance threshold as feature coordinates associated with each other.
In this embodiment, if the terminal device detects that the actual moving distance is greater than or equal to the preset distance threshold, it indicates that the moving distance of the key part exceeds the normal moving distance, at this time, it identifies that the key part in the two video image frames belongs to different target objects, and at this time, it determines that the two feature coordinates are non-associated feature coordinates; otherwise, if the actual moving distance value is smaller than the preset distance threshold value, the fact that the key parts in the two video image frames belong to the same target object is indicated, the fact that the two feature coordinates are associated feature coordinates is judged at the moment, the purpose of tracking the target object is achieved, the situation that the moving track of the user A is tracked is avoided, the moving track of the user B is tracked, and the accuracy of motion recognition is improved.
At S1045, the key feature sequence for the key location is generated according to all the feature coordinates associated with each other.
In this embodiment, the terminal device filters all the feature coordinates that are not associated, encapsulates the feature coordinates that are associated with each other, and generates a key feature sequence related to the key location.
In the embodiment of the invention, the abnormal characteristic coordinate points can be filtered by calculating the actual moving distance of the key parts under different frames, so that the accuracy of motion recognition is improved.
Fig. 4 is a flowchart showing a specific implementation of a human motion recognition method S102 according to a fourth embodiment of the present invention. Referring to fig. 4, with respect to the embodiments described in fig. 1 to 3, a method S102 for identifying a human motion provided in this embodiment includes: s1021 to S1024 are specifically described as follows:
further, the analyzing each video image frame separately, extracting a human body area image about the target object in the video image frame includes:
in S1021, a contour curve of the video image frame is acquired by a contour recognition algorithm, and an area surrounded by each contour curve is calculated.
In this embodiment, the terminal device determines the contour curve in the video image frame by a contour recognition algorithm. The specific way of identifying the contour line can be as follows: and the terminal equipment calculates the difference value of pixel values between two adjacent coordinate points, if the difference value is larger than a preset contour threshold value, the coordinate point is identified as the coordinate point where the contour line is located, and all the coordinate points on the contour line obtained by identification are connected to form a continuous contour curve. Each closed contour curve corresponds to a subject.
In this embodiment, the terminal device marks all contour curves on the video image frame, and integrates the contour curves and/or the areas enclosed between the boundaries of the video image frame, so as to obtain the area corresponding to each contour curve.
In S1022, a human body recognition window of the video image frame is generated according to each of the area areas.
In this embodiment, because of different scaling ratios, the size of the human body recognition window needs to be adjusted accordingly, based on this, the terminal device may calculate the scaling ratio corresponding to the video image frame according to the area of each shooting object, and inquire the size of the human body recognition window associated with the scaling ratio, and then generate the human body recognition window matched with the video image frame.
Optionally, in this embodiment, the terminal device adopts a yolov3 human body recognition algorithm, and yolov3 needs to configure 3 human body recognition windows. Based on the above, the terminal equipment generates the distribution condition of the area according to the area surrounded by each contour curve, selects three area with the largest distribution density as the characteristic area, and generates the human body recognition window corresponding to the three characteristic areas, namely three feature maps.
In S1023, sliding framing is performed on the video image frame based on the human body recognition window, and a plurality of candidate region images are generated.
In this embodiment, after generating the human body recognition window corresponding to the scaling ratio of the video image frame, the terminal device may perform sliding framing on the video image frame through the human body recognition window, and use the area image framed each time as the candidate area image. If a plurality of human body recognition windows with different sizes exist, concurrent threads corresponding to the number of the human body recognition windows are created, the plurality of video image frames are copied, and the human body recognition windows are respectively controlled to slide and frame on different video image frames through the plurality of concurrent threads, namely, the sliding and frame operation of the human body recognition windows with different sizes are mutually independent and do not affect each other, and candidate region images with different sizes are generated.
In S1024, the coincidence ratios between the candidate region images and the standard human body template are calculated, respectively, and the candidate region image with the coincidence ratio greater than the preset coincidence ratio threshold is selected as the human body region image.
In this embodiment, the terminal device calculates the coincidence ratio between the candidate region image and the standard human body template, if the coincidence ratio between the candidate region image and the standard human body template is higher, the similarity between the shooting object corresponding to the region image and the target object is higher, so that the candidate region can be identified as a human body region image; on the other hand, if the overlapping ratio between the two is lower, the similarity between the form of the region image and the target object is lower, and the region image is identified as a non-human region image. Because the video image frame may include a plurality of different users, the terminal device may identify all candidate areas with the coincidence rate exceeding the preset coincidence rate threshold as the human body area images, in which case, the terminal device may locate the face images of the respective human body area images, so as to match the human body images with the standard face of the target object, and thereby select the human body area image matched with the standard face as the human body area image of the target object.
In the embodiment of the invention, the contour curves in the video image frames are acquired, so that the scaling ratio of the video image frames is determined based on the area of each contour curve, and the human body recognition window corresponding to the scaling ratio is generated to perform the recognition operation of the human body area images, thereby improving the recognition accuracy.
Fig. 5 shows a flowchart of a specific implementation of a human motion recognition method S105 according to a fifth embodiment of the present invention. Referring to fig. 5, with respect to the embodiments described in fig. 1 to 3, a method S105 for identifying a human motion provided in this embodiment includes: s1051 to S1052, the details are as follows:
further, the determining at least one candidate action of the target object by the key feature sequence of each key part includes:
in S1051, feature coordinates of each of the key feature sequences are marked in a preset coordinate axis, and a part change curve for each of the key parts is generated.
In this embodiment, the terminal device marks each feature coordinate on a preset coordinate axis according to the coordinate values of each feature coordinate in each key feature sequence and the frame number of the corresponding video image frame, and connects each feature coordinate to generate a location change curve about the key location. The coordinate axis may be a coordinate axis established based on the video image frame, with the horizontal axis corresponding to the length of the video image frame and the vertical axis corresponding to the width of the video image frame.
In S1052, the part change curve is matched with a standard action curve of each candidate action in a preset action library, and the candidate action of the target object is determined based on the matching result.
In this embodiment, the terminal device matches the part change curves of all the key parts with the standard action curves of each candidate action in the preset action library, calculates the coincidence rate of the two change curves, and selects one candidate action with the highest coincidence rate as the action type of the target object.
In the embodiment of the invention, the action type of the target object can be intuitively determined by drawing the part change curve of the key part, and the accuracy of the action type is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Fig. 6 is a block diagram of a human motion recognition apparatus according to an embodiment of the present invention, where the human motion recognition apparatus includes units for performing the steps in the embodiment corresponding to fig. 1. Please refer to fig. 1 and the related description of the embodiment corresponding to fig. 1. For convenience of explanation, only the portions related to the present embodiment are shown.
Referring to fig. 6, the human motion recognition apparatus includes:
a video file acquisition unit 61 for acquiring a video file of a target object; the video file includes a plurality of video image frames;
a human body region image extracting unit 62, configured to parse each of the video image frames, extract a human body region image related to the target object in the video image frames, and determine an interactable object contained in the video image frames;
a key part identification unit 63, configured to mark each key part in a preset human body key part list in the human body area image, and obtain feature coordinates of each key part;
a key feature sequence generating unit, configured to generate a key feature sequence related to the key location according to the feature coordinates 64 corresponding to the key location in each of the video image frames;
a candidate action recognition unit 65 for determining at least one candidate action of the target object by the key feature sequences of the respective key parts;
an action type recognition unit 66, configured to calculate a degree of matching between each candidate action and the interactable object, and determine an action type of the target object from the candidate actions according to the degree of matching.
Optionally, the action type recognition unit 66 includes:
the interactive confidence calculation unit is used for acquiring a distance value between the interactable object and the human body region image and determining the interactive confidence of the interactable object based on the distance value;
the action confidence degree identification unit is used for respectively calculating the similarity between the key feature sequence and the standard feature sequence of each candidate action and identifying the similarity as the action confidence degree of the candidate action;
an interaction probability determining unit, configured to determine an interaction probability of the candidate action with the object type based on the object type of the interactable object;
the object confidence degree identification unit is used for extracting an object region image of the interactable object from the video image frame and determining the object confidence degree of the interactable object according to the object region image and a standard image preset by the object type;
a matching degree calculation unit, configured to import the interaction confidence degree, the action confidence degree, the object confidence degree and the interaction probability into a matching degree calculation model, and determine the matching degree of the candidate action; the matching degree calculation model specifically comprises the following steps:
wherein ,the degree of matching for the candidate action a; />The interaction confidence is the interaction confidence; s is(s) h To be the instituteThe action confidence; s is(s) o Confidence for the object; />Is the interaction probability; />The triggering probability of the candidate action a is preset;
and the candidate action selecting unit is used for selecting the candidate actions with the matching degree larger than a matching threshold value and identifying the candidate actions as the action types of the target objects.
Optionally, the key feature sequence generating unit 64 includes:
the image distance value calculation unit is used for acquiring first feature coordinates and second feature coordinates of the same key part in two adjacent video image frames of the frame number, and calculating an image distance value between the first feature coordinates and the second feature coordinates;
a shooting focal length determining unit for calculating an image area of the human body region image and determining a shooting focal length between the target object and a shooting module based on the image area;
an actual moving distance calculating unit, configured to import the shooting focal length, the image distance value, and the shooting frame rate of the video file into a distance conversion model, and calculate actual moving distances of the key parts in the two video image frames; the distance conversion model specifically comprises the following steps:
Wherein Dist is the actual movement distance; standard dist is the image distance value; figDist is the shooting focal length; baseDist is a preset reference focal length; actFrame is the shooting frame rate; baseFrame is the reference frame rate;
the associated coordinate recognition unit is used for recognizing the two feature coordinates with the actual moving distance smaller than a preset distance threshold as feature coordinates which are associated with each other;
and the associated coordinate packaging unit is used for generating the key feature sequence related to the key part according to all the feature coordinates which are associated with each other.
Optionally, the human body region image extraction unit 62 includes:
the contour curve acquisition unit is used for acquiring contour curves of the video image frames through a contour recognition algorithm and calculating the area surrounded by each contour curve;
the human body identification window generation unit is used for generating a human body identification window of the video image frame according to the area of each region;
a candidate region image extraction unit, configured to perform sliding frame extraction on the video image frame based on the human body recognition window, and generate a plurality of candidate region images;
and the human body region image matching unit is used for respectively calculating the coincidence rate between each candidate region image and the standard human body template, and selecting the candidate region image with the coincidence rate larger than a preset coincidence rate threshold value as the human body region image.
Optionally, the action type recognition unit 65 includes:
a position change curve generating unit, configured to mark feature coordinates of each key feature sequence in a preset coordinate axis, and generate a position change curve about each key position;
and the candidate action selecting unit is used for matching the part change curve with standard action curves of all candidate actions in a preset action library and determining the action type of the target object based on a matching result.
Therefore, the recognition device for human body actions provided by the embodiment of the invention can also recognize the action type of the video image without depending on a neural network and without help of optical flow information, so that recognition time delay caused by time sequence recursion is avoided, the recognition efficiency is improved, on the other hand, the terminal device can determine the interactive object in the video image frame, and determine whether the target user has interactive actions or not by virtue of the interactive actions, so that a plurality of approximate gestures can be distinguished, and the accuracy of action recognition is further improved.
Fig. 7 is a schematic diagram of a terminal device according to another embodiment of the present invention. As shown in fig. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72 stored in said memory 71 and executable on said processor 70, for example an identification program of a human action. The steps in the above-described respective human motion recognition method embodiments are implemented by the processor 70 when executing the computer program 72, such as S101 to S106 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, performs the functions of the units in the above-described device embodiments, such as the functions of the modules 61 to 66 shown in fig. 6.
By way of example, the computer program 72 may be divided into one or more units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 72 in the terminal device 7. For example, the computer program 72 may be divided into a video file acquisition unit, a human body region image extraction unit, a key part recognition unit, a key feature sequence generation unit, a candidate action recognition unit, and an action type recognition unit, each of which functions specifically as described above.
The terminal device 7 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the terminal device 7 and does not constitute a limitation of the terminal device 7, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be another general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 71 may also be used for temporarily storing data that has been output or is to be output.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (8)

1. A method for recognizing human motion, comprising:
acquiring a video file of a target object; the video file includes a plurality of video image frames;
analyzing each video image frame, extracting a human body area image related to the target object in the video image frame, and determining an interactable object contained in the video image frame;
Marking each key part in a preset key part list of the human body in the human body area image, and acquiring feature coordinates of each key part;
generating a key feature sequence related to the key part according to the feature coordinates of the key part corresponding to each video image frame;
determining at least one candidate action of the target object through the key feature sequences of the key parts;
calculating the matching degree between each candidate action and the interactable object respectively, and determining the action type of the target object from the candidate actions according to the matching degree;
the calculating the matching degree between each candidate action and the interactable object, and determining the action type of the target object from the candidate actions according to the matching degree, includes:
acquiring a distance value between the interactable object and the human body region image, and determining interaction confidence of the interactable object based on the distance value;
respectively calculating the similarity between the key feature sequence and the standard feature sequence of each candidate action, and identifying the similarity as the action confidence of the candidate action;
Determining an interaction probability of the candidate action with the object type based on the object type of the interactable object;
extracting an object region image of the interactable object from the video image frame, and determining the object confidence of the interactable object according to the object region image and a standard image preset by the object type;
importing the interaction confidence, the action confidence, the object confidence and the interaction probability into a matching degree calculation model to determine the matching degree of the candidate action; the matching degree calculation model specifically comprises the following steps:
wherein ,the degree of matching for the candidate action a; />The interaction confidence is the interaction confidence; s is(s) h The action confidence is the action confidence; s is(s) o Confidence for the object; />Is the interaction probability; />The triggering probability of the candidate action a is preset;
and selecting the candidate actions with the matching degree larger than a matching threshold as action types of the target objects.
2. The method of claim 1, wherein generating a key feature sequence for the key location based on the feature coordinates of the key location in each of the video image frames, comprises:
Acquiring first feature coordinates and second feature coordinates of the same key part in two adjacent video image frames, and calculating an image distance value between the first feature coordinates and the second feature coordinates;
calculating the image area of the human body area image, and determining the shooting focal length between the target object and the shooting module based on the image area;
importing the shooting focal length, the image distance value and the shooting frame rate of the video file into a distance conversion model, and calculating the actual moving distance of the key part in the two video image frames; the distance conversion model specifically comprises the following steps:
wherein Dist is the actual movement distance; standard dist is the image distance value; figDist is the shooting focal length; baseDist is a preset reference focal length; actFrame is the shooting frame rate; baseFrame is the reference frame rate;
identifying two feature coordinates of which the actual moving distance is smaller than a preset distance threshold as feature coordinates which are associated with each other;
and generating the key feature sequence related to the key part according to all the feature coordinates which are mutually associated.
3. The method of any of claims 1-2, wherein the parsing each of the video image frames, extracting a human body region image of the video image frames with respect to the target object, and determining an interactable object contained in the video image frames, comprises:
Acquiring contour curves of the video image frames through a contour recognition algorithm, and calculating the area surrounded by each contour curve;
according to the area of each region, generating a human body identification window of the video image frame;
sliding frame extraction is carried out on the video image frame based on the human body recognition window, and a plurality of candidate area images are generated;
and respectively calculating the coincidence rate between each candidate region image and the standard human body template, and selecting the candidate region image with the coincidence rate larger than a preset coincidence rate threshold value as the human body region image.
4. The method of any of claims 1-2, wherein said determining at least one candidate action of the target object from the key feature sequence of each of the key locations comprises:
marking feature coordinates of each key feature sequence in a preset coordinate axis to generate a part change curve about each key part;
and matching the position change curve with a standard action curve of each candidate action in a preset action library, and determining the candidate action of the target object based on a matching result.
5. An apparatus for recognizing human motion, comprising:
a video file acquisition unit for acquiring a video file of a target object; the video file includes a plurality of video image frames;
a human body region image extraction unit, configured to parse each video image frame, extract a human body region image related to the target object in the video image frame, and determine an interactable object contained in the video image frame;
the key part identification unit is used for marking each key part in a preset human body key part list in the human body area image and acquiring the characteristic coordinates of each key part;
a key feature sequence generating unit, configured to generate a key feature sequence related to the key location according to the feature coordinates corresponding to the key location in each video image frame;
a candidate action recognition unit, configured to determine at least one candidate action of the target object through the key feature sequences of the respective key parts;
the action type recognition unit is used for respectively calculating the matching degree between each candidate action and the interactable object and determining the action type of the target object from the candidate actions according to the matching degree;
The action type recognition unit includes:
the interactive confidence calculation unit is used for acquiring a distance value between the interactable object and the human body region image and determining the interactive confidence of the interactable object based on the distance value;
the action confidence degree identification unit is used for respectively calculating the similarity between the key feature sequence and the standard feature sequence of each candidate action and identifying the similarity as the action confidence degree of the candidate action;
an interaction probability determining unit, configured to determine an interaction probability of the candidate action with the object type based on the object type of the interactable object;
the object confidence degree identification unit is used for extracting an object region image of the interactable object from the video image frame and determining the object confidence degree of the interactable object according to the object region image and a standard image preset by the object type;
a matching degree calculation unit, configured to import the interaction confidence degree, the action confidence degree, the object confidence degree and the interaction probability into a matching degree calculation model, and determine the matching degree of the candidate action; the matching degree calculation model specifically comprises the following steps:
wherein ,the degree of matching for the candidate action a; />The interaction confidence is the interaction confidence; s is(s) h The action confidence is the action confidence; s is(s) o Confidence for the object; />Is the interaction probability; />The triggering probability of the candidate action a is preset;
and the candidate action selecting unit is used for selecting the candidate actions with the matching degree larger than a matching threshold value and identifying the candidate actions as the action types of the target objects.
6. The apparatus according to claim 5, wherein the key feature sequence generating unit includes:
the image distance value calculation unit is used for acquiring first feature coordinates and second feature coordinates of the same key part in two adjacent video image frames of the frame number, and calculating an image distance value between the first feature coordinates and the second feature coordinates;
a shooting focal length determining unit for calculating an image area of the human body region image and determining a shooting focal length between the target object and a shooting module based on the image area;
an actual moving distance calculating unit, configured to import the shooting focal length, the image distance value, and the shooting frame rate of the video file into a distance conversion model, and calculate actual moving distances of the key parts in the two video image frames; the distance conversion model specifically comprises the following steps:
Wherein Dist is the actual movement distance; standard dist is the image distance value; figDist is the shooting focal length; baseDist is a preset reference focal length; actFrame is the shooting frame rate; baseFrame is the reference frame rate;
the associated coordinate recognition unit is used for recognizing the two feature coordinates with the actual moving distance smaller than a preset distance threshold as feature coordinates which are associated with each other;
and the associated coordinate packaging unit is used for generating the key feature sequence related to the key part according to all the feature coordinates which are associated with each other.
7. A terminal device, characterized in that it comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the steps of the method according to any one of claims 1 to 4.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 4.
CN201910264883.XA 2019-04-03 2019-04-03 Human body action recognition method and device Active CN110135246B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910264883.XA CN110135246B (en) 2019-04-03 2019-04-03 Human body action recognition method and device
PCT/CN2019/103161 WO2020199479A1 (en) 2019-04-03 2019-08-29 Human motion recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910264883.XA CN110135246B (en) 2019-04-03 2019-04-03 Human body action recognition method and device

Publications (2)

Publication Number Publication Date
CN110135246A CN110135246A (en) 2019-08-16
CN110135246B true CN110135246B (en) 2023-10-20

Family

ID=67569223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910264883.XA Active CN110135246B (en) 2019-04-03 2019-04-03 Human body action recognition method and device

Country Status (2)

Country Link
CN (1) CN110135246B (en)
WO (1) WO2020199479A1 (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417205A (en) * 2019-08-20 2021-02-26 富士通株式会社 Target retrieval device and method and electronic equipment
CN110738588A (en) * 2019-08-26 2020-01-31 恒大智慧科技有限公司 Intelligent community toilet management method and computer storage medium
CN111288986B (en) * 2019-12-31 2022-04-12 中科彭州智慧产业创新中心有限公司 Motion recognition method and motion recognition device
CN113496143A (en) * 2020-03-19 2021-10-12 北京市商汤科技开发有限公司 Action recognition method and device, and storage medium
CN111539352A (en) * 2020-04-27 2020-08-14 支付宝(杭州)信息技术有限公司 Method and system for judging human body joint motion direction
CN111814775B (en) * 2020-09-10 2020-12-11 平安国际智慧城市科技股份有限公司 Target object abnormal behavior identification method, device, terminal and storage medium
CN112528785A (en) * 2020-11-30 2021-03-19 联想(北京)有限公司 Information processing method and device
CN112418137B (en) * 2020-12-03 2022-10-25 杭州云笔智能科技有限公司 Operation identification method and system for target object
CN112528823B (en) * 2020-12-04 2022-08-19 燕山大学 Method and system for analyzing batcharybus movement behavior based on key frame detection and semantic component segmentation
CN112364835B (en) * 2020-12-09 2023-08-11 武汉轻工大学 Video information frame taking method, device, equipment and storage medium
CN112580499A (en) * 2020-12-17 2021-03-30 上海眼控科技股份有限公司 Text recognition method, device, equipment and storage medium
CN112529943B (en) * 2020-12-22 2024-01-16 深圳市优必选科技股份有限公司 Object detection method, object detection device and intelligent equipment
CN112712906A (en) * 2020-12-29 2021-04-27 安徽科大讯飞医疗信息技术有限公司 Video image processing method and device, electronic equipment and storage medium
CN112784760B (en) 2021-01-25 2024-04-12 北京百度网讯科技有限公司 Human behavior recognition method, device, equipment and storage medium
CN112883816A (en) * 2021-01-26 2021-06-01 百度在线网络技术(北京)有限公司 Information pushing method and device
CN113288087B (en) * 2021-06-25 2022-08-16 成都泰盟软件有限公司 Virtual-real linkage experimental system based on physiological signals
CN113553951B (en) * 2021-07-23 2024-04-16 北京市商汤科技开发有限公司 Object association method and device, electronic equipment and computer readable storage medium
CN113784059B (en) * 2021-08-03 2023-08-18 阿里巴巴(中国)有限公司 Video generation and splicing method, equipment and storage medium for clothing production
CN113657278A (en) * 2021-08-18 2021-11-16 成都信息工程大学 Motion gesture recognition method, device, equipment and storage medium
CN113869274B (en) * 2021-10-13 2022-09-06 深圳联和智慧科技有限公司 Unmanned aerial vehicle intelligent tracking monitoring method and system based on city management
CN114157526B (en) * 2021-12-23 2022-08-12 广州新华学院 Digital image recognition-based home security remote monitoring method and device
CN115171217B (en) * 2022-07-27 2023-03-03 北京拙河科技有限公司 Action recognition method and system under dynamic background
CN115620392A (en) * 2022-09-26 2023-01-17 珠海视熙科技有限公司 Action counting method, device, medium and fitness equipment
CN116704405A (en) * 2023-05-22 2023-09-05 阿里巴巴(中国)有限公司 Behavior recognition method, electronic device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930767A (en) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 Human body skeleton-based action recognition method
CN106022251A (en) * 2016-05-17 2016-10-12 沈阳航空航天大学 Abnormal double-person interaction behavior recognition method based on vision co-occurrence matrix sequence
WO2017000917A1 (en) * 2015-07-01 2017-01-05 乐视控股(北京)有限公司 Positioning method and apparatus for motion-stimulation button
CN107423721A (en) * 2017-08-08 2017-12-01 珠海习悦信息技术有限公司 Interactive action detection method, device, storage medium and processor
CN108197589A (en) * 2018-01-19 2018-06-22 北京智能管家科技有限公司 Semantic understanding method, apparatus, equipment and the storage medium of dynamic human body posture
WO2018113405A1 (en) * 2016-12-19 2018-06-28 广州虎牙信息科技有限公司 Live broadcast interaction method based on video stream, and corresponding apparatus thereof
CN108304762A (en) * 2017-11-30 2018-07-20 腾讯科技(深圳)有限公司 A kind of human body attitude matching process and its equipment, storage medium, terminal
CN109325456A (en) * 2018-09-29 2019-02-12 佳都新太科技股份有限公司 Target identification method, device, target identification equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101817583B1 (en) * 2015-11-30 2018-01-12 한국생산기술연구원 System and method for analyzing behavior pattern using depth image
CN107335192A (en) * 2017-05-26 2017-11-10 深圳奥比中光科技有限公司 Move supplemental training method, apparatus and storage device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000917A1 (en) * 2015-07-01 2017-01-05 乐视控股(北京)有限公司 Positioning method and apparatus for motion-stimulation button
CN105930767A (en) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 Human body skeleton-based action recognition method
CN106022251A (en) * 2016-05-17 2016-10-12 沈阳航空航天大学 Abnormal double-person interaction behavior recognition method based on vision co-occurrence matrix sequence
WO2018113405A1 (en) * 2016-12-19 2018-06-28 广州虎牙信息科技有限公司 Live broadcast interaction method based on video stream, and corresponding apparatus thereof
CN107423721A (en) * 2017-08-08 2017-12-01 珠海习悦信息技术有限公司 Interactive action detection method, device, storage medium and processor
CN108304762A (en) * 2017-11-30 2018-07-20 腾讯科技(深圳)有限公司 A kind of human body attitude matching process and its equipment, storage medium, terminal
CN108197589A (en) * 2018-01-19 2018-06-22 北京智能管家科技有限公司 Semantic understanding method, apparatus, equipment and the storage medium of dynamic human body posture
CN109325456A (en) * 2018-09-29 2019-02-12 佳都新太科技股份有限公司 Target identification method, device, target identification equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于运动方向的视角无关行为识别方法;梅雪;张继法;许松松;胡石;;计算机工程;第38卷(第15期);第159-165页 *

Also Published As

Publication number Publication date
WO2020199479A1 (en) 2020-10-08
CN110135246A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN110135246B (en) Human body action recognition method and device
CN110147717B (en) Human body action recognition method and device
WO2021114892A1 (en) Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium
WO2019184749A1 (en) Trajectory tracking method and apparatus, and computer device and storage medium
Yan et al. Learning the change for automatic image cropping
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
US10534957B2 (en) Eyeball movement analysis method and device, and storage medium
JP5554984B2 (en) Pattern recognition method and pattern recognition apparatus
US9465992B2 (en) Scene recognition method and apparatus
WO2019071664A1 (en) Human face recognition method and apparatus combined with depth information, and storage medium
Baskan et al. Projection based method for segmentation of human face and its evaluation
CN110472491A (en) Abnormal face detecting method, abnormality recognition method, device, equipment and medium
US9626552B2 (en) Calculating facial image similarity
CN110348331B (en) Face recognition method and electronic equipment
WO2021017286A1 (en) Facial recognition method and apparatus, electronic device and non-volatile computer readable storage medium
JP2017033547A (en) Information processing apparatus, control method therefor, and program
US10650234B2 (en) Eyeball movement capturing method and device, and storage medium
JP7107598B2 (en) Authentication face image candidate determination device, authentication face image candidate determination method, program, and recording medium
Seo et al. Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection
JP2009157767A (en) Face image recognition apparatus, face image recognition method, face image recognition program, and recording medium recording this program
CN111695462A (en) Face recognition method, face recognition device, storage medium and server
JP2022542199A (en) KEYPOINT DETECTION METHOD, APPARATUS, ELECTRONICS AND STORAGE MEDIA
Shaikh et al. Gait recognition using partial silhouette-based approach
CN110363790A (en) Target tracking method, device and computer readable storage medium
CN108875488B (en) Object tracking method, object tracking apparatus, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant