WO2020199479A1 - 一种人体动作的识别方法及设备 - Google Patents

一种人体动作的识别方法及设备 Download PDF

Info

Publication number
WO2020199479A1
WO2020199479A1 PCT/CN2019/103161 CN2019103161W WO2020199479A1 WO 2020199479 A1 WO2020199479 A1 WO 2020199479A1 CN 2019103161 W CN2019103161 W CN 2019103161W WO 2020199479 A1 WO2020199479 A1 WO 2020199479A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
key
human body
candidate
video image
Prior art date
Application number
PCT/CN2019/103161
Other languages
English (en)
French (fr)
Inventor
叶明�
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020199479A1 publication Critical patent/WO2020199479A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Definitions

  • This application belongs to the field of image recognition technology, and in particular relates to a method and equipment for recognizing human movements.
  • the embodiments of the present application provide a method and device for recognizing human movements to solve the existing recognition methods of human movements.
  • the recognition speed is low and the accuracy is not high, especially for some approximate gesture behaviors.
  • the convolutional neural network cannot be accurately distinguished, resulting in a problem that the accuracy of action recognition is further reduced.
  • the first aspect of the embodiments of the present application provides a method for recognizing human movements, including:
  • the video file includes a plurality of video image frames
  • the matching degree between each candidate action and the interactive object is calculated respectively, and the action type of the target object is determined from the candidate actions according to the matching degree.
  • the embodiment of the application obtains the video file of the target user that needs to perform action behavior analysis, analyzes each video image frame of the video file, determines the human body region image contained in each video image frame, and recognizes the video image Interactable objects that can interact with the target user in the frame, mark each key part in the human body area image, and determine the change of each part of the target object according to the characteristic coordinates of each key part, so as to determine the target
  • the candidate action of the object according to the degree of matching between the candidate action and the interactive object, further screens multiple candidate actions with similar poses, determines the action type of the target object, and automatically recognizes the human body movement of the target object.
  • the embodiment of the present application does not need to rely on neural networks to recognize the action type of the video image, and does not rely on optical flow information to avoid the recognition delay caused by the need to perform time sequence recursion. Therefore, the efficiency of recognition is improved.
  • the terminal device will determine the interactive object in the video image frame, and use the interactive action to determine whether the target user has interactive behavior, so that multiple approximate gestures can be distinguished, and the accuracy of action recognition is further improved rate.
  • FIG. 1 is an implementation flowchart of a method for recognizing human body movements provided by the first embodiment of the present application
  • FIG. 2 is a specific implementation flow chart of the method S106 for recognizing human body movements provided by the second embodiment of the present application;
  • FIG. 3 is a specific implementation flow chart of a method S104 for recognizing human body movements provided by the third embodiment of the present application;
  • FIG. 4 is a specific implementation flowchart of a method S102 for recognizing a human body movement provided by the fourth embodiment of the present application;
  • FIG. 5 is a specific implementation flow chart of a method S105 for recognizing human body movements provided by the fifth embodiment of the present application;
  • FIG. 6 is a structural block diagram of a human body motion recognition device provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of a terminal device provided by another embodiment of the present application.
  • the execution subject of the process is the terminal device.
  • the terminal equipment includes, but is not limited to: servers, computers, smart phones, tablet computers, and other devices capable of performing human body motion recognition operations.
  • Fig. 1 shows a flow chart of the method for recognizing human body movements provided by the first embodiment of the present application, which is detailed as follows:
  • a video file of a target object is acquired; the video file includes multiple video image frames.
  • the administrator can specify the video file containing the target object as the target video file.
  • the terminal device will download the video file about the target object from the video database according to the file identifier of the target video file. And recognize the action behavior of the target object.
  • the terminal device is specifically a video surveillance device that will obtain video files in the current scene; in this case, the terminal device will recognize each object captured in the current scene as the target object, based on the people of different shooting objects Face image, configure the object number for each object.
  • the terminal device will determine the action type of each monitored object in real time according to the video file generated during the monitoring process. If the action type of a certain target object is detected in the abnormal action list, a warning message will be generated , To notify the monitored object that performs the abnormal action to stop the abnormal behavior, so as to realize the real-time warning of the abnormal action of the monitored object.
  • the user can send the face information of the target object to the terminal device.
  • the terminal device searches for the face of each video file in the video database based on the face information, and uses the video file containing the face information as the target video file.
  • the specific search operation can be: the terminal device recognizes the candidate face in each video image frame in each video file in the video database, extracts the facial feature value of the key area in the candidate face, and compares the face of each candidate face The feature value is matched with the face information of the target face. If the matching degree between the two is greater than the preset matching threshold, it means that the two correspond to the same physical person, and the video file is recognized as a face image containing the target object.
  • the video file contains multiple video image frames, and each video image frame corresponds to a frame number. Based on the positive sequence of the frame numbers, the video image frames are arranged and encapsulated to generate the video file.
  • the frame number can be determined according to the playing time of the video image frame in the video file.
  • each of the video image frames is respectively parsed, the human body region image related to the target object in the video image frame is extracted, and the interactive objects contained in the video image frame are determined.
  • the terminal device analyzes the video file, performs body recognition on each video image frame in the video file, and extracts the human body region image of each video image frame with respect to the target object.
  • the specific method for extracting the human body region image can be as follows: the terminal device uses the face recognition algorithm to determine whether the video image frame contains the face region image, if not, it means that the video image frame does not contain the human body region image; otherwise, if If the video image frame contains a face image, then based on the coordinates where the face image is located, contour recognition is performed on the area of the coordinates, based on the contour information obtained by the recognition, the human body region image corresponding to the face image is extracted, and based on the face image Match with the face template of the target object to determine whether the human body area image is the human body area image of the target object.
  • the terminal device will determine the human body region image of the face image contained in the video image frame, and then combine the face image with each Match the face template of the target object to determine the target object corresponding to the face image, and mark the object identifier of the associated target object on the human body area image, and then quickly determine each target object in the video image frame
  • the corresponding human body region image is convenient for tracking the actions of multiple objects.
  • the terminal device may obtain the object human body template associated with the object identifier according to the object identifier of the target object.
  • the target human body template can be used to represent the human body characteristics of the target object, such as body shape information, gender information and/or hairstyle information.
  • the terminal device can slide the frame in the video image frame according to the target human body template, and calculate the framed If the matching degree between the candidate region and the target human body template is greater than the preset matching threshold, the candidate region is recognized as the human body region image of the target object; conversely, if the matching degree between the two is less than or equal to the matching threshold, Recognize that the candidate area is not the human body image of the target object, and continue to slide the frame; if all the candidate areas in the video image frame do not contain the human body image, repeat the above operation on the next video image frame to identify the target An image of the human body area of the subject.
  • the terminal device in addition to acquiring the human body region image of the target object, can also extract interactive objects that can interact with the user from the image.
  • the specific recognition method may be: determining the contour information contained in the video image frame through a contour recognition algorithm, determining the subject type of each subject based on the contour information, and determining the interactive object according to the subject type. Different types of interactive subjects have different contour characteristics. Therefore, the subject type of the subject can be determined by identifying the wheel library information, and the subject that can interact with the target object is selected as the interactive object according to the subject type. For example, shooting subjects such as chairs, tables, and knives may interact with the target object, while shooting subjects such as clouds and the sun have a low probability of interacting with the target object. Therefore, by identifying the subject type, most invalid interactive objects can be filtered.
  • the terminal device calculates the distance value between each subject and the human body region image, and selects the subject whose distance value is less than a preset threshold as the interactive object.
  • the terminal device may select the subject whose contour boundary is adjacent to the image of the human body area as the interactive subject. Since the target object interacts with the interactive subject, that is, the two are in contact with each other, the contour boundary of the interactive object is The target users are adjacent.
  • each key part in the preset list of key parts of the human body is marked in the human body region image, and characteristic coordinates of each key part are acquired.
  • the terminal device stores a list of key parts of the human body, and the list of key parts of the human body contains multiple key parts of the human body.
  • the list of key parts of the human body contains 17 key parts, namely: nose, eyes, The 17 key parts of ears, shoulders, wrists, hands, waist, knees and feet.
  • the terminal device marks each key part in the human body area image, and the specific marking method is: based on the contour information of the human body area image, determine the current posture type of the target object, where the posture type is specifically: standing type , Walking type, lying type, sitting type, etc. Then, according to the corresponding relationship between different key parts and posture types, each key part is marked on the image of the human body area.
  • the correspondence records the distance value between the key part and the contour center point of the human body region image and the relative direction vector, and the terminal device can locate each key part based on the distance value and the relative direction vector, and perform a marking operation.
  • the terminal device establishes an image coordinate axis based on the video image frame, and determines the characteristic coordinates of each key part according to the position of each key part on the video image frame.
  • the terminal device may use the end point of the lower left corner of the video image frame as the coordinate origin, or may use the image center point as the coordinate origin, which is specifically determined according to the default settings of the administrator or the device.
  • a key feature sequence about the key part is generated according to the feature coordinates corresponding to the key part in each of the video image frames.
  • the terminal device needs to determine the motion trajectory of each key part. Therefore, based on the part identifier of the key part, the feature coordinates corresponding to the part identifier are extracted from each video image frame, and all information about the characteristic part is extracted. The feature coordinates are encapsulated, and the key feature sequence about the feature part is generated. Among them, the sequence of each element in the key feature sequence in the sequence is consistent with the frame number of the video image frame to which it belongs, that is, each element in the key feature sequence has a time sequence relationship, so that the key feature sequence can be used to determine the key location based on time. The situation changes over time.
  • the terminal device can establish a characteristic curve about the key parts on the preset coordinate axes according to the frame number of the video image frame.
  • the feature coordinates are sequentially connected based on the frame number, and the feature coordinates corresponding to the missing video image frames can be filled by a smoothing algorithm to determine the feature coordinates corresponding to the missing video image frames.
  • At least one candidate action of the target object is determined through the key feature sequence of each of the key parts.
  • the terminal device can determine the motion trajectories of different key parts based on the key feature sequences of multiple key parts, and then the action types that conform to the aforementioned motion trajectories are used as candidate types. Specifically, the terminal device can determine the movement direction of the key part according to the key feature sequence, and then match the movement direction of the key part of each action template in each action type library one by one based on the movement direction of multiple key parts.
  • the number of key parts for example, an action template whose number of matched key parts is greater than a preset matching threshold is selected as the candidate action of the target object.
  • the terminal device may be set with a maximum number of frames, and then the terminal device divides the key feature sequence of the key part based on the maximum number of frames, divided into multiple feature subsequences, and respectively determines the action types of different feature subsequences , Due to the long duration of the captured video file, the user may make multiple actions during the shooting. Based on this, the terminal device will set the maximum number of frames in order to divide and recognize different actions. To achieve the purpose of single user multi-action recognition.
  • the degree of matching between each of the candidate actions and the interactive object is calculated, and the action type of the target object is determined from the candidate actions according to the degree of matching.
  • the terminal device can obtain the interactive behavior list of interactive objects, and detect the similarity between the candidate actions and each interactive behavior in the interactive behavior list, and select the maximum similarity as the candidate actions and the interactive objects The matching degree between each candidate action is then used to determine the action type of the target object.
  • the terminal device can select a candidate action with a matching degree greater than a preset matching threshold as the action type currently performed by the target object.
  • the terminal device can determine the type of action on a video file obtained by video surveillance.
  • the video file can be a video file about a security check area, which determines the interactive behavior of personnel in the security check area and detects whether there is a user There is abnormal behavior. Locate the target object to be identified in the video surveillance file, and determine the type of action between the target object and each interactive object.
  • the interactive object can be a suitcase or a certificate to be authenticated, and determine whether the user submits the suitcase as required. Security check operation, or take dangerous items from the suitcase to avoid the security check operation, so as to improve the accuracy of the security check process.
  • the terminal device can identify the distance value between each interactive object and the human body region image, and select the interactive object with the lowest distance value as the target interactive object, and calculate the target interactive object and each candidate The degree of match between actions to determine the target object’s action type.
  • the method for recognizing human movements obtains the video file of the target user that needs to be analyzed for action behavior, and analyzes each video image frame of the video file to determine each video
  • the human body area image contained in the image frame mark each key part in the human body area image, and determine the change of each part of the target object according to the characteristic coordinates of each key part, so as to determine the action type of the target object and automatically recognize the target Human body movements of the subject.
  • the embodiment of the present application does not need to rely on neural networks to recognize the action type of the video image, and does not rely on optical flow information to avoid the recognition delay caused by the need to perform time sequence recursion. Therefore, the efficiency of recognition is improved, and by locating multiple key parts and determining the action of the target object through changes in multiple key parts, the accuracy is further improved, thereby improving the effect of image recognition and the efficiency of object behavior analysis.
  • FIG. 2 shows a specific implementation flow chart of a method S106 for recognizing a human body movement provided by the second embodiment of the present application.
  • a human body motion recognition method S106 provided in this embodiment includes: S1061 to S1066, which are detailed as follows:
  • the separately calculating the matching degree between each of the candidate actions and the interactable object, and determining the action type of the target object from the candidate actions according to the matching degree includes:
  • a distance value between the interactive object and the human body region image is acquired, and the interaction confidence of the interactive object is determined based on the distance value.
  • the terminal device may mark the area image where the interactive object is located on the video image frame, and use the center coordinate of the area image as the characteristic coordinate of the interactive object, and calculate the characteristic coordinates and the The Euclidean distance between the center coordinates of the human body region, and the Euclidean distance is taken as the distance value between the interactive object and the image of the human body region. If the distance value is smaller, the interaction probability between the two is greater; on the contrary, if the distance value is greater, the interaction probability between the two is smaller. Therefore, the terminal device can calculate the interaction confidence between the interactable object and the target human body according to the distance value.
  • the similarity between the key feature sequence and the standard feature sequence of each candidate action is calculated respectively, and the similarity is recognized as the action confidence of the candidate action.
  • the terminal device needs to determine the correct probability of the identified candidate action, so it will obtain the standard feature sequence of the candidate action, and calculate the difference between the key feature sequence and the standard feature sequence in multiple video image frames. Similarity.
  • the similarity calculation method may be: the terminal device generates a standard curve about the standard feature sequence on the preset coordinate axis, and calculates the behavior curve of the key feature sequence, and calculates the enclosed enclosure between the two curves. The area of the region, based on the area, determines the similarity between the key feature sequence and the standard feature sequence. If the area is larger, it means that the difference between the two actions is larger, and the similarity is smaller; on the contrary, if the area is smaller, it means that the difference between the two actions is smaller, and the similarity is greater.
  • an interaction probability between the candidate action and the object type is determined.
  • the terminal device determines the object type of the interactable object according to the outline information of the interactable object, that is, determines which type of article it belongs to, and determines the interaction probability of the object type and the candidate action.
  • the object type "basketball” can be used as the action receptor for candidate actions such as “shoot” and “kick”, that is, the interaction probability is greater; while for candidate actions such as “sit” and “stand”, it will not interact with If the object type "basketball” interacts, the probability of interaction is small.
  • the terminal device can obtain the action recipient object of each candidate action according to the action record library, calculate the number of action records corresponding to the object type, and determine the interaction probability between the object type and the candidate action based on the number.
  • the object area image of the interactive object is extracted from the video image frame, and the object confidence level of the interactive object is determined according to the object area image and the standard image preset by the object type .
  • the terminal device also needs to determine the accuracy of the interactive object recognition, so it will obtain the object area image of the interactive object, and compare the similarity with the standard image matching the object type. The degree of similarity determines the object confidence of the interactable object.
  • the interaction confidence, the action confidence, the object confidence, and the interaction probability are imported into a matching degree calculation model to determine the matching degree of the candidate action; the matching degree calculation
  • the specific model is:
  • Is the matching degree of the candidate action a The interaction of confidence; s h is the confidence level of the motion; s o is the confidence level of the object; Is the interaction probability; Is the preset trigger probability of the candidate action a.
  • the terminal device imports the above four calculated parameters into the matching degree calculation model to determine the degree of matching between the candidate action and the interactive object, so that the type of action can be further determined by the interactive object. Screening and identification.
  • the trigger probability of the candidate action can be calculated according to the action type corresponding to the previous image frame and the action type of the next image frame. Since the action has a certain continuity, it can be passed through The action and subsequent actions to determine the trigger probability of the current action.
  • the candidate action whose matching degree is greater than a matching threshold is selected as the action type of the target object.
  • the terminal device may select a candidate action with a matching degree greater than a preset matching threshold as the action type of the target object.
  • the matching degree of each candidate action is calculated, which can improve the accuracy of the matching degree calculation, thereby improving the accuracy of human action recognition .
  • FIG. 3 shows a specific implementation flow chart of a method S104 for recognizing a human body movement provided by the third embodiment of the present application.
  • a human body motion recognition method S104 provided in this embodiment includes: S1041 to S1045, which are detailed as follows:
  • the generating a key feature sequence about the key part according to the feature coordinates corresponding to the key part in each of the video image frames includes:
  • S1041 obtain the first feature coordinates and the second feature coordinates of the same key part in the two video image frames with adjacent frames, and calculate the difference between the first feature coordinates and the second feature coordinates The image distance value between.
  • the terminal device needs to track key parts of the human body. If it detects that the displacement of the same key part in two adjacent image frames is too large, it will identify that the two key parts belong to different human bodies, so that re-tracking can be quickly performed , And improve the accuracy of action recognition. Based on this, the terminal device will obtain the first feature coordinates and the second feature coordinates of the same key part in two video image frames with adjacent frames, import the two feature coordinates into the Euclidean distance calculation formula, and calculate the two coordinates The distance value between the points, that is, the image distance value.
  • the image distance value specifically refers to the distance between two coordinate points on the video image frame, and is not the moving distance of the key part in the actual scene. Therefore, the image distance value needs to be numerically converted.
  • the image area of the human body region image is calculated, and the shooting focal length between the target object and the shooting module is determined based on the image area.
  • the terminal device acquires the area occupied by the human body region image in the video image frame, that is, the image area.
  • the terminal device is provided with a standard human body area and a standard shooting focal length corresponding to the area.
  • the terminal device can calculate the ratio between the current image area and the standard human body area, determine the zoom ratio, and calculate the actual shooting focal length between the target object and the shooting model based on the zoom ratio and the standard shooting focal length, that is, the aforementioned shooting focal length.
  • Dist is the actual moving distance
  • StandardDist is the image distance value
  • FigDist is the shooting focal length
  • BaseDist is the preset reference focal length
  • ActFrame is the shooting frame rate
  • BaseFrame is the reference frame rate.
  • the shooting focal length corresponding to the video image frame of the terminal device, the image distance value of the two key parts and the shooting frame rate of the video file are imported into the distance conversion model, so that the actual key parts in the scene can be calculated. Moving distance.
  • the two feature coordinates whose actual movement distance is less than a preset distance threshold are identified as mutually related feature coordinates.
  • the terminal device detects that the actual movement distance is greater than or equal to the preset distance threshold, it means that the movement distance of the key part exceeds the normal movement distance.
  • the key part in the two video image frames will be identified Belong to different target objects, at this time, it will be judged that the above two feature coordinates are unrelated feature coordinates; on the contrary, if the actual movement distance value is less than the preset distance threshold, it means that the key parts in the two video image frames belong to the same
  • the above two feature coordinates will be determined as the associated feature coordinates to achieve the purpose of tracking the target object, avoiding switching to tracking the motion trajectory of user B when tracking the motion trajectory of user A, which improves The accuracy of action recognition.
  • the key feature sequence related to the key part is generated according to all the mutually related feature coordinates.
  • the terminal device filters all non-associated feature coordinates, encapsulates the mutually associated feature coordinates, and generates a key feature sequence about key parts.
  • the abnormal feature coordinate points can be filtered, and the accuracy of the action recognition is improved.
  • FIG. 4 shows a specific implementation flow chart of a method S102 for recognizing a human body movement provided by the fourth embodiment of the present application.
  • a human body motion recognition method S102 provided in this embodiment includes: S1021 to S1024, which are detailed as follows:
  • parsing each of the video image frames separately, and extracting a human body region image about the target object in the video image frames includes:
  • the contour curve of the video image frame is obtained through the contour recognition algorithm, and the area of the area surrounded by each contour curve is calculated.
  • the terminal device uses a contour recognition algorithm to determine the contour curve in the video image frame.
  • the specific method for identifying the contour line can be as follows: the terminal device calculates the pixel value difference between two adjacent coordinate points, and if the difference value is greater than the preset contour threshold value, then the coordinate point is recognized as the coordinate point where the contour line is located. , Connect all the coordinate points on the identified contour line to form a continuous contour curve. Each closed contour curve corresponds to a subject.
  • the terminal device marks all contour curves on the video image frame, and integrates the area enclosed by the contour curve and/or the boundary of the video image frame, so as to obtain information about each contour curve.
  • the area area since a contour curve corresponds to a photographed object, based on the area area, the zoom ratio of the photographed object can be determined, so that a suitable window can be selected to extract the human body area image, and the accuracy of the human body area image extraction can be improved.
  • a human body recognition window of the video image frame is generated according to the area of each of the regions.
  • the size of the human body recognition window also needs to be adjusted accordingly.
  • the terminal device can calculate the zoom ratio corresponding to the video image frame according to the area area of each shooting object, and query the The human body recognition window size associated with the zoom ratio is then generated to generate a human body recognition window matching the video image frame.
  • the terminal device uses the body recognition algorithm of yolov3, and yolov3 needs to be configured with three body recognition windows. Based on this, the terminal device generates the distribution of the area area according to the area enclosed by each contour curve, selects the three area areas with the highest distribution density as the characteristic area, and generates the corresponding human body recognition window based on the three characteristic areas , Namely three feature maps.
  • a sliding frame is performed on the video image frame based on the human body recognition window to generate multiple candidate region images.
  • the terminal device after the terminal device generates a human body recognition window corresponding to the zoom ratio of the video image frame, it can use the human body recognition window to slide the frame on the video image frame, and use the region image that is framed each time as a candidate region image. If there are multiple sizes of human body recognition windows, create a concurrent thread corresponding to the number of human body recognition windows, copy the multiple video image frames, and control the human body recognition window to slide on different video image frames through multiple concurrent threads.
  • Frame fetching that is, the sliding frame fetching operations of human body recognition windows of different sizes are independent of each other and do not affect each other, and candidate region images of different sizes are generated.
  • the overlap rate between each of the candidate region images and the standard human body template is calculated, and the candidate region image with the overlap rate greater than a preset overlap rate threshold is selected as the body region image.
  • the terminal device calculates the coincidence rate between the candidate region image and the standard human body template. If the coincidence rate between the two is higher, it means that the object corresponding to the region image is similar to the target object. The higher the degree, so the candidate area can be identified as a human body image; on the contrary, if the coincidence rate between the two is lower, it means that the shape of the image of the region has a lower similarity with the target object, and it is recognized as a non-human body image . Since the video image frame can contain multiple different users, the terminal device will recognize all candidate regions whose coincidence rate exceeds the preset coincidence rate threshold as human body region images. In this case, the terminal device can locate each human body region image In order to match the human body image with the standard human face of the target object, the human body region image matching the standard human face is selected as the human body region image of the target object.
  • the zoom ratio of the video image frame is determined, and the corresponding human body recognition window is generated to perform the recognition operation of the human body region image , which can improve the accuracy of recognition.
  • FIG. 5 shows a specific implementation flow chart of a method S105 for recognizing a human body movement provided by the fifth embodiment of the present application.
  • a human body motion recognition method S105 provided in this embodiment includes: S1051 to S1052, and the details are as follows:
  • the determining at least one candidate action of the target object through the key feature sequence of each of the key parts includes:
  • the feature coordinates of each of the key feature sequences are marked in a preset coordinate axis, and a part change curve about each of the key parts is generated.
  • the terminal device marks each feature coordinate on a preset coordinate axis according to the coordinate value of each feature coordinate in each key feature sequence and the frame number of the corresponding video image frame, and connects each feature coordinate, Generate part change curve about key parts.
  • the coordinate axis may be a coordinate axis established based on the video image frame, the horizontal axis corresponds to the length of the video image frame, and the vertical axis corresponds to the width of the video image frame.
  • the part change curve is matched with the standard action curve of each candidate action in the preset action library, and the candidate action of the target object is determined based on the matching result.
  • the terminal device matches the change curve of all key parts with the standard movement curve of each candidate action in the preset movement library, calculates the coincidence rate of the two change curves, and selects the candidate action with the highest coincidence rate as The action type of the target object.
  • the action type of the target object can be determined intuitively, which improves the accuracy of the action type.
  • FIG. 6 shows a structural block diagram of a human body motion recognition device provided by an embodiment of the present application.
  • the human body motion recognition device includes units for executing steps in the embodiment corresponding to FIG. 1.
  • only the parts related to this embodiment are shown.
  • the recognition device for human body movements includes:
  • the video file obtaining unit 61 is configured to obtain a video file of the target object; the video file includes a plurality of video image frames;
  • the human body region image extraction unit 62 is configured to analyze each of the video image frames, extract the human body region image of the target object in the video image frame, and determine the interactive objects contained in the video image frame;
  • the key part recognition unit 63 is configured to mark each key part in the preset human body key part list in the human body region image, and obtain the characteristic coordinates of each key part;
  • a key feature sequence generating unit configured to generate a key feature sequence about the key part according to the feature coordinates 64 corresponding to the key part in each of the video image frames;
  • the candidate action recognition unit 65 is configured to determine at least one candidate action of the target object through the key feature sequence of each of the key parts;
  • the action type recognition unit 66 is configured to calculate the matching degree between each of the candidate actions and the interactable object, and determine the action type of the target object from the candidate actions according to the matching degree.
  • the action type identification unit 66 includes:
  • An interaction confidence calculation unit configured to obtain a distance value between the interactable object and the human body region image, and determine the interaction confidence of the interactable object based on the distance value;
  • the action confidence recognition unit is configured to calculate the similarity between the key feature sequence and the standard feature sequence of each candidate action, and recognize the similarity as the action confidence of the candidate action;
  • An interaction probability determining unit configured to determine the interaction probability between the candidate action and the object type based on the object type of the interactable object
  • the object confidence recognition unit is configured to extract the object area image of the interactive object from the video image frame, and determine the interactive object based on the object area image and the standard image preset by the object type Object confidence level;
  • a matching degree calculation unit configured to import the interaction confidence, the action confidence, the object confidence, and the interaction probability into a matching calculation model to determine the matching of the candidate action;
  • the matching degree calculation model is specifically:
  • the candidate action selection unit is configured to select the candidate actions whose matching degree is greater than a matching threshold and identify them as the action type of the target object.
  • the key feature sequence generating unit 64 includes:
  • the image distance value calculation unit is used to obtain the first feature coordinates and the second feature coordinates of the same key part in the two video image frames with adjacent frames, and calculate the first feature coordinates and the first feature coordinates Image distance value between two feature coordinates;
  • a photographing focal length determining unit configured to calculate the image area of the human body region image, and determine the photographing focal length between the target object and the photographing module based on the image area;
  • the actual movement distance calculation unit is used to import the shooting focal length, the image distance value and the shooting frame rate of the video file into the distance conversion model, and calculate the actual movement of the key part in the two video image frames Distance;
  • the distance conversion model is specifically:
  • Dist is the actual moving distance
  • StandardDist is the image distance value
  • FigDist is the shooting focal length
  • BaseDist is the preset reference focal length
  • ActFrame is the shooting frame rate
  • BaseFrame is the reference frame rate
  • An associated coordinate identification unit configured to identify the two feature coordinates whose actual movement distance is less than a preset distance threshold as mutually associated feature coordinates
  • the associated coordinate packaging unit is used to generate the key feature sequence about the key part according to all the mutually associated feature coordinates.
  • the human body region image extraction unit 62 includes:
  • a contour curve obtaining unit configured to obtain the contour curve of the video image frame through a contour recognition algorithm, and calculate the area of the area surrounded by each contour curve;
  • a human body recognition window generating unit configured to generate a human body recognition window of the video image frame according to the area of each of the regions;
  • a candidate region image extraction unit configured to perform sliding frame extraction on the video image frame based on the human body recognition window to generate multiple candidate region images
  • the human body region image matching unit is used to calculate the coincidence rate between each of the candidate region images and the standard human body template, and select the candidate region image with the coincidence rate greater than a preset coincidence rate threshold as the body region image .
  • the action type identification unit 65 includes:
  • a part change curve generating unit configured to mark the feature coordinates of each of the key feature sequences in a preset coordinate axis, and generate a part change curve about each of the key parts;
  • the candidate action selection unit is configured to match the part change curve with the standard action curve of each candidate action in the preset action library, and determine the action type of the target object based on the matching result.
  • the human body motion recognition device provided by the embodiment of the present application can also recognize the motion type of a video image without relying on a neural network, and does not rely on optical flow information, avoiding the recognition delay caused by the need to perform time sequence recursion, thereby Improve the efficiency of recognition.
  • the terminal device will determine the interactive object in the video image frame, and use the interactive action to determine whether the target user has interactive behavior, so that multiple approximate gestures can be distinguished, and the accuracy of action recognition is further improved .
  • Fig. 7 is a schematic diagram of a terminal device provided by another embodiment of the present application.
  • the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and computer-readable instructions 72 stored in the memory 71 and running on the processor 70, such as a human body movement Identification procedures.
  • the processor 70 executes the computer-readable instructions 72, the steps in the above-mentioned various human body motion recognition method embodiments are implemented, such as S101 to S106 shown in FIG. 1.
  • the processor 70 executes the computer-readable instructions 72
  • the functions of the units in the foregoing device embodiments such as the functions of the modules 61 to 66 shown in FIG. 6, are implemented.
  • the computer-readable instruction 72 may be divided into one or more units, and the one or more units are stored in the memory 71 and executed by the processor 70 to complete the application .
  • the one or more units may be a series of computer-readable instruction instruction segments capable of completing specific functions, and the instruction segment is used to describe the execution process of the computer-readable instructions 72 in the terminal device 7.
  • the computer-readable instructions 72 can be divided into a video file acquisition unit, a human body region image extraction unit, a key part recognition unit, a key feature sequence generation unit, a candidate action recognition unit, and an action type recognition unit. The specific functions of each unit are as described above. Said.
  • the terminal device 7 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 70 and a memory 71.
  • FIG. 7 is only an example of the terminal device 7 and does not constitute a limitation on the terminal device 7. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the terminal device may also include input and output devices, network access devices, buses, etc.
  • the so-called processor 70 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7.
  • the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk equipped on the terminal device 7, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc.
  • the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device.
  • the memory 71 is used to store the computer-readable instructions and other programs and data required by the terminal device.
  • the memory 71 can also be used to temporarily store data that has been output or will be output.
  • the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

一种人体动作的识别方法及设备,包括:获取目标对象的视频文件(S101);分别解析各个视频图像帧,提取视频图像帧中关于目标对象的人体区域图像,以及确定视频图像帧包含的可交互对象(S102);在人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个关键部位的特征坐标(S103);根据关键部位在各个视频图像帧中对应的特征坐标,生成关键特征序列(S104);通过关键部位的关键特征序列,确定目标对象的候选动作(S105);分别计算各个候选动作与可交互对象之间的匹配度,并根据匹配度确定目标对象的动作类型(S106)。通过借助交互动作确定目标用户是否存在交互行为,从而能够对多个近似姿态进行区分,进一步提高了动作识别的准确率。

Description

一种人体动作的识别方法及设备
本申请申明享有2019年04月03日递交的申请号为201910264883.X、名称为“一种人体动作的识别方法及设备”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请属于图像识别技术领域,尤其涉及一种人体动作的识别方法及设备。
背景技术
随着图像识别技术的不断发展,计算机可以从图像文件以及视频文件中自动识别得到越来越多的信息,例如确定画面中包含的用户的人体动作类型,并基于识别得到动作信息进行对象追踪以及对象行为分析等操作,因此图像识别技术的准确度以及识别速率,则会直接影响后续步骤的处理效果。现有的人体动作的识别技术,一般是采用卷积神经网络进行识别,然而上述技术需要借助光流信息,需要多次进行时序递归操作,从而识别速度较低,而且准确率也不高,特别对于部分近似姿态行为,例如对于坐下以及蹲下两个动作,由于人体姿态相似,通过卷积神经网络无法准确进行区分,导致动作识别的准确率进一步降低。
技术问题
有鉴于此,本申请实施例提供了一种人体动作的识别方法及设备,以解决现有的人体动作的识别方法,识别速度较低,而且准确率也不高,特别对于部分近似姿态行为,例如对于坐下以及蹲下两个动作,由于人体姿态相似,通过卷积神经网络无法准确进行区分,导致动作识别的准确率进一步降低的问题。
技术解决方案
本申请实施例的第一方面提供了一种人体动作的识别方法,包括:
获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象;
在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作;
分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型。
有益效果
本申请实施例通过获取所需要进行动作行为分析的目标用户的视频文件,并对该视频文件的各个视频图 像帧进行解析,确定每个视频图像帧中包含的人体区域图像,并识别出视频图像帧中可与目标用户之间可以存在交互行为的可交互对象,在人体区域图像中标记出各个关键部位,并根据各个关键部位的特征坐标,确定目标对象的各个部位的变化情况,从而确定目标对象的候选动作,根据候选动作与可交互对象之间的匹配度,进一步对多个姿态近似的候选动作进行筛选,确定目标对象的动作类型,自动识别目标对象的人体动作。与现有的人体动作的识别技术相比,本申请实施例无需依赖神经网络对视频图像进行动作类型的识别,并不借助光流信息,避免了需要进行时序递归而带来的识别时延,从而提高了识别的效率,另一方面终端设备会确定视频图像帧中的交互对象,借助交互动作确定目标用户是否存在交互行为,从而能够对多个近似姿态进行区分,进一步提高了动作识别的准确率。
附图说明
图1是本申请第一实施例提供的一种人体动作的识别方法的实现流程图;
图2是本申请第二实施例提供的一种人体动作的识别方法S106具体实现流程图;
图3是本申请第三实施例提供的一种人体动作的识别方法S104具体实现流程图;
图4是本申请第四实施例提供的一种人体动作的识别方法S102具体实现流程图;
图5是本申请第五实施例提供的一种人体动作的识别方法S105具体实现流程图;
图6是本申请一实施例提供的一种人体动作的识别设备的结构框图;
图7是本申请另一实施例提供的一种终端设备的示意图。
本发明的实施方式
在本申请实施例中,流程的执行主体为终端设备。该终端设备包括但不限于:服务器、计算机、智能手机以及平板电脑等能够执行人体动作的识别操作的设备。图1示出了本申请第一实施例提供的人体动作的识别方法的实现流程图,详述如下:
在S101中,获取目标对象的视频文件;所述视频文件包括多个视频图像帧。
在本实施例中,管理员可以指定包含目标对象的视频文件作为目标视频文件,在该情况下,终端设备会根据该目标视频文件的文件标识,从视频数据库中下载关于目标对象的视频文件,并对该目标对象的动作行为进行识别。优选地,该终端设备具体为一视频监控设备,会获取当前场景内视频文件;在该情况下,终端设备会将当前场景中拍摄得到的各个对象均识别为目标对象,基于不同拍摄对象的人脸图像,为各个对象配置对象编号,终端设备根据监控过程中生成的视频文件,实时判定各个监控对象的动作类型,若检测到某一目标对象的动作类型在异常动作列表内,则生成警告信息,以通知执行异常动作的监控对象停止该异常行为,实现实时对监控对象的异常动作的警告目的。
可选地,用户可以将目标对象的人脸信息发送给终端设备。终端设备基于该人脸信息在视频数据库内的各个视频文件进行人脸查找,将包含该人脸信息的视频文件作为目标视频文件。具体的查找操作可以为:终端设备识别视频数据库内的各个视频文件中每个视频图像帧中的候选人脸,提取候选人脸中关键区域的脸部 特征值,将各个候选人脸的脸部特征值与目标人脸的人脸信息进行匹配,若两者匹配度大于预设的匹配阈值,则表示两者对应同一实体人,则将该视频文件识别为包含目标对象的人脸图像。
在本实施例中,视频文件包含多个视频图像帧,每个视频图像帧对应一个帧编号,基于帧编号的正序将各个视频图像帧进行排列并封装,生成视频文件。该帧编号可以根据视频图像帧在视频文件中的播放时间确定。
在S102中,分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象。
在本实施例中,终端设备对视频文件进行解析,分别对视频文件中的各个视频图像帧进行人体识别,并提取各个视频图像帧关于目标对象的人体区域图像。提取人体区域图像的具体方式可以为:终端设备通过人脸识别算法,判断该视频图像帧中是否包含人脸区域图像,若不包含,则表示该视频图像帧不包含人体区域图像;反之,若该视频图像帧包含人脸图像,则基于该人脸图像所在的坐标,对该坐标的区域进行轮廓识别,基于识别得到的轮廓信息提取人脸图像对应的人体区域图像,并根据该人脸图像与目标对象的人脸模板进行匹配,从而判断该人体区域图像是否为目标对象的人体区域图像。
可选地,若目标对象的数量为多个,即需要监控多个对象的行为,则终端设备在确定视频图像帧包含的人脸图像的人体区域图像后,则会将该人脸图像与各个目标对象的人脸模板进行匹配,从而确定该人脸图像多对应的目标对象,并在该人体区域图像上标记关联的目标对象的对象标识,继而可以在视频图像帧中快速确定每个目标对象所对应的人体区域图像,方便对多对象的动作跟踪。
可选地,在本实施例中,终端设备可以根据目标对象的对象标识,获取对象标识关联的对象人体模板。该对象人体模板可以用于表示该目标对象的人体特征,例如体型信息、性别信息和/或发型信息,终端设备可以根据该对象人体模板在视频图像帧中进行滑动框取,计算所框取的候选区域与对象人体模板之间的匹配度,若两者匹配度大于预设的匹配阈值,则识别该候选区域为目标对象的人体区域图像;反之,若两者匹配度小于或等于匹配阈值,则识别该候选区域并非目标对象的人体区域图像,继续进行滑动框取;若视频图像帧中所有候选区域均不包含人体区域图像,则对下一帧的视频图像帧重复执行上述操作,识别目标对象的人体区域图像。
在本实施例中,终端设备除了获取目标对象的人体区域图像外,还可以从图像中提取可与用户进行交互的可交互对象。具体识别的方式可以为:通过轮廓识别算法,确定所述视频图像帧中包含的轮廓信息,基于所述轮廓信息确定各个拍摄主体的主体类型,并根据所述主体类型确定可交互对象。不同类型的可交互主体的轮廓特性会存在差异,因此通过识别轮库信息可以确定拍摄主体的主体类型,并根据主体类型选取能够与目标对象进行交互的拍摄主体作为可交互对象。例如,椅子、桌子、刀等拍摄对象,则可能与目标对象产生交互行为的,而云、太阳等拍摄主体,则与目标对象产生交互的概率较低。因此,通过识别主体类型,可以过滤大部分无效的可交互对象。
可选地,终端设备在识别得到拍摄主体后,会计算各个拍摄主体与人体区域图像之间的距离值,选取所述距离值小于预设阈值的拍摄主体作为可交互对象。优选地,终端设备可以选取轮廓边界与所述人体区域图像相邻的拍摄主体作为可交互主体,由于目标对象与交互主体进行交互,即两者是互相接触的,因此可交互对象的轮廓边界与目标用户是相邻的。
在S103中,在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标。
在本实施例中,终端设备存储有一个人体关键部位列表,该人体关键部位列表包含有多个人体关键部位,优选地,人体关键部位列表包含有17个关键部位,分别为:鼻子,双眼,双耳,双肩,双腕,双手,双腰,双膝,双脚这17个关键部位。通过定位多个人体关键部位,并追踪多个关键部位的运动变化情况,能够提高人体动作识别的准确率。
在本实施例中,终端设备在人体区域图像中标记出各个关键部位,具体的标记方式为:基于人体区域图像的轮廓信息,确定目标对象当前的姿态类型,其中,姿态类型具体为:站立类型、行走类型、平躺类型、正坐类型等,继而根据不同关键部位与姿态类型的对应关系,在人体区域图像上标记出各个关键部位。可选地,该对应关系记录有关键部位与人体区域图像的轮廓中心点的距离值以及相对方向向量,终端设备可以基于该距离值以及相对方向向量定位出各个关键部位,并执行标记操作。
在本实施例中,终端设备基于视频图像帧建立一个图像坐标轴,并根据各个关键部位在视频图像帧上的位置,从而确定各个关键部位的特征坐标。可选地,终端设备可以将视频图像帧的左下角的端点作为坐标原点,也可以将图像中心点作为坐标原点,具体根据管理员或设备的默认设置决定。
在S104中,根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列。
在本实施例中,终端设备需要确定各个关键部位的运动轨迹,因此会基于关键部位的部位标识,从各个视频图像帧中提取关于该部位标识对应的特征坐标,并将所有关于该特征部位的特征坐标进行封装,生成关于该特征部位的关键特征序列。其中,该关键特征序列中各个元素的在序列中的次序与所属视频图像帧的帧序号一致,即该关键特征序列中各个元素是具有时序关系的,从而能够通过关键特征序列确定关键部位基于时间的推移而变化的情况。
可选地,若部分视频图像帧中的关键部位因被遮挡而不存在对应的特征坐标,终端设备可以根据视频图像帧的帧序号,在预设的坐标轴上建立关于关键部位的特征曲线,基于帧序号依次连接各个特征坐标,而缺失的视频图像帧对应的特征坐标则可以通过平滑算法进行填补,确定缺失的视频图像帧对应的特征坐标。
在S105中,通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作。
在本实施例中,终端设备根据多个关键部位的关键特征序列,则可以确定不同关键部位的运动轨迹,继而符合上述运动轨迹的动作类型作为候选类型。具体地,终端设备可以根据关键特征序列确定关键部位的运 动方向,继而基于多个关键部位的运动方向,与各个动作类型库中各个动作模板的关键部位的运动方向一一进行匹配,基于匹配的关键部位的个数,例如选取匹配的关键部位的个数大于预设匹配阈值的动作模板作为目标对象的候选动作。
可选地,终端设备可以设置有最大的帧数,继而终端设备基于该最大的帧数对关键部位的关键特征序列进行划分,划分为多个特征子序列,分别确定不同特征子序列的动作类型,由于拍摄的视频文件的时长较长的情况下,用户可能在该拍摄过程中做出多个动作,基于此,终端设备为了对不同的动作进行划分以及识别,会设置有最大的帧数,实现单用户的多动作识别的目的。
在S106中,分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型。
在本实施例中,终端设备可以获取可交互对象的交互行为列表,并检测所述候选动作与交互行为列表中各个交互行为的相似度,选取所述相似度最大值作为候选动作与可交互对象之间的匹配度,继而通过各个候选动作的匹配度,确定目标对象的动作类型。需要说明的是,识别得到的动作类型可以由多个,例如用户可以在握着水果到的同时,使用水果刀切水果,即包含了“握”以及“切”两个交互动作,因此终端设备最后识别得到的动作类型的个数可以为多个。基于此,终端设备可以选取匹配度大于预设的匹配阈值的候选动作作为目标对象当前执行的动作类型。
又例如,终端设备可以在对视频监控得到的视频文件进行动作类型的判断,具体地,该视频文件可以为关于安检区域的视频文件,对安检区域的人员进行交互行为的判定,检测是否有用户存在异常行为。在视频监控文件中定位出待识别的目标对象,并判断该目标对象与各个可交互物体之间的动作类型,可交互物体可以为行李箱或待认证证件,判断用户是否按规定提交行李箱进行安检操作,抑或是从行李箱中拿取危险物品来躲避安检操作,从而能够提高安检过程的准确性。
可选地,终端设备可以识别各个可交互对象与人体区域图像之间的距离值,并选取所述距离值最下的一个可交互对象作为目标交互对象,并计算所述目标交互对象与各个候选动作之间的匹配度,从而确定目标对象的动作类型。
以上可以看出,本申请实施例提供的一种人体动作的识别方法通过获取所需要进行动作行为分析的目标用户的视频文件,并对该视频文件的各个视频图像帧进行解析,确定每个视频图像帧中包含的人体区域图像,在人体区域图像中标记出各个关键部位,并根据各个关键部位的特征坐标,确定目标对象的各个部位的变化情况,从而确定目标对象的动作类型,自动识别目标对象的人体动作。与现有的人体动作的识别技术相比,本申请实施例无需依赖神经网络对视频图像进行动作类型的识别,并不借助光流信息,避免了需要进行时序递归而带来的识别时延,从而提高了识别的效率,而且通过定位多个关键部位,通过多个关键部位的变动情况确定目标对象的动作,准确率也进一步提高,从而提高了图像识别的效果以及对象行为分析的效率。
图2示出了本申请第二实施例提供的一种人体动作的识别方法S106的具体实现流程图。参见图2,相对 于图1所述实施例,本实施例提供的一种人体动作的识别方法S106包括:S1061~S1066,具体详述如下:
进一步地,所述分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型,包括:
在S1061中,获取所述可交互对象与所述人体区域图像之间的距离值,并基于所述距离值确定所述可交互对象的交互置信度。
在本实施例中,终端设备可以在视频图像帧上标记出可交互对象所在的区域图像,并将所述区域图像的中心坐标作为所述可交互对象的特征坐标,计算所述特征坐标与所述人体区域的中心坐标之间的欧氏距离,将所述欧氏距离作为可交互对象与人体区域图像的距离值。若该距离值越小,则两者之间的交互概率越大;反之,若该距离值越大,则两者之间的交互概率越小。因此,终端设备可以根据该距离值计算可交互对象与目标人体之间的交互置信度。
在1062中,分别计算所述关键特征序列与各个所述候选动作的标准特征序列之间的相似度,将所述相似度识别为所述候选动作的动作置信度。
在本实施例中,终端设备需要确定识别得到的候选动作的正确概率,因此会获取该候选动作的标准特征序列,并计算在多个视频图像帧中的关键特征序列与标准特征序列之间的相似度。其中,相似度的计算方式可以为:终端设备在预设的坐标轴上生成关于标准特征序列的标准曲线,以及计算所述关键特征序列的行为曲线,计算上述两个曲线之间的围成封闭区域的面积,基于所述面积确定所述关键特征序列与标准特征序列之间的相似度。若该面积越大,则表示两个动作之间的差距越大,则相似度越小;反之,若该面积越小,则表示两个动作之间的差距越小,则相似度越大。
在S1063中,基于所述可交互对象的对象类型,确定所述候选动作与所述对象类型的交互概率。
在本实施例中,终端设备根据可交互对象的轮廓信息,确定可交互对象的对象类型,即确定其属于哪一类型的物品,并判断该对象类型与该候选动作的交互概率。例如,“篮球”这一对象类型,可作为“投”、“踢”等候选动作的动作受体,即交互概率较大;而对于“坐”、“站”等候选动作,则不会与“篮球”这一对象类型进行交互,则交互概率较小。终端设备可以根据动作记录库,获取各个候选动作的动作受体对象,计算该对象类型所对应的动作记录的个数,并基于该个数确定该对象类型与候选动作之间的交互概率。
在S1064中,从所述视频图像帧中提取所述可交互对象的对象区域图像,并根据所述对象区域图像与所述对象类型预设的标准图像,确定所述可交互对象的对象置信度。
在本实施例中,终端设备还需要确定可交互对象识别的准确性,因此会获取可交互对象的对象区域图像,与该对象类型匹配的标准图像进行相似度比对,根据两个图像之间的相似度,确定该可交互对象的对象置信度。
在S1065中,将所述交互置信度、所述动作置信度、所述对象置信度以及所述交互概率导入到匹配度计算模型,确定所述候选动作的所述匹配度;所述匹配度计算模型具体为:
Figure PCTCN2019103161-appb-000001
其中,
Figure PCTCN2019103161-appb-000002
为所述候选动作a的所述匹配度;
Figure PCTCN2019103161-appb-000003
为所述交互置信度;s h为所述动作置信度;s o为所述对象置信度;
Figure PCTCN2019103161-appb-000004
为所述交互概率;
Figure PCTCN2019103161-appb-000005
为预设的所述候选动作a的触发概率。
在本实施例中,终端设备将上述四个计算得到的参数导入到匹配度计算模型,确定该候选动作与可交互对象之间的匹配度,从而能够借助交互对象对动作类型进行进一笔的筛选识别。特别地,该候选动作的触发概率,可以根据上一图像帧对应的动作类型以及下一图像帧的动作类型计算当前的候选动作的触发概率,由于动作具有一定的连续性,因此可以通过已触发的动作以及后续的动作来确定当前动作的触发概率。
在S1066中,选取所述匹配度大于匹配阈值的所述候选动作,作为所述目标对象的动作类型。
在本实施例中,由于与可交互对象的交互动作可以存在多个,因此终端设备可以选取匹配度大于预设的匹配阈值的候选动作作为目标对象的动作类型。
在本申请实施例中,通过确定候选动作与可交互对象在多个维度的置信度,从而计算出各个候选动作的匹配度,能够提高匹配度计算的准确性,从而提高人体动作识别的准确率。
图3示出了本申请第三实施例提供的一种人体动作的识别方法S104的具体实现流程图。参见图3,相对于图1所述的实施例,本实施例提供的一种人体动作的识别方法S104包括:S1041~S1045,具体详述如下:
进一步地,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
在S1041中,获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值。
在本实施例中,终端设备需要进行人体关键部位追踪,若检测到两个相邻图像帧中相同关键部位的位移过大,则标识两个关键部位属于不同的人体,从而能够快速进行重追踪,并且提高动作识别的准确率。基于此,终端设备会获取帧数相邻的两个视频图像帧中相同关键部位的第一特征坐标以及第二特征坐标,将两个特征坐标导入到欧氏距离计算公式,计算出两个坐标点之间的距离值,即图像距离值。该图像距离值具体指在视频图像帧上两个坐标点之间的距离,并非该关键部位在实际场景下的移动距离,因此需要对该图像距离值进行数值转换。
在S1042中,计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距。
在本实施例中,终端设备获取人体区域图像在视频图像帧中所占据的面积,即图像面积。终端设备设置有标准的人体面积以及该面积所对应的标准拍摄焦距。终端设备可以计算当前的图像面积与标准的人体面积之间的比例,确定缩放比例,基于所述缩放比例以及标准拍摄焦距,计算该目标对象与拍摄模型之间的实际拍摄焦距,即上述的拍摄焦距。
在S1043中,将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
Figure PCTCN2019103161-appb-000006
其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率。
在本实施例中,终端设备该视频图像帧对应的拍摄焦距以及两个关键部位的图像距离值以及该视频文件的拍摄帧率导入到距离转换模型内,从而能够计算关键部位在场景中的实际移动距离。
在S1044中,将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标。
在本实施例中,终端设备若检测到实际移动距离大于或等于预设的距离阈值,则表示该关键部位移动距离超过了正常的移动距离,此时会识别两个视频图像帧中该关键部位属于不同的目标对象,此时会判定上述两个特征坐标为非关联的特征坐标;反之,若该实际移动距离值小于预设的距离阈值,则表示两个视频图像帧中该关键部位属于同一目标对象,此时会判定上述两个特征坐标为关联的特征坐标,实现对目标对象的追踪的目的,避免在追踪用户A的运动轨迹的情况下,切换到追踪用户B的运动轨迹,提高了动作识别的准确率。
在S1045,根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
在本实施例中,终端设备将所有非关联的特征坐标进行过滤,将互为关联的特征坐标进行封装,生成关于关键部位的关键特征序列。
在本申请实施例中,通过计算不同帧数下关键部位的实际移动距离,从而能够对异常的特征坐标点进行过滤,提高了动作识别的准确性。
图4示出了本申请第四实施例提供的一种人体动作的识别方法S102的具体实现流程图。参见图4,相对于图1至3所述实施例,本实施例提供的一种人体动作的识别方法S102包括:S1021~S1024,具体详述如下:
进一步地,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,包括:
在S1021中,通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积。
在本实施例中,终端设备通过轮廓识别算法,确定该视频图像帧中的轮廓曲线。具体识别轮廓线的方式可以为:终端设备计算相邻两个坐标点之间的像素值的差值,若该差值大于预设的轮廓阈值,则识别该坐标点为轮廓线所在的坐标点,连接所有识别得到的轮廓线上的坐标点,构成一条连续的轮廓曲线。每一条封闭的轮廓曲线对应一个拍摄对象。
在本实施例中,终端设备在视频图像帧上标记出所有轮廓曲线,并将轮廓曲线和/或视频图像帧的边界之间所围成的区域进行积分,从而能够得到关于各个轮廓曲线对应的区域面积,由于一条轮廓曲线对应一个拍摄对象,基于区域面积,可以确定被拍摄对象的缩放比例,从而能够选取合适的窗口在提取人体区域图像,提高人体区域图像提取的准确性。
在S1022中,根据各个所述区域面积,生成所述视频图像帧的人体识别窗口。
在本实施例中,由于不同的缩放比例,人体识别窗口的尺寸也需要随之调整,基于此,终端设备可以根据各个拍摄对象的区域面积,计算出视频图像帧对应的缩放比例,并查询该缩放比例关联的人体识别窗口尺寸,继而生成与视频图像帧匹配的人体识别窗口。
可选地,在本实施例中,终端设备采用的是yolov3的人体识别算法,而yolov3需要配置3个人体识别窗口。基于此,终端设备根据各个轮廓曲线所围成的区域面积,生成区域面积的分布情况,选取分布密度最大的三个区域面积作为特征面积,并基于三个特征面积生成与之对应的人体识别窗口,即三个feature map。
在S1023中,基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像。
在本实施例中,终端设备在生成与视频图像帧的缩放比例对应的人体识别窗口后,可以通过人体识别窗口在视频图像帧上进行滑动框取,将每一次框取的区域图像作为候选区域图像。若存在多个尺寸的人体识别窗口,则创建与人体识别窗口数量对应的并发线程,并复制该多个视频图像帧,通过多条并发线程分别控制人体识别窗口在不同的视频图像帧上进行滑动框取,即不同尺寸的人体识别窗口的滑动框取操作是相互独立、互不影响的,生成不同尺寸的候选区域图像。
在S1024中,分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
在本实施例中,终端设备计算该候选区域图像与标准人体模板之间的重合率,若两者之间的重合率越高,则表示该区域图像所对应的拍摄对象与目标对像的相似度越高,因此可以识别该候选区域为人体区域图像;反之,若两者之间的重合率越低,则表示该区域图像的形态与目标对象的相似度较低,识别为非人体区域图像。由于视频图像帧中可以包含多个不同用户,因此终端设备会将所有重合率超过预设的重合率阈值的候选区域均识别为人体区域图像,在该情况下,终端设备可以定位各个人体区域图像的人脸图像,从而将人体图像与目标对象的标准人脸进行匹配,从而选取与标准人脸相匹配的人体区域图像作为目标对象的人体区域图像。
在本申请实施例中,通过获取视频图像帧中的轮廓曲线,从而基于各个轮廓曲线的区域面积,确定视频图像帧的缩放比例,并生成与之对应的人体识别窗口进行人体区域图像的识别操作,从而能够提高识别的准确率。
图5示出了本申请第五实施例提供的一种人体动作的识别方法S105的具体实现流程图。参见图5,相对于图1至图3所述实施例,本实施例提供的一种人体动作的识别方法S105包括:S1051~S1052,具体详述如 下:
进一步地,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作,包括:
在S1051中,在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线。
在本实施例中,终端设备根据各个关键特征序列中各个特征坐标的坐标值以及对应的视频图像帧的帧数,在预设的坐标轴上标记出各个各个特征坐标,并连接各个特征坐标,生成关于关键部位的部位变化曲线。该坐标轴可以以视频图像帧为基础建立的坐标轴,横轴标对应视频图像帧的长,纵坐标对应视频图像帧的宽。
在S1052中,将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的所述候选动作。
在本实施例中,终端设备根据所有关键部位的部位变化曲线与预设动作库中各个候选动作的标准动作曲线进行匹配,计算两个变化曲线的重合率,选取重合率最高的一个候选动作为目标对象的动作类型。
在本申请实施例中,通过绘制关键部位的部位变化曲线从而能够直观地确定目标对象的动作类型,提高了动作类型的准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
图6示出了本申请一实施例提供的一种人体动作的识别设备的结构框图,该人体动作的识别设备包括的各单元用于执行图1对应的实施例中的各步骤。具体请参阅图1与图1所对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。
参见图6,所述人体动作的识别设备包括:
视频文件获取单元61,用于获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
人体区域图像提取单元62,用于分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象;
关键部位识别单元63,用于在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
关键特征序列生成单元,用于根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标64,生成关于所述关键部位的关键特征序列;
候选动作识别单元65,用于通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作;
动作类型识别单元66,用于分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型。
可选地,所述动作类型识别单元66包括:
交互置信度计算单元,用于获取所述可交互对象与所述人体区域图像之间的距离值,并基于所述距离值确定所述可交互对象的交互置信度;
动作置信度识别单元,用于分别计算所述关键特征序列与各个所述候选动作的标准特征序列之间的相似度,将所述相似度识别为所述候选动作的动作置信度;
交互概率确定单元,用于基于所述可交互对象的对象类型,确定所述候选动作与所述对象类型的交互概率;
对象置信度识别单元,用于从所述视频图像帧中提取所述可交互对象的对象区域图像,并根据所述对象区域图像与所述对象类型预设的标准图像,确定所述可交互对象的对象置信度;
匹配度计算单元,用于将所述交互置信度、所述动作置信度、所述对象置信度以及所述交互概率导入到匹配度计算模型,确定所述候选动作的所述匹配度;所述匹配度计算模型具体为:
Figure PCTCN2019103161-appb-000007
其中,
Figure PCTCN2019103161-appb-000008
为所述候选动作a的所述匹配度;
Figure PCTCN2019103161-appb-000009
为所述交互置信度;s h为所述动作置信度;s o为所述对象置信度;
Figure PCTCN2019103161-appb-000010
为所述交互概率;
Figure PCTCN2019103161-appb-000011
为预设的所述候选动作a的触发概率;
候选动作选取单元,用于选取所述匹配度大于匹配阈值的所述候选动作,识别为所述目标对象的动作类型。
可选地,所述关键特征序列生成单元64包括:
图像距离值计算单元,用于获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
拍摄焦距确定单元,用于计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
实际移动距离计算单元,用于将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
Figure PCTCN2019103161-appb-000012
其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
关联坐标识别单元,用于将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
关联坐标封装单元,用于根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
可选地,所述人体区域图像提取单元62包括:
轮廓曲线获取单元,用于通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
人体识别窗口生成单元,用于根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
候选区域图像提取单元,用于基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
人体区域图像匹配单元,用于分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
可选地,所述动作类型识别单元65包括:
部位变化曲线生成单元,用于在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
候选动作选取单元,用于将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
因此,本申请实施例提供的人体动作的识别设备同样可以无需依赖神经网络对视频图像进行动作类型的识别,并不借助光流信息,避免了需要进行时序递归而带来的识别时延,从而提高了识别的效率,另一方面终端设备会确定视频图像帧中的交互对象,借助交互动作确定目标用户是否存在交互行为,从而能够对多个近似姿态进行区分,进一步提高了动作识别的准确率。
图7是本申请另一实施例提供的一种终端设备的示意图。如图7所示,该实施例的终端设备7包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机可读指令72,例如人体动作的识别程序。所述处理器70执行所述计算机可读指令72时实现上述各个人体动作的识别方法实施例中的步骤,例如图1所示的S101至S106。或者,所述处理器70执行所述计算机可读指令72时实现上述各装置实施例中各单元的功能,例如图6所示模块61至66功能。
示例性的,所述计算机可读指令72可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器71中,并由所述处理器70执行,以完成本申请。所述一个或多个单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述所述计算机可读指令72在所述终端设备7中的执行过程。例如,所述计算机可读指令72可以被分割成视频文件获取单元、人体区域图像提取单元、关键部位识别单元、关键特征序列生成单元、候选动作识别单元以及动作类型识别单元,各单元具体功能如上所述。
所述终端设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器70、存储器71。本领域技术人员可以理解,图7仅仅是终端设备7的示例,并不构成对终端设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如 所述终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器70可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器71可以是所述终端设备7的内部存储单元,例如终端设备7的硬盘或内存。所述存储器71也可以是所述终端设备7的外部存储设备,例如所述终端设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器71还可以既包括所述终端设备7的内部存储单元也包括外部存储设备。所述存储器71用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器71还可以用于暂时地存储已经输出或者将要输出的数据。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种人体动作的识别方法,其特征在于,包括:
    获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象;
    在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作;
    分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型。
  2. 根据权利要求1所述的识别方法,其特征在于,所述分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型,包括:
    获取所述可交互对象与所述人体区域图像之间的距离值,并基于所述距离值确定所述可交互对象的交互置信度;
    分别计算所述关键特征序列与各个所述候选动作的标准特征序列之间的相似度,将所述相似度识别为所述候选动作的动作置信度;
    基于所述可交互对象的对象类型,确定所述候选动作与所述对象类型的交互概率;
    从所述视频图像帧中提取所述可交互对象的对象区域图像,并根据所述对象区域图像与所述对象类型预设的标准图像,确定所述可交互对象的对象置信度;
    将所述交互置信度、所述动作置信度、所述对象置信度以及所述交互概率导入到匹配度计算模型,确定所述候选动作的所述匹配度;所述匹配度计算模型具体为:
    Figure PCTCN2019103161-appb-100001
    其中,
    Figure PCTCN2019103161-appb-100002
    为所述候选动作a的所述匹配度;
    Figure PCTCN2019103161-appb-100003
    为所述交互置信度;s h为所述动作置信度;s o为所述对象置信度;
    Figure PCTCN2019103161-appb-100004
    为所述交互概率;
    Figure PCTCN2019103161-appb-100005
    为预设的所述候选动作a的触发概率;
    选取所述匹配度大于匹配阈值的所述候选动作,作为所述目标对象的动作类型。
  3. 根据权利要求1所述的识别方法,其特征在于,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
    获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103161-appb-100006
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  4. 根据权利要求1-3任一项所述的识别方法,其特征在于,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象,包括:
    通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  5. 根据权利要求1-3任一项所述的识别方法,其特征在于,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作,包括:
    在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的所述候选动作。
  6. 一种人体动作的识别设备,其特征在于,包括:
    视频文件获取单元,用于获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    人体区域图像提取单元,用于分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象;
    关键部位识别单元,用于在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    关键特征序列生成单元,用于根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    候选动作识别单元,用于通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作;
    动作类型识别单元,用于分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述 候选动作中确定所述目标对象的动作类型。
  7. 根据权利要求6所述的识别设备,其特征在于,所述动作类型识别单元包括:
    交互置信度计算单元,用于获取所述可交互对象与所述人体区域图像之间的距离值,并基于所述距离值确定所述可交互对象的交互置信度;
    动作置信度识别单元,用于分别计算所述关键特征序列与各个所述候选动作的标准特征序列之间的相似度,将所述相似度识别为所述候选动作的动作置信度;
    交互概率确定单元,用于基于所述可交互对象的对象类型,确定所述候选动作与所述对象类型的交互概率;
    对象置信度识别单元,用于从所述视频图像帧中提取所述可交互对象的对象区域图像,并根据所述对象区域图像与所述对象类型预设的标准图像,确定所述可交互对象的对象置信度;
    匹配度计算单元,用于将所述交互置信度、所述动作置信度、所述对象置信度以及所述交互概率导入到匹配度计算模型,确定所述候选动作的所述匹配度;所述匹配度计算模型具体为:
    Figure PCTCN2019103161-appb-100007
    其中,
    Figure PCTCN2019103161-appb-100008
    为所述候选动作a的所述匹配度;
    Figure PCTCN2019103161-appb-100009
    为所述交互置信度;s h为所述动作置信度;s o为所述对象置信度;
    Figure PCTCN2019103161-appb-100010
    为所述交互概率;
    Figure PCTCN2019103161-appb-100011
    为预设的所述候选动作a的触发概率;
    候选动作选取单元,用于选取所述匹配度大于匹配阈值的所述候选动作,识别为所述目标对象的动作类型。
  8. 根据权利要求6所述的识别设备,其特征在于,所述关键特征序列生成单元包括:
    图像距离值计算单元,用于获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    拍摄焦距确定单元,用于计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    实际移动距离计算单元,用于将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103161-appb-100012
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    关联坐标识别单元,用于将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    关联坐标封装单元,用于根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  9. 根据权利要求6-8任一项所述的识别设备,其特征在于,所述人体区域图像提取单元包括:
    轮廓曲线获取单元,用于通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    人体识别窗口生成单元,用于根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    候选区域图像提取单元,用于基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    人体区域图像匹配单元,用于分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  10. 根据权利要求6-8任一项所述的识别设备,其特征在于,所述动作类型识别单元包括:
    部位变化曲线生成单元,用于在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    候选动作选取单元,用于将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
  11. 一种终端设备,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象;
    在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作;
    分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型。
  12. 根据权利要求11所述的终端设备,其特征在于,所述分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型,包括:
    获取所述可交互对象与所述人体区域图像之间的距离值,并基于所述距离值确定所述可交互对象的交互置信度;
    分别计算所述关键特征序列与各个所述候选动作的标准特征序列之间的相似度,将所述相似度识别为所述候选动作的动作置信度;
    基于所述可交互对象的对象类型,确定所述候选动作与所述对象类型的交互概率;
    从所述视频图像帧中提取所述可交互对象的对象区域图像,并根据所述对象区域图像与所述对象类型预设的标准图像,确定所述可交互对象的对象置信度;
    将所述交互置信度、所述动作置信度、所述对象置信度以及所述交互概率导入到匹配度计算模型,确定所述候选动作的所述匹配度;所述匹配度计算模型具体为:
    Figure PCTCN2019103161-appb-100013
    其中,
    Figure PCTCN2019103161-appb-100014
    为所述候选动作a的所述匹配度;
    Figure PCTCN2019103161-appb-100015
    为所述交互置信度;s h为所述动作置信度;s o为所述对象置信度;
    Figure PCTCN2019103161-appb-100016
    为所述交互概率;
    Figure PCTCN2019103161-appb-100017
    为预设的所述候选动作a的触发概率;
    选取所述匹配度大于匹配阈值的所述候选动作,作为所述目标对象的动作类型。
  13. 根据权利要求11所述的终端设备,其特征在于,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
    获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103161-appb-100018
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  14. 根据权利要求11-13任一项所述的终端设备,其特征在于,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象,包括:
    通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  15. 根据权利要求11-13任一项所述的终端设备,其特征在于,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作,包括:
    在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的所述候选动作。
  16. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:
    获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象;
    在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作;
    分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型。
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述分别计算各个所述候选动作与所述可交互对象之间的匹配度,并根据所述匹配度,从所述候选动作中确定所述目标对象的动作类型,包括:
    获取所述可交互对象与所述人体区域图像之间的距离值,并基于所述距离值确定所述可交互对象的交互置信度;
    分别计算所述关键特征序列与各个所述候选动作的标准特征序列之间的相似度,将所述相似度识别为所述候选动作的动作置信度;
    基于所述可交互对象的对象类型,确定所述候选动作与所述对象类型的交互概率;
    从所述视频图像帧中提取所述可交互对象的对象区域图像,并根据所述对象区域图像与所述对象类型预设的标准图像,确定所述可交互对象的对象置信度;
    将所述交互置信度、所述动作置信度、所述对象置信度以及所述交互概率导入到匹配度计算模型,确定所述候选动作的所述匹配度;所述匹配度计算模型具体为:
    Figure PCTCN2019103161-appb-100019
    其中,
    Figure PCTCN2019103161-appb-100020
    为所述候选动作a的所述匹配度;
    Figure PCTCN2019103161-appb-100021
    为所述交互置信度;s h为所述动作置信度;s o为所述对象置信度;
    Figure PCTCN2019103161-appb-100022
    为所述交互概率;
    Figure PCTCN2019103161-appb-100023
    为预设的所述候选动作a的触发概率;
    选取所述匹配度大于匹配阈值的所述候选动作,作为所述目标对象的动作类型。
  18. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
    获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中 所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103161-appb-100024
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  19. 根据权利要求16-18任一项所述的计算机非易失性可读存储介质,其特征在于,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,以及确定所述视频图像帧包含的可交互对象,包括:
    通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  20. 如权利要求16-18任一项所述的计算机非易失性可读存储介质,其特征在于,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的至少一个候选动作,包括:
    在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的所述候选动作。
PCT/CN2019/103161 2019-04-03 2019-08-29 一种人体动作的识别方法及设备 WO2020199479A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910264883.X 2019-04-03
CN201910264883.XA CN110135246B (zh) 2019-04-03 2019-04-03 一种人体动作的识别方法及设备

Publications (1)

Publication Number Publication Date
WO2020199479A1 true WO2020199479A1 (zh) 2020-10-08

Family

ID=67569223

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103161 WO2020199479A1 (zh) 2019-04-03 2019-08-29 一种人体动作的识别方法及设备

Country Status (2)

Country Link
CN (1) CN110135246B (zh)
WO (1) WO2020199479A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364835A (zh) * 2020-12-09 2021-02-12 武汉轻工大学 视频信息取帧方法、装置、设备及存储介质
CN112507875A (zh) * 2020-12-10 2021-03-16 上海连尚网络科技有限公司 一种用于检测视频重复度的方法与设备
CN112528785A (zh) * 2020-11-30 2021-03-19 联想(北京)有限公司 一种信息处理方法及装置
CN112529943A (zh) * 2020-12-22 2021-03-19 深圳市优必选科技股份有限公司 一种物体检测方法、物体检测装置及智能设备
CN112580499A (zh) * 2020-12-17 2021-03-30 上海眼控科技股份有限公司 文本识别方法、装置、设备及存储介质
CN112784760A (zh) * 2021-01-25 2021-05-11 北京百度网讯科技有限公司 人体行为识别方法、装置、设备以及存储介质
CN112906660A (zh) * 2021-03-31 2021-06-04 浙江大华技术股份有限公司 安检预警方法和装置、存储介质及电子设备
CN113288087A (zh) * 2021-06-25 2021-08-24 成都泰盟软件有限公司 一种基于生理信号的虚实联动实验系统
CN113537121A (zh) * 2021-07-28 2021-10-22 浙江大华技术股份有限公司 身份识别方法和装置、存储介质及电子设备
CN113553951A (zh) * 2021-07-23 2021-10-26 北京市商汤科技开发有限公司 对象关联方法及装置、电子设备、计算机可读存储介质
CN113553562A (zh) * 2021-05-14 2021-10-26 浙江大华技术股份有限公司 支持多人体质的测试方法、测试装置及存储介质
CN113569693A (zh) * 2021-07-22 2021-10-29 南京华捷艾米软件科技有限公司 运动状态的识别方法、装置及设备
CN113784059A (zh) * 2021-08-03 2021-12-10 阿里巴巴(中国)有限公司 用于服装生产的视频生成与拼接方法、设备及存储介质
CN113850160A (zh) * 2021-09-08 2021-12-28 橙狮体育(北京)有限公司 重复动作的计数方法及装置
CN113869274A (zh) * 2021-10-13 2021-12-31 深圳联和智慧科技有限公司 基于城市管理的无人机智能跟踪监控方法及系统
CN115620392A (zh) * 2022-09-26 2023-01-17 珠海视熙科技有限公司 一种动作计数方法、装置、介质及健身设备
CN118015710A (zh) * 2024-04-09 2024-05-10 浙江深象智能科技有限公司 一种智能体育运动的识别方法和装置

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417205A (zh) * 2019-08-20 2021-02-26 富士通株式会社 目标检索装置及方法、电子设备
CN110738588A (zh) * 2019-08-26 2020-01-31 恒大智慧科技有限公司 智慧社区厕所管理办法及计算机存储介质
CN111288986B (zh) * 2019-12-31 2022-04-12 中科彭州智慧产业创新中心有限公司 一种运动识别方法及运动识别装置
CN113496143B (zh) * 2020-03-19 2024-07-16 北京市商汤科技开发有限公司 动作识别方法及装置、存储介质
CN111539352A (zh) * 2020-04-27 2020-08-14 支付宝(杭州)信息技术有限公司 一种判断人体关节运动方向的方法及系统
CN111814775B (zh) * 2020-09-10 2020-12-11 平安国际智慧城市科技股份有限公司 目标对象异常行为识别方法、装置、终端及存储介质
CN112418137B (zh) * 2020-12-03 2022-10-25 杭州云笔智能科技有限公司 一种目标对象的操作识别方法和系统
CN112528823B (zh) * 2020-12-04 2022-08-19 燕山大学 一种基于关键帧检测和语义部件分割的条斑鲨运动行为分析方法及系统
CN112712906B (zh) * 2020-12-29 2024-07-12 讯飞医疗科技股份有限公司 视频图像处理方法、装置、电子设备及存储介质
CN112883816A (zh) * 2021-01-26 2021-06-01 百度在线网络技术(北京)有限公司 信息推送方法和装置
CN113657278A (zh) * 2021-08-18 2021-11-16 成都信息工程大学 一种运动姿态识别方法、装置、设备及存储介质
CN114157526B (zh) * 2021-12-23 2022-08-12 广州新华学院 一种基于数字图像识别的家居安全远程监控方法及装置
CN115171217B (zh) * 2022-07-27 2023-03-03 北京拙河科技有限公司 一种动态背景下的动作识别方法及系统
CN118212679A (zh) * 2022-12-15 2024-06-18 华为终端有限公司 一种动作匹配方法及装置
CN116704405B (zh) * 2023-05-22 2024-06-25 阿里巴巴(中国)有限公司 行为识别方法、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930767A (zh) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 一种基于人体骨架的动作识别方法
CN107335192A (zh) * 2017-05-26 2017-11-10 深圳奥比中光科技有限公司 运动辅助训练方法、装置及存储装置
US20180322336A1 (en) * 2015-11-30 2018-11-08 Korea Institute Of Industrial Technology Behaviour pattern analysis system and method using depth image
CN109325456A (zh) * 2018-09-29 2019-02-12 佳都新太科技股份有限公司 目标识别方法、装置、目标识别设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979330A (zh) * 2015-07-01 2016-09-28 乐视致新电子科技(天津)有限公司 体感按键的定位方法及装置
CN106022251B (zh) * 2016-05-17 2019-03-26 沈阳航空航天大学 基于视觉共生矩阵序列的异常双人交互行为识别方法
CN106658038A (zh) * 2016-12-19 2017-05-10 广州虎牙信息科技有限公司 基于视频流的直播交互方法及其相应的装置
CN107423721A (zh) * 2017-08-08 2017-12-01 珠海习悦信息技术有限公司 人机交互动作检测方法、装置、存储介质及处理器
CN108304762B (zh) * 2017-11-30 2021-11-05 腾讯科技(深圳)有限公司 一种人体姿态匹配方法及其设备、存储介质、终端
CN108197589B (zh) * 2018-01-19 2019-05-31 北京儒博科技有限公司 动态人体姿态的语义理解方法、装置、设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180322336A1 (en) * 2015-11-30 2018-11-08 Korea Institute Of Industrial Technology Behaviour pattern analysis system and method using depth image
CN105930767A (zh) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 一种基于人体骨架的动作识别方法
CN107335192A (zh) * 2017-05-26 2017-11-10 深圳奥比中光科技有限公司 运动辅助训练方法、装置及存储装置
CN109325456A (zh) * 2018-09-29 2019-02-12 佳都新太科技股份有限公司 目标识别方法、装置、目标识别设备及存储介质

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528785A (zh) * 2020-11-30 2021-03-19 联想(北京)有限公司 一种信息处理方法及装置
CN112364835A (zh) * 2020-12-09 2021-02-12 武汉轻工大学 视频信息取帧方法、装置、设备及存储介质
CN112364835B (zh) * 2020-12-09 2023-08-11 武汉轻工大学 视频信息取帧方法、装置、设备及存储介质
CN112507875A (zh) * 2020-12-10 2021-03-16 上海连尚网络科技有限公司 一种用于检测视频重复度的方法与设备
CN112580499A (zh) * 2020-12-17 2021-03-30 上海眼控科技股份有限公司 文本识别方法、装置、设备及存储介质
CN112529943A (zh) * 2020-12-22 2021-03-19 深圳市优必选科技股份有限公司 一种物体检测方法、物体检测装置及智能设备
CN112529943B (zh) * 2020-12-22 2024-01-16 深圳市优必选科技股份有限公司 一种物体检测方法、物体检测装置及智能设备
CN112784760A (zh) * 2021-01-25 2021-05-11 北京百度网讯科技有限公司 人体行为识别方法、装置、设备以及存储介质
CN112784760B (zh) * 2021-01-25 2024-04-12 北京百度网讯科技有限公司 人体行为识别方法、装置、设备以及存储介质
CN112906660A (zh) * 2021-03-31 2021-06-04 浙江大华技术股份有限公司 安检预警方法和装置、存储介质及电子设备
CN113553562A (zh) * 2021-05-14 2021-10-26 浙江大华技术股份有限公司 支持多人体质的测试方法、测试装置及存储介质
CN113288087A (zh) * 2021-06-25 2021-08-24 成都泰盟软件有限公司 一种基于生理信号的虚实联动实验系统
CN113569693A (zh) * 2021-07-22 2021-10-29 南京华捷艾米软件科技有限公司 运动状态的识别方法、装置及设备
CN113553951A (zh) * 2021-07-23 2021-10-26 北京市商汤科技开发有限公司 对象关联方法及装置、电子设备、计算机可读存储介质
CN113553951B (zh) * 2021-07-23 2024-04-16 北京市商汤科技开发有限公司 对象关联方法及装置、电子设备、计算机可读存储介质
CN113537121A (zh) * 2021-07-28 2021-10-22 浙江大华技术股份有限公司 身份识别方法和装置、存储介质及电子设备
CN113784059A (zh) * 2021-08-03 2021-12-10 阿里巴巴(中国)有限公司 用于服装生产的视频生成与拼接方法、设备及存储介质
CN113784059B (zh) * 2021-08-03 2023-08-18 阿里巴巴(中国)有限公司 用于服装生产的视频生成与拼接方法、设备及存储介质
CN113850160A (zh) * 2021-09-08 2021-12-28 橙狮体育(北京)有限公司 重复动作的计数方法及装置
CN113869274A (zh) * 2021-10-13 2021-12-31 深圳联和智慧科技有限公司 基于城市管理的无人机智能跟踪监控方法及系统
CN113869274B (zh) * 2021-10-13 2022-09-06 深圳联和智慧科技有限公司 基于城市管理的无人机智能跟踪监控方法及系统
CN115620392A (zh) * 2022-09-26 2023-01-17 珠海视熙科技有限公司 一种动作计数方法、装置、介质及健身设备
CN118015710A (zh) * 2024-04-09 2024-05-10 浙江深象智能科技有限公司 一种智能体育运动的识别方法和装置

Also Published As

Publication number Publication date
CN110135246A (zh) 2019-08-16
CN110135246B (zh) 2023-10-20

Similar Documents

Publication Publication Date Title
WO2020199479A1 (zh) 一种人体动作的识别方法及设备
WO2020199480A1 (zh) 一种人体动作的识别方法及设备
WO2021004112A1 (zh) 异常人脸检测方法、异常识别方法、装置、设备及介质
WO2021114892A1 (zh) 基于环境语义理解的人体行为识别方法、装置、设备及存储介质
WO2019184749A1 (zh) 轨迹跟踪方法、装置、计算机设备和存储介质
WO2019128508A1 (zh) 图像处理方法、装置、存储介质及电子设备
Sagonas et al. 300 faces in-the-wild challenge: The first facial landmark localization challenge
Yan et al. Learning the change for automatic image cropping
US9075453B2 (en) Human eye controlled computer mouse interface
WO2019071664A1 (zh) 结合深度信息的人脸识别方法、装置及存储介质
JP4951498B2 (ja) 顔画像認識装置、顔画像認識方法、顔画像認識プログラムおよびそのプログラムを記録した記録媒体
WO2015070764A1 (zh) 一种人脸定位的方法与装置
CN111695462B (zh) 一种人脸识别方法、装置、存储介质和服务器
WO2014040559A1 (zh) 场景识别的方法和装置
WO2021151317A1 (zh) 活体检测方法、装置、电子设备及存储介质
WO2021017286A1 (zh) 人脸识别方法、装置、电子设备及计算机非易失性可读存储介质
CN112364827A (zh) 人脸识别方法、装置、计算机设备和存储介质
CN110348331A (zh) 人脸识别方法及电子设备
CN113298158B (zh) 数据检测方法、装置、设备及存储介质
CN109214324A (zh) 基于多相机阵列的最正脸图像输出方法及输出系统
WO2021203718A1 (zh) 人脸识别方法及系统
WO2022206680A1 (zh) 图像处理方法、装置、计算机设备和存储介质
WO2021159672A1 (zh) 一种人脸图像的识别方法及设备
CN111159476A (zh) 目标对象的搜索方法、装置、计算机设备及存储介质
Mercaldo et al. A proposal to ensure social distancing with deep learning-based object detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19922508

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19922508

Country of ref document: EP

Kind code of ref document: A1