WO2020199480A1 - 一种人体动作的识别方法及设备 - Google Patents

一种人体动作的识别方法及设备 Download PDF

Info

Publication number
WO2020199480A1
WO2020199480A1 PCT/CN2019/103164 CN2019103164W WO2020199480A1 WO 2020199480 A1 WO2020199480 A1 WO 2020199480A1 CN 2019103164 W CN2019103164 W CN 2019103164W WO 2020199480 A1 WO2020199480 A1 WO 2020199480A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
human body
image
video image
key part
Prior art date
Application number
PCT/CN2019/103164
Other languages
English (en)
French (fr)
Inventor
叶明�
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020199480A1 publication Critical patent/WO2020199480A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • This application belongs to the field of image recognition technology, and in particular relates to a method and equipment for recognizing human movements.
  • the embodiments of the present application provide a method and device for recognizing human movements to solve the existing recognition methods of human movements.
  • the recognition speed is low and the accuracy is not high, thereby reducing the recognition of image processing.
  • the first aspect of the embodiments of the present application provides a method for recognizing human movements, including:
  • the video file includes a plurality of video image frames
  • the action type of the target object is determined through the key feature sequence of each of the key parts.
  • the embodiment of the application obtains the video file of the target user that needs to perform action behavior analysis, and analyzes each video image frame of the video file to determine the human body region image contained in each video image frame. Mark each key part, and determine the change of each part of the target object according to the characteristic coordinates of each key part, thereby determining the action type of the target object, and automatically identifying the human body movement of the target object.
  • the embodiment of the present application does not need to rely on neural networks to recognize the action type of the video image, and does not rely on optical flow information to avoid the recognition delay caused by the need to perform time sequence recursion. Therefore, the efficiency of recognition is improved, and by locating multiple key parts and determining the action of the target object through changes in multiple key parts, the accuracy is further improved, thereby improving the effect of image recognition and the efficiency of object behavior analysis.
  • FIG. 1 is an implementation flowchart of a method for recognizing human body movements provided by the first embodiment of the present application
  • FIG. 3 is a specific implementation flow chart of a method S104 for recognizing human body movements provided by the third embodiment of the present application;
  • FIG. 4 is a specific implementation flowchart of a method S103 for recognizing a human body movement provided by the fourth embodiment of the present application;
  • FIG. 5 is a specific implementation flow chart of a method S105 for recognizing human body movements provided by the fifth embodiment of the present application;
  • FIG. 6 is a structural block diagram of a human body motion recognition device provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of a terminal device provided by another embodiment of the present application.
  • the execution subject of the process is the terminal device.
  • the terminal equipment includes, but is not limited to: servers, computers, smart phones, tablet computers, and other devices capable of performing human body motion recognition operations.
  • Fig. 1 shows a flow chart of the method for recognizing human body movements provided by the first embodiment of the present application, which is detailed as follows:
  • a video file of a target object is acquired; the video file includes multiple video image frames.
  • the administrator can specify the video file containing the target object as the target video file.
  • the terminal device will download the video file about the target object from the video database according to the file identifier of the target video file. And recognize the action behavior of the target object.
  • the terminal device is specifically a video surveillance device that will obtain video files in the current scene; in this case, the terminal device will recognize each object captured in the current scene as the target object, based on the people of different shooting objects Face image, configure the object number for each object.
  • the terminal device will determine the action type of each monitored object in real time according to the video file generated during the monitoring process. If the action type of a certain target object is detected in the abnormal action list, a warning message will be generated , To notify the monitored object that performs the abnormal action to stop the abnormal behavior, so as to realize the real-time warning of the abnormal action of the monitored object.
  • the user can send the face information of the target object to the terminal device.
  • the terminal device searches for the face of each video file in the video database based on the face information, and uses the video file containing the face information as the target video file.
  • the specific search operation can be: the terminal device recognizes the candidate face in each video image frame in each video file in the video database, extracts the facial feature value of the key area in the candidate face, and compares the face of each candidate face The feature value is matched with the face information of the target face. If the matching degree between the two is greater than the preset matching threshold, it means that the two correspond to the same physical person, and the video file is recognized as a face image containing the target object.
  • the video file contains multiple video image frames, and each video image frame corresponds to a frame number. Based on the positive sequence of the frame numbers, the video image frames are arranged and encapsulated to generate the video file.
  • the frame number can be determined according to the playing time of the video image frame in the video file.
  • each of the video image frames is respectively parsed, and the human body region image related to the target object in the video image frame is extracted.
  • the terminal device analyzes the video file, performs body recognition on each video image frame in the video file, and extracts the human body region image of each video image frame with respect to the target object.
  • the specific method for extracting the human body region image can be as follows: the terminal device uses the face recognition algorithm to determine whether the video image frame contains the face region image, if not, it means that the video image frame does not contain the human body region image; otherwise, if If the video image frame contains a face image, then based on the coordinates where the face image is located, contour recognition is performed on the area of the coordinates, based on the contour information obtained by the recognition, the human body region image corresponding to the face image is extracted, and based on the face image Match with the face template of the target object to determine whether the human body area image is the human body area image of the target object.
  • the terminal device will determine the human body region image of the face image contained in the video image frame, and then combine the face image with each Match the face template of the target object to determine the target object corresponding to the face image, and mark the object identifier of the associated target object on the human body area image, and then quickly determine each target object in the video image frame
  • the corresponding human body region image is convenient for tracking the actions of multiple objects.
  • the terminal device may obtain the object human body template associated with the object identifier according to the object identifier of the target object.
  • the target human body template can be used to represent the human body characteristics of the target object, such as body shape information, gender information and/or hairstyle information.
  • the terminal device can slide the frame in the video image frame according to the target human body template, and calculate the framed If the matching degree between the candidate region and the target human body template is greater than the preset matching threshold, the candidate region is recognized as the human body region image of the target object; conversely, if the matching degree between the two is less than or equal to the matching threshold, Recognize that the candidate area is not the human body image of the target object, and continue to slide the frame; if all the candidate areas in the video image frame do not contain the human body image, repeat the above operation on the next video image frame to identify the target An image of the human body area of the subject.
  • each key part in the preset list of key parts of the human body is marked in the human body region image, and the characteristic coordinates of each key part are acquired.
  • the terminal device stores a list of key parts of the human body, and the list of key parts of the human body contains multiple key parts of the human body.
  • the list of key parts of the human body contains 17 key parts, namely: nose, eyes, The 17 key parts of ears, shoulders, wrists, hands, waist, knees and feet.
  • the terminal device marks each key part in the human body area image, and the specific marking method is: based on the contour information of the human body area image, determine the current posture type of the target object, where the posture type is specifically: standing type , Walking type, lying type, sitting type, etc. Then, according to the corresponding relationship between different key parts and posture types, each key part is marked on the image of the human body area.
  • the correspondence records the distance value between the key part and the contour center point of the human body region image and the relative direction vector, and the terminal device can locate each key part based on the distance value and the relative direction vector, and perform a marking operation.
  • the terminal device establishes an image coordinate axis based on the video image frame, and determines the characteristic coordinates of each key part according to the position of each key part on the video image frame.
  • the terminal device may use the end point of the lower left corner of the video image frame as the coordinate origin, or may use the image center point as the coordinate origin, which is specifically determined according to the default settings of the administrator or the device.
  • a key feature sequence about the key part is generated according to the feature coordinates corresponding to the key part in each of the video image frames.
  • the terminal device needs to determine the motion trajectory of each key part. Therefore, based on the part identifier of the key part, the feature coordinates corresponding to the part identifier are extracted from each video image frame, and all information about the characteristic part is extracted. The feature coordinates are encapsulated, and the key feature sequence about the feature part is generated. Among them, the sequence of each element in the key feature sequence in the sequence is consistent with the frame number of the video image frame to which it belongs, that is, each element in the key feature sequence has a time sequence relationship, so that the key feature sequence can be used to determine the key location based on time. The situation changes over time.
  • the terminal device can establish a characteristic curve about the key parts on the preset coordinate axes according to the frame number of the video image frame.
  • the feature coordinates are sequentially connected based on the frame number, and the feature coordinates corresponding to the missing video image frames can be filled by a smoothing algorithm to determine the feature coordinates corresponding to the missing video image frames.
  • the action type of the target object is determined through the key feature sequence of each of the key parts.
  • the terminal device can determine the motion trajectories of different key parts based on the key feature sequences of multiple key parts, and then determine the action type of the target object. Specifically, the terminal device may determine the movement direction of the key part according to the key feature sequence, and then match the movement direction of the key part of each candidate action type one by one based on the movement direction of the multiple key parts, based on the individual of the matched key part. For example, the candidate action type with the largest number of matched key parts is selected as the action type of the target object.
  • the terminal device may be set with a maximum number of frames, and then the terminal device divides the key feature sequence of the key part based on the maximum number of frames, divided into multiple feature subsequences, and respectively determines the action types of different feature subsequences , Due to the long duration of the captured video file, the user may make multiple actions during the shooting. Based on this, the terminal device will set the maximum number of frames in order to divide and recognize different actions. To achieve the purpose of single user multi-action recognition.
  • the method for recognizing human movements obtains the video file of the target user that needs to be analyzed for action behavior, and analyzes each video image frame of the video file to determine each video
  • the human body area image contained in the image frame mark each key part in the human body area image, and determine the change of each part of the target object according to the characteristic coordinates of each key part, so as to determine the action type of the target object and automatically recognize the target Human body movements of the subject.
  • the embodiment of the present application does not need to rely on neural networks to recognize the action type of the video image, and does not rely on optical flow information to avoid the recognition delay caused by the need to perform time sequence recursion. Therefore, the efficiency of recognition is improved, and by locating multiple key parts and determining the action of the target object through changes in multiple key parts, the accuracy is further improved, thereby improving the effect of image recognition and the efficiency of object behavior analysis.
  • Fig. 2 shows a specific implementation flow chart of a method S102 for recognizing a human body movement provided by the second embodiment of the present application.
  • a human body motion recognition method S102 provided in this embodiment includes: S1021 to S1024, which are detailed as follows:
  • parsing each of the video image frames separately, and extracting a human body region image about the target object in the video image frames includes:
  • the contour curve of the video image frame is obtained through the contour recognition algorithm, and the area of the area surrounded by each contour curve is calculated.
  • the terminal device uses a contour recognition algorithm to determine the contour curve in the video image frame.
  • the specific method for identifying the contour line can be as follows: the terminal device calculates the pixel value difference between two adjacent coordinate points, and if the difference value is greater than the preset contour threshold value, then the coordinate point is recognized as the coordinate point where the contour line is located. , Connect all the coordinate points on the identified contour line to form a continuous contour curve. Each closed contour curve corresponds to a subject.
  • the terminal device marks all contour curves on the video image frame, and integrates the area enclosed by the contour curve and/or the boundary of the video image frame, so as to obtain information about each contour curve.
  • the area area since a contour curve corresponds to a photographed object, based on the area area, the zoom ratio of the photographed object can be determined, so that a suitable window can be selected to extract the human body area image and the accuracy of the human body area image extraction can be improved.
  • a human body recognition window of the video image frame is generated according to the area of each of the regions.
  • the size of the human body recognition window also needs to be adjusted accordingly.
  • the terminal device can calculate the zoom ratio corresponding to the video image frame according to the area area of each shooting object, and query the The human body recognition window size associated with the zoom ratio is then generated to generate a human body recognition window matching the video image frame.
  • the terminal device uses the body recognition algorithm of yolov3, and yolov3 needs to be configured with three body recognition windows. Based on this, the terminal device generates the distribution of the area area according to the area enclosed by each contour curve, selects the three area areas with the highest distribution density as the characteristic area, and generates the corresponding human body recognition window based on the three characteristic areas , Namely three feature maps.
  • a sliding frame is performed on the video image frame based on the human body recognition window to generate multiple candidate region images.
  • the terminal device after the terminal device generates a human body recognition window corresponding to the zoom ratio of the video image frame, it can use the human body recognition window to slide the frame on the video image frame, and use the region image that is framed each time as a candidate region image. If there are multiple sizes of human body recognition windows, create a concurrent thread corresponding to the number of human body recognition windows, copy the multiple video image frames, and control the human body recognition window to slide on different video image frames through multiple concurrent threads.
  • Frame fetching that is, the sliding frame fetching operations of human body recognition windows of different sizes are independent of each other and do not affect each other, and candidate region images of different sizes are generated.
  • the overlap rate between each of the candidate region images and the standard human body template is calculated, and the candidate region image with the overlap rate greater than a preset overlap rate threshold is selected as the body region image.
  • the terminal device calculates the coincidence rate between the candidate region image and the standard human body template. If the coincidence rate between the two is higher, it means that the object corresponding to the region image is similar to the target object. The higher the degree, so the candidate area can be identified as a human body image; on the contrary, if the coincidence rate between the two is lower, it means that the shape of the image of the region has a lower similarity with the target object, and it is recognized as a non-human body image . Since the video image frame can contain multiple different users, the terminal device will recognize all candidate regions whose coincidence rate exceeds the preset coincidence rate threshold as human body region images. In this case, the terminal device can locate each human body region image In order to match the human body image with the standard human face of the target object, the human body region image matching the standard human face is selected as the human body region image of the target object.
  • the zoom ratio of the video image frame is determined, and the corresponding human body recognition window is generated to perform the recognition operation of the human body region image , which can improve the accuracy of recognition.
  • FIG. 3 shows a specific implementation flow chart of a method S104 for recognizing a human body movement provided by the third embodiment of the present application.
  • a human body motion recognition method S104 provided in this embodiment includes: S1041 to S1045, which are detailed as follows:
  • the generating a key feature sequence about the key part according to the feature coordinates corresponding to the key part in each of the video image frames includes:
  • S1041 obtain the first feature coordinates and the second feature coordinates of the same key part in the two video image frames with adjacent frames, and calculate the difference between the first feature coordinates and the second feature coordinates The image distance value between.
  • the terminal device needs to track key parts of the human body. If it detects that the displacement of the same key part in two adjacent image frames is too large, it will identify that the two key parts belong to different human bodies, so that re-tracking can be quickly performed , And improve the accuracy of action recognition. Based on this, the terminal device will obtain the first feature coordinates and the second feature coordinates of the same key part in two video image frames with adjacent frames, import the two feature coordinates into the Euclidean distance calculation formula, and calculate the two coordinates The distance value between the points, that is, the image distance value.
  • the image distance value specifically refers to the distance between two coordinate points on the video image frame, and is not the moving distance of the key part in the actual scene. Therefore, the image distance value needs to be numerically converted.
  • the image area of the human body region image is calculated, and the shooting focal length between the target object and the shooting module is determined based on the image area.
  • the terminal device acquires the area occupied by the human body region image in the video image frame, that is, the image area.
  • the terminal device is provided with a standard human body area and a standard shooting focal length corresponding to the area.
  • the terminal device can calculate the ratio between the current image area and the standard human body area, determine the zoom ratio, and calculate the actual shooting focal length between the target object and the shooting model based on the zoom ratio and the standard shooting focal length, that is, the aforementioned shooting focal length.
  • Dist is the actual moving distance
  • StandardDist is the image distance value
  • FigDist is the shooting focal length
  • BaseDist is the preset reference focal length
  • ActFrame is the shooting frame rate
  • BaseFrame is the reference frame rate.
  • the shooting focal length corresponding to the video image frame of the terminal device, the image distance value of the two key parts and the shooting frame rate of the video file are imported into the distance conversion model, so that the actual key parts in the scene can be calculated. Moving distance.
  • the two feature coordinates whose actual movement distance is less than a preset distance threshold are identified as mutually related feature coordinates.
  • the terminal device detects that the actual movement distance is greater than or equal to the preset distance threshold, it means that the movement distance of the key part exceeds the normal movement distance.
  • the key part in the two video image frames will be identified Belong to different target objects, at this time, it will be judged that the above two feature coordinates are unrelated feature coordinates; on the contrary, if the actual movement distance value is less than the preset distance threshold, it means that the key parts in the two video image frames belong to the same
  • the above two feature coordinates will be determined as the associated feature coordinates to achieve the purpose of tracking the target object, avoiding switching to tracking the motion trajectory of user B when tracking the motion trajectory of user A, which improves The accuracy of action recognition.
  • the key feature sequence related to the key part is generated according to all the mutually related feature coordinates.
  • the terminal device filters all non-associated feature coordinates, encapsulates the mutually associated feature coordinates, and generates a key feature sequence about key parts.
  • the abnormal feature coordinate points can be filtered, and the accuracy of the action recognition is improved.
  • FIG. 4 shows a specific implementation flow chart of a method S103 for recognizing a human body movement provided by the fourth embodiment of the present application.
  • S103 in a method for recognizing human body movements provided in this embodiment includes: S1031 to S1032, which are detailed as follows:
  • the marking each key part in the preset human body key part list in the human body region image, and obtaining the characteristic coordinates of each key part includes:
  • S1031 perform face recognition on the human body region image, and determine the face position coordinates of the human body region image.
  • the terminal device performs face recognition on the human body region image, acquires the face region image contained in the human body region image, and uses the center coordinates of the human body region image as the face position coordinates.
  • the method of recognizing a human face may be: the terminal device performs grayscale processing on the human body region image, extracts various contour lines in the human body region image, and selects contour lines that match the face curve according to the shape of the contour line, and The area enclosed by the matched contour lines is recognized as a face area image, and the face position coordinates of the face area image are obtained.
  • the terminal device uses the coordinates of the face part as the reference coordinates, and the preset positional relationship between the face part and each key part, can locate the position of each key part, and subtract the image of the human body area. Mark on the top, and the position relationship is included as a distance vector, that is, a vector composed of the coordinates of the face part as the starting point and the key part as the end point.
  • the terminal device can locate various key parts by recognizing the coordinates of human face parts, which improves the accuracy of key parts recognition.
  • FIG. 5 shows a specific implementation flow chart of a method S105 for recognizing a human body movement provided by the fifth embodiment of the present application.
  • a human body motion recognition method S105 provided in this embodiment includes: S1051 to S1052, which are detailed as follows:
  • the determining the action type of the target object through the key feature sequence of each of the key parts includes:
  • the feature coordinates of each of the key feature sequences are marked in a preset coordinate axis, and a part change curve about each of the key parts is generated.
  • the terminal device marks each feature coordinate on a preset coordinate axis according to the coordinate value of each feature coordinate in each key feature sequence and the frame number of the corresponding video image frame, and connects each feature coordinate, Generate part change curve about key parts.
  • the coordinate axis may be a coordinate axis established based on the video image frame, the horizontal axis corresponds to the length of the video image frame, and the vertical axis corresponds to the width of the video image frame.
  • the part change curve is matched with the standard action curve of each candidate action in the preset action library, and the action type of the target object is determined based on the matching result.
  • the terminal device matches the change curve of all key parts with the standard movement curve of each candidate action in the preset movement library, calculates the coincidence rate of the two change curves, and selects the candidate action with the highest coincidence rate as The action type of the target object.
  • the action type of the target object can be determined intuitively, which improves the accuracy of the action type.
  • FIG. 6 shows a structural block diagram of a human body motion recognition device provided by an embodiment of the present application.
  • the human body motion recognition device includes units for executing steps in the embodiment corresponding to FIG. 1.
  • only the parts related to this embodiment are shown.
  • the recognition device for human body movements includes:
  • the video file obtaining unit 61 is configured to obtain a video file of the target object; the video file includes a plurality of video image frames;
  • the human body region image extraction unit 62 is configured to analyze each of the video image frames, and extract the human body region image of the target object in the video image frames;
  • the key part recognition unit 63 is configured to mark each key part in the preset human body key part list in the human body region image, and obtain the characteristic coordinates of each key part;
  • a key feature sequence generating unit 64 configured to generate a key feature sequence about the key part according to the feature coordinates corresponding to the key part in each of the video image frames;
  • the action type recognition unit 65 is configured to determine the action type of the target object through the key feature sequence of each of the key parts.
  • the human body region image extraction unit 62 includes:
  • a contour curve obtaining unit configured to obtain the contour curve of the video image frame through a contour recognition algorithm, and calculate the area of the area surrounded by each contour curve;
  • a human body recognition window generating unit configured to generate a human body recognition window of the video image frame according to the area of each of the regions;
  • a candidate region image extraction unit configured to perform sliding frame extraction on the video image frame based on the human body recognition window to generate multiple candidate region images
  • the human body region image matching unit is used to calculate the coincidence rate between each of the candidate region images and the standard human body template, and select the candidate region image with the coincidence rate greater than a preset coincidence rate threshold as the body region image .
  • the key feature sequence generating unit 64 includes:
  • the image distance value calculation unit is used to obtain the first feature coordinates and the second feature coordinates of the same key part in the two video image frames with adjacent frames, and calculate the first feature coordinates and the first feature coordinates Image distance value between two feature coordinates;
  • a photographing focal length determining unit configured to calculate the image area of the human body region image, and determine the photographing focal length between the target object and the photographing module based on the image area;
  • the actual movement distance calculation unit is used to import the shooting focal length, the image distance value and the shooting frame rate of the video file into the distance conversion model, and calculate the actual movement of the key part in the two video image frames Distance;
  • the distance conversion model is specifically:
  • Dist is the actual moving distance
  • StandardDist is the image distance value
  • FigDist is the shooting focal length
  • BaseDist is the preset reference focal length
  • ActFrame is the shooting frame rate
  • BaseFrame is the reference frame rate
  • An associated coordinate identification unit configured to identify the two feature coordinates whose actual movement distance is less than a preset distance threshold as mutually associated feature coordinates
  • the associated coordinate packaging unit is used to generate the key feature sequence about the key part according to all the mutually associated feature coordinates.
  • the key part identifying unit 63 includes:
  • a face recognition unit configured to perform face recognition on the human body region image, and determine the coordinates of the face part of the human body region image
  • the key part marking unit is used to mark each key part in the human body region image based on the positional relationship between the face part and each key part.
  • the action type identification unit 65 includes:
  • a part change curve generating unit configured to mark the feature coordinates of each of the key feature sequences in a preset coordinate axis, and generate a part change curve about each of the key parts;
  • the candidate action selection unit is configured to match the part change curve with the standard action curve of each candidate action in the preset action library, and determine the action type of the target object based on the matching result.
  • the human body motion recognition device provided by the embodiment of the present application can also recognize the motion type of a video image without relying on a neural network, and does not rely on optical flow information, avoiding the recognition delay caused by the need to perform time sequence recursion, thereby
  • the efficiency of recognition is improved, and by locating multiple key parts and determining the action of the target object through the changes of multiple key parts, the accuracy is further improved, thereby improving the effect of image recognition and the efficiency of object behavior analysis.
  • Fig. 7 is a schematic diagram of a terminal device provided by another embodiment of the present application.
  • the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and computer-readable instructions 72 stored in the memory 71 and running on the processor 70, such as a human body movement Identification procedures.
  • the processor 70 executes the computer-readable instructions 72, the steps in the above embodiments of the recognition method for body movements are implemented, for example, S101 to S105 shown in FIG. 1.
  • the processor 70 executes the computer-readable instructions 72
  • the functions of the units in the foregoing device embodiments, such as the functions of the modules 61 to 65 shown in FIG. 6, are implemented.
  • the computer-readable instruction 72 may be divided into one or more units, and the one or more units are stored in the memory 71 and executed by the processor 70 to complete the application .
  • the one or more units may be a series of computer-readable instruction instruction segments capable of completing specific functions, and the instruction segment is used to describe the execution process of the computer-readable instructions 72 in the terminal device 7.
  • the computer-readable instructions 72 can be divided into a video file acquisition unit, a human body region image extraction unit, a key part recognition unit, a key feature sequence generation unit, and an action type recognition unit. The specific functions of each unit are as described above.
  • the terminal device 7 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 70 and a memory 71.
  • FIG. 7 is only an example of the terminal device 7 and does not constitute a limitation on the terminal device 7. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the terminal device may also include input and output devices, network access devices, buses, etc.
  • the so-called processor 70 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7.
  • the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk equipped on the terminal device 7, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc. Further, the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device.
  • the memory 71 is used to store the computer-readable instructions and other programs and data required by the terminal device.
  • the memory 71 can also be used to temporarily store data that has been output or will be output.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种人体动作的识别方法及设备,适用于图像识别技术领域,该方法包括:获取目标对象的视频文件(S101);分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像(S102);在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标(S103);根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列(S104);通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型(S105)。通过多个关键部位的变动情况确定目标对象的动作,准确率也进一步提高,从而提高了图像识别的效果以及对象行为分析的效率。

Description

一种人体动作的识别方法及设备
本申请申明享有2019年04月03日递交的申请号为201910264909.0、名称为“一种人体动作的识别方法及设备”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请属于图像识别技术领域,尤其涉及一种人体动作的识别方法及设备。
背景技术
随着图像识别技术的不断发展,计算机可以从图像文件以及视频文件中自动识别得到越来越多的信息,例如确定画面中包含的用户的人体动作类型,并基于识别得到动作信息进行对象追踪以及对象行为分析等操作,因此图像识别技术的准确度以及识别速率,则会直接影响后续步骤的处理效果。现有的人体动作的识别技术,一般是采用卷积神经网络进行识别,然而上述技术需要借助光流信息,需要多次进行时序递归操作,从而识别速度较低,而且准确率也不高,从而降低了图像识别的效果以及后续基于人体动作进行对象行为分析的效率。
技术问题
有鉴于此,本申请实施例提供了一种人体动作的识别方法及设备,以解决现有的人体动作的识别方法,识别速度较低,而且准确率也不高,从而降低了图像处理的识别效果以及后续基于人体动作进行对象行为分析的效率的问题。
技术解决方案
本申请实施例的第一方面提供了一种人体动作的识别方法,包括:
获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像;
在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
有益效果
本申请实施例通过获取所需要进行动作行为分析的目标用户的视频文件,并对该视频文件的各个视频图像帧进行解析,确定每个视频图像帧中包含的人体区域图像,在人体区域图像中标记出各个关键部位,并根据各个关键部位的特征坐标,确定目标对象的各个部位的变化情况,从而确定目标对象的动作类型,自动识别目标对象的人体动作。与现有的人体动作的识别技术相比,本申请实施例无需依赖神经网络对视频图像进行动作类型的识别,并不借助光流信息,避免了需要进行时序递归而带来的识别时延,从而提高了识别的效率,而且通过定位多个关键部位,通过多个关键部位的变动情况确定目标对象的动作,准确率也进一步提高,从而提高了图像识别的效果以及对象行为分析的效率。
附图说明
图1是本申请第一实施例提供的一种人体动作的识别方法的实现流程图;
图2是本申请第二实施例提供的一种人体动作的识别方法S102具体实现流程图;
图3是本申请第三实施例提供的一种人体动作的识别方法S104具体实现流程图;
图4是本申请第四实施例提供的一种人体动作的识别方法S103具体实现流程图;
图5是本申请第五实施例提供的一种人体动作的识别方法S105具体实现流程图;
图6是本申请一实施例提供的一种人体动作的识别设备的结构框图;
图7是本申请另一实施例提供的一种终端设备的示意图。
本发明的实施方式
在本申请实施例中,流程的执行主体为终端设备。该终端设备包括但不限于:服务器、计算机、智能手机以及平板电脑等能够执行人体动作的识别操作的设备。图1示出了本申请第一实施例提供的人体动作的识别方法的实现流程图,详述如下:
在S101中,获取目标对象的视频文件;所述视频文件包括多个视频图像帧。
在本实施例中,管理员可以指定包含目标对象的视频文件作为目标视频文件,在该情况下,终端设备会根据该目标视频文件的文件标识,从视频数据库中下载关于目标对象的视频文件,并对该目标对象的动作行为进行识别。优选地,该终端设备具体为一视频监控设备,会获取当前场景内视频文件;在该情况下,终端设备会将当前场景中拍摄得到的各个对象均识别为目标对象,基于不同拍摄对象的人脸图像,为各个对象配置对象编号,终端设备根据监控过程中生成的视频文件,实时判定各个监控对象的动作类型,若检测到某一目标对象的动作类型在异常动作列表内,则生成警告信息,以通知执行异常动作的监控对象停止该异常行为,实现实时对监控对象的异常动作的警告目的。
可选地,用户可以将目标对象的人脸信息发送给终端设备。终端设备基于该人脸信息 在视频数据库内的各个视频文件进行人脸查找,将包含该人脸信息的视频文件作为目标视频文件。具体的查找操作可以为:终端设备识别视频数据库内的各个视频文件中每个视频图像帧中的候选人脸,提取候选人脸中关键区域的脸部特征值,将各个候选人脸的脸部特征值与目标人脸的人脸信息进行匹配,若两者匹配度大于预设的匹配阈值,则表示两者对应同一实体人,则将该视频文件识别为包含目标对象的人脸图像。
在本实施例中,视频文件包含多个视频图像帧,每个视频图像帧对应一个帧编号,基于帧编号的正序将各个视频图像帧进行排列并封装,生成视频文件。该帧编号可以根据视频图像帧在视频文件中的播放时间确定。
在S102中,分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像。
在本实施例中,终端设备对视频文件进行解析,分别对视频文件中的各个视频图像帧进行人体识别,并提取各个视频图像帧关于目标对象的人体区域图像。提取人体区域图像的具体方式可以为:终端设备通过人脸识别算法,判断该视频图像帧中是否包含人脸区域图像,若不包含,则表示该视频图像帧不包含人体区域图像;反之,若该视频图像帧包含人脸图像,则基于该人脸图像所在的坐标,对该坐标的区域进行轮廓识别,基于识别得到的轮廓信息提取人脸图像对应的人体区域图像,并根据该人脸图像与目标对象的人脸模板进行匹配,从而判断该人体区域图像是否为目标对象的人体区域图像。
可选地,若目标对象的数量为多个,即需要监控多个对象的行为,则终端设备在确定视频图像帧包含的人脸图像的人体区域图像后,则会将该人脸图像与各个目标对象的人脸模板进行匹配,从而确定该人脸图像多对应的目标对象,并在该人体区域图像上标记关联的目标对象的对象标识,继而可以在视频图像帧中快速确定每个目标对象所对应的人体区域图像,方便对多对象的动作跟踪。
可选地,在本实施例中,终端设备可以根据目标对象的对象标识,获取对象标识关联的对象人体模板。该对象人体模板可以用于表示该目标对象的人体特征,例如体型信息、性别信息和/或发型信息,终端设备可以根据该对象人体模板在视频图像帧中进行滑动框取,计算所框取的候选区域与对象人体模板之间的匹配度,若两者匹配度大于预设的匹配阈值,则识别该候选区域为目标对象的人体区域图像;反之,若两者匹配度小于或等于匹配阈值,则识别该候选区域并非目标对象的人体区域图像,继续进行滑动框取;若视频图像帧中所有候选区域均不包含人体区域图像,则对下一帧的视频图像帧重复执行上述操作,识别目标对象的人体区域图像。
在S103中,在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位, 并获取各个所述关键部位的特征坐标。
在本实施例中,终端设备存储有一个人体关键部位列表,该人体关键部位列表包含有多个人体关键部位,优选地,人体关键部位列表包含有17个关键部位,分别为:鼻子,双眼,双耳,双肩,双腕,双手,双腰,双膝,双脚这17个关键部位。通过定位多个人体关键部位,并追踪多个关键部位的运动变化情况,能够提高人体动作识别的准确率。
在本实施例中,终端设备在人体区域图像中标记出各个关键部位,具体的标记方式为:基于人体区域图像的轮廓信息,确定目标对象当前的姿态类型,其中,姿态类型具体为:站立类型、行走类型、平躺类型、正坐类型等,继而根据不同关键部位与姿态类型的对应关系,在人体区域图像上标记出各个关键部位。可选地,该对应关系记录有关键部位与人体区域图像的轮廓中心点的距离值以及相对方向向量,终端设备可以基于该距离值以及相对方向向量定位出各个关键部位,并执行标记操作。
在本实施例中,终端设备基于视频图像帧建立一个图像坐标轴,并根据各个关键部位在视频图像帧上的位置,从而确定各个关键部位的特征坐标。可选地,终端设备可以将视频图像帧的左下角的端点作为坐标原点,也可以将图像中心点作为坐标原点,具体根据管理员或设备的默认设置决定。
在S104中,根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列。
在本实施例中,终端设备需要确定各个关键部位的运动轨迹,因此会基于关键部位的部位标识,从各个视频图像帧中提取关于该部位标识对应的特征坐标,并将所有关于该特征部位的特征坐标进行封装,生成关于该特征部位的关键特征序列。其中,该关键特征序列中各个元素的在序列中的次序与所属视频图像帧的帧序号一致,即该关键特征序列中各个元素是具有时序关系的,从而能够通过关键特征序列确定关键部位基于时间的推移而变化的情况。
可选地,若部分视频图像帧中的关键部位因被遮挡而不存在对应的特征坐标,终端设备可以根据视频图像帧的帧序号,在预设的坐标轴上建立关于关键部位的特征曲线,基于帧序号依次连接各个特征坐标,而缺失的视频图像帧对应的特征坐标则可以通过平滑算法进行填补,确定缺失的视频图像帧对应的特征坐标。
在S105中,通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
在本实施例中,终端设备根据多个关键部位的关键特征序列,则可以确定不同关键部位的运动轨迹,继而确定该目标对象的动作类型。具体地,终端设备可以根据关键特征序 列确定关键部位的运动方向,继而基于多个关键部位的运动方向,与各个候选动作类型的关键部位的运动方向一一进行匹配,基于匹配的关键部位的个数,例如选取匹配的关键部位的个数最大的候选动作类型作为目标对象的动作类型。
可选地,终端设备可以设置有最大的帧数,继而终端设备基于该最大的帧数对关键部位的关键特征序列进行划分,划分为多个特征子序列,分别确定不同特征子序列的动作类型,由于拍摄的视频文件的时长较长的情况下,用户可能在该拍摄过程中做出多个动作,基于此,终端设备为了对不同的动作进行划分以及识别,会设置有最大的帧数,实现单用户的多动作识别的目的。
以上可以看出,本申请实施例提供的一种人体动作的识别方法通过获取所需要进行动作行为分析的目标用户的视频文件,并对该视频文件的各个视频图像帧进行解析,确定每个视频图像帧中包含的人体区域图像,在人体区域图像中标记出各个关键部位,并根据各个关键部位的特征坐标,确定目标对象的各个部位的变化情况,从而确定目标对象的动作类型,自动识别目标对象的人体动作。与现有的人体动作的识别技术相比,本申请实施例无需依赖神经网络对视频图像进行动作类型的识别,并不借助光流信息,避免了需要进行时序递归而带来的识别时延,从而提高了识别的效率,而且通过定位多个关键部位,通过多个关键部位的变动情况确定目标对象的动作,准确率也进一步提高,从而提高了图像识别的效果以及对象行为分析的效率。
图2示出了本申请第二实施例提供的一种人体动作的识别方法S102的具体实现流程图。参见图2,相对于图1所述实施例,本实施例提供的一种人体动作的识别方法S102包括:S1021~S1024,具体详述如下:
进一步地,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,包括:
在S1021中,通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积。
在本实施例中,终端设备通过轮廓识别算法,确定该视频图像帧中的轮廓曲线。具体识别轮廓线的方式可以为:终端设备计算相邻两个坐标点之间的像素值的差值,若该差值大于预设的轮廓阈值,则识别该坐标点为轮廓线所在的坐标点,连接所有识别得到的轮廓线上的坐标点,构成一条连续的轮廓曲线。每一条封闭的轮廓曲线对应一个拍摄对象。
在本实施例中,终端设备在视频图像帧上标记出所有轮廓曲线,并将轮廓曲线和/或视频图像帧的边界之间所围成的区域进行积分,从而能够得到关于各个轮廓曲线对应的区域面积,由于一条轮廓曲线对应一个拍摄对象,基于区域面积,可以确定被拍摄对象的缩放 比例,从而能够选取合适的窗口在提取人体区域图像,提高人体区域图像提取的准确性。
在S1022中,根据各个所述区域面积,生成所述视频图像帧的人体识别窗口。
在本实施例中,由于不同的缩放比例,人体识别窗口的尺寸也需要随之调整,基于此,终端设备可以根据各个拍摄对象的区域面积,计算出视频图像帧对应的缩放比例,并查询该缩放比例关联的人体识别窗口尺寸,继而生成与视频图像帧匹配的人体识别窗口。
可选地,在本实施例中,终端设备采用的是yolov3的人体识别算法,而yolov3需要配置3个人体识别窗口。基于此,终端设备根据各个轮廓曲线所围成的区域面积,生成区域面积的分布情况,选取分布密度最大的三个区域面积作为特征面积,并基于三个特征面积生成与之对应的人体识别窗口,即三个feature map。
在S1023中,基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像。
在本实施例中,终端设备在生成与视频图像帧的缩放比例对应的人体识别窗口后,可以通过人体识别窗口在视频图像帧上进行滑动框取,将每一次框取的区域图像作为候选区域图像。若存在多个尺寸的人体识别窗口,则创建与人体识别窗口数量对应的并发线程,并复制该多个视频图像帧,通过多条并发线程分别控制人体识别窗口在不同的视频图像帧上进行滑动框取,即不同尺寸的人体识别窗口的滑动框取操作是相互独立、互不影响的,生成不同尺寸的候选区域图像。
在S1024中,分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
在本实施例中,终端设备计算该候选区域图像与标准人体模板之间的重合率,若两者之间的重合率越高,则表示该区域图像所对应的拍摄对象与目标对像的相似度越高,因此可以识别该候选区域为人体区域图像;反之,若两者之间的重合率越低,则表示该区域图像的形态与目标对象的相似度较低,识别为非人体区域图像。由于视频图像帧中可以包含多个不同用户,因此终端设备会将所有重合率超过预设的重合率阈值的候选区域均识别为人体区域图像,在该情况下,终端设备可以定位各个人体区域图像的人脸图像,从而将人体图像与目标对象的标准人脸进行匹配,从而选取与标准人脸相匹配的人体区域图像作为目标对象的人体区域图像。
在本申请实施例中,通过获取视频图像帧中的轮廓曲线,从而基于各个轮廓曲线的区域面积,确定视频图像帧的缩放比例,并生成与之对应的人体识别窗口进行人体区域图像的识别操作,从而能够提高识别的准确率。
图3示出了本申请第三实施例提供的一种人体动作的识别方法S104的具体实现流程图。 参见图3,相对于图1所述的实施例,本实施例提供的一种人体动作的识别方法S104包括:S1041~S1045,具体详述如下:
进一步地,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
在S1041中,获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值。
在本实施例中,终端设备需要进行人体关键部位追踪,若检测到两个相邻图像帧中相同关键部位的位移过大,则标识两个关键部位属于不同的人体,从而能够快速进行重追踪,并且提高动作识别的准确率。基于此,终端设备会获取帧数相邻的两个视频图像帧中相同关键部位的第一特征坐标以及第二特征坐标,将两个特征坐标导入到欧氏距离计算公式,计算出两个坐标点之间的距离值,即图像距离值。该图像距离值具体指在视频图像帧上两个坐标点之间的距离,并非该关键部位在实际场景下的移动距离,因此需要对该图像距离值进行数值转换。
在S1042中,计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距。
在本实施例中,终端设备获取人体区域图像在视频图像帧中所占据的面积,即图像面积。终端设备设置有标准的人体面积以及该面积所对应的标准拍摄焦距。终端设备可以计算当前的图像面积与标准的人体面积之间的比例,确定缩放比例,基于所述缩放比例以及标准拍摄焦距,计算该目标对象与拍摄模型之间的实际拍摄焦距,即上述的拍摄焦距。
在S1043中,将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
Figure PCTCN2019103164-appb-000001
其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率。
在本实施例中,终端设备该视频图像帧对应的拍摄焦距以及两个关键部位的图像距离值以及该视频文件的拍摄帧率导入到距离转换模型内,从而能够计算关键部位在场景中的实际移动距离。
在S1044中,将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标。
在本实施例中,终端设备若检测到实际移动距离大于或等于预设的距离阈值,则表示该关键部位移动距离超过了正常的移动距离,此时会识别两个视频图像帧中该关键部位属于不同的目标对象,此时会判定上述两个特征坐标为非关联的特征坐标;反之,若该实际移动距离值小于预设的距离阈值,则表示两个视频图像帧中该关键部位属于同一目标对象,此时会判定上述两个特征坐标为关联的特征坐标,实现对目标对象的追踪的目的,避免在追踪用户A的运动轨迹的情况下,切换到追踪用户B的运动轨迹,提高了动作识别的准确率。
在S1045,根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
在本实施例中,终端设备将所有非关联的特征坐标进行过滤,将互为关联的特征坐标进行封装,生成关于关键部位的关键特征序列。
在本申请实施例中,通过计算不同帧数下关键部位的实际移动距离,从而能够对异常的特征坐标点进行过滤,提高了动作识别的准确性。
图4示出了本申请第四实施例提供的一种人体动作的识别方法S103的具体实现流程图。参见图4,相对于图1-图3所述的实施例,本实施例提供的一种人体动作的识别方法中S103包括:S1031~S1032,具体详述如下:
进一步地,所述在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标,包括:
在S1031中,对所述人体区域图像进行人脸识别,确定所述人体区域图像的人脸部位坐标。
在本实施例中,终端设备对人体区域图像进行人脸识别,获取该人体区域图像中包含的人脸区域图像,并将人体区域图像的中心坐标作为人脸部位坐标。具体地,识别人脸的方式可以为:终端设备对人体区域图像进行灰度处理,提取人体区域图像中的各个轮廓线,根据轮廓线的形状,选取与人脸曲线相匹配的轮廓线,并将匹配的轮廓线所围成的区域识别为人脸区域图像,并获取人脸区域图像的人脸部位坐标。
在S1032中,基于所述人脸部位与各个所述关键部位的位置关系,在所述人体区域图像中标记各个所述关键部位。
在本实施例中,终端设备以人脸部位坐标为基准坐标,以及预设的人脸部位与各个关键部位的位置关系,能够定位出各个关键部位所在的位置,并子啊人体区域图像上进行标 记,该位置关系包含为一个距离向量,即包含以人脸部位坐标为起点、以关键部位为终点所构成的向量。
在本申请实施例中,终端设备通过识别人脸部位坐标,从而能够定位出各个关键部位,提高了关键部位识别的准确性。
图5示出了本申请第五实施例提供的一种人体动作的识别方法S105的具体实现流程图。参见图5,相对于图1至图3所述实施例,本实施例提供的一种人体动作的识别方法S105包括:S1051~S1052,具体详述如下:
进一步地,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型,包括:
在S1051中,在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线。
在本实施例中,终端设备根据各个关键特征序列中各个特征坐标的坐标值以及对应的视频图像帧的帧数,在预设的坐标轴上标记出各个各个特征坐标,并连接各个特征坐标,生成关于关键部位的部位变化曲线。该坐标轴可以以视频图像帧为基础建立的坐标轴,横轴标对应视频图像帧的长,纵坐标对应视频图像帧的宽。
在S1052中,将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
在本实施例中,终端设备根据所有关键部位的部位变化曲线与预设动作库中各个候选动作的标准动作曲线进行匹配,计算两个变化曲线的重合率,选取重合率最高的一个候选动作为目标对象的动作类型。
在本申请实施例中,通过绘制关键部位的部位变化曲线从而能够直观地确定目标对象的动作类型,提高了动作类型的准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
图6示出了本申请一实施例提供的一种人体动作的识别设备的结构框图,该人体动作的识别设备包括的各单元用于执行图1对应的实施例中的各步骤。具体请参阅图1与图1所对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。
参见图6,所述人体动作的识别设备包括:
视频文件获取单元61,用于获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
人体区域图像提取单元62,用于分别解析各个所述视频图像帧,提取所述视频图像帧 中关于所述目标对象的人体区域图像;
关键部位识别单元63,用于在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
关键特征序列生成单元64,用于根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
动作类型识别单元65,用于通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
可选地,所述人体区域图像提取单元62包括:
轮廓曲线获取单元,用于通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
人体识别窗口生成单元,用于根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
候选区域图像提取单元,用于基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
人体区域图像匹配单元,用于分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
可选地,所述关键特征序列生成单元64包括:
图像距离值计算单元,用于获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
拍摄焦距确定单元,用于计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
实际移动距离计算单元,用于将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
Figure PCTCN2019103164-appb-000002
其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准 帧率;
关联坐标识别单元,用于将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
关联坐标封装单元,用于根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
可选地,所述关键部位识别单元63包括:
人脸识别单元,用于对所述人体区域图像进行人脸识别,确定所述人体区域图像的人脸部位坐标;
关键部位标记单元,用于基于所述人脸部位与各个所述关键部位的位置关系,在所述人体区域图像中标记各个所述关键部位。
可选地,所述动作类型识别单元65包括:
部位变化曲线生成单元,用于在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
候选动作选取单元,用于将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
因此,本申请实施例提供的人体动作的识别设备同样可以无需依赖神经网络对视频图像进行动作类型的识别,并不借助光流信息,避免了需要进行时序递归而带来的识别时延,从而提高了识别的效率,而且通过定位多个关键部位,通过多个关键部位的变动情况确定目标对象的动作,准确率也进一步提高,从而提高了图像识别的效果以及对象行为分析的效率。
图7是本申请另一实施例提供的一种终端设备的示意图。如图7所示,该实施例的终端设备7包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机可读指令72,例如人体动作的识别程序。所述处理器70执行所述计算机可读指令72时实现上述各个人体动作的识别方法实施例中的步骤,例如图1所示的S101至S105。或者,所述处理器70执行所述计算机可读指令72时实现上述各装置实施例中各单元的功能,例如图6所示模块61至65功能。
示例性的,所述计算机可读指令72可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器71中,并由所述处理器70执行,以完成本申请。所述一个或多个单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述所述计算机可读指令72在所述终端设备7中的执行过程。例如,所述计算机可读指令72可以被分割视频文件获取单元、人体区域图像提取单元、关键部位识别单元、关键特征序列生 成单元以及动作类型识别单元,各单元具体功能如上所述。
所述终端设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器70、存储器71。本领域技术人员可以理解,图7仅仅是终端设备7的示例,并不构成对终端设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器70可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器71可以是所述终端设备7的内部存储单元,例如终端设备7的硬盘或内存。所述存储器71也可以是所述终端设备7的外部存储设备,例如所述终端设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器71还可以既包括所述终端设备7的内部存储单元也包括外部存储设备。所述存储器71用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器71还可以用于暂时地存储已经输出或者将要输出的数据。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、 直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种人体动作的识别方法,其特征在于,包括:
    获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像;
    在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
  2. 根据权利要求1所述的识别方法,其特征在于,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,包括:
    通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  3. 根据权利要求1所述的识别方法,其特征在于,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
    获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103164-appb-100001
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准 帧率;
    将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  4. 根据权利要求1-3任一项所述的识别方法,其特征在于,所述在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标,包括:
    对所述人体区域图像进行人脸识别,确定所述人体区域图像的人脸部位坐标;
    基于所述人脸部位与各个所述关键部位的位置关系,在所述人体区域图像中标记各个所述关键部位。
  5. 根据权利要求1-3任一项所述的识别方法,其特征在于,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型,包括:
    在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
  6. 一种人体动作的识别设备,其特征在于,包括:
    视频文件获取单元,用于获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    人体区域图像提取单元,用于分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像;
    关键部位识别单元,用于在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    关键特征序列生成单元,用于根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    动作类型识别单元,用于通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
  7. 根据权利要求6所述的识别设备,其特征在于,所述人体区域图像提取单元包括:
    轮廓曲线获取单元,用于通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    人体识别窗口生成单元,用于根据各个所述区域面积,生成所述视频图像帧的人体识 别窗口;
    候选区域图像提取单元,用于基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    人体区域图像匹配单元,用于分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  8. 根据权利要求6所述的识别设备,其特征在于,所述关键特征序列生成单元包括:
    图像距离值计算单元,用于获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    拍摄焦距确定单元,用于计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    实际移动距离计算单元,用于将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103164-appb-100002
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    关联坐标识别单元,用于将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    关联坐标封装单元,用于根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  9. 根据权利要求6-8任一项所述的识别设备,其特征在于,所述关键部位识别单元包括:
    人脸识别单元,用于对所述人体区域图像进行人脸识别,确定所述人体区域图像的人脸部位坐标;
    关键部位标记单元,用于基于所述人脸部位与各个所述关键部位的位置关系,在所述人体区域图像中标记各个所述关键部位。
  10. 根据权利要求6-8任一项所述的识别设备,其特征在于,所述动作类型识别单元包括:
    部位变化曲线生成单元,用于在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    候选动作选取单元,用于将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
  11. 一种终端设备,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像;
    在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列;
    通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
  12. 根据权利要求11所述的终端设备,其特征在于,所述所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,包括:
    通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  13. 根据权利要求11所述的终端设备,其特征在于,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
    获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103164-appb-100003
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  14. 根据权利要求11-13任一项所述的终端设备,其特征在于,所述在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标,包括:
    对所述人体区域图像进行人脸识别,确定所述人体区域图像的人脸部位坐标;
    基于所述人脸部位与各个所述关键部位的位置关系,在所述人体区域图像中标记各个所述关键部位。
  15. 根据权利要求11-13任一项所述的终端设备,其特征在于,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型,包括:
    在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
  16. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:
    获取目标对象的视频文件;所述视频文件包括多个视频图像帧;
    分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像;
    在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标;
    根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键 部位的关键特征序列;
    通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型。
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述分别解析各个所述视频图像帧,提取所述视频图像帧中关于所述目标对象的人体区域图像,包括:
    通过轮廓识别算法,获取所述视频图像帧的轮廓曲线,并计算各个所述轮廓曲线所包围的区域面积;
    根据各个所述区域面积,生成所述视频图像帧的人体识别窗口;
    基于所述人体识别窗口在所述视频图像帧上进行滑动框取,生成多个候选区域图像;
    分别计算各个所述候选区域图像与标准人体模板之间的重合率,并选取所述重合率大于预设重合率阈值的所述候选区域图像作为所述人体区域图像。
  18. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述根据所述关键部位在各个所述视频图像帧中对应的所述特征坐标,生成关于所述关键部位的关键特征序列,包括:
    获取帧数相邻的两个所述视频图像帧内同一所述关键部位的第一特征坐标以及第二特征坐标,并计算所述第一特征坐标与所述第二特征坐标之间的图像距离值;
    计算所述人体区域图像的图像面积,并基于所述图像面积确定所述目标对象与拍摄模块之间的拍摄焦距;
    将所述拍摄焦距、所述图像距离值以及所述视频文件的拍摄帧率导入到距离转换模型,计算两个所述视频图像帧中所述关键部分的实际移动距离;所述距离转换模型具体为:
    Figure PCTCN2019103164-appb-100004
    其中,Dist为所述实际移动距离;StandardDist为所述图像距离值;FigDist为所述拍摄焦距;BaseDist为预设的基准焦距;ActFrame为所述拍摄帧率;BaseFrame为所述基准帧率;
    将所述实际移动距离小于预设的距离阈值的两个所述特征坐标识别为互为关联的特征坐标;
    根据所有所述互为关联的特征坐标生成关于所述关键部位的所述关键特征序列。
  19. 根据权利要求16-18任一项所述的计算机非易失性可读存储介质,其特征在于,所述在所述人体区域图像中标记出预设的人体关键部位列表内的各个关键部位,并获取各个所述关键部位的特征坐标,包括:
    对所述人体区域图像进行人脸识别,确定所述人体区域图像的人脸部位坐标;
    基于所述人脸部位与各个所述关键部位的位置关系,在所述人体区域图像中标记各个所述关键部位。
  20. 如权利要求16-18任一项所述的计算机非易失性可读存储介质,其特征在于,所述通过各个所述关键部位的所述关键特征序列,确定所述目标对象的动作类型,包括:
    在预设的坐标轴内标记各个所述关键特征序列的特征坐标,生成关于各个所述关键部位的部位变化曲线;
    将所述部位变化曲线与预设动作库内的各个候选动作的标准动作曲线进行匹配,基于匹配结果确定所述目标对象的动作类型。
PCT/CN2019/103164 2019-04-03 2019-08-29 一种人体动作的识别方法及设备 WO2020199480A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910264909.0 2019-04-03
CN201910264909.0A CN110147717B (zh) 2019-04-03 2019-04-03 一种人体动作的识别方法及设备

Publications (1)

Publication Number Publication Date
WO2020199480A1 true WO2020199480A1 (zh) 2020-10-08

Family

ID=67589546

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103164 WO2020199480A1 (zh) 2019-04-03 2019-08-29 一种人体动作的识别方法及设备

Country Status (2)

Country Link
CN (1) CN110147717B (zh)
WO (1) WO2020199480A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464786A (zh) * 2020-11-24 2021-03-09 泰康保险集团股份有限公司 一种视频的检测方法及装置
CN113268626A (zh) * 2021-05-26 2021-08-17 中国人民武装警察部队特种警察学院 数据处理方法、装置、电子设备及存储介质
CN113392743A (zh) * 2021-06-04 2021-09-14 北京格灵深瞳信息技术股份有限公司 异常动作检测方法、装置、电子设备和计算机存储介质
CN113505733A (zh) * 2021-07-26 2021-10-15 浙江大华技术股份有限公司 行为识别方法、装置、存储介质及电子装置
CN113673503A (zh) * 2021-08-25 2021-11-19 浙江大华技术股份有限公司 一种图像检测的方法及装置
CN113825012A (zh) * 2021-06-04 2021-12-21 腾讯科技(深圳)有限公司 视频数据处理方法和计算机设备
CN113989944A (zh) * 2021-12-28 2022-01-28 北京瑞莱智慧科技有限公司 操作动作识别方法、装置及存储介质
CN114360201A (zh) * 2021-12-17 2022-04-15 中建八局发展建设有限公司 基于ai技术的建筑临边危险区域越界识别方法和系统
CN115063885A (zh) * 2022-06-14 2022-09-16 中国科学院水生生物研究所 一种鱼体运动特征的分析方法及系统
US11450027B2 (en) * 2019-10-31 2022-09-20 Beijing Dajia Internet Information Technologys Co., Ltd. Method and electronic device for processing videos
CN116055684A (zh) * 2023-01-18 2023-05-02 訸和文化科技(苏州)有限公司 基于画面监控的在线体育教学系统
CN116890668A (zh) * 2023-09-07 2023-10-17 国网浙江省电力有限公司台州供电公司 信息同步互联的安全充电方法及充电装置

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147717B (zh) * 2019-04-03 2023-10-20 平安科技(深圳)有限公司 一种人体动作的识别方法及设备
CN110751034B (zh) * 2019-09-16 2023-09-01 平安科技(深圳)有限公司 行人行为识别方法及终端设备
CN110826506A (zh) * 2019-11-11 2020-02-21 上海秒针网络科技有限公司 目标行为的识别方法及装置
CN113132775B (zh) * 2019-12-30 2023-03-10 深圳Tcl数字技术有限公司 一种电子设备的控制方法、电子设备及存储介质
CN111158486B (zh) * 2019-12-31 2023-12-05 恒信东方文化股份有限公司 一种识别唱跳节目动作的方法及识别系统
CN111428607B (zh) * 2020-03-19 2023-04-28 浙江大华技术股份有限公司 一种追踪方法、装置及计算机设备
CN111539352A (zh) * 2020-04-27 2020-08-14 支付宝(杭州)信息技术有限公司 一种判断人体关节运动方向的方法及系统
CN112216640B (zh) * 2020-10-19 2021-08-06 高视科技(苏州)有限公司 一种半导体芯片定位方法和装置
CN112528823B (zh) * 2020-12-04 2022-08-19 燕山大学 一种基于关键帧检测和语义部件分割的条斑鲨运动行为分析方法及系统
CN112887792B (zh) * 2021-01-22 2023-07-25 维沃移动通信有限公司 视频处理方法、装置、电子设备及存储介质
CN113657278A (zh) * 2021-08-18 2021-11-16 成都信息工程大学 一种运动姿态识别方法、装置、设备及存储介质
CN113723307A (zh) * 2021-08-31 2021-11-30 上海掌门科技有限公司 基于俯卧撑检测的社交分享方法、设备及计算机可读介质
CN113780253B (zh) * 2021-11-12 2022-02-18 佛山科学技术学院 一种人体关节运动关键点识别方法及系统
CN114818989B (zh) * 2022-06-21 2022-11-08 中山大学深圳研究院 基于步态的行为识别方法、装置、终端设备及存储介质
CN115272923B (zh) * 2022-07-22 2023-04-21 华中科技大学同济医学院附属协和医院 一种基于大数据平台的智能识别方法和系统
CN117423166B (zh) * 2023-12-14 2024-03-26 广州华夏汇海科技有限公司 一种根据人体姿态图像数据的动作识别方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930767A (zh) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 一种基于人体骨架的动作识别方法
CN107335192A (zh) * 2017-05-26 2017-11-10 深圳奥比中光科技有限公司 运动辅助训练方法、装置及存储装置
US9911198B2 (en) * 2015-12-17 2018-03-06 Canon Kabushiki Kaisha Method, system and apparatus for matching moving targets between camera views
CN109325456A (zh) * 2018-09-29 2019-02-12 佳都新太科技股份有限公司 目标识别方法、装置、目标识别设备及存储介质
CN109460702A (zh) * 2018-09-14 2019-03-12 华南理工大学 基于人体骨架序列的乘客异常行为识别方法
CN110147717A (zh) * 2019-04-03 2019-08-20 平安科技(深圳)有限公司 一种人体动作的识别方法及设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4488233B2 (ja) * 2003-04-21 2010-06-23 日本電気株式会社 映像オブジェクト認識装置、映像オブジェクト認識方法、および映像オブジェクト認識プログラム
CN100568262C (zh) * 2007-12-29 2009-12-09 浙江工业大学 基于多摄像机信息融合的人脸识别检测装置
ES2812578T3 (es) * 2011-05-13 2021-03-17 Vizrt Ag Estimación de una postura basada en la silueta
CN103148788B (zh) * 2013-03-29 2015-06-17 宁凯 用于远距离识别的体感外部设备与人体高度识别方法
CN107507243A (zh) * 2016-06-14 2017-12-22 华为技术有限公司 一种摄像机参数调整方法、导播摄像机及系统
CN109241835A (zh) * 2018-07-27 2019-01-18 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9911198B2 (en) * 2015-12-17 2018-03-06 Canon Kabushiki Kaisha Method, system and apparatus for matching moving targets between camera views
CN105930767A (zh) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 一种基于人体骨架的动作识别方法
CN107335192A (zh) * 2017-05-26 2017-11-10 深圳奥比中光科技有限公司 运动辅助训练方法、装置及存储装置
CN109460702A (zh) * 2018-09-14 2019-03-12 华南理工大学 基于人体骨架序列的乘客异常行为识别方法
CN109325456A (zh) * 2018-09-29 2019-02-12 佳都新太科技股份有限公司 目标识别方法、装置、目标识别设备及存储介质
CN110147717A (zh) * 2019-04-03 2019-08-20 平安科技(深圳)有限公司 一种人体动作的识别方法及设备

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11450027B2 (en) * 2019-10-31 2022-09-20 Beijing Dajia Internet Information Technologys Co., Ltd. Method and electronic device for processing videos
CN112464786A (zh) * 2020-11-24 2021-03-09 泰康保险集团股份有限公司 一种视频的检测方法及装置
CN112464786B (zh) * 2020-11-24 2023-10-31 泰康保险集团股份有限公司 一种视频的检测方法及装置
CN113268626A (zh) * 2021-05-26 2021-08-17 中国人民武装警察部队特种警察学院 数据处理方法、装置、电子设备及存储介质
CN113268626B (zh) * 2021-05-26 2024-04-26 中国人民武装警察部队特种警察学院 数据处理方法、装置、电子设备及存储介质
CN113392743A (zh) * 2021-06-04 2021-09-14 北京格灵深瞳信息技术股份有限公司 异常动作检测方法、装置、电子设备和计算机存储介质
CN113825012A (zh) * 2021-06-04 2021-12-21 腾讯科技(深圳)有限公司 视频数据处理方法和计算机设备
CN113392743B (zh) * 2021-06-04 2023-04-07 北京格灵深瞳信息技术股份有限公司 异常动作检测方法、装置、电子设备和计算机存储介质
CN113505733A (zh) * 2021-07-26 2021-10-15 浙江大华技术股份有限公司 行为识别方法、装置、存储介质及电子装置
CN113673503A (zh) * 2021-08-25 2021-11-19 浙江大华技术股份有限公司 一种图像检测的方法及装置
CN113673503B (zh) * 2021-08-25 2024-03-26 浙江大华技术股份有限公司 一种图像检测的方法及装置
CN114360201A (zh) * 2021-12-17 2022-04-15 中建八局发展建设有限公司 基于ai技术的建筑临边危险区域越界识别方法和系统
CN113989944A (zh) * 2021-12-28 2022-01-28 北京瑞莱智慧科技有限公司 操作动作识别方法、装置及存储介质
CN115063885A (zh) * 2022-06-14 2022-09-16 中国科学院水生生物研究所 一种鱼体运动特征的分析方法及系统
CN115063885B (zh) * 2022-06-14 2023-06-23 中国科学院水生生物研究所 一种鱼体运动特征的分析方法及系统
CN116055684B (zh) * 2023-01-18 2023-12-12 广州乐体科技有限公司 基于画面监控的在线体育教学系统
CN116055684A (zh) * 2023-01-18 2023-05-02 訸和文化科技(苏州)有限公司 基于画面监控的在线体育教学系统
CN116890668A (zh) * 2023-09-07 2023-10-17 国网浙江省电力有限公司台州供电公司 信息同步互联的安全充电方法及充电装置
CN116890668B (zh) * 2023-09-07 2023-11-28 国网浙江省电力有限公司杭州供电公司 信息同步互联的安全充电方法及充电装置

Also Published As

Publication number Publication date
CN110147717B (zh) 2023-10-20
CN110147717A (zh) 2019-08-20

Similar Documents

Publication Publication Date Title
WO2020199480A1 (zh) 一种人体动作的识别方法及设备
WO2020199479A1 (zh) 一种人体动作的识别方法及设备
WO2021004112A1 (zh) 异常人脸检测方法、异常识别方法、装置、设备及介质
JP5554984B2 (ja) パターン認識方法およびパターン認識装置
WO2019071664A1 (zh) 结合深度信息的人脸识别方法、装置及存储介质
US11928800B2 (en) Image coordinate system transformation method and apparatus, device, and storage medium
WO2022078041A1 (zh) 遮挡检测模型的训练方法及人脸图像的美化处理方法
WO2017088432A1 (zh) 图像识别方法和装置
WO2020052476A1 (zh) 特征点定位方法、存储介质和计算机设备
CN110705478A (zh) 人脸跟踪方法、装置、设备及存储介质
US9626552B2 (en) Calculating facial image similarity
CN112364827B (zh) 人脸识别方法、装置、计算机设备和存储介质
US10489636B2 (en) Lip movement capturing method and device, and storage medium
WO2015070764A1 (zh) 一种人脸定位的方法与装置
CN109299658B (zh) 脸部检测方法、脸部图像渲染方法、装置及存储介质
US10650234B2 (en) Eyeball movement capturing method and device, and storage medium
CN111695462A (zh) 一种人脸识别方法、装置、存储介质和服务器
WO2021169754A1 (zh) 构图提示方法、装置、存储介质及电子设备
CN109255802B (zh) 行人跟踪方法、装置、计算机设备及存储介质
CN107944381B (zh) 人脸跟踪方法、装置、终端及存储介质
WO2019033575A1 (zh) 电子装置、人脸追踪的方法、系统及存储介质
JP2022542199A (ja) キーポイントの検出方法、装置、電子機器および記憶媒体
WO2022206680A1 (zh) 图像处理方法、装置、计算机设备和存储介质
US20200275017A1 (en) Tracking system and method thereof
WO2020164284A1 (zh) 基于平面检测的活体识别方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19922701

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19922701

Country of ref document: EP

Kind code of ref document: A1