WO2023273075A1 - 行为识别方法、装置、计算机设备及存储介质 - Google Patents

行为识别方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2023273075A1
WO2023273075A1 PCT/CN2021/127654 CN2021127654W WO2023273075A1 WO 2023273075 A1 WO2023273075 A1 WO 2023273075A1 CN 2021127654 W CN2021127654 W CN 2021127654W WO 2023273075 A1 WO2023273075 A1 WO 2023273075A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
image
target
action
detection
Prior art date
Application number
PCT/CN2021/127654
Other languages
English (en)
French (fr)
Inventor
梁博
熊梓云
陈伟
罗玮杰
刘雁萍
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2023273075A1 publication Critical patent/WO2023273075A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the embodiment of the present application relates to the technical field of image recognition, and relates to but not limited to a behavior recognition method, device, computer equipment and storage medium.
  • some monitoring devices can identify and detect smart devices used by personnel, but when the smart devices cannot be clearly identified, the recognition efficiency is low.
  • Embodiments of the present application provide at least one behavior recognition method, device, computer equipment, and storage medium.
  • the embodiment of the present application provides a behavior recognition method, the recognition method includes:
  • the human body detection frame Based on the overlapping area of each of the human body detection frames and the key monitoring area, select the human body detection frame whose area ratio between the overlapping area and the area of the human body detection frame is greater than or equal to a preset ratio as the target human body detection frame; or Selecting the human body detection frame whose area ratio between the overlapping area and the area of the key monitoring area is greater than or equal to a preset ratio as the target human body detection frame;
  • the key frame image includes a target behavior corresponding to the preset target action.
  • the target action recognition is performed on the human body detection frame in the key frame image, so as to determine whether there is a person performing the target action, and then achieve the purpose of monitoring the behavior of the person in the key monitoring area.
  • the method further includes acquiring a key frame image; the acquiring a key frame image includes:
  • the key frame images are extracted from the multiple frames of the original frame images based on a preset frame extraction rule.
  • different frame extraction rules can be determined to extract key frame images.
  • the determining at least one human body image in the key frame image and the human body detection frame corresponding to each human body image by performing human body detection on the pre-extracted key frame image includes:
  • the body detection frame of the body image including:
  • the human body detection frame corresponding to each human body image can be drawn according to the edge of the extension area, so that the human body detection frame matches the human body image better.
  • the action recognition is performed on the target intercepted image corresponding to each target human body detection frame, and the detection result of each target intercepted image is obtained, including:
  • the confidence is greater than a preset confidence threshold, it is determined that the detected result of the intercepted image of the target is that the intercepted image of the target includes the preset target action.
  • the human action recognition model is trained through the following steps:
  • the positive samples and the negative samples are input into the human action recognition model to be trained, and model training is performed on the human action recognition model to be trained to obtain a trained human action recognition model.
  • the positive samples and negative samples are input into the human action recognition model to be trained, and the model training is performed on the human action recognition model to be trained until the model loss of the model meets the preset loss threshold, and the training process is completed, so that the training is good.
  • the recognition results of the human action recognition model are more accurate.
  • the action recognition is performed on the target intercepted image corresponding to each target human body detection frame, and the detection result of each target intercepted image is obtained, including:
  • the detection result of the target intercepted image is that the target intercepted image includes the preset target action
  • the positional relationship includes the distance between at least two target detection points, the offset angle between at least two target detection points in the human body image, and the at least two target detection points in the human body image within a preset time range.
  • the identification method also includes:
  • the warning information includes at least one of the following: the warning information includes at least one of the following: the number of the preset target action performers, the detection point position of each preset target action performer relative to the key frame image And the confidence value of each preset target action performer performing the target action output through the human action recognition model data.
  • the processing basis for the monitoring personnel is improved, and it is convenient for the monitoring personnel to process based on the above information.
  • the target action includes one or more of the following actions:
  • the embodiment of the present application also provides a behavior recognition device, and the behavior recognition device includes:
  • the detection module is configured to determine at least one human body image in the key frame image and the human body detection frame corresponding to each human body image by performing human body detection on the pre-extracted key frame image, and each of the human body detection frame and the predicted Set the overlapping area of key monitoring areas;
  • the selection module is configured to select a human body detection frame whose area ratio between the overlapping area and the area of the human body detection frame is greater than or equal to a preset ratio based on the overlapping area of each of the human body detection frames and the key monitoring area as the target A human body detection frame; or select a human body detection frame whose area ratio between the overlapping area and the area of the key monitoring area is greater than or equal to a preset ratio as the target human body detection frame;
  • the recognition module is configured to perform action recognition on the target intercepted image corresponding to each target human body detection frame, and obtain the detection result of each target intercepted image;
  • the determining module is configured to, in response to the detection result indicating that there is at least one target intercepted image including a preset target action, determine that the key frame image includes a target behavior corresponding to the preset target action.
  • the behavior recognition device further includes: an acquisition module;
  • the acquisition module is configured to acquire key frame images
  • the acquisition module is further configured to: decode the acquired real-time video stream to obtain multi-frame original frame images corresponding to the real-time video stream;
  • the key frame images are extracted from the multiple frames of the original frame images based on a preset frame extraction rule.
  • the detection module includes:
  • a detection unit configured to perform human body position detection on the key frame image to obtain at least one human body position detection point
  • An acquisition unit configured to acquire the coordinate position of each human body position detection point in the key frame image in the key frame image
  • a determining unit configured to determine at least one human body image in the key frame image based on the coordinate position of each human body position detection point in the key frame image, and draw the at least one human body image in the at least one human body image Human detection frame.
  • the determining unit is further configured to:
  • the identification module is further configured to:
  • the confidence is greater than a preset confidence threshold, it is determined that the detected result of the intercepted image of the target is that the intercepted image of the target includes the preset target action.
  • the behavior recognition device further includes a training module, and the training module is configured to;
  • the training module is configured to obtain a positive sample image including a target action, and a negative sample image not including the target action;
  • the positive samples and the negative samples are input into the human action recognition model to be trained, and model training is performed on the human action recognition model to be trained to obtain a trained human action recognition model.
  • the identification module is further configured to:
  • the detection result of the target intercepted image is that the target intercepted image includes the preset target action
  • the positional relationship includes the distance between at least two target detection points, the offset angle between at least two target detection points in the human body image, and the at least two target detection points in the human body image within a preset time range.
  • the behavior recognition device also includes:
  • the display module is configured to display the key frame image including the preset target action and the alarm information corresponding to the key frame image at the front end;
  • the warning information includes at least one of the following: the warning information includes at least one of the following: the number of the preset target action performers, the detection point position of each preset target action performer relative to the key frame image And the confidence value of each preset target action performer performing the target action output through the human action recognition model data.
  • the target action includes one or more of the following actions:
  • the embodiment of the present application also provides a computer device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processing
  • the processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any one of the first aspects in the first aspect, is executed. Steps in one possible implementation.
  • a behavior recognition method, device, computer equipment, and storage medium including: determining at least one human body image in the key frame image and corresponding The human body detection frame of the body image; based on the overlapping area of each of the human body detection frames and the key monitoring area, select the human body detection frame whose area ratio between the overlapping area and the area of the human body detection frame is greater than or equal to a preset ratio be the target human body detection frame; or select the human body detection frame whose area ratio of the overlapping area and the area of the key monitoring area is greater than or equal to the preset ratio as the target human body detection frame; intercept the image of the target corresponding to each target human body detection frame Perform action recognition to obtain a detection result of each target intercepted image; in response to the detection result indicating that there is at least one target intercepted image that includes a preset target action, then determine that the key frame image includes a target action corresponding to the preset target action target behavior.
  • FIG. 1A shows a flowchart of a behavior recognition method provided by an embodiment of the present application
  • FIG. 1B shows a schematic diagram of a system architecture to which the behavior recognition method of the embodiment of the present application can be applied;
  • Fig. 2 shows the schematic diagram of human body detection provided by the embodiment of the present application
  • FIG. 3 shows a flow chart of another behavior recognition method provided by the embodiment of the present application.
  • Fig. 4 shows a schematic diagram of a behavior recognition device provided by an embodiment of the present application
  • Fig. 5 shows a schematic diagram of another behavior recognition device provided by the embodiment of the present application.
  • FIG. 6 shows a specific schematic diagram of the detection module in the behavior recognition device provided by the embodiment of the present application.
  • FIG. 7 shows a schematic diagram of a computer device provided by an embodiment of the present application.
  • a and/or B in this article only describes an associative relationship, which means that there may be three kinds of relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists independently. situation.
  • at least one herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
  • this application provides a behavior recognition method, by performing human body detection on the key frame image, and selecting the target human body detection frame, performing human body action recognition on the target intercepted image in the target human body detection frame, and determining whether the target intercepted image is There is a target action, so it is judged whether the key frame image includes the target action. In this way, it is possible to judge whether there is an abnormal action in the key monitoring area by detecting the target action, without detecting the corresponding execution item, and improving the detection efficiency for the key monitoring area.
  • the execution subject of the behavior recognition method provided in the embodiment of the present application is generally a computer device with a certain computing power.
  • the computer The equipment includes, for example: terminal equipment or server or other processing equipment, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA) , handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • the behavior recognition method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • the recognition method includes steps S101 to S104, wherein:
  • S101 By performing human body detection on pre-extracted key frame images, determine at least one human body image in the key frame image and a human body detection frame corresponding to each human body image, and each of the human body detection frames and preset focus monitoring The overlapping area of the regions.
  • human body detection can be performed on the extracted key frame image, at least one human body image included in the key frame image is detected, and a corresponding human body detection frame is added to each human body image, and based on each The position and size of the human body detection frame determine the overlapping area between each human body detection frame and the preset key monitoring area.
  • before performing human body detection on the key frame image further includes: acquiring the key frame image.
  • the key frame image may be extracted through the following steps:
  • the key frame images are extracted from the multiple frames of the original frame images based on a preset frame extraction rule.
  • the frame rate of the video is generally 30 frames per second, and can be reduced to 25 frames per second at the lowest, so the decoding process can be performed based on the frame rate of the real-time video stream.
  • the surveillance video may be decoded while receiving it in real time, so as to obtain multi-frame original frame images corresponding to the real-time video stream.
  • After obtaining multiple frames of original frame images in view of the limitation of processing resources and based on the needs of the target business, it is generally not necessary to process each frame of original frame images. Therefore, based on certain frame extraction rules, for example, a frames, or a rule of extracting one frame every five frames, extracting a frame image from multiple frames of the original frame image as a key frame image.
  • human body detection processing can be performed on the key frame image, so that at least one human body image in the key frame image can be detected, and a corresponding human body image can be added to each human body image detection box.
  • FIG. 2 is a schematic diagram of human body detection provided by the embodiment of the present application.
  • the determining at least one human body image in the key frame image and the human body detection frame corresponding to each human body image by performing human body detection on the pre-extracted key frame images includes:
  • the human body image included in the key frame image and the corresponding approximate position can be screened out.
  • the position of the human body can be accurately determined by calibrating the detection points of the human body.
  • the preset coordinate position of the human body detection point in the key frame image and then, based on the determined coordinate position of the human body detection point, determine the human body images included in the key frame image, and draw a corresponding human body detection frame.
  • At least one human body image in the key frame image is determined based on the coordinate position of each human body position detection point in the key frame image, and each human body image in the at least one human body image is drawn Human body detection frames for human body images, including:
  • the human body detection point when the human body detection point is set as the human head, center on the determined human head detection point, move toward the preset designated direction, such as the horizontal and vertical directions, or the pre-predicted human upright direction and the corresponding In the vertical direction, extend outward until each human body image is covered, and an extended area covering each human body image is obtained.
  • the preset designated direction such as the horizontal and vertical directions, or the pre-predicted human upright direction and the corresponding In the vertical direction
  • an extended area covering each human body image is obtained.
  • a line segment is formed, and based on each point on the line segment, it is extended in the direction of the vertical line segment until it covers the entire human body image, and the extension area covering each human body image is obtained, and the corresponding extension
  • the image in the area is at least one human body image in the key frame image.
  • a human body detection frame corresponding to each human body image may be drawn according to the edge of the extended area.
  • the identification method also includes:
  • S102 Based on the overlapping area of each of the human body detection frames and the key monitoring area, select the human body detection frame whose area ratio of the overlapping area to the area of the human body detection frame is greater than or equal to a preset ratio as the target human body detection frame ; or select the human body detection frame whose area ratio of the overlapping area to the key monitoring area is greater than or equal to a preset ratio as the target human body detection frame.
  • the ratio of the overlapping area to each human body detection frame after determining the overlapping area of each of the human body detection frames and the preset key monitoring area, it is possible to determine the ratio of the overlapping area to each human body detection frame, and the ratio of the overlapping area to the The proportion of the entire key monitoring area, so that the human body detection frame that meets the preset conditions can be selected as the target human body detection frame.
  • the key frame image includes four human body detection frames 1, 2, 3, and 4, and the corresponding curve area is the key monitoring area.
  • the human body detection frame 1 is completely located inside the key monitoring area, so the area ratio of the overlapping area to the area of the human body detection frame is 1, which is greater than 50% of the preset ratio, and it can be determined as the target human body detection frame, and the human body detection frame 2 is completely separated from the key monitoring area , so the area ratio of the overlapping area to the area of the human body detection frame is 0, which is less than 50% of the preset ratio, which cannot be used as the target human body detection frame and can be directly screened out.
  • the area ratio of the overlapping area of the human body detection frame 3 to the area of the human body detection frame is greater than 50% of the preset ratio, so it can be judged that the human body detection frame 3 is the target human body detection frame, correspondingly, the human body
  • the area ratio of the overlapping area of the detection frame 4 to the area of the human body detection frame is less than 50% of the preset ratio, so it can be judged that 4 cannot be used as the target human body detection frame.
  • the human body detection frame whose area ratio between the overlapping area and the area of the key monitoring area is greater than or equal to the preset ratio as the target human body detection frame.
  • the method of using the human body detection frame whose ratio is greater than or equal to the preset ratio as the target human body detection frame is similar and will not be repeated here.
  • the method of selecting a human body detection frame satisfying a preset condition as a target human body detection frame may further include:
  • the key frame image includes 1, 2, 3, 4, four human body detection frames, and the corresponding curved area is the key monitoring area. It can be seen from the figure that 1 is completely located inside the key monitoring area, Therefore, it can be determined as the target human body detection frame, and 2 is completely separated from the key monitoring area, so it can be directly screened out.
  • the area of 3 located in the key monitoring area is larger than the area outside the key area, so it can be judged that 3 is the target human body detection frame, and correspondingly, the area of 4 located in the key monitoring area is smaller than The area outside the key area, so it can be judged that 4 is not the target human body detection frame.
  • the identification method further includes:
  • S103 Perform action recognition on the target intercepted image corresponding to each target human body detection frame, and obtain a detection result of each target intercepted image.
  • the action recognition can be performed on the target intercepted image based on the preset action recognition method, and then the detection result corresponding to each target intercepted image can be obtained.
  • the target action includes one or more of the following actions: playing with a mobile phone, making a phone call, and checking a mobile phone.
  • the motion recognition can be completed through the trained motion recognition module, and the detection result can be obtained.
  • at least one target intercepted image corresponding to the target human body detection frame is input into a pre-trained human action recognition model, and the confidence value of each target intercepted image including the preset target action is obtained. Test results. For each of the intercepted images of the target, if the confidence is greater than a preset confidence threshold, it is determined that the detected result of the intercepted image of the target is that the intercepted image of the target includes the preset target action.
  • the model can output the confidence level of the preset target action for each target intercepted image, for example: A The confidence level of the image is 95%, and the confidence level of the B image is 80%, then based on a preset confidence threshold, such as 90%, it can be determined that there is a preset target action in the A image.
  • a preset confidence threshold such as 90%
  • the human action recognition model is trained through the following steps:
  • the positive samples and the negative samples are input into the human action recognition model to be trained, and model training is performed on the human action recognition model to be trained to obtain a trained human action recognition model.
  • the target action can be set as approaching both hands, approaching the ear with the left or right hand, bowing the head, and the like.
  • the label corresponding to the positive sample image is that the action in the positive sample image is consistent with the preset target action
  • the label corresponding to the negative sample image is that the action in the negative sample image is inconsistent with the preset target action .
  • the target action may include the distance between at least two target detection points in the human body image, the offset angle between at least two target detection points in the human body image, the preset A moving distance and/or a variation range of an offset angle between at least two target detection points in the human body image within a time range.
  • the detection purpose of playing with a mobile phone As an example, take the detection purpose of playing with a mobile phone as an example.
  • the hand When playing with a mobile phone, the hand will be close to the eyes or the two hands will be close together, or the offset angle between the hand and the face will decrease or the offset angle between the hands will decrease.
  • a certain angle threshold it can also monitor whether the hand or head does not move or moves in a small range within a certain time range. Therefore, it is possible to determine whether the monitoring personnel are playing with the mobile phone by monitoring the above actions .
  • the detection purpose as a phone call it can monitor the distance between the hand and the ear, or the offset angle, or within a certain time range, the hand or head does not move or occurs in a small range The movement, etc., will not be repeated here.
  • the embodiment of the present application can be applied to monitor the target personnel in the monitoring area. For example, in the factory where the operating machine is placed, it is forbidden for the staff to slip away, distract, etc., especially the violations that may cause potential safety hazards, such as playing with mobile phones and making phone calls. Therefore It can be judged whether there is a violation of the personnel in the monitoring area by monitoring whether there is a target action; or, in some test scenarios, when personnel are prohibited from passing messages, it can be determined whether there is a target action by monitoring whether there is a target action in the monitoring area. Violations; or, on the road, especially at intersections and other traffic-intensive places, if pedestrians play with mobile phones while crossing the road, there will be a greater traffic safety hazard. You can also judge whether there is a target movement by monitoring the personnel in the monitoring area Whether there are any violations will not be repeated here.
  • the identification method further includes:
  • S104 In response to the detection result indicating that there is at least one target intercepted image including a preset target action, determine that the key frame image includes a target behavior corresponding to the preset target action.
  • this step by determining whether there is a target intercepted image including a preset target action, it is determined whether the key frame image includes the target action, and then it is determined whether there is a target action in the detection area corresponding to the key frame.
  • an area corresponds to a manager in charge, so when it is determined that there is a target capture image including the target action, the key frame image including the target action can be directly sent to the corresponding manager in charge, so that the manager in charge can Carry out corresponding processing.
  • human body detection is performed on the key frame image, and a target human body detection frame is selected, and human body action recognition is performed on the target intercepted image in the target human body detection frame to determine whether there is a target action in the target intercepted image, thereby judging whether the key frame image is Include target actions.
  • human body action recognition is performed on the target intercepted image in the target human body detection frame to determine whether there is a target action in the target intercepted image, thereby judging whether the key frame image is Include target actions.
  • FIG. 1B shows a schematic diagram of a system architecture to which the behavior recognition method of the embodiment of the present application can be applied; as shown in FIG. 1B , the system architecture includes: a behavior recognition terminal 201 , a network 202 and an electronic device 203 .
  • the behavior recognition terminal 201 and the electronic device 203 establish a communication connection through the network 202, and the electronic device 203 reports the acquired key frame image to the behavior recognition terminal 201 through the network 202;
  • the key frame image 201 carries out the key frame image Human body detection; first, determine at least one human body image in the key frame image, the human body detection frame corresponding to each human body image, and the overlapping area between each human body detection frame and the preset key monitoring area; secondly, select the target human body according to the overlapping area detection frame, and perform action recognition on each target human body detection frame. Finally, based on the detection results, it is judged whether the target behavior is included in the key frame image.
  • the behavior recognition terminal 201 uploads the output key frame images to the network 202 .
  • the electronic device 203 may include an image acquisition device or an image scanning device, and the behavior recognition terminal 201 may include a visual processing device capable of processing visual information or a remote server.
  • the network 202 may be connected in a wired or wireless manner.
  • the electronic device 203 can communicate with the visual processing device through a wired connection, such as performing data communication through a bus; when the behavior recognition terminal 201 is a remote server, the electronic device 203 can Data exchange with remote server through wireless network.
  • the electronic device 203 may be a vision processing device with a video capture module, or a host with a camera.
  • the pose acquisition method in the embodiment of the present application may be executed by the electronic device 203, and the above-mentioned system architecture may not include the network 202 and the server.
  • the recognition method includes steps S301 to S305, wherein:
  • S301 By performing human body detection on pre-extracted key frame images, determine at least one human body image in the key frame image and a human body detection frame corresponding to each human body image, and each of the human body detection frames and preset focus monitoring The overlapping area of the regions.
  • S302 Based on the overlapping area of each of the human body detection frames and the key monitoring area, select the human body detection frame whose area ratio of the overlapping area to the area of the human body detection frame is greater than or equal to a preset ratio as the target human body detection frame ; or select the human body detection frame whose area ratio between the overlapping area and the key monitoring area is greater than or equal to a preset ratio as the target human body detection frame.
  • S303 Perform action recognition on the target intercepted image corresponding to each target human body detection frame, and obtain a detection result of each target intercepted image.
  • S304 In response to the detection result indicating that there is at least one target intercepted image including a preset target action, determine that the key frame image includes a target behavior corresponding to the preset target action.
  • step S301 to step S304 can refer to the description of step S101 to step S104, and can achieve the same technical effect, and will not be repeated here.
  • the key frame image and the alarm information corresponding to the key frame image can be displayed on the front end, so that the corresponding responsible personnel can perform corresponding operations.
  • the alarm information includes at least one of the following: the alarm information includes at least one of the following: the number of the preset target action performers, the detection of each preset target action performer relative to the key frame image The point position and the confidence value of each preset target action performer performing the target action output through the human action recognition model data.
  • the number of target action performers exceeds a certain value, it can be determined that the key frame image has relatively serious violations, and should be subjected to a higher level of early warning processing.
  • the position of the detection point of each target action performer relative to the key frame image is convenient for the person in charge to find the corresponding target action performer, or determine the information of the person at the position based on the coordinate position.
  • the confidence value of each target action performer performing the target action output through the human action recognition model data is convenient for the person in charge to judge the urgency of the relevant processing. In the case of extremely high confidence, it is determined that there is a target action The probability is extremely high and should be dealt with in a timely manner.
  • the embodiment of the present application provides convenience for responsible personnel to deal with related problems by displaying key frame images with target actions and corresponding alarm information on the front end, thereby improving the processing efficiency of target actions.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the embodiment of the present application also provides a behavior recognition device corresponding to the behavior recognition method. Since the problem-solving principle of the device in the embodiment of the application is similar to the above-mentioned behavior recognition method in the embodiment of the application, the device's For the implementation, please refer to the implementation of the method, and repeated descriptions will not be repeated.
  • Fig. 4 is a schematic diagram of a kind of behavior recognition provided by the embodiment of the present application
  • Fig. 5 is a schematic diagram of another kind of behavior recognition device provided by the embodiment of the present application
  • Fig. 6 It is a specific schematic diagram of the detection module in the behavior recognition device provided by the embodiment of the present application.
  • the behavior recognition device includes: a detection module 410, a selection module 420, an identification module 430, and a determination module 440; wherein:
  • the detection module 410 is configured to determine at least one human body image in the key frame image and a human body detection frame corresponding to each human body image by performing human body detection on the pre-extracted key frame image, and each of the human body detection frame and The overlapping area of preset key monitoring areas;
  • the selection module 420 is configured to select a human body detection frame whose area ratio between the overlapping area and the area of the human body detection frame is greater than or equal to a preset ratio based on the overlapping area of each of the human body detection frames and the key monitoring area as A target human body detection frame; or select a human body detection frame whose area ratio between the overlapping area and the area of the key monitoring area is greater than or equal to a preset ratio as the target human body detection frame;
  • the recognition module 430 is configured to perform action recognition on the target intercepted image corresponding to each target human body detection frame, and obtain the detection result of each target intercepted image;
  • the determining module 440 is configured to, in response to the detection result indicating that there is at least one target intercepted image including a preset target action, determine that the key frame image includes a target behavior corresponding to the preset target action.
  • the behavior recognition device further includes: an acquisition module 450;
  • the acquisition module 450 is configured to acquire key frame images
  • the acquisition module 450 is further configured to: decode the acquired real-time video stream to obtain multi-frame original frame images corresponding to the real-time video stream;
  • the key frame images are extracted from the multiple frames of the original frame images based on a preset frame extraction rule.
  • the detection module 410 includes:
  • the detection unit 411 is configured to perform human body position detection on the key frame image to obtain at least one human body position detection point;
  • the obtaining unit 412 is configured to obtain the coordinate position of each human body position detection point in the key frame image in the key frame image;
  • the determination unit 413 is configured to determine at least one human body image in the key frame image based on the coordinate position of each human body position detection point in the key frame image, and draw each human body image in the at least one human body image human body detection frame.
  • the determining unit 413 is further configured to:
  • the identification module 430 is further configured to:
  • the confidence is greater than a preset confidence threshold, it is determined that the detected result of the intercepted image of the target is that the intercepted image of the target includes the preset target action.
  • the behavior recognition device further includes a training module 460;
  • the training module 460 is configured to acquire a positive sample image including a target action, and a negative sample image not including the target action;
  • the positive samples and the negative samples are input into the human action recognition model to be trained, and model training is performed on the human action recognition model to be trained to obtain a trained human action recognition model.
  • the identification module 430 is further configured to:
  • the detection result of the target intercepted image is that the target intercepted image includes the preset target action
  • the positional relationship includes the distance between at least two target detection points, the offset angle between at least two target detection points in the human body image, and the at least two target detection points in the human body image within a preset time range.
  • the behavior recognition device further includes:
  • the display module 470 is configured to display the key frame image including the preset target action and the alarm information corresponding to the key frame image at the front end;
  • the warning information includes at least one of the following: the warning information includes at least one of the following: the number of the preset target action performers, the detection point position of each preset target action performer relative to the key frame image And the confidence value of each preset target action performer performing the target action output through the human action recognition model data.
  • the preset target actions include one or more of the following actions:
  • human body detection is performed on the key frame image, and a target human body detection frame is selected, and human body action recognition is performed on the target intercepted image in the target human body detection frame to determine whether there is a target action in the target intercepted image, thereby judging whether the key frame image is Include target actions.
  • human body action recognition is performed on the target intercepted image in the target human body detection frame to determine whether there is a target action in the target intercepted image, thereby judging whether the key frame image is Include target actions.
  • the embodiment of the present application also provides a computer device, as shown in Figure 7, which is a schematic structural diagram of the computer device provided in the embodiment of the present application, including:
  • the human body detection frame Based on the overlapping area of each of the human body detection frames and the key monitoring area, select the human body detection frame whose area ratio between the overlapping area and the area of the human body detection frame is greater than or equal to a preset ratio as the target human body detection frame; or Selecting the human body detection frame whose area ratio between the overlapping area and the area of the key monitoring area is greater than or equal to a preset ratio as the target human body detection frame;
  • the key frame image includes a target behavior corresponding to the preset target action.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the behavior recognition method described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the embodiment of the present application also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the behavior recognition method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in other embodiments, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the present application provides a behavior recognition method, device, computer equipment, and storage medium, including: determining at least one human body image in the key frame image and the human body corresponding to each human body image by performing human body detection on pre-extracted key frame images The detection frame and the overlapping area with the preset key monitoring area; the human body detection frame whose area ratio between the overlapping area and the area of the human body detection frame or the area of the key monitoring area is greater than or equal to the preset ratio is selected as the target human body detection frame; for each target human body Perform action recognition on the target intercepted image of the detection frame to obtain a detection result; in response to the detection result indicating that there is at least one target intercepted image including a preset target action, determine that the key frame image includes a target behavior corresponding to the preset target action. In this way, by detecting the target action, it is judged whether there is an abnormal action in the key monitoring area, and the detection efficiency for the key monitoring area is improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Alarm Systems (AREA)

Abstract

本申请提供了一种行为识别方法、装置、计算机设备及存储介质,包括:通过对预先抽取的关键帧图像进行人体检测,确定关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框以及与预设重点监测区域的重叠面积;选取重叠面积与人体检测框面积或者重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;对每个目标人体检测框的目标截取图像进行动作识别,得到检测结果;响应于检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。这样,通过检测目标动作判断重点监测区域中是否存在异常行动,提高针对重点监测区域的检测效率。

Description

行为识别方法、装置、计算机设备及存储介质
相关申请的交叉引用
本专利申请要求2021年06月30日提交的中国专利申请号为202110737409.1、申请人为深圳市商汤科技有限公司,申请名称为“行为识别方法、装置、计算机设备及存储介质”的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及图像识别技术领域,涉及但不限于一种行为识别方法、装置、计算机设备及存储介质。
背景技术
随着智能设备的普及,如智能手机、平板电脑、智能手表等,为人们的生活工作带来了极大的便利。与此同时,智能设备上的游戏、购物、聊天功能对于人们一定程度的诱惑,导致人们可能在使用智能设备的同时,未能注意到自身以及周边环境的安全,由此引发安全事故。
相关技术中,一些监控设备可以就人员使用的智能设备进行识别检测,但是在无法明确识别出智能设备的情况下,识别效率较低。
发明内容
本申请实施例至少提供一种行为识别方法、装置、计算机设备及存储介质。
第一方面,本申请实施例提供了一种行为识别方法,所述识别方法包括:
通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积;
基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;
对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;
响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
在本实施例中,通过对关键帧图像中的人体检测框进行目标动作识别,从而确定是否存在执行目标动作的人员,进而起到对重点监测区域中人员的行为进行监控的目的。
在一些实施例中,所述方法还包括,获取关键帧图像;所述获取关键帧图像,包括:
对获取的实时视频流进行解码处理,得到与所述实时视频流对应的多帧原始帧图像;
基于预设的抽帧规则,从所述多帧所述原始帧图像中抽取所述关键帧图像。
这样,可以按照一定的处理需求,确定不同的抽帧规则,抽取关键帧图像。
在一些实施例中,所述通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,包括:
对所述关键帧图像进行人体位置检测得到至少一个人体位置检测点;
获取所述关键帧图像中,每个人体位置检测点在所述关键帧图像中的坐标位置;
基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框。
这样,通过对关键帧图像进行人体位置检测,绘制对应每个人体图像的人体检测框,从而排除不属于重点监测区域的人体图像,便于后续针对重点监测区域的监控。
在一些实施例中,所述基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框,包括:
针对每个人体位置检测点,基于所述人体位置检测点的坐标位置,得到覆盖与所述人体位置检测点对应的人体图像的延伸区域,其中,所述人体位置检测点位于所述延伸区域内;
基于所述人体图像的延伸区域,确定所述关键帧图像中的所述人体图像,并按照所述延伸区域的边缘,绘制对应所述人体图像的人体检测框。
这样,通过确定每个人体图像的延伸区域,能够按照所述延伸区域的边缘,绘制对应每个人体图像的人体检测框,从而使得人体检测框与该人体图像更加匹配。
在一些实施例中,所述对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果,包括:
将至少一个所述目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中,获取每个目标截取图像的、包括预设目标动作的置信度的检测结果;
针对所述每个目标截取图像,在所述置信度大于预设的置信度阈值情况下,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作。
这样,按照置信度的大小,能够更加准确地判断图像中是否包括预设目标动作。
在一些实施例中,通过以下步骤训练所述人体动作识别模型:
获取包括目标动作的正样本图像,以及不包括所述目标动作的负样本图像;
基于所述正样本图像以及与所述正样本图像对应的正样本标签构建正样本,并基于所述负样本图像以及与所述负样本图像对应的负样本标签构建负样本,所述正样本标签为所述正样本图像中的动作与所述预设目标动作一致,所述负样本标签为所述负样本图像中的动作与所述预设目标动作不一致;
将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,得到训练好的人体动作识别模型。
这样,将正样本及负样本输入至待训练的人体动作识别模型中,对待训练的人体动作识别模型进行模型训练,直至该模型的模型损失满足预设的损失阈值,完成训练过程,使得训练好的人体动作识别模型的识别结果更加精确。
在一些实施例中,所述对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果,包括:
针对每个目标人体检测框对应的目标截取图像,识别所述目标截图图像中至少两个目标检测点之间的位置关系,得到所述目标截取图像的、包括所述位置关系的检测结果;
响应于所述位置关系与预设目标动作中对应的至少两个检测点之间的预设位置关系相匹配,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作;
其中,所述位置关系包括至少两个目标检测点之间的距离、所述人体图像中至少两个目标检测点之间的偏移角、在预设时间范围内所述人体图像中至少两个目标检测点之间的移动距离和/或偏移角变化范围中的一者或者多者。
这样,通过识别目标截图图像中至少两个目标检测点之间的位置关系,能够进一步更加精确地判断目标截取图像中是否包括预设目标动作。
在一些实施例中,所述识别方法还包括:
将包括所述预设目标动作的关键帧图像以及与所述关键帧图像对应的告警信息进行前端展示;
所述告警信息包括以下至少一种:所述告警信息包括以下至少一种:所述预设目标动作执行人员的数量、每个预设目标动作执行人员相对于所述关键帧图像的检测点位置以及通过所述人体动作识别模型数据输出的每个预设目标动作执行人员执行所述目标动作的置信度数值。
这样,通过显示包括预设目标动作的关键帧图像以及对应的告警信息,为监测人员提高处理依据,便于监测人员基于上述信息进行处理。
在一些实施例中,所述目标动作包括以下动作中的一种或者多种:
玩手机、打电话、查看手机。
第二方面,本申请实施例还提供一种行为行为识别装置,所述行为识别装置包括:
检测模块,配置为通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一 个人体图像以及对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积;
选取模块,配置为基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;
识别模块,配置为对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;
确定模块,配置为响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
在一些实施例中,所述行为识别装置还包括:获取模块;
所述获取模块,配置为获取关键帧图像;
所述获取模块,还配置为:对获取的实时视频流进行解码处理,得到与所述实时视频流对应的多帧原始帧图像;
基于预设的抽帧规则,从所述多帧所述原始帧图像中抽取所述关键帧图像。
在一些实施例中,所述检测模块,包括:
检测单元,配置为对所述关键帧图像进行人体位置检测得到至少一个人体位置检测点;
获取单元,配置为获取所述关键帧图像中,每个人体位置检测点在所述关键帧图像中的坐标位置;
确定单元,配置为基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框。
在一些实施例中,所述确定单元还配置为:
针对每个人体位置检测点,基于所述人体位置检测点的坐标位置,得到覆盖与所述人体位置检测点对应的人体图像的延伸区域,其中,所述人体位置检测点位于所述延伸区域内;
基于所述人体图像的延伸区域,确定所述关键帧图像中的所述人体图像,并按照所述延伸区域的边缘,绘制对应所述人体图像的人体检测框。
在一些实施例中,所述识别模块还配置为:
将至少一个所述目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中,获取每个目标截取图像的、包括预设目标动作的置信度的检测结果;
针对所述每个目标截取图像,在所述置信度大于预设的置信度阈值的情况下,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作。
在一些实施例中,所述行为识别装置还包括训练模块,所述训练模块,配置为;
所述训练模块,配置为获取包括目标动作的正样本图像,以及不包括所述目标动作的负样本图像;
基于所述正样本图像以及与所述正样本图像对应的正样本标签构建正样本,并基于所述负样本图像以及与所述负样本图像对应的负样本标签构建负样本,所述正样本标签为所述正样本图像中的动作与所述预设目标动作一致,所述负样本标签为所述负样本图像中的动作与所述预设目标动作不一致;
将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,得到训练好的人体动作识别模型。
在一些实施例中,所述识别模块还配置为:
针对每个目标人体检测框对应的目标截取图像,识别所述目标截图图像中至少两个目标检测点之间的位置关系,得到所述目标截取图像的、包括所述位置关系的检测结果;
响应于所述位置关系与预设目标动作中对应的至少两个检测点之间的预设位置关系相匹配,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作;
其中,所述位置关系包括至少两个目标检测点之间的距离、所述人体图像中至少两个目标检测点之间的偏移角、在预设时间范围内所述人体图像中至少两个目标检测点之间的移动距离和/或偏移角变化范围中的一者或者多者。
在一些实施例中,所述行为识别装置还包括:
展示模块,配置为将包括所述预设目标动作的关键帧图像以及与所述关键帧图像对应的告警信息进行前端展示;
所述告警信息包括以下至少一种:所述告警信息包括以下至少一种:所述预设目标动作执行人员的数量、每个预设目标动作执行人员相对于所述关键帧图像的检测点位置以及通过所述人体动作识别模型数据输出的每个预设目标动作执行人员执行所述目标动作的置信度数值。
在一些实施例中,所述目标动作包括以下动作中的一种或者多种:
玩手机、打电话、查看手机。
第三方面,本申请实施例还提供一种计算机设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第四方面,本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
本申请实施例提供的一种行为识别方法、装置、计算机设备及存储介质,包括:通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框;基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,则确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
本申请实施例中,通过对关键帧图像进行人体检测,并选取目标人体检测框,对目标人体检测框中目标截取图像进行人体动作识别,进行确定目标截取图像是否存在目标动作,从而判断关键帧图像是否包括目标动作。这样,可以通过检测目标动作判断重点监测区域中是否存在异常行动,无需检测到对应的执行物品,提高针对重点监测区域的检测效率。
为使本申请实施例的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1A示出了本申请实施例所提供的一种行为识别方法的流程图;
图1B示出可以应用本申请实施例的行为识别方法的一种系统架构示意图;
图2示出了本申请实施例所提供的人体检测示意图;
图3示出了本申请实施例所提供的另一种行为识别方法的流程图;
图4示出了本申请实施例所提供的一种行为行为识别装置的示意图;
图5示出了本申请实施例所提供的另一种行为行为识别装置的示意图;
图6示出了本申请实施例所提供的行为行为识别装置中,检测模块的具体示意图;
图7示出了本申请实施例所提供的一种计算机设备的示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
经研究发现,目前,一些监控设备可以就人员使用的智能设备进行识别检测,即当监控画面中出现智能设备时,可以就该识别结果进行判断并执行对应的警示作用。但是上述方法中,往往只能够针对监控画面中可以明确识别为智能设备的情况进行警示,针对无法明确识别出智能设备时的识别效率较低。
基于上述研究,本申请提供了一种行为识别方法,通过对关键帧图像进行人体检测,并选取目标人体检测框,对目标人体检测框中目标截取图像进行人体动作识别,进行确定目标截取图像是否存在目标动作,从而判断关键帧图像是否包括目标动作。这样,可以通过检测目标动作判断重点监测区域中是否存在异常行动,无需检测到对应的执行物品,提高针对重点监测区域的检测效率。
为便于对本实施例进行理解,首先对本申请实施例所公开的一种行为识别方法进行详细介绍,本申请实施例所提供的行为识别方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该行为识别方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
参见图1A所示,为本申请实施例提供的一种行为识别方法的流程图,所述识别方法包括步骤S101至S104,其中:
S101:通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积。
一些可能的实现方式中,可以对抽取得到的关键帧图像进行人体检测,检测出该关键帧图像中所包括的至少一个人体图像,并对每个人体图像添加对应的人体检测框,并基于每个人体检测框的位置及大小,确定出每个人体检测框与预设的重点监测区域之间的重叠面积。
在本申请的一个实施例中,在对关键帧图像进行人体检测之前,还包括:获取关键帧图像。一些可能的实现方式中,可以通过以下步骤抽取所述关键帧图像:
对获取的实时视频流进行解码处理,得到与所述实时视频流对应的多帧原始帧图像;
基于预设的抽帧规则,从所述多帧所述原始帧图像中抽取所述关键帧图像。
示例性的,视频的帧数一般都是30帧/秒,最低可以降到25帧/秒,因此可以基于实时视频流的帧数对其进行解码处理。一些可能的实现方式中,可以在实时接收到监控视频的同时,对其进行解码处理,从而得到与实时视频流对应的多帧原始帧图像。在获取多帧原始帧图像之后,鉴于处理资源的限制,以及基于目标业务的需求,一般无需对每一帧原始帧图像进行处理,因此,可以基于一定的抽帧规则,例如每10帧抽取一帧、或者每5帧抽取一帧的规则,从多帧所述原始帧图像中抽取帧图像,作为关键帧图像。
在本申请实施例中,在获取到关键帧图像之后,便可以对关键帧图像进行人体检测处理,从而可以检测出关键帧图像中的至少一个人体图像,并对每个人体图像添加对应的人体检测框。
参阅图2,图2为本申请实施例所提供的人体检测示意图。
一些可能的实现方式中,所述通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,包括:
对所述关键帧图像进行人体位置检测得到至少一个人体位置检测点;
获取所述关键帧图像中,每个人体位置检测点在所述关键帧图像中的坐标位置;
基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框。
示例性的,可以通过将关键帧图像输入至人体检测模型,或者是对关键帧图像进行人体识别,从而筛选出关键帧图像中包括的人体图像,以及对应的大致位置。为了更为准确的确定人体图像的对应位置,因此可以通过标定人体检测点的方法,准确的确定人体的位置,例如,以人体的头顶、脚尖、颈部等人体部位作为人体检测点,并检测预设的人体检测点在关键帧图像中所在的坐标位置,接下来,可以以确定的人体检测点的坐标位置为基准,确定关键帧图像中包括的人体图像,并针对每一个人体图像绘制对应的人体检测框。
一些可能的实现方式中,所述基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框,包括:
针对每个人体位置检测点,基于所述人体位置检测点的坐标位置,得到覆盖与所述人体位置检测点对应的人体图像的延伸区域,其中,所述人体位置检测点位于所述延伸区域内;
基于所述人体图像的延伸区域,确定所述关键帧图像中的所述人体图像,并按照所述延伸区域的边缘,绘制对应所述人体图像的人体检测框。
示例性的,当人体检测点设定为人体头部时,以确定的人体头部检测点为中心,向预设的指定方向,例如水平垂直方向,或者是预先预测的人体直立方向以及对应的垂直方向,向外延伸,直至覆盖每个人体图像,得到覆盖每个人体图像的延伸区域。例如,从头部延伸至人体的双脚之间,形成线段,并以线段上各点为基准,向垂直线段方向延伸,直至覆盖整个人体图像,得到覆盖每个人体图像的延伸区域,对应延伸区域内的图像即为关键帧图像中的至少一个人体图像。之后,可以按照所述延伸区域的边缘,绘制对应每个人体图像的人体检测框。
承接于上述S101,该识别方法还包括:
S102:基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框。
一些可能的实现方式中,在确定出每个所述人体检测框与预设重点监测区域的重叠面积之后,便可以确定出重叠面积相对于每个人体检测框的占比,以及重叠面积相对于整个重点监测区域的占比,从而可以选取出满足预设条件的人体检测框作为目标人体检测框。
示例性的,如图2所示,假设预设比例为50%,关键帧图像中包括1、2、3、4,四个人体检测框,对应的曲线区域为重点监测区域,由图可知,人体检测框1完全位于重点监控区域内部,因此重叠面积与人体检测框面积的面积比为1,大于预设比例50%,可以确定为目标人体检测框,人体检测框2完全脱离于重点监控区域,因此重叠面积与人体检测框面积的面积比为0,小于预设比例50%,不能够作为目标人体检测框,可以直接筛除掉。针对于3、4两个人体检测框,人体检测框3的重叠面积与人体检测框面积的面积比大于预设比例50%,因此可以判断人体检测框3为目标人体检测框,对应的,人体检测框4的重叠面积与人体检测框面积的面积比小于预设比例50%,因此可以判断4不能够作为目标人体检测框。
在本申请实施例中,还可以通过选取重叠面积与重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框,与上述通过选取重叠面积与人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框的方法类似,在此不再赘述。
在另一个实施例中,选取出满足预设条件的人体检测框作为目标人体检测框的方法还可以包括:
确定所述人体检测框位于所述重点监测区域的内部面积,以及所述人体检测框位于所述重点监测区域的外部面积;
选取所述内部面积大于所述外部面积的人体检测框为所述目标人体检测框。
示例性的,如图2所示,关键帧图像中包括1、2、3、4,四个人体检测框,对应的曲线区域为重点监测区域,由图可知,1完全位于重点监控区域内部,因此可以确定为目标人体检测框,2完全脱离于重点监控区域,因此,可以直接筛除掉。针对于3、4两个人体检测框,3位于重点监测区域的面积要大于位于重点区域之外的面积,因此可以判断3为目标人体检测框,对应的,4位于重点监测区域的面积要小于位于重点区域之外的面积,因此可以判断4不是目标人体检测框。
承接与上述S102,所述识别方法还包括:
S103:对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果。
一些可能的实现方式中,在确定了目标人体检测框之后,便可以基于预设的动作识别方法对目标截取图像进行动作识别,进而得到每个目标截取图像对应的检测结果。
所述目标动作包括以下动作中的一种或者多种:玩手机、打电话、查看手机。
在一个实施例中,可以通过训练完成的动作识别模块完成动作识别,得到检测结果。一些可能的实现方式中,将至少一个所述目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中,获取每个目标截取图像的、包括预设目标动作的置信度的检测结果。针对所述每个目标截取图像,在所述置信度大于预设的置信度阈值情况下,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作。
示例性的,将目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中之后,该模型可以输出针对每张目标截取图像的存在预设目标动作的置信度,例如:A图像的置信度为95%,B图像的置信度为80%,那么基于预设的置信度阈值,例如90%,可以判定A图像存在预设的目标动作。
在本申请一个实施例中,通过以下步骤训练所述人体动作识别模型:
获取包括目标动作的正样本图像,以及不包括所述目标动作的负样本图像;
基于所述正样本图像以及与所述正样本图像对应的正样本标签构建正样本,并基于所述负样本图像以及与所述负样本图像对应的负样本标签构建负样本,所述正样本标签为所述正样本图像中的动作与所述预设目标动作一致,所述负样本标签为所述负样本图像中的动作与所述预设目标动作不一致;
将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,得到训练好的人体动作识别模型。
示例性的,以检测目的为玩手机为例,可以设定目标动作为双手靠近,左手或右手靠近耳部、低头等。对应的,正样本图像对应的标签为所述正样本图像中的动作与所述预设目标动作一致,负样本图像对应的标签为所述负样本图像中的动作与所述预设目标动作不一致。之后,将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,直至该模型的模型损失满足预设的损失阈值,完成训练过程,得到用于确定输入样本包括目标动作的置信度的训练好的人体动作识别模型。
在另一实施例中,所述目标动作可以包括所述人体图像中至少两个目标检测点之间的距离、所述人体图像中至少两个目标检测点之间的偏移角、在预设时间范围内所述人体图像中至少两个目标检测点之间的移动距离和/或偏移角变化范围。
示例性的,以检测目的为玩手机为例,由于玩手机时,手部会靠近眼睛或是两个手靠近,又或者,手部与面部的偏移角缩小或双手之间的偏移角缩小至一定的角度阈值,还可以监测在一定时间范围内,手部或头部不发生移动或是发生较小范围的移动等,由此,可以通过监测上述动作确定监测人员是否进行玩手机的动作。以检测目的为打电话为例,可以监测手部与耳部之间的距离,或是偏移角,又或是在一定时间范围内,手部或头部不发生移动或是发生较小范围的移动等,在此不再赘述。
本申请实施例可应用于在监控区域对目标人员进行监控,例如,在安置运作机器工厂里,禁止工 作人员溜号、走神等,尤其是玩手机、打电话等容易造成安全隐患的违规行为,因此可以通过监测是否存在目标动作判断监控区域内的人员是否存在违规行为;又或者,对于在一些考试场景下,禁止人员传递消息时,可以通过监测监控区域内的人员是否存在目标动作进而判定是否存在违规行为;又或者,在马路上,尤其是在路口等车流密集位置,如果行人在过马路时玩手机会存在较大的交通安全隐患,也可以通过监测是否存在目标动作判断监控区域内的人员是否存在违规行为,在此不再赘述。
承接于上述S103,所述识别方法还包括:
S104:响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,则确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
该步骤中,通过判定是否存在包括预设目标动作的目标截取图像,从而判断关键帧图像是否包括目标动作,进而判定关键帧对应的检测区域是否存在目标行为。
示例性的,一个区域对应一个负责管理者,因此当判定存在包括目标动作的目标截取图像,可以将包括目标动作的关键帧图像直接发送至对应的负责管理者,从而便于负责管理者基于该图像进行对应处理。
本申请实施例通过对关键帧图像进行人体检测,并选取目标人体检测框,对目标人体检测框中目标截取图像进行人体动作识别,进行确定目标截取图像是否存在目标动作,从而判断关键帧图像是否包括目标动作。这样,可以通过检测目标动作判断重点监测区域中是否存在异常行动,无需检测到对应的执行物品,提高针对重点监测区域的检测效率。
图1B示出可以应用本申请实施例的行为识别方法的一种系统架构示意图;如图1B所示,该系统架构中包括:行为识别终端201、网络202和电子设备203。为实现支撑一个示例性应用,行为识别终端201和电子设备203通过网络202建立通信连接,电子设备203通过网络202向行为识别终端201上报获取的关键帧图像;关键帧图像201对关键帧图像进行人体检测;首先,确定关键帧图像中的至少一个人体图像、对应每个人体图像的人体检测框,以及每个人体检测框与预设重点监测区域的重叠面积;其次,根据重叠面积选取目标人体检测框,并对每个目标人体检测框进行动作识别。最后,基于检测结果,判断关键帧图像中是否包括目标行为。行为识别终端201将输出的关键帧图像上传至网络202。
作为示例,电子设备203可以包括图像采集设备或者图像扫描设备,行为识别终端201可以包括具有视觉信息处理能力的视觉处理设备或远程服务器。网络202可以采用有线或无线连接方式。其中,当行为识别终端201为视觉处理设备时,电子设备203可以通过有线连接的方式与视觉处理设备通信连接,例如通过总线进行数据通信;当行为识别终端201为远程服务器时,电子设备203可以通过无线网络与远程服务器进行数据交互。
或者,在一些场景中,电子设备203可以是带有视频采集模组的视觉处理设备,可以是带有摄像头的主机。这时,本申请实施例的位姿获取方法可以由电子设备203执行,上述系统架构可以不包含网络202和服务器。
参见图3所示,为本申请实施例提供的另一种行为识别方法的流程图,所述识别方法包括步骤S301至S305,其中:
S301:通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积。
S302:基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框。
S303:对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果。
S304:响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
其中,步骤S301至步骤S304的描述可以参考步骤S101至步骤S104的描述,并且可以达到相同 的技术效果,在此不再赘述。
S305:将包括所述预设目标动作的关键帧图像以及与所述关键帧图像对应的告警信息进行前端展示。
该步骤中,在确定出包括预设目标动作的关键帧图像之后,可以将该关键帧图像以及与该关键帧图像对应的告警信息进行前端展示,便于对应的负责人员执行对应的操作。
其中,所述告警信息包括以下至少一种:所述告警信息包括以下至少一种:所述预设目标动作执行人员的数量、每个预设目标动作执行人员相对于所述关键帧图像的检测点位置以及通过所述人体动作识别模型数据输出的每个预设目标动作执行人员执行所述目标动作的置信度数值。
示例性的,当目标动作执行人员的数量超过一定数值时,可以确定该关键帧图像存在较为严重的违规行为,应处于较高级别的预警处理。每个目标动作执行人员相对于所述关键帧图像的检测点位置便于负责人员查找对应的目标动作执行人员,或是基于该坐标位置判定处于该位置的人员信息。通过所述人体动作识别模型数据输出的每个目标动作执行人员执行所述目标动作的置信度数值便于负责人员判定进行相关处理的紧急程度,在置信度极高的情况下,即确定存在目标动作的概率极高,应及时予以对应处理。
本申请实施例通过前端展示存在目标动作的关键帧图像以及相应的告警信息,为负责人员处理相关问题提供了便利,进而提高了针对目标动作的处理效率。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本申请实施例中还提供了与行为识别方法对应的行为行为识别装置,由于本申请实施例中的装置解决问题的原理与本申请实施例上述行为识别方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图4、图5、图6所示,图4为本申请实施例提供的一种行为识别的示意图;图5为了本申请实施例所提供的另一种行为行为识别装置的示意图;图6为本申请实施例所提供的行为行为识别装置中,检测模块的具体示意图。所述行为识别装置包括:检测模块410、选取模块420、识别模块430、确定模块440;其中:
检测模块410,配置为通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积;
选取模块420,配置为基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;
识别模块430,配置为对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;
确定模块440,配置为响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
在一些实施例中,如图5所示,所述行为识别装置还包括:获取模块450;
所述获取模块450,配置为获取关键帧图像
所述获取模块450,还配置为:对获取的实时视频流进行解码处理,得到与所述实时视频流对应的多帧原始帧图像;
基于预设的抽帧规则,从所述多帧所述原始帧图像中抽取所述关键帧图像。
在一些实施例中,如图6所示,所述检测模块410,包括:
检测单元411,配置为对所述关键帧图像进行人体位置检测得到至少一个人体位置检测点;
获取单元412,配置为获取所述关键帧图像中,每个人体位置检测点在所述关键帧图像中的坐标位置;
确定单元413,配置为基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框。
在一些实施例中,所述确定单元413还配置为:
针对每个人体位置检测点,基于所述人体位置检测点的坐标位置,得到覆盖与所述人体位置检测点对应的人体图像的延伸区域,其中,所述人体位置检测点位于所述延伸区域内;
基于所述人体图像的延伸区域,确定所述关键帧图像中的所述人体图像,并按照所述延伸区域的边缘,绘制对应所述人体图像的人体检测框
在一些实施例中,所述识别模块430还配置为:
将至少一个所述目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中,获取每个目标截取图像的、包括预设目标动作的置信度的检测结果;
针对所述每个目标截取图像,在所述置信度大于预设的置信度阈值情况下,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作。
在一些实施例中,如图5所示,所述行为识别装置还包括训练模块460;
所述训练模块460,配置为获取包括目标动作的正样本图像,以及不包括所述目标动作的负样本图像;
基于所述正样本图像以及与所述正样本图像对应的正样本标签构建正样本,并基于所述负样本图像以及与所述负样本图像对应的负样本标签构建负样本,所述正样本标签为所述正样本图像中的动作与所述预设目标动作一致,所述负样本标签为所述负样本图像中的动作与所述预设目标动作不一致;
将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,得到训练好的人体动作识别模型。
在一些实施例中,所述识别模块430还配置为:
针对每个目标人体检测框对应的目标截取图像,识别所述目标截图图像中至少两个目标检测点之间的位置关系,得到所述目标截取图像的、包括所述位置关系的检测结果;
响应于所述位置关系与预设目标动作中对应的至少两个检测点之间的预设位置关系相匹配,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作;
其中,所述位置关系包括至少两个目标检测点之间的距离、所述人体图像中至少两个目标检测点之间的偏移角、在预设时间范围内所述人体图像中至少两个目标检测点之间的移动距离和/或偏移角变化范围中的一者或者多者。
在一些实施例中,如图5所示,所述行为识别装置还包括:
展示模块470,配置为将包括所述预设目标动作的关键帧图像以及与所述关键帧图像对应的告警信息进行前端展示;
所述告警信息包括以下至少一种:所述告警信息包括以下至少一种:所述预设目标动作执行人员的数量、每个预设目标动作执行人员相对于所述关键帧图像的检测点位置以及通过所述人体动作识别模型数据输出的每个预设目标动作执行人员执行所述目标动作的置信度数值。
在一些实施例中,所述预设目标动作包括以下动作中的一种或者多种:
玩手机、打电话、查看手机。
本申请实施例通过对关键帧图像进行人体检测,并选取目标人体检测框,对目标人体检测框中目标截取图像进行人体动作识别,进行确定目标截取图像是否存在目标动作,从而判断关键帧图像是否包括目标动作。这样,可以通过检测目标动作判断重点监测区域中是否存在异常行动,无需检测到对应的执行物品,提高针对重点监测区域的检测效率。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
对应于图1A中的一种行为识别方法,本申请实施例还提供了一种计算机设备,如图7所示,为本申请实施例提供的计算机设备结构示意图,包括:
处理器701、存储器702、和总线703;存储器702用于存储执行指令,包括内存7021和外部存 储器7022;这里的内存7021也称内存储器,用于暂时存放处理器701中的运算数据,以及与硬盘等外部存储器7022交换的数据,处理器701通过内存7021与外部存储器7022进行数据交换,当所述计算机设备运行时,所述处理器701与所述存储器702之间通过总线703通信,使得所述处理器701执行以下指令:
通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积;
基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;
对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;
响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,则确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的行为识别方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本申请实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的行为识别方法的步骤,具体可参见上述方法实施例,在此不再赘述。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一些实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一些实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本申请的具体实施方式,用以说明本申请的技术方案,而非对其限制,本申请的保护范围并不局限于此,尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本申请实施例技术方案的精神 和范围,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。
工业实用性
本申请提供了一种行为识别方法、装置、计算机设备及存储介质,包括:通过对预先抽取的关键帧图像进行人体检测,确定关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框以及与预设重点监测区域的重叠面积;选取重叠面积与人体检测框面积或者重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;对每个目标人体检测框的目标截取图像进行动作识别,得到检测结果;响应于检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。这样,通过检测目标动作判断重点监测区域中是否存在异常行动,提高针对重点监测区域的检测效率。

Claims (20)

  1. 一种行为识别方法,所述方法由电子设备执行,所述识别方法包括:
    通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像、对应每个人体图像的人体检测框,以及每个所述人体检测框与预设重点监测区域的重叠面积;
    基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;
    对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;
    响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
  2. 根据权利要求1所述的行为识别方法,其中,所述方法还包括,获取关键帧图像;
    所述获取关键帧图像,包括:
    对获取的实时视频流进行解码处理,得到与所述实时视频流对应的多帧原始帧图像;
    基于预设的抽帧规则,从所述多帧所述原始帧图像中抽取所述关键帧图像。
  3. 根据权利要求1所述的行为识别方法,其中,所述通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框,包括:
    对所述关键帧图像进行人体位置检测得到至少一个人体位置检测点;
    获取所述关键帧图像中,每个人体位置检测点在所述关键帧图像中的坐标位置;
    基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框。
  4. 根据权利要求3所述的行为识别方法,其中,所述基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框,包括:
    针对每个人体位置检测点,基于所述人体位置检测点的坐标位置,得到覆盖与所述人体位置检测点对应的人体图像的延伸区域;其中,所述人体位置检测点位于所述延伸区域内;
    基于所述人体图像的延伸区域,确定所述关键帧图像中的所述人体图像,并按照所述延伸区域的边缘,绘制对应所述人体图像的人体检测框。
  5. 根据权利要求1所述的行为识别方法,其中,所述对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果,包括:
    将至少一个所述目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中,获取每个目标截取图像的、包括预设目标动作的置信度的检测结果;
    针对所述每个目标截取图像,在所述置信度大于预设的置信度阈值的情况下,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作。
  6. 根据权利要求5所述的行为识别方法,其中,通过以下步骤训练所述人体动作识别模型:
    获取包括目标动作的正样本图像,以及不包括所述目标动作的负样本图像;
    基于所述正样本图像以及与所述正样本图像对应的正样本标签构建正样本,并基于所述负样本图像以及与所述负样本图像对应的负样本标签构建负样本,所述正样本标签为所述正样本图像中的动作与所述预设目标动作一致,所述负样本标签为所述负样本图像中的动作与所述预设目标动作不一致;
    将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,得到训练好的人体动作识别模型。
  7. 根据权利要求1所述的行为识别方法,其中,所述对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果,包括:
    针对每个目标人体检测框对应的目标截取图像,识别所述目标截图图像中至少两个目标检测点之间的位置关系,得到所述目标截取图像的、包括所述位置关系的检测结果;
    响应于所述位置关系与预设目标动作中对应的至少两个检测点之间的预设位置关系相匹配,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作;
    其中,所述位置关系包括至少两个目标检测点之间的距离、所述人体图像中至少两个目标检测点之间的偏移角、在预设时间范围内所述人体图像中至少两个目标检测点之间的移动距离和/或偏移角变化范围中的一者或者多者。
  8. 根据权利要求1至7任一项所述的行为识别方法,其中,所述识别方法还包括:
    将包括所述预设目标动作的关键帧图像以及与所述关键帧图像对应的告警信息进行前端展示;
    所述告警信息包括以下至少一种:所述预设目标动作执行人员的数量、每个预设目标动作执行人员相对于所述关键帧图像的检测点位置以及通过所述人体动作识别模型数据输出的每个预设目标动作执行人员执行所述目标动作的置信度数值。
  9. 根据权利要求1所述的方法,其中,所述预设目标动作包括以下动作中的一种或者多种:
    玩手机、打电话、查看手机。
  10. 一种行为识别装置,其中,所述行为识别装置包括:
    检测模块,配置为通过对预先抽取的关键帧图像进行人体检测,确定所述关键帧图像中的至少一个人体图像以及对应每个人体图像的人体检测框;
    选取模块,配置为基于每个所述人体检测框与所述重点监测区域的重叠面积,选取所述重叠面积与所述人体检测框面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;或者选取所述重叠面积与所述重点监测区域面积的面积比大于或等于预设比例的人体检测框为目标人体检测框;
    输入模块,配置为对每个目标人体检测框对应的目标截取图像进行动作识别,得到每个目标截取图像的检测结果;
    确定模块,配置为响应于所述检测结果指示存在至少一个目标截取图像包括预设目标动作,确定所述关键帧图像包括与所述预设目标动作对应的目标行为。
  11. 根据权利要求10所述的行为识别装置,其中,
    所述行为识别装置还包括:获取模块;
    所述获取模块,配置为获取关键帧图像;
    所述获取模块,还配置为:对获取的实时视频流进行解码处理,得到与所述实时视频流对应的多帧原始帧图像;
    基于预设的抽帧规则,从所述多帧所述原始帧图像中抽取所述关键帧图像。
  12. 根据权利要求10所述的行为识别装置,其中,所述检测模块,包括:
    检测单元,配置为对所述关键帧图像进行人体位置检测得到至少一个人体位置检测点;
    获取单元,配置为获取所述关键帧图像中,每个人体位置检测点在所述关键帧图像中的坐标位置;
    确定单元,配置为基于每个人体位置检测点在所述关键帧图像中的坐标位置,确定所述关键帧图像中的至少一个人体图像,并绘制所述至少一个人体图像中每个人体图像的人体检测框。
  13. 根据权利要求12所述的行为识别装置,其中,所述确定单元还配置为:
    针对每个人体位置检测点,基于所述人体位置检测点的坐标位置,得到覆盖与所述人体位置检测点对应的人体图像的延伸区域,其中,所述人体位置检测点位于所述延伸区域内;
    基于所述人体图像的延伸区域,确定所述关键帧图像中的所述人体图像,并按照所述延伸区域的边缘,绘制对应所述人体图像的人体检测框。
  14. 根据权利要求10所述的行为识别装置,其中,所述识别模块还配置为:
    将至少一个所述目标人体检测框对应的目标截取图像输入至预先训练好的人体动作识别模型中,获取每个目标截取图像的、包括预设目标动作的置信度的检测结果;
    针对所述每个目标截取图像,在所述置信度大于预设的置信度阈值的情况下,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作。
  15. 根据权利要求14所述的行为识别装置,其中,所述行为识别装置还包括训练模块,所述训练模块,配置为;
    所述训练模块,配置为获取包括目标动作的正样本图像,以及不包括所述目标动作的负样本图像;
    基于所述正样本图像以及与所述正样本图像对应的正样本标签构建正样本,并基于所述负样本图像以及与所述负样本图像对应的负样本标签构建负样本,所述正样本标签为所述正样本图像中的动作与所述预设目标动作一致,所述负样本标签为所述负样本图像中的动作与所述预设目标动作不一致;
    将所述正样本及所述负样本输入至待训练的人体动作识别模型中,对所述待训练的人体动作识别模型进行模型训练,得到训练好的人体动作识别模型。
  16. 根据权利要求10所述的行为识别装置,其中,所述识别模块还配置为:
    针对每个目标人体检测框对应的目标截取图像,识别所述目标截图图像中至少两个目标检测点之间的位置关系,得到所述目标截取图像的、包括所述位置关系的检测结果;
    响应于所述位置关系与预设目标动作中对应的至少两个检测点之间的预设位置关系相匹配,确定所述目标截取图像的检测结果为所述目标截取图像中包括所述预设目标动作;
    其中,所述位置关系包括至少两个目标检测点之间的距离、所述人体图像中至少两个目标检测点之间的偏移角、在预设时间范围内所述人体图像中至少两个目标检测点之间的移动距离和/或偏移角变化范围中的一者或者多者。
  17. 根据权利要求10至16任一项所述的行为识别装置,其中,所述行为识别装置还包括:
    展示模块,配置为将包括所述预设目标动作的关键帧图像以及与所述关键帧图像对应的告警信息进行前端展示;
    所述告警信息包括以下至少一种:所述告警信息包括以下至少一种:所述预设目标动作执行人员的数量、每个预设目标动作执行人员相对于所述关键帧图像的检测点位置以及通过所述人体动作识别模型数据输出的每个预设目标动作执行人员执行所述目标动作的置信度数值。
  18. 根据权利要求10所述的行为识别装置,其中,所述目标动作包括以下动作中的一种或者多种:
    玩手机、打电话、查看手机。
  19. 一种计算机设备,其中,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一项所述的行为识别方法的步骤。
  20. 一种计算机可读存储介质,其中,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任一项所述的行为识别方法的步骤。
PCT/CN2021/127654 2021-06-30 2021-10-29 行为识别方法、装置、计算机设备及存储介质 WO2023273075A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110737409.1A CN113255606A (zh) 2021-06-30 2021-06-30 行为识别方法、装置、计算机设备及存储介质
CN202110737409.1 2021-06-30

Publications (1)

Publication Number Publication Date
WO2023273075A1 true WO2023273075A1 (zh) 2023-01-05

Family

ID=77190290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127654 WO2023273075A1 (zh) 2021-06-30 2021-10-29 行为识别方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN113255606A (zh)
WO (1) WO2023273075A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830517A (zh) * 2023-02-14 2023-03-21 江西云眼视界科技股份有限公司 基于视频的考场异常帧提取方法及系统
CN115953137A (zh) * 2023-03-13 2023-04-11 长春工程学院 一种矿山智能化管理方法和系统
CN116563951A (zh) * 2023-07-07 2023-08-08 东莞先知大数据有限公司 一种单杠悬挂动作规范确定方法、装置、设备和存储介质
CN116740813A (zh) * 2023-06-20 2023-09-12 深圳市视壮科技有限公司 一种基于ai图像识别行为监测的分析系统及其方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255606A (zh) * 2021-06-30 2021-08-13 深圳市商汤科技有限公司 行为识别方法、装置、计算机设备及存储介质
CN113955594B (zh) * 2021-10-18 2024-02-27 日立楼宇技术(广州)有限公司 一种电梯控制方法、装置、计算机设备和存储介质
CN113989944B (zh) * 2021-12-28 2022-04-08 北京瑞莱智慧科技有限公司 操作动作识别方法、装置及存储介质
CN114444895A (zh) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 清洁质量评估方法及相关设备
CN114333070A (zh) * 2022-03-10 2022-04-12 山东山大鸥玛软件股份有限公司 一种基于深度学习的考生异常行为检测方法
CN114943936B (zh) * 2022-06-17 2023-06-20 北京百度网讯科技有限公司 目标行为识别方法、装置、电子设备及存储介质
CN115147818A (zh) * 2022-06-30 2022-10-04 京东方科技集团股份有限公司 玩手机行为识别方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160234464A1 (en) * 2015-02-06 2016-08-11 Xerox Corporation Computer-vision based process recognition
CN111814587A (zh) * 2020-06-18 2020-10-23 浙江大华技术股份有限公司 人体行为检测方法、教师行为检测方法及相关系统和装置
CN112200081A (zh) * 2020-10-10 2021-01-08 平安国际智慧城市科技股份有限公司 异常行为识别方法、装置、电子设备及存储介质
CN112883866A (zh) * 2021-02-08 2021-06-01 上海新纪元机器人有限公司 一种区域入侵实时检测方法、系统及存储介质
CN113255606A (zh) * 2021-06-30 2021-08-13 深圳市商汤科技有限公司 行为识别方法、装置、计算机设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018012432A1 (ja) * 2016-07-12 2018-01-18 コニカミノルタ株式会社 行動判定装置及び行動判定方法
CN107818651A (zh) * 2017-10-27 2018-03-20 华润电力技术研究院有限公司 一种基于视频监控的非法越界报警方法及装置
CN108764190B (zh) * 2018-06-04 2021-09-24 山东财经大学 老年人离床与在床状态的视频监测方法
CN110147752A (zh) * 2019-05-15 2019-08-20 浙江大华技术股份有限公司 运动检测处理方法、装置、电子设备和存储介质
CN112528850B (zh) * 2020-12-11 2024-06-04 北京百度网讯科技有限公司 人体识别方法、装置、设备和存储介质
CN112784738B (zh) * 2021-01-21 2023-09-19 上海云从汇临人工智能科技有限公司 运动目标检测告警方法、装置以及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160234464A1 (en) * 2015-02-06 2016-08-11 Xerox Corporation Computer-vision based process recognition
CN111814587A (zh) * 2020-06-18 2020-10-23 浙江大华技术股份有限公司 人体行为检测方法、教师行为检测方法及相关系统和装置
CN112200081A (zh) * 2020-10-10 2021-01-08 平安国际智慧城市科技股份有限公司 异常行为识别方法、装置、电子设备及存储介质
CN112883866A (zh) * 2021-02-08 2021-06-01 上海新纪元机器人有限公司 一种区域入侵实时检测方法、系统及存储介质
CN113255606A (zh) * 2021-06-30 2021-08-13 深圳市商汤科技有限公司 行为识别方法、装置、计算机设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830517A (zh) * 2023-02-14 2023-03-21 江西云眼视界科技股份有限公司 基于视频的考场异常帧提取方法及系统
CN115953137A (zh) * 2023-03-13 2023-04-11 长春工程学院 一种矿山智能化管理方法和系统
CN116740813A (zh) * 2023-06-20 2023-09-12 深圳市视壮科技有限公司 一种基于ai图像识别行为监测的分析系统及其方法
CN116740813B (zh) * 2023-06-20 2024-01-05 深圳市视壮科技有限公司 一种基于ai图像识别行为监测的分析系统及其方法
CN116563951A (zh) * 2023-07-07 2023-08-08 东莞先知大数据有限公司 一种单杠悬挂动作规范确定方法、装置、设备和存储介质
CN116563951B (zh) * 2023-07-07 2023-09-26 东莞先知大数据有限公司 一种单杠悬挂动作规范确定方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN113255606A (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
WO2023273075A1 (zh) 行为识别方法、装置、计算机设备及存储介质
US10429944B2 (en) System and method for deep learning based hand gesture recognition in first person view
US11735018B2 (en) Security system with face recognition
TWI724736B (zh) 圖像處理方法及裝置、電子設備、儲存媒體和電腦程式
US11455788B2 (en) Method and apparatus for positioning description statement in image, electronic device, and storage medium
US11367194B1 (en) Image segmentation of a video stream
WO2021155632A1 (zh) 图像处理方法及装置、电子设备和存储介质
JP7031588B2 (ja) 不審者検出装置、不審者検出方法およびプログラム
US11295139B2 (en) Human presence detection in edge devices
US9098888B1 (en) Collaborative text detection and recognition
US20170124719A1 (en) Method, device and computer-readable medium for region recognition
WO2018210047A1 (zh) 数据处理方法、数据处理装置、电子设备及存储介质
TW202113757A (zh) 目標對象匹配方法及目標對象匹配裝置、電子設備和電腦可讀儲存媒介
CN111240482B (zh) 一种特效展示方法及装置
CN108600656B (zh) 在视频中添加面部标签的方法及装置
TW202105239A (zh) 圖像處理方法、電子設備和儲存介質
US20140354540A1 (en) Systems and methods for gesture recognition
US20210364281A1 (en) Localization and visualization of sound
US11010947B2 (en) Information processing system and information processing apparatus
TW202121249A (zh) 行人事件的檢測方法及檢測裝置、電子設備和電腦可讀儲存媒體
CN114882437A (zh) 一种识别模型的训练方法、装置、电子设备和存储介质
CN109753883A (zh) 视频定位方法、装置、存储介质和电子设备
US20200018926A1 (en) Information processing apparatus, information processing method, and program
CN109816628B (zh) 人脸评价方法及相关产品
WO2022183663A1 (zh) 事件检测方法、装置、电子设备、存储介质及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21948002

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE