CN113065474A

CN113065474A - Behavior recognition method and device and computer equipment

Info

Publication number: CN113065474A
Application number: CN202110372549.3A
Authority: CN
Inventors: 蔡逸超; 游华斌; 黄睿; 晏斐; 张远来
Original assignee: Tellhow Software Co ltd
Current assignee: Tellhow Software Co ltd
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-07-02
Anticipated expiration: 2041-04-07
Also published as: CN113065474B

Abstract

The embodiment of the invention discloses a behavior identification method, which comprises the following steps: extracting skeleton key points of each object to be detected in the current frame image; searching a target object with an arm included angle smaller than an included angle threshold value from all objects to be detected according to the skeleton key points; acquiring a hand area image of a target object; and identifying whether the hand area image of each target object contains the target object or not by using a pre-trained behavior identification model, and outputting a behavior identification result corresponding to each target object in the current frame image. By combining the analysis of the skeletal key points and the target recognition of the hand area image, the real-time recognition of smoking and playing mobile phone behaviors of the personnel in the monitoring image can be realized. By carrying out cigarette mobile phone identification on the hand region image, the background interference can be greatly reduced, and the identification accuracy is improved.

Description

Behavior recognition method and device and computer equipment

Technical Field

The invention relates to the field of image recognition, in particular to a behavior recognition method and device and computer equipment.

Background

In order to improve the service quality of the business hall staff and maintain the good image of the enterprise, the working state and behavior standard supervision of the business hall staff needs to be enhanced. The staff smokes or plays the cell phone during the work, which greatly influences the work efficiency and causes dissatisfaction of visiting clients. In addition, smoking in public places not only damages self health, but also seriously pollutes indoor air and influences the health of other people. With the development of AI (AI) technology, image recognition is widely applied to various aspects of life and production. By applying the advanced image recognition technology, the smoking and mobile phone playing behaviors of the staff in the business hall are controlled, the working service efficiency of the staff can be effectively improved, the civilized service is protected, and the customer satisfaction and the self image of an enterprise are improved.

In the prior art, in the field of behavior recognition, the prior art mostly focuses on behavior recognition without distinguishing objects, but a scheme for recognizing specific behaviors of a specific group is relatively lacked, and the prior art is mostly directed at behaviors with remarkable body actions such as rope skipping, hand waving, running and the like. Particularly, for the identification of the non-significant body behaviors such as smoking or playing a mobile phone, the existing technical scheme has the situations of poor identification precision and low identification efficiency.

Disclosure of Invention

In order to solve the problems, the invention provides a behavior recognition method, a behavior recognition device and computer equipment.

The specific scheme is as follows:

in a first aspect, an embodiment of the present disclosure provides a behavior identification method, where the method includes:

extracting skeleton key points of each object to be detected in the current frame image, wherein the skeleton key points at least comprise wrist key points, elbow key points and shoulder key points;

searching a target object with an arm included angle smaller than an included angle threshold value from all objects to be detected according to the skeleton key points, wherein the arm included angle is an included angle between a vector formed by the elbow key points and the shoulder key points and a vector formed by the elbow key points and the wrist key points;

acquiring a hand area image of a target object, wherein the hand area image is an image comprising a wrist key point and an adjacent area;

and identifying whether the hand area image of each target object contains a target object by using a pre-trained behavior identification model, and outputting a behavior identification result corresponding to each target object in the current frame image, wherein the target object comprises at least one of a cigarette and a mobile phone, the behavior identification result comprises an abnormal behavior and an abnormal behavior, and the abnormal behavior comprises at least one of a smoking behavior and a mobile phone playing behavior.

According to a specific embodiment of the present disclosure, the obtaining manner of the behavior recognition model includes:

acquiring hand area image samples, wherein the hand area image samples are divided into first type image samples of which wrist key point peripheral areas comprise target objects and second type image samples of which wrist key point peripheral areas do not contain the target objects;

inputting all hand region image samples into an initial neural network model for training, and reserving a deep learning network model obtained by each training;

and performing index evaluation on each deep learning network model, and selecting the deep learning network model with the highest index value as the behavior recognition model.

According to a specific embodiment of the present disclosure, the step of recognizing whether the hand region image of each target object contains a target object by using a pre-trained behavior recognition model, and outputting a behavior recognition result corresponding to each target object in the current frame image includes:

if cigarettes and/or a mobile phone are identified in the hand area image of the target object, determining that the target object has abnormal behaviors;

and if the cigarettes and the mobile phone are not identified in the hand area image of the target object, determining that the target object has no abnormal behavior.

According to a specific embodiment of the present disclosure, the method further comprises:

counting the number of frames with abnormal behaviors as the result of behavior recognition in a preset number of continuous images in a video to be detected, wherein the preset number of continuous images comprise the current frame image and continuous multi-frame images adjacent to the current frame image in the forward direction;

and if the ratio of the number of the frames with abnormal behaviors to the preset number is greater than or equal to a preset threshold value, sending an alarm signal.

According to a specific embodiment of the present disclosure, the step of searching for a target object with an arm included angle smaller than an included angle threshold from all objects to be detected according to the bone key points includes:

extracting a complete object to be detected from all objects to be detected, wherein the complete object to be detected is the object to be detected of which at least one side of hand completely comprises a wrist key point, an elbow key point and a shoulder key point and the object size is larger than a size threshold, and the object size is the ratio of the maximum side length of a circumscribed rectangle of all skeleton key points of the object to be detected to the image frame width;

and searching a target object with an arm included angle smaller than an included angle threshold value from all the complete objects to be detected.

According to a specific embodiment of the present disclosure, the step of acquiring the hand region image of the target object includes:

and taking the wrist key point of the target object as a center, and intercepting a square area with a preset side length as a hand area image.

In a second aspect, an embodiment of the present disclosure further provides a behavior recognition apparatus, where the apparatus includes:

the extraction module is used for extracting skeleton key points of each object to be detected in the current frame image, wherein the skeleton key points at least comprise wrist key points, elbow key points and shoulder key points;

the searching module is used for searching a target object with an arm included angle smaller than an included angle threshold value from all objects to be detected according to the skeleton key points, wherein the arm included angle is an included angle between a vector formed by the elbow key points and the shoulder key points and a vector formed by the elbow key points and the wrist key points;

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a hand area image of a target object, and the hand area image comprises a wrist key point and an adjacent area;

the identification module is used for identifying whether the hand area image of each target object contains the target object or not by using a pre-trained behavior identification model and outputting a behavior identification result corresponding to each target object in the current frame image, wherein the target object comprises at least one of a cigarette and a mobile phone, the behavior identification result comprises abnormal behaviors and non-abnormal behaviors, and the abnormal behaviors comprise at least one of smoking behaviors and mobile phone playing behaviors.

According to a specific embodiment of the present disclosure, the apparatus further comprises:

the second acquisition module is used for acquiring hand area image samples, wherein the hand area image samples are divided into first type image samples of which wrist key point peripheral areas comprise target objects and second type image samples of which the wrist key point peripheral areas do not contain the target objects;

the training module is used for inputting all hand region image samples into the initial neural network model for training and reserving a deep learning network model obtained by each training;

and the selection module is used for performing index evaluation on each deep learning network model and selecting the deep learning network model with the highest index value as the behavior recognition model.

In a third aspect, an embodiment of the present disclosure further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program, when running on the processor, executes the behavior recognition method according to the first aspect.

In a fourth aspect, this disclosed embodiment also provides a computer-readable storage medium, which stores a computer program that, when running on a processor, executes the behavior recognition method according to the first aspect.

The behavior identification method, the behavior identification device and the computer equipment provided by the embodiment of the disclosure extract skeletal key points of each object to be detected in the current frame image; searching a target object with an arm included angle smaller than an included angle threshold value from all objects to be detected according to the skeleton key points; acquiring a hand area image of a target object; and identifying whether the hand area image of each target object contains the target object or not by using a pre-trained behavior identification model, and outputting a behavior identification result corresponding to each target object in the current frame image. By combining the analysis of the skeletal key points and the target recognition of the hand area image, the real-time recognition of smoking and playing mobile phone behaviors of the personnel in the monitoring image can be realized. By carrying out cigarette mobile phone identification on the hand region image, the background interference can be greatly reduced, and the identification accuracy is improved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.

Fig. 1 is a schematic flow chart illustrating a behavior recognition method provided by an embodiment of the present disclosure;

FIG. 2 illustrates an overall framework diagram of a behavior recognition method provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating skeletal key points of a behavior recognition method provided by an embodiment of the present disclosure;

FIG. 4 is a partial flow chart diagram illustrating a behavior recognition method provided by an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating an alarm processing of a behavior recognition method according to an embodiment of the present disclosure;

fig. 6 shows a block diagram of a behavior recognition device provided in an embodiment of the present disclosure;

fig. 7 shows a block diagram of some modules of a behavior recognition apparatus provided in an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.

Example 1

Fig. 1 is a schematic flow chart of a behavior recognition method according to an embodiment of the present disclosure. As shown in fig. 1, the behavior recognition method mainly includes the following steps:

s101, extracting skeleton key points of each object to be detected in the current frame image, wherein the skeleton key points at least comprise wrist key points, elbow key points and shoulder key points;

the scheme is mainly characterized in that whether smoking behaviors or mobile phone playing behaviors exist in each object to be detected in the image is judged by performing behavior recognition on the object to be detected. For example, video acquisition and image video are carried out on the spot of a business hall so as to analyze whether abnormal behaviors such as smoking, playing mobile phones and the like exist in the spot of the business hall or not, and thus, the automatic monitoring of the behaviors of personnel in a specific scene can be realized.

As shown in fig. 2, specifically, a current frame image of the monitoring image is acquired. The current frame image is a latest frame image collected in the monitoring image, and at least comprises an object to be detected. The object to be measured is a person collected in the monitoring image. And extracting the bone key points of each object to be detected in the image by using a bone key point detection algorithm. The extracted skeletal key points of the object to be detected mainly include 18 skeletal key points shown in fig. 3, such as wrist key points, elbow key points, shoulder key points, and the like, and the extracted skeletal key points are mainly used for abnormal behavior detection. When the object to be detected has smoking behavior or mobile phone playing behavior, an included angle formed by the upper arm and the forearm at the elbow has a determined angle range, so that the scheme mainly selects a wrist key point, an elbow key point and a shoulder key point as extraction objects. Of course, different skeletal key points can be extracted according to different algorithm adjustment.

The bone key point detection algorithm is a suitable algorithm according to actual conditions, such as OpenPose or AlphaPose, and is not limited herein.

And then, filtering out invalid key points by using scale analysis, wherein the invalid key points are bone key points corresponding to people with too small object scale in the image. In the monitoring image, the person always has a certain image proportion, and a person who is too small in the image may cause the accuracy of the behavior recognition result to be lowered, and thus may be discarded. And judging whether the bone key points meet the requirements or not by using an object scale criterion. The object scale is a ratio of a maximum side length of a circumscribed rectangle of all skeleton key points of the object to be measured to an image frame width, and of course, in other embodiments of the scheme, the object scale can be defined by self as required. And if all wrist key points of the object to be detected are missing or the object size of the object to be detected is too small, rejecting the object to be detected.

S102, searching a target object with an arm included angle smaller than an included angle threshold value from all objects to be detected according to the skeleton key points, wherein the arm included angle is an included angle between a vector formed by the elbow key points and the shoulder key points and a vector formed by the elbow key points and the wrist key points;

the detection object aimed by the scheme of the application is an object which may have smoking behaviors or mobile phone playing behaviors, and the included angle of the arms of the part of the object is generally a small acute angle. Therefore, the included angle of the arm of the object to be measured can be processed and screened.

Specifically, assume that the vector P made up of the shoulder and elbow keypoints for each person's left or right hand represents the upper arm, and the vector Q made up of the elbow and wrist keypoints represents the lower arm. The PQ angle may then represent the angle at which its arm bends. In the continuous video frames, a small included angle is formed when a person plays a mobile phone and smokes, so that the semantics of key points can be defined according to the smaller included angle, an included angle threshold value T is set, if the included angle formed by any arm of the person is smaller than the included angle threshold value T, the person is considered to have suspected smoking and mobile phone playing behaviors, and otherwise, the person is considered to be normal.

S103, acquiring a hand area image of the target object, wherein the hand area image is an image containing a wrist key point and an adjacent area;

and further extracting hand area images of the target objects with the arm included angles smaller than the included angle threshold value to perform behavior recognition so as to judge whether smoking behaviors or mobile phone playing behaviors exist.

In specific implementation, if the included angle of the left arm of the target object is smaller than the included angle threshold T, taking the key point of the left arm as the center, intercepting the rectangular area image as the hand area image, and performing subsequent identification; if the included angle of the right arm of the target object is smaller than the included angle threshold T, taking the key point of the right wrist as the center, intercepting the rectangular area image as a hand area image, and performing subsequent identification; if the included angles of the arms of the left hand and the right hand of the target object are smaller than the threshold value T, respectively taking the wrist key points of the two hands as the centers, intercepting the two rectangular area images as hand area images, and carrying out subsequent identification.

And S104, identifying whether the hand area image of each target object contains a target object or not by using a pre-trained behavior identification model, and outputting a behavior identification result corresponding to each target object in the current frame image, wherein the target object comprises at least one of a cigarette and a mobile phone, the behavior identification result comprises an abnormal behavior and an abnormal behavior, and the abnormal behavior comprises at least one of a smoking behavior and a mobile phone playing behavior.

In order to recognize the hand region image, a behavior recognition model needs to be trained in advance. The training method of the model comprises the steps of firstly obtaining hand area image samples, wherein the hand area image samples comprise a first type sample image containing a target object and a second type sample image not containing the target object. And inputting the hand area sample image into an initial neural network model for training. And obtaining a behavior recognition model for recognizing the hand region image.

In specific implementation, the hand area image is used as an input image, whether the hand area image contains the mobile phone and the cigarette or not is identified, and the position of the target is obtained. And judging whether the target object has abnormal behaviors or not according to the cigarette and mobile phone identification results of the hand area image of the target object.

In order to improve the reliability of the recognition result of the behavior recognition model, a confidence threshold value can be set for the behavior recognition model. In one particular embodiment, the confidence threshold is set to 0.4. Of course, the confidence threshold may be any value that meets the actual use case, and is not limited herein.

Further, if all target objects in the image have no abnormal behavior, the state of the current frame image is considered to be normal; if any target object has smoking behavior, mobile phone playing behavior or both smoking behavior and mobile phone playing behavior, the current frame image state is considered to be abnormal, and the states of all images are stored.

The behavior identification method provided by the embodiment of the disclosure comprises the steps of firstly extracting skeleton key points of each object to be detected in a current frame image, and filtering the skeleton key points which do not meet requirements; searching a target object with an arm included angle smaller than an included angle threshold value from all objects to be detected according to the skeleton key points; acquiring a hand area image containing wrist key points of a target object; and identifying whether the hand area image of each target object contains the target object or not by using a pre-trained behavior identification model, and outputting a behavior identification result corresponding to each target object in the current frame image. By combining the analysis of the skeletal key points and the target recognition of the hand area image, the real-time recognition of smoking and playing mobile phone behaviors of the personnel in the monitoring image can be realized. By carrying out cigarette mobile phone identification on the hand region image, the background interference can be greatly reduced, and the identification accuracy is improved.

On the basis of the above embodiment, the present embodiment mainly further defines the process of obtaining the behavior recognition model. As shown in fig. 4, the behavior recognition model is obtained in a manner including:

s401, acquiring hand area image samples, wherein the hand area image samples are divided into first type image samples of which wrist key point peripheral areas comprise target objects and second type image samples of which wrist key point peripheral areas do not contain the target objects;

in order to obtain the behavior recognition model, an initial neural network model needs to be established in advance, and a large number of hand area sample images are obtained to train and optimize the initial neural network model.

Specifically, a monitoring image of a business hall is collected, the collected image data includes the conditions of single person, multiple persons and no person in terms of personnel number, and the conditions of smoking behavior, mobile phone playing behavior, smoking behavior and mobile phone playing behavior existing at the same time and no abnormal behavior exist in terms of personnel behavior.

In one specific embodiment, 996 raw monitoring images including smoking behavior, cell phone playing behavior were acquired. The number of the monitoring image samples containing the mobile phone playing behaviors is 445, and the number of the monitoring image samples containing the smoking behaviors is 551.

As shown in fig. 2, the collected monitoring image is analyzed by using a bone key point detection algorithm, and a bone key point of each person in the image is obtained. And judging whether the skeleton key points meet the requirements by using a scale criterion, and eliminating staff with small scale in the scene. Specifically, if all the wrist key points of the person are missing, or the object size of the person is too small, the person is rejected.

The bone key point detection algorithm belongs to replaceable components, and the advanced algorithm in the industry, such as OpenPose, AlphaPose and the like, is adopted according to the actual situation, and is not limited here.

In one specific embodiment, the bone key point detection algorithm employs an openpos algorithm pre-trained on the COCO dataset, as shown in fig. 3. In fig. 3, the openpos algorithm detects 18 skeletal key points of the human body, wherein the main skeletal key points used are the left-hand wrist key point 4 and the right-hand wrist key point 7. And after the key points are obtained, filtering the key points by using scale analysis.

The method specifically comprises the following steps: when the skeletal key points extracted to the sample object do not contain the key points 4 and do not contain the key points 7, the left hand and the right hand of the sample object are considered to be not detected, and at the moment, the skeletal key points of the person cannot provide information for subsequent processing and are discarded; when the ratio of the maximum side length of the minimum bounding rectangle of the bone key points of the sample object to the image width is less than a given threshold value of 0.1, the bone key points of the sample object are considered to have detection errors or the scale of the sample object in the image is too small, and the sample object is discarded.

And acquiring image areas near the key points of the left hand and the right hand wrist of each person in the image as hand area image samples according to the acquired bone key points, and taking the images as input images for subsequent behavior recognition. The size of the hand region image sample is adjusted according to the resolution of the monitored image, and the aspect ratio of the region is generally set to 1: 1. of course, the aspect ratio of the region may be set to other values, and is not limited herein. The purpose of this step is to filter out most of the background in the image, thereby improving the recognition accuracy of the cigarette and the mobile phone.

In one specific embodiment, the resolution of the monitored image is 1366 × 768, as shown in fig. 3, a square area is cut out with the key points 4 and 7 of the bones as the center and the side length 150 as the side length, and the square area is used as the hand area image sample.

Further, labeling the extracted hand area image sample, wherein the labeling targets are cigarettes and mobile phones, and the labeling content comprises detection frames of the cigarettes or the mobile phones and corresponding object labels. And after the labeling is finished, dividing the hand area image sample into a positive sample and a negative sample according to the existence of the target object. And dividing the positive and negative samples into a training set and a verification set according to the ratio of 8: 2.

And a data enhancement method is adopted to expand the training set, so that the sample capacity and the abundance of the training set are improved. The data enhancement method may include, but is not limited to, image brightness variation, image flipping, image rotation, image scaling, and Mosaic enhancement method.

In a specific embodiment, the sample is expanded by adopting HSV color threshold transformation, image inversion and Mosaic transformation, and the training set is expanded by 4 times.

S402, inputting all hand region image samples into an initial neural network model for training, and reserving a deep learning network model obtained by each training;

specifically, an initial neural network model is built based on a deep learning algorithm, and the model identification category comprises two categories of cigarettes and mobile phones. The specific algorithm can adopt the advanced target detection algorithm in the industry according to the actual situation, including but not limited to YOLO, Faster-RCNN, SSD, and the detection algorithm based on Transformer, etc.

And training the constructed model based on the hand region image. Specifically, a proper model training optimization method is selected, and a loss function of target detection and hyper-parameters such as initial learning rate, batch size and iteration times are designed. Wherein the batch size is the number of samples selected before adjusting the learning parameters for one iteration each time. If the algorithm for constructing the model involves a preset box, the size of the preset box anchors is set. And starting model training, and storing the weight file generated in each iteration process and the deep learning network model corresponding to the weight file.

And S403, performing index evaluation on each deep learning network model, and selecting the deep learning network model with the highest index value as the behavior recognition model.

When the method is specifically implemented, firstly, models with high initial loss values in training are eliminated, then, the weight files of the rest models are evaluated by utilizing the mAP 0.5 index, and the deep learning network model corresponding to the optimal weight file is selected as a behavior recognition model. The accuracy of the recognition result of the behavior recognition model is improved by performing index evaluation on each deep learning network model and selecting the deep learning network model with the highest recognition accuracy as the behavior recognition model.

In one specific embodiment, YOLOv4 is used as the initial neural network model, and the backbone network is CSP-DarkNet 53. YOLOv4 trained to use 9 sets of preset boxes of sizes (12, 16), (19, 36), (40, 28), (36, 75), (76, 55), (72, 146), (142, 140), (192, 243), (459, 401), respectively.

The loss function of the regression part of the model detection box adopts DIoU, and is as follows:

wherein L is_DIoUIs the DIoU loss function, b is the detection box, b^gtFor the labeling box, ρ represents the euclidean distance between the detection box and the center of the labeling box.

The loss function of the model category judgment part adopts a softmax cross entropy loss function, which is as follows:

wherein S is a softmax function, y_iFor the output of category i, since the method involves categories of cigarette and mobile phone, n is 2; h (p, q) is a cross entropy loss function, where p (x) is a desired classification output of the image x, i.e., a one-hot vector of the class label, and the vector q (x) is a vector of the output of the network for the image x after softmax calculation.

The Adam function is adopted in the model training optimization method, and the momentum is set to be 0.9. The training adopts a freezing and unfreezing training mode, firstly, the backbone network is frozen for 50 periods, the batch size is set to be 32, and the initial learning rate is 0.001; then, the backbone network is unfrozen on the basis, the period of training is 150, the batch size is set to be 8, and the initial learning rate is set to be 0.0001. From the unfreezing training, the model weight file for each cycle is saved.

And finally, carrying out model evaluation. The method specifically comprises the following steps: firstly, screening 5 weight files with the lowest loss through the loss values corresponding to the weight files; and then, selecting a deep learning network model corresponding to the weight file with the highest index value as a behavior recognition model by using the mAP 0.5 index evaluation model.

if cigarettes and/or a mobile phone are identified in the hand area image of the target object, determining that the target object has abnormal behaviors; and if the cigarettes and the mobile phone are not identified in the hand area image of the target object, determining that the target object has no abnormal behavior.

Specifically, if a cigarette is identified in the hand area image of the target object, determining that the target object has smoking behavior; if a mobile phone is identified in the hand area image of the target object, determining that a mobile phone playing behavior exists in the target object; if the cigarettes and the mobile phone are identified in the hand area image of the target object at the same time, determining that the target object has a smoking behavior and a mobile phone playing behavior at the same time; and if the cigarettes and the mobile phone are not identified in the hand area image of the target object, determining that the target object has no abnormal behavior.

Referring to fig. 5, an alarm processing diagram of a behavior recognition method according to an embodiment of the present disclosure is shown. As shown in fig. 5, the method further comprises:

in order to improve the robustness of the recognition result of the behavior recognition model, the scheme performs behavior recognition on the images of the continuous preset number, and then determines whether to perform alarm processing according to the ratio of the number of the frames with abnormal behaviors to the preset number.

Specifically, a video to be detected is input, behavior recognition is performed on a current frame image of the video, a behavior recognition result of the current frame image is obtained, and the behavior recognition result is stored. If the target object in the current frame image has smoking behavior, mobile phone playing behavior or both smoking behavior and mobile phone playing behavior, the frame image has abnormal behavior; otherwise, no abnormal behavior exists. And counting the behavior recognition result of the N frames of images which are cut into the current frame of image.

In a specific embodiment, the value of N is set to 100, and the value of N may also be flexibly set according to practical applications, and is not limited herein.

Specifically, if the ratio of the number of frames with abnormal behavior to N is greater than a threshold value alpha, an alarm signal is sent out; otherwise, no alarm is given. The robustness of the behavior recognition method can be improved by adopting a statistical method to carry out alarm processing.

in order to improve the recognition accuracy of the behavior recognition model, the object to be detected which cannot provide skeleton key point information and has an excessively small size in the image frame needs to be deleted, and the object to be detected which has a proper size and a complete skeleton key point needs to be reserved.

In a specific embodiment, if the skeletal key point of the object to be measured lacks any one of the wrist key point, the elbow key point and the shoulder key point of the right hand, and lacks any one of the wrist key point, the elbow key point and the shoulder key point of the left hand, the object to be measured is not processed.

And calculating the ratio of the maximum side length of the circumscribed rectangle of the skeleton key points of the reserved object to be detected to the image frame width, and taking the ratio as the object dimension of the object to be detected in the image frame. And calculating and counting the object dimensions of all the objects to be measured, and setting a dimension corresponding value which is 5% of the minimum object dimension as a dimension threshold value. And screening the object to be detected with the actual object dimension larger than or equal to the dimension threshold value as a complete object to be detected.

In a specific embodiment, the maximum arm included angle of the movement of the object to be measured, which includes the smoking behavior and the cell phone playing behavior, is firstly counted in the collected image data, and the included angle threshold is set to be about 1.2 times of the arm included angle, where the included angle threshold is 60 degrees in this embodiment. Of course, in other embodiments, other angles may be provided, and are not limited herein. If the included angle of the left hand or right hand arm of the object to be detected is smaller than the included angle threshold value, judging that the object to be detected is a target object, and acquiring a hand area image corresponding to the object to be detected for subsequent processing; otherwise, the object to be detected is considered to have no abnormal behavior.

Generally, when a target object has a mobile phone playing behavior or a smoking behavior, a corresponding mobile phone or cigarette appears in an image of an area near a wrist key point, so that a square area image with the wrist key point as a center is captured as a hand area image and input into a behavior recognition model for recognition.

In one specific embodiment, the resolution of the monitored image is 1366 × 768 and the side length of the hand region image is set to 150. Of course, in specific implementation, different preset side lengths may be set according to specific situations, and are not limited herein.

By combining the analysis of the skeletal key points and the target recognition of the hand area image, the real-time recognition of smoking and playing mobile phone behaviors of the personnel in the monitoring image can be realized. By carrying out cigarette mobile phone identification on the hand region image, the background interference can be greatly reduced, and the identification accuracy is improved.

Example 2

Referring to fig. 6, a block diagram of a behavior recognition apparatus according to an embodiment of the present disclosure is provided. As shown in fig. 6, the behavior recognizing apparatus 600 includes:

the extraction module 601 is configured to extract skeletal key points of each object to be detected in the current frame image, where the skeletal key points at least include wrist key points, elbow key points, and shoulder key points;

the searching module 602 is configured to search, according to the bone key points, a target object with an arm included angle smaller than an included angle threshold from all objects to be detected, where the arm included angle is an included angle between a vector formed by the elbow key point and the shoulder key point and a vector formed by the elbow key point and the wrist key point;

a first obtaining module 603, configured to obtain a hand region image of the target object, where the hand region image is an image including a wrist key point and an adjacent region;

the identifying module 604 is configured to identify whether the hand region image of each target object includes a target object by using a pre-trained behavior identification model, and output a behavior identification result corresponding to each target object in the current frame image, where the target object includes at least one of a cigarette and a mobile phone, the behavior identification result includes an abnormal behavior and an abnormal behavior, and the abnormal behavior includes at least one of a smoking behavior and a mobile phone playing behavior.

On the basis of the above embodiment, an implementation manner of the present disclosure further provides a partial block diagram of a behavior recognition device. As shown in fig. 7, the behavior recognition apparatus 600 further includes:

a second obtaining module 605, configured to obtain hand region image samples, where the hand region image samples are divided into a first type image sample in which a wrist key point peripheral region includes a target object and a second type image sample in which the wrist key point peripheral region does not include the target object;

the training module 606 is used for inputting all hand region image samples into the initial neural network model for training, and reserving a deep learning network model obtained by each training;

and the selection module 607 is configured to perform index evaluation on each deep learning network model, and select the deep learning network model with the highest index value as the behavior recognition model.

In summary, the behavior recognition device provided by the embodiment of the present disclosure can realize real-time recognition of smoking and playing mobile phone behaviors of people in the monitored image by combining bone key point analysis and hand region image target recognition; by carrying out cigarette mobile phone identification on the hand region image, the background interference can be greatly reduced, and the identification accuracy is improved. The specific implementation process of the behavior recognition device provided in the embodiments shown in fig. 1, fig. 2, and fig. 4 can be referred to as the specific implementation process of the behavior recognition method provided in the embodiments, and details are not repeated here.

Furthermore, a computer device is provided in an embodiment of the present disclosure, which includes a memory and a processor, where the memory stores a computer program, and the computer program executes the behavior recognition method shown in fig. 1 and fig. 2 when running on the processor.

In addition, the embodiment of the present disclosure also provides a computer-readable storage medium, which stores a computer program that executes the behavior recognition method shown in fig. 1 and 2 when the computer program runs on a processor.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A method of behavior recognition, the method comprising:

2. The behavior recognition method according to claim 1, wherein the behavior recognition model is obtained in a manner that includes:

3. The behavior recognition method according to claim 1, wherein the step of recognizing whether the hand region image of each target object contains a target object by using the pre-trained behavior recognition model and outputting the behavior recognition result corresponding to each target object in the current frame image comprises:

4. The behavior recognition method according to claim 3, characterized in that the method further comprises:

5. The behavior recognition method according to claim 1, wherein the step of searching for the target object with the included arm angle smaller than the included angle threshold from all the objects to be detected according to the skeletal key points comprises:

6. The behavior recognition method according to claim 1, wherein the step of acquiring the hand region image of the target object includes:

7. An apparatus for behavior recognition, the apparatus comprising:

8. The behavior recognition device according to claim 7, characterized in that the device further comprises:

the second acquisition module is used for acquiring hand area image samples, wherein the hand area sample images are divided into first type image samples of which wrist key point peripheral areas contain the target object and second type image samples of which wrist key point peripheral areas do not contain the target object;

9. A computer device, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when run on the processor, performs the behavior recognition method of any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the behavior recognition method according to any one of claims 1 to 6.