CN112052815B

CN112052815B - Behavior detection method and device and electronic equipment

Info

Publication number: CN112052815B
Application number: CN202010965429.XA
Authority: CN
Inventors: 史晓蒙; 韩晴; 张星; 宋征; 李高杨; 魏健康; 张伟
Original assignee: China Hualu Group Co Ltd; Beijing E Hualu Information Technology Co Ltd
Current assignee: China Hualu Group Co Ltd; Beijing E Hualu Information Technology Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2024-02-20
Anticipated expiration: 2040-09-14
Also published as: CN112052815A

Abstract

The invention discloses a behavior detection method, a behavior detection device and electronic equipment, comprising the following steps: acquiring video data of a target monitoring area; performing face detection on image data in video data by using a pre-trained face recognition model; performing expansion processing on the frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the whole head area where the face is positioned; and performing behavior detection on the expanded frame selection area by using a pre-trained behavior detection model. By detecting the area where the face is located, the detection of the expanded frame selection area is avoided while false detection is caused to pedestrians which are not close to the face and only hold cigarettes or mobile phones, illegal objects in the frame selection area are identified, the expanded area can also contain hands holding mobile phones and cigarette ends, the detection of illegal actions is assisted by further combining the relative positions of the hands and the head area, and the accuracy of detection results is improved.

Description

Behavior detection method and device and electronic equipment

Technical Field

The invention relates to the technical field of safety control, in particular to a behavior detection method and device and electronic equipment.

Background

In some important monitoring scenes in the fields of security supervision, urban management and the like, behaviors such as call taking, smoking and the like are not allowed, such as gas stations and the like, so that the behaviors such as smoking, making and the like in the scenes need to be detected in real time, and early warning is sent out in time when the events are detected.

In the related art, the detection method for the behaviors detects illegal behaviors such as smoking, making a call and the like by detecting texture features, motion features or shape features of objects contained in an image, and the detection method is limited by the size of the visual field of the image and the shooting angle of the image, for example, when the illegal behaviors such as smoking, arcing and the like are far away from a lens, the pixels of a cigarette end in the image are smaller, the texture features and the motion features of smoke are not obvious, and misjudgment is easy; when a pedestrian holds the mobile phone to make a call, part of pixels of the mobile phone are shielded, misjudgment exists according to the judgment of the texture characteristics and the shape characteristics, and when the mobile phone is detected, the pedestrian is not actually making a call, but holds the mobile phone in the hand, so that misdetection is easily caused. Therefore, a new behavior detection method is needed to detect behaviors such as smoking and calling of pedestrians so as to improve the accuracy of detection results.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is to overcome the defect of poor accuracy of the existing detection mode, so as to provide a behavior detection method, a behavior detection device and electronic equipment.

According to a first aspect, an embodiment of the present invention discloses a behavior detection method, including: acquiring video data of a target monitoring area; performing face detection on image data in the video data by using a pre-trained face recognition model; performing expansion processing on the frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the whole head area where the face is positioned; and performing behavior detection on the expanded frame selection area by using a pre-trained behavior detection model to obtain a detection result, wherein the behavior detection model is obtained by training the face image data of the illegal action including the whole head area.

Optionally, the face recognition model is trained by the following method: acquiring face image data in a plurality of actual scenes, and constructing a first face image dataset, wherein the face image data in each scene contains images in different time periods; performing data enhancement processing on the images in the first face image data set to obtain a first face image data set with an enlarged data volume; and performing model training on the first target network model by using the first face image data set with the enlarged data volume to obtain the face recognition model.

Optionally, the first target network model includes a RetinaFace face detection model.

Optionally, the behavior detection model is trained by: obtaining face image data of illegal actions and face image data of illegal actions in a plurality of actual scenes, and constructing a second face image data set, wherein the face image data in each scene contains images in different time periods; performing expansion processing on the frame selection area corresponding to the face image data in the second face image data set, so that the expanded frame selection area is larger than the whole head area where the face is located; and carrying out data enhancement processing on the images in the expanded second face image data set to obtain a second face image data set with expanded data volume, and carrying out model training on a second target network model by using the second face image data set with expanded data volume until the loss value of the target loss function meets the preset condition to obtain the behavior detection model.

Optionally, the second target network model includes: an OSNet network model, wherein an attention mechanism is added between a feature layer and a full connection layer of the OSNet network model; the objective loss function includes: NLLLOSS loss function.

Optionally, the model training of the second target network model using the second face image dataset of the enlarged data volume includes: dividing the second face image data set with the enlarged data volume according to a target proportion to obtain a training set, a verification set and a test set; and performing model training on the second target network model by using the training set.

Optionally, before the face detection is performed on the image data in the video data by using the pre-trained face recognition model, the method further includes: and performing frame extraction processing on the video data of the acquired target monitoring area according to the target interval time to obtain image data needing to be subjected to face detection.

According to a second aspect, an embodiment of the present invention further discloses a behavior detection apparatus, including: the acquisition module is used for acquiring video data of the target monitoring area; the first detection module is used for carrying out face detection on the image data in the video data by utilizing a pre-trained face recognition model; the expansion module is used for carrying out expansion processing on the frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the whole head area where the face is positioned; and the second detection module is used for detecting the behaviors of the expanded frame selection area by utilizing a pre-trained behavior detection model to obtain a detection result, and the behavior detection model is obtained by training the face image data of the illegal actions comprising the whole head area.

According to a third aspect, an embodiment of the present invention further discloses an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the behavior detection method of the first aspect or any alternative implementation of the first aspect.

According to a fourth aspect, an embodiment of the present invention also discloses a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the behavior detection method according to the first aspect or any of the alternative embodiments of the first aspect.

The technical scheme of the invention has the following advantages:

according to the behavior detection method/device provided by the invention, the video data of the target monitoring area is obtained, the pre-trained face recognition model is utilized to detect the face of the image data in the video data, the frame selection area corresponding to the detected face data is expanded, so that the expanded frame selection area is larger than the whole head area where the face is located, and the pre-trained behavior detection model which is obtained according to the training of the illegal face image data comprising the whole head area is utilized to detect the behavior of the expanded frame selection area, so that the detection result is obtained. Compared with the prior art, only illegal actions are detected through illegal objects (such as cigarettes or mobile phones) in the images, the method and the device have the advantages that the face is recognized, the region where the face is recognized is detected, false detection of pedestrians which are not close to the face and hold cigarettes or mobile phones is avoided, meanwhile, the region where the recognized face is located is expanded, the expanded frame selection region is larger than the whole head region where the face is located, the expanded frame selection region is detected, the illegal actions can be judged by recognizing the illegal objects in the frame selection region, meanwhile, the expanded region can also contain hands for holding the mobile phones and cigarette ends, and further, detection of smoking and calling actions can be assisted by combining the relative positions of the hands and the head region, so that accuracy of detection results of the illegal actions is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a specific example of a behavior detection method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a specific example of a behavior detection apparatus according to an embodiment of the present invention;

fig. 3 is a diagram illustrating an embodiment of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, or can be communicated inside the two components, or can be connected wirelessly or in a wired way. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The embodiment of the invention discloses a behavior detection method, as shown in fig. 1, which comprises the following steps:

step 101, obtaining video data of a target monitoring area.

The video data of the target monitoring area can be acquired in real time by image acquisition equipment arranged at any place where illegal action detection is required. The type of the target monitoring area and the video data acquisition mode are not limited in the embodiment of the application.

And 102, performing face detection on the image data in the video data by using a pre-trained face recognition model.

The pre-trained face recognition model may be any model capable of recognizing a video image, and the type of the face recognition model is not limited in the embodiment of the present application. In order to improve generalization capability of a face recognition model and improve accuracy of model recognition, the face recognition model in the embodiment of the application can be obtained through training in the following manner:

firstly, acquiring face image data in a plurality of actual scenes, and constructing a first face image dataset, wherein the face image data in each scene contains images in different time periods;

the actual scene may be any of a plurality of different monitoring scenes requiring detection of illegal actions such as smoking, making a call, etc., and the first face image dataset is constructed for subsequent model training by acquiring face image data of different time periods under the plurality of actual monitoring scenes. In order to ensure the training effect of the model, the acquired face image data needs to be clear and easy to recognize, and in the embodiment of the application, the pixel value of the acquired face image data is required to be greater than 60 x 60.

Secondly, carrying out data enhancement processing on the images in the first face image data set to obtain a first face image data set with enlarged data volume;

by way of example, the data enhancement mode for the image may include random vertical flip, random noise, random chromaticity variation, random rotation, etc., and the embodiment of the present application does not limit the data enhancement mode, and those skilled in the art may determine according to actual needs. The number of images contained in the first face image dataset is further enlarged by performing data enhancement processing on the images.

And thirdly, performing model training on the first target network model by using the first face image dataset with the enlarged data volume to obtain the face recognition model.

Illustratively, the first face image dataset of the enlarged data volume is calibrated by manually calibrating key feature points (such as five sense organs) of the face and calibrating corresponding frame-selected areas. And training the first target network model by using the calibrated image to obtain a face recognition model.

Because the Retinaface detection model has high detection speed, the detection time is 7 milliseconds when using Mobilene as a backstone; the detection accuracy is high, mAP is evaluated to be 90.4% and 82.5% in a WiderFace Hard test set respectively, and in the embodiment of the application, the RetinaFace face detection model is preferably selected as the first target network model. Compared with the detection of follow-up illegal behaviors of the whole pedestrian frame selection area by using the whole target detection models such as the YOLO and the SSD, the face recognition model is used for obtaining the face-containing frame selection area, and the area containing the face is detected, so that interference of useless information can be effectively reduced, the focus of the illegal behavior detection model is concentrated on the positions such as the cigarette end and the mobile phone, and accuracy of follow-up classification is improved.

And 103, performing expansion processing on the frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the whole head area where the face is positioned.

The method includes the steps that when a face area of an image is detected by using a face recognition model and frame selection marking is carried out, a frame selection area corresponding to a face in the image is obtained, the frame selection area corresponding to detected face data is subjected to expansion processing, the expanded frame selection area is larger than the whole head area where the face in the image is located, expansion times can be determined according to actual needs, for example, the expansion times can be 1.1-1.5 times as large as the original frame selection area, the area to be identified can contain the whole head area after expansion, and meanwhile, if any image contains illegal behaviors such as smoking, calling and the like, the hand area of a person is close to the head area due to the fact that the smoking and the calling are carried out, the hand area of the person can also be covered, reference object types for identifying the illegal behaviors are enriched, and accuracy of detection results of the illegal behaviors is improved.

And 104, performing behavior detection on the expanded frame selection area by using a pre-trained behavior detection model to obtain a detection result, wherein the behavior detection model is obtained by training the illegal action face image data comprising the whole head area.

The pre-trained behavior detection model may be obtained by training any neural network structure capable of detecting illegal behaviors, and the type of the neural network structure is not limited in the embodiment of the present application. In order to improve accuracy of detection results of the behavior detection model, the face recognition model in the embodiment of the application can be obtained through training in the following manner:

firstly, obtaining face image data of illegal actions and face image data of illegal actions in a plurality of actual scenes, and constructing a second face image data set, wherein the face image data in each scene contains images in different time periods;

the method comprises the steps of collecting pedestrian images of actual application scenes, collecting illegal action images and illegal action images from multiple scenes, multiple time periods and multiple angles, such as pedestrian smoking, phone calling images, non-smoking and non-phone calling images, identifying and labeling the illegal action images and the illegal action images by using a face recognition model, wherein labeling results comprise facial five sense organs and frame selection areas corresponding to faces. Wherein the face recognition model is preferably the face recognition model described in the above embodiments. In order to ensure the model training effect, the acquired image data needs to be clear and easy to recognize, and the pixel value of the image data selected in the embodiment of the application is greater than 60×60.

Secondly, expanding the frame selection area corresponding to the face image data in the second face image data set, so that the expanded frame selection area is larger than the whole head area where the face is located; the specific extension processing manner is referred to the above embodiments, and will not be described herein.

And performing data enhancement processing on the images in the expanded second face image data set to obtain a second face image data set with expanded data volume. The manner of the data enhancement process refers to the above embodiment, and is not described herein.

And thirdly, performing model training on the second target network model by using the second face image data set with the enlarged data volume until the loss value of the target loss function meets the preset condition, so as to obtain the behavior detection model.

The class label is used for detecting smoking and calling in the monitoring scene according to the type of illegal behaviors required to be detected, and the class label can comprise smoking, calling and normal class. And training the second target network model by using the calibrated image to obtain a corresponding behavior detection model.

Taking the detection of smoking and calling actions of pedestrians as an example, due to too small pixels of objects such as cigarette ends and mobile phones, detection models aiming at a certain target (such as cigarettes and mobile phones) are not suitable for detection, and the smoking and calling actions are only distinguished in local areas, a classification model focusing on local fine granularity characteristics needs to be designed, and in addition, due to small differentiation of smoking and calling categories, the model is required to be relatively simple, and the overfitting is avoided. According to the existing method for realizing fine granularity classification, through comparison and experiments, the embodiment of the application adopts a method for increasing the attention mechanism to realize the attention of a model to fine granularity characteristics, and simultaneously utilizes NLLLOSS loss function and ReLU activation function to carry out label smoothing processing on the loss function for preventing overfitting and improving generalization capability of the model, and takes a lightweight network OSNet network model as a main body, and the attention mechanism is added to solve the problem of distinguishing local fine granularity, so that the behaviors of calling and smoking can be distinguished accurately.

In the process of model training and detection based on an OSNet network model, 224 x 224 three-channel images are used for network input, an effective feature layer is extracted from a trunk feature extraction network of the OSNet, an attention mechanism is added behind the feature layer, and after full-connection operation is carried out on the feature layer combined with the attention mechanism, an image classification result and corresponding confidence level are output through an output layer.

The result of detecting the collected image data by using the behavior detection model not only comprises the illegal behavior detection result, but also can be further combined with the equipment ID for collecting the image and the ID number of the image to construct an output result dictionary so as to be convenient for timely and accurately carrying out early warning processing.

According to the behavior detection method provided by the embodiment of the application, the video data of the target monitoring area are obtained, the face detection is carried out on the image data in the video data by utilizing the pre-trained face recognition model, the frame selection area corresponding to the detected face data is expanded, the expanded frame selection area is enabled to be larger than the whole head area where the face is located, and the behavior detection is carried out on the expanded frame selection area by utilizing the pre-trained behavior detection model which is obtained according to the training of the illegal face image data comprising the whole head area, so that the detection result is obtained. Compared with the prior art, only illegal actions are detected through illegal objects (such as cigarettes or mobile phones) in the images, the method and the device have the advantages that the face is recognized, the region where the face is recognized is detected, false detection of pedestrians which are not close to the face and hold cigarettes or mobile phones is avoided, meanwhile, the region where the recognized face is located is expanded, the expanded frame selection region is larger than the whole head region where the face is located, the expanded frame selection region is detected, the illegal actions can be judged by recognizing the illegal objects in the frame selection region, meanwhile, the expanded region can also contain hands for holding the mobile phones and cigarette ends, and further, detection of smoking and calling actions can be assisted by combining the relative positions of the hands and the head region, so that accuracy of detection results of the illegal actions is improved.

As an optional embodiment of the invention, the training of the model of the second target network by using the second face image dataset of the enlarged data volume includes: dividing the second face image data set with the enlarged data volume according to a target proportion to obtain a training set, a verification set and a test set; and performing model training on the second target network model by using the training set.

In order to guarantee the model training result, the second face image dataset obtained during training of the second target network model may be divided to obtain a training set, a verification set and a test set, the model training is performed through the training set, and the accuracy test is performed on the model obtained through the training by using the verification set and the test set, so as to guarantee the detection accuracy of the model in the actual use process.

As an optional embodiment of the present invention, before step 102, the method further includes: and performing frame extraction processing on the video data of the acquired target monitoring area according to the target interval time to obtain image data needing to be subjected to face detection.

The target interval time can be determined according to the real-time requirement of the detection result, for example, the target interval time can be in a second level, for example, the video data is subjected to frame extraction processing at intervals of 1 second, so that the image data needing to be subjected to face detection is obtained, the detection result meets the real-time requirement, and meanwhile, the video memory occupation is reduced.

Through the detection of the illegal action detection method provided by the embodiment of the application, the rapid and efficient detection of the actions of smoking and making a call of pedestrians in various scenes can be met, the test speed on Tesla p4 can reach 0.33s, and the accuracy rate in multiple scenes and multiple angles can reach 95.6%.

The embodiment of the invention also discloses a behavior detection device, as shown in fig. 2, which comprises:

an acquisition module 201, configured to acquire video data of a target monitoring area;

a first detection module 202, configured to perform face detection on image data in the video data by using a pre-trained face recognition model;

the expansion module 203 is configured to perform expansion processing on a frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the entire head area where the face is located;

and the second detection module 204 is configured to perform behavior detection on the expanded frame selection area by using a pre-trained behavior detection model, so as to obtain a detection result, where the behavior detection model is obtained by training the face image data of the illegal action including the whole head area.

According to the behavior detection device provided by the invention, the video data of the target monitoring area is obtained, the pre-trained face recognition model is utilized to detect the face of the image data in the video data, the frame selection area corresponding to the detected face data is expanded, so that the expanded frame selection area is larger than the whole head area where the face is located, and the pre-trained behavior detection model which is obtained according to the training of the illegal action face image data comprising the whole head area is utilized to detect the behavior of the expanded frame selection area, so that the detection result is obtained. Compared with the prior art, only illegal actions are detected through illegal objects (such as cigarettes or mobile phones) in the images, the method and the device have the advantages that the face is recognized, the region where the face is recognized is detected, false detection of pedestrians which are not close to the face and hold cigarettes or mobile phones is avoided, meanwhile, the region where the recognized face is located is expanded, the expanded frame selection region is larger than the whole head region where the face is located, the expanded frame selection region is detected, the illegal actions can be judged by recognizing the illegal objects in the frame selection region, meanwhile, the expanded region can also contain hands for holding the mobile phones and cigarette ends, and further, detection of smoking and calling actions can be assisted by combining the relative positions of the hands and the head region, so that accuracy of detection results of the illegal actions is improved.

As an alternative embodiment of the present invention, the apparatus further comprises: the first model training module is used for acquiring face image data in a plurality of actual scenes and constructing a first face image data set, wherein the face image data in each scene contains images in different time periods; performing data enhancement processing on the images in the first face image data set to obtain a first face image data set with an enlarged data volume; and performing model training on the first target network model by using the first face image data set with the enlarged data volume to obtain the face recognition model.

As an optional embodiment of the present invention, the first target network model includes a RetinaFace face detection model.

As an alternative embodiment of the present invention, the apparatus further comprises: the second model training module is used for acquiring the face image data of the illegal action and the face image data of the illegal action in a plurality of actual scenes and constructing a second face image data set, and the face image data in each scene contains images in different time periods; performing expansion processing on the frame selection area corresponding to the face image data in the second face image data set, so that the expanded frame selection area is larger than the whole head area where the face is located; performing data enhancement processing on the images in the expanded second face image dataset to obtain a second face image dataset with expanded data volume; and performing model training on the second target network model by using the second face image data set with the enlarged data volume until the loss value of the target loss function meets the preset condition, so as to obtain the behavior detection model.

As an optional embodiment of the present invention, the second target network model includes: an OSNet network model, wherein an attention mechanism is added between a feature layer and a full connection layer of the OSNet network model; the objective loss function includes: NLLLOSS loss function.

As an optional implementation manner of the present invention, the second model training module is further configured to divide the second face image dataset with the enlarged data volume according to a target proportion, so as to obtain a training set, a verification set and a test set; and performing model training on the second target network model by using the training set.

As an optional implementation manner of the present invention, the first detection module is further configured to perform frame extraction processing on the video data acquired from the target monitoring area according to the target interval time, so as to obtain image data that needs to be subjected to face detection.

The embodiment of the present invention further provides an electronic device, as shown in fig. 3, which may include a processor 401 and a memory 402, where the processor 401 and the memory 402 may be connected by a bus or other means, and in fig. 3, the connection is exemplified by a bus.

The processor 401 may be a central processing unit (Central Processing Unit, CPU). The processor 401 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 402 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the behavior detection method in the embodiment of the present invention. The processor 401 executes various functional applications of the processor and data processing, i.e., implements the behavior detection method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 402.

Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 401, or the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, such remote memory being connectable to processor 401 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 402 and when executed by the processor 401 perform the behavior detection method in the embodiment shown in fig. 1.

The specific details of the electronic device may be understood correspondingly with respect to the corresponding related descriptions and effects in the embodiment shown in fig. 1, which are not repeated herein.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (RandomAccessMemory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A behavior detection method, comprising:

acquiring video data of a target monitoring area;

performing face detection on image data in the video data by using a pre-trained face recognition model;

performing expansion processing on the frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the whole head area where the face is positioned;

performing behavior detection on the expanded frame selection area by using a pre-trained behavior detection model to obtain a detection result, and constructing an output result dictionary by combining an equipment ID for collecting image data and an ID number of an image, wherein the behavior detection model is obtained by training the face image data of the illegal action comprising the whole head area;

the behavior detection model is obtained through training in the following mode: obtaining face image data of illegal actions and face image data of illegal actions in a plurality of actual scenes, and constructing a second face image data set, wherein the face image data in each scene contains images in different time periods; performing expansion processing on the frame selection area corresponding to the face image data in the second face image data set, so that the expanded frame selection area is larger than the whole head area where the face is located; performing data enhancement processing on the images in the expanded second face image dataset to obtain a second face image dataset with expanded data volume; performing model training on a second target network model by using the second face image data set with the enlarged data volume until the loss value of the target loss function meets a preset condition, so as to obtain the behavior detection model;

wherein the second target network model comprises: an OSNet network model, wherein an attention mechanism is added between a feature layer and a full connection layer of the OSNet network model; the objective loss function includes: NLLLOSS loss function.

2. The method according to claim 1, wherein the face recognition model is trained by:

acquiring face image data in a plurality of actual scenes, and constructing a first face image dataset, wherein the face image data in each scene contains images in different time periods;

performing data enhancement processing on the images in the first face image data set to obtain a first face image data set with an enlarged data volume;

and performing model training on the first target network model by using the first face image data set with the enlarged data volume to obtain the face recognition model.

3. The method of claim 2, wherein the first target network model comprises a RetinaFace face detection model.

4. The method of claim 1, wherein model training a second target network model using the second face image dataset of enlarged data volume comprises:

dividing the second face image data set with the enlarged data volume according to a target proportion to obtain a training set, a verification set and a test set;

and performing model training on the second target network model by using the training set.

5. The method of claim 1, wherein prior to face detection of image data in the video data using a pre-trained face recognition model, the method further comprises:

and performing frame extraction processing on the video data of the acquired target monitoring area according to the target interval time to obtain image data needing to be subjected to face detection.

6. A behavior detection apparatus, characterized by comprising:

the acquisition module is used for acquiring video data of the target monitoring area;

the first detection module is used for carrying out face detection on the image data in the video data by utilizing a pre-trained face recognition model;

the expansion module is used for carrying out expansion processing on the frame selection area corresponding to the detected face data, so that the expanded frame selection area is larger than the whole head area where the face is positioned;

the second detection module is used for detecting the behaviors of the expanded frame selection area by utilizing a pre-trained behavior detection model to obtain a detection result, and constructing an output result dictionary by combining the equipment ID for collecting the image data and the ID number of the image, wherein the behavior detection model is obtained by training the face image data of the illegal act comprising the whole head area; the behavior detection model is obtained through training in the following mode: obtaining face image data of illegal actions and face image data of illegal actions in a plurality of actual scenes, and constructing a second face image data set, wherein the face image data in each scene contains images in different time periods; performing expansion processing on the frame selection area corresponding to the face image data in the second face image data set, so that the expanded frame selection area is larger than the whole head area where the face is located; performing data enhancement processing on the images in the expanded second face image dataset to obtain a second face image dataset with expanded data volume; performing model training on a second target network model by using the second face image data set with the enlarged data volume until the loss value of the target loss function meets a preset condition, so as to obtain the behavior detection model; wherein the second target network model comprises: an OSNet network model, wherein an attention mechanism is added between a feature layer and a full connection layer of the OSNet network model; the objective loss function includes: NLLLOSS loss function.

7. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the behavior detection method of any one of claims 1-5.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the behavior detection method according to any one of claims 1-5.