CN114092846A

CN114092846A - Prison behavior specification detection device and method based on deep learning

Info

Publication number: CN114092846A
Application number: CN202010736024.9A
Authority: CN
Inventors: 杨景翔; 许根; 黄业鹏; 吕立; 王菊; 徐刚; 肖江剑
Original assignee: Ningbo Institute of Material Technology and Engineering of CAS
Current assignee: Ningbo Institute of Material Technology and Engineering of CAS
Priority date: 2020-07-08
Filing date: 2020-07-28
Publication date: 2022-02-25

Abstract

The invention discloses a prison behavior specification detection device and method based on deep learning. The monitoring station behavior specification detection device based on deep learning comprises: the human head counting detection module and the behavior specification detection module; the human head counting and detecting module comprises a target detecting and dividing process and is used for identifying the non-sensible roll names and the crowd density of people; the behavior specification detection module comprises a training process for acquiring a classifier by using a training sample set and an identification process for identifying a test sample by using the classifier, and is used for calculating and distinguishing the behavior of the personnel in real time. Through the mode, the system and the method can effectively carry out behavior standard identification on the escort personnel according to the monitoring requirements, carry out detection alarm on abnormal behaviors, strengthen the prison security and protection and improve the working efficiency of policemen.

Description

Prison behavior specification detection device and method based on deep learning

Technical Field

The invention relates to the field of machine learning research, in particular to a prison behavior specification detection device and method based on deep learning.

Background

With the rapid development of information technology, computer vision has been in the best development period along with the emergence of concepts such as VR, AR and artificial intelligence, and the most important video behavior analysis in the field of computer vision is more and more favored by scholars at home and abroad. In a series of fields such as video monitoring, human-computer interaction, medical care, video retrieval and the like, video behavior analysis occupies a great proportion. Such as the now popular unmanned automobile project, video behavior analysis is very challenging. Due to the characteristics of complexity and diversity of human body actions and the influence of factors such as human body self-shielding, multi-scale, visual angle rotation and translation under multiple visual angles, the difficulty of video behavior identification is very high. How to accurately recognize human behaviors from multiple angles in real life and analyze the human behaviors is a very important research topic, and social requirements on behavior analysis are increasing.

The traditional research methods include the following:

based on the video stream feature points: and extracting the spatio-temporal feature points from the extracted video frame images, modeling and analyzing the spatio-temporal feature points, and finally classifying.

Based on the single-frame image characteristics: the behavior characteristics of people in the single-frame image are extracted through an algorithm or a depth camera, and then the behavior characteristics are described, modeled, trained and classified according to video behaviors.

The behavior analysis method based on the video stream characteristic points and the single-frame image characteristics obtains remarkable results under the traditional single visual angle or single-person mode, but aiming at the existing areas with large pedestrian flow, such as streets, airports, stations and the like, or a series of complex problems of human body shielding, illumination change, visual angle change and the like, the effect of the two analysis methods simply used in the actual life often cannot meet the requirements of people, and sometimes the robustness of the algorithm is poor.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a prison behavior specification detection device and method based on deep learning, wherein a deep learning network is adopted to analyze human body behaviors, so that the robustness of a classification model is improved; in particular, the deep learning network is suitable for training and learning based on big data and can well exert the advantages of the deep learning network.

The technical scheme of the invention is realized as follows:

the embodiment of the invention provides a monitoring place behavior specification detection device based on deep learning, which is used for carrying out corresponding behavior detection algorithm design completely according to the behavior specification rule of escort personnel at the monitoring place. The execution complexity of the system is reduced, and the stability of the system is improved. The detection time period and the detection area are completely defined by a user and set according to the standard behavior specification, so that the requirement of behavior specification detection can be well met.

Specifically, the invention provides a monitoring station behavior specification detection device based on deep learning, which comprises: the human head counting detection module and the behavior specification detection module; wherein:

the human head counting and detecting module is used for identifying the non-sensible roll names and/or the crowd density; the human head counting and detecting module comprises a target detecting and dividing process;

the behavior specification detection module is used for calculating and distinguishing the behavior of the person in real time; the behavior specification detection module comprises a training process for obtaining a classifier by using a training sample set and a recognition process for recognizing a test sample by using the classifier.

Specifically, the embodiment of the present invention further provides a monitoring station behavior specification detection method based on deep learning, which is characterized by including the following steps:

counting and detecting the human head, and identifying the non-sensible roll call and/or the crowd density; the human head counting detection comprises a target detection segmentation process;

the behavior specification detection is used for carrying out real-time calculation judgment on the behavior of the person; the behavior specification detection comprises a training process of obtaining a classifier by using a training sample set and a recognition process of recognizing a test sample by using the classifier.

The invention has the advantages that: the global advanced features are obtained by using the CNN method, the robustness of videos in actual life is good through the feature reinforcement of STN, then the human posture information is obtained through SPPE, the human posture information is regressed to a human detection frame through SDTN, the network of the human body is optimized, the redundant detection problem is solved through PP-NMS, corresponding classifier training is carried out based on the posture estimation result, the features obtained through the global features are more comprehensive, the behavior description is more complete, and the applicability is stronger.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a target detection segmentation process of the human head technology detection module of the present invention;

FIG. 2 is a schematic flow chart of the training process of the behavior specification detection module of the present invention;

FIG. 3 is a flow chart illustrating the determination process of the behavior specification detection module according to the present invention;

FIG. 4 is a simplified underlying feature extraction and modeling flow diagram;

fig. 5 is a process flow diagram of a general CNN.

Detailed Description

In view of the deficiencies in the prior art, the inventors of the present invention have made extensive studies and extensive practices to provide technical solutions of the present invention. The technical solution, its implementation and principles, etc. will be further explained as follows.

The prison behavior specification detection device based on deep learning provided by the embodiment of the invention uses a CNN method to extract features of a bottom layer so as to obtain global features instead of key points obtained by a traditional method, and uses an STN method to perform feature reinforcement on the obtained global features instead of directly modeling the obtained features; in addition, the monitoring place behavior specification detection device based on deep learning provided by the embodiment of the invention uses an SDTN method to remap the obtained attitude characteristics, so as to further enhance the accuracy of the detection frame, and in addition, for the key points under multiple scales, the key point regression operation is performed through a deconvolution layer, so that the precision of multi-person key point detection can be effectively improved. And definitely matching the connected key point pairs according to the connectivity of the key points of the human body and the structure of the human body.

An embodiment of the present invention provides a prison behavior specification detection device based on deep learning, including: the human head counting detection module and the behavior specification detection module; wherein:

Further, the target detection and segmentation process of the human head counting detection module comprises the following steps:

s1) labeling the head of the image by using a labeling tool, generating a JSON file for each picture, and extracting the characteristic information of the image labeling through a convolutional neural network;

s2) extracting ROI (region of interest) from the feature information obtained in the step S1) by using a region generation network, and then, using region of interest pooling to change all the ROIs into a fixed size;

s3) performing Bounding box regression and classification prediction on the ROI obtained in the step S2) through a full-connection layer, sampling at different points of a feature map, and applying bilinear interpolation;

s4) finally, dividing a mask network, taking the positive regions selected by the ROI classifier as input, and generating masks of the positive regions; amplifying the predicted mask to the size of the ROI bounding box to give a final mask result, one mask for each target; a mask of the predictive segmentation is added to each ROI and the output is the existing object on the image and a high quality segmented mask.

Further, the human head counting and detecting module specifically comprises:

the target detection unit is used for carrying out real-time detection and statistics on the noninductivity of escort personnel;

the density analysis unit is used for real-time accurate density detection and abnormal alarm of the prison and the air release ring;

the target detection unit includes the steps of: firstly, five groups of video images with heads exposed in different environments are collected according to standard requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as a verification data set; then, operating the four groups of video frame images according to the steps S1) to S5) to obtain a human head detection model; finally, loading the human head detection model to the remaining group of video frame images, and carrying out final real-time personnel detection and statistics;

further, the training process of the behavior specification detection module comprises the following steps:

s5) inputting a certain behavior video frame image, making the image pass through the convolutional neural network to extract features, convolving the output of 6 specific convolutional layers in the network with two convolution kernels of 3 × 3, then gathering all the generated bounding boxes, and all the bounding boxes are lost to NMS, i.e. non-maximum suppression, to obtain a series of target detection boxes.

S6) inputting the target detection box obtained in step S5) to STN, i.e., a spatial transform network, performing enhancement operation, and extracting a high-quality one-person region from an inaccurate candidate box;

s7) estimating the posture skeleton of the person by using SPPE (single person posture estimator) for the strengthened single person region frame in the step S6);

s8) the single posture obtained in the step S7) is mapped to an image coordinate system again through SDTN (space inverse transformation) namely a space inverse transformation network, so that a more accurate human body target detection frame is obtained, and the human body posture estimation operation is carried out again; then, through PP-NMS (propene polymer-network management system), namely parameterized non-maximum value inhibition, the problem of redundancy detection is solved, and human skeleton information under the behavior is obtained;

s9) performing key point regression operation on the multi-scale key points obtained in the step S8) through a deconvolution layer, which is equivalent to a process of performing one-time up-sampling, and can improve the precision of target key points; considering the connectivity of a plurality of key points, establishing a directed field for connecting the key points, and definitely matching connected key point pairs according to the connectivity and the structure of human body parts to reduce misconnection and obtain final human body skeleton information;

s10) carrying out feature extraction on the final human skeleton information obtained in the step S9), and inputting the final human skeleton information into a classifier for training as a training sample of the behavior;

s11) repeating the steps to obtain classifiers of various behaviors.

Further, the identification process of the behavior specification detection module comprises the following steps:

s12) according to the rule of the behavior specification of the escort at which the guard is located, setting a detection trigger time period and a detection area aiming at the specific behavior detection requirement, and storing the detection trigger time period and the detection area locally in a JSON form;

s13), reading the JSON file, recording a video frame image of a certain behavior in a set detection trigger time period, only taking an image in a detection area, and performing human body posture estimation on the image by adopting the steps S5) to S10) to obtain human body skeleton characteristic information in the detection area; only playing the video frame image in the rest time periods, and not performing corresponding behavior identification operation;

s14) inputting the human skeleton feature information obtained in the step S13) into a classifier to be identified to obtain a video behavior category.

Furthermore, the identification process comprises the steps of setting a detection trigger time period and a detection area for detecting each item of standard behaviors and identifying by using a classifier, wherein the detection trigger time period and the detection area are set artificially, the detection trigger time period and the detection area are executed strictly according to the rule of behavior regulations of escorting personnel in a guard, when the detection trigger time period is within the detection trigger time period, corresponding behavior identification operation is carried out on the set detection area, and when illegal behaviors are identified, alarm information needs to be sent out; if the current time is not within the detection trigger time period, corresponding behavior recognition operation is not carried out; the detection time period and the detection area are completely defined by a user and set according to the standard behavior specification, so that the requirement of behavior specification detection can be well met.

Further, the operation of the PP-NMS in step S8) specifically includes: selecting the gesture with the maximum confidence as a reference, eliminating the region frame close to the reference according to an elimination standard, and repeating the process for multiple times until redundant identification frames are eliminated and each identification frame is unique;

the skeleton information of the human body obtained in the step S8) further includes: and (3) simulating the forming process of the human body region frame by using the reinforced data set and learning the description information of different postures in the output result, and further generating a larger training set.

The invention further provides a prison behavior specification detection method based on deep learning, which comprises the following steps:

Further, the target detection and segmentation process specifically includes the following steps:

Further, the human head counting detection specifically comprises the following steps:

target detection, which is used for detecting and counting the insensitivity of escort personnel in real time;

density analysis, which is used for real-time accurate density detection and abnormal alarm of a prison and a wind release ring;

the target detection comprises the following steps: firstly, five groups of video images with heads exposed in different environments are collected according to standard requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as a verification data set; then, operating the four groups of video frame images according to the steps S1) to S5) to obtain a human head detection model; and finally, loading the human head detection model on the remaining group of video frame images, and performing final real-time personnel detection and statistics.

The technical solution, the implementation process and the principle thereof will be further explained with reference to the drawings.

As shown in fig. 1 to fig. 3, the monitoring station behavior specification detection method based on deep learning of the present invention includes a human head counting detection module and a behavior specification detection module.

The head counting detection module is used for monitoring the noninductive roll calling of escort personnel and the crowd density identification; the behavior standard detection module is used for carrying out real-time calculation and judgment on behaviors of supervising house washing order, housekeeping, dining order and sleeping order, getting up order, television education order, safety threshold value standard, operation check standard, three-positioning supervision standard and head-out supervision standard.

Wherein, people's head count detection module specifically includes: and the target detection unit is used for detecting and counting the insensibility of the escort personnel in real time. And the density analysis unit is used for real-time accurate density detection and abnormal alarm of the prison and the air release ring.

The behavior specification detection module specifically comprises:

and the washing order comparison unit is used for monitoring a set toilet and a queuing area, calculating and judging whether only 2 persons are kept in the toilet or not in real time and whether other persons wait in a specified area or not.

And the arrangement housekeeping standardization unit is used for supervising and setting beds and wall leaning waiting areas, calculating and judging whether the beds are always kept with 4 persons in arrangement housekeeping and whether other persons wait in the wall leaning area in real time.

And the dining order comparison unit is used for monitoring the dining time and calculating and judging whether abnormal persons having no sitting or standing have a real-time effect.

And the sleep order comparison unit is used for monitoring the rest time and calculating and judging whether the people sleep blindly and get up illegally in real time.

And the getting-up order regulation unit is used for monitoring the getting-up deadline and calculating and judging whether a person is in the bed or not in real time.

And the television education order comparison unit is used for monitoring the television education time, calculating and judging whether the television education personnel are abnormally watched without sitting or standing in real time, and giving an alarm if the number of people walking about is too large.

And the safety round value specification unit is used for monitoring and setting a safety round value area, calculating and judging whether the safety round value area keeps 2 persons in the field in real time, and judging that the persons are in violation if the safety round value area is in the same position for a long time.

And the operation specification assessment unit is used for monitoring operation time, calculating and judging the queue regularity in real time and grading.

And the three-positioning supervision unit is used for calculating and judging whether the personnel carry out three-positioning operation according to the regulations in real time when the fighting behavior is monitored.

And the outlet monitoring head holding standard unit is used for monitoring and setting a warning line area, and calculating and judging whether the outlet monitoring personnel carry out double-hand head holding in the warning line area according to the regulations or not in real time.

Further, the human head technology detection module comprises a target detection segmentation process, and the behavior specification detection module comprises a training process for obtaining a classifier by using the training sample set and an identification process for identifying the test sample by using the classifier. And corresponding behavior detection algorithm design is completely carried out according to the rule of the behavior specification of the escorting personnel at the guard. The identification process comprises the steps of setting detection trigger time periods and detection areas of various standard behavior detections and identifying by using a classifier, wherein the detection time periods and the detection areas are set artificially, the detection is executed strictly according to the behavior standard rule of escorting personnel in a guard place, when the escorting personnel are in the detection trigger time periods, corresponding behavior identification operation is carried out in the set detection areas, and when illegal behaviors are identified, alarm information needs to be sent. And if the current time is not within the detection trigger time period, not performing corresponding behavior identification operation. The detection time period and the detection area are completely defined by a user and set according to the standard behavior specification, so that the requirement of behavior specification detection can be well met. And triggering behavior detection only in a set time period and a set detection area, and not performing corresponding recognition algorithms in other time periods and other areas. The execution complexity of the system is reduced, and the stability of the system is improved. The detection time period and the detection area are completely defined by a user and set according to the standard behavior specification, so that the requirement of behavior specification detection can be well met.

Further, the target detection and segmentation process of the human head technology detection module is shown in fig. 1, and includes the following steps:

s1) labeling the head of the image by using a labeling tool, and generating a JSON file for each picture. And extracting the characteristic information of the image annotation through a CNN (Convolutional Neural Network).

S2) extracting ROIs (regions Of Interest) from the feature information obtained in step S1) using RPN (Region pro-social Network), and then using ROI Pooling to fix all the ROIs.

S3) performing Bounding box regression and classification prediction on the ROI obtained in the step S2) through a full-connected layer, sampling at different points of a feature map, and applying bilinear interpolation.

S4) finally, a segmentation mask network takes the positive regions selected by the ROI classifier as input and generates their masks. The predicted mask is scaled to the size of the ROI border to give the final mask result, one mask for each object. A mask of the predictive segmentation is added to each ROI and the output is the existing object on the image and a high quality segmented mask.

The data set contained four different environments, with 10 people in five groups, each group repeated three times as per the specification requirements. Four of the groups were used as training data sets and the remaining group as test data sets.

Specifically, for example, to complete target detection, five groups of videos are collected to expose the head in the video images in different environments according to the specification requirements, wherein the videos of four groups are used as a training data set, and one group of videos is used as a verification data set. Firstly, four groups of video frame images are operated according to the steps S1) to S5), and finally, a human head detection model is obtained; and then loading the human head detection model on the remaining group of video frame images, and carrying out final real-time personnel detection and statistics. If the density detection is to be completed, the density calculation is required to be performed in the last step.

The training process of the behavior specification detection module is shown in fig. 2, and includes the following steps:

s5) inputting a certain action video frame image, making the image pass through CNN to extract features, convolving the output of 6 specific convolution layers in the network with two convolution kernels of 3 × 3, then gathering all the generated bounding boxes, and all the bounding boxes are thrown into NMS (Non-Maximum-Suppression, Non-Maximum Suppression) to obtain a series of target detection boxes.

S6) inputting the target detection box obtained in step S5) into STN (Spatial Transform Networks) to perform enhancement operation, and extracting a high-quality single-person region from an inaccurate candidate box.

S7) estimating the human Pose skeleton using SPPE (Single Person position Estimator) for the Single Person region frame enhanced in step S6).

S8) remapping the single pose obtained in step S7) to an image coordinate system by an SDTN (Spatial De-Transformer Network), thereby obtaining a more accurate human target detection frame, and performing a human pose estimation operation again. And then, solving the problem of redundancy detection through PP-NMS (Parametric Pose Non-Maximum-Suppression) to obtain the human skeleton information under the behavior.

S9) performing key point regression operation on the multi-scale key points obtained in the step S8) through a deconvolution layer, which is equivalent to a process of performing one-time up-sampling, and thus the accuracy of the target key points can be improved. And considering the connectivity of a plurality of key points, establishing a directed field for connecting the key points, and definitely matching the connected key point pairs according to the connectivity and the structure of the human body part to reduce misconnection and obtain final human body skeleton information.

s11) repeating the steps to obtain classifiers of various behaviors.

The identification process of the behavior specification detection module is shown in fig. 2, and includes the following steps:

s12) according to the rule of the behavior specification of the escort at the guard, setting a detection trigger time period and a detection area aiming at the specific behavior detection requirement, and storing the detection trigger time period and the detection area locally in a JSON form.

S13), firstly reading the JSON file, recording a video frame image of a certain behavior in a set detection trigger time period, only taking the image in the detection area, and carrying out human body posture estimation on the image by adopting the steps S5) to S10) to obtain human body skeleton characteristic information in the detection area. And only playing the video frame image in the rest time periods, and not performing corresponding behavior identification operation.

In the above technical solution, step S5) preferably performs two-layer convolution to extract the detection result from different feature maps.

In the above technical solution, the PP-NMS operation in step S8) is as follows:

the pose with the highest confidence is first selected as the reference and the region boxes near the reference are eliminated according to the elimination criteria, and this process is repeated a number of times until redundant recognition boxes are eliminated and each recognition box is unique.

In the above technical solution, the human skeleton information obtained in step S8) further includes the following operations:

and (3) simulating the forming process of the human body region frame by using the reinforced data set and learning the description information of different postures in the output result, and further generating a larger training set.

The invention preferably adopts a watchhouse data set which comprises four different environments, 10 persons are divided into five groups, and each group is repeated three times according to the standard requirement. Four of the groups were used as training data sets and the remaining group as test data sets.

Specifically, for example, to identify the behavior of "washing", four groups are collected and enter a washing area according to the specification requirements, one group enters the washing area in a violation mode, the washing videos of the four groups serve as training data sets, and the washing video of the one group serves as a verification data set. Firstly, operating a certain group of washing video frame images according to the steps S1) to S5), and finally obtaining the human body posture skeleton information characteristics of the washing video behaviors; taking the sample as a training sample with the standard behavior of 'washing' of the group, and inputting the training sample into a classifier for training; and training by different groups of training samples for multiple times to obtain the 'washing' behavior classifier. Similarly, classifiers of various video behaviors can be constructed.

When the judgment is made, the steps S12) to S14) are executed, firstly, a detection trigger time period and a detection area are set, if the current time is within the detection trigger time period, the video frame image of one group in the test sample is divided according to the set detection area, and only the image in the detection area is operated according to the steps S5) to S10), so as to obtain the human body posture skeleton information characteristics in the detection area, and then the human body posture skeleton information characteristics are input into a classifier through a data enhancement set to identify the row as a type. The other environment determination processes are the same.

Fig. 3 shows a simplified flow chart of the underlying feature extraction and modeling.

In the technical scheme of the invention, the adopted posture Estimation frame is RMPE (Regional Multi-Person position Estimation), as shown in FIG. 3, CNN is used for extracting the characteristics of an input image, then two convolution kernels of 3 x 3 are respectively used for convolving the output of 6 specific convolution layers in a network, all generated bounding boxes are collected to obtain a screened target detection box through NMS, then the detection box is input into STN and SPPE to automatically detect the human posture, and then SDTN and PP-NMS are used for carrying out regression to establish a directed field connecting key points, so that the final human posture skeleton characteristics are obtained by reducing misconnection.

The technical scheme of the invention adopts two layers of convolution operation to extract the bottom layer characteristics, and then carries out redundancy elimination on the detection result by a non-maximum suppression method. And inputting the detection box after the redundancy elimination into the STN layer to perform strengthening operation on the features, wherein the function of the STN network is to enable the obtained features to have robustness to translation, rotation and scale change. And then carrying out SPPE single-person posture estimation on the characteristic image output by the STN, and then returning the posture estimation result to an image coordinate system through the SDTN, so that a high-quality human body region can be extracted in a non-accurate region frame. The problem of redundancy detection is then solved by the PP-NMS. And finally, performing key point regression through a deconvolution layer, improving the precision of key points, establishing a directed field for connecting the key points, and reducing misconnection so as to obtain final human skeleton information.

Among the above technical solutions, CNN is a highly efficient recognition method that has been developed and paid attention in recent years. In the 60's of the 20 th century, Hubel and Wiesel discovered their unique network structures that could effectively reduce the complexity of the feedback neural network when studying neurons for local sensitivity and direction selection in the feline cerebral cortex, and subsequently proposed CNN. At present, CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification, because the network avoids the complex preprocessing of the image and can directly input the original image, it has been more widely applied.

In general, the basic structure of CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to a local acceptance domain of the previous layer and extracts the feature of the local. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal.

The technical scheme of the invention is to use the feature mapping layer to extract the global bottom layer features in the video frame image and then carry out deeper processing on the bottom layer features.

The general processing flow of CNN is shown in fig. 4.

The layer to be used in the technical scheme of the invention is the Feature Map obtained after convolution, the Feature Map size of six layers is (38,38), (19,19), (10,10), (5,5), (3,3), (1,1) respectively, and then a plurality of prior frames with different scales or aspect ratios are arranged in each unit of the Feature Map. This forms a signature. And performing convolution on the feature map to obtain a detection result, wherein the detection value comprises a category confidence coefficient and a bounding box position. Each performed using a 3 x 3 convolution.

It should be understood that the above-mentioned embodiments are merely illustrative of the technical concepts and features of the present invention, which are intended to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and therefore, the protection scope of the present invention is not limited thereby. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A prison behavior specification detection device based on deep learning is characterized by comprising: the human head counting detection module and the behavior specification detection module; wherein:

2. The deep learning-based prison behavior specification detection device as claimed in claim 1, wherein the target detection segmentation process of the head count detection module comprises the following steps:

3. The deep learning-based prison behavior specification detection device according to claim 2, wherein the head count detection module specifically comprises:

the target detection unit includes the steps of: firstly, five groups of video images with heads exposed in different environments are collected according to standard requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as a verification data set; then, operating the four groups of video frame images according to the steps S1) to S5) to obtain a human head detection model; and finally, loading the human head detection model on the remaining group of video frame images, and performing final real-time personnel detection and statistics.

4. The deep learning-based prison behavior specification detection device as claimed in claim 1, wherein the training process of the behavior specification detection module comprises the following steps:

s11) repeating the steps to obtain classifiers of various behaviors.

5. The deep learning-based prison behavior specification detection device as claimed in claim 4, wherein the identification process of the behavior specification detection module comprises the following steps:

6. The prison behavior specification detection device based on deep learning of claim 1, wherein the identification process includes setting a detection trigger time period and a detection area for each specification behavior detection and identifying by using a classifier, including artificially setting a detection time and a detection area, strictly executing according to the rule of behavior specification of a guard at guard, performing corresponding behavior identification operation in the set detection area when the detection trigger time period is in, and sending alarm information when an illegal behavior is identified; if the current time is not within the detection trigger time period, corresponding behavior recognition operation is not carried out; the detection time period and the detection area are completely defined by a user and set according to the standard behavior specification, so that the requirement of behavior specification detection can be well met.

7. The deep learning based prison behavior specification detection device of claim 4, wherein the PP-NMS operation in step S8) specifically comprises: selecting the gesture with the maximum confidence as a reference, eliminating the region frame close to the reference according to an elimination standard, and repeating the process for multiple times until redundant identification frames are eliminated and each identification frame is unique;

8. A prison behavior specification detection method based on deep learning is characterized by comprising the following steps:

9. The prison behavior specification detection method based on deep learning of claim 8, wherein the target detection segmentation process specifically comprises the following steps:

10. The deep learning-based prison behavior specification detection method according to claim 9, wherein the head count detection specifically comprises the following steps: