WO2022022368A1 - Deep-learning-based apparatus and method for monitoring behavioral norms in jail - Google Patents

Deep-learning-based apparatus and method for monitoring behavioral norms in jail Download PDF

Info

Publication number
WO2022022368A1
WO2022022368A1 PCT/CN2021/107746 CN2021107746W WO2022022368A1 WO 2022022368 A1 WO2022022368 A1 WO 2022022368A1 CN 2021107746 W CN2021107746 W CN 2021107746W WO 2022022368 A1 WO2022022368 A1 WO 2022022368A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection
behavior
classifier
human
behavioral
Prior art date
Application number
PCT/CN2021/107746
Other languages
French (fr)
Chinese (zh)
Inventor
杨景翔
许根
黄业鹏
吕立
王菊
徐刚
肖江剑
Original Assignee
宁波环视信息科技有限公司
中国科学院宁波材料技术与工程研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010736024.9A external-priority patent/CN114092846A/en
Application filed by 宁波环视信息科技有限公司, 中国科学院宁波材料技术与工程研究所 filed Critical 宁波环视信息科技有限公司
Publication of WO2022022368A1 publication Critical patent/WO2022022368A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of machine learning research, and in particular, to a deep learning-based device and method for detecting behavioral norms in prisons.
  • the behavior analysis method based on video stream feature points and single-frame image features has achieved remarkable results in traditional single-view or single-person mode, but it is currently used in areas with relatively large pedestrian traffic such as streets, airports, and stations, or human occlusion.
  • the present application proposes a deep learning-based detection device and method for behavioral norms of prisons and institutions, which adopts a deep learning network to analyze human behavior and improves the robustness of a classification model; especially a deep learning network It is suitable for training and learning based on big data, and can give full play to its advantages.
  • the embodiment of the present application provides a deep learning-based detection device for behavioral norms of prisons, which completely designs a behavioral detection algorithm in accordance with the code of conduct for detainees in prisons.
  • the detection process includes setting detection triggers for each standardized behavioral detection. Time period and detection area, the behavior detection is only triggered in the set time period and the set detection area, and the corresponding recognition algorithm is not performed in other time periods and other areas. Reduce the execution complexity of the system and improve the stability of the system.
  • the detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
  • this application proposes a deep learning-based behavioral code detection device for prisons, including: a head count detection module and a behavioral code detection module; wherein:
  • the head count detection module is used for non-sensing roll call and/or crowd density identification; the head count detection module includes a target detection and segmentation process;
  • the behavior norm detection module is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection module includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
  • an embodiment of the present application also proposes a deep learning-based method for detecting behavioral norms in prisons, characterized in that the method includes the following steps:
  • Head count detection used for non-sensing roll call and/or crowd density identification; the head count detection includes a target detection and segmentation process;
  • Behavioral norm detection is used for real-time calculation and discrimination of personnel behavior; the behavioral norm detection includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
  • the advantages of this application are: the global high-level features are obtained by using the CNN method, and the feature enhancement of STN has good robustness to real-life videos, and then SPPE is used to obtain the human body posture information, and the SDTN returns to Human body detection frame, optimize its own network, use PP-NMS to solve the problem of redundant detection, and conduct corresponding classifier training based on the attitude estimation results.
  • the features obtained from the global features are more comprehensive, making the behavior description more complete and more applicable.
  • FIG. 1 is a schematic flowchart of the target detection and segmentation process of the applicant's head technology detection module
  • Fig. 2 is the schematic flow chart of the training process of the code of conduct detection module of the present application
  • 3 is a schematic flowchart of the discrimination process of the code of conduct detection module of the present application.
  • Fig. 4 is a simplified flowchart of extraction and modeling of underlying features
  • Figure 5 is a process flow diagram of a general CNN.
  • An apparatus for detecting behavior norms of prisons based on deep learning uses a CNN method to perform feature extraction on underlying features to obtain global features instead of key points obtained by traditional methods, and the embodiments of the present application Provided is a deep learning-based monitoring device for behavioral norms of prisons and institutions, which uses the STN method to perform feature enhancement on the obtained global features instead of directly modeling the obtained features;
  • the learned prison behavior norm detection device uses the SDTN method to remap the obtained pose features to further enhance the accuracy of the detection frame.
  • a layer of deconvolution layer is used to perform key detection. The point regression operation can effectively improve the accuracy of multi-person key point detection.
  • the deep learning-based prison code of conduct detection device provided by the embodiment of the present application also takes into account the connectivity of multiple key points, and establishes a method for connecting key points. to the field.
  • the connected keypoint pairs are explicitly matched according to the connectivity of the human keypoints and the human body structure.
  • An embodiment of the present application provides a deep learning-based behavioral code detection device for prisons, including: a head count detection module and a behavioral code detection module; wherein:
  • the head count detection module is used for non-sensing roll call and/or crowd density identification; the head count detection module includes a target detection and segmentation process;
  • the behavior norm detection module is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection module includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
  • the target detection and segmentation process of the head count detection module includes the following steps:
  • S1 use the labeling tool to label the head of the image, generate a JSON file for each picture, and extract the feature information of the image labeling through a convolutional neural network;
  • step S2) using the feature information obtained in step S1) to extract the ROI, that is, the region of interest, using the region generation network, and then use the region of interest pooling to turn these ROIs into a fixed size;
  • step S3) perform Bounding box regression and classification prediction on the ROI obtained in step S2) through the fully connected layer, sample at different points of the feature map, and apply bilinear interpolation;
  • the head count detection module specifically includes:
  • the target detection unit is used for non-sensing real-time detection and statistics of detainees
  • Density analysis unit used for real-time accurate density detection and abnormal alarm in dormitories and venting circles;
  • the target detection unit includes the following steps: firstly collect five groups to expose their heads in video images in different environments according to specification requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as verification data sets; The video frame images of the four groups are operated according to the steps S1) to S5) to obtain a human head detection model; finally, this human head detection model is loaded to the remaining video frame images of that group, and the final real-time personnel detection and statistics are carried out;
  • the training process of the behavior specification detection module includes the following steps:
  • step S6 input the target detection frame obtained in step S5) into the STN, that is, the spatial transformation network, and carry out a reinforcement operation to extract a high-quality single-person area from the inaccurate candidate frame;
  • step S7) use SPPE to the single person area frame after step S6) strengthening, namely single person posture estimator, estimate the posture skeleton of this person;
  • step S8) Remap the single-person posture obtained in step S7) to the image coordinate system through SDTN, that is, the space inverse transformation network, so as to obtain a more accurate human body target detection frame, and perform the human body posture estimation operation again; then through PP- NMS, that is, parameterized non-maximum suppression, solves the problem of redundant detection, and obtains the human skeleton information under this behavior;
  • SDTN that is, the space inverse transformation network
  • step S9 For the multi-scale key points obtained in step S8), the key point regression operation is performed through the deconvolution layer, which is equivalent to performing an up-sampling process, which can improve the accuracy of the target key points; consider the connectivity of multiple key points. , establish a directed field connecting key points, and match the connected key point pairs according to the connectivity and structure of human body parts, reduce misconnections, and obtain the final human skeleton information;
  • step S10) perform feature extraction on the final human skeleton information obtained in step S9), and input it into the classifier for training as a training sample of this type of behavior;
  • the identification process of the behavior specification detection module includes the following steps:
  • step S14 Input the human skeleton feature information obtained in step S13) into the classifier for identification to obtain the video behavior category.
  • the identification process includes setting the detection trigger time period and detection area of each standard behavior detection and using the classifier to identify, including artificially setting the detection time and detection area, and strictly following the code of conduct for the detainees in the detention center.
  • the corresponding behavior recognition operation is carried out in the set detection area.
  • an alarm message needs to be issued; if it is not within the detection trigger time period, the corresponding behavior recognition operation will not be performed;
  • the detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
  • the PP-NMS operation specifically includes: selecting the attitude of the maximum confidence as a reference, and eliminating the area frame close to the reference according to the elimination standard, repeating the process many times until the redundant identification frame. is eliminated and each recognition box appears uniquely;
  • the human skeleton information obtained in the step S8) further includes: using the enhanced data set, by learning the description information of different postures in the output result, to imitate the formation process of the human body area frame, and further generate a larger training set.
  • Yet another embodiment of the present application provides a deep learning-based method for detecting behavioral norms in prisons, the method comprising the following steps:
  • Head count detection used for non-sensing roll call and/or crowd density identification; the head count detection includes a target detection and segmentation process;
  • Behavior norm detection is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
  • target detection and segmentation process specifically includes the following steps:
  • S1 use the labeling tool to label the head of the image, generate a JSON file for each picture, and extract the feature information of the image labeling through a convolutional neural network;
  • step S2) using the feature information obtained in step S1) to extract the ROI, that is, the region of interest, using the region generation network, and then use the region of interest pooling to turn these ROIs into a fixed size;
  • step S3) perform Bounding box regression and classification prediction on the ROI obtained in step S2) through the fully connected layer, sample at different points of the feature map, and apply bilinear interpolation;
  • the head count detection specifically includes the following steps:
  • Target detection which is used for non-sensing real-time detection and statistics of detainees
  • Density analysis used for real-time accurate density detection and abnormal alarm in dormitories and ventilation circles;
  • the target detection includes the following steps: firstly collect five groups to expose their heads in video images in different environments according to the specification requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as verification data sets; The video frame images of the group are operated according to the steps S1) to S5) to obtain a human head detection model; finally, the human head detection model is loaded for the remaining video frame images of the group, and the final real-time personnel detection and statistics are performed.
  • the deep learning-based method for detecting behavioral norms in prisons of the present application includes a head count detection module and a behavioral norms detection module.
  • the head count detection module is used for the nonsensical roll call of detainees and the identification of crowd density in the prison; the behavioral code detection module is used for the order of washing, housekeeping, dining and sleeping, getting up, and television education in prisons.
  • safety rotation norms conduct assessment norms, three-position supervision norms, and out-of-jail holding head norms conduct real-time calculation and judgment.
  • the head count detection module specifically includes: a target detection unit, which is used for insensitive real-time detection and statistics of detainees.
  • the density analysis unit is used for real-time accurate density detection and abnormal alarm in prisons and venting circles.
  • the behavior specification detection module specifically includes:
  • the washing order comparison unit is used to set the toilet and the waiting area in the prison, and calculate in real time whether there are only 2 people in the toilet and whether other people are waiting in the specified area.
  • the housekeeping standard unit is used to set the bed and the waiting area against the wall in the dormitory, and calculate in real time whether the bed is always kept for 4 people to clean the house, and whether other personnel are waiting in the area against the wall.
  • the meal order comparison unit is used for the meal time in the dormitory, and real-time calculation to determine whether there are abnormal people who do not sit and eat.
  • the sleeping order comparison unit is used for the rest time in the dormitory, and real-time calculation to determine whether there is a head-covered sleep or a violation of getting up.
  • the wake-up order specification unit is used for the deadline for getting up in the dormitory, and real-time calculation to determine whether someone is in the bed.
  • the TV education order comparison unit is used for the TV education time in the prison. It will calculate and judge in real time whether there are any abnormal people who are not sitting and watching TV education, and if there are too many people walking around, an alarm will be issued.
  • the safety rotation specification unit is used for setting the safety rotation area in the prison, and it is calculated in real time to determine whether two people are present in the safety rotation area, and it is judged to be a violation if they stay in the same position for a long time.
  • the conduct norm assessment unit is used for the operation time of the prison house, and the uniformity of the queue is calculated and scored in real time.
  • the three-positioning supervision unit is used to calculate and judge in real time whether the personnel perform the "three-positioning" operation in accordance with the regulations when a fight occurs in the prison.
  • the standard unit for holding the head when leaving the prison is used to set the cordon area in the prison, and calculate in real time whether the person leaves the prison to carry the head with both hands in the cordon area according to regulations.
  • the human head technology detection module includes a target detection and segmentation process
  • the behavior specification detection module includes a training process of obtaining a classifier by using a training sample set and a recognition process of using the classifier to identify test samples.
  • the corresponding behavior detection algorithm is designed in full accordance with the code of conduct for detainees in the detention center.
  • the identification process includes setting the detection triggering time period and detection area of each standard behavior detection and using the classifier to identify, including artificially setting the detection time and detection area, strictly in accordance with the code of conduct for the detainees in the detention center.
  • the corresponding behavior identification operation is performed in the set detection area, and an alarm message is issued when a violation is identified.
  • the detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
  • the behavior detection is only triggered in the set time period and the set detection area, and the corresponding recognition algorithm is not carried out in other time periods and other areas. Reduce the execution complexity of the system and improve the stability of the system.
  • the detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
  • FIG. 1 the target detection and segmentation process of the human head technology detection module is shown in Figure 1, including the following steps:
  • RPN Region Proposal Network, region generation network
  • step S3 Perform Bounding box regression and classification prediction on the ROI obtained in step S2) through a fully connected layer, sample at different points of the feature map, and apply bilinear interpolation.
  • the dataset contains four different environments, 10 people are divided into five groups, and each group is repeated three times according to the specification. Four of these groups were used as training datasets, and the remaining group was used as test datasets.
  • the target detection to complete target detection, first collect five groups of videos that expose their heads in different environments according to the specification requirements. Four groups of videos are used as training data sets, and one group of videos is used as validation data sets. First, the four groups of video frame images are operated according to the above-mentioned steps S1) to S5), and finally the model of human head detection is obtained; then the human head detection model is loaded on the remaining group of video frame images, and the final real-time personnel detection is carried out. and statistics. To complete the density detection, the final step of density calculation is required.
  • the training process of the behavior specification detection module is shown in Figure 2, including the following steps:
  • step S6) Input the target detection frame obtained in step S5) into STN (Spatial Transform Networks) for reinforcement operation, and extract high-quality single-person regions from inaccurate candidate frames.
  • STN Sesian Transform Networks
  • step S8) Remap the single-person pose obtained in step S7) to the image coordinate system through SDTN (Spatial De-Transformer Network), so as to obtain a more accurate human target detection frame, and perform the human pose estimation operation again. . Then, the redundant detection problem is solved by PP-NMS (Parametric Pose Non-Maximum-Suppression, parametric non-maximum suppression), and the human skeleton information under this behavior is obtained.
  • SDTN Spatial De-Transformer Network
  • step S9 For the multi-scale key points obtained in step S8), the key point regression operation is performed through the deconvolution layer, which is equivalent to performing an up-sampling process, which can improve the accuracy of the target key points.
  • the key point regression operation is performed through the deconvolution layer, which is equivalent to performing an up-sampling process, which can improve the accuracy of the target key points.
  • a directed field connecting the key points is established, and the connected key point pairs are clearly matched according to the connectivity and structure of human body parts to reduce misconnections and obtain the final human skeleton information.
  • step S9) obtains the final human body skeleton information and carries out feature extraction, and it is input into the classifier as the training sample of this type of behavior for training;
  • the identification process of the behavior specification detection module is shown in Figure 2, including the following steps:
  • step S14 Input the human skeleton feature information obtained in step S13) into the classifier for identification to obtain the video behavior category.
  • step S5 preferably two layers of convolution are used to extract detection results for different feature maps.
  • step S8 PP-NMS operates as follows:
  • the pose with the highest confidence is selected as the reference, and the area frame close to the reference is eliminated according to the elimination standard. This process is repeated many times until the redundant identification frame is eliminated and each identification frame is unique.
  • the human skeleton information obtained in step S8) also includes the following operations:
  • This application preferably adopts the detention center data set.
  • the data set contains four different environments, 10 people are divided into five groups, and each group is repeated three times according to the specification requirements. Four of these groups were used as training datasets, and the remaining group was used as test datasets.
  • Figure 3 shows a simplified low-level feature extraction and modeling flowchart.
  • the posture estimation framework adopted is RMPE (Regional Multi-Person Pose Estimation, regional multi-person posture detection).
  • RMPE Registered Multi-Person Pose Estimation, regional multi-person posture detection.
  • the outputs of each specific convolutional layer are convolved with two 3*3 convolution kernels respectively, and all the generated bounding boxes are collected together to obtain the filtered target detection frame through NMS, and then the detection frame is input to STN and
  • the human body posture is automatically detected in SPPE, and then regression is performed through SDTN and PP-NMS to establish a directed field connecting key points, reducing misconnection to obtain the final human posture skeleton feature.
  • the technical solution of the present application adopts a two-layer convolution operation to extract the underlying features, and then uses a non-maximum suppression method to eliminate redundancy in the detection results.
  • the detection frame after redundancy elimination is input into the STN layer to enhance the features.
  • the function of the STN network is to make the obtained features robust to translation, rotation and scale changes.
  • the feature image output by STN is used for SPPE single-person pose estimation, and then the pose estimation result is returned to the image coordinate system through SDTN, which can extract high-quality human regions in the inaccurate region frame.
  • the problem of redundant detection is solved by PP-NMS.
  • the key point regression is carried out through the deconvolution layer, the accuracy of the key points is improved, the directed field connecting the key points is established, and the misconnection is reduced, so as to obtain the final human skeleton information.
  • CNN is an efficient identification method that has been developed in recent years and has attracted attention.
  • Hubel and Wiesel discovered that their unique network structure can effectively reduce the complexity of the feedback neural network when they studied the neurons used for local sensitivity and direction selection in the cat cerebral cortex, and then proposed CNN.
  • CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification, because the network avoids the complex pre-processing of the image and can directly input the original image, so it has been more widely used.
  • the basic structure of CNN includes two layers, one of which is a feature extraction layer, the input of each neuron is connected to the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, the positional relationship between it and other features is also determined; the second is the feature mapping layer, each computing layer of the network consists of multiple feature maps, each feature map is a plane, All neurons in the plane have equal weights.
  • the feature mapping layer is used to extract the global underlying features in the video frame images, and then perform deeper processing on the underlying features.
  • the layer to be used in the technical solution of this application is the Feature Map obtained after convolution.
  • the detection result is obtained by convolving the feature map, and the detection value includes the class confidence and the position of the bounding box. Each is done with a 3 ⁇ 3 convolution.

Abstract

Disclosed are a deep-learning-based apparatus and method for monitoring behavioral norms in a jail. The deep-learning-based apparatus for monitoring behavioral norms in a jail comprises: a people counting and detection module and a behavioral norm monitoring module, wherein the people counting and detection module comprises a target detection and segmentation process, and is used for imperceptible roll call of people and crowd density recognition; and the behavioral norm monitoring module comprises a training process of obtaining a classifier by using a training sample set and a recognition process of recognizing a test sample by using the classifier, and is used for performing real-time calculation and discrimination on behaviors of people. In this way, according to the present application, behavioral norm recognition can be effectively performed, regarding the requirements of a jail, on detainees, and abnormal behaviors are detected and alarms for same are provided, thereby reinforcing the security protection of the jail and improving the working efficiency of correctional officers.

Description

基于深度学习的监所行为规范检测装置及方法Device and method for detecting behavioral norms of prisons based on deep learning
本申请基于并要求于2020年07月28日递交的申请号为202010736024.9、发明名称为“基于深度学习的监所行为规范检测装置及方法”的中国专利申请的优先权。This application is based on and claims the priority of the Chinese patent application filed on July 28, 2020 with the application number of 202010736024.9 and the invention titled "Deep Learning-based Prison Behavioral Code Detection Device and Method".
技术领域technical field
本申请涉及机器学习研究领域,特别是涉及一种基于深度学习的监所行为规范检测装置及方法。The present application relates to the field of machine learning research, and in particular, to a deep learning-based device and method for detecting behavioral norms in prisons.
背景技术Background technique
随着信息技术的飞速发展,计算机视觉伴随着VR、AR以及人工智能等概念的出现迎来了最好的发展时期,作为计算机视觉领域最重要的视频行为分析也越来越受到国内外学者的青睐。视频监控、人机交互、医疗看护、视频检索等一系列的领域中,视频行为分析占据了很大的比重。例如现在比较流行的无人驾驶汽车项目,视频行为分析非常具有挑战性。由于人体动作的复杂性和多样性的特点,再加上多个视角下人体自遮挡、多尺度以及视角旋转、平移等因素的影响,使得视频行为识别的难度非常大。如何能够精确地识别实际生活中多个角度下人体行为,并对人体行为进行分析,一直都是非常重要的研究课题,并且社会对行为分析的要求也越来越高。With the rapid development of information technology, computer vision has ushered in the best period of development with the emergence of concepts such as VR, AR, and artificial intelligence. As the most important video behavior analysis in the field of computer vision, it is increasingly being studied by scholars at home and abroad. favor. In a series of fields such as video surveillance, human-computer interaction, medical care, and video retrieval, video behavior analysis occupies a large proportion. For example, in the popular driverless car project, video behavior analysis is very challenging. Due to the complexity and diversity of human actions, coupled with the influence of human self-occlusion, multi-scale, and perspective rotation and translation from multiple perspectives, it is very difficult to recognize video behaviors. How to accurately identify human behavior from multiple angles in real life and analyze human behavior has always been a very important research topic, and the society has higher and higher requirements for behavior analysis.
传统的研究方法包含以下几种:Traditional research methods include the following:
基于视频流特征点:对提取到的视频帧图像提取其中的时空特征点,然后时空特征点建模、分析,最后进行分类。Based on video stream feature points: extract the spatiotemporal feature points from the extracted video frame images, then model and analyze the spatiotemporal feature points, and finally classify them.
基于单帧图像特征:通过算法或者深度相机提取到单帧图像中人的行为特征,然后通过对行为特征进行描述、建模,训练继而对视频行为分类。Based on single-frame image features: extract the behavioral features of people in a single-frame image through algorithms or depth cameras, and then describe, model, and train the behavioral features to classify video behaviors.
基于视频流特征点和单帧图像特征的行为分析方法,在传统单视角下或者单人模式下取得 了显著地成果,但是针对现在像大街、机场、车站等行人流量比较大的地区或者人体遮挡、光照变化、视角变换等一系列复杂问题的出现,单纯的使用这两种分析方法在实际生活中效果往往达不到人们的要求,有时算法的鲁棒性也很差。The behavior analysis method based on video stream feature points and single-frame image features has achieved remarkable results in traditional single-view or single-person mode, but it is currently used in areas with relatively large pedestrian traffic such as streets, airports, and stations, or human occlusion. The appearance of a series of complex problems such as lighting changes, perspective changes, etc., simply using these two analysis methods in real life often fails to meet people's requirements, and sometimes the robustness of the algorithm is also very poor.
发明内容SUMMARY OF THE INVENTION
为了解决以上现有技术存在的缺陷,本申请提出一种基于深度学习的监所行为规范检测装置及方法,采用深度学习网络对人体行为进行分析,提升分类模型的鲁棒性;尤其深度学习网络适合基于大数据进行训练、学习,能够很好地发挥出其的优点。In order to solve the above-mentioned defects in the prior art, the present application proposes a deep learning-based detection device and method for behavioral norms of prisons and institutions, which adopts a deep learning network to analyze human behavior and improves the robustness of a classification model; especially a deep learning network It is suitable for training and learning based on big data, and can give full play to its advantages.
本申请的技术方案是这样实现的:The technical solution of the present application is realized as follows:
本申请实施例提供了一种基于深度学习的监所行为规范检测装置,完全依照看守所在押人员行为规范守则进行相应的行为检测算法设计,所述检测流程包括设定各项规范行为检测的检测触发时间段与检测区域,只在设定的时间段与设定的检测区域内触发行为检测,其他时间段与其他区域不进行相应的识别算法。降低系统的执行复杂性,提高系统的稳定性。检测时间段与检测区域完全由用户自定义,依照标准行为规范设定,能够很好地满足行为规范检测的需求。The embodiment of the present application provides a deep learning-based detection device for behavioral norms of prisons, which completely designs a behavioral detection algorithm in accordance with the code of conduct for detainees in prisons. The detection process includes setting detection triggers for each standardized behavioral detection. Time period and detection area, the behavior detection is only triggered in the set time period and the set detection area, and the corresponding recognition algorithm is not performed in other time periods and other areas. Reduce the execution complexity of the system and improve the stability of the system. The detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
具体地,本申请提出了一种基于深度学习的监所行为规范检测装置,包括:人头计数检测模块和行为规范检测模块;其中:Specifically, this application proposes a deep learning-based behavioral code detection device for prisons, including: a head count detection module and a behavioral code detection module; wherein:
所述人头计数检测模块,用于无感点名和/或人群密度识别;所述人头计数检测模块包括目标检测分割过程;The head count detection module is used for non-sensing roll call and/or crowd density identification; the head count detection module includes a target detection and segmentation process;
所述行为规范检测模块,用于对人员行为进行实时计算判别;所述行为规范检测模块包括利用训练样本集获得分类器的训练过程,以及利用分类器识别测试样本的识别过程。The behavior norm detection module is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection module includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
具体地,本申请实施例还提出了一种基于深度学习的监所行为规范检测方法,其特征在于,所述方法包括如下步骤:Specifically, an embodiment of the present application also proposes a deep learning-based method for detecting behavioral norms in prisons, characterized in that the method includes the following steps:
人头计数检测,用于无感点名和/或人群密度识别;所述人头计数检测包括目标检测分割过程;Head count detection, used for non-sensing roll call and/or crowd density identification; the head count detection includes a target detection and segmentation process;
行为规范检测,用于对人员行为进行实时计算判别;所述行为规范检测包括利用训练样本 集获得分类器的训练过程,以及利用分类器识别测试样本的识别过程。Behavioral norm detection is used for real-time calculation and discrimination of personnel behavior; the behavioral norm detection includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
本申请的优点在于:使用CNN的方法得出的是全局高级特征,经过STN的特征强化,对实际生活中的视频具有很好的鲁棒性,然后使用SPPE得到人体姿态信息,经过SDTN回归到人体检测框,优化自身网络,使用PP-NMS来解决冗余检测问题,基于姿态估计结果进行相应的分类器训练,全局特征得到的特征更全面,使得行为描述地更加完整,适用性更强。The advantages of this application are: the global high-level features are obtained by using the CNN method, and the feature enhancement of STN has good robustness to real-life videos, and then SPPE is used to obtain the human body posture information, and the SDTN returns to Human body detection frame, optimize its own network, use PP-NMS to solve the problem of redundant detection, and conduct corresponding classifier training based on the attitude estimation results. The features obtained from the global features are more comprehensive, making the behavior description more complete and more applicable.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请人头技术检测模块的目标检测分割过程的流程示意图;1 is a schematic flowchart of the target detection and segmentation process of the applicant's head technology detection module;
图2是本申请行为规范检测模块的训练过程的流程示意图;Fig. 2 is the schematic flow chart of the training process of the code of conduct detection module of the present application;
图3是本申请行为规范检测模块的判别过程的流程示意图;3 is a schematic flowchart of the discrimination process of the code of conduct detection module of the present application;
图4是简化的底层特征的提取与建模流程图;Fig. 4 is a simplified flowchart of extraction and modeling of underlying features;
图5是一般CNN的处理流程图。Figure 5 is a process flow diagram of a general CNN.
具体实施方式detailed description
鉴于现有技术中的不足,本案发明人经长期研究和大量实践,得以提出本申请的技术方案。如下将对该技术方案、其实施过程及原理等作进一步的解释说明。In view of the deficiencies in the prior art, the inventor of the present application has been able to propose the technical solution of the present application after long-term research and extensive practice. The technical solution, its implementation process and principle will be further explained as follows.
本申请实施例提供的一种基于深度学习的监所行为规范检测装置,使用CNN的方法对底层特征进行特征提取,得到全局的特征而不是传统方法所得到的关键点,以及,本申请实施例提供的一种基于深度学习的监所行为规范检测装置,使用STN方法对得到的全局特征进行特征强化,而不是对得到的特征直接进行建模;并且,本申请实施例提供的一种基于深度学习的监所行为规范检测装置,使用了SDTN方法对得到的姿态特征进行重新映射,进一步加强检测框的准确度,另外地,对于多尺度下的关键点,经过一层反卷积层进行关键点回归操作,能够有 效提升多人关键点检测的精度,本申请实施例提供的一种基于深度学习的监所行为规范检测装置还考虑到多个关键点的连通性,建立连接关键点的有向场。根据人体关键点的连通性和人体结构明确匹配连接的关键点对。An apparatus for detecting behavior norms of prisons based on deep learning provided by an embodiment of the present application uses a CNN method to perform feature extraction on underlying features to obtain global features instead of key points obtained by traditional methods, and the embodiments of the present application Provided is a deep learning-based monitoring device for behavioral norms of prisons and institutions, which uses the STN method to perform feature enhancement on the obtained global features instead of directly modeling the obtained features; The learned prison behavior norm detection device uses the SDTN method to remap the obtained pose features to further enhance the accuracy of the detection frame. In addition, for the key points at multiple scales, a layer of deconvolution layer is used to perform key detection. The point regression operation can effectively improve the accuracy of multi-person key point detection. The deep learning-based prison code of conduct detection device provided by the embodiment of the present application also takes into account the connectivity of multiple key points, and establishes a method for connecting key points. to the field. The connected keypoint pairs are explicitly matched according to the connectivity of the human keypoints and the human body structure.
本申请一实施例提供了一种基于深度学习的监所行为规范检测装置,包括:人头计数检测模块和行为规范检测模块;其中:An embodiment of the present application provides a deep learning-based behavioral code detection device for prisons, including: a head count detection module and a behavioral code detection module; wherein:
所述人头计数检测模块,用于无感点名和/或人群密度识别;所述人头计数检测模块包括目标检测分割过程;The head count detection module is used for non-sensing roll call and/or crowd density identification; the head count detection module includes a target detection and segmentation process;
所述行为规范检测模块,用于对人员行为进行实时计算判别;所述行为规范检测模块包括利用训练样本集获得分类器的训练过程,以及利用分类器识别测试样本的识别过程。The behavior norm detection module is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection module includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
进一步的,所述人头计数检测模块的目标检测分割过程包括以下步骤:Further, the target detection and segmentation process of the head count detection module includes the following steps:
S1)利用标注工具,对图像进行人头的标注,每个图片产生一个JSON文件,经过卷积神经网络提取图像标注的特征信息;S1) use the labeling tool to label the head of the image, generate a JSON file for each picture, and extract the feature information of the image labeling through a convolutional neural network;
S2)将步骤S1)得到的特征信息使用区域生成网络提取出ROI,即感兴趣区域,然后使用感兴趣区域池化将这些ROI全部变成固定尺寸;S2) using the feature information obtained in step S1) to extract the ROI, that is, the region of interest, using the region generation network, and then use the region of interest pooling to turn these ROIs into a fixed size;
S3)对步骤S2)得到的ROI通过全连接层进行Bounding box回归和分类预测,在特征图的不同点采样,并应用双线性插值;S3) perform Bounding box regression and classification prediction on the ROI obtained in step S2) through the fully connected layer, sample at different points of the feature map, and apply bilinear interpolation;
S4)最后进行分割掩码网络,取ROI分类器选择的正区域为输入,并生成它们的掩码;将预测的掩码放大为ROI边框的尺寸以给出最终的掩码结果,每个目标有一个掩码;将预测分割的掩模添加到每个ROI,输出结果为图像上现有目标和高质量的分割掩模。S4) Finally, perform a segmentation mask network, take the positive regions selected by the ROI classifier as input, and generate their masks; enlarge the predicted masks to the size of the ROI border to give the final mask results, each target There is a mask; the predicted segmentation mask is added to each ROI, and the output is the existing object on the image and a high-quality segmentation mask.
进一步的,所述人头计数检测模块具体包括:Further, the head count detection module specifically includes:
目标检测单元,用于对在押人员的无感实时检测和统计;The target detection unit is used for non-sensing real-time detection and statistics of detainees;
密度分析单元,用于监舍及放风圈的实时精准密度检测和异常报警;Density analysis unit, used for real-time accurate density detection and abnormal alarm in dormitories and venting circles;
所述目标检测单元包括以下步骤:首先采集五个组分别按规范要求在不同环境中暴露出头在视频图像中,其中四个组的视频作为训练数据集,一个组视频作为验证数据集;然后将四组的视频帧图像按照所述步骤S1)至S5)进行操作,得到人头检测模型;最后对剩下那组的视频帧图像加载该人头检测模型,进行最终的人员实时检测和统计;The target detection unit includes the following steps: firstly collect five groups to expose their heads in video images in different environments according to specification requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as verification data sets; The video frame images of the four groups are operated according to the steps S1) to S5) to obtain a human head detection model; finally, this human head detection model is loaded to the remaining video frame images of that group, and the final real-time personnel detection and statistics are carried out;
进一步的,所述行为规范检测模块的训练过程包括以下步骤:Further, the training process of the behavior specification detection module includes the following steps:
S5)将某一行为的视频帧图像输入,让图片经过卷积神经网络提取特征,对网络中6个特定的卷积层的输出分别用两个3*3的卷积核进行卷积,然后将生成的所有边界框都集合起来,全部丢到NMS,即非极大值抑制,得到一系列的目标检测框。S5) Input the video frame image of a certain behavior, let the image extract features through the convolutional neural network, and convolve the outputs of the six specific convolutional layers in the network with two 3*3 convolution kernels, and then Collect all the generated bounding boxes and throw them all to NMS, that is, non-maximum suppression, to obtain a series of target detection boxes.
S6)将步骤S5)得到的目标检测框输入到STN,即空间变换网络,进行强化操作,从不准确的候选框中提取高质量的单人区域;S6) input the target detection frame obtained in step S5) into the STN, that is, the spatial transformation network, and carry out a reinforcement operation to extract a high-quality single-person area from the inaccurate candidate frame;
S7)对步骤S6)强化后的单人区域框使用SPPE,即单人姿态估计器,估计此人的姿态骨架;S7) use SPPE to the single person area frame after step S6) strengthening, namely single person posture estimator, estimate the posture skeleton of this person;
S8)将步骤S7)得到的单人姿态通过SDTN,即空间逆变换网络,重新映射到图像坐标系下,从而得到更加精确的人体目标检测框,并再次进行人体姿态估计操作;然后通过PP-NMS,即参数化非极大值抑制,解决冗余检测问题,得到该行为下的人体骨架信息;S8) Remap the single-person posture obtained in step S7) to the image coordinate system through SDTN, that is, the space inverse transformation network, so as to obtain a more accurate human body target detection frame, and perform the human body posture estimation operation again; then through PP- NMS, that is, parameterized non-maximum suppression, solves the problem of redundant detection, and obtains the human skeleton information under this behavior;
S9)对步骤S8)得到的多尺度下关键点,经过反卷积层进行关键点回归操作,相当于进行一次向上采样的过程,能够提高目标关键点的精度;考虑多个关键点的连通性,建立连接关键点的有向场,根据人体部位的连通性和结构明确匹配连接的关键点对,减少误连接,得到最终的人体骨架信息;S9) For the multi-scale key points obtained in step S8), the key point regression operation is performed through the deconvolution layer, which is equivalent to performing an up-sampling process, which can improve the accuracy of the target key points; consider the connectivity of multiple key points. , establish a directed field connecting key points, and match the connected key point pairs according to the connectivity and structure of human body parts, reduce misconnections, and obtain the final human skeleton information;
S10)对步骤S9)得到最终人体骨架信息进行特征提取,并将其作为该类行为的训练样本输入到分类器中进行训练;S10) perform feature extraction on the final human skeleton information obtained in step S9), and input it into the classifier for training as a training sample of this type of behavior;
S11)重复以上各步骤,得到各种行为的分类器。S11) Repeat the above steps to obtain classifiers of various behaviors.
进一步的,所述行为规范检测模块的识别过程包括以下步骤:Further, the identification process of the behavior specification detection module includes the following steps:
S12)依照看守所在押人员行为规范守则,针对具体的行为检测要求,设定检测触发时间段与检测区域,以JSON的形式存储在本地;S12) According to the code of conduct of the detainees, according to the specific behavior detection requirements, set the detection trigger time period and detection area, and store them locally in the form of JSON;
S13)进行检测时,先读取JSON文件,在设定好的检测触发时间段内,录入某一行为的视频帧图像,只取检测区域内的图像,采用所述步骤S5)至S10)对其进行人体姿态估计,得到检测区域内的人体骨架特征信息;其余时间段只播放视频帧图像,不进行相应的行为识别操作;S13) when detecting, first read the JSON file, in the set detection trigger time period, enter the video frame image of a certain behavior, only take the image in the detection area, and adopt the steps S5) to S10) to It performs human pose estimation and obtains the human skeleton feature information in the detection area; in other time periods, only video frame images are played, and corresponding behavior recognition operations are not performed;
S14)将步骤S13)得到的人体骨架特征信息输入到分类器中进行识别得到视频行为类别。S14) Input the human skeleton feature information obtained in step S13) into the classifier for identification to obtain the video behavior category.
进一步的,所述识别过程包括设定各项规范行为检测的检测触发时间段与检测区域和利用 分类器进行识别,包括人为设定检测时间和检测区域,严格按照看守所在押人员行为规范守则执行,当处于检测触发时间段内,对设定的检测区域内进行相应的行为识别操作,当识别出违规行为需发出警报信息;若未在检测触发时间段内,则不进行相应的行为识别操作;检测时间段与检测区域完全由用户自定义,依照标准行为规范设定,能够很好地满足行为规范检测的需求。Further, the identification process includes setting the detection trigger time period and detection area of each standard behavior detection and using the classifier to identify, including artificially setting the detection time and detection area, and strictly following the code of conduct for the detainees in the detention center. When it is within the detection trigger time period, the corresponding behavior recognition operation is carried out in the set detection area. When a violation is identified, an alarm message needs to be issued; if it is not within the detection trigger time period, the corresponding behavior recognition operation will not be performed; The detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
进一步的,所述步骤S8)中PP-NMS操作具体包括:选取最大置信度的姿态作为参考,并且根据消除标准将靠近该参考的区域框进行消除,多次重复该过程直到冗余的识别框被消除并且每一个识别框都是唯一出现;Further, in the described step S8), the PP-NMS operation specifically includes: selecting the attitude of the maximum confidence as a reference, and eliminating the area frame close to the reference according to the elimination standard, repeating the process many times until the redundant identification frame. is eliminated and each recognition box appears uniquely;
所述步骤S8)得到的人体骨架信息还包括:使用强化数据集,通过学习输出结果中不同姿态的描述信息,来模仿人体区域框的形成过程,进一步产生一个更大的训练集。The human skeleton information obtained in the step S8) further includes: using the enhanced data set, by learning the description information of different postures in the output result, to imitate the formation process of the human body area frame, and further generate a larger training set.
本申请又一实施例提供了一种基于深度学习的监所行为规范检测方法,所述方法包括如下步骤:Yet another embodiment of the present application provides a deep learning-based method for detecting behavioral norms in prisons, the method comprising the following steps:
人头计数检测,用于无感点名和/或人群密度识别;所述人头计数检测包括目标检测分割过程;Head count detection, used for non-sensing roll call and/or crowd density identification; the head count detection includes a target detection and segmentation process;
行为规范检测,用于对人员行为进行实时计算判别;所述行为规范检测包括利用训练样本集获得分类器的训练过程,以及利用分类器识别测试样本的识别过程。Behavior norm detection is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
进一步的,所述目标检测分割过程具体包括以下步骤:Further, the target detection and segmentation process specifically includes the following steps:
S1)利用标注工具,对图像进行人头的标注,每个图片产生一个JSON文件,经过卷积神经网络提取图像标注的特征信息;S1) use the labeling tool to label the head of the image, generate a JSON file for each picture, and extract the feature information of the image labeling through a convolutional neural network;
S2)将步骤S1)得到的特征信息使用区域生成网络提取出ROI,即感兴趣区域,然后使用感兴趣区域池化将这些ROI全部变成固定尺寸;S2) using the feature information obtained in step S1) to extract the ROI, that is, the region of interest, using the region generation network, and then use the region of interest pooling to turn these ROIs into a fixed size;
S3)对步骤S2)得到的ROI通过全连接层进行Bounding box回归和分类预测,在特征图的不同点采样,并应用双线性插值;S3) perform Bounding box regression and classification prediction on the ROI obtained in step S2) through the fully connected layer, sample at different points of the feature map, and apply bilinear interpolation;
S4)最后进行分割掩码网络,取ROI分类器选择的正区域为输入,并生成它们的掩码;将预测的掩码放大为ROI边框的尺寸以给出最终的掩码结果,每个目标有一个掩码;将预测分割的掩模添加到每个ROI,输出结果为图像上现有目标和高质量的分割掩模。S4) Finally, perform a segmentation mask network, take the positive regions selected by the ROI classifier as input, and generate their masks; enlarge the predicted masks to the size of the ROI border to give the final mask results, each target There is a mask; the predicted segmentation mask is added to each ROI, and the output is the existing object on the image and a high-quality segmentation mask.
进一步的,所述人头计数检测具体包括以下步骤:Further, the head count detection specifically includes the following steps:
目标检测,用于对在押人员的无感实时检测和统计;Target detection, which is used for non-sensing real-time detection and statistics of detainees;
密度分析,用于监舍及放风圈的实时精准密度检测和异常报警;Density analysis, used for real-time accurate density detection and abnormal alarm in dormitories and ventilation circles;
所述目标检测包括以下步骤:首先采集五个组分别按规范要求在不同环境中暴露出头在视频图像中,其中四个组的视频作为训练数据集,一个组视频作为验证数据集;然后将四组的视频帧图像按照所述步骤S1)至S5)进行操作,得到人头检测模型;最后对剩下那组的视频帧图像加载该人头检测模型,进行最终的人员实时检测和统计。The target detection includes the following steps: firstly collect five groups to expose their heads in video images in different environments according to the specification requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as verification data sets; The video frame images of the group are operated according to the steps S1) to S5) to obtain a human head detection model; finally, the human head detection model is loaded for the remaining video frame images of the group, and the final real-time personnel detection and statistics are performed.
如下将结合附图对该技术方案、其实施过程及原理等作进一步的解释说明。The technical solution, its implementation process and principle will be further explained below with reference to the accompanying drawings.
如图1-图3所示,本申请的基于深度学习的监所行为规范检测方法,包括人头计数检测模块和行为规范检测模块。As shown in FIGS. 1-3 , the deep learning-based method for detecting behavioral norms in prisons of the present application includes a head count detection module and a behavioral norms detection module.
所述人头计数检测模块,用于监所内在押人员无感点名和人群密度识别;所述行为规范检测模块,用于监舍洗漱秩序、内务整理、就餐秩序和睡觉秩序、起床秩序、电视教育秩序、安全轮值规范、操行考核规范、三定位监管规范、出监抱头规范的行为进行实时计算判别。The head count detection module is used for the nonsensical roll call of detainees and the identification of crowd density in the prison; the behavioral code detection module is used for the order of washing, housekeeping, dining and sleeping, getting up, and television education in prisons. , safety rotation norms, conduct assessment norms, three-position supervision norms, and out-of-jail holding head norms conduct real-time calculation and judgment.
其中,所述人头计数检测模块具体包括:目标检测单元,用于对在押人员的无感实时检测和统计。密度分析单元,用于监舍及放风圈的实时精准密度检测和异常报警。Wherein, the head count detection module specifically includes: a target detection unit, which is used for insensitive real-time detection and statistics of detainees. The density analysis unit is used for real-time accurate density detection and abnormal alarm in prisons and venting circles.
其中,所述行为规范检测模块具体包括:Wherein, the behavior specification detection module specifically includes:
洗漱秩序比对单元,用于监舍设定卫生间与排队等候区,实时计算判别卫生间内是否保持只有2人,其他人员是否在规定区域内等候。The washing order comparison unit is used to set the toilet and the waiting area in the prison, and calculate in real time whether there are only 2 people in the toilet and whether other people are waiting in the specified area.
整理内务规范单元,用于监舍设定床铺与靠墙等候区,实时计算判别床铺是否始终保持4人在整理内务,其他人员是否在靠墙区域内等候。The housekeeping standard unit is used to set the bed and the waiting area against the wall in the dormitory, and calculate in real time whether the bed is always kept for 4 people to clean the house, and whether other personnel are waiting in the area against the wall.
就餐秩序比对单元,用于监舍就餐时间,实时计算判别是否有异常未坐立就餐人员。The meal order comparison unit is used for the meal time in the dormitory, and real-time calculation to determine whether there are abnormal people who do not sit and eat.
睡觉秩序比对单元,用于监舍休息时间,实时计算判别是否有蒙头睡觉、违规起身。The sleeping order comparison unit is used for the rest time in the dormitory, and real-time calculation to determine whether there is a head-covered sleep or a violation of getting up.
起床秩序规范单元,用于监舍起床截止时间,实时计算判别是否有人在床铺内。The wake-up order specification unit is used for the deadline for getting up in the dormitory, and real-time calculation to determine whether someone is in the bed.
电视教育秩序比对单元,用于监舍电视教育时间,实时计算判别是否有异常未坐立观看电视教育人员,并且走动人数过多将发出警报。The TV education order comparison unit is used for the TV education time in the prison. It will calculate and judge in real time whether there are any abnormal people who are not sitting and watching TV education, and if there are too many people walking around, an alarm will be issued.
安全轮值规范单元,用于监舍设定安全轮值区域,实时计算判别安全轮值区域是否保持2 人在场,并且长时间处在同一位置不动判别为违规。The safety rotation specification unit is used for setting the safety rotation area in the prison, and it is calculated in real time to determine whether two people are present in the safety rotation area, and it is judged to be a violation if they stay in the same position for a long time.
操行规范考核单元,用于监舍做操时间,实时计算判别队列整齐度并进行评分。The conduct norm assessment unit is used for the operation time of the prison house, and the uniformity of the queue is calculated and scored in real time.
三定位监管单元,用于监舍发生打架行为时,实时计算判别人员是否按规定进行“三定位”操作。The three-positioning supervision unit is used to calculate and judge in real time whether the personnel perform the "three-positioning" operation in accordance with the regulations when a fight occurs in the prison.
出监抱头规范单元,用于监舍设定警戒线区域,实时计算判别人员出监舍是否按规定在警戒线区域内进行双手抱头。The standard unit for holding the head when leaving the prison is used to set the cordon area in the prison, and calculate in real time whether the person leaves the prison to carry the head with both hands in the cordon area according to regulations.
进一步地,其中人头技术检测模块包括了目标检测分割过程,行为规范检测模块包括利用训练样本集获得分类器的训练过程及利用分类器识别测试样本的识别过程。完全依照看守所在押人员行为规范守则进行相应的行为检测算法设计。所述识别过程包括设定各项规范行为检测的检测触发时间段与检测区域和利用分类器进行识别,包括人为设定检测时间和检测区域,严格按照看守所在押人员行为规范守则执行,当处于检测触发时间段内,对设定的检测区域内进行相应的行为识别操作,当识别出违规行为需发出警报信息。若未在检测触发时间段内,则不进行相应的行为识别操作。检测时间段与检测区域完全由用户自定义,依照标准行为规范设定,能够很好地满足行为规范检测的需求。只在设定的时间段与设定的检测区域内触发行为检测,其他时间段与其他区域不进行相应的识别算法。降低系统的执行复杂性,提高系统的稳定性。检测时间段与检测区域完全由用户自定义,依照标准行为规范设定,能够很好地满足行为规范检测的需求。Further, the human head technology detection module includes a target detection and segmentation process, and the behavior specification detection module includes a training process of obtaining a classifier by using a training sample set and a recognition process of using the classifier to identify test samples. The corresponding behavior detection algorithm is designed in full accordance with the code of conduct for detainees in the detention center. The identification process includes setting the detection triggering time period and detection area of each standard behavior detection and using the classifier to identify, including artificially setting the detection time and detection area, strictly in accordance with the code of conduct for the detainees in the detention center. During the trigger time period, the corresponding behavior identification operation is performed in the set detection area, and an alarm message is issued when a violation is identified. If it is not within the detection trigger time period, the corresponding behavior recognition operation will not be performed. The detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection. The behavior detection is only triggered in the set time period and the set detection area, and the corresponding recognition algorithm is not carried out in other time periods and other areas. Reduce the execution complexity of the system and improve the stability of the system. The detection time period and detection area are completely user-defined and set in accordance with the standard code of conduct, which can well meet the needs of code of conduct detection.
进一步地,所述人头技术检测模块的目标检测分割过程如图1所示,包括以下步骤:Further, the target detection and segmentation process of the human head technology detection module is shown in Figure 1, including the following steps:
S1)利用标注工具,对图像进行人头的标注,每个图片产生一个JSON文件。经过CNN(Convolutional Neural Network,卷积神经网络)提取图像标注的特征信息。S1) Use the labeling tool to label the head of the image, and generate a JSON file for each image. The feature information of image annotation is extracted through CNN (Convolutional Neural Network, convolutional neural network).
S2)将步骤S1)得到的特征信息使用RPN(RegionProposal Network,区域生成网络)去提取出ROI(Region Of Interest,感兴趣区域),然后使用ROI Pooling(感兴趣区域池化)将这些ROI全部变成固定尺寸。S2) Use RPN (Region Proposal Network, region generation network) to extract the ROI (Region Of Interest, region of interest) from the feature information obtained in step S1), and then use ROI Pooling (region of interest pooling) to change all these ROIs into a fixed size.
S3)对步骤S2)得到的ROI通过全连接层进行Bounding box回归和分类预测,在特征图的不同点采样,并应用双线性插值。S3) Perform Bounding box regression and classification prediction on the ROI obtained in step S2) through a fully connected layer, sample at different points of the feature map, and apply bilinear interpolation.
S4)最后进行分割掩码网络,取ROI分类器选择的正区域为输入,并生成它们的掩码。将 预测的掩码放大为ROI边框的尺寸以给出最终的掩码结果,每个目标有一个掩码。将预测分割的掩模添加到每个ROI,输出结果为图像上现有目标和高质量的分割掩模。S4) Finally, perform a segmentation mask network, take the positive regions selected by the ROI classifier as input, and generate their masks. The predicted mask is scaled up to the size of the ROI bounding box to give the final masking result, one mask per object. The predicted segmentation mask is added to each ROI, and the output is the existing object on the image and a high-quality segmentation mask.
数据集包含四种不同环境、10人分五组,每组按规范要求重复三次。使用其中的四个组作为训练数据集,剩余的一组作为测试数据集。The dataset contains four different environments, 10 people are divided into five groups, and each group is repeated three times according to the specification. Four of these groups were used as training datasets, and the remaining group was used as test datasets.
具体的,例如要完成目标检测,首先采集五个组分别按规范要求在不同环境中暴露出头在视频图像中,其中四个组的视频作为训练数据集,一个组视频作为验证数据集。首先将四组的视频帧图像按照上述步骤S1)至S5)进行操作,最终得到的是人头检测的模型;然后对剩下那组的视频帧图像加载该人头检测模型,进行最终的人员实时检测和统计。若要完成密度检测,则需要在最后进行一步密度计算即可。Specifically, for example, to complete target detection, first collect five groups of videos that expose their heads in different environments according to the specification requirements. Four groups of videos are used as training data sets, and one group of videos is used as validation data sets. First, the four groups of video frame images are operated according to the above-mentioned steps S1) to S5), and finally the model of human head detection is obtained; then the human head detection model is loaded on the remaining group of video frame images, and the final real-time personnel detection is carried out. and statistics. To complete the density detection, the final step of density calculation is required.
所述行为规范检测模块的训练过程如图2所示,包括以下步骤:The training process of the behavior specification detection module is shown in Figure 2, including the following steps:
S5)将某一行为的视频帧图像输入,让图片经过CNN提取特征,对网络中6个特定的卷积层的输出分别用两个3*3的卷积核进行卷积,然后将生成的所有边界框都集合起来,全部丢到NMS(Non-Maximum-Suppression,非极大值抑制)中,得到一系列的目标检测框。S5) Input the video frame image of a certain behavior, let the image go through CNN to extract features, convolve the outputs of 6 specific convolutional layers in the network with two 3*3 convolution kernels, and then convolve the generated All bounding boxes are collected and thrown into NMS (Non-Maximum-Suppression, non-maximum suppression) to obtain a series of target detection boxes.
S6)将步骤S5)得到的目标检测框输入到STN(Spatial Transform Networks,空间变换网络)进行强化操作,从不准确的候选框中提取高质量的单人区域。S6) Input the target detection frame obtained in step S5) into STN (Spatial Transform Networks) for reinforcement operation, and extract high-quality single-person regions from inaccurate candidate frames.
S7)对步骤S6)强化后的单人区域框使用SPPE(Single Person Pose Estimator,单人姿态估计器)来估计此人的姿态骨架。S7) Use SPPE (Single Person Pose Estimator, single-person pose estimator) to estimate the pose skeleton of the person on the single-person area frame enhanced in step S6).
S8)将步骤S7)得到的单人姿态通过SDTN(Spatial De-Transformer Network,空间逆变换网络)重新映射到图像坐标系下,从而得到更加精确的人体目标检测框,并再次进行人体姿态估计操作。然后通过PP-NMS(Parametric Pose Non-Maximum-Suppression,参数化非极大值抑制)来解决冗余检测问题,得到该行为下的人体骨架信息。S8) Remap the single-person pose obtained in step S7) to the image coordinate system through SDTN (Spatial De-Transformer Network), so as to obtain a more accurate human target detection frame, and perform the human pose estimation operation again. . Then, the redundant detection problem is solved by PP-NMS (Parametric Pose Non-Maximum-Suppression, parametric non-maximum suppression), and the human skeleton information under this behavior is obtained.
S9)对步骤S8)得到的多尺度下关键点,经过反卷积层进行关键点回归操作,相当于进行一次向上采样的过程,能够提高目标关键点的精度。考虑多个关键点的连通性,建立连接关键点的有向场,根据人体部位的连通性和结构明确匹配连接的关键点对,减少误连接,得到最终的人体骨架信息。S9) For the multi-scale key points obtained in step S8), the key point regression operation is performed through the deconvolution layer, which is equivalent to performing an up-sampling process, which can improve the accuracy of the target key points. Considering the connectivity of multiple key points, a directed field connecting the key points is established, and the connected key point pairs are clearly matched according to the connectivity and structure of human body parts to reduce misconnections and obtain the final human skeleton information.
S10)对步骤S9)得到最终人体骨架信息进行特征提取,并将其作为该类行为的训练样本输入 到分类器中进行训练;S10) step S9) obtains the final human body skeleton information and carries out feature extraction, and it is input into the classifier as the training sample of this type of behavior for training;
S11)重复以上各步骤,得到各种行为的分类器。S11) Repeat the above steps to obtain classifiers of various behaviors.
所述行为规范检测模块的识别过程如图2所示,包括以下步骤:The identification process of the behavior specification detection module is shown in Figure 2, including the following steps:
S12)依照看守所在押人员行为规范守则,针对具体的行为检测要求,设定检测触发时间段与检测区域,以JSON的形式存储在本地。S12) According to the code of conduct of the detainees, according to the specific behavior detection requirements, set the detection trigger time period and detection area, and store them locally in the form of JSON.
S13)进行检测时,先读取JSON文件,在设定好的检测触发时间段内,录入某一行为的视频帧图像,只取检测区域内的图像,采用上述步骤S5)至S10)对其进行人体姿态估计,得到检测区域内的人体骨架特征信息。其余时间段只播放视频帧图像,不进行相应的行为识别操作。S13) When detecting, first read the JSON file, record the video frame image of a certain behavior within the set detection trigger time period, only take the image in the detection area, and use the above-mentioned steps S5) to S10) to perform it. The human body pose estimation is performed to obtain the human skeleton feature information in the detection area. In the rest of the time period, only video frame images are played, and corresponding behavior recognition operations are not performed.
S14)将步骤S13)得到的人体骨架特征信息输入到分类器中进行识别得到视频行为类别。S14) Input the human skeleton feature information obtained in step S13) into the classifier for identification to obtain the video behavior category.
上述技术方案中,步骤S5)优选两层卷积对不同的特征图来进行提取检测结果。In the above technical solution, in step S5) preferably two layers of convolution are used to extract detection results for different feature maps.
上述技术方案中,步骤S8)中PP-NMS操作如下:In the above-mentioned technical scheme, in step S8), PP-NMS operates as follows:
首先选取最大置信度的姿态作为参考,并且根据消除标准将靠近该参考的区域框进行消除,这个过程多次重复直到冗余的识别框被消除并且每一个识别框都是唯一出现。Firstly, the pose with the highest confidence is selected as the reference, and the area frame close to the reference is eliminated according to the elimination standard. This process is repeated many times until the redundant identification frame is eliminated and each identification frame is unique.
上述技术方案中,步骤S8)得到的人体骨架信息还包括以下操作:In the above technical solution, the human skeleton information obtained in step S8) also includes the following operations:
使用强化数据集,通过学习输出结果中不同姿态的描述信息,来模仿人体区域框的形成过程,进一步产生一个更大的训练集。Using the reinforcement data set, by learning the description information of different poses in the output results, to imitate the formation process of the human body area frame, and further generate a larger training set.
本申请优选采用看守所数据集,数据集包含四种不同环境、10人分五组,每组按规范要求重复三次。使用其中的四个组作为训练数据集,剩余的一组作为测试数据集。This application preferably adopts the detention center data set. The data set contains four different environments, 10 people are divided into five groups, and each group is repeated three times according to the specification requirements. Four of these groups were used as training datasets, and the remaining group was used as test datasets.
具体的,例如要识别“洗漱”这个行为,首先采集四个组分别按规范要求进入洗漱区域,一个组违规进入洗漱区域,其中四个组的洗漱视频作为训练数据集,一个组的洗漱视频作为验证数据集。首先将某一组的洗漱视频帧图像按照上述步骤S1)至S5)进行操作,最终得到的是“洗漱”视频行为的人体姿态骨架信息特征;将其作为这一组的“洗漱”这一行为规范的训练样本,输入分类器训练;经过多次不同组的训练样本训练后,得到“洗漱”行为分类器。同理,可以构建各种视频行为的分类器。Specifically, for example, to identify the behavior of "washing", first collect four groups to enter the washing area according to the specifications, and one group enters the washing area illegally. The washing videos of the four groups are used as the training data set, and the washing videos of one group are used as the training data set. Validation dataset. First, operate a certain group of washing video frame images according to the above steps S1) to S5), and finally obtain the human posture and skeleton information features of the "washing" video behavior; take it as the "washing" behavior of this group Standardized training samples are input into classifier training; after multiple training samples of different groups, the "washing" behavior classifier is obtained. Similarly, classifiers for various video behaviors can be constructed.
当进行判别时,执行上述步骤S12)至S14),首先设定好检测触发时间段与检测区域,如果当前时间在检测触发时间段内,则将测试样本中的一个组的视频帧图像按照设定好的检测区域 进行分割,只将检测区域内的图像按照上述步骤S5)至S10)进行操作,得到该检测区域内的人体姿态骨架信息特征,再经过数据强化集,将其输入分类器中识别出行为类别。其他环境的判别别过程与此同。When judging, execute the above-mentioned steps S12) to S14), first set the detection trigger time period and the detection area, if the current time is within the detection trigger time period, then the video frame images of a group in the test sample are set according to the set The determined detection area is divided, and only the images in the detection area are operated according to the above steps S5) to S10) to obtain the information features of human body posture and skeleton information in the detection area, and then through the data enhancement set, it is input into the classifier. Behavioral categories are identified. The discrimination process for other environments is the same.
如图3所示为简化的底层特征提取与建模流程图。Figure 3 shows a simplified low-level feature extraction and modeling flowchart.
本申请的技术方案中,采用的姿态估计框架是RMPE(Regional Multi-Person Pose Estimation,区域多人姿态检测),如图3所示,首先对输入图像使用CNN进行特征提取,然后对网络中6个特定的卷积层的输出分别用两个3*3的卷积核进行卷积,将生成的所有边界框集合起来通过NMS得到筛选后的目标检测框,然后将该检测框输入到STN和SPPE中自动检测人体姿态,再通过SDTN和PP-NMS进行回归,建立连接关键点的有向场,减少误连接得到最终的人体姿态骨架特征。In the technical solution of the present application, the posture estimation framework adopted is RMPE (Regional Multi-Person Pose Estimation, regional multi-person posture detection). The outputs of each specific convolutional layer are convolved with two 3*3 convolution kernels respectively, and all the generated bounding boxes are collected together to obtain the filtered target detection frame through NMS, and then the detection frame is input to STN and The human body posture is automatically detected in SPPE, and then regression is performed through SDTN and PP-NMS to establish a directed field connecting key points, reducing misconnection to obtain the final human posture skeleton feature.
本申请的技术方案采用两层卷积操作来提取底层特征,然后通过非极大值抑制方法对检测结果进行冗余消除。将冗余消除之后的检测框输入到STN层中对特征进行强化操作,STN网络的功能是能够使得到的特征具有对平移、旋转和尺度变化具有鲁棒性。然后将STN输出的特征图像进行SPPE单人姿态估计,接着通过SDTN将姿态估计结果回归到图像坐标系下,能够在不精准的区域框中提取到高质量的人体区域。然后通过PP-NMS解决冗余检测的问题。最后经过反卷积层进行关键点回归,提高关键点精度,建立连接关键点的有向场,减少误连接,从而得到最终的人体骨架信息。The technical solution of the present application adopts a two-layer convolution operation to extract the underlying features, and then uses a non-maximum suppression method to eliminate redundancy in the detection results. The detection frame after redundancy elimination is input into the STN layer to enhance the features. The function of the STN network is to make the obtained features robust to translation, rotation and scale changes. Then, the feature image output by STN is used for SPPE single-person pose estimation, and then the pose estimation result is returned to the image coordinate system through SDTN, which can extract high-quality human regions in the inaccurate region frame. Then the problem of redundant detection is solved by PP-NMS. Finally, the key point regression is carried out through the deconvolution layer, the accuracy of the key points is improved, the directed field connecting the key points is established, and the misconnection is reduced, so as to obtain the final human skeleton information.
上述技术方案中,CNN是近年来发展起来的发展起来、并引起重视的高效识别方法。20世纪60年代,Hubel和Wiesel在研究猫脑皮层中用于局部敏感和方向选择的神经元时发现其独特的网络结构可以有效地降低反馈神经网络的复杂性,继而提出了CNN。现在,CNN已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。Among the above technical solutions, CNN is an efficient identification method that has been developed in recent years and has attracted attention. In the 1960s, Hubel and Wiesel discovered that their unique network structure can effectively reduce the complexity of the feedback neural network when they studied the neurons used for local sensitivity and direction selection in the cat cerebral cortex, and then proposed CNN. Now, CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification, because the network avoids the complex pre-processing of the image and can directly input the original image, so it has been more widely used.
一般地,CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。Generally, the basic structure of CNN includes two layers, one of which is a feature extraction layer, the input of each neuron is connected to the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, the positional relationship between it and other features is also determined; the second is the feature mapping layer, each computing layer of the network consists of multiple feature maps, each feature map is a plane, All neurons in the plane have equal weights.
本申请的技术方案中就是使用特征映射层,提取视频帧图像中的全局底层特征,而后对底层特征进行更深层次的处理。In the technical solution of the present application, the feature mapping layer is used to extract the global underlying features in the video frame images, and then perform deeper processing on the underlying features.
CNN的一般化处理流程如图4所示。The generalized processing flow of CNN is shown in Figure 4.
本申请的技术方案要使用的层就是在经过卷积以后得到的Feature Map,我们抽取其中六层的feature map大小分别是(38,38),(19,19),(10,10),(5,5),(3,3),(1,1),然后在feature map的每个单元设置多个尺度或者长宽比不同的先验框。这样就形成了特征图。对特征图进行卷积得到检测结果,检测值包括类别置信度和边界框位置。各采用一次3×3卷积来进行完成。The layer to be used in the technical solution of this application is the Feature Map obtained after convolution. We extract the six layers of feature maps whose sizes are (38, 38), (19, 19), (10, 10), ( 5,5), (3,3), (1,1), and then set multiple a priori boxes with different scales or aspect ratios in each unit of the feature map. This forms a feature map. The detection result is obtained by convolving the feature map, and the detection value includes the class confidence and the position of the bounding box. Each is done with a 3×3 convolution.
应当理解,上述实施例仅为说明本申请的技术构思及特点,其目的在于让熟悉此项技术的人士能够了解本申请的内容并据以实施,并不能以此限制本申请的保护范围。凡根据本申请精神实质所作的等效变化或修饰,都应涵盖在本申请的保护范围之内。It should be understood that the above-mentioned embodiments are only intended to illustrate the technical concept and characteristics of the present application, and the purpose thereof is to enable those who are familiar with the technology to understand the content of the present application and implement accordingly, and cannot limit the protection scope of the present application. All equivalent changes or modifications made according to the spirit and spirit of this application should be covered within the protection scope of this application.

Claims (10)

  1. 一种基于深度学习的监所行为规范检测装置,其特征在于包括:人头计数检测模块和行为规范检测模块;其中:A deep learning-based behavioral code detection device for prisons is characterized by comprising: a head count detection module and a behavioral code detection module; wherein:
    所述人头计数检测模块,用于无感点名和/或人群密度识别;所述人头计数检测模块包括目标检测分割过程;The head count detection module is used for non-sensing roll call and/or crowd density identification; the head count detection module includes a target detection and segmentation process;
    所述行为规范检测模块,用于对人员行为进行实时计算判别;所述行为规范检测模块包括利用训练样本集获得分类器的训练过程,以及利用分类器识别测试样本的识别过程。The behavior norm detection module is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection module includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
  2. 根据权利要求1所述的基于深度学习的监所行为规范检测装置,其特征在于,所述人头计数检测模块的目标检测分割过程包括以下步骤:The deep learning-based detection device for prison behavior norms according to claim 1, wherein the target detection and segmentation process of the head count detection module comprises the following steps:
    S1)利用标注工具,对图像进行人头的标注,每个图片产生一个JSON文件,经过卷积神经网络提取图像标注的特征信息;S1) use the labeling tool to label the head of the image, generate a JSON file for each picture, and extract the feature information of the image labeling through a convolutional neural network;
    S2)将步骤S1)得到的特征信息使用区域生成网络提取出ROI,即感兴趣区域,然后使用感兴趣区域池化将这些ROI全部变成固定尺寸;S2) using the feature information obtained in step S1) to extract the ROI, that is, the region of interest, using the region generation network, and then use the region of interest pooling to turn these ROIs into a fixed size;
    S3)对步骤S2)得到的ROI通过全连接层进行Bounding box回归和分类预测,在特征图的不同点采样,并应用双线性插值;S3) perform Bounding box regression and classification prediction on the ROI obtained in step S2) through the fully connected layer, sample at different points of the feature map, and apply bilinear interpolation;
    S4)最后进行分割掩码网络,取ROI分类器选择的正区域为输入,并生成它们的掩码;将预测的掩码放大为ROI边框的尺寸以给出最终的掩码结果,每个目标有一个掩码;将预测分割的掩模添加到每个ROI,输出结果为图像上现有目标和高质量的分割掩模。S4) Finally, perform a segmentation mask network, take the positive regions selected by the ROI classifier as input, and generate their masks; enlarge the predicted masks to the size of the ROI border to give the final mask results, each target There is a mask; the predicted segmentation mask is added to each ROI, and the output is the existing object on the image and a high-quality segmentation mask.
  3. 根据权利要求2所述的基于深度学习的监所行为规范检测装置,其特征在于,所述人头计数检测模块具体包括:The deep learning-based detection device for prison behavior norms according to claim 2, wherein the head count detection module specifically includes:
    目标检测单元,用于对在押人员的无感实时检测和统计;The target detection unit is used for non-sensing real-time detection and statistics of detainees;
    密度分析单元,用于监舍及放风圈的实时精准密度检测和异常报警;Density analysis unit, used for real-time accurate density detection and abnormal alarm in dormitories and venting circles;
    所述目标检测单元包括以下步骤:首先采集五个组分别按规范要求在不同环境中暴露出头在视频图像中,其中四个组的视频作为训练数据集,一个组视频作为验证数据集;然后将四组的视频帧图像按照所述步骤S1)至S5)进行操作,得到人头检测模型;最后对剩下那组的视频帧图像加载该人头检测模型,进行最终的人员实时检测和统计。The target detection unit includes the following steps: firstly collect five groups to expose their heads in video images in different environments according to specification requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as verification data sets; The four groups of video frame images are operated according to the steps S1) to S5) to obtain a human head detection model; finally, the human head detection model is loaded on the remaining group of video frame images, and the final real-time personnel detection and statistics are performed.
  4. 根据权利要求1所述的基于深度学习的监所行为规范检测装置,其特征在于,所述行为规范检测模块的训练过程包括以下步骤:The deep learning-based prison behavior norm detection device according to claim 1, wherein the training process of the behavior norm detection module comprises the following steps:
    S5)将某一行为的视频帧图像输入,让图片经过卷积神经网络提取特征,对网络中6个特定的卷积层的输出分别用两个3*3的卷积核进行卷积,然后将生成的所有边界框都集合起来,全部丢到NMS,即非极大值抑制,得到一系列的目标检测框。S5) Input the video frame image of a certain behavior, let the image extract features through the convolutional neural network, and convolve the outputs of the six specific convolutional layers in the network with two 3*3 convolution kernels, and then Collect all the generated bounding boxes and throw them all to NMS, that is, non-maximum suppression, to obtain a series of target detection boxes.
    S6)将步骤S5)得到的目标检测框输入到STN,即空间变换网络,进行强化操作,从不准确的候选框中提取高质量的单人区域;S6) input the target detection frame obtained in step S5) into the STN, that is, the spatial transformation network, and carry out a reinforcement operation to extract a high-quality single-person area from the inaccurate candidate frame;
    S7)对步骤S6)强化后的单人区域框使用SPPE,即单人姿态估计器,估计此人的姿态骨架;S7) use SPPE to the single person area frame after step S6) strengthening, namely single person posture estimator, estimate the posture skeleton of this person;
    S8)将步骤S7)得到的单人姿态通过SDTN,即空间逆变换网络,重新映射到图像坐标系下,从而得到更加精确的人体目标检测框,并再次进行人体姿态估计操作;然后通过PP-NMS,即参数化非极大值抑制,解决冗余检测问题,得到该行为下的人体骨架信息;S8) Remap the single-person posture obtained in step S7) to the image coordinate system through SDTN, that is, the space inverse transformation network, so as to obtain a more accurate human body target detection frame, and perform the human body posture estimation operation again; then through PP- NMS, that is, parameterized non-maximum suppression, solves the problem of redundant detection, and obtains the human skeleton information under this behavior;
    S9)对步骤S8)得到的多尺度下关键点,经过反卷积层进行关键点回归操作,相当于进行一次向上采样的过程,能够提高目标关键点的精度;考虑多个关键点的连通性,建立连接关键点的有向场,根据人体部位的连通性和结构明确匹配连接的关键点对,减少误连接,得到最终的人体骨架信息;S9) For the multi-scale key points obtained in step S8), the key point regression operation is performed through the deconvolution layer, which is equivalent to performing an up-sampling process, which can improve the accuracy of the target key points; consider the connectivity of multiple key points. , establish a directed field connecting key points, and match the connected key point pairs according to the connectivity and structure of human body parts, reduce misconnections, and obtain the final human skeleton information;
    S10)对步骤S9)得到最终人体骨架信息进行特征提取,并将其作为该类行为的训练样本输入到分类器中进行训练;S10) perform feature extraction on the final human skeleton information obtained in step S9), and input it into the classifier for training as a training sample of this type of behavior;
    S11)重复以上各步骤,得到各种行为的分类器。S11) Repeat the above steps to obtain classifiers of various behaviors.
  5. 根据权利要求4所述的基于深度学习的监所行为规范检测装置,其特征在于,所述行为规范检测模块的识别过程包括以下步骤:The deep-learning-based behavioral code detection device for prisons according to claim 4, wherein the identification process of the behavioral code detection module comprises the following steps:
    S12)依照看守所在押人员行为规范守则,针对具体的行为检测要求,设定检测触发时间段与检测区域,以JSON的形式存储在本地;S12) According to the code of conduct of the detainees, according to the specific behavior detection requirements, set the detection trigger time period and detection area, and store them locally in the form of JSON;
    S13)进行检测时,先读取JSON文件,在设定好的检测触发时间段内,录入某一行为的视频帧图像,只取检测区域内的图像,采用所述步骤S5)至S10)对其进行人体姿态估计,得到检测区域内的人体骨架特征信息;其余时间段只播放视频帧图像,不进行相应的行为识别操作;S13) when detecting, first read the JSON file, in the set detection trigger time period, enter the video frame image of a certain behavior, only take the image in the detection area, and adopt the steps S5) to S10) to It performs human pose estimation and obtains the human skeleton feature information in the detection area; in other time periods, only video frame images are played, and corresponding behavior recognition operations are not performed;
    S14)将步骤S13)得到的人体骨架特征信息输入到分类器中进行识别得到视频行为类别。S14) Input the human skeleton feature information obtained in step S13) into the classifier for identification to obtain the video behavior category.
  6. 根据权利要求1所述的基于深度学习的监所行为规范检测装置,其特征在于,所述识别过程包括设定各项规范行为检测的检测触发时间段与检测区域和利用分类器进行识别,包括人为设定检测时间和检测区域,严格按照看守所在押人员行为规范守则执行,当处于检测触发时间段内,对设定的检测区域内进行相应的行为识别操作,当识别出违规行为需发出警报信息;若未在检测触发时间段内,则不进行相应的行为识别操作;检测时间段与检测区域完全由用户自定义,依照标准行为规范设定,能够很好地满足行为规范检测的需求。The deep learning-based behavior norm detection device for prisons according to claim 1, wherein the identification process includes setting a detection trigger time period and a detection area for the detection of various normative behaviors, and using a classifier to identify, including: Manually set the detection time and detection area, and strictly follow the code of conduct for the detainees in the detainee. When the detection trigger time period is in place, the corresponding behavior recognition operation is performed in the set detection area. When a violation is identified, an alarm message is issued. ; If it is not within the detection trigger time period, the corresponding behavior recognition operation will not be performed; the detection time period and detection area are completely customized by the user and set according to the standard code of conduct, which can well meet the needs of behavior code detection.
  7. 根据权利要求4所述的基于深度学习的监所行为规范检测装置,其特征在于,所述步骤S8)中PP-NMS操作具体包括:选取最大置信度的姿态作为参考,并且根据消除标准将靠近该参考的区域框进行消除,多次重复该过程直到冗余的识别框被消除并且每一个识别框都是唯一出现;The deep learning-based monitoring device for behavioral norms in prisons according to claim 4, wherein the PP-NMS operation in the step S8) specifically includes: selecting a posture with a maximum confidence as a reference, and according to the elimination standard, close the The referenced area frame is eliminated, and the process is repeated many times until the redundant identification frame is eliminated and each identification frame is unique;
    所述步骤S8)得到的人体骨架信息还包括:使用强化数据集,通过学习输出结果中不同姿态的描述信息,来模仿人体区域框的形成过程,进一步产生一个更大的训练集。The human skeleton information obtained in the step S8) further includes: using the enhanced data set, by learning the description information of different postures in the output result, to imitate the formation process of the human body area frame, and further generate a larger training set.
  8. 一种基于深度学习的监所行为规范检测方法,其特征在于,所述方法包括如下步骤:A deep learning-based method for detecting behavioral norms in prisons, characterized in that the method comprises the following steps:
    人头计数检测,用于无感点名和/或人群密度识别;所述人头计数检测包括目标检测分割过程;Head count detection, used for non-sensing roll call and/or crowd density identification; the head count detection includes a target detection and segmentation process;
    行为规范检测,用于对人员行为进行实时计算判别;所述行为规范检测包括利用训练样本集获得分类器的训练过程,以及利用分类器识别测试样本的识别过程。Behavior norm detection is used for real-time calculation and discrimination of personnel behavior; the behavior norm detection includes a training process of obtaining a classifier by using a training sample set, and a recognition process of using the classifier to identify test samples.
  9. 根据权利要求8所述的基于深度学习的监所行为规范检测方法,其特征在于,所述目标检测分割过程具体包括以下步骤:The deep learning-based method for detecting behavioral norms in prisons according to claim 8, wherein the target detection and segmentation process specifically comprises the following steps:
    S1)利用标注工具,对图像进行人头的标注,每个图片产生一个JSON文件,经过卷积神经网络提取图像标注的特征信息;S1) use the labeling tool to label the head of the image, generate a JSON file for each picture, and extract the feature information of the image labeling through a convolutional neural network;
    S2)将步骤S1)得到的特征信息使用区域生成网络提取出ROI,即感兴趣区域,然后使用感兴趣区域池化将这些ROI全部变成固定尺寸;S2) using the feature information obtained in step S1) to extract the ROI, that is, the region of interest, using the region generation network, and then use the region of interest pooling to turn these ROIs into a fixed size;
    S3)对步骤S2)得到的ROI通过全连接层进行Bounding box回归和分类预测,在特征图的不同点采样,并应用双线性插值;S3) perform Bounding box regression and classification prediction on the ROI obtained in step S2) through the fully connected layer, sample at different points of the feature map, and apply bilinear interpolation;
    S4)最后进行分割掩码网络,取ROI分类器选择的正区域为输入,并生成它们的掩码;将预测的掩码放大为ROI边框的尺寸以给出最终的掩码结果,每个目标有一个掩码;将预测分割的掩模添加到每个ROI,输出结果为图像上现有目标和高质量的分割掩模。S4) Finally, perform a segmentation mask network, take the positive regions selected by the ROI classifier as input, and generate their masks; enlarge the predicted masks to the size of the ROI border to give the final mask results, each target There is a mask; the predicted segmentation mask is added to each ROI, and the output is the existing object on the image and a high-quality segmentation mask.
  10. 根据权利要求9所述的基于深度学习的监所行为规范检测方法,其特征在于,所述人头计数检测具体包括以下步骤:The deep learning-based method for detecting behavioral norms in prisons according to claim 9, wherein the detection of the head count specifically comprises the following steps:
    目标检测,用于对在押人员的无感实时检测和统计;Target detection, which is used for non-sensing real-time detection and statistics of detainees;
    密度分析,用于监舍及放风圈的实时精准密度检测和异常报警;Density analysis, used for real-time accurate density detection and abnormal alarm in dormitories and ventilation circles;
    所述目标检测包括以下步骤:首先采集五个组分别按规范要求在不同环境中暴露出头在视频图像中,其中四个组的视频作为训练数据集,一个组视频作为验证数据集;然后将四组的视频帧图像按照所述步骤S1)至S5)进行操作,得到人头检测模型;最后对剩下那组的视频帧图像加载该人头检测模型,进行最终的人员实时检测和统计。The target detection includes the following steps: firstly collect five groups to expose their heads in video images in different environments according to the specification requirements, wherein four groups of videos are used as training data sets, and one group of videos is used as verification data sets; The video frame images of the group are operated according to the steps S1) to S5) to obtain a human head detection model; finally, the human head detection model is loaded for the remaining video frame images of the group, and the final real-time personnel detection and statistics are performed.
PCT/CN2021/107746 2020-07-28 2021-07-22 Deep-learning-based apparatus and method for monitoring behavioral norms in jail WO2022022368A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010736024.9A CN114092846A (en) 2020-07-08 2020-07-28 Prison behavior specification detection device and method based on deep learning
CN202010736024.9 2020-07-28

Publications (1)

Publication Number Publication Date
WO2022022368A1 true WO2022022368A1 (en) 2022-02-03

Family

ID=80037108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107746 WO2022022368A1 (en) 2020-07-28 2021-07-22 Deep-learning-based apparatus and method for monitoring behavioral norms in jail

Country Status (1)

Country Link
WO (1) WO2022022368A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114740774A (en) * 2022-04-07 2022-07-12 青岛沃柏斯智能实验科技有限公司 Behavior analysis control system for safe operation of fume hood
CN115205929A (en) * 2022-06-23 2022-10-18 池州市安安新材科技有限公司 Authentication method and system for avoiding false control of electric spark cutting machine tool workbench
CN115273154A (en) * 2022-09-26 2022-11-01 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Thermal infrared pedestrian detection method and system based on edge reconstruction and storage medium
CN115294661A (en) * 2022-10-10 2022-11-04 青岛浩海网络科技股份有限公司 Pedestrian dangerous behavior identification method based on deep learning
CN115482491A (en) * 2022-09-23 2022-12-16 湖南大学 Bridge defect identification method and system based on transformer
CN115841651A (en) * 2022-12-13 2023-03-24 广东筠诚建筑科技有限公司 Constructor intelligent monitoring system based on computer vision and deep learning
CN115953741A (en) * 2023-03-14 2023-04-11 江苏实点实分网络科技有限公司 Edge computing system and method based on embedded algorithm
CN115988181A (en) * 2023-03-08 2023-04-18 四川三思德科技有限公司 Personnel monitoring system and method based on infrared image algorithm
CN115995119A (en) * 2023-03-23 2023-04-21 山东特联信息科技有限公司 Gas cylinder filling link illegal behavior identification method and system based on Internet of things
CN116206265A (en) * 2023-05-05 2023-06-02 昆明轨道交通四号线土建项目建设管理有限公司 Protection alarm device and method for rail transit operation maintenance
CN116260990A (en) * 2023-05-16 2023-06-13 合肥高斯智能科技有限公司 AI asynchronous detection and real-time rendering method and system for multipath video streams
CN116343343A (en) * 2023-05-31 2023-06-27 杭州电子科技大学 Intelligent evaluation method for crane lifting command action based on cloud end architecture
CN116665419A (en) * 2023-05-09 2023-08-29 三峡高科信息技术有限责任公司 Intelligent fault early warning system and method based on AI analysis in power production operation
CN116665309A (en) * 2023-07-26 2023-08-29 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN117115926A (en) * 2023-10-25 2023-11-24 天津大树智能科技有限公司 Human body action standard judging method and device based on real-time image processing
CN117253176A (en) * 2023-11-15 2023-12-19 江苏海内软件科技有限公司 Safe production Al intelligent detection method based on video analysis and computer vision
CN117275069A (en) * 2023-09-26 2023-12-22 华中科技大学 End-to-end head gesture estimation method based on learnable vector and attention mechanism
CN117351434A (en) * 2023-12-06 2024-01-05 山东恒迈信息科技有限公司 Working area personnel behavior specification monitoring and analyzing system based on action recognition
CN116631050B (en) * 2023-04-20 2024-02-13 北京电信易通信息技术股份有限公司 Intelligent video conference-oriented user behavior recognition method and system
CN117351434B (en) * 2023-12-06 2024-04-26 山东恒迈信息科技有限公司 Working area personnel behavior specification monitoring and analyzing system based on action recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416254A (en) * 2018-01-17 2018-08-17 上海鹰觉科技有限公司 A kind of statistical system and method for stream of people's Activity recognition and demographics
CN109800665A (en) * 2018-12-28 2019-05-24 广州粤建三和软件股份有限公司 A kind of Human bodys' response method, system and storage medium
CN109886085A (en) * 2019-01-03 2019-06-14 四川弘和通讯有限公司 People counting method based on deep learning target detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416254A (en) * 2018-01-17 2018-08-17 上海鹰觉科技有限公司 A kind of statistical system and method for stream of people's Activity recognition and demographics
CN109800665A (en) * 2018-12-28 2019-05-24 广州粤建三和软件股份有限公司 A kind of Human bodys' response method, system and storage medium
CN109886085A (en) * 2019-01-03 2019-06-14 四川弘和通讯有限公司 People counting method based on deep learning target detection

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114740774A (en) * 2022-04-07 2022-07-12 青岛沃柏斯智能实验科技有限公司 Behavior analysis control system for safe operation of fume hood
CN115205929B (en) * 2022-06-23 2023-07-28 池州市安安新材科技有限公司 Authentication method and system for avoiding misoperation of workbench of electric spark cutting machine tool
CN115205929A (en) * 2022-06-23 2022-10-18 池州市安安新材科技有限公司 Authentication method and system for avoiding false control of electric spark cutting machine tool workbench
CN115482491A (en) * 2022-09-23 2022-12-16 湖南大学 Bridge defect identification method and system based on transformer
CN115273154A (en) * 2022-09-26 2022-11-01 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Thermal infrared pedestrian detection method and system based on edge reconstruction and storage medium
CN115273154B (en) * 2022-09-26 2023-01-17 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Thermal infrared pedestrian detection method and system based on edge reconstruction and storage medium
CN115294661A (en) * 2022-10-10 2022-11-04 青岛浩海网络科技股份有限公司 Pedestrian dangerous behavior identification method based on deep learning
CN115841651A (en) * 2022-12-13 2023-03-24 广东筠诚建筑科技有限公司 Constructor intelligent monitoring system based on computer vision and deep learning
CN115841651B (en) * 2022-12-13 2023-08-22 广东筠诚建筑科技有限公司 Constructor intelligent monitoring system based on computer vision and deep learning
CN115988181A (en) * 2023-03-08 2023-04-18 四川三思德科技有限公司 Personnel monitoring system and method based on infrared image algorithm
CN115953741A (en) * 2023-03-14 2023-04-11 江苏实点实分网络科技有限公司 Edge computing system and method based on embedded algorithm
CN115995119B (en) * 2023-03-23 2023-07-28 山东特联信息科技有限公司 Gas cylinder filling link illegal behavior identification method and system based on Internet of things
CN115995119A (en) * 2023-03-23 2023-04-21 山东特联信息科技有限公司 Gas cylinder filling link illegal behavior identification method and system based on Internet of things
CN116631050B (en) * 2023-04-20 2024-02-13 北京电信易通信息技术股份有限公司 Intelligent video conference-oriented user behavior recognition method and system
CN116206265A (en) * 2023-05-05 2023-06-02 昆明轨道交通四号线土建项目建设管理有限公司 Protection alarm device and method for rail transit operation maintenance
CN116665419B (en) * 2023-05-09 2024-01-16 三峡高科信息技术有限责任公司 Intelligent fault early warning system and method based on AI analysis in power production operation
CN116665419A (en) * 2023-05-09 2023-08-29 三峡高科信息技术有限责任公司 Intelligent fault early warning system and method based on AI analysis in power production operation
CN116260990A (en) * 2023-05-16 2023-06-13 合肥高斯智能科技有限公司 AI asynchronous detection and real-time rendering method and system for multipath video streams
CN116343343A (en) * 2023-05-31 2023-06-27 杭州电子科技大学 Intelligent evaluation method for crane lifting command action based on cloud end architecture
CN116343343B (en) * 2023-05-31 2023-07-25 杭州电子科技大学 Intelligent evaluation method for crane lifting command action based on cloud end architecture
CN116665309B (en) * 2023-07-26 2023-11-14 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN116665309A (en) * 2023-07-26 2023-08-29 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN117275069A (en) * 2023-09-26 2023-12-22 华中科技大学 End-to-end head gesture estimation method based on learnable vector and attention mechanism
CN117115926A (en) * 2023-10-25 2023-11-24 天津大树智能科技有限公司 Human body action standard judging method and device based on real-time image processing
CN117115926B (en) * 2023-10-25 2024-02-06 天津大树智能科技有限公司 Human body action standard judging method and device based on real-time image processing
CN117253176A (en) * 2023-11-15 2023-12-19 江苏海内软件科技有限公司 Safe production Al intelligent detection method based on video analysis and computer vision
CN117253176B (en) * 2023-11-15 2024-01-26 江苏海内软件科技有限公司 Safe production Al intelligent detection method based on video analysis and computer vision
CN117351434A (en) * 2023-12-06 2024-01-05 山东恒迈信息科技有限公司 Working area personnel behavior specification monitoring and analyzing system based on action recognition
CN117351434B (en) * 2023-12-06 2024-04-26 山东恒迈信息科技有限公司 Working area personnel behavior specification monitoring and analyzing system based on action recognition

Similar Documents

Publication Publication Date Title
WO2022022368A1 (en) Deep-learning-based apparatus and method for monitoring behavioral norms in jail
Gong et al. A real-time fire detection method from video with multifeature fusion
CN109819208A (en) A kind of dense population security monitoring management method based on artificial intelligence dynamic monitoring
US9001199B2 (en) System and method for human detection and counting using background modeling, HOG and Haar features
Pantic et al. Automatic analysis of facial expressions: The state of the art
Sun et al. Articulated part-based model for joint object detection and pose estimation
Lin et al. Estimation of number of people in crowded scenes using perspective transformation
CN109190479A (en) A kind of video sequence expression recognition method based on interacting depth study
CN110717389B (en) Driver fatigue detection method based on generation countermeasure and long-short term memory network
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
CN107330371A (en) Acquisition methods, device and the storage device of the countenance of 3D facial models
CN108345894B (en) A kind of traffic incidents detection method based on deep learning and entropy model
CN110427834A (en) A kind of Activity recognition system and method based on skeleton data
CN104504395A (en) Method and system for achieving classification of pedestrians and vehicles based on neural network
CN112183472A (en) Method for detecting whether test field personnel wear work clothes or not based on improved RetinaNet
CN111860297A (en) SLAM loop detection method applied to indoor fixed space
Elbasi Reliable abnormal event detection from IoT surveillance systems
Zambanini et al. Detecting falls at homes using a network of low-resolution cameras
Wu et al. An eye localization, tracking and blink pattern recognition system: Algorithm and evaluation
CN114782979A (en) Training method and device for pedestrian re-recognition model, storage medium and terminal
Hung et al. Fall detection with two cameras based on occupied area
Juang et al. Human posture classification using interpretable 3-D fuzzy body voxel features and hierarchical fuzzy classifiers
CN112766145B (en) Method and device for identifying dynamic facial expressions of artificial neural network
CN107025439A (en) Lip-region feature extraction and normalization method based on depth data
Alsaedi et al. Design and Simulation of Smart Parking System Using Image Segmentation and CNN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21848547

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21848547

Country of ref document: EP

Kind code of ref document: A1