CN115565016B - Comprehensive operator safety detection method - Google Patents

Comprehensive operator safety detection method Download PDF

Info

Publication number
CN115565016B
CN115565016B CN202211339248.1A CN202211339248A CN115565016B CN 115565016 B CN115565016 B CN 115565016B CN 202211339248 A CN202211339248 A CN 202211339248A CN 115565016 B CN115565016 B CN 115565016B
Authority
CN
China
Prior art keywords
model
detection
face
safety
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211339248.1A
Other languages
Chinese (zh)
Other versions
CN115565016A (en
Inventor
张培培
张武杰
徐怡彤
王梅霞
陈锦斐
王冰冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Science and Technology
Original Assignee
North China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Science and Technology filed Critical North China University of Science and Technology
Priority to CN202211339248.1A priority Critical patent/CN115565016B/en
Publication of CN115565016A publication Critical patent/CN115565016A/en
Application granted granted Critical
Publication of CN115565016B publication Critical patent/CN115565016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Ophthalmology & Optometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a comprehensive operator safety detection method, which comprises the following steps: acquiring a basic model for safety detection, wherein the basic model is a PyTorch model; converting the PyTorch model into an ONNX model; performing low-precision quantization processing and topology removal processing on the ONNX model; converting the processed ONNX model into an IR model; the IR model was run on a nerve computation stick. According to the technology, the algorithm adopts a mode of model conversion to optimize the model, firstly, the PyTorch model is converted into a Open Neural Network Exchange (ONNX) model, then, the ONNX model is subjected to low-precision quantification treatment and topology removal treatment, the treated model is generated into files of IR models of xml and bin, and then the IR model is operated on an Intel nerve computation stick, so that the reasoning speed of the model can be greatly increased, and the edge end computing effect is greatly improved.

Description

Comprehensive operator safety detection method
Technical Field
The invention belongs to the technical field of safety monitoring, and particularly relates to a comprehensive operator safety detection method.
Background
The traditional field safety detection equipment has scattered functions, only the safety detection of the machine equipment on the human body is considered, or only the detection of the operation state of the operation personnel is considered, along with the improvement of the comprehensive requirement of the safety monitoring, the safety detection of the field equipment and the safety detection of the operation personnel are required to be unified into one product, and the safety operation process of the field operation personnel is detected in an omnibearing manner.
The safety operation process of the field operator is generally detected by real-time field identification through an artificial intelligence related model, however, the existing artificial intelligence algorithm model has higher requirements on operation equipment, and the operation is not Chang Kadu at the end of a small-size and low-power-consumption processor (such as an ARM processor). If the target detection algorithm is based on the PyTorch, the initial model is a PyTorch (pt) model, if model optimization is not performed, the target detection algorithm directly runs at the ARM end, and detects images every 2 seconds on average, the detection rate is too slow, and the detection effect is lost.
Disclosure of Invention
The invention aims to provide a comprehensive operator safety detection method, which is used for solving the problems that an artificial intelligent algorithm in the prior art has high requirements on operation equipment, and the detection rate is too slow and the detection effect is lost when the operation is not Chang Ka at a small-volume and low-power-consumption processor end.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a comprehensive operator safety detection method comprises the following steps:
acquiring a basic model for safety detection, wherein the basic model is a PyTorch model;
converting the PyTorch model into an ONNX model;
performing low-precision quantization processing and topology removal processing on the ONNX model;
converting the processed ONNX model into an IR model;
the IR model was run on a nerve computation stick.
According to the technology, the algorithm adopts a mode of model conversion to optimize the model, firstly, the PyTorch model is converted into a Open Neural Network Exchange (ONNX) model, then, the ONNX model is subjected to low-precision quantification treatment and topology removal treatment, the treated model is generated into files of IR models of xml and bin, and then the IR model is operated on an Intel nerve computation stick, so that the reasoning speed of the model can be greatly increased, and the edge end computing effect is greatly improved.
In one possible design, the basic model includes a character ranging model, the character ranging model includes target detection and ranging, the target detection is to detect the targets of the people and the vehicles for each frame of image, and a detection frame is drawn; according to the distance measuring method, calculating the distance between a person and a vehicle, and judging whether an alarm is required according to whether the distance is smaller than a threshold value;
the target detection comprises the following steps:
determining optimal overlap ratio IoU threshold N under non-overlapping condition of detection frames by using cross-validation method t
Judging whether overlapping exists or not according to the number of the communicated detection frames;
if not, according to IoU threshold N t NMS operation is carried out on the detection frame;
if yes, carrying out cluster analysis on the communicated detection frames, dividing the communicated detection frames into a plurality of clusters through the cluster analysis, and then, in each cluster, carrying out N according to a IoU threshold value t To perform NMS operations.
Correspondingly, the method for carrying out cluster analysis on the communicated detection frames comprises the following steps:
using YOLOv3 as a training model, wherein the number of detection frames allocated to each target is fixed, dividing the number of communicated detection frames by the number of detection frames allocated to each target, rounding up, and determining the number of clusters according to the rounding up;
and calculating the distance between the center points of the communicated detection frames to form a distance adjacent matrix, removing the symmetrical part of the adjacent matrix, calculating the maximum value of each column of the matrix to form a row vector, performing differential operation on the row vector, and forcibly disconnecting the connectivity of two communicated detection frames with the distances arranged in the first few according to the number of clusters to form a plurality of clusters.
Correspondingly, the method for ranging in the character ranging model comprises the following steps:
the method for measuring the distance between the person and the vehicle by adopting a monocular distance measuring method comprises the following steps:
D=(W×F)/P (8)
wherein W is the target width, F is the focal length of the camera, and P is the pixel width;
during calculation, W and F are set to be constant, meanwhile, the focal length F of the camera is determined in a focusing mode, and the pixel width P can be determined according to the width of the target detection frame.
The method can be completed by using the common camera through monocular distance measurement, and the cost of one common camera is greatly reduced compared with that of the binocular camera.
In one possible design, the base model further includes a fatigue operation identification model, and the method for identifying a fatigue operation identification model includes: and detecting the face of the operator, detecting eyes on the basis of the face detection result, judging whether the operator is in a closed-eye state according to the eye detection result, if so, judging whether the continuous time length of closing the eyes exceeds a threshold value, if so, judging that the operator is in a fatigue state, and giving an early warning. Whether the operator is tired or not is judged by judging whether the operator is in a eye-closing state or not and judging the time of the eye-closing state, compared with the traditional rapid action such as blinking, the device is required to have a rapid detection speed, and therefore the edge computing effect is improved.
Accordingly, when judging whether the operator is in a closed-eye state, the operator first detects the face by using the Haar model, and then detects the eyes on the basis of the face detection result, and when the eyes are not detected, the operator is directly considered to be in the closed-eye state.
In one possible design, the basic model further includes a dangerous action recognition model, and the method of the dangerous action recognition model includes:
identify key nodes of human body for each frame of image, calculate characteristic vector of human body posture,
meanwhile, target detection is carried out on specific articles, wherein the specific articles comprise mobile phones and water cups;
and classifying and identifying dangerous behaviors by using a support vector machine according to the human body posture feature vector and the target detection feature data.
In the dangerous action recognition process, by adding the target detection result characteristics, the false detection rate and the omission rate can be reduced, and the detection precision is improved.
In one possible design, the basic model further includes a face recognition model, and the method of the face recognition model includes:
and (3) carrying out face detection by using a Dlib model, intercepting a face image after the face is detected, converting the image into a data set with a specific size, taking the data set as the input of a convolutional neural network, and carrying out transformation on the data set through a convolutional layer and a pooling layer of the convolutional neural network to obtain 512-dimensional feature vectors, and classifying according to the 512-dimensional feature vectors to obtain a classification result.
In one possible design, the basic model further includes a safety helmet detection, and the method for safety helmet detection includes:
detecting human face by Dlib model, photographing the corresponding person to obtain image information,
the detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
The beneficial effects are that:
the algorithm adopts a mode of model conversion to optimize the model, firstly converts a PyTorch model into a Open Neural Network Exchange (ONNX) model, then carries out low-precision quantification treatment and topology removal treatment on the ONNX model, generates files of IR models of xml and bin for the treated model, and then enables the IR model to operate on an Intel nerve computation stick, thereby greatly accelerating the reasoning speed of the model and greatly improving the edge end computation effect;
the method has the advantages that a target detection algorithm for people and vehicles is provided, target detection under the overlapping situation of the people and the vehicles is realized, and the detection effect is improved;
the method can be completed by using a common camera through monocular distance measurement, and the cost of one common camera is greatly reduced compared with that of a binocular camera; whether the operator is tired or not is judged by judging whether the operator is in a eye-closing state or not and judging the time of the eye-closing state, compared with the traditional rapid action such as blinking, the device is required to have a rapid detection speed, so that the detection speed is improved; in the dangerous action recognition process, by adding the target detection result characteristics, the false detection rate and the omission rate can be reduced, and the detection precision is improved.
Drawings
FIG. 1 is a system functional module to which the comprehensive operator safety detection method provided in the embodiment is applied;
FIG. 2 is a flow chart of a comprehensive operator safety detection method provided in the embodiment;
FIG. 3 is a schematic flow chart of a person ranging method in the comprehensive operator safety detection method provided in the embodiment;
FIG. 4 is a schematic flow chart of a clustering process performed by cluster analysis in a human ranging method in an embodiment;
fig. 5 is a comparison chart of key node feature recognition in a fatigue recognition method and operation time of a Haar detection method in the comprehensive operator safety detection method provided in the embodiment.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the description of the embodiments or the prior art, and it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art. It should be noted that the description of these examples is for aiding in understanding the present invention, but is not intended to limit the present invention.
As shown in fig. 1, comprehensive operator safety detection mainly comprises three aspects of safety detection of personal and machine equipment before operation, and realizes comprehensive safety guarantee.
The safety detection before operation comprises identity authentication based on face recognition and detection of a safety helmet, and the operation of the equipment can be performed after the detection passes; in the operation process, for large dangerous equipment, the approach of people or vehicles is forbidden in the running process, the target detection of objects such as people, vehicles and the like is required to be carried out, the distance measurement is required, and the alarm is required to be carried out when the distance is too close; in the operation process, the state of an operator is detected, if the operator is in a fatigue state or dangerous behaviors such as calling, drinking and the like exist, and if the operator is in the fatigue state, the operator can give an alarm in time.
Examples:
the embodiment provides a comprehensive operator safety detection method, as shown in fig. 2, comprising the following steps: acquiring a basic model for safety detection, wherein the basic model is a PyTorch (pt) model; converting the pyrerch model to a Open Neural Network Exchange (ONNX) model; performing low-precision quantization processing and topology removal processing on the ONNX model by using OpenVino; converting the processed ONNX model into an IR model to generate IR files of xml and bin; the IR model was run on a nerve computation stick, which was Intel second generation nerve computation stick NCS2. The reasoning speed of the model can be greatly increased, the detection effect of 20 frames per second on average can be realized, and the edge computing effect is greatly improved. Wherein IR (ImageReady) is image editing software mainly processing network graphics, ONNX is an open file format designed for machine learning, and is used for storing trained models. It allows different deep learning frameworks to store model data in the same format. ONNX is an intermediate expression format that facilitates migration of models in various mainstream deep learning frameworks.
In one possible design, the basic model includes a character ranging model, the character ranging model includes target detection and ranging, the target detection is to detect the targets of the people and the vehicles for each frame of image, and a detection frame is drawn; according to the distance measuring method, calculating the distance between a person and a vehicle, and judging whether an alarm is required according to whether the distance is smaller than a threshold value;
the method improves the NMS method in the case of overlap, and the target detection comprises the following steps:
first, using cross-validation method to determine the optimal IoU threshold N without overlap t
Judging whether overlapping exists according to the number of the communication frames;
when there is no overlap, the IoU threshold N is used t Performing NMS operation;
if overlapping exists, the connected detection frames are divided into a plurality of clusters by using cluster analysis, and then the N threshold value of IoU is utilized in each cluster t To perform NMS operations.
Specifically, in the process of operating the equipment by an operator (such as in the driving process of a forklift), if people around the equipment are close to each other, the damage to the people is easily caused, so that people and vehicles around the equipment need to be detected in the operation process of dangerous equipment, and when the target is close to the equipment, a prompt is given.
The 'person and object ranging' algorithm is divided into two steps, namely, target detection is carried out, namely, people and vehicles are detected, and a detection frame is drawn; and secondly, distance measurement, namely calculating the distance between the detected person or vehicle and the mechanical equipment, and giving an alarm when the distance is smaller than a threshold value. The present embodiment mainly optimizes the algorithm of target detection.
Many detection frames are generated in the target detection reasoning process, but only one detection frame is needed for each target finally. Specifically, the NMS operates as Non-maximum suppression (Non-Maximum Suppression, NMS) to sequence the detection frames from large to small according to the confidence score, select the detection frame with the highest score, and calculate the Intersection-over-Union (IoU) ratio of the detection frame with the highest score, wherein IoU reflects the overlapping degree of the two Intersection frames, and if the overlapping degree is high, the IoU is larger, and the value range of IoU is 0-0.5; the specific calculations are shown in equation 1,
Figure BDA0003915813140000061
wherein, C represents the detection frame with the highest confidence score, which is simply called detection frame C, area (C) represents the area of the detection frame C, area (G), G represents the detection frame intersected with the detection frame with the highest confidence score, which is simply called detection frame G, area (G) represents the area of the detection frame G.
When IoU value exceeds threshold N t For exceeding threshold N t Is suppressed, as shown in equation 2,
Figure BDA0003915813140000062
i is a detection frame intersected with a detection frame with highest confidence score, s i Confidence score for detection box i; when IoU exceeds the threshold, the overlapping degree of the frame and the detection frame with the highest confidence score is higher, and the frame needs to be restrained, namely deleted;
detection frames exceeding the threshold are to be deleted, which is liable to cause a missed detection situation. A common approach to solving this problem is a post-processing method of Soft-NMS, which translates deleting the threshold box above IoU to attenuating its confidence, as shown in equation 3,
Figure BDA0003915813140000063
the present algorithm improves the NMS method in the case where there is overlap. First, whether there is an overlapping situation is judged according to the number of the connected frames, and when there is no overlapping situation, a IoU threshold N is used t Performing NMS operation; if overlapping exists, firstly, clustering analysis is utilized to divide the communicated detection frame into a plurality of clusters, and then, ioU threshold N is utilized in each cluster t To perform the operation of the NMS. A specific example is shown in flow chart 3.
Correspondingly, the cluster analysis method comprises the following steps: and (3) using YOLOv3 as a training model, wherein the number of detection frames allocated to each target is fixed, dividing the number of communication frames by the number of each target allocation frame, rounding to determine the number of clusters, calculating the distance between the central points of the communication frames to form a distance adjacent matrix, removing the symmetrical part of the adjacent matrix, calculating the maximum value of each column of the matrix to form a row vector, performing differential operation, and forcibly disconnecting the connectivity of two communication frames with the distance row in the first few according to the number of clusters to form a plurality of clusters. A specific example is shown in fig. 4. For a pair ofAt IoU the optimal threshold N t Setting, determining the optimal threshold value N by adopting a cross-validation method t
The effect of the method and the conventional method combined with the embodiment is verified as follows:
and collecting images of vehicles and human bodies at different times and different places, wherein a total of 3720 images are obtained, each photo has a plurality of people and vehicles, a total of 174,848 targets to be detected, and a test experiment is carried out. Different NMS methods are evaluated using Precision (P), recall (R), false alarm (F), and miss (M), the calculation formulas of which are shown in fig. 4-7, wherein TP, TN, FP, FN represents the number of positive examples, negative examples, false alarms, and missed alarms, respectively. As shown in Table 1, the improved NMS method of the algorithm has the advantages that the false detection rate and the omission rate are smaller than those of the common NMS and the Soft-NMS, the accuracy rate is higher than those of the common NMS and the Soft-NMS, and the experiment proves that the detection effect is improved.
Figure BDA0003915813140000071
Figure BDA0003915813140000072
Figure BDA0003915813140000073
Figure BDA0003915813140000074
Table 1 comparison of the detection accuracy of different models
Figure BDA0003915813140000075
In one possible embodiment, the method for ranging in the character ranging model includes:
the method for measuring the distance between the person and the vehicle by adopting a monocular distance measuring method comprises the following steps:
D=(W×F)/P (8)
wherein W is the target width, F is the focal length of the camera, and P is the pixel width;
during calculation, W and F are set to be constant, meanwhile, the focal length E of the camera is determined in a focusing mode, and the pixel width P can be determined according to the width of the target detection frame.
The conventional distance measuring and calculating method is usually completed in a binocular distance measuring mode, the cost of the method is high, the cost of a binocular distance camera is generally more than thousands of yuan, and for one device to realize multi-azimuth detection of people and vehicles, at least cameras in front, back, left and right directions are needed, and if the binocular distance camera is adopted, the cost of the camera is more than 4000 yuan. The algorithm adopts a monocular ranging method to realize the ranging of the targets of people and vehicles, and the method has the characteristics of low cost and high running speed. The monocular distance measurement can be completed by using the common camera, the cost of one common camera is only about tens of yuan, and compared with the binocular camera, the monocular distance measurement device has the advantage that the cost is greatly reduced.
In a possible implementation manner, the basic model further includes a dangerous action recognition model, and the method for the dangerous action recognition model includes:
identify key nodes of human body for each frame of image, calculate characteristic vector of human body posture,
meanwhile, target detection is carried out on specific articles, wherein the specific articles comprise mobile phones and water cups;
and classifying and identifying dangerous behaviors by using a support vector machine according to the human body posture feature vector and the target detection feature data.
According to the method, firstly, an OpenPose open source model is adopted to identify key nodes of a human body, 4 key node data of the head, the arm and the like of the human body in each frame of image are read in real time, and gesture characteristics are extracted. Specific gesture features include: (1) triangle characteristic formed by three nodes of arm shoulder, elbow and wrist. When the arm is in a straightening state, the triangle is an obtuse triangle, the ratio of the square of the long side to the sum of squares of the two short sides is more than 1, when the arm is bent, such as making a call or drinking water, the acute triangle is easy to form, and the ratio of the square of the long side to the sum of squares of the two short sides is less than 1; (2) the area characteristics of the polygon formed by the head node and the three nodes of the arm shoulder, the elbow and the wrist. If the operation is normal, the area is relatively large, and when dangerous actions such as calling or drinking water are performed, the area is reduced; (3) distance of wrist node from head node. The wrist node is far from the head node when operating normally, and becomes very close when making a call or drinking water.
Collecting human body key node identification images at different times and different places, and performing a total of 147376 test experiments. And evaluating the SVM recognition results by using different methods by using the accuracy rate P, the recall rate R, the false detection rate F and the omission rate M. As shown in table 2. The accuracy rate P and recall rate R of SVM identification by utilizing the key node characteristics of the human body are 95.90% and 96.71%, respectively, and the accuracy rate P and recall rate R of SVM identification added with the target detection result characteristics are 96.84% and 97.63%, respectively; meanwhile, the false detection rate F and the false omission rate M of SVM identification by utilizing the key node characteristics of the human body are 28.66% and 3.29%, respectively, and the false detection rate F and the false omission rate M of SVM identification added with the target detection result characteristics are 19.76% and 2.37% respectively. Experiments prove that the accuracy rate of the two methods exceeds 95%, but the detection effect of the gesture recognition method added with target detection is better.
Table 2 comparison of SVM recognition results by different feature quantity methods
Figure BDA0003915813140000091
In the specific implementation, the acceleration can be performed in a parallel computing mode, but a higher requirement is put on the computing capacity of the equipment, the intel nerve computing rod is adopted by the equipment to improve the reasoning speed, one nerve computing rod is used by the OpenPose gesture recognition model, one nerve computing rod is used by the cup and mobile phone target detection model, the two reasoning models can be accelerated at the same time, and after the corresponding characteristics are extracted, dangerous actions are recognized by the support vector machine SVM.
In one possible embodiment, the base model includes a fatigue operation identification model, and the method for identifying a fatigue operation identification model includes: the face is detected in the image, then eyes are detected in the face image, if eyes are detected, the eyes are opened, and if eyes are not detected, the eyes are closed. Judging whether the operator is in a fatigue state according to whether the continuous time length of closing the eyes exceeds a threshold value, and giving an early warning if the operator is in the fatigue state. For rapid actions such as blinking, the device is required to have a high detection speed, and the face and eyes can be efficiently detected through the Haar model, so that the effect of real-time detection is realized.
For the recognition of the eye closing state, the conventional algorithm mostly utilizes the extracted facial key node data, and then judges whether the eye is closed or not according to key node characteristics such as the aspect ratio of an eye closing graph. The algorithm for extracting the facial key node data is mostly used in face recognition, so that people can be identified, the accuracy is high, and the corresponding running speed is also prolonged. The algorithm does not need to identify, only needs to detect the face, adopts a Haar model, utilizes an integral graph to rapidly calculate rectangular features such as boundary features, linear features, central features and the like, and then utilizes an AdaBoost algorithm to rapidly detect the face.
Based on the face detection result, the Haar eye detection model is also adopted to detect eyes, and when no eyes are detected, the detection is directly regarded as an eye-closing state. The Haar model increases the detection speed, and as shown in fig. 5, when the image size is 30M, the average detection time is only 0.1 seconds, and the average detection time is already 1s by adopting the method of eye closure recognition based on the facial key node characteristics. Experiments prove that the detection speed of the algorithm is higher than that of other algorithms. Meanwhile, as the Haar model adopts an AdaBoost algorithm, the detection algorithm is optimized, the detection precision is improved, as shown in table 3, the accuracy of the closed-eye detection based on the facial key nodes is only 94.01%, and the accuracy of the closed-eye detection model based on the Haar model is improved to 95.72%. The detection effect is improved.
Table 3 comparison of SVM recognition results by different feature quantity methods
Figure BDA0003915813140000101
In the dangerous action recognition process, by adding the target detection result characteristics, the false detection rate and the omission rate can be reduced, and the detection precision is improved.
In one possible design, the basic model further includes a face recognition model, and the method of the face recognition model includes:
and (3) carrying out face detection by using a Dlib model, intercepting a face image after the face is detected, converting the image into a data set with a specific size, taking the data set as the input of a convolutional neural network, and carrying out transformation on the data set through a convolutional layer and a pooling layer of the convolutional neural network to obtain 512-dimensional feature vectors, and classifying according to the 512-dimensional feature vectors to obtain a classification result.
Specifically, an employee approaches a camera, a Dlib model performs face detection, a face image is intercepted after the face is detected, the image is converted into a 64 x 3 data set, the data set is used as input of a convolutional neural network, the data set is transformed through a convolutional layer and a pooling layer, 512-dimensional feature vectors are obtained, and classification is performed according to the feature vectors, so that a classification result is obtained.
In one possible implementation manner, the basic model further comprises a safety helmet detection, and the safety helmet detection method comprises the following steps:
detecting human face by Dlib model, photographing the corresponding person to obtain image information,
the detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
As a specific example, the safety equipment detection of the system mainly detects whether an employee wears a helmet. Firstly, the detection model of the safety helmet in the design is trained through a hundred-degree flying oar easy DL platform, the model training process is that a model is firstly established, for example, a training model is selected as YOLOv5, and the like, then a training data set is uploaded, training is carried out, verification is carried out after the training is completed, and the detection can be carried out after the verification is passed. The detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
The equipment completely adopts an artificial intelligence method to realize safety detection, and realizes more intelligent, more accurate, more convenient and quick safety detection. For example, the identification is realized mainly by using an intelligent mode of face recognition; the detection of the safety helmet is mainly realized by a safety helmet target detection method; the safety detection of the large-scale mobile equipment mainly utilizes a target detection mode to detect people or vehicles around the equipment, then calculates the distance of the target, and alarms when the distance is smaller than a threshold value; detecting the fatigue state, namely detecting the face and eyes in an artificial intelligence mode, and judging whether the fatigue driving state is the fatigue driving state according to the state of the eyes; dangerous action detection, namely, detecting key nodes of a human body in an artificial intelligence mode, and identifying whether an operator has dangerous actions such as calling, drinking water and the like according to the characteristics of the key nodes and the target detection results of a mobile phone, a water cup and the like.
Finally, it should be noted that: the foregoing description is only of the preferred embodiments of the invention and is not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. The comprehensive operator safety detection method is characterized by comprising the following steps of:
acquiring a basic model for comprehensive operator safety detection, wherein the basic model is a PyTorch model;
converting the PyTorch model into an ONNX model;
performing low-precision quantization processing and topology removal processing on the ONNX model;
converting the processed ONNX model into an IR model;
allowing the IR model to run on a nerve computation stick;
the basic model further comprises a character ranging model, a fatigue operation recognition model, a dangerous action recognition model, a face recognition model and a safety helmet detection;
the character ranging model comprises target detection and ranging, wherein the target detection is to detect the targets of the people and the vehicles in each frame of image, and draw a detection frame; according to the distance measuring method, calculating the distance between a person and a vehicle, and judging whether an alarm is required according to whether the distance is smaller than a threshold value;
the target detection comprises the following steps:
determining optimal overlap ratio IoU threshold N under non-overlapping condition of detection frames by using cross-validation method t
Judging whether overlapping exists or not according to the number of the communicated detection frames;
if not, according to IoU threshold N t NMS operation is carried out on the detection frame;
if yes, carrying out cluster analysis on the communicated detection frames, dividing the communicated detection frames into a plurality of clusters through the cluster analysis, and then, in each cluster, carrying out N according to a IoU threshold value t To perform NMS operations;
the method for carrying out cluster analysis on the communicated detection frames comprises the following steps:
using YOLOv3 as a training model, wherein the number of detection frames allocated to each target is fixed, dividing the number of communicated detection frames by the number of detection frames allocated to each target, rounding up, and determining the number of clusters according to the rounding up;
and calculating the distance between the center points of the communicated detection frames to form a distance adjacent matrix, removing the symmetrical part of the adjacent matrix, calculating the maximum value of each column of the matrix to form a row vector, performing differential operation on the row vector, and forcibly disconnecting the connectivity of two communicated detection frames with the distances arranged in the first few according to the number of clusters to form a plurality of clusters.
2. The method for detecting safety of comprehensive operators according to claim 1, wherein the method for ranging in the character ranging model comprises:
the method for measuring the distance between the person and the vehicle by adopting a monocular distance measuring method comprises the following steps:
D=(W×F)/P
wherein W is the target width, F is the focal length of the camera, and P is the pixel width;
and meanwhile, the focal length F of the camera is determined in a focusing mode, and the pixel width P can be determined according to the width of the target detection frame.
3. The comprehensive worker safety detection method according to claim 1, wherein the method of fatigue operation identification model comprises: and detecting the face of the operator, detecting eyes on the basis of the face detection result, judging whether the operator is in a closed-eye state according to the eye detection result, if so, judging whether the continuous time length of closing the eyes exceeds a threshold value, if so, judging that the operator is in a fatigue state, and giving an early warning.
4. The comprehensive worker safety detection method according to claim 3, wherein when judging whether the operator is in a closed-eye state, the face detection is first performed by using the Haar model, the eye detection is performed again based on the face detection result, and when the eye is not detected, the operator is directly considered to be in the closed-eye state.
5. The comprehensive worker safety detection method according to claim 1, wherein the method of dangerous action recognition model includes:
identify key nodes of human body for each frame of image, calculate characteristic vector of human body posture,
meanwhile, target detection is carried out on specific articles, wherein the specific articles comprise mobile phones and water cups;
and classifying and identifying dangerous behaviors by using a support vector machine according to the human body posture feature vector and the target detection feature data.
6. The comprehensive worker safety detection method according to claim 1, wherein the face recognition model method comprises:
and (3) performing face detection by using a Dlib model, intercepting a face image after the face is detected, converting the image into a data set with a specific size, using the data set as the input of a convolutional neural network, and transforming the data set through a convolutional layer and a pooling layer of the convolutional neural network to obtain 512-dimensional feature vectors, and classifying according to the 512-dimensional feature vectors to obtain a classification result.
7. The method for detecting the safety of comprehensive operators according to claim 1, wherein the method for detecting the safety helmet comprises the following steps:
and (3) carrying out face detection by using a Dlib model, photographing a person corresponding to the face after the face is detected, obtaining image information, wherein the detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
CN202211339248.1A 2022-10-28 2022-10-28 Comprehensive operator safety detection method Active CN115565016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211339248.1A CN115565016B (en) 2022-10-28 2022-10-28 Comprehensive operator safety detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211339248.1A CN115565016B (en) 2022-10-28 2022-10-28 Comprehensive operator safety detection method

Publications (2)

Publication Number Publication Date
CN115565016A CN115565016A (en) 2023-01-03
CN115565016B true CN115565016B (en) 2023-06-09

Family

ID=84769019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211339248.1A Active CN115565016B (en) 2022-10-28 2022-10-28 Comprehensive operator safety detection method

Country Status (1)

Country Link
CN (1) CN115565016B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257799A (en) * 2020-10-30 2021-01-22 电子科技大学中山学院 Method, system and device for detecting household garbage target
US11816987B2 (en) * 2020-11-18 2023-11-14 Nvidia Corporation Emergency response vehicle detection for autonomous driving applications
CN113221687B (en) * 2021-04-28 2022-07-22 南京南瑞继保电气有限公司 Training method of pressing plate state recognition model and pressing plate state recognition method
CN114067211A (en) * 2021-11-22 2022-02-18 齐鲁工业大学 Lightweight safety helmet detection method and system for mobile terminal

Also Published As

Publication number Publication date
CN115565016A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
Mahmood et al. Facial expression recognition in image sequences using 1D transform and gabor wavelet transform
CN111767900B (en) Face living body detection method, device, computer equipment and storage medium
CN110363183A (en) Service robot visual method for secret protection based on production confrontation network
CN108830237B (en) Facial expression recognition method
CN111274916A (en) Face recognition method and face recognition device
JP2012053756A (en) Image processor and image processing method
Mafeni Mase et al. Benchmarking deep learning models for driver distraction detection
Rao et al. Neural network classifier for continuous sign language recognition with selfie video
Dipu et al. Real-time driver drowsiness detection using deep learning
CN113269010B (en) Training method and related device for human face living body detection model
CN111860056B (en) Blink-based living body detection method, blink-based living body detection device, readable storage medium and blink-based living body detection equipment
CN115331205A (en) Driver fatigue detection system with cloud edge cooperation
Pandey et al. Dumodds: Dual modeling approach for drowsiness detection based on spatial and spatio-temporal features
Zhang et al. Development of a rescue system for agricultural machinery operators using machine vision
Sakthimohan et al. Detection and Recognition of Face Using Deep Learning
Yang et al. Dangerous Driving Behavior Recognition Based on Improved YoloV5 and Openpose [J]
Echoukairi et al. Improved Methods for Automatic Facial Expression Recognition.
Li et al. Monitoring and alerting of crane operator fatigue using hybrid deep neural networks in the prefabricated products assembly process
CN111814760B (en) Face recognition method and system
KR101542206B1 (en) Method and system for tracking with extraction object using coarse to fine techniques
CN115565016B (en) Comprehensive operator safety detection method
CN113205060A (en) Human body action detection method adopting circulatory neural network to judge according to bone morphology
Yu et al. Drowsydet: a mobile application for real-time driver drowsiness detection
CN111898454A (en) Weight binarization neural network and transfer learning human eye state detection method and device
CN109815887B (en) Multi-agent cooperation-based face image classification method under complex illumination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20230103

Assignee: Jiangxi Tianyi Information Engineering Co.,Ltd.

Assignor: NORTH CHINA University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2023110000088

Denomination of invention: A Comprehensive Safety Testing Method for Operators

Granted publication date: 20230609

License type: Common License

Record date: 20230717

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20230103

Assignee: Beijing Zhonglian Purui Technology Co.,Ltd.

Assignor: NORTH CHINA University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2023980041242

Denomination of invention: A Comprehensive Safety Testing Method for Operators

Granted publication date: 20230609

License type: Common License

Record date: 20230905