Disclosure of Invention
The invention aims to provide a comprehensive operator safety detection method, which is used for solving the problems that an artificial intelligent algorithm in the prior art has high requirements on operation equipment, and the detection rate is too slow and the detection effect is lost when the operation is not Chang Ka at a small-volume and low-power-consumption processor end.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a comprehensive operator safety detection method comprises the following steps:
acquiring a basic model for safety detection, wherein the basic model is a PyTorch model;
converting the PyTorch model into an ONNX model;
performing low-precision quantization processing and topology removal processing on the ONNX model;
converting the processed ONNX model into an IR model;
the IR model was run on a nerve computation stick.
According to the technology, the algorithm adopts a mode of model conversion to optimize the model, firstly, the PyTorch model is converted into a Open Neural Network Exchange (ONNX) model, then, the ONNX model is subjected to low-precision quantification treatment and topology removal treatment, the treated model is generated into files of IR models of xml and bin, and then the IR model is operated on an Intel nerve computation stick, so that the reasoning speed of the model can be greatly increased, and the edge end computing effect is greatly improved.
In one possible design, the basic model includes a character ranging model, the character ranging model includes target detection and ranging, the target detection is to detect the targets of the people and the vehicles for each frame of image, and a detection frame is drawn; according to the distance measuring method, calculating the distance between a person and a vehicle, and judging whether an alarm is required according to whether the distance is smaller than a threshold value;
the target detection comprises the following steps:
determining optimal overlap ratio IoU threshold N under non-overlapping condition of detection frames by using cross-validation method t ;
Judging whether overlapping exists or not according to the number of the communicated detection frames;
if not, according to IoU threshold N t NMS operation is carried out on the detection frame;
if yes, carrying out cluster analysis on the communicated detection frames, dividing the communicated detection frames into a plurality of clusters through the cluster analysis, and then, in each cluster, carrying out N according to a IoU threshold value t To perform NMS operations.
Correspondingly, the method for carrying out cluster analysis on the communicated detection frames comprises the following steps:
using YOLOv3 as a training model, wherein the number of detection frames allocated to each target is fixed, dividing the number of communicated detection frames by the number of detection frames allocated to each target, rounding up, and determining the number of clusters according to the rounding up;
and calculating the distance between the center points of the communicated detection frames to form a distance adjacent matrix, removing the symmetrical part of the adjacent matrix, calculating the maximum value of each column of the matrix to form a row vector, performing differential operation on the row vector, and forcibly disconnecting the connectivity of two communicated detection frames with the distances arranged in the first few according to the number of clusters to form a plurality of clusters.
Correspondingly, the method for ranging in the character ranging model comprises the following steps:
the method for measuring the distance between the person and the vehicle by adopting a monocular distance measuring method comprises the following steps:
D=(W×F)/P (8)
wherein W is the target width, F is the focal length of the camera, and P is the pixel width;
during calculation, W and F are set to be constant, meanwhile, the focal length F of the camera is determined in a focusing mode, and the pixel width P can be determined according to the width of the target detection frame.
The method can be completed by using the common camera through monocular distance measurement, and the cost of one common camera is greatly reduced compared with that of the binocular camera.
In one possible design, the base model further includes a fatigue operation identification model, and the method for identifying a fatigue operation identification model includes: and detecting the face of the operator, detecting eyes on the basis of the face detection result, judging whether the operator is in a closed-eye state according to the eye detection result, if so, judging whether the continuous time length of closing the eyes exceeds a threshold value, if so, judging that the operator is in a fatigue state, and giving an early warning. Whether the operator is tired or not is judged by judging whether the operator is in a eye-closing state or not and judging the time of the eye-closing state, compared with the traditional rapid action such as blinking, the device is required to have a rapid detection speed, and therefore the edge computing effect is improved.
Accordingly, when judging whether the operator is in a closed-eye state, the operator first detects the face by using the Haar model, and then detects the eyes on the basis of the face detection result, and when the eyes are not detected, the operator is directly considered to be in the closed-eye state.
In one possible design, the basic model further includes a dangerous action recognition model, and the method of the dangerous action recognition model includes:
identify key nodes of human body for each frame of image, calculate characteristic vector of human body posture,
meanwhile, target detection is carried out on specific articles, wherein the specific articles comprise mobile phones and water cups;
and classifying and identifying dangerous behaviors by using a support vector machine according to the human body posture feature vector and the target detection feature data.
In the dangerous action recognition process, by adding the target detection result characteristics, the false detection rate and the omission rate can be reduced, and the detection precision is improved.
In one possible design, the basic model further includes a face recognition model, and the method of the face recognition model includes:
and (3) carrying out face detection by using a Dlib model, intercepting a face image after the face is detected, converting the image into a data set with a specific size, taking the data set as the input of a convolutional neural network, and carrying out transformation on the data set through a convolutional layer and a pooling layer of the convolutional neural network to obtain 512-dimensional feature vectors, and classifying according to the 512-dimensional feature vectors to obtain a classification result.
In one possible design, the basic model further includes a safety helmet detection, and the method for safety helmet detection includes:
detecting human face by Dlib model, photographing the corresponding person to obtain image information,
the detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
The beneficial effects are that:
the algorithm adopts a mode of model conversion to optimize the model, firstly converts a PyTorch model into a Open Neural Network Exchange (ONNX) model, then carries out low-precision quantification treatment and topology removal treatment on the ONNX model, generates files of IR models of xml and bin for the treated model, and then enables the IR model to operate on an Intel nerve computation stick, thereby greatly accelerating the reasoning speed of the model and greatly improving the edge end computation effect;
the method has the advantages that a target detection algorithm for people and vehicles is provided, target detection under the overlapping situation of the people and the vehicles is realized, and the detection effect is improved;
the method can be completed by using a common camera through monocular distance measurement, and the cost of one common camera is greatly reduced compared with that of a binocular camera; whether the operator is tired or not is judged by judging whether the operator is in a eye-closing state or not and judging the time of the eye-closing state, compared with the traditional rapid action such as blinking, the device is required to have a rapid detection speed, so that the detection speed is improved; in the dangerous action recognition process, by adding the target detection result characteristics, the false detection rate and the omission rate can be reduced, and the detection precision is improved.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the description of the embodiments or the prior art, and it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art. It should be noted that the description of these examples is for aiding in understanding the present invention, but is not intended to limit the present invention.
As shown in fig. 1, comprehensive operator safety detection mainly comprises three aspects of safety detection of personal and machine equipment before operation, and realizes comprehensive safety guarantee.
The safety detection before operation comprises identity authentication based on face recognition and detection of a safety helmet, and the operation of the equipment can be performed after the detection passes; in the operation process, for large dangerous equipment, the approach of people or vehicles is forbidden in the running process, the target detection of objects such as people, vehicles and the like is required to be carried out, the distance measurement is required, and the alarm is required to be carried out when the distance is too close; in the operation process, the state of an operator is detected, if the operator is in a fatigue state or dangerous behaviors such as calling, drinking and the like exist, and if the operator is in the fatigue state, the operator can give an alarm in time.
Examples:
the embodiment provides a comprehensive operator safety detection method, as shown in fig. 2, comprising the following steps: acquiring a basic model for safety detection, wherein the basic model is a PyTorch (pt) model; converting the pyrerch model to a Open Neural Network Exchange (ONNX) model; performing low-precision quantization processing and topology removal processing on the ONNX model by using OpenVino; converting the processed ONNX model into an IR model to generate IR files of xml and bin; the IR model was run on a nerve computation stick, which was Intel second generation nerve computation stick NCS2. The reasoning speed of the model can be greatly increased, the detection effect of 20 frames per second on average can be realized, and the edge computing effect is greatly improved. Wherein IR (ImageReady) is image editing software mainly processing network graphics, ONNX is an open file format designed for machine learning, and is used for storing trained models. It allows different deep learning frameworks to store model data in the same format. ONNX is an intermediate expression format that facilitates migration of models in various mainstream deep learning frameworks.
In one possible design, the basic model includes a character ranging model, the character ranging model includes target detection and ranging, the target detection is to detect the targets of the people and the vehicles for each frame of image, and a detection frame is drawn; according to the distance measuring method, calculating the distance between a person and a vehicle, and judging whether an alarm is required according to whether the distance is smaller than a threshold value;
the method improves the NMS method in the case of overlap, and the target detection comprises the following steps:
first, using cross-validation method to determine the optimal IoU threshold N without overlap t ;
Judging whether overlapping exists according to the number of the communication frames;
when there is no overlap, the IoU threshold N is used t Performing NMS operation;
if overlapping exists, the connected detection frames are divided into a plurality of clusters by using cluster analysis, and then the N threshold value of IoU is utilized in each cluster t To perform NMS operations.
Specifically, in the process of operating the equipment by an operator (such as in the driving process of a forklift), if people around the equipment are close to each other, the damage to the people is easily caused, so that people and vehicles around the equipment need to be detected in the operation process of dangerous equipment, and when the target is close to the equipment, a prompt is given.
The 'person and object ranging' algorithm is divided into two steps, namely, target detection is carried out, namely, people and vehicles are detected, and a detection frame is drawn; and secondly, distance measurement, namely calculating the distance between the detected person or vehicle and the mechanical equipment, and giving an alarm when the distance is smaller than a threshold value. The present embodiment mainly optimizes the algorithm of target detection.
Many detection frames are generated in the target detection reasoning process, but only one detection frame is needed for each target finally. Specifically, the NMS operates as Non-maximum suppression (Non-Maximum Suppression, NMS) to sequence the detection frames from large to small according to the confidence score, select the detection frame with the highest score, and calculate the Intersection-over-Union (IoU) ratio of the detection frame with the highest score, wherein IoU reflects the overlapping degree of the two Intersection frames, and if the overlapping degree is high, the IoU is larger, and the value range of IoU is 0-0.5; the specific calculations are shown in equation 1,
wherein, C represents the detection frame with the highest confidence score, which is simply called detection frame C, area (C) represents the area of the detection frame C, area (G), G represents the detection frame intersected with the detection frame with the highest confidence score, which is simply called detection frame G, area (G) represents the area of the detection frame G.
When IoU value exceeds threshold N t For exceeding threshold N t Is suppressed, as shown in equation 2,
i is a detection frame intersected with a detection frame with highest confidence score, s i Confidence score for detection box i; when IoU exceeds the threshold, the overlapping degree of the frame and the detection frame with the highest confidence score is higher, and the frame needs to be restrained, namely deleted;
detection frames exceeding the threshold are to be deleted, which is liable to cause a missed detection situation. A common approach to solving this problem is a post-processing method of Soft-NMS, which translates deleting the threshold box above IoU to attenuating its confidence, as shown in equation 3,
the present algorithm improves the NMS method in the case where there is overlap. First, whether there is an overlapping situation is judged according to the number of the connected frames, and when there is no overlapping situation, a IoU threshold N is used t Performing NMS operation; if overlapping exists, firstly, clustering analysis is utilized to divide the communicated detection frame into a plurality of clusters, and then, ioU threshold N is utilized in each cluster t To perform the operation of the NMS. A specific example is shown in flow chart 3.
Correspondingly, the cluster analysis method comprises the following steps: and (3) using YOLOv3 as a training model, wherein the number of detection frames allocated to each target is fixed, dividing the number of communication frames by the number of each target allocation frame, rounding to determine the number of clusters, calculating the distance between the central points of the communication frames to form a distance adjacent matrix, removing the symmetrical part of the adjacent matrix, calculating the maximum value of each column of the matrix to form a row vector, performing differential operation, and forcibly disconnecting the connectivity of two communication frames with the distance row in the first few according to the number of clusters to form a plurality of clusters. A specific example is shown in fig. 4. For a pair ofAt IoU the optimal threshold N t Setting, determining the optimal threshold value N by adopting a cross-validation method t 。
The effect of the method and the conventional method combined with the embodiment is verified as follows:
and collecting images of vehicles and human bodies at different times and different places, wherein a total of 3720 images are obtained, each photo has a plurality of people and vehicles, a total of 174,848 targets to be detected, and a test experiment is carried out. Different NMS methods are evaluated using Precision (P), recall (R), false alarm (F), and miss (M), the calculation formulas of which are shown in fig. 4-7, wherein TP, TN, FP, FN represents the number of positive examples, negative examples, false alarms, and missed alarms, respectively. As shown in Table 1, the improved NMS method of the algorithm has the advantages that the false detection rate and the omission rate are smaller than those of the common NMS and the Soft-NMS, the accuracy rate is higher than those of the common NMS and the Soft-NMS, and the experiment proves that the detection effect is improved.
Table 1 comparison of the detection accuracy of different models
In one possible embodiment, the method for ranging in the character ranging model includes:
the method for measuring the distance between the person and the vehicle by adopting a monocular distance measuring method comprises the following steps:
D=(W×F)/P (8)
wherein W is the target width, F is the focal length of the camera, and P is the pixel width;
during calculation, W and F are set to be constant, meanwhile, the focal length E of the camera is determined in a focusing mode, and the pixel width P can be determined according to the width of the target detection frame.
The conventional distance measuring and calculating method is usually completed in a binocular distance measuring mode, the cost of the method is high, the cost of a binocular distance camera is generally more than thousands of yuan, and for one device to realize multi-azimuth detection of people and vehicles, at least cameras in front, back, left and right directions are needed, and if the binocular distance camera is adopted, the cost of the camera is more than 4000 yuan. The algorithm adopts a monocular ranging method to realize the ranging of the targets of people and vehicles, and the method has the characteristics of low cost and high running speed. The monocular distance measurement can be completed by using the common camera, the cost of one common camera is only about tens of yuan, and compared with the binocular camera, the monocular distance measurement device has the advantage that the cost is greatly reduced.
In a possible implementation manner, the basic model further includes a dangerous action recognition model, and the method for the dangerous action recognition model includes:
identify key nodes of human body for each frame of image, calculate characteristic vector of human body posture,
meanwhile, target detection is carried out on specific articles, wherein the specific articles comprise mobile phones and water cups;
and classifying and identifying dangerous behaviors by using a support vector machine according to the human body posture feature vector and the target detection feature data.
According to the method, firstly, an OpenPose open source model is adopted to identify key nodes of a human body, 4 key node data of the head, the arm and the like of the human body in each frame of image are read in real time, and gesture characteristics are extracted. Specific gesture features include: (1) triangle characteristic formed by three nodes of arm shoulder, elbow and wrist. When the arm is in a straightening state, the triangle is an obtuse triangle, the ratio of the square of the long side to the sum of squares of the two short sides is more than 1, when the arm is bent, such as making a call or drinking water, the acute triangle is easy to form, and the ratio of the square of the long side to the sum of squares of the two short sides is less than 1; (2) the area characteristics of the polygon formed by the head node and the three nodes of the arm shoulder, the elbow and the wrist. If the operation is normal, the area is relatively large, and when dangerous actions such as calling or drinking water are performed, the area is reduced; (3) distance of wrist node from head node. The wrist node is far from the head node when operating normally, and becomes very close when making a call or drinking water.
Collecting human body key node identification images at different times and different places, and performing a total of 147376 test experiments. And evaluating the SVM recognition results by using different methods by using the accuracy rate P, the recall rate R, the false detection rate F and the omission rate M. As shown in table 2. The accuracy rate P and recall rate R of SVM identification by utilizing the key node characteristics of the human body are 95.90% and 96.71%, respectively, and the accuracy rate P and recall rate R of SVM identification added with the target detection result characteristics are 96.84% and 97.63%, respectively; meanwhile, the false detection rate F and the false omission rate M of SVM identification by utilizing the key node characteristics of the human body are 28.66% and 3.29%, respectively, and the false detection rate F and the false omission rate M of SVM identification added with the target detection result characteristics are 19.76% and 2.37% respectively. Experiments prove that the accuracy rate of the two methods exceeds 95%, but the detection effect of the gesture recognition method added with target detection is better.
Table 2 comparison of SVM recognition results by different feature quantity methods
In the specific implementation, the acceleration can be performed in a parallel computing mode, but a higher requirement is put on the computing capacity of the equipment, the intel nerve computing rod is adopted by the equipment to improve the reasoning speed, one nerve computing rod is used by the OpenPose gesture recognition model, one nerve computing rod is used by the cup and mobile phone target detection model, the two reasoning models can be accelerated at the same time, and after the corresponding characteristics are extracted, dangerous actions are recognized by the support vector machine SVM.
In one possible embodiment, the base model includes a fatigue operation identification model, and the method for identifying a fatigue operation identification model includes: the face is detected in the image, then eyes are detected in the face image, if eyes are detected, the eyes are opened, and if eyes are not detected, the eyes are closed. Judging whether the operator is in a fatigue state according to whether the continuous time length of closing the eyes exceeds a threshold value, and giving an early warning if the operator is in the fatigue state. For rapid actions such as blinking, the device is required to have a high detection speed, and the face and eyes can be efficiently detected through the Haar model, so that the effect of real-time detection is realized.
For the recognition of the eye closing state, the conventional algorithm mostly utilizes the extracted facial key node data, and then judges whether the eye is closed or not according to key node characteristics such as the aspect ratio of an eye closing graph. The algorithm for extracting the facial key node data is mostly used in face recognition, so that people can be identified, the accuracy is high, and the corresponding running speed is also prolonged. The algorithm does not need to identify, only needs to detect the face, adopts a Haar model, utilizes an integral graph to rapidly calculate rectangular features such as boundary features, linear features, central features and the like, and then utilizes an AdaBoost algorithm to rapidly detect the face.
Based on the face detection result, the Haar eye detection model is also adopted to detect eyes, and when no eyes are detected, the detection is directly regarded as an eye-closing state. The Haar model increases the detection speed, and as shown in fig. 5, when the image size is 30M, the average detection time is only 0.1 seconds, and the average detection time is already 1s by adopting the method of eye closure recognition based on the facial key node characteristics. Experiments prove that the detection speed of the algorithm is higher than that of other algorithms. Meanwhile, as the Haar model adopts an AdaBoost algorithm, the detection algorithm is optimized, the detection precision is improved, as shown in table 3, the accuracy of the closed-eye detection based on the facial key nodes is only 94.01%, and the accuracy of the closed-eye detection model based on the Haar model is improved to 95.72%. The detection effect is improved.
Table 3 comparison of SVM recognition results by different feature quantity methods
In the dangerous action recognition process, by adding the target detection result characteristics, the false detection rate and the omission rate can be reduced, and the detection precision is improved.
In one possible design, the basic model further includes a face recognition model, and the method of the face recognition model includes:
and (3) carrying out face detection by using a Dlib model, intercepting a face image after the face is detected, converting the image into a data set with a specific size, taking the data set as the input of a convolutional neural network, and carrying out transformation on the data set through a convolutional layer and a pooling layer of the convolutional neural network to obtain 512-dimensional feature vectors, and classifying according to the 512-dimensional feature vectors to obtain a classification result.
Specifically, an employee approaches a camera, a Dlib model performs face detection, a face image is intercepted after the face is detected, the image is converted into a 64 x 3 data set, the data set is used as input of a convolutional neural network, the data set is transformed through a convolutional layer and a pooling layer, 512-dimensional feature vectors are obtained, and classification is performed according to the feature vectors, so that a classification result is obtained.
In one possible implementation manner, the basic model further comprises a safety helmet detection, and the safety helmet detection method comprises the following steps:
detecting human face by Dlib model, photographing the corresponding person to obtain image information,
the detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
As a specific example, the safety equipment detection of the system mainly detects whether an employee wears a helmet. Firstly, the detection model of the safety helmet in the design is trained through a hundred-degree flying oar easy DL platform, the model training process is that a model is firstly established, for example, a training model is selected as YOLOv5, and the like, then a training data set is uploaded, training is carried out, verification is carried out after the training is completed, and the detection can be carried out after the verification is passed. The detection process adopts an API interface mode, the detection model is accessed through the Internet, the platform gives a detection result through the Internet, if the safety helmet is detected, the position information of the detection frame is returned in a JSON string mode, and if the safety helmet is not detected, the JSON string is returned to be empty.
The equipment completely adopts an artificial intelligence method to realize safety detection, and realizes more intelligent, more accurate, more convenient and quick safety detection. For example, the identification is realized mainly by using an intelligent mode of face recognition; the detection of the safety helmet is mainly realized by a safety helmet target detection method; the safety detection of the large-scale mobile equipment mainly utilizes a target detection mode to detect people or vehicles around the equipment, then calculates the distance of the target, and alarms when the distance is smaller than a threshold value; detecting the fatigue state, namely detecting the face and eyes in an artificial intelligence mode, and judging whether the fatigue driving state is the fatigue driving state according to the state of the eyes; dangerous action detection, namely, detecting key nodes of a human body in an artificial intelligence mode, and identifying whether an operator has dangerous actions such as calling, drinking water and the like according to the characteristics of the key nodes and the target detection results of a mobile phone, a water cup and the like.
Finally, it should be noted that: the foregoing description is only of the preferred embodiments of the invention and is not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.