CN115588165A

CN115588165A - Tunnel worker safety helmet detection and face recognition method

Info

Publication number: CN115588165A
Application number: CN202211318528.4A
Authority: CN
Inventors: 周茂; 岳杨; 胡立锦; 何文彬; 邹飞; 颜嘉; 李育骏; 李智; 余留洋; 廖柯嘉; 罗洪平; 唐贤伦
Original assignee: State Grid Chongqing Electric Power Co Construction Branch
Current assignee: State Grid Chongqing Electric Power Co Construction Branch
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-01-10

Abstract

The invention relates to a tunnel worker safety helmet detection and face recognition method, which belongs to the technical field of image target detection and face recognition, and comprises the steps of firstly, generating a classified label data set by utilizing acquired image label data of a person wearing and not wearing a safety helmet, and performing data enhancement, wherein images are balanced to balance adverse effects caused by harsh care conditions of a tunnel; training a classified convolution neural network model Tiny-YOLOv3 and a FaceNet neural network model fusing distance thresholds, transmitting a face picture into a FaceNet network to obtain a feature vector representing the face, calculating a combined distance between the feature vector and the face in a database and the Euclidean distance and cosine similarity, wherein the combined distance not only represents the direction of the feature vector, but also considers the absolute difference of the feature vector in terms of the value; meanwhile, abnormal alarm is carried out on the detection and identification results, so that the identity of tunnel workers is effective, the early warning of the safety helmet is not worn, and the access of irrelevant personnel is forbidden.

Description

Tunnel worker safety helmet detection and face recognition method

Technical Field

The invention belongs to the technical field of image target detection and face recognition, and relates to a method for detecting a safety helmet of a tunnel worker and recognizing a face.

Background

Convolutional Neural Networks (CNNs), a model architecture for deep learning, have become the most effective method in the field of image processing and computer vision. The two characteristics of weight sharing and local receptive field reduce the number of weights, so that the operation complexity of the model is reduced; the translation invariance of the image features also enables the image features to have good feature extraction capability and high stability.

Currently, a great deal of research is carried out on target detection and face recognition methods by using a convolutional neural network, and since R.Girshick et al propose an R-CNN deep learning model based on a candidate region in 2014, a group of classical target detection algorithms such as fast R-CNN, SSD, YOLO and the like are generated, and meanwhile, face recognition models such as faceNet and ArcFace and the like are generated. The trend in these models is generally that the number of network layers is continuously increased to obtain better feature extraction capability, and the image scale is continuously increased to cover a wider range of features. However, the more complex models bring problems of difficult network convergence, fast parameter growth, slow calculation speed and the like, and the most significant problem is that the models are difficult to deploy on resource-limited devices as the complexity of the models increases.

Meanwhile, on the construction site of the tunnel, the safety helmet can be correctly worn to effectively prevent and reduce the damage of safety accidents of workers in the operation process. Related studies have shown that safety accidents caused by the non-wearing of safety helmets are relatively large. However, at present, the supervision of wearing the safety helmet on a construction site is divided into two ideas, one is mainly dependent on manual supervision, the efficiency is low, and manpower is consumed; secondly, a detection network is established by utilizing deep learning, but the existing safety helmet wearing detection network model has the problems of low accuracy, low reasoning speed, incapability of meeting the requirements on precision and instantaneity when deployed to edge computing equipment and the like. Therefore, utilizing a lightweight end-to-end detection network is a poor choice for hardhat detection. Meanwhile, the safety helmet in the tunnel is used for detecting the problem that the accuracy of small targets is low in the conventional industry such as the building industry, and interference of dark light exists.

For the model of face recognition, after face images are subjected to convolution operation for multiple times, the final input vector contains the richest spatial and semantic information. For how to identify the identity of a worker, generally, the euclidean distance or cosine similarity between feature vectors representing human faces is calculated to calculate whether the identity is successfully identified, but any one of the euclidean distance or cosine similarity cannot take account of the direction and absolute difference of the vectors. Meanwhile, the general face recognition network finally uses 128-dimensional vectors to represent the identity, but for a large amount of data, the higher-dimensional representation vectors are adopted to be beneficial to distinguishing more fine features among the feature vectors.

Disclosure of Invention

In view of the above, the invention aims to provide a tunnel worker helmet detection and face recognition method based on the Tiny-YOLOv3 and the fusion distance threshold FaceNet, which performs model training to quickly obtain a face under the condition of detecting a helmet in real time, and after the face is aligned, compares the face with a database by calculating a characterization vector of the face to quickly recognize the identity of a worker and whether the worker wears the helmet.

In order to achieve the purpose, the invention provides the following technical scheme:

a tunnel worker safety helmet detection and face recognition method comprises the following steps:

s1: acquiring face images with safety helmets of tunnel workers in different postures, aligning the faces to be used as a data set for face recognition, recording the data set as a recog _ dataset, and establishing a face database of each worker;

s2: acquiring a plurality of face images with or without a safety helmet, labeling and enhancing data of the faces with or without the safety helmet as a training set of a detection network, and recording the training set as a detec _ dataset;

s3: training a classified convolutional neural network model Tiny-YOLOv3 for detecting whether a worker wears a safety helmet or not by using the detec _ dataset enhanced by the data in the step S2; training a convolutional neural network faceNet for face recognition by using the data of the recog _ dataset in the step S1;

s4: detecting the input image by using the trained Tiny-YOLOv3 in the step S3 to obtain the rectangular coordinates of the head image of the input image with the safety helmet, and intercepting the local picture of the detected part from the input image according to the corresponding coordinates;

s5: carrying out key point detection on the local picture in the step S4 by using dlib, aligning the local picture to an FFHQ data set sample by adopting affine transformation, and equalizing the aligned picture to obtain a human face picture to be identified;

s6: the face picture to be recognized in the step S5 is transmitted into a faceNet network to obtain a 512-dimensional feature vector representing the face;

s7: calculating cosine similarity, euclidean distance and Manhattan distance between the feature vector in the step S6 and the face in the database, and judging whether the combined result of the Euclidean distance and the cosine similarity meets a threshold value for successful recognition or not so as to judge whether the combined result is a worker or not;

s8: and performing identity authentication according to the result in the step S7, and warning the person who does not wear the safety helmet, so as to realize the identification of the tunnel staff and the early warning of the safety helmet.

Further, after acquiring the face images with the safety helmet of different postures of tunnel workers in the step S1, carrying out key point detection by using dlib, aligning the local picture to an FFHQ data set sample by adopting affine transformation, and equalizing the aligned pictures to be used as a database for face recognition, specifically:

s11: using dlib to traverse the FFHQ data set sample to obtain 5 key point coordinates of the FFHQ, namely key points in two eyes and people, and using the key points as expected template coordinates;

s12: carrying out dlib 5 key point detection on the picture with the safety helmet, and transforming the key points of the picture to template coordinates through affine transformation to realize face alignment;

s13: meanwhile, the influence of the light is weakened by using equalization in consideration of the problem of light imbalance of the tunnel.

Further, in the step S2, the face is labeled by adopting ImageLabel, the face is divided into two categories of wearing and non-wearing safety helmets, and the labeled picture is rotated, overturned and fuzzified to enrich the training set.

Further, the step S3: training a classified convolutional neural network model Tiny-YOLOv3 for detecting whether a worker wears a safety helmet or not by using the detec _ dataset data in the step S2; using the recog _ dataset data set data in the step S1 to train a convolutional neural network FaceNet for face recognition, specifically:

s31: inputting the data-enhanced labeled image into a target detection Tiny-YOLOv3 model for supervised training: updating the weight of the target detection model by using an Adam optimization algorithm by taking a loss function between the minimized predicted value and the label as an optimization target; the loss function is composed of a positioning loss function and a classification loss function, wherein the positioning loss function adopts a cross-over ratio DIoU (Distance interaction over Unit):

where IoU denotes the ratio of intersection to union of the prediction box and the real box, DIoU denotes IoU, b, taking into account the distance between the centers of the prediction box and the real box ^gt Representing the center points of the predicted and real frames, respectively, B ^gt B is a real frame and a prediction frame, rho represents the Euclidean distance between two central points, and c represents the diagonal distance of a minimum closure area which can simultaneously contain the prediction frame and the real frame;

the classification loss function is binary cross-entropy BCE (binary cross entropy):

wherein y is the label category, p (y) is the probability that the output belongs to the y label, i represents the ith sample in a training batch, and N represents the total number of samples contained in a training batch;

s32: inputting the different shielded face images subjected to face alignment into a FaceNet model for supervision training: taking a triple loss function as a target, aiming at enabling the intra-class distance to be smaller and the inter-class distance to be larger, updating the weight of a target detection model by using an Adam optimization algorithm, wherein the loss function is as follows:

wherein

The training sample of the ith class a in one batch,

is represented by

The samples of the same type are used for the analysis,

is shown and

a sample that is not of the same class,

representing the Euclidean distance between a and b, alpha is a threshold value, and loss and gradient are generated when the distance between the alpha class inner distance and the class outer distance is larger than the threshold value.

Further, in step S4, the actually shot picture is transferred into a Tiny-yoolov 3 detection network, and the detected face is cut out after affine transformation.

Further, step S5 specifically includes the following steps:

s51: using dlib to traverse the FFHQ data set sample to obtain 5 key point coordinates of the FFHQ, namely key points in two eyes and people, and using the key points as expected template coordinates;

s52: 5 key point detection of dlib is carried out on the local picture in the step S4, and the picture key points are transformed to template coordinates through affine transformation to achieve face alignment;

s53: meanwhile, the problem of light imbalance of the tunnel is considered, and the influence of light is weakened by using equalization.

Further, in step S7, the aligned face image is transmitted to the recognition network to obtain a 512-dimensional vector representing the face, and the distances between the face and all the faces in the database are calculated in the following manner:

wherein L is _Euclidean ,L _cosine Respectively, the euclidean distance and the cosine distance, x,

a 512-dimensional vector representing the input and a 512-dimensional feature vector in the database,

representing a given input x, finding the one in the database having the smallest Euclidean distance therefrom

Representing the euclidean distance between a and b.

Further, step S8 specifically includes: if the calculated distance meets the set threshold value, the same person is considered to be the same, namely identity confirmation is carried out, and otherwise authentication fails; meanwhile, whether the person wears the safety helmet or not can be obtained according to the detection result, and if the person does not wear the safety helmet and the identity authentication is abnormal, an alarm is given, so that the safety of the person and the entrance of irrelevant persons are ensured.

The invention has the beneficial effects that: the method uses dlib to detect key points, and adopts affine transformation to align the local pictures to the FFHQ data set samples, so that the pictures have uniform formats. The aligned pictures are equalized to obtain a face picture to be recognized, and adverse effects of the face picture on the equalization of harsh care conditions of the tunnel can be effectively balanced; the face picture is transmitted into a faceNet network to obtain a feature vector with 512 dimensionalities for representing the face, higher representation dimensionality is adopted, finer identification of the feature vector is achieved, and the face picture can be freely recognized even facing more people. Calculating a combined result of Euclidean distance and cosine similarity of the feature vector and the face in the database, judging the face as a worker according to a threshold value meeting successful recognition, and simultaneously, not only representing the direction of the feature vector, but also considering absolute difference of numerical values of the feature vector by the calculated result; meanwhile, abnormal alarm can be carried out on the detection and identification results, so that the identity identification of the workers in the tunnel is effective, the early warning of the safety helmet is not worn, and the access of irrelevant people is forbidden.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a tunnel worker helmet detection and face recognition method based on the Tiny-YOLOv3 and the fusion distance threshold FaceNet according to the preferred embodiment of the invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present invention, and the specific meaning of the terms described above will be understood by those skilled in the art according to the specific circumstances.

Referring to fig. 1, the tunnel worker helmet detection and face recognition method based on Tiny-yollov 3 and the fusion distance threshold FaceNet according to the embodiment includes the following steps:

step 1: the method comprises the steps of obtaining face images of safety helmets with different postures of workers, using dlib to detect key points, adopting affine transformation to align local pictures to FFHQ data set samples, equalizing the aligned pictures to be used as a data base for face recognition, and establishing a recog _ dataset data set for face recognition, wherein the recog _ dataset specifically comprises the following steps: using dlib to traverse the FFHQ dataset samples to obtain 5 key point coordinates of FFHQ, i.e. key points in both eyes and people, as the desired template coordinates. And carrying out dlib 5 key point detection on the picture with the wearable safety helmet, and transforming the picture key points to the template coordinates through affine transformation to realize face alignment. Meanwhile, the problem of light imbalance of the tunnel is considered, and the influence of light is weakened by using equalization.

And 2, step: acquiring a plurality of face images with or without safety helmets, and performing label labeling and data enhancement on the faces with or without safety helmets as a detec _ dataset training set of a detection network;

and 3, step 3: training a classified convolutional neural network model Tiny-YOLOv3 for detecting whether a worker wears a safety helmet or not by using the detec _ dataset enhanced by the data in the step 2; using the recog _ dataset data set data in the step 1 to train a face recognition convolutional neural network FaceNet, specifically:

the method comprises the steps of using 6 depth separable convolution and maximum pooling structures as a backbone network, using FPN to realize multi-scale detection, constructing a classification convolution neural network model for a large target and a medium target after a feature map of each scale, using positioning loss, confidence loss and classification loss as optimization targets until the model converges, storing the weight of the model with the best classification accuracy of a test set, and using an Adam optimization algorithm to update the weight of the model. The positioning loss takes DIou as a loss function, and the classification loss takes binary cross entropy as the loss function.

Where IoU denotes the ratio of intersection to union of the prediction box and the real box, DIoU denotes IoU, b, taking into account the distance between the centers of the prediction box and the real box ^gt Respectively representing the center points of the predicted frame and the real frame, B ^gt And B is the real and predicted boxes. And p represents the calculation of the euclidean distance between the two center points. c represents the diagonal distance of the minimum closure area that can contain both the prediction box and the real box.

Where y is the binary label 0 or 1,p (y) is the probability that the output belongs to the y label, i represents the ith sample in a training batch, and N represents the number of samples contained in a training batch.

To balance the speed accuracy, the FaceNet backbone network adopts mobile net to extract features and map the image to the Euclidean space. The spatial distance of the picture is directly related to the picture similarity: the distance between different images of the same person in space is small, and the distance between the images of different persons in space is large, so that the method can be used for face verification, recognition and clustering. The network targets a triple loss function, aiming at making the intra-class distance smaller and the inter-class distance larger.

Wherein

Ith category in a batchA training sample of a is selected from the training samples of a,

is shown and

the samples of the same type are used for the analysis,

is shown and

the samples of the non-homogeneous type are,

And 4, step 4: using the trained Tiny-Yolov3 in the step 3 to obtain a detected head image with a safety helmet, and intercepting a local image of a detected part from a corresponding image;

and 5: carrying out key point detection by using dlib, aligning a local picture to an FFHQ data set sample by adopting affine transformation, and equalizing the aligned pictures to obtain a human face picture to be identified, wherein the method specifically comprises the following steps:

and detecting a pre-training model according to the official shape _ predictor _5_face with angles of eyes and key points in the human according to bz 2. The 5-point coordinates are obtained by detecting the FFHQ data in advance. And aligning the detected face to FFHQ data through affine transformation. The affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and maintains "straightness" and "parallelism" of a two-dimensional figure. The three non-collinear pairs of corresponding points define a unique affine transformation. Affine transformations can be achieved by a complex series of atomic transformations, including translation, scaling, flipping, rotation, and shearing. Equalization is used as a method for enhancing Image Contrast (Image Contrast), and the main idea is to change the histogram distribution of one Image into an approximately uniform distribution, thereby enhancing the Contrast of the Image.

Step 6: using the face picture to be recognized in the step 5, and transmitting the face picture into a FaceNet network to obtain a 512-dimensional feature vector representing the face;

and 7: and (3) judging whether the combined result of the Euclidean distance and the cosine similarity of the feature vector in the step (6) and the face calculation in the database meets a threshold value for successful recognition, wherein the judgment is specifically as follows: and judging the similarity of the two vectors by calculating the cosine value of the included angle between the template vector and the vector of the detected image in the database and the Euclidean distance between the template vector and the vector of the detected image. And when the calculation result is larger than the set threshold value, the people are considered to be the same person.

The reason is that the cosine similarity is more to distinguish differences from the vector direction, but is insensitive to absolute numerical values, the Euclidean distance can reflect the absolute difference of individual numerical value characteristics, the two are combined to ensure that a threshold value has certain balance to the direction and the absolute difference of the vector, and the value range of the cosine [ -1,1] does not expand the absolute difference of the vector, and the calculation formula is as follows:

wherein L is _Euclidean ,L _cosine Respectively, representing the euclidean and cosine distances, x,

The expression x is used to indicate the number x,

the euclidean distance between them.

And 8: and (4) performing identity authentication according to the result in the step (7), if the safety helmet is not worn, giving an alarm, and realizing the identification of the tunnel workers and the early warning of the wearing of the safety helmet.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A tunnel worker safety helmet detection and face recognition method is characterized in that: the method comprises the following steps:

s7: calculating cosine similarity, euclidean distance and Manhattan distance between the feature vector in the step S6 and the face in the database, and judging whether the combined result of the Euclidean distance and the cosine similarity meets a threshold value for successful recognition to judge whether the combined result is a worker;

s8: and performing identity authentication according to the result in the step S7, and warning the user who does not wear the safety helmet, so as to realize the identification of the tunnel staff and the early warning of wearing the safety helmet.

2. The method of claim 1, wherein the method comprises the steps of: after the face images with the safety helmet of different postures of tunnel workers are obtained in the step S1, carrying out key point detection by using dlib, aligning local pictures to FFHQ data set samples by adopting affine transformation, and balancing the aligned pictures to serve as a database for face recognition, wherein the method specifically comprises the following steps of:

3. The method for detecting a safety helmet of a tunnel worker and recognizing a human face according to claim 1, wherein the method comprises the following steps: in the step S2, the face is labeled by adopting ImageLabel, the face is divided into two categories of wearing and non-wearing safety helmets, and the labeled picture is labeled

And rotating, overturning and fuzzifying to enrich the training set.

4. The method of claim 1, wherein the method comprises the steps of: the step S3: training a classified convolutional neural network model Tiny-YOLOv3 for detecting whether a worker wears a safety helmet or not by using the detec _ dataset data in the step S2; using the recog _ dataset data set data in the step S1 to train a convolutional neural network FaceNet for face recognition, specifically:

s31: inputting the data-enhanced annotation image into a target detection Tiny-YOLOv3 model for supervision training: updating the weight of the target detection model by using an Adam optimization algorithm by taking a loss function between the minimized predicted value and the label as an optimization target; the loss function is composed of a positioning loss function and a classification loss function, wherein the positioning loss function adopts an intersection ratio DIoU:

where IoU denotes the ratio of intersection to union of the predicted and real boxes, DIoU denotes IoU, b, considering the distance between the predicted and real box centers ^gt Representing the center points of the predicted and real frames, respectively, B ^gt B is a real frame and a prediction frame, rho represents the Euclidean distance between two central points, and c represents the diagonal distance of a minimum closure area which can simultaneously contain the prediction frame and the real frame;

the classification loss function is binary cross entropy BCE:

wherein

The ith training sample in one batch,

is shown and

the samples of the same type are used for the analysis,

is shown and

the samples of the non-homogeneous type are,

5. The method of claim 1, wherein the method comprises the steps of: in step S4, the actually shot picture is transmitted into a Tiny-YOLOv3 detection network, and the detected face is intercepted after affine transformation.

6. The method for detecting a safety helmet of a tunnel worker and recognizing a human face according to claim 1, wherein the method comprises the following steps: in step S5, the method specifically includes the following steps:

s52: carrying out dlib 5 key point detection on the local picture in the step S4, and transforming the picture key points to template coordinates through affine transformation to realize face alignment;

s53: meanwhile, the influence of the light is weakened by using equalization in consideration of the problem of light imbalance of the tunnel.

7. The method for detecting a safety helmet of a tunnel worker and recognizing a human face according to claim 1, wherein the method comprises the following steps: in step S7, the aligned face image is transmitted to the recognition network to obtain a 512-dimensional vector representing the face, and the distances between the face and all faces in the database are calculated, and the person in the database with the smallest distance is selected as the identity of the detected face, and the calculation method is as follows:

wherein L is _Euclidean ,L _cosine Respectively representing the euclidean distance and the cosine distance,

To represent

The euclidean distance between them.

8. The method of claim 1, wherein the method comprises the steps of: step S8 specifically includes: if the calculated distance meets the set threshold value, the same person is considered, namely identity confirmation is carried out, otherwise, authentication fails; meanwhile, whether the safety helmet is worn by a person or not can be obtained according to the detection result, and if the safety helmet is not worn and the identity authentication is abnormal, an alarm is given, so that the safety of the person and the entrance of irrelevant people are ensured.