CN114565967B

CN114565967B - Worker card face detection method, terminal and storage medium

Info

Publication number: CN114565967B
Application number: CN202210456936.XA
Authority: CN
Inventors: 王鹏亮; 陈曦; 钟国海; 黄梓珊
Original assignee: Guangzhou Richstone Technology Co ltd
Current assignee: Guangzhou Richstone Technology Co ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-08-30
Anticipated expiration: 2042-04-28
Also published as: CN114565967A

Abstract

The invention provides a worker card face detection method, a terminal and a storage medium, wherein the worker card face detection method comprises the following steps: s101: controlling a camera to shoot images; s102: inputting the image to a face detection model in the face workhorse detection model, outputting coordinate information to the workhorse detection model after the face detection model outputs the identification information and the coordinate information of the face, and obtaining a workhorse detection result, wherein the face detection model is obtained by replacing Retinaface network training of a loss function, and the workhorse detection model is obtained by modifying the hierarchy and parameter YOLO v3 network training. The invention has the advantages of high detection speed and high efficiency, does not need manual inspection, avoids the problems of missed detection and false detection and reduces the labor cost.

Description

Worker card face detection method, terminal and storage medium

Technical Field

The invention relates to the field of image detection, in particular to a worker face detection method, a terminal and a storage medium.

Background

Information security and security are always a topic which is not neglected in the whole society, in the security field, especially in the intelligent security of a home or a garden, human body biological characteristics are the most important ones, and in the human body biological characteristics, human faces are one of the most important characteristic parameters, so the human face recognition and human face comparison technology is often used in the intelligent security. In addition, in some relatively regular enterprises paying attention to information security, employees are generally required to wear the work cards, and the work cards can not only display the positions and identity information of the wearers, but also represent the management culture of the enterprises. From the perspective of the employee, the employee badge is a sign that the enterprise confirms the identity of the employee. From the perspective of the enterprise, the worker cards are the image of the enterprise. The enterprise staff wears the worker card, which is not only beneficial to improving the occupation honor and responsibility of the staff, but also beneficial to the standard management of the enterprise, and is also convenient for identifying and supervising the staff.

Therefore, in order to protect the information security and the standard management of a company or a confidential area, identity detection and card detection need to be carried out on people who enter and exit the company or the confidential area, and both the detection and the card detection are carried out in a manual inspection mode in the past, so that the problems of missed detection, time consumption, labor consumption and the like exist.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the worker card face detection method, the terminal and the storage medium, the worker card face detection can be automatically carried out through the worker card face detection model, the speed is high, the efficiency is high, manual inspection is not needed, the problems of missing detection and false detection are avoided, and the labor cost is reduced.

In order to solve the above problems, the present invention adopts a technical solution as follows: a worker card face detection method comprises the following steps: s101: controlling a camera to shoot images; s102: the image is input into a face detection model in the face workmanship board detection model, the face detection model detects a face and outputs identification information and coordinate information of the face, the coordinate information is output to the workmanship board detection model in the face workmanship board detection model, and a workmanship board detection result of the workmanship board detection model is obtained, wherein the face detection model is obtained through Retinaface network training replacing a loss function, and the workmanship board detection model is obtained through YOLO v3 network training correcting layers and parameters.

Further, the step of obtaining the face card detection model specifically includes: s201: acquiring a training sample for worker face detection through a shot image, and labeling a worker in the training sample to form a data set; s202: correcting the hierarchy and parameters of the YOLO v3 network according to the data set, and training the corrected YOLO v3 network by using the data set to generate a worker-brand detection model; s203: replace the loss function of Retinaface network, through Retinaface network generation face detection model after the data set training is revised combines worker tablet detection model, face detection model face worker tablet detection model

Further, the step of obtaining a training sample for face detection of the worker card through the shot image specifically includes: screening out primary screening images from shot images through a preset screening model, and selecting a preset number of images from the primary screening images as training samples according to an input instruction, wherein the training samples comprise positive sample images of a worker-wearing card and negative sample images of a worker-not-wearing card.

Further, the step of controlling the camera to shoot the image specifically includes: judging whether a person to be detected exists or not through a Dlib model integrated in the camera, and extracting a half-length picture of the person to be detected in the image after the person to be detected is detected.

Further, the step of correcting the hierarchy and parameters of the YOLO v3 network according to the data set specifically includes: and reducing the number of layers of the YOLO v3 network according to the data set to form a shallow network, and adjusting the window size of the convolutional layer of the YOLO v3 network.

Further, the step of training the modified YOLO v3 network to generate the card detection model by using the data set specifically includes: and inputting the training samples in the data set into the shallow network for training, and performing parameter optimization on the worktile detection model according to a training result.

Further, the worker's license plate detection model comprises 12 convolutional layers, 6 maximum pooling layers, 2 yolo layers, 2 connection layers and 1 upsampling layer

Further, the step of replacing the loss function of the Retinaface network specifically includes: and replacing the loss function of the Retinaface network with an Arcface loss function.

Based on the same inventive concept, the invention further provides an intelligent terminal, which comprises a processor and a memory, wherein the processor is in communication connection with the memory, the memory stores a computer program, and the processor executes the worker card face detection method through the computer program.

Based on the same inventive concept, the present invention also proposes a computer-readable storage medium storing program data for executing the method for detecting a face of a card as described above.

Compared with the prior art, the invention has the beneficial effects that: face detection model through among the people's face worker tablet detection model, worker's tablet detection model carries out face detection respectively, worker's tablet detects, and to the level of the YOLO v3 network that is used for training worker's tablet detection model, the parameter is revised and is replaced the loss function who is used for training the Retinaface network of face detection model, can carry out people's face worker's tablet automatically through the people's face worker tablet detection model that obtains and detect, and is fast, high efficiency, need not manual inspection, the problem of missed detection and false retrieval has been avoided, and labor cost is reduced.

Drawings

FIG. 1 is a flow chart of an embodiment of a method for detecting a face of a worker's card according to the present invention;

FIG. 2 is a flowchart of an embodiment of obtaining a face card detection model in the card face detection method of the present invention;

FIG. 3 is a flowchart of another embodiment of obtaining a face card detection model in the card face detection method of the present invention;

FIG. 4 is a block diagram of an embodiment of an intelligent terminal according to the invention;

fig. 5 is a block diagram of an embodiment of a computer-readable storage medium of the present invention.

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It should be noted that the various embodiments of the present disclosure, described and illustrated in the figures herein generally, may be combined with each other without conflict, and that the structural components or functional modules therein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 to 3, wherein fig. 1 is a flowchart of an embodiment of a worker's card face detection method according to the present invention; FIG. 2 is a flowchart of an embodiment of obtaining a face card detection model in the card face detection method of the present invention; FIG. 3 is a flowchart of another embodiment of obtaining a face card detection model in the card face detection method of the present invention. The method for detecting the face of the card of the invention is explained in detail with reference to fig. 1 to 3.

The face detection and the worker card detection belong to target recognition, but the face detection and the worker card detection have essential difference, the difference between the classes of the face detection and the difference between the classes of the face detection are small, and the difficulty is much higher than that of the worker card detection. For face detection/comparison, the more common methods are Dlib, MTCNN, Retinaface. However, the traditional CNN neural network is used by Dlib, so it is obvious that its model is also unwieldy, and therefore the problem that the most challenging is to train/predict time when the Dlib is used for face detection and face comparison is that the algorithm is slower than other algorithms, the model is slightly larger, the accuracy rate is lower, and the method is not suitable for industrial application. The main sub-networks of the MTCNN are P-Net, R-Net and O-Net, and the three sub-networks are full convolution networks, so the MTCNN has high accuracy but low speed, and has high requirements on hardware in terms of training and prediction. Finally, Retinaface, whose backbone network is Resnet, which is excellent in classification problems, but the Loss function used by its network Loss layer is Softmax Loss, which is a relatively traditional Loss function, and the problem that he considers is whether the samples can be correctly classified, but Softmax still has a large promotion space in the problems of enlarging the inter-class distance between heterogeneous samples and reducing the intra-class distance between homogeneous samples, especially for human face detection and human face comparison, the difference value between the classes and the same classes is not large, but the classes and the same classes must be separated, otherwise, the failed item is predicted, and the commonly used Softmax Loss function cannot be used in the aspect of selecting the Loss function. In the field of face detection, the loss function with accurate prediction is Arcface which is developed specially for face detection, so the invention combines the loss function to detect/compare the face.

In this embodiment, the method for detecting the face of the card comprises the following steps:

s101: and controlling the camera to shoot images.

In this embodiment, the step of controlling the camera to capture the image specifically includes: judging whether a person to be detected exists or not through a Dlib model integrated in the camera, extracting a half-length picture of the person to be detected in the image after the person to be detected is detected, and inputting the half-length picture into a human face card detection model to detect the human face card. In other embodiments, the person detection and the extraction of the bust of the detected person may be performed by one or more of the face recognition models such as Facenet, deep id, deep face, and face + +.

The half-length image is an image of the upper half of the person to be detected, and face recognition and worker plate recognition are performed through the half-length image.

The invention combines face recognition, face comparison and workcard detection, and integrates the model into RTSP camera video stream together, so that face workcard detection can be carried out on a certain area in real time, and privacy security can be better protected.

S102: inputting the half-length picture into a face detection model in a face workhorse detection model, outputting coordinate information to a workhorse detection model in the face workhorse detection model after the face detection model detects a face and outputs identification information and coordinate information of the face, and obtaining a workhorse detection result of the workhorse detection model, wherein the face detection model is obtained by Retinaface network training replacing a loss function, and the workhorse detection model is obtained by YOLO v3 network training correcting layers and parameters.

In this embodiment, the work card detection model determines an area where the work card is located according to the coordinate information, and performs work card detection on the area, where the work card detection result includes a work card wearing result of a person. The steps of obtaining the face card detection model specifically comprise:

s201: and acquiring a training sample for detecting the face of the worker's card through the shot image, and labeling the worker's card in the training sample to form a data set.

In this embodiment, the step of obtaining a training sample for worker face detection through a captured image specifically includes: screening out primary screening images from shot images through a preset screening model, selecting a preset number of images from the primary screening images as training samples according to an input instruction, wherein the training samples comprise positive sample images of a worker-wearing card and negative sample images of a worker-not-wearing card.

Specifically, the screening model is a digital display (Dlib) model, the Dlib model is integrated in the camera, and images shot by the camera are screened through the Dlib model.

In this embodiment, an image is captured by a camera, wherein when the image is captured by the camera, the camera is called to determine whether the calling of the camera is successful, if so, the image of a person is captured, and if not, a network cable and a power supply connected to the camera and an RTSP address of the camera are detected.

The primary screening images comprise images with faces and images without faces, and the preset number is 1000 positive sample images and 100 positive sample images.

In a specific embodiment, an image with a human body is selected from a plurality of pieces of video stream frame data acquired by a camera, a batch coarse detection is performed through a Dlib library, 10000 face images and 2000 non-face images are screened out, and then 1000 well-behaved positive samples (a wearing plate) and 100 well-behaved negative samples (a non-wearing plate) are manually screened out from the 10000 face images and 2000 non-face images.

In this embodiment, the step of labeling the workpiece cards in the training sample to form the data set further includes: and preprocessing the training sample, and extracting a half-length picture comprising the human face from the training sample through a screening model. The preprocessing comprises image preprocessing operations such as denoising and geometric correction. And extracting the half-length image from the training sample according to the position of the face by the Dlib model.

In a specific embodiment, the data set is a VOC data set and comprises a training set, a prediction set and a verification set, target labeling is carried out on a training sample, the position of a worker card in an image is labeled, position information is output to an xml file, a negative sample (an image without the worker card) is similar, the xml file is output by using the same text structure as a positive sample, only the position information is null, then the data set is divided by using script codes, then three txt files containing image address information and xml labeling file information are output according to the actual conditions of the training set, the prediction set and the verification set, then the folders are combined and integrated into a VOC data set identified by the worker card, and the data set is manufactured.

S202: and correcting the hierarchy and parameters of the YOLO v3 network according to the data set, and training the corrected YOLO v3 network by using the data set to generate a worker-brand detection model.

In this embodiment, the step of correcting the hierarchy and parameters of the YOLO v3 network according to the data set specifically includes: and reducing the number of layers of the YOLO v3 network according to the data set to form a shallow network, and adjusting the size of a sliding window of the YOLO v3 network.

Specifically, after the data set is manufactured, training of the workcard detection can be started, a model is required before training, the conventional YOLO v3 model comprises 76 conv layers, 23 shortcut layers, 3 YOLO layers, 4 route layers and 2 upsample layers, the conventional YOLO v3 model is very deep, and the model has the advantage of being deep, because the deeper the model is, the more each feature in the image can be extracted, and the advantage is provided for a large target because gradient extinction is not easy to occur in the training process. However, the method is not friendly to the small target of the card, and because the model is deep, the training is deep, gradient extinction is likely to happen, and the characteristics of the final card disappear completely, so that the method improves the deep network layer of the conventional YOLO v3 originally, and uses a shallow network layer for card detection.

In a specific embodiment, the workcard inspection model obtained by the YOLO v3 network training includes 12 conv layers (convolutional layers), 6 maxporoling layers (max pooling layer), 2 YOLO layers (YOLO 3 network layer), 2 route layers (connection layer), and 1 upsample layer (upsampling layer). Specifically, the structure of the modified card detection model is shown in table one:

watch-I and worker-card detection model

In this embodiment, the step of generating the card detection model by using the data set to train the modified YOLO v3 network specifically includes: and inputting training samples in the data set into a shallow network for training, and performing parameter optimization on the worklist detection model according to a training result. And performing iterative optimization on the workcard detection model according to the accuracy and the recall rate of the model.

In a specific embodiment, after a relevant model is built, parameters in the model need to be adjusted according to the labeling condition of a data set, wherein the important thing is adjustment of anchors (sliding windows), which is the size of a convolutional layer window during training, the accuracy during training can be improved by selecting appropriate anchors, 9 anchors frames with different sizes are screened out according to the width and height of an actual labeling frame of the data set, the screening principle is k-means, and a screening value is obtained through the k-means and then substituted into model parameters for training. Meanwhile, according to the actual situation of training iteration, other parameters in the model are also adjusted in a relevant manner, and the total adjustment situation is as follows:

1.1, the subdivisions (the number of training in each batch) is changed into 16, so that more training images in each batch are obtained, and the captured features are more comprehensive.

1.2 set max _ batches (upper training limit) to 6000, prevent overfitting.

1.3 steps (learning rate adjusted down by 10 times when training reaches the value of steps) moderate adjustments are made to the value of max _ batches, setting steps to 4800,5400.

1.4 filters (number of output profiles) is set to 18.

1.5 it was observed that the width of the image in the new dataset was 1/3 times the length, thus adjusting the aspect ratio to 960 x 320.

1.6 Anchors size was adjusted to 97,100, 85,159, 113,165, 99,202, 143,163, 127,209, 113,259, 159,211, 147,287 based on the k-means distribution of the sample.

S203: and replacing a loss function of the Retinaface network, training the modified Retinaface network through a data set to generate a face detection model, and combining the worklist detection model and the face detection model to generate the face worklist detection model.

In this embodiment, the step of replacing the loss function of the Retinaface network specifically includes: and replacing the loss function of the Retinaface network with an Arcface loss function.

In a specific embodiment, the Retinaface Network is divided into two enhanced Feature extraction layers, which are FPN (Single Stage Face Detector) and SSH (Feature Pyramid Network), respectively. The Retinaface network is a neural network with better use effect in the current face recognition and face detection, but the only defect is the Loss function of the Retinaface network, the Loss function used by the network Loss layer of the Retinaface network is Softmax Loss, the Softmax Loss is a more traditional Loss function, the problem considered by the Retinaface network is whether samples can be correctly classified, but the Softmax Loss still has a great promotion space on the problems of enlarging the inter-class distance between heterogeneous samples and reducing the intra-class distance between homogeneous samples, and therefore the Arcface Loss function is selected to replace the original Softmax Loss. Softmax Loss can classify two classes, but does not reflect the difference between different classes and the closeness between the same classes; compared with other nonlinear Loss functions, the decision boundary of the ArcFace Loss is always stable, so that the ArcFace Loss can achieve a good training effect, higher accuracy and better convergence function can be achieved on various different types of training data sets without combining other Loss functions, and the ArcFace Loss is finally combined into a Retinaface network to achieve good effect.

The step of generating the face detection model by training the modified Retinaface network through the data set specifically comprises the following steps: and inputting the LFW face data set into a Retinaface network to carry out worktile detection training to form a face detection model, and carrying out face recognition and face detection through the face detection model.

In this embodiment, after the face card detection model is obtained, the face data of the employee in the area to be detected (e.g., a company) is put into the face library, and face detection and corresponding card detection are performed according to the face data in the face library.

In this embodiment, a face detection model in a face workcard detection model performs face detection and face recognition on a face in a video stream transmitted in real time based on face data in a face library, returns coordinate information to a workcard recognition model for workcard detection after face coordinate information is obtained, and determines who the worker is and whether the worker is to wear the workcard according to the face recognition information and the workcard detection result; if the person is unknown, displaying the unknown person and the card wearing information of the unknown person. The video stream transmitted by the camera is an RTSP video stream.

Has the advantages that: according to the worker-card face detection method, after a to-be-detected person is shot through the camera, the half-body image of the to-be-detected person is extracted, face detection and worker-card detection are respectively carried out through the face detection model and the worker-card detection model in the face worker-card detection model, the hierarchy and the parameters of the YOLO v3 network used for training the worker-card detection model are corrected, the loss function of the Retinaface network used for training the face detection model is replaced, the face worker-card detection can be automatically carried out through the obtained face worker-card detection model, the speed is high, the efficiency is high, manual detection is not needed, the problems of missing detection and false detection are avoided, and the labor cost is reduced.

Based on the same inventive concept, the present invention further provides an intelligent terminal, please refer to fig. 4, wherein fig. 4 is a structural diagram of an embodiment of the intelligent terminal of the present invention. The intelligent terminal of the present invention is described in detail with reference to fig. 4.

In this embodiment, the intelligent terminal includes a processor and a memory, the processor is in communication connection with the memory, the memory stores a computer program, and the processor executes the method for detecting the face of the card as described in the above embodiment through the computer program.

In some embodiments, memory may include, but is not limited to, high speed random access memory, non-volatile memory. Such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable functional device, a discrete Gate or transistor functional device, or a discrete hardware component.

Based on the same inventive concept, the present invention further provides a computer-readable storage medium, please refer to fig. 5, fig. 5 is a structural diagram of an embodiment of the computer-readable storage medium of the present invention, and the computer-readable storage medium of the present invention is described with reference to fig. 5.

In the present embodiment, a computer-readable storage medium stores program data used to execute the card face detection method as described in the above embodiments.

The computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be an article of manufacture that is not accessible to the computer device or may be a component that is used by an accessed computer device.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A worker card face detection method is characterized by comprising the following steps:

s101: judging whether a person to be detected exists or not through a Dlib model integrated in the camera, and extracting a half-length diagram of the person to be detected in the image after the person to be detected is detected;

s102: inputting a half-length picture into a face detection model in a face workcard detection model, outputting coordinate information to a workcard detection model in the face workcard detection model after the face detection model detects a face and outputs identification information and coordinate information of the face, determining an area where a workcard is located according to the coordinate information by the workcard detection model, detecting the workcard in the area, wherein the workcard detection result comprises a workcard wearing result of a person, and acquiring the person identification information and whether the workcard is worn according to the face identification information and the workcard detection result, wherein the face detection model is obtained by Retinaace network training replacing a loss function, and the workcard detection model is obtained by YOLO v3 network training for correcting layers and parameters;

the steps of obtaining the human face card detection model specifically comprise:

s201: acquiring a training sample for face detection of a worker's card through a shot image, and labeling the worker's card in the training sample to form a data set;

s202: reducing the number of layers of the YOLO v3 network according to the data set to form a shallow network, screening sliding windows of 9 YOLO v3 networks in different sizes based on the length and the width of an actual labeling frame of the data set, and training the corrected YOLO v3 network by using the data set and the screened sliding windows to generate a worker-brand detection model, wherein the worker-brand detection model comprises 12 convolutional layers, 6 maximum pooling layers, 2 YOLO layers, 2 connecting layers and 1 upsampling layer;

s203: replace the loss function of Retinaface network, through Retinaface network generation face detection model after the data set training is revised combines worker tablet detection model, face detection model face worker tablet detection model.

2. The method for detecting a face of a card as claimed in claim 1, wherein the step of obtaining training samples for detecting the face of the card by the photographed image specifically comprises:

screening primary screening images from shot images through a preset screening model, and selecting a preset number of images from the primary screening images as training samples according to an input instruction, wherein the training samples comprise positive sample images of a worker wearing card and negative sample images of a worker not wearing the card.

3. The method for detecting a face of a card as claimed in claim 1, wherein the step of training the modified YOLO v3 network to generate the card detection model by using the data set specifically comprises:

and inputting the training samples in the data set into the shallow network for training, and performing parameter optimization on the worktile detection model according to a training result.

4. The worker card face detection method according to claim 1, wherein the step of replacing the loss function of the Retinaface network specifically comprises:

and replacing the loss function of the Retinaface network with an Arcface loss function.

5. An intelligent terminal, characterized in that the intelligent terminal comprises a processor and a memory, the processor is connected with the memory in a communication way, the memory stores a computer program, and the processor executes the worker card face detection method according to any one of claims 1 to 4 through the computer program.

6. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program data for executing the card face detection method according to any one of claims 1 to 4.