CN113705404A

CN113705404A - Face detection method facing embedded hardware

Info

Publication number: CN113705404A
Application number: CN202110952145.1A
Authority: CN
Inventors: 杨浩; 孙凯
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-11-26

Abstract

The invention discloses a face detection method facing embedded hardware, which comprises a model training process: acquiring a data set and a verification set of a face image, labeling the face, and converting the face image into a data format conforming to a yolov5 network model; modifying an active layer function in the yolov5 network model to ReLU; inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and a model weight file; the model deployment application process: converting the trained yolov5 network model into an ONNX format, converting the ONNX format into a format supported by hardware equipment, quantizing the ONNX format, and deploying the quantized network model on edge equipment; and inputting the image to be predicted into the quantized network model, preprocessing the image, performing inference prediction to obtain an output result, and expressing the output result on the original image. The invention can reduce the volume of the model and improve the transportability of the model on the premise of ensuring the accuracy, has higher accuracy, portability and real-time performance, and accelerates the development of a face detection system in the embedded field.

Description

Face detection method facing embedded hardware

Technical Field

The invention relates to a face detection method facing embedded hardware, and belongs to the technical field of face detection.

Background

Neural networks have been highly successful in many fields of computer vision as a typical method of deep learning, and the trace of neural networks can be seen from basic large-scale image classification tasks to advanced computer vision applications. The application of neural networks in the field of face recognition is also many, and face recognition has become a very popular research direction in the field of biometric recognition in recent years.

Among the face detection systems in practical use at present, face detection systems based on a PC platform account for most. However, with the development of electronic technology and the change of social requirements, the hardware processing platform is developed towards miniaturization, low power consumption and portability, and the PC platform has the disadvantages of large volume, high power consumption, poor portability and the like, so that the wide application and popularization of face detection are limited. The embedded hardware platform has the characteristics of small volume, low power consumption, strong portability and the like.

With the development of the technology, the computing speed of the embedded platform is faster and faster, so that the development of a portable face detection system becomes possible. Therefore, it becomes possible to develop an embedded face detection system having a wider application field.

Disclosure of Invention

In order to overcome the defects that most of the existing face detection algorithms are developed aiming at general parallel computing hardware under the server environment and have large volume, high power consumption and poor portability, the invention provides a face detection method facing to embedded hardware, which has higher accuracy and real-time performance and provides a high-efficiency detection method for face detection.

The invention specifically adopts the following technical scheme to solve the technical problems:

a face detection method facing embedded hardware comprises a model training process and a model deployment application process;

wherein the model training process comprises the steps of:

obtaining an image containing a human face, dividing the image into a data set and a verification set according to a proportion, and carrying out human face labeling on the images of the data set and the verification set; converting the image data set and the verification set of the labeled human face into a data format conforming to the yolov5 network model;

modifying an activation layer function in the yolov5 network model into a ReLU, and modifying the classification number and name of training;

inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and a model weight file;

the model deployment application process comprises the following steps:

converting the trained yolov5 network model into an ONNX format, and converting the ONNX format network model into a format supported by hardware equipment;

quantizing the network model of the hardware equipment support format, deploying the quantized network model on the edge equipment, and loading the quantized network model;

inputting an image to be predicted into a quantized network model, preprocessing the image to be predicted by using the quantized network model, performing inference prediction on the preprocessed image, loading a model weight file obtained by training, obtaining an output result by using the quantized network model, and expressing a rectangular coordinate corresponding to the output result on an original image to obtain a visual prediction result.

Further, as a preferred technical solution of the present invention, a formula of a ReLU activation function adopted in the model training process is as follows: f (x) max (0, x).

Further, as a preferred technical scheme of the present invention, the loss function of yolov5 network model adopted to obtain the model weight file in the model training process is GloU_loss。

Further, as a preferred technical solution of the present invention, the model training process further includes performing error correction on a training result obtained by training the yolov5 network model.

Further, as a preferred technical scheme of the present invention, the error correction in the model training process employs a non-maximum suppression method NMS to screen out the prediction result of the overlap ratio excess.

Further, as a preferred technical solution of the present invention, the yolov5 network model adopted in the model training process includes:

backbone section: processing an input image by using a convolutional neural network to generate a deep characteristic diagram for abstracting and extracting the characteristics of the image;

part of the Neck: outputting the extracted image features in different sizes for detecting targets in different sizes;

head section: and predicting the output image features with different sizes, generating a boundary box and predicting the category information.

Further, as a preferred technical solution of the present invention, the quantifying a network model in a format supported by a hardware device in the model deployment application process includes:

converting the float32 data type in the network model to the fluid 8 data type quantization, and calculating the scaling factor S of the float32 data type and the translation factor Z of the fluid 8 data type:

wherein, X_maxAnd X_minRepresenting the maximum and minimum values of the floating point number; r represents a floating point number, Q represents quantized uint8 type data, round represents rounding, and function clamp represents:

wherein, the clamp function limits the random variation value in a given interval, a, b represents constant, x represents a variable.

By adopting the technical scheme, the invention can produce the following technical effects:

the invention provides a face detection method facing embedded hardware, wherein the existing yolov5 network model is a universal target detection model, and the invention improves an activation function on the basis to form an improved yolov5 network model, trains aiming at the face detection problem and obtains a special yolov5 network model for face detection; then, format conversion and quantification operation are carried out on the improved yolov5 network model, the volume of the model is reduced on the premise of ensuring the accuracy, the portability of the model is improved, and the model is convenient to deploy on edge equipment; finally, embedded hardware is adopted to accelerate the reasoning of the model, and the method has higher practicability. Experimental results on different data sets show that the face detection method provided by the invention has higher accuracy, portability and real-time performance, accelerates the development of a face detection system in the embedded field, and can provide an efficient detection method for face detection.

Drawings

Fig. 1 is a schematic flow chart of the face detection method facing embedded hardware according to the present invention.

Detailed Description

The following describes embodiments of the present invention with reference to the drawings.

As shown in fig. 1, the present invention relates to a face detection method for embedded hardware, which mainly includes a model training process and a model deployment application process, and specifically includes the following steps:

step 1, the model training process comprises the following steps:

step 1-1, acquiring an image containing a human face, dividing the image into a data set and a verification set according to a proportion, and carrying out human face labeling on the images of the data set and the verification set; and converting the image data set and the verification set of the annotated human face into a data format conforming to the yolov5 network model.

Step 1-2, modifying an activation layer function in a yolov5 network model to be ReLU, and modifying the classification number and name of training; the role of the activation function of a neural network is primarily to provide the nonlinear modeling capability of the network. After the nonlinear activation function is added, the deep neural network has the layered nonlinear mapping learning capability. The formula of the adopted ReLU activation function is as follows: (x) max (0, x), which has the following advantages: overcoming the problem of gradient disappearance; the training speed is accelerated.

And step 1-3, inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and obtain a model weight file.

In the invention, the yolov5 network model is input into an image and can be regarded as a matrix, and the output of the yolov5 network model is (x, y, w, h, c) which respectively represents the x and y coordinates of a prediction frame on an image coordinate system, the width and the height of a rectangle and the confidence coefficient. Is essentially a matrix, and another output matrix is obtained through the network formed by the three parts. In order to ensure that the targets are all detected, a plurality of targets are output as much as possible, and error correction is carried out later to remove wrong prediction results.

Further, in the invention, in the training process, error correction is performed on the training result obtained by training the yolov5 network model. The output of the yolov5 network model is essentially a matrix, in order to ensure that all targets are detected, a plurality of targets are output as far as possible, and then the error correction at the later stage is carried out to remove the wrong prediction results, namely, the yolov5 network model outputs a plurality of targets and carries out the error correction on the plurality of targets, the error correction adopts a non-maximum value suppression method NMS, and the prediction results with high coincidence degree are screened out by using the non-maximum value suppression method.

For training of yolov5 network model, it is necessary to find as many data sets and validation sets as possible as training sample images, and the format can still be represented as (x, y, w, h, 1), where x, y, w, h are true values, and the confidence is set to 1. And putting the data set and the verification set of the image with the marked face into a yolov5 network model for training to obtain a weight file. The trained neural network will give higher weight values to input information that it considers important, while those of less important input information will be relatively smaller. The weight information constitutes a weight file of the required face detection.

The weight parameters are determined by a loss function, which is a function of the difference between the model output and the sample label value, and can be adjusted by taking the derivative of the error function. The loss function adopted by the yolov5 network model in the invention is GloU_lossUsing intersecting rulersAnd calculating the loss by measuring. GIoU_lossThe calculation formula of (2) is as follows:

wherein IoU is intersection and union ratio, the ratio of intersection and union of the predicted border and the real border is calculated, C represents the minimum circumscribed rectangle of the real border given by the label and the predicted border given by the model, C-AUC is the area of C which does not cover A and B, and AUC represents the union of the real border and the predicted border.

Step 2, the model deployment application process comprises the following steps:

and 2-1, in order to adapt to the characteristics of the embedded hardware, the trained model needs to be subjected to conversion and quantification operation. Firstly, a trained yolov5 Network model is converted into an ONNX format, namely an Open Neural Network Exchange (ONNX) format; the method is a standard for expressing a deep learning model, can transfer the model among different frameworks, and converts the network model in the ONNX format into a format supported by hardware equipment.

In the conversion process, the large-size maximum pooling layer is replaced by a plurality of small-size maximum pooling layers by default, so that the reasoning speed of the model can be accelerated. In addition, a transposition layer at the tail end is removed in the conversion process, so that the deployment and reasoning of the model are facilitated. The transpose layer functions to rearrange the dimensions of the input in a given pattern.

And 2-2, quantizing the network model in the format supported by the hardware equipment, deploying the quantized network model on the edge equipment, and loading the quantized network model.

The quantization adopts a method of agent 8 (asymmetric quantization), and on the premise of ensuring the precision, the model quantization can effectively reduce the size of the model, reduce the storage space and accelerate the reasoning speed.

wherein, the clamp function limits the random variation value in a given interval, a and b are constants, and x is a variable.

Then, the quantized network model is deployed on the edge device, and the quantized network model is loaded.

Step 2-3, firstly, inputting an image to be predicted into a quantized network model, and preprocessing the image to be predicted by using the quantized network model, wherein the preprocessing part of the image mainly comprises the operations of inputting the size of the image, the sequence of channels and the like and needs to be adjusted according to the input format of the model; in the loading process of the model, the running time and the node name of each layer can be set and inquired, and result analysis is facilitated.

Then, the quantized network model is used for carrying out reasoning prediction on the preprocessed image, when prediction is carried out, the model weight file obtained by training is loaded, the quantized network model obtains an output result (x, y, w, h and c), rectangular coordinates corresponding to the output result are expressed on the original image, and a visual prediction result can be obtained, wherein the result display comprises the steps of displaying the area where the face is located on the original image, drawing a face frame and confidence coefficient.

The yolov5 network model adopted in the invention mainly comprises three main components:

part of the Neck: outputting the extracted image features in different sizes for detecting targets in different sizes, so that the targets in different sizes can be better detected;

Therefore, the Backbone and the Neck in the model are mainly used for extracting image features, and the image features are the features of various human faces in the input image prediction frame. The Head section is used for feature detection and prediction classes.

The yolov5 network model can be trained in the following way, and the yolov5 network model can be trained by using tens of thousands of face images, the total number of classifications is 1, and the classification name is face. The picture pixel size is set to 416x 416. The training data were expressed as 9: 1, dividing the training set into a training set and a verification set, loading a yolov5 network model, and training. After the trained yolov5 network model and the weight file of the model are loaded, a prediction result can be given to the image with the face. The embodiment has higher accuracy and real-time performance and higher practical use significance.

In conclusion, the method carries out face detection on the yolov5 network model based on the improved activation function so as to train the yolov5 network model; then, format conversion and quantification operation are carried out on the trained yolov5 network model, the volume of the model is reduced on the premise of ensuring the accuracy, and the portability of the model is improved; finally, embedded hardware is adopted to accelerate the reasoning of the model, and the method has higher practicability. Experimental results on different data sets show that the face detection method provided by the invention has higher accuracy, portability and real-time performance, and accelerates the development of a face detection system in the embedded field.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. A face detection method facing embedded hardware is characterized by comprising a model training process and a model deployment application process;

wherein the model training process comprises the steps of:

the model deployment application process comprises the following steps:

inputting an image to be predicted into a quantized network model, preprocessing the input image to be predicted by using the quantized network model, performing inference prediction on the preprocessed image, loading a model weight file obtained by training, obtaining an output result by using the quantized network model, and expressing a rectangular coordinate corresponding to the output result on an original image to obtain a visual prediction result.

2. The embedded hardware-oriented face detection method according to claim 1, wherein the formula of the ReLU activation function adopted in the model training process is as follows: f (x) max (0, x).

3. The embedded hardware-oriented face detection method according to claim 1, wherein a loss function of yolov5 network model adopted as the model weight file obtained in the model training process is GIoU_loss。

4. The embedded hardware-oriented face detection method according to claim 1, wherein the model training process further comprises performing error correction on a training result obtained by training a yolov5 network model.

5. The embedded hardware-oriented face detection method according to claim 1, wherein the error correction in the model training process employs a non-maximum suppression method NMS to screen out the prediction result of the overlap ratio excess.

6. The embedded hardware-oriented face detection method according to claim 1, wherein the yolov5 network model adopted in the model training process comprises:

7. The embedded hardware-oriented face detection method according to claim 1, wherein quantizing the network model in the format supported by the hardware device in the model deployment application process includes: