CN113705404A - Face detection method facing embedded hardware - Google Patents

Face detection method facing embedded hardware Download PDF

Info

Publication number
CN113705404A
CN113705404A CN202110952145.1A CN202110952145A CN113705404A CN 113705404 A CN113705404 A CN 113705404A CN 202110952145 A CN202110952145 A CN 202110952145A CN 113705404 A CN113705404 A CN 113705404A
Authority
CN
China
Prior art keywords
model
network model
image
training
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110952145.1A
Other languages
Chinese (zh)
Inventor
杨浩
孙凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110952145.1A priority Critical patent/CN113705404A/en
Publication of CN113705404A publication Critical patent/CN113705404A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face detection method facing embedded hardware, which comprises a model training process: acquiring a data set and a verification set of a face image, labeling the face, and converting the face image into a data format conforming to a yolov5 network model; modifying an active layer function in the yolov5 network model to ReLU; inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and a model weight file; the model deployment application process: converting the trained yolov5 network model into an ONNX format, converting the ONNX format into a format supported by hardware equipment, quantizing the ONNX format, and deploying the quantized network model on edge equipment; and inputting the image to be predicted into the quantized network model, preprocessing the image, performing inference prediction to obtain an output result, and expressing the output result on the original image. The invention can reduce the volume of the model and improve the transportability of the model on the premise of ensuring the accuracy, has higher accuracy, portability and real-time performance, and accelerates the development of a face detection system in the embedded field.

Description

Face detection method facing embedded hardware
Technical Field
The invention relates to a face detection method facing embedded hardware, and belongs to the technical field of face detection.
Background
Neural networks have been highly successful in many fields of computer vision as a typical method of deep learning, and the trace of neural networks can be seen from basic large-scale image classification tasks to advanced computer vision applications. The application of neural networks in the field of face recognition is also many, and face recognition has become a very popular research direction in the field of biometric recognition in recent years.
Among the face detection systems in practical use at present, face detection systems based on a PC platform account for most. However, with the development of electronic technology and the change of social requirements, the hardware processing platform is developed towards miniaturization, low power consumption and portability, and the PC platform has the disadvantages of large volume, high power consumption, poor portability and the like, so that the wide application and popularization of face detection are limited. The embedded hardware platform has the characteristics of small volume, low power consumption, strong portability and the like.
With the development of the technology, the computing speed of the embedded platform is faster and faster, so that the development of a portable face detection system becomes possible. Therefore, it becomes possible to develop an embedded face detection system having a wider application field.
Disclosure of Invention
In order to overcome the defects that most of the existing face detection algorithms are developed aiming at general parallel computing hardware under the server environment and have large volume, high power consumption and poor portability, the invention provides a face detection method facing to embedded hardware, which has higher accuracy and real-time performance and provides a high-efficiency detection method for face detection.
The invention specifically adopts the following technical scheme to solve the technical problems:
a face detection method facing embedded hardware comprises a model training process and a model deployment application process;
wherein the model training process comprises the steps of:
obtaining an image containing a human face, dividing the image into a data set and a verification set according to a proportion, and carrying out human face labeling on the images of the data set and the verification set; converting the image data set and the verification set of the labeled human face into a data format conforming to the yolov5 network model;
modifying an activation layer function in the yolov5 network model into a ReLU, and modifying the classification number and name of training;
inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and a model weight file;
the model deployment application process comprises the following steps:
converting the trained yolov5 network model into an ONNX format, and converting the ONNX format network model into a format supported by hardware equipment;
quantizing the network model of the hardware equipment support format, deploying the quantized network model on the edge equipment, and loading the quantized network model;
inputting an image to be predicted into a quantized network model, preprocessing the image to be predicted by using the quantized network model, performing inference prediction on the preprocessed image, loading a model weight file obtained by training, obtaining an output result by using the quantized network model, and expressing a rectangular coordinate corresponding to the output result on an original image to obtain a visual prediction result.
Further, as a preferred technical solution of the present invention, a formula of a ReLU activation function adopted in the model training process is as follows: f (x) max (0, x).
Further, as a preferred technical scheme of the present invention, the loss function of yolov5 network model adopted to obtain the model weight file in the model training process is GloUloss
Further, as a preferred technical solution of the present invention, the model training process further includes performing error correction on a training result obtained by training the yolov5 network model.
Further, as a preferred technical scheme of the present invention, the error correction in the model training process employs a non-maximum suppression method NMS to screen out the prediction result of the overlap ratio excess.
Further, as a preferred technical solution of the present invention, the yolov5 network model adopted in the model training process includes:
backbone section: processing an input image by using a convolutional neural network to generate a deep characteristic diagram for abstracting and extracting the characteristics of the image;
part of the Neck: outputting the extracted image features in different sizes for detecting targets in different sizes;
head section: and predicting the output image features with different sizes, generating a boundary box and predicting the category information.
Further, as a preferred technical solution of the present invention, the quantifying a network model in a format supported by a hardware device in the model deployment application process includes:
converting the float32 data type in the network model to the fluid 8 data type quantization, and calculating the scaling factor S of the float32 data type and the translation factor Z of the fluid 8 data type:
Figure BDA0003217369080000021
Figure BDA0003217369080000022
wherein, XmaxAnd XminRepresenting the maximum and minimum values of the floating point number; r represents a floating point number, Q represents quantized uint8 type data, round represents rounding, and function clamp represents:
Figure BDA0003217369080000031
wherein, the clamp function limits the random variation value in a given interval, a, b represents constant, x represents a variable.
By adopting the technical scheme, the invention can produce the following technical effects:
the invention provides a face detection method facing embedded hardware, wherein the existing yolov5 network model is a universal target detection model, and the invention improves an activation function on the basis to form an improved yolov5 network model, trains aiming at the face detection problem and obtains a special yolov5 network model for face detection; then, format conversion and quantification operation are carried out on the improved yolov5 network model, the volume of the model is reduced on the premise of ensuring the accuracy, the portability of the model is improved, and the model is convenient to deploy on edge equipment; finally, embedded hardware is adopted to accelerate the reasoning of the model, and the method has higher practicability. Experimental results on different data sets show that the face detection method provided by the invention has higher accuracy, portability and real-time performance, accelerates the development of a face detection system in the embedded field, and can provide an efficient detection method for face detection.
Drawings
Fig. 1 is a schematic flow chart of the face detection method facing embedded hardware according to the present invention.
Detailed Description
The following describes embodiments of the present invention with reference to the drawings.
As shown in fig. 1, the present invention relates to a face detection method for embedded hardware, which mainly includes a model training process and a model deployment application process, and specifically includes the following steps:
step 1, the model training process comprises the following steps:
step 1-1, acquiring an image containing a human face, dividing the image into a data set and a verification set according to a proportion, and carrying out human face labeling on the images of the data set and the verification set; and converting the image data set and the verification set of the annotated human face into a data format conforming to the yolov5 network model.
Step 1-2, modifying an activation layer function in a yolov5 network model to be ReLU, and modifying the classification number and name of training; the role of the activation function of a neural network is primarily to provide the nonlinear modeling capability of the network. After the nonlinear activation function is added, the deep neural network has the layered nonlinear mapping learning capability. The formula of the adopted ReLU activation function is as follows: (x) max (0, x), which has the following advantages: overcoming the problem of gradient disappearance; the training speed is accelerated.
And step 1-3, inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and obtain a model weight file.
In the invention, the yolov5 network model is input into an image and can be regarded as a matrix, and the output of the yolov5 network model is (x, y, w, h, c) which respectively represents the x and y coordinates of a prediction frame on an image coordinate system, the width and the height of a rectangle and the confidence coefficient. Is essentially a matrix, and another output matrix is obtained through the network formed by the three parts. In order to ensure that the targets are all detected, a plurality of targets are output as much as possible, and error correction is carried out later to remove wrong prediction results.
Further, in the invention, in the training process, error correction is performed on the training result obtained by training the yolov5 network model. The output of the yolov5 network model is essentially a matrix, in order to ensure that all targets are detected, a plurality of targets are output as far as possible, and then the error correction at the later stage is carried out to remove the wrong prediction results, namely, the yolov5 network model outputs a plurality of targets and carries out the error correction on the plurality of targets, the error correction adopts a non-maximum value suppression method NMS, and the prediction results with high coincidence degree are screened out by using the non-maximum value suppression method.
For training of yolov5 network model, it is necessary to find as many data sets and validation sets as possible as training sample images, and the format can still be represented as (x, y, w, h, 1), where x, y, w, h are true values, and the confidence is set to 1. And putting the data set and the verification set of the image with the marked face into a yolov5 network model for training to obtain a weight file. The trained neural network will give higher weight values to input information that it considers important, while those of less important input information will be relatively smaller. The weight information constitutes a weight file of the required face detection.
The weight parameters are determined by a loss function, which is a function of the difference between the model output and the sample label value, and can be adjusted by taking the derivative of the error function. The loss function adopted by the yolov5 network model in the invention is GloUlossUsing intersecting rulersAnd calculating the loss by measuring. GIoUlossThe calculation formula of (2) is as follows:
Figure BDA0003217369080000041
wherein IoU is intersection and union ratio, the ratio of intersection and union of the predicted border and the real border is calculated, C represents the minimum circumscribed rectangle of the real border given by the label and the predicted border given by the model, C-AUC is the area of C which does not cover A and B, and AUC represents the union of the real border and the predicted border.
Step 2, the model deployment application process comprises the following steps:
and 2-1, in order to adapt to the characteristics of the embedded hardware, the trained model needs to be subjected to conversion and quantification operation. Firstly, a trained yolov5 Network model is converted into an ONNX format, namely an Open Neural Network Exchange (ONNX) format; the method is a standard for expressing a deep learning model, can transfer the model among different frameworks, and converts the network model in the ONNX format into a format supported by hardware equipment.
In the conversion process, the large-size maximum pooling layer is replaced by a plurality of small-size maximum pooling layers by default, so that the reasoning speed of the model can be accelerated. In addition, a transposition layer at the tail end is removed in the conversion process, so that the deployment and reasoning of the model are facilitated. The transpose layer functions to rearrange the dimensions of the input in a given pattern.
And 2-2, quantizing the network model in the format supported by the hardware equipment, deploying the quantized network model on the edge equipment, and loading the quantized network model.
The quantization adopts a method of agent 8 (asymmetric quantization), and on the premise of ensuring the precision, the model quantization can effectively reduce the size of the model, reduce the storage space and accelerate the reasoning speed.
Converting the float32 data type in the network model to the fluid 8 data type quantization, and calculating the scaling factor S of the float32 data type and the translation factor Z of the fluid 8 data type:
Figure BDA0003217369080000051
Figure BDA0003217369080000052
wherein, XmaxAnd XminRepresenting the maximum and minimum values of the floating point number; r represents a floating point number, Q represents quantized uint8 type data, round represents rounding, and function clamp represents:
Figure BDA0003217369080000053
wherein, the clamp function limits the random variation value in a given interval, a and b are constants, and x is a variable.
Then, the quantized network model is deployed on the edge device, and the quantized network model is loaded.
Step 2-3, firstly, inputting an image to be predicted into a quantized network model, and preprocessing the image to be predicted by using the quantized network model, wherein the preprocessing part of the image mainly comprises the operations of inputting the size of the image, the sequence of channels and the like and needs to be adjusted according to the input format of the model; in the loading process of the model, the running time and the node name of each layer can be set and inquired, and result analysis is facilitated.
Then, the quantized network model is used for carrying out reasoning prediction on the preprocessed image, when prediction is carried out, the model weight file obtained by training is loaded, the quantized network model obtains an output result (x, y, w, h and c), rectangular coordinates corresponding to the output result are expressed on the original image, and a visual prediction result can be obtained, wherein the result display comprises the steps of displaying the area where the face is located on the original image, drawing a face frame and confidence coefficient.
The yolov5 network model adopted in the invention mainly comprises three main components:
backbone section: processing an input image by using a convolutional neural network to generate a deep characteristic diagram for abstracting and extracting the characteristics of the image;
part of the Neck: outputting the extracted image features in different sizes for detecting targets in different sizes, so that the targets in different sizes can be better detected;
head section: and predicting the output image features with different sizes, generating a boundary box and predicting the category information.
Therefore, the Backbone and the Neck in the model are mainly used for extracting image features, and the image features are the features of various human faces in the input image prediction frame. The Head section is used for feature detection and prediction classes.
The yolov5 network model can be trained in the following way, and the yolov5 network model can be trained by using tens of thousands of face images, the total number of classifications is 1, and the classification name is face. The picture pixel size is set to 416x 416. The training data were expressed as 9: 1, dividing the training set into a training set and a verification set, loading a yolov5 network model, and training. After the trained yolov5 network model and the weight file of the model are loaded, a prediction result can be given to the image with the face. The embodiment has higher accuracy and real-time performance and higher practical use significance.
In conclusion, the method carries out face detection on the yolov5 network model based on the improved activation function so as to train the yolov5 network model; then, format conversion and quantification operation are carried out on the trained yolov5 network model, the volume of the model is reduced on the premise of ensuring the accuracy, and the portability of the model is improved; finally, embedded hardware is adopted to accelerate the reasoning of the model, and the method has higher practicability. Experimental results on different data sets show that the face detection method provided by the invention has higher accuracy, portability and real-time performance, and accelerates the development of a face detection system in the embedded field.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (7)

1. A face detection method facing embedded hardware is characterized by comprising a model training process and a model deployment application process;
wherein the model training process comprises the steps of:
obtaining an image containing a human face, dividing the image into a data set and a verification set according to a proportion, and carrying out human face labeling on the images of the data set and the verification set; converting the image data set and the verification set of the labeled human face into a data format conforming to the yolov5 network model;
modifying an activation layer function in the yolov5 network model into a ReLU, and modifying the classification number and name of training;
inputting the data set after format conversion and the image of the verification set into a yolov5 network model for training to obtain a training result and a model weight file;
the model deployment application process comprises the following steps:
converting the trained yolov5 network model into an ONNX format, and converting the ONNX format network model into a format supported by hardware equipment;
quantizing the network model of the hardware equipment support format, deploying the quantized network model on the edge equipment, and loading the quantized network model;
inputting an image to be predicted into a quantized network model, preprocessing the input image to be predicted by using the quantized network model, performing inference prediction on the preprocessed image, loading a model weight file obtained by training, obtaining an output result by using the quantized network model, and expressing a rectangular coordinate corresponding to the output result on an original image to obtain a visual prediction result.
2. The embedded hardware-oriented face detection method according to claim 1, wherein the formula of the ReLU activation function adopted in the model training process is as follows: f (x) max (0, x).
3. The embedded hardware-oriented face detection method according to claim 1, wherein a loss function of yolov5 network model adopted as the model weight file obtained in the model training process is GIoUloss
4. The embedded hardware-oriented face detection method according to claim 1, wherein the model training process further comprises performing error correction on a training result obtained by training a yolov5 network model.
5. The embedded hardware-oriented face detection method according to claim 1, wherein the error correction in the model training process employs a non-maximum suppression method NMS to screen out the prediction result of the overlap ratio excess.
6. The embedded hardware-oriented face detection method according to claim 1, wherein the yolov5 network model adopted in the model training process comprises:
backbone section: processing an input image by using a convolutional neural network to generate a deep characteristic diagram for abstracting and extracting the characteristics of the image;
part of the Neck: outputting the extracted image features in different sizes for detecting targets in different sizes;
head section: and predicting the output image features with different sizes, generating a boundary box and predicting the category information.
7. The embedded hardware-oriented face detection method according to claim 1, wherein quantizing the network model in the format supported by the hardware device in the model deployment application process includes:
converting the float32 data type in the network model to the fluid 8 data type quantization, and calculating the scaling factor S of the float32 data type and the translation factor Z of the fluid 8 data type:
Figure FDA0003217369070000021
Figure FDA0003217369070000022
wherein, XmaxAnd XminRepresenting the maximum and minimum values of the floating point number; r represents a floating point number, Q represents quantized uint8 type data, round represents rounding, and function clamp represents:
Figure FDA0003217369070000023
wherein, the clamp function limits the random variation value in a given interval, a, b represents constant, x represents a variable.
CN202110952145.1A 2021-08-18 2021-08-18 Face detection method facing embedded hardware Pending CN113705404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110952145.1A CN113705404A (en) 2021-08-18 2021-08-18 Face detection method facing embedded hardware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110952145.1A CN113705404A (en) 2021-08-18 2021-08-18 Face detection method facing embedded hardware

Publications (1)

Publication Number Publication Date
CN113705404A true CN113705404A (en) 2021-11-26

Family

ID=78653379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110952145.1A Pending CN113705404A (en) 2021-08-18 2021-08-18 Face detection method facing embedded hardware

Country Status (1)

Country Link
CN (1) CN113705404A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153389A (en) * 2023-04-21 2023-05-23 之江实验室 Method, apparatus, device and storage medium for quantifying protein language model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488989A (en) * 2020-04-16 2020-08-04 济南浪潮高新科技投资发展有限公司 Method and model for realizing lightweight target detection at mobile phone end
CN112085010A (en) * 2020-10-28 2020-12-15 成都信息工程大学 Mask detection and deployment system and method based on image recognition
CN112233090A (en) * 2020-10-15 2021-01-15 浙江工商大学 Film flaw detection method based on improved attention mechanism
CN112287839A (en) * 2020-10-29 2021-01-29 广西科技大学 SSD infrared image pedestrian detection method based on transfer learning
US20210056293A1 (en) * 2019-08-19 2021-02-25 Zhuhai Eeasy Technology Co., Ltd. Face detection method
CN112749626A (en) * 2020-12-10 2021-05-04 同济大学 DSP platform-oriented rapid face detection and recognition method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210056293A1 (en) * 2019-08-19 2021-02-25 Zhuhai Eeasy Technology Co., Ltd. Face detection method
CN111488989A (en) * 2020-04-16 2020-08-04 济南浪潮高新科技投资发展有限公司 Method and model for realizing lightweight target detection at mobile phone end
CN112233090A (en) * 2020-10-15 2021-01-15 浙江工商大学 Film flaw detection method based on improved attention mechanism
CN112085010A (en) * 2020-10-28 2020-12-15 成都信息工程大学 Mask detection and deployment system and method based on image recognition
CN112287839A (en) * 2020-10-29 2021-01-29 广西科技大学 SSD infrared image pedestrian detection method based on transfer learning
CN112749626A (en) * 2020-12-10 2021-05-04 同济大学 DSP platform-oriented rapid face detection and recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
3D视觉工坊: "YOLOv5x6模型来了! 同样支持CPU上ONNX部署与推理", pages 1 - 4, Retrieved from the Internet <URL:YOLOv5x6模型来了! 同样支持CPU上ONNX部署与推理-腾讯云开发者社区-腾讯云 (tencent.com)> *
张童;谭南林;包辰铭;: "应用于嵌入式平台的实时红外行人检测方法", 激光与红外, no. 02 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153389A (en) * 2023-04-21 2023-05-23 之江实验室 Method, apparatus, device and storage medium for quantifying protein language model

Similar Documents

Publication Publication Date Title
CN108764048B (en) Face key point detection method and device
CN112528977B (en) Target detection method, target detection device, electronic equipment and storage medium
CN114202672A (en) Small target detection method based on attention mechanism
CN110866471A (en) Face image quality evaluation method and device, computer readable medium and communication terminal
EP3968179A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
CN111027563A (en) Text detection method, device and recognition system
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN112115783A (en) Human face characteristic point detection method, device and equipment based on deep knowledge migration
CN112287820A (en) Face detection neural network, face detection neural network training method, face detection method and storage medium
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
CN110222607B (en) Method, device and system for detecting key points of human face
CN107301643B (en) Well-marked target detection method based on robust rarefaction representation Yu Laplce&#39;s regular terms
CN115457531A (en) Method and device for recognizing text
CN109583367A (en) Image text row detection method and device, storage medium and electronic equipment
CN110163205A (en) Image processing method, device, medium and calculating equipment
WO2023284608A1 (en) Character recognition model generating method and apparatus, computer device, and storage medium
Nida et al. Instructor activity recognition through deep spatiotemporal features and feedforward extreme learning machines
CN110765882A (en) Video tag determination method, device, server and storage medium
CN113706562B (en) Image segmentation method, device and system and cell segmentation method
CN111127360A (en) Gray level image transfer learning method based on automatic encoder
He et al. Context-aware mathematical expression recognition: An end-to-end framework and a benchmark
CN114863554A (en) Sign language recognition system and method based on deep learning model
CN113705404A (en) Face detection method facing embedded hardware
CN113361384A (en) Face recognition model compression method, device, medium, and computer program product
CN116301914A (en) Convolutional neural network deployment method based on GAP8 microprocessor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination