CN109359556B

CN109359556B - Face detection method and system based on low-power-consumption embedded platform

Info

Publication number: CN109359556B
Application number: CN201811110258.1A
Authority: CN
Inventors: 游忍; 刘明华; 周春燕
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2021-08-06
Anticipated expiration: 2038-09-21
Also published as: CN109359556A

Abstract

The invention discloses a face detection method based on a low-power-consumption embedded platform, which comprises the following steps: A. acquiring input image information; B. carrying out data amplification, data normalization and PCA whitening processing on the acquired image information; C. optimizing a deep cascade network face detection algorithm, training a face detection model, and detecting the processed image information to obtain face information; D. and receiving the face information output by the face detection algorithm, and outputting image information including the face information. The face detection method based on the low-power-consumption embedded platform aims at the problems that the existing face detection algorithm has huge model parameters, large calculation amount and is difficult to operate in an embedded mode, the obtained face detection algorithm can operate in the embedded platform after the algorithm is improved, the precision loss is little, the face detection algorithm can be applied to embedded equipment such as an air conditioner, a television and the like, the user experience is improved, and the face detection method has the advantages of low power consumption, high speed, high precision and the like.

Description

Face detection method and system based on low-power-consumption embedded platform

Technical Field

The invention relates to the technical field of computer vision, in particular to a face detection method and a face detection system based on a low-power-consumption embedded platform.

Background

With the development of computer technology, face detection technology gradually becomes practical, and is widely applied in the fields of video monitoring, access control systems, security, traffic systems and the like.

At present, a hardware platform for realizing face detection comprises a PC and an embedded hardware system. The PC is large in size, high in cost, high in power consumption and inconvenient to carry, and therefore application occasions of face detection are greatly limited. With the development of the technology, the embedded platform has faster and faster operation speed, smaller and smaller volume and smaller cost and power consumption, so that the development of the portable face detection system has sufficient hardware support. The face detection system can be operated at the embedded end with low power consumption.

Currently, the mainstream embedded hardware platforms include arm (advanced RISC machines), dsp (digital Signal processor), and davinci dual core platform. DSP processors have powerful data processing and computing capabilities, but have limited control capabilities for peripherals, imperfect support for user interfaces, and are expensive. The da vinci dual-core platform has strong control and data processing capacity on peripheral equipment, but is complex in working, difficult to develop and expensive. The ARM processor has powerful functions in the aspects of man-machine interaction, equipment control and the like, and meanwhile, along with the rapid development of the ARM processor, the ARM processor has stronger and stronger data processing capacity and lower and cheaper price.

Compared with a DSP (digital signal processor) and a DaVinci platform, the ARM has the advantages of low cost, low power consumption, high operation speed and the like, can meet the requirement of a face detection algorithm, and meanwhile, the ARM series processor is widely applied to an intelligent terminal, so that a face detection system can more easily enter common adults for consumption. Therefore, the invention has important application value in developing a low-power-consumption face detection system on an ARM series platform, and is based on the technical background.

Generally, in order to meet the accuracy requirement of face detection, the adopted detection algorithm has high complexity, needs certain hardware condition support, is limited by harsh hardware conditions of a low-power-consumption embedded environment, and determines that the accuracy and the practicability cannot be ensured by adopting the detection algorithm with high complexity. For example, in some embedded environments such as a small unmanned aerial vehicle and intelligent security video monitoring, the operating requirements of algorithms with higher complexity cannot be directly met due to the fact that the processor and the memory resources are limited, the storage space is smaller, and the available hardware resources are also smaller. Therefore, the human face needs to be detected through reasonable system design and optimization according to the characteristic that the hardware resources of the low-power-consumption embedded environment are insufficient.

Disclosure of Invention

The invention aims to provide a face detection method and a face detection system based on a low-power-consumption embedded platform based on the application requirements based on the technical background.

In order to achieve the technical effects, the invention adopts the following technical scheme:

a face detection method based on a low-power-consumption embedded platform comprises the following steps:

A. acquiring input image information;

B. carrying out data amplification, data normalization and PCA whitening processing on the acquired image information;

wherein, the quantity of images can be increased through data augmentation, the generalization capability of the model is improved, irrelevant information in the images can be eliminated through data normalization and PCA whitening processing, the detectability of the relevant information is enhanced, the data is simplified to the maximum extent,

C. optimizing a deep cascade network face detection algorithm MTCNN, training a face detection model, and detecting the processed image information to obtain face information;

D. and receiving the face information output by the face detection algorithm, and outputting image information including the face information.

Further, the step a specifically includes: photoelectric signals of the image acquired by the image acquisition equipment are converted into digital signals through sampling and quantization.

Further, the data augmentation processing in the step B comprises at least processing the image by adopting a geometric transformation means of translation, rotation, partial blacking and shearing, and carrying out Gaussian filtering and channel transformation operation on the image;

the data normalization processing in the step B specifically comprises the following steps: readjusting the value of each dimension of the data to enable the final data vector to fall in a [0,1] interval;

the PCA whitening processing in step B includes: firstly, carrying out PCA (principal component analysis) dimension reduction processing on the data to ensure that the correlation among the characteristics is low; each feature is then divided by the square root of the corresponding feature root of the data covariance matrix so that all features have the same variance.

Further, the step C of optimizing the MTCNN algorithm specifically includes the following steps:

s1, removing a maximum pooling layer in an original MTCNN structure of the deep cascade network face detection algorithm, and changing the convolution step length of a convolution layer in front of the pooling layer into the step length of the maximum pooling layer so as to achieve the purpose of reducing the calculated amount;

s2, adding a Batch Normalization layer after each convolution layer, and changing the distribution of input data of each hidden layer of the network into standard normal distribution with the mean value of 0 and the variance of 1;

in a neural network, data distribution of each hidden layer changes, the hidden layer is easily in a saturation region of an activation function after being input into an activation layer, the gradient is close to 0 at the moment, the problem that the gradient disappears occurs, the convergence speed of a model is slow, even the convergence is not performed, and after a Batch Normalization layer is used, the probability of the data is enabled to appear in an approximate linear region of the activation function, so that the gradient is large, the convergence of the model can be accelerated, and the precision of the model is improved;

s3, converting the common convolution Conv into a staggered grouping convolution IGCV; dividing input channels of the input feature map of each convolution layer into a plurality of groups, wherein the number of the channels of each group is the same, and then performing convolution operation on each group;

and recombining the channels of the plurality of feature maps obtained after the grouping convolution operation, splicing the recombined feature maps on the channel dimension, performing the second grouping convolution, and performing the convolution operation by using a convolution kernel of 1x1, thereby achieving the purpose of reducing the calculation amount and ensuring the model accuracy.

Meanwhile, the invention also discloses a face detection system based on the low-power-consumption embedded platform, which comprises the following components: the human face detection system comprises an image input module, an image processing module, a human face detection module and an image output module, wherein the input end of the image processing module is connected with the output end of the image input module, the output end of the image processing module is connected with the input end of the human face detection module, and the output end of the human face detection module is connected with the input end of the image output module; wherein the content of the first and second substances,

the image processing module is used for carrying out data amplification, data normalization and PCA whitening processing on the received video image information and transmitting the processed video image information to the face detection module, the face detection module is used for optimizing a face detection algorithm MTCNN of a deep cascade network, a face detection model is trained to detect the video image information input by the image processing module, the obtained face information is transmitted to the image output module, and the image output module is used for receiving the face information output by the face detection module and outputting image information including the face information;

that is, in the face detection system based on the low-power embedded platform of the invention, after the image input module acquires the image information, the image processing module can perform data augmentation, normalization and PCA whitening processing on the picture data input from the image input module, thereby eliminating irrelevant information in the image and simplifying data to the maximum extent, the face detection module carries out face detection on the image, the face detection algorithm of the deep cascade network is optimized by using methods such as pooling removal, Batch Normalization (Batch Normalization), staggered packet convolution (IGCV) and the like, the redundancy and the calculation amount of model parameters are reduced while the precision is ensured, so that the algorithm can run on an embedded platform with low power consumption, and meanwhile, the precision loss is within 1 percent, and finally, the image output module receives the face information output from the face detection module and outputs the image information including the face information.

Further, the image input module is used for converting photoelectric signals of the image acquired by the image acquisition device into digital signals through sampling and quantization so as to acquire image information.

Furthermore, the image processing module performs data augmentation processing by at least adopting a geometric transformation means of translation, rotation, partial blacking and shearing on the image, and performing Gaussian filtering and channel transformation operation on the image;

when the image processing module carries out data normalization processing, the value of each dimensionality of the data is readjusted, so that the final data vector falls in a [0,1] interval;

when the image processing module carries out PCA whitening processing, PCA dimension reduction processing is carried out on the data, so that the correlation among the characteristics is low; each feature is then divided by the square root of the corresponding feature root of the data covariance matrix so that all features have the same variance.

Further, the face detection module includes an optimized deep cascade network face detection unit for optimizing a deep cascade network face detection algorithm MTCNN, and the optimizing of the deep cascade network face detection unit for optimizing the deep cascade network face detection algorithm MTCNN specifically includes:

removing a maximum pooling layer in an original MTCNN structure of the deep cascade network face detection algorithm, and changing the convolution step length of a convolution layer in front of the pooling layer into the step length of the maximum pooling layer;

adding a Batch Normalization layer behind each convolution layer, and changing the distribution of input data of each hidden layer of the network into standard normal distribution with the mean value of 0 and the variance of 1;

converting the common convolution Conv into interleaved grouped convolution IGCV, and dividing input channels of the input characteristic graph of each convolution layer into a plurality of groups, wherein the number of the channels of each group is the same;

performing convolution operation on each group, and recombining the channels of the plurality of characteristic graphs obtained after the grouping convolution operation;

and splicing the recombined feature maps on the channel dimension, and performing a second packet convolution to perform convolution operation by a convolution kernel of 1x 1.

Compared with the prior art, the invention has the following beneficial effects:

the face detection method and the face detection system based on the low-power-consumption embedded platform can solve the problems that the existing face detection algorithm has huge model parameters, large calculation amount and is difficult to operate in an embedded mode, the obtained face detection algorithm can operate in the embedded platform after the algorithm is improved, the precision loss is low and can be guaranteed to be within 1%, the face detection algorithm can be applied to embedded equipment such as an air conditioner, a television and the like, the user experience is improved, and the face detection method and the face detection system based on the low-power-consumption embedded platform have the advantages of low power consumption, high speed, high precision and the like.

Drawings

Fig. 1 is a schematic flow chart of a face detection method based on a low-power-consumption embedded platform according to the present invention.

Fig. 2 is a schematic diagram of the face detection system based on the low power consumption embedded platform according to the invention.

Detailed Description

The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.

Example (b):

the first embodiment is as follows:

as shown in fig. 1, a face detection method for a low-power-consumption embedded platform includes the following specific implementation steps:

A. the image information is acquired, and specifically, the image information can be input through receiving RGB image input devices such as a camera, a video camera and the like.

B. The processing of the input image data specifically includes the following processing:

B1. the number of images is increased through data augmentation, the generalization capability of the model is improved, specifically, the data augmentation adopts geometric transformation means such as translation, rotation, partial blacking, shearing and the like, and simultaneously Gaussian filtering and channel transformation operation are carried out on the images;

B2. data normalization and PCA whitening processing: the data normalization is to readjust the value of each dimension of the data so that the final data vector falls in the [0,1] interval; the PCA whitening processing is to firstly carry out PCA dimension reduction processing on the data to ensure that the correlation among the characteristics is lower; each feature is then divided by the square root of the corresponding feature root of the data covariance matrix so that all features have the same variance.

C. The method comprises the following steps of optimizing a deep cascade network face detection algorithm MTCNN, training a face detection model, detecting a face and outputting face information, and specifically comprises the following steps:

C1. the maximum pooling layer (MP) in the MTCNN structure is removed, the convolution step length of the convolution layer in front of the maximum pooling layer is changed into the step length of the maximum pooling layer, and the purpose of reducing the calculated amount is achieved;

C2. in the MTCNN structure of the face detection algorithm of the deep cascade network, a Batch Normalization layer is added behind each convolution layer, so that the convergence of the model is accelerated, and the precision of the model is improved;

C3. in the MTCNN structure of the deep cascade network face detection algorithm, the common convolution Conv is converted into the staggered grouping convolution IGCV, so that the effect of reducing the calculated amount is achieved, and the calculation accuracy can be ensured;

dividing input channels of the input feature map of each convolution layer into several groups, wherein the number of the channels of each group is the same, and then performing convolution operation on each group; and recombining the channels of the plurality of feature maps obtained after the grouping convolution operation, splicing the recombined feature maps on the channel dimension, and performing the second grouping convolution to perform the convolution operation by a convolution kernel of 1x 1. The purpose of reducing the calculated amount and ensuring the model precision is achieved;

for example, in this embodiment, there are six small blocks (channels) inside the ordinary convolution, six channels are obtained by convolution, if the size of the convolution kernel is 5 × 5, the calculation amount of each position is 6 × 5 × 5 × 6, in order to reduce the calculation amount, in this technical solution, the 6 channels are divided into the upper 3 channels and the lower 3 channels, and are respectively convolved, and then they are spliced together after the convolution is completed, and finally, 6 channels are obtained, where the calculation amount is that the upper is 3 × 5 × 5 × 3, and the lower is the same, and the entire calculation complexity is half smaller than that of the previous 6 × 5 × 5 × 6, but at the same time, the problem that the parameter utilization ratio may be insufficient is caused because the upper three channels are uncorrelated with the lower three channels.

In order to solve the above problems, the technical solution of the present application introduces a second group of convolutions, by rearranging the 6 channels in the order of 1, 2, 3, 4, 5, 6 from top to bottom, and according to 1, 4; 2. 5; 3. the combination of 6 is divided into three branches, so that each branch is convolved by 1 × 1 (convolution) to obtain three new groups of channels combined two by two, and each output channel is connected with the previous channel in an interleaving manner.

C4. Training an optimized face detection model by using the processed data, performing face detection on the image processed in the step B, and outputting face information such as face frame coordinates, face confidence score and the like;

D. and outputting image information including face information.

Example two

As shown in fig. 2, a face detection device of a low-power-consumption embedded platform specifically includes the following modules: the human face detection system comprises an image input module, an image processing module, a human face detection module and an image output module, wherein the input end of the image processing module is connected with the output end of the image input module, the output end of the image processing module is connected with the input end of the human face detection module, and the output end of the human face detection module is connected with the input end of the image output module.

Wherein the image input module: the image information is used for acquiring input image information, and specifically, the image information can be input by receiving an RGB image input device such as a camera, a video camera and the like.

An image processing module: the image processing module is used for processing the image data input by the image input module, and comprises the following steps when processing the image data:

and data augmentation, which can achieve the technical effects of increasing the number of images and improving the generalization capability of the model. Specifically, the data augmentation adopts geometric transformation means such as translation, rotation, partial blackening, shearing and the like, and simultaneously performs Gaussian filtering and channel transformation operation on the image.

Data normalization by readjusting the value of each dimension of the data so that the final data vector falls in the [0,1] interval, and PCA whitening.

The PCA whitening processing is to firstly carry out PCA dimension reduction processing on the data to reduce the correlation among the characteristics; each feature is then divided by the square root of the corresponding feature root of the data covariance matrix so that all features have the same variance.

The face detection module: the method is used for optimizing a deep cascade network face detection algorithm, training a face detection model, detecting a face and outputting face information. The working process of the face detection module specifically comprises the following steps:

firstly, removing a maximum pooling layer (MP) in an MTCNN structure of a deep cascade network face detection algorithm, and changing the convolution step length of a convolution layer in front of the maximum pooling layer into the step length of the maximum pooling layer to achieve the purpose of reducing the calculated amount;

and secondly, adding a Batch Normalization layer after each convolution layer in the MTCNN structure, thereby accelerating model convergence and improving model precision.

Then, in the MTCNN structure of the deep cascade network face detection algorithm, the common convolution Conv is converted into the staggered grouping convolution IGCV, so that the technical effects of reducing the calculation amount and ensuring the calculation accuracy are achieved.

And finally, dividing the input channels of the input feature maps of each convolution layer into a plurality of groups, wherein the number of the channels of each group is the same, performing convolution operation on each group, recombining the channels of the plurality of feature maps obtained after the grouping convolution operation, splicing the recombined feature maps on channel dimensionality, performing grouping convolution for the second time, and performing convolution operation by using a convolution kernel of 1x1 to achieve the purpose of reducing the calculated amount and ensuring the model accuracy.

In this embodiment, for example, there are six channels in the normal convolution, and six channels are obtained by convolution, and if the size of the convolution kernel is 5 × 5, the calculation amount is 6 × 5 × 5 × 6 for each position. In order to reduce the amount of calculation, in this embodiment, the 6 channels are divided into the upper 3 channels and the lower 3 channels, and the convolution is performed respectively, and after the convolution is performed, the upper 3 channels and the lower 3 channels are spliced together, so that the final result is 6 channels. The amount of computation at this time is 3 × 5 × 5 × 3 above and the same also below, the overall computational complexity is half less than that of the preceding 6 × 5 × 5 × 6, but this also leads to a problem that the parameter utilization may not be sufficient because the upper three channels are uncorrelated with the lower three channels.

A face detection model: the face detection module is used for training and optimizing the data processed by the image processing module, then carrying out face detection on the image to be detected, and outputting face information such as face box coordinates, face confidence score and the like.

An image output module: the face detection module is used for receiving the face information output by the face detection module and outputting image information including the face information.

In summary, the face detection method and system based on the low-power-consumption embedded platform of the invention can solve the problems that the current face detection algorithm has huge model parameters, large calculation amount and is difficult to operate in an embedded mode, and the obtained face detection algorithm can operate in the embedded type platform after the algorithm is improved, has little precision loss and can be ensured to be within 1%, and the face detection algorithm can be applied to embedded devices such as air conditioners, televisions and the like, thereby improving the user experience and having the advantages of low power consumption, high speed, high precision and the like.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A face detection method based on a low-power-consumption embedded platform is characterized by comprising the following steps:

A. acquiring input image information;

the optimized deep cascade network face detection algorithm MTCNN in the step C specifically comprises the following steps:

s1, removing a maximum pooling layer in an original MTCNN structure of the deep cascade network face detection algorithm, and changing the convolution step length of a convolution layer in front of the pooling layer into the step length of the maximum pooling layer;

recombining the channels of the multiple feature maps obtained after the grouping convolution operation, splicing the recombined feature maps on the channel dimension, performing the second grouping convolution, and performing the convolution operation by a convolution kernel of 1x 1;

2. The face detection method based on the low-power-consumption embedded platform according to claim 1, wherein the step a specifically comprises: photoelectric signals of the image acquired by the image acquisition equipment are converted into digital signals through sampling and quantization.

3. The human face detection method based on the low-power-consumption embedded platform according to claim 1, wherein the data augmentation processing in the step B comprises at least processing the image by adopting a geometric transformation means of translation, rotation, partial blackening and shearing, and performing Gaussian filtering and channel transformation operations on the image;

4. A face detection system based on a low-power embedded platform is characterized by comprising: the human face detection system comprises an image input module, an image processing module, a human face detection module and an image output module, wherein the input end of the image processing module is connected with the output end of the image input module, the output end of the image processing module is connected with the input end of the human face detection module, and the output end of the human face detection module is connected with the input end of the image output module; wherein the content of the first and second substances,

the face detection module comprises an optimized deep cascade network face detection unit for optimizing a deep cascade network face detection algorithm MTCNN, and the optimized deep cascade network face detection unit specifically comprises the following steps of:

5. The system according to claim 4, wherein the image input module is configured to acquire image information by converting a photoelectric signal of an image acquired by the image acquisition device into a digital signal through sampling and quantization.

6. The system according to claim 4, wherein the image processing module performs data augmentation processing by at least performing geometric transformation on the image by translation, rotation, partial blackening and shearing, and performing Gaussian filtering and channel transformation on the image;