CN108197613B

CN108197613B - Face detection optimization method based on deep convolution cascade network

Info

Publication number: CN108197613B
Application number: CN201810146901.XA
Authority: CN
Inventors: 王思俊; 刘琰; 王国峰; 慈红斌
Original assignee: Tiandy Technologies Co Ltd
Current assignee: Tiandy Technologies Co Ltd
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2022-02-08
Anticipated expiration: 2038-02-12
Also published as: CN108197613A

Abstract

The invention provides a face detection optimization method based on a deep convolution cascade network, which specifically comprises the following steps: detecting a possibly appearing area of the face, namely a hot area by using a deep cascade network; updating the hot area, and setting all areas which are not in the hot area to zero; and carrying out data sparse compression on the obtained whole image. The invention reduces the redundant computation amount to a great extent and improves the operation efficiency of the algorithm. The efficiency of the CNN network face detection algorithm on a front-end embedded platform can be improved by 20-30%, the method disclosed by the invention does not lose the detection precision, a better solution is provided for the rapid operation of a deep convolutional network on the front end, and a foundation is laid for the scale application of a later network.

Description

Face detection optimization method based on deep convolution cascade network

Technical Field

The invention belongs to the technical field of automatic detection, and particularly relates to a face detection optimization method based on a deep convolution cascade network.

Background

Wide and deep networks bring better effects, but the calculated amount is also huge, so that how to enable a CNN network to run on an embedded type is a problem at present. The general approach to solve this problem is: 1. lighter CNN networks are designed to reduce the amount of computation, but small networks generally come at the expense of accuracy. 2. The acceleration is carried out by adopting special instructions according to different embedded platforms, and the compiling and debugging are time-consuming and labor-consuming. 3. The use of a hardcore with a CNN network for implementation is not problematic, but relatively costly.

Disclosure of Invention

In view of this, the present invention aims to provide a face detection optimization method based on a deep convolutional cascade network, so as to reduce the redundant computation amount and improve the detection rate.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a face detection optimization method based on a deep convolution cascade network specifically comprises the following steps:

(1) detecting a possibly appearing area of the face, namely a hot area by using a deep cascade network;

(2) updating the hot area, and setting all areas which are not in the hot area to zero;

(3) and carrying out data sparse compression on the obtained whole image.

Further, the step (1) specifically comprises the following steps:

(11) a depth cascade network is adopted, and a suspected area of a target is quickly determined by utilizing a first layer of depth network;

(12) filling the pyramid face area, resetting the interference area, and expanding the filling graph;

(13) and comparing the filled human face hot area with the original image to obtain a human face hot area map which is finally sent to a subsequent network.

Further, the step (2) specifically includes the steps of:

(21) dividing a processing frame into two odd-even processing branches according to the frame number of the video;

(22) carrying out full-image detection on odd frames and generating a hot area map at the same time, and transmitting the generated hot area to an even frame processing branch;

(23) and the even frame processing branch performs data sparse processing on the hot area map and sends the hot area map into the CNN network for operation.

Further, the step (3) specifically comprises the following steps:

(31) performing pixel-wise AND on the pyramid image and the corresponding heat map, and enabling all non-face areas to be 0;

(32) and converting the generated graph into a sparse matrix.

Compared with the prior art, the face detection optimization method based on the deep convolution cascade network has the following advantages:

the invention reduces the redundant computation amount to a great extent and improves the operation efficiency of the algorithm. The efficiency of the CNN network face detection algorithm on a front-end embedded platform can be improved by 20-30%, the method disclosed by the invention does not lose the detection precision, a better solution is provided for the rapid operation of a deep convolutional network on the front end, and a foundation is laid for the scale application of a later network.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic diagram of a face detection optimization method based on a deep convolutional cascade network according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a deep network model for face localization according to an embodiment of the present invention;

fig. 3 is a schematic diagram of image data sparseness according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, the invention provides a face detection optimization method based on a deep convolution cascade network, which greatly reduces the redundant computation amount to a great extent and improves the detection rate. The technical scheme of the invention is mainly realized by three major aspects of face hot area calculation, updating and data sparsity. And rapidly detecting a possibly appearing area of the face, namely a hot area, by the first-layer deep network, updating the hot area according to the method of the process, setting all the areas which are not in the hot area to zero, and performing data sparse compression on the obtained whole image to reduce the data calculation amount of the subsequent network.

1. Hot zone calculation:

because the deep cascade network is adopted, the front-layer network has the function of quickly detecting the areas similar to human faces, and the subsequent network removes false targets and corrects the target positions. The invention adopts the characteristic of the front-layer network, and the hot zone calculation is realized by dividing into three steps, which are as follows:

1) the suspected area of the target, namely the hot zone, is quickly determined by utilizing the first layer deep network.

Fig. 2 is a depth network model structure for locating a face of a 3-channel image with 12 × 12 input size, where Conv represents the size of a convolution kernel, MP represents a convolution kernel moving step size, face classification represents whether an input image block with 12 × 12 size is a face, and bounding box regression represents a specific position of a face regressed by a network.

The method comprises the steps of scanning an input image by taking 12-by-12 as a basic unit to judge whether each position is a face or not, regressing to obtain the coordinate position of the face with the size of a template, and carrying out pyramid transformation on the image by setting a certain coefficient (which can be defined automatically according to actual needs, generally, the numerical value is 0.6-0.99, the larger the numerical value is, the more the number of pyramid layers is), so that the face with any size in the image can traverse the template scale, the detection of the face with multiple scales is realized, and in the code implementation process, a full convolution mode can be adopted, so that the calculation is simpler.

2) And filling the pyramid face area, resetting the interference area, and expanding the filling map to avoid missing detection caused by target motion. The method comprises the specific steps that a pyramid image where a bounding box with the face classification of 1 is located is filled with the bounding box, the rest positions are cleared, in order to avoid position deviation caused by target motion, the filling image is expanded, the center is generally kept unchanged, and the filling width is changed to be 2 times of the original bounding box.

3) And comparing the filled human face hot area with the original image to obtain a human face hot area map which is finally sent to a subsequent network.

2. The hot zone is updated.

As the targets in the scene are in continuous motion and new targets appear from time to time, in order to prevent missing detection, the invention adopts a hot area updating algorithm to update the suspected area where the face appears in real time. The method comprises the following specific steps:

1) the processing frame is divided into two odd and even processing branches according to the frame number of the video.

2) Full picture detection is performed on odd frames while hot-zone maps are generated. And passes the generated hot zone to the even frame processing branch.

3) And the even frame processing branch performs data sparseness processing on the hot zone map and then sends the hot zone map into a CNN network for operation.

3. Image data sparseness

Sparse matrix vector multiplication can replace dense matrix operation under many conditions, so that memory occupation can be greatly saved, and calculation cost is reduced. Matrix vector multiplication is different from matrix and matrix multiplication, which is completely access-intensive calculation, and the main optimization direction is to improve the access efficiency or reduce the access overhead.

The image data sparseness can be illustrated by the following example as shown in fig. 3, where we take a triplet representation method to represent the sparse matrix as an example, and the upper sparse matrix can be represented as: ((1,4,22),(1,7,15),(2,2,11),(3,4, -6),(4,6,39),(6,3,28)). The sparse matrix represented by the ternary array table not only saves space, but also enables certain operations of the matrix to be less time than a classical algorithm. Of course, there are many methods for sparse matrix, and a faster and better matrix sparse method can be selected according to actual conditions.

The image data sparseness of the invention mainly comprises the following steps:

1) and performing pixel-wise AND operation on the pyramid image and the corresponding heat map, and enabling all non-face areas to be 0.

2) And converting the generated graph into a sparse matrix.

3) The linear operation library is adopted to directly convert the dense matrix into a sparse matrix for the convolution calculation of the neural network, and the conversion format is defined as follows:

the Eigen syntax is used to define the dynamic floating point type matrix: and invokes the classforms for sparse conversion. Eigen:: Map < const Eigen:: Matrix < float, Eigen:: Dynamic, Eigen:: RowMajor > > mat _ B (B, mat _ B _ row, mat _ B _ col);

mat_b.sparseView(0.3f)

4) the CNN convolution layer adopts the sparse matrix to carry out convolution operation, so that the operation efficiency is improved, and the CNN convolution layer is one of key points for improving the efficiency of the scheme.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A face detection optimization method based on a deep convolution cascade network is characterized in that: the method specifically comprises the following steps:

(1) detecting a possibly appearing area of the human face, namely a hot area by using a deep convolution cascade network;

the step (1) comprises the following steps:

(11) a depth convolution cascade network is adopted, and a suspected area of a target is quickly determined by utilizing a first layer of depth convolution network;

(13) comparing the filled human face hot area with the original image to obtain a human face hot area map which is finally sent to a subsequent network;

the step (2) comprises the steps of:

(22) carrying out full-image detection on odd frames, generating a hot area map at the same time, and transmitting the generated hot area map to an even frame processing branch;

(23) the even frame processing branch performs data sparse processing on the hot area map and sends the hot area map into a CNN network for operation;

(3) carrying out data sparse compression on the obtained whole image;

the step (3) comprises the following steps:

(31) performing pixel-wise AND operation on the pyramid image and the corresponding hot area map, and enabling all non-face areas to be 0;

(32) and converting the generated graph into a sparse matrix.