CN106778550A

CN106778550A - A kind of method and apparatus of Face datection

Info

Publication number: CN106778550A
Application number: CN201611082414.9A
Authority: CN
Inventors: 万韶华
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2017-05-31
Anticipated expiration: 2036-11-30
Also published as: CN106778550B

Abstract

The disclosure is directed to a kind of method and apparatus of Face datection, belong to field of computer technology.Methods described includes：In depth convolutional network model to be used, the convolution kernel of target convolutional layer is obtained；Convolution kernel to the target convolutional layer carries out CP decomposition, obtains the low-rank convolution kernel of the target convolutional layer；In the depth convolutional network model to be used, the convolution kernel of the target convolutional layer is replaced with into corresponding low-rank convolution kernel, the depth convolutional network model after being adjusted；Face datection is carried out to image based on the depth convolutional network model after the adjustment.Using the disclosure, the processing speed of Face datection can be improved.

Description

Face detection method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for face detection.

Background

The face detection technology is a technology for positioning a face in an image according to feature information of the face. The algorithm model used by the general face detection technology is a deep convolution network model, and the specific processing is as follows:

and taking the image to be detected as the input of a preset depth convolution network model, and obtaining the face position information in the image to be detected through multilayer processing and full-connection processing. The multi-layer processing generally includes at least one layer of convolution processing and at least one layer of pooling processing, and the layer subjected to convolution processing may be referred to as a convolution layer and the layer subjected to pooling processing may be referred to as a pooling layer. In a multi-layer process, the output of the previous layer process is used as the input of the next layer. When a convolutional layer is convolved, the output data of the convolutional layer processing is generally obtained by multiplying the output data (which may be a matrix or a vector) of the previous layer processing by the convolution kernel (which is a matrix composed of a plurality of different parameters) of the convolutional layer.

In general, a deep convolutional network model includes a plurality of convolutional layers, each convolutional layer corresponds to a plurality of convolutional kernels, and thus, because the number of convolutional kernels of the convolutional layers is large, a large amount of complex calculation is required during convolutional processing, and the processing speed of face detection is slow.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a method and an apparatus for face detection. The technical scheme is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a method for face detection, the method including:

acquiring a convolution kernel of a target convolution layer in a deep convolution network model to be used;

performing canonical CP decomposition on the convolution kernel of the target convolution layer to obtain a low-rank convolution kernel of the target convolution layer;

in the deep convolution network model to be used, replacing the convolution kernel of the target convolution layer with a corresponding low-rank convolution kernel to obtain an adjusted deep convolution network model;

and carrying out face detection on the image based on the adjusted depth convolution network model.

Optionally, the method further includes:

setting the value of the model parameter in the adjusted deep convolutional network model as the training initial value of the model parameter of the adjusted deep convolutional network model, and retraining the adjusted deep convolutional network model;

the detecting the human face of the image based on the adjusted depth convolution network model comprises the following steps:

and carrying out face detection on the image based on the retrained deep convolutional network model.

Optionally, the retraining the adjusted deep convolutional network model includes:

determining a training value of the model parameter corresponding to each preset sample image based on an error feedback algorithm, wherein when the value of the model parameter in the adjusted depth convolution network model is the training value and the input image of the adjusted depth convolution network model is the sample image, the output value of the adjusted depth convolution network model and a preset reference output value corresponding to the sample image meet a preset matching condition;

determining the average value of the training values of the model parameters corresponding to each sample image;

and adjusting the values of the model parameters in the adjusted deep convolutional network model into corresponding average values to obtain the retrained deep convolutional network model.

Optionally, the performing CP decomposition on the convolution kernel of the target convolutional layer to obtain a low-rank convolution kernel of the target convolutional layer includes:

and performing CP decomposition on the convolution kernels d multiplied by S multiplied by T of the target convolution layer to obtain four low-rank convolution kernels d multiplied by R, d multiplied by R, S multiplied by R and T multiplied by R of the target convolution layer, wherein d is the number of rows and columns of the convolution kernels, S is the number of color channels, T is the number of convolution kernels of the target convolution layer, and R is the rank of the convolution kernels of the target convolution layer.

and if the convolution kernel of the target convolution layer is a full-rank matrix, performing CP decomposition on the convolution kernel of the target convolution layer to obtain a low-rank convolution kernel of the target convolution layer.

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for face detection, the apparatus comprising:

the acquisition module is used for acquiring a convolution kernel of the target convolution layer in the deep convolution network model to be used;

the decomposition module is used for performing canonical CP decomposition on the convolution kernel of the target convolution layer to obtain a low-rank convolution kernel of the target convolution layer;

the replacing module is used for replacing the convolution kernel of the target convolution layer with a corresponding low-rank convolution kernel in the deep convolution network model to be used to obtain an adjusted deep convolution network model;

and the detection module is used for carrying out face detection on the image based on the adjusted depth convolution network model.

Optionally, the apparatus further comprises:

the training module is used for setting the value of the model parameter in the adjusted deep convolutional network model as the training initial value of the model parameter of the adjusted deep convolutional network model and retraining the adjusted deep convolutional network model;

the detection module is configured to:

Optionally, the training module includes a first determining sub-module, a second determining sub-module, and an adjusting sub-module, wherein:

the first determining submodule is configured to determine, for each preset sample image, a training value of the model parameter corresponding to the sample image based on an error back-transfer algorithm, where when a value of the model parameter in the adjusted deep convolutional network model is the training value and an input image of the adjusted deep convolutional network model is the sample image, an output value of the adjusted deep convolutional network model and a preset reference output value corresponding to the sample image satisfy a preset matching condition;

the second determining submodule is used for determining the average value of the training values of the model parameters corresponding to each sample image;

and the adjusting module is used for adjusting the values of the model parameters in the adjusted deep convolutional network model into corresponding average values to obtain the retrained deep convolutional network model.

Optionally, the decomposition module is configured to:

According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for face detection, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

performing CP decomposition on the convolution kernel of the target convolution layer to obtain a low-rank convolution kernel of the target convolution layer;

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the embodiment of the disclosure, in the deep convolutional network model to be used, the server may obtain a convolutional kernel of the target convolutional layer, perform CP decomposition on the convolutional kernel of the target convolutional layer to obtain a low-rank convolutional kernel of the target convolutional layer, replace the convolutional kernel of the target convolutional layer with a corresponding low-rank convolutional kernel in the deep convolutional network model to be used to obtain an adjusted deep convolutional network model, and perform face detection on an image based on the adjusted deep convolutional network model in a subsequent image face detection process. Therefore, when the deep convolutional network model is used for detecting the image face, the convolutional kernel of the convolutional layer is a low-rank convolutional kernel, the parameters in the low-rank convolutional kernel are less, and the data processing amount is smaller, so that the processing speed of the face detection can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. In the drawings:

fig. 1 is a flowchart of a method for detecting a human face according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a target convolutional layer convolution kernel provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a face detection process provided by an embodiment of the present disclosure;

FIG. 4 is a flowchart of a training method of a deep convolutional network model provided by an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an apparatus for face detection according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an apparatus for face detection according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an apparatus for face detection according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The embodiment of the disclosure provides a face detection method, and an execution subject of the method can be a server, wherein the server can be a background server of a face detection application program. The server can be provided with a processor, a memory and the like, the processor can be used for processing in the process of face detection, and the memory can be used for storing data required in the process of face detection and generated data.

As shown in fig. 1, the processing flow of the method may include the following steps:

in step 101, the convolution kernel of the target convolution layer is obtained in the deep convolutional network model to be used.

The target convolutional layer may be one convolutional layer or a plurality of convolutional layers.

In implementation, the deep convolutional network model to be used includes at least one convolutional layer, each convolutional layer includes a preset number of convolutional kernels, the number of parameters of each convolutional kernel is the same, the parameter values of the parameters are different, and the parameter value of each convolutional kernel is determined. Before the server performs face detection on the image, the server may obtain a convolution kernel of the target convolution layer in at least one convolution layer.

In step 102, canonical CP decomposition is performed on the convolution kernel of the target convolutional layer to obtain a low-rank convolution kernel of the target convolutional layer.

In an implementation, after the server obtains the convolution kernel of the target convolution layer, CP (systematic convolutional decomposition) decomposition may be performed on each convolution kernel of the target convolution layer to obtain a low-rank convolution kernel corresponding to each convolution kernel, so as to obtain a low-rank convolution kernel of the target convolution layer.

Optionally, the convolution kernel of the target convolutional layer may be decomposed into four low-rank convolution kernels, and the corresponding processing of step 102 may be as follows:

Where d × d × S × T represents the number of rows and columns of the convolution kernel, S represents the number of color channels, i.e., RGB (Red Green Blue), and S is generally 3, T represents the number of convolution kernels of the target convolution layer, and R represents the rank of convolution kernels of the target convolution layer.

In an implementation, as shown in fig. 2, the server acquires the target convolutional layer with a convolutional kernel size of d × d, T convolutional kernels with a size of d × d in total, and S color channels, so that the server acquires the target convolutional layer with a convolutional kernel size of d × d × S × T, and can decompose the convolutional kernel size of d × d × S × T into four low-rank convolutional kernels with sizes of d × R, d × R, S × R and T × R according to the CP decomposition method.

For example, the input of the target convolutional layer is represented by a matrix U of size X × Y × S, U ═ X × Y × S, the convolution kernel of the target convolutional layer is represented by a matrix k of size d × d × S × T, k ═ d × d × S × T, the output of the target convolutional layer is represented by a matrix V of size (X-d +1) × (Y-d +1), and the output of the target convolutional layer is represented by formula (1)

Wherein,the convolution kernel k is decomposed by using CP to obtain an expression (2),

in the formula (2), k^x(i-x+,r)、k^y(j-y+,r)、k^s(s,r)、k^t(T, R) represents four component matrices of d × R, d × R, S × R and T × R, respectively, and formula (2) is substituted into formula (1) to obtain formula (3)

Thus, the output V (x, y, t) of the convolutional layer can be calculated by the following low-rank convolution kernel:

as described above, the output of the target convolution layer can be expressed by expression (7), and when the target convolution layer is subjected to convolution processing, the complexity of the multiplication calculation to be performed becomes X × Y × d²S T is changed into X Y R (2d + S + T), and the complexity of calculation is reduced because R is far less than d,the efficiency of face detection can be improved.

Optionally, if the convolution kernel of the target convolutional layer is a full-rank matrix, performing CP decomposition on the convolution kernel of the target convolutional layer to obtain a low-rank convolution kernel of the target convolutional layer.

In implementation, after the server obtains the convolution kernel of the target convolution layer, the server may determine the rank of the convolution kernel, and if the rank of the convolution kernel is equal to the number of rows of the convolution kernel or equal to the number of columns of the convolution kernel, the server determines that the convolution kernel is a full-rank matrix, and then performs CP decomposition on the target convolution kernel to obtain a low-rank convolution kernel of the target convolution layer.

In step 103, in the deep convolutional network model to be used, the convolutional kernel of the target convolutional layer is replaced by a corresponding low-rank convolutional kernel, so as to obtain an adjusted deep convolutional network model.

In implementation, after the server determines the low-rank convolution kernel of the target convolution kernel, each convolution kernel of the target convolution layer may be replaced by a corresponding low-rank convolution kernel in the deep convolution network model to be used, and the low-rank convolution kernels are stored to obtain the adjusted deep convolution network model.

In step 104, face detection is performed on the image based on the adjusted deep convolutional network model.

In implementation, as shown in fig. 3, after the server determines the adjusted deep convolutional network model, the adjusted deep convolutional network model may be used in face detection of an image, and the processing may be: inputting an image to be detected as an adjusted depth convolution network model, obtaining an N x N image after convolution processing and pooling processing, then dividing the N x N image into a preset number of image blocks with equal size, regarding each image block, taking a central position point of the image block as a central position point of a candidate frame according to the width-to-height ratio of 1:2, 1:1 and 2:1 and the area of 128²、256²And 512²Adding candidate frames in the image block, determining the position information of each candidate frame, and then obtaining the image in the candidate frameImage feature vectors of the image. And then carrying out full-connection processing, wherein the process is to multiply the acquired image feature vector by a preset matrix W to obtain the category of the image feature contained in each candidate frame and the position of each candidate frame needing to be adjusted, so that the position information of the face in the image can be determined.

The embodiment of the present disclosure further provides a process of retraining the adjusted deep convolutional network model, and the corresponding processing may be as follows:

and setting the value of the model parameter in the adjusted deep convolutional network model as the training initial value of the model parameter of the adjusted deep convolutional network model, retraining the adjusted deep convolutional network model, and performing face detection on the image based on the retrained deep convolutional network model.

The model parameters comprise parameters in a convolution kernel, other parameters in a deep convolution network model such as parameters in a pooling kernel and the like.

In implementation, after the adjusted deep convolutional network model is determined, values of model parameters in the adjusted deep convolutional network model can be obtained, then the obtained values of the model parameters are used as training initial values of the model parameters of the convolutional network model, the adjusted deep convolutional network model is retrained, and after the retrained deep convolutional network model is obtained, face detection can be performed on the image based on the retrained deep convolutional network model.

Optionally, the process of retraining the adjusted deep convolutional network model is the same as the training process of a general convolutional network model, as shown in fig. 4, the specific processing steps may be as follows:

in step 401, for each preset sample image, a training value of a model parameter corresponding to the sample image is determined based on an error back-transmission algorithm, where when a value of the model parameter in the adjusted deep convolutional network model is the training value and an input image of the adjusted deep convolutional network model is the sample image, an output value of the adjusted deep convolutional network model and a preset reference output value corresponding to the sample image satisfy a preset matching condition.

The preset matching condition may be that a difference between an output value of the adjusted depth convolution network model and a preset reference output value corresponding to the sample image is smaller than a preset threshold value, and the like.

In implementation, after the adjusted depth convolutional network model is determined, values of model parameters in the adjusted depth convolutional network model can be obtained, and a preset sample image is obtained, wherein the sample image also corresponds to a preset reference output value. In the training process, determining an objective function corresponding to the adjusted deep convolutional network model, wherein the independent variable of the objective function is the input x of the adjusted deep convolutional network model, the dependent variable is the output y of the adjusted deep convolutional network model, the parameters are model parameters w, w represents a plurality of parameters, and the objective function is represented as:taking a certain sample image (which can be called as a first sample image) as the input of the adjusted deep convolutional network model, performing forward propagation, determining the value of y of the target function, if the value of y of the target function and the preset reference output value corresponding to the sample image do not meet the preset matching condition, taking the difference value between the target function and the preset reference output value, then squaring to obtain a loss function L, then using an error return method, executing a backward propagation process through the adjusted deep convolutional network model, when executing the backward propagation process, firstly obtaining a preset learning rate α, then taking a partial derivative of each parameter based on the loss function, and calculating the parameter value used next time by the parameterUntil the next used parameter value of each parameter in the model parameters in the adjusted deep convolutional network model is calculated. Then, the parameter values of the determined model parameters are updated to the adjusted deep convolution network modelAnd taking the first sample image as the input of the adjusted deep convolutional network model, then executing forward propagation and backward propagation until the output value obtained when the first sample image is taken as the input of the adjusted deep convolutional network model and the preset reference output value corresponding to the sample image meet the preset matching condition, and determining the parameter value of the model parameter as the training value. The above-mentioned process is a training process based on a sample image, and the above-mentioned process is executed for each sample image until determining the training value of the model parameter corresponding to the adjusted deep convolutional network model when training based on each sample image.

In step 402, an average of the training values of the model parameters corresponding to each sample image is determined.

In implementation, the training values of the model parameters determined by using each sample image are respectively averaged to obtain the values of each model parameter in the adjusted deep convolutional network model.

In step 403, the values of the model parameters in the adjusted deep convolutional network model are adjusted to corresponding average values, so as to obtain a retrained deep convolutional network model.

In implementation, the values of the model parameters in the adjusted deep convolutional network model are respectively adjusted to corresponding average values and stored, so that the retrained deep convolutional network model is obtained.

For the learning rate mentioned in the training process, the parameter value of the model parameter is finely adjusted on the basis of determining the parameter value of the model parameter of the adjusted deep convolutional network model, so that the learning rate can be set to be smaller.

In the embodiment of the disclosure, in the deep convolutional network model to be used, the server may obtain a convolutional kernel of the target convolutional layer, perform CP decomposition on the convolutional kernel of the target convolutional layer to obtain a low-rank convolutional kernel of the target convolutional layer, replace the convolutional kernel of the target convolutional layer with a corresponding low-rank convolutional kernel in the deep convolutional network model to be used to obtain an adjusted deep convolutional network model, and perform face detection on an image based on the adjusted deep convolutional network model in a subsequent image face detection process. Therefore, when the deep convolutional network model is used for detecting the image face, the convolutional kernel of the convolutional layer is a low-rank convolutional kernel, the parameters of the low-rank convolutional kernel are less, and the data processing amount is smaller, so that the processing speed of the face detection can be improved.

Another embodiment of the present disclosure provides an apparatus for detecting a human face, as shown in fig. 5, the apparatus including:

an obtaining module 510, configured to obtain a convolution kernel of the target convolution layer in a deep convolution network model to be used;

a decomposition module 520, configured to perform canonical CP decomposition on the convolution kernel of the target convolution layer to obtain a low-rank convolution kernel of the target convolution layer;

a replacing module 530, configured to replace, in the deep convolutional network model to be used, the convolutional kernel of the target convolutional layer with a corresponding low-rank convolutional kernel, so as to obtain an adjusted deep convolutional network model;

and a detection module 540, configured to perform face detection on the image based on the adjusted deep convolutional network model.

Optionally, as shown in fig. 6, the apparatus further includes:

a training module 550, configured to set a value of a model parameter in the adjusted deep convolutional network model as a training initial value of the model parameter of the adjusted deep convolutional network model, and retrain the adjusted deep convolutional network model;

the detecting module 540 is configured to:

Optionally, as shown in fig. 7, the training module 550 includes a first determining submodule 551, a second determining submodule 552 and an adjusting submodule 553, wherein:

the first determining submodule 551 is configured to determine, for each preset sample image, a training value of the model parameter corresponding to the sample image based on an error back-propagation algorithm, where when a value of the model parameter in the adjusted deep convolutional network model is the training value and an input image of the adjusted deep convolutional network model is the sample image, an output value of the adjusted deep convolutional network model and a preset reference output value corresponding to the sample image satisfy a preset matching condition;

the second determining submodule 552 is configured to determine an average value of the training values of the model parameter corresponding to each sample image;

the adjusting submodule 553 is configured to adjust the values of the model parameters in the adjusted deep convolutional network model to corresponding average values, so as to obtain a retrained deep convolutional network model.

Optionally, the decomposition module 520 is configured to:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

It should be noted that: in the face detection device provided in the above embodiment, when performing face detection, only the division of the functional modules is illustrated, and in practical application, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the face detection apparatus provided in the above embodiment and the face detection method embodiment belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiment and are not described herein again.

Yet another exemplary embodiment of the present disclosure provides a structural diagram of a server. Referring to fig. 8, server 800 includes a processing component 1922 that further includes one or more processors and memory resources, represented by memory 1932, for storing instructions, such as applications, that are executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the method of displaying usage records described above.

The server 800 may also include a power component 1926 configured to perform power management for the server 800, a wired or wireless network interface 1950 configured to connect the server 800 to a network, and an input/output (I/O) interface 1958. The server 800 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

The server 800 may include memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

Optionally, the method further includes:

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of face detection, the method comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the retraining the adjusted deep convolutional network model comprises:

4. The method of claim 1, wherein the performing CP decomposition on the convolution kernels of the target convolutional layer to obtain low-rank convolution kernels of the target convolutional layer comprises:

5. The method of claim 1, wherein the performing CP decomposition on the convolution kernels of the target convolutional layer to obtain low-rank convolution kernels of the target convolutional layer comprises:

6. An apparatus for face detection, the apparatus comprising:

7. The apparatus of claim 6, further comprising:

the detection module is configured to:

8. The apparatus of claim 7, wherein the training module comprises a first determination sub-module, a second determination sub-module, and an adjustment sub-module, wherein:

9. The apparatus of claim 6, wherein the decomposition module is configured to:

10. The apparatus of claim 6, wherein the decomposition module is configured to:

11. An apparatus for face detection, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: