CN111951373B

CN111951373B - Face image processing method and equipment

Info

Publication number: CN111951373B
Application number: CN202010623139.7A
Authority: CN
Inventors: 徐博
Original assignee: Chongqing Spiritplume Interactive Entertainment Technology Co ltd
Current assignee: Chongqing Spiritplume Interactive Entertainment Technology Co ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2024-02-13
Anticipated expiration: 2040-06-30
Also published as: CN111951373A

Abstract

The invention discloses a processing method and equipment of a face image, which are used for generating a preset neural network model in advance based on training data and a preset neural network structure, wherein the preset neural network structure comprises a convolution neural network block model and a convolution kernel, and the method comprises the following steps: receiving a face image to be processed, and acquiring illumination parameters of the face image to be processed based on the preset neural network model, wherein the illumination parameters comprise spherical harmonic illumination coefficients and normal mapping; and generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter. The illumination information of the face image is obtained through the preset neural network model, and special light detection equipment is not needed, so that the accuracy and stability of deglazing the face image are improved on the basis of not improving the cost.

Description

Face image processing method and equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and apparatus for processing a face image.

Background

In the process of forming the 3D face reconstruction by scanning the face through a camera, image information in a photo needs to be acquired, then the information is made into a UV (ultraviolet) map, and the UV map is attached to a 3D face mesh to be displayed in a 3D scene. The images in the photographs contain rich light information in the photographing environment, whereas in general 3D scenes, the addition of simulated straight-line light and point light sources is required. If the face image in the photo is not polished, the photo and scene light information are overlapped, and the performance of the face in the 3D scene is seriously affected.

One of the preconditions for removing light from face images in photographs is that illumination information of the position of the face images can be captured when photographing. The illumination information includes information such as ambient light, straight line light, spot light, and the like. The most direct method in the prior art is to have a light detection device which is placed at the position of the face to collect the illumination information. Current mobile phone devices are limited by technology and cost, most mobile phones do not have a sensor for detecting light, and the accuracy is not enough if any.

The prior art also carries out large-area modification on pixels of the original image through a global histogram adjustment and gamma correction scheme. However, this scheme can only change the brightness of the whole, but cannot effectively remove the shadow and highlight areas.

There are also prior art schemes for gamma correction by iteration after shadow-based object detection. However, this scheme is limited by the accuracy of shadow detection, which is not stable enough because a proper threshold cannot be determined to determine the system of shadow and gamma correction.

Therefore, how to improve the accuracy and stability of the face image for deglazing without increasing the cost is a technical problem to be solved at present.

Disclosure of Invention

In view of the defects of high cost, insufficient precision and poor stability of a face image dimming method in the prior art, the invention provides a face image processing method, which is used for generating a preset neural network model based on training data and a preset neural network structure in advance, and comprises the following steps:

receiving a face image to be processed, and acquiring illumination parameters of the face image to be processed based on the preset neural network model, wherein the illumination parameters comprise spherical harmonic illumination coefficients and normal mapping;

generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter;

the training data comprises a preset face image, a real spherical harmonic illumination coefficient of the preset face image and a real normal map.

Preferably, the generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter specifically includes:

acquiring a first matrix corresponding to the face image to be processed based on the face image to be processed;

acquiring a second matrix corresponding to the spherical harmonic illumination coefficient and a third matrix corresponding to the normal map based on the illumination parameter;

and acquiring a fourth matrix according to the first matrix, the second matrix and the third matrix, and acquiring the dimming image based on the fourth matrix.

Preferably, a fourth matrix is obtained according to the first matrix, the second matrix and the third matrix, specifically:

the fourth matrix is obtained according to a dimming formula, wherein the dimming formula specifically comprises: a=b/(C x D),

wherein a is the fourth matrix, B is the first matrix, C is the second matrix, and D is the third matrix.

Preferably, the convolutional neural network block model is a residual network block model, wherein a preset number of residual network block models are not connected with a full connection layer of the preset neural network structure.

Preferably, the training data is data subjected to data enhancement processing, and the data enhancement processing includes increasing the background of the preset face image and/or changing the rotation angle of the preset face image.

Preferably, the preset neural network model is generated based on training data and a preset neural network structure, specifically:

determining initial parameters of a preset neural network structure according to the length and the width of the preset face image, wherein the initial parameters comprise the number of units of an input layer, the input number and the output number of each hidden layer and an initial weight value;

inputting the preset face image into the input layer, and determining an output layer result based on a forward propagation algorithm and the initial parameters;

determining a loss function according to the output layer result and the training data;

training according to a preset learning rate based on an optimization algorithm and a back propagation algorithm, and determining a minimum loss value of the loss function according to a training result, wherein the preset learning rate is the learning rate determined by estimating an Adam algorithm based on an adaptive matrix;

and determining the preset neural network model according to the weight value corresponding to the minimum loss value.

Correspondingly, the invention also provides a processing device of the face image, which comprises:

the acquisition module is used for receiving the face image to be processed and acquiring illumination parameters of the face image to be processed based on a preset neural network model, wherein the illumination parameters comprise spherical harmonic illumination coefficients and normal mapping, and the preset neural network model is generated in advance based on training data and a preset neural network structure;

the generating module is used for generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter;

Preferably, the generating module is specifically configured to:

Preferably, the generating module is further configured to:

Compared with the prior art, the invention has the following beneficial effects:

the invention discloses a processing method and equipment of a face image, which are used for generating a preset neural network model in advance based on training data and a preset neural network structure, wherein the preset neural network structure comprises a convolution neural network block model and a convolution kernel, and the method comprises the following steps: receiving a face image to be processed, and acquiring illumination parameters of the face image to be processed based on the preset neural network model, wherein the illumination parameters comprise spherical harmonic illumination coefficients and normal mapping; and generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter. The illumination information of the face image is obtained through the preset neural network model, and special light detection equipment is not needed, so that the accuracy and stability of deglazing the face image are improved on the basis of not improving the cost, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a face image processing method according to an embodiment of the present invention;

fig. 2 is a flow chart illustrating a face image processing method according to another embodiment of the present invention;

FIG. 3 shows a schematic diagram of a set of training data in an embodiment of the invention;

FIG. 4 is a schematic diagram of a preset neural network according to an embodiment of the present invention;

fig. 5 shows a front-back contrast diagram of face image dimming in an embodiment of the present invention;

fig. 6 shows a schematic structural diagram of a face image processing device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

As described in the background art, the method for removing the light from the face image in the photo in the prior art has high cost, insufficient precision and poor stability.

In order to solve the above problems, an embodiment of the present invention provides a method for processing a face image, which generates a preset neural network model in advance based on training data and a preset neural network structure, acquires illumination parameters of the face image to be processed based on the preset neural network model, and generates a dimming image of the face image to be processed according to the face image to be processed and the illumination parameters. The illumination information of the face image is obtained through the preset neural network model, and special light detection equipment is not needed, so that the accuracy and stability of deglazing the face image are improved on the basis of not improving the cost.

Fig. 1 is a schematic flow chart of a face image processing method according to an embodiment of the present invention, where the method generates a preset neural network model in advance based on training data and a preset neural network structure.

The method comprises the following steps:

s101, receiving a face image to be processed, and acquiring illumination parameters of the face image to be processed based on the preset neural network model.

In a specific implementation scenario, a face image to be processed may be received from various sources such as a mobile phone camera, and after the face image to be processed is received, in order to perform a dimming process on the face image to be processed, an illumination parameter of the face image to be processed needs to be obtained.

The spherical harmonic illumination coefficient is actually a multi-dimensional coefficient obtained by sampling the ambient light, and then the coefficient is used for restoring the illumination during rendering, which can be regarded as simplification of the ambient light, thereby simplifying the calculation process.

The normal line mapping is to make normal line on each point of concave-convex surface of original object, mark normal line direction by RGB color channel, if light source is applied on specific position, it can make surface with low detail degree generate accurate illumination direction and reflection effect with high detail degree.

The illumination parameters such as the spherical harmonic illumination coefficient, the normal map and the like are acquired to analyze illumination information in the face image to be processed so as to further obtain a deglazed image, and the deglazed processing of the face image to be processed is realized.

The preset neural network model is generated in advance based on training data and a preset neural network structure.

The training data comprises a preset face image, a real spherical harmonic illumination coefficient of the preset face image and a real normal map. The real spherical harmonic illumination coefficient and the real normal map are the real values of the spherical harmonic illumination coefficient and the normal map of the preset face image.

In order to ensure the training effect of the training data on the preset neural network model, in a preferred embodiment of the present application, the training data is data subjected to data enhancement processing, where the data enhancement processing includes increasing the background of the preset face image and/or changing the rotation angle of the preset face image.

In a specific implementation scenario, various backgrounds are unavoidable in the face image, in order to improve the capability of the preset neural network model for recognizing the face image from the complex background, a background image can be added to the preset face image for training, for example, a graph is randomly selected from an atlas to be used as a background image, the background image is rendered first, then the face image is rendered, and the two images are overlapped. The face images with various angles can be identified by training a preset neural network model by changing the rotation angle of the preset face images.

It should be noted that, the scheme of the above preferred embodiment is only one specific implementation scheme provided in the present application, and other ways of processing training data in order to increase the training effect of the neural network are all within the protection scope of the present application.

The preset neural network structure comprises a convolutional neural network block model and a convolutional kernel. The convolution kernel is the function that, given an input image, a weighted average of pixels in a small region in the input image is taken as each corresponding pixel in the output image, where the weight is defined by a function called the convolution kernel.

In order to obtain higher accuracy and recall rate of the preset neural network structure, in a preferred embodiment of the present application, the convolutional neural network block model is a residual network block model, where a preset number of residual network block models are not connected to a full connection layer of the preset neural network structure.

In a specific implementation scene, the convolutional neural network block model is selected from a residual network block model, and the residual network is characterized by easy optimization and can improve the accuracy by increasing a considerable depth. The residual blocks inside the deep neural network are connected in a jumping mode, and the gradient disappearance problem caused by depth increase in the deep neural network is relieved. In order to avoid information loss caused by the full-connection layer, a preset number of residual network block models are not connected with the full-connection layer of the preset neural network structure, for example, 6 groups of residual network blocks are selected to skip the full-connection layer.

It should be noted that, the scheme of the above preferred embodiment is only a specific implementation scheme provided in the present application, the preset number of the total connection layers skipped by the residual network block may be determined according to a specific implementation scenario, and in order to enable the preset neural network structure to obtain higher accuracy and recall rate, other types of convolutional neural network block models may also be selected, and these modifications to the preset neural network structure all belong to the protection scope of the present application.

In order to obtain an accurate preset neural network model, in a preferred embodiment of the present application, the preset neural network model is generated based on training data and a preset neural network structure, specifically:

In a specific implementation scene, determining the number of units of an input layer according to the length and the width of an image in a preset face image, setting the input and output number of each hidden layer, randomly initializing weights, inputting the preset face image into the input layer, and determining an output layer result by utilizing the initial parameters through a forward propagation algorithm. Adam is adopted to design independent adaptive learning rates for different parameters by calculating first moment estimation and second moment estimation of gradients, so that an efficient training process is obtained. The back propagation algorithm computes the gradient of the loss function for all weight values in the network in combination with the optimization algorithm. This gradient is fed back to the optimization method for updating the weights to minimize the loss function. And determining a minimum loss value through thousands of iterations and adjustment of learning rate, and taking the weight value iteratively saved when the minimum loss value is obtained as the weight value in a preset neural network model.

It should be noted that, the scheme of the above preferred embodiment is only one specific implementation scheme provided in the present application, and other ways of generating the preset neural network model based on the training data and the preset neural network structure are all within the protection scope of the present application.

S102, generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter.

In a specific implementation scene, after the illumination parameters of the face image to be processed predicted by the preset neural network model are obtained, a dimming image of the face image to be processed can be generated according to the face image to be processed and the illumination parameters.

In order to increase the effect of the generated dimming image, in a preferred embodiment of the present application, the generating the dimming image of the face image to be processed according to the face image to be processed and the illumination parameter specifically includes:

In a specific implementation scene, the face image to be processed, the spherical harmonic illumination coefficient and the normal map are converted into corresponding matrixes, image processing calculation is facilitated, a fourth matrix corresponding to the deglazed image is obtained through the converted matrix processing calculation, and then the fourth matrix is converted to obtain the deglazed image.

It should be noted that, the above solution of the preferred embodiment is only one specific implementation solution provided in the present application, and other ways of generating the dimming image of the face image to be processed according to the face image to be processed and the illumination parameter are all within the protection scope of the present application.

In order to ensure accuracy of obtaining a dimming image by using a matrix, in a preferred embodiment of the present application, a fourth matrix is obtained according to the first matrix, the second matrix and the third matrix, specifically:

In a specific implementation scene, the converted matrix is processed and calculated by the above-mentioned deglazing formula to obtain a fourth matrix corresponding to the deglazing image, and the illumination information in the face image to be processed is removed, so that enough definition can be reserved. The above-mentioned dimming formula is only a preferred formula provided by the present invention, other formulas can be used for performing matrix calculation to obtain a dimming image under the teaching of the present invention, and the mode of obtaining the dimming image by using matrix calculation belongs to the protection scope of the present application.

In order to further explain the technical idea of the invention, the technical scheme of the invention is described with specific application scenarios.

The embodiment of the invention provides a processing method of a face image, which comprises the steps of firstly training a neural network model through a plurality of groups (such as 40 ten thousand groups) of data. As long as the face photo is input, the predicted 27-dimensional spherical harmonic illumination coefficient and the normal map of each pixel are output through the model, so that illumination information is calculated. And calculating each pixel of the original image and the illumination information to finally obtain a dimming image. The implementation process is shown in fig. 2, and the specific steps are as follows:

the first step: a plurality of sets of training data for training a predetermined neural network model are collected. An example of a set of training data is shown in fig. 3. The training data is that accurate face photos are obtained in advance based on parameterized 3D face models, and spherical harmonic illumination coefficients and normal maps corresponding to the face photos. And taking the face photo as input data, and taking the spherical harmonic illumination coefficient and the normal map as the true value of output data. In order to make the preset neural network model have better generalization capability, data enhancement processing is also performed on training data, such as adding various backgrounds.

In a specific implementation scenario, training data may be collected based on the three-dimensional face deformation model 3 DMM. Based on a BFM2017 database of the 3DMM, on the basis that a three-dimensional deformation model is built on a three-dimensional face database, the face shape and face texture statistics are taken as constraints, and meanwhile, the influence of the pose and illumination factors of the face is considered, so that the generated three-dimensional face model is high in precision.

And a second step of: selecting a residual network block model to extract features in the face image. The preset neural network structure is shown in fig. 4. The residual network is characterized by easy optimization and can improve accuracy by increasing considerable depth. The residual network block inside the deep neural network uses jump connection, so that the gradient disappearance problem caused by adding depth in the deep neural network is relieved. And then, different convolution kernels are added to extract the characteristics of pixel areas with different sizes, so that the global and local key information points can be acquired more accurately. Specifically, a convolution kernel of the shape (3, 3), (3, 4), (3, 5) is selected to slide the convolution over the picture. Since the padding mechanism is used, it is independent of the image pixel size and area. Each convolution uses one convolution kernel to extract a set of features. In order to avoid information loss caused by the full connection layer, part of residual network blocks are skipped over the full connection layer, so that higher accuracy and recall rate are obtained.

In a specific implementation scenario, in order to extract features in a face image, a neural network training needs to be performed on a preset neural network structure by using collected training data, so as to generate a preset neural network model, which specifically includes the following steps:

1. selecting a preset neural network structure

And determining the number of units of the input layer according to the length and the width of the face image in the training data. The number of inputs and outputs per concealment layer is set, wherein the number of inputs and outputs of (3, 64), (64, 128), (256 ), (256, 512) is used in the encoding phase and the number of inputs and outputs of (512, 256), (256 ), (256, 64), (64,3) is used in the decoding phase.

2. Randomly initializing weights

The weight values in the neural network are initialized to a small number near 0, but not 0.

3. Performing forward propagation FP algorithm

For forward propagation, the process can be expressed by the following formula:

a ⁿ ＝σ(a ^n-1 *W ⁿ +b ⁿ )

wherein the superscript represents the number of layers, the asterisk represents the convolution, b represents the bias term bias, σ represents the activation function, and W represents the weight value.

4. Calculating a loss function

The calculation formula is as follows:

Loss(image)＝λ1*E-RECON+λ2*E-Normal+(1-λ1-λ2)*E-Light

wherein image is a face image, E-RECON is a difference between a predicted dimming image and an original image, E-Norma is a normal map difference generated by a predicted normal map and training data, E-Light is an illumination difference generated by a predicted illumination and training data, λ1=0.3, and λ2=0.3.

5. Minimizing loss function using optimization algorithm and back propagation algorithm

The random gradient descent keeps a single learning rate updating all weights, and the learning rate does not change during training. The independent adaptive learning rate can be designed for different parameters by adopting Adam to calculate the first moment estimation and the second moment estimation of the gradient, so that an efficient training process is obtained. The back propagation algorithm computes the gradient of the loss function for all weight values in the network in combination with the optimization algorithm. This gradient is fed back to the optimization method for updating the weight values to minimize the loss function.

6. Preserving neural network values

After thousands of iterations and adjustments in learning rate, the training is ended when the loss value is reduced to a point where it can no longer be reduced. And using the neural network weight saved in the iteration with the smallest loss value as a model of the using stage.

And a third step of: the trained pre-set neural network model may be packaged into a web service or SDK (Software Development Kit ) for application invocation. The mobile phone sends the face image obtained by the camera to a preset neural network model, so that a spherical harmonic illumination coefficient and a normal map are obtained. And converting the face image, the spherical harmonic illumination coefficient and the normal map into matrixes, and then calculating through a dimming formula to obtain a dimming image matrix. Finally, a face image after the light removal is obtained, as shown in fig. 5.

Specifically, the picture is an n×n-dimensional array of 3 channels, and the matrix is an n×n×3 matrix. The spherical harmonic illumination coefficient is a matrix of 1 x 27 dimension, and if the picture is of n x n dimension, the spherical harmonic illumination coefficient is converted into a matrix of n x 27 dimension, and then calculated with the matrix of each channel of the picture.

Wherein, the formula of removing light is specifically:

A＝B/(C×D)

wherein A is a dimming image matrix, B is an original face image matrix, C is a spherical harmonic illumination coefficient matrix, and D is a normal map matrix.

By applying the technical scheme, a preset neural network model is generated in advance based on training data and a preset neural network structure, illumination parameters of the face image to be processed are acquired based on the preset neural network model, and a dimming image of the face image to be processed is generated according to the face image to be processed and the illumination parameters. The illumination information of the face image is obtained through the preset neural network model, and special light detection equipment is not needed, so that the accuracy and stability of deglazing the face image are improved on the basis of not improving the cost.

Corresponding to the processing method of the face image provided in the embodiment of the present application, the embodiment of the present application further provides a processing device of the face image, as shown in fig. 6, where the device includes:

the acquiring module 601 is configured to receive a face image to be processed, and acquire illumination parameters of the face image to be processed based on a preset neural network model, where the illumination parameters include a spherical harmonic illumination coefficient and a normal map, and the preset neural network model is generated in advance based on training data and a preset neural network structure;

a generating module 602, configured to generate a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter;

In a specific application scenario of the present application, the generating module 602 is specifically configured to:

In a specific application scenario of the present application, the generating module 602 is further configured to:

In a specific application scenario of the present application, the convolutional neural network block model is a residual network block model, where a preset number of residual network block models are not connected to a full connection layer of the preset neural network structure.

In a specific application scenario of the present application, the training data is data subjected to data enhancement processing, where the data enhancement processing includes increasing a background of the preset face image and/or changing a rotation angle of the preset face image.

In a specific application scenario of the present application, the apparatus further includes a training module, specifically configured to:

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for processing a face image, wherein a preset neural network model is generated in advance based on training data and a preset neural network structure, the method comprising:

generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter; specifically, a first matrix corresponding to the face image to be processed is obtained based on the face image to be processed; acquiring a second matrix corresponding to the spherical harmonic illumination coefficient and a third matrix corresponding to the normal map based on the illumination parameter; obtaining a fourth matrix according to the first matrix, the second matrix and the third matrix, specifically obtaining the fourth matrix according to a dimming formula, wherein the dimming formula specifically includes: a=b/(c×d), where a is the fourth matrix, B is the first matrix, C is the second matrix, and D is the third matrix; and acquiring the dimming image based on the fourth matrix;

the convolutional neural network block model is a residual network block model, wherein a preset number of residual network block models are not connected with a full-connection layer of the preset neural network structure; the training data comprises a preset face image, a real spherical harmonic illumination coefficient of the preset face image and a real normal map; the training data is data subjected to data enhancement processing, and the data enhancement processing comprises the steps of increasing the background of the preset face image and/or changing the rotation angle of the preset face image;

generating a preset neural network model based on training data and a preset neural network structure, wherein the method specifically comprises the following steps: determining initial parameters of a preset neural network structure according to the length and the width of the preset face image, wherein the initial parameters comprise the number of units of an input layer, the input number and the output number of each hidden layer and an initial weight value; inputting the preset face image into the input layer, and determining an output layer result based on a forward propagation algorithm and the initial parameters; determining a loss function according to the output layer result and the training data; training according to a preset learning rate based on an optimization algorithm and a back propagation algorithm, and determining a minimum loss value of the loss function according to a training result, wherein the preset learning rate is the learning rate determined by estimating an Adam algorithm based on an adaptive matrix; and determining the preset neural network model according to the weight value corresponding to the minimum loss value.

2. A processing apparatus for face images, the apparatus comprising:

the generating module is used for generating a dimming image of the face image to be processed according to the face image to be processed and the illumination parameter; the method is particularly used for acquiring a first matrix corresponding to the face image to be processed based on the face image to be processed; acquiring a second matrix corresponding to the spherical harmonic illumination coefficient and a third matrix corresponding to the normal map based on the illumination parameter; acquiring a fourth matrix according to the first matrix, the second matrix and the third matrix, and acquiring the dimming image based on the fourth matrix;

the generating module is further configured to obtain the fourth matrix according to a dimming formula, where the dimming formula specifically is: a=b/(c×d), where a is the fourth matrix, B is the first matrix, C is the second matrix, and D is the third matrix;

the training data comprises a preset face image, a real spherical harmonic illumination coefficient of the preset face image and a real normal map; the convolutional neural network block model is a residual network block model, wherein a preset number of residual network block models are not connected with a full-connection layer of the preset neural network structure.