CN110334679B

CN110334679B - Face point processing method and device

Info

Publication number: CN110334679B
Application number: CN201910627394.6A
Authority: CN
Inventors: 陈良; 余清洲; 苏晋展; 张伟; 许清泉
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-11-26
Anticipated expiration: 2039-07-11
Also published as: CN110334679A

Abstract

The disclosure provides a method and a device for processing face points, and relates to the technical field of image processing. The face point processing method and device provided by the disclosure are based on a face point fitting network, down-sampling is carried out on a face facial feature image to be detected, face point feature information is extracted to obtain face point data, and up-sampling is carried out on the face point feature information based on a segmentation network to obtain a face facial feature mask. According to the face point processing method and device, the face point fitting network and the segmentation network are combined, the face facial features image is processed, the face point data and the facial features mask are output, time cost is saved, and precision is improved.

Description

Face point processing method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing a face point.

Background

The face point alignment has a very wide application scene in production, and AR (Augmented Reality) materials and makeup can be added to the face of a person, and even a 3D (three-dimensional) model of the face can be established in an auxiliary mode. In practical application scenes, face points are mostly obtained in a regression mode of each frame, then a mask is fitted according to the difference value of the face points of each frame, and the parts of five sense organs of the face are made up according to the mask. However, fitting a mask in this way is time-costly and less accurate.

Disclosure of Invention

Based on the research, the present disclosure provides a face point processing method and apparatus.

Embodiments of the present disclosure may be implemented as follows:

in a first aspect, the disclosed embodiments provide a face point processing method, which is applied to an electronic device, where the electronic device stores a face point fitting network and a segmentation network; the method comprises the following steps:

based on the face point fitting network, performing down-sampling on a face facial feature image to be detected, and extracting face point characteristic information to obtain face point data;

and based on the segmentation network, performing up-sampling on the face point characteristic information to obtain a face facial feature mask.

Further, the face point fitting network and the segmentation network are obtained by training through the following steps:

based on a face point fitting network to be trained, performing down-sampling on a facial feature picture, and extracting first face point characteristic information to obtain first face point data;

based on a segmentation network to be trained, performing up-sampling on the first face point feature information to obtain a first face facial feature mask;

and according to the first face facial feature mask and the first face point data, combining sample data obtained in advance, based on a preset loss function, and adjusting the weight of the face point fitting network to be trained and the weight of the segmentation network to be trained through a back propagation algorithm until the output of the preset loss function is smaller than a preset threshold value.

Further, the step of adjusting the weight of the face point fitting network to be trained and the weight of the segmentation network to be trained through a back propagation algorithm based on a preset loss function according to the first facial feature mask and the first facial point data by combining sample data obtained in advance includes:

adjusting the weight of the face point fitting network to be trained through a back propagation algorithm based on the preset first loss function according to the first face facial features mask, the first face point data and the sample data;

obtaining second face point characteristic information based on the adjusted face point fitting network, and obtaining a second face facial feature mask according to the second face point characteristic information;

and adjusting the weight of the segmentation network to be trained through a back propagation algorithm based on the preset second loss function according to the second face facial feature mask.

Further, the preset second loss function is a cross entropy loss function, and the preset first loss function is:

wherein l_iCoordinates of each face point in the face point data; l is_iCoordinates of each face point in sample data; i is_xi，yiPixel points of a mask for the five sense organs of the human face; r is a threshold value for judging whether the pixel point is visible or not; n is the number of sample data, and i is any one sample in the sample data.

Further, the face point fitting network comprises a plurality of down-sampling units, the output size of each down-sampling unit is 1/C of the output size of the last down-sampling unit, and C is a positive integer.

Further, the segmentation network comprises a plurality of up-sampling units, each up-sampling unit and each down-sampling unit are symmetrically arranged, the output size of each up-sampling unit is C times of the output size of the last up-sampling unit, and C is a positive integer.

Further, the step of performing upsampling on the face point feature information based on the segmentation network to obtain a face facial feature mask includes:

for each up-sampling unit, fusing the output of the up-sampling unit with the output of the corresponding down-sampling unit to obtain fused characteristic information;

and for each up-sampling unit, up-sampling the up-sampling unit on the fusion characteristic information output by the last up-sampling unit.

In a second aspect, the disclosed embodiment provides a face point processing apparatus, which is applied to an electronic device, where the electronic device stores a face point fitting network and a segmentation network; the human face point processing device comprises an extraction module and a segmentation module;

the extraction module is used for performing down-sampling on a facial feature image to be detected based on the facial point fitting network, and extracting facial point characteristic information to obtain facial point data;

and the segmentation module is used for performing up-sampling on the face point characteristic information based on the segmentation network to obtain a face facial feature mask.

Further, the face point processing apparatus further includes a training module, and the training module is configured to:

Further, the preset loss function includes a preset first loss function and a preset second loss function, and the training module is further configured to:

According to the face point processing method and device, the face point fitting network and the segmentation network are combined to process the face facial features image, the output of the face point data and the facial features mask is achieved, the time cost is saved, compared with the prior art that the mask is obtained according to the face point difference fitting of each frame, the face point fitting network and the segmentation network are adopted to improve the accuracy of the face facial features mask.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained from the drawings without inventive effort.

Fig. 1 is a block diagram of an electronic device provided in the present disclosure.

Fig. 2 is a schematic flow chart of a face point processing method provided by the present disclosure.

Fig. 3 is another schematic flow chart of the face point processing method provided by the present disclosure.

Fig. 4 is a schematic diagram illustrating a principle of the face point processing method according to the present disclosure.

Fig. 5 is a schematic flow chart of a face point processing method provided by the present disclosure.

Fig. 6 is a schematic diagram of a network training process of the face point processing method provided by the present disclosure.

Fig. 7 is a schematic flow chart of a face point processing method according to the present disclosure.

Fig. 8 is a block diagram of a face point processing apparatus provided in the present disclosure.

Icon: 100-an electronic device; 10-a face point processing device; 11-an extraction module; 12-a segmentation module; 13-a training module; 20-a memory; 30-a processor; 40-a communication module.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features in the embodiments of the present disclosure may be combined with each other without conflict.

The face point alignment has a very wide application scene in production, and AR materials and makeup can be added to the face of a person, and even a 3D model of the face can be established in an auxiliary mode. Nowadays, the convolutional neural network has very wide application in various task scenes, and is not exceptional for face alignment.

In a real-time scene, face points are mostly obtained in a regression mode of each frame, the face points are provided for the next frame of cutting picture so as to achieve the purpose of tracking, and meanwhile, a mask is fitted according to the difference value of the face points of each frame and is used for processing of making up and the like of all parts of facial features. However, because the face points have the characteristic of dispersion, the mask fitting method has limitations in practical applications such as make-up, and meanwhile, the mask fitted in the mode cannot correctly process the situation that facial features are partially blocked, the accuracy is low, and meanwhile, the time cost and the calculation cost are high.

Based on the above research, the present disclosure provides a method and an apparatus for processing a face point, so as to improve the above problem.

Referring to fig. 1, the face point processing method provided by the present disclosure is applied to an electronic device 100, and the electronic device 100 executes the face point processing method provided by the present disclosure.

The electronic device 100 includes the human face point processing apparatus 10, the memory 20, the processor 30 and the communication module 40 shown in fig. 1, and the respective elements of the memory 20, the processor 30 and the communication module 40 are electrically connected to each other directly or indirectly to implement data transmission or interaction. For example, the components may be directly electrically connected to each other via one or more communication buses or signal lines. The face point processing device 10 includes at least one software functional module which can be stored in the memory 20 in the form of software or Firmware (Firmware), and the processor 30 executes various functional applications and data processing by running software programs and modules stored in the memory 20.

The Memory 20 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 30 may be an integrated circuit chip having signal processing capabilities. The Processor 30 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like.

The communication module 40 is configured to establish a communication connection between the electronic device 100 and another external device through a network, and perform data transmission through the network.

It is to be understood that the configuration shown in fig. 1 is merely exemplary, and that the electronic device 100 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

In the present disclosure, the electronic device 100 may be, but is not limited to, a device having a processing capability, such as a Personal Computer (PC), a notebook Computer, a Personal Digital Assistant (PDA), or a server.

Referring to fig. 2, the present disclosure provides a face point processing method applicable to the electronic device 100. Wherein the method steps defined by the method related flows may be implemented by the processor 30. The specific process shown in fig. 2 will be described in detail below.

Step S10: and based on the face point fitting network, performing down-sampling on the facial features image to be detected, extracting the facial point characteristic information, and obtaining the facial point data.

Step S20: and based on the segmentation network, performing up-sampling on the face point characteristic information to obtain a face facial feature mask.

The human face facial features image to be detected is obtained based on a facial-aligned facial features cascade model, a full-face picture is input into the facial features cascade model to obtain full-face facial points, namely facial key points, and then positioning cutting is carried out according to the full-face facial points to obtain the facial features image, such as a facial mouth image, a facial eye image and the like. And the facial image that cuts out according to the facial point location of full face is the facial image that is image that just.

After the face facial feature image to be detected is obtained, the face facial feature image to be detected is input into a face point fitting network, the face facial feature image to be detected is subjected to down-sampling processing based on the face point fitting network, and face point characteristic information is extracted to obtain face point data. And simultaneously, based on the segmentation network, performing up-sampling on the face point characteristic information output by down-sampling, and recovering image information to obtain a face facial features mask, wherein the face facial features mask is the same as the size of a face facial features image to be detected.

Further, in the present disclosure, the face point fitting network includes a plurality of down-sampling units, an output size of each down-sampling unit is 1/C of an output size of a last down-sampling unit, and C is a positive integer.

The downsampling part of the face point fitting network can adopt a ShortCut structure to form a plurality of downsampling units according to mainstream network design experience, the output of each downsampling unit is obtained by downsampling on the basis of the output of the last downsampling unit, for example, after a certain facial feature picture is input into a downsampling unit A1 to be downsampled, first feature information is output, the first feature information is used as the input of a downsampling unit A2, downsampling is performed on the first feature information, second feature information is output, the second feature information is used as the input of a downsampling unit A3, downsampling is performed on the second feature information, and the like, and after downsampling is performed by all downsampling units, the face point feature information of the facial feature picture is obtained.

In order to obtain the face point data and the face facial feature mask, in the present disclosure, the output size of each down-sampling unit needs to be designed to be 1/C of the output size of the last down-sampling unit, for example, in the down-sampling process of the face point fitting network, a certain facial feature picture is down-sampled by the down-sampling unit a1 to obtain first feature information, the first feature information is down-sampled by the down-sampling unit a2 to obtain second feature information, and the output size of the down-sampling unit a2 is designed to be 1/C of the output size of the down-sampling unit a1, so the size of the second feature information is reduced to be 1/C of the first feature information. Optionally, in the present disclosure, C is 2.

In the present disclosure, a ShortCut structure is adopted to form a plurality of down-sampling units, and each down-sampling unit completes the down-sampling process through convolution operation. The number and size of convolution kernels can be set according to actual conditions, and the disclosure is not limited. And after the facial feature image to be detected is subjected to convolution operation of each down-sampling unit, the facial point data are regressed in the full connection layer of the facial point fitting network.

The split network also adopts a Shortcut structure to form a plurality of up-sampling units, each up-sampling unit and each down-sampling unit are symmetrically arranged, and the output size of each up-sampling unit is C times of that of the last up-sampling unit. If the face point fitting network performs down-sampling for n times, the segmentation network performs up-sampling for n times, and if the output size of each down-sampling unit is 1/2 times of the output size of the last down-sampling unit, the output size of each up-sampling unit is 2 times of the output size of the last up-sampling unit.

Further, referring to fig. 3, the step of upsampling the facial point feature information based on the segmentation network to obtain a facial feature mask includes steps S21 to S22.

Step S21: and for each up-sampling unit, fusing the output of the up-sampling unit with the output of the corresponding down-sampling unit to obtain fused characteristic information.

Step S22: and for each up-sampling unit, up-sampling the up-sampling unit on the fusion characteristic information output by the last up-sampling unit.

The facial features lose more shallow information after being subjected to downsampling, and are not beneficial to learning of an upsampling part, so that feature fusion needs to be carried out on each downsampling unit and the corresponding upsampling unit, and circulation of shallow information is achieved.

And for each up-sampling unit, performing convolution on the output of the down-sampling unit corresponding to the up-sampling unit with unchanged size, and then fusing the output of the up-sampling unit with the output of the up-sampling unit to obtain fused characteristic information. And after the fusion characteristic information is obtained, taking the fusion characteristic information as the input of the next up-sampling unit, and up-sampling the fusion characteristic information. For example, in fig. 4, the up-sampling unit B3 corresponds to the down-sampling unit A3, the up-sampling unit B2 corresponds to the down-sampling unit a2, and the up-sampling unit B1 corresponds to the down-sampling unit a 1. The output of the down-sampling unit A3 is used as the input of the up-sampling unit B3, up-sampling is carried out through the up-sampling unit B3, the output of the down-sampling unit A3 is subjected to size-invariant convolution and then is fused with the output of the up-sampling unit B3 to obtain fused characteristic information B1, the fused characteristic information B1 is used as the input of the up-sampling unit B2, up-sampling is carried out through the up-sampling unit B3, the output of the down-sampling unit A2 is subjected to size-invariant convolution and then is fused with the output of the up-sampling unit B2 to obtain fused characteristic information B2, the fused characteristic information B2 is used as the input of the up-sampling unit B1, and by analogy, after up-sampling is carried out through the last up-sampling unit, the features fused by the last up-sampling unit are input into a softmax layer of a segmentation network for classification, and then the face five-sense organ mask is obtained.

Alternatively, the feature fusion may be performed by Concat fusion or Alpha fusion.

According to the face point processing method and device, on the basis of the facial-alignment facial features cascading model, the structure of the face point fitting network is not changed, the segmentation network and the face point fitting network are combined, when the facial features image is processed, the face point data and the facial features mask can be output simultaneously, time and calculation cost are saved, and real-time segmentation performance is greatly improved.

Further, please refer to fig. 5 and fig. 6 in combination, fig. 6 is a schematic diagram of a training process of network training provided by the present disclosure, and fig. 6 is a schematic diagram of a training process of only a mouth portion of a human face, it can be understood that a training process of other facial features may also refer to the schematic diagram of the training process shown in fig. 6. The face point fitting network and the segmentation network are obtained by training through the following steps:

step S30: and performing down-sampling on the facial features picture based on a facial point fitting network to be trained, and extracting first facial point characteristic information to obtain first facial point data.

Step S40: and performing up-sampling on the first face point characteristic information based on a segmentation network to be trained to obtain a first face facial feature mask.

The process of obtaining the first face point data and the first face facial mask may refer to the processes of step S10 to step S22.

Step S50: and according to the first face facial feature mask and the first face point data, combining sample data obtained in advance, based on a preset loss function, and adjusting the weight of the face point fitting network to be trained and the weight of the segmentation network to be trained through a back propagation algorithm until the output of the preset loss function is smaller than a preset threshold value.

Further, please refer to fig. 7 in combination, where the preset loss function includes a preset first loss function and a preset second loss function, and the step of adjusting the weight of the face point fitting network to be trained and the weight of the segmentation network to be trained by using a back propagation algorithm based on the preset loss function by combining the sample data obtained in advance according to the first face facial feature mask and the first face point data includes steps S51 to S53.

Step S51: and adjusting the weight of the face point fitting network to be trained through a back propagation algorithm based on the preset first loss function according to the first face facial features mask, the first face point data and the sample data.

Wherein, the preset first loss function is:

wherein l_iCoordinates of each face point in the face point data; l is_iCoordinates of each face point in sample data; i is_xi，yiPixel points of a mask for the five sense organs of the human face; r is a threshold value for judging whether the pixel point is visible or not; n is the number of sample data, i is any sample in the sample data, Loss (L, L) is the Loss value of the sample data and the face point data, and Euclidean is the Euclidean distance function.

According to whether each pixel point in the first face facial features mask is visible or not, giving corresponding weight to a preset first loss function, for example, if the pixel point I in the first face facial features mask is visible_xi，yiIf the weight is less than the visible threshold value, the given weight is 0, and if the pixel point I in the mask of the first facial features is smaller than the visible threshold value, the pixel point I in the mask of the first facial features is_xi，yiAbove the visible threshold, the weight assigned is 1.

After the corresponding weight of a preset first loss function is given through a first face facial feature mask, sample data and a face point data loss value loss are calculated according to an Euclidean distance function Euclidean, the loss value is propagated reversely through a back propagation algorithm, and then the weight of a face point fitting network to be trained is adjusted.

In the present disclosure, coordinates of each face point in sample data are obtained by pre-labeling, the sample data is a group route in fig. 6, the first face point data is a Prediction in fig. 6, and the Loss Func in fig. 6 is a preset first Loss function. According to the method, the corresponding weight of the preset first loss function is given through the first face facial feature mask, when the loss value of the sample data and the face point data is calculated, the loss generated by the shielded part of the face point is shielded to a certain extent, and the precision of the face point and the robustness during shielding are improved.

Step S52: and obtaining second face point characteristic information based on the adjusted face point fitting network, and obtaining a second face facial feature mask according to the second face point characteristic information.

Step S53: and adjusting the weight of the segmentation network to be trained through a back propagation algorithm based on the preset second loss function according to the second face facial feature mask.

After the weight of the face point fitting network is adjusted, extracted face point feature information is correspondingly adjusted, and a face facial feature mask obtained according to the adjusted face point feature information is correspondingly adjusted. Therefore, second face point feature information is extracted and obtained based on the adjusted face point fitting network, and a second face facial features mask is obtained according to the second face point feature information, wherein the second face facial features mask is the adjusted face facial features mask.

And after a second face facial feature mask is obtained, based on a preset second loss function, adjusting the weight of the segmentation network to be trained through a back propagation algorithm until the output of the preset first loss function and the output of the preset second loss function are both smaller than a preset threshold value.

Optionally, in this disclosure, the second loss function is preset to be a cross entropy loss function.

According to the method, the face point fitting network and the segmentation network are jointly trained, the first face facial features mask is used for endowing corresponding weight of the preset first loss function, loss generated by the shielded part of the face point is shielded, and the accuracy and robustness of the face point and the face facial features mask during shielding are improved. After the training of the face point fitting network and the segmentation network is finished, the face point fitting network and the segmentation network are combined to process facial features images, face point data and facial features masks can be output simultaneously, time cost and calculation cost are saved, and real-time segmentation performance is greatly improved.

On the basis, please refer to fig. 8 in combination, an embodiment of the present disclosure provides a face point processing apparatus 10, which is applied to an electronic device 100, wherein the electronic device 100 stores a face point fitting network and a segmentation network; the face point processing device 10 includes an extraction module 11 and a segmentation module 12.

The extraction module 11 is configured to perform downsampling on a facial feature image to be detected based on the facial point fitting network, and extract facial point feature information to obtain facial point data.

The segmentation module 12 is configured to perform upsampling on the face point feature information based on the segmentation network to obtain a face facial feature mask.

Further, the human face processing apparatus 10 further includes a training module 13, where the training module 13 is configured to:

and performing down-sampling on the facial features picture based on a facial point fitting network to be trained, and extracting first facial point characteristic information to obtain first facial point data.

And performing up-sampling on the first face point characteristic information based on a segmentation network to be trained to obtain a first face facial feature mask.

And according to the first face facial feature mask and the first face point data, combining sample data obtained in advance, based on a preset loss function, and adjusting the weight values of the face point fitting network to be trained and the segmentation network to be trained through a back propagation algorithm until the output of the preset loss function is smaller than a preset threshold value.

Further, the preset loss function includes a preset first loss function and a preset second loss function, and the training module 13 is further configured to:

and adjusting the weight of the face point fitting network to be trained through a back propagation algorithm based on the preset first loss function according to the first face facial features mask, the first face point data and the sample data.

And obtaining second face point characteristic information based on the adjusted face point fitting network, and obtaining a second face facial feature mask according to the second face point characteristic information.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above-described face point processing apparatus 10 may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

In summary, the face point processing method and device provided by the disclosure perform down-sampling on a face facial feature image to be detected based on a face point fitting network, extract face point feature information to obtain face point data, and perform up-sampling on the face point feature information based on a segmentation network to obtain a face facial feature mask, so that the face point data and the facial feature mask are output simultaneously, time cost is saved, and precision is improved.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A face point processing method is applied to electronic equipment, wherein the electronic equipment stores a face point fitting network and a segmentation network; the method comprises the following steps:

based on the segmentation network, up-sampling the face point characteristic information to obtain a face facial feature mask;

the face point fitting network comprises a plurality of down-sampling units, the output size of each down-sampling unit is 1/C of the output size of the last down-sampling unit, and C is a positive integer;

the segmentation network comprises a plurality of up-sampling units, each up-sampling unit and each down-sampling unit are symmetrically arranged, the output size of each up-sampling unit is C times of the output size of the last up-sampling unit, and C is a positive integer;

the step of up-sampling the face point feature information based on the segmentation network to obtain a face facial features mask comprises:

for each up-sampling unit, up-sampling the up-sampling unit on the fusion characteristic information output by the last up-sampling unit;

the face points are face key points.

2. The method of claim 1, wherein the face point fitting network and the segmentation network are trained by:

3. The method according to claim 2, wherein the preset loss function includes a preset first loss function and a preset second loss function, and the step of adjusting the weight of the face point fitting network to be trained and the weight of the segmentation network to be trained by using a back propagation algorithm based on the preset loss function according to the first face facial feature mask and the first face point data in combination with pre-obtained sample data comprises:

4. The method according to claim 3, wherein the preset second loss function is a cross entropy loss function, and the preset first loss function is:

5. A face point processing device is applied to electronic equipment, wherein the electronic equipment stores a face point fitting network and a segmentation network; the human face point processing device comprises an extraction module and a segmentation module;

the segmentation module is used for performing up-sampling on the face point characteristic information based on the segmentation network to obtain a face facial feature mask;

the segmentation module is specifically configured to:

the face points are face key points.

6. The apparatus according to claim 5, further comprising a training module, said training module being configured to:

7. The apparatus according to claim 6, wherein the preset loss function comprises a preset first loss function and a preset second loss function, and the training module is configured to: