CN108229296B

CN108229296B - Face skin attribute identification method and device, electronic equipment and storage medium

Info

Publication number: CN108229296B
Application number: CN201710927454.7A
Authority: CN
Inventors: 罗思伟; 张展鹏; 张伟
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2017-09-30
Filing date: 2017-09-30
Publication date: 2021-04-02
Anticipated expiration: 2037-09-30
Also published as: CN108229296A

Abstract

The embodiment of the invention discloses a method and a device for identifying human face skin attributes, electronic equipment and a computer storage medium, wherein the method comprises the following steps: extracting the features of the face image in the image to be recognized through each convolution layer in the neural network; fusing the extracted features of at least one shallower convolutional layer in the neural network with the extracted features of the last convolutional layer to obtain fused features; and predicting the skin attribute of the face image based on the fusion characteristics to obtain a prediction label of the skin attribute. In the embodiment of the invention, the shallow layer characteristic and the deep layer characteristic of the neural network are obtained by fusing the extracted characteristic of at least one shallow convolutional layer and the extracted characteristic of the last convolutional layer in the neural network, so that the comprehensive judgment of the skin attribute is realized; predicting the skin attribute of the face image based on the fusion characteristics; and the prediction of different skin attributes of the human face skin is realized.

Description

Face skin attribute identification method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a face attribute identification method and device, electronic equipment and a computer storage medium.

Background

In recent years, deep learning has achieved a good result in various fields of image processing, for example: picture classification, picture segmentation, etc. The face skin attribute identification has important significance in some entertainment applications such as mobile phone beauty, live video and the like, and the face skin attribute generally comprises but is not limited to the skin and skin color attributes of a face. The beauty application may automatically select the degree of beauty based on the skin attributes to avoid situations where the degree of beauty is insufficient or overly beautiful.

Disclosure of Invention

The embodiment of the invention provides a technical scheme for identifying the skin attribute of a human face.

The embodiment of the invention provides a face skin attribute identification method, which comprises the following steps:

extracting the features of the face image in the image to be recognized through each convolution layer in the neural network;

fusing the extracted features of at least one shallower convolutional layer in the neural network with the extracted features of the last convolutional layer to obtain fused features; the shallower convolutional layer is the other convolutional layer except the last convolutional layer in the neural network;

and predicting the skin attribute of the face image based on the fusion characteristics to obtain a prediction label of the skin attribute.

In another embodiment of the above method according to the present invention, fusing the extracted features of at least one of the shallower convolutional layers and the last convolutional layer in the neural network to obtain fused features, including:

outputting at least one feature based on each shallow convolutional layer of the neural network, and carrying out scale transformation on the at least one feature based on the output of the shallow convolutional layer to obtain a feature with the same scale size as that of the feature output by the last convolutional layer;

and stacking the features with the same dimension to obtain the fused feature.

In another embodiment of the above method according to the present invention, the scaling the at least one feature output based on the shallower convolutional layer to obtain a feature with the same size as the feature scale output of the last convolutional layer includes:

and performing pooling operation on the at least one feature output based on the shallower convolutional layer to obtain a feature with the same size as the feature scale output by the last convolutional layer.

In another embodiment of the above method according to the present invention, the pooling the at least one feature based on the output of the shallower convolutional layer comprises:

and sequentially performing pooling operation on the extracted features of each shallower convolutional layer according to a pooling strategy of alternating average pooling and maximum pooling.

In another embodiment of the above method according to the present invention, the stacking features with the same dimension to obtain the fused feature includes:

and stacking the features with the same dimension in sequence by taking the channel as an axis to obtain a fusion feature, wherein the dimension of the fusion feature corresponds to the sum of the output channels of the convolutional layers.

In another embodiment based on the above method of the present invention, the predicting the skin attribute of the face image based on the fused feature includes:

predicting the skin attribute of the face image based on the fusion features through a full connection layer in a neural network;

before the predicting of the skin attribute of the face image based on the fusion feature, the method further includes:

and reducing the dimension of the fused feature by a dimension reduction convolutional layer.

In another embodiment of the foregoing method according to the present invention, the number of convolution kernels in the dimension-reduced convolution layer is smaller than a preset value, and the size of the convolution kernel is 1;

reducing the dimension of the fusion feature by a dimension reduction convolutional layer, comprising: and performing convolution operation on the fusion feature based on the dimensionality reduction convolution layer with the number of the convolution kernels smaller than a preset value to obtain a fusion feature graph with the dimensionality being the number of the convolution kernels.

In another embodiment of the method according to the present invention, before extracting features of a face image in an image to be recognized through each convolution layer in a neural network, the method further includes:

and carrying out face detection on the image to be recognized to obtain the face image, and extracting the face image from the image to be recognized.

In another embodiment based on the above method of the present invention, the performing face detection on the image to be recognized includes:

obtaining at least one face position feature and a face confidence coefficient threshold corresponding to the image to be recognized from the image to be recognized by using a face detection network, wherein the face position feature comprises a face position rectangular frame and a face confidence coefficient;

obtaining a face position rectangular frame with a face confidence degree larger than the face confidence degree threshold value based on the obtained face position characteristics;

and performing face key point detection on the face position rectangular frame based on a face key point network to obtain face key points, and obtaining a face image from the face position image based on the face key points.

In another embodiment based on the above method of the present invention, the face position feature further includes a face angle;

before performing face key point detection on the face position rectangular frame based on a face key point network, the method further comprises the following steps: and adjusting the face position rectangular frame based on the face angle to obtain the face position rectangular frame placed in the forward direction.

In another embodiment of the above method according to the present invention, the predictive label of the skin property comprises any one or more of:

skin quality, skin color, skin brightness.

In another embodiment of the foregoing method according to the present invention, the method further includes:

and performing beautification processing operation on the face image based on the prediction label of the skin attribute.

setting an image to be identified as a sample image, and training the neural network:

extracting the characteristics of the face image in the sample image through each convolution layer in the neural network; the sample image is marked with at least one known marking label;

predicting the skin attribute of the face image based on the fusion feature to obtain a prediction label of the skin attribute;

training the neural network based on the obtained predictive labels and known annotation labels.

In another embodiment of the above method according to the present invention, training the neural network based on the obtained predicted labels and known labeled labels includes:

calculating an error value through a loss function based on the obtained prediction label and the known label;

updating parameters in each convolutional layer of the neural network by a back gradient algorithm based on the error value.

taking the neural network with the updated parameters as the neural network, and iteratively extracting the features of the face image in the sample image through each convolution layer in the neural network; the sample image is marked with at least one known marking label; fusing the extracted features of at least one shallower convolutional layer in the neural network with the extracted features of the last convolutional layer to obtain fused features; predicting the skin attribute of the face image based on the fusion feature to obtain a prediction label of the skin attribute; calculating an error value through a loss function based on the obtained prediction label and the known label; updating parameters in each convolution layer of the neural network through a reverse gradient algorithm based on the error value; until the neural network meets a preset condition.

In another embodiment of the above method according to the present invention, the preset condition includes any one of:

the loss function is converged, the iteration times reach preset times, and the error value is smaller than a preset value.

directly and reversely propagating the error value to at least one shallower convolutional layer and the last convolutional layer of the neural network to obtain the fusion characteristics, and reversely propagating the error value to each convolutional layer of the neural network through a reverse gradient algorithm;

updating parameters in the convolutional layers based on the error values propagated into the convolutional layers.

taking the neural network with the updated parameters as the neural network, and iteratively extracting the features of the face image in the sample image through each convolution layer in the neural network; the sample image is marked with at least one known marking label; fusing the extracted features of at least one shallower convolutional layer in the neural network with the extracted features of the last convolutional layer to obtain fused features; predicting the skin attribute of the face image based on the fusion feature to obtain a prediction label of the skin attribute; calculating an error value through a loss function based on the obtained prediction label and the known label; directly and reversely propagating the error value to at least one shallower convolutional layer and the last convolutional layer of the neural network to obtain the fusion characteristics, and reversely propagating the error value to each convolutional layer of the neural network through a reverse gradient algorithm; updating parameters in the convolutional layers based on the error values propagated into the convolutional layers; until the neural network meets a preset condition.

According to an aspect of the embodiments of the present invention, there is provided a face skin attribute recognition apparatus, including:

the characteristic extraction unit is used for extracting the characteristics of the face image in the image to be recognized through each convolution layer in the neural network;

the characteristic fusion unit is used for fusing the extracted characteristics of at least one shallower convolutional layer and the extracted characteristics of the last convolutional layer in the neural network to obtain fusion characteristics; the shallower convolutional layer is the other convolutional layer except the last convolutional layer in the neural network;

and the attribute prediction unit is used for predicting the skin attribute of the face image based on the fusion feature to obtain a prediction label of the skin attribute.

In another embodiment of the above apparatus according to the present invention, the feature fusion unit includes:

the scale transformation module is used for outputting at least one feature based on each shallow convolutional layer of the neural network, and carrying out scale transformation on the at least one feature output based on the shallow convolutional layer to obtain a feature with the same scale size as that of the feature output by the last convolutional layer;

and the feature stacking module is used for stacking the features with the same dimension to obtain the fused features.

In another embodiment of the above apparatus according to the present invention, the scaling module is specifically configured to pool the at least one feature based on the output of the shallower convolutional layer to obtain a feature with the same size as the feature scale of the output of the last convolutional layer.

In another embodiment of the above apparatus according to the present invention, the pooling operation performed by the scaling module includes:

In another embodiment of the above apparatus according to the present invention, the feature stacking module is specifically configured to stack the features with the same dimension in sequence with a channel as an axis to obtain a fused feature, where a dimension of the fused feature corresponds to a sum of output channels of the convolutional layers.

In another embodiment of the above apparatus according to the present invention, the attribute prediction unit is specifically configured to perform, through a fully connected layer in a neural network, prediction of skin attributes on the face image based on the fusion features;

the face skin attribute recognition device further comprises:

and the dimension reduction unit is used for reducing the dimension of the fusion feature through a dimension reduction convolutional layer.

In another embodiment of the above apparatus according to the present invention, the number of convolution kernels in the dimension-reduced convolution layer is smaller than a preset value, and the size of the convolution kernel is 1;

the dimension reduction unit is specifically configured to perform convolution operation on the fusion feature based on the dimension reduction convolution layer with the number of convolution kernels smaller than a preset value, and obtain a fusion feature map with the dimension being the number of the convolution kernels.

In another embodiment of the above apparatus according to the present invention, further comprising:

and the face recognition unit is used for carrying out face detection on the image to be recognized to obtain the face image and extracting the face image from the image to be recognized.

In another embodiment of the above apparatus according to the present invention, the face recognition unit includes:

the position acquisition module is used for acquiring at least one face position feature and a face confidence coefficient threshold corresponding to the image to be recognized from the image to be recognized by using a face detection network, wherein the face position feature comprises a face position rectangular frame and a face confidence coefficient;

the position determining module is used for obtaining a face position rectangular frame with a face confidence coefficient larger than the face confidence coefficient threshold value based on the obtained face position characteristics;

and the face acquisition module is used for executing face key point detection on the face position rectangular frame based on a face key point network to obtain face key points and acquiring a face image from the face position image based on the face key points.

In another embodiment of the above apparatus according to the present invention, the face position feature further includes a face angle;

the face skin attribute recognition device further comprises: and the angle adjusting unit adjusts the face position rectangular frame based on the face angle to obtain the face position rectangular frame placed in the forward direction.

In another embodiment of the above apparatus according to the present invention, the predictive label of the skin property comprises any one or more of:

skin quality, skin color, skin brightness.

and the beautifying unit is used for carrying out beautifying processing operation on the face image based on the prediction label of the skin attribute.

the sample prediction unit is used for setting an image to be identified as a sample image and obtaining a prediction label corresponding to the skin attribute of the sample image based on the feature extraction unit, the feature fusion unit and the attribute prediction unit; the sample image is marked with at least one known marking label;

and the network training unit is used for training the neural network based on the obtained prediction label and the known label.

In another embodiment of the above apparatus according to the present invention, the network training unit includes:

the error calculation module is used for calculating an error value through a loss function based on the obtained prediction label and the known label;

and the parameter updating module is used for updating the parameters in each convolution layer of the neural network through a reverse gradient algorithm based on the error value.

In another embodiment of the above apparatus according to the present invention, the network training unit further includes:

the iteration updating module is used for taking the neural network after the parameters are updated as the neural network, and iterating to obtain a prediction label corresponding to the skin attribute of the sample image based on the feature extracting unit, the feature fusing unit and the attribute predicting unit; updating the parameters in each convolution layer based on the error calculation module and the parameter updating module; until the neural network meets a preset condition.

In another embodiment of the above apparatus according to the present invention, the preset condition includes any one of:

the error propagation module is used for directly and reversely propagating the error value to at least one shallower convolutional layer and the last convolutional layer of the fusion feature obtained in the neural network, and reversely propagating the error value to each convolutional layer of the neural network through a reverse gradient algorithm;

a parameter update module to update parameters in the convolutional layers based on the error values propagated into the convolutional layers.

the iteration updating module is used for taking the neural network after the parameters are updated as the neural network, and iterating to obtain a prediction label corresponding to the skin attribute of the sample image based on the feature extracting unit, the feature fusing unit and the attribute predicting unit; updating the parameters in each convolution layer based on the error calculation module, the error propagation module and the parameter updating module; until the neural network meets a preset condition.

According to an aspect of the embodiment of the present invention, there is provided an electronic device, which includes a processor, wherein the processor includes the human face skin attribute recognition apparatus as described above.

According to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: a memory for storing executable instructions;

and a processor in communication with the memory to execute the executable instructions to perform the operations of the face-skin attribute recognition method as described above.

According to an aspect of the embodiments of the present invention, there is provided a computer storage medium for storing computer-readable instructions, which when executed, perform the operations of the face-skin attribute identification method as described above.

Based on the method and the device for identifying the attribute of the human face skin, the electronic equipment and the computer storage medium provided by the embodiment of the invention, the features of the human face image in the image to be identified are extracted through each convolution layer in the neural network; fusing the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fused features; the superficial layer features and the deep layer features of the neural network can be obtained through the fusion features, and the comprehensive judgment on the skin attribute is realized by using the features extracted from the shallower convolutional layer and the features extracted from the last convolutional layer; the problem that in the prior art, the skin attribute judgment is inaccurate due to the fact that some details are lost because the characteristics of the last convolutional layer are only obtained is solved; predicting the skin attribute of the face image based on the fusion characteristics; and the prediction of different skin attributes of the human face skin is realized.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a flowchart of an embodiment of a face skin attribute recognition method according to the present invention.

Fig. 2 is a schematic structural diagram of a specific example of the above embodiments of the face skin attribute recognition method of the present invention.

Fig. 3 is a schematic structural diagram of an embodiment of the face-skin attribute recognition apparatus of the present invention.

Fig. 4 is a schematic structural diagram of an electronic device for implementing a terminal device or a server according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

The computer system/server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

In the prior art, the process of identifying attributes of human face skin includes: after preprocessing of the acquired picture (including picture size and gray scale normalization, head posture correction and image segmentation), feature extraction (including geometric features, statistical features, frequency domain features, motion features and the like) is carried out, and finally classification is carried out. However, the recognition process requires design of manual features and feature selection, and different feature designs are required for different illumination and human face angular rotation, that is, the robustness of the method needs to be improved, and the method is difficult to be applied in practice. Generally speaking, for the identification of the skin attribute of the face, a neural network model is established, a skin model and a skin color model are respectively trained based on training data, for a given test picture, the face part is extracted by methods such as face detection and alignment, and then the face part is placed into a skin color or skin type neural network for the identification of the skin attribute of the face. But a separately trained model will result in a linear increase of the volume of the model with increasing attribute class.

Fig. 1 is a flowchart of an embodiment of a face skin attribute recognition method according to the present invention. As shown in fig. 1, the method of this embodiment includes:

step 101, extracting the features of the face image in the image to be recognized through each convolution layer in the neural network.

Specifically, each convolution layer in the neural network sequentially extracts the features of the face image in the image to be recognized.

And 102, fusing the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fused features.

Wherein, the shallower convolutional layer is the other convolutional layers except the last convolutional layer in the neural network; the extracted features of the shallower convolutional layer can reflect the skin problems (such as pox and scar) of the human skin, and the problem that only part of the skin attribute can be identified due to the fact that the neural network only utilizes the output feature of the last convolutional layer in the prior art is solved; and fusing the multilayer features, and judging the skin attribute of the human face by using the fused features, so that the recognition effect of the skin attribute is improved.

And 103, predicting the skin attribute of the face image based on the fusion features to obtain a prediction label of the skin attribute.

Based on the method for identifying the attribute of the human face skin provided by the embodiment of the invention, the characteristics of the human face image in the image to be identified are extracted through each convolution layer in the neural network; fusing the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fused features; the superficial layer characteristics and the deep layer characteristics of the neural network can be obtained through the fusion characteristics, comprehensive judgment on skin attributes is realized by utilizing the characteristics extracted by the shallow convolutional layer and the characteristics extracted by the last convolutional layer, and only a plurality of layers are required to be added to the neural network aiming at the knowledge by sharing a plurality of intermediate layer parameters in the neural network when an additional attribute identification task is added, so that the parameter number cannot be increased in multiples; the problem that in the prior art, the skin attribute judgment is inaccurate due to the fact that some details are lost because the characteristics of the last convolutional layer are only obtained is solved; predicting the skin attribute of the face image based on the fusion characteristics; and the prediction of different skin attributes of the human face skin is realized.

In a specific example of the above embodiment of the face-skin attribute recognition method of the present invention, operation 102 includes:

and stacking the features with the same dimension to obtain the fused feature.

In this embodiment, before feature fusion, the features to be fused are first converted into the feature scale output by the last convolutional layer, so as to provide a fusion basis for feature fusion; usually, a convolutional layer outputs at least one feature through at least one channel, each channel outputs one feature, the dimensions of the features output by each convolutional layer are the same, and the features converted into the same dimensions are stacked according to the channels in the stacking process.

In a specific example of the foregoing embodiments of the face skin attribute recognition method of the present invention, scale-transforming at least one feature output based on a shallower convolutional layer to obtain a feature with the same size as the feature scale output by the last convolutional layer, the method includes:

and performing pooling operation on at least one feature output based on the shallower convolutional layer to obtain a feature with the same size as the feature scale output by the last convolutional layer.

In this embodiment, in order to convert the size of the feature that needs to be fused with the feature output by the last convolutional layer, the feature may be converted to the required size by pooling operations (including max pooling, and/or average pooling, etc.).

In a specific example of the foregoing embodiments of the face-skin attribute recognition method of the present invention, the pooling operation of at least one feature output based on a shallower convolutional layer includes: and sequentially performing pooling operation on the extracted features of each shallow convolutional layer according to a pooling strategy of alternating average pooling and maximum pooling.

Fig. 2 is a schematic structural diagram of a specific example of the above embodiments of the face skin attribute recognition method of the present invention. As shown in fig. 2, the average pooling and the maximum pooling are performed on a straight line of the feature map, the average pooling and the maximum pooling are performed on a second feature map, the maximum pooling is performed on a third feature map, and the maximum pooling is performed on a fourth feature map; the size of all feature maps is adjusted to the same size by these pooling operations, and there is no limitation on which pooling operation or the order of the pooling operations is selected, and the alternation of average pooling and maximum pooling in this embodiment means the alternation of average pooling and maximum pooling when pooling operation is required to be performed jointly by the average pooling and the maximum pooling.

In a specific example of the above embodiments of the face skin attribute recognition method of the present invention, stacking features with the same scale size to obtain a fusion feature includes:

Each of the convolutional layers comprises at least one channel, each channel outputs a feature, and the dimension of a fused feature obtained by stacking the features together with the channel as an axis is equal to the sum of the channel numbers output by all the convolutional layers.

In another embodiment of the method for identifying attributes of human face skin according to the present invention, based on the above embodiments, the operation 103 includes:

predicting the skin attribute of the face image based on the fusion characteristics through a full connection layer in a neural network;

before predicting the skin attribute of the face image based on the fusion features, the method further comprises the following steps:

the fused features are reduced in dimension by a dimension reduction convolutional layer.

In this embodiment, since the dimension of the fused feature corresponds to the sum of the channel numbers output by the at least two convolutional layers, the dimension of the fused feature is large, and for the convenience of subsequent identification, the dimension of the fused feature is reduced by the dimension-reduced convolutional layer.

In a specific example of the above embodiments of the face skin attribute recognition method of the present invention, the number of convolution kernels in the dimension reduction convolution layer is less than a preset value, and the size of the convolution kernel is 1;

reducing the dimension of the fusion feature through a dimension reduction convolutional layer, comprising the following steps: and performing convolution operation on the fusion features based on the dimensionality reduction convolution layers with the number of the convolution kernels smaller than the preset value to obtain a fusion feature graph with the number of the convolution kernels as the dimensionality.

In this embodiment, the dimension of the feature map is fused after dimension reduction is limited by limiting the number of convolution kernels of the dimension-reduced convolution layer and the size of the convolution kernel, and the dimension reduction that can be realized by the convolution layer with a convolution kernel of 1 is determined by the feature of the convolution operation.

In another embodiment of the method for identifying a facial skin attribute according to the present invention, on the basis of the above embodiments, before operation 101, the method further includes:

and carrying out face detection on the image to be recognized to obtain a face image, and extracting the face image from the image to be recognized.

In this embodiment, since the face skin attribute recognition method of the present invention is mainly performed for the face skin, the face image needs to be extracted from the image to be recognized, the process of extracting the face image may perform face detection through a neural network in the prior art, and the face image is extracted based on the detection result.

In a specific example of the above embodiment of the face skin attribute recognition method of the present invention, performing face detection on an image to be recognized includes:

obtaining a face position rectangular frame with a face confidence degree larger than a face confidence degree threshold value based on the obtained face position characteristics;

and performing face key point detection on the face position rectangular frame based on the face key point network to obtain face key points, and obtaining a face image from the face position image based on the face key points.

In this embodiment, a face position feature and a face confidence level threshold are first obtained from an image to be recognized, each image to be recognized includes a face confidence level threshold and at least one face position feature, which face position rectangular frames include faces can be judged based on the face confidence level threshold, the face position rectangular frames are face position rectangular frames, and then face key points are superimposed on the face position rectangular frames to obtain a face image.

In a specific example of the above embodiment of the face skin attribute recognition method of the present invention, the face position features further include a face angle;

before face key point detection is performed on the face position rectangular frame based on the face key point network, the method further comprises the following steps: and adjusting the face position rectangular frame based on the face angle to obtain the face position rectangular frame placed in the forward direction.

The face angle referred in the embodiment refers to an inclination angle of a horizontal angle face position rectangular frame of an image and does not refer to an inclination angle of a face in the face image, and the face in the face position rectangular frame is placed in the forward direction through angle adjustment of the face position rectangular frame, so that the face image can be extracted and processed subsequently.

In a specific example of the above embodiment of the face skin attribute recognition method of the present invention, the prediction labels of the skin attributes include any one or more of the following items:

skin quality, skin color, skin brightness.

The characteristics obtained by the multilayer convolutional layer of the embodiment can identify skin attributes from multiple angles, and may include: the human face skin can be more comprehensively known based on the skin attributes, such as the skin quality, the skin color and the skin brightness, so that the human face skin can be more accurately operated subsequently.

In a specific example of the above embodiment of the face skin attribute recognition method of the present invention, the method further includes:

and performing beautifying processing operation on the face image based on the prediction label of the skin attribute.

In the embodiment, through the skin attributes such as the quality of the identified skin of the human face, the skin color brightness and the like, the skin quality of the user is evaluated and the skin color brightness of the user is estimated according to the algorithm, and then the beautifying operation of reasonable degree is carried out on the image of the user by the beautifying algorithm, wherein the skin quality improvement comprises skin grinding, whitening and the like; for example: the face of a user has more acnes, poor skin quality and black skin, so that the skin can be strongly rubbed and whitened when the beautifying procedure is performed; if the user has a good skin and a white skin, the skin care procedure will be performed to a low degree of skin abrasion and skin whitening, so that the obtained result will be more realistic.

On the basis of the above embodiments, the present invention further provides a method for identifying attributes of human face skin, further comprising:

setting an image to be identified as a sample image, and training a neural network:

predicting the skin attribute of the face image based on the fusion characteristics to obtain a prediction label of the skin attribute;

and training the neural network based on the obtained predicted labels and the known labeled labels.

The neural network obtained by training through the training method provided by the embodiment can be applied to any embodiment of the invention to obtain a more accurate prediction label of the skin attribute.

In a specific example of the above-described embodiment of the face skin attribute recognition method of the present invention, training a neural network based on an obtained predicted label and a known label includes:

and updating parameters in each convolution layer of the neural network through a reverse gradient algorithm based on the error value.

The training method provided in this embodiment updates the parameters in each convolutional layer by a commonly used inverse gradient method, which is not limited to the training method of the present invention, and the specific training may also be implemented by other training methods in the prior art.

taking the neural network after updating the parameters as a neural network, and iteratively extracting the features of the face image in the sample image through each convolution layer in the neural network; the sample image is marked with at least one known marking label; fusing the extracted features of at least one shallower convolutional layer in the neural network with the extracted features of the last convolutional layer to obtain fused features; predicting the skin attribute of the face image based on the fusion characteristics to obtain a prediction label of the skin attribute; calculating an error value through a loss function based on the obtained prediction label and the known label; updating parameters in each convolution layer of the neural network through a reverse gradient algorithm based on the error value; until the neural network meets the preset condition.

The preset condition includes any one of the following:

and (4) converging the loss function, enabling the iteration times to reach preset times and enabling the error value to be smaller than a preset value.

In this embodiment, the prediction process of the skin attribute is iteratively performed on the neural network after the parameters are updated, and it is known that the obtained neural network meets a preset condition, where the preset condition includes, but is not limited to, loss function convergence, the number of iterations reaching a preset number, or an error value being smaller than a preset value.

In another embodiment of the method for identifying a skin attribute of a human face according to the present invention, based on the above embodiments, the training of the neural network based on the obtained predicted labels and the known labeled labels includes:

directly and reversely propagating the error value to the neural network to obtain at least one shallower convolutional layer and the last convolutional layer of the fusion feature, and reversely propagating the error value to each convolutional layer of the neural network through a reverse gradient algorithm;

parameters in each convolutional layer are updated based on the error values propagated into each convolutional layer.

In this embodiment, the short connection directly connected between the shallow convolutional layer and the loss function layer is combined, so that the error value is less lost in the gradient return process, and the model can be more quickly converged, but when the error value is returned only by adopting the short connection, the problem that the parameters between the convolutional layers are not uniform is easily caused, so that the error is returned according to the sequence of the convolutional layers of the neural network by combining the reverse gradient algorithm while the gradient return is performed through the short connection, and the parameter uniformity is ensured while the training speed is increased; the method solves the problem that gradient attenuation is generated during gradient return when the number of layers of a network is deep when a training method in the prior art performs gradient reverse conduction in training, and further the updating of shallow convolution kernel parameters is slow.

taking the neural network after updating the parameters as a neural network, and iteratively extracting the features of the face image in the sample image through each convolution layer in the neural network; the sample image is marked with at least one known marking label; fusing the extracted features of at least one shallower convolutional layer in the neural network with the extracted features of the last convolutional layer to obtain fused features; predicting the skin attribute of the face image based on the fusion feature to obtain a prediction label of the skin attribute; calculating an error value through a loss function based on the obtained prediction label and the known label; directly and reversely propagating the error value to the neural network to obtain at least one shallower convolutional layer and the last convolutional layer of the fusion feature, and reversely propagating the error value to each convolutional layer of the neural network through a reverse gradient algorithm; updating parameters in each convolutional layer based on the error values propagated into each convolutional layer; until the neural network meets the preset condition.

The preset condition includes any one of the following:

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Fig. 3 is a schematic structural diagram of an embodiment of the face-skin attribute recognition apparatus of the present invention. The apparatus of this embodiment may be used to implement the method embodiments of the present invention described above. As shown in fig. 3, the apparatus of this embodiment includes:

and the feature extraction unit 31 is configured to perform feature extraction on the face image in the image to be recognized through each convolution layer in the neural network.

And the feature fusion unit 32 is configured to fuse the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fusion features.

And the attribute predicting unit 33 is configured to perform skin attribute prediction on the face image based on the fusion features, and obtain a prediction label of the skin attribute.

Based on the human face skin attribute recognition device provided by the embodiment of the invention, the features of the human face image in the image to be recognized are extracted through each convolution layer in the neural network; fusing the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fused features; the superficial layer characteristics and the deep layer characteristics of the neural network can be obtained through the fusion characteristics, comprehensive judgment on skin attributes is realized by utilizing the characteristics extracted by the shallow convolutional layer and the characteristics extracted by the last convolutional layer, and only a plurality of layers are required to be added to the neural network aiming at the knowledge by sharing a plurality of intermediate layer parameters in the neural network when an additional attribute identification task is added, so that the parameter number cannot be increased in multiples; the problem that in the prior art, the skin attribute judgment is inaccurate due to the fact that some details are lost because the characteristics of the last convolutional layer are only obtained is solved; predicting the skin attribute of the face image based on the fusion characteristics; and the prediction of different skin attributes of the human face skin is realized.

In a specific example of the above embodiment of the face-skin attribute recognition apparatus of the present invention, the feature fusion unit 32 includes:

the scale transformation module is used for outputting at least one feature based on each shallow convolutional layer of the neural network, and carrying out scale transformation on the at least one feature based on the output of the shallow convolutional layer to obtain a feature with the same scale size as the feature output by the last convolutional layer;

In a specific example of the above embodiment of the face-skin attribute recognition apparatus of the present invention, the scale transformation module is specifically configured to perform pooling operation on at least one feature output based on a shallower convolutional layer to obtain a feature with the same size as a feature scale output by a last convolutional layer.

In a specific example of the above embodiment of the face-skin attribute recognition apparatus of the present invention, the pooling operation performed by the scale transformation module includes:

and sequentially performing pooling operation on the extracted features of each shallow convolutional layer according to a pooling strategy of alternating average pooling and maximum pooling.

In a specific example of the above embodiment of the face skin attribute recognition device of the present invention, the feature stacking module is specifically configured to stack the features with the same dimension in sequence by using the channels as axes to obtain a fusion feature, and the dimension of the fusion feature corresponds to the sum of the output channels of the convolutional layers.

In another embodiment of the apparatus for identifying attributes of human face skin according to the present invention, on the basis of the above embodiments, the attribute predicting unit 33 is specifically configured to predict the skin attributes of the human face image based on the fusion features through a full connection layer in a neural network;

the face skin attribute recognition device of the embodiment further comprises:

In a specific example of each of the above embodiments of the face skin attribute recognition apparatus of the present invention, the number of convolution kernels in the dimension reduction convolution layer is smaller than a preset value, and the size of the convolution kernel is 1;

and the dimension reduction unit is specifically used for performing convolution operation on the fusion features based on the dimension reduction convolution layers with the number of convolution kernels smaller than a preset value to obtain a fusion feature graph with the dimension as the number of the convolution kernels.

In another embodiment of the face-skin attribute recognition apparatus of the present invention, on the basis of the above embodiments, the apparatus further includes:

and the face recognition unit is used for carrying out face detection on the image to be recognized to obtain a face image and extracting the face image from the image to be recognized.

In a specific example of the above embodiments of the face-skin attribute recognition apparatus of the present invention, the face recognition unit includes:

the position determining module is used for obtaining a face position rectangular frame with the face confidence coefficient larger than a face confidence coefficient threshold value based on the obtained face position characteristics;

and the face acquisition module is used for executing face key point detection on the face position rectangular frame based on the face key point network to obtain face key points and acquiring a face image from the face position image based on the face key points.

In a specific example of each of the above embodiments of the face skin attribute recognition apparatus of the present invention, the face position feature further includes a face angle;

the face skin attribute recognition device of the embodiment further comprises: and the angle adjusting unit adjusts the face position rectangular frame based on the face angle to obtain the face position rectangular frame placed in the forward direction.

In a specific example of the foregoing embodiments of the face-skin attribute recognition apparatus of the present invention, the prediction labels of the skin attributes include any one or more of the following items:

skin quality, skin color, skin brightness.

In a specific example of the above embodiments of the face-skin attribute recognition apparatus of the present invention, the apparatus further includes:

On the basis of the above embodiments, the present invention further provides a further embodiment of the face-skin attribute recognition apparatus, further comprising:

and the sample prediction unit is used for setting the image to be identified as a sample image and obtaining a prediction label of the skin attribute of the corresponding sample image based on the feature extraction unit, the feature fusion unit and the attribute prediction unit.

Wherein the sample image is labeled with at least one known labeling label.

And the network training unit is used for training the neural network based on the obtained predicted labels and the known labeled labels.

In a specific example of the above embodiments of the face skin attribute recognition apparatus of the present invention, the network training unit includes:

In a specific example of the foregoing embodiments of the face-skin attribute recognition apparatus of the present invention, the network training unit further includes:

the iteration updating module is used for taking the neural network after the parameters are updated as the neural network, and iterating to obtain a prediction label of the skin attribute of the corresponding sample image based on the feature extracting unit, the feature fusing unit and the attribute predicting unit; updating the parameters in each convolution layer based on the error calculation module and the parameter updating module; until the neural network meets the preset condition.

In a specific example of the above embodiments of the face-skin attribute recognition apparatus of the present invention, the preset condition includes any one of:

In another embodiment of the face skin attribute recognition apparatus according to the present invention, on the basis of the above embodiments, the network training unit includes:

the error propagation module is used for directly and reversely propagating the error value to the neural network to obtain at least one shallower convolutional layer and the last convolutional layer of the fusion feature, and reversely propagating the error value to each convolutional layer of the neural network through a reverse gradient algorithm;

a parameter update module to update parameters in each convolutional layer based on the error values propagated into each convolutional layer.

the iteration updating module is used for taking the neural network after the parameters are updated as the neural network, and iterating to obtain a prediction label of the skin attribute of the corresponding sample image based on the feature extracting unit, the feature fusing unit and the attribute predicting unit; updating the parameters in each convolution layer based on the error calculation module, the error propagation module and the parameter updating module; until the neural network meets the preset condition.

According to an aspect of the embodiments of the present invention, there is provided an electronic device, which includes a processor, where the processor includes the apparatus for identifying human face-skin attributes according to any of the above embodiments of the present invention.

and a processor for communicating with the memory to execute the executable instructions to perform the operations of any of the above-described embodiments of the face-skin attribute recognition method of the present invention.

According to an aspect of the embodiments of the present invention, there is provided a computer storage medium for storing computer-readable instructions, which when executed, perform the operations of any one of the above embodiments of the face-skin attribute recognition method of the present invention.

The embodiment of the invention also provides electronic equipment, which can be a mobile terminal, a Personal Computer (PC), a tablet computer, a server and the like. Referring now to fig. 4, there is shown a schematic diagram of an electronic device 400 suitable for use in implementing a terminal device or server of an embodiment of the present application: as shown in fig. 4, the computer system 400 includes one or more processors, communication sections, and the like, for example: one or more Central Processing Units (CPUs) 401, and/or one or more image processors (GPUs) 413, etc., which may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM)402 or loaded from a storage section 408 into a Random Access Memory (RAM) 403. Communications portion 412 may include, but is not limited to, a network card, which may include, but is not limited to, an ib (infiniband) network card,

the processor may communicate with the read-only memory 402 and/or the random access memory 430 to execute the executable instructions, connect with the communication part 412 through the bus 404, and communicate with other target devices through the communication part 412, thereby completing the operation corresponding to any method provided by the embodiments of the present application, for example, performing feature extraction on a face image in an image to be recognized through each convolution layer in a neural network; fusing the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fused features; and predicting the skin attribute of the face image based on the fusion characteristics to obtain a prediction label of the skin attribute.

In addition, in the RAM403, various programs and data necessary for the operation of the device can also be stored. The CPU401, ROM402, and RAM403 are connected to each other via a bus 404. The ROM402 is an optional module in case of the RAM 403. The RAM403 stores or writes executable instructions into the ROM402 at runtime, and the executable instructions cause the processor 401 to execute operations corresponding to the above-described communication method. An input/output (I/O) interface 405 is also connected to bus 404. The communication unit 412 may be integrated, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus link.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

It should be noted that the architecture shown in fig. 4 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 4 may be selected, deleted, added or replaced according to actual needs; in different functional component settings, separate settings or integrated settings may also be used, for example, the GPU and the CPU may be separately set or the GPU may be integrated on the CPU, the communication part may be separately set or integrated on the CPU or the GPU, and so on. These alternative embodiments are all within the scope of the present disclosure.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flowchart, the program code may include instructions corresponding to performing the method steps provided by embodiments of the present application, e.g., performing feature extraction on a face image in an image to be recognized through each convolutional layer in a neural network; fusing the extracted features of at least one shallower convolutional layer and the extracted features of the last convolutional layer in the neural network to obtain fused features; and predicting the skin attribute of the face image based on the fusion characteristics to obtain a prediction label of the skin attribute. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401.

The method and apparatus, device of the present invention may be implemented in a number of ways. For example, the method, apparatus and device of the present invention may be implemented by software, hardware, firmware or any combination of software, hardware and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.

The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A face skin attribute identification method is characterized by comprising the following steps:

obtaining at least one face position feature from an image to be recognized by using a face detection network; the face position features comprise face angles, and the face angles refer to the inclination angles of a face position rectangular frame aiming at the horizontal angles of the images;

adjusting the face position rectangular frame based on the face angle to obtain a face position rectangular frame placed in the forward direction; superposing the key points of the human face to the rectangular frame of the human face position to obtain a human face image;

further comprising: calculating an error value through a loss function based on a prediction label obtained by processing a sample image through the neural network and a known label corresponding to the sample image;

2. The method of claim 1, wherein fusing the extracted features of at least one of the shallower convolutional layers and the last convolutional layer in the neural network to obtain fused features comprises:

and stacking the features with the same dimension to obtain the fused feature.

3. The method of claim 2, wherein scaling the at least one feature based on the shallower convolutional layer output to obtain a feature having a same scale size as a feature output by a last convolutional layer output comprises:

4. The method of claim 3, wherein the pooling the at least one feature based on the shallower convolutional layer output comprises:

5. The method of claim 2, wherein stacking features of the same dimension size to obtain a fused feature comprises:

6. The method according to any one of claims 1-5, wherein the predicting skin attributes of the face image based on the fused features comprises:

7. The method of claim 6, wherein the number of convolution kernels in the dimension-reduced convolution layer is less than a preset value, and the size of the convolution kernels is 1;

8. The method according to any one of claims 1 to 5, wherein before extracting the features of the face image in the image to be recognized through each convolution layer in the neural network, the method further comprises:

carrying out face detection on the image to be recognized to obtain the face image, and extracting the face image from the image to be recognized;

the face detection of the image to be recognized comprises the following steps:

and performing face key point detection on the face position rectangular frame based on a face key point network to obtain face key points, and obtaining a face image from the face position rectangular frame based on the face key points.

9. The method of any one of claims 1-5, wherein the predictive signature of the skin attribute comprises any one or more of:

skin quality, skin color, skin brightness.

10. The method of any of claims 1-5, further comprising:

11. The method of any of claims 1-5, further comprising:

12. The method of claim 1, further comprising:

13. The method according to claim 12, wherein the preset condition comprises any one of:

14. A face skin attribute recognition apparatus, comprising:

the face recognition unit is used for acquiring at least one face position feature from the image to be recognized by using a face detection network; the face position features comprise face angles, and the face angles refer to the inclination angles of a face position rectangular frame aiming at the horizontal angles of the images;

the angle adjusting unit is used for adjusting the face position rectangular frame based on the face angle to obtain a face position rectangular frame placed in the forward direction; superposing the key points of the human face to the rectangular frame of the human face position to obtain a human face image;

the characteristic fusion unit is used for fusing the extracted characteristics of at least one shallower convolutional layer and the extracted characteristics of the last convolutional layer in the neural network to obtain fusion characteristics; the shallower convolutional layer is the other convolutional layer except the last convolutional layer in the neural network; the feature fusion unit is specifically configured to stack, according to channels, a plurality of features with the same scale size obtained based on the features extracted by at least one shallower convolutional layer and the features extracted by the last convolutional layer in the neural network, so as to obtain the fusion features;

the attribute prediction unit is used for predicting the skin attribute of the face image based on the fusion feature to obtain a prediction label of the skin attribute;

further comprising: a network training unit, the network training unit comprising: the error calculation module is used for calculating an error value through a loss function based on a prediction label obtained by processing a sample image through the neural network and a known label corresponding to the sample image;

15. The apparatus of claim 14, wherein the feature fusion unit comprises:

16. The apparatus of claim 15, wherein the scaling module is specifically configured to pool the at least one feature based on the output of the shallower convolutional layer to obtain a feature having a same scale size as a feature output of a last convolutional layer.

17. The apparatus of claim 16, wherein the pooling operation by the scaling module comprises:

18. The apparatus of claim 15, wherein the feature stacking module is specifically configured to stack the features with the same dimension in sequence with a channel as an axis to obtain a fused feature, and the dimension of the fused feature corresponds to a sum of output channels of the convolutional layers.

19. The apparatus according to any of the claims 14 to 18, wherein the property prediction unit is specifically configured to perform, through a fully connected layer in a neural network, prediction of skin properties for the face image based on the fused features;

the face skin attribute recognition device further comprises:

20. The apparatus of claim 19, wherein the number of convolution kernels in the dimension-reduced convolution layer is less than a preset value, and the size of the convolution kernels is 1;

21. The apparatus according to any one of claims 14 to 18, wherein the face recognition unit is configured to perform face detection on the image to be recognized, obtain the face image, and extract the face image from the image to be recognized;

the face recognition unit includes:

and the face acquisition module is used for executing face key point detection on the face position rectangular frame based on a face key point network to obtain face key points and acquiring a face image from the face position rectangular frame based on the face key points.

22. The apparatus according to any of claims 14-18, wherein the predictive signature of skin properties comprises any one or more of:

skin quality, skin color, skin brightness.

23. The apparatus of any of claims 14-18, further comprising:

24. The apparatus of any of claims 14-18, further comprising:

25. The apparatus of claim 14, wherein the network training unit further comprises:

26. The apparatus of claim 25, wherein the preset condition comprises any one of:

27. An electronic device, characterized in that it comprises a processor comprising the facial skin attribute recognition apparatus of any one of claims 14 to 26.

28. An electronic device, comprising: a memory for storing executable instructions;

and a processor in communication with the memory for executing the executable instructions to perform the operations of the face-skin attribute recognition method of any one of claims 1 to 13.

29. A computer storage medium storing computer readable instructions, wherein the instructions, when executed, perform the operations of the face-skin attribute recognition method of any one of claims 1 to 13.