CN110287930B

CN110287930B - Wrinkle classification model training method and device

Info

Publication number: CN110287930B
Application number: CN201910586506.8A
Authority: CN
Inventors: 王喆; 黄炜; 张伟; 许清泉; 关明鑫
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Yifu Technology Co ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2021-08-20
Anticipated expiration: 2039-07-01
Also published as: CN110287930A

Abstract

The embodiment of the application provides a wrinkle classification model training method and device, at least one wrinkle is marked on a face image sample with a wrinkle label, and the classification problem of the wrinkles is converted into a wrinkle segmentation problem, so that the marked at least one wrinkle can be used as a supervision signal to classify the wrinkles of the face image, and the accuracy of the classification result is improved.

Description

Wrinkle classification model training method and device

Technical Field

The application relates to the technical field of image processing, in particular to a wrinkle classification model training method and device.

Background

In some application scenarios, it is desirable to identify whether a facial image has wrinkles. In the related art, a Convolutional Neural Network (CNN) model is generally used as a classification model. However, since the face image generally has interference features such as eye shape and eyelash, the accuracy of classifying whether the face image contains wrinkles by CNN is low.

Disclosure of Invention

In view of the above, an objective of the present application is to provide a method and an apparatus for training a wrinkle classification model to at least partially improve the classification accuracy of the wrinkle classification model.

In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

in a first aspect, an embodiment of the present application provides a wrinkle classification model training method, configured to train a pre-constructed wrinkle classification model, where the wrinkle classification model includes a feature extraction part and an upsampling part; the feature extraction section includes a plurality of first processing layers, and the up-sampling section includes a plurality of second processing layers respectively corresponding to the plurality of first processing layers; the method comprises the following steps:

acquiring a plurality of face image samples and a mask image of each face image sample; a sample label is preset in each face image sample, the sample label is a wrinkle-containing label or a wrinkle-free label, and a mask image of the face image sample provided with the wrinkle-containing label comprises a contour of at least one wrinkle in the face image sample;

inputting each face image sample into the wrinkle classification model respectively; inputting the feature map output by the first processing layer at the highest layer in the feature extraction part into a full connection layer, and calculating the classification loss of the feature map output by the full connection layer and the sample label of the face image sample to obtain a first calculation result; calculating the segmentation loss of the feature map output by at least one second processing layer of the up-sampling part and the mask image of the face image sample to obtain at least one second calculation result;

and adjusting parameters of the wrinkle classification model according to the first calculation result and the second calculation result respectively to realize the training of the wrinkle classification model.

In a second aspect, an embodiment of the present application provides a wrinkle classification model training device, configured to train a pre-constructed wrinkle classification model, where the wrinkle classification model includes a feature extraction part and an upsampling part; the feature extraction section includes a plurality of first processing layers, and the up-sampling section includes a plurality of second processing layers respectively corresponding to the plurality of first processing layers; the device comprises:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of face image samples and mask images of each face image sample; a sample label is preset in each face image sample, the sample label is a wrinkle-containing label or a wrinkle-free label, and a mask image of the face image sample provided with the wrinkle-containing label comprises a contour of at least one wrinkle in the face image sample;

the computing module is used for respectively inputting each face image sample into the wrinkle classification model; inputting the feature map output by the first processing layer at the highest layer in the feature extraction part into a full connection layer, and calculating the classification loss of the feature map output by the full connection layer and the sample label of the face image sample to obtain a first calculation result; calculating the segmentation loss of the feature map output by at least one second processing layer of the up-sampling part and the mask image of the face image sample to obtain at least one second calculation result;

and the parameter adjusting module is used for adjusting parameters of the wrinkle classification model according to the first calculation result and the second calculation result respectively so as to realize the training of the wrinkle classification model.

Compared with the prior art, the embodiment of the application provides a wrinkle classification model training method and device, the classification problem of wrinkles is converted into a segmentation problem, at least one wrinkle marked in a mask image can be used as a supervision signal to classify the wrinkles of a face image, and the classification accuracy can be improved through fewer samples.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a wrinkle classification model according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a wrinkle classification model training method according to an embodiment of the present application;

fig. 4 is a functional block diagram of a wrinkle classification model training device according to an embodiment of the present application.

Icon: 10-an image processing device; 11-a processor; 12-a machine-readable storage medium; 20-wrinkle classification model; 21-a feature extraction section; 22-an upsampling part; 40-a wrinkle classification model training device; 41-an acquisition module; 42-a calculation module; 43-parameter adjustment module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Referring to fig. 1, fig. 1 is a block diagram of an image processing apparatus 10 according to an embodiment of the present disclosure. The image processing apparatus 10 may be any apparatus having an image processing function, such as a server, a Personal Computer (PC), or the like.

The image processing apparatus 10 includes a processor 11 and a machine-readable storage medium 12, and the processor 11 and the machine-readable storage medium 12 are communicatively connected by a system bus. The machine-readable storage medium 12 stores machine-executable instructions that, when executed by the processor 11, may implement the wrinkle classification model training method described below in this embodiment.

Additionally, the machine-readable storage medium 12 may also store a pre-constructed wrinkle classification model.

It is to be noted that the image processing apparatus shown in fig. 1 is merely illustrative, and the image processing apparatus may include more or less components than those shown in fig. 1, or may have a completely different configuration.

Referring to fig. 2, an architectural diagram of a wrinkle classification model 20 provided in the embodiment of the present application is exemplarily shown.

The wrinkle classification model 20 includes a feature extraction section 21 and an up-sampling section 22, and the feature extraction section 21 includes a plurality of first processing layers, such as L1, L2, L3, L4, and L5 shown in fig. 2. Each first processing layer comprises at least one preset layer group, and the preset layer group comprises a convolution (Conv) layer and an activation (Relu) function which are connected in sequence.

The up-sampling part 22 includes a plurality of second processing layers, such as L5, L6, L7, L8, and L9 shown in fig. 2, corresponding to the plurality of first processing layers, respectively. Here, L5 may be regarded as the first processing layer at the highest level in the feature extraction section 21, or may be regarded as the second processing layer at the highest level in the upsampling section 22. L5 as the first treated layer corresponds to L5 as the second treated layer, L6 corresponds to L4, L7 corresponds to L3, L8 corresponds to L2, and L9 corresponds to L1.

Each second processing layer comprises at least one preset layer group. It should be noted that the convolutional layers and the activation functions in different preset layer groups may have different parameters or the same parameters, and the embodiment is not limited thereto.

Optionally, in this embodiment, in the feature extraction part, a normalization layer (BatchNorm) may be further disposed between the convolution layer of each preset layer group and the activation function, and is used to perform normalization processing on the feature map output by the convolution layer.

Alternatively, the wrinkle classification model may be implemented based on an improved UNet model.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a wrinkle classification model training process according to an embodiment of the present application, where the method can be applied to the image processing apparatus shown in fig. 1. The individual steps of the method will be described in detail below.

In step S31, a plurality of face image samples and a mask image for each face image sample are acquired.

Wherein, a sample label can be preset for each face image sample according to whether the face image sample contains wrinkles or not. Specifically, if one face image sample contains wrinkles, a wrinkle-containing label may be added thereto, and if one face image sample does not contain wrinkles, a wrinkle-free label may be added thereto. In practice, a specific field may be set for each face image sample, and when the value of the specific field is data a (e.g., 0), it indicates that there is a wrinkle label, and when the value of the specific field is data b (e.g., 1), it indicates that there is no wrinkle label.

In this embodiment, for a face image sample provided with a wrinkle-containing tag, the mask image thereof includes at least one wrinkle in the face image sample, for example, the wrinkle with the largest depth in the face image sample, so that the number of marked wrinkles can be reduced and the classification accuracy can be improved better.

In this embodiment, the mask image of one face image sample has the same size as that of the face image sample. Also, for a mask image including a wrinkle contour, the value of the pixel at the position where the wrinkle contour is located may be 255, and the values of the pixels at other positions may be 0 (i.e., black).

Step S32, inputting each face image sample into the wrinkle classification model respectively.

In this embodiment, each face image sample input into the wrinkle classification model is first processed sequentially by each first processing layer of the feature extraction part, and for the feature map output by each first processing layer, the feature map may be pooled (pool) and then input into the next first processing layer connected to the first processing layer. For the feature map output by each first processing layer, one copy of the feature map may be copied (copy), and the copied feature map may be cut (crop) and input to the second processing layer corresponding to the first processing layer.

Further, as for the feature map output from the first processing layer (for example, L5) or each second processing layer at the highest layer in the feature extraction section, the feature map may be up-sampled by using the mask image of the face image sample as a supervision signal and then input to the next connected second processing layer. The upsampling may be implemented by an upsampling layer or a deconvolution layer, and the embodiment is not limited thereto.

In the above case, each second processing layer will receive the upsampled feature map and the cut feature map. In this embodiment, each second processing layer may perform stitching (concat) on the received upsampled feature map and the cut feature map, and then process the stitching result and output a corresponding feature map.

Optionally, in this embodiment, when the feature map output by the first processing layer or the second processing layer is up-sampled by using the mask image of any facial image sample as the supervision signal, the mask image may be down-sampled to the same scale as the feature map. Specifically, the downsampling may be performed by using a maxporoling (maximum pooling) algorithm, and the wrinkle contour in the mask image and the black pixels around the wrinkle contour may be prevented from being averaged out, compared with the upsampling performed by using an average pooling algorithm in the related art.

Step S33, inputting the feature map output by the first processing layer at the highest layer in the feature extraction part into a full link layer, and calculating a classification loss (loss) between the feature map output by the full link layer and the sample label of the face image sample to obtain a first calculation result.

Specifically, the output of the full-link layer indicates the classification result of the wrinkle classification model on the face image sample, a specific classification loss function may be used to calculate the classification loss of the feature map output by the full-link layer and the sample label of the face image sample, and the obtained function value is the first calculation result. Alternatively, the classification loss function may be, for example, a Softmax (flexible maximum) function.

By providing the full link layer, the wrinkle classification model may also be applied to classification problems, in the case of being applied to segmentation problems.

Optionally, in this embodiment, step 33 may further include the following sub-steps:

and processing the feature graph output by the first processing layer at the highest layer in the feature extraction part through a global pooling layer, and inputting a processing result into the full connection layer.

In this way, the number of parameters can be reduced without affecting the accuracy of the processing result, thereby improving the training speed and the data processing speed when the wrinkle classification model is actually used.

Step S34, calculating a segmentation loss between the feature map output by the at least one second processing layer of the upsampled part and the mask image of the face image sample, and obtaining at least one second calculation result.

In detail, in this embodiment, the segmentation loss (loss) of the feature map output by each second processing layer except the lowest layer (e.g., L9 shown in fig. 2) in the upsampled part and the mask image of the face image sample may be calculated separately, and a plurality of the second calculation results may be obtained. In the implementation process, a preset segmentation function may be used to calculate the segmentation loss, and the obtained function value is one of the second calculation results.

By calculating segmentation loss at multiple locations in the wrinkle classification model (i.e., setting up multiple layers of supervisory signals), classification accuracy can be improved more effectively.

Alternatively, in this embodiment, before calculating the segmentation loss of the feature map output by each second processing layer except the lowest layer in the upsampling part and the mask image of the face image sample, the mask image of the face image sample may be downsampled into an image with the same scale as the feature map output by the second processing layer. Wherein the downsampling may be performed using the maxpoling algorithm described above.

Step S35, adjusting parameters of the wrinkle classification model according to the first calculation result and the second calculation result, respectively, to implement training of the wrinkle classification model.

In the implementation process, when the first calculation result is obtained, the hyper-parameter of the wrinkle classification model can be adjusted according to the first calculation result, so that the function value of the corresponding classification loss function is reduced. And adjusting the hyper-parameters of the wrinkle classification model according to a second calculation result every time the second calculation result is obtained so as to reduce the function value of the segmentation loss function outputting the second calculation result.

And (4) repeatedly adjusting the hyper-parameters of the wrinkle classification model according to the flow, so as to realize the training of the wrinkle classification model.

In the design, mask images of various scales are used as supervision signals to conduct wrinkle classification training on face image samples, so that a wrinkle classification model can learn more accurate wrinkle features in the training process, and the influence of interference features of the face on the wrinkle features, such as the influence of eyelashes on eye wrinkles, the influence of hair on head raising lines, the influence of the shape of the nose or the mouth corner on statuary lines and the like, can be at least partially filtered. Therefore, the classification precision of the wrinkle classification model is improved, namely the generalization capability of the wrinkle classification model is improved.

Referring to fig. 4, fig. 4 is a functional block diagram of a wrinkle classification model training device 40 according to an embodiment of the present application. The wrinkle classification model training device 40 comprises at least one functional module that may be stored in the machine-readable storage medium 12 in the form of machine-executable instructions. Specifically, the wrinkle classification model training device 40 includes an obtaining module 41, a calculating module 42, and a parameter adjusting module 43.

The obtaining module 41 is configured to obtain a plurality of face image samples and a mask image of each face image sample.

In this embodiment, a sample label is preset in each face image sample, where the sample label is specifically a wrinkle-containing label or a wrinkle-free label, and a mask image of the face image sample with the wrinkle-containing label includes a contour of at least one wrinkle in the face image sample.

The computing module 42 is configured to input each facial image sample into the wrinkle classification model; inputting the feature map output by the first processing layer at the highest layer in the feature extraction part into a full connection layer, and calculating the classification loss of the feature map output by the full connection layer and the sample label of the face image sample to obtain a first calculation result; and calculating the segmentation loss of the characteristic graph output by at least one second processing layer of the up-sampling part and the mask image of the face image sample to obtain at least one second calculation result.

Optionally, the calculating module 42 may be specifically configured to calculate a segmentation loss between the feature map output by each second processing layer except the lowest layer in the upsampled part and the mask image of the face image sample, respectively, to obtain a plurality of second calculation results.

Optionally, the calculation module 42 may be further specifically configured to, before calculating a segmentation loss between the feature map output by each second processing layer except the lowest layer in the upsampling part and the mask image of the facial image sample, downsample the mask image of the facial image sample into an image with the same scale as the feature map output by the second processing layer.

The parameter adjusting module 43 is configured to adjust parameters of the wrinkle classification model according to the first calculation result and the second calculation result, respectively, so as to implement training of the wrinkle classification model.

The detailed description of the above functional modules may refer specifically to the explanation of the corresponding steps above.

In summary, the present application provides a method and an apparatus for training a wrinkle classification model, which acquire a plurality of facial image samples and their respective mask images, wherein each sample is provided with a wrinkle-containing label or a wrinkle-free label, and the facial image sample provided with the wrinkle-containing label includes a contour of at least one wrinkle in the sample; inputting each sample into a wrinkle classification model, inputting a feature map output by the highest layer in a feature extraction part of the model into a full connection layer, and calculating the classification loss of the feature map output by the full connection layer and a sample label of the sample to obtain a first calculation result; calculating the segmentation loss of a characteristic graph output by at least one second processing layer of the upsampling part of the model and the mask image of the sample to obtain at least one second calculation result; and adjusting parameters of the model according to the obtained first calculation result and the second calculation result respectively to realize the training of the model. In this way, the problem of classifying the wrinkles is converted into a segmentation problem, at least one wrinkle marked in the mask image can be used as a supervision signal to classify the wrinkles of the face image, and the classification accuracy can be improved through fewer samples.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A wrinkle classification model training method is used for training a pre-constructed wrinkle classification model and is characterized in that the wrinkle classification model comprises a feature extraction part and an up-sampling part; the feature extraction section includes a plurality of first processing layers, and the up-sampling section includes a plurality of second processing layers respectively corresponding to the plurality of first processing layers; the method comprises the following steps:

inputting each face image sample into the wrinkle classification model respectively; inputting the feature map output by the first processing layer at the highest layer in the feature extraction part into a full connection layer, and calculating the classification loss of the feature map output by the full connection layer and the sample label of the face image sample to obtain a first calculation result;

respectively calculating the feature map output by each second processing layer except the lowest layer in the up-sampling part and the segmentation loss of the mask image of the face image sample to obtain a plurality of second calculation results;

when a first calculation result is obtained, the hyperparameter of the wrinkle classification model is adjusted according to the first calculation result so as to reduce the function value of the corresponding classification loss function, and when a second calculation result is obtained, the hyperparameter of the wrinkle classification model can be adjusted according to the second calculation result so as to reduce the function value of the segmentation loss function outputting the second calculation result, thereby realizing the training of the wrinkle classification model.

2. The method according to claim 1, wherein the step of inputting the feature map output from the first processing layer at the highest layer in the feature extraction part into a full connection layer comprises:

3. The method of claim 1, further comprising:

and before calculating the segmentation loss of the feature map output by each second processing layer except the lowest layer in the upsampling part and the mask image of the face image sample, downsampling the mask image of the face image sample into an image with the same scale as the feature map output by the second processing layer.

4. The method of claim 3, wherein the step of down-sampling the mask image of the facial image sample into an image having the same scale as the feature map output by the second processing layer comprises:

and downsampling the mask image of the face image sample by adopting a Maxboosting algorithm to obtain an image with the same scale as the feature map output by the second processing layer.

5. The method according to claim 1 or 2, characterized in that the method further comprises:

processing each face image sample input into the wrinkle classification model by using the feature extraction part, cutting a feature graph output by each first processing layer, inputting the feature graph into a second processing layer corresponding to the first processing layer, pooling the feature graphs and inputting the feature graphs into a next first processing layer connected with the first processing layer;

for the feature map output by the first processing layer or each second processing layer at the highest layer in the feature extraction part, the mask image of the face image sample is used as a supervision signal to perform up-sampling on the feature map and then input into the next second processing layer;

and each second processing layer splices the received feature map subjected to the upsampling and the cut feature map, processes the splicing result and outputs the processed result.

6. A wrinkle classification model training device is used for training a pre-constructed wrinkle classification model and is characterized in that the wrinkle classification model comprises a feature extraction part and an up-sampling part; the feature extraction section includes a plurality of first processing layers, and the up-sampling section includes a plurality of second processing layers respectively corresponding to the plurality of first processing layers; the device comprises:

the computing module is used for respectively inputting each face image sample into the wrinkle classification model; inputting the feature map output by the first processing layer at the highest layer in the feature extraction part into a full connection layer, and calculating the classification loss of the feature map output by the full connection layer and the sample label of the face image sample to obtain a first calculation result;

and the parameter adjusting module is used for adjusting the hyperparameter of the wrinkle classification model according to the first calculation result when the first calculation result is obtained so as to reduce the function value of the corresponding classification loss function, and adjusting the hyperparameter of the wrinkle classification model according to the second calculation result every time a second calculation result is obtained so as to reduce the function value of the segmentation loss function outputting the second calculation result and realize the training of the wrinkle classification model.

7. The apparatus according to claim 6, wherein the computing module is specifically configured to process, by using a global pooling layer, the feature map output by the first processing layer at the highest layer in the feature extraction part, and output the processing result to the full-connection layer.

8. The apparatus according to claim 6, wherein the computing module is further configured to, before computing a segmentation loss between the feature map output by each second processing layer except the lowest layer in the upsampled part and the mask image of the facial image sample, downsample the mask image of the facial image sample into an image with the same scale as the feature map output by the second processing layer.