CN114663417A

CN114663417A - Sensitive prediction model training method, face image processing method and equipment

Info

Publication number: CN114663417A
Application number: CN202210350011.7A
Authority: CN
Inventors: 齐子铭; 刘兴云; 李志阳
Original assignee: Xiamen Meitu Yifu Technology Co ltd
Current assignee: Xiamen Meitu Yifu Technology Co ltd
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2022-06-24

Abstract

The application provides a training method of a sensitive prediction model, a face image processing method and equipment, and relates to the technical field of computer image processing. The training method of the sensitive prediction model comprises the following steps: firstly, processing a multi-channel image corresponding to each sample face image by using an image generator to obtain a sample sensitive prediction image corresponding to each sample face image; and secondly, processing the sample sensitive prediction image corresponding to each sample face image and each sample face image by adopting an image discriminator to obtain the sensitive prediction accuracy of the image generator. And performing model training based on the image generator after the parameter adjustment again by adjusting the parameter of the image generator until the sensitivity prediction accuracy obtained by the image discriminator meets the preset condition, and finishing the training. The sensitivity prediction model obtained by training can predict the sensitivity change of the face based on the current skin state of the user, and accurately and scientifically output a result graph of sensitivity prediction.

Description

Sensitive prediction model training method, face image processing method and equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a training method of a sensitive prediction model, a face image processing method and face image processing equipment.

Background

In recent years, the number of users suffering from sensitive skin is increasing, and the current common skin sensitive area detection method can analyze the skin state of the user and then give a multi-dimensional skin analysis detection report of the current state, or simulate the change trend of the skin relaxation degree and the wrinkle degree in a map mode in the face area of the user according to the age change.

However, the current method for detecting the sensitive area of the skin further scientifically and individually predicts the change trend of the sensitive area of the skin on the basis of analyzing the current skin condition of the user, and a corresponding solution is not provided in academia.

Disclosure of Invention

The present invention aims to provide a training method of a sensitive prediction model, a face image processing method and a device thereof, so as to realize scientific prediction of a sensitive area of the skin, aiming at the defects in the prior art.

In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:

in a first aspect, an embodiment of the present application provides a method for training a sensitive prediction model, where the sensitive prediction model includes: the training method comprises the following steps:

obtaining a plurality of sample face images and red light images corresponding to the sample face images;

processing the plurality of sample face images to obtain non-sensitive area images corresponding to the plurality of sample face images;

respectively processing the red light images corresponding to the plurality of sample face images by adopting a preset sensitive area analysis algorithm to obtain sample sensitive area mask images corresponding to the plurality of sample face images;

combining a non-sensitive area image corresponding to each sample face image and a sample sensitive area mask image corresponding to each sample face image to obtain a multi-channel image corresponding to each sample face image;

processing the multi-channel image corresponding to each sample face image by using the image generator to obtain a sample sensitive prediction image corresponding to each sample face image;

processing the sample sensitive prediction image corresponding to each sample face image and each sample face image by adopting the image discriminator to obtain the sensitive prediction accuracy of the image generator;

and adjusting parameters of the image generator, and performing model training based on the image generator after parameter adjustment until the sensitivity prediction accuracy obtained by the image discriminator meets a preset condition.

Optionally, the processing, by using a preset sensitive region analysis algorithm, the red light images corresponding to the plurality of sample face images respectively to obtain sample sensitive region mask images corresponding to the plurality of sample face images includes:

analyzing the pixel values of each pixel point in a plurality of color channels in the red light images corresponding to the plurality of sample face images by adopting the preset sensitive region analysis algorithm, and determining whether each pixel point is a sensitive pixel point or a non-sensitive pixel point according to the pixel value of each pixel point in the plurality of color channels;

and assigning the pixel values of the sensitive pixel points and the non-sensitive pixel points in the red light images corresponding to the plurality of sample face images to obtain sample sensitive area mask images corresponding to the plurality of sample face images.

Optionally, the determining, according to the pixel value of each pixel point in a plurality of color channels, whether each pixel point is a sensitive pixel point or a non-sensitive pixel point includes:

calculating a first ratio of the pixel value of each pixel point in the red light channel to the pixel value of each pixel point in the blue light channel;

calculating a second ratio of the pixel value of each pixel point in the red light channel to the pixel value of each pixel point in the green light channel;

if the first ratio is larger than a first preset threshold, the second ratio is larger than a second preset threshold, the pixel value of a target pixel point in a green light channel is smaller than a third preset threshold, and the pixel value of the target pixel point in a blue light channel is smaller than a fourth preset threshold, determining that the target pixel point is a sensitive pixel point, and determining that other pixel points except the sensitive pixel point in a red light image corresponding to each sample face image are non-sensitive pixel points.

Optionally, before the non-sensitive region image corresponding to each sample face image and the sample sensitive region mask image corresponding to each sample face image are combined to obtain the multi-channel image corresponding to each sample face image, the method further includes:

normalizing the pixel value of the non-sensitive area image corresponding to each sample face image to a first pixel range;

normalizing the pixel value of the sample sensitive area mask image corresponding to each sample face image to a second pixel range;

the merging the non-sensitive area image corresponding to each sample face image and the sample sensitive area mask image corresponding to each sample face image to obtain the multi-channel image corresponding to each sample face image comprises:

and merging the image without the sensitive area after the pixel value normalization and the mask image of the sample sensitive area after the pixel value normalization to obtain a multi-channel image corresponding to each sample face image.

Optionally, the sensitive prediction model further includes: an attention learning module;

the processing the multi-channel image corresponding to each sample face image by using the image generator to obtain the sample sensitive prediction image corresponding to each sample face image comprises:

processing the multi-channel image corresponding to each sample face image by adopting the image generator to obtain a sensitive prediction result corresponding to each sample face image;

processing the sensitive prediction result by adopting the attention learning module to obtain a three-channel sensitive prediction image and a single-channel sensitive prediction image;

and fusing the three-channel sensitive prediction image and the single-channel sensitive prediction image to obtain a sample sensitive prediction image corresponding to each sample face image.

In a second aspect, an embodiment of the present application further provides a face image processing method, including:

acquiring an original face image and a red light image corresponding to the original face image;

processing the red light image by adopting a preset sensitive area analysis algorithm to obtain a sensitive area mask image;

merging the original face image and the sensitive area mask image to obtain a multi-channel image;

and processing the multi-channel image by using a preset sensitive prediction model to obtain a sensitive prediction image corresponding to the original face image, wherein the preset sensitive prediction model is a model trained by using the training method of the sensitive prediction model of the first aspect.

Optionally, before the preset sensitive prediction model is used to process the multi-channel image to obtain a sensitive prediction image corresponding to the original face image, the method includes:

cutting the multi-channel image according to preset cutting parameters to obtain a first multi-channel image;

and processing the first multi-channel image by adopting the preset sensitive prediction model to obtain a sensitive prediction image of part of original face images corresponding to the first multi-channel image.

Optionally, before the original face image and the sensitive region mask image are combined to obtain a multi-channel image, the method includes:

normalizing the original face image pixel value to a first pixel range;

normalizing the pixel value of the sensitive area mask image corresponding to the original face image to a second pixel range;

combining the original face image and the sensitive region mask image to obtain a multi-channel image, wherein the method comprises the following steps:

and merging the original face image after pixel value normalization and the sensitive area mask image after pixel value normalization to obtain a multi-channel image corresponding to the original face image.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores program instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the program instructions to execute the steps of the face image processing method according to the second aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is executed by a processor to perform the steps of the face image processing method according to any one of the second aspects.

The beneficial effect of this application is: the embodiment of the application provides a method for training a sensitive prediction model, which comprises the steps of firstly obtaining a plurality of sample face images and red light images corresponding to the plurality of sample face images; the method comprises the steps of processing a plurality of sample face images to obtain non-sensitive area images corresponding to the plurality of sample face images, and processing red light images corresponding to the plurality of sample face images by adopting a preset sensitive area analysis algorithm to obtain sample sensitive area mask images corresponding to the plurality of sample face images. And combining the non-sensitive area image corresponding to each sample face image and the sample sensitive area mask image corresponding to each sample face image to obtain a multi-channel image corresponding to each sample face image. After the preparation work is completed, the sensitive prediction model is trained. Firstly, processing a multi-channel image corresponding to each sample face image by using an image generator to obtain a sample sensitive prediction image corresponding to each sample face image; and secondly, processing the sample sensitive prediction image corresponding to each sample face image and each sample face image by adopting an image discriminator to obtain the sensitive prediction accuracy of the image generator. And performing model training based on the image generator after the parameter adjustment again by adjusting the parameter of the image generator until the sensitivity prediction accuracy obtained by the image discriminator meets the preset condition, and finishing the training. The sensitive prediction model obtained by training is based on mass data, and prediction of sensitive areas in the face image is proposed and realized by utilizing a generation countermeasure network (an image generator and an image discriminator), deep learning and related technologies. The method can predict the change of the sensitivity of the face (such as the areas of the forehead, the cheek, the nasal wing and the like) based on the current skin state of the user, and in addition, through the arrangement of the decoding part of the image generator of the combination of nearest neighbor up-sampling and convolution layers, the algorithm can accurately and scientifically output a result graph of the sensitivity prediction on the basis of ensuring the original texture and the personalized features of the skin of the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flowchart of a method for training a sensitivity prediction model according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for training a sensitivity prediction model according to another embodiment of the present application;

fig. 3 is a schematic diagram of a sample sensitive area mask image corresponding to a sample face image according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for training a sensitivity prediction model according to another embodiment of the present application;

FIG. 5 is a flowchart of a method for training a sensitivity prediction model according to yet another embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for training a sensitivity prediction model according to yet another embodiment of the present application;

fig. 7 is a schematic structural diagram of an attention learning module according to an embodiment of the present application;

fig. 8 is a flowchart of a face image processing method according to an embodiment of the present application;

fig. 9 is a flowchart of a face image processing method according to another embodiment of the present application;

fig. 10 is a flowchart of a face image processing method according to yet another embodiment of the present application;

fig. 11 is a schematic view of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

In this application, unless explicitly stated or limited otherwise, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one feature. In the description of the present invention, "a plurality" means at least two, for example, two, three, unless specifically defined otherwise. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In recent years, the number of users suffering from sensitive skin has increased, and sensitive skin can be simply understood as: the skin condition is poor in tolerance and easy to be allergic, and the skin looks reddish and red blood streak is easy to see visually. The future growth change trend of the sensitive area of the skin is predicted based on the current skin state of the user, and then therapeutic intervention is carried out, so that the trouble of skin lesion of the user can be relieved.

Most skin detection methods on the market at present can analyze the skin state of a user according to the current state of the user, and then give a multi-dimensional skin analysis detection report of the current state, for example: skin, degree of skin tolerance, degree of pigmentation, degree of skin laxity, etc.; some people can simulate the change trend of the skin relaxation degree and the wrinkle degree only in a material pasting mode in the face area of the user according to the age change, but none of the methods can further scientifically and individually predict the change trend of the skin sensitive area on the basis of analyzing the current skin condition of the user. Furthermore, no corresponding solution has been proposed in academia.

Aiming at the existing defects, the embodiment of the application provides a plurality of possible implementation modes to realize scientific prediction of sensitive areas of the skin. The following is explained by way of a number of examples in connection with the drawings. Fig. 1 is a flowchart of a method for training a sensitivity prediction model according to an embodiment of the present application, where the method may be implemented by an electronic device running the method for training a sensitivity prediction model, and the electronic device may be, for example, a terminal device or a server. As shown in fig. 1, the sensitivity prediction model includes: the training method comprises the following steps:

step 101: and acquiring a plurality of sample face images and red light images corresponding to the plurality of sample face images.

It should be noted that, when the sensitive prediction model is trained, the number of the sample face images and the abundance of the image states may affect the reality of the trained model, so the number of the sample face images is not limited in the present application, and a user may set the number according to actual needs, for example, 6000 sample face images and 6000 red images corresponding to the sample face images may be obtained. The plurality of sample face images can comprise skin data under various illumination conditions, skin states and angle postures, so that the training of a sensitive prediction model is better realized. In addition, the sample face image may be any one of a photo of a user, an image of a face, and the like, or may be a local face image after being cropped, which is not limited in this application.

It should be noted that the skin color of human body is mainly composed of two pigments: melanin and heme. The two pigments have relatively fixed spectra for absorption and reflection of light, and thus relatively fixed colors for image formation. The color of human skin is determined by the contents of the two pigments, and the contents of the two pigments in the skin can be calculated. The contents of melanin (as a result of Brown plot, Brown) and heme (as a result of Red plot, Red) were calculated by image imaging. The red light image corresponding to the sample face image in the application can be obtained by calculating the content of the red pigment through image imaging and generating the red light image according to the content of the red pigment. The foregoing is merely an example, and in practical implementation, there may be other red light pattern generation manners, which is not limited in this application.

Step 102: and processing the plurality of sample face images to obtain non-sensitive area images corresponding to the plurality of sample face images.

The method and the device for obtaining the image without the sensitive area are not limited in specific modes, and a user can select the method and the device according to actual needs. In a possible implementation manner, a relevant skin color equalization algorithm may be used to obtain non-sensitive area images corresponding to a plurality of sample face images. For example, a Look-Up Table (LUT) method is used to obtain the image without the sensitive area; or after manually marking data and training a skin color uniform model based on a convolutional neural network, acquiring an image without a sensitive area through the skin color uniform model. The above is merely an example, and in an actual implementation, there may be other non-sensitive area image acquisition manners, which is not limited in the present application.

Step 103: and respectively processing the red light images corresponding to the plurality of sample face images by adopting a preset sensitive area analysis algorithm to obtain sample sensitive area mask images corresponding to the plurality of sample face images.

And processing the red light image corresponding to each sample face image by adopting a preset sensitive area analysis algorithm to obtain a sample sensitive area mask image corresponding to the sample face image. It should be noted that the preset sensitive region analysis algorithm may extract the sensitive region from the red light image and perform labeling, so as to obtain a sample sensitive region mask image corresponding to the sample face image. The specific implementation mode of the preset sensitive region analysis algorithm is not limited, as long as the extraction of the sensitive region in the red light image can be realized. In addition, the sample sensitive area mask image is an image obtained by performing sensitive area feature extraction on a red light image, a sensitive area and a non-sensitive area can be distinguished in the sample sensitive area mask image, and the distinguishing mode can be color, contour line and the like, which is not limited in the application.

Step 104: and combining the non-sensitive area image corresponding to each sample face image and the sample sensitive area mask image corresponding to each sample face image to obtain a multi-channel image corresponding to each sample face image.

Because the color channels and the like of the image are not processed when the non-sensitive area image corresponding to the sample face image is obtained, the non-sensitive area image is the same as the sample face image and belongs to a multi-channel image; and the sample sensitive area mask image corresponding to the sample face image is a single-channel image, and the non-sensitive area image corresponding to each sample face image and the sample sensitive area mask image corresponding to each sample face image are combined to obtain a multi-channel image, namely the multi-channel image corresponding to each sample face image.

In a possible implementation manner, the sample face image has three channels of a red channel, a blue channel and a green channel, the image without the sensitive region also has three channels of a red channel, a blue channel and a green channel, and the mask image of the sample sensitive region is a single-channel image, and then the image without the sensitive region corresponding to each sample face image and the mask image of the sample sensitive region corresponding to the sample face image are combined to obtain a four-channel image.

The foregoing is merely an example, and in an actual implementation, other channel setting and channel merging manners are possible, which are not limited in this application.

Step 105: and processing the multi-channel image corresponding to each sample face image by using an image generator to obtain a sample sensitive prediction image corresponding to each sample face image.

And inputting the multi-channel image corresponding to each sample face image into an image generator, and processing the input multi-channel image by the image generator to obtain a sample sensitive prediction image corresponding to each sample face image.

In a possible implementation manner, the generator may be a full convolution network of an encoding-decoding structure, and in addition, the upsampling of the decoding portion adopts a combination of nearest neighbor upsampling + convolution layer, so that the generator can generate a more realistic sensitive region, while ensuring the skin details and personalized features of images of other regions, table 1 is a generator network result provided by an embodiment of the present application, and its input size is: 512x512 as shown in table 1.

Table 1 a generator network result provided in an embodiment of the present application

Wherein the negotiveslope of the LeakyReLU may each be set to 0.2.

It should also be noted that the above is only an example, and the image generator of the present application can be applied to any full convolution-generating countermeasure network structure, which is not limited in the present application.

Step 106: and processing the sample sensitive prediction image corresponding to each sample face image and each sample face image by adopting an image discriminator to obtain the sensitive prediction accuracy of the image generator.

And after the image discriminator acquires the sample sensitive prediction image corresponding to each sample face image generated by the image generator, processing the sample sensitive prediction image corresponding to each sample face image and each sample face image to obtain the sensitive prediction accuracy of the image generator.

In one possible implementation, the image discriminator may employ a multi-scale discriminator to determine the authenticity of images of different resolutions. For example, when the input size of the image generator is 512 × 512, the discriminator is set to discriminate the authenticity of the following 3 resolution images (image width x image height x image channel): 512x512x3, 256x256x3, 128x128x 3. The above is merely an example, and other ways of setting the discriminator may be used in practical implementation, which is not limited in the present application.

And 107, adjusting parameters of the image generator, and performing model training based on the image generator after the parameters are adjusted again until the sensitivity prediction accuracy obtained by the image discriminator meets a preset condition.

And adjusting parameters of the image generator according to a judgment result of the image discriminator to enable the sample sensitive prediction image generated by the image generator to be adjusted in a more real direction, and training the image generator based on the adjusted parameters until the accuracy of the sensitive prediction obtained by the image discriminator meets a preset condition to finish training. In a possible implementation manner, the preset condition for determining that the sensitive prediction model is trained may be that the loss function converges, that is, when the value of the loss function of the sensitive prediction model changes stably in a small range, the model is considered to be trained completely.

In a specific implementation, the training of the sensitive prediction model can be achieved by:

first, a data set is acquired: in order to ensure the diversity of training data, about 6000 sample face images under various illumination conditions, skin states and angle postures are collected.

Secondly, setting a data gain: in order to improve the stability and robustness of the sensitive prediction model, one or more of the following data gains can be made for each sample face image through probability control during training: image color transformation: color cast, contrast/saturation adjustment, overexposure/underexposure; pixel position conversion: affine transformation, translation, rotation/flipping, etc.

Next, training parameters are set: the optimization algorithm is Adam; the learning rate of the generator: 0.0002; learning rate of the discriminator: 0.0001.

then, a loss function for generating the network is set: l ═ L₁+L_vgg+L_adv

L₁＝|generate-GT|；

In the formula:

1) GT is a target image of learning, i.e. a sample face image. L1 and L2 are both loss functions, L_vggAs a function of perceptual loss, L_advTo generate a pair loss-tolerance function.

L_perceptualIndicating a loss of perception, the loss of perception being: inputting both the output (generation) and GT of the network into another network, extracting the feature tensors of the corresponding layers, and calculating the difference between the feature tensors, i being the ith sampleThe method is as follows.

2) And generating a sample sensitive prediction image corresponding to the sample face image generated by the image generator. For a multi-channel image, Generate ═ (out _ rgb _ out _ association) + (1-out _ rgb) × a; wherein out _ RGB is a generation image of a non-sensitive area image (in this embodiment, the non-sensitive area image is an RGB three-channel image) corresponding to the sample face image in the image generator, out _ attention is a generation image of a sample sensitive area mask image corresponding to the sample face image in the image generator, and a is a non-sensitive area image corresponding to the sample face image input to the image generator.

4) W, H and C respectively represent the width, height and channel number of the characteristic diagram. The special graph refers to an image obtained after the input multi-channel image is subjected to convolutional layer calculation in the image generator.

5) Phi denotes the network (e.g., VGG-16 network, etc.) used to extract the feature map.

In summary, the embodiment of the present application provides a method for training a sensitive prediction model, which includes obtaining a plurality of sample face images and a plurality of red light images corresponding to the sample face images; the method comprises the steps of processing a plurality of sample face images to obtain non-sensitive area images corresponding to the plurality of sample face images, and processing red light images corresponding to the plurality of sample face images by adopting a preset sensitive area analysis algorithm to obtain sample sensitive area mask images corresponding to the plurality of sample face images. And combining the non-sensitive area image corresponding to each sample face image and the sample sensitive area mask image corresponding to each sample face image to obtain a multi-channel image corresponding to each sample face image. After the preparation work is completed, the sensitive prediction model is trained. Firstly, processing a multi-channel image corresponding to each sample face image by using an image generator to obtain a sample sensitive prediction image corresponding to each sample face image; and secondly, processing the sample sensitive prediction image corresponding to each sample face image and each sample face image by adopting an image discriminator to obtain the sensitive prediction accuracy of the image generator. And performing model training based on the image generator after the parameter adjustment again by adjusting the parameter of the image generator until the sensitivity prediction accuracy obtained by the image discriminator meets the preset condition, and finishing the training. The sensitive prediction model obtained by training is based on mass data, and prediction of sensitive areas in the face image is proposed and realized by utilizing a generation countermeasure network (an image generator and an image discriminator), deep learning and related technologies. The method can predict the sensitivity change of the face (such as the areas of the forehead, the cheek, the nasal wing and the like) based on the current skin state of the user, and in addition, the result graph of the sensitivity prediction can be accurately and scientifically output by an algorithm on the basis of ensuring the original texture and the personalized characteristics of the skin of the user through the arrangement of a decoding part of an image generator combining nearest neighbor upsampling and convolution layers.

Optionally, on the basis of fig. 1, the present application further provides a possible implementation manner of a method for training a sensitive prediction model, and fig. 2 is a flowchart of a method for training a sensitive prediction model according to another embodiment of the present application; as shown in fig. 2, the processing of the red light images corresponding to the plurality of sample face images by using a preset sensitive region analysis algorithm to obtain sample sensitive region mask images corresponding to the plurality of sample face images includes:

step 201: and analyzing the pixel values of each pixel point in the red light images corresponding to the plurality of sample face images in a plurality of color channels by adopting a preset sensitive region analysis algorithm, and determining whether each pixel point is a sensitive pixel point or a non-sensitive pixel point according to the pixel value of each pixel point in the plurality of color channels.

It should be noted that, by using a preset sensitive region analysis algorithm, each pixel point in the red light image corresponding to each sample face image is processed, so that a sample sensitive region mask image corresponding to the sample face image can be obtained.

In a possible implementation manner, the pixel values of each pixel point in the red light images corresponding to the multiple sample face images in the multiple color channels are analyzed, and whether the pixel point is a sensitive pixel point or a non-sensitive pixel point is judged by analyzing the pixel value of each color channel. It should be noted that, in a possible implementation manner, since the sensitive region is often red, whether the point belongs to a sensitive pixel point or a non-sensitive pixel point can be determined through the pixel value of the red channel and the data relationship (for example, a proportional relationship, a difference value, and the like) between the pixel value of the red channel in the pixel point and the pixel values of the other channels.

Step 202: and assigning the pixel values of the sensitive pixel points and the non-sensitive pixel points in the red light images corresponding to the plurality of sample face images to obtain sample sensitive area mask images corresponding to the plurality of sample face images.

For subsequent use, the discrimination results of the sensitive pixel points and the non-sensitive pixel points determined in step 201 need to be further processed, for example, the pixel values of the sensitive pixel points and the non-sensitive pixel points in the red images corresponding to the plurality of sample face images can be respectively assigned to obtain the sample sensitive area mask images corresponding to the sample face images, and since each pixel point only has two conditions of the sensitive pixel point and the non-sensitive pixel point, only two values exist in the obtained sample sensitive area mask images corresponding to the sample face images. It should be noted that the sensitive pixel points and the non-sensitive pixel points in the sample sensitive area mask image corresponding to the sample face image may be 0,1 values, or other values, which is not limited in the present application, and after the sample sensitive area mask image is subjected to imaging display, the obtained sample sensitive area mask image is an image of two colors, which may be a black-and-white image, or a two-color image, which is not limited in the present application. Exemplarily, fig. 3 is a schematic diagram of a sample sensitive area mask image corresponding to a sample face image according to an embodiment of the present application, as shown in fig. 3, the sample sensitive area mask image corresponding to the sample face image is a black-and-white image, and a white pixel area represents a potential sensitive area.

Each pixel point in the red light image is processed to obtain a sample sensitive area mask image, so that the existing or potential skin sensitive area of the sample face image in the current skin state can be obtained.

Optionally, on the basis of fig. 2, the present application further provides a possible implementation manner of a method for training a sensitive prediction model, and fig. 4 is a flowchart of a method for training a sensitive prediction model according to another embodiment of the present application; as shown in fig. 4, determining whether each pixel is a sensitive pixel or a non-sensitive pixel according to the pixel value of each pixel in a plurality of color channels includes:

step 401: and calculating a first ratio of the pixel value of each pixel point in the red light channel to the pixel value of each pixel point in the blue light channel.

In a possible implementation manner, the Red light image corresponding to the sample face image includes three channels, namely a Red light channel, a blue light channel, and a green light channel, and pixel values of the pixel points in the three channels are obtained for each pixel point, so that a pixel value Red _ R of the pixel point in the Red light channel, a pixel value Red _ B of the blue light channel, and a pixel value Red _ G of the pixel point in the green light channel are obtained. That is, Red _ R, Red _ G, Red _ B represents the RGB values of a certain pixel point in the Red image corresponding to the sample face image. In order to determine whether the pixel point is a sensitive pixel point or a non-sensitive pixel point, a first ratio of a pixel value Red _ R of the pixel point in a Red light channel to a pixel value Red _ B of the pixel point in a blue light channel is calculated, and in a specific implementation manner, the calculation manner of the first ratio is as follows:

the foregoing is merely an example, and in another specific implementation, the first ratio may also be a ratio of a pixel value Red _ B of a blue light channel to a pixel value Red _ R of a Red light channel, which is not limited in this application, and a user may select the first ratio according to actual needs.

Step 402: and calculating a second ratio of the pixel value of each pixel point in the red light channel to the pixel value of each pixel point in the green light channel.

Calculating a second ratio of the pixel value Red _ R of the pixel point in the Red light channel to the pixel value Red _ B of the pixel point in the green light channel, wherein in a specific implementation mode, the calculation mode of the second ratio is as follows:

the foregoing is merely an example, and in another specific implementation, the second ratio may also be a ratio of a pixel value Red _ G of the green light channel to a pixel value Red _ R of the Red light channel, which is not limited in this application, and a user may select the ratio according to actual needs.

Step 404: and if the first ratio is greater than a first preset threshold, the second ratio is greater than a second preset threshold, the pixel value of the target pixel point in the green light channel is less than a third preset threshold, and the pixel value of the target pixel point in the blue light channel is less than a fourth preset threshold, determining that the target pixel point is a sensitive pixel point, and determining that other pixel points except the sensitive pixel point in the red light image corresponding to each sample face image are non-sensitive pixel points.

Acquiring a first ratio and a second ratio of a target pixel point and a pixel value Red _ B of the target pixel point in a blue light channel and a pixel value Red _ G of the target pixel point in a green light channel, and then judging whether the target pixel point is a sensitive pixel point or not by the following method: when a certain pixel point simultaneously meets the following four conditions, determining that the target pixel point is a sensitive pixel point, wherein the four conditions are as follows:

first, the first ratio is greater than a first preset threshold (i.e., Red _ R/Red _ B > first preset threshold);

secondly, the second ratio is larger than a second preset threshold (namely Red _ R/Red _ G > the second preset threshold);

thirdly, the pixel value of the target pixel point in the green light channel is smaller than a third preset threshold (namely Red _ G < the third preset threshold);

fourthly, the pixel value of the target pixel point in the blue light channel is smaller than a fourth preset threshold (namely Red _ B < the fourth preset threshold);

in a specific implementation manner, the first preset threshold may be, for example, 1.2 to 1.3, the second preset threshold may be, for example, 1.2 to 1.3, the third preset threshold may be, for example, 130 to 160, and the fourth preset threshold may be, for example, 130 to 160. The above is merely an example illustration of an empirical value in an experiment, and in an actual implementation, a user may set a preset threshold according to needs or an actual use condition, which is not limited in the present application.

It should be noted that the target pixel point may be any pixel point in the red light image corresponding to any sample face image, which is not limited in the present application.

And repeating the calculation and judgment operation pixel by pixel in the red light image corresponding to each sample face image to obtain whether each pixel point in the red light image corresponding to the sample face image is a sensitive pixel point or a non-sensitive pixel point.

By decomposing the red light image into three channels and analyzing the relation between the pixel values of each channel and the specific value of each pixel value, the existing or potential skin sensitive area of the user in the current skin state can be obtained.

Optionally, on the basis of fig. 1, the present application further provides a possible implementation manner of a training method of a sensitive prediction model, and fig. 5 is a flowchart of a training method of a sensitive prediction model according to yet another embodiment of the present application; as shown in fig. 5, before merging the non-sensitive region image corresponding to each sample face image and the sample sensitive region mask image corresponding to each sample face image to obtain a multi-channel image corresponding to each sample face image, the method further includes:

step 501: and normalizing the pixel value of the non-sensitive area image corresponding to each sample face image to a first pixel range.

In order to facilitate subsequent image processing, it is necessary to perform normalization processing on pixel values of the image without the sensitive region corresponding to each sample face image, and in a possible implementation manner, the first pixel range may be, for example, a (-1, 1) range; in another possible implementation manner, if the non-sensitive region image corresponding to each sample face image includes three images, i.e., a red light channel image, a green light channel image, and a blue light channel image, the pixel value corresponding to each channel image may be normalized, for example, to a range of (-1, 1). The foregoing is merely an example, and in practical implementations, there may be other normalization manners and specific first pixel ranges, which are not limited in this application.

Step 502: and normalizing the pixel value of the sample sensitive area mask image corresponding to each sample face image to a second pixel range.

Similarly, the pixel values of the mask image of the sample sensitive area corresponding to each sample face image need to be normalized, and in a possible implementation, the second pixel range may be, for example, a [0,1] range. The foregoing is merely an example, and in an actual implementation, there may be other sample sensitive area mask image normalization manners and a specific second pixel range, which is not limited in this application.

Merging the non-sensitive area image corresponding to each sample face image and the sample sensitive area mask image corresponding to each sample face image to obtain a multi-channel image corresponding to each sample face image, wherein the multi-channel image comprises:

step 503: and merging the image without the sensitive area after the pixel value normalization and the mask image of the sample sensitive area after the pixel value normalization to obtain a multi-channel image corresponding to each sample face image.

And combining the image without the sensitive area after the pixel value normalization and the mask image of the sample sensitive area after the pixel value normalization to obtain a multi-channel image corresponding to each sample face image. It should be noted that the merging may be superposition of the normalized images, or superposition or weighted superposition of each channel pixel value of each pixel point, which is not limited in the present application as long as a multi-channel image corresponding to each sample face image can be obtained.

Optionally, on the basis of fig. 1 to fig. 5, the present application further provides a possible implementation manner of a training method for a sensitive prediction model, where the sensitive prediction model further includes: an attention learning module; FIG. 6 is a flowchart illustrating a method for training a sensitivity prediction model according to yet another embodiment of the present application; as shown in fig. 6, the processing, by using the image generator, of the multi-channel image corresponding to each sample face image to obtain a sample sensitive predicted image corresponding to each sample face image includes:

step 601: and processing the multi-channel image corresponding to each sample face image by adopting an image generator to obtain a sensitive prediction result corresponding to each sample face image.

The image generator processes the multi-channel image corresponding to each sample face image to obtain the sensitive prediction result corresponding to each sample face image, and in order to further ensure the scientificity and accuracy of the sensitive prediction image (the sample sensitive prediction image and the sensitive prediction image in subsequent use), the sensitive prediction result corresponding to each sample face image obtained by the image generator can be further processed. It should be noted that the sensitive prediction result corresponding to each sample face image obtained by the image generator may be in the form of a pixel matrix, a multi-channel picture, and the like, which is not limited in this application.

Step 602: and processing the sensitive prediction result by adopting an attention learning module to obtain a three-channel sensitive prediction image and a single-channel sensitive prediction image.

And processing the sensitive prediction result corresponding to each sample face image obtained by the image generator by adopting an attention learning module so as to obtain a three-channel sensitive prediction image and a single-channel sensitive prediction image.

In a possible implementation manner, fig. 7 is a schematic structural diagram of an attention learning module according to an embodiment of the present application, and as shown in fig. 7, an attention learning module (ALN) module has two branches: one branch outputs a three-channel intermediate result graph (a three-channel sensitive prediction image, noted as out _ rgb), and the activation function of the branch can be Tanh; the other branch outputs a sample sensitive area mask image (a single-channel sensitive prediction image, which is recorded as out _ attention) corresponding to the sample face image learned by the attention learning module, and the activation function of the branch may be Sigmoid. The structure of the ALN module is given in fig. 7.

Step 603: and fusing the three-channel sensitive prediction image and the single-channel sensitive prediction image to obtain a sample sensitive prediction image corresponding to each sample face image.

The method comprises the following steps of fusing a three-channel sensitive prediction image and a single-channel sensitive prediction image to obtain a sample sensitive prediction image corresponding to each sample face image, wherein in a possible implementation mode, the sample sensitive prediction image corresponding to each sample face image can be obtained in the following mode:

each sample face image corresponds to a sample sensitive prediction image ═ out _ rgb _ out _ authentication) + (1-out _ rgb) × a;

wherein, out _ rgb is a three-channel sensitive prediction image, out _ attribute is a single-channel sensitive prediction image, and a is a non-sensitive area image corresponding to a sample face image input into the image generator.

The method and the device have the advantages that the established generation countermeasure network model based on the attention mechanics learning module (ALN) ensures the scientificity and the accuracy of the sensitive area prediction result, the multi-channel image corresponding to each sample face image obtained through analysis and calculation is used as input, the attention mechanics learning module is added behind the image generator, original skin details and personalized features in the sample face image are protected, and the authenticity of sensitive area prediction is effectively ensured.

In addition, the embodiment of the application also provides a plurality of possible implementation modes of the face image processing method so as to realize scientific prediction of the skin sensitive area. The following is explained by way of a number of examples in connection with the drawings. Fig. 8 is a flowchart of a face image processing method according to an embodiment of the present application, where the method may be implemented by an electronic device running the face image processing method, and the electronic device may be, for example, a terminal device (e.g., a skin prediction device, an electronic face detection device, etc.), or a server. As shown in fig. 8, the method includes:

step 801: acquiring an original face image and a red light image corresponding to the original face image;

it should be noted that the original face image may be a face image acquired in real time by using a camera, an intelligent terminal, a computer, and other devices capable of acquiring the face image, for example, a face image acquired in real time in short video or live broadcast by using an intelligent terminal device; the acquired face image may also be a non-real-time image acquired by a computer device, for example, a face image downloaded by the computer device. The specific acquisition form of the face image is not limited, and the face image processing method can be realized. In addition, the face image of the present application may be in the form of, for example, a picture, a dynamic picture, a short video, a video, and the like, and the specific type of the face image is not limited in the present application.

The red light map corresponding to the original face image can be obtained according to the way of obtaining the red light map in the step 101, which is not described herein again.

Step 802: processing the red light image by adopting a preset sensitive area analysis algorithm to obtain a sensitive area mask image;

the sensitive region mask image may be obtained according to the method of any one of the

steps

101, 201 to 202, and 401 to 403, which is not described herein again.

Step 803: merging the original face image and the sensitive area mask image to obtain a multi-channel image;

the method for combining the original face image and the sensitive region mask image can refer to a method for combining the non-sensitive region image corresponding to each sample face image and the sample sensitive region mask image corresponding to each sample face image in the training method of the sensitive prediction model; and replacing the non-sensitive area image corresponding to each sample face image with the original face image, and replacing the sample sensitive area mask image corresponding to each sample face image with the sensitive area mask image.

Step 804: and processing the multi-channel image by adopting a preset sensitive prediction model to obtain a sensitive prediction image corresponding to the original face image, wherein the preset sensitive prediction model is a model trained by adopting the training method of the sensitive prediction model.

And processing the multi-channel image by adopting the model trained by the sensitive prediction model training method to obtain a sensitive prediction image corresponding to the original face image.

In a possible implementation manner, after the sensitive prediction model is trained according to the steps in the training method of the sensitive prediction model, the sensitive prediction model is obtained, when in application, for an original face image, firstly, a sensitive region mask image is obtained through analysis and calculation according to a red light image corresponding to the original face image, then, the sensitive region mask image and the original face image are combined to obtain a four-channel tensor (the original face image is a three-channel tensor, and the sensitive region mask image is a one-channel tensor), and then, the four-channel tensor is input into the preset sensitive prediction model for prediction, wherein the output of the preset sensitive prediction model is a sensitive prediction image corresponding to the original face image.

Optionally, on the basis of fig. 8, the present application further provides a possible implementation manner of a training method for a sensitive prediction model, and fig. 9 is a flowchart of a face image processing method according to another embodiment of the present application; as shown in fig. 9, before a preset sensitive prediction model is used to process a multi-channel image and obtain a sensitive prediction image corresponding to an original face image, the method includes:

step 901: and cutting the multi-channel image according to preset cutting parameters to obtain a first multi-channel image.

Since there are a lot of unnecessary factors (such as background, non-face area, non-face sensitive area, etc.) during image processing, which increase the amount of calculation and memory usage of the method, but cannot bring processing progress to the method, the multi-channel image can be cropped before image processing. It should be noted that, after the cropping is completed according to the preset cropping parameters, a part or a plurality of parts of the original multi-channel image may be retained, which is not limited in the present application, and the user may set the image according to actual needs, so that one or more first multi-channel images may be obtained after the multi-channel image is cropped according to the preset cropping parameters, and the present application does not limit the specific number of the first multi-channel images.

It should be noted that the preset clipping parameter in the present application may be a fixed clipping parameter, or a facial region in an original face image may be intelligently identified by using a model such as a deep learning algorithm, and a preset reference parameter is flexibly set according to the facial region or a specific part in the facial region.

In a possible implementation manner, the cutting of the multi-channel image according to the preset cutting parameters may be based on the respective cutting of a channel corresponding to an original face image and a channel corresponding to a sensitive region mask image in the multi-channel image, and then combining the channels to obtain a first multi-channel image. For example, according to preset clipping parameters, a channel corresponding to an original face image in a multi-channel image is clipped to obtain a graph a, a channel corresponding to a sensitive region mask image in the multi-channel image is clipped to obtain a graph b, and the graph a and the graph b are combined in channel dimensions to obtain a first multi-channel image.

In another possible implementation manner, the cropping of the multi-channel image according to the preset cropping parameters may also be a cropping directly based on the multi-channel image, which is not limited in this application, and besides the above exemplary method, the user may have other cropping manners, which are not limited in this application.

Step 902: and processing the first multi-channel image by adopting a preset sensitive prediction model to obtain a sensitive prediction image of part of original face images corresponding to the first multi-channel image.

And inputting the first multi-channel image into a preset sensitive prediction model, wherein the output of the preset sensitive prediction model is a sensitive prediction image of a part of original face image corresponding to the first multi-channel image. Namely, the output is the sensitive prediction image of the local part of the original face image corresponding to the part of the multi-channel image reserved by clipping.

In recent years, the number of users suffering from sensitive skin is increasing, and at present, no product or solution for predicting the skin sensitivity change trend in a further scientific and personalized manner is available in the industry and academia on the basis of analyzing the current skin condition of the user. Therefore, the human face image processing method is firstly proposed and realized, and the face sensitive area prediction is realized. The method can predict the sensitivity change of different areas (forehead, cheek, nasal wing and the like) of the face based on the current skin state of the user, and the algorithm can accurately and scientifically output the result graph of sensitivity prediction on the basis of ensuring the original texture and personalized characteristics of the skin of the user.

Optionally, on the basis of fig. 8, the present application further provides a possible implementation manner of a training method for a sensitive prediction model, and fig. 10 is a flowchart of a face image processing method according to another embodiment of the present application; as shown in fig. 10, before merging the original face image and the sensitive region mask image to obtain a multi-channel image, the method includes:

step 1001: and normalizing the pixel values of the original face image to a first pixel range.

In order to facilitate subsequent processing of the image, it is necessary to perform normalization processing on pixel values of the original face image, and in a possible implementation manner, the first pixel range may be, for example, a (-1, 1) range; in another possible implementation manner, if the original face image includes three images, i.e., a red light channel image, a green light channel image, and a blue light channel image, the pixel value corresponding to the image of each channel may be normalized, for example, to a range of (-1, 1). The foregoing is merely an example, and in practical implementations, there may be other normalization manners and specific first pixel ranges, which are not limited in this application.

Step 1002: and normalizing the pixel value of the sensitive area mask image corresponding to the original face image to a second pixel range.

The pixel values of the mask image of the sensitive region corresponding to the original face image are normalized, and in a possible implementation, the second pixel range may be, for example, a [0,1] range. The foregoing is merely an example, and in an actual implementation, there may be other normalization manners and specific second pixel ranges of the sensitive region mask image corresponding to the original face image, which are not limited in this application.

Merging the original face image and the sensitive region mask image to obtain a multi-channel image, wherein the method comprises the following steps:

step 1003: and merging the original face image after the pixel value normalization and the sensitive area mask image after the pixel value normalization to obtain a multi-channel image corresponding to the original face image.

And combining the original face image after the pixel value normalization and the sensitive area mask image after the pixel value normalization to obtain a multi-channel image corresponding to the original face image. It should be noted that the merging may be superposition of the normalized images, or superposition or weighted superposition of each channel pixel value of each pixel in units of pixels, which is not limited in the present application, as long as a multi-channel image corresponding to the original face image can be obtained.

For the following description, specific implementation procedures and technical effects of the electronic device and the storage medium provided by the present application are described above, and will not be described again below.

The embodiment of the present application provides a possible implementation example of an electronic device, which is capable of executing the method for training a sensitive prediction model provided in the foregoing embodiment. Fig. 11 is a schematic diagram of an electronic device according to an embodiment of the present application, where the electronic device may be integrated in a terminal device or a chip of the terminal device, and the terminal may be a computing device with a data processing function.

The electronic device includes: the system comprises a processor 1101, a storage medium 1102 and a bus, wherein the storage medium stores program instructions executable by the processor, when the control device runs, the processor and the storage medium communicate through the bus, and the processor executes the program instructions to execute the steps of the training method of the sensitive prediction model. The specific implementation and technical effects are similar, and are not described herein again.

The embodiment of the present application provides a possible implementation example of a computer-readable storage medium, which is capable of executing the method for training the sensitive prediction model provided in the foregoing embodiment, where the storage medium stores a computer program, and the computer program is executed by a processor to perform the steps of the method for training the sensitive prediction model.

A computer program stored in a storage medium may include instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (which may be a processor) to perform some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other media capable of storing program codes.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training a sensitive prediction model, wherein the sensitive prediction model comprises: the training method comprises the following steps:

2. The method of claim 1, wherein the processing the red light images corresponding to the plurality of sample face images respectively by using a preset sensitive region analysis algorithm to obtain sample sensitive region mask images corresponding to the plurality of sample face images comprises:

3. The method of claim 2, wherein said determining whether each pixel is a sensitive pixel or a non-sensitive pixel based on the pixel values of said each pixel in a plurality of color channels comprises:

4. The method of claim 1, wherein before the combining the non-sensitive region image corresponding to each sample face image and the sample sensitive region mask image corresponding to each sample face image to obtain the multi-channel image corresponding to each sample face image, the method further comprises:

normalizing the pixel value of the image without the sensitive area corresponding to each sample face image to a first pixel range;

5. The method of any of claims 1-4, wherein the sensitivity prediction model further comprises: an attention learning module;

6. A face image processing method is characterized by comprising the following steps:

processing the multi-channel image by using a preset sensitive prediction model to obtain a sensitive prediction image corresponding to the original face image, wherein the preset sensitive prediction model is a model trained by using the training method of the sensitive prediction model according to any one of claims 1 to 5.

7. The method according to claim 6, wherein before the multi-channel image is processed by using the preset sensitive prediction model to obtain the sensitive prediction image corresponding to the original face image, the method comprises:

8. The method of claim 6, wherein before the merging according to the original face image and the sensitive region mask image to obtain a multi-channel image, the method comprises:

normalizing the original face image pixel value to a first pixel range;

9. An electronic device, comprising: a processor, a storage medium and a bus, wherein the storage medium stores program instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the program instructions to execute the steps of the human face image processing method according to any one of claims 6 to 8.

10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the face image processing method according to any one of claims 6 to 8.