CN114187201A

CN114187201A - Model training method, image processing method, device, equipment and storage medium

Info

Publication number: CN114187201A
Application number: CN202111497159.5A
Authority: CN
Inventors: 徐颖; 李玉乐; 项伟
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-15

Abstract

The embodiment of the invention discloses a model training method, an image processing device, model training equipment and a storage medium. The model training method comprises the following steps: the method comprises the steps of obtaining a training sample set, training a preset original network model by utilizing the training sample set based on a preset loss function to obtain a target network model, and determining a face image processing model according to a target generation network contained in the target network model, wherein the preset original network model comprises a generation countermeasure network and a flaw segmentation network, the generation network is used for generating a target domain flawless face image, the flaw segmentation network is used for segmenting a generated image output by the generation network in the training process, converting a segmentation result into a flaw loss inhibiting function, and constraining the generated image to inhibit the generation of flaws in the generation process. By adopting the technical scheme, the human face image processing model is used, so that a more real and natural non-flaw human face image can be obtained, the flaw removing processing efficiency is high, and the real-time performance is strong.

Description

Model training method, image processing method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of image processing, in particular to a model training method, an image processing device, model training equipment and a storage medium.

Background

The beauty function has become one of the important functions in many application programs, and the facial image can be beautified by using the beauty function, so that flaws in the skin of a human face, such as pockmarks, moles, color spots, large pores and the like, can be removed.

Currently, a common beauty treatment scheme includes an implementation of a filter based on edge preservation, such as a bilateral filter, a guided filter, a surface blurring filter, a local mean filter, or a combination stack of multiple filters, which preserves a portion of an edge with a large pixel gradient value, such as an eyebrow, a hair, and a background, and a portion with a uniform color and a small pixel gradient, such as a smooth skin. However, the scheme can cause the details of the skin to be lost, the higher the beautification degree is, the more serious the details are lost, the less real the overall texture is, the whole image becomes unreal, the texture of the real skin is separated, the false face feeling exists, the beautifying effect is not ideal, and the comfort level of a viewer is influenced. In addition, a beauty processing scheme based on deep learning exists, the scheme needs to detect flaws in a human face through a target detection model, and then remove the flaws by using methods such as local skin grinding or filling, but the timeliness of the target detection model is poor, and the whole image processing process is very time-consuming and low in image processing efficiency due to the fact that a subsequent processing process is added.

Disclosure of Invention

The embodiment of the invention provides a model training method, an image processing device, equipment and a storage medium, which can optimize the existing scheme for beautifying face images.

In a first aspect, an embodiment of the present invention provides a model training method, where the method includes:

acquiring a first training sample set containing a training sample pair, wherein the training sample pair comprises an original face sample image containing flaws and a target face sample image with flaws removed on the basis of the original face sample image;

training a preset original network model by using the first training sample set based on a preset loss function to obtain a target network model, wherein the preset original network model comprises a generation countermeasure network with parameters to be adjusted and a pre-trained defect segmentation network with fixed parameters, the generation countermeasure network comprises a generation network and a judgment network, the generation network is used for generating a target domain flawless face image, the defect segmentation network is used for segmenting a generated image output by the generation network to obtain a segmentation result based on a defect region and a non-defect region, the preset loss function comprises a defect inhibiting loss function, and the defect inhibiting loss function is obtained by converting the segmentation result and is used for restricting generation of defects of the generated image in a generation process;

and determining a face image processing model according to a target generation network contained in the target network model, wherein the face image processing model is used for processing the face image to be processed so as to remove flaws contained in the face image to be processed.

In a second aspect, an embodiment of the present invention provides an image processing method, including:

acquiring a face image to be processed;

and inputting the face image to be processed into a face image processing model so as to output a target face image which corresponds to the face image to be processed and is subjected to flaw removal processing, wherein the face image processing model is obtained by the model training method provided by the embodiment of the invention.

In a third aspect, an embodiment of the present invention provides a model training apparatus, including:

the system comprises a sample set acquisition module, a comparison module and a comparison module, wherein the sample set acquisition module is used for acquiring a first training sample set containing a training sample pair, and the training sample pair comprises an original face sample image containing flaws and a target face sample image with flaws removed on the basis of the original face sample image;

the model training module is used for training a preset original network model by utilizing the first training sample set based on a preset loss function to obtain a target network model, wherein the preset original network model comprises a generation countermeasure network with parameters to be adjusted and a pre-trained defect segmentation network with fixed parameters, the generation countermeasure network comprises a generation network and a judgment network, the generation network is used for generating a target domain flawless face image, the defect segmentation network is used for segmenting a generated image output by the generation network to obtain a segmentation result based on a defective region and a non-defective region, the preset loss function comprises a defect inhibiting loss function, and the defect inhibiting loss function is obtained by converting the segmentation result and is used for restraining the generation of defects of the generated image in the generation process;

and the model determining module is used for determining a face image processing model according to a target generation network contained in the target network model, wherein the face image processing model is used for processing the face image to be processed so as to remove flaws contained in the face image to be processed.

In a fourth aspect, an embodiment of the present invention provides a model training apparatus, including:

the image acquisition module to be processed is used for acquiring a face image to be processed;

and the image processing module is used for inputting the face image to be processed into a face image processing model so as to output a target face image which corresponds to the face image to be processed and is subjected to flaw removal processing, wherein the face image processing model is obtained by the model training method provided by the embodiment of the invention.

In a fifth aspect, an embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the model training method and/or the image processing method according to an embodiment of the present invention when executing the computer program.

In a sixth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a model training method and/or an image processing method according to the present invention.

The model training scheme provided in the embodiment of the invention is characterized in that a first training sample set containing a training sample pair is obtained, wherein the training sample pair comprises an original face sample image containing flaws and a target face sample image after flaws are removed on the basis of the original face sample image, a preset original network model is trained by using the first training sample set on the basis of a preset loss function to obtain a target network model, the preset original network model comprises a generation confrontation network with parameters to be adjusted and a flaw segmentation network with parameters fixed in advance, the generation confrontation network comprises a generation network and a discrimination network, the generation network is used for generating a target domain flawless face image, the flaw segmentation network is used for segmenting a generated image output by the generation network to obtain segmentation results based on flaw areas and non-flaw areas, the preset loss function comprises a defect inhibiting loss function which is obtained by converting a segmentation result and is used for restricting generation of defects in a generated image in a generating process, and a face image processing model is determined according to a target generation network contained in a target network model, wherein the face image processing model is used for processing a face image to be processed so as to remove the defects contained in the face image to be processed. By adopting the technical scheme, the anti-generation network is used for generating the flawless face in a mode of combining the anti-generation network with the flaw segmentation network, so that the consistency of other characteristics except skin texture is kept, the flaw segmentation network is used as an auxiliary network to promote the generation process of the anti-generation network, the generation network can learn to eliminate flaws by combining a flaw loss inhibition function, and then a flawless image is output, after the training is finished, the target generation network obtained by the training is used as a facial image processing model for flaw elimination, a more real and natural non-flaw facial image can be obtained by using the model, the non-flaw facial image is output at one time without multi-stage processing, the method can improve the efficiency of flaw removal processing, enhance the instantaneity and be well suitable for application scenes with higher timeliness requirements such as processing real-time video images in live broadcast or video call and the like.

Drawings

Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another model training method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a fault segmentation network training process according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a preset original network model training process according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of an image processing method according to an embodiment of the present invention;

fig. 6 is a block diagram of a model training apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

Fig. 1 is a schematic flowchart of a model training method according to an embodiment of the present invention, which may be executed by a model training apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. As shown in fig. 1, the method includes:

step 101, obtaining a first training sample set containing a training sample pair, wherein the training sample pair comprises an original face sample image containing flaws and a target face sample image with flaws removed on the basis of the original face sample image.

Illustratively, the flaws on the face may include objects that affect the beauty of the face, such as pockmarks, pox marks, moles, blood streaks, color spots, large pores, and the like, and many real face images may contain flaws, and processing the flaws may obtain a more beautiful face image, and meet aesthetic requirements of users. The flaws in the face image can be understood as flaw image areas obtained after the flaws on the real face are acquired by the image acquisition device, and are referred to as flaws for short in the text.

In the embodiment of the invention, the face image containing the flaw can be collected as the original face sample image. For the original face sample image, professional image processing software can be adopted to process so as to eliminate flaws contained in the original face sample image, and the processing process can be completed by operating the image processing software by professionals, so that a more standard beautified image is obtained and used as a target face sample image. And forming a training sample pair by the original face sample image and the corresponding target face sample image. For example, the original face sample image is a, and after a is processed, an unblemished target face sample image a is obtained, and a form a training sample pair.

For example, the first training sample set may include a first preset number of training sample pairs, where the first preset number may be set according to an actual situation, and in order to ensure a training effect, the first preset number may be set to be larger.

102, training a preset original network model by using a first training sample set based on a preset loss function to obtain a target network model, wherein the preset original network model comprises a generated countermeasure network with parameters to be adjusted and a pre-trained defect segmentation network with fixed parameters, the generated countermeasure network comprises a generated network and a discrimination network, the generated network is used for generating a target domain flawless face image, the defect segmentation network is used for segmenting a generated image output by the generated network to obtain a segmentation result based on a defective area and a non-defective area, the preset loss function comprises a defect inhibiting loss function, and the defect inhibiting loss function is obtained by converting the segmentation result and is used for restricting generation of defects of the generated image in a generation process.

For example, a pre-set original network model may be constructed, which includes generating a countermeasure network and a fault segmentation network. The specific internal structure for generating the antagonistic network and the fault segmentation network is not limited.

Among them, a generation countermeasure network (GAN), also called a generation discrimination network, has a very strong learning and generating capability for image information. The generation of the countermeasure network comprises two parts, a generation network (also called generator G) and a discrimination network (also called discriminator D). The generated network can be used for learning and removing flaws in the image, generating a target domain flaw-free face image and keeping consistency of features except skin texture; the discrimination network can be used for discriminating whether the face image output by the generation network is real or close to the target face sample image. In the training process, parameters in the confrontation network can be continuously optimized and generated, and the effect of enhancing the model capability is achieved.

Optionally, in the process of training a preset original network model based on a preset loss function by using the first training sample set, the generating network takes the original face sample image as input and outputs a generated image, and the discriminating network takes the corresponding target face sample image and the generated image as input and outputs whether the target face sample image and the generated image are the same.

The flaw segmentation network is used for assisting in generating training of the countermeasure network, and segmentation signals of flaws can be supervised. Before the step, a preset original flaw segmentation network can be trained in advance to obtain a trained flaw segmentation network, and the flaw segmentation network can segment a flaw area and a non-flaw area in the image. In the preset original network model, the flaw segmentation network is used for segmenting a generated image output by the generation network to obtain segmentation results based on a flaw area and a non-flaw area, and parameters in the flaw segmentation network are kept unchanged in the training process of the preset original network model.

In the embodiment of the invention, the function of the preset loss function is to provide guidance for the training process of the preset original network model, and parameters of the generated countermeasure network are continuously optimized during training so that the loss function is reduced, thereby achieving the effect of enhancing the model capability. The preset loss function includes a defect-inhibiting loss function, which is obtained by converting the segmentation result and is used for restricting the generation of defects in the generated image in the generation process, specifically, for restricting the difference between the generated image and the target image of the non-defective area. Alternatively, the defect loss suppressing function may specifically be to constrain the number of defect regions included in the generated image. Illustratively, the defect loss suppressing function is represented by a total number of defective regions included in the generated image output by the defect segmentation network.

For example, the preset loss function may further include other loss functions, such as a reconstruction loss function, an antagonistic loss function, a perceptual loss function, a structural information loss function, and the like, which are not limited specifically.

For example, after the training process is finished, a target network model is obtained, the target network model may include a trained generative confrontation network (which may be referred to as a target generative confrontation network) and a flaw segmentation network that remains unchanged, and the target generative confrontation network includes a trained generative network (which may be referred to as a target generative network) and a trained discriminant network (which may be referred to as a target discriminant network).

Step 103, determining a face image processing model according to a target generation network contained in the target network model, wherein the face image processing model is used for processing a face image to be processed so as to remove flaws contained in the face image to be processed.

Illustratively, in practical application, a face image containing flaws needs to be processed to remove the flaws, while the flaw segmentation network and the target discrimination network in the target network model do not actually participate in the processing of the face image, so that the flaw segmentation network and the discrimination network can be understood as networks for assisting the training of the model, and do not need to be used in an inference stage (i.e., an application stage), and the network mainly used is a target generation network. Therefore, the target generation network can be directly used as a face image processing model, or further optimization and adjustment can be carried out on the basis of the target generation network to obtain the face image processing model. When the method is applied, the human face image to be processed can be input into the human face image processing model, the human face image to be processed is subjected to flaw removing processing by using the human face image processing model, and the processed real and natural flawless human face image can be rapidly output.

The model training method provided by the embodiment of the invention adopts a mode of combining the generation countermeasure network and the flaw segmentation network, the generation countermeasure network is responsible for generating the flawless face, the consistency of other characteristics except skin texture is kept, the flaw segmentation network is used as an auxiliary network to promote the generation process of the generation countermeasure network, the generation network can learn to eliminate flaws by combining a function of inhibiting flaw loss, and then a flawless image is output.

In some embodiments, the preset loss function further includes a reconstruction loss function and a countermeasure loss function, wherein the reconstruction loss function is used for constraining a gap between the generated image and a corresponding target face sample image. The advantage of this arrangement is that the resulting network can be made better able to learn to remove flaws.

For example, the reconstruction loss function may bring the generated image (which may be referred to as gen _ img) close to the target face sample image (which may be referred to as target _ img), which helps the generated image to maintain the characteristics of the attribute picture (i.e., the original face sample image). The reconstruction loss function may be, for example, an L1 norm loss function, and may be represented by, for example, the following expression:

recon_loss＝||gen_img-target_img||₂

where recon _ loss represents the reconstruction loss function.

Illustratively, the countering loss function makes the generated picture more realistic and natural and close to the data distribution of the skin-beautifying face, which can be realized by conventional countering loss, and can be denoted as gan _ loss.

In some embodiments, before the training the preset original network model based on the preset loss function by using the training sample set, the method further includes: obtaining a second training sample set comprising the training sample pairs; segmenting and labeling the original face sample image according to the difference between the original face sample image and the target face sample image in each training sample pair in the second training sample set to obtain a target segmentation image containing a defective area and a non-defective area; and taking the original face sample image as an input of a preset original flaw segmentation network, taking a corresponding target segmentation image as an expectation, and training the preset original flaw segmentation network based on a preset segmentation loss function to obtain the flaw segmentation network. The advantage of this arrangement is that a fault segmentation network can be trained that accurately divides the faulty and non-faulty areas.

For example, the second training sample set may include a second preset number of training sample pairs, where the second preset number may be the same as or different from the first preset number, and the training sample pairs included in the second training sample set also include an original face sample image including a flaw and a target face sample image with the flaw removed on the basis of the original face sample image. Each image in the training sample pair included in the second training sample set may be the same as or different from each image in the training sample pair included in the first training sample set, and is not limited specifically.

For example, the target face sample image may be considered as a flawless face image, and if a difference between a first pixel region in the original face sample image and a corresponding first target pixel region in the target face sample image is large, it may be considered that the first pixel region includes a flaw, so as to perform segmentation and labeling on the original face sample image. The fault segmentation network is used for identifying a fault area and a non-fault area in an image, inputting an original human face sample image into a preset original fault segmentation network, outputting a segmentation image obtained by segmenting the fault area and the non-fault area, and if the preset original fault segmentation network can be accurately segmented, enabling the output segmentation image to be consistent with a corresponding target segmentation image.

In some embodiments, the segmenting and labeling the original face sample image according to a difference between the original face sample image and the target face sample image in each training sample pair in the second training sample set includes: for each training sample pair in the second training sample set, calculating an absolute difference value of a first pixel and a second pixel corresponding to each pixel position, wherein the first pixel is derived from the original face sample image, and the second pixel is derived from the target face sample image; determining a target segmentation threshold value according to each absolute difference value; and carrying out segmentation and labeling on the original face sample image by using the target segmentation threshold, wherein the part corresponding to the absolute difference value larger than or equal to the target segmentation threshold is labeled as a defective area, and the part corresponding to the absolute difference value smaller than the target segmentation threshold is labeled as a non-defective area. The method has the advantages that the target segmentation threshold can be reasonably determined according to the overall difference condition, and then the training samples are quickly and accurately labeled according to the target segmentation threshold.

For example, the original face sample image and the target face sample image have the same size and the same number of pixels, and a coordinate system may be constructed with a certain vertex of the image as a coordinate origin, and the pixel position may be represented by coordinates. And calculating the absolute value of the difference between the pixel values of the first pixel from the original face sample image and the second pixel from the target face sample image at the same pixel position. For example, if the pixel value of the first pixel is m and the pixel value of the second pixel is n, the absolute difference is m-n. Optionally, the absolute difference values are analyzed, and a target segmentation threshold is determined according to the distribution condition or the concentration degree of each absolute difference value. For example, the target division threshold may be determined from a median, an average, or a percentile, or the like, and further, the target division threshold may be determined by multiplying a product obtained by multiplying a preset coefficient on the basis of the median, the average, or the percentile, or the like.

Illustratively, when the original face sample image is subjected to segmentation labeling, a binary mask image (which may be understood as a group-trout) can be obtained, and the defective area is marked as 1 and the non-defective area is marked as 0. After the original face sample image passes through the flaw segmentation network, a binary segmentation mask image which is as large as the original face sample image can be obtained. In the process of training the preset original flaw segmentation network, the difference between the output binary segmentation mask map and the binary mask map serving as the target segmentation image can be constrained by using a preset segmentation loss function, namely the binary segmentation mask map approaches to the binary mask map, and the accurate flaw segmentation network is obtained through gradient descent and parameter updating.

The segmentation in the embodiments of the present invention can be regarded as a classification problem of pixels, and the conventional classification problem generally uses cross-entropy loss. However, for the face image, according to the characteristics of the skin defect, the proportion of the skin defect in the face image is usually small, so that the proportion of the positive sample and the negative sample is seriously unbalanced, the number of the negative sample (the position without the defect, such as the large-area skin and the background) is obviously larger than that of the positive sample (the defect position such as pox or color spots), and the common cross entropy loss cannot achieve a better training effect. In the embodiment of the invention, aiming at the characteristics, an unconventional loss function can be adopted to weaken the influence of unbalanced proportion of positive and negative samples on the training effect.

For example, the preset segmentation Loss function may include Weighted cross entropy Loss function (Weighted cross entry Loss), Focal Loss function (Focal Loss), and die Loss function (Dice Loss). The Dice Loss is only used for medical image segmentation at present, but the Dice Loss is applied to segmentation of the defective face image, and tests show that a good training effect can be obtained, and the problem that positive and negative samples of the defective face image are unbalanced is well solved.

Fig. 2 is a schematic flow chart of another model training method provided in an embodiment of the present invention, which is optimized based on the above optional embodiments, as shown in fig. 2, the method may include:

step 201, a second training sample set containing training sample pairs is obtained.

The training sample pair comprises an original face sample image containing flaws and a target face sample image with flaws removed on the basis of the original face sample image.

Step 202, segmenting and labeling the original face sample image according to the difference between the original face sample image and the target face sample image in each training sample pair in the second training sample set to obtain a target segmentation image containing a defective area and a non-defective area.

Illustratively, for each training sample pair, absolute values of differences between pixel values of a first pixel from an original face sample image and a second pixel from a target face sample image, which correspond to each pixel position, are calculated, a target segmentation threshold is determined according to each absolute value, the original face sample image is labeled by using the target segmentation threshold, and a target segmentation image including a defective area and a non-defective area is obtained, specifically, the target segmentation image may be a binary mask image, the defective area is recorded as 1, and the non-defective area is recorded as 0.

And 203, taking the original face sample image as an input of a preset original flaw segmentation network, taking a corresponding target segmentation image as an expectation, and training the preset original flaw segmentation network based on a preset segmentation loss function to obtain the flaw segmentation network.

Wherein, the preset segmentation Loss function is Dice Loss.

Fig. 3 is a schematic diagram of a flaw segmentation network training process according to an embodiment of the present invention, and as shown in fig. 3, an original face sample image is input as an input image into a preset original flaw segmentation network, a segmentation image is output, and a preset segmentation loss function is calculated according to the segmentation image and a target segmentation image. For example, a binary segmentation mask graph output from a preset original flaw segmentation network is constrained by Dice Loss to approach to the binary mask graph, and weight parameters in the preset original flaw segmentation network are continuously adjusted to finally obtain a trained flaw segmentation network. In the binary mask map, 0 is represented in black, and 1 is represented in white, that is, white represents a defective region.

And step 204, constructing a preset original network model according to the flaw segmentation network and the generated countermeasure network.

Fig. 4 is a schematic diagram of a training process of a preset original network model according to an embodiment of the present invention, and as shown in fig. 4, the preset original network model includes a generation countermeasure network formed by a generation network and a decision network, and further includes a fault segmentation network. And the generated image output by the generating network is used as the input of the flaw segmentation network, and the target face sample image and the generated image output by the generating network are used as the input of the discrimination network.

It should be noted that, in the images each including a human face in fig. 3 and 4, the eye is subjected to mosaic processing, which is not generally performed in the actual training process and the application process.

Step 205, a first training sample set comprising training sample pairs is obtained.

And step 206, training a preset original network model by using the first training sample set based on a preset loss function to obtain a target network model.

The preset loss function comprises a reconstruction loss function, a countermeasure loss function and a defect inhibition loss function.

As shown in fig. 4, the generation network takes the original face sample image as input and outputs a generated image, and the generated image is made to approach the target face sample image by using the constraints of the reconstruction loss function and the countervailing loss function, so as to learn and generate the target face sample image with flaws removed. And (4) judging the network, taking the generated image and the target human face sample image as input, and learning to distinguish the generated image and the target human face sample image. The defect segmentation network uses a defect loss inhibiting function to punish a generation network which does not successfully remove defects, the defect segmentation network takes a generation image as input, a binary segmentation mask image is output, the defect loss inhibiting function is calculated according to the binary segmentation mask image, the defect loss inhibiting function restrains the generation image from having no defects, namely, the corresponding values of pixels in the binary segmentation mask image are all restrained to be 0 (such as a completely black image in FIG. 4), and the defect loss inhibiting function can be represented by the sum of pixel values after binarization in the binary segmentation mask image and can be recorded as sum (S (gen _ img)). By making the inhibit flaw loss function approach 0, the forced net learning eliminates flaws.

Alternatively, the predetermined loss function may be represented as a weighted sum of the reconstruction loss function, the countermeasure loss function, and the flaw suppression loss function. For example, it can be expressed by the following expression:

Loss＝a*recon_loss+b*gan_loss+c*res_loss

wherein, Loss represents a preset Loss function, recon _ Loss represents a reconstruction Loss function, gan _ Loss represents a countermeasure Loss function, res _ Loss represents a defect suppression Loss function, a represents a first weight coefficient, b represents a second weight coefficient, and c represents a third weight coefficient. a. The values of b and c can be set according to actual requirements.

In the training process, parameters of the flaw segmentation network are fixed, parameters in the generation network and the judgment network can be respectively fixed and alternately optimized, the preset loss function minimum is taken as a target for optimization, and respective parameters are updated through gradient descent.

And step 207, determining a human face image processing model according to a target generation network contained in the target network model.

The face image processing model is used for processing a face image to be processed so as to remove flaws contained in the face image to be processed.

According to the model training method provided by the embodiment of the invention, the obtained human face image processing model is wide in application range, can be suitable for any skin type, is strong in sense of reality, can well process large-area dense flaws, cannot cause mottled signs, basically keeps the generated image consistent with the original image for the skin without flaws, cannot cause damage such as quality or color difference, and in practical application, only one network, namely the generated network, is superior to a scheme of removing flaws by two steps of detecting and filling or locally grinding the skin, has more advantages in time, can support video-level acne removal, and can be well suitable for application scenes with higher timeliness requirements such as live broadcast or video call and the like for processing real-time video images.

Fig. 5 is a flowchart illustrating an image processing method according to an embodiment of the present invention, where the method may be executed by an image processing apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. As shown in fig. 5, the method includes:

and step 501, acquiring a face image to be processed.

For example, the specific source of the facial image to be processed is not limited, and the facial image to be processed may be a facial image local to the computer device, a facial image from a network, a facial image acquired in real time, or the like. Optionally, the face image to be processed may be a real-time video image including a face in a video call, or a video frame including a face in a live stream.

Step 502, inputting the face image to be processed into a face image processing model so as to output a target face image which is subjected to flaw removal processing and corresponds to the face image to be processed.

The face image processing model is obtained by the model training method provided by the embodiment of the invention.

The face image to be processed is input into the face image processing model provided by the embodiment of the invention, and the target face image with flaws removed can be output, so that the beautifying effect is achieved.

The image processing method provided by the embodiment of the invention can obtain a more real and natural flawless face image, does not need multi-stage processing, outputs the flawless face image at one time, can improve the efficiency of treating the flawed removal, enhances the real-time property, and can be well suitable for application scenes with higher timeliness requirements such as live broadcast or video call and the like for treating real-time video images.

Fig. 6 is a block diagram of a model training apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device, and may perform model training by executing a model training method. As shown in fig. 6, the apparatus includes:

a sample set obtaining module 601, configured to obtain a first training sample set including a training sample pair, where the training sample pair includes an original face sample image including a flaw and a target face sample image with the flaw removed on the basis of the original face sample image;

a model training module 602, configured to train a preset original network model based on a preset loss function by using the first training sample set to obtain a target network model, where the preset original network model includes a generated countermeasure network with parameters to be adjusted and a pre-trained defect segmentation network with fixed parameters, the generated countermeasure network includes a generation network and a decision network, the generation network is used to generate a target domain flawless face image, the defect segmentation network is used to segment a generated image output by the generation network to obtain a segmentation result based on a defective region and a non-defective region, the preset loss function includes a defect-inhibiting loss function, and the defect-inhibiting loss function is obtained by converting the segmentation result and is used to constrain generation of defects in a generation process of the generated image;

a model determining module 603, configured to determine a face image processing model according to a target generation network included in the target network model, where the face image processing model is configured to process a face image to be processed to remove flaws included in the face image to be processed.

The model training device provided by the embodiment of the invention adopts a mode of combining the generation countermeasure network and the flaw segmentation network, the generation countermeasure network is responsible for generating the flawless face, the consistency of other characteristics except skin texture is favorably kept, the flaw segmentation network is used as an auxiliary network to promote the generation process of the generation countermeasure network, the generation network can learn to eliminate flaws by combining a function of inhibiting flaw loss, and then a flawless image is output.

Fig. 7 is a block diagram of an image processing apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device, and may perform image processing by executing an image processing method. As shown in fig. 7, the apparatus includes:

a to-be-processed image obtaining module 701, configured to obtain a to-be-processed face image;

the image processing module 702 is configured to input the face image to be processed into a face image processing model, so as to output a target face image which is subjected to flaw removal processing and corresponds to the face image to be processed, where the face image processing model is obtained by using the model training method provided in the embodiment of the present invention.

The image processing device provided by the embodiment of the invention can obtain a more real and natural flawless face image, does not need multi-stage processing, outputs the flawless face image at one time, can improve the efficiency of treating the flawed removal, enhances the real-time property, and can be well suitable for application scenes with higher timeliness requirements such as live broadcast or video call and the like for treating real-time video images.

The embodiment of the invention provides computer equipment, wherein the model training device provided by the embodiment of the invention can be integrated in the computer equipment. Fig. 8 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 800 comprises a memory 801, a processor 802 and a computer program stored on the memory 801 and executable on the processor 802, wherein the processor 802 implements the model training method and/or the image processing method provided by the embodiments of the present invention when executing the computer program.

Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used to perform the model training method and/or the image processing method provided by embodiments of the present invention.

The model training device, the image processing device, the equipment and the storage medium provided in the above embodiments can execute the corresponding method provided in any embodiment of the present invention, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in the above embodiments, reference may be made to a model training method and an image processing method provided in any embodiment of the present invention.

Note that the above is only a preferred embodiment of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in more detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the claims.

Claims

1. A method of model training, comprising:

2. The method according to claim 1, wherein in a process of training a preset original network model based on a preset loss function by using the first training sample set, the generating network takes the original face sample image as input and outputs a generated image, and the discriminating network takes the corresponding target face sample image and the generated image as input and outputs whether the target face sample image and the generated image are the same.

3. The method according to claim 2, wherein the preset loss function further comprises a reconstruction loss function and a countermeasure loss function, wherein the reconstruction loss function is used for constraining a gap between the generated image and the corresponding target face sample image.

4. The method of claim 1, further comprising, before said training a preset original network model based on a preset loss function using said set of training samples:

obtaining a second training sample set comprising the training sample pairs;

segmenting and labeling the original face sample image according to the difference between the original face sample image and the target face sample image in each training sample pair in the second training sample set to obtain a target segmentation image containing a defective area and a non-defective area;

and taking the original face sample image as an input of a preset original flaw segmentation network, taking a corresponding target segmentation image as an expectation, and training the preset original flaw segmentation network based on a preset segmentation loss function to obtain the flaw segmentation network.

5. The method according to claim 4, wherein the segmenting and labeling the original face sample image according to the difference between the original face sample image and the target face sample image in each training sample pair in the second training sample set comprises:

for each training sample pair in the second training sample set, calculating an absolute difference value of a first pixel and a second pixel corresponding to each pixel position, wherein the first pixel is derived from the original face sample image, and the second pixel is derived from the target face sample image;

determining a target segmentation threshold value according to each absolute difference value;

and carrying out segmentation and labeling on the original face sample image by using the target segmentation threshold, wherein the part corresponding to the absolute difference value larger than or equal to the target segmentation threshold is labeled as a defective area, and the part corresponding to the absolute difference value smaller than the target segmentation threshold is labeled as a non-defective area.

6. The method of claim 4, wherein the predetermined splitting loss function comprises a dice loss function.

7. The method of any of claims 1-6, wherein said defect loss suppressing function is represented by a total number of defective regions contained in said generated image output by said defect segmentation network.

8. An image processing method, comprising:

acquiring a face image to be processed;

inputting the face image to be processed into a face image processing model to output a target face image which is subjected to flaw removal processing and corresponds to the face image to be processed, wherein the face image processing model is obtained by the model training method according to any one of claims 1 to 7.

9. A model training apparatus, comprising:

10. An image processing apparatus characterized by comprising:

an image processing module, configured to input the facial image to be processed into a facial image processing model, so as to output a target facial image subjected to flaw removal processing corresponding to the facial image to be processed, where the facial image processing model is obtained by using the model training method according to any one of claims 1 to 7.

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-8 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.