CN112465717B

CN112465717B - Face image processing model training method, device, electronic equipment and medium

Info

Publication number: CN112465717B
Application number: CN202011342855.4A
Authority: CN
Inventors: 李华夏
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2024-05-31
Anticipated expiration: 2040-11-25
Also published as: CN112465717A

Abstract

Embodiments of the present disclosure disclose facial image processing model training methods, apparatuses, electronic devices, and computer readable media. One embodiment of the method comprises the following steps: acquiring a training sample set; selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, and the target ratio is a first preset value in a training starting stage and a second preset value in a training ending stage; and taking the face images in the training sample set as input, taking the processed face images corresponding to the face images as expected output, and training to obtain a face image processing model. This embodiment achieves an improvement in processing quality for a local area in a face image.

Description

Face image processing model training method, device, electronic equipment and medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a facial image processing model training method, apparatus, electronic device, and computer readable medium.

Background

Image processing techniques address different needs by performing different processing on the image. For example, broken or blurred images may be repaired to provide a processed, sharp, complete image. With the development of artificial intelligence technology, artificial neural networks play an important role in the field of image processing. However, there is a need for improving the quality of local area processing in some scenarios for artificial neural networks. For example, when repairing a broken image including a face region, the processing quality of a local region (for example, a tooth region) is not high.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose facial image processing model training methods, apparatuses, electronic devices, and computer readable media to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a facial image processing model training method, including: acquiring a training sample set, wherein the training sample in the training sample set comprises a face image and a processed face image corresponding to the face image, and the face image comprises a face image of a display target part and a face image of a non-display target part; selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, and the target ratio is a first preset value in a training starting stage and a second preset value in a training ending stage; and taking the face images in the training sample set as input, taking the processed face images corresponding to the face images as expected output, and training to obtain a face image processing model.

In a second aspect, some embodiments of the present disclosure provide a facial image processing model training apparatus, comprising: an acquisition unit configured to acquire a training sample set, wherein a training sample in the training sample set includes a face image and a processed face image corresponding to the face image, the face image including a face image of a display target portion and a face image of a non-display target portion; the training sample set is configured to display the face image of the target part and the face image of the non-display target part in the training sample set, and the number of the face images of the non-display target part in the training sample set is in accordance with a target proportion; and the training unit is configured to take the face images in the training sample group as input, take the processed face images corresponding to the face images as expected output, and train to obtain a face image processing model.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantageous effects: the facial image processing model obtained by the facial image processing model training method improves the processing quality of local areas in the facial image. In particular, the inventors have found that the reason for the low quality of processing of the local area by the associated artificial neural network is: natural training samples have the phenomenon of sample imbalance. That is, some classes of training samples do not exist or exist in relatively small numbers. For example, facial images containing dental regions are less prevalent in the sample image. This results in a poor learning of the features of these regions and thus in a poor processing quality for the local regions. Based on this, a training sample set employed in the face image processing model training method of some embodiments of the present disclosure includes a face image of a display target portion and a face image of a non-display target portion. And the ratio of the number of face images displaying the target portion and face images not displaying the target portion matches the target ratio. Therefore, the problems are overcome, the characteristics of the areas can be well learned in the training process of the model, and the processing quality of the local areas is finally improved. At the same time, the ratio of the number of face images showing the target portion and the number of face images not showing the target portion is different at different stages of training. Specifically, the target ratio is a first preset value at the training start stage and a second preset value at the training end stage. According to practical situations, the second preset value can be set to be a natural sample distribution proportion, and the first preset value can be higher than the second preset value. Through the proportion setting, the distribution of the samples is more in line with the natural sample distribution while the model is enabled to learn the characteristics well, and the phenomenon that the model learns sample deviation and influences the image processing effect is avoided.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a schematic illustration of one application scenario of a facial image processing model training method according to some embodiments of the present disclosure;

FIG. 2 is a flow chart of some embodiments of a facial image processing model training method according to the present disclosure;

FIG. 3 is a flow chart of further embodiments of a facial image processing model training method according to the present disclosure;

FIG. 4 is a schematic structural diagram of some embodiments of a facial image processing model training apparatus according to the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is one application scenario 100 of a facial image processing model training method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, computing device 101 may first obtain training sample set 102. Wherein the training samples in the training sample set 102 include face images and processed face images corresponding to the face images. The face image includes a face image displaying the target portion and a face image not displaying the target portion. For example, the target site may be a tooth. Training sample 1021 includes a facial image showing teeth. Training samples 1022 include facial images that do not show teeth. It will be appreciated that other training samples may also be included in training sample set 102.

Computing device 101 may then select training sample set 103 from training sample set 102. Wherein, the ratio of the number of the face images of the display target part and the face images of the non-display target part in the training sample set accords with the target ratio. Wherein the value of the target ratio varies in different stages of training. Specifically, the target ratio is a first preset value at the training start stage and a second preset value at the training end stage. Alternatively, the target ratio may be an intermediate value between the first preset value and the second preset value in the training intermediate stage.

On the basis, the face image in the training sample set 103 is used as an input, the processed face image corresponding to the face image is used as a desired output, and the face image processing model 104 is trained.

With continued reference to fig. 2, a flow 200 of some embodiments of facial image processing model training methods in accordance with the present disclosure is shown. The face image processing model training method comprises the following steps:

Step 201, a training sample set is obtained.

In some embodiments, the subject performing the facial image processing model training method may obtain a training sample set from a ground or communicatively connected electronic device. The training samples in the training sample set comprise face images and processed face images corresponding to the face images. The face image includes a face image displaying the target portion and a face image not displaying the target portion.

In some embodiments, the target sites may be various sites of the face, e.g., teeth, eyes, etc. Further, depending on the scene that should be, the processed face image may be a face image subjected to various processes, which may be a desired output of the model to supervise the model. In practice, the face image may be an image obtained by defogging, repairing, beautifying, and the like.

Step 202, selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, and the target ratio is a first preset value in a training start stage and a second preset value in a training end stage.

In some embodiments, the executing entity may select a training sample set from a training sample set. In this process, the ratio of the number of face images of the displayed target portion and face images of the non-displayed target portion in the training sample set is controlled to conform to the target ratio. Furthermore, the value of the target ratio varies in different stages of training. Specifically, the target ratio is a first preset value at the training start stage and a second preset value at the training end stage.

For example, the second preset value may be set to a natural sample distribution ratio, and the first preset value may be higher than the second preset value. Through the proportion setting, the distribution of the samples is more in line with the natural sample distribution while the model is enabled to learn the characteristics well, and the phenomenon that the model learns sample deviation and influences the image processing effect is avoided.

Alternatively, the target ratio may be an intermediate value between the first preset value and the second preset value in the training intermediate stage.

For example, in the case where the target site is a tooth, since in the natural training sample set, the ratio of the facial image showing the tooth to the facial image showing no tooth is between 1:5 and 1:10. The first preset value may be 5:1. By increasing the proportion of the facial image showing the teeth, the model is better characterized for learning, thereby improving the quality of processing for local areas in the facial image. At the same time, the model may learn the sample deviation due to the artificial increase of the proportion of the face image of the teeth, which may affect the image processing effect. Thus, to address this issue, the second preset value may be set to a natural sample distribution ratio (e.g., 1:5). Therefore, the distribution of the samples can be more in line with the natural sample distribution, and the influence of sample deviation learned by a model on the image processing effect is avoided.

It will be appreciated that the training process may be divided into different phases according to actual needs. For example, the training process may be divided into different phases according to a predetermined number of iterations. Wherein the training initiation phase comprises at least a first iteration. The training end phase includes at least the last iterations.

In some alternative implementations of some embodiments, the target proportion is determined from the first preset value, the second preset value, and the current number of iterations during the training intermediate stage.

For example, the preset number of iterations may be 96. Each training sample set includes 48 training samples. Then, in the training sample group selected for the first time, the number of face images of the display target portion and face images of the non-display target portion are 40 and 8, respectively. In the training sample group selected last time, the number of the face images of the displayed target part and the face images of the non-displayed target part are respectively 8 and 40. The number of face images of the display target portion may be reduced by 1 and the number of face images of the non-display target portion may be increased by 1 every three passes.

And 203, taking the face images in the training sample set as input, taking the processed face images corresponding to the face images as expected output, and training to obtain a face image processing model.

In some embodiments, the executing body may use the face image in the training sample set as an input, and use the processed face image corresponding to the face image as a desired output, so as to train to obtain the face image processing model.

Specifically, as an example, a difference between the actually output image and the processed face image corresponding to the face image may be determined based on a preset loss function; and adjusting parameters of the initial image processing model based on the difference until the training ending condition is met, so that the face image processing model is obtained. The training ending condition comprises the number of iterations reaching a preset number, the difference being smaller than or equal to a preset loss value, and the like.

According to the method, the facial image processing model obtained through the facial image processing model training method improves the processing quality of local areas in the facial image. In particular, the inventors have found that the reason for the low quality of processing of the local area by the associated artificial neural network is: natural training samples have the phenomenon of sample imbalance. That is, some classes of training samples do not exist or exist in relatively small numbers. For example, facial images containing dental regions are less prevalent in the sample image. This results in a poor learning of the features of these regions and thus in a poor processing quality for the local regions. Based on this, a training sample set employed in the face image processing model training method of some embodiments of the present disclosure includes a face image of a display target portion and a face image of a non-display target portion. And the ratio of the number of face images displaying the target portion and face images not displaying the target portion matches the target ratio. Therefore, the problems are overcome, the characteristics of the areas can be well learned in the training process of the model, and the processing quality of the local areas is finally improved. At the same time, the ratio of the number of face images showing the target portion and the number of face images not showing the target portion is different at different stages of training. Specifically, the target ratio is a first preset value at the training start stage and a second preset value at the training end stage. According to practical situations, the second preset value can be set to be a natural sample distribution proportion, and the first preset value can be higher than the second preset value. Through the proportion setting, the distribution of the samples is more in line with the natural sample distribution while the model is enabled to learn the characteristics well, and the phenomenon that the model learns sample deviation and influences the image processing effect is avoided.

With further reference to FIG. 3, a flow 300 of further embodiments of a facial image processing model training method is shown. The face image processing model training method flow 300 includes the steps of:

Step 301, a training sample set is obtained.

Step 302, selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, and the target ratio is a first preset value in a training start stage and a second preset value in a training end stage.

In some embodiments, the specific implementation of steps 301 to 302 and the technical effects thereof may refer to steps 201 to 202 in those embodiments corresponding to fig. 2, which are not described herein.

Step 303, inputting the face image in the training sample set into the initial image processing model to obtain an actual output image.

In some embodiments, the initial image processing model may be an untrained or untrained completed image processing model, as desired. The network structure of the initial image processing model may be selected according to actual needs, for example, CNN (convolutional neural network), GAN (generation countermeasure network), or the like.

Step 304, determining a difference between the actual output image and the processed face image corresponding to the face image based on a preset loss function, wherein the preset loss function comprises a feature matching loss.

In some embodiments, the execution subject of the facial image processing model training method may determine a difference between the actual output image and the processed facial image corresponding to the facial image based on a preset penalty function, the preset penalty function including a feature matching penalty. Wherein, the feature matching loses the output of the middle layer used for supervising the model, so that the final output can be ensured to be consistent, and the output of the middle layer can be consistent as much as possible.

In some alternative implementations of some embodiments, the feature matching penalty is obtained by: inputting an output result of a target layer in the initial image processing model into a feature extraction network to obtain feature data of the target layer; and obtaining the feature matching loss based on the feature data.

In step 305, parameters of the initial image processing model are adjusted based on the differences.

In some embodiments, as an example, back propagation, random gradient descent, etc. methods may be used to propagate the differences forward from the last layer of the model and adjust the parameters of the model.

In some optional implementations of some embodiments, the facial image of the display target portion is generated by: acquiring a face image of an original display target part; determining a mask image of a target part corresponding to the face image of the original display target part; and superposing the mask image of the target part and the face image of the original display target part to obtain the face image of the display target part.

In these implementations, the model is made to focus more on the target site by adding a mask image of the target site, resulting in improved processing quality for the target site.

As can be seen from fig. 3, the steps of determining the difference between the actual output image and the processed face image corresponding to the face image based on a preset loss function, and adjusting the model parameters are added as compared to the description of some embodiments corresponding to fig. 2. The preset loss function comprises characteristic matching loss. Wherein, the feature matching loses the output of the middle layer used for supervising the model, so that the final output can be ensured to be consistent, and the output of the middle layer can be consistent as much as possible.

With further reference to fig. 4, as an implementation of the method illustrated in the above figures, the present disclosure provides some embodiments of a facial image processing model training apparatus, which correspond to those illustrated in fig. 2, and which are particularly applicable in various electronic devices.

As shown in fig. 4, the facial image processing model training apparatus 400 of some embodiments includes: an acquisition unit 401, a selection unit 402, and a training unit 403. Wherein the obtaining unit 401 is configured to obtain a training sample set, wherein the training samples in the training sample set include a face image and a processed face image corresponding to the face image, the face image including a face image of a display target portion and a face image of a non-display target portion. The selecting unit 402 is configured to select a training sample set from the set of training samples, in which a ratio of the number of face images of the displayed target portion and face images of the non-displayed target portion matches a target ratio, the target ratio being a first preset value at a training start stage and a second preset value at a training end stage. The training unit 403 is configured to train to obtain a face image processing model by taking the face images in the training sample set as input, and the processed face images corresponding to the face images as desired output.

In an alternative implementation of some embodiments, the facial image of the display target region is generated by: acquiring a face image of an original display target part; determining a mask image of a target part corresponding to the face image of the original display target part; and superposing the mask image of the target part and the face image of the original display target part to obtain the face image of the display target part.

In alternative implementations of some embodiments, training unit 403 may be further configured to: inputting the face images in the training sample set into an initial image processing model to obtain an actual output image; determining the difference between the actual output image and the processed face image corresponding to the face image based on a preset loss function, wherein the preset loss function comprises feature matching loss; based on the differences, parameters of the initial image processing model are adjusted.

In an alternative implementation of some embodiments, the feature matching penalty is obtained by: inputting an output result of a target layer in the initial image processing model into a feature extraction network to obtain feature data of the target layer; and obtaining the feature matching loss based on the feature data.

In an alternative implementation of some embodiments, the target proportion is determined according to the first preset value, the second preset value and the current number of iterations during the training intermediate stage.

In some embodiments, the quality of processing for local regions in the facial image is improved.

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., the electronic device of fig. 1) 500 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic devices in some embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 5 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communications device 509, or from the storage device 508, or from the ROM 502. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.

It should be noted that the computer readable medium according to some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein the training sample in the training sample set comprises a face image and a processed face image corresponding to the face image, and the face image comprises a face image of a display target part and a face image of a non-display target part; selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, and the target ratio is a first preset value in a training starting stage and a second preset value in a training ending stage; and taking the face images in the training sample set as input, taking the processed face images corresponding to the face images as expected output, and training to obtain a face image processing model.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor comprises an acquisition unit, a selection unit and a training unit. The names of these units do not in any way constitute a limitation of the unit itself, for example, the acquisition unit may also be described as "unit acquiring a training sample set".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

According to one or more embodiments of the present disclosure, a facial image processing model training method includes: acquiring a training sample set, wherein the training sample in the training sample set comprises a face image and a processed face image corresponding to the face image, and the face image comprises a face image of a display target part and a face image of a non-display target part; selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, and the target ratio is a first preset value in a training starting stage and a second preset value in a training ending stage; and taking the face images in the training sample set as input, taking the processed face images corresponding to the face images as expected output, and training to obtain a face image processing model.

According to one or more embodiments of the present disclosure, a facial image of a display target portion is generated by: acquiring a face image of an original display target part; determining a mask image of a target part corresponding to the face image of the original display target part; and superposing the mask image of the target part and the face image of the original display target part to obtain the face image of the display target part.

According to one or more embodiments of the present disclosure, a face image in a training sample set is taken as an input, a processed face image corresponding to the face image is taken as a desired output, and a face image processing model is obtained through training, including: inputting the face images in the training sample set into an initial image processing model to obtain an actual output image; determining the difference between the actual output image and the processed face image corresponding to the face image based on a preset loss function, wherein the preset loss function comprises feature matching loss; based on the differences, parameters of the initial image processing model are adjusted.

In accordance with one or more embodiments of the present disclosure, the feature matching loss is obtained by: inputting an output result of a target layer in the initial image processing model into a feature extraction network to obtain feature data of the target layer; and obtaining the feature matching loss based on the feature data.

According to one or more embodiments of the present disclosure, during the training intermediate stage, the target proportion is determined according to the first preset value, the second preset value, and the current number of iterations.

According to one or more embodiments of the present disclosure, there is provided a facial image processing model training apparatus including: an acquisition unit configured to acquire a training sample set, wherein a training sample in the training sample set includes a face image and a processed face image corresponding to the face image, the face image including a face image of a display target portion and a face image of a non-display target portion; the training sample set is configured to display the face image of the target part and the face image of the non-display target part in the training sample set, and the number of the face images of the non-display target part in the training sample set is in accordance with a target proportion; and the training unit is configured to take the face images in the training sample group as input, take the processed face images corresponding to the face images as expected output, and train to obtain a face image processing model.

According to one or more embodiments of the present disclosure, the training unit is further configured to: inputting the face images in the training sample set into an initial image processing model to obtain an actual output image; determining the difference between the actual output image and the processed face image corresponding to the face image based on a preset loss function, wherein the preset loss function comprises feature matching loss; based on the differences, parameters of the initial image processing model are adjusted.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A facial image processing model training method, comprising:

Acquiring a training sample set, wherein a training sample in the training sample set comprises a face image and a processed face image corresponding to the face image, and the face image comprises a face image of a display target part and a face image of a non-display target part; wherein the face image of the display target portion is generated by: acquiring a face image of an original display target part; determining a mask image of the target part corresponding to the face image of the original display target part; superposing the mask image of the target part and the face image of the original display target part to obtain the face image of the display target part;

Selecting a training sample set from the training sample set, wherein the ratio of the number of the face images of the displayed target part to the number of the face images of the non-displayed target part in the training sample set accords with a target ratio, the target ratio is a first preset value in a training starting stage and is a second preset value in a training ending stage;

And taking the face images in the training sample set as input, taking the processed face images corresponding to the face images as expected output, and training to obtain the face image processing model.

2. The method of claim 1, wherein the training to obtain the facial image processing model takes the facial images in the training sample set as input, and the processed facial images corresponding to the facial images as expected output, comprises:

inputting the face images in the training sample set into an initial image processing model to obtain an actual output image;

determining a difference between the actual output image and the processed face image corresponding to the face image based on a preset loss function, wherein the preset loss function comprises feature matching loss;

based on the differences, parameters of the initial image processing model are adjusted.

3. The method of claim 2, wherein the feature matching penalty is obtained by:

inputting an output result of a target layer in the initial image processing model into a feature extraction network to obtain feature data of the target layer;

And obtaining the feature matching loss based on the feature data.

4. The method of claim 1, wherein the target proportion is determined from the first preset value, the second preset value, and a current number of iterations during an intermediate stage of training.

5. A facial image processing model training apparatus, comprising:

An acquisition unit configured to acquire a training sample set, wherein a training sample in the training sample set includes a face image including a face image of a display target part and a face image of a non-display target part and a processed face image corresponding to the face image; wherein the face image of the display target portion is generated by: acquiring a face image of an original display target part; determining a mask image of the target part corresponding to the face image of the original display target part; superposing the mask image of the target part and the face image of the original display target part to obtain the face image of the display target part;

A selecting unit configured to select a training sample set from the training sample set, where a ratio of the number of face images of the displayed target portion and face images of the non-displayed target portion in the training sample set matches a target ratio, the target ratio being a first preset value at a training start stage and a second preset value at a training end stage;

And the training unit is configured to take the face images in the training sample set as input, take the processed face images corresponding to the face images as expected output and train to obtain the face image processing model.

6. The apparatus of claim 5, wherein the training unit is further configured to:

7. An electronic device, comprising:

One or more processors;

A storage device having one or more programs stored thereon,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

8. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-4.