CN116168127A

CN116168127A - Image processing method, device, computer storage medium and electronic equipment

Info

Publication number: CN116168127A
Application number: CN202111398327.5A
Authority: CN
Inventors: 张玉兵
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2023-05-26

Abstract

The embodiment of the application discloses an image processing method, an image processing device, a computer storage medium and electronic equipment. Wherein the method comprises the following steps: acquiring an original face image; extracting features of an original face image by using an encoder model to obtain a target feature vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample, and the first training sample comprises: the first face image and the second training sample pair comprise: a second face image and a third face image of a target style; and generating an image of the target feature vector by using the pre-trained generation model to generate a target face image of the target style. Therefore, the embodiment of the application can obtain a better corresponding relation between the real face and the cartoon face picture distribution, so that a more similar cartoon face image is stably generated.

Description

Image processing method, device, computer storage medium and electronic equipment

Technical Field

The present application relates to the field of image processing and computer vision, and in particular, to an image processing method, an image processing apparatus, a computer storage medium, and an electronic device.

Background

There are two general approaches to the existing scheme of cartoonizing face images. The method is characterized in that the method is U-GAT-IT, the method is mainly based on a technical framework of cyclegaN (Cycle Generative Adversarial Network) for circularly generating an countermeasure network, and the correlation between an input face picture and an output cartoon face picture is restrained through cycle loss; secondly, the method is mainly based on a StyleGAN (Style Generative Adversarial Network) series technology framework, interpolation between pictures is achieved through exchange among network layers, and correlation between input face pictures and output cartoon face pictures is restrained.

However, the two technical schemes have the following problems: the similarity of the input face image and the output cartoon face image cannot be ensured. And the StyleGAN-based technical stacks all need to reversely solve hidden space variables corresponding to the input face picture, and the solving process involves iterative optimization operation and consumes more time.

Aiming at the problem that the similarity of an input face image and an output cartoon face image is not high in the process of cartoon face images in the prior art, no effective solution is proposed at present.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a computer storage medium and electronic equipment, which at least solve the technical problem that the similarity of an input face image and an output cartoon face image is not high in the process of cartoon face image in the prior art.

According to an aspect of an embodiment of the present application, there is provided an image processing method including: acquiring an original face image; extracting features of an original face image by using an encoder model to obtain a target feature vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample, and the first training sample comprises: the first face image and the second training sample pair comprise: a second face image and a third face image of a target style; and generating an image of the target feature vector by using the pre-trained generation model to generate a target face image of the target style.

Optionally, performing feature extraction on the first face image and the second face image by using the initial model to obtain a first feature vector of the first face image and a second feature vector of the second face image; respectively carrying out image generation on the first feature vector and the second feature vector by utilizing a pre-trained generation model to generate a first generation image and a second generation image of a target style; constructing a target loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image and the second generated image; and adjusting model parameters of the initial model based on the target loss function to obtain an encoder model.

Optionally, constructing the objective loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image, and the second generated image includes: acquiring first similarity of a first face image and a first generated image, second similarity of a second face image and a second generated image, and constructing a first loss function based on the first similarity and the second similarity; constructing a second loss function based on the second generated image and the third face image; respectively judging the first generated image and the second generated image by utilizing a pre-trained discriminator network to obtain a first score of the first generated image and a second score of the second generated image, and constructing a third loss function based on the first score and the second score; and processing the first loss function, the second loss function and the third loss function to obtain a target loss function.

Optionally, constructing the first loss function based on the first similarity and the second similarity includes: and obtaining an average value of the first similarity and the second similarity to obtain a first loss function.

Optionally, constructing the third loss function based on the first score and the second score comprises: and obtaining an average value of the first score and the second score to obtain a third loss function.

Optionally, processing the first loss function, the second loss function, and the third loss function to obtain a target loss function includes: obtaining a common logarithm of a first loss function to obtain a first parameter; obtaining the product of the second loss function and the first super-parameter to obtain a second parameter; obtaining the ratio of the third loss function to the preset value to obtain a third parameter; obtaining the sum of the second parameter and the third parameter to obtain a parameter sum; and obtaining the difference between the first parameter and the sum of the second parameter to obtain the target loss function.

Optionally, adjusting model parameters of the initial model based on the target loss function, the deriving the encoder model includes: obtaining a derivative of the target loss function; obtaining the product of the derivative and the second super parameter to obtain a target step length; and adjusting model parameters of the initial model according to the target step length.

According to another aspect of the embodiments of the present application, there is also provided an image processing apparatus including: the acquisition module is used for acquiring an original face image; the extraction module is used for extracting features of an original face image by using an encoder model to obtain a target feature vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample, and the first training sample comprises: the first face image and the second training sample pair comprise: a second face image and a third face image of a target style; and the generating module is used for generating an image of the target feature vector by utilizing the pre-trained generating model to generate a target face image of the target style.

According to another aspect of the embodiments of the present application, there is also provided a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the image processing method of the above embodiments.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by a processor and to perform the image processing method of the above embodiments.

In the embodiment of the application, in the process of cartooning a face image, an original face image is obtained, an encoder model is utilized to conduct feature extraction on the original face image, and a target feature vector of the original face image is obtained, wherein the encoder model is obtained through training of a first training sample and a second training sample, and the first training sample comprises: the first face image and the second training sample pair comprise: a second face image and a third face image of a target style; and generating an image of the target feature vector by using the pre-trained generation model to generate a target face image of the target style. It is easy to notice that by combining a small number of manually-made pairs of real faces and cartoon faces, the pairs are introduced into a cartoon face generation model as supervision information, and the purpose of obtaining a better corresponding relation between the distribution of the real faces and the cartoon faces is achieved, so that the technical effect of obtaining the cartoon faces with higher similarity is achieved, and the technical problem that the input face images and the output cartoon faces are not high in similarity in the process of the cartoon faces in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic illustration of a real face image training sample material;

FIG. 3 is a schematic diagram of a small number of artificially created pairs of real faces and cartoon human face samples;

FIG. 4 is a flow chart of an alternative face encoder model training method according to embodiments of the present application;

FIG. 5 is a schematic illustration of an alternative cartoonized face image generation model according to an embodiment of the present application;

fig. 6 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Example 1

According to an embodiment of the present application, there is provided an image processing method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that herein.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application, as shown in fig. 1, the method includes the steps of:

step S102, acquiring an original face image.

The original face image in the above step S102 may be obtained by an image capturing device or may be obtained from a database, but is not limited thereto. Wherein, the camera device includes: cameras, video cameras, but are not limited thereto. For example, a mobile phone camera is used for shooting to obtain a face image.

Step S104, extracting features of an original face image by using an encoder model to obtain a target feature vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample, and the first training sample comprises: the first face image and the second training sample pair comprise: a second face image and a third face image of the target style.

The target style in step S104 includes: cartoon, dark style, palace style, but not limited thereto. The embodiment of the application takes a cartoon face image as an example for explanation.

The first face image in the above step S104 may be a large number of real face images, as shown in fig. 2. The second face image may be a small number of artificially produced real face images, and the third face image of the target style may be a cartoon face image generated according to the second face image, and the generating manner includes: and (3) manual manufacturing and machine manufacturing, but is not limited to the method. The third face image of the target style corresponds to the second face image one by one, as shown in fig. 3.

The encoder model in the above step S104 may be constituted by a deep learning neural network CNN (Convolutional Neural Network ), including: resNet (Residual Neural Network ), shuffeNet (Shuffle Neural Network, shuffle neural network), mobileNet (Mobile Neural Network ), but are not limited thereto.

In the step S104, a small number of artificially generated real face images are used as input, corresponding cartoon face images are used as output, and a large number of real face images are used as input, so that training is continuously performed until a required encoder model is obtained. And extracting the target feature vector of the original face by using the trained encoder model.

And S106, generating an image of the target feature vector by using the pre-trained generation model, and generating a target face image of the target style.

The target face image of the target style in step S106 may be a cartoonized original face image.

The pre-trained generation model in the step S106 may be obtained by continuously training the extracted feature vector of the real face image as input of the existing generation model and the cartoonized image corresponding to the feature vector as output. Time costs can be saved by using a pre-trained generative model.

The first face image in the above step may be a large number of real face images, the second face image may be a small number of real face images manufactured manually, and the third face image may be a cartoon face image generated according to the second face image, and the generating manner includes: and (3) manual manufacturing and machine manufacturing, but is not limited to the method.

The first generated image in the step is a cartoon first face image generated by using the pre-training generated model, and the second generated image is a cartoon second face image generated by using the pre-training generated model.

In an alternative embodiment, in the process of training the encoder, the existing model is utilized to perform feature extraction on a large number of real face images and a small number of artificially produced real face images, so as to obtain corresponding first feature vectors and second feature vectors. And taking the extracted first feature vector and the second feature vector as the input of a pre-trained generation model to generate a cartoon first face image and a cartoon second face image. Constructing an objective loss function of an initial model according to a first face image, a second face image, a cartoon face image artificially manufactured according to the second face image, the cartoon first face image and the cartoon second face image, comprising: countermeasure information, identity information, and supervision information, but is not limited thereto. And adjusting model parameters of the existing model according to the obtained target loss function to obtain an encoder model. Through the steps, a better corresponding relation between the cartoon face picture distribution and the input face image distribution can be generated.

The similarity in the above steps may be a similarity between the generated cartoonized face image and the input face image, wherein the first similarity may be a similarity between the generated cartoonized first face image and the input first face image, and the second similarity may be a similarity between the generated cartoonized second face image and the input second face image. The first loss function may be obtained from an average of the calculated first and second similarities, but is not limited thereto.

In the above step, the second loss function may be obtained according to the similarity between the generated cartoonized second face image and the cartoonized face image artificially created from the second face image, but is not limited thereto.

In the above step, the first score may be obtained by scoring the generated cartoonized first face image according to the input first face image using the existing already trained discriminant network, and the second score may be obtained by scoring the generated cartoonized second face image according to the input second face image using the existing already trained discriminant network. The third loss function may be obtained based on an average of the calculated first and second scores, but is not limited thereto.

The first super parameter in the above steps may be set according to the needs of the user.

The second super parameter in the above step may be set according to the needs of the user.

Through the steps, the target loss function can be obtained, the model parameters of the existing model are adjusted according to the target loss function, the encoder model is obtained, and the better corresponding relation between the cartoon face picture distribution and the input face image distribution can be generated.

In an alternative embodiment, as shown in fig. 4, the face encoder model training method includes the steps of:

step S402, preparing a real face image dataset.

The real face image dataset in the above step S402 is shown in fig. 2.

Step S404, a small number of artificially produced real face and cartoon face sample pairs are produced.

The face sample pair in step S404 is shown in fig. 3.

In step S406, an encoder model E is constructed, and a trained generator G and a discriminator D are introduced.

Step S408, constructing a loss function, and adjusting the encoder model E according to the loss function to obtain a trained face image encoder model E.

The loss function in step S408 includes countermeasure information, identity information, and supervision information. Wherein, countermeasure information (discriminator evaluation): whether the generated cartoon face image can pass through the examination of the discriminator D or not; identity is similar: in the unsupervised training part, the generated cartoon face image and the similarity of the input face image are generated; monitoring information: in the supervised training part, the consistency of the generated cartoon face image and the label image is realized.

The specific process of constructing the loss function and adjusting the encoder model E in step S408 is as follows:

Input: minimum batch of real face images x' (fig. 2), minimum batch of supervision information pairs < x, y > (fig. 3), x representing real face images, y representing artificially created cartoonized face images corresponding to x, and number of expected iteration steps (training times) S. Because the number of < x, y > is much smaller than x', the < x, y > is recycled.

for n=1 to S do (start loop, iterative training)

z+.E (x), z'

Generating cartoon face image based on generator G

The discriminator D scores the evaluation of the generated cartoon face image

Distance between generated cartoon face image and input face image

The difference L≡log(s) between the cartoon face generated based on x and the artificially produced cartoon face image y _D )-s _id /2-λs _pair Constructing a discriminator objective function, wherein lambda is a super parameter and is set manually>

Iterative training of face image encoder network E end for with L maximization as target

In an alternative embodiment, as shown in fig. 5, the obtained real face image of at least one male in the exercise scene is input into a trained encoder model E, the encoder model E extracts the face feature vector of the real face image, and uses the face feature vector as the input of a trained cartoon face image generation model G, so that the cartoon face image of the male real face image can be obtained, and identity privacy information of the video face is ensured while interestingness is increased.

In an alternative embodiment, as shown in fig. 5, the obtained real face image of at least one participant in the remote video conference scene is input into a trained encoder model E, the encoder model E extracts the face feature vector of the real face image and uses the face feature vector as the trained cartoon face image generation model G for input, so that the cartoon face image of the real face image of the participant can be obtained, the privacy is protected, the communication interest can be increased, and the feedback of visual interaction information of the eye spirit and expression between the communication participants is maintained.

Example 2

According to the embodiment of the present invention, an image processing apparatus is further provided, where the apparatus may be configured to execute the image processing method in the foregoing embodiment, and the specific implementation scheme and the application scenario are the same as those of the foregoing embodiment, and are not described herein.

Fig. 6 is a schematic view of an image processing apparatus according to an embodiment of the present invention, as shown in fig. 6, including:

an acquisition module 62 is configured to acquire an original face image.

The extracting module 64 is configured to perform feature extraction on an original face image by using an encoder model to obtain a target feature vector of the original face image, where the encoder model is obtained by training a first training sample and a second training sample, and the first training sample includes: the first face image and the second training sample pair comprise: a second face image and a third face image of the target style.

The generating module 66 is configured to generate an image of the target feature vector by using the pre-trained generating model, and generate a target face image of the target style.

Optionally, the extraction module includes: the extraction unit is used for extracting the characteristics of the first face image and the second face image by using the initial model to obtain a first characteristic vector of the first face image and a second characteristic vector of the second face image; the generating unit is used for generating images of the first characteristic vector and the second characteristic vector by utilizing the pre-trained generating model to generate a first generating image and a second generating image of the target style; the building unit is used for building a target loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image and the second generated image; and the adjusting unit is used for adjusting the model parameters of the initial model based on the target loss function to obtain the encoder model.

Optionally, the building unit comprises: the first construction subunit is used for acquiring a first similarity of the first face image and the first generated image, a second similarity of the second face image and the second generated image, and constructing a first loss function based on the first similarity and the second similarity; a second construction subunit configured to construct a second loss function based on the second generated image and the third face image; the third construction subunit is used for respectively judging the first generated image and the second generated image by utilizing a pre-trained discriminator network to obtain a first score of the first generated image and a second score of the second generated image, and constructing a third loss function based on the first score and the second score; and the processing subunit is used for processing the first loss function, the second loss function and the third loss function to obtain a target loss function.

Optionally, the first construction subunit is further configured to obtain an average value of the first similarity and the second similarity, to obtain a first loss function.

Optionally, the third construction subunit is further configured to obtain an average value of the first score and the second score, to obtain a third loss function.

Optionally, the processing subunit is further configured to obtain a common logarithm of the first loss function, to obtain a first parameter; obtaining the product of the second loss function and the first super-parameter to obtain a second parameter; obtaining the ratio of the third loss function to the preset value to obtain a third parameter; obtaining the sum of the second parameter and the third parameter to obtain a parameter sum; and obtaining the difference between the first parameter and the sum of the second parameter to obtain the target loss function.

Optionally, the adjusting unit includes: a first acquisition subunit for acquiring a derivative of the target loss function; a second obtaining subunit, configured to obtain a product of the derivative and the second super parameter, to obtain a target step size; and the adjustment subunit is used for adjusting the model parameters of the initial model according to the target step length.

Example 3

The embodiment of the present application further provides a computer storage medium, where a plurality of instructions may be stored, where the instructions are adapted to be loaded by a processor and execute the steps of the method of the embodiment shown in fig. 1 to 5, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 5, which is not repeated herein.

The device on which the storage medium resides may be an electronic device.

Example 4

As shown in fig. 7, the electronic device 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002.

Wherein the communication bus 1002 is used to enable connected communication between these components.

The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.

The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the overall electronic device 1000 using various interfaces and lines, performs various functions of the electronic device 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.

The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 7, an operating system, a network communication module, a user interface module, and an operating application of the electronic device may be included in a memory 1005, which is one type of computer storage medium.

In the electronic device 1000 shown in fig. 7, the user interface 1003 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke an operating application of the electronic device stored in the memory 1005, and specifically perform the following operations:

Acquiring an original face image; extracting features of an original face image by using an encoder model to obtain a target feature vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample, and the first training sample comprises: the first face image and the second training sample pair comprise: a second face image and a third face image of a target style; and generating an image of the target feature vector by using the pre-trained generation model to generate a target face image of the target style.

In one embodiment, the operating system of the electronic device is an android system in which the processor 1001 further performs the steps of:

respectively extracting features of the first face image and the second face image by using the initial model to obtain a first feature vector of the first face image and a second feature vector of the second face image; respectively carrying out image generation on the first feature vector and the second feature vector by utilizing a pre-trained generation model to generate a first generation image and a second generation image of a target style; constructing a target loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image and the second generated image; and adjusting model parameters of the initial model based on the target loss function to obtain an encoder model.

In one embodiment, the processor 1001 further performs the steps of:

constructing an objective loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image, and the second generated image comprises: acquiring first similarity of a first face image and a first generated image, second similarity of a second face image and a second generated image, and constructing a first loss function based on the first similarity and the second similarity; constructing a second loss function based on the second generated image and the third face image; respectively judging the first generated image and the second generated image by utilizing a pre-trained discriminator network to obtain a first score of the first generated image and a second score of the second generated image, and constructing a third loss function based on the first score and the second score; and processing the first loss function, the second loss function and the third loss function to obtain a target loss function.

In one embodiment, the processor 1001 further performs the steps of:

constructing a first loss function based on the first similarity and the second similarity includes: and obtaining an average value of the first similarity and the second similarity to obtain a first loss function.

In one embodiment, the processor 1001 further performs the steps of:

constructing a third loss function based on the first score and the second score includes: and obtaining an average value of the first score and the second score to obtain a third loss function.

In one embodiment, the processor 1001 further performs the steps of:

processing the first, second and third loss functions to obtain a target loss function comprising: obtaining a common logarithm of a first loss function to obtain a first parameter; obtaining the product of the second loss function and the first super-parameter to obtain a second parameter; obtaining the ratio of the third loss function to the preset value to obtain a third parameter; obtaining the sum of the second parameter and the third parameter to obtain a parameter sum; and obtaining the difference between the first parameter and the sum of the second parameter to obtain the target loss function.

In one embodiment, the processor 1001 further performs the steps of:

adjusting model parameters of the initial model based on the target loss function, the obtaining an encoder model comprising: obtaining a derivative of the target loss function; obtaining the product of the derivative and the second super parameter to obtain a target step length; and adjusting model parameters of the initial model according to the target step length.

In this embodiment of the present application, the electronic device may perform, by using a processor, obtaining an original face image, and perform feature extraction on the original face image by using an encoder model to obtain a target feature vector of the original face image, where the encoder model is obtained by training a first training sample and a second training sample, and the first training sample includes: the first face image and the second training sample pair comprise: a second face image and a third face image of a target style; and generating an image of the target feature vector by using the pre-trained generation model to generate a target face image of the target style. By combining a small number of artificially-made real face and cartoon face sample pairs, the method is used as supervision information to be introduced into a cartoon face generation model, and the purpose of obtaining a better corresponding relation between the real face and the cartoon face picture distribution is achieved, so that the technical effect of obtaining a cartoon face image with higher similarity is achieved, and the technical problem that the input face image and the output cartoon face image are not high in similarity in the process of the cartoon face image in the prior art is solved.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An image processing method, comprising:

acquiring an original face image;

extracting features of the original face image by using an encoder model to obtain a target feature vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample pair, and the first training sample comprises: a first face image, the second training sample pair comprising: a second face image and a third face image of a target style;

and generating an image of the target feature vector by using a pre-trained generation model to generate a target face image of the target style.

2. The method according to claim 1, wherein the method further comprises:

respectively extracting features of the first face image and the second face image by using an initial model to obtain a first feature vector of the first face image and a second feature vector of the second face image;

Respectively carrying out image generation on the first characteristic vector and the second characteristic vector by utilizing the pre-trained generation model to generate a first generation image and a second generation image of the target style;

constructing a target loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image and the second generated image;

and adjusting model parameters of the initial model based on the target loss function to obtain the encoder model.

3. The method of claim 2, wherein constructing the objective loss function of the initial model based on the first face image, the second face image, the third face image, the first generated image, and the second generated image comprises:

acquiring first similarity of the first face image and the first generated image, second similarity of the second face image and the second generated image, and constructing a first loss function based on the first similarity and the second similarity;

constructing a second loss function based on the second generated image and the third face image;

Respectively judging the first generated image and the second generated image by utilizing a pre-trained judging device network to obtain a first score of the first generated image and a second score of the second generated image, and constructing a third loss function based on the first score and the second score;

and processing the first loss function, the second loss function and the third loss function to obtain the target loss function.

4. The method of claim 3, wherein constructing a first loss function based on the first similarity and the second similarity comprises:

and obtaining an average value of the first similarity and the second similarity to obtain the first loss function.

5. A method according to claim 3, wherein constructing a third loss function based on the first score and the second score comprises:

and obtaining an average value of the first score and the second score to obtain the third loss function.

6. A method according to claim 3, wherein processing the first, second and third loss functions to obtain the target loss function comprises:

Obtaining a common logarithm of the first loss function to obtain a first parameter;

obtaining the product of the second loss function and the first super-parameter to obtain a second parameter;

obtaining the ratio of the third loss function to a preset value to obtain a third parameter;

obtaining the sum of the second parameter and the third parameter to obtain a parameter sum;

and obtaining the difference between the first parameter and the sum of the parameters to obtain the target loss function.

7. The method of claim 2, wherein adjusting model parameters of the initial model based on the target loss function to obtain the encoder model comprises:

obtaining a derivative of the target loss function;

obtaining the product of the derivative and the second super parameter to obtain a target step length;

and adjusting model parameters of the initial model according to the target step length.

8. An image processing apparatus, comprising:

the acquisition module is used for acquiring an original face image;

the extraction module is used for extracting the characteristics of the original face image by using an encoder model to obtain a target characteristic vector of the original face image, wherein the encoder model is obtained by training a first training sample and a second training sample, and the first training sample comprises: a first face image, the second training sample pair comprising: a second face image and a third face image of a target style;

And the generating module is used for generating images of the target feature vectors by utilizing a pre-trained generating model and generating target face images of the target style.

9. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 7.

10. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 7.