CN108564127B

CN108564127B - Image conversion method, image conversion device, computer equipment and storage medium

Info

Publication number: CN108564127B
Application number: CN201810354082.8A
Authority: CN
Inventors: 李旻骏; 黄浩智; 马林; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-04-19
Filing date: 2018-04-19
Publication date: 2022-02-18
Anticipated expiration: 2038-04-19
Also published as: CN108564127A

Abstract

The application relates to an image conversion method, which comprises the following steps: acquiring a current image to be processed, wherein the current image to be processed comprises a face; inputting the current image to be processed into a trained target face image conversion model, wherein the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model; and acquiring a target style face image output by the target face image conversion model. The image conversion method does not need to label the training samples, and is low in training cost and high in accuracy. In addition, an image conversion apparatus, a computer device and a storage medium are also provided.

Description

Image conversion method, image conversion device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer processing technologies, and in particular, to an image conversion method and apparatus, a computer device, and a storage medium.

Background

Image conversion is the conversion of an image from one style to another. The traditional model for converting the facial image from one style to another style needs to be trained by adopting a supervised training method, namely, corresponding labels need to be set for training samples. The acquisition of the labels is often difficult, the training result depends on the set labels, and if the labels are not accurate enough or the training samples with the labels are not enough, the final training result is affected, i.e., the training cost is high, and the training result is often not accurate enough.

Disclosure of Invention

In view of the above, it is necessary to provide an image conversion method, an image conversion apparatus, a computer device, and a storage medium with low cost and high accuracy.

A method of image conversion, the method comprising:

acquiring a current image to be processed, wherein the current image to be processed comprises a face;

inputting the current image to be processed into a trained target face image conversion model, wherein the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model;

and acquiring a target style face image output by the target face image conversion model.

An image conversion apparatus, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a current image to be processed, and the current image to be processed comprises a face;

the input module is used for inputting the current image to be processed into a trained target face image conversion model, the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model;

and the output module is used for acquiring the target style face image output by the target face image conversion model.

A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:

A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

According to the image conversion method, the image conversion device, the computer equipment and the storage medium, the current image to be processed containing the face is input into the trained target face image conversion model, the trained target face image conversion model is used for converting the face of the input image from the first style to the second style, the trained target face image conversion model is obtained by training the original face image conversion model and the discrimination network model, and the target style face image output by the target face image conversion model is obtained. The target face image conversion model is trained by adopting the original face image conversion model and the discrimination network model, and in the process, the discrimination model is adopted as the auxiliary model to train the original face image conversion model, so that the training can be completed without labels, namely, manual labeling is not needed, the training cost is greatly reduced, and the training accuracy is improved.

Drawings

FIG. 1 is a diagram of an exemplary environment in which an image transformation method may be implemented;

FIG. 2 is a flow diagram of a method for image conversion in one embodiment;

FIG. 3 is a diagram illustrating face image conversion in one embodiment;

FIG. 4 is a schematic diagram of an image transformation model training process in one embodiment;

FIG. 5 is a flow diagram of a training process for a target facial image transformation model in one embodiment;

FIG. 6 is a schematic diagram of an image transformation model training process in another embodiment;

FIG. 7 is a flow diagram of acquiring an image in one embodiment;

FIG. 8 is a flow diagram of inputting a to-be-processed image into a target facial image transformation model in one embodiment;

FIG. 9A is a block diagram of image conversion in one embodiment;

FIG. 9B is a diagram illustrating image conversion in one embodiment;

FIG. 10 is a flowchart of an image conversion method in another embodiment;

FIG. 11 is a block diagram showing the structure of an image conversion apparatus according to an embodiment;

FIG. 12 is a block diagram of the structure of a training module in one embodiment;

FIG. 13 is a block diagram of the structure of a training module in another embodiment;

FIG. 14 is a block diagram showing the construction of an image conversion apparatus according to another embodiment;

FIG. 15 is a block diagram showing the construction of an image conversion apparatus according to still another embodiment;

FIG. 16 is a block diagram showing a configuration of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a diagram of an application environment of an image conversion method in one embodiment. Referring to fig. 1, the image conversion method is applied to an image conversion system. The image conversion system comprises a terminal 110 and a server 120, wherein the terminal 110 and the server 120 are connected through a network, the terminal 110 may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. The terminal 110 sends a current image to be processed to the server 120, the server 120 obtains the current image to be processed, the current image to be processed includes a face, then the current image to be processed is input into a trained target face image conversion model, the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model, a target style face image output by the target face image conversion model is obtained, and the target style face image is returned to the corresponding terminal 110.

In another embodiment, the above-described image conversion method may be directly applied to the terminal 110. The terminal 110 obtains a current image to be processed, the current image to be processed comprises a face, then the terminal 110 inputs the current image to be processed into a trained target face image conversion model, the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model, and finally, the terminal 110 obtains a target style face image output by the target face image conversion model.

In one embodiment, as shown in FIG. 2, an image conversion method is provided. The present embodiment is illustrated as being applied to a terminal, which may be applied to a server. The image conversion method specifically comprises the following steps:

step S202, acquiring a current image to be processed, wherein the current image to be processed comprises a face.

The current image to be processed refers to an image to be converted currently, and the current image to be processed includes a face, which may be a face of a person or a face of an animal. The current image to be processed can be an image containing a face directly shot by the terminal through calling a camera, or can be a stored image containing a face selected from an album.

Step S204, inputting the current image to be processed into a trained target face image conversion model, wherein the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model.

The target face image conversion model is used for converting the face in the input image from a first style to a second style. The images of different styles have different features, and the first style and the second style belong to different styles of images. Wherein the style of the image comprises: cartoon style, real style, sketch style, cartoon style, quadratic element style, etc. I.e., a target facial image transformation model is used to transform an image from one style to another. The target face image conversion model is obtained by training an original face image conversion model and a discrimination network model. In one embodiment, the original face image conversion model and the discrimination network model are obtained by adopting an unsupervised training algorithm through antagonistic learning training. The unsupervised training algorithm is a training method without setting labels for training samples; the antagonistic learning means that two models learn against each other by setting an antagonistic loss value, wherein the aim of one model training is to minimize the antagonistic loss value, and the aim of the other model training is to maximize the antagonistic loss value. Both continuously adjust the parameters in the respective models by means of such antagonistic learning until a convergence condition is reached. The original face image conversion model and the discrimination network model are subjected to counterstudy, so that the target face image conversion model is obtained by training under the condition of no label.

In one embodiment, training the model comprises: an image transformation model and a discriminant network model. The training set comprises two image sets, wherein one image set is a first format image set, the other image set is a second format image set, and the images in the first format image set and the second format image set both contain faces. The training can be obtained by adopting the following modes: first, a first style image is acquired from a first style image set as a current first style image, and a second style image is acquired from a second style image set as a current second style image. Then, the current first style image is input into the image conversion model, an output first output image is obtained, the first output image and the second style image are used as input of the judgment network model, a first probability corresponding to the output first output image is obtained, and an output second probability corresponding to the second style image is obtained. And finally, calculating a resistance loss value according to the first probability and the second probability, and then adjusting the weight parameters in the image conversion model and the discrimination network model according to the resistance loss value, wherein the parameters of the image conversion model are adjusted to minimize the resistance loss value, and the parameters in the discrimination network model are adjusted to maximize the resistance loss value. And then, updating the current first-style image and the current second-style image, repeating the processes to adjust the weight parameters in the image conversion model and the judgment network model, circulating in sequence until corresponding convergence conditions are reached, and taking the trained image conversion model as a target image conversion model.

In step S206, the target style face image output by the target face image conversion model is acquired.

The target style face image is an image with a target style face obtained by converting through a target face image conversion model. Assuming that the target style is a two-dimensional animation style, the obtained face image has a two-dimensional style. Fig. 3 is a schematic diagram illustrating an embodiment of converting an image containing a human face into a face image having a two-dimensional style.

The image conversion method includes inputting a current image to be processed containing a face into a trained target face image conversion model, wherein the trained target face image conversion model is used for converting the face of the input image from a first style to a second style, and the trained target face image conversion model is obtained through an original face image conversion model and a discrimination network model, so that a target style face image output by the target face image conversion model is obtained. The target face image conversion model is trained by adopting the original face image conversion model and the discrimination network model, and in the process, the discrimination model is adopted as the auxiliary model to train the original face image conversion model, so that the training can be completed without labels, namely, manual labeling is not needed, the training cost is greatly reduced, and the training accuracy is improved.

As shown in fig. 4, in one embodiment, the original face image conversion model includes a forward face image conversion model and a reverse face image conversion model, and the discrimination network model includes a first discrimination network model connected to an output of the forward face image conversion model and a second discrimination network model connected to an output of the reverse face image conversion model; the forward face image conversion model is used for converting the face in the input image from a first style to a second style, and the reverse face image conversion model is used for converting the face in the input image from the second style to the first style; the first judgment network model is used for calculating the state information of the input image belonging to the second style, and the second judgment network model is used for calculating the state information of the input image belonging to the first style.

The forward face image conversion model is used for converting the face of the input image from a first style to a second style. The reverse face image conversion model functions to convert the face of the input image from the second style to the first style. In order to train the forward face image conversion model into the target face image conversion model, three auxiliary models are required, wherein one auxiliary model is the reverse face image conversion model and two discrimination network models are also required. The first judgment network model is connected with the output of the forward face image conversion model, and the second judgment network model is connected with the output of the reverse face image conversion model, that is, the output of the forward face image conversion model is used as the input of the first judgment network model, and the output of the reverse face image conversion model is used as the input of the second judgment network model. In addition, it is necessary to input the second-style image as the first discrimination network model and the first-style image as the second discrimination network model. In one embodiment, as shown in fig. 4, a schematic diagram of a forward facial image conversion model, a reverse facial image conversion model, a first discriminant network model, and a second discriminant network model in a training process is shown, in which x represents a first-style image, y represents a second-style image, g (x) represents an output image obtained through the forward facial image conversion model, and f (y) represents an output image obtained through the reverse facial image conversion model.

The first judgment network model is used for calculating the state information that the face of the input image belongs to the second style, the second judgment network model is used for calculating the state information that the face of the input image belongs to the first style, and the state information can be probability information or weight score information. The first discrimination network model is used to identify which of the faces of the input image is a true second-style image and which is an image output by the forward face image conversion model. The first discrimination network model is trained to judge the input real second-style image as true and the image output by the forward face image conversion model as false. If the image output by the forward face image conversion model is enough to cheat the first discrimination network model, the image output by the forward face image conversion model has the characteristics of the second style image, so that the first discrimination network model cannot identify the authenticity. Similarly, the second determination network model is used to identify which of the input images is the true first-style image and which is the image output by the reverse face image conversion model, and is trained to determine that the input true first-style image is true and that the image output by the reverse face image conversion model is false. If the image output by the reverse face image conversion model is enough to cheat the second discrimination network model, the image output by the reverse face image conversion model has the characteristics of the first style image, so that the second discrimination network model cannot identify the authenticity. In one embodiment, the forward face image transformation model, the reverse face image transformation model, the first discrimination network model and the second discrimination network model are trained by using a convolutional neural network model.

As shown in FIG. 5, in one embodiment, the training step of the target face image transformation model includes:

step S502, a training data set is obtained, the training data set comprises a first style image set and a second style image set, and each image in the first style image set and each image in the second style image set comprises a face.

The training data set refers to a set of data required for training a model. The purpose of the training target face image conversion model is to convert the face from a first style to a second style. The training data set includes two sets, one set of images of a first style and one set of images of a second style, and each image in the first and second sets of images of the style includes a face.

In one embodiment, after the training data set is obtained, preprocessing is performed on the images in the training data set, where the preprocessing includes data cleaning and data enhancement on the first style images and the second style images, and the data cleaning refers to deleting samples in the training set, where the samples may interfere with training, such as incomplete faces (e.g., side faces only), insufficient light, and the like. Data enhancement refers to highlighting faces in training images, including: and (3) carrying out pure coloring on the background in the image, and positioning the face in the training image at the center of the picture.

Step S504, obtaining a current first style image according to the first style image set, and obtaining a current second style image according to the second style image set.

Before the model is trained, firstly, a first-style image is obtained according to a first-style image set to serve as a current first-style image, and a second-style image is obtained according to a second-style image set to serve as a current second-style image. In one embodiment, a first-style image from the first-style image set may be directly taken as the current first-style image, and a second-style image from the second-style image set may be taken as the current second-style image. In another embodiment, the first-style images directly obtained from the first-style image set are further processed, including identifying the position of the face in the first-style images, extracting the first-style face images from the first-style images according to the identified face position, and then using the first-style face images as the current first-style images. Similarly, the second-style images directly acquired from the second-style image set are further processed, and the extracted second-style face image is taken as the current second-style image.

Step S506, the current first style image is sequentially processed by the forward face image conversion model and the first discrimination model to obtain a first probability of corresponding output.

The method comprises the steps of taking a current first style image as an input of a forward face image conversion model, obtaining an image output by the forward face image conversion model, namely a first output image, taking the first output image as an input of a first discrimination model, and then obtaining a first output probability. The first probability is a probability that the first output image determined to belong to the second style image.

And step S508, the current second style image is sequentially processed by the reverse face image conversion model and the second judgment model to obtain a second probability of corresponding output.

The current second-style image is used as the input of the reverse face image conversion model, the image output by the reverse face image conversion model is acquired, the image is called as a second output image for distinguishing, the second output image is used as the input of a second judging model, and then the second probability of the output is acquired. The second probability is a probability that the second output image obtained by the judgment belongs to the first style image.

Step S510, inputting the current first-style image into the second decision model to obtain a third probability of corresponding output, and inputting the current second-style image into the first decision model to process to obtain a fourth probability of corresponding output.

The first discriminant model is used for judging the probability that the input image belongs to the second style. The second judgment model is used for judging the probability that the input image belongs to the first style. And inputting the current first-style image into the second discrimination model to obtain a third probability which is correspondingly output and belongs to the first style, and inputting the current second-style image into the first discrimination model to obtain a fourth probability which is output and belongs to the second style.

And S512, calculating to obtain a confrontation loss value according to the first probability, the second probability, the third probability and the fourth probability.

The first probability and the fourth probability are calculated probabilities belonging to the second style, and the second probability and the third probability are calculated probabilities belonging to the first style. And calculating to obtain a first pair of loss resistance values according to the first probability and the fourth probability. And calculating to obtain a second pair of loss resistance values according to the second probability and the third probability. The total countermeasure loss value is calculated from the first countermeasure loss value and the second countermeasure loss value, and in one embodiment, the sum of the first countermeasure loss value and the second countermeasure loss value is directly used as the total countermeasure loss value.

Step S514, parameters in the forward face image conversion model, the reverse face image conversion model, the first discrimination model and the second discrimination model are adjusted according to the confrontation loss value.

And the training of the first discriminant model aims to identify the first output image output by the forward face image conversion model as false and identify the current second style image as true, namely for the first discriminant model, the smaller the output first probability corresponding to the first output image is, the better the output fourth probability corresponding to the current second style image is, the larger the output fourth probability is, the better the output fourth probability is. The aim of training the forward facial image conversion model is to convert the first style image into the second style image, that is, for the forward facial image conversion model, the larger the expected first probability is, the better the first probability is, which indicates that the image obtained by the forward facial image conversion model conforms to the second style image characteristics. Similarly, for the second judgment model, it is desirable that the smaller the second probability is, the better the second probability is, and the larger the third probability is, the better the third probability is. For the inverse face image conversion model, the larger the second probability it is desired to obtain, the better.

The confrontation loss value is positively correlated with the third probability and the fourth probability and inversely correlated with the first probability and the second probability. Therefore, for the forward face image conversion model and the reverse face image conversion model, the weighting parameters are adjusted so that the countermeasure loss value is adjusted toward a decrease. The first and second discrimination models are adjusted in weight parameters so that the countermeasure loss value is increased.

In one embodiment, the calculation formula for the challenge loss value may be expressed in the form:

L_adv＝log(D_y(y))+log(1-D_y(y′))+log(D_x(x))+log(1-D_x(x′))

where x represents the current first-style image and y represents the current second-style image. x 'represents an image (second output image) where y is output via the reverse face image conversion model, and y' is an image (first output image) where x is output via the forward face image conversion model. D_x(x) Is the probability that the first style image is output through the second discriminant model, D_x(x') refers to a probability that the second output image is output through the second discriminant model. D_y(y) represents the probability that the current second-style image is output by the first discriminant model, D_y(y') represents a probability that the first output image is output through the first discriminant model. L is_advRepresenting the calculated challenge loss value. According to the calculated resistance loss value L_advAdjusting parameters in the forward face image conversion model, the reverse face image conversion model, the first discrimination model and the second discrimination model, wherein the parameters are adjusted by the forward face image conversion model and the reverse face image conversion model to ensure that the confrontation loss value L is ensured_advThe objective of the minimization, the first discrimination model and the second discrimination model to adjust the parameters is to make the resistance loss value L_advAnd (4) maximizing. Reverse face based on forward face image conversionParameters in the model are adjusted through counterstudy between the image conversion and the first judging model and the second judging model, so that the forward face image conversion model can be trained under the condition of no label. In one embodiment, after obtaining the penalty value, parameters in each model are adjusted by using a gradient descent method, and in order to improve the stability of training, a gradient penalty term (gradient penalty) may be added when the parameters in the model are adjusted.

And step S516, updating the current first style image and the current second style image, returning to obtain a corresponding first probability by sequentially processing the current first style image through the forward face image conversion model and the first judgment model, circulating until a convergence condition is met, and taking the forward face image conversion model obtained after training as a target face image conversion model.

Wherein, after adjusting parameters in the forward face image conversion model, the reverse face image conversion model, the first discrimination model and the second discrimination model according to the resistance loss value, updating the current first style image and the current second style image, namely updating the current training sample data, then using the new training sample data to continue training the forward face image conversion model, the reverse face image conversion model, the first discrimination model and the second discrimination model, the training mode is the same as above, returning to the step of processing the current first style image by the forward face image conversion model and the first discrimination model in sequence to obtain the corresponding first probability, processing the current second style image by the reverse face image conversion model and the second discrimination model in sequence to obtain the corresponding second probability, circulating according to the mode until meeting the convergence condition, training of each model is completed, wherein the convergence condition can be set in a self-defined manner, for example, by looking at whether the output image has substantially no change, if the output image has substantially no change, the convergence is achieved, or looking at whether the first probability output by the first discriminant model is in a random state, that is, whether the first output image is true or false cannot be identified, because the first output image completely has the features of the second style. And taking the forward face image obtained by training as a target face image conversion model for converting the face image of the first style into the face image of the second style.

In one embodiment, after adjusting parameters in the forward facial image conversion model, the reverse facial image conversion model, the first discrimination model and the second discrimination model according to the confrontation loss value, the method further includes: taking a first output image of the current first style image, which is output through the forward face image conversion model, as the input of the adjusted reverse face image conversion model, and acquiring an output third output image; taking a second output image output by the current second style image through the reverse face image conversion model as the input of the adjusted forward face image conversion model, and acquiring an output fourth output image; calculating a first difference value between the current first style image and the third output image and a second difference value between the current second style image and the fourth output image; calculating to obtain a cyclic loss value according to the first difference value and the second difference value; and adjusting parameters in the forward face image conversion model and the reverse image conversion model again according to the cycle loss value.

And taking the first output image as the input of the adjusted reverse face image conversion model, and acquiring an output third output image. And taking the second output image as the input of the adjusted forward face image conversion model to obtain an output fourth output image. The first difference value is a loss value between a third output image obtained by sequentially passing a forward face image conversion model and a reverse face image conversion model between a current first style image and a current first style image. The second difference value is a loss value between a fourth output image obtained by sequentially passing the current second-style image and the current second-style image through a reverse face image conversion model and a forward face image conversion model. And calculating to obtain a cyclic loss value according to the first difference value and the second difference value, wherein the first difference value and the second difference value are positively correlated with the cyclic loss value. In one embodiment, the sum of the first difference value and the second difference value is directly taken as the value of the cyclic loss. In one embodiment, the calculation of the value of the cyclic loss is as follows:

L_cyc＝||F(G(x))-x||₁+||G(F(y))-y||₁,

wherein, x represents the current first style image, and F (G (x)) represents the result obtained by sequentially passing the current first style image through the forward face conversion model and the reverse face conversion model. And G (F (y)) represents a result obtained by sequentially passing the current second style image through the reverse face conversion model and the forward face conversion model. | x | non-conducting phosphor₁Which represents the calculation of norm 1 for x. As shown in fig. 6, a schematic diagram of the forward face image conversion model, the reverse face image conversion model, the first discriminant network model, and the second discriminant network model in the training process in one embodiment includes taking G (x) as the input of the reverse face conversion model to obtain F (G (x)), and taking F (y) as the input of the forward face conversion model to obtain G (F (y)).

The purpose of training the forward face image conversion model is to convert the face in the input image from a first style to a second style. The purpose of training the inverse facial image conversion model is to convert the second style of the face in the input image to the first style. In order to ensure that the content of the original image is preserved in the image conversion process, it is desirable that the original image can be obtained by converting the face in the image from the first style to the second style, and then converting the second style to the first style. The smaller the variability between images after cycling the better. The goal of parameter adjustment in the forward face image conversion model and the backward image conversion model according to the value of the cyclic loss is to minimize the value of the cyclic loss. Parameters in the forward face image conversion model and the reverse face image conversion model are adjusted through the resistance loss value and the cyclic loss value, so that the original content of the image is kept in the conversion process, only the style of the original content is converted, and the face image with the target style obtained by the conversion of the target image conversion model obtained through training keeps the face characteristics in the original image.

As shown in FIG. 7, in one embodiment, obtaining a current first-style image from a first-style image set and a current second-style image from a second-style image set comprises:

step S504A, an original first-style image is obtained from the first-style image set, and an original second-style image is obtained from the second-style image set.

The image directly obtained from the first-style image set is referred to as an "original first-style image", and the image directly obtained from the second-style image set is referred to as an "original second-style image".

Step S504B, respectively performing random disturbance processing on the original first-style image and the original second-style image to obtain an adjusted first-style image and an adjusted second-style image, where the random disturbance processing includes: at least one of translation, scaling, and brightness adjustment.

Wherein the random perturbation process comprises at least one of translation, scaling and brightness adjustment of the image. In order to enable samples in the training data set to have diversity, after the original first style image and the original second style image are obtained, random disturbance processing is respectively carried out on the original first style image and the original second style image. The adjustment of the first style image refers to an image obtained by performing random disturbance processing on the original first style image. The second style image is adjusted by randomly disturbing the original second style image.

In step S504C, the first-style image is adjusted to be the current first-style image, and the second-style image is adjusted to be the current second-style image.

And taking the adjusted first style image obtained through random disturbance processing as a current first style image, and taking the adjusted second style image as a current second style image. Namely, the first style image is adjusted to be input into the forward face image conversion model, and the second style image is adjusted to be input into the reverse face image conversion model.

In one embodiment, after acquiring the current image to be processed, the current image to be processed includes a face, the method further includes: identifying the face in the current image to be processed to obtain face characteristic points; and extracting a face image from the current image to be processed according to the face characteristic point, and taking the extracted face image as the current image to be processed.

The face can be recognized by adopting a face recognition model, and the face recognition model can be obtained by adopting convolutional neural network model training. The facial feature points include feature points representing the outline and five sense organs of the face. Therefore, the face image can be extracted from the current image to be processed according to the face feature points, and the extracted face image is used as the current image to be processed, namely the extracted face image is used as the input of the target face image conversion model.

In one embodiment, extracting a face image from a current image to be processed according to the face feature points comprises: determining the positions of two eyes according to the eye feature points in the face feature points; when the positions of the two eyes are not on the same horizontal line, rotating the current image to be processed to enable the positions of the two eyes to be on the same horizontal line to obtain an intermediate image; determining a clipping frame corresponding to the face image according to the face feature points in the intermediate image; a face image is extracted from the intermediate image according to the cropping frame.

The positions of the eyes are calculated according to the coordinates of the eye feature points in the face feature points, whether the positions of the two eyes are on the same horizontal line or not is judged, if the positions of the two eyes are not on the same horizontal line, the face is inclined, the positions of the two eyes can be on the same horizontal line by selecting the current image to be processed, and the image after rotation is called as an intermediate image. Then, a clipping frame corresponding to the face image is determined according to the feature points of the face contour in the face feature points in the intermediate image, and the face image is extracted from the intermediate image according to the clipping frame.

In one embodiment, the first style refers to a real style of an image directly captured by the camera; the second style refers to a virtual animation style.

The first-style image may be a real-style image directly captured by the camera. For example, an image pickup device (e.g., a camera) is used to capture a face, and the obtained face image is the first style image. The second-style image is a virtual cartoon-style image, such as a two-dimensional character-style image.

As shown in fig. 8, in one embodiment, the target face conversion model includes a downsampled convolutional layer, a residual convolutional layer, and an upsampled convolutional layer;

inputting an image to be processed into a trained target face image conversion model, wherein the method comprises the following steps:

step S204A is to take the image to be processed as an input of a downsampling convolutional layer, where the downsampling convolutional layer is used to perform downsampling convolution calculation on the face image in the image to be processed to obtain a first face feature matrix.

The downsampling convolution layer is used for carrying out downsampling convolution calculation on the face image in the image to be processed. The effect of using the convolutional layer is to convert the image from a high resolution image to a low resolution image. Namely, the first face feature matrix obtained by the downsampling convolution calculation is a low-resolution image, and the calculation amount of the subsequent convolution is favorably reduced by obtaining the low-resolution image.

Step S204B is to use the first face feature matrix as an input to a residual convolution layer, which is used to perform a residual convolution operation on the first face feature matrix to convert the first face feature matrix into a second face feature matrix with a target style feature.

And the obtained first face feature matrix is used as the input of a residual convolution layer, and the residual convolution layer is used for carrying out residual convolution operation on the first face feature matrix and converting the first face feature matrix into a second face feature matrix with target style features. The residual convolutional layer has the main function of converting the features of the face in the input image to be processed from the first style into the features with the second style. The residual convolutional layer can be implemented by a plurality of convolutional layers, for example, a neural network layer composed of two convolutional layers can be used as the residual convolutional layer, and the input of a part of the residual convolutional layer is directly added to the output, so that the input data information of the previous network layer can be ensured to directly act on the following network layer, and the deviation of the corresponding output and the original input is reduced.

Step S204C, the second face feature matrix is used as an input of the upsampling convolutional layer, and the upsampling convolutional layer is used for performing upsampling convolution calculation on the second face feature matrix to obtain the target style face image.

And the upsampling convolutional layer is used for performing upsampling convolution calculation on the obtained second face characteristic matrix to obtain a target style face image. The role of the upsampled convolutional layer is to convert the resulting image from low resolution to high resolution. Since the downsampling convolution operation is performed on the image, the resolution of the image is reduced, and in order to obtain a relatively clear output image, the image needs to be upsampled from a low resolution to obtain a high-resolution image.

In one embodiment, the target facial image transformation model includes a 2-layer down-sampled (down) convolutional layer, a 6-layer Residual (Residual) convolutional layer, and a 2-layer up-sampled (upsamplle) convolutional layer. To optimize the conversion quality, a Dropout layer may be added to the Residual (Residual) convolutional layer, where the Dropout layer may remove some neurons, for example, randomly remove 50% of the neurons according to a probability of 0.5, and may prevent overfitting. In one embodiment, the network structure of the target facial image transformation model is shown in table 1:

TABLE 1

Layer type	Convolution kernel	Number of filled cells	Step size	Number of convolution kernels	Activating a function
						Convolutional layer	7X7	3	1	128	ReLu
Downsampling convolutional layer	3X3	1	2	256	ReLu
						Downsampling convolutional layer	3X3	1	2	256	ReLu
Residual convolution layer 6	3X3	1	1	256	ReLu
						Up-sampling convolutional layer	3X3	1	1	512	ReLu
Up-sampling convolutional layer	3X3	1	1	256	ReLu
						Up-sampling convolutional layer	7X7	3	1	3	tanh

In one embodiment, the discriminant network model is also trained using a convolutional neural network model, and the corresponding network structure is shown in table 2 and includes 6 convolutional layers, where the last convolutional layer does not include an activation function:

TABLE 2

Layer type	Convolution kernel	Number of filled cells	Step size	Number of convolution kernels	Activating a function
						Convolutional layer	3X3	1	2	64	Leaky ReLu
Convolutional layer	3X3	1	2	128	Leaky ReLu
						Convolutional layer	3X3	1	2	256	Leaky ReLu
Convolutional layer	3X3	1	2	512	Leaky ReLu
						Convolutional layer	3X3	1	1	512	Leaky ReLu
Convolutional layer	3X3	1	1	1	_

In one embodiment, the upsampled convolutional layer is a sub-pixel convolutional layer; the sub-pixel convolution layer is used to convert a low-resolution image into a high-resolution image.

In order to obtain a high-resolution image, the upsampling convolutional layer is implemented by using a sub-pixel (sub) convolutional layer. The sub-pixel convolution layer can effectively convert a low-resolution image into a high-resolution image with relatively high definition.

In one embodiment, the image conversion method further comprises: and taking the target style face image as the input of a trained image super-resolution processing model, and acquiring the output processed high-resolution target style face image, wherein the image super-resolution processing model is obtained by adopting a convolutional neural network algorithm for training.

In order to obtain a clearer target style face image, the super-resolution processing is carried out on the target style face image to obtain a high-definition target style face image. Specifically, the target style face image is used as an input of a trained image super-resolution processing model, an output processed high-resolution target style face image is obtained, and the image super-resolution model can be obtained by adopting a convolutional neural network algorithm for training.

As shown in fig. 9A, in one embodiment, three models are included, a face recognition model, a target face image conversion model, and a super-resolution processing model. The face recognition model is used for recognizing face characteristic points in an input image to be processed, then extracting a face image from the image to be processed according to the recognized face characteristic points, then inputting the face image into a target face image conversion model, the target face image conversion model is used for converting a face from a first style to a second style, acquiring a target style face image output by the target face conversion model, and then inputting the target style face image into a super-resolution processing model, and the super-resolution processing model is used for converting the target style face image into a high-resolution image. Fig. 9B is a schematic diagram of an embodiment of recognizing, converting, and performing super-resolution processing on an image including a human face by using the three models.

In one embodiment, acquiring a current image to be processed, the current image to be processed including a face, includes: acquiring a target video; and acquiring a target video frame containing a face in the target video, and taking the target video frame containing the face as a current image to be processed.

The image conversion method can be applied to real-time conversion of the face in the video, specifically, a target video is obtained, then a target video frame containing the face in the target video is obtained, the video is composed of video frames of one frame and one frame, each video frame corresponds to one video image, and therefore the face in the video can be converted into the face image of the target style by taking the target video frame containing the face as the current image to be processed.

As shown in fig. 10, in one embodiment, an image conversion method is proposed, including the steps of:

step S1001, acquiring a current image to be processed, wherein the current image to be processed comprises a face;

step S1002, identifying the face in the current image to be processed to obtain face characteristic points;

step S1003, determining the positions of two eyes according to the eye feature points in the face feature points;

step S1004, determining whether the positions of the two eyes are on a horizontal line, if yes, taking the current image to be processed as an intermediate image, and going to step S1006, if no, going to step S1005.

Step S1005, rotating the current image to be processed to enable the positions of the two eyes to be on a horizontal line, and obtaining an intermediate image;

step S1006, determining a clipping frame corresponding to the face image according to the face feature points in the intermediate image;

step 1007, extracting a face image from the intermediate image according to the clipping frame;

step S1008, inputting the face image into a trained target face image conversion model, the trained target face image conversion model including: the face image processing method comprises a down-sampling convolutional layer, a residual convolutional layer and an up-sampling convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling convolution calculation on a face image to obtain a first face feature matrix, the residual convolutional layer is used for performing residual convolution operation on the first face feature matrix to convert the first face feature matrix into a second face feature matrix with target style features, and the up-sampling convolutional layer is used for performing up-sampling convolution calculation on the second face feature matrix to obtain a target style face image.

In step S1009, the target style face image output by the target face image conversion model is acquired.

And step S1010, taking the target style face image as the input of a trained image super-resolution processing model, and acquiring the output processed high-resolution target style face image, wherein the image super-resolution processing model is obtained by adopting a convolutional neural network algorithm for training.

It should be understood that, although the steps in the flowcharts of fig. 2 to 10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-10 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

As shown in fig. 11, in one embodiment, there is provided an image conversion apparatus including:

an obtaining module 1102, configured to obtain a current image to be processed, where the current image to be processed includes a face;

an input module 1104, configured to input the current image to be processed into a trained target face image conversion model, where the trained target face image conversion model is used to convert a face in an input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model;

an output module 1106, configured to obtain the target style facial image output by the target facial image conversion model.

In one embodiment, the original facial image transformation model includes a forward facial image transformation model and a reverse facial image transformation model, and the discrimination network model includes a first discrimination network model connected to an output of the forward facial image transformation model and a second discrimination network model connected to an output of the reverse facial image transformation model; the forward face image conversion model is used for converting the face in the input image from a first style to a second style, and the reverse face image conversion model is used for converting the face in the input image from the second style to the first style; the first judgment network model is used for calculating the state information of the input image belonging to the second style, and the second judgment network model is used for calculating the state information of the input image belonging to the first style.

As shown in fig. 12, in an embodiment, the image conversion apparatus further includes a training module 1101, where the training module 1101 includes:

a training data obtaining module 1101A, configured to obtain a training data set, where the training data set includes a first style image set and a second style image set, and each image in the first style image set and each image in the second style image set includes a face;

an image obtaining module 1101B, configured to obtain a current first-style image according to the first-style image set, and obtain a current second-style image according to the second-style image set;

a first probability output module 1101C, configured to pass the current first-style image through a forward face image conversion model and a first discrimination model in sequence to obtain a first probability of corresponding output;

a second probability output module 1101D, configured to pass the current second-style image through a reverse face image conversion model and a second determination model in sequence to obtain a second probability of corresponding output;

the judgment output module 1101E is configured to input the current first-style image into the second judgment model to obtain a third probability of corresponding output, and input the current second-style image into the first judgment model to process to obtain a fourth probability of corresponding output;

a loss calculation module 1101F, configured to calculate a countermeasure loss value according to the first probability, the second probability, the third probability and the fourth probability;

a first adjusting module 1101G, configured to adjust parameters in the forward face image conversion model, the reverse face image conversion model, the first judging model, and the second judging model according to the confrontation loss value;

an updating module 1101H, configured to update the current first-style image and the current second-style image, return to process the current first-style image sequentially through the forward face image conversion model and the first discrimination model to obtain a corresponding first probability, and cycle the process until a convergence condition is met, and use the forward face image conversion model obtained after training as the target face image conversion model.

As shown in fig. 13, in an embodiment, the training module 1101 further includes:

a first image output module 1101I, configured to take a first output image of the current first-style image output by the forward face image conversion model as an input of the adjusted reverse face image conversion model, and obtain an output third output image;

a second image output module 1101J, configured to take a second output image, output by the reverse face image conversion model, of the current second-style image as an input of the adjusted forward face image conversion model, and obtain an output fourth output image;

a difference value calculating module 1101K, configured to calculate a first difference value between the current first-style image and the third output image, and a second difference value between the current second-style image and the fourth output image;

a cyclic loss calculation module 1101L, configured to calculate a cyclic loss value according to the first difference value and the second difference value;

a second adjusting module 1101M, configured to adjust parameters in the forward face image conversion model and the backward image conversion model again according to the cyclic loss value.

In an embodiment, the image obtaining module 1101B is further configured to obtain an original first-style image from the first-style image set, obtain an original second-style image from the second-style image set, and perform random perturbation processing on the original first-style image and the original second-style image respectively to obtain an adjusted first-style image and an adjusted second-style image, where the random perturbation processing includes: and at least one of translation, scaling and brightness adjustment is carried out, the adjusted first style image is used as the current first style image, and the adjusted second style image is used as the current second style image.

As shown in fig. 14, in one embodiment, the image conversion apparatus further includes:

the recognition module 1108 is configured to recognize a face in the current image to be processed to obtain a face feature point;

the extracting module 1110 is configured to extract a facial image from the current image to be processed according to the facial feature points, and use the extracted facial image as the current image to be processed.

In one embodiment, the extracting module 1110 is further configured to determine positions of two eyes according to the eye feature points in the facial feature points, rotate the current image to be processed so that the positions of the two eyes are on a horizontal line when the positions of the two eyes are not on a horizontal line, obtain an intermediate image, determine a clipping frame corresponding to the facial image according to the facial feature points in the intermediate image, and extract the facial image from the intermediate image according to the clipping frame.

In one embodiment, the first style refers to a real style of an image directly captured by a camera; the second style is a virtual cartoon style.

In one embodiment, the target face conversion model includes a downsampled convolutional layer, a residual convolutional layer, and an upsampled convolutional layer; the input module is further used for taking the image to be processed as the input of the downsampling convolutional layer, and the downsampling convolutional layer is used for carrying out downsampling convolution calculation on the face image in the image to be processed to obtain a first face feature matrix; the first face feature matrix is used as the input of the residual convolution layer, and the residual convolution layer is used for performing residual convolution operation on the first face feature matrix and converting the first face feature matrix into a second face feature matrix with target style features; and taking the second face feature matrix as the input of the up-sampling convolution layer, wherein the up-sampling convolution layer is used for carrying out up-sampling convolution calculation on the second face feature matrix to obtain a target style face image.

In one embodiment, the upsampled convolutional layer is a sub-pixel convolutional layer; the sub-pixel convolution layer is used to convert a low resolution image into a high resolution image.

As shown in fig. 15, in one embodiment, the image conversion apparatus further includes:

a super-resolution processing module 1112, configured to use the target style face image as an input of a trained image super-resolution processing model, and obtain an output processed high-resolution target style face image, where the image super-resolution processing model is obtained by training with a convolutional neural network algorithm.

In an embodiment, the obtaining module is further configured to obtain a target video, obtain a target video frame containing a face in the target video, and use the target video frame containing the face as the current image to be processed.

FIG. 16 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be a terminal, and may also be a server. As shown in fig. 16, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the image conversion method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the image conversion method. Those skilled in the art will appreciate that the architecture shown in fig. 16 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the image conversion method provided by the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 16. The memory of the computer device may store various program modules that make up the image conversion apparatus, such as the acquisition module 1102, the input module 1104, and the output module 1106 of FIG. 11. The computer program constituted by the respective program modules causes the processor to execute the steps in the image conversion apparatus of the respective embodiments of the present application described in the present specification. For example, the computer device shown in fig. 16 may acquire a current image to be processed, which includes a face, through the acquisition module 1102 of the image conversion apparatus shown in fig. 11; inputting the current image to be processed into a trained target face image conversion model through an input module 1104, wherein the trained target face image conversion model is used for converting a face in an input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model; the target style facial image output by the target facial image conversion model is obtained through an output module 1106.

In one embodiment, a computer device is proposed, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of: acquiring a current image to be processed, wherein the current image to be processed comprises a face; inputting the current image to be processed into a trained target face image conversion model, wherein the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model; and acquiring a target style face image output by the target face image conversion model.

In one embodiment, the processor is further configured to perform the steps of: acquiring a training data set, wherein the training data set comprises a first style image set and a second style image set, and each image in the first style image set and the second style image set comprises a face; acquiring a current first style image according to the first style image set, and acquiring a current second style image according to the second style image set; sequentially passing the current first style image through a forward face image conversion model and a first discrimination model to obtain a first probability of corresponding output; sequentially passing the current second style image through a reverse face image conversion model and a second judgment model to obtain a corresponding output second probability; inputting the current first style image into the second judgment model to obtain a third probability of corresponding output, and inputting the current second style image into the first judgment model to process to obtain a fourth probability of corresponding output; calculating to obtain a confrontation loss value according to the first probability, the second probability, the third probability and the fourth probability; adjusting parameters in the forward face image conversion model, the reverse face image conversion model, the first discrimination model and the second discrimination model according to the confrontation loss value; and updating the current first style image and the current second style image, returning to obtain a corresponding first probability by sequentially processing the current first style image through a forward face image conversion model and a first discrimination model, circulating until a convergence condition is met, and taking the forward face image conversion model obtained after training as the target face image conversion model.

In one embodiment, after the adjusting parameters in the forward facial image conversion model, the backward facial image conversion model, the first discrimination model and the second discrimination model according to the confrontation loss value, the processor is further configured to perform the following steps: taking a first output image of the current first style image, which is output through a forward face image conversion model, as the input of the adjusted reverse face image conversion model, and acquiring an output third output image; taking a second output image output by the current second style image through a reverse face image conversion model as the input of the adjusted forward face image conversion model, and acquiring an output fourth output image; calculating a first difference value between the current first style image and the third output image and a second difference value between the current second style image and the fourth output image; calculating to obtain a cyclic loss value according to the first difference value and the second difference value; and adjusting parameters in the forward face image conversion model and the reverse image conversion model again according to the cyclic loss value.

In one embodiment, obtaining a current first-style image from the first-style image set and obtaining a current second-style image from the second-style image set comprises: obtaining an original first-style image from the first-style image set, and obtaining an original second-style image from the second-style image set; respectively carrying out random disturbance processing on the original first style image and the original second style image to obtain an adjusted first style image and an adjusted second style image, wherein the random disturbance processing comprises the following steps: at least one of translation, zooming and brightness adjustment; and taking the adjusted first style image as the current first style image, and taking the adjusted second style image as the current second style image.

In one embodiment, after the obtaining the current to-be-processed image, the current to-be-processed image including a face, the processor is further configured to perform the steps of: identifying the face in the current image to be processed to obtain face characteristic points; and extracting a face image from the current image to be processed according to the face characteristic point, and taking the extracted face image as the current image to be processed.

In one embodiment, the extracting a facial image from the current image to be processed according to the facial feature points includes: determining the positions of two eyes according to the eye feature points in the face feature points; when the positions of the two eyes are not on the same horizontal line, rotating the current image to be processed to enable the positions of the two eyes to be on the same horizontal line to obtain an intermediate image; determining a clipping frame corresponding to the face image according to the face feature points in the intermediate image; and extracting a face image from the intermediate image according to the cropping frame.

In one embodiment, the first style is a real style of an image directly captured by a camera, and the second style is a virtual cartoon style.

In one embodiment, the target face conversion model includes a downsampled convolutional layer, a residual convolutional layer, and an upsampled convolutional layer; the inputting the image to be processed into the trained target face image conversion model comprises: the image to be processed is used as the input of the downsampling convolutional layer, and the downsampling convolutional layer is used for carrying out downsampling convolution calculation on the face image in the image to be processed to obtain a first face feature matrix; the first face feature matrix is used as the input of the residual convolution layer, and the residual convolution layer is used for performing residual convolution operation on the first face feature matrix and converting the first face feature matrix into a second face feature matrix with target style features; and taking the second face feature matrix as the input of the up-sampling convolution layer, wherein the up-sampling convolution layer is used for carrying out up-sampling convolution calculation on the second face feature matrix to obtain a target style face image.

In one embodiment, the processor is further configured to perform the steps of: and taking the target style face image as the input of a trained image super-resolution processing model, and acquiring the output processed high-resolution target style face image, wherein the image super-resolution processing model is obtained by adopting a convolutional neural network algorithm for training.

In one embodiment, the obtaining a current to-be-processed image, the current to-be-processed image including a face, includes: acquiring a target video; and acquiring a target video frame containing a face in the target video, and taking the target video frame containing the face as the current image to be processed.

In one embodiment, a computer-readable storage medium is proposed, in which a computer program is stored which, when executed by a processor, causes the processor to carry out the steps of: acquiring a current image to be processed, wherein the current image to be processed comprises a face; inputting the current image to be processed into a trained target face image conversion model, wherein the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model; and acquiring a target style face image output by the target face image conversion model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of image conversion, the method comprising:

the original face image conversion model comprises a forward face image conversion model and a reverse face image conversion model, and the discrimination network model comprises a first discrimination network model connected with the output of the forward face image conversion model and a second discrimination network model connected with the output of the reverse face image conversion model; the forward face image conversion model is used for converting the face in the input image from a first style to a second style, and the reverse face image conversion model is used for converting the face in the input image from the second style to the first style;

the target face image conversion model is obtained by training the forward face image conversion model; the forward face image conversion model is trained according to a confrontation loss value; the confrontation loss value is calculated according to the first probability, the second probability, the third probability and the fourth probability; the first probability is obtained by sequentially passing the current first style image through a forward face image conversion model and a first discrimination model; the second probability is obtained by sequentially passing the current second-style image through a reverse face image conversion model and a second judgment model; the third probability is obtained by inputting the current first style image into a second judgment model; the fourth probability is obtained by inputting the current second style image into the first discrimination model for processing;

2. The method of claim 1,

the first judgment network model is used for calculating the state information of the input image belonging to the second style, and the second judgment network model is used for calculating the state information of the input image belonging to the first style.

3. The method of claim 2, wherein the step of training the target facial image transformation model comprises:

acquiring a training data set, wherein the training data set comprises a first style image set and a second style image set, and each image in the first style image set and the second style image set comprises a face;

acquiring a current first style image according to the first style image set, and acquiring a current second style image according to the second style image set;

sequentially passing the current first style image through a forward face image conversion model and a first discrimination model to obtain a first probability of corresponding output;

sequentially passing the current second style image through a reverse face image conversion model and a second judgment model to obtain a corresponding output second probability;

inputting the current first style image into the second judgment model to obtain a third probability of corresponding output, and inputting the current second style image into the first judgment model to process to obtain a fourth probability of corresponding output;

calculating to obtain a confrontation loss value according to the first probability, the second probability, the third probability and the fourth probability;

adjusting parameters in the forward face image conversion model, the reverse face image conversion model, the first discrimination model and the second discrimination model according to the confrontation loss value;

and updating the current first style image and the current second style image, returning to obtain a corresponding first probability by sequentially processing the current first style image through a forward face image conversion model and a first discrimination model, circulating until a convergence condition is met, and taking the forward face image conversion model obtained after training as the target face image conversion model.

4. The method of claim 3, wherein after adjusting parameters in the forward facial image transformation model, the reverse facial image transformation model, the first discriminant model, and the second discriminant model according to the confrontation loss values, the method further comprises:

taking a first output image of the current first style image, which is output through a forward face image conversion model, as the input of the adjusted reverse face image conversion model, and acquiring an output third output image;

taking a second output image output by the current second style image through a reverse face image conversion model as the input of the adjusted forward face image conversion model, and acquiring an output fourth output image;

calculating a first difference value between the current first style image and the third output image and a second difference value between the current second style image and the fourth output image;

calculating to obtain a cyclic loss value according to the first difference value and the second difference value;

and adjusting parameters in the forward face image conversion model and the reverse image conversion model again according to the cyclic loss value.

5. The method of claim 3, wherein obtaining a current first-style image from the first-style image set and a current second-style image from the second-style image set comprises:

obtaining an original first-style image from the first-style image set, and obtaining an original second-style image from the second-style image set;

respectively carrying out random disturbance processing on the original first style image and the original second style image to obtain an adjusted first style image and an adjusted second style image, wherein the random disturbance processing comprises the following steps: at least one of translation, zooming and brightness adjustment;

and taking the adjusted first style image as the current first style image, and taking the adjusted second style image as the current second style image.

6. The method of claim 1, wherein after the obtaining a current to-be-processed image, the current to-be-processed image including a face, further comprising:

identifying the face in the current image to be processed to obtain face characteristic points;

and extracting a face image from the current image to be processed according to the face characteristic point, and taking the extracted face image as the current image to be processed.

7. The method according to claim 6, wherein the extracting a facial image from the current image to be processed according to the facial feature points comprises:

determining the positions of two eyes according to the eye feature points in the face feature points;

when the positions of the two eyes are not on the same horizontal line, rotating the current image to be processed to enable the positions of the two eyes to be on the same horizontal line to obtain an intermediate image;

determining a clipping frame corresponding to the face image according to the face feature points in the intermediate image;

and extracting a face image from the intermediate image according to the cropping frame.

8. The method according to claim 1, wherein the first style is a real style of an image directly captured by a camera, and the second style is a virtual cartoon style.

9. The method of claim 1, wherein the target face conversion model comprises a downsampled convolutional layer, a residual convolutional layer, and an upsampled convolutional layer;

the inputting the image to be processed into the trained target face image conversion model comprises:

the image to be processed is used as the input of the downsampling convolutional layer, and the downsampling convolutional layer is used for carrying out downsampling convolution calculation on the face image in the image to be processed to obtain a first face feature matrix;

the first face feature matrix is used as the input of the residual convolution layer, and the residual convolution layer is used for performing residual convolution operation on the first face feature matrix and converting the first face feature matrix into a second face feature matrix with target style features;

and taking the second face feature matrix as the input of the up-sampling convolution layer, wherein the up-sampling convolution layer is used for carrying out up-sampling convolution calculation on the second face feature matrix to obtain a target style face image.

10. The method of claim 9, wherein the upsampled convolutional layer is a sub-pixel convolutional layer;

the sub-pixel convolution layer is used to convert a low resolution image into a high resolution image.

11. The method of claim 1, further comprising:

and taking the target style face image as the input of a trained image super-resolution processing model, and acquiring the output processed high-resolution target style face image, wherein the image super-resolution processing model is obtained by adopting a convolutional neural network algorithm for training.

12. The method of claim 1, wherein the obtaining a current image to be processed, the current image to be processed including a face, comprises:

acquiring a target video;

and acquiring a target video frame containing a face in the target video, and taking the target video frame containing the face as the current image to be processed.

13. An image conversion apparatus, the apparatus comprising:

the input module is used for inputting the current image to be processed into a trained target face image conversion model, the trained target face image conversion model is used for converting the face in the input image from a first style to a second style, and the trained target face image conversion model is obtained by training an original face image conversion model and a discrimination network model; the original face image conversion model comprises a forward face image conversion model and a reverse face image conversion model, and the discrimination network model comprises a first discrimination network model connected with the output of the forward face image conversion model and a second discrimination network model connected with the output of the reverse face image conversion model; the forward face image conversion model is used for converting the face in the input image from a first style to a second style, and the reverse face image conversion model is used for converting the face in the input image from the second style to the first style; the target face image conversion model is obtained by training the forward face image conversion model; the forward face image conversion model is trained according to a confrontation loss value; the confrontation loss value is calculated according to the first probability, the second probability, the third probability and the fourth probability; the first probability is obtained by sequentially passing the current first style image through a forward face image conversion model and a first discrimination model; the second probability is obtained by sequentially passing the current second-style image through a reverse face image conversion model and a second judgment model; the third probability is obtained by inputting the current first style image into a second judgment model; the fourth probability is obtained by inputting the current second style image into the first discrimination model for processing;

14. The apparatus of claim 13, wherein the first discriminant network model is configured to compute state information of the input image belonging to the second genre, and wherein the second discriminant network model is configured to compute state information of the input image belonging to the first genre.

15. The apparatus of claim 14, further comprising a training module, the training module comprising:

the training data acquisition module is used for acquiring a training data set, wherein the training data set comprises a first style image set and a second style image set, and each image in the first style image set and each image in the second style image set comprises a face;

the image acquisition module is used for acquiring a current first style image according to the first style image set and acquiring a current second style image according to the second style image set;

the first probability output module is used for enabling the current first style image to sequentially pass through a forward face image conversion model and a first discrimination model to obtain a first probability of corresponding output;

the second probability output module is used for enabling the current second-style image to sequentially pass through the reverse face image conversion model and the second judgment model to obtain a second probability of corresponding output;

the judgment output module is used for inputting the current first style image into the second judgment model to obtain a third probability of corresponding output, and inputting the current second style image into the first judgment model to be processed to obtain a fourth probability of corresponding output;

the loss calculation module is used for calculating a confrontation loss value according to the first probability, the second probability, the third probability and the fourth probability;

the first adjusting module is used for adjusting parameters in the forward face image conversion model, the reverse face image conversion model, the first judging model and the second judging model according to the confrontation loss value;

and the updating module is used for updating the current first style image and the current second style image, returning to obtain a corresponding first probability by sequentially processing the current first style image through the forward face image conversion model and the first judgment model, circulating the steps until a convergence condition is met, and taking the forward face image conversion model obtained after training as the target face image conversion model.

16. The apparatus of claim 15, wherein the training module further comprises:

the first image output module is used for taking a first output image of the current first style image, which is output by the forward face image conversion model, as the input of the adjusted reverse face image conversion model, and acquiring an output third output image;

the second image output module is used for taking a second output image of the current second style image, which is output by the reverse face image conversion model, as the input of the adjusted forward face image conversion model, and acquiring an output fourth output image;

the difference value calculating module is used for calculating a first difference value between the current first style image and the third output image and a second difference value between the current second style image and the fourth output image;

the cyclic loss calculation module is used for calculating a cyclic loss value according to the first difference value and the second difference value;

and the second adjusting module is used for adjusting the parameters in the forward face image conversion model and the reverse image conversion model again according to the cyclic loss value.

17. The apparatus of claim 15, wherein the image obtaining module is further configured to obtain an original first-style image from the first-style image set, and an original second-style image from the second-style image set; respectively carrying out random disturbance processing on the original first style image and the original second style image to obtain an adjusted first style image and an adjusted second style image, wherein the random disturbance processing comprises the following steps: at least one of translation, zooming and brightness adjustment; and taking the adjusted first style image as the current first style image, and taking the adjusted second style image as the current second style image.

18. The apparatus of claim 13, further comprising:

the identification module is used for identifying the face in the current image to be processed to obtain face characteristic points;

and the extraction module is used for extracting a face image from the current image to be processed according to the face characteristic points and taking the extracted face image as the current image to be processed.

19. The apparatus according to claim 18, wherein the extraction module is further configured to determine positions of two eyes according to the eye feature points in the facial feature points; when the positions of the two eyes are not on the same horizontal line, rotating the current image to be processed to enable the positions of the two eyes to be on the same horizontal line to obtain an intermediate image; determining a clipping frame corresponding to the face image according to the face feature points in the intermediate image; and extracting a face image from the intermediate image according to the cropping frame.

20. The apparatus according to claim 13, wherein the first style is a real style of an image directly captured by the camera, and the second style is a virtual cartoon style.

21. The apparatus of claim 13, wherein the target face conversion model comprises a downsampled convolutional layer, a residual convolutional layer, and an upsampled convolutional layer; the input module is further used for taking the image to be processed as the input of the downsampling convolutional layer, and the downsampling convolutional layer is used for carrying out downsampling convolution calculation on the face image in the image to be processed to obtain a first face feature matrix; the first face feature matrix is used as the input of the residual convolution layer, and the residual convolution layer is used for performing residual convolution operation on the first face feature matrix and converting the first face feature matrix into a second face feature matrix with target style features; and taking the second face feature matrix as the input of the up-sampling convolution layer, wherein the up-sampling convolution layer is used for carrying out up-sampling convolution calculation on the second face feature matrix to obtain a target style face image.

22. The apparatus of claim 21, wherein the upsampled convolutional layer is a sub-pixel convolutional layer; the sub-pixel convolution layer is used to convert a low resolution image into a high resolution image.

23. The apparatus of claim 13, further comprising: and the super-resolution processing module is used for taking the target style face image as the input of a trained image super-resolution processing model and acquiring the output processed high-resolution target style face image, wherein the image super-resolution processing model is obtained by adopting a convolutional neural network algorithm for training.

24. The apparatus of claim 13, wherein the obtaining module is further configured to obtain a target video; and acquiring a target video frame containing a face in the target video, and taking the target video frame containing the face as the current image to be processed.

25. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 12.

26. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 12.