CN116958451B

CN116958451B - Model processing, image generating method, image generating device, computer device and storage medium

Info

Publication number: CN116958451B
Application number: CN202311191788.4A
Authority: CN
Inventors: 樊艳波; 余旺博; 张勇; 王璇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-26
Anticipated expiration: 2043-09-15
Also published as: CN116958451A

Abstract

The application relates to model processing, an image generation method, an image generation device, a computer device and a storage medium. The method comprises the following steps: acquiring a pre-training generator; acquiring a first face image sample, and training an image encoder based on the first face image sample and the pre-training generator; acquiring a second face image sample with a preset image style, and adjusting the pre-training generator based on the second face image sample to obtain a stylized generator; determining at least one pair of associated model parameters from the model parameters of each of the pre-training generator and the stylization generator to obtain a pair of model parameters; and interpolating each model parameter pair, and updating model parameters belonging to the model parameter pair for interpolation in the stylized generator by utilizing an interpolation result to obtain an image decoder. By adopting the method, the application scene of image stylization can be expanded.

Description

Model processing, image generating method, image generating device, computer device and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to a model processing method, apparatus, computer device, storage medium, and computer program product, and an image generating method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of computer technology, a face image stylization technology appears, and a given face image can be processed into a preset style image through the face image stylization technology, so that the face image stylization technology is widely applied to the fields of film and television special effects, video call and the like.

In the conventional technology, when the face image is stylized, only a two-dimensional style image can be obtained, so that the application scene of the face image is limited.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a model processing method, apparatus, computer device, computer readable storage medium, and computer program product, and an image generating method, apparatus, computer device, computer readable storage medium, and computer program product that are capable of expanding application scenes for face image stylization.

In a first aspect, the present application provides a model processing method. The method comprises the following steps:

acquiring a pre-training generator; the pre-training generator is used for acquiring a face feature vector, generating three-dimensional face data according to the face feature vector, and rendering a face image according to a specified visual angle;

Acquiring a first face image sample, and training an image encoder based on the first face image sample and the pre-training generator; the image encoder is used for encoding the input face image into a face feature vector when the face image is input;

acquiring a second face image sample with a preset image style, and adjusting the pre-training generator based on the second face image sample to obtain a stylized generator;

determining at least one pair of associated model parameters from the model parameters of each of the pre-training generator and the stylization generator to obtain a pair of model parameters;

interpolation is carried out on each model parameter pair, and model parameters belonging to the model parameter pair for interpolation in the stylized generator are updated by utilizing interpolation results, so that an image decoder is obtained; the image decoder is used for decoding the face feature vector output by the image encoder to obtain three-dimensional style face data.

In a second aspect, the present application further provides a model processing apparatus. The device comprises:

the generator acquisition module is used for acquiring a pre-training generator; the pre-training generator is used for acquiring a face feature vector, generating three-dimensional face data according to the face feature vector, and rendering a face image according to a specified visual angle;

The encoder training module is used for acquiring a first face image sample and training an image encoder based on the first face image sample and the pre-training generator; the image encoder is used for encoding the input face image into a face feature vector when the face image is input;

the generator adjustment module is used for acquiring a second face image sample with a preset image style, adjusting the pre-training generator based on the second face image sample, and acquiring a stylized generator;

a parameter pair determining module, configured to determine at least one pair of associated model parameters from model parameters of each of the pre-training generator and the stylized generator, and obtain a pair of model parameters;

the parameter adjustment module is used for interpolating each model parameter pair, and updating model parameters belonging to the model parameter pair for interpolation in the stylized generator by utilizing an interpolation result to obtain an image decoder; the image decoder is used for decoding the face feature vector output by the image encoder to obtain three-dimensional style face data.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the above model processing method when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the model processing method described above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when being executed by a processor, implements the steps of the above model processing method.

The model processing method, the device, the computer equipment, the storage medium and the computer program product are used for obtaining the pre-training generator, the pre-training generator can be used for generating three-dimensional face data according to the face feature vectors, the three-dimensional face data are used for rendering face images according to the appointed visual angles, the image encoder can be trained based on the first face image sample and the pre-training generator, the input face images are encoded into the face feature vectors through the image encoder, the pre-training generator is adjusted based on the second face image sample with the preset image style to obtain the stylized generator, at least one pair of associated model parameters are determined from the model parameters of the pre-training generator and the stylized generator, the model parameter pairs are obtained, interpolation is carried out on each model parameter pair, the model parameters of the model parameter pairs which belong to the interpolation model parameter pairs in the stylized generator are updated through interpolation results, and the image decoder is obtained.

In a sixth aspect, the present application provides an image generation method. The method comprises the following steps:

acquiring a face image;

encoding the face image into a face feature vector by a pre-trained image encoder;

decoding the face feature vector through a pre-trained image decoder to obtain three-dimensional style face data of the face image;

acquiring a view angle parameter value, and rendering according to the view angle parameter value and the three-dimensional style face data to obtain a style face image of the face image under the view angle corresponding to the view angle parameter value;

the image decoder is obtained by interpolating at least one pair of associated model parameters of a pre-training generator and a stylized generator and then updating the model parameters in the stylized generator by using an interpolation result; the stylized generator is obtained by adjusting a pre-training generator, and the pre-training generator is used for generating three-dimensional face data.

In a seventh aspect, the present application further provides an image generating apparatus. The device comprises:

the face image acquisition module is used for acquiring a face image;

the feature vector coding module is used for coding the face image into a face feature vector through a pre-trained image coder;

The feature vector decoding module is used for decoding the face feature vector through a pre-trained image decoder to obtain three-dimensional style face data of the face image;

the style image rendering module is used for acquiring a view angle parameter value, and rendering the style face image of the face image under the view angle corresponding to the view angle parameter value according to the view angle parameter value and the three-dimensional style face data; the image decoder is obtained by interpolating at least one pair of associated model parameters of a pre-training generator and a stylized generator and then updating the model parameters in the stylized generator by using an interpolation result; the stylized generator is obtained by adjusting a pre-training generator, and the pre-training generator is used for generating three-dimensional face data.

In an eighth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the image generation method described above when the processor executes the computer program.

In a ninth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the image generation method described above.

In a tenth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above-described image generation method.

The image generating method, the device, the computer equipment, the storage medium and the computer program product are characterized in that the face image is encoded into the face feature vector through a pre-trained image encoder, the face feature vector is decoded through a pre-trained image decoder, and three-dimensional style face data of the face image are obtained.

Drawings

FIG. 1 is a diagram of an application environment for a model processing method and an image generation method in some embodiments;

FIG. 2 is a flow diagram of a model processing method in some embodiments;

FIG. 3 is a flow chart illustrating steps for determining model parameter pairs in some embodiments;

FIG. 4 is a flow chart of an image generation method according to other embodiments;

FIG. 5 is an overall system framework diagram of the present application in some embodiments;

FIG. 6 is a block diagram of a corresponding embodiment of a grid generator;

FIG. 7 is a block diagram of a training image encoder in some embodiments;

FIG. 8 is a schematic diagram of 3D stylized results in some embodiments;

FIG. 9 is a schematic diagram of a 3D stylized result in other embodiments;

FIG. 10 is a schematic diagram of 3D stylized results in yet other embodiments;

FIG. 11 is a schematic diagram of an obtained image decoder in some embodiments;

FIG. 12 is a block diagram of a model processing device in some embodiments;

FIG. 13 is a block diagram of an image generation device in some embodiments;

FIG. 14 is an internal block diagram of a computer device in some embodiments;

FIG. 15 is an internal block diagram of a computer device in some embodiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The embodiment of the application provides a model processing method and an image generating method, which relate to Machine Learning (ML), computer vision technology (Reinforcement Learning, RL) and other technologies in artificial intelligence, wherein:

artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The model processing method and the image generating method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The terminal 102 and the server 104 may communicate via a network, such as a wired or wireless network. The data storage system may store data that the server 104 needs to process. The data storage system may be provided separately, may be integrated on the server 104, or may be located on a cloud or other server. The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. A client running a target application provided with an image stylization function may be installed in the terminal 102. In addition, the form of the target application is not limited, and may be a parent application running in an operating system, or a child application running in the parent application, for example, an applet, or may be in the form of a web page. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The server 102 may be a background server of the target application, for providing background services for the target application.

The model processing method provided by the embodiment of the application aims at training and obtaining an image encoder and an image decoder, and the image generating method of the application aims at performing image stylization on a two-dimensional face image through the obtained image encoder and the image decoder to obtain three-dimensional style face data, so that the face image under a specified viewing angle is rendered according to the three-dimensional style face data, and a style face image with multi-viewing angle consistency is obtained. In a specific application, the execution subject of each step in the model processing method and the image processing method provided in the embodiments of the present application may be a computer device, where the computer device refers to an electronic device with data computing, processing and storage capabilities. Taking the application environment shown in fig. 1 as an example, the terminal 102 may perform the model processing method or the image generating method alone, the server 104 may perform the model processing method or the image generating method alone, or the terminal 102 and the server 104 may perform the model processing method or the image generating method in an interactive and coordinated manner, which is not limited in this application.

Optionally, the server 102 may utilize a first face image sample and a pre-training generator, train an image encoder, adjust the pre-training generator based on a second face image sample, obtain a stylized generator, determine at least one pair of associated model parameters from model parameters of each of the pre-training generator and the stylized generator, obtain a pair of model parameters, interpolate each pair of model parameters, and update model parameters belonging to the pair of model parameters for interpolation in the stylized generator by using an interpolation result, obtain an image decoder, and the server 102 may generate the image encoder and the image decoder to a terminal, so that when the terminal obtains a face image, encode the face image into a face feature vector through the image encoder, decode the face feature vector through the image decoder, obtain three-dimensional style face data of the face image, and render the face image according to the obtained view angle parameter value and the three-dimensional style face data, so as to obtain the style face image of the face image under the view angle corresponding to the view angle parameter value.

In some embodiments, as shown in fig. 2, a model processing method is provided, where the method is performed by a computer device, and the computer device may be the server 104 or the terminal 102 in fig. 1, and in this embodiment, the application of the method to the server 104 in fig. 1 is described as an example, where the method includes the following steps:

step 202, a pre-training generator is obtained.

The pre-training generator is used for acquiring the face feature vectors, generating three-dimensional face data according to the face feature vectors, and rendering face images according to the appointed visual angles. The pre-training generator is a generator in a pre-training generated countermeasure network, and the pre-training generated countermeasure network further comprises pre-training judgment, and the pre-training generator and the pre-training discriminator are obtained through countermeasure training. The facial feature vectors obtained by the pre-training generator can be obtained by random sampling from the hidden space corresponding to the pre-training generator. Three-dimensional face data is a three-dimensional representation of face features. Alternatively, the three-dimensional face data may be an implicit representation of the face image, for example, a three-dimensional neural radiation field of the face image. Here, the neural radiation field (Neural Radiance Field, neRF) is a new approach to visual angle synthesis (novel view synthesis) implemented by a multi-layer perceptron (Multi Layer Perceptrons, MLP). Alternatively, the three-dimensional face data may also be a display representation of a face image, for example a voxel grid (voxel grids) of the face image. Alternatively, the three-dimensional face data may also be a display of a face image and an implicit hybrid representation. The specific form of the three-dimensional face data is not limited in the present application.

In particular, the server may obtain a pre-trained generator. The pre-training generator has the capability of generating three-dimensional face data, and a corresponding hidden space exists, when face feature vectors are sampled from the hidden space and input into the pre-training generator, the pre-training generator can generate three-dimensional face data, and the three-dimensional face data can render a face image under a corresponding view angle of the view angle reference value under the condition of the given view angle reference value, so that the face image with multi-view angle consistency can be obtained. The view angle parameter value herein refers to a parameter describing a view angle, and may specifically be a camera parameter required when rendering.

In a specific application, the pre-training generator acquired by the server may be obtained by pre-training the server, and when the server trains to obtain the generated countermeasure network, the server may store the generated countermeasure network obtained by training, so that when the server executes the model processing method of the present application, the server may acquire the pre-training generator from the stored generated countermeasure network. In other embodiments, the pre-training generator may also be server-derived from other computer devices, i.e., the computer device that trains the generation of the challenge network may be other computer devices than a server. The generation of the countermeasure network can be implemented by adopting the existing 3D GAN model structure, and the model structure is not limited in the application. The training mode of pre-training the generated countermeasure network can be implemented by adopting the existing 3D GAN training mode, and the training mode is not limited in the application.

Alternatively, the pre-trained generation of the challenge network may be obtained by FFHQ (Flickr-Faces-high-Quality) data set training. FFHQ is created by using inflight as a reference for generating a countermeasure network (GAN), and the training dataset for Style GAN is an open-source high-quality face dataset, and contains 70000 PNG format high-definition face images with 1024x1024 resolution, which are rich and diverse in age and image background and obvious in difference, and also have very much change in face attribute.

Step 204, a first face image sample is obtained, and based on the first face image sample and the pre-training generator, an image encoder is trained, wherein the image encoder is used for encoding the input face image into face feature vectors when the face image is input.

The training image encoder is used for encoding the input face image into a face feature vector when the face image is input, the face feature vector is a feature vector of a hidden space of a pre-training generator, the face feature vector can be decoded through the pre-training generator to obtain three-dimensional face data, namely, the purpose of the training image encoder is to enable the image encoder to have the capability of encoding the face image into the hidden space of the pre-training generator, and therefore the pre-training generator can generate the three-dimensional face data for the given face image.

Specifically, the server acquires a first face image sample, inputs the first face image sample into an encoder to be trained, acquires a coding feature vector through the encoder to be trained, reconstructs the first face image sample through a pre-training generator and a given view parameter value, and finally determines loss in a training process based on the reconstructed face image, adjusts model parameters of the encoder based on the loss, and can acquire an image encoder when the training is finished, wherein the training is to generate a contrast network inversion (GAN inversion) process, and aims to train one image encoder through the pre-training generator, encode the input image into a hidden space (tension space) for generating a contrast network, and enable the pre-training generator to reconstruct the input image by using a model prior.

Optionally, the first face image sample may be a real face image, and when training is performed by using the real face image as a sample, interference of other non-face feature factors on the image reconstruction process may be avoided, so that the image encoder obtained by training is more accurate.

Step 206, obtaining a second face image sample with a preset image style, and adjusting the pre-training generator based on the second face image sample to obtain a stylized generator.

The second face image sample is used as a training sample to adjust the pre-training generator, so that the pre-training generator has the capability of performing image stylization on the image, and the adjusted pre-training generator is the stylization generator. The second face image sample has a preset image style. The preset image style refers to a preset image style. The image style refers to the artistic effect presented by the image. Alternatively, the preset image style may be any one of a cartoon style, a retro style, a cartoon style, and the like. The present application does not limit what the preset image style is specifically.

Specifically, the server may obtain a second face image sample having a preset image style, adjust model parameters of the pre-training generator based on the second face image sample, and use the adjusted pre-training generator as a stylized generator.

In a specific application, the server may perform a stylization process on the real face image to obtain a second face image sample with a preset image style. Optionally, the server may perform the stylization processing on the real face image by using any one of image stylization based on style migration, image stylization based on image translation, or image stylization based on style gan. The style migration-based image stylization aims at extracting style features from a reference style image through a deep neural network, extracting content features from an input content image, and carrying out feature fusion on the extracted style features and the extracted content features to realize the stylization of the input image. Image stylization based on image translation requires monitoring training by using a large number of image pairs in real image areas and style image areas, so that a network learns style prior of style images, and an input image is translated into style images, thereby realizing stylization.

At step 208, at least one pair of associated model parameters is determined from the model parameters of each of the pre-training generator and the stylization generator to obtain a pair of model parameters.

The stylized generator is obtained by adjusting model parameters of the pre-training generator, so that the stylized generator and the pre-training generator have the same model structure, and the associated model parameters are model parameters in the same position in the pre-training generator and the stylized generator. It will be appreciated that for any one of the model parameters in the pre-training generator, there are associated model parameters in the stylized generator, and likewise, for any one of the model parameters in the stylized generator, there are associated model parameters in the pre-training generator.

In particular, the server may determine at least a portion of the model parameters from the pre-training generator, for each of these model parameters, determine the model parameters associated therewith from the stylized generator, each two associated model parameters constituting a model parameter pair. In other embodiments, the server may determine at least a portion of the model parameters from the stylized generator, for each of these model parameters, determine the model parameters associated therewith from the pre-training generator, each two associated model parameters constituting a model parameter pair. Alternatively, at least a portion of the model parameters may be a portion of the model parameters. Alternatively still, at least a portion of the model parameters may be the entire model.

In some embodiments, the network structure of the stylized generator is consistent with a pre-training generator, the pre-training generator and the stylized generator each comprising a plurality of network layers, the model parameter pairs being model parameters in the same network layer in the stylized generator and the stylized generator. In particular, since the network structure of the stylized generator is consistent with the pre-training generator, the server may first determine one or more network layers corresponding to the location in the stylized generator and the pre-training generator, and combine the model parameters associated in the network layers corresponding to the location into a model parameter pair. By firstly determining the network layer corresponding to the position, the model parameter pair can be rapidly and accurately determined, so that the interpolation efficiency and the interpolation accuracy are improved.

And 210, interpolating each model parameter pair, and updating model parameters belonging to the model parameter pair for interpolation in the stylized generator by utilizing an interpolation result to obtain an image decoder.

The image decoder is used for decoding the face feature vector output by the image encoder to obtain three-dimensional style face data, wherein the three-dimensional style face data refers to three-dimensional face data with a preset image style. The three-dimensional style face data may be used to render a style face image having a preset image style according to a specified viewing angle. Interpolation refers to interpolating a continuous function on the basis of discrete data such that the continuous curve passes through all given discrete data points. The interpolation method may be, for example, any of linear interpolation, lagrangian interpolation, newton interpolation, spline interpolation, and the like.

Specifically, the server interpolates each model parameter pair to obtain an interpolation result of each model parameter pair, then updates model parameters of the model parameter pair in the stylized generator by using the interpolation result of each model parameter pair, and takes the stylized generator after updating as an image decoder. The image decoder can be matched with the image encoder obtained through training to realize the 3D stylization processing of the face image to obtain three-dimensional face data, specifically, the face feature vector is obtained through encoding by the image encoder after the face image is input into the image encoder, the face feature vector is input into the image decoder, and the three-dimensional face data can be obtained after decoding by the image decoder.

In some embodiments, the server interpolates each model parameter pair specifically may be: and for each model parameter pair, carrying out linear summation on the model parameters in the corresponding model parameter pair to obtain an interpolation result of the corresponding model parameter pair. Specifically, for each model parameter pair, the server multiplies each of the two model parameters in the model parameter pair by a respective linear summation weight and then sums the two model parameters, where the linear summation weight of the model parameters is less than 1 and the sum of the linear summation weights of the two model parameters is 1.

In some embodiments, the model parameter pairs determined by the server from the pre-training generator and the stylized generator are part of model parameters in the pre-training generator and the stylized generator, and the interpolation of each model parameter pair by the server may be: the weight of a pre-training generator in each model parameter pair is set to be 1, the weight of a stylized generator is set to be 0, and then model parameters in each model parameter pair are multiplied by the weight and added to obtain an interpolation result; in other embodiments, the server may further set the weight of the pre-training generator included in one part of the obtained model parameter pairs to 1, the weight of the stylized generator to 0, the weight of the pre-training generator included in the other part of the model parameter pairs to 0, the weight of the stylized generator to 1, and then multiply the model parameters in each model parameter pair by the weights and sum the result to obtain the interpolation result. The method is equivalent to mapping interpolation of model parameters, the calculation process is relatively simple, and the interpolation efficiency can be improved.

In the model processing method, a pre-training generator is obtained, the pre-training generator can be used for generating three-dimensional face data according to face feature vectors, the three-dimensional face data are used for rendering a face image according to a specified viewing angle, an image encoder can be trained based on a first face image sample and the pre-training generator, the input face image is encoded into the face feature vectors by the image encoder, the pre-training generator is adjusted based on a second face image sample with a preset image style to obtain a stylized generator, at least one pair of associated model parameters are determined from model parameters of the pre-training generator and the stylized generator, a model parameter pair is obtained, interpolation is carried out on each model parameter pair, an interpolation result is utilized, model parameters belonging to the model parameter pair subjected to interpolation in the stylized generator are updated, an image decoder is obtained, and when the face feature vectors output by the image encoder are decoded, the obtained three-dimensional face data can retain the feature information of the input face image, and the style provided by the stylized generator is realized, so that the three-dimensional face image can be applied to the three-dimensional face image, the three-dimensional face data of the face image can be obtained, and the three-dimensional face image can be obtained by using the specified viewing angle of the face image.

In some embodiments, as shown in fig. 3, determining at least one pair of associated model parameters from the model parameters of each of the pre-training generator and the stylization generator comprises:

step 302, screening at least one network layer from a plurality of network layers of a stylized generator; among the plurality of network layers of the stylized generator, there is also a network layer after at least one network layer.

Among the multiple network layers of the stylized generator, there are other network layers after at least one network layer is screened, i.e. the screened network layer is not the last network layer of the stylized generator.

Considering that the characteristics extracted by the shallow layer network are closer to the input image, the information of pixel points of more input images is contained, the information mainly comprises information with fine granularity, such as color, texture, edge and edge angle information, the characteristics extracted by the deep layer network are closer to the model output, more abstract information, namely semantic information, mainly comprises information with coarse granularity, mainly comprises the deep layer network responsible for stylizing treatment, in the interpolation process, the shallow layer network is expected to better extract the input characteristics, and the deep layer network better carries out the stylizing treatment, so that the server can screen at least one network layer from the shallow layer network of the stylizing generator to serve as a network layer for carrying out interpolation subsequently, and the deep layer network is not interpolated to better maintain the stylizing effect.

Alternatively, the server may screen out the first few network layers from the stylized generator as the network layers for subsequent interpolation. For example, the first layer to the third layer of the stylized generator are filtered out for interpolation, or the first layer to the fifth layer of the stylized generator are filtered out for interpolation. It can be understood that different layers of interpolation can obtain different stylized effects, so in practical application, the stylized effects can be adjusted by changing the number of layers of interpolation.

Step 304, filtering at least one network layer from the plurality of network layers of the pre-training generator according to the location of the at least one network layer at the plurality of network layers of the stylized generator.

Specifically, the server may screen out a corresponding network layer from a plurality of network layers in the pre-training generator according to the position of the network layer screened out from the stylized generator in the stylized generator. For example, assuming that the network layers selected in the stylized generator are first through third layers, the first through third layers are also selected from the pre-training generator.

Step 306, associating the at least one network layer selected from the stylized generator with the at least one network layer selected from the pre-training generator by network level location.

At step 308, at least one pair of model parameters is determined from the associated network layer.

Specifically, the server may associate at least one network layer selected from the stylized generator and at least one network layer selected from the pre-training generator according to a position of a network hierarchy, for example, the network hierarchies of three network layers selected from the stylized generator are respectively a first layer, a third layer and a fifth layer, the network hierarchies of three network layers selected from the pre-training generator are respectively a first layer, a third layer and a fifth layer, and the network hierarchies of the three network layers selected from the pre-training generator are associated, and the network hierarchies of the first layer are associated, and the network hierarchies of the third layer are associated, and the network hierarchies of the network hierarchy of the fifth layer are associated. Further, the server composes model parameters corresponding to the positions in each two associated network layers into a model parameter pair.

In the above embodiment, since the shallow layer network of the stylized generator can be interpolated and the model parameters of the deep layer network are kept unchanged, on one hand, the features extracted by the shallow layer network can be more similar to the face features, the face features represented by the three-dimensional style face data obtained by decoding can be accurate, and on the other hand, the three-dimensional style face data obtained by decoding can be better kept in style effect.

In some embodiments, interpolating each pair of model parameters includes: determining interpolation coefficients for each model parameter pair according to the network layer where the model parameters in the model parameter pair are located; determining the linear addition weight corresponding to each model parameter in the pair of the targeted model parameters according to the interpolation coefficient; the linear addition weight corresponding to the model parameters of the network layer in the stylized generator is positively correlated with the network depth of the network layer; and carrying out linear summation on the model parameters in the aimed model parameter pair according to the corresponding linear summation weights to obtain an interpolation result of the aimed model parameter pair.

Specifically, different interpolation coefficients may be preset for different network layers, so that after determining a pair of model parameters, the server determines, for each pair of model parameters, an interpolation coefficient according to the network layer where the model parameter in the pair of model parameters is located, where the interpolation coefficient may be used as a linear sum weight of one of the model parameters, and a value obtained by subtracting the interpolation coefficient from 1 is used as a linear sum weight of the other model parameter in the pair of model parameters, and then multiplies each model parameter in the pair of model parameters by its own linear sum weight, and sums the result of interpolation of the pair of model parameters is obtained.

Considering that the stylizing process is mainly determined by the features extracted by the deep network, and the deeper the network depth of the network layer is, the closer the extracted features are to the output, in order to better maintain the stylizing effect, when the interpolation coefficient is set, the linear addition weight corresponding to the model parameters of the network layer in the stylizing generator can be positively correlated with the network depth of the network layer where the linear addition weight is located. Here, the positive correlation refers to: under the condition that other conditions are unchanged, the directions of the two variables are the same, and when one variable is changed from large to small, the other variable is also changed from large to small. It will be appreciated that positive correlation herein means that the direction of change is consistent, but it is not required that when one variable changes a little, the other variable must also change. For example, it may be set that when the variable a is 10 to 20, the variable b is 100, and when the variable a is 20 to 30, the variable b is 120. Thus, both a and b change directions, and when a becomes larger, b becomes larger. But b may be unchanged when a is in the range of 10 to 20.

Alternatively, in the case where the interpolation coefficient is used as a linear addition weight of the stylized generator, that is, the interpolation coefficient is set to increase with an increase in the network depth. Alternatively, in the case where the interpolation coefficient is a linear addition weight of the pre-training generator, that is, the interpolation coefficient is set to decrease with an increase in the network depth.

In the above embodiment, since the interpolation coefficient may be determined according to the network layer, and the linear summation weight corresponding to the model parameter of the network layer in the stylized generator is positively correlated with the network depth of the network layer, the contribution of the stylized generator in the interpolation result may be made larger, so that the preset image style may be better maintained in the three-dimensional face data output by the decoder.

In some embodiments, the second face image sample is obtained by a second face image sample generation step, the second face image sample generation step comprising: acquiring a training sample set used in training a pre-training generator; the training sample set comprises a plurality of real face image samples; and performing image stylization on each real face image sample according to a preset image style to generate a second face image sample.

The second face image sample is an image with a preset image style and is used for adjusting the pre-training generator so that the generator has stylized capability. In consideration of the fact that the cost of manually collecting the image samples with the preset image style is too high, in the method, samples required by stylized training can be amplified through a training sample set used in training the pre-training generator, and a second face image sample for adjusting the pre-training generator is obtained.

Specifically, for the real face image samples included in the training sample set, the server may perform stylization according to a preset image style, and generate a second face image sample. Here, the stylization may be implemented by using a 2D image stylization method of the related art, and the specific stylization method is not limited in the present application.

In the above embodiment, the second face image sample is generated by stylizing the real face image in the training sample set used in training the pre-training generator, so that the cost required for manually collecting the second face image sample can be saved.

In some embodiments, the pre-training generator corresponds to a hidden space, the pre-training generator being included in a pre-trained generation countermeasure network, the generation countermeasure network further including a pre-training arbiter; adjusting the pre-training generator based on the second face image sample to obtain a stylized generator, comprising: adjusting model parameters of the pre-training discriminant based on the second face image sample to obtain a discriminant with the parameters adjusted; generating corresponding three-dimensional face data according to the face feature vectors sampled from the hidden space by a pre-training generator; determining a first training visual angle parameter value, and rendering according to the first training visual angle parameter value and the three-dimensional face data to obtain a style face image; and performing countermeasure training by using the face image of the style and the discriminator with the adjusted parameters so as to adjust model parameters of the pre-training generator and obtain the stylized generator.

The pre-training generator is included in the pre-trained generated countermeasure network, and the pre-training generator is obtained by performing countermeasure training with a pre-training judger also included in the pre-trained generated countermeasure network. In this embodiment, the pre-training generator is adjusted, so that the pre-training generator has the capability of image stylization, and in order to ensure the stylization effect, the pre-training generator can be used for countertraining by the generator and the discriminator.

Specifically, the second face image sample is firstly used as a positive sample to be input into the pre-training arbiter to obtain a discrimination result, discrimination loss is determined according to the discrimination result, and model parameters of the pre-training arbiter are adjusted according to the discrimination loss to obtain the arbiter after parameter adjustment. In the specific application, in the process of adjusting the training judgment device, the forged style face image can be generated through the pre-training generator and used as a negative sample to be input into the pre-training judgment device, so as to obtain a judgment result, and the judgment loss is determined according to the judgment result. The discrimination loss is used to characterize the difference between the discrimination result and the discrimination label, and the discrimination loss is positively correlated with the difference between the discrimination result and the discrimination label, and the larger the difference between the discrimination result and the discrimination label is, the larger the discrimination loss is. The true second face image sample is a positive sample, the judging label is 1, the style face image generated by the generator is a negative sample, and the judging label is 0. It will be appreciated that the discrimination losses herein may be calculated using a common loss function that generates discriminators in the antagonism network.

The parameter-adjusted arbiter is adjusted through a specific preset second face image sample of the image style, and has the capability of identifying whether the image is a real second face image sample, that is, the adjusted arbiter has the capability of identifying whether the image is a real stylized image, and after the parameter-adjusted arbiter is obtained, the pre-training generator can be subjected to countermeasure training through the parameter-adjusted arbiter.

Specifically, the server may sample the face feature vector from the hidden space, input the sampled face feature vector into the pre-training generator, and output three-dimensional face data through the pre-training generator. Further, the server may determine a first training view angle parameter value, render a style face image according to the first training view angle parameter value and the three-dimensional face data, perform countermeasure training by using the style face image and the discriminator after parameter adjustment, so as to adjust model parameters of the pre-training generator and obtain a stylized generator, so that the pre-training generator for generating the three-dimensional face data may be trained by using the two-dimensional image as supervision information. As one possible implementation, the server may randomly determine the view parameter value as the first training view parameter value.

In the above embodiment, firstly, the model parameters of the pre-training discriminant are adjusted through the second face image sample with the preset image style, and after the discriminant with the adjusted parameters is obtained, the discriminant with the adjusted parameters is used for countertraining so as to adjust the model parameters of the pre-training generator and obtain the stylized generator, so that the three-dimensional face data generated by the stylized generator has a better stylized effect.

In some embodiments, the pre-training generator includes a mapping network for generating intermediate feature vectors from face feature vectors sampled from the hidden space and a synthesis network for generating three-dimensional face data using the intermediate feature vectors.

Considering that the generator's ability to control visual features using input vectors is very limited, from a macroscopic perspective, the input random vectors must follow the probability distribution of the training set, e.g., if an image of a black hair person is more common in the data set, more input values will be mapped to the feature, a phenomenon known as feature entanglement.

In this embodiment, the pre-training generator includes a mapping network (synthesis network) and a synthesizing network, the mapping network is aimed to encode the input face feature vector into an intermediate feature vector, so that the generator can generate the intermediate feature vector which does not have to follow the training data distribution, and can reduce the correlation between features, thereby decoupling between features.

In some embodiments, the countermeasure training is performed by using the face image of the style and the discriminator with the adjusted parameters to adjust model parameters of the pre-training generator, including: inputting the style face image into a discriminator with the parameters adjusted to obtain a discrimination result; determining stylized loss of the pre-training generator according to the discrimination result; in the case of freezing model parameters of the mapping network in the pre-training generator, model parameters of the composite network in the pre-training generator are adjusted based on the stylized penalty.

The judging result is used for representing whether the image input into the judging device is a real second face image sample or not. Alternatively, the discrimination result may be a probability value of the true second face image sample of the image input to the discriminator, for example, the discrimination result may be a probability value of 0.8 for a certain image input to the discriminator. The parameters of the pre-training generator are adjusted by the stylization penalty in order to obtain a stylized generator that can generate a stylized image in spurious, i.e. such that the generated stylized image can successfully "fool" the arbiter. The stylized penalty may be calculated using a generic penalty function that is generated against the generators in the network.

Specifically, the server inputs the style face image generated by the generator into the parameter-adjusted discriminator, outputs a discrimination result through the parameter-adjusted discriminator, determines the stylized loss of the pre-training generator according to the difference between the discrimination result and the discrimination label, and adjusts the model parameters of the synthetic network in the pre-training generator according to the stylized loss under the condition that the model parameters of the network are mapped in the pre-training generator are frozen. The model parameters of the mapping network in the pre-training generator are frozen, namely, the model parameters of the mapping network in the pre-training generator are kept unchanged in the parameter adjustment process, and only the model parameters of the synthesized network are adjusted.

In some embodiments, the present application may perform multiple rounds of countermeasure training, that is, after model parameters of a pre-training generator are adjusted, the adjusted parameters of the discriminators are continuously trained by the generator after parameter adjustment, and then the model parameters of the pre-training generator are adjusted by continuously using the obtained discriminators, so that after multiple rounds of countermeasure training, a stylized generator with better performance may be obtained. It will be appreciated that in multiple rounds of countermeasure training, the model parameters of the mapping network may be fixed each time, only the model parameters of the composite network being adjusted.

In the above embodiment, in the parameter adjustment process, the model parameters of the mapping network in the pre-training generator are fixed, and only the model parameters of the synthesis network are adjusted, so that the pre-training generator and the stylized generator obtained after adjustment can share the same hidden space, and therefore, the stylized result is ensured to retain the characteristic information of the original image.

In some specific embodiments, the server adjusts the pre-training generator based on the second face image sample, and when obtaining the stylized generator, the method specifically includes the following steps:

1. and obtaining a visual angle parameter value of a real face image sample corresponding to the second face image sample.

The real face image sample corresponding to the second face image sample refers to an image used for performing stylization to obtain the second face image sample.

2. And adjusting model parameters of the pre-trained discriminant based on the second face image sample and the acquired visual angle parameter value to obtain the discriminant with the adjusted parameters.

The second face image samples are obtained by image-stylizing the real face image samples, and therefore, for each second face image sample, the viewing angle parameter value of the real face image sample used for the stylized obtaining of the second face image sample may be used as the style parameter value of the second face image sample. The server can input the view angle parameter value of the second face image sample as a condition to the discriminator together with the second face image sample, so that the discriminator can discriminate not only the authenticity of the input image, but also the view angle consistency of the input image and the input view angle parameter value.

3. And generating corresponding three-dimensional face data according to the face feature vectors sampled from the hidden space by the pre-training generator.

4. And determining a first training visual angle parameter value, and rendering according to the first training visual angle parameter value and the three-dimensional face data to obtain a style face image.

Optionally, the first training perspective parameter value may be matched with the real face image sample, that is, when the style face image of the style face image is rendered, the first training perspective parameter value may be rendered according to the perspective parameter value of the real face image sample corresponding to the second face image sample.

5. And inputting the style face image and the first training visual angle parameter value into a parameter-adjusted discriminator to obtain a discrimination result.

6. And determining the stylized loss of the pre-training generator according to the judging result.

7. In the case of freezing model parameters of the mapping network in the pre-training generator, model parameters of the composite network in the pre-training generator are adjusted based on the stylized penalty.

The first training view angle parameter value is added, so that the discriminator can perform conditional discrimination, the obtained discrimination result can represent whether the input image is a real second face image sample or not, and can also represent whether the input image is matched with the input first training view angle parameter value, and when the model parameters of the synthesis network of the pre-training generator are adjusted by determining the stylization loss based on the discrimination result, the three-dimensional style face data output by the synthesis network can be more accurate.

In the above embodiment, the view angle parameter value may be input as a condition to the discriminator, so that the discriminator may not only discriminate the authenticity of the image, but also discriminate the consistency of the image and the view angle, thereby making the adjusted stylized generator more accurate.

In some embodiments, based on the first face image sample and the pre-training generator, training the image encoder includes: inputting the first face image sample into an encoder to be trained, and encoding the first face image sample into a training face feature vector through the encoder to be trained; inputting the training face feature vector into a pre-training generator, and decoding the training face feature vector through the pre-training generator to obtain training three-dimensional face data; acquiring a second training view angle parameter value, and determining a reconstructed face image under a view angle corresponding to the second training view angle parameter value according to the training three-dimensional face data and the second training view angle parameter value; and training the encoder to be trained according to the first face image sample and the reconstructed face image to obtain the image encoder.

The encoder to be trained may be an encoder based on a convolutional neural network (convolutional neural network), and the encoder to be trained may encode an image, but cannot encode the image to a hidden space generated by pre-training, after training through the training step of this embodiment, generation of a countermeasure network inversion (GAN inversion) may be achieved, generation of a generator for the purpose of given pre-training of the countermeasure network inversion, training of an image encoder, and encoding an input image into a hidden space (latency space) for generating the countermeasure network, so that the generator can reconstruct the input image using a model a priori.

Specifically, the server inputs a first face image sample into an encoder to be trained, performs image encoding through the encoder to be trained to output a training face feature vector, inputs the training face feature vector into a pre-training generator, performs image decoding through the pre-training generator to output three-dimensional face data, further obtains a second training view angle parameter value, performs rendering according to the training three-dimensional face data and the second training view angle parameter value, determines a reconstructed face image under a corresponding view angle of the second training view angle parameter value, then determines loss of the image encoder based on difference between the first face image sample and the reconstructed face image, and performs positive correlation between the determined loss and the difference between the first face image sample and the reconstructed face image, and further performs parameter adjustment towards the encoder to be trained until a training stop condition is met.

The training stopping condition may be that the loss reaches a minimum value, the training times reach a preset number of times or the training time reaches a preset time.

Alternatively, the second training viewing angle parameter value may be a viewing angle parameter value at a random viewing angle. Alternatively, the second training perspective parameter value may be a perspective parameter value of a first face image sample input to the encoder to be trained, and the server may perform pose estimation on the first face image sample input to the encoder to be trained to obtain the perspective parameter value of the first face image sample.

In some embodiments, the pre-training generator is included in a pre-trained generating countermeasure network, the pre-trained generating countermeasure network further includes a pre-training discriminator, the server may further input the reconstructed face image into the pre-training discriminator, output a discrimination result through the pre-training discriminator, determine a generating loss of the generator according to the discrimination result, count the generating loss and the reconstructing loss, obtain a statistical loss, and train the encoder to be trained according to the statistical loss, and obtain the image encoder. Further determination of the generation loss of the generator by combining the determiner may make the obtained image encoder more accurate. It will be appreciated that the generation penalty herein is positively correlated with the probability that the reconstructed face image is a true sample image as determined by the determiner.

In the above embodiment, the 2D face image is inverted into the hidden space of the 3D generation countermeasure network, and the 3D generation countermeasure network is used to realize three-dimensional reconstruction of a single 2D image, and the reconstructed face image is rendered according to the three-dimensional face data obtained by reconstruction, so that the encoder to be trained can be trained according to the reconstructed face image and the first face image sample, and the more accurate image encoder can be trained.

In some embodiments, obtaining the second training perspective parameter value comprises: performing pose estimation on the first face image sample through a pre-trained visual angle parameter estimation model to obtain a second training visual angle parameter value; the above model processing method further comprises: and retraining the view angle parameter estimation model according to the first face image sample and the reconstructed face image.

Specifically, in this embodiment, the server may perform pose estimation on the first face image sample through the pre-trained view parameter estimation model to obtain the second training view parameter value, and further may retrain the view parameter estimation model toward the direction in which the loss is reduced while taking the loss determined according to the difference between the first face image sample and the reconstructed face image as the loss of the view parameter estimation model. The visual angle parameter estimation model obtained through training can be used for carrying out pose estimation on the input face image to obtain the visual angle parameter value of the face image.

In some embodiments, as shown in fig. 4, there is provided an image generating method, which is performed by a computer device, which may be the server 104 or the terminal 102 in fig. 1, and in this embodiment, the method is applied to the terminal 102 in fig. 1, and is described by taking as an example, the following steps:

Step 402, a face image is acquired.

Step 404, encoding the face image into a face feature vector by a pre-trained image encoder.

And step 406, decoding the face feature vector through a pre-trained image decoder to obtain the three-dimensional style face data of the face image.

Step 408, obtaining a view angle parameter value, and rendering to obtain a style face image of the face image under the view angle corresponding to the view angle parameter value according to the view angle parameter value and the three-dimensional style face data.

The image decoder is obtained by interpolating at least one pair of associated model parameters of the pre-training generator and the stylized generator and then updating the model parameters in the stylized generator by utilizing an interpolation result; the stylized generator is obtained by adjusting a pre-training generator, and the pre-training generator is used for generating three-dimensional face data.

Specifically, the terminal may acquire a face image, input the acquired face image to a pre-trained image encoder, encode the face image into a face feature vector through the image encoder, further input the face feature vector to a pre-trained image decoder, and decode the face feature vector through the pre-trained image decoder to obtain three-dimensional style face data of the face image, so that under the condition that a view angle parameter value is acquired, the style face image of the face image under the view angle corresponding to the view angle parameter value can be rendered according to the view angle parameter value and the three-dimensional style face data.

In some embodiments, the image encoder and the image decoder in the above-described image generation method may be obtained by the model processing method of any embodiment of the present application.

In some embodiments, the face image is obtained by acquiring a face at a first time; acquiring the viewing angle parameter values, comprising: acquiring a face image obtained by acquiring a face at a second moment; the second moment is after the first moment; and carrying out pose estimation on the face image acquired at the second moment to obtain a visual angle parameter value. In this embodiment, the face image obtained at the first moment may be used as basic to provide feature information of a face, and three-dimensional style face data of the face may be obtained after the feature information is encoded by the image encoder and decoded by the image decoder, so that when the face image of the face is obtained again at any second moment after the first moment, the obtained face image may be subjected to pose estimation to obtain a view angle parameter value, and thus the three-dimensional style face data may be rendered according to the obtained view angle parameter value, and a style face image under the view angle corresponding to the view angle parameter value may be obtained.

According to the image generation method, the face image is encoded into the face feature vector through the pre-trained image encoder, the face feature vector is decoded through the pre-trained image decoder, and the three-dimensional style face data of the face image are obtained.

In some embodiments, a model processing method and an image generating method are provided, and the following details are described below:

the existing face stylization method is mainly focused on 2D image stylization, and the 2D image stylization method is only used for stylization on a 2D image domain, so that the application scene is limited. The 3D image stylization comprises two steps of three-dimensional reconstruction and stylization of a 2D image, can generate stylized 3D face representation with consistent multi-view angles, has wider application scenes, can be applied to the fields of virtual reality, game role modeling, rendering and the like, and promotes the landing and development of the universe.

Referring to fig. 5, in order to provide an overall system frame diagram of the present application, the overall system is composed of a pair of image encoders and image decoders, given an input 2D face image, a 3D stylized result of the 2D face image may be generated end-to-end, and style face images of each view angle in the 3D stylized result may well maintain feature information of the input face image.

Referring to fig. 6, 7 and 8, the present application first utilizes a stylized dataset, such as a cartoon avatar, to fine tune a 3D generated countermeasure network pre-trained on a real face dataset FFHQ so that it can randomly generate a stylized 3D avatar. An encoder based on a convolutional neural network (convolutional neural network) is trained, the RGB face image is encoded into a hidden space (space) for generating a countermeasure network, and 3D reconstruction of the input face image is realized by means of 3D priori of the 3D generated countermeasure network. Finally, model interpolation is carried out on the pre-trained 3D generation countermeasure network and the stylized generation countermeasure network by a model soft interpolation (soft layer mixing) method, so as to obtain the decoder.

The following sections are specifically described:

1. fine-tuning pre-trained 3D generation countermeasure network

Given a 3D generated countermeasure network pre-trained on FFHQ datasets, the present application first uses the animation dataset to fine tune the pre-trained network so that it can generate animation head portraits. Unlike fine-tuning 2D generation countermeasure networks, fine-tuning 3D generation countermeasure networks requires a large number of animation head portraits having different perspectives as training data to ensure multi-perspective consistency of the generated results. However, there is no large-scale cartoon head data set with view angle diversity, and the cost for manually collecting the data is too high. In order to solve the problem, the method uses the existing 2D image stylization method to stylize the face head portrait in the FFHQ data set, and generates cartoon data with different visual angles for training.

The specific fine tuning steps comprise: adjusting model parameters of the pre-training discriminant based on the second face image sample to obtain a discriminant with the parameters adjusted; generating corresponding three-dimensional face data according to the face feature vectors sampled from the hidden space by a pre-training generator; determining a first training view angle parameter value, rendering according to the first training view angle parameter value and the three-dimensional face data to obtain a style face image, inputting the style face image into a discriminator with parameters adjusted to obtain a discrimination result, determining stylized loss of the pre-training generator according to the discrimination result, and adjusting model parameters of the pre-training generator based on the stylized loss.

The pre-training generator is a mapping network and a synthesis network, wherein the mapping network is used for generating an intermediate feature vector according to face feature vectors sampled from a hidden space, and the synthesis network is used for generating three-dimensional face data by utilizing the intermediate feature vector. In order to ensure that the pre-training model and the fine-tuned stylized model share the same hidden space (space), thereby ensuring that the stylized result retains the characteristic information of the original image, the weight of a mapping network (mapping network) of a fixed generator is trained in the training process, and only a synthesis network (synthesis network) is trained to obtain the stylized generator。

2. Generating an countermeasure network inversion

Generating a contrast network inversion (GAN inversion) aims at giving a pre-trained generator, training an image encoder, encoding the input image into a hidden space (space) that generates the contrast network, enabling the generator to reconstruct the input image using a model a priori. The existing inversion method mainly focuses on reconstructing a 2D image by using a 2D generation countermeasure network, an image encoder is trained, the 2D image is inverted into a hidden space of a 3D generation countermeasure network, and the 3D generation countermeasure network is used for priori reconstructing a single 2D image.

The specific training steps comprise: inputting the first face image sample into an encoder to be trained, and encoding the first face image sample into a training face feature vector through the encoder to be trained; inputting the training face feature vector into a pre-training generator, and decoding the training face feature vector through the pre-training generator to obtain training three-dimensional face data; acquiring a second training view angle parameter value, and determining a reconstructed face image under a view angle corresponding to the second training view angle parameter value according to the training three-dimensional face data and the second training view angle parameter value; and training the encoder to be trained according to the first face image sample and the reconstructed face image to obtain the image encoder.

The generated image encoder may be used to encode the input face image into a face feature vector when the face image is input, and the whole process may be expressed as the following formula (1):

formula (1)

Wherein,for the input face image, < >>Is an image encoder based on convolutional neural networks, which functions to encode an input face image into a hidden space that generates an countermeasure network. />Generating a pretraining generator in an countermeasure network for a pretrained 3D, given an input face image +. >Different camera parameters are entered +.>The network can render face images with different visual angles +.>。

3. Model interpolation

The application adopts a model soft interpolation (soft layer mixing) method to perform model interpolation on a pre-training generator and a stylized generator to obtain an image decoder. The image decoder can be expressed as the following formula (2):

formula (2)

Wherein the method comprises the steps ofRepresenting the image decoder->Model weight of layer->And->Representing a pre-training generator and a stylization generator, respectively, < >>Indicate->Interpolation coefficient of layer, < >>For model interpolation, the first three layers to the first five layers are generally selected. By means of soft interpolation of the model, smooth interpolation between the stylized generator and the pre-training generator can be achieved, so that the output of the image decoder can keep the characteristic information of the input face image, and the model has the style provided by the stylized generator.

The obtained image decoder is used for decoding the face feature vector output by the image encoder to obtain three-dimensional style face data. The reasoning process of the whole model can be formulated as:

formula (3)

Wherein the method comprises the steps ofFor stylized face image, +.>For a given input face image, using different camera parameters +. >Rendering 3D representation of model output, capable of rendering stylized results of different angles +.>。

To verify the validity of the present application, the examples of the present application performed experiments on a series of cartoon head data sets. As shown in fig. 9, the present application may reconstruct a stylized image with consistent multiple views from a single input face image. Meanwhile, interpolation can be flexibly carried out on different layers of the model, and the stylization degree of an output result is controlled by adjusting the interpolation layer number. As shown in fig. 10 and 11, model interpolation is performed on the first three layers of the stylized model, the first five layers and the pre-training model, so that different stylized effects can be obtained. When the model interpolation is performed on the first three layers, the obtained stylized image has higher stylized degree compared with the model interpolation on the first five layers, and as can be seen, the deeper the interpolated network layer is, the more features of the input image are reserved by the obtained stylized image, so that the lower the stylized degree is, the shallower the interpolated network layer is, the fewer features of the input image are reserved by the obtained stylized image, and the higher the stylized degree is.

The image encoder and the image decoder obtained by the method can be applied to scenes such as video call, video conference, animation production, virtual anchor, game role modeling, virtual reality and the like. In scenes such as video conferences, the method and the device can be applied to customizing virtual ginseng images according to the five-sense organ characteristics of the participants; in the animation production scene, the method and the device can get rid of the dependence of the traditional animation film on the dynamic capture technology, and reduce the production cost; in technologies such as game role modeling and virtual anchor generation, 3D modeling results with different styles can be conveniently provided by applying the method and the device. In virtual reality and meta universe, the application has wide application scene.

Taking a video conference application scene as an example, an image encoder and an image decoder are integrated in the video conference application, a terminal provided with the video conference application acquires face images of users at the first moment when a conference starts, the acquired face images are input into the image encoder, the face images are encoded into face feature vectors through the image encoder, the face feature vectors are decoded through the image decoder to obtain three-dimensional style face data of the users, the face images can be acquired in real time after the first moment in the conference process, pose estimation is carried out on the acquired face images to obtain camera parameters, the style face images corresponding to the camera parameters are rendered according to the camera parameters and the three-dimensional style face data, the style face images are transmitted to a server as video frames in the video conference, and the server encodes the video frames and then transmits the video frames to terminals of other users participating in the video conference, so that the user terminals of other video conferences can display the video frames, and each user participating in the conference can display stylized images.

Taking a game role modeling application scene as an example, an image encoder and an image decoder are deployed in a server, after a face image for modeling a game role is acquired by the server, the face image is input into the image encoder, the face image is encoded into a face feature vector through the image encoder, the face feature vector is further input into the image decoder, and the face feature vector is decoded through the image decoder, so that three-dimensional style face data of the face image are obtained, and a plurality of style face images with multi-view consistency can be modeled under the condition that view parameter values of a plurality of different views are given.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiments of the present application also provide a model processing apparatus for implementing the above-mentioned model processing method, and an image generating apparatus for implementing the above-mentioned image generating method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitations in the embodiments of one or more model processing devices and image generating devices provided below may refer to the limitations of the model processing method and the image generating method hereinabove, and are not repeated herein.

In some embodiments, as shown in fig. 12, there is provided a model processing apparatus 1200 comprising:

a generator acquisition module 1202 for acquiring a pre-training generator; the pre-training generator is used for acquiring the face feature vector, generating three-dimensional face data according to the face feature vector, and rendering a face image according to a specified visual angle;

the encoder training module 1204 is configured to obtain a first face image sample, and train the image encoder based on the first face image sample and the pre-training generator; the image encoder is used for encoding the input face image into a face feature vector when the face image is input;

the generator adjustment module 1206 is configured to obtain a second face image sample with a preset image style, and adjust the pre-training generator based on the second face image sample to obtain a stylized generator;

a parameter pair determining module 1208, configured to determine at least one pair of associated model parameters from model parameters of each of the pre-training generator and the stylization generator, and obtain a pair of model parameters;

a parameter adjustment module 1210, configured to interpolate each model parameter pair, and update model parameters belonging to the model parameter pair for interpolation in the stylized generator by using the interpolation result, so as to obtain an image decoder; the image decoder is used for decoding the face feature vector output by the image encoder to obtain three-dimensional style face data.

The model processing device acquires the pre-training generator, the pre-training generator can be used for generating three-dimensional face data according to the face feature vector, the three-dimensional face data are used for rendering a face image according to a specified view angle, the image encoder can be trained based on the first face image sample and the pre-training generator, the input face image is encoded into the face feature vector by the image encoder, the pre-training generator is adjusted based on the second face image sample with a preset image style to obtain a stylized generator, at least one pair of associated model parameters are determined from model parameters of the pre-training generator and the stylized generator, a model parameter pair is obtained, interpolation is carried out on each model parameter pair, an interpolation result is utilized, model parameters belonging to the model parameter pair subjected to interpolation in the stylized generator are updated, an image decoder is obtained, and when the face feature vector output by the image encoder is decoded, the obtained three-dimensional style face data can retain the feature information of the input face image, and the style provided by the stylized generator is realized, so that the three-dimensional face image can be applied to the three-dimensional face image, the three-dimensional face data can be obtained by utilizing the three-dimensional face image, and the specified view angle of the face image can be obtained.

In some embodiments, the parameter adjustment module is further configured to: and for each model parameter pair, carrying out linear summation on the model parameters in the corresponding model parameter pair to obtain an interpolation result of the corresponding model parameter pair.

In some embodiments, the network structure of the stylized generator is consistent with a pre-training generator, which includes multiple network layers, respectively, each pair of model parameters being model parameters in the same network layer in the stylized generator and the stylized generator.

In some embodiments, the parameter pair determination module is further configured to: screening at least one network layer from a plurality of network layers of the stylized generator; among the plurality of network layers of the stylized generator, there is also a network layer after at least one network layer; screening at least one network layer from the plurality of network layers of the pre-training generator according to the position of the at least one network layer in the plurality of network layers of the stylized generator; correlating the at least one network layer selected from the stylized generator with the at least one network layer selected from the pre-training generator by network level location; at least one pair of model parameters is determined from the associated network layer.

In some embodiments, the parameter adjustment module is further configured to: determining interpolation coefficients for each model parameter pair according to the network layer where the model parameters in the model parameter pair are located; determining the linear addition weight corresponding to each model parameter in the pair of the targeted model parameters according to the interpolation coefficient; performing linear summation on the model parameters in the targeted model parameter pair according to the corresponding linear summation weights to obtain an interpolation result of the targeted model parameter pair; the linear addition weight corresponding to the model parameters of the network layer in the stylized generator is positively correlated with the network depth of the network layer.

In some embodiments, the apparatus further comprises a sample generation module for: acquiring a training sample set used in training a pre-training generator; the training sample set comprises a plurality of real face image samples; and performing image stylization on each real face image sample according to a preset image style to generate a second face image sample.

In some embodiments, the pre-training generator corresponds to a hidden space, the generating the countermeasure network further comprising a pre-training arbiter; the generator adjustment module is further configured to: adjusting model parameters of the pre-training discriminant based on the second face image sample to obtain a discriminant with the parameters adjusted; generating corresponding three-dimensional face data according to the face feature vectors sampled from the hidden space by a pre-training generator; determining a first training visual angle parameter value, and rendering according to the first training visual angle parameter value and the three-dimensional face data to obtain a style face image; and performing countermeasure training by using the face image of the style and the discriminator with the adjusted parameters so as to adjust model parameters of the pre-training generator and obtain the stylized generator.

In some embodiments, the pre-training generator comprises a mapping network for generating intermediate feature vectors from face feature vectors sampled from the hidden space and a synthesis network for generating three-dimensional face data using the intermediate feature vectors; the generator adjustment module is further configured to: inputting the style face image into a discriminator with the parameters adjusted to obtain a discrimination result; determining stylized loss of the pre-training generator according to the discrimination result; in the case of freezing model parameters of the mapping network in the pre-training generator, model parameters of the composite network in the pre-training generator are adjusted based on the stylized penalty.

In some embodiments, the second face image sample is obtained by image stylizing a real face image sample, and the generator adjustment module is further configured to: obtaining a visual angle parameter value of a real face image sample corresponding to the second face image sample; adjusting model parameters of the pre-training discriminant based on the second face image sample and the acquired view angle parameter values to obtain a discriminant with the parameters adjusted; and inputting the style face image and the first training visual angle parameter value into the parameter-adjusted discriminator to obtain a discrimination result.

In some embodiments, the encoder training module is further configured to input the first face image sample into an encoder to be trained, and encode the first face image sample into a training face feature vector through the encoder to be trained; inputting the training face feature vector into a pre-training generator, and decoding the training face feature vector through the pre-training generator to obtain training three-dimensional face data; acquiring a second training view angle parameter value, and determining a reconstructed face image under a view angle corresponding to the second training view angle parameter value according to the training three-dimensional face data and the second training view angle parameter value; and training the encoder to be trained according to the first face image sample and the reconstructed face image to obtain the image encoder.

In some embodiments, the encoder training module is further configured to perform pose estimation on the first face image sample through a pre-trained view parameter estimation model to obtain a second training view parameter value; and retraining the view angle parameter estimation model according to the first face image sample and the reconstructed face image.

In some embodiments, as shown in fig. 13, there is provided an image generating apparatus 1300, comprising:

a face image acquiring module 1302, configured to acquire a face image;

A feature vector encoding module 1304 for encoding the face image into a face feature vector by a pre-trained image encoder;

the feature vector decoding module 1306 is configured to decode the face feature vector by using a pre-trained image decoder, so as to obtain three-dimensional face data of the face image;

a style image rendering module 1308, configured to obtain a view angle parameter value, and render, according to the view angle parameter value and three-dimensional style face data, a style face image of the face image under a view angle corresponding to the view angle parameter value; the image decoder is obtained by interpolating at least one pair of associated model parameters of the pre-training generator and the stylized generator and then updating the model parameters in the stylized generator by utilizing an interpolation result; the stylized generator is obtained by adjusting a pre-training generator, and the pre-training generator is used for generating three-dimensional face data.

In some embodiments, the face image is obtained by acquiring a face at a first time; the style image rendering module is also used for acquiring a face image obtained by acquiring a face at the second moment; the second moment is after the first moment; and carrying out pose estimation on the face image acquired at the second moment to obtain a visual angle parameter value.

According to the image generating device, the face image is encoded into the face feature vector through the pre-trained image encoder, the face feature vector is decoded through the pre-trained image decoder, and the three-dimensional style face data of the face image are obtained.

The above-described respective modules in the model processing apparatus, the image generating apparatus may be realized entirely or partially by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 14. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing image sample data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a model processing method or an image generating method.

In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 15. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a model processing method or an image generating method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 14 and 15 are merely block diagrams of partial structures related to the present application and do not constitute a limitation of the computer device to which the present application is applied, and that a specific computer device may include more or less components than those shown in the drawings, or may combine some components, or have different arrangements of components.

In some embodiments, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the model processing method of any of the embodiments described above when the computer program is executed.

In some embodiments, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the model processing method of any of the embodiments described above.

In some embodiments, a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of the model processing method of any of the embodiments described above.

In some embodiments, another computer device is provided, comprising a memory having a computer program stored therein and a processor, which when executing the computer program performs the steps of the model processing method of any of the embodiments described above.

In some embodiments, another computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the model processing method of any of the embodiments described above.

In some embodiments, another computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the model processing method of any of the embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of model processing, the method comprising:

acquiring a first face image sample, encoding the first face image sample into a training face feature vector through an encoder to be trained, performing image reconstruction based on the training face feature vector, the pre-training generator and a given view angle parameter value, and training the encoder to be trained based on the obtained reconstructed face image and the first face image sample to obtain an image encoder; the image encoder is used for encoding the input face image into a face feature vector when the face image is input;

2. The method of claim 1, wherein interpolating each pair of model parameters comprises:

and for each model parameter pair, carrying out linear summation on the model parameters in the corresponding model parameter pair to obtain an interpolation result of the corresponding model parameter pair.

3. The method of claim 1, wherein the network structure of the stylized generator is consistent with the pre-training generator, the pre-training generator and the stylized generator each comprising a plurality of network layers, the pair of model parameters being model parameters in the same network layer in the stylized generator and the stylized generator.

4. A method according to claim 3, wherein said determining at least one pair of associated model parameters from the model parameters of each of the pre-training generator and the stylized generator comprises:

screening at least one network layer from a plurality of network layers of the stylized generator; among the plurality of network layers of the stylized generator, there is also a network layer after the at least one network layer;

screening at least one network layer from the plurality of network layers of the pre-training generator according to the position of the at least one network layer in the plurality of network layers of the stylized generator;

correlating the at least one network layer screened from the stylization generator with the at least one network layer screened from the pre-training generator according to the position of the network layer;

at least one pair of model parameters is determined from the associated network layer.

5. A method according to claim 3, wherein said interpolating each pair of model parameters comprises:

determining interpolation coefficients for each model parameter pair according to the network layer where the model parameters in the model parameter pair are located;

determining the linear addition weight corresponding to each model parameter in the pair of the targeted model parameters according to the interpolation coefficient; the linear addition weight corresponding to the model parameters of the network layer in the stylized generator is positively correlated with the network depth of the network layer;

And carrying out linear summation on the model parameters in the aimed model parameter pair according to the corresponding linear summation weights to obtain an interpolation result of the aimed model parameter pair.

6. The method according to claim 1, wherein the second face image sample is obtained by a second face image sample generation step, the second face image sample generation step comprising:

acquiring a training sample set used in training the pre-training generator; the training sample set comprises a plurality of real face image samples;

and performing image stylization on each real face image sample according to a preset image style to generate a second face image sample.

7. The method of claim 1, wherein the pre-training generator corresponds to a hidden space, the pre-training generator being a generator in a pre-trained generated countermeasure network, the generated countermeasure network further comprising a pre-training arbiter; the adjusting the pre-training generator based on the second face image sample to obtain a stylized generator includes:

adjusting model parameters of the pre-training discriminant based on the second face image sample to obtain a discriminant with adjusted parameters;

Generating corresponding three-dimensional face data according to the face feature vectors sampled from the hidden space by the pre-training generator;

determining a first training visual angle parameter value, and rendering according to the first training visual angle parameter value and the three-dimensional face data to obtain a style face image;

and performing countermeasure training by using the style face image and the discriminator with the parameters adjusted so as to adjust model parameters of the pre-training generator and obtain a stylized generator.

8. The method of claim 7, wherein the pre-training generator comprises a mapping network for generating intermediate feature vectors from face feature vectors sampled from the hidden space and a synthesis network for generating three-dimensional face data using the intermediate feature vectors;

the training of countermeasures is performed by using the facial image of the style and the adjusted discriminators to adjust model parameters of the pre-training generator, including:

inputting the style face image into the discriminator after the parameter adjustment to obtain a discrimination result;

determining stylized loss of the pre-training generator according to the judging result;

And under the condition of freezing the model parameters of the mapping network in the pre-training generator, adjusting the model parameters of the synthesis network in the pre-training generator based on the stylized loss.

9. The method of claim 8, wherein the second face image sample is obtained by image stylizing a real face image sample; the adjusting the model parameters of the pre-training discriminant based on the second face image sample to obtain the discriminant with adjusted parameters includes:

obtaining a visual angle parameter value of a real face image sample corresponding to the second face image sample;

adjusting model parameters of the pre-training discriminant based on the second face image sample and the acquired view angle parameter values to obtain a discriminant with the parameters adjusted;

the step of inputting the style face image into the parameter-adjusted discriminator to obtain a discrimination result comprises the following steps:

and inputting the style face image and the first training visual angle parameter value into the parameter-adjusted discriminator to obtain a discrimination result.

10. The method according to any one of claims 1 to 9, wherein the performing image reconstruction based on the training face feature vector, the pre-training generator, and a given view parameter value, training the encoder to be trained based on the obtained reconstructed face image and the first face image sample, and obtaining an image encoder, comprises:

Inputting the training face feature vector into the pre-training generator, and decoding the training face feature vector through the pre-training generator to obtain training three-dimensional face data;

acquiring a second training view angle parameter value, and determining a reconstructed face image under a view angle corresponding to the second training view angle parameter value according to the training three-dimensional face data and the second training view angle parameter value;

and training the encoder to be trained according to the first face image sample and the reconstructed face image to obtain an image encoder.

11. The method of claim 10, wherein the obtaining the second training perspective parameter value comprises:

performing pose estimation on the first face image sample through a pre-trained visual angle parameter estimation model to obtain a second training visual angle parameter value;

the method further comprises the steps of:

and retraining the visual angle parameter estimation model according to the first face image sample and the reconstructed face image.

12. An image generation method, the method comprising:

acquiring a face image;

the image decoder is obtained by interpolating at least one pair of associated model parameters of a pre-training generator and a stylized generator and then updating the model parameters in the stylized generator by using an interpolation result; the stylized generator is obtained by adjusting a pre-training generator, and the pre-training generator is used for generating three-dimensional face data; the image encoder is obtained by training an encoder to be trained based on a first face image sample and a reconstructed face image corresponding to the first face image sample, the reconstructed face image is obtained by performing image reconstruction based on a training face feature vector, the pre-training generator and a given view angle parameter value, and the training face feature vector is obtained by encoding the first face image sample through the encoder to be trained.

13. The method of claim 12, wherein the face image is obtained by capturing a face at a first time; the obtaining the viewing angle parameter value includes:

acquiring a face image obtained by acquiring the face at a second moment; the second time is after the first time;

and carrying out pose estimation on the face image acquired at the second moment to obtain a visual angle parameter value.

14. A model processing apparatus, characterized in that the apparatus comprises:

the encoder training module is used for acquiring a first face image sample, encoding the first face image sample into a training face feature vector through an encoder to be trained, carrying out image reconstruction based on the training face feature vector, the pre-training generator and a given view angle parameter value, and training the encoder to be trained based on the acquired reconstructed face image and the first face image sample to acquire an image encoder; the image encoder is used for encoding the input face image into a face feature vector when the face image is input;

15. The apparatus of claim 14, wherein the parameter adjustment module is further configured to: and for each model parameter pair, carrying out linear summation on the model parameters in the corresponding model parameter pair to obtain an interpolation result of the corresponding model parameter pair.

16. The apparatus of claim 14, wherein the network structure of the stylized generator is consistent with the pre-training generator, the pre-training generator and the stylized generator each comprising a plurality of network layers, the pair of model parameters being model parameters in the same network layer in the stylized generator and the stylized generator.

17. The apparatus of claim 16, wherein the parameter pair determination module is further configured to: screening at least one network layer from a plurality of network layers of the stylized generator; among the plurality of network layers of the stylized generator, there is also a network layer after the at least one network layer; screening at least one network layer from the plurality of network layers of the pre-training generator according to the position of the at least one network layer in the plurality of network layers of the stylized generator; correlating the at least one network layer screened from the stylization generator with the at least one network layer screened from the pre-training generator according to the position of the network layer; at least one pair of model parameters is determined from the associated network layer.

18. The apparatus of claim 16, wherein the parameter adjustment module is further configured to: determining interpolation coefficients for each model parameter pair according to the network layer where the model parameters in the model parameter pair are located; determining the linear addition weight corresponding to each model parameter in the pair of the targeted model parameters according to the interpolation coefficient; performing linear summation on the model parameters in the targeted model parameter pair according to the corresponding linear summation weights to obtain an interpolation result of the targeted model parameter pair; the linear addition weight corresponding to the model parameters of the network layer in the stylized generator is positively correlated with the network depth of the network layer.

19. The apparatus of claim 14, further comprising a sample generation module configured to: acquiring a training sample set used in training the pre-training generator; the training sample set comprises a plurality of real face image samples; and performing image stylization on each real face image sample according to a preset image style to generate a second face image sample.

20. The apparatus of claim 14, wherein the pre-training generator corresponds to a hidden space, the pre-training generator being a generator in a pre-trained generated countermeasure network, the generated countermeasure network further comprising a pre-training arbiter; the generator adjustment module is further configured to: adjusting model parameters of the pre-training discriminant based on the second face image sample to obtain a discriminant with adjusted parameters; generating corresponding three-dimensional face data according to the face feature vectors sampled from the hidden space by the pre-training generator; determining a first training visual angle parameter value, and rendering according to the first training visual angle parameter value and the three-dimensional face data to obtain a style face image; and performing countermeasure training by using the style face image and the discriminator with the parameters adjusted so as to adjust model parameters of the pre-training generator and obtain a stylized generator.

21. The apparatus of claim 20, wherein the pre-training generator comprises a mapping network for generating intermediate feature vectors from face feature vectors sampled from the hidden space and a synthesis network for generating three-dimensional face data using the intermediate feature vectors; the generator adjustment module is further configured to: inputting the style face image into the discriminator after the parameter adjustment to obtain a discrimination result; determining stylized loss of the pre-training generator according to the judging result; and under the condition of freezing the model parameters of the mapping network in the pre-training generator, adjusting the model parameters of the synthesis network in the pre-training generator based on the stylized loss.

22. The apparatus of claim 21, wherein the second face image sample is obtained by image stylizing a real face image sample; the generator adjustment module is further configured to: obtaining a visual angle parameter value of a real face image sample corresponding to the second face image sample; adjusting model parameters of the pre-training discriminant based on the second face image sample and the acquired view angle parameter values to obtain a discriminant with the parameters adjusted; and inputting the style face image and the first training visual angle parameter value into the parameter-adjusted discriminator to obtain a discrimination result.

23. The apparatus of any one of claims 14 to 22, wherein the encoder training module is further configured to: inputting the training face feature vector into the pre-training generator, and decoding the training face feature vector through the pre-training generator to obtain training three-dimensional face data; acquiring a second training view angle parameter value, and determining a reconstructed face image under a view angle corresponding to the second training view angle parameter value according to the training three-dimensional face data and the second training view angle parameter value; and training the encoder to be trained according to the first face image sample and the reconstructed face image to obtain an image encoder.

24. The apparatus of claim 23, wherein the encoder training module is further configured to: performing pose estimation on the first face image sample through a pre-trained visual angle parameter estimation model to obtain a second training visual angle parameter value; and retraining the visual angle parameter estimation model according to the first face image sample and the reconstructed face image.

25. An image generation apparatus, the apparatus comprising:

The face image acquisition module is used for acquiring a face image;

the style image rendering module is used for acquiring a view angle parameter value, and rendering the style face image of the face image under the view angle corresponding to the view angle parameter value according to the view angle parameter value and the three-dimensional style face data; the image decoder is obtained by interpolating at least one pair of associated model parameters of a pre-training generator and a stylized generator and then updating the model parameters in the stylized generator by using an interpolation result; the stylized generator is obtained by adjusting a pre-training generator, and the pre-training generator is used for generating three-dimensional face data; the image encoder is obtained by training an encoder to be trained based on a first face image sample and a reconstructed face image corresponding to the first face image sample, the reconstructed face image is obtained by performing image reconstruction based on a training face feature vector, the pre-training generator and a given view angle parameter value, and the training face feature vector is obtained by encoding the first face image sample through the encoder to be trained.

26. The apparatus of claim 25, wherein the face image is obtained by acquiring a face at a first time; the style image rendering module is further configured to: acquiring a face image obtained by acquiring the face at a second moment; the second time is after the first time; and carrying out pose estimation on the face image acquired at the second moment to obtain a visual angle parameter value.

27. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 13 when the computer program is executed.

28. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 13.