CN115294423A - Model determination method, image processing method, device, equipment and storage medium - Google Patents

Model determination method, image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115294423A
CN115294423A CN202210975057.8A CN202210975057A CN115294423A CN 115294423 A CN115294423 A CN 115294423A CN 202210975057 A CN202210975057 A CN 202210975057A CN 115294423 A CN115294423 A CN 115294423A
Authority
CN
China
Prior art keywords
face
image
model
target
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210975057.8A
Other languages
Chinese (zh)
Inventor
曾豪
丁彧
吕唐杰
范长杰
胡志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202210975057.8A priority Critical patent/CN115294423A/en
Publication of CN115294423A publication Critical patent/CN115294423A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application provides a model determining method, an image processing device, a model determining device, an image processing device and a storage medium, and relates to the technical field of image processing. The method comprises the following steps: constructing a first training sample according to the training sample set; training an initial face changing model according to a first training sample to obtain an intermediate face changing model; inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors; performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model; and obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model. By applying the embodiment of the application, the situation that the real face image is transferred to the virtual face image and then the real face image is out of harmony can be avoided.

Description

Model determination method, image processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a model determination method, an image processing method, an apparatus, a device, and a storage medium.
Background
With the development of network and computer technologies, image face changing gradually becomes a new hotspot of people's social entertainment. For example, in the field of gaming, a player may replace a target image (e.g., a game character image) with a source image (e.g., a player image or a favorite star image, etc.) to change the identity of the game character image while preserving the attribute characteristics of the game character image.
At present, an initial face-changing model is trained according to a real face image serving as a source image sample and a training sample constructed by a virtual face image serving as a target image sample to directly obtain a target face-changing model, and the identity characteristics of the target real face image are replaced into the target virtual face image by using the target face-changing model to obtain a face-changing image.
However, when the identity feature of the target real face image is replaced with the identity feature of the target virtual face image by using the target face changing model, the attribute features of the target real face image, such as texture and skin color, may be introduced, so that the target real face image is transferred to the virtual human image to cause discomfort, and it is difficult to ensure the consistency of the face style of the face changing image and the face style of the target virtual face image.
Disclosure of Invention
An object of the present application is to provide a model determining method, an image processing apparatus, a device and a storage medium, which can avoid the occurrence of a sense of incongruity after a real-person image is migrated to a virtual-person image, and further ensure the consistency of the face style of a face-changed image and the face style of a target virtual-face image.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides a model determining method, where the method includes:
constructing a first training sample according to a training sample set, wherein the training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample;
training an initial face changing model according to the first training sample to obtain an intermediate face changing model, wherein the intermediate face changing model is used for processing a fusion face characteristic vector to obtain a prediction face changing image, and the fusion face characteristic vector comprises an identity characteristic vector of a source image sample and an attribute characteristic vector of a target image sample;
inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors;
performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model, wherein the target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image;
and obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
In a second aspect, an embodiment of the present application further provides an image processing method, where the method includes:
acquiring a target real face image and a target virtual face image;
inputting the target real face image and the target virtual face image into a target face-changing model respectively to obtain a face-changing image, where the face-changing image includes an identity characteristic of the target real face image and an attribute characteristic of the target virtual face image, and the target face-changing model is obtained by the model determination method of the first aspect.
In a third aspect, an embodiment of the present application further provides a model determining apparatus, where the apparatus includes:
the system comprises a construction module, a detection module and a processing module, wherein the construction module is used for constructing a first training sample according to a training sample set, the training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample;
the first determining module is used for training an initial face changing model according to the first training sample to obtain an intermediate face changing model, the intermediate face changing model is used for processing a fusion face characteristic vector to obtain a prediction face changing image, and the fusion face characteristic vector comprises an identity characteristic vector of a source image sample and an attribute characteristic vector of a target image sample;
the first input module is used for inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors;
the feature alignment module is used for performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face changing model, correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment, and obtaining a target virtual face reconstruction model, wherein the target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image;
and the second determining module is used for obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
In a fourth aspect, an embodiment of the present application further provides an image processing apparatus, including:
the acquisition module is used for acquiring a target real face image and a target virtual face image;
a second input module, configured to input the target real face image and the target virtual face image into a target face-changing model respectively, so as to obtain a face-changing image, where the face-changing image includes an identity feature of the target real face image and an attribute feature of the target virtual face image, and the target face-changing model is obtained by the model determination apparatus according to the third aspect.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device is operated, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to execute the steps of the model determining method of the first aspect or the steps of the image processing method of the second aspect.
In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the model determination method of the first aspect or the steps of the image processing method of the second aspect.
The beneficial effect of this application is:
the embodiment of the application provides a model determining method, an image processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: constructing a first training sample according to the training sample set; training an initial face changing model according to a first training sample to obtain an intermediate face changing model; inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors; performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model; and obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
By adopting the model determination method provided by the embodiment of the application, after the intermediate face-changing model is obtained by training with the first training sample, the initial virtual face reconstruction model comprising the initial virtual face reconstruction decoder can be trained based on the intermediate face-changing model, and because the initial virtual face reconstruction model inputs the virtual face image sample, the virtual face feature vector and the fused face feature vector generated by the intermediate face-changing model can be subjected to feature alignment processing in the process of training the initial virtual face reconstruction model, so that the distribution of the virtual face feature vector input to the initial virtual face reconstruction decoder is consistent with the distribution of the fused face feature vector, and thus, the finally trained target virtual face reconstruction model not only can focus on the attribute features of the virtual face image, such as texture, skin color and the like, but also can normally decode the fused face feature vector containing the real face image information. That is to say, the target face-changing model determined and obtained according to the target virtual face reconstruction model and the intermediate face-changing model can ensure that the style of the generated face-changing image accords with the style of the virtual face image, so that the mismatching feeling caused after the real face image is migrated to the virtual face image is avoided, and the authenticity and the image quality of the face-changing image can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a model determining method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an initial face changing model according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a combination of an intermediate face-changing model and an initial virtual face reconstruction model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a target face changing model according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another model determination method provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of another model determination method provided in the embodiment of the present application;
fig. 7 is a schematic flowchart of another model determination method provided in an embodiment of the present application;
fig. 8 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a model determining apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In recent years, with the development of human face synthesis technology, face changing technology has been widely used. Changing the face refers to replacing a facial region in the target image with a facial region in the source image to change the identity characteristics (e.g., face shape, eyebrow strike, etc.) of the target image while preserving the attribute characteristics (e.g., head pose, facial expression, etc.) in the target image.
However, the applicant has found that, as identity features and attribute features in an image have a certain correlation, when a face-changed image is obtained by changing a face of a source image and a target image, some attribute features, such as texture and skin color, in the source image (for example, a player image) may be brought into the face-changed image, so that attributes, such as texture and skin color, in the face-changed image deviate from the texture and skin color in the target image (for example, a game character image), and a face style of the face-changed image is inconsistent with a face style corresponding to the target image.
In view of the above-mentioned problems, the present application solves the problems by using the following embodiments, and before explaining the embodiments of the present application in detail, an application scenario of the present application will be described first. The application scene may be specifically a scene in which a game character image is set individually, for example, a game character image in a CG (Computer Graphics, computer Graphics image) video of a game may be set as a favorite star image, a player's own image, or the like, and a specific setting process may refer to an example described below in this application. It should be noted that the technical solution provided by the present application can be applied not only to the field of games, but also to the fields of cultural tourism, movie and television production, etc., without being limited thereto.
The embodiments mentioned below in this application can be divided into two parts, the first part being a training model phase and the second part being an application model phase. Aiming at the first part, a pre-constructed initial face changing model and an initial virtual face reconstruction model are combined, for one example, a first training sample can be used for training the initial face changing model to obtain an intermediate face changing model, then a virtual face image is input into the initial virtual face reconstruction model to obtain virtual face features, feature alignment processing is carried out on fused face features obtained by inputting the first training sample into the intermediate face changing model again on the basis of the virtual face features, a target virtual face reconstruction model is obtained through training, and finally the target face changing model is obtained according to the intermediate face changing model and the target virtual face reconstruction model; another exemplary method includes training an initial face-changing model and an initial virtual face reconstruction model together, that is, inputting a first training sample into the initial face-changing model and inputting a virtual face image sample into the initial virtual face reconstruction model, performing feature alignment processing based on a fused face feature generated in the process of training the initial face-changing model and a virtual face feature generated in the process of training the initial virtual face reconstruction model, obtaining an intermediate face-changing model and a target virtual face reconstruction model through trainable training after a training stop condition is met, and finally obtaining the target face-changing model according to the intermediate face-changing model and the target virtual face reconstruction model. For the sake of clarity, the first example mentioned above is used as an example and is not intended to be limiting.
And aiming at the second part, after the target face changing model is obtained, face changing can be carried out on the obtained target real face image and the target virtual face image to obtain a face changing image, and the texture and the skin color in the face changing image are close to those in the target virtual face image to a greater extent, so that the discomfort caused to the application scene corresponding to the face changing image and the target image is avoided. That is, by performing face replacement based on the obtained target face replacement model, the reality and the image quality of the face replacement image obtained by transferring the real face image to the virtual face image can be improved.
The method mentioned in the present application is exemplified below with reference to the accompanying drawings. Fig. 1 is a schematic flowchart of a model determination method according to an embodiment of the present application. As shown in fig. 1, the method may include:
s101, constructing a first training sample according to a training sample set.
The training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample.
For example, in a game scene, the real face image sample may be an image including a real face region, such as a player image, a star image, and the like, and the virtual face image sample may be an image including a virtual face region, such as a game character image and the like, it should be noted that the application does not limit the real face image sample and the virtual face image sample.
It is understood that the source image and the target image in the first training sample have a relationship that allows the identity feature in the source image to be migrated to the target image. The source image and the target image in the first training sample may have the following relationship with the real facial image sample and the virtual facial image sample in the training sample set, the source image and the target image in the first training sample may both be the real facial image sample in the training sample set, or the source image is the real facial image sample in the training sample set, and the target image is the virtual facial image sample in the training sample set, which is not limited in the present application.
And S102, training an initial face changing model according to the first training sample to obtain an intermediate face changing model.
The intermediate face-changing model is used for processing the fused face feature vector to obtain a predicted face-changing image, and the fused face feature vector comprises an identity feature vector of a source image sample and an attribute feature vector of a target image sample.
With reference to fig. 2 for explanation, fig. 2 is a schematic structural diagram of an initial face change model provided in an embodiment of the present application, as shown in fig. 2, an initial face change model 200 includes an initial identity encoder 201, an initial attribute encoder 202, and an initial fuser 203, where the initial identity encoder 201 and the initial attribute encoder 202 are respectively connected to the initial fuser 203, the initial identity encoder 201 is configured to encode an input source image sample to obtain an identity feature vector, the initial attribute encoder 202 is configured to encode an input target image sample to obtain an attribute feature vector, and the initial fuser 203 is configured to fuse the identity feature vector and the attribute feature vector to obtain a fused face feature vector, so as to obtain a predicted face change image by fusing the face feature vectors.
It can be understood that the training of the initial face-changing model 200 is essentially to modify the learning parameters in the initial identity encoder 201, the initial attribute encoder 202, the initial fuser 203 and the initial face-changing decoder 204 according to the preset loss function.
Training the initial face change model using the first training sample may be expressed as:
res=G human (E id (src),D(tgt))
wherein G is human Representing an initial face-changing model, src, tgt and res are respectively represented as a source image sample, a target image sample and a predicted face-changing image, E id Representing an identity encoder and D an attribute encoder.
The training process of the initial face-changing model 200 may be performed in the following supervision manner, and specifically, the supervision may be performed based on the identity similarity between the predicted face-changing image res and the source image sample src and the attribute similarity between the predicted face-changing image res and the target image sample src, where the identity similarity loss is defined as: l is a radical of an alcohol id =1-cos(E id (src),E id (res)), where cos represents the calculated cosine similarity.
The attribute similarity loss is defined as: l is attr =||D(tgt)-D(res)|| 2 Wherein | | Qi | purple 2 Representing the euclidean distance.
That is, the total loss L1 and the identity similarity loss L corresponding to the training of the initial face-changing model id And attribute similarity loss L attr And in correlation, when the total loss function L1 meets the preset training stopping condition, the intermediate face changing model can be trained.
Fig. 3 is a schematic structural diagram of a combination of an intermediate face-changing model and an initial virtual face reconstruction model according to an embodiment of the present application. As shown in fig. 3, the intermediate face-changing model 300 includes an identity encoder 301, an attribute encoder 302, and a fuser 303. It is understood that the above-mentioned initial fusion device 203 corresponds to the fusion device 303, that is, the fusion device 303 is configured to fuse the identity feature vector and the attribute feature vector to obtain a fused facial feature vector, and the intermediate face-changing model 300 may obtain the predicted face-changing image based on the fused facial feature vector.
S103, inputting the virtual face image samples in the training sample set into the initial virtual face reconstruction model to obtain virtual face feature vectors.
Continuing with FIG. 3, for clarity of the training process of the initial virtual face reconstruction model 30, an iterative training process is illustrated as a dimension. The initial virtual face reconstruction model 30 includes an initial virtual face reconstruction encoder 305, where the initial virtual face reconstruction encoder 305 is configured to encode a virtual face image sample (such as a game character image sample) in the received virtual face image to obtain a virtual face feature vector, and the initial virtual face reconstruction model 30 may obtain a reconstructed virtual face image based on the virtual face feature vector.
And S104, carrying out feature alignment on the virtual face feature vector and the fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model.
The target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image.
Continuing with the description in conjunction with fig. 3, it can be seen from fig. 3 that the initial virtual face reconstruction encoder 305 is connected not only to the initial virtual face reconstruction decoder 306, but also to the output of the fuser 303 in the intermediate facetted model 300. Based on the connection relationship, the constructed first training sample is input into the intermediate face-changing model 300 again, and feature alignment is performed on the fused face feature vector output by the fusion device 303 in the intermediate face-changing model 300 and the virtual face feature vector output by the initial virtual face reconstruction encoder 305 in the initial virtual face reconstruction model 30, so that the loss of feature alignment is obtained. It is understood that as the training process proceeds, the initial virtual face reconstruction encoder 305 may output a plurality of virtual face feature vectors, and the fusion device 303 in the intermediate face-changing model 300 may also output a plurality of fusion face feature vectors, i.e., features having a distribution of virtual face feature vectors and a distribution of fusion face feature vectors.
The loss of feature alignment mentioned above can be represented by a decision value that can be used to characterize the probability that the distribution of the virtual facial feature vector is consistent with the distribution of the fused facial feature vector. The process of changing the determination value is a process of correcting the learning parameters in the initial virtual face reconstruction decoder 306 in the initial virtual face reconstruction model 30. And when the judgment value changes to meet the training stopping condition, further training to obtain a target virtual face reconstruction model. As can be seen from the above description, the initial virtual face reconstruction decoder 306 is configured to process the virtual face feature vector to obtain a reconstructed virtual face image, that is, the target virtual face reconstruction decoder may be configured to process the virtual face feature vector to obtain the reconstructed virtual face image.
It is to be understood that the purpose of the feature alignment mentioned herein is to make the virtual facial feature vector output by the initial virtual face reconstruction encoder 305 in the initial virtual face reconstruction model 30 consistent with the distribution of the fused facial feature vector output by the fuser 303 in the intermediate face-changing model 300. The target virtual face reconstruction model obtained through training can focus on the attribute characteristics of the virtual face image, such as texture, skin color and the like, and can normally decode the fusion face characteristic vector containing the real face image information and output a face changing image.
And S105, obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
After the target virtual face reconstruction model is obtained through training, the intermediate face changing model can be modified based on the target virtual face reconstruction model, and the modified intermediate face changing model is obtained. For example, the modified intermediate face-changing model can be directly used as a target face-changing model; for another example, after the modified intermediate face changing model is obtained, the modified intermediate face changing model may be trained by using the real face image sample and the virtual face image sample in the training sample set as the source image sample and the target image sample, respectively, and after the training stopping condition is satisfied, the target face changing model is obtained.
In summary, in the model determining method provided by the present application, after the intermediate face-changing model is obtained by training using the first training sample, the initial virtual face reconstruction model including the initial virtual face reconstruction decoder may be trained based on the intermediate face-changing model, and since the initial virtual face reconstruction model inputs the virtual face image sample, in the process of training the initial virtual face reconstruction model, the virtual face feature vector and the fused face feature vector generated by the intermediate face-changing model may be subjected to feature alignment processing, so that the distribution of the virtual face feature vector input to the initial virtual face reconstruction decoder is consistent with the distribution of the fused face feature vector, and thus, the finally trained target virtual face reconstruction model may not only focus on the attribute features of the virtual face image, such as texture, skin color, and the like, but also may normally decode the fused face feature vector including the real image information. That is to say, the target face-changing model determined and obtained according to the target virtual face reconstruction model and the intermediate face-changing model can ensure that the style of the generated face-changing image accords with the style of the virtual face image, so that the mismatching feeling caused after the real face image is migrated to the virtual face image is avoided, and the authenticity and the image quality of the face-changing image can be improved.
Optionally, the intermediate face-changing model includes an intermediate face-changing decoder; the target virtual face reconstruction model comprises a target virtual face reconstruction decoder.
As described with reference to fig. 2 and fig. 3, as can be seen from fig. 2, the initial face-changing model 200 further includes an initial face-changing decoder 204, the initial fuser 203 is connected to the initial face-changing decoder 204, the initial fuser 203 is configured to fuse the identity characteristic vectors and the attribute characteristic vectors to obtain fused face characteristic vectors, and the initial face-changing decoder 204 is configured to decode the fused face characteristic vectors to obtain predicted face-changing images. As can be seen from fig. 3, the intermediate face-changing model 300 further includes an intermediate face-changing decoder 304 corresponding to the initial face-changing decoder 204, and it can be understood that the intermediate face-changing decoder 304 is configured to decode the fused face feature vectors to obtain the predicted face-changing image.
As can be seen from the above description, the initial virtual face reconstruction model 30 includes an initial virtual face reconstruction encoder 305, the initial virtual face reconstruction encoder 305 is connected to an initial virtual face reconstruction decoder 306, the initial virtual face reconstruction encoder 305 is configured to decode a virtual face image sample (such as a game character image sample) in a received virtual face image to obtain a virtual face feature vector, and the initial virtual face reconstruction decoder 306 is configured to decode the virtual face feature vector to obtain a reconstructed virtual face image. Based on this, when the determination value corresponding to the loss of the feature alignment changes to satisfy the training stop condition, the target virtual face reconstruction model is obtained through training, that is, the initial virtual face reconstruction decoder 306 becomes the target virtual face reconstruction decoder in the target virtual face reconstruction model.
Further, the obtaining of the target face-changed model according to the target virtual face reconstruction model and the intermediate face-changed model includes: and replacing the intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a target face-changing model.
The intermediate face change decoder 304 in the intermediate face change model 300 in fig. 3 may be replaced with a target virtual face reconstruction decoder 401 in a target virtual face reconstruction model, as shown in fig. 4. For example, after replacement, the replaced intermediate face-changing model can be directly used as the target face-changing model; for another example, after the replacement, the real face image sample and the virtual face image sample in the training sample set may be used as the source image sample and the target image sample respectively to train the replaced intermediate face changing model, and after the training stopping condition is met, the target face changing model is obtained.
As shown in fig. 3, the initial virtual face reconstruction model 30 includes an initial discriminator 307, and the initial discriminator 307 inputs the fused face feature vector output by the fusion device 303 in the intermediate face-changing model 300 and the virtual face feature vector output by the initial virtual face reconstruction encoder 305 in the initial virtual face reconstruction model 30, respectively, so as to perform feature alignment on the virtual face feature vector and the fused face feature vector. And the target virtual face reconstruction model comprises a target virtual face reconstruction decoder.
Fig. 5 is a schematic flowchart of another model determination method according to an embodiment of the present application. As shown in fig. 5, optionally, the performing feature alignment on the virtual face feature vector and the fused face feature vector generated by the intermediate face-changing model, and correcting the initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to a loss of the feature alignment to obtain the target virtual face reconstruction model includes:
s501, inputting the virtual face feature vector and the fused face feature vector into an initial discriminator, and performing feature alignment processing by the initial discriminator to obtain loss of feature alignment.
For example, the virtual facial feature vector and the fused facial feature vector are input to an initial discriminator, and the initial discriminator obtains the loss of feature alignment according to the difference between the first distribution information of the fused facial feature vector and the second distribution information of the virtual facial feature vector.
The initial discriminator may determine the distribution information of the feature vector, and may represent, for example, the distribution of the fused facial feature vector and the distribution of the virtual facial feature vector by a flag (e.g., 0 or 1). For example, if the distribution information of the fused facial feature vector is the first distribution information, the first distribution information of the fused facial feature vector may be represented as 0, and as long as the difference between the second distribution information of the virtual facial feature vector and the first distribution information does not satisfy the difference condition corresponding to the preset training stop condition, the initial discriminator may represent the distribution information of the virtual facial feature vector as 1, and the initial discriminator obtains the loss of feature alignment based on the difference between the first distribution information and the second distribution information.
S502, correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of feature alignment, the pixel loss corresponding to the virtual face image sample and the reconstructed virtual face image and the perception loss to obtain a target virtual face reconstruction model.
Pixel loss and perceptual loss may be determined from the virtual face image samples and the reconstructed virtual face image. The initial virtual face reconstruction encoder 305 in the initial virtual face reconstruction model 30, the initial virtual face reconstruction decoder 306 in the initial virtual face reconstruction model 30, and the learning parameters in the initial discriminator may be modified based on pixel loss, perceptual loss, and the resulting loss of feature alignment described above.
Assuming that the sample of the virtual face image is I and the reconstructed virtual face image is R, the pixel loss is L rec Is defined as:
L rec =||R-I|| 2
wherein | | xi | purple 2 Representing the euclidean distance.
Loss of perception L p Is defined as: l is p =||F(R)-F(I)|| 2 Wherein, F represents an initial virtual face reconstruction encoder, and the specific structure may be a VGG (Visual Geometry Group) network structure.
That is, the total loss L2 corresponding to the initial virtual face reconstruction model is not only equal to the pixel loss L rec And a loss of perception L p And when the total loss L2 meets a preset stop condition, training to obtain a target virtual face reconstruction model, wherein the target virtual face reconstruction model comprises a target virtual face reconstruction encoder, a target virtual face reconstruction decoder and a target discriminator. When the target virtual face reconstruction decoder obtained in this way decodes the fused face feature vector at a later stage, a face-changed image of a style corresponding to the virtual face image, for example, a game style, can be generated.
The following examples describe the relationship of source and target image samples in a first training sample to real and virtual facial image samples in a set of training samples.
Optionally, the constructing a first training sample according to the training sample set includes: respectively constructing a source image sample and a target image sample in the first training sample according to the real face image samples in the training sample set, wherein the source image sample and the target image sample are different real face image samples.
As can be seen from the above description, the training sample set includes both the real face image sample and the virtual face image sample. An exemplary method includes that two different real face images in a training sample set form a plurality of real face image sample groups, the plurality of real face image sample groups are obtained according to the number of preset training samples, each real face image sample group is respectively input into an initial face changing model to be trained, and an intermediate face changing model is obtained, wherein a first training sample is any one real face image sample group.
With reference to fig. 2 and fig. 3, taking any real face image sample group (first training sample) as an example, where the real face image sample group includes a real face image sample 1 and a real face image sample 2, inputting the real face image sample 1 (i.e., a source image sample) into an initial identity encoder 201 in an initial face change model 200, inputting the real face image sample 2 (a target image sample) into an initial attribute encoder 202 in the initial face change model 200 for encoding, so as to train the initial face change model 200, and when a training stop condition is satisfied, training obtains an intermediate face change model, where the intermediate face change model may be, for example, the intermediate face change model 300 shown in fig. 3, and specific structures in the initial face change model 200 and the intermediate face change model 300 may refer to the above-mentioned relevant part descriptions, which will not be described here.
It can be understood that the number of the real face image samples in the training sample set far exceeds the number of the virtual face image samples, and the real face image samples are respectively used as a source image sample and a target image sample in a training sample for training an initial face change model, so that the accuracy and robustness of the middle face change model obtained by training can be greatly improved, and the accuracy and robustness of the target face change model obtained in the later stage can be improved.
Optionally, the relationship of the source image sample and the target image sample in the first training sample to the real face image sample and the virtual face image sample in the training sample set may also be the following example. For example, a real facial image sample in the training sample set may be used as a source image sample in the first training sample, and a virtual facial image sample may be used as a target image sample in the first training sample. That is to say, an initial face-changing model can be trained by using a training sample constructed by a real face image sample and a virtual face image sample, and when a training stopping condition is met, an intermediate face-changing model is obtained through training.
The following is a specific example of the step of replacing the above-mentioned intermediate face change decoder in the intermediate face change model with the target virtual face reconstruction decoder to obtain the target face change model when the source image sample and the target image sample in the first training sample are both real face image samples in the training sample set.
Fig. 6 is a schematic flow chart of another model determination method provided in the present application. As shown in fig. 6, optionally, the replacing the intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a target face-changing model includes:
s601, replacing the intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a replaced intermediate face-changing model.
With reference to fig. 2 and 3, after the initial face-changing model 200 in fig. 2 is trained by using the source image sample and the target image sample, both of which are real face image samples, to obtain the intermediate face-changing model 300 shown in fig. 3, the initial virtual face reconstruction model 30 may be trained based on the intermediate face-changing model 300, and after the training of the initial virtual face reconstruction model 30 is completed, the intermediate face-changing decoder 304 in the intermediate face-changing model 300 may be replaced by using the obtained target virtual face reconstruction decoder in the target virtual face reconstruction model, to obtain the replaced intermediate face-changing model.
S602, respectively constructing a source image sample and a target image sample in a second training sample according to a real face image sample and a virtual face image sample in a training sample set, taking the real face image sample as the source image sample, and taking the virtual face image sample as the target image sample.
And S603, inputting the second training sample into the replaced intermediate face changing model, and training to obtain a target face changing model.
For example, the training sample set may be first divided into a plurality of real-virtual facial image sample groups, each real-virtual facial image sample group including one real facial image sample and one virtual facial image sample. And acquiring a plurality of real-virtual face image sample groups according to the number of preset training samples, and inputting each real-virtual face image sample group into the replaced intermediate face changing model for training to obtain a target face changing model, wherein the second training sample is any one real-virtual face image sample group.
Referring to fig. 4, assuming that a real-virtual face image sample group includes a real face image sample 1 and a virtual face image sample 2, the real face image sample 1 is input as a source face image sample into an identity encoder 301 in the replaced intermediate face change model, and the virtual face image sample 2 is input as a virtual face image sample into an attribute encoder 302 in the replaced intermediate face change model to encode so as to train the replaced intermediate face change model, and when a training stop condition is satisfied, a target face change model is obtained by training.
It can be understood that, because the source image sample and the target image sample in the first training sample corresponding to the intermediate face changing model obtained by training are both real face image samples, and in an actual scene, the real face image and the virtual face image are changed, after the replaced intermediate face changing model is obtained, the replaced intermediate face changing model is fine-tuned by using the source image sample as the real face image sample and the second training sample as the target image sample of the virtual face image sample, so that the finally obtained target face changing model can be suitable for an actual application scene, and the accuracy of the target face changing model is improved.
Fig. 7 is a schematic flow chart of another model determination method provided in the present application. As shown in fig. 7, optionally, before the constructing the first training sample according to the training sample set, the method further includes:
s701, obtaining a plurality of preset initial virtual face image samples and a plurality of real face image samples.
The method comprises the steps of obtaining a plurality of real face image samples from a real face image database, wherein the real face images comprise real face regions, such as player face regions and star face regions.
Taking a game scene as an example for illustration, a plurality of 3D models corresponding to different initial game character image samples (i.e. initial virtual face image samples) may be obtained, for example, 30.
S702, generating a plurality of virtual face image samples according to the expression parameters and the head posture parameters of the initial virtual face image samples.
S703, building a training sample set according to the plurality of virtual face image samples and the plurality of real face image samples.
Continuing the above example, each 3D model corresponds to an expression parameter and a head pose parameter, and the expression parameters and the head pose parameters corresponding to each 3D model may be modified according to a preset modification policy, so that a plurality of virtual face image samples corresponding to each initial game character image sample may be generated. It should be noted that the number of virtual face image samples is not limited in the present application. After obtaining a plurality of virtual face image samples and a plurality of real face image samples, a training sample set can be obtained.
Therefore, a plurality of virtual face image samples can be quickly obtained on the premise of only acquiring a plurality of preset initial virtual face image samples, and the efficiency of obtaining the target face changing model can be improved.
The following is an example of applying the target trade model after the target trade model is obtained.
Fig. 8 is a flowchart illustrating an image processing method according to an embodiment of the present application. As shown in fig. 8, the method may include:
s801, acquiring a target real face image and a target virtual face image.
The target real face image and the target virtual face image may be any pictures containing face information. The target real face image may be a picture containing face information of a game player, or may be a picture containing face information of other people, and the target virtual face image may be a picture containing face information of a game character. The target real face image and the target virtual face image may be pre-stored pictures or pictures captured by a camera. The target real face image and the target virtual face image may be a single picture, or may be one of continuous video frames containing face information in the video data. The present application is not limited to this.
S802, inputting the target real face image and the target virtual face image into the target face changing model respectively to obtain a face changing image.
The face-changing image comprises the identity characteristics of the target real face image and the attribute characteristics of the target virtual face image, and the training process of the target face-changing model can be described by referring to the relevant parts.
With reference to fig. 4, a target real face image is input into the identity encoder 301 in the target face-changing model, a target virtual face image is input into the attribute encoder 302 in the target face-changing model, the identity eigenvector output by the identity encoder 301 and the attribute eigenvector output by the attribute encoder 302 are fused by the fusion device 303 to obtain a fused face eigenvector, and the target virtual face reconstruction decoder 400 in the target face-changing model decodes the fused face eigenvector to obtain a face-changing image.
The face style of the face-changed image obtained in the above manner is matched with the face style of the target virtual face image, and if the target virtual face image is a game character image, the face style of the face-changed image is a game style, that is, the attribute characteristics of the face-changed image are more matched with the attribute characteristics of the target virtual face image.
Fig. 9 is a schematic structural diagram of a model determining apparatus according to an embodiment of the present application. As shown in fig. 9, the apparatus includes:
the building module 901 is configured to build a first training sample according to a training sample set, where the training sample set includes a real face image sample and a virtual face image sample, and the first training sample includes a source image sample and a target image sample.
The first determining module 902 is configured to train an initial face change model according to a first training sample to obtain an intermediate face change model, where the intermediate face change model is configured to process a fused face feature vector to obtain a predicted face change image, and the fused face feature vector includes an identity feature vector of a source image sample and an attribute feature vector of a target image sample;
a first input module 903, configured to input a virtual face image sample in the training sample set into an initial virtual face reconstruction model, so as to obtain a virtual face feature vector;
a feature alignment module 904, configured to perform feature alignment on the virtual face feature vector and the fused face feature vector generated by the intermediate face-changing model, and correct an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to a loss of the feature alignment, so as to obtain a target virtual face reconstruction model, where the target virtual face reconstruction model is configured to process the virtual face feature vector to obtain a reconstructed virtual face image;
and the second determining module 905 is configured to obtain a target face-changing model according to the target virtual face reconstruction model and the intermediate face-changing model.
Optionally, the intermediate face-changing model includes an intermediate face-changing decoder; the target virtual face reconstruction model comprises a target virtual face reconstruction decoder;
correspondingly, the second determining module 905 is specifically configured to replace the intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder, so as to obtain a target face-changing model.
Optionally, the target virtual face reconstruction model includes a target virtual face reconstruction decoder; the initial virtual face reconstruction model comprises an initial discriminator;
correspondingly, the feature alignment module 904 is specifically configured to input the virtual face feature vector and the fused face feature vector into an initial discriminator, and perform feature alignment processing by the initial discriminator to obtain a loss of feature alignment; and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the characteristic alignment, the pixel loss corresponding to the virtual face image sample and the reconstructed virtual face image and the perception loss to obtain a target virtual face reconstruction model.
Optionally, the feature alignment module 904 is further specifically configured to input the virtual facial feature vector and the fused facial feature vector into an initial discriminator, and the initial discriminator obtains a loss of feature alignment according to a difference between first distribution information of the fused facial feature vector and second distribution information of the virtual facial feature vector.
Optionally, the constructing module 901 is specifically configured to respectively construct a source image sample and a target image sample in the first training sample according to the real face image samples in the training sample set, where the source image sample and the target image sample are different real face image samples.
Optionally, the second determining module 905 is further specifically configured to replace an intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder, so as to obtain a replaced intermediate face-changing model; respectively constructing a source image sample and a target image sample in a second training sample according to the real face image sample and the virtual face image sample in the training sample set, taking the real face image sample as the source image sample, and taking the virtual face image sample as the target image sample; and inputting the second training sample into the replaced intermediate face changing model, and training to obtain a target face changing model.
Optionally, the apparatus further comprises: building a module;
the building module is used for obtaining a plurality of preset initial virtual face image samples and a plurality of real face image samples; generating a plurality of virtual face image samples according to the expression parameters and the head posture parameters of the initial virtual face image samples; and constructing a training sample set according to the plurality of virtual face image samples and the plurality of real face image samples.
Optionally, an embodiment of the present application further provides an image processing apparatus, where the apparatus includes:
the acquisition module is used for acquiring a target real face image and a target virtual face image;
a second input module, configured to input the target real face image and the target virtual face image into the target face-changing model respectively to obtain a face-changing image, where the face-changing image includes an identity feature of the target real face image and an attribute feature of the target virtual face image, and the target face-changing model is obtained by the model determination apparatus mentioned in the above example.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 10, the electronic device may include: a processor 1001, a storage medium 1002 and a bus 1003, the storage medium 1002 storing machine readable instructions executable by the processor 1001, the electronic device when operating communicating between the processor 1001 and the storage medium 1002 via the bus 1003, the processor 1001 executing the machine readable instructions to perform the steps of:
in a possible implementation, the processor 1001, when executing the model determining method, is specifically configured to: constructing a first training sample according to a training sample set, wherein the training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample; training an initial face changing model according to a first training sample to obtain an intermediate face changing model, wherein the intermediate face changing model is used for processing a fused face feature vector to obtain a predicted face changing image, and the fused face feature vector comprises an identity feature vector of a source image sample and an attribute feature vector of a target image sample; inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors; performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in an initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model, wherein the target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image; and obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
In a possible implementation, the processor 1001, when executing the model determining method, is specifically configured to: and replacing the intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a target face-changing model.
In a possible implementation, the processor 1001, when executing the model determining method, is specifically configured to: inputting the virtual face feature vector and the fused face feature vector into an initial discriminator, and performing feature alignment processing by the initial discriminator to obtain the loss of feature alignment; and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the characteristic alignment, the pixel loss corresponding to the virtual face image sample and the reconstructed virtual face image and the perception loss to obtain a target virtual face reconstruction model.
In one possible embodiment, when executing the model determining method, the processor 1001 is specifically configured to: and inputting the virtual face feature vector and the fused face feature vector into an initial discriminator, and obtaining the loss of feature alignment by the initial discriminator according to the difference between the first distribution information of the fused face feature vector and the second distribution information of the virtual face feature vector.
In a possible implementation, the processor 1001, when executing the model determining method, is specifically configured to: respectively constructing a source image sample and a target image sample in the first training sample according to the real face image samples in the training sample set, wherein the source image sample and the target image sample are different real face image samples.
In a possible implementation, the processor 1001, when executing the model determining method, is specifically configured to: replacing an intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a replaced intermediate face-changing model; respectively constructing a source image sample and a target image sample in a second training sample according to the real face image sample and the virtual face image sample in the training sample set, taking the real face image sample as the source image sample, and taking the virtual face image sample as the target image sample; and inputting the second training sample into the replaced intermediate face changing model, and training to obtain a target face changing model.
In a possible implementation, the processor 1001, when executing the model determining method, is specifically configured to: acquiring a plurality of preset initial virtual face image samples and a plurality of real face image samples; generating a plurality of virtual face image samples according to the expression parameters and the head posture parameters of the initial virtual face image samples; and constructing a training sample set according to the plurality of virtual face image samples and the plurality of real face image samples.
In a possible embodiment, the processor 1001, when executing the image processing method, is specifically configured to obtain a target real face image and a target virtual face image; and respectively inputting the target real face image and the target virtual face image into a target face changing model to obtain a face changing image, wherein the face changing image comprises the identity characteristic of the target real face image and the attribute characteristic of the target virtual face image, and the target face changing model is obtained by the model determining method.
Optionally, the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the processor executes the following steps:
in a possible embodiment, the processor, when executing the model determining method, is specifically configured to: constructing a first training sample according to a training sample set, wherein the training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample; training an initial face changing model according to a first training sample to obtain an intermediate face changing model, wherein the intermediate face changing model is used for processing a fused face feature vector to obtain a predicted face changing image, and the fused face feature vector comprises an identity feature vector of a source image sample and an attribute feature vector of a target image sample; inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors; performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in an initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model, wherein the target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image; and obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
In a possible embodiment, the processor, when executing the model determining method, is specifically configured to: and replacing the intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a target face-changing model.
In a possible embodiment, the processor, when executing the model determining method, is specifically configured to: inputting the virtual face feature vector and the fused face feature vector into an initial discriminator, and performing feature alignment processing by the initial discriminator to obtain the loss of feature alignment; and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the characteristic alignment, the pixel loss corresponding to the virtual face image sample and the reconstructed virtual face image and the perception loss to obtain a target virtual face reconstruction model.
In a possible embodiment, the processor, when executing the model determining method, is specifically configured to: and inputting the virtual face feature vector and the fused face feature vector into an initial discriminator, and obtaining the loss of feature alignment by the initial discriminator according to the difference between the first distribution information of the fused face feature vector and the second distribution information of the virtual face feature vector.
In a possible embodiment, the processor, when executing the model determining method, is specifically configured to: respectively constructing a source image sample and a target image sample in the first training sample according to the real face image samples in the training sample set, wherein the source image sample and the target image sample are different real face image samples.
In a possible embodiment, the processor, when executing the model determining method, is specifically configured to: replacing an intermediate face-changing decoder in the intermediate face-changing model with a target virtual face reconstruction decoder to obtain a replaced intermediate face-changing model; respectively constructing a source image sample and a target image sample in a second training sample according to the real face image sample and the virtual face image sample in the training sample set, taking the real face image sample as the source image sample, and taking the virtual face image sample as the target image sample; and inputting the second training sample into the replaced intermediate face changing model, and training to obtain a target face changing model.
In a possible embodiment, the processor, when executing the model determining method, is specifically configured to: acquiring a plurality of preset initial virtual face image samples and a plurality of real face image samples; generating a plurality of virtual face image samples according to the expression parameters and the head posture parameters of the initial virtual face image samples; and constructing a training sample set according to the plurality of virtual face image samples and the plurality of real face image samples.
In a possible embodiment, the processor, when executing the image processing method, is specifically configured to obtain a target real face image and a target virtual face image; and respectively inputting the target real face image and the target virtual face image into a target face changing model to obtain a face changing image, wherein the face changing image comprises the identity characteristic of the target real face image and the attribute characteristic of the target virtual face image, and the target face changing model is obtained by the model determining method.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method of model determination, the method comprising:
constructing a first training sample according to a training sample set, wherein the training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample;
training an initial face changing model according to the first training sample to obtain an intermediate face changing model, wherein the intermediate face changing model is used for processing a fusion face characteristic vector to obtain a prediction face changing image, and the fusion face characteristic vector comprises an identity characteristic vector of a source image sample and an attribute characteristic vector of a target image sample;
inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors;
performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model, wherein the target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image;
and obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
2. The method of claim 1, wherein the intermediate face-swapping model comprises an intermediate face-swapping decoder; the target virtual face reconstruction model comprises a target virtual face reconstruction decoder;
obtaining a target face-changing model according to the target virtual face reconstruction model and the intermediate face-changing model, including:
and replacing the intermediate face changing decoder in the intermediate face changing model with the target virtual face reconstruction decoder to obtain a target face changing model.
3. The method of claim 1, wherein the target virtual face reconstruction model comprises a target virtual face reconstruction decoder; the initial virtual face reconstruction model comprises an initial discriminator; performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face-changing model, and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment to obtain a target virtual face reconstruction model, including:
inputting the virtual face feature vector and the fused face feature vector into the initial discriminator, and performing feature alignment processing by the initial discriminator to obtain loss of feature alignment;
and correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment, the pixel loss and the perception loss corresponding to the virtual face image sample and the reconstructed virtual face image to obtain the target virtual face reconstruction model.
4. The method according to claim 3, wherein the inputting the virtual facial feature vector and the fused facial feature vector into the initial discriminator, and performing feature alignment processing by the initial discriminator to obtain a loss of feature alignment comprises:
and inputting the virtual facial feature vector and the fused facial feature vector into the initial discriminator, and obtaining the loss of the feature alignment by the initial discriminator according to the difference between the first distribution information of the fused facial feature vector and the second distribution information of the virtual facial feature vector.
5. The method of claim 2, wherein constructing the first training sample from the set of training samples comprises:
and respectively constructing a source image sample and a target image sample in the first training sample according to the real face image samples in the training sample set, wherein the source image sample and the target image sample are different real face image samples.
6. The method of claim 5, wherein replacing an intermediate face-change decoder in the intermediate face-change model with the target virtual face reconstruction decoder to obtain a target face-change model comprises:
replacing an intermediate face-changing decoder in the intermediate face-changing model with the target virtual face reconstruction decoder to obtain a replaced intermediate face-changing model;
respectively constructing a source image sample and a target image sample in a second training sample according to a real face image sample and a virtual face image sample in the training sample set, taking the real face image sample as the source image sample, and taking the virtual face image sample as the target image sample;
and inputting the second training sample into the replaced intermediate face changing model, and training to obtain the target face changing model.
7. The method of any one of claims 1-6, wherein prior to constructing the first training sample from the set of training samples, the method further comprises:
acquiring a plurality of preset initial virtual face image samples and a plurality of real face image samples;
generating a plurality of virtual face image samples according to the expression parameters and the head posture parameters of the initial virtual face image samples;
and constructing the training sample set according to the plurality of virtual face image samples and the plurality of real face image samples.
8. An image processing method, characterized in that the method comprises:
acquiring a target real face image and a target virtual face image;
inputting the target real facial image and the target virtual facial image into a target face-changing model respectively to obtain a face-changing image, wherein the face-changing image comprises an identity characteristic of the target real facial image and an attribute characteristic of the target virtual facial image, and the target face-changing model is obtained by the model determination method according to any one of claims 1 to 7.
9. A model determination apparatus, characterized in that the apparatus comprises:
the system comprises a construction module, a detection module and a processing module, wherein the construction module is used for constructing a first training sample according to a training sample set, the training sample set comprises a real face image sample and a virtual face image sample, and the first training sample comprises a source image sample and a target image sample;
the first determining module is used for training an initial face changing model according to the first training sample to obtain an intermediate face changing model, the intermediate face changing model is used for processing a fusion face characteristic vector to obtain a prediction face changing image, and the fusion face characteristic vector comprises an identity characteristic vector of a source image sample and an attribute characteristic vector of a target image sample;
the first input module is used for inputting the virtual face image samples in the training sample set into an initial virtual face reconstruction model to obtain virtual face feature vectors;
the feature alignment module is used for performing feature alignment on the virtual face feature vector and a fused face feature vector generated by the intermediate face changing model, correcting an initial virtual face reconstruction decoder in the initial virtual face reconstruction model according to the loss of the feature alignment, and obtaining a target virtual face reconstruction model, wherein the target virtual face reconstruction model is used for processing the virtual face feature vector to obtain a reconstructed virtual face image;
and the second determining module is used for obtaining a target face changing model according to the target virtual face reconstruction model and the intermediate face changing model.
10. An image processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a target real face image and a target virtual face image;
a second input module, configured to input the target real face image and the target virtual face image into a target face change model respectively to obtain a face change image, where the face change image includes an identity feature of the target real face image and an attribute feature of the target virtual face image, and the target face change model is obtained by the model determining apparatus according to claim 9.
11. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the model determination method according to any one of claims 1 to 7 or the steps of the image processing method according to claim 8.
12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the model determination method as set forth in any one of the claims 1-7 or the steps of the image processing method as set forth in claim 8.
CN202210975057.8A 2022-08-15 2022-08-15 Model determination method, image processing method, device, equipment and storage medium Pending CN115294423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210975057.8A CN115294423A (en) 2022-08-15 2022-08-15 Model determination method, image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210975057.8A CN115294423A (en) 2022-08-15 2022-08-15 Model determination method, image processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115294423A true CN115294423A (en) 2022-11-04

Family

ID=83830676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210975057.8A Pending CN115294423A (en) 2022-08-15 2022-08-15 Model determination method, image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115294423A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385669A (en) * 2023-06-06 2023-07-04 北京红棉小冰科技有限公司 Virtual human video creation method and device and electronic equipment
WO2024099004A1 (en) * 2022-11-09 2024-05-16 腾讯科技(深圳)有限公司 Image processing model training method and apparatus, and electronic device, computer-readable storage medium and computer program product

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024099004A1 (en) * 2022-11-09 2024-05-16 腾讯科技(深圳)有限公司 Image processing model training method and apparatus, and electronic device, computer-readable storage medium and computer program product
CN116385669A (en) * 2023-06-06 2023-07-04 北京红棉小冰科技有限公司 Virtual human video creation method and device and electronic equipment
CN116385669B (en) * 2023-06-06 2023-10-24 北京红棉小冰科技有限公司 Virtual human video creation method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Liu et al. Pd-gan: Probabilistic diverse gan for image inpainting
CN111275518B (en) Video virtual fitting method and device based on mixed optical flow
Zhang et al. Semantic image inpainting with progressive generative networks
CN110717977B (en) Method, device, computer equipment and storage medium for processing game character face
Wang et al. Toward characteristic-preserving image-based virtual try-on network
CN115294423A (en) Model determination method, image processing method, device, equipment and storage medium
KR102602112B1 (en) Data processing method, device, and medium for generating facial images
Tuzel et al. Global-local face upsampling network
Zhao et al. Identity preserving face completion for large ocular region occlusion
CN116109798B (en) Image data processing method, device, equipment and medium
Chen et al. Face swapping: realistic image synthesis based on facial landmarks alignment
CN115565238B (en) Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product
CN111553267A (en) Image processing method, image processing model training method and device
JP2023539620A (en) Facial image processing method, display method, device and computer program
KR20230110607A (en) Face reconstruction methods, devices, computer equipment and storage media
Li et al. Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation
CN115187706A (en) Lightweight method and system for face style migration, storage medium and electronic equipment
CN116704079A (en) Image generation method, device, equipment and storage medium
CN116704084B (en) Training method of facial animation generation network, facial animation generation method and device
Ye et al. Real3d-portrait: One-shot realistic 3d talking portrait synthesis
Li et al. SPGAN: Face forgery using spoofing generative adversarial networks
JP7479507B2 (en) Image processing method and device, computer device, and computer program
Otto et al. Learning dynamic 3D geometry and texture for video face swapping
CN116012550A (en) Face deformation target correction method and device, equipment, medium and product thereof
Zeng et al. Multi-view self-supervised learning for 3D facial texture reconstruction from single image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination