WO2023138498A1 - 生成风格化图像的方法、装置、电子设备及存储介质 - Google Patents

生成风格化图像的方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023138498A1
WO2023138498A1 PCT/CN2023/072067 CN2023072067W WO2023138498A1 WO 2023138498 A1 WO2023138498 A1 WO 2023138498A1 CN 2023072067 W CN2023072067 W CN 2023072067W WO 2023138498 A1 WO2023138498 A1 WO 2023138498A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
generation model
model
style
Prior art date
Application number
PCT/CN2023/072067
Other languages
English (en)
French (fr)
Inventor
周财进
李文越
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023138498A1 publication Critical patent/WO2023138498A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Embodiments of the present disclosure relate to the technical field of data processing, for example, to a method, device, electronic device, and storage medium for generating a stylized image.
  • Embodiments of the present disclosure provide a method, device, electronic device, and storage medium for generating stylized images, which can efficiently construct a target style data generation model without fusing a large number of training samples with two types of styles, and reduce the cost consumed in the model construction process.
  • an embodiment of the present disclosure provides a method for generating a stylized image, the method including:
  • model parameters to be transferred of the facial image generation model so as to construct the first sample generation model to be trained and the second sample generation model to be trained based on the model parameters to be transferred;
  • a target style data generation model Based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, determine a target style data generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.
  • the embodiment of the present disclosure also provides a device for generating a stylized image, the device comprising:
  • the model parameter acquisition module to be transferred is configured to obtain the model parameters to be transferred of the facial image generation model, so as to construct the first sample generation model to be trained and the second sample generation model to be trained based on the model parameters to be transferred;
  • the first sample generation model training module to be trained is configured to train the first sample generation model to be trained based on the training samples of the first style type to obtain the first target sample generation model;
  • the second sample generation model training module to be trained is configured to train the second sample generation model to be trained based on the training samples of the second style type to obtain a second target sample generation model;
  • the target style data generation model determination module is configured to determine a target style data generation model based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.
  • an embodiment of the present disclosure further provides an electronic device, and the electronic device includes:
  • storage means configured to store at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor is made to implement the method for generating a stylized image according to any one of the embodiments of the present disclosure.
  • the embodiments of the present disclosure further provide a storage medium containing computer-executable instructions, the computer-executable instructions are used to implement the method for generating a stylized image according to any one of the embodiments of the present disclosure when executed by a computer processor.
  • FIG. 1 is a schematic flowchart of a method for generating a stylized image provided by Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic diagram of constructing a first sample generation model to be trained and a second sample generation model to be trained based on the facial image generation model provided by Embodiment 1 of the present disclosure;
  • FIG. 3 is a schematic diagram of a target style data generation model constructed based on the first target sample generation model and the second target sample generation model provided by Embodiment 1 of the present disclosure;
  • FIG. 4 is a schematic flowchart of a method for generating a stylized image provided in Embodiment 2 of the present disclosure
  • FIG. 5 is a structural block diagram of a device for generating a stylized image provided by Embodiment 3 of the present disclosure
  • FIG. 6 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • Embodiment 1 is a schematic flow chart of a method for generating a stylized image provided by Embodiment 1 of the present disclosure.
  • This embodiment is applicable to the scene of constructing a specific style data generation model.
  • the constructed model is set to generate a stylized image that combines two types of styles.
  • the method can be executed by a device for generating a stylized image.
  • the device can be implemented in the form of software and/or hardware.
  • the hardware can be an electronic device, such as a mobile terminal, a personal computer (Personal Computer, PC) or a server.
  • the scene of arbitrary image display is usually implemented by the cooperation of the client and the server.
  • the method provided in this embodiment can be executed by the server, the client, or the cooperation of the client and the server.
  • the method of the present embodiment includes:
  • the facial image generation model may be a neural network model used to generate the user's facial image. It can be understood that after inputting the user's facial features into the facial image generation model, a facial image consistent with the user's facial features can be obtained after model processing.
  • the facial image generation model can be a stylegan model based on Generative Adversarial Network (GAN).
  • GAN Generative Adversarial Network
  • the generation confrontation network consists of a generation network and a discriminative network.
  • the generation network randomly samples from the latent space as input, and its output needs to imitate the real samples in the training set as much as possible.
  • the input of the discriminant network is a real sample and the output of the generation network.
  • the stylegan model in this embodiment also includes a generator and a discriminator, and the generator can be used to process the Gaussian noise corresponding to the user's facial image, thereby regenerating a user's facial image; the relevant parameters in the generator can be adjusted by using the discriminator.
  • the advantage of using a discriminator that includes a discriminative network is that the user's facial image regenerated by the stylegan model after parameter correction is almost exactly the same as the user's facial image corresponding to the input Gaussian noise. It should be noted that in the field of high-definition image generation, the stylegan model has excellent expressive ability, and can at least generate high-definition images up to 1024*1024 resolution.
  • Fig. 2 Take the schematic diagram of the training sample generation model as an example, where G1 is a facial image generation model, and a clear facial schematic diagram can be obtained after inputting Gaussian noise.
  • G1 is a facial image generation model
  • a clear facial schematic diagram can be obtained after inputting Gaussian noise.
  • the facial image generation model in order to make the output of the facial image generation model almost completely consistent with the facial image corresponding to the input Gaussian vector, the facial image generation model also needs to be trained.
  • obtain a plurality of basic training samples process the Gaussian noise based on the image generator to be trained to generate an image to be discriminated; determine the benchmark loss value based on the discrimination process between the image to be discriminated and the collected real facial image based on the discriminator; modify the model parameters in the generator of the image to be trained based on the benchmark loss value; use the convergence of the loss function in the generator of the image to be trained as the training target to obtain a facial image generation model.
  • the basic training samples are the data used to train the facial image generation model, and each basic training sample is Gaussian noise corresponding to the facial information of the target subject, wherein the facial information of the target subject is an image containing the facial information of the user, for example, the user's ID photo or life photo, and Gaussian noise can be understood as a high-dimensional vector corresponding to the facial information of the target subject.
  • Gaussian noise can be understood as a high-dimensional vector corresponding to the facial information of the target subject.
  • FFHQ this data set is a facial feature data set.
  • the facial image generation model to be trained is a stylegan model
  • the model is composed of an image generator to be trained and a discriminator. Therefore, after obtaining multiple basic training samples, a large amount of Gaussian noise can be processed by the image generator to be trained to generate an image to be discriminated, that is, an image that may be different from the real facial image input by the user. After being determined, the benchmark loss value between the image to be discriminated and the real face image can be determined based on the discriminator.
  • the training error of the loss function in the image generator to be trained can be used as a condition for detecting whether the loss function reaches convergence, such as whether the training error is smaller than the preset error or whether the error trend is stable, or whether the current iteration number is equal to the preset number. If the detection reaches the convergence condition, for example, the training error of the loss function is less than the preset error, or the trend of error tends to be stable, it indicates that the training of the facial image generation model to be trained is completed, and the iterative training can be stopped at this time.
  • transfer learning is to apply the knowledge or patterns learned in a certain field or task to different but related fields or problems, that is, to realize the transfer of labeled data or knowledge structures from related fields, and to complete or improve the learning effect of the target field or task.
  • the advantage of using transfer learning is that, under the condition of a small number of samples, a model for generating a certain style can be trained.
  • the parameters that have been trained in the facial image generation model can be used as parameters of the model to be transferred, and the first sample generation model to be trained and the second sample generation model to be trained are constructed based on the parameters.
  • the advantage of constructing the first sample generation model to be trained and the second sample generation model to be trained by transfer learning is that the model for generating images of a specific style type can be efficiently constructed by using the model parameters that have been trained, which not only avoids the cumbersome process of obtaining a large number of images of the style model as training data, that is, eliminates the problem of difficult sample acquisition, but also reduces the consumption of computing resources.
  • the model parameters to be transferred of G1 can be obtained, and the first sample generation model G2 to be trained and the second sample generation model G3 to be trained can be generated based on transfer learning.
  • the Gaussian noise corresponding to the user's facial image After the acoustic input G2 is processed, the image output by the model presents the dressing style of a specific region while retaining the user's unique facial features.
  • the image under the first style type output by G2 can be based on the user's original facial features, adding local and regional characteristics such as clothing, hairstyle, hair accessories, and makeup.
  • the images under the second style type can be based on the user's original facial features, adding images of characters in ancient style paintings, which can be understood as making the user's realistic facial images present the visual effect of ancient character paintings.
  • the training samples of the first style type can be obtained to train the model.
  • the first style type is a regional style image, for example, a facial image of a user dressed in a unique style, which corresponds to a certain region. It can be understood that the first style type is a style type that presents the characteristics of a user in a certain region such as clothing, hairstyle, hair accessories, and makeup.
  • Each training sample includes the first face image under the first style type.
  • the first facial image may be processed based on the trained target compilation model to generate Gaussian noise corresponding to the first facial image.
  • the first sample generation model to be trained is a model for generating images of a specific regional style
  • the corresponding training samples are multiple images of the user's dressing style in the region, and these images are the first facial images.
  • the process of training the first sample generation model to be trained is to obtain a plurality of training samples under the first style type; input the Gaussian noise corresponding to the first facial image into the first sample generation model to be trained to obtain the first actual output image; based on the discriminator, the first actual output image and the corresponding first facial image are discriminated and processed to determine the loss value, so as to correct the model parameters in the first sample generation model to be trained based on the loss value; the loss function convergence in the first image generation model to be trained is used as the training target to obtain the first target sample generation model.
  • the first to-be-trained The image generator in the training sample generation model processes multiple Gaussian noises to generate the first actual output image to be discriminated, that is, the image that is different from the first facial image. After the first actual output image and the corresponding first facial image are determined, corresponding multiple loss values may be determined based on the discriminator.
  • the training error of the loss function in the model can be used as a condition for detecting whether the loss function reaches convergence, such as whether the training error is smaller than the preset error or whether the error trend tends to be stable, or whether the current number of iterations is equal to the preset number. If the detection meets the convergence condition, for example, the training error of the loss function is smaller than the preset error, or the trend of error tends to be stable, it indicates that the training of the first sample generation model to be trained is completed, and the iterative training can be stopped at this time.
  • the trained first target sample generation model can be obtained.
  • Gaussian noise corresponding to the user's facial image is input into the model, a user's facial image that retains the unique facial features of the user and can present the first type of user's facial image can be obtained.
  • the training samples can be about 200 images of the first style type (that is, the first facial image).
  • these images should have a similar structure to the facial image input by the user.
  • the images must have features such as the user's facial features and hair.
  • the convenience of model training is not only improved, but also the corresponding target sample generation model can be trained when there are few images of a specific style type, which greatly reduces the demand for training samples for the model to be trained.
  • the second style class can be obtained Type training samples to train the model.
  • the second style type is an ancient-style material image, for example, an image of an ancient figure painting style, which can be understood as a style type that presents characteristics of ancient meticulous painting, oil painting, and the like.
  • Each training sample includes a second facial image under the second style type.
  • Gaussian noise reflecting the corresponding facial features can also be obtained.
  • the second sample generation model to be trained is a model for generating an ancient-style material style image
  • the corresponding training samples are multiple images of the ancient-style material style, and these images are the second facial image.
  • the process of training the second sample generation model to be trained is to obtain a plurality of training samples under the second style type; input the Gaussian noise corresponding to the second facial image into the second sample generation model to be trained to obtain the second actual output image; carry out discrimination processing on the second actual output image and the corresponding second facial image based on the discriminator, determine the loss value, and modify the model parameters in the second sample generation model to be trained based on the loss value; use the convergence of the loss function in the second image generation model to be trained as the training target to obtain the second target sample generation model.
  • the process of training the second sample generation model to be trained based on the multiple training samples under the second style type is similar to the process of training the first sample generation model to be trained based on the multiple training samples under the first style type, and will not be repeated in this embodiment of the present disclosure.
  • only a small amount of training data of the second style type is needed to train the second sample generation model to be trained to obtain the second target sample generation model, for example, about 200 images of the second style type (that is, the second facial image).
  • these images also have a similar structure to the facial image input by the user.
  • the images must have features such as the user's facial features and hair. It can be understood that this model training method similar to the first sample generation model to be trained is also convenient, and reduces the demand for images of the second style type, which will not be repeated in this embodiment of the present disclosure.
  • S104 Determine the target style data generation model based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.
  • model fusion is the process of integrating multiple models according to a certain method after training multiple models.
  • the output image can not only retain the user's unique facial features, but also make the image present the first style type and the second style type at the same time.
  • the fitting parameter may be a coefficient representing the fusion degree of the two style types.
  • the fitting parameter is at least used to adjust the weights of different style types. It can be understood that it is used to control the style type presented by the output stylized image which tends to be more inclined to which of the above two style types.
  • developers can edit or modify the fitting parameters in advance based on corresponding controls or programs, which will not be repeated in the embodiments of the present disclosure.
  • the model parameters of the first target sample generation model and the second target sample generation model can be linearly combined to obtain the target model parameters, that is, the parameters required for constructing the target style data generation model. Therefore, based on these parameters, the target style data generation model can be obtained.
  • the target style data generation model G4 can be constructed. It can be seen from Figure 3 that G2 can obtain images of a specific regional style based on user input, and G3 can obtain images of ancient style material styles based on user input. Therefore, after processing user input with the constructed G4, the obtained image not only retains the user’s unique facial features, but also presents a specific regional style and ancient material style.
  • the G4 When using a typical image, the G4 outputs a stylized image that combines the first style type and the second style type at the same time, which can not only present the original facial features of the user, but also present the local and regional characteristic clothing, hairstyle, hair accessories and makeup, and make the image present the visual effect of ancient figure paintings.
  • the parameters of the model to be migrated are obtained first, so that the first sample generation model to be trained and the second sample generation model to be trained are constructed based on these parameters, and the corresponding sample generation model to be trained is trained based on the training samples of two types of styles.
  • the target style data generation model can be efficiently constructed, which not only enables users to use the model to generate images of the target style type, but also reduces the cost consumed in the model construction process.
  • the facial image input by the user can be processed, so as to obtain an image with multiple styles at the same time.
  • the model is obtained based on the weighted average of the parameters in the first target sample generation model and the parameters in the second target sample generation model, there may be a problem that the output image effect is not good.
  • the following methods can be used to optimize the target style data generation model.
  • Gaussian noise is input into the target style data generation model to obtain a stylized image to be corrected that combines the first style type and the second style type; through correction processing of the stylized image to be corrected, the target style image is determined, and the target style image is used as a target training sample to correct model parameters in the target style data generation model based on the target training sample, and an updated target style data generation model is obtained.
  • the image output by G4 is the stylized image to be corrected.
  • the stylized image to be corrected retains the user’s unique facial features, it may not achieve high accuracy when reflecting the first style type and the second style type, or the fusion state of the two style types is relatively rigid.
  • Correct the stylized image for correction processing for example, based on pre-written scripts or related drawing software, adjust the parameters of the image such as saturation, contrast, blur and texture, so as to obtain a target style image that is more in line with user expectations.
  • the corrected target style image can be used as training data to train the target style data generation model in a subsequent process.
  • the method of modifying the model parameters to realize the model update may be: input Gaussian noise into the target style data generation model, and output the stylized image to be corrected; process the stylized image to be corrected and the target style image based on the discriminator to determine the loss value; correct the model parameters in the target style data generation model based on the loss value, and obtain the updated target style data generation model.
  • the target style data generation model can be used to process multiple Gaussian noises to generate a stylized image to be corrected, that is, an image that does not fully present the target style type.
  • multiple corresponding loss values can be determined based on the discriminator.
  • the training error of the loss function in the model that is, the loss parameter, can be used as a condition for detecting whether the loss function has reached convergence, such as whether the training error is smaller than the preset error or whether the error trend is stable, or whether the current number of iterations is equal to the preset number.
  • the iterative training can be stopped at this time. If it is detected that the current convergence condition is not met, other Gaussian noise can be further processed to generate a new stylized image to be corrected, so as to continue training the model until the training error of the loss function is within the preset range. It can be understood that when the training error of the loss function reaches convergence, the trained target style data generation model can be obtained. At this time, after the Gaussian noise corresponding to the user's facial image is input into the model, the user's facial image that retains the user's unique facial features and can present the first style type and the second style type can be obtained.
  • the target stylized image corresponds to the target special effect image mentioned in this technical solution.
  • the constructed target style data generation model can be deployed in related application software. It can be understood that when a user is detected to trigger a special effect control related to the target style data generation model, the program related to the special effect can be run. For example, if the user’s facial image is received based on the user import operation (such as the user uploading a photo through a related button), or the user’s facial image is collected based on the camera device of the mobile terminal (such as the user’s real-time video), these images can be converted, so as to show that there are two styles of stylization image.
  • the user import operation such as the user uploading a photo through a related button
  • the user’s facial image is collected based on the camera device of the mobile terminal (such as the user’s real-time video)
  • these images can be converted, so as to show that there are two styles of stylization image.
  • the trained target compilation model can also be combined with the target style data generation model to obtain a complete special effect image generation model; for example, the special effect image generation model is deployed on a mobile terminal to provide users with services for generating multiple styles of special effect images based on input images.
  • the special effect image generation model is deployed on a mobile terminal to provide users with services for generating multiple styles of special effect images based on input images.
  • the method includes the following steps:
  • the special effect image generation model After the special effect image generation model is obtained, it needs to be deployed to the terminal device. Since the terminal device generally has the function of collecting the user's facial image, the trained target style data generation model can only process the Gaussian noise corresponding to the user's facial image. Therefore, in order to make the special effect image generation model run effectively on the terminal device, it is also necessary to determine a model capable of generating corresponding Gaussian noise based on the user's facial image, that is, the target compilation model.
  • the training compilation model is trained to obtain the target compilation model; based on the target compilation model and the target style data generation model, the special effect image generation model is determined, and based on the special effect image generation model, the obtained facial image to be processed is stylized to obtain a target special effect image that combines the first style type and the second style type.
  • the facial image is the image containing facial features input by the user, for example, the user's ID photo or life photo, etc.
  • the compiled model to be trained can be an encoder coding model.
  • the encoding-decoding (Encoder-Decoder) framework is a model framework of deep learning type, and the embodiments of the present disclosure will not repeat them here. Input multiple facial images into the encoder coding model, and process the Gaussian noise output by the encoder coding model based on the facial image generation model, and then obtain corresponding images that can be used as training data for the compiled model to be trained.
  • the training process of the compiled model to be trained is to obtain a plurality of first training images; for each first image to be trained, input the current first training image into the compiled model to be trained to obtain the Gaussian noise to be used corresponding to the current first training image; input the Gaussian noise to be used to the facial image generation model to obtain the third actual output image; determine the image loss value based on the third actual output image and the current first training image;
  • the target compilation model is used to determine the special effect image generation model based on the target compilation model and the target style data generation model.
  • the training image after obtaining the first training image containing the facial features of the user, the training image can be used to The training and compilation model processes multiple of these images to generate the corresponding Gaussian noise to be used.
  • These Gaussian noises are actually high-dimensional vectors that cannot accurately and completely reflect the user's facial features.
  • the Gaussian noise to be used is processed by using the facial image generation model to obtain a third actual output image that is not completely consistent with the first training image. After the third actual output image and the current first training image are determined, multiple corresponding loss values can be determined based on the discriminator.
  • the training error of the loss function in the model can be used as the condition for detecting whether the loss function has reached convergence, such as whether the training error is smaller than the preset error or whether the error trend is stable, or whether the current number of iterations is equal to the preset number. If the detection meets the convergence condition, for example, the training error of the loss function is less than the preset error, or the trend of the error tends to be stable, it indicates that the training of the compiled model to be trained is completed, and the iterative training can be stopped at this time.
  • the training error of the loss function reaches convergence, the trained target compiled model can be obtained.
  • the target compilation model is set to process the input facial image into corresponding Gaussian noise. After the user's facial image is input to the target compilation model, the facial image generation model can output an image that is almost exactly the same as the user's facial image based on the Gaussian noise output by the target compilation model.
  • the target compilation model and the target style data generation model can be combined to obtain the special effect image generation model.
  • the model can be combined with G4 to obtain a special effect image generation model.
  • the target compilation model in the model can process the image, and input the processed Gaussian noise z into G4.
  • the model after obtaining the special effect image generation model, in order to use the model to provide corresponding services to users, the model can be deployed in the mobile terminal, for example, based on a specific program algorithm, the special effect image generation model is integrated into an application program (Application, APP) developed for the mobile platform.
  • Application Application, APP
  • a corresponding control can be developed in the APP for the special effect image
  • a button named "multi-style special effects” is developed in the APP application interface, and at the same time, the button is associated with the function of generating images with multiple styles based on the special effect image generation model.
  • the image input by the user in real time based on the mobile terminal can be called, or the image pre-stored in the mobile terminal can be called. It can be understood that the called image needs to contain at least the user's facial information, and these images are the images to be processed.
  • the image to be processed can be processed based on the program code corresponding to the special effect image generation model, so as to obtain a target special effect image that not only retains the user's unique facial features, but also combines the first style type and the second style type, that is, the special effect image output by G4 in FIG. 3 .
  • the trained target compiled model can also be combined with the target style data generation model to obtain a complete special effect image generation model; the special effect image generation model is deployed on the mobile terminal, thereby providing users with services for generating multiple styles of special effect images based on input images.
  • Fig. 5 is a structural block diagram of a device for generating a stylized image provided in Embodiment 3 of the present disclosure, which can execute the method for generating a stylized image provided in any embodiment of the present disclosure, and has corresponding functional modules for executing the method.
  • the device includes: a model parameter acquisition module 301 to be transferred, a first training sample generation model training module 302 , a second training sample generation model training module 303 , and a target style data generation model determination module 304 .
  • the parameter acquisition module 301 of the model to be transferred is configured to obtain the model parameters to be transferred of the facial image generation model, so as to construct the first sample generation model to be trained based on the model parameters to be transferred, and the second model to be transferred. Training sample generation model.
  • the first sample generation model training module 302 is configured to train the first sample generation model to be trained based on the training samples of the first style type to obtain a first target sample generation model.
  • the second sample generation model training module 303 is configured to train the second sample generation model to be trained based on the training samples of the second style type to obtain a second target sample generation model.
  • the target style data generation model determination module 304 is configured to determine a target style data generation model based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.
  • the device for generating a stylized image further includes a facial image generation model determination module.
  • Facial image generation model determination module is set to obtain a plurality of basic training samples; wherein, each basic training sample is Gaussian noise that includes the corresponding target body facial information; based on the image generator to be trained, the Gaussian noise is processed to generate an image to be discriminated; based on the discriminator, the image to be discriminated and the real facial image collected are discriminated and processed to determine a benchmark loss value; the model parameters in the image generator to be trained are corrected based on the benchmark loss value;
  • the first training sample generation model training module 302 includes a first style type training sample acquisition unit, a first actual output image determination unit, a first correction unit, and a first target sample generation model determination unit.
  • the first style type training sample acquisition unit is configured to acquire a plurality of training samples under the first style type; wherein, each training sample includes a first facial image under the first style type.
  • the first actual output image determination unit is configured to input Gaussian noise corresponding to the first facial image into the first sample generation model to be trained to obtain a first actual output image.
  • the first correction unit is configured to perform discrimination processing on the first actual output image and the corresponding first facial image based on the discriminator, and determine a loss value, so as to perform a discrimination process on the first sample to be trained based on the loss value.
  • the model parameters in this generated model are corrected.
  • the first target sample generation model determination unit is configured to use the convergence of the loss function in the first image generation model to be trained as the training target to obtain the first target sample generation model.
  • the second to-be-trained sample generation model training module 303 includes a second style type training sample acquisition unit, a second actual output image determination unit, a second correction unit, and a second target sample generation model determination unit.
  • the second style type training sample acquisition unit is configured to acquire a plurality of training samples under the second style type; wherein, each training sample includes a second facial image under the second style type.
  • the second actual output image determination unit is configured to input the Gaussian noise corresponding to the second facial image into the second sample generation model to be trained to obtain the second actual output image.
  • the second correction unit is configured to perform discrimination processing on the second actual output image and the corresponding second facial image based on the discriminator, and determine a loss value, so as to correct the model parameters in the second sample generation model to be trained based on the loss value.
  • the second target sample generation model determination unit is configured to use the convergence of the loss function in the second image generation model to be trained as the training target to obtain the second target sample generation model.
  • the target style data generation model determination module 304 includes a fitting parameter acquisition unit, a target model parameter determination unit, and a target style data generation model determination unit.
  • the fitting parameter acquisition unit is set to obtain the preset fitting parameters.
  • the target model parameter determination unit is configured to perform fitting processing on the model parameters to be fitted in the first target sample generation model and the second target sample generation model based on the fitting parameters to obtain target model parameters.
  • the target style data generation model determination unit is configured to determine the target style data generation model based on the target model parameters.
  • the device for generating a stylized image further includes a target style data generating model updating module.
  • target style data generation model update module configured to input Gaussian noise to said target style
  • a stylized image to be corrected that combines the first style type and the second style type is obtained; by correcting the stylized image to be corrected, a target style image is determined, and the target style image is used as a target training sample to correct model parameters in the target style data generation model based on the target training sample to obtain an updated target style data generation model.
  • the device for generating a stylized image further includes a model parameter correction module.
  • the model parameter correction module is configured to input Gaussian noise into the target style data generation model, and output the stylized image to be corrected; process the stylized image to be corrected and the target style image based on the discriminator to determine a loss value; modify the model parameters in the target style data generation model based on the loss value, and obtain an updated target style data generation model.
  • the device for generating a stylized image further includes a stylized processing module.
  • the stylization processing module is configured to train the compilation model to be trained based on the facial image generation model and a plurality of facial images to obtain a target compilation model; wherein, the target compilation model is configured to process the input facial image into corresponding Gaussian noise; based on the target compilation model and the target style data generation model, a special effect image generation model is determined to perform stylization processing on the acquired facial image to be processed based on the special effect image generation model, and obtain a target special effect map that combines the first style type and the second style type.
  • the device for generating a stylized image further includes a target compilation model determination module.
  • the target compilation model determination module is configured to obtain a plurality of first training images; for each first image to be trained, input the current first training image into the compilation model to be trained to obtain Gaussian noise to be used corresponding to the current first training image; input the Gaussian noise to be used into the facial image generation model to obtain a third actual output image; determine an image loss value based on the third actual output image and the current first training image; Correct the model parameters in , and take the convergence of the loss function in the compilation model to be trained as the training target to obtain the target compilation model, so as to determine the special effect image generation model based on the target compilation model and the target style data generation model.
  • the device for generating stylized images further includes a model deployment module.
  • the model deployment module is configured to deploy the special effect image generation model in the mobile terminal, so that when the special effect display control is detected, the collected image to be processed is processed into a target special effect image that combines the first style type and the second style type.
  • the first style type is a regional style image
  • the second style type is an ancient style material image
  • the parameters of the model to be migrated are acquired first, so that the first sample generation model to be trained and the second sample generation model to be trained are constructed based on these parameters, and the corresponding generation model of the sample to be trained is trained based on the training samples of two types of styles.
  • the target style data generation model can be efficiently constructed, which not only enables users to use the model to generate images of the target style type, but also reduces the cost consumed in the model construction process.
  • the device for generating a stylized image provided in an embodiment of the present disclosure can execute the method for generating a stylized image provided in any embodiment of the present disclosure, and has a corresponding functional module for executing the method.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present disclosure.
  • the terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), PAD (tablet computer), portable multimedia players (Portable Media Player, PMP), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • PDA Personal Digital Assistant
  • PAD tablet computer
  • portable multimedia players Portable Media Player
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • the electronic device shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • the electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 401, which may perform various appropriate actions and processes according to a program stored in a read-only memory (Read-Only Memory, ROM) 402 or a program loaded from a storage device 406 into a random access memory (Random Access Memory, RAM) 403.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (Input/Output, I/O) interface 405 is also connected to the bus 404 .
  • an editing device 406 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.
  • an output device 407 including, for example, a liquid crystal display (Liquid Crystal Display, LCD), a speaker, a vibrator, etc.
  • a storage device 408 including, for example, a magnetic tape, a hard disk, etc.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 6 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 409, or installed from storage means 406, or from ROM 402 is installed.
  • the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the electronic device provided by the embodiments of the present disclosure and the method for generating a stylized image provided by the above embodiments belong to the same inventive concept, and technical details not described in detail in this embodiment can be referred to the above embodiments.
  • An embodiment of the present disclosure provides a computer storage medium on which a computer program is stored, and when the program is executed by a processor, the method for generating a stylized image provided in the above embodiments is implemented.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read-Only Memory (EPROM (Erasable Programmable Read-Only Memory) or flash memory), optical fibers, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • computer can
  • the program code contained on the read medium can be transmitted by any appropriate medium, including but not limited to: electric wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can be interconnected with any form or medium of digital data communication (for example, a communication network).
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks (Local Area Networks, LANs), wide area networks (Wide Area Networks, WANs), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries at least one program, and when the above-mentioned at least one program is executed by the electronic device, the electronic device:
  • model parameters to be transferred of the facial image generation model so as to construct the first sample generation model to be trained and the second sample generation model to be trained based on the model parameters to be transferred;
  • a target style data generation model Based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, determine a target style data generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • the program code may be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly in the on a remote computer, or entirely on a remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., through the Internet using an Internet service provider).
  • LAN local area network
  • WAN wide area network
  • Internet service provider e.g., AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in the flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Parts (ASSP), System on Chip (System on Chip, SOC), Complex Programmable Logic Device (Complex Programmable Log ic Device, CPLD) and so on.
  • a machine-readable medium may be a tangible medium that may contain or be stored for use by or in conjunction with an instruction execution system, apparatus, or device the program used.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • a machine-readable storage medium would include one or more wire-based electrical connections, a portable computer disk, a hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage devices or any suitable combination of the foregoing.
  • Example 1 provides a method for generating a stylized image, the method including:
  • model parameters to be transferred of the facial image generation model so as to construct the first sample generation model to be trained and the second sample generation model to be trained based on the model parameters to be transferred;
  • a target style data generation model Based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, determine a target style data generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.
  • Example 2 provides a method for generating a stylized image, which further includes:
  • each basic training sample is Gaussian noise corresponding to the facial information of the target subject
  • Example 3 provides a method for generating a stylized image, which further includes:
  • each training sample includes the first facial image under the first style type
  • Example 4 provides a method for generating a stylized image, which further includes:
  • each training sample includes a second facial image under the second style type
  • Example 5 provides a method for generating a stylized image law, including:
  • a target style data generation model is determined.
  • Example 6 provides a method for generating a stylized image, further comprising:
  • Gaussian noise is input into the target style data generation model to obtain a stylized image to be corrected which combines the first style type and the second style type;
  • a target style image is determined, and the target style image is used as a target training sample to correct model parameters in the target style data generation model based on the target training sample to obtain an updated target style data generation model.
  • Example 7 provides a method for generating a stylized image, further comprising:
  • Gaussian noise is input into the target style data generation model, and the stylized image to be corrected is output;
  • the model parameters in the target style data generation model are corrected to obtain an updated target style data generation model.
  • Example 8 provides a method for generating a stylized image, further comprising:
  • the compiled model to be trained is trained to obtain a target compiled model; wherein, the target compiled model is configured to process the input facial image into corresponding Gaussian noise;
  • the model is used to stylize the acquired facial image to be processed based on the special effect image generation model to obtain a target special effect image that combines the first style type and the second style type.
  • Example 9 provides a method for generating a stylized image, further comprising:
  • the current first training image is input into the compiled model to be trained, and Gaussian noise to be used corresponding to the current first training image is obtained;
  • Example 10 provides a method for generating a stylized image, further comprising:
  • the special effect image generation model is deployed in the mobile terminal, so that when the special effect display control is detected, the collected image to be processed is processed into a target special effect image that combines the first style type and the second style type.
  • Example Eleven provides a method for generating a stylized image, further comprising:
  • the first style type is a regional style image
  • the second style type is an ancient style material image
  • Example 12 provides a device for generating a stylized image, including:
  • the model parameter acquisition module to be transferred is set to obtain the model parameters to be transferred of the facial image generation model, so as to construct the first sample generation model to be trained based on the model parameters to be transferred, and the second model to be trained training sample generation model;
  • the first sample generation model training module to be trained is configured to train the first sample generation model to be trained based on the training samples of the first style type to obtain the first target sample generation model;
  • the second sample generation model training module to be trained is configured to train the second sample generation model to be trained based on the training samples of the second style type to obtain a second target sample generation model;
  • the target style data generation model determination module is configured to determine a target style data generation model based on the model parameters to be fitted of the first target sample generation model and the second target sample generation model, so as to generate a stylized image that combines the first style type and the second style type based on the target style generation model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例提供了一种生成风格化图像的方法、装置、电子设备及存储介质,该方法包括:获取面部图像生成模型的待迁移模型参数,从而构建第一待训练样本生成模型以及第二待训练样本生成模型;基于第一风格类型的训练样本以及第二风格类型的训练样本,分别对相应的待训练样本生成模型进行训练,得到第一目标样本生成模型以及第二目标样本生成模型;基于两种目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于目标风格生成模型生成融合两种风格类型的风格化图像。

Description

生成风格化图像的方法、装置、电子设备及存储介质
本公开要求在2022年1月20日提交中国专利局、申请号为202210067042.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本公开中。
技术领域
本公开实施例涉及数据处理技术领域,例如涉及一种生成风格化图像的方法、装置、电子设备及存储介质。
背景技术
随着图像处理技术的不断发展,用户可以利用多种应用对图像进行处理,从而使处理后的图像呈现出自己所期望的风格类型。
相关技术中,用于图像处理的相关算法在为用户提供相应的服务之前,往往需要利用大量的数据对模型进行训练,然而,这种方式需要消耗大量的成本,同时,在无法获取到某种风格类型的相关图像时,也便不能针对这种风格类型构建出有效的算法模型。
发明内容
本公开实施例提供一种生成风格化图像的方法、装置、电子设备及存储介质,在无需大量融合有两种风格类型的训练样本的情况下,即可高效地构建出目标风格数据生成模型,降低了模型构建过程中所消耗的成本。
第一方面,本公开实施例提供了一种生成风格化图像的方法,该方法包括:
获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型;
基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练, 得到第二目标样本生成模型;
基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
第二方面,本公开实施例还提供了一种生成风格化图像的装置,该装置包括:
待迁移模型参数获取模块,设置为获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型;
第一待训练样本生成模型训练模块,设置为基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
第二待训练样本生成模型训练模块,设置为基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型;
目标风格数据生成模型确定模块,设置为基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
第三方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
至少一个处理器;
存储装置,设置为存储至少一个程序,
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如本公开实施例任一所述的生成风格化图像的方法。
第四方面,本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于实现如本公开实施例任一所述的生成风格化图像的方法。
附图说明
贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例一所提供的一种生成风格化图像的方法的流程示意图;
图2为本公开实施例一所提供的基于面部图像生成模型构建出第一待训练样本生成模型以及第二待训练样本生成模型的示意图;
图3为本公开实施例一所提供的基于第一目标样本生成模型以及第二目标样本生成模型构建出目标风格数据生成模型的示意图;
图4为本公开实施例二所提供的一种生成风格化图像的方法的流程示意图;
图5为本公开实施例三所提供的一种生成风格化图像的装置的结构框图;
图6为本公开实施例四所提供的一种电子设备的结构示意图。
具体实施方式
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“至少一个”。
实施例一
图1为本公开实施例一所提供的一种生成风格化图像的方法的流程示意图,本实施例可适用于构建特定的风格数据生成模型的场景中,所构建的模型设置为生成融合有两种风格类型的风格化图像,该方法可以由生成风格化图像的装置来执行,该装置可以通过软件和/或硬件的形式实现,该硬件可以是电子设备,如移动终端、个人电脑(Personal Computer,PC)端或服务器等。任意图像展示的场景通常是由客户端和服务器来配合实现的,本实施例所提供的方法可以由服务端来执行,客户端来执行,或者是客户端和服务端的配合来执行。
如图1,本实施例的方法包括:
S101、获取面部图像生成模型的待迁移模型参数,以基于待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型。
在本实施例中,面部图像生成模型可以是用于生成用户面部图像的神经网络模型,可以理解为,将用户面部相关特征输入至面部图像生成模型后,经过模型处理后即可得到与该用户面部特征相一致的面部图像。
在实际应用过程中,面部图像生成模型可以是基于生成对抗网络(Generative Adversarial Network,GAN)的stylegan模型。其中,生成对抗网络由一个生成网络和一个判别网络组成。生成网络从潜在空间中随机采样作为输入,其输出结果需要尽量模仿训练集中的真实样本,判别网络的输入为真实样本以及生成网络的输出。基于此可以理解,本实施例中的stylegan模型也包括生成器以及判别器,利用生成器可以对与用户面部图像相对应的高斯噪声进行处理,从而重新生成一幅用户面部图像;利用判别器可以对生成器中的相关参数进行调整。利用包含判别网络的判别器的好处在于,可以使参数修正后的stylegan模型重新生成的用户面部图像,与作为输入的高斯噪声所对应的用户面部图像几乎完全一致。需要说明的是,在高清图像生成领域,stylegan模型具有非常优异的表达能力,至少可以生成高达1024*1024分辨率的高清图片。
以图2基于面部图像生成模型构建出第一待训练样本生成模型以及第二待 训练样本生成模型的示意图为例,其中,G1为面部图像生成模型,输入高斯噪声后可以得到清晰的面部示意图。在本实施例中,为了使面部图像生成模型的输出,与输入的高斯向量所对应面部图像几乎完全一致,还需要训练得到该面部图像生成模型。可选的,获取多个基础训练样本;基于待训练图像生成器对高斯噪声进行处理,生成待判别图像;基于判别器对待判别图像和采集的真实面部图像判别处理,确定基准损失值;基于基准损失值对待训练图像生成器中的模型参数进行修正;将待训练图像生成器中的损失函数收敛作为训练目标,得到面部图像生成模型。
其中,基础训练样本即是用于训练面部图像生成模型的数据,每个基础训练样本为包括目标主体面部信息相对应的高斯噪声,其中,目标主体面部信息即是包含用户面部信息的图像,例如,用户的证件照或生活照,而高斯噪声则可以理解为与目标主体面部信息相对应的高维向量。需要说明的是,在实际应用过程中,可以基于大型公开数据集FFHQ(该数据集为一个面部特征数据集)来获取大量的基础训练样本。
同时,根据上述说明还可以确定,当待训练的面部图像生成模型为stylegan模型时,该模型由待训练图像生成器以及判别器构成。因此,在获取多个基础训练样本后,即可利用待训练图像生成器对大量的高斯噪声进行处理,生成待判别图像,即可能与用户输入的真实面部图像存在差异的图像。在确定出后,可以基于判别器确定待判别图像以及真实面部图像之间的基准损失值。利用基准损失值对待训练图像生成器中的模型参数进行修正时,可以将待训练图像生成器中的损失函数的训练误差,即损失参数作为检测损失函数是否达到收敛的条件,比如训练误差是否小于预设误差或误差变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者误差变化趋势趋于稳定,表明该待训练面部图像生成模型训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以进一步获取其他基础训练样本以对模型继续进行训练,直至损失函数的训练误 差在预设范围之内。可以理解,当损失函数的训练误差达到收敛时,即可得到训练完成的面部图像生成模型,此时将用户面部图像对应的高斯向量输入至模型中后,即可得到与用户面部图像几乎完全一致的图像,以图2为例来说,训练完毕后的G1输出的图像,与所输入高斯噪声对应的图像几乎完全一致。通常来说,利用大量数据对面部图像生成模型进行训练比较困难,需要消耗较多的计算资源,同时,如果想要训练得到用于生成特定风格类型图像的模型时,还需要获取大量属于该风格类型的图像作为训练样本,而某种风格类型下的样本几乎是不存在的或者难以获取的,相应的,在实际应用中也无法训练得到此种风格类型的模型,进而也无法将拍摄得到的图像转换为此种风格下的图像。因此,在本实施例中,对面部图像生成模型的参数训练完毕后,可以利用迁移学习(Transfer Learning)来得到用于生成特定风格类型图像的模型。其中,在人工智能领域,迁移学习即是为了将某个领域或任务上学习到的知识或模式应用到不同但相关的领域或问题中,即实现从相关领域中迁移标注数据或者知识结构、完成或改进目标领域或任务的学习效果。在本实施例中,采用迁移学习的好处在于,在少量样本的条件下,就可以训练得到生成某种风格类型的模型。
示例性的,为了得到用于生成特定风格类型图像的模型,可以将面部图像生成模型中已经训练完毕的参数作为待迁移模型参数,并基于该参数构建出第一待训练样本生成模型以及第二待训练样本生成模型。
可以理解,通过迁移学习来构建第一待训练样本生成模型以及第二待训练样本生成模型的好处在于:利用已经训练完毕的模型参数即可高效地构建出用于生成特定风格类型图像的模型,不仅避免了获取大量该风格模型图片作为训练数据的繁琐过程,即,消除了样本获取困难的问题,同时也减少了对计算资源的消耗。
继续以图2为例进行说明,当确定面部图像生成模型为G1后,可以获取G1的待迁移模型参数,并基于迁移学习生成第一待训练样本生成模型G2,以及第二待训练样本生成模型G3,通过图2可知,将用户面部图像对应的高斯噪 声输入G2进行处理后,模型所输出的图像在保留用户独特的面部特征的同时,呈现出特定地域的装扮的风格,例如,G2输出的第一风格类型下的图像可以是在用户原有的面部特征的基础上,添加有地方区域特色服饰、发型、发饰以及妆容等特征的图像;将用户面部图像对应的高斯噪声输入G3进行处理后,模型所输出的图像在保留用户独特的面部特征的同时,呈现出古风材质的风格,例如,G3输出的第二风格类型下的图像可以是在用户原有的面部特征的基础上,添加有古风绘画中人物特征的图像,可以理解为,使用户写实的面部图像呈现出古代人物绘画的视觉效果。
S102、基于第一风格类型的训练样本对第一待训练样本生成模型进行训练,得到第一目标样本生成模型。
在本实施例中,在得到第一待训练样本生成模型后,即可获取第一风格类型的训练样本对模型进行训练。其中,第一风格类型为地域风格图像,例如,某种独特风格装扮的用户面部图像,这种风格装扮又与某一地域相对应,可以理解为,第一风格类型即是呈现出某一地域用户的服饰、发型、发饰以及妆容等特征的风格类型。每个训练样本中包括第一风格类型下第一面部图像。可以基于训练好的目标编译模型对第一面部图像进行处理,生成与第一面部图像相对应的高斯噪声。以图2为例,当第一待训练样本生成模型为用于生成特定地域风格图像的模型时,对应的训练样本即是多幅该地域用户装扮风格的图像,这些图像即是第一面部图像。
训练第一待训练样本生成模型的过程为,获取第一风格类型下的多个训练样本;将与第一面部图像相对应的高斯噪声输入至第一待训练样本生成模型中,得到第一实际输出图像;基于判别器对第一实际输出图像和相应的第一面部图像进行判别处理,确定损失值,以基于损失值对第一待训练样本生成模型中的模型参数进行修正;将第一待训练图像生成模型中的损失函数收敛作为训练目标,得到第一目标样本生成模型。
示例性的,获取多个第一风格类型下的训练样本后,也可以利用第一待训 练样本生成模型中的图像生成器对多个高斯噪声进行处理,生成待判别的第一实际输出图像,即与第一面部图像存在差异的图像。在确定出第一实际输出图像以及对应的第一面部图像后,可以基于判别器确定出相应的多个损失值。在利用多个损失值对第一待训练样本生成模型中的模型参数进行修正时,可以将模型中的损失函数的训练误差,即损失参数作为检测损失函数是否达到收敛的条件,比如训练误差是否小于预设误差或误差变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者误差变化趋势趋于稳定,表明第一待训练样本生成模型训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以进一步获取其他第一风格类型下的训练样本对模型继续进行训练,直至损失函数的训练误差在预设范围之内。可以理解,当损失函数的训练误差达到收敛时,即可得到训练完成的第一目标样本生成模型,此时将与用户面部图像相对应的高斯噪声输入至模型中后,即可得到既保留有用户独特的面部特征,同时又能呈现出第一风格类型的用户面部图像。
需要说明的是,由于第一待训练样本生成模型是基于已训练完毕的面部图像生成模型构建的,因此,只需要采用少量的第一风格类型的训练样本对模型进行训练,即可得到第一目标样本生成模型,在实际应用过程中,训练样本可以是约200幅第一风格类型的图像(即第一面部图像),同时,这些图像应当与用户输入的面部图像拥有相似的结构,例如,图像中都要具备用户的五官和头发等特征。
通过这种方式,不仅提高了模型训练的便捷性,同时,还可以在特定风格类型的图像较少时训练出相应的目标样本生成模型,大幅减少了待训练的模型对训练样本的需求量。
S103、基于第二风格类型的训练样本对第二待训练样本生成模型进行训练,得到第二目标样本生成模型。
在本实施例中,在得到第二待训练样本生成模型后,即可获取第二风格类 型的训练样本对模型进行训练。其中,第二风格类型为古风材质图像,例如,古代人物绘画风格的图像,可以理解为,第二风格类型即是呈现出古代工笔画、油画等特征的风格类型。每个训练样本中包括第二风格类型下的第二面部图像,对第二面部图像进行处理之后,同样可以得到反映对应的面部特征的高斯噪声,以图2为例,当第二待训练样本生成模型为用于生成古风材质风格图像的模型时,对应的训练样本即是多幅古风材质风格的图像,这些图像即是第二面部图像。
训练第二待训练样本生成模型的过程为,获取第二风格类型下的多个训练样本;将与第二面部图像相对应的高斯噪声输入至第二待训练样本生成模型中,得到第二实际输出图像;基于判别器对第二实际输出图像和相应的第二面部图像进行判别处理,确定损失值,以基于损失值对第二待训练样本生成模型中的模型参数进行修正;将第二待训练图像生成模型中的损失函数收敛作为训练目标,得到第二目标样本生成模型。
本领域技术人员应当理解,基于第二风格类型下的多个训练样本对第二待训练样本生成模型进行训练的过程,与基于第一风格类型下多个训练样本对第一待训练样本生成模型进行训练的过程相类似,本公开实施例在此不再赘述。同时,在实际应用的过程中,对第二待训练样本生成模型进行训练以得到第二目标样本生成模型,也只需要少量的第二风格类型的训练数据,例如,200幅左右第二风格类型的图像(即第二面部图像),同时,这些图像也与用户输入的面部图像拥有相似的结构,例如,图像中都要具备用户的五官和头发等特征。可以理解,这种与第一待训练样本生成模型相似的模型训练方式同样具备便捷性,减少了对第二风格类型图像的需求量,本公开实施例在此不再赘述。
S104、基于第一目标样本生成模型和第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于目标风格生成模型生成融合第一风格类型和第二风格类型的风格化图像。
在本实施例中,在训练得到第一目标样本生成模型以及第二目标样本生成 模型后,即可获取上述两种模型的参数,基于模型融合得到目标风格数据生成模型。其中,模型融合即是在训练多个模型后,然后按照一定的方法对这些模型进行集成的过程。集成得到的目标风格生成模型对用户输入的面部图像进行处理后,所输出的图像不仅可以保留用户独特的面部特征,还可以使图像同时呈现出第一风格类型以及第二风格类型,这些呈现出多种风格类型的图像即是风格化图像。
示例性的,在构建目标风格数据生成模型时,首先需要获取预先设置的拟合参数;基于拟合参数对第一目标样本生成模型和第二目标样本生成模型中的待拟合模型参数进行拟合处理,得到目标模型参数;基于目标模型参数,确定目标风格数据生成模型。其中,拟合参数可以是表征两种风格类型融合程度的系数,在输出的风格化图像中,拟合参数至少用于调节不同风格类型的权重,可以理解为,用于控制输出的风格化图像所呈现的风格类型更趋向于上述两种风格类型中的哪一种。在实际应用过程中,开发人员可以基于相应的控件或程序预先对拟合参数进行编辑或修改,本公开实施例在此不再赘述。
示例性的,基于预设的拟合参数即可对第一目标样本生成模型以及第二目标样本生成模型的模型参数进行线性组合,从而得到目标模型参数,即,用于构建目标风格数据生成模型所需的参数。因此,基于这些参数即可得到目标风格数据生成模型。
以图3基于第一目标样本生成模型以及第二目标样本生成模型构建出目标风格数据生成模型的示意图为例,当基于预设的拟合参数,对G2和G3的模型参数进行线性组合后,即可构建出目标风格数据生成模型G4。通过图3可知,由于G2可以基于用户输入得到特定地域风格的图像,G3可以基于用户输入得到古风材质风格的图像,因此,利用所构建的G4对用户输入进行处理后,所得到的图像不仅保留了用户独特的面部特征,还同时呈现出特定地域风格以及古风材质风格,例如,当第一风格类型的图像为添加有地方区域特色服饰、发型、发饰以及妆容等特征的图像,第二风格类型的图像为添加有古风绘画中人物特 征的图像时,G4输出的同时融合有第一风格类型以及第二风格类型的风格化图像即可在呈现用户原有面部特征的同时,既呈现出地方区域特色服饰、发型、发饰以及妆容,又使图像呈现出古代人物绘画的视觉效果。
本实施例的技术方案,先获取面部图像生成模型的待迁移模型参数,从而基于这些参数构建出第一待训练样本生成模型以及第二待训练样本生成模型,基于两种风格类型的训练样本对相应的待训练样本生成模型进行训练,当模型训练完毕后,获取两种目标样本生成模型的待拟合模型参数,从而基于待拟合模型参数确定出目标风格数据生成模型,以基于目标风格数据生成模型生成融合两种风格类型的风格化图像,在无需大量融合有两种风格类型的训练样本的情况下,即可高效地构建出目标风格数据生成模型,不仅使用户可以利用该模型生成目标风格类型的图像,同时也降低了模型构建过程中所消耗的成本。
在上述方案的基础上,得到目标风格数据生成模型之后,即可对用户输入的面部图像进行处理,从而得到同时具备多种风格的图像。此时,由于模型是基于第一目标样本生成模型中的参数以及第二目标样本生成模型中的参数加权平均得到的,因此可能存在输出图像效果不佳的问题。针对于该问题,可以利用下述方式对目标风格数据生成模型进行优化。
示例性的,将高斯噪声输入至目标风格数据生成模型中,得到融合第一风格类型和第二风格类型的待修正风格化图像;通过对待修正风格化图像修正处理,确定目标风格图像,并将目标风格图像作为目标训练样本,以基于目标训练样本对目标风格数据生成模型中的模型参数进行修正,得到更新后的目标风格数据生成模型。
以图3为例,获取与用户面部图像相对应的高斯噪声z之后,即可将其输入至目标风格数据生成模型G4中,对应的,G4输出的图像即为待修正风格化图像,可以理解为,待修正风格化图像虽然保留了用户独特的面部特征,但可能在体现第一风格类型与第二风格类型时,所呈现的效果并未达到较高的准确度,或者,两种风格类型的融合状态比较生硬。此时,可以基于相关应用对待 修正风格化图像进行修正处理,例如,基于预先编写的脚本或相关绘图软件,对图像的饱和度、对比度、模糊度以及纹理等参数进行调整,从而得到更符合用户期望的目标风格图像。本领域技术人员应当理解,修正得到的目标风格图像即可作为训练数据,以在后续过程中对目标风格数据生成模型进行训练。
在本实施例中,对模型参数进行修正以实现模型更新的方式可以是,将高斯噪声输入至目标风格数据生成模型中,输出待修正风格化图像;基于判别器对待修正风格化图像和目标风格图像进行处理,确定损失值;基于损失值对目标风格数据生成模型中的模型参数进行修正,得到更新后的目标风格数据生成模型。
在本实施例中,获取与用户面部特征对应的高斯噪声后,即可利用目标风格数据生成模型对多个高斯噪声进行处理,生成待判别的待修正风格化图像,即并未完全呈现出目标风格类型的图像。在确定出待修正风格化图像以及目标风格图像后,可以基于判别器确定出相应的多个损失值。在利用多个损失值对目标风格数据生成模型中的模型参数进行修正时,可以将模型中的损失函数的训练误差,即损失参数作为检测损失函数是否达到收敛的条件,比如训练误差是否小于预设误差或误差变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者误差变化趋势趋于稳定,表明目标风格数据生成模型训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以进一步对其他高斯噪声进行处理,生成新的待修正风格化图像,从而对模型继续进行训练,直至损失函数的训练误差在预设范围之内。可以理解,当损失函数的训练误差达到收敛时,即可得到训练完成的目标风格数据生成模型,此时将与用户面部图像相对应的高斯噪声输入至模型中后,即可得到既保留有用户独特的面部特征,同时又能呈现出第一风格类型以及第二风格类型的用户面部图像。
还需要说明的是,在本技术方案中,目标风格化图像对应于本技术方案中所提及的目标特效图像。
需要说明的是,在实际应用过程中,所构建的目标风格数据生成模型可以在相关的应用软件中进行部署,可以理解为,当检测到用户触发与目标风格数据生成模型相关的特效控件时,即可运行特效相关的程序,示例性的,若基于用户导入操作接收到用户面部图像(如用户通过相关按钮上传照片),或基于移动终端的摄像装置采集到用户面部图像(如用户进行实时视频)时,即可对这些图像进行转换,从而显示融合有两种风格类型的风格化图像。
实施例二
图4为本公开实施例二所提供的一种生成风格化图像的方法的流程示意图,在前述实施例的基础上,在得到目标风格数据生成模型后,还可以将训练得到的目标编译模型与目标风格数据生成模型进行组合,从而得到完整的特效图像生成模型;示例性的,将特效图像生成模型部署于移动终端,从而为用户提供用于基于输入的图像生成多种风格特效图像的服务。其具体的实施方式可以参见本实施例技术方案。其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图4所示,该方法包括如下步骤:
S201、获取面部图像生成模型的待迁移模型参数,以基于待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型。
S202、基于第一风格类型的训练样本对第一待训练样本生成模型进行训练,得到第一目标样本生成模型。
S203、基于第二风格类型的训练样本对第二待训练样本生成模型进行训练,得到第二目标样本生成模型。
S204、基于第一目标样本生成模型和第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于目标风格生成模型生成融合第一风格类型和第二风格类型的风格化图像。
S205、确定特效图像生成模型。
在本实施例中,在得到目标风格数据生成模型后,为了向用户提供相应的服务,即,使用户利用该模型使输入的面部图像呈现出相应的特效,还需要基于目标风格数据生成模型构建出相应的特效图像生成模型。
通常来说,得到特效图像生成模型后,还需要将其部署至终端设备中,由于终端设备普遍具备采集用户面部图像的功能,而训练得到的目标风格数据生成模型仅能够对与用户面部图像相对应的高斯噪声进行处理。因此,为了使特效图像生成模型在终端设备上有效运行,还需要确定出一个能够基于用户面部图像生成对应的高斯噪声的模型,即目标编译模型。
示例性的,基于面部图像生成模型和多个面部图像,对待训练编译模型进行训练,得到目标编译模型;基于目标编译模型和目标风格数据生成模型,确定特效图像生成模型,以基于特效图像生成模型对获取到的待处理面部图像进行风格化处理,得到融合第一风格类型和第二风格类型的目标特效图。
其中,面部图像即是用户输入的包含面部特征的图像,例如,用户的证件照或生活照等,待训练编译模型则可以是encoder编码模型,本领域技术人员应当理解,编码-解码(Encoder-Decoder)框架即是一个深度学习类型的模型框架,本公开实施例在此不再赘述。将多个面部图像输入至encoder编码模型,并基于面部图像生成模型对encoder编码模型输出的高斯噪声进行处理后,即可得到对应的、可作为待训练编译模型训练数据的图像。
示例性的,待训练编译模型的训练过程为,获取多个第一训练图像;针对每个第一待训练图像,将当前第一训练图像输入至待训练编译模型中,得到与当前第一训练图像相对应的待使用高斯噪声;将待使用高斯噪声输入至面部图像生成模型中,得到第三实际输出图像;基于第三实际输出图像和当前第一训练图像,确定图像损失值;基于图像损失值对待训练编译模型中的模型参数进行修正,并将待训练编译模型中的损失函数收敛作为训练目标,得到目标编译模型,以基于目标编译模型和目标风格数据生成模型,确定特效图像生成模型。
在本实施例中,获取包含用户面部特征的第一训练图像后,即可利用待训 练编译模型对多个这些图像进行处理,生成对应的待使用高斯噪声,这些高斯噪声实际上即是并不能准确、完整反映用户面部特征的高维向量。利用面部图像生成模型对这些待使用高斯噪声进行处理,即可得到与第一训练图像并非完全一致的第三实际输出图像。在确定出第三实际输出图像和当前第一训练图像后,即可基于判别器确定出相应的多个损失值。在利用多个损失值对待训练编译模型中的模型参数进行修正时,可以将模型中的损失函数的训练误差,即损失参数作为检测损失函数是否达到收敛的条件,比如训练误差是否小于预设误差或误差变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者误差变化趋势趋于稳定,表明待训练编译模型训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以进一步对其他第一训练图像进行处理,并基于面部图像生成模型生成与所得到高斯向量相对应的第三实际输出图像,对模型继续进行训练,直至损失函数的训练误差在预设范围之内。当损失函数的训练误差达到收敛时,即可得到训练完成的目标编译模型。可以理解,目标编译模型设置为将输入的面部图像处理为相应的高斯噪声,将用户面部图像输入至目标编译模型后,面部图像生成模型基于目标编译模型输出的高斯噪声,即可输出与用户面部图像几乎完全一致的图像。
在本实施例中,得到目标编译模型后,将目标编译模型和目标风格数据生成模型进行组合,即可得到特效图像生成模型。以图3为例,在得到目标编译模型(即图3所示标识E对应的模型)之后,即可将该模型与G4进行组合,从而得到特效图像生成模型,用户将面部图像输入特效图像生成模型后,模型中的目标编译模型可以对图像进行处理,并将处理得到的高斯噪声z输入至G4中,经过G4的处理后,即可得到既保留有用户独特的面部特征,同时又能呈现出特定地域风格以及古风材质风格的目标特效图像。
S206、将特效图像生成模型部署在移动终端中,以在检测到特效显示控件时,将采集到的待处理图像处理为融合第一风格类型和第二风格类型的目标特 效图像。
在本实施例中,得到特效图像生成模型后,为了利用该模型向用户提供相应的服务,可以将该模型部署于移动终端中,例如,基于特定的程序算法将特效图像生成模型集成到针对于移动平台开发的应用程序(Application,APP)中。
示例性的,可以在APP中针对该特效图像开发相应的控件,例如,在APP应用界面中开发一个名称为“多风格特效”的按钮,同时,将按钮与基于特效图像生成模型生成具有多种风格类型的图像的功能相关联。基于此,当检测到用户触发该按钮后,即可调用用户基于移动终端实时输入的图像,或者,调用预先存储于移动终端中的图像,可以理解,所调用的图像中至少需要包含用户的面部信息,这些图像即是待处理图像。
示例性的,可以基于与特效图像生成模型相对应的程序代码对待处理图像进行处理,从而得到既保留用户独特的面部特征,也同时融合有第一风格类型和第二风格类型的目标特效图像,即,如图3中G4输出的特效图像。
本实施例的技术方案,在得到目标风格数据生成模型后,还可以将训练得到的目标编译模型与目标风格数据生成模型进行组合,从而得到完整的特效图像生成模型;将特效图像生成模型部署于移动终端,从而为用户提供用于基于输入的图像生成多种风格特效图像的服务。
实施例三
图5为本公开实施例三所提供的一种生成风格化图像的装置的结构框图,可执行本公开任意实施例所提供的生成风格化图像的方法,具备执行方法相应的功能模块。如图5所示,该装置包括:待迁移模型参数获取模块301、第一待训练样本生成模型训练模块302、第二待训练样本生成模型训练模块303以及目标风格数据生成模型确定模块304。
待迁移模型参数获取模块301,设置为获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待 训练样本生成模型。
第一待训练样本生成模型训练模块302,设置为基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型。
第二待训练样本生成模型训练模块303,设置为基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型。
目标风格数据生成模型确定模块304,设置为基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
在上述各技术方案的基础上,生成风格化图像的装置还包括面部图像生成模型确定模块。
面部图像生成模型确定模块,设置为获取多个基础训练样本;其中,每个基础训练样本为包括目标主体面部信息相对应的高斯噪声;基于待训练图像生成器对所述高斯噪声进行处理,生成待判别图像;基于判别器对所述待判别图像和采集的真实面部图像判别处理,确定基准损失值;基于所述基准损失值对待训练图像生成器中的模型参数进行修正;将所述待训练图像生成器中的损失函数收敛作为训练目标,得到所述面部图像生成模型。
在上述各技术方案的基础上,第一待训练样本生成模型训练模块302包括第一风格类型训练样本获取单元、第一实际输出图像确定单元、第一修正单元以及第一目标样本生成模型确定单元。
第一风格类型训练样本获取单元,设置为获取第一风格类型下的多个训练样本;其中,每个训练样本中包括第一风格类型下的第一面部图像。
第一实际输出图像确定单元,设置为将与所述第一面部图像相对应的高斯噪声输入至所述第一待训练样本生成模型中,得到第一实际输出图像。
第一修正单元,设置为基于判别器对所述第一实际输出图像和相应的第一面部图像进行判别处理,确定损失值,以基于所述损失值对所述第一待训练样 本生成模型中的模型参数进行修正。
第一目标样本生成模型确定单元,设置为将所述第一待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第一目标样本生成模型。
在上述各技术方案的基础上,第二待训练样本生成模型训练模块303包括第二风格类型训练样本获取单元、第二实际输出图像确定单元、第二修正单元以及第二目标样本生成模型确定单元。
第二风格类型训练样本获取单元,设置为获取第二风格类型下的多个训练样本;其中,每个训练样本中包括第二风格类型下的第二面部图像。
第二实际输出图像确定单元,设置为将与所述第二面部图像相对应的高斯噪声输入至所述第二待训练样本生成模型中,得到第二实际输出图像。
第二修正单元,设置为基于判别器对所述第二实际输出图像和相应的第二面部图像进行判别处理,确定损失值,以基于所述损失值对所述第二待训练样本生成模型中的模型参数进行修正。
第二目标样本生成模型确定单元,设置为将所述第二待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第二目标样本生成模型。
在上述各技术方案的基础上,目标风格数据生成模型确定模块304包括拟合参数获取单元、目标模型参数确定单元以及目标风格数据生成模型确定单元。
拟合参数获取单元,设置为获取预先设置的拟合参数。
目标模型参数确定单元,设置为基于所述拟合参数对所述第一目标样本生成模型和所述第二目标样本生成模型中的待拟合模型参数进行拟合处理,得到目标模型参数。
目标风格数据生成模型确定单元,设置为基于所述目标模型参数,确定目标风格数据生成模型。
在上述各技术方案的基础上,生成风格化图像的装置还包括目标风格数据生成模型更新模块。
目标风格数据生成模型更新模块,设置为将高斯噪声输入至所述目标风格 数据生成模型中,得到融合所述第一风格类型和所述第二风格类型的待修正风格化图像;通过对所述待修正风格化图像修正处理,确定目标风格图像,并将所述目标风格图像作为目标训练样本,以基于所述目标训练样本对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的所述目标风格数据生成模型。
在上述各技术方案的基础上,生成风格化图像的装置还包括模型参数修正模块。
模型参数修正模块,设置为将高斯噪声输入至所述目标风格数据生成模型中,输出待修正风格化图像;基于判别器对所述待修正风格化图像和目标风格图像进行处理,确定损失值;基于所述损失值对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的目标风格数据生成模型。
在上述各技术方案的基础上,生成风格化图像的装置还包括风格化处理模块。
风格化处理模块,设置为基于所述面部图像生成模型和多个面部图像,对待训练编译模型进行训练,得到目标编译模型;其中,所述目标编译模型设置为将输入的面部图像处理为相应的高斯噪声;基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型,以基于所述特效图像生成模型对获取到的待处理面部图像进行风格化处理,得到融合第一风格类型和第二风格类型的目标特效图。
在上述各技术方案的基础上,生成风格化图像的装置还包括目标编译模型确定模块。
目标编译模型确定模块,设置为获取多个第一训练图像;针对每个第一待训练图像,将当前第一训练图像输入至待训练编译模型中,得到与当前第一训练图像相对应的待使用高斯噪声;将所述待使用高斯噪声输入至所述面部图像生成模型中,得到第三实际输出图像;基于所述第三实际输出图像和所述当前第一训练图像,确定图像损失值;基于所述图像损失值对所述待训练编译模型 中的模型参数进行修正,并将所述待训练编译模型中的损失函数收敛作为训练目标,得到目标编译模型,以基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型。
在上述各技术方案的基础上,生成风格化图像的装置还包括模型部署模块。
模型部署模块,设置为将所述特效图像生成模型部署在移动终端中,以在检测到特效显示控件时,将采集到的待处理图像处理为融合第一风格类型和第二风格类型的目标特效图像。
在上述各技术方案的基础上,所述第一风格类型为地域风格图像,所述第二风格类型为古风材质图像。
本实施例所提供的技术方案,先获取面部图像生成模型的待迁移模型参数,从而基于这些参数构建出第一待训练样本生成模型以及第二待训练样本生成模型,基于两种风格类型的训练样本对相应的待训练样本生成模型进行训练,当模型训练完毕后,获取两种目标样本生成模型的待拟合模型参数,从而基于待拟合模型参数确定出目标风格数据生成模型,以基于目标风格数据生成模型生成融合两种风格类型的风格化图像,在无需大量融合有两种风格类型的训练样本的情况下,即可高效地构建出目标风格数据生成模型,不仅使用户可以利用该模型生成目标风格类型的图像,同时也降低了模型构建过程中所消耗的成本。
本公开实施例所提供的生成风格化图像的装置可执行本公开任意实施例所提供的生成风格化图像的方法,具备执行方法相应的功能模块。
值得注意的是,上述装置所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
实施例四
图6为本公开实施例四所提供的一种电子设备的结构示意图。下面参考图6, 其示出了适于用来实现本公开实施例的电子设备(例如图6中的终端设备或服务器)400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、PAD(平板电脑)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(Read-Only Memory,ROM)402中的程序或者从存储装置406加载到随机访问存储器(Random Access Memory,RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(Input/Output,I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的编辑装置406;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置406被安装,或者从ROM 402 被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的生成风格化图像的方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述实施例。
实施例五
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的生成风格化图像的方法。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM(Erasable Programmable Read-Only Memory)或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可 读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有至少一个程序,当上述至少一个程序被该电子设备执行时,使得该电子设备:
获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型;
基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型;
基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在 远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地 使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的至少一个实施例,【示例一】提供了一种生成风格化图像的方法,该方法包括:
获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型;
基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型;
基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
根据本公开的至少一个实施例,【示例二】提供了一种生成风格化图像的方法,还包括:
可选的,获取多个基础训练样本;其中,每个基础训练样本为包括目标主体面部信息相对应的高斯噪声;
基于待训练图像生成器对所述高斯噪声进行处理,生成待判别图像;
基于判别器对所述待判别图像和采集的真实面部图像判别处理,确定基准损失值;
基于所述基准损失值对待训练图像生成器中的模型参数进行修正;
将所述待训练图像生成器中的损失函数收敛作为训练目标,得到所述面部图像生成模型。
根据本公开的至少一个实施例,【示例三】提供了一种生成风格化图像的方法,还包括:
可选的,获取第一风格类型下的多个训练样本;其中,每个训练样本中包括第一风格类型下的第一面部图像;
将与所述第一面部图像相对应的高斯噪声输入至所述第一待训练样本生成模型中,得到第一实际输出图像;
基于判别器对所述第一实际输出图像和相应的第一面部图像进行判别处理,确定损失值,以基于所述损失值对所述第一待训练样本生成模型中的模型参数进行修正;
将所述第一待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第一目标样本生成模型。
根据本公开的至少一个实施例,【示例四】提供了一种生成风格化图像的方法,还包括:
可选的,获取第二风格类型下的多个训练样本;其中,每个训练样本中包括第二风格类型下的第二面部图像;
将与所述第二面部图像相对应的高斯噪声输入至所述第二待训练样本生成模型中,得到第二实际输出图像;
基于判别器对所述第二实际输出图像和相应的第二面部图像进行判别处理,确定损失值,以基于所述损失值对所述第二待训练样本生成模型中的模型参数进行修正;
将所述第二待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第二目标样本生成模型。
根据本公开的至少一个实施例,【示例五】提供了一种生成风格化图像的方 法,还包括:
可选的,获取预先设置的拟合参数;
基于所述拟合参数对所述第一目标样本生成模型和所述第二目标样本生成模型中的待拟合模型参数进行拟合处理,得到目标模型参数;
基于所述目标模型参数,确定目标风格数据生成模型。
根据本公开的至少一个实施例,【示例六】提供了一种生成风格化图像的方法,还包括:
可选的,将高斯噪声输入至所述目标风格数据生成模型中,得到融合所述第一风格类型和所述第二风格类型的待修正风格化图像;
通过对所述待修正风格化图像修正处理,确定目标风格图像,并将所述目标风格图像作为目标训练样本,以基于所述目标训练样本对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的所述目标风格数据生成模型。
根据本公开的至少一个实施例,【示例七】提供了一种生成风格化图像的方法,还包括:
可选的,将高斯噪声输入至所述目标风格数据生成模型中,输出待修正风格化图像;
基于判别器对所述待修正风格化图像和目标风格图像进行处理,确定损失值;
基于所述损失值对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的目标风格数据生成模型。
根据本公开的至少一个实施例,【示例八】提供了一种生成风格化图像的方法,还包括:
可选的,基于所述面部图像生成模型和多个面部图像,对待训练编译模型进行训练,得到目标编译模型;其中,所述目标编译模型设置为将输入的面部图像处理为相应的高斯噪声;
基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成 模型,以基于所述特效图像生成模型对获取到的待处理面部图像进行风格化处理,得到融合第一风格类型和第二风格类型的目标特效图。
根据本公开的至少一个实施例,【示例九】提供了一种生成风格化图像的方法,还包括:
可选的,获取多个第一训练图像;
针对每个第一待训练图像,将当前第一训练图像输入至待训练编译模型中,得到与当前第一训练图像相对应的待使用高斯噪声;
将所述待使用高斯噪声输入至所述面部图像生成模型中,得到第三实际输出图像;
基于所述第三实际输出图像和所述当前第一训练图像,确定图像损失值;
基于所述图像损失值对所述待训练编译模型中的模型参数进行修正,并将所述待训练编译模型中的损失函数收敛作为训练目标,得到目标编译模型,以基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型。
根据本公开的至少一个实施例,【示例十】提供了一种生成风格化图像的方法,还包括:
可选的,将所述特效图像生成模型部署在移动终端中,以在检测到特效显示控件时,将采集到的待处理图像处理为融合第一风格类型和第二风格类型的目标特效图像。
根据本公开的至少一个实施例,【示例十一】提供了一种生成风格化图像的方法,还包括:
可选的,所述第一风格类型为地域风格图像,所述第二风格类型为古风材质图像。
根据本公开的至少一个实施例,【示例十二】提供了一种生成风格化图像的装置,包括:
待迁移模型参数获取模块,设置为获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训 练样本生成模型;
第一待训练样本生成模型训练模块,设置为基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
第二待训练样本生成模型训练模块,设置为基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型;
目标风格数据生成模型确定模块,设置为基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (24)

  1. 一种生成风格化图像的方法,包括:
    获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型;
    基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
    基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型;
    基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
  2. 根据权利要求1所述的方法,其中,待训练的面部图像生成模型包括待训练图像生成器和判别器,所述在所述获取面部图像生成模型的待迁移模型参数之前,还包括:
    获取多个基础训练样本;其中,每个基础训练样本包括目标主体面部信息相对应的高斯噪声;
    基于待训练图像生成器对所述高斯噪声进行处理,生成待判别图像;
    基于判别器对所述待判别图像和采集的真实面部图像判别处理,确定基准损失值;
    基于所述基准损失值对待训练图像生成器中的模型参数进行修正;
    将所述待训练图像生成器中的损失函数收敛作为训练目标,得到所述面部图像生成模型。
  3. 根据权利要求1所述的方法,其中,所述基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型,包括:
    获取第一风格类型下的多个训练样本;其中,每个训练样本中包括第一风格类型下的第一面部图像;
    将与所述第一面部图像相对应的高斯噪声输入至所述第一待训练样本生成 模型中,得到第一实际输出图像;
    基于判别器对所述第一实际输出图像和相应的第一面部图像进行判别处理,确定损失值,以基于所述损失值对所述第一待训练样本生成模型中的模型参数进行修正;
    将所述第一待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第一目标样本生成模型。
  4. 根据权利要求1所述的方法,其中,所述基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型,包括:
    获取第二风格类型下的多个训练样本;其中,每个训练样本中包括第二风格类型下的第二面部图像;
    将与所述第二面部图像相对应的高斯噪声输入至所述第二待训练样本生成模型中,得到第二实际输出图像;
    基于判别器对所述第二实际输出图像和相应的第二面部图像进行判别处理,确定损失值,以基于所述损失值对所述第二待训练样本生成模型中的模型参数进行修正;
    将所述第二待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第二目标样本生成模型。
  5. 根据权利要求1所述的方法,其中,所述基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,包括:
    获取预先设置的拟合参数;
    基于所述拟合参数对所述第一目标样本生成模型和所述第二目标样本生成模型中的待拟合模型参数进行拟合处理,得到目标模型参数;
    基于所述目标模型参数,确定目标风格数据生成模型。
  6. 根据权利要求1所述的方法,在得到所述目标风格数据生成模型之后,还包括:
    将高斯噪声输入至所述目标风格数据生成模型中,得到融合所述第一风格类型和所述第二风格类型的待修正风格化图像;
    通过对所述待修正风格化图像修正处理,确定目标风格图像,并将所述目标风格图像作为目标训练样本,以基于所述目标训练样本对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的所述目标风格数据生成模型。
  7. 根据权利要求6所述的方法,其中,基于所述目标训练样本对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的所述目标风格数据生成模型,包括:
    将高斯噪声输入至所述目标风格数据生成模型中,输出待修正风格化图像;
    基于判别器对所述待修正风格化图像和目标风格图像进行处理,确定损失值;
    基于所述损失值对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的目标风格数据生成模型。
  8. 根据权利要求2所述的方法,还包括:
    基于所述面部图像生成模型和多个面部图像,对待训练编译模型进行训练,得到目标编译模型;其中,所述目标编译模型设置为将输入的面部图像处理为相应的高斯噪声;
    基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型,以基于所述特效图像生成模型对获取到的待处理面部图像进行风格化处理,得到融合第一风格类型和第二风格类型的目标特效图。
  9. 根据权利要求8所述的方法,其中,所述基于所述面部图像生成模型和多个面部图像,对待训练编译模型进行训练,得到目标编译模型,包括:
    获取多个第一训练图像;
    针对每个第一待训练图像,将当前第一训练图像输入至待训练编译模型中,得到与当前第一训练图像相对应的待使用高斯噪声;
    将所述待使用高斯噪声输入至所述面部图像生成模型中,得到第三实际输 出图像;
    基于所述第三实际输出图像和所述当前第一训练图像,确定图像损失值;
    基于所述图像损失值对所述待训练编译模型中的模型参数进行修正,并将所述待训练编译模型中的损失函数收敛作为训练目标,得到目标编译模型,以基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型。
  10. 根据权利要求8所述的方法,还包括:
    将所述特效图像生成模型部署在移动终端中,以响应于检测到特效显示控件,将采集到的待处理图像处理为融合第一风格类型和第二风格类型的目标特效图像。
  11. 根据权利要求1-10中任一所述的方法,其中,所述第一风格类型为地域风格图像,所述第二风格类型为古风材质图像。
  12. 一种生成风格化图像的装置,包括:
    待迁移模型参数获取模块,设置为获取面部图像生成模型的待迁移模型参数,以基于所述待迁移模型参数构建第一待训练样本生成模型,以及第二待训练样本生成模型;
    第一待训练样本生成模型训练模块,设置为基于第一风格类型的训练样本对所述第一待训练样本生成模型进行训练,得到第一目标样本生成模型;
    第二待训练样本生成模型训练模块,设置为基于第二风格类型的训练样本对所述第二待训练样本生成模型进行训练,得到第二目标样本生成模型;
    目标风格数据生成模型确定模块,设置为基于所述第一目标样本生成模型和所述第二目标样本生成模型的待拟合模型参数,确定目标风格数据生成模型,以基于所述目标风格生成模型生成融合所述第一风格类型和所述第二风格类型的风格化图像。
  13. 根据权利要求12所述的装置,还包括面部图像生成模型确定模块,其中,待训练的面部图像生成模型包括待训练图像生成器和判别器;
    面部图像生成模型确定模块,设置为获取多个基础训练样本;其中,每个 基础训练样本为包括目标主体面部信息相对应的高斯噪声;基于待训练图像生成器对所述高斯噪声进行处理,生成待判别图像;基于判别器对所述待判别图像和采集的真实面部图像判别处理,确定基准损失值;基于所述基准损失值对待训练图像生成器中的模型参数进行修正;将所述待训练图像生成器中的损失函数收敛作为训练目标,得到所述面部图像生成模型。
  14. 根据权利要求12所述的装置,其中,所述第一待训练样本生成模型训练模块包括第一风格类型训练样本获取单元、第一实际输出图像确定单元、第一修正单元以及第一目标样本生成模型确定单元;
    所述第一风格类型训练样本获取单元,设置为获取第一风格类型下的多个训练样本;其中,每个训练样本中包括第一风格类型下的第一面部图像;
    所述第一实际输出图像确定单元,设置为将与所述第一面部图像相对应的高斯噪声输入至所述第一待训练样本生成模型中,得到第一实际输出图像;
    所述第一修正单元,设置为基于判别器对所述第一实际输出图像和相应的第一面部图像进行判别处理,确定损失值,以基于所述损失值对所述第一待训练样本生成模型中的模型参数进行修正;
    所述第一目标样本生成模型确定单元,设置为将所述第一待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第一目标样本生成模型。
  15. 根据权利要求12所述的装置,其中,所述第二待训练样本生成模型训练模块包括第二风格类型训练样本获取单元、第二实际输出图像确定单元、第二修正单元以及第二目标样本生成模型确定单元;
    所述第二风格类型训练样本获取单元,设置为获取第二风格类型下的多个训练样本;其中,每个训练样本中包括第二风格类型下的第二面部图像;
    所述第二实际输出图像确定单元,设置为将与所述第二面部图像相对应的高斯噪声输入至所述第二待训练样本生成模型中,得到第二实际输出图像;
    所述第二修正单元,设置为基于判别器对所述第二实际输出图像和相应的第二面部图像进行判别处理,确定损失值,以基于所述损失值对所述第二待训 练样本生成模型中的模型参数进行修正;
    所述第二目标样本生成模型确定单元,设置为将所述第二待训练图像生成模型中的损失函数收敛作为训练目标,得到所述第二目标样本生成模型。
  16. 根据权利要求12所述的装置,其中,所述目标风格数据生成模型确定模块包括拟合参数获取单元、目标模型参数确定单元以及目标风格数据生成模型确定单元;
    所述拟合参数获取单元,设置为获取预先设置的拟合参数;
    所述目标模型参数确定单元,设置为基于所述拟合参数对所述第一目标样本生成模型和所述第二目标样本生成模型中的待拟合模型参数进行拟合处理,得到目标模型参数;
    所述目标风格数据生成模型确定单元,设置为基于所述目标模型参数,确定目标风格数据生成模型。
  17. 根据权利要求12所述的装置,还包括目标风格数据生成模型更新模块;
    所述目标风格数据生成模型更新模块,设置为将高斯噪声输入至所述目标风格数据生成模型中,得到融合所述第一风格类型和所述第二风格类型的待修正风格化图像;通过对所述待修正风格化图像修正处理,确定目标风格图像,并将所述目标风格图像作为目标训练样本,以基于所述目标训练样本对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的所述目标风格数据生成模型。
  18. 根据权利要求17所述的装置,还包括模型参数修正模块;
    所述模型参数修正模块,设置为将高斯噪声输入至所述目标风格数据生成模型中,输出待修正风格化图像;基于判别器对所述待修正风格化图像和目标风格图像进行处理,确定损失值;基于所述损失值对所述目标风格数据生成模型中的模型参数进行修正,得到更新后的目标风格数据生成模型。
  19. 根据权利要求13所述的装置,还包括风格化处理模块;
    所述风格化处理模块,设置为基于所述面部图像生成模型和多个面部图像, 对待训练编译模型进行训练,得到目标编译模型;其中,所述目标编译模型设置为将输入的面部图像处理为相应的高斯噪声;基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型,以基于所述特效图像生成模型对获取到的待处理面部图像进行风格化处理,得到融合第一风格类型和第二风格类型的目标特效图。
  20. 根据权利要求19所述的装置,还包括目标编译模型确定模块;
    所述目标编译模型确定模块,设置为获取多个第一训练图像;针对每个第一待训练图像,将当前第一训练图像输入至待训练编译模型中,得到与当前第一训练图像相对应的待使用高斯噪声;将所述待使用高斯噪声输入至所述面部图像生成模型中,得到第三实际输出图像;基于所述第三实际输出图像和所述当前第一训练图像,确定图像损失值;基于所述图像损失值对所述待训练编译模型中的模型参数进行修正,并将所述待训练编译模型中的损失函数收敛作为训练目标,得到目标编译模型,以基于所述目标编译模型和所述目标风格数据生成模型,确定特效图像生成模型。
  21. 根据权利要求19所述的装置,还包括模型部署模块;
    所述模型部署模块,设置为将所述特效图像生成模型部署在移动终端中,以在检测到特效显示控件时,将采集到的待处理图像处理为融合第一风格类型和第二风格类型的目标特效图像。
  22. 根据权利要求12-21任一所述的装置,其中,所述第一风格类型为地域风格图像,所述第二风格类型为古风材质图像。
  23. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-11中任一所述的生成风格化图像的方法。
  24. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由 计算机处理器执行时用于实现如权利要求1-11中任一所述的生成风格化图像的方法。
PCT/CN2023/072067 2022-01-20 2023-01-13 生成风格化图像的方法、装置、电子设备及存储介质 WO2023138498A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210067042.1A CN114429418A (zh) 2022-01-20 2022-01-20 生成风格化图像的方法、装置、电子设备及存储介质
CN202210067042.1 2022-01-20

Publications (1)

Publication Number Publication Date
WO2023138498A1 true WO2023138498A1 (zh) 2023-07-27

Family

ID=81312535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072067 WO2023138498A1 (zh) 2022-01-20 2023-01-13 生成风格化图像的方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114429418A (zh)
WO (1) WO2023138498A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761831A (zh) * 2020-11-13 2021-12-07 北京沃东天骏信息技术有限公司 风格书法生成方法、装置、设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429418A (zh) * 2022-01-20 2022-05-03 北京字跳网络技术有限公司 生成风格化图像的方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402112A (zh) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及计算机可读介质
CN111784566A (zh) * 2020-07-01 2020-10-16 北京字节跳动网络技术有限公司 图像处理方法、迁移模型训练方法、装置、介质及设备
CN112150489A (zh) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 图像的风格转换方法、装置、电子设备及存储介质
CN114429418A (zh) * 2022-01-20 2022-05-03 北京字跳网络技术有限公司 生成风格化图像的方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402112A (zh) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及计算机可读介质
CN111784566A (zh) * 2020-07-01 2020-10-16 北京字节跳动网络技术有限公司 图像处理方法、迁移模型训练方法、装置、介质及设备
CN112150489A (zh) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 图像的风格转换方法、装置、电子设备及存储介质
CN114429418A (zh) * 2022-01-20 2022-05-03 北京字跳网络技术有限公司 生成风格化图像的方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761831A (zh) * 2020-11-13 2021-12-07 北京沃东天骏信息技术有限公司 风格书法生成方法、装置、设备及存储介质
CN113761831B (zh) * 2020-11-13 2024-05-21 北京沃东天骏信息技术有限公司 风格书法生成方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114429418A (zh) 2022-05-03

Similar Documents

Publication Publication Date Title
WO2023138498A1 (zh) 生成风格化图像的方法、装置、电子设备及存储介质
CN111476871B (zh) 用于生成视频的方法和装置
CN110827378A (zh) 虚拟形象的生成方法、装置、终端及存储介质
WO2023125374A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2023160513A1 (zh) 3d素材的渲染方法、装置、设备及存储介质
WO2023061169A1 (zh) 图像风格迁移和模型训练方法、装置、设备和介质
CN111968647B (zh) 语音识别方法、装置、介质及电子设备
WO2023138560A1 (zh) 风格化图像生成方法、装置、电子设备及存储介质
WO2023232056A1 (zh) 图像处理方法、装置、存储介质及电子设备
US20230112005A1 (en) Image special effect configuration method, image recognition method, apparatus and electronic device
CN114863214A (zh) 图像生成模型训练、图像生成方法、装置、介质及设备
CN114004905B (zh) 人物风格形象图的生成方法、装置、设备及存储介质
CN110097004B (zh) 面部表情识别方法和装置
WO2024037556A1 (zh) 图像处理方法、装置、设备及存储介质
WO2023202543A1 (zh) 文字处理方法、装置、电子设备及存储介质
WO2023207779A1 (zh) 图像处理方法、装置、设备及介质
US11962929B2 (en) Method, apparatus, and device for configuring video special effect, and storage medium
CN115049537A (zh) 图像处理方法、装置、电子设备及存储介质
CN116596748A (zh) 图像风格化处理方法、装置、设备、存储介质和程序产品
WO2022083213A1 (zh) 图像生成方法、装置、设备和计算机可读介质
CN110717467A (zh) 头部姿势的估计方法、装置、设备及存储介质
CN114647472B (zh) 图片处理方法、装置、设备、存储介质和程序产品
CN115937338B (zh) 图像处理方法、装置、设备及介质
CN112781581B (zh) 应用于扫地机的移动至儿童推车路径生成方法、装置
CN116246014B (zh) 一种形象生成方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23742813

Country of ref document: EP

Kind code of ref document: A1