WO2024087946A1 - 图像编辑方法、装置、计算机设备和存储介质 - Google Patents

图像编辑方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2024087946A1
WO2024087946A1 PCT/CN2023/119716 CN2023119716W WO2024087946A1 WO 2024087946 A1 WO2024087946 A1 WO 2024087946A1 CN 2023119716 W CN2023119716 W CN 2023119716W WO 2024087946 A1 WO2024087946 A1 WO 2024087946A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
initial
target
generation model
feature
Prior art date
Application number
PCT/CN2023/119716
Other languages
English (en)
French (fr)
Inventor
陈浩锟
申瑞雪
王锐
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024087946A1 publication Critical patent/WO2024087946A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the field of image processing technology, and in particular to an image editing method, apparatus, computer equipment, storage medium and computer program product.
  • Image editing technology targeting image attributes is used to edit local attributes of an image on the original image to obtain a new image.
  • face attribute editing refers to editing a single or at least two attributes of a face on the original face image to generate a new face image, that is, to modify local attributes without changing other attributes of the original face image.
  • image editing technology for image attributes usually uses image editing software such as PhotoShop to manually and accurately edit images. For example, using PhotoShop's fusion tool to cut the target attribute area of other faces onto the current face image.
  • image editing software such as PhotoShop to manually and accurately edit images.
  • PhotoShop's fusion tool to cut the target attribute area of other faces onto the current face image.
  • this method requires careful editing by professionals and has low image editing efficiency.
  • Embodiments of the present application provide an image editing method, apparatus, computer device, computer-readable storage medium, and computer program product.
  • the present application provides an image editing method, which is executed by a computer device, and the method comprises:
  • the initial image generation model is obtained by training based on a first training image set
  • the feature image generation model is obtained by training the initial image generation model based on a second training image set
  • the object images in the first training image set and the second training image set belong to target category objects
  • the object images in the second training image set contain target attributes
  • the images output by the feature image generation model have the target attributes
  • the second initial object image and the second feature object image are fused to obtain a reference object image; the reference object image is used to represent an image obtained by performing target attribute editing on the second initial object image.
  • the present application also provides an image editing device.
  • the device comprises:
  • a model acquisition module used to acquire an initial image generation model and a feature image generation model;
  • the initial image generation model is obtained by training based on a first training image set
  • the feature image generation model is obtained by training the initial image generation model based on a second training image set
  • the object images in the first training image set and the second training image set belong to target category objects
  • the object images in the second training image set contain target attributes
  • the images output by the feature image generation model have the target attributes
  • a random variable acquisition module used for acquiring a first random variable, and inputting the first random variable into the initial image generation model and the feature image generation model respectively;
  • a first image acquisition module is used to acquire the image output by the first network layer in the initial image generation model to obtain a first initial object image, and acquire the image output by the second network layer in the feature image generation model to obtain a first feature object image;
  • a mask image acquisition module configured to acquire attribute mask images corresponding to the first initial object image and the first feature object image respectively based on an image area corresponding to an attribute type to which the target attribute belongs in the object image, and obtain a target joint mask image based on each of the attribute mask images;
  • a second image acquisition module is used to acquire the image output by the target network layer matched in the initial image generation model and the feature image generation model to obtain a second initial object image and a second feature object image;
  • An image fusion module is used to fuse the second initial object image and the second feature object image based on the target joint mask image to obtain a reference object image; the reference object image is used to represent an image obtained by performing target attribute editing on the second initial object image.
  • a computer device comprises a memory and a processor, wherein the memory stores computer-readable instructions, and the processor implements the steps of the above-mentioned image editing method when executing the computer-readable instructions.
  • a computer-readable storage medium stores computer-readable instructions, which implement the steps of the above-mentioned image editing method when executed by a processor.
  • a computer program product comprises computer-readable instructions, wherein when the computer-readable instructions are executed by a processor, the steps of the above-mentioned image editing method are implemented.
  • FIG1 is a diagram showing an application environment of an image editing method according to an embodiment
  • FIG2 is a schematic diagram of a flow chart of an image editing method in one embodiment
  • FIG3 is a schematic flow chart of an image editing method in another embodiment
  • FIG4 is a schematic diagram of a flow chart of an image editing method in yet another embodiment
  • FIG5 is a schematic diagram of a process of training an image attribute editing model in one embodiment
  • FIG6 is a schematic diagram of a flow chart of an image editing method applied to a portrait picture in one embodiment
  • FIG7 is a schematic diagram of feature fusion in one embodiment
  • FIG8 is a schematic diagram showing the effect of face back hair editing in one embodiment
  • FIG9 is a schematic diagram showing the effect of face attribute editing in one embodiment
  • FIG10 is a block diagram of a structure of an image editing device in one embodiment
  • FIG11 is a structural block diagram of an image editing device in another embodiment
  • FIG12 is a diagram showing the internal structure of a computer device in one embodiment
  • FIG. 13 is a diagram showing the internal structure of a computer device in another embodiment.
  • the image editing method provided in the embodiment of the present application can be applied to the application environment shown in Figure 1.
  • the terminal 102 communicates with the server 104 through the network.
  • the data storage system can store the data that the server 104 needs to process.
  • the data storage system can be integrated on the server 104, or it can be placed on the cloud or other servers.
  • the terminal 102 can be, but is not limited to, various desktop computers, laptops, smart phones, tablet computers, Internet of Things devices and portable wearable devices.
  • the Internet of Things devices can be smart speakers, smart TVs, smart air conditioners, smart car-mounted devices, etc.
  • Portable wearable devices can be smart watches, smart bracelets, head-mounted devices, etc.
  • the server 104 can be implemented with an independent server or a server cluster consisting of multiple servers or a cloud server.
  • Both the terminal and the server can be used independently to execute the image editing method provided in the embodiments of the present application.
  • the server obtains a first random variable, and inputs the first random variable into an initial image generation model and a feature image generation model respectively.
  • the initial image generation model is obtained by training based on a first training image set
  • the feature image generation model is obtained by training the initial image generation model based on a second training image set.
  • the first training image set and the second training image set include object images corresponding to the target category object
  • the object images in the second training image set are object images containing target attributes.
  • the image output by the feature image generation model has the target attribute.
  • the server obtains the image output by the first network layer in the initial image generation model to obtain a first initial object image, obtains the image output by the second network layer in the feature image generation model to obtain a first feature object image, obtains the attribute mask images corresponding to the first initial object image and the first feature object image respectively based on the image area corresponding to the attribute type to which the target attribute belongs in the object image, and obtains the target joint mask image based on each attribute mask image.
  • the server obtains the image output by the matching target network layer in the initial image generation model and the feature image generation model to obtain a second initial object image and a second feature object image.
  • the server fuses the second initial object image and the second feature object image based on the target joint mask image to obtain a reference object image, and the reference object image is used to represent the image obtained by editing the target attribute of the second initial object image.
  • the terminal and the server can also be used together to execute the image editing method provided in the embodiments of the present application.
  • the server obtains a first random variable based on an image editing request sent by a terminal, inputs the first random variable into an initial image generation model and a feature image generation model respectively, obtains an image output by a first network layer in the initial image generation model to obtain a first initial object image, obtains an image output by a second network layer in the feature image generation model to obtain a first feature object image, obtains an image output by a target network layer matched in the initial image generation model and the feature image generation model, and obtains a second initial object image and a second feature object image.
  • the server obtains attribute mask images corresponding to the first initial object image and the first feature object image respectively based on an image area corresponding to an attribute type to which a target attribute belongs in the object image, and obtains a target joint mask image based on each attribute mask image.
  • the server fuses the second initial object image and the second feature object image based on the target joint mask image to obtain a reference object image.
  • the server sends the reference object image to the terminal.
  • the terminal can display the reference object image.
  • the initial image generation model is trained based on the object image corresponding to the target category object.
  • the initial image generation model can output the object image corresponding to the target category object.
  • the feature image generation model is trained based on the object image corresponding to the target category object and containing the target attribute.
  • the feature image generation model can output the object image corresponding to the target category object and containing the target attribute.
  • the first random variable is input into the initial image generation model and the feature image generation model respectively.
  • the object images output by the initial image generation model and the feature image generation model respectively have a certain similarity, and the object image output by the feature image generation model contains the target attribute.
  • the attribute mask image can reflect the image area corresponding to the attribute type to which the target attribute belongs in the object image
  • the target joint mask image can reflect the joint image area of the attribute type to which the target attribute belongs in the first initial object image and the first feature object image.
  • an image editing method is provided, and the method is applied to a computer device as an example, and the computer device may be a terminal or a server.
  • the method may be executed by the terminal or the server itself alone, or may be implemented through interaction between the terminal and the server.
  • the image editing method includes the following steps:
  • Step S202 obtaining an initial image generation model and a feature image generation model; the initial image generation model is obtained by training the first training image set, and the feature image generation model is obtained by training the initial image generation model based on the second training image set, the object images in the first training image set and the second training image set belong to target category objects, the object images in the second training image set contain target attributes, and the images output by the feature image generation model have target attributes.
  • the image generation model is a machine learning model for generating images.
  • the input data of the image generation model is a random variable, and the output data is an image.
  • the image generation model is usually trained based on a training image set.
  • the training image set includes object images of multiple objects belonging to the same category, so that the image generation model trained based on the training image set can be used to generate object images of objects of a specific category.
  • the image generation model for generating face images is trained based on a face image set
  • the image generation model for generating cat images is trained based on a cat image training set
  • the image generation model for generating vehicle images is trained based on a vehicle image training set.
  • the object can be an object, such as a car, a table, etc.
  • the object can also be a living body, such as an animal, a plant, etc.
  • the first training image set includes object images corresponding to the target category object, that is, the first training image set includes multiple object images, and the object images in the first training image set belong to the target category object.
  • the initial image generation model is an image generation model trained based on the first training image set, and the initial image generation model is used to generate object images belonging to the target category object.
  • the target category object is used to refer to a certain type of object.
  • the target category object is a human face
  • the first training image set includes multiple human face images
  • the initial image generation model is used to generate human face images
  • the target category object is a dog
  • the first training image set includes multiple dog images
  • the initial image generation model is used to generate dog images
  • the target category object is a cat
  • the first training image set includes multiple cat images
  • the initial image generation model is used to generate cat images
  • the target category object is a vehicle
  • the first training image set includes multiple vehicle images, and the initial image generation model is used to generate vehicle images.
  • the second training image set includes object images corresponding to the target category object and containing the target attribute, that is, the second training image set includes multiple object images, and the object images in the second training image set not only belong to the target category object but also contain the target attribute.
  • the feature image generation model is an image generation model obtained by training based on the second training image set.
  • the feature image generation model is used to generate object images belonging to the target category object and containing the target attribute, that is, the feature image generation model is used to generate object images belonging to the target category object and having specific features.
  • object attributes in the object image There are various types of object attributes in the object image.
  • a type of object attribute i.e., an attribute type
  • a type of object attribute is used to characterize a type of feature and characteristic of the object.
  • a type of object attribute is a general term for multiple specific object attributes of the same type, and there are commonalities between multiple specific object attributes of the same type. Multiple specific object attributes of the same type are used to describe the same part of the object, and different specific object attributes of the same type can make the same part have different forms.
  • a face image includes hair attributes, and the hair attributes are used to characterize and describe the hair of the object.
  • the hair attributes can include various hair attributes used to characterize different hairstyles, which can specifically be back hair attributes, short hair attributes, long hair attributes, curly hair attributes, etc.
  • the cat image includes cat leg attributes, which are used to characterize and describe the legs of the cat.
  • the cat leg attributes may include various leg attributes used to characterize different legs of the cat, specifically, long legs attributes, short legs attributes, etc.
  • the target attribute refers to a specific object attribute, such as a back head attribute and a long legs attribute.
  • the target attribute may also refer to at least two specific object attributes of different attribute types, such as a back head attribute + a beard attribute.
  • the first training image set specifically includes various vehicle images, and the initial image generation model is used to generate vehicle images.
  • the target attribute is a sunroof attribute
  • the second training image set specifically includes vehicle images with sunroofs on the top of the vehicle, and the feature image generation model is used to generate vehicle images with sunroofs on the top of the vehicle.
  • the feature image generation model is obtained by further training based on the second training image set on the basis of the initial image generation model. That is, the feature image generation model is obtained by fine-tuning the initial image generation model based on the second training image set. If the input data of the feature image generation model is consistent with that of the initial image generation model, then the feature image generation model is obtained by further training based on the second training image set.
  • the images output by the image generation model and the initial image generation model have certain similarities. Compared with the images output by the initial image generation model, the main difference between the images output by the feature image generation model and the images output by the feature image generation model is that the images output by the feature image generation model have target attributes.
  • the initial image generation model is used to generate face images
  • the feature image generation model is used to generate face images with back hair features.
  • the same random variable is input into the feature image generation model and the initial image generation model respectively.
  • the face images output by the feature image generation model and the initial image generation model have similar facial features, and the hairstyle in the face image output by the feature image generation model is back hair.
  • an initial image generation model and a feature image generation model can be pre-trained, and a computer device obtains the initial image generation model and the feature image generation model locally or from other devices, and target attribute editing for an image can be quickly implemented based on the initial image generation model and the feature image generation model.
  • Step S204 obtaining a first random variable, and inputting the first random variable into the initial image generation model and the feature image generation model respectively.
  • variable refers to data whose value can change.
  • Random variable refers to a randomly generated variable.
  • a random variable can be data randomly sampled from a Gaussian distribution; a random variable can be data randomly sampled from a uniform distribution; and so on.
  • the first random variable refers to a random variable.
  • the image generation model includes a plurality of network layers connected in sequence, and each network layer can output a corresponding object image. It can be understood that the image output by the end network layer of the image generation model is the image with the highest image quality among the images output by all network layers.
  • the initial object image is the object image obtained by the initial image generation model after data processing.
  • the feature object image is the object image obtained by the feature image generation model after data processing.
  • the first initial object image is the image output by the first network layer in the initial image generation model, and the first feature object image is the image output by the second network layer in the feature image generation model.
  • the feature image generation model is obtained by fine-tuning the initial image generation model, and the feature image generation model and the initial image generation model include the same model structure, but the model parameters are different, that is, the feature image generation model and the initial image generation model have the same number of network layers.
  • the first network layer and the second network layer can be network layers with the same number of layers in the two models, for example, the first network layer is the end network layer in the initial image generation model, and the second network layer is the end network layer in the feature image generation model.
  • the first network layer and the second network layer can also be network layers with inconsistent number of layers in the two models, for example, the first network layer is the seventh layer in the initial image generation model, and the second network layer is the eleventh layer in the feature image generation model.
  • the computer device obtains a first random variable, inputs the first random variable into an initial image generation model and a feature image generation model respectively, obtains an image output by a first network layer in the initial image generation model as a first initial object image, and obtains an image output by a second network layer in the feature image generation model as a first feature object image.
  • Step S208 based on the image area corresponding to the attribute type to which the target attribute belongs in the object image, obtain the attribute mask images corresponding to the first initial object image and the first feature object image respectively, and obtain the target joint mask image based on the respective attribute mask images.
  • an attribute type is used to refer to a type of object attribute.
  • the attribute type to which the target attribute belongs has a corresponding image area in the object image. For example, for a face image, if the target attribute is a slicked-back head attribute, the attribute type to which the slicked-back head attribute belongs is a hair attribute, and the image area corresponding to the hair attribute is the hair area in the image; for a vehicle image, if the target attribute is a convertible attribute, the attribute type to which the convertible attribute belongs is a roof attribute, and the image area corresponding to the roof attribute is the roof area in the image.
  • the mask is used to block a part of the image and display another part of the image.
  • the attribute mask image is used to determine the position of a certain type of object attribute in the object image.
  • the image area corresponding to the attribute type to which the target attribute belongs in the object image is the display part, and the other image areas of the object image are the occluded parts.
  • the attribute mask image includes an occluded area and a non-occluded area.
  • the non-occluded area refers to the displayed part in the object image, that is, the non-occluded area is the image area corresponding to the attribute type to which the target attribute belongs in the object image.
  • the occluded area refers to the occluded part in the object image, that is, the occluded area is the image area in the object image except for the attribute type to which the target attribute belongs.
  • the attribute mask image is a binary image
  • the blocked portion is represented by a pixel value of 0
  • the non-blocked portion is represented by a pixel value of 1.
  • the target joint mask image is obtained based on the attribute mask images corresponding to the first initial object image and the first characteristic object image, and is used to determine the joint position of a certain type of object attribute in the first initial object image and the first characteristic object image.
  • the computer device can acquire the attribute mask images corresponding to the first initial object image and the first characteristic object image respectively based on the image area corresponding to the attribute type to which the target attribute belongs in the object image. Generate the attribute mask image corresponding to the first initial object image based on the image area corresponding to the attribute type to which the target attribute belongs in the first initial object image, and generate the attribute mask image corresponding to the first characteristic object image based on the image area corresponding to the attribute type to which the target attribute belongs in the first characteristic object image.
  • the computer device obtains the target joint mask image based on the attribute mask images corresponding to the first initial object image and the first characteristic object image, for example, obtain the target joint mask image by finding the intersection of the attribute mask images corresponding to the first initial object image and the first characteristic object image; obtain the target joint mask image by finding the union of the attribute mask images corresponding to the first initial object image and the first characteristic object image; and so on.
  • the computer device can obtain the attribute mask image corresponding to the object image through a machine learning model, and can also obtain the attribute mask image corresponding to the object image through other means.
  • This application does not limit the means of obtaining the attribute mask image.
  • Step S210 obtaining the images output by the matching target network layers in the initial image generation model and the feature image generation model to obtain a second initial object image and a second feature object image.
  • the second initial object image is the image output by the network layer in the initial image generation model
  • the second characteristic object image is the image output by the network layer in the characteristic image generation model
  • the second initial object image and the second characteristic object image are the images output by the matching target network layers in the initial image generation model and the characteristic image generation model, respectively.
  • the matching target network layers in the initial image generation model and the characteristic image generation model can be network layers with the same number of layers in the initial image generation model and the characteristic image generation model.
  • the image generation model includes multiple network layers connected in sequence, each network layer can output a corresponding object image, and the target network layer can be any network layer among the network layers.
  • the network layer that outputs the first initial object image and the network layer that outputs the second initial object image can be the same network layer or different network layers.
  • the network layer that outputs the first feature object image and the network layer that outputs the second feature object image can be the same network layer or different network layers.
  • the computer device inputs the first random variable into the initial image generation model and the feature image generation model respectively, obtains the image output by the matching target network layer in the initial image generation model and the feature image generation model, and obtains the second initial object image and the second feature object image.
  • the operation of acquiring the second initial object image and the second characteristic object image may be performed synchronously with the operation of acquiring the first initial object image and the first characteristic object image, or may be performed asynchronously.
  • Step S212 based on the target joint mask image, the second initial object image and the second feature object image are fused to obtain a reference object image; the reference object image is used to represent an image obtained by performing target attribute editing on the second initial object image.
  • the target attribute editing refers to editing an image so that the edited image has the target attribute.
  • the computer device can fuse the second initial object image and the second feature object image based on the target joint mask image, and fuse the image area corresponding to the non-occluded area in the target joint mask image in the second feature object image into the second initial object image, thereby obtaining a reference object image.
  • the reference object image is used to represent the image obtained by editing the second initial object image by target attribute, that is, the reference object image is equivalent to modifying the image area corresponding to the attribute type to which the target attribute belongs to the target attribute without changing other image areas of the second initial object image.
  • the target attribute may be a slicked-back attribute
  • the reference object image is equivalent to modifying the hair area in the second initial face image to a slicked-back attribute, that is, the hairstyle of the face in the second initial face image is a slicked-back
  • the target attribute may be a sunroof attribute
  • the reference object image is equivalent to modifying the roof area in the second initial vehicle image to a sunroof attribute, that is, the roof in the second initial vehicle image has a sunroof.
  • the target attribute editing is realized by fusing the second initial object image and the second feature object image based on the target joint mask image, and the target attribute coherent area can be accurately edited without changing other incoherent areas in the second initial object image.
  • a facial image needs to be edited with a beard, it is necessary to obtain facial images containing beards as the second training image set, and fine-tune the initial facial image generation model based on the second training image set to obtain a feature facial image generation model for generating facial images containing beards; if a facial image needs to be edited with a slicked back hair, it is necessary to obtain facial images containing a slicked back hair as the second training image set, and fine-tune the initial facial image generation model based on the second training image set to obtain a feature facial image generation model for generating facial images containing a slicked back hair.
  • the corresponding second training image set can be obtained to fine-tune the initial image generation model to obtain a feature image generation model for generating an object image containing at least two attributes simultaneously.
  • a facial image needs to be edited with a beard and a slicked-back hair simultaneously
  • a facial image containing both a beard and a slicked-back hair can be obtained as the second training image set
  • the initial facial image generation model is fine-tuned to obtain a feature facial image generation model for generating a facial image containing both a beard and a slicked-back hair simultaneously.
  • the initial image generation model is obtained by training based on the object image corresponding to the target category object. After receiving the input data, the initial image generation model can output the object image corresponding to the target category object.
  • the feature image generation model is obtained by training the initial image generation model based on the object image corresponding to the target category object and containing the target attribute. After receiving the input data, the feature image generation model can output the object image corresponding to the target category object and containing the target attribute.
  • the first random variable is input into the initial image generation model and the feature image generation model respectively.
  • the object images output by the initial image generation model and the feature image generation model respectively have a certain similarity, and the object image output by the feature image generation model contains the target attribute.
  • the attribute mask image can reflect the image area corresponding to the attribute type to which the target attribute belongs in the object image
  • the target joint mask image can reflect the joint image area of the attribute type to which the target attribute belongs in the first initial object image and the first feature object image.
  • step S202 includes:
  • adversarial learning is performed on the initial image generation network and the initial image discrimination network to obtain the intermediate image generation network and the intermediate image discrimination network; based on the intermediate image generation network, the initial image generation model is obtained; based on the second training image set, adversarial learning is performed on the intermediate image generation network and the intermediate image discrimination network to obtain the target image generation network and the target image discrimination network; based on the target image generation network, the feature image generation model is obtained.
  • the image generation network and the image discrimination network are also machine learning models.
  • the input data of the image generation network is a random variable, and the output data is an image.
  • the input data of the image discrimination network is an image, and the output data is an image label used to indicate whether the input image is true or false.
  • the image generation network can also be called an image generator, and the image discrimination network can also be called an image discriminator.
  • the initial image generation network is the image generation network to be trained, and the initial image discrimination network is the image generation network to be trained.
  • the intermediate image generation network is the image generation network trained based on the first training image set, and the intermediate image discrimination network is the image generation network trained based on the first training image set.
  • the network is an image discrimination network trained based on the first training atlas.
  • the target image generation network is an image generation network fine-tuned based on the second training image set, and the target image discrimination network is an image discrimination network fine-tuned based on the second training image set.
  • Adversarial learning refers to training the desired network by letting different networks learn in a game-like manner.
  • the initial image generation network and the initial image discrimination network are subjected to adversarial learning.
  • the goal of the initial image generation network is to generate realistic images based on random variables.
  • the goal of the initial image discrimination network is to distinguish the forged images output by the initial image generation network from the real images as much as possible.
  • the initial image generation network and the initial image discrimination network learn against each other and continuously adjust parameters.
  • the ultimate goal is to make the image generation network deceive the image discrimination network as much as possible, so that the image discrimination network cannot determine whether the output image of the image generation network is real.
  • the image generation model can be a generative adversarial network, which is trained by adversarial learning.
  • the computer device can perform adversarial learning on the initial image generation network and the initial image discriminant network based on the first training image set.
  • the generator tries its best to make the output image deceive the judgment of the discriminator, while the discriminator tries its best to distinguish the forged image and the real image output by the generator, so as to confront each other, thereby training the intermediate image generation network and the intermediate image discriminant network.
  • the computer device can obtain the initial image generation model based on the intermediate image generation network. It can be understood that the discriminator is to help model training, and the generator is mainly used when the model is applied.
  • the computer device can perform adversarial learning on the intermediate image generation network and the intermediate image discrimination network based on the second training image set, the generator tries its best to make the output image deceive the judgment of the discriminator, and the discriminator tries its best to distinguish the forged image and the real image output by the generator, so as to confront each other, thereby obtaining the target image generation network and the target image discrimination network, and obtaining the feature image generation model based on the target image generation network. Since the object image in the second training image set contains the target attribute, the intermediate image generation network obtained by training based on the second training image set can generate the object image containing the target attribute.
  • the intermediate image generation network and the intermediate image discrimination network have undergone certain model training and have relatively good model parameters, when the intermediate image generation network and the intermediate image discrimination network are subjected to adversarial learning based on the second training image set, the intermediate image generation network and the intermediate image discrimination network can be fine-tuned to quickly obtain the target image generation network and the target image discrimination network, thereby quickly obtaining the feature image generation model.
  • the training method for adversarial learning of the image generation network and the image discrimination network can be to first fix the initial image generation network, train the initial image discrimination network based on the false image and the real image output by the initial image generation network, obtain an updated image discrimination network, so that the updated image discrimination network has a certain image discrimination ability, and then fix the updated image discrimination network, and train the initial image generation network based on the predicted discrimination result and the real discrimination result of the updated image discrimination network for the output image of the initial image generation network, so as to obtain an updated image generation network, so that the updated image generation network has a certain image generation ability, and use the updated image generation network and the updated image discrimination network as the initial image generation network and the initial image discrimination network for the aforementioned iterative training, and repeat this cycle until the convergence condition is met to obtain the intermediate image generation network and the intermediate image discrimination network.
  • the training method for adversarial learning of the image generation network and the image discrimination network can be various commonly used adversarial learning training
  • adversarial learning is performed on the initial image generation network and the initial image discrimination network to obtain the intermediate image generation network and the intermediate image discrimination network; the initial image generation model is obtained based on the intermediate image generation network; based on the second training image set, adversarial learning is performed on the intermediate image generation network and the intermediate image discrimination network to obtain the target image generation network and the target image discrimination network; the feature image generation model is obtained based on the target image generation network.
  • an image generation network that can generate realistic images can be quickly trained through adversarial learning, thereby quickly obtaining the initial image generation model, and an image generation network that can generate images containing specific attributes can be quickly trained through further adversarial learning based on a specific training image set, thereby quickly obtaining the feature image generation model.
  • the image editing method further includes: acquiring a first candidate image set; each first candidate image in the first candidate image set is an object image corresponding to the target category object; aligning each first candidate image based on the position of the reference object part of the target category object in the first candidate image; and aligning each first candidate image based on the aligned first candidate image. candidate images, and obtain a first training image set.
  • the image editing method also includes: obtaining a second candidate image set; each second candidate image in the second candidate image set is an object image corresponding to a target category object and containing target attributes; based on the position of a reference object part of the target category object in the second candidate image, performing image alignment on each second candidate image; and obtaining a second training image set based on each second candidate image after image alignment.
  • the first candidate image set is used to generate a first training image set, the first candidate image set includes a plurality of first candidate images, and the first candidate images are object images corresponding to the target category object.
  • the second candidate image set is used to generate a second training image set, the second candidate image set includes a plurality of second candidate images, and the second candidate images are object images corresponding to the target category object and containing target attributes.
  • Image alignment is used to align the main part of an object in an object image.
  • the main part of the object is located in the same position in the image.
  • the training set obtained through image alignment helps to improve the training speed of the model. For example, image alignment is used to align the facial features in a face image; image alignment is used to align the body of a vehicle in an image of a vehicle.
  • the reference object part of the target category object is a preset object part, which is the object part whose position needs to be aligned during image alignment.
  • the object images in the training image set may be original images corresponding to the target category objects.
  • the object images in the training image set may be images obtained through image alignment.
  • a first candidate image set consisting of original object images corresponding to the target category objects can be obtained, and based on the position of the reference object part of the target category object in the first candidate image, image alignment is performed on each first candidate image in the first candidate image set, and the first candidate images that have undergone image alignment are combined into a first training image set.
  • a second candidate image set consisting of original object images corresponding to the target category objects and containing target attributes can be obtained, and based on the position of the reference object part of the target category object in the first candidate image, image alignment is performed on each second candidate image in the second candidate image set, and the second candidate images that have undergone image alignment are combined into a second training image set.
  • the preset position corresponding to the reference object part can be used as the reference position, and the reference position is used as a benchmark to adjust the position of the reference object part in each candidate image based on the reference position, align the position of the reference object part in each candidate image to the reference position, and fix the position of the reference object part in each candidate image at the reference position, thereby completing the image alignment.
  • the preset position refers to a pre-set position.
  • any candidate image from each candidate image as a reference image align other candidate images to the reference image, use the position corresponding to the reference object part in the reference image as the reference position, and use the reference position as a benchmark to fix the position of the reference object part in other candidate images at the reference position, thereby completing the image alignment.
  • a group of face images can be used to form a first candidate image set, and each face image can be aligned using the eye position or facial feature position in the face image, and the aligned face images can be used to form a first training image set.
  • a group of face images with back-head attributes can be used to form a second candidate image set, and each face image can be aligned using the eye position or facial feature position in the face image, and the aligned face images can be used to form a second training image set.
  • Affine transformation algorithms can be used to deform faces so that the facial features are fixed to a specific position.
  • the first training image set and the second training image set are obtained through image alignment, which can effectively reduce the training difficulty of the model and improve the training efficiency.
  • obtaining a target joint mask image based on each attribute mask image includes:
  • the attribute mask image corresponding to the first initial object image is used as the first mask image
  • the attribute mask image corresponding to the first feature object image is used as the second mask image
  • the first initial object image and the first feature object image are images of different sizes
  • the first mask image and the second mask image are images of different sizes
  • the first mask image and the second mask image are size-aligned
  • a target joint mask image is obtained based on the size-aligned first mask image and the second mask image.
  • the first initial object image and the first characteristic object image may correspond to different sizes.
  • the first random variable is input into the initial image generation model and the characteristic image generation model
  • the first initial object image is the image output by the sixth layer in the initial image generation model
  • the first characteristic object image is the image output by the eighth layer in the characteristic image generation model.
  • the first mask image is the attribute mask image corresponding to the first initial object image
  • the second mask image is the attribute mask image corresponding to the first characteristic object image. If the first initial object image and the first characteristic object image are object images of different sizes, then the first mask image and the second mask image are also attribute mask images of different sizes.
  • Size alignment is used to unify the sizes of different images to the same size for data processing. For example, to size align two images, one of the images can be enlarged or reduced to make its size consistent with the size of the other image. It can be understood that when the image is enlarged or reduced during size alignment, the image content of the image is enlarged or reduced synchronously. For example, when a face image is enlarged during size alignment, the face presented in the face image is also enlarged.
  • the first mask image and the second mask image are attribute mask images of different sizes. Then when generating the target joint mask image, the first mask image and the second mask image can be first converted into attribute mask images of the same size and then fused, so as to obtain the final target joint mask image. For example, the first mask image can be resized to obtain a transformed mask image of the same size as the second mask image, and the transformed mask image and the second mask image are fused to obtain the target joint mask image.
  • the attribute mask images corresponding to the first initial object image and the first characteristic object image can be directly fused to obtain the target joint mask image without size transformation.
  • the first mask image corresponding to the first initial object image and the second mask image corresponding to the first feature object image are images of different sizes
  • the first mask image and the second mask image are first size-aligned, and then the size-aligned first mask image and the second mask image are fused to obtain the target joint mask image.
  • size alignment is performed before fusion, which can reduce the difficulty of fusion and quickly obtain an accurate target joint mask image.
  • the second initial object image and the second feature object image are fused to obtain a reference object image, including:
  • an image area matching the occluded area in the target joint mask image is obtained as the first image area; the second initial object image and the second feature object image are fused to obtain a fused object image; from the fused object image, an image area matching the non-occluded area in the target joint mask image is obtained as the second image area; based on the first image area and the second image area, a reference object image is obtained.
  • the image area in the second initial object image that matches the occluded area in the target joint mask image can be used as the first image area, and the first image area is used to retain the image information of the area incoherent with the target attribute in the second initial object image.
  • the second initial object image and the second characteristic object image are fused, and the image area that matches the non-occluded area in the target joint mask image is obtained from the fused object image as the second image area.
  • the second image area is the area in the object image that is coherent with the target attribute, and is equivalent to having the target attribute to a certain extent.
  • the reference object image is obtained by combining the first image area and the second image area.
  • the second image area determined from the fused object image not only has the target attribute, but also can obtain a reference object image with a more natural connection between image areas after being combined with the first image area.
  • an image area matching the occluded area in the target joint mask image is obtained from the second initial object image as the first image area, and the first image area is used to ensure that the image information of the area incoherent with the target attribute in the second initial object image remains unchanged when the target attribute is edited.
  • the second initial object image and the second feature object image are fused to obtain a fused object image
  • an image area matching the non-occluded area in the target joint mask image is obtained from the fused object image as the second image area
  • the second image area is used to ensure that the image information of the area coherent with the target attribute in the second initial object image has the target attribute when the target attribute is edited.
  • an image area that matches the occluded area in the target joint mask image is obtained from the second initial object image as the first image area, including: resizing the target joint mask image to obtain a transformed joint mask image of the same size as the second initial object image; performing inverse masking processing on the transformed joint mask image to obtain an inverse joint mask image; and fusing the second initial object image and the inverse joint mask image to obtain the first image area.
  • Acquiring an image region matching a non-blocked region in a target joint mask image from the fused object image as a second image region includes: fusing the fused object image and transforming the joint mask image to obtain the second image region.
  • the inverse masking process is used to convert the original occluded area in the mask image into the non-occluded area, and convert the original non-occluded area in the mask image into the occluded area.
  • the occluded part in the first mask image is represented by a pixel value of 0, and the non-occluded part is represented by a pixel value of 1.
  • the original occluded part in the first mask image is represented by a pixel value of 1
  • the original non-occluded part is represented by a pixel value of 0.
  • the target joint mask image may be resized first to obtain a transformed joint mask image of the same size as the second initial object image.
  • the transformed joint mask image is first subjected to an inverse mask process to obtain an inverse joint mask image, and then the second initial object image and the inverse joint mask image are fused to obtain the first image area.
  • the pixel values of the pixels in the second initial object image and the inverse joint mask image that are in the same position are multiplied to obtain the first image area.
  • the first image area at this time is equivalent to an image of the same size as the second initial object image, and the area in the first image area that is incoherent with the target attribute is a non-occluded part, and the area that is coherent with the target attribute is an occluded part.
  • the first image area is equivalent to an image in which the hair part in the second initial object image is blocked and the non-hair part is displayed.
  • the first fused image and the transformed joint mask image are fused to obtain the second image area.
  • the second image area at this time is equivalent to an image with the same size as the second initial object image, and the area in the second image area that is coherent with the target attribute is the non-occluded part, and the area that is incoherent with the target attribute is the occluded part.
  • the second image area is equivalent to an image in which the hair part in the fused object image is displayed and the non-hair part is occluded.
  • the reference object image When the reference object image is obtained based on the first image region and the second image region, the reference object image can be obtained by adding and fusing the first image region and the second image region.
  • the target joint mask image is resized to obtain a transformed joint mask image of the same size as the second initial object image.
  • Data processing of images of the same size can reduce the difficulty of data processing and improve data processing efficiency.
  • the transformed joint mask image is subjected to inverse mask processing to obtain an inverse joint mask image.
  • the inverse mask processing can clarify the area in the second initial object image that needs to remain unchanged when the target attributes are edited.
  • the first image area can be quickly obtained by fusing the second initial object image and the inverse joint mask image.
  • the image editing method further includes:
  • Step S302 replace the second initial object image with the reference object image and input it into the backward network layer of the target network layer in the initial image generation model, and obtain the image output by the ending network layer as the target object image; wherein the target object image is used to represent the image obtained by editing the original object image by target attributes, and the original object image is the image directly output by the ending network layer after the first random variable is input into the initial image generation model.
  • the backward network layer refers to the network layer connected after the target network layer.
  • the ending network layer refers to the last network layer in the model.
  • the original object image refers to the image directly output by the final network layer of the initial image generation model after the first random variable is input into the initial image generation model. That is, after the first random variable is input into the initial image generation model, the model is not interfered with, and the image output by the final network layer of the initial image generation model is used as the original object image.
  • the target object image is obtained by interfering with the model after the first random variable is input into the initial image generation model, obtaining the second initial object image output by the target network layer in the initial image generation model for processing to obtain the reference object image, replacing the second initial object image with the reference object image and inputting the backward network layer of the target network layer in the initial image generation model to continue data processing, and finally using the image output by the final network layer of the initial image generation model as the target object image.
  • the target object image is used to represent the image obtained by editing the original object image by target attributes.
  • the computer device in order to improve the quality of target attribute editing, after the computer device obtains the reference object image, it can replace the second initial object image with the reference object image and input it into the backward network layer of the target network layer in the initial image generation model, continue to perform forward calculations in the model, and finally use the image output by the ending network layer of the initial image generation model as the target object image.
  • the computer device obtains the reference object image, it can replace the second initial object image with the reference object image and input it into the backward network layer of the target network layer in the initial image generation model, continue to perform forward calculations in the model, and finally use the image output by the ending network layer of the initial image generation model as the target object image.
  • the computer device after the computer device obtains the reference object image, it can replace the second initial object image with the reference object image and input it into the backward network layer of the target network layer in the initial image generation model, continue to perform forward calculations in the model, and finally use the image output by the ending network layer of the initial image generation model as the target object image.
  • the reference object image replaces the second initial object image and is input into the backward network layer of the target network layer in the initial image generation model for forward calculation, and the final network layer of the initial image generation model can output a target object image with higher image quality.
  • the target object image can represent a high-quality image obtained by editing the target attribute of the original object image.
  • the original object image is an image directly output by the final network layer of the initial image generation model after the first random variable is input into the initial image generation model.
  • replacing the second initial object image with the reference object image and inputting the reference object image into the backward network layer of the target network layer in the initial image generation model, and obtaining the image output by the end network layer as the target object image comprises:
  • Step S402 Replace the second initial object image with the reference object image and input it into the backward network layer of the target network layer in the initial image generation model, and obtain the image output by the third network layer in the backward network layer as the third initial object image.
  • Step S404 Obtain an image output by the fourth network layer that matches the third network layer in the feature image generation model as a third feature object image.
  • the third network layer is any one of the backward network layers of the target network layer in the initial image generation model.
  • the fourth network layer matching the third network layer is a network layer matching the third network layer in the backward network layer of the target network layer in the feature image generation model.
  • the third network layer and the fourth network layer can be network layers with the same number of layers in the two models.
  • the target network layer that matches the initial image generation model and the feature image generation model is the eleventh layer in the two models
  • the third network layer can be the thirteenth layer in the initial image generation model
  • the fourth network layer that matches the third network layer can be the thirteenth layer in the feature image generation model.
  • the third network layer can be the forward network layer of the first network layer, or the backward network layer of the first network layer.
  • the fourth network layer and the second network layer can be the forward network layer of the second network layer, or the backward network layer of the second network layer.
  • Step S406 based on the current joint mask image, fuse the third initial object image and the third feature object image to obtain an updated object image;
  • the current joint mask image is the target joint mask image or the updated joint mask image, and the updated joint mask image is obtained based on the image output by the fifth network layer in the initial image generation model and the image output by the sixth network layer in the feature image generation model.
  • Step S408 Replace the third initial object image with the updated object image and input it into the backward network layer of the third network layer in the initial image generation model, and obtain the image output by the final network layer as the target object image.
  • the updated joint mask image refers to a joint mask image that is different from the target joint mask image and is a new joint mask image.
  • the generation method of the updated joint mask image is similar to the generation method of the target joint mask image.
  • the first random variable is input into the initial image generation model and the feature image generation model, and the image output by the fifth network layer in the initial image generation model is obtained as the third initial object image, and the image output by the sixth network layer in the feature image generation model is obtained as the third feature object image.
  • the attribute mask images corresponding to the third initial object image and the third feature object image are obtained, and the updated joint mask image is obtained based on the obtained attribute mask images.
  • the fifth network layer and the first network layer can be the same network layer or different network layers.
  • the sixth network layer and the second network layer can be the same network layer or different network layers.
  • the fifth network layer and the sixth network layer can be the same network layer or different network layers.
  • the fusion operation can be performed again based on the same or different joint mask images with reference to the way of obtaining the reference object image to improve the initial image. Improve the quality of the target object image that is finally output by the generative model and the quality of target attribute editing.
  • the computer device After the reference object image is obtained through the first fusion, the computer device replaces the second initial object image with the reference object image and inputs it into the backward network layer of the target network layer in the initial image generation model to continue the forward calculation of the model, and the image output by the third network layer in the backward network layer can be obtained as the third initial object image.
  • the image output by the network layer matching the third network layer is obtained as the third feature object image.
  • the computer device can fuse the third initial object image and the third feature object image based on the same joint mask image or a new joint mask image to obtain an updated object image, replace the third initial object image with the updated object image and input it into the backward network layer of the third network layer in the initial image generation model, and obtain the image output by the ending network layer as the target object image.
  • Each fusion can be based on the same joint mask image, for example, the fusion is based on the target joint mask image, or based on different joint mask images, for example, the first fusion is based on the target joint mask image, and the second fusion is based on the updated joint mask image.
  • the initial image generation model outputs the target object image, at least one fusion operation can be performed, and the quality of the target object image improves as the number of fusion operations increases.
  • a first preset number of fusion operations are performed; if the attribute editing complexity corresponding to the target attribute is less than the preset complexity, a second preset number of fusion operations are performed, and the first preset number is greater than the second preset number.
  • the attribute editing complexity is used to characterize the complexity and sophistication of attribute editing.
  • the operation of fusing the initial object image and the feature object image based on the joint mask image can be performed again to further improve the image quality of the target object image finally output by the final network layer of the initial image generation model and improve the fineness of target attribute editing.
  • the image editing method further includes:
  • Step S502 taking the original object image and the target object image as a training image pair.
  • Step S504 Based on the training image pair, the initial image attribute editing model is trained to obtain a target image attribute editing model; the target image attribute editing model is used to perform target attribute editing on the input image of the model.
  • the input data of the image attribute editing model is an image
  • the output data is an image with attribute editing.
  • the image attribute editing model is a slicked-back hairstyle editing model, which is used to edit the hairstyle in the input image into a slicked-back hairstyle.
  • the initial image attribute editing model is the image attribute editing model to be trained
  • the target image attribute editing model is the trained image attribute editing model.
  • a large number of paired object images can be generated by the image editing method of the present application, and the paired object images are used as training data for the image attribute editing model.
  • the computer device can use the original object image and the target object image as training image pairs, and the training image pairs are used as training data for the image attribute editing model.
  • the initial image attribute editing model is trained based on the training image pairs, and a target image attribute editing model for performing target attribute editing on the input image of the model can be obtained through training.
  • the original object image in the training image pair is used as the input data of the initial image attribute editing model
  • the target object image in the training image pair is used as the expected output data of the initial image attribute editing model.
  • the model can output an image that is very close to the target object image, that is, the model has the ability to perform target attributes editing on the input image, thereby obtaining a target image attribute editing model.
  • training image pairs can be obtained through different random variables. training image pairs, and the training image pairs are combined into a training image pair set. The initial image attribute editing model is trained based on the training image pair set to obtain a target image attribute editing model.
  • the image attribute editing model may also be a generative adversarial network.
  • a large number of paired object images can be quickly generated by the image editing method of the present application, and the paired object images are used as training data for the image attribute editing model.
  • the acquisition cost of a large number of paired object images is relatively high.
  • the original object image and the target object image obtained by the image editing method of the present application can be used as training image pairs to train the image attribute editing model.
  • the image editing method of the present application can quickly and accurately obtain a large number of paired object images.
  • the network layers connected sequentially in the initial image generation model and the feature image generation model are used to output images of successively increasing sizes.
  • the images output by the network layers with the same order correspond to the same size.
  • the initial image generation model includes network layers connected in sequence, and the network layers connected in sequence are used to output images of increasing sizes in sequence.
  • the output image of the current network layer is used as the input data of the next network layer, and the next network layer can further process the output image of the current network layer to improve the image quality and output an image of larger size.
  • the image output by the ending network layer of the initial image generation model has the largest size, the largest resolution, the richest and most harmonious image details, and the highest image quality.
  • the feature image generation model also includes network layers connected in sequence, and the network layers connected in sequence are used to output images of increasing sizes in sequence.
  • the initial image generation model and the feature image generation model include the same number of network layers. In the initial image generation model and the feature image generation model, the images output by the network layers with the same order correspond to the same size.
  • the network layers connected in sequence in the initial image generation model and the feature image generation model are used to output images with increasing sizes, and the initial image generation model and the feature image generation model can eventually output high-resolution, high-quality images.
  • the images output by the network layers with consistent order correspond to the same size.
  • the target category object is a face
  • the initial image generation model is an initial face image generation model
  • the feature image generation model is a feature face image generation model
  • the target attribute is a target local face attribute
  • the images in the first training image set are face images
  • the images in the second training image set are face images with target attributes.
  • the initial image generation model obtained by training based on the first training image set is the initial face image generation model, and the initial face image generation model is used to generate face images.
  • the random variables are input into the initial face image generation model, and the initial face image generation model can output a face image that can be mistaken for a real face.
  • the feature image generation model obtained by training the initial face image generation model based on the second training image set is the feature face image generation model, and the feature face image generation model is used to generate a face image with target attributes.
  • the random variables are input into the feature face image generation model, and the feature face image generation model can output a face image with target attributes.
  • the target attribute is a target partial face attribute.
  • the target partial face attribute is a partial face attribute determined from a large number of partial face attributes.
  • the partial face attribute is used to describe the partial information of the face.
  • the partial face attribute can be various hair attributes
  • the partial face attribute can be various facial expressions
  • the partial face attribute can be various beard attributes.
  • the target partial face attribute can be one, two, or more of the partial face attributes.
  • the image editing method of the present application can be specifically applied to face images.
  • Obtain a first random variable input the first random variable into the initial face image generation model and the feature face image generation model respectively, obtain the image output by the first network layer in the initial face image generation model to obtain a first initial face image, obtain the image output by the second network layer in the feature face image generation model to obtain a first feature face image, based on the image area corresponding to the attribute type of the target attribute in the face image, obtain the attribute mask images corresponding to the first initial face image and the first feature face image respectively, obtain the target joint mask image based on the respective attribute mask images, obtain the image output by the matching target network layers in the initial face image generation model and the feature face image generation model to obtain a second initial face image and a second feature.
  • the face image is obtained by fusing the second initial face image and the second feature face image based on the target joint mask image to obtain a reference face image, which is used to represent an image obtained by editing the second initial face image by target attributes.
  • the reference face image is used to replace the second initial face image and input into the backward network layer of the target network layer in the initial face image generation model, and the image output by the final network layer is obtained as the target face image.
  • the original face image is the image directly output by the final network layer of the model when the first random variable is input into the initial face image generation model.
  • the original face image and the target face image are used as a training face image pair, and the initial face image attribute editing model is trained based on the training face image pair to obtain the target face image attribute editing model, which is used to perform target attribute editing on the input face image of the model.
  • the target category object is a face
  • the initial image generation model is an initial face image generation model
  • the feature image generation model is a feature face image generation model
  • the target attribute is a target local face attribute.
  • the image editing method of the present application can be applied to the editing of local attributes of facial images.
  • Local attributes of facial images specifically refer to targeted changes to a certain local attribute of a facial image while maintaining other attributes unchanged. For example, a character's hairstyle is changed to a slicked back hair, but the facial features do not change.
  • the image editing method of the present application is a simple and efficient method that can complete non-binary precise editing of local attributes of the face. Mainly by collecting a batch of facial image data with the same target attribute, the corresponding attribute editing of the facial semantic area of the target attribute can be achieved.
  • the image editing method of the present application includes the following steps:
  • a batch of high-definition face images that do not require any annotation are collected as face training data, and then the position of the face in each face image is aligned using the position of the eyes.
  • a generative adversarial network is trained using the aligned face images, which is recorded as the original network (i.e., the initial image generation model).
  • a small number of high-definition face images with no annotations other than uniform features are collected as face training data, for example, all of them are with their heads turned back. Then, the position of the face in each face image is used to align the position of the face, and the original network is fine-tuned using the aligned face images to obtain a new generative adversarial network, denoted as the feature network (i.e., the feature image generation model).
  • the feature network i.e., the feature image generation model
  • the generative adversarial network includes a noise conversion network and a picture synthesis network, the input data of the noise conversion network is noise, and the output data is an intermediate hidden code, the input data of the picture synthesis network is an intermediate hidden code, and the output data is a picture.
  • the noise vector z sampled from the Gaussian noise is first converted by the noise conversion network into an intermediate hidden code w, and then the intermediate hidden code w is converted by the affine transformation layer in each layer of the picture synthesis network into a modulation parameter used in each layer, and the network can finally output a high-definition face picture. It can be considered that different intermediate hidden codes w correspond to different high-definition face pictures.
  • the generative adversarial network can be StyleGAN2. In the process of training to obtain the original network, the noise conversion network needs to be adjusted, and in the process of fine-tuning the original network to obtain the feature network, the noise conversion network may not need to be adjusted.
  • the original network and the feature network are inferred synchronously to obtain a face image, and a mask model or other customized methods are used to obtain the mask corresponding to the specific feature area (i.e., the area related to the target attribute), and then a joint mask image is generated.
  • the same noise vector z (input noise conversion network) or the same intermediate hidden code w (input image synthesis network) can be input into the original network and the feature network to obtain the initial portrait image and the feature portrait image respectively.
  • the images output by the final network layers of the original network and the feature network are used as the initial portrait image and the feature portrait image. Since the feature network is fine-tuned from the original network, the information such as the face orientation of the two images is roughly the same.
  • Mask for example, if the target attribute is a slicked back hair, the mask can specifically be a hair mask. Finally, a joint mask is obtained based on these two masks, which is recorded as mask.
  • the original network and the feature network are inferred synchronously. During the inference process, the features of the corresponding area of the target layer and the mask of the feature network are integrated into the original network. Then the original network continues to infer. Finally, the original network inference is completed to obtain the final portrait image.
  • the same noise vector z input noise conversion network
  • the same intermediate hidden code w input image synthesis network
  • the features of the joint mask specified area of a specific layer are fused according to the coefficients.
  • the generative adversarial network is a pyramid structure. Referring to Figure 7, the network slowly generates larger images from smaller resolutions layer by layer.
  • the output feature size of the 11th layer of the generative adversarial network is 64x64.
  • the output of the 11th layer of the original network will be recorded as The output of the 11th layer of the feature network will be recorded as
  • the resolution of the mask is adjusted to 64x64 using a resizing algorithm such as bilinear interpolation, denoted as mask 64.
  • a fusion coefficient a is set, and the calculation formula of the feature map after fusion is as follows:
  • the value range of a is between 0 and 1. In one embodiment, 1-a is greater than a, so that more information of the target attribute can be integrated into the image to further ensure Has attributes that are closer or consistent with the target attributes, further ensuring The attribute that is closer to or consistent with the target attribute. In one embodiment, the value range of a is between 0 and 0.3.
  • the fused features replace The original network is further forwarded. Finally, a high-definition picture with completed attribute editing can be obtained, and the image output by the final network layer of the original network is used as the high-definition picture with completed attribute editing, that is, the final portrait picture.
  • the final network layer of the original network can output an image with a resolution of 1024x1024, and this application finally realizes controllable editing to generate a high-definition face picture at the level of 1024*1024 pixels.
  • the operation of fusing the features of the mask specified area of a specific layer according to the coefficients can be performed at least once. For example, based on the same or different joint masks, fusion is performed on both the 11th layer and the 12th layer. 702 in FIG. 7 represents
  • Figure 8 is a schematic diagram of an initial portrait image, a feature portrait image, and a final portrait image.
  • the portrait images in the same column in Figure 8 are obtained based on different intermediate hidden codes, and the portrait images in the same row in Figure 8 are obtained based on the same intermediate hidden code.
  • FIG9 includes schematic diagrams of slicked-back editing, beard editing, and short hair editing.
  • the image editing method of the present application can generate a large number of paired face attribute editing training samples to support data for other generative adversarial network training.
  • the initial portrait image and the final portrait image constitute a pair of face attribute editing training samples.
  • a large number of paired face attribute editing training samples can be used to train a face attribute editing network, which can be used for various face attribute editing tasks at the front end, such as expression editing functions and hairstyle editing functions in applications.
  • the image editing method of the present application can also be applied to the editing of local attributes of other pictures, for example, to the editing of local attributes of other animal images, to the editing of local attributes of object images, and so on.
  • the embodiment of the present application also provides an image editing device for implementing the above-mentioned image editing method.
  • the implementation solution provided by the device to solve the problem is similar to the implementation solution recorded in the above-mentioned method, so the specific limitations in one or more image editing device embodiments provided below can refer to the limitations on the image editing method above, and will not be repeated here.
  • an image editing device comprising: a model acquisition module 1002, a random variable acquisition module 1004, a first image acquisition module 1006, a mask image acquisition module 1008, a second image acquisition module 1010 and an image fusion module 1012, wherein:
  • the model acquisition module 1002 is used to acquire an initial image generation model and a feature image generation model; the initial image generation model is obtained by training the first training image set, and the feature image generation model is obtained by training the initial image generation model based on the second training image set, the object images in the first training image set and the second training image set belong to target category objects, the object images in the second training image set contain target attributes, and the images output by the feature image generation model have target attributes;
  • the random variable acquisition module 1004 is used to obtain a first random variable and input the first random variable into an initial image generation model and a feature image generation model respectively; the initial image generation model is obtained by training the first training image set, and the feature image generation model is obtained by training the initial image generation model based on the second training image set; the first training image set and the second training image set include object images corresponding to target category objects, the object images in the second training image set are object images containing target attributes, and the images output by the feature image generation model have target attributes.
  • the first image acquisition module 1006 is used to acquire the image output by the first network layer in the initial image generation model to obtain the first initial object image, and acquire the image output by the second network layer in the feature image generation model to obtain the first feature object image.
  • the mask image acquisition module 1008 is used to acquire the attribute mask images corresponding to the first initial object image and the first feature object image respectively based on the image area corresponding to the attribute type to which the target attribute belongs in the object image, and obtain the target joint mask image based on each attribute mask image.
  • the second image acquisition module 1010 is used to acquire the image output by the matching target network layer in the initial image generation model and the feature image generation model to obtain a second initial object image and a second feature object image.
  • the image fusion module 1012 is used to fuse the second initial object image and the second characteristic object image based on the target joint mask image to obtain a reference object image; the reference object image is used to represent an image obtained by performing target attribute editing on the second initial object image.
  • model acquisition module 1002 is further used to:
  • adversarial learning is performed on the initial image generation network and the initial image discrimination network to obtain the intermediate image generation network and the intermediate image discrimination network; based on the intermediate image generation network, the initial image generation model is obtained; based on the second training image set, adversarial learning is performed on the intermediate image generation network and the intermediate image discrimination network to obtain the target image generation network and the target image discrimination network; based on the target image generation network, the feature image generation model is obtained.
  • the image editing device further includes:
  • the training image set acquisition module 1102 is used to acquire a first candidate image set; each first candidate image in the first candidate image set is an object image corresponding to a target category object; based on the position of a reference object part of the target category object in the first candidate image, each first candidate image is image aligned; based on each first candidate image after image alignment, a first training image set is obtained; a second candidate image set is acquired; each second candidate image in the second candidate image set is an object image corresponding to the target category object and contains target attributes; based on the reference object part of the target category object in the first candidate image, a first training image set is obtained; The positions of the second candidate images are determined, and each second candidate image is aligned; and a second training image set is obtained based on each second candidate image after the image alignment.
  • the mask image acquisition module 1008 is further configured to:
  • the attribute mask image corresponding to the first initial object image is used as the first mask image
  • the attribute mask image corresponding to the first feature object image is used as the second mask image
  • the first initial object image and the first feature object image are images of different sizes
  • the first mask image and the second mask image are images of different sizes
  • the first mask image and the second mask image are size-aligned
  • a target joint mask image is obtained based on the size-aligned first mask image and the second mask image.
  • the image fusion module 1012 is further used for:
  • an image area matching the occluded area in the target joint mask image is obtained as the first image area; the second initial object image and the second feature object image are fused to obtain a fused object image; from the fused object image, an image area matching the non-occluded area in the target joint mask image is obtained as the second image area; based on the first image area and the second image area, a reference object image is obtained.
  • the image fusion module 1012 is further used for:
  • the target joint mask image is resized to obtain a transformed joint mask image of the same size as the second initial object image; the transformed joint mask image is demasked to obtain a demasked joint mask image; and the second initial object image and the demasked joint mask image are fused to obtain the first image region.
  • the image fusion module 1012 is also used for:
  • the fused object image and the transformed joint mask image are fused to obtain a second image region.
  • the image editing device further includes:
  • the target object image determination module 1104 is used to replace the second initial object image with the reference object image and input it into the backward network layer of the target network layer in the initial image generation model, and obtain the image output by the ending network layer as the target object image; wherein the target object image is used to represent the image obtained by editing the original object image by target attributes, and the original object image is the image directly output by the ending network layer after the first random variable is input into the initial image generation model.
  • the target object image determination module 1104 is further configured to:
  • the reference object image replaces the second initial object image and is input into the backward network layer of the target network layer in the initial image generation model, and the image output by the third network layer in the backward network layer is obtained as the third initial object image; the image output by the fourth network layer matching the third network layer in the feature image generation model is obtained as the third feature object image; based on the current joint mask image, the third initial object image and the third feature object image are fused to obtain an updated object image; the current joint mask image is the target joint mask image or the updated joint mask image, and the updated joint mask image is obtained based on the image output by the fifth network layer in the initial image generation model and the image output by the sixth network layer in the feature image generation model; the updated object image replaces the third initial object image and is input into the backward network layer of the third network layer in the initial image generation model, and the image output by the ending network layer is obtained as the target object image.
  • the image editing device further includes:
  • the model training module 1106 is used to use the original object image and the target object image as a training image pair; based on the training image pair, the initial image attribute editing model is trained to obtain a target image attribute editing model; the target image attribute editing model is used to perform target attribute editing on the input image of the model.
  • the network layers connected sequentially in the initial image generation model and the feature image generation model are used to output images of successively increasing sizes.
  • the images output by the network layers with the same order correspond to the same size.
  • the target category object is a face
  • the initial image generation model is an initial face image generation model
  • the feature image generation model is a feature face image generation model
  • the target attribute is a target local face attribute
  • the initial image generation model is trained based on the object image corresponding to the target category object. After receiving the input data, the initial image generation model can output the object image corresponding to the target category object.
  • the feature image generation model is trained based on the object image corresponding to the target category object and containing the target attributes. After receiving the input data, the feature image generation model can output the object image corresponding to the target category object.
  • the first random variable is input into the initial image generation model and the feature image generation model respectively, and the object images output by the initial image generation model and the feature image generation model respectively have certain similarities, and the object image output by the feature image generation model contains the target attribute.
  • the attribute mask image can reflect the image area corresponding to the attribute type to which the target attribute belongs in the object image
  • the target joint mask image can reflect the joint image area of the attribute type to which the target attribute belongs in the first initial object image and the first feature object image.
  • the second initial object image and the second feature object image are fused, and the image area corresponding to the target attribute in the second feature object image can be fused into the second initial object image, so that the reference object image obtained by fusion is equivalent to making the second initial object image have the target attribute without changing other attributes of the second initial object image, that is, the reference object image is equivalent to the image obtained by editing the target attribute of the second initial object image.
  • the target attribute editing of the image output by the initial image generation model can be quickly realized based on the initial image generation model, the feature image generation model and the target joint mask image, which improves the image editing efficiency and also ensures the image editing accuracy.
  • Each module in the above-mentioned image editing device can be implemented in whole or in part by software, hardware or a combination thereof.
  • Each module can be embedded in or independent of a processor in a computer device in the form of hardware, or can be stored in a memory in a computer device in the form of software, so that the processor can call and execute the operations corresponding to each module.
  • a computer device which may be a server, and its internal structure diagram may be shown in FIG12.
  • the computer device includes a processor, a memory, an input/output interface (I/O for short) and a communication interface.
  • the processor, the memory and the input/output interface are connected via a system bus, and the communication interface is connected to the system bus via the input/output interface.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer-readable instruction and a database.
  • the internal memory provides an environment for the operation of the operating system and the computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store data such as an initial image generation model, a feature image generation model, a reference object image, and a target object image.
  • the input/output interface of the computer device is used to exchange information between the processor and an external device.
  • the communication interface of the computer device is used to communicate with an external terminal via a network connection.
  • a computer device which may be a terminal, and its internal structure diagram may be shown in FIG13.
  • the computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device.
  • the processor, the memory, and the input/output interface are connected via a system bus, and the communication interface, the display unit, and the input device are connected to the system bus via the input/output interface.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer-readable instructions.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the input/output interface of the computer device is used to exchange information between the processor and an external device.
  • the communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be implemented through WIFI, a mobile cellular network, NFC (near field communication) or other technologies.
  • an image editing method is implemented.
  • the display unit of the computer device is used to form a visually visible image, and can be a display screen, a projection device or a virtual reality imaging device.
  • the display screen can be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer device can be a touch layer covered on the display screen, or a button, trackball or touchpad set on the computer device casing, or an external keyboard, touchpad or mouse, etc.
  • FIGS. 12 and 13 are merely block diagrams of partial structures related to the scheme of the present application, and do not constitute a limitation on the computer device to which the scheme of the present application is applied.
  • the specific computer device may include more or fewer components than those shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, wherein the memory stores computer-readable instructions, and the processor implements the steps in the above-mentioned method embodiments when executing the computer-readable instructions.
  • a computer-readable storage medium storing computer-readable instructions.
  • the machine-readable instructions are executed by a processor, the steps in the above method embodiments are implemented.
  • a computer program product or computer program includes computer-readable instructions, the computer-readable instructions are stored in a computer-readable storage medium.
  • a processor of a computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device performs the steps in the above-mentioned method embodiments.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • any reference to the memory, database or other medium used in the embodiments provided in this application can include at least one of non-volatile and volatile memory.
  • Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetoresistive random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc.
  • Volatile memory can include random access memory (RAM) or external cache memory, etc.
  • RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM).
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • the database involved in each embodiment provided in this application may include at least one of a relational database and a non-relational database.
  • Non-relational databases may include distributed databases based on blockchain, etc., but are not limited to this.
  • the processor involved in each embodiment provided in this application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, etc., but are not limited to this.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种图像编辑方法、装置、计算机设备、存储介质和计算机程序产品,涉及人工智能。所述方法包括:获取初始图像生成模型和特征图像生成模型;初始图像生成模型是基于第一训练图像集训练得到的,特征图像生成模型是基于第二训练图像集对初始图像生成模型进行训练得到的,第一训练图像集和第二训练图像集中的对象图像属于目标类别对象,第二训练图像集中的对象图像包含目标属性,特征图像生成模型输出的图像具有目标属性(步骤S202);获取第一随机变量,将第一随机变量分别输入初始图像生成模型和特征图像生成模型(步骤S204);获取初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像(步骤S206);基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像,基于各个属性遮罩图像得到目标联合遮罩图像(步骤S208);获取初始图像生成模型和特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像(步骤S210);基于目标联合遮罩图像,融合第二初始对象图像和第二特征对象图像,得到参考对象图像;参考对象图像用于表征对第二初始对象图像进行目标属性编辑得到的图像(步骤S212)。

Description

图像编辑方法、装置、计算机设备和存储介质
本申请要求于2022年10月28日提交中国专利局,申请号为202211330943.1,申请名称为“图像编辑方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种图像编辑方法、装置、计算机设备、存储介质和计算机程序产品。
背景技术
随着图像处理技术的发展,出现了针对图像属性的图像编辑技术,针对图像属性的图像编辑技术用于在原始图像上对图像的局部属性进行编辑从而得到新图像。以针对人脸图像的人脸属性编辑为例,人脸属性编辑是指在原有的人脸图片上对人脸的属性进行单个或至少两个属性的编辑产生新的人脸图像,即在不改变原有人脸图片其它属性的情况下,实现对局部属性的修改。
传统技术中,针对图像属性的图像编辑技术通常是使用PhotoShop等图像编辑软件,人工精确地编辑图像,例如,使用PhotoShop的融合工具将其它人脸的目标属性区域剪切到当前人脸图像上。然而,该方法需要专业人士精心编辑,图像编辑效率低。
发明内容
本申请实施例提供了一种图像编辑方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。
本申请提供了一种图像编辑方法,由计算机设备执行,所述方法包括:
获取初始图像生成模型和特征图像生成模型;所述初始图像生成模型是基于第一训练图像集训练得到的,所述特征图像生成模型是基于第二训练图像集对所述初始图像生成模型进行训练得到的,所述第一训练图像集和所述第二训练图像集中的对象图像属于目标类别对象,所述第二训练图像集中的对象图像包含目标属性,所述特征图像生成模型输出的图像具有所述目标属性;
获取第一随机变量,将所述第一随机变量分别输入所述初始图像生成模型和所述特征图像生成模型;
获取所述初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取所述特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像;
基于所述目标属性所属的属性类型在对象图像中对应的图像区域,获取所述第一初始对象图像和所述第一特征对象图像分别对应的属性遮罩图像,基于各个所述属性遮罩图像得到目标联合遮罩图像;
获取所述初始图像生成模型和所述特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像;
基于所述目标联合遮罩图像,融合所述第二初始对象图像和所述第二特征对象图像,得到参考对象图像;所述参考对象图像用于表征对所述第二初始对象图像进行目标属性编辑得到的图像。
本申请还提供了一种图像编辑装置。所述装置包括:
模型获取模块,用于获取初始图像生成模型和特征图像生成模型;所述初始图像生成模型是基于第一训练图像集训练得到的,所述特征图像生成模型是基于第二训练图像集对所述初始图像生成模型进行训练得到的,所述第一训练图像集和所述第二训练图像集中的对象图像属于目标类别对象,所述第二训练图像集中的对象图像包含目标属性,所述特征图像生成模型输出的图像具有所述目标属性;
随机变量获取模块,用于获取第一随机变量,将所述第一随机变量分别输入所述初始图像生成模型和所述特征图像生成模型;
第一图像获取模块,用于获取所述初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取所述特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像;
遮罩图像获取模块,用于基于所述目标属性所属的属性类型在对象图像中对应的图像区域,获取所述第一初始对象图像和所述第一特征对象图像分别对应的属性遮罩图像,基于各个所述属性遮罩图像得到目标联合遮罩图像;
第二图像获取模块,用于获取所述初始图像生成模型和所述特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像;
图像融合模块,用于基于所述目标联合遮罩图像,融合所述第二初始对象图像和所述第二特征对象图像,得到参考对象图像;所述参考对象图像用于表征对所述第二初始对象图像进行目标属性编辑得到的图像。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述图像编辑方法所述的步骤。
一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述图像编辑方法所述的步骤。
一种计算机程序产品,包括计算机可读指令,所述计算机可读指令被处理器执行时实现上述图像编辑方法所述的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中图像编辑方法的应用环境图;
图2为一个实施例中图像编辑方法的流程示意图;
图3为另一个实施例中图像编辑方法的流程示意图;
图4为又一个实施例中图像编辑方法的流程示意图;
图5为一个实施例中训练图像属性编辑模型的流程示意图;
图6为一个实施例中应用于人像图片的图像编辑方法的流程示意图;
图7为一个实施例中特征融合的示意图;
图8为一个实施例中人脸背头编辑的效果示意图;
图9为一个实施例中人脸属性编辑的效果示意图;
图10为一个实施例中图像编辑装置的结构框图;
图11为另一个实施例中图像编辑装置的结构框图;
图12为一个实施例中计算机设备的内部结构图;
图13为另一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供的方案涉及人工智能的计算机视觉技术、机器学习等技术,具体通过如下实施例进行说明:
本申请实施例提供的图像编辑方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。数据存储系统可以存储服务器104需要处理的数据。数据存储系统可以集成在服务器104上,也可以放在云上或其他服务器上。终端102可以但不限于是各种台式计算机、笔记本电脑、智能手机、平板电脑、物联网设备和便携式可穿戴设备,物联网设备可为智能音箱、智能电视、智能空调、智能车载设备等。便携式可穿戴设备可为智能手表、智能手环、头戴设备等。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群或者云服务器来实现。
终端和服务器均可单独用于执行本申请实施例中提供的图像编辑方法。
例如,服务器获取第一随机变量,将第一随机变量分别输入初始图像生成模型和特征图像生成模型。其中,初始图像生成模型是基于第一训练图像集训练得到的,特征图像生成模型是基于第二训练图像集对初始图像生成模型进行训练得到的,第一训练图像集和第二训练图像集包括目标类别对象对应的对象图像,第二训练图像集中的对象图像为包含目标属性的对象图像,特征图像生成模型输出的图像具有目标属性。服务器获取初始图像生成模型中第一网络层输出的图像得到第一初始对象图像,获取特征图像生成模型中第二网络层输出的图像得到第一特征对象图像,基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像,基于各个属性遮罩图像得到目标联合遮罩图像。服务器获取初始图像生成模型和特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像。服务器基于目标联合遮罩图像将第二初始对象图像和第二特征对象图像进行融合,得到参考对象图像,参考对象图像用于表征对第二初始对象图像进行目标属性编辑得到的图像。
终端和服务器也可协同用于执行本申请实施例中提供的图像编辑方法。
例如,服务器基于终端发送的图像编辑请求获取第一随机变量,将第一随机变量分别输入初始图像生成模型和特征图像生成模型,获取初始图像生成模型中第一网络层输出的图像得到第一初始对象图像,获取特征图像生成模型中第二网络层输出的图像得到第一特征对象图像,获取初始图像生成模型和特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像。服务器基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像,基于各个属性遮罩图像得到目标联合遮罩图像。服务器基于目标联合遮罩图像将第二初始对象图像和第二特征对象图像进行融合,得到参考对象图像。服务器将参考对象图像发送至终端。终端可以将参考对象图像进行展示。
这样,初始图像生成模型是基于目标类别对象对应的对象图像训练得到的,在接收输入数据后,初始图像生成模型可以输出目标类别对象对应的对象图像。特征图像生成模型是基于目标类别对象对应的、且包含目标属性的对象图像,对初始图像生成模型训练得到的,在接收输入数据后,特征图像生成模型可以输出目标类别对象对应的、且包含目标属性的对象图像。将第一随机变量分别输入初始图像生成模型和特征图像生成模型,初始图像生成模型和特征图像生成模型分别输出的对象图像具有一定的相似性,并且特征图像生成模型输出的对象图像包含目标属性。属性遮罩图像可以反映目标属性所属的属性类型在对象图像中对应的图像区域,目标联合遮罩图像可以反映目标属性所属的属性类型在第一初始对象图像和第一特征对象图像中的联合图像区域。基于目标联合遮罩图像将第二初始对象图像和第二特征对象图像进行融合,可以将第二特征对象图像中目标属性对应的图像区域融合至第二初始对象图像中,从而融合得到的参考对象图像相当于在不改变第二初始对象图像其它属性的情况下使其具有目标属性,即参考对象图像相当于是对第二初始对象图像进行目标属性编辑得到的图像。无需人工精细编辑,基于初始图像生成模型、特征图像生成模型和目标联合遮罩图像可以快速实现对初始图像生成模型输出图像的目标属性编辑,提高了图像编辑效率,并且也可以保障图像编辑准确性。
在一个实施例中,如图2所示,提供了一种图像编辑方法,以该方法应用于计算机设备来举例说明,计算机设备可以是终端或服务器。该方法可以由终端或服务器自身单独执行,也可以通过终端和服务器之间的交互来实现。参考图2,图像编辑方法包括以下步骤:
步骤S202,获取初始图像生成模型和特征图像生成模型;初始图像生成模型是基于第一训练图像集训练得到的,特征图像生成模型是基于第二训练图像集对初始图像生成模型进行训练得到的,第一训练图像集和第二训练图像集中的对象图像属于目标类别对象,第二训练图像集中的对象图像包含目标属性,特征图像生成模型输出的图像具有目标属性。
其中,图像生成模型是机器学习模型,用于生成图像。图像生成模型的输入数据为随机变量,输出数据为图像。图像生成模型通常是基于训练图像集训练得到的。训练图像集包括属于同一类别的多个对象的对象图像,从而基于训练图像集训练得到的图像生成模型可以用于生成特定类别对象的对象图像。例如,用于生成人脸图像的图像生成模型是基于人脸图像集训练得到的,用于生成猫图像的图像生成模型是基于猫图像训练集训练得到的,用于生成车辆图像的图像生成模型是基于车辆图像训练集训练得到的。可以理解,对象可以是物体,例如,汽车、桌子等,对象也可以是活体,例如,动物、植物等。
第一训练图像集包括目标类别对象对应的对象图像,即第一训练图像集包括多个对象图像,第一训练图像集中的对象图像属于目标类别对象。初始图像生成模型是基于第一训练图像集训练得到的图像生成模型,初始图像生成模型用于生成属于目标类别对象的对象图像。目标类别对象用于指代某一类对象,例如,若目标类别对象为人脸,那么第一训练图像集包括多个人脸图像,初始图像生成模型用于生成人脸图像;若目标类别对象为狗,那么第一训练图像集包括多个狗图像,初始图像生成模型用于生成狗图像;若目标类别对象为猫,那么第一训练图像集包括多个猫图像,初始图像生成模型用于生成猫图像,若目标类别对象为车辆,那么第一训练图像集包括多个车辆图像,初始图像生成模型用于生成车辆图像。
第二训练图像集包括目标类别对象对应的、且包含目标属性的对象图像,即第二训练图像集包括多个对象图像,第二训练图像集中的对象图像不仅属于目标类别对象还包含目标属性。特征图像生成模型是基于第二训练图像集训练得到的图像生成模型,特征图像生成模型用于生成属于目标类别对象、且包含目标属性的对象图像,即特征图像生成模型用于生成属于目标类别对象、且具有特定特征的对象图像。对象图像存在各类对象属性,一类对象属性(即一个属性类型)在对象图像中存在对应的图像区域,一类对象属性用于表征对象的一类特征、特性。一类对象属性是针对同一类型的多个具体对象属性的总称,属于同一类型的多个具体对象属性之间存在共性。属于同一类型的多个具体对象属性用于描述对象的同一部位,属于同一类型的不同具体对象属性可以使得同一部位具有不同的形态。例如,人脸图像包括头发类属性,头发类属性用于表征、描述对象的头发,头发类属性可以包括用于表征不同发型的各个头发属性,具体可以是背头属性、短发属性、长发属性、卷发属性等。猫图像包括猫腿类属性,猫腿类属性用于表征、描述猫的腿部,猫腿类属性可以包括用于表征猫不同腿部的各个腿部属性,具体可以是长腿属性、短腿属性等。目标属性是指某一具体对象属性,例如,背头属性、长腿属性。目标属性也可以是指属于不同属性类型的至少两个具体对象属性,例如,背头属性+络腮胡属性。
举例说明,若目标类别对象为车辆,第一训练图像集具体包括各种车辆图像,初始图像生成模型用于生成车辆图像。进一步的,若目标属性为天窗属性,第二训练图像集具体包括车辆顶部具有天窗的车辆图像,特征图像生成模型用于生成车辆顶部具有天窗的车辆图像。
进一步的,特征图像生成模型是在初始图像生成模型的基础上,基于第二训练图像集进一步训练得到的,也就是,特征图像生成模型是基于第二训练图像集对初始图像生成模型进行微调得到的,若特征图像生成模型和初始图像生成模型的输入数据一致,那么特征 图像生成模型和初始图像生成模型输出的图像是具有一定的相似性的。相较于初始图像生成模型输出的图像,特征图像生成模型输出的图像和它的主要区别在于特征图像生成模型输出的图像具有目标属性。例如,初始图像生成模型用于生成人脸图像,特征图像生成模型用于生成具有背头特征的人脸图像,将同一随机变量分别输入特征图像生成模型和初始图像生成模型,特征图像生成模型和初始图像生成模型输出的人脸图像具有相似的五官,特征图像生成模型输出的人脸图像中的发型为背头。
具体地,可以预先训练得到初始图像生成模型和特征图像生成模型,计算机设备在本地或从其他设备上获取初始图像生成模型和特征图像生成模型,基于初始图像生成模型和特征图像生成模型来快速实现针对图像的目标属性编辑。
步骤S204,获取第一随机变量,将第一随机变量分别输入初始图像生成模型和特征图像生成模型。
其中,变量是指数值可以变的数据。随机变量是指随机生成的变量。例如,随机变量可以是从高斯分布中随机采样得到的数据;随机变量可以是从均匀分布中随机采样得到的数据;等等。第一随机变量是指某一个随机变量。步骤S206,获取初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像。
其中,图像生成模型包括依次连接的多个网络层,各个网络层可以输出相应的对象图像。可以理解,图像生成模型的结尾网络层输出的图像为所有网络层输出的图像中图像质量最高的图像。初始对象图像为初始图像生成模型经过数据处理得到的对象图像。特征对象图像为特征图像生成模型经过数据处理得到的对象图像。第一初始对象图像是初始图像生成模型中第一网络层输出的图像,第一特征对象图像是特征图像生成模型中第二网络层输出的图像。特征图像生成模型是对初始图像生成模型进行微调得到的,特征图像生成模型和初始图像生成模型包括相同的模型结构,只是模型参数不同,即特征图像生成模型和初始图像生成模型具有相同数量的网络层。第一网络层和第二网络层可以是两个模型中层数一致的网络层,例如,第一网络层为初始图像生成模型中的结尾网络层,第二网络层为特征图像生成模型中的结尾网络层。第一网络层和第二网络层也可以是两个模型中层数不一致的网络层,例如,第一网络层为初始图像生成模型中的第七层,第二网络层为特征图像生成模型中的第十一层。
具体地,计算机设备获取第一随机变量,将第一随机变量分别输入初始图像生成模型和特征图像生成模型,获取初始图像生成模型中第一网络层输出的图像作为第一初始对象图像,获取特征图像生成模型中第二网络层输出的图像作为第一特征对象图像。
步骤S208,基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像,基于各个属性遮罩图像得到目标联合遮罩图像。
其中,一个属性类型用于指代一类对象属性。目标属性所属的属性类型在对象图像中存在对应的图像区域。例如,针对人脸图像,若目标属性为背头属性,背头属性所属的属性类型为头发类属性,头发类属性对应的图像区域为图像中的头发区域;针对车辆图像,若目标属性为敞篷属性,敞篷属性所属的属性类型为车顶类属性,车顶类属性对应的图像区域为图像中的车顶区域。
遮罩用于遮挡图像中的一部分,显示图像中的另一部分。属性遮罩图像用于确定某一类对象属性在对象图像中的位置。为了后续实现目标属性编辑,在属性遮罩图像中,目标属性所属的属性类型在对象图像中对应的图像区域为显示部分,对象图像的其他图像区域为遮挡部分。属性遮罩图像包括遮挡区域和非遮挡区域。非遮挡区域是指对象图像中的显示部分,即非遮挡区域是目标属性所属的属性类型在对象图像中对应的图像区域。遮挡区域是指对象图像中的遮挡部分,即遮挡区域是对象图像中除目标属性所属的属性类型对应 的图像区域之外的其他图像区域。在一个实施例中,属性遮罩图像为二值图像,遮挡部分用像素值0表示,非遮挡部分用像素值1表示。
目标联合遮罩图像是基于第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像得到的,用于确定某一类对象属性在第一初始对象图像和第一特征对象图像中的联合位置。
具体地,在获取到第一初始对象图像和第一特征对象图像后,计算机设备可以基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像。基于目标属性所属的属性类型在第一初始对象图像中对应的图像区域生成第一初始对象图像对应的属性遮罩图像,基于目标属性所属的属性类型在第一特征对象图像中对应的图像区域生成第一特征对象图像对应的属性遮罩图像。然后,计算机设备基于第一初始对象图像和第一特征对象图像对应的属性遮罩图像得到目标联合遮罩图像,例如,对第一初始对象图像和第一特征对象图像对应的属性遮罩图像求交集得到目标联合遮罩图像;对第一初始对象图像和第一特征对象图像对应的属性遮罩图像求并集得到目标联合遮罩图像;等等。
可以理解,计算机设备可以通过机器学习模型获取对象图像对应的属性遮罩图像,也可以通过其他手段获取对象图像对应的属性遮罩图像,本申请不限制获取属性遮罩图像的手段。
步骤S210,获取初始图像生成模型和特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像。
其中,第二初始对象图像是初始图像生成模型中的网络层输出的图像,第二特征对象图像是特征图像生成模型中的网络层输出的图像,第二初始对象图像和第二特征对象图像是初始图像生成模型和特征图像生成模型中匹配的目标网络层分别输出的图像。初始图像生成模型和特征图像生成模型中匹配的目标网络层可以是初始图像生成模型和特征图像生成模型中层数相同的网络层。图像生成模型包括依次连接的多个网络层,各个网络层可以输出相应的对象图像,目标网络层可以是各个网络层中的任意网络层。
可以理解,在初始图像生成模型中,输出第一初始对象图像的网络层和输出第二初始对象图像的网络层可以是相同的网络层,也可以是不同的网络层。在特征图像生成模型中,输出第一特征对象图像的网络层和输出第二特征对象图像的网络层可以是相同的网络层,也可以是不同的网络层。
具体地,计算机设备将第一随机变量分别输入初始图像生成模型和特征图像生成模型,获取初始图像生成模型和特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像。
可以理解,获取第二初始对象图像和第二特征对象图像的操作可以和获取第一初始对象图像和第一特征对象图像的操作同步进行,也可以非同步进行。
步骤S212,基于目标联合遮罩图像,融合第二初始对象图像和第二特征对象图像,得到参考对象图像;参考对象图像用于表征对第二初始对象图像进行目标属性编辑得到的图像。
其中,目标属性编辑是指对图像进行编辑使得编辑后的图像具有目标属性。
具体地,计算机设备可以基于目标联合遮罩图像融合第二初始对象图像和第二特征对象图像,将目标联合遮罩图像中非遮挡区域在第二特征对象图像中对应的图像区域融合至第二初始对象图像中,从而得到参考对象图像。参考对象图像用于表征对第二初始对象图像进行目标属性编辑得到的图像,也就是,参考对象图像相当于在不改变第二初始对象图像的其他图像区域的情况下将目标属性所属的属性类型对应的图像区域修改为目标属性。例如,针对人脸图像,目标属性可以是背头属性,参考对象图像相当于将第二初始人脸图像中的头发区域修改为背头属性,也就是第二初始人脸图像中人脸的发型为背头;针对车 辆图像,目标属性可以是天窗属性,参考对象图像相当于将第二初始车辆图像中的车顶区域修改为天窗属性,也就是第二初始车辆图像中车顶具有天窗。
基于目标联合遮罩图像将第二初始对象图像和第二特征对象图像进行融合来实现目标属性编辑,可以在不改变第二初始对象图像中其他非相干区域的情况下实现对目标属性相干区域的精确编辑。
可以理解,若需要进行其他属性编辑,则需要获取新的目标属性对应的第二训练图像集,基于新的第二训练图像集对初始图像生成模型进行微调得到新的特征图像生成模型,基于初始图像生成模型和新的特征图像生成模型来实现对图像进行针对新的目标属性的编辑。例如,若需要对人脸图像进行络腮胡编辑,则需要获取包含络腮胡的人脸图像作为第二训练图像集,基于该第二训练图像集对初始人脸图像生成模型进行微调得到用于生成包含络腮胡的人脸图像的特征人脸图像生成模型;若需要对人脸图像进行背头编辑,则需要获取包含背头的人脸图像作为第二训练图像集,基于该第二训练图像集对初始人脸图像生成模型进行微调得到用于生成包含背头的人脸图像的特征人脸图像生成模型。
可以理解,若需要同时对至少两个属性进行编辑,可以获取相应的第二训练图像集对初始图像生成模型进行微调得到用于生成同时包含至少两个属性的对象图像的特征图像生成模型。例如,若需要对人脸图像同时进行络腮胡编辑和背头编辑,则可以获取同时包含络腮胡和背头的人脸图像作为第二训练图像集,基于该第二训练图像集对初始人脸图像生成模型进行微调得到用于生成同时包含络腮胡和背头的人脸图像的特征人脸图像生成模型。
上述图像编辑方法中,初始图像生成模型是基于目标类别对象对应的对象图像训练得到的,在接收输入数据后,初始图像生成模型可以输出目标类别对象对应的对象图像。特征图像生成模型是基于目标类别对象对应的、且包含目标属性的对象图像,对初始图像生成模型训练得到的,在接收输入数据后,特征图像生成模型可以输出目标类别对象对应的、且包含目标属性的对象图像。将第一随机变量分别输入初始图像生成模型和特征图像生成模型,初始图像生成模型和特征图像生成模型分别输出的对象图像具有一定的相似性,并且特征图像生成模型输出的对象图像包含目标属性。属性遮罩图像可以反映目标属性所属的属性类型在对象图像中对应的图像区域,目标联合遮罩图像可以反映目标属性所属的属性类型在第一初始对象图像和第一特征对象图像中的联合图像区域。基于目标联合遮罩图像将第二初始对象图像和第二特征对象图像进行融合,可以将第二特征对象图像中目标属性对应的图像区域融合至第二初始对象图像中,从而融合得到的参考对象图像相当于在不改变第二初始对象图像其它属性的情况下使其具有目标属性,即参考对象图像相当于是对第二初始对象图像进行目标属性编辑得到的图像。无需人工精细编辑,基于初始图像生成模型、特征图像生成模型和目标联合遮罩图像可以快速实现对初始图像生成模型输出图像的目标属性编辑,提高了图像编辑效率,并且也可以保障图像编辑准确性。
在一个实施例中,步骤S202,包括:
基于第一训练图像集,对初始图像生成网络和初始图像判别网络进行对抗学习,得到中间图像生成网络和中间图像判别网络;基于中间图像生成网络得到初始图像生成模型;基于第二训练图像集,对中间图像生成网络和中间图像判别网络进行对抗学习,得到目标图像生成网络和目标图像判别网络;基于目标图像生成网络得到特征图像生成模型。
其中,图像生成网络和图像判别网络也是机器学习模型。图像生成网络的输入数据为随机变量,输出数据为图像。图像判别网络的输入数据为图像,输出数据为用于表示输入图像真假的图像标签。图像生成网络也可以称为图像生成器,图像判别网络也可以称为图像判别器。
初始图像生成网络为待训练的图像生成网络,初始图像判别网络为待训练的图像生成网络。中间图像生成网络为基于第一训练图像集训练得到的图像生成网络,中间图像判别 网络为基于第一训练图集训练得到的图像判别网络。目标图像生成网络为基于第二训练图像集微调得到的图像生成网络,目标图像判别网络为基于第二训练图像集微调得到的图像判别网络。
对抗学习是指通过让不同网络以相互博弈的方式进行学习,从而训练得到期望的网络。将初始图像生成网络和初始图像判别网络进行对抗学习,初始图像生成网络的目标是根据随机变量生成能够以假乱真的图像。初始图像判别网络的目标是将初始图像生成网络输出的伪造图像从真实图像中尽可能分辨出来。初始图像生成网络和初始图像判别网络相互对抗学习、不断调整参数,最终目的是让图像生成网络要尽可能地蒙蔽过图像判别网络,使图像判别网络无法判断图像生成网络的输出图像是否真实。
具体地,图像生成模型可以是生成对抗网络,通过对抗学习的方式训练得到。在模型训练阶段,计算机设备可以基于第一训练图像集,对初始图像生成网络和初始图像判别网络进行对抗学习,生成器尽力使输出的图片骗过判别器的判断,而判别器则尽力区分由生成器输出的伪造图片和真实图片,以此互相对抗,从而训练得到中间图像生成网络和中间图像判别网络。在模型应用阶段时,计算机设备可以基于中间图像生成网络得到初始图像生成模型。可以理解,判别器是为了帮助模型训练,模型应用时主要应用的还是生成器。
进一步的,为了快速训练得到特征图像生成模型,计算机设备可以基于第二训练图像集,对中间图像生成网络和中间图像判别网络进行对抗学习,生成器尽力使输出的图片骗过判别器的判断,而判别器则尽力区分生成器输出的伪造图片和真实图片,以此互相对抗,从而得到目标图像生成网络和目标图像判别网络,基于目标图像生成网络得到特征图像生成模型。由于第二训练图像集中的对象图像包含目标属性,基于第二训练图像集训练得到的中间图像生成网络可以生成包含目标属性的对象图像。由于中间图像生成网络和中间图像判别网络已经经过一定的模型训练,具有相对较好的模型参数,在基于第二训练图像集对中间图像生成网络和中间图像判别网络进行对抗学习时,对中间图像生成网络和中间图像判别网络进行微调即可快速得到目标图像生成网络和目标图像判别网络,从而快速得到特征图像生成模型。
可以理解,对图像生成网络和图像判别网络进行对抗学习的训练方式可以是先固定初始图像生成网络,基于初始图像生成网络输出的虚假图像和真实图像对初始图像判别网络进行训练,得到更新图像判别网络,使得更新图像判别网络具有一定的图像判别能力,再固定更新图像判别网络,基于更新图像判别网络针对初始图像生成网络输出图像的预测判别结果和真实判别结果对初始图像生成网络进行训练,得到更新图像生成网络,使得更新图像生成网络具有一定的图像生成能力,将更新图像生成网络和更新图像判别网络作为初始图像生成网络和初始图像判别网络来进行前述迭代训练,如此循环往复,直到满足收敛条件,得到中间图像生成网络和中间图像判别网络。可以理解,对图像生成网络和图像判别网络进行对抗学习的训练方式可以是各种常用的对抗学习训练方式,本申请对此不作限制。
上述实施例中,基于第一训练图像集,对初始图像生成网络和初始图像判别网络进行对抗学习,得到中间图像生成网络和中间图像判别网络;基于中间图像生成网络得到初始图像生成模型;基于第二训练图像集,对中间图像生成网络和中间图像判别网络进行对抗学习,得到目标图像生成网络和目标图像判别网络;基于目标图像生成网络得到特征图像生成模型。这样,通过对抗学习可以快速训练得到可生成以假乱真的图像的图像生成网络,从而快速得到初始图像生成模型,基于特定的训练图像集通过进一步的对抗学习可以快速训练得到可生成包含特定属性的图像的图像生成网络,从而快速得到特征图像生成模型。
在一个实施例中,图像编辑方法还包括:获取第一候选图像集;第一候选图像集中的各个第一候选图像为目标类别对象对应的对象图像;基于目标类别对象的参考对象部位在第一候选图像中的位置,对各个第一候选图像进行图像对齐;基于图像对齐后的各个第一 候选图像,得到第一训练图像集。
图像编辑方法还包括:获取第二候选图像集;第二候选图像集中的各个第二候选图像为目标类别对象对应的、且包含目标属性的对象图像;基于目标类别对象的参考对象部位在第二候选图像中的位置,对各个第二候选图像进行图像对齐;基于图像对齐后的各个第二候选图像,得到第二训练图像集。
其中,第一候选图像集用于生成第一训练图像集,第一候选图像集包括多个第一候选图像,第一候选图像为目标类别对象对应的对象图像。第二候选图像集用于生成第二训练图像集,第二候选图像集包括多个第二候选图像,第二候选图像为目标类别对象对应的、且包含目标属性的对象图像。
图像对齐用于将对象图像中对象主体部分进行位置对齐。在图像对齐后的对象图像中,对象主体部分位于图像中的相同位置。经过图像对齐得到的训练集有助于提高模型的训练速度。例如,图像对齐用于将人脸图像中的人脸五官进行位置对齐;图像对齐用于将车辆图像中的车身进行位置对齐。
目标类别对象的参考对象部位是预先设置的对象部位,是在图像对齐时需要对齐位置的对象部分。
具体地,训练图像集中的对象图像可以是目标类别对象对应的原始图像,然而为了进一步降低训练难度、提高训练速度,训练图像集中的对象图像可以是经过图像对齐得到的图像。
针对第一训练图像集,可以获取由目标类别对象对应的原始对象图像组成的第一候选图像集,基于目标类别对象的参考对象部位在第一候选图像中的位置,对第一候选图像集中各个第一候选图像进行图像对齐,将经过图像对齐的各个第一候选图像组成第一训练图像集。
针对第二训练图像集,可以获取由目标类别对象对应的、且包含目标属性的原始对象图像组成的第二候选图像集,基于目标类别对象的参考对象部位在第一候选图像中的位置,对第二候选图像集中各个第二候选图像进行图像对齐,将经过图像对齐的各个第二候选图像组成第二训练图像集。
在图像对齐时,可以将参考对象部位对应的预设位置作为参考位置,以参考位置作为基准,基于参考位置对各个候选图像中参考对象部位的位置进行调整,将各个候选图像中参考对象部位的位置向参考位置对齐,将各个候选图像中参考对象部位的位置固定在参考位置上,从而完成图像对齐。预设位置是指预先设置的位置。也可以从各个候选图像中获取任意的候选图像作为参考图像,将其他候选图像向参考图像对齐,将参考图像中参考对象部位对应的位置作为参考位置,以参考位置为基准,将其他候选图像中参考对象部位的位置固定在参考位置上,从而完成图像对齐。
以人脸图像为例,可以将一批人脸图像组成第一候选图像集,利用人脸图像中的眼睛位置或五官位置对齐每一张人脸图像,利用对齐后的人脸图像组成第一训练图像集。可以将一批具有背头属性的人脸图像组成第二候选图像集,利用人脸图像中的眼睛位置或五官位置对齐每一张人脸图像,利用对齐后的人脸图像组成第二训练图像集。可以利用仿射变换算法变形人脸,使得人脸的五官固定到某个特定位置。
上述实施例中,第一训练图像集和第二训练图像集是经过图像对齐得到的,能够有效降低模型的训练难度、提高训练效率。
在一个实施例中,基于各个属性遮罩图像得到目标联合遮罩图像,包括:
将第一初始对象图像对应的属性遮罩图像作为第一遮罩图像,将第一特征对象图像对应的属性遮罩图像作为第二遮罩图像;第一初始对象图像和第一特征对象图像为不同尺寸的图像,第一遮罩图像和第二遮罩图像为不同尺寸的图像;将第一遮罩图像和第二遮罩图像进行尺寸对齐,基于尺寸对齐后的第一遮罩图像和第二遮罩图像得到目标联合遮罩图像。
其中,第一初始对象图像和第一特征对象图像可以对应不同的尺寸。例如,将第一随机变量输入初始图像生成模型和特征图像生成模型,第一初始对象图像是初始图像生成模型中第六层输出的图像,第一特征对象图像是特征图像生成模型中第八层输出的图像。第一遮罩图像为第一初始对象图像对应的属性遮罩图像,第二遮罩图像为第一特征对象图像对应的属性遮罩图像。若第一初始对象图像和第一特征对象图像是不同尺寸的对象图像,那么第一遮罩图像和第二遮罩图像也是不同尺寸的属性遮罩图像。
尺寸对齐用于将不同图像的尺寸统一为相同尺寸,以便进行数据处理。例如,对两个图像进行尺寸对齐,可以是将其中一个图像进行放大或缩小使其尺寸与另一个图像的尺寸一致。可以理解,在尺寸对齐时,将图像进行放大或缩小,该图像的图像内容同步放大或缩小。例如,在尺寸对齐时,将一张人脸图像进行放大,则该人脸图像呈现的人脸也放大了。
具体地,若第一遮罩图像和第二遮罩图像是不同尺寸的属性遮罩图像。那么在生成目标联合遮罩图像时,可以将第一遮罩图像和第二遮罩图像先转换为相同尺寸的属性遮罩图像再进行融合,从而得到最终的目标联合遮罩图像。例如,可以对第一遮罩图像进行尺寸变换,得到与第二遮罩图像对应相同尺寸的变换遮罩图像,融合变换遮罩图像和第二遮罩图像得到目标联合遮罩图像。
可以理解,若第一初始对象图像和第一特征对象图像为相同尺寸的对象图像,则可以直接融合第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像得到目标联合遮罩图像,无需进行尺寸变换。
上述实施例中,在生成目标联合遮罩图像时,若第一初始对象图像对应的第一遮罩图像和第一特征对象图像对应的第二遮罩图像为不同尺寸的图像,则先将第一遮罩图像和第二遮罩图像进行尺寸对齐,再融合尺寸对齐后的第一遮罩图像和第二遮罩图像得到目标联合遮罩图像,这样先进行尺寸对齐再进行融合,能够降低融合难度,快速得到准确的目标联合遮罩图像。
在一个实施例中,基于目标联合遮罩图像,融合第二初始对象图像和第二特征对象图像,得到参考对象图像,包括:
从第二初始对象图像中,获取与目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域;融合第二初始对象图像和第二特征对象图像,得到融合对象图像;从融合对象图像中,获取与目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域;基于第一图像区域和第二图像区域,得到参考对象图像。
具体地,在基于目标联合遮罩图像融合第二初始对象图像和第二特征对象图像时,可以将第二初始对象图像中与目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域,第一图像区域用于保留第二初始对象图像中与目标属性非相干区域的图像信息。将第二初始对象图像和第二特征对象图像进行融合,从融合得到的融合对象图像中获取与目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域,第二图像区域为对象图像中与目标属性相干区域,并且在一定程度上相当于具有目标属性。最终,结合第一图像区域和第二图像区域得到参考对象图像。从融合对象图像中确定的第二图像区域不仅具有目标属性,而且与第一图像区域结合后可以得到图像区域之间衔接更自然的参考对象图像。
上述实施例中,从第二初始对象图像中,获取与目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域,第一图像区域用于保障在目标属性编辑时保持第二初始对象图像中与目标属性非相干区域的图像信息不变。将第二初始对象图像和第二特征对象图像进行融合,得到融合对象图像,从融合对象图像中,获取与目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域,第二图像区域用于保障在目标属性编辑时将第二初始对象图像中与目标属性相干区域的图像信息具有目标属性。最终基于第一图像区域和第二图像区域得到的参考对象图像可以呈现精确的目标属性编辑效果。
在一个实施例中,从第二初始对象图像中,获取与目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域,包括:对目标联合遮罩图像进行尺寸变换,得到与第二初始对象图像对应相同尺寸的变换联合遮罩图像;对变换联合遮罩图像进行反遮罩处理,得到反联合遮罩图像;融合第二初始对象图像和反联合遮罩图像,得到第一图像区域。
从融合对象图像中,获取与目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域,包括:融合融合对象图像和变换联合遮罩图像,得到第二图像区域。
其中,反遮罩处理用于将遮罩图像中原有的遮挡区域转换为非遮挡区域,将遮罩图像中原有的非遮挡区域转换为遮挡区域。例如,第一遮罩图像中遮挡部分用像素值0表示,非遮挡部分用像素值1表示,对第一遮罩图像进行反遮罩处理后,第一遮罩图像中原来的遮挡部分用像素值1表示,原来的非遮挡部分用像素值0表示。
具体地,为了便于后续数据处理,可以先对目标联合遮罩图像进行尺寸变换,得到与第二初始对象图像对应相同尺寸的变换联合遮罩图像。在从第二初始对象图像中获取第一图像区域时,先对变换联合遮罩图像进行反遮罩处理得到反联合遮罩图像,然后融合第二初始对象图像和反联合遮罩图像得到第一图像区域。例如,将第二初始对象图像和反联合遮罩图像中位置一致的像素点的像素值相乘,得到第一图像区域。可以理解,此时的第一图像区域相当于是和第二初始对象图像尺寸一致的图像,第一图像区域中与目标属性非相干区域为非遮挡部分,与目标属性相干区域为遮挡部分。例如,以背头属性为例,第一图像区域相当于是将第二初始对象图像中头发部分遮挡,非头发部分显示的图像。
在从融合对象图像中获取第二图像区域时,将第一融合图像和变换联合遮罩图像进行融合就可以得到第二图像区域。可以理解,此时的第二图像区域相当于是和第二初始对象图像尺寸一致的图像,第二图像区域中与目标属性相干区域为非遮挡部分,与目标属性非相干区域为遮挡部分。例如,以背头属性为例,第二图像区域相当于是将融合对象图像中头发部分显示,非头发部分遮挡的图像。
在基于第一图像区域和第二图像区域得到参考对象图像时,将第一图像区域和第二图像区域进行相加融合即可得到参考对象图像。
上述实施例中,对目标联合遮罩图像进行尺寸变换,得到与第二初始对象图像对应相同尺寸的变换联合遮罩图像,对尺寸相同的图像进行数据处理可以降低数据处理难度、提高数据处理效率。对变换联合遮罩图像进行反遮罩处理,得到反联合遮罩图像,反遮罩处理可以明确在目标属性编辑时第二初始对象图像中需要保持不变的区域。融合第二初始对象图像和反联合遮罩图像可以快速得到第一图像区域。
在一个实施例中,如图3所示,图像编辑方法还包括:
步骤S302,将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像;其中,目标对象图像用于表征对原始对象图像进行目标属性编辑得到的图像,原始对象图像是将第一随机变量输入初始图像生成模型,结尾网络层直接输出的图像。
其中,后向网络层是指连接在目标网络层之后的网络层。结尾网络层是指模型中最后的网络层。
原始对象图像是指将第一随机变量输入初始图像生成模型,初始图像生成模型的结尾网络层直接输出的图像。也就是,将第一随机变量输入初始图像生成模型后,不干涉模型,将初始图像生成模型的结尾网络层输出的图像作为原始对象图像。与原始对象图像不同,目标对象图像是将第一随机变量输入初始图像生成模型后,对模型进行干涉,获取初始图像生成模型中目标网络层输出的第二初始对象图像进行处理得到参考对象图像,将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层继续进行数据处理,最终将初始图像生成模型的结尾网络层输出的图像作为目标对象图像。目标对象图像用于表征对原始对象图像进行目标属性编辑得到的图像。
具体地,为了提高目标属性编辑的质量,计算机设备得到参考对象图像后,可以将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,在模型中继续进行前向计算,最终将初始图像生成模型的结尾网络层输出的图像作为目标对象图像。可以理解,在模型中通过后向网络层的数据处理,可以增加图像中的细节信息以及平滑图像中不和谐的地方,最终结尾网络层可以输出质量更高的图像。
上述实施例中,将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层进行前向计算,初始图像生成模型的结尾网络层可以输出图像质量更高的目标对象图像。目标对象图像可以表示对原始对象图像进行目标属性编辑得到的高质量图像。其中,原始对象图像是将第一随机变量输入初始图像生成模型,初始图像生成模型的结尾网络层直接输出的图像。
在一个实施例中,如图4所示,将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像,包括:
步骤S402,将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,获取后向网络层中第三网络层输出的图像作为第三初始对象图像。
步骤S404,获取特征图像生成模型中与第三网络层匹配的第四网络层输出的图像作为第三特征对象图像。
其中,第三网络层是初始图像生成模型中目标网络层的后向网络层中的任意一个网络层。与第三网络层匹配的第四网络层是特征图像生成模型中目标网络层的后向网络层中与第三网络层匹配的网络层。第三网络层和第四网络层可以是两个模型中层数相同的网络层。
例如,初始图像生成模型和特征图像生成模型中匹配的目标网络层为两个模型中的第十一层,第三网络层可以是初始图像生成模型中的第十三层,与第三网络层匹配的第四网络层可以是特征图像生成模型中的第十三层。
可以理解,第三网络层和第一网络层无绝对的前后关系,第三网络层可以是第一网络层的前向网络层,也可以是第一网络层的后向网络层。第四网络层和第二网络层无绝对的前后关系,可以是第二网络层的前向网络层,也可以是第二网络层的后向网络层。
步骤S406,基于当前联合遮罩图像,融合第三初始对象图像和第三特征对象图像,得到更新对象图像;当前联合遮罩图像为目标联合遮罩图像或更新联合遮罩图像,更新联合遮罩图像是基于初始图像生成模型中第五网络层输出的图像和特征图像生成模型中第六网络层输出的图像得到的。
步骤S408,将更新对象图像替换第三初始对象图像输入初始图像生成模型中第三网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像。
其中,更新联合遮罩图像是指与目标联合遮罩图像不同的联合遮罩图像,是新的联合遮罩图像。更新联合遮罩图像的生成方式与目标联合遮罩图像的生成方式类似。将第一随机变量输入初始图像生成模型和特征图像生成模型,获取初始图像生成模型中第五网络层输出的图像作为第三初始对象图像,获取特征图像生成模型中第六网络层输出的图像作为第三特征对象图像,基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第三初始对象图像和第三特征对象图像分别对应的属性遮罩图像,基于获取到的属性遮罩图像得到更新联合遮罩图像。
可以理解,第五网络层和第一网络层可以是相同的网络层,也可以是不同的网络层。第六网络层和第二网络层可以是相同的网络层,也可以是不同的网络层。第五网络层和第六网络层可以是相同的网络层,也可以不同的网络层。
具体地,在从将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层之后至初始图像生成模型输出目标对象图像之前的过程中,可以参考得到参考对象图像的方式,基于同一或不同联合遮罩图像再次进行融合操作,以提高初始图 像生成模型最终输出的目标对象图像的质量,提高目标属性编辑的质量。
在经过第一次融合得到参考对象图像后,计算机设备将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,以继续进行模型的前向计算,可以获取后向网络层中第三网络层输出的图像作为第三初始对象图像。相应的,从特征图像生成模型中,获取与第三网络层匹配的网络层输出的图像作为第三特征对象图像。计算机设备可以基于同一联合遮罩图像或新的联合遮罩图像将第三初始对象图像和第三特征对象图像进行融合得到更新对象图像,将更新对象图像替换第三初始对象图像输入初始图像生成模型中第三网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像。
在每次融合时,可以基于同一联合遮罩图像,例如,都基于目标联合遮罩图像进行融合,也可以基于不同的联合遮罩图像,例如,第一次融合基于目标联合遮罩图像,第二次融合基于更新联合遮罩图像。
可以理解,在初始图像生成模型输出目标对象图像之前,可以进行至少一次的融合操作,目标对象图像的质量随着融合次数的增加而提升。
在一个实施例中,若目标属性对应的属性编辑复杂度大于或等于预设复杂度,则进行第一预设数目的融合操作,若目标属性对应的属性编辑复杂度小于预设复杂度,则进行第二预设数目的融合操作,第一预设数目大于第二预设数目。这样,在目标属性比较难编辑时,进行较多次的融合操作以保障目标属性编辑质量,在目标属性比较容易编辑时,进行较少次的融合操作以保障目标属性编辑效率。属性编辑复杂度用于表征属性编辑的复杂程度、精细程度。可以是根据目标属性包含的属性数量、目标属性对应的图像区域尺寸等数据中的至少一者确定的,例如,同时编辑三个属性需要比只编辑一个属性进行更多次融合操作;由于头发区域所占面积比络腮胡大,编辑头发属性需要比编辑络腮胡进行更多次融合操作。
上述实施例中,将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层进行前向计算的过程中,可以再次进行基于联合遮罩图像将初始对象图像和特征对象图像进行融合的操作,以进一步提高初始图像生成模型的结尾网络层最终输出的目标对象图像的图像质量,提高目标属性编辑的精细度。
在一个实施例中,如图5所示,图像编辑方法还包括:
步骤S502,将原始对象图像和目标对象图像作为训练图像对。
步骤S504,基于训练图像对,对初始图像属性编辑模型进行模型训练,得到目标图像属性编辑模型;目标图像属性编辑模型用于对模型的输入图像进行目标属性编辑。
其中,图像属性编辑模型的输入数据为图像,输出数据为经过属性编辑的图像。例如,图像属性编辑模型为背头发型编辑模型,背头发型编辑模型用于将输入图像中的发型编辑为背头发型。初始图像属性编辑模型为待训练的图像属性编辑模型,目标图像属性编辑模型为训练完成的图像属性编辑模型。
具体地,通过本申请的图像编辑方法可以生成大量成对的对象图像,成对的对象图像用于作为图像属性编辑模型的训练数据。计算机设备可以将原始对象图像和目标对象图像作为训练图像对,训练图像对用于作为图像属性编辑模型的训练数据,基于训练图像对对初始图像属性编辑模型进行模型训练,可以训练得到用于对模型的输入图像进行目标属性编辑的目标图像属性编辑模型。
在模型训练时,训练图像对中的原始对象图像用于作为初始图像属性编辑模型的输入数据,训练图像对中的目标对象图像用于作为初始图像属性编辑模型的期望输出数据,通过有监督训练,使得将原始对象图像输入模型,模型可以输出与目标对象图像非常接近的图像,也就是使得模型具备对输入图像进行目标属性编辑的能力,从而得到目标图像属性编辑模型。
可以理解,通过不同的随机变量,可以得到不同的训练图像对,一个随机变量对应一 个训练图像对,将各个训练图像对组成训练图像对集合,基于训练图像对集合对初始图像属性编辑模型进行模型训练得到目标图像属性编辑模型。
在一个实施例中,图像属性编辑模型也可以是生成对抗网络。
上述实施例中,通过本申请的图像编辑方法可以快速生成大量成对的对象图像,成对的对象图像用于作为图像属性编辑模型的训练数据。传统技术中,大量成对的对象图像的获取成本较高,然而通过本申请的图像编辑方法得到的原始对象图像和目标对象图像能够作为训练图像对来训练图像属性编辑模型,通过本申请的图像编辑方法可以快速地、准确地得到大量成对的对象图像。
在一个实施例中,初始图像生成模型和特征图像生成模型中依次连接的网络层用于输出尺寸依次递增的图像,在初始图像生成模型和特征图像生成模型中,排序一致的网络层输出的图像对应相同的尺寸。
其中,初始图像生成模型包括依次连接的网络层,依次连接的网络层用于输出尺寸依次递增的图像。在初始图像生成模型中,当前网络层的输出图像用于作为下一网络层的输入数据,下一网络层可以对当前网络层的输出图像进行进一步的数据处理来提高图像质量,输出尺寸更大的图像。可以理解,初始图像生成模型的结尾网络层输出的图像的尺寸最大、分辨率最大、图像细节最丰富和和谐、图像质量最高。同理,特征图像生成模型也包括依次连接的网络层,依次连接的网络层用于输出尺寸依次递增的图像。初始图像生成模型和特征图像生成模型包括相同数量的网络层。在初始图像生成模型和特征图像生成模型中,排序一致的网络层输出的图像对应相同的尺寸。
上述实施例中,初始图像生成模型和特征图像生成模型中依次连接的网络层用于输出尺寸依次递增的图像,初始图像生成模型和特征图像生成模型最终可以输出高分辨率的高质量图像。在初始图像生成模型和特征图像生成模型中,排序一致的网络层输出的图像对应相同的尺寸,在基于联合遮罩图像对初始图像生成模型和特征图像生成模型中排序一致的网络层输出的图像进行融合时,可以实现准确的目标属性编辑,保障编辑效果。
在一个实施例中,目标类别对象为人脸,初始图像生成模型为初始人脸图像生成模型,特征图像生成模型为特征人脸图像生成模型,目标属性为目标局部人脸属性。
其中,若目标类别对象为人脸,则第一训练图像集中的图像为人脸图像,第二训练图像集中的图像为具备目标属性的人脸图像。基于第一训练图像集训练得到的初始图像生成模型为初始人脸图像生成模型,初始人脸图像生成模型用于生成人脸图像。将随机变量输入初始人脸图像生成模型,初始人脸图像生成模型就可以输出能够以假乱真的人脸图像。基于第二训练图像集对初始人脸图像生成模型进行训练得到的特征图像生成模型为特征人脸图像生成模型,特征人脸图像生成模型用于生成具有目标属性的人脸图像。将随机变量输入特征人脸图像生成模型,特征人脸图像生成模型就可以输出具有目标属性的人脸图像。
若目标类别对象为人脸,则目标属性为目标局部人脸属性。目标局部人脸属性是从大量的局部人脸属性中确定的局部人脸属性。局部人脸属性用于描述人脸的局部信息,例如,局部人脸属性可以是各种头发属性,局部人脸属性可以是各种人脸表情,局部人脸属性可以是各种胡子属性。目标局部人脸属性可以是局部人脸属性中的一个或两个或多个。
具体地,本申请的图像编辑方法具体可以应用在人脸图像中。获取第一随机变量,将第一随机变量分别输入初始人脸图像生成模型和特征人脸图像生成模型,获取初始人脸图像生成模型中第一网络层输出的图像,得到第一初始人脸图像,获取特征人脸图像生成模型中第二网络层输出的图像,得到第一特征人脸图像,基于目标属性所属的属性类型在人脸图像中对应的图像区域,获取第一初始人脸图像和第一特征人脸图像分别对应的属性遮罩图像,基于各个属性遮罩图像得到目标联合遮罩图像,获取初始人脸图像生成模型和特征人脸图像生成模型中匹配的目标网络层输出的图像,得到第二初始人脸图像和第二特征 人脸图像,基于目标联合遮罩图像将第二初始人脸图像和第二特征人脸图像进行融合,得到参考人脸图像,参考人脸图像用于表征对第二初始人脸图像进行目标属性编辑得到的图像。
进一步的,将参考人脸图像替换第二初始人脸图像输入初始人脸图像生成模型中目标网络层的后向网络层,获取结尾网络层输出的图像作为目标人脸图像。原始人脸图像是将第一随机变量输入初始人脸图像生成模型,模型的结尾网络层直接输出的图像。将原始人脸图像和目标人脸图像作为训练人脸图像对,基于训练人脸图像对,对初始人脸图像属性编辑模型进行模型训练,得到目标人脸图像属性编辑模型,目标人脸图像属性编辑模型用于对模型的输入人脸图像进行目标属性编辑。
上述实施例中,目标类别对象为人脸,初始图像生成模型为初始人脸图像生成模型,特征图像生成模型为特征人脸图像生成模型,目标属性为目标局部人脸属性,本申请的图像编辑方法可以实现针对初始图像生成模型输出的初始人脸图像的目标属性编辑,提高针对人脸图像的目标属性编辑效率。
在一个具体的实施例中,本申请的图像编辑方法可以应用于人脸图片局部属性编辑。人脸图片局部属性具体是指针对性地改变一张人脸图片的某个局部属性,同时维持其它属性不发生任何变化。例如,改变人物的发型为背头,但五官不发生任何改变。本申请的图像编辑方法是一种简单高效的方法,可以完成对人脸局部属性的非二分的精确编辑。主要收集一批具有相同目标属性的人脸图片数据,即可实现对该目标属性的人脸语义区域进行对应的属性编辑。
参考图6,本申请的图像编辑方法包括以下步骤:
1、训练得到原始网络
收集一批不需要任何标注的高清人脸图片作为人脸训练数据,然后利用眼睛的位置对齐每一张人脸图片中人脸的位置,利用对齐好的人脸图片训练一个生成对抗网络,该网络记为原始网络(即初始图像生成模型)。
2、对原始网络进行微调得到特征网络
收集一批少量除拥有统一特征(即目标属性)外没有其他任何标注的高清人脸图片作为人脸训练数据,例如都是背头,然后利用眼睛的位置对齐每一张人脸图片中人脸的位置,利用对齐好的人脸图片对原始网络微调训练,获得一个新的生成对抗网络,记为特征网络(即特征图像生成模型)。
在一个实施例中,生成对抗网络包括噪声转换网络和图片合成网络,噪声转换网络的输入数据为噪声,输出数据为中间隐编码,图片合成网络的输入数据为中间隐编码,输出数据为图片。从高斯噪声中采样得到的噪声向量z首先被噪声转换网络转换为中间隐编码w,继而中间隐编码w被图片合成网络中的每层中的仿射变换层转换为每一层使用的调制参数,网络最终可以输出一张高清的人脸图片。可以认为不同的中间隐编码w对应了不同的高清人脸图片。在一个实施例中,生成对抗网络可以是StyleGAN2。在训练得到原始网络的过程中,噪声转换网络需要调整,在微调原始网络得到特征网络的过程中,噪声转换网络可以不需要调整。
3、获取联合遮罩
对原始网络和特征网络进行同步推理获得人脸图片,使用遮罩模型或者其他自定义的方式获取对应特定特征区域(即与目标属性相干区域)的遮罩,进而生成联合遮罩图像。
例如,可以对原始网络和特征网络输入同一个噪声向量z(输入噪声转换网络)或输入同一中间隐编码w(输入图片合成网络),即可分别获取初始人像图片和特征人像图片,例如,将原始网络和特征网络的结尾网络层输出的图像作为初始人像图片和特征人像图片。由于特征网络是由原始网络微调得来,这两张图片的人脸朝向等信息大致一致。使用遮罩模型或者其他自定义的手段分别获取初始人像图片和特征人像图片的目标特征区域的遮 罩,例如若目标属性为背头,则遮罩具体可以是头发遮罩,最终基于这两个遮罩得到联合遮罩,记为mask。
4、基于联合遮罩完成属性编辑
对原始网络和特征网络进行同步推理,推理过程中将特征网络目标层联合遮罩对应区域的特征融合进原始网络中,然后原始网络继续进行推理,最终原始网络推理完成,获得最终人像图片。
例如,可以对原始网络和特征网络输入同一个噪声向量z(输入噪声转换网络)或输入同一中间隐编码w(输入图片合成网络),在这过程中,对于特定层的联合遮罩指定区域的特征按照系数进行融合。例如,生成对抗网络为金字塔型结构,参考图7,网络从较小分辨率一层一层地慢慢生成更大尺寸的图片。生成对抗网络第11层的输出特征尺寸为64x64。原始网络第11层的输出将记为特征网络第11层的输出将记为将mask的分辨率使用双线性插值等尺寸调整算法调整为64x64,记为mask64。设定一个融合系数a,融合之后的特征图计算公式如下:
其中,a的取值范围在0~1之间。在一个实施例中,1-a大于a,这样可以将目标属性的信息更多地融合到图像中,以进一步保障具有与目标属性更接近或一致的属性,进一步保障具有与目标属性更接近或一致的属性。在一个实施例中,a的取值范围在0~0.3之间。
将融合后的特征代替进行原始网络的进一步前向。最终可获得完成属性编辑的高清图片,将原始网络的结尾网络层输出的图像作为最终完成属性编辑的高清图片,即最终人像图片。例如,原始网络的结尾网络层可输出分辨率1024x1024的图像,本申请最终实现可控编辑生成1024*1024像素级别的高清人脸图片。
可以理解,对于特定层的遮罩指定区域的特征按照系数进行融合的操作可以执行至少一次。例如,基于同一或不同的联合遮罩在第11层、第12层均进行融合。图7中的702表示
参考图8,图8为初始人像图片、特征人像图片和最终人像图片的示意图。图8中同一列中的人像图片是基于不同的中间隐编码得到的,图8中同一行中的人像图片是基于同一中间隐编码得到的。
可以理解,本方法可以实现多种局部区域的多种局部属性编辑。对于不同人脸属性,只需要重复第二、三、四步,即可完成不同特定的人脸属性编辑。参考图9,图9包括背头编辑、络腮胡编辑和短发编辑的示意图。
目前收集成对的真实人脸图片比较困难,但是通过本申请的图像编辑方法可以生成大量成对的人脸属性编辑训练样本,用于支持其它生成对抗网络训练的数据。例如,初始人像图片和最终人像图片组成一对人脸属性编辑训练样本。用大量成对的人脸属性编辑训练样本可以训练人脸属性编辑网络,人脸属性编辑网络可用于前端的各种人脸属性编辑任务,例如,应用于应用程序中的表情编辑功能,发型编辑功能等。
当然,本申请的图像编辑方法还可以应用于其他图片局部属性编辑中,例如,应用于其他动物图像局部属性编辑中、应用于物体图像局部属性编辑中等。
应该理解的是,虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指 示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
基于同样的发明构思,本申请实施例还提供了一种用于实现上述所涉及的图像编辑方法的图像编辑装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似,故下面所提供的一个或多个图像编辑装置实施例中的具体限定可以参见上文中对于图像编辑方法的限定,在此不再赘述。
在一个实施例中,如图10所示,提供了一种图像编辑装置,包括:模型获取模块1002、随机变量获取模块1004、第一图像获取模块1006、遮罩图像获取模块1008、第二图像获取模块1010和图像融合模块1012,其中:
模型获取模块1002,用于获取初始图像生成模型和特征图像生成模型;所述初始图像生成模型是基于第一训练图像集训练得到的,所述特征图像生成模型是基于第二训练图像集对所述初始图像生成模型进行训练得到的,所述第一训练图像集和所述第二训练图像集中的对象图像属于目标类别对象,所述第二训练图像集中的对象图像包含目标属性,所述特征图像生成模型输出的图像具有目标属性;
随机变量获取模块1004,用于获取第一随机变量,将第一随机变量分别输入初始图像生成模型和特征图像生成模型;初始图像生成模型是基于第一训练图像集训练得到的,特征图像生成模型是基于第二训练图像集对初始图像生成模型进行训练得到的;第一训练图像集和第二训练图像集包括目标类别对象对应的对象图像,第二训练图像集中的对象图像为包含目标属性的对象图像,特征图像生成模型输出的图像具有目标属性。
第一图像获取模块1006,用于获取初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像。
遮罩图像获取模块1008,用于基于目标属性所属的属性类型在对象图像中对应的图像区域,获取第一初始对象图像和第一特征对象图像分别对应的属性遮罩图像,基于各个属性遮罩图像得到目标联合遮罩图像。
第二图像获取模块1010,用于获取初始图像生成模型和特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像。
图像融合模块1012,用于基于目标联合遮罩图像,融合第二初始对象图像和第二特征对象图像,得到参考对象图像;参考对象图像用于表征对第二初始对象图像进行目标属性编辑得到的图像。
在一个实施例中,模型获取模块1002还用于:
基于第一训练图像集,对初始图像生成网络和初始图像判别网络进行对抗学习,得到中间图像生成网络和中间图像判别网络;基于中间图像生成网络得到初始图像生成模型;基于第二训练图像集,对中间图像生成网络和中间图像判别网络进行对抗学习,得到目标图像生成网络和目标图像判别网络;基于目标图像生成网络得到特征图像生成模型。
在一个实施例中,如图11所示,图像编辑装置还包括:
训练图像集获取模块1102,用于获取第一候选图像集;第一候选图像集中的各个第一候选图像为目标类别对象对应的对象图像;基于目标类别对象的参考对象部位在第一候选图像中的位置,对各个第一候选图像进行图像对齐;基于图像对齐后的各个第一候选图像,得到第一训练图像集;获取第二候选图像集;第二候选图像集中的各个第二候选图像为目标类别对象对应的、且包含目标属性的对象图像;基于目标类别对象的参考对象部位在第 二候选图像中的位置,对各个第二候选图像进行图像对齐;基于图像对齐后的各个第二候选图像,得到第二训练图像集。
在一个实施例中,遮罩图像获取模块1008还用于:
将第一初始对象图像对应的属性遮罩图像作为第一遮罩图像,将第一特征对象图像对应的属性遮罩图像作为第二遮罩图像;第一初始对象图像和第一特征对象图像为不同尺寸的图像,第一遮罩图像和第二遮罩图像为不同尺寸的图像;将第一遮罩图像和第二遮罩图像进行尺寸对齐,基于尺寸对齐后的第一遮罩图像和第二遮罩图像得到目标联合遮罩图像。
在一个实施例中,图像融合模块1012还用于:
从第二初始对象图像中,获取与目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域;融合第二初始对象图像和第二特征对象图像,得到融合对象图像;从融合对象图像中,获取与目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域;基于第一图像区域和第二图像区域,得到参考对象图像。
在一个实施例中,图像融合模块1012还用于:
对目标联合遮罩图像进行尺寸变换,得到与第二初始对象图像对应相同尺寸的变换联合遮罩图像;对变换联合遮罩图像进行反遮罩处理,得到反联合遮罩图像;融合第二初始对象图像和反联合遮罩图像,得到第一图像区域。
图像融合模块1012还用于:
融合融合对象图像和变换联合遮罩图像,得到第二图像区域。
在一个实施例中,如图11所示,图像编辑装置还包括:
目标对象图像确定模块1104,用于将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像;其中,目标对象图像用于表征对原始对象图像进行目标属性编辑得到的图像,原始对象图像是将第一随机变量输入初始图像生成模型,结尾网络层直接输出的图像。
在一个实施例中,目标对象图像确定模块1104还用于:
将参考对象图像替换第二初始对象图像输入初始图像生成模型中目标网络层的后向网络层,获取后向网络层中第三网络层输出的图像作为第三初始对象图像;获取特征图像生成模型中与第三网络层匹配的第四网络层输出的图像作为第三特征对象图像;基于当前联合遮罩图像,融合第三初始对象图像和第三特征对象图像,得到更新对象图像;当前联合遮罩图像为目标联合遮罩图像或更新联合遮罩图像,更新联合遮罩图像是基于初始图像生成模型中第五网络层输出的图像和特征图像生成模型中第六网络层输出的图像得到的;将更新对象图像替换第三初始对象图像输入初始图像生成模型中第三网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像。
在一个实施例中,如图11所示,图像编辑装置还包括:
模型训练模块1106,用于将原始对象图像和目标对象图像作为训练图像对;基于训练图像对,对初始图像属性编辑模型进行模型训练,得到目标图像属性编辑模型;目标图像属性编辑模型用于对模型的输入图像进行目标属性编辑。
在一个实施例中,初始图像生成模型和特征图像生成模型中依次连接的网络层用于输出尺寸依次递增的图像,在初始图像生成模型和特征图像生成模型中,排序一致的网络层输出的图像对应相同的尺寸。
在一个实施例中,目标类别对象为人脸,初始图像生成模型为初始人脸图像生成模型,特征图像生成模型为特征人脸图像生成模型,目标属性为目标局部人脸属性。
上述图像编辑装置,初始图像生成模型是基于目标类别对象对应的对象图像训练得到的,在接收输入数据后,初始图像生成模型可以输出目标类别对象对应的对象图像。特征图像生成模型是基于目标类别对象对应的、且包含目标属性的对象图像,对初始图像生成模型训练得到的,在接收输入数据后,特征图像生成模型可以输出目标类别对象对应的、 且包含目标属性的对象图像。将第一随机变量分别输入初始图像生成模型和特征图像生成模型,初始图像生成模型和特征图像生成模型分别输出的对象图像具有一定的相似性,并且特征图像生成模型输出的对象图像包含目标属性。属性遮罩图像可以反映目标属性所属的属性类型在对象图像中对应的图像区域,目标联合遮罩图像可以反映目标属性所属的属性类型在第一初始对象图像和第一特征对象图像中的联合图像区域。基于目标联合遮罩图像将第二初始对象图像和第二特征对象图像进行融合,可以将第二特征对象图像中目标属性对应的图像区域融合至第二初始对象图像中,从而融合得到的参考对象图像相当于在不改变第二初始对象图像其它属性的情况下使其具有目标属性,即参考对象图像相当于是对第二初始对象图像进行目标属性编辑得到的图像。无需人工精细编辑,基于初始图像生成模型、特征图像生成模型和目标联合遮罩图像可以快速实现对初始图像生成模型输出图像的目标属性编辑,提高了图像编辑效率,并且也可以保障图像编辑准确性。
上述图像编辑装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图12所示。该计算机设备包括处理器、存储器、输入/输出接口(Input/Output,简称I/O)和通信接口。其中,处理器、存储器和输入/输出接口通过系统总线连接,通信接口通过输入/输出接口连接到系统总线。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储初始图像生成模型、特征图像生成模型、参考对象图像、目标对象图像等数据。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种图像编辑方法。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图13所示。该计算机设备包括处理器、存储器、输入/输出接口、通信接口、显示单元和输入装置。其中,处理器、存储器和输入/输出接口通过系统总线连接,通信接口、显示单元和输入装置通过输入/输出接口连接到系统总线。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、移动蜂窝网络、NFC(近场通信)或其他技术实现。该计算机可读指令被处理器执行时以实现一种图像编辑方法。该计算机设备的显示单元用于形成视觉可见的画面,可以是显示屏、投影装置或虚拟现实成像装置,显示屏可以是液晶显示屏或电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图12、13中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,该处理器执行计算机可读指令时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机可读指令,该计算 机可读指令被处理器执行时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行上述各方法实施例中的步骤。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory,MRAM)、铁电存储器(Ferroelectric Random Access Memory,FRAM)、相变存储器(Phase Change Memory,PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器等。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等,不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等,不限于此。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。

Claims (15)

  1. 一种图像编辑方法,其特征在于,由计算机设备执行,所述方法包括:
    获取初始图像生成模型和特征图像生成模型;所述初始图像生成模型是基于第一训练图像集训练得到的,所述特征图像生成模型是基于第二训练图像集对所述初始图像生成模型进行训练得到的,所述第一训练图像集和所述第二训练图像集中的对象图像属于目标类别对象,所述第二训练图像集中的对象图像包含目标属性,所述特征图像生成模型输出的图像具有所述目标属性;
    获取第一随机变量,将所述第一随机变量分别输入所述初始图像生成模型和所述特征图像生成模型;
    获取所述初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取所述特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像;
    基于所述目标属性所属的属性类型在对象图像中对应的图像区域,获取所述第一初始对象图像和所述第一特征对象图像分别对应的属性遮罩图像,基于各个所述属性遮罩图像得到目标联合遮罩图像;
    获取所述初始图像生成模型和所述特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像;及
    基于所述目标联合遮罩图像,融合所述第二初始对象图像和所述第二特征对象图像,得到参考对象图像;所述参考对象图像用于表征对所述第二初始对象图像进行目标属性编辑得到的图像。
  2. 根据权利要求1所述的方法,其特征在于,所述获取初始图像生成模型和特征图像生成模型,包括:
    基于所述第一训练图像集,对初始图像生成网络和初始图像判别网络进行对抗学习,得到中间图像生成网络和中间图像判别网络;
    基于所述中间图像生成网络得到初始图像生成模型;
    基于所述第二训练图像集,对中间图像生成网络和中间图像判别网络进行对抗学习,得到目标图像生成网络和目标图像判别网络;及
    基于所述目标图像生成网络得到特征图像生成模型。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取第一候选图像集;所述第一候选图像集中的各个第一候选图像为所述目标类别对象对应的对象图像;
    基于所述目标类别对象的参考对象部位在第一候选图像中的位置,对所述各个第一候选图像进行图像对齐;
    基于图像对齐后的各个第一候选图像,得到所述第一训练图像集;
    获取第二候选图像集;所述第二候选图像集中的各个第二候选图像为所述目标类别对象对应的、且包含所述目标属性的对象图像;
    基于所述目标类别对象的参考对象部位在第二候选图像中的位置,对所述各个第二候选图像进行图像对齐;及
    基于图像对齐后的各个第二候选图像,得到所述第二训练图像集。
  4. 根据权利要求1所述的方法,其特征在于,所述基于各个所述属性遮罩图像得到目标联合遮罩图像,包括:
    将所述第一初始对象图像对应的属性遮罩图像作为第一遮罩图像,将所述第一特征对象图像对应的属性遮罩图像作为第二遮罩图像;所述第一初始对象图像和所述第一特征对象图像为不同尺寸的图像,所述第一遮罩图像和所述第二遮罩图像为不同尺寸的图像;及
    将所述第一遮罩图像和所述第二遮罩图像进行尺寸对齐,基于尺寸对齐后的第一遮罩图像和第二遮罩图像得到所述目标联合遮罩图像。
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述目标联合遮罩图像,融合所述第二初始对象图像和所述第二特征对象图像,得到参考对象图像,包括:
    从所述第二初始对象图像中,获取与所述目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域;
    融合所述第二初始对象图像和所述第二特征对象图像,得到融合对象图像;
    从所述融合对象图像中,获取与所述目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域;及
    基于所述第一图像区域和所述第二图像区域,得到所述参考对象图像。
  6. 根据权利要求5所述的方法,其特征在于,所述从所述第二初始对象图像中,获取与所述目标联合遮罩图像中遮挡区域匹配的图像区域作为第一图像区域,包括:
    对所述目标联合遮罩图像进行尺寸变换,得到与所述第二初始对象图像对应相同尺寸的变换联合遮罩图像;
    对所述变换联合遮罩图像进行反遮罩处理,得到反联合遮罩图像;
    融合所述第二初始对象图像和所述反联合遮罩图像,得到所述第一图像区域;
    所述从所述融合对象图像中,获取与所述目标联合遮罩图像中非遮挡区域匹配的图像区域作为第二图像区域,包括:
    融合所述融合对象图像和所述变换联合遮罩图像,得到所述第二图像区域。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    将所述参考对象图像替换所述第二初始对象图像输入所述初始图像生成模型中目标网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像;
    其中,所述目标对象图像用于表征对原始对象图像进行目标属性编辑得到的图像,所述原始对象图像是将所述第一随机变量输入所述初始图像生成模型,所述结尾网络层直接输出的图像。
  8. 根据权利要求7所述的方法,其特征在于,所述将所述参考对象图像替换所述第二初始对象图像输入所述初始图像生成模型中目标网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像,包括:
    将所述参考对象图像替换所述第二初始对象图像输入所述初始图像生成模型中目标网络层的后向网络层,获取后向网络层中第三网络层输出的图像作为第三初始对象图像;
    获取所述特征图像生成模型中与所述第三网络层匹配的第四网络层输出的图像作为第三特征对象图像;
    基于当前联合遮罩图像,融合所述第三初始对象图像和所述第三特征对象图像,得到更新对象图像;所述当前联合遮罩图像为所述目标联合遮罩图像或更新联合遮罩图像,所述更新联合遮罩图像是基于所述初始图像生成模型中第五网络层输出的图像和所述特征图像生成模型中第六网络层输出的图像得到的;及
    将所述更新对象图像替换所述第三初始对象图像输入所述初始图像生成模型中第三网络层的后向网络层,获取结尾网络层输出的图像作为目标对象图像。
  9. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    将所述原始对象图像和所述目标对象图像作为训练图像对;及
    基于所述训练图像对,对初始图像属性编辑模型进行模型训练,得到目标图像属性编辑模型;所述目标图像属性编辑模型用于对模型的输入图像进行目标属性编辑。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述初始图像生成模型和所述特征图像生成模型中依次连接的网络层用于输出尺寸依次递增的图像,在所述初始图像生成模型和所述特征图像生成模型中,排序一致的网络层输出的图像对应相同的尺寸。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述目标类别对象为人脸,所述初始图像生成模型为初始人脸图像生成模型,所述特征图像生成模型为特征人脸 图像生成模型,所述目标属性为目标局部人脸属性。
  12. 一种图像编辑装置,其特征在于,所述装置包括:
    模型获取模块,用于获取初始图像生成模型和特征图像生成模型;所述初始图像生成模型是基于第一训练图像集训练得到的,所述特征图像生成模型是基于第二训练图像集对所述初始图像生成模型进行训练得到的,所述第一训练图像集和所述第二训练图像集中的对象图像属于目标类别对象,所述第二训练图像集中的对象图像包含目标属性,所述特征图像生成模型输出的图像具有所述目标属性;
    随机变量获取模块,用于获取第一随机变量,将所述第一随机变量分别输入所述初始图像生成模型和所述特征图像生成模型;
    第一图像获取模块,用于获取所述初始图像生成模型中第一网络层输出的图像,得到第一初始对象图像,获取所述特征图像生成模型中第二网络层输出的图像,得到第一特征对象图像;
    遮罩图像获取模块,用于基于所述目标属性所属的属性类型在对象图像中对应的图像区域,获取所述第一初始对象图像和所述第一特征对象图像分别对应的属性遮罩图像,基于各个所述属性遮罩图像得到目标联合遮罩图像;
    第二图像获取模块,用于获取所述初始图像生成模型和所述特征图像生成模型中匹配的目标网络层输出的图像,得到第二初始对象图像和第二特征对象图像;及
    图像融合模块,用于基于所述目标联合遮罩图像,融合所述第二初始对象图像和所述第二特征对象图像,得到参考对象图像;所述参考对象图像用于表征对所述第二初始对象图像进行目标属性编辑得到的图像。
  13. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现权利要求1至11中任一项所述的方法的步骤。
  14. 一种计算机可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现权利要求1至11中任一项所述的方法的步骤。
  15. 一种计算机程序产品,包括计算机可读指令,该计算机可读指令被处理器执行时实现权利要求1至11中任一项所述的方法的步骤。
PCT/CN2023/119716 2022-10-28 2023-09-19 图像编辑方法、装置、计算机设备和存储介质 WO2024087946A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211330943.1A CN115393183B (zh) 2022-10-28 2022-10-28 图像编辑方法、装置、计算机设备和存储介质
CN202211330943.1 2022-10-28

Publications (1)

Publication Number Publication Date
WO2024087946A1 true WO2024087946A1 (zh) 2024-05-02

Family

ID=84115282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/119716 WO2024087946A1 (zh) 2022-10-28 2023-09-19 图像编辑方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN115393183B (zh)
WO (1) WO2024087946A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393183B (zh) * 2022-10-28 2023-02-07 腾讯科技(深圳)有限公司 图像编辑方法、装置、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941986A (zh) * 2019-10-10 2020-03-31 平安科技(深圳)有限公司 活体检测模型的训练方法、装置、计算机设备和存储介质
CN112734873A (zh) * 2020-12-31 2021-04-30 北京深尚科技有限公司 对抗生成网络的图像属性编辑方法、装置、设备及介质
US20210383154A1 (en) * 2019-05-24 2021-12-09 Shenzhen Sensetime Technology Co., Ltd. Image processing method and apparatus, electronic device and storage medium
CN115223013A (zh) * 2022-07-04 2022-10-21 深圳万兴软件有限公司 基于小数据生成网络的模型训练方法、装置、设备及介质
CN115393183A (zh) * 2022-10-28 2022-11-25 腾讯科技(深圳)有限公司 图像编辑方法、装置、计算机设备和存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197525B (zh) * 2017-11-20 2020-08-11 中国科学院自动化研究所 人脸图像生成方法及装置
CN111553267B (zh) * 2020-04-27 2023-12-01 腾讯科技(深圳)有限公司 图像处理方法、图像处理模型训练方法及设备
CN112001983B (zh) * 2020-10-30 2021-02-09 深圳佑驾创新科技有限公司 生成遮挡图像的方法、装置、计算机设备和存储介质
CN112560758A (zh) * 2020-12-24 2021-03-26 百果园技术(新加坡)有限公司 一种人脸属性编辑方法、系统、电子设备及存储介质
CN113570684A (zh) * 2021-01-22 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备和存储介质
CN113822953A (zh) * 2021-06-24 2021-12-21 华南理工大学 图像生成器的处理方法、图像生成方法及装置
CN113327221A (zh) * 2021-06-30 2021-08-31 北京工业大学 融合roi区域的图像合成方法、装置、电子设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210383154A1 (en) * 2019-05-24 2021-12-09 Shenzhen Sensetime Technology Co., Ltd. Image processing method and apparatus, electronic device and storage medium
CN110941986A (zh) * 2019-10-10 2020-03-31 平安科技(深圳)有限公司 活体检测模型的训练方法、装置、计算机设备和存储介质
CN112734873A (zh) * 2020-12-31 2021-04-30 北京深尚科技有限公司 对抗生成网络的图像属性编辑方法、装置、设备及介质
CN115223013A (zh) * 2022-07-04 2022-10-21 深圳万兴软件有限公司 基于小数据生成网络的模型训练方法、装置、设备及介质
CN115393183A (zh) * 2022-10-28 2022-11-25 腾讯科技(深圳)有限公司 图像编辑方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN115393183B (zh) 2023-02-07
CN115393183A (zh) 2022-11-25

Similar Documents

Publication Publication Date Title
US20220121932A1 (en) Supervised learning techniques for encoder training
US10719742B2 (en) Image composites using a generative adversarial neural network
CN109961507B (zh) 一种人脸图像生成方法、装置、设备及存储介质
Zhu et al. Barbershop: Gan-based image compositing using segmentation masks
CN110322416B (zh) 图像数据处理方法、装置以及计算机可读存储介质
KR20210119438A (ko) 얼굴 재연을 위한 시스템 및 방법
CN110766777A (zh) 虚拟形象的生成方法、装置、电子设备及存储介质
CN108961369A (zh) 生成3d动画的方法和装置
KR102602112B1 (ko) 얼굴 이미지 생성을 위한 데이터 프로세싱 방법 및 디바이스, 및 매체
US10515456B2 (en) Synthesizing hair features in image content based on orientation data from user guidance
WO2024087946A1 (zh) 图像编辑方法、装置、计算机设备和存储介质
WO2022022043A1 (zh) 人脸图像生成方法、装置、服务器及存储介质
Singh et al. Neural style transfer: A critical review
CN111127309A (zh) 肖像风格迁移模型训练方法、肖像风格迁移方法以及装置
JP2024503794A (ja) 2次元(2d)顔画像から色を抽出するための方法、システム及びコンピュータプログラム
CN114202615A (zh) 人脸表情的重建方法、装置、设备和存储介质
CN115984447A (zh) 图像渲染方法、装置、设备和介质
CN111402394A (zh) 三维夸张漫画人脸生成方法及装置
US20220101122A1 (en) Energy-based variational autoencoders
EP4150514A1 (en) High-resolution controllable face aging with spatially-aware conditional gans
CN115546011A (zh) 图像处理方法、装置、计算机设备和存储介质
US20220101145A1 (en) Training energy-based variational autoencoders
CN113221794A (zh) 一种训练数据集生成方法、装置、设备及存储介质
US20240013500A1 (en) Method and apparatus for generating expression model, device, and medium
CN113409207B (zh) 一种人脸图像清晰度提升方法及装置