CN113902957A - Image generation method, model training method, device, electronic device and medium - Google Patents

Image generation method, model training method, device, electronic device and medium Download PDF

Info

Publication number
CN113902957A
CN113902957A CN202111184646.6A CN202111184646A CN113902957A CN 113902957 A CN113902957 A CN 113902957A CN 202111184646 A CN202111184646 A CN 202111184646A CN 113902957 A CN113902957 A CN 113902957A
Authority
CN
China
Prior art keywords
age
image
target
sample image
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111184646.6A
Other languages
Chinese (zh)
Other versions
CN113902957B (en
Inventor
尚太章
何声一
刘家铭
洪智滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111184646.6A priority Critical patent/CN113902957B/en
Publication of CN113902957A publication Critical patent/CN113902957A/en
Application granted granted Critical
Publication of CN113902957B publication Critical patent/CN113902957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides an image generation method, a model training method, an image generation device, an electronic device, and a medium, which relate to the technical field of artificial intelligence, and in particular, to a computer vision and deep learning technology, and can be applied to scenes such as face image processing and face recognition. The specific implementation scheme is as follows: determining a target age interval corresponding to a target age, wherein the target age interval comprises a start age and an end age; determining a target age representation corresponding to the target age according to a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age; and generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image, wherein the target image is the image of which the age of the object is the target age.

Description

Image generation method, model training method, device, electronic device and medium
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and in particular, to computer vision and deep learning technology, which can be applied to scenes such as face image processing and face recognition. In particular, the invention relates to an image generation method, a model training method, an apparatus, an electronic device and a medium.
Background
Computer vision technology has attracted increasing attention in the age of artificial intelligence. Object property editing is an important research direction of computer vision technology and attracts more and more attention.
Age transformation belongs to one of object property edits. Age transformation may refer to rendering an image including an object under an age condition, leaving identity characteristics of the object unchanged.
Disclosure of Invention
The disclosure provides an image generation method, a model training method, an image generation device, an electronic device and a medium.
According to an aspect of the present disclosure, there is provided an image generation method including: determining a target age interval corresponding to a target age, wherein the target age interval comprises a start age and an end age; determining a target age representation corresponding to the target age based on a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age; and generating a target image of the subject based on an image of the subject in the original image based on the target age characteristic and an original image characteristic corresponding to the original image, wherein the target image is an image in which the age of the subject is the target age.
According to another aspect of the present disclosure, there is provided a training method of an age transformation model, including: training a first generator and a first discriminator by using the first real sample image set and the first simulation sample image set to obtain a trained first generator and a trained first discriminator; and determining the trained first generator as the age transformation model, wherein the age transformation model is used for generating the target image.
According to another aspect of the present disclosure, there is provided an image generating apparatus including: a first determining module, configured to determine a target age interval corresponding to a target age, where the target age interval includes a start age and an end age; a second determining module for determining a target age representation corresponding to the target age according to a start age representation corresponding to the start age and an end age representation corresponding to the end age; and a first generation module configured to generate a target image of the subject based on an image of the subject in the original image according to the target age characteristic and an original image characteristic corresponding to the original image, wherein the target image is an image in which an age of the subject is the target age.
According to another aspect of the present disclosure, there is provided an age transformation model training apparatus including: the training module is used for training a first generator and a first discriminator by utilizing the first real sample image set and the first simulation sample image set to obtain the trained first generator and the trained first discriminator; and a third determining module, configured to determine the trained first generator as the age transformation model, where the age transformation model is used to generate the target image.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates an exemplary system architecture that may be applied to an image generation method, an age transformation model training method, and an apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically shows a flow chart of an image generation method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of an image generation process according to an embodiment of the disclosure;
FIG. 4A schematically illustrates a flow chart of a method of training an age transformation model according to an embodiment of the present disclosure;
FIG. 4B schematically shows a flow chart of a method of training an age transformation model according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates an example schematic diagram of a training process of an age conversion model according to an embodiment of the disclosure;
FIG. 6A schematically illustrates an example schematic of an image generation process according to an embodiment of this disclosure;
FIG. 6B schematically illustrates an example schematic diagram of a process for processing a starting age using a multi-tier perceptron to arrive at a starting age characterization corresponding to the starting age, in accordance with an embodiment of the present disclosure;
FIG. 6C schematically illustrates an example schematic diagram of a process for processing an original image by an encoding module to obtain original image features corresponding to the original image, according to an embodiment of the disclosure;
FIG. 6D schematically illustrates an example of a process for processing original image features and a target age representation to obtain a target image of a subject using a decoding module according to an embodiment of this disclosure;
fig. 6E schematically illustrates an example schematic diagram of a continuous age transformation according to an embodiment of the disclosure;
fig. 7 schematically shows a block diagram of an image generation apparatus according to an embodiment of the present disclosure;
fig. 8 schematically shows a block diagram of a training apparatus of an age conversion model according to an embodiment of the present disclosure; and
fig. 9 schematically shows a block diagram of an electronic device adapted to implement an image generation method and a training method of an age conversion model according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Object property editing may involve editing of different properties. The attributes may include at least one of facial skin features, hair features, neck skin features, and the like. For example, object property editing may include removing or adding eyeglasses, removing or adding bangs, editing hair color, editing five sense organs, or editing makeup. If the edited attribute is a single attribute and the single attribute label is decoupled from other attributes, the editing of the target can be easily achieved. If the edited property is multiple properties, and the multiple properties may be coupled, the difficulty of achieving targeted editing is high.
Age transformation is the editing of properties of an object with a variety of properties and coupling between different properties. The age transformation can be implemented in the following manner.
One approach consists in an age transformation method based on a physical model. That is, by studying the rules of the physical change mechanism of the aging process of the subject, for example, the face texture, shape, and skeleton change mechanism rules, adding these rules to the original image is realized to obtain the target image of the target age.
Another approach is an age transformation method based on deep learning. Specifically, based on the deep learning model, the mapping relationship between different age sections is learned using a sample image set of different age sections to obtain an age conversion model, and a target image of a target age is generated using the age conversion model.
The two ways realize discrete age transformation, and cannot realize continuous age transformation.
To this end, the embodiments of the present disclosure propose an image generation scheme capable of implementing continuous age transformation. That is, a target age interval corresponding to the target age is determined. The target age interval includes a start age and an end age. A target age characterization corresponding to the target age is determined based on the starting age characterization corresponding to the starting age and the ending age characterization corresponding to the ending age. And generating a target image with the age of the object as the target age based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image.
The target age representation corresponding to the target age is determined according to the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age, so that the target age representation corresponding to any target age can be obtained, and the data type of the target age is continuous data. Therefore, the target image of the target age with the data type being continuous data can be obtained based on the target age representation and the original image characteristics, and continuous age transformation is realized.
Fig. 1 schematically illustrates an exemplary system architecture that may be applied to an image generation method, an age transformation model training method, and an apparatus according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the image generation method and the training method and apparatus for the age transformation model may be applied may include a terminal device, but the terminal device may implement the image generation method and the training method and apparatus for the age transformation model provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be various types of servers providing various services, such as a background management server (for example only) providing support for content browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
The Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, and solves the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS (Virtual Private Server, VPS). Server 105 may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that the image generation method and the training method of the age transformation model provided by the embodiment of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Accordingly, the image generation apparatus and the training apparatus of the age transformation model provided by the embodiment of the present disclosure may also be provided in the terminal device 101, 102, or 103.
Alternatively, the image generation method and the training method of the age transformation model provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the image generation apparatus and the age transformation model object training apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The image generation method and the training method of the age transformation model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the image generation apparatus and the training apparatus of the age transformation model provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, server 105 determines a target age interval corresponding to the target age. The target age interval includes a start age and an end age. According to a starting age characterization corresponding to the starting age and an ending age characterization corresponding to the ending age. A target age characterization corresponding to the target age is determined. And generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image. The target image is an image in which the age of the subject is the target age. Or a target image of the object is generated based on an image of the object in the original image from the target age characteristic and the original image characteristic corresponding to the original image by a server or a server cluster capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, the server 105 trains the module to train the first generator and the first discriminator using the first set of real sample images and the first set of simulated sample images, resulting in a trained first generator and first discriminator. And the third determining module is used for determining the trained first generator as the age transformation model. The age transformation model is used to generate the target image described in this disclosure. Or training the first generator and the first discriminator with the first set of real sample images and the first set of simulated sample images by a server or a cluster of servers capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, resulting in a trained first generator and first discriminator. And the third determining module is used for determining the trained first generator as the age transformation model.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flow chart of an image generation method according to an embodiment of the present disclosure.
As shown in FIG. 2, the method 200 includes operations S210-S230.
In operation S210, a target age section corresponding to a target age is determined. The target age interval includes a start age and an end age.
In operation S220, a target age representation corresponding to the target age is determined according to a start age representation corresponding to the start age and an end age representation corresponding to the end age.
In operation S230, a target image of the object is generated based on an image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image. The target image is an image in which the age of the subject is the target age.
According to an embodiment of the present disclosure, the original image may include an object. The original image may be an image of the age of the subject that is the original age. The target image may be an image in which the age of the subject is the target age. The original image and the target image comprise the same object. The target age interval may include a start age and an end age. The target age interval may be determined according to a first age interval and a second age interval. The first age interval and the second age interval may be two adjacent age intervals. The first age interval may include a starting age and an ending age of the first age interval. The second age interval may include a starting age and an ending age of the second age interval. The starting age of the target age interval may be determined according to the starting age and the ending age of the first age interval. The end age of the target age interval may be determined according to the start age and the end age of the second age interval. For example, the starting age of the target age interval may be an average of the sum of the starting age and the ending age of the first age interval. The ending age of the target age may be an average of the sum of the starting age and the ending age of the second age interval.
According to embodiments of the present disclosure, the starting age characterization may be used to characterize the starting age. The end age characterization may be used to characterize the end age. The target age characterization may be used to characterize the target age. The starting age characterization can be obtained by performing feature extraction on the starting age. The end age characterization may be a feature extraction of the end age. The original image features may be used to characterize the features of the original image. The original image features can be obtained by feature extraction of the original image. For example, feature extraction may be performed on the starting age, the ending age and the original image respectively by using a model obtained by training a preset model with a training sample, so as to obtain a starting age representation, an ending age representation and original image features. The preset model may include at least one of a machine learning model, a deep learning model, a reinforcement learning model, and a migration learning model. The number of starting age characterizations may include one or more. The number of end age characterizations may include one or more. The target age characterization may include one or more. The number of raw image features may include one or more. The above-described feature extraction manners of the start age characterization, the end age characterization, and the original image feature are merely exemplary embodiments, but are not limited thereto, and may also include a feature extraction manner known in the art as long as feature extraction can be achieved.
According to the embodiments of the present disclosure, it is possible to determine a target age section corresponding to a target age, that is, determine a target age section to which the target age belongs, after acquiring the target age. After determining the target age interval corresponding to the target age, the target age representation corresponding to the target age may be determined based on an interpolation method from each of the start age representations corresponding to the at least one start age and each of the end age representations corresponding to the at least one end age. The interpolation method may include a linear interpolation method or a non-linear interpolation method.
According to an embodiment of the present disclosure, after determining the target age characterization, a target image including the object in the original image may be generated according to the target age characterization and the original image characteristics corresponding to the original image. The age of the object in the target image is the target age, that is, the age of the object in the original image can be converted from the original age to the target age according to the target age characteristics and the original image characteristics corresponding to the original image, so as to obtain the target image. For example, the model obtained by training the preset model with the training sample as described above may process the original image feature and the target age characterization to obtain the target image. The above-described generation manner of the target image is only an exemplary embodiment, but is not limited thereto, and may include a generation manner known in the art as long as the generation of the target image can be achieved.
According to the embodiment of the disclosure, the target age representation corresponding to the target age is determined according to the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age, so that any target age can obtain the target age representation corresponding to the target age, and the data type of the target age is continuous data. Therefore, the target image of the target age with the data type being continuous data can be obtained based on the target age representation and the original image characteristics, and continuous age transformation is realized.
According to an embodiment of the present disclosure, operation S220 may include the following operations.
And according to the difference between the target age and the starting age and the difference between the target age and the ending age, interpolating between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age to obtain the target age representation corresponding to the target age.
According to an embodiment of the present disclosure, the difference between the target age and the start age and the end age may be determined, respectively, and then the proportions occupied by the start age characterization and the end age characterization in the target age characterization may be determined according to the difference between the target age and the start age and the end age, respectively. And interpolating between the starting age characterization and the ending age characterization according to the proportion to obtain the target age characterization. The ratio may characterize the role played in the target age characterization. The larger the ratio, the greater the effect that can be played in the target age characterization. The larger the difference, the smaller the ratio. For example, if the difference between the target age and the starting age is less than the difference between the target age and the ending age, the starting age characterization may occupy a greater proportion of the target age characterization than the ending age characterization.
According to the embodiment of the disclosure, the target age representation of the continuous target age is realized by interpolating between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age according to the difference between the target age and the starting age and the ending age respectively to obtain the target age representation corresponding to the target age.
According to an embodiment of the present disclosure, interpolating between a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age according to differences between the target age and the starting age and the ending age respectively to obtain the target age representation corresponding to the target age may include the following operations.
A first difference is determined. The first difference characterizes a difference between the ending age and the target age. A second difference is determined. The second difference characterizes a difference between the target age and the starting age. A first ratio is determined. The first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference. A second ratio is determined. The second ratio characterizes a ratio of the second difference sum to a sum of the first difference and the second difference. And determining a target age representation corresponding to the target age according to the first ratio, the second ratio, the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age.
According to an embodiment of the present disclosure, a sum of the first difference and the second difference, i.e. a difference between the ending age and the starting age, is determined. A first ratio of the first difference to the difference between the ending age and the starting age is determined. A second ratio of the second difference to the difference between the ending age and the starting age is determined. The first ratio may characterize the proportion of the end age characterization that is occupied in the target age characterization. The second ratio may characterize the proportion of the starting age characterization that is occupied in the target age characterization.
According to the embodiment of the disclosure, the start age characterization and the end age characterization can be processed based on the first ratio and the second ratio to obtain the target age characterization.
According to an embodiment of the present disclosure, determining a target age representation corresponding to the target age from the first ratio, the second ratio, the starting age representation corresponding to the starting age, and the ending age representation corresponding to the ending age may include the following operations.
A first product is determined. The first product characterizes a product between the first ratio and a starting age characterization corresponding to the starting age. A second product is determined. The second product characterizes a product between the second ratio and an end age characterization corresponding to the end age. Determining a sum between the first product and the second product as a target age characterization corresponding to the target age.
According to an embodiment of the present disclosure, the first ratio may be multiplied by the starting age characterization to obtain a first product. And multiplying the second ratio by the ending age characterization to obtain a second product. The sum between the first product and the second product is then determined as the target age characteristic.
According to an embodiment of the present disclosure, the target age characterization may be determined according to the following equations (1) to (5).
d1=End_age-Target_age (1)
d2=Target_age-Start_age (2)
Figure BDA0003297937160000101
Figure BDA0003297937160000102
Figure BDA0003297937160000103
According to an embodiment of the present disclosure, Start _ age characterizes the Start age. End _ age characterizes the End age. Target _ age characterizes the Target age. The Start _ age _ embedding represents a Start age representation corresponding to the Start age Start _ age. End _ age _ embedding represents an End age representation corresponding to the End age End _ age. Target _ age _ embedding represents a Target age representation corresponding to the Target age Target _ age. d1The first difference is characterized. d2The second difference is characterized. α represents the first ratio. Beta characterizes the second ratio.
According to an embodiment of the present disclosure, the start age characterization, the end age characterization, and the original image features respectively include N start age characterizations, N end age characterizations, and N original image features, N being an integer greater than or equal to 1.
According to an embodiment of the present disclosure, the above-described image generation method may further include the following operations.
And extracting the ith potential space representation corresponding to the original image to obtain the ith original image feature corresponding to the original image. And extracting the jth latent vector corresponding to the starting age to obtain the jth starting age representation corresponding to the starting age. And extracting the jth latent vector corresponding to the finishing age to obtain the jth finishing age representation corresponding to the finishing age. i and j are integers greater than 1 and less than N. And determining a jth target age characteristic according to the jth starting age characteristic and the jth ending age characteristic. And generating a target image of the object based on the image of the object in the original image according to the N original image characteristics and the N target age characteristics.
According to the embodiment of the present disclosure, the value of N may be configured according to actual service requirements, and is not limited herein. For example, N ═ 6. i ∈ {1, 2.,. N-1, N }. j ∈ {1, 2.,. N-1, N }.
According to an embodiment of the present disclosure, an ith latent vector corresponding to a start age may be extracted using a Multi-Layer perceptron (MLP) to obtain a jth start age representation corresponding to the start age. The jth latent vector corresponding to the ending age can be extracted by using a multi-layer perceptron, and a jth ending age representation corresponding to the ending age can be obtained. An ith potential spatial representation corresponding to the original image can be extracted by using an encoding module, and an ith original image feature corresponding to the original image is obtained. Thereby, the 1 st to nth original image features can be obtained. 1 st to nth starting age characterization. 1 st to nth end age characterization.
According to an embodiment of the present disclosure, generating a target image of a subject based on an image of the subject in an original image according to N original image features and N target age representations may include the following operations.
And under the condition that i is equal to j and is equal to 1, obtaining a j-th intermediate feature according to the j-th target age feature and the i-th original image feature. And under the condition that i is larger than or equal to j and larger than or equal to 1, obtaining a jth intermediate feature according to the (j-1) th intermediate feature, the jth target age feature and the ith original image feature. And performing convolution on the Nth intermediate feature to generate a target image of the object.
According to the embodiment of the disclosure, the jth target age characteristic and the ith original image characteristic corresponding to the original image can be processed by the first generator to obtain the jth intermediate characteristic. The first generator may include a decoding module, which may include a cascade of N feature extraction units. Each level of feature extraction unit has an intermediate feature corresponding thereto. The feature extraction units of different levels are used for extracting the features of the intermediate features corresponding to the feature extraction units. That is, the feature extraction unit of the p-th level has the j-th intermediate feature corresponding thereto. p ∈ {1, 2., N-1, N }.
According to an embodiment of the present disclosure, for the p-th 1 feature extraction unit, the input of the feature extraction unit may include two parts, i.e., the j-th 1 target age feature corresponding to the feature extraction unit and the i-th 1 original image feature corresponding to the feature extraction unit. The output of the feature extraction unit may be that the j-th 1 intermediate feature of the feature extraction unit is obtained by processing the j-th 1 target age feature and the i-th 1 original image feature corresponding to the feature extraction unit by the feature extraction unit, that is, the j-th 1 intermediate feature is obtained by processing the j-th 1 target age feature and the i-th 1 original image feature by the p-th 1 feature extraction unit.
According to an embodiment of the present disclosure, for each of the other feature extraction units except for the p-th 1-th feature extraction unit, the input of each feature extraction unit may include three parts, that is, an intermediate feature of a last feature extraction unit of the feature extraction unit, a target age representation corresponding to the feature extraction unit, and an original image feature. The output of each feature extraction unit can be the intermediate feature of the feature extraction unit obtained by processing the intermediate feature of the last feature extraction unit of the feature extraction unit, the target age representation corresponding to the feature extraction unit and the original image feature by the feature extraction unit, that is, the (j-1) th intermediate feature, the jth target age representation and the ith original image feature are processed by the p ≧ 1 feature extraction unit to obtain the jth intermediate feature.
For example, N ═ 4. i ∈ {1, 2, 3, 4 }. j ∈ {1, 2, 3, 4 }. And for the condition that i is equal to j and is equal to 1, obtaining a 1 st intermediate feature according to the 1 st target age feature and the 1 st original image feature. And for the i-j-2, obtaining a 2 nd intermediate feature according to the 1 st intermediate feature, the 2 nd target age feature and the 2 nd original image feature. And for the i-j-3, obtaining a 3 rd intermediate feature according to the 2 nd intermediate feature, the 3 rd target age feature and the 3 rd original image feature. And for the i-j-4, obtaining a 4 th intermediate feature according to the 3 rd intermediate feature, the 4 th target age feature and the 4 th original image feature. And (4) performing convolution on the 4 th intermediate feature to generate a target image of the object.
The image generation method according to the embodiment of the present disclosure is further described with reference to fig. 3.
Fig. 3 schematically shows a principle schematic of an image generation process according to an embodiment of the present disclosure.
As shown in fig. 3, in the image generation process 300, a target age 301 is acquired, and a target age section corresponding to the target age 301 is determined, the target age section including a start age 302 and an end age 303. A target age representation 306 is determined from a start age representation 304 corresponding to the start age 302 and an end age representation 305 corresponding to the end age 303. A target image 309 of the subject is generated from the target age representation 306 and the original image features 308 corresponding to the original image 307.
According to the embodiment of the disclosure, the starting age representation, the ending age representation and the original image feature are obtained by processing the starting age, the ending age and the original image respectively by using an age transformation model. The target image is obtained by processing the original image characteristics and the target age representation by using an age transformation model.
According to an embodiment of the present disclosure, the age transformation model may be obtained by training a deep learning model using a training sample. The training samples may comprise a sample image set.
Fig. 4A schematically shows a flow chart of a training method of an age transformation model according to an embodiment of the present disclosure.
As shown in FIG. 4A, the method 400A includes operations S410-S420.
In operation S410, a first generator and a first discriminator are trained using the first set of real sample images and the first set of simulated sample images, resulting in a trained first generator and first discriminator.
In operation S420, the first generator whose training is completed is determined as an age transformation model. The age transformation model is used to generate the target image described in the disclosed embodiments.
According to an embodiment of the present disclosure, the first set of true sample images may include at least one first true sample image. The real age set may include at least one real age. Each first true sample image has a corresponding true age. The real age characterizes the age of the object in the first real sample image corresponding to the real age after transformation. Each real age may be an age obtained by encoding. The real age may be encoded in such a way that an age interval corresponding to the real age is determined. Setting each dimensionality of the age interval corresponding to the real age as a first preset identification, and setting each dimensionality of other ages except the age interval corresponding to the real age as a second preset identification to obtain the coded age. The first preset identifier and the second preset identifier may be configured according to an actual service requirement, which is not limited herein. For example, the first preset flag may be set to 1, and the second preset flag may be set to 0.
For example, the age interval may include M. The dimension of each age interval may include the U dimension. The dimension for each age may include MU dimensions. M is an integer greater than or equal to 1, and U is an integer greater than or equal to 1. Each age may be given as a ═ a1,a2,......,aK,......,aM-1,aMCharacterization, aKCharacterize the kth age interval, K ∈ {1, 2. a isK={aK1,aK2,......,aKl,....,aKU-1,aKU},aKlThe i dimension, i ∈ {1, 2., U-1, U }, characterizing the K age interval. In determining the age interval corresponding to the true age is a1In the case of (a), will be1Is set to 1, divide by a1The other dimensions are set to 0.
According to an embodiment of the present disclosure, the first set of simulated sample images includes at least one first simulated sample image. The first generator may be for generating a first set of simulated sample images. And continuously training the first generator to learn the data distribution of the first real sample image set, so that samples which are consistent with the data distribution of the first real sample image set can be generated from nothing to nothing, and the first discriminator is defrobulated as far as possible. The first discriminator may be used to distinguish between the first set of true sample images and the first set of simulated sample images.
According to the embodiment of the disclosure, the first generator and the first discriminator are subjected to iterative alternate training by using the first real sample image set and the first simulation sample image set, so that the first generator and the first discriminator realize respective optimization through games between the first generator and the first discriminator, and finally the first discriminator cannot accurately distinguish the first real sample image set and the first simulation sample image set, that is, Nash balance is achieved. In this case, it can be considered that the first generator has learned the data distribution of the first simulation sample image set, and the trained first generator is determined as the age transformation model.
According to an embodiment of the present disclosure, training the first generator and the first discriminator alternately using the first set of real sample images and the first set of simulated sample images may include: in each iteration process, under the condition that the model parameters of the first generator are kept unchanged, the first discriminator is trained by using the first real sample image set and the first simulation sample image set, so that the number of times of training set for the first discriminator in the iteration is completed. After the number of times of training set for the first discriminator in the iteration is completed, the first generator is trained by using the first simulation sample image set under the condition that the model parameters of the first discriminator are kept unchanged, so that the number of times of training set for the first generator in the iteration is completed. It should be noted that, during each training process, a first generator may be used to generate a first set of simulated sample images corresponding to the training process. The above-described training manner of the first generator and the first discriminator is only an exemplary embodiment, but is not limited thereto, and may include a training manner known in the art as long as the training of the first generator and the first discriminator can be achieved.
According to the embodiment of the present disclosure, an appropriate training strategy may be selected according to actual business requirements, which is not limited herein. For example, the training strategy may include at least one of: in each iteration, the number of times of training of the first generator and the number of times of training of the first discriminator are one, the number of times of training of the first generator is one and the number of times of training of the first discriminator is a plurality of times of training, the number of times of training of the first generator is a plurality of times and the number of times of training of the first discriminator is one, the number of times of training of the first generator is a plurality of times and the number of times of training of the first discriminator is a plurality of times.
The training method and the image generation method of the age conversion model according to the embodiment of the present disclosure are further explained with reference to fig. 4B, fig. 5, fig. 6A, fig. 6B, fig. 6C, fig. 6D, and fig. 6E.
In the two age-shifting implementations described above, the number of sample images in some age intervals is small, for example, the number of sample images in the age intervals of the elderly and the age intervals of infants is small. Therefore, the prediction accuracy of the age transformation model obtained based on the deep learning training is low. Therefore, the embodiment of the disclosure provides a training scheme of an age conversion model. This is explained below with reference to fig. 4B.
Fig. 4B schematically shows a flow chart of a training method of an age transformation model according to another embodiment of the present disclosure.
As shown in fig. 4B, the method 400B includes operation S411, operation S412, operation S413, operation S414, operation S415, and operation S420.
In operation S411, an additional sample image set is generated. The additional sample image set characterizes a sample image set of at least one predetermined age interval.
In operation S412, a first true sample image set is obtained according to the initial sample image set and the additional sample image set.
In operation S413, the first real sample image set and the real age set corresponding to the first real sample image set are processed respectively, so as to obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set.
In operation S414, a first simulated sample image set is obtained according to the real sample image and the real age characteristic set.
In operation S415, a first generator and a first discriminator are trained using the first real sample image set and the first simulated sample image set, resulting in a trained first generator and first discriminator.
In operation S420, the first generator whose training is completed is determined as an age transformation model.
According to an embodiment of the present disclosure, the preset age section may refer to an age section in which the number of sample images is less than or equal to the number threshold, that is, the preset age section may refer to an age section in which the number of sample images is smaller. The preset age interval may include a start age and an end age. The at least one preset age interval may comprise at least one of: the starting age of the preset age interval is greater than or equal to the first age threshold and the ending age of the preset age interval is less than or equal to the second age threshold. The first age threshold is greater than the second age threshold. The starting age of the preset age interval being greater than or equal to the first age threshold may comprise an elderly age interval. The age interval of the elderly may represent an age interval corresponding to the elderly. The end age of the preset age interval being less than or equal to the second age threshold may comprise an infant age interval. The infant age interval may characterize an age interval corresponding to an infant. The first age threshold and the second age threshold may be configured according to actual traffic demands, and are not limited herein. For example, the first age threshold may be 70 years of age. The second age threshold may be 5 years of age. The starting age of the elderly age interval may be 70 years and the ending age may be 110 years. The starting age of the infant age interval may be 0 years and the ending age may be 5 years.
According to an embodiment of the present disclosure, the initial sample image set may include a sample image set of at least one preset age interval and a sample image set of at least one other age interval. The initial sample image set may include sample image sets of at least one other age interval. The other age intervals may refer to age intervals other than the preset age interval.
For example, the age range may include 0-5 years, 6-9 years, 10-19 years, 20-24 years, 25-29 years, 30-34 years, 35-39 years, 40-49 years, 50-69 years, and 70-110 years. The at least one predetermined age interval may include 0 to 5 years and 70 to 110 years. The at least one other age interval may include 6-9 years, 10-19 years, 20-24 years, 25-29 years, 30-34 years, 35-39 years, 40-49 years, and 50-69 years.
According to an embodiment of the present disclosure, the first generation antagonistic network model may include a first generator and a first discriminator. The first set of authentic sample images may be processed using a first generator to obtain authentic sample image features corresponding to each of the first authentic sample images included in the first set of authentic sample images. A set of true ages corresponding to the first set of true sample images can be processed with the first generator to obtain a representation of true age corresponding to each true age included in the set of true ages.
According to an embodiment of the disclosure, after obtaining the real sample image feature and the real age characterization, the real sample image feature and the real age characterization may be processed by a first generator to obtain a first simulated sample image. The first simulated sample image may be an image of the age of the subject in the first real sample image as the real age.
According to an embodiment of the present disclosure, the preset age zone may refer to an age zone in which the number of sample images is small, and the number of sample images of the preset age zone is increased by generating an additional sample image set including at least one preset age zone. On the basis, the additional sample image set is involved in the training of the first generator, so that the prediction accuracy of the age transformation model is improved.
According to an embodiment of the present disclosure, operation S413 may include the following operations.
And processing the first real sample image set by using a coding module included in the first generator to obtain a real sample image feature set corresponding to the first real sample image set. Processing the real age set corresponding to the first real sample image set by using a multi-layer perceptron comprised by the first generator to obtain a real age characteristic set corresponding to the real age set.
According to an embodiment of the present disclosure, the first generator may include an encoding module and a multi-layer perceptron. The first generator may comprise an encoding module operable to process the first set of authentic sample images to obtain authentic sample image features corresponding to each of the first authentic sample images comprised in the first set of authentic sample images. The first generator may comprise a multi-layered perceptron operable to process a set of true ages corresponding to the first set of true sample images to derive a representation of true age corresponding to each true age comprised in the set of true ages.
Operation S414 may include the following operations according to an embodiment of the present disclosure.
And processing the real sample image feature set and the real age feature set by using a decoding module included in the first generator to obtain a first simulation sample image set.
According to an embodiment of the present disclosure, the first generator may further include a decoding module. The first generator may include a decoding module for processing the real sample image features and the real age characterization to obtain a first simulated sample image.
According to an embodiment of the present disclosure, the real age characterization corresponding to each real age may include N. The real image features corresponding to each first real sample image may include N. N is an integer greater than or equal to 1. The first set of true sample images may include T first true sample images. T is an integer greater than or equal to 1.
According to an embodiment of the present disclosure, for a qth first real sample image and a qth real age corresponding to the qth first real sample image, processing the qth first real sample image by using a coding module included in the first generator to obtain N real sample image features corresponding to the qth first real sample image may include: and extracting the ith potential space representation corresponding to the qth first real sample image by utilizing an encoding module included in the first generator to obtain the ith real sample image characteristic corresponding to the first real sample image. The jth latent vector corresponding to the real age can be extracted by utilizing a multi-layer perceptron comprised by the first generator, and the jth real age representation corresponding to the real age can be obtained. i ∈ {1, 2.,. N-1, N }. j ∈ {1, 2.,. N-1, N }. q ∈ {1, 2.,. T-1, T }.
According to an embodiment of the present disclosure, for a qth first real sample image and a qth real age corresponding to the qth first real sample image, processing, by a decoding module included in the first generator, N real sample image features corresponding to the qth first real sample image and N real age features corresponding to the qth real age to obtain a qth first simulated sample image corresponding to the qth first real sample image and the qth real age may include: and under the condition that i is equal to j is equal to 1, obtaining a j-th sample intermediate feature according to a j-th real age feature corresponding to a q-th real age and an i-th real sample image feature corresponding to a q-th first real sample image. And under the condition that i is larger than or equal to j and larger than or equal to 1, obtaining a j-th sample intermediate feature according to the (j-1) -th sample intermediate feature, the j-th real age feature and the i-th real sample image feature. And (4) convolving the intermediate features of the Nth sample to generate a first simulation sample image.
According to an embodiment of the present disclosure, alternately training a first generator and a first discriminator using a first set of real sample images and a first set of simulated sample images to obtain a trained first generator and first discriminator may include the following operations.
And under the condition that the preset condition is determined not to be met, alternately training the first generator and the first discriminator by utilizing the first real sample image set and the first simulation sample image set.
And determining a second simulation sample image set from the first simulation sample image set under the condition that the preset condition is determined to be met. And obtaining a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set. And obtaining a second real sample image set according to the second simulation sample image set and the first real sample image set. And alternately training the first generator and the first discriminator by using the second real sample image set and the third simulation sample image set.
According to an embodiment of the present disclosure, the preset condition may be used as a condition for determining whether to train the first generator and the first discriminator using a supervised and unsupervised hybrid training method. The preset condition may include the number of iterations being greater than or equal to a threshold number of iterations. The second set of simulated sample images can include at least one second simulated sample image. The second simulation sample image may be the first simulation sample image whose age prediction accuracy is higher. The age prediction accuracy may be determined according to a deviation between the real age corresponding to the first simulated sample image and the simulated age. The simulated age may be determined by age prediction of the first simulated sample image. Thus, the age prediction accuracy may be determined with the deviation between the real age and the simulated age of the first simulated sample image being less than or equal to the deviation threshold. The deviation threshold may be configured according to actual service requirements, and is not limited herein.
According to an embodiment of the present disclosure, whether a preset condition is satisfied may be determined. If it is determined that the preset condition is satisfied, a second set of simulated sample images may be determined from the first set of simulated sample images. And determining the sample image sets in the first simulation sample image set except the second simulation sample image set as third simulation sample image sets. And adding the second simulation sample image set to the first real sample image set to obtain a second real sample image set.
According to an embodiment of the present disclosure, after obtaining the second set of real sample images and the third set of simulated sample images, the first generator and the first discriminator may be alternately trained using the above.
According to an embodiment of the present disclosure, if it is determined that the preset condition is not satisfied, the first generator and the first discriminator may be alternately trained using the first real sample image set and the first simulated sample image set.
According to the embodiment of the disclosure, the first generator and the first discriminator are trained in an unsupervised manner under the condition that the preset condition is not met. Under the condition that preset conditions are met, the first generator and the first discriminator are trained in an unsupervised and supervised mixed mode, namely, the second simulation sample image set is added to the first real sample image set as group Truth, and generation of simulation sample images of corresponding ages is supervised by gradually adding more second simulation sample images, so that the training process can be more and more stable, and the model stabilizing speed is improved.
According to an embodiment of the present disclosure, operation S411 may include the following operations.
And obtaining a direction vector of an attribute axis corresponding to at least one preset age interval according to the classifier model. At least one first alternate image feature is generated. Generating an additional sample image set using the at least one first alternate image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval based on the image generation model.
According to an embodiment of the present disclosure, the classifier model may be used to determine a direction vector of an attribute axis corresponding to a preset age interval. The classifier model may include a support vector machine model, a decision tree model, or a neural network model. The support vector machine model may comprise a linear support vector machine model. The image generation model may be used to generate additional sample image sets. The image generation model may be a second generator that generates a countermeasure network model. The direction vector of the attribute axis corresponding to at least one preset age interval may include one or more. The first alternate image feature may be an image feature derived based on first random noise. The first random noise may include gaussian noise.
According to an embodiment of the present disclosure, after obtaining the classifier model, a direction vector of an attribute axis corresponding to each of at least one preset age interval may be determined based on the classifier model. For example, the classifier model may include a linear support vector machine model. The at least one preset age interval comprises an elderly age interval and an infant age interval. The classification hyperplane can be obtained according to a linear support vector machine model. And determining a normal vector of the classification hyperplane. And determining the normal vector of the classification hyperplane as the direction vector of the attribute axis corresponding to the age interval of the old and the age interval of the infant.
According to the embodiment of the disclosure, after determining the direction vector of the attribute axis corresponding to at least one preset age interval and at least one first spare image feature, the first spare image feature is adjusted according to the direction vector of the attribute axis corresponding to at least one preset age interval, so as to obtain the adjusted first spare image feature. And inputting the adjusted first standby image characteristics into an image generation model to obtain an additional sample image set.
According to an embodiment of the present disclosure, the training method of the age conversion model may further include the following operations.
An alternate sample image set is generated using the image generation model. A subset of the alternative sample images is determined from the set of alternative sample images. The spare sample image subset characterizes a spare sample image subset corresponding to at least one preset age interval, the spare sample image subset including at least one spare sample image. And training a preset model according to the second standby image characteristics and the age class labels corresponding to each standby sample image in the at least one standby sample image to obtain a classifier model.
According to embodiments of the present disclosure, the image generation model may also be used to generate a set of spare sample images. The set of spare sample images may include a subset of spare sample images corresponding to at least one preset age interval and a subset of spare sample images corresponding to at least one other age interval. Age category labels may be used to characterize categories of ages. The age category label may be determined according to a preset age interval. For example, the at least one preset age interval may include an elderly age interval and an infant age interval. Accordingly, the age category tags may include an elderly tag and an infant tag.
According to an embodiment of the present disclosure, generating the spare sample image set using the image generation model may include: second random noise data may be input into the image generation model, resulting in a set of alternate sample images. The second random noise data may include gaussian noise. After obtaining the preliminary sample image set, an age interval corresponding to each of the preliminary sample images included in the preliminary sample image set may be determined. The spare sample image whose age section is a preset age section is determined as a spare sample image included in the spare sample image subset.
In accordance with embodiments of the present disclosure, after determining the subset of the alternative sample images, a second alternative image feature corresponding to each alternative sample image in the set of alternative sample images may be determined. The output value may be derived using the second alternate image feature and the age category label corresponding to each of the at least one alternate sample image based on a loss function corresponding to the preset model. And adjusting the model parameters of the preset model according to the output value until the output value is converged. And determining the preset model obtained under the condition that the output value is converged as a classifier model.
According to an embodiment of the present disclosure, generating an additional sample image set using at least one first alternative image feature and a direction vector of an attribute axis corresponding to at least one preset age interval based on an image generation model may include the following operations.
Each of the at least one first alternate image feature is adjusted to an expected image feature based on a directional vector of an attribute axis corresponding to the at least one preset age interval. The expected image features are image features corresponding to expected age category labels. At least one expected image feature is input into the image generation model, resulting in an additional sample image set.
According to an embodiment of the present disclosure, the expected age category label may be an age category label corresponding to the alternative sample image. The expected age category labels may include elderly labels and infant labels.
According to an embodiment of the present disclosure, adjusting each of the at least one first spare image feature to an expected image feature based on a direction vector of an attribute axis corresponding to the at least one preset age interval may include: at least one first alternate image feature may be processed using the classifier model to obtain a predicted age category label corresponding to each first alternate image feature. For each of the at least one first standby image feature, in a case where it is determined that the predicted age category label corresponding to the standby image feature does not coincide with the expected age category label, the first standby image feature is moved in a direction of the expected age category label corresponding to the first standby image feature based on the direction vector of the attribute axis corresponding to the at least one preset age interval until the expected image feature corresponding to the first standby image feature is obtained.
For example, contemplated age category labels include senior tags and infant tags. The expected age category label corresponding to a certain first alternate image feature is an elderly label. Determining, using a classifier model, that a predicted age category label corresponding to the first alternate image feature is a young person label. Since the predicted age class label does not match the expected age class label, the first spare image feature may be moved in the direction of the old person label until the expected image feature is obtained, based on the direction vector of the attribute axis corresponding to the old person age section and the infant age section.
According to an embodiment of the present disclosure, after obtaining the expected image features, the expected image features may be input into the image generation model, resulting in additional sample images.
According to an embodiment of the present disclosure, additional sample images of expected age category labels satisfying the number requirement may be generated in the above manner.
According to an embodiment of the present disclosure, the training method of the age conversion model may further include the following operations.
A fourth set of simulated sample images is generated using the second generator. And alternately training the second generator and the second discriminator by utilizing the third real sample image set and the fourth simulation sample image set to obtain the trained second generator and second discriminator. And determining the trained second generator as an image generation model.
According to an embodiment of the present disclosure, the second generating the antagonistic network model may include a second generator and a second discriminator. The second generative confrontation network model may be a Style-based generative confrontation network (Style GAN) model.
According to an embodiment of the present disclosure, generating the fourth set of simulated sample images with the second generator may include: third random noise data may be input to a second generator resulting in a fourth set of simulated sample images.
According to an embodiment of the present disclosure, the alternately training the second generator and the second discriminator using the third set of real sample images and the fourth set of simulated sample images may include: in each iteration process, under the condition that the model parameters of the second generator are kept unchanged, the third real sample image set and the fourth simulation sample image set are used for training the second judgment device, so that the number of times of training set for the second judgment device in the iteration is finished. And after the number of times of training set for the second discriminator by the iteration is finished, training the second generator by using the fourth simulation sample image set under the condition that the model parameters of the second discriminator are kept unchanged so as to finish the number of times of training set for the second generator by the iteration. It should be noted that, in performing each training process, a fourth set of simulated sample images corresponding to the training process may be generated by the second generator. The training method of the second generator and the second discriminator is only an exemplary embodiment, but is not limited thereto, and may include a training method known in the art as long as the training of the second generator and the second discriminator can be achieved.
Fig. 5 schematically illustrates an example schematic of a training process of an age conversion model according to an embodiment of the disclosure.
As shown in FIG. 5, in the training process 500, a fourth set of simulated sample images 502 is generated using a second generator included in a second generative network model 503. And alternately training the second discriminators included in the second generator and the second generation network model 503 by using the third real sample image set 501 and the fourth simulation sample image set 502 to obtain a trained second generator and second discriminators. The trained second generator is determined to be the image generation model 504.
A set of alternate sample images 505 is generated using the image generation model 504. A spare sample image subset 506 is determined from the spare sample image set 505. The preset model 508 is trained according to the second alternative image features and the age class labels 507 corresponding to each of the at least one alternative sample image, resulting in a classifier model 509.
According to the classifier model 509, a direction vector 510 of an attribute axis corresponding to at least one preset age interval is obtained. At least one first alternate image feature 511 is generated. Each first alternate image feature 511 of the at least one first alternate image feature is adjusted to an expected image feature 512 based on a directional vector 510 of an attribute axis corresponding to the at least one preset age interval. At least one desired image feature 512 is input to the image generation model 504, resulting in an additional sample image set 513.
From the initial sample image set 514 and the additional sample image set 513, a first true sample image set 515 is obtained. The first set of true sample images 515 and the set of true ages 516 corresponding to the first set of true sample images 515 are processed using a first generator comprised by a first generative pairwise anti-web model 517, resulting in a set of true sample images 518 corresponding to the set of true sample images 515 and a set of true age characteristics 519 corresponding to the set of true ages 516. The set of true sample images 518 and the set of true age features 519 are processed using a first generator included in a first generative pairwise networked model 517 to produce a first set of simulated sample images 520.
The first generator and the first discriminator included in the first generated network model 517 are alternately trained using the first set of true sample images 515 and the first set of simulated sample images 520, resulting in a trained first generator and first discriminator. The first generator that is trained is determined to be the age conversion model 521.
According to an embodiment of the present disclosure, the first generator may include an encoding module, a multi-layer perceptron, and a decoding module. The process of obtaining the target image with the first generator will be described below with reference to fig. 6A to 6E.
Fig. 6A schematically illustrates an example schematic of an image generation process according to an embodiment of the disclosure.
As shown in fig. 6A, in 600A, an original image 601 may be processed by an encoding module 602 of a first generator, resulting in original image features 603 corresponding to the original image 601.
The start age 604 may be processed using the multi-tier perceptron 606 of the first generator to obtain a start age characterization 607 corresponding to the start age 604. The ending age 605 may be processed using the multi-tier perceptron 606 to derive an ending age representation 608 corresponding to the ending age 605. From the start age characterization 607 and the end age characterization 608, a target age characterization 609 is obtained corresponding to the target age.
The original image features 603 and the target age representation 609 are processed by a decryption module 610 of the first generator to obtain a target image 611 of the subject.
Fig. 6B schematically illustrates an example schematic diagram of a process for processing a starting age using a multi-tier perceptron to arrive at a starting age characterization corresponding to the starting age, according to an embodiment of the disclosure.
As in fig. 6B, in 600B, the multi-layered perceptron 606 in fig. 6A includes a fully-connected layer 6060, a feature extraction unit 6061, a feature extraction unit 6062, a feature extraction unit 6063, a feature extraction unit 6064, a feature extraction unit 6065, and a feature extraction unit 6066 in fig. 6B.
The start age 604 is processed using the full link layer 6060 to obtain a processing result. The processing results are processed by a feature extraction unit 6061, a feature extraction unit 6062, a feature extraction unit 6063, a feature extraction unit 6064, a feature extraction unit 6065, and a feature extraction unit 6066, respectively, resulting in a 1 st start age characterization 6070, a 2 nd start age characterization 6071, a 3 rd start age characterization 6072, a 4 th start age characterization 6073, a 5 th start age characterization 6074, and a 6 th start age characterization 6075.
The end age indicator 608 may be processed in the same manner as shown in fig. 6B, and will not be described herein.
Fig. 6C schematically illustrates an example schematic diagram of a process of processing an original image by using an encoding module to obtain original image features corresponding to the original image according to an embodiment of the present disclosure.
As in fig. 6C, in 600C, the encoding module 602 in fig. 6A includes a feature extraction unit 6020, feature extraction units 6021, … …, a feature extraction unit 6024, and a feature extraction unit 6025 in fig. 6C.
The raw image is processed by the feature extraction unit 6020, the feature extraction units 6021, … …, the feature extraction unit 6024, and the feature extraction unit 6025, respectively, to obtain a 1 st raw image feature 6030, a 2 nd raw image feature 6031, … …, a 5 th raw image feature 6034, and a 6 th raw image feature 6035.
Fig. 6D schematically illustrates an example schematic diagram of a process for processing original image features and a target age representation to obtain a target image of a subject by using a decoding module according to an embodiment of the disclosure.
As shown in fig. 6D, in 600D, the decoding module 610 in fig. 6A includes the convolution layer 6106, the feature extraction unit 6100, the feature extraction units 6101, … …, and the feature extraction unit 6105 in fig. 6D.
The feature extraction unit 6100 is utilized to process the 1 st original image feature 6030 and the 1 st target age representation 6090 to obtain the 1 st intermediate feature 6120.
The feature extraction unit 6101 is utilized to process the 1 st intermediate result 6120, the 2 nd original image feature 6031 and the 2 nd target age characterization 6091 to obtain the 2 nd intermediate feature. … …, the 5 th intermediate result 6124, the 6 th original image feature 6035 and the 6 th target age representation 6095 are processed by the feature extraction unit 6105 to obtain a 6 th intermediate feature.
The 6 th intermediate result is processed with convolution layer 6106 resulting in target image 611 of the object.
Fig. 6E schematically illustrates an example schematic diagram of a continuous age transformation according to an embodiment of the disclosure.
As shown in fig. 6E, in 600E, in the case of different target ages, the original image 601 is processed by using the image generation method according to the embodiment of the present disclosure, resulting in target images 611 corresponding to the different target ages. The target age of the subject in the target image 611 increases in the direction of the arrow in fig. 6E.
The above is only an exemplary embodiment, but is not limited thereto, and other image generation methods and training methods of an age transformation model known in the art may be further included as long as continuous age transformation can be implemented.
In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all conform to the regulations of the related laws and regulations, and do not violate the good custom of the public order.
Fig. 7 schematically shows a block diagram of an image generation apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the image generation apparatus 700 may include a first determination module 710, a second determination module 720, and a first generation module 730.
A first determining module 710 for determining a target age interval corresponding to a target age, wherein the target age interval includes a start age and an end age.
A second determining module 720, configured to determine a target age representation corresponding to the target age according to the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age.
A first generating module 730, configured to generate a target image of the object based on an image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image. The target image is an image in which the age of the subject is the target age.
According to an embodiment of the present disclosure, the second determining module 720 may include a first obtaining submodule.
And the first obtaining submodule is used for carrying out interpolation between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age according to the difference between the target age and the starting age and the difference between the target age and the ending age respectively to obtain the target age representation corresponding to the target age.
According to an embodiment of the present disclosure, the obtaining sub-module may include a first determining unit, a second determining unit, a third determining unit, a fourth determining unit, and a fifth determining unit.
A first determining unit for determining a first difference value. The first difference characterizes a difference between the ending age and the target age.
A second determining unit for determining a second difference. The second difference characterizes a difference between the target age and the starting age.
And the third determining unit is used for determining the first ratio. The first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference.
A fourth determination unit for determining the second ratio. The second ratio characterizes a ratio of the second difference sum to a sum of the first difference and the second difference.
And the fifth determining unit is used for determining the target age representation corresponding to the target age according to the first ratio, the second ratio, the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age.
According to an embodiment of the present disclosure, the fifth determining unit may include a first determining subunit, a second determining subunit, and a third determining subunit.
A first determining subunit for determining the first product. The first product characterizes a product between the first ratio and a starting age characterization corresponding to the starting age.
A second determining subunit for determining a second product. The second product characterizes a product between the second ratio and an end age characterization corresponding to the end age.
A third determining subunit for determining a sum between the first product and the second product as the target age representation corresponding to the target age.
According to an embodiment of the present disclosure, the start age characterization, the end age characterization, and the original image features respectively include N start age characterizations, N end age characterizations, and N original image features, N being an integer greater than or equal to 1.
The first generation module comprises a second obtaining submodule, a third obtaining submodule, a fourth obtaining submodule, a first determining submodule and a first generation submodule.
And the second obtaining submodule is used for extracting the ith potential space representation corresponding to the original image to obtain the ith original image feature corresponding to the original image.
And the third obtaining submodule is used for extracting the jth latent vector corresponding to the starting age to obtain the jth starting age representation corresponding to the starting age.
And the fourth obtaining submodule is used for extracting the jth latent vector corresponding to the finishing age to obtain the jth finishing age representation corresponding to the finishing age. i and j are integers greater than 1 and less than N.
And the first determining submodule is used for determining the jth target age characteristic according to the jth starting age characteristic and the jth ending age characteristic.
And the first generation submodule is used for generating a target image of the object based on the image of the object in the original image according to the N original image characteristics and the N target age characteristics.
According to an embodiment of the present disclosure, the first generation submodule may include a first obtaining unit, a second obtaining unit, and a first generation unit.
A first obtaining unit, configured to obtain an ith intermediate feature according to a jth target age feature and an ith original image feature when i ═ j ═ 1.
And the second obtaining unit is used for obtaining the ith intermediate feature according to the (j-1) th intermediate feature, the jth target age feature and the ith original image feature under the condition that i is j ≧ 1.
And the first generation unit is used for performing convolution on the Nth intermediate characteristic to generate a target image of the object.
According to the embodiment of the disclosure, the starting age representation, the ending age representation and the original image feature are obtained by processing the starting age, the ending age and the original image respectively by using an age transformation model. The target image is obtained by processing the original image characteristics and the target age representation by using an age transformation model.
Fig. 8 schematically shows a block diagram of a training apparatus of an age conversion model according to an embodiment of the present disclosure.
As shown in fig. 8, the training apparatus 800 of the age conversion model may include a training module 810 and a third determining module 820.
The training module 810 trains the first generator and the first discriminator by using the first real sample image set and the first simulation sample image set to obtain a trained first generator and a trained first discriminator.
A third determining module 820 for determining the trained first generator as an age transformation model. The age transformation model is used to generate the target image described in the disclosed embodiments.
According to an embodiment of the present disclosure, the training module 810 may include a second generation sub-module, a fifth obtaining sub-module, a sixth obtaining sub-module, a seventh obtaining sub-module, and an eighth obtaining sub-module.
A second generation submodule for generating an additional sample image set. The additional sample image set characterizes a sample image set of at least one predetermined age interval.
And the fifth obtaining submodule is used for obtaining a first real sample image set according to the initial sample image set and the additional sample image set.
And the sixth obtaining submodule is used for respectively processing the first real sample image set and the real age set corresponding to the first real sample image set to obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set.
And the seventh obtaining submodule is used for obtaining a first simulation sample image set according to the real sample image feature set and the real age feature set.
And the eighth obtaining submodule is used for training the first generator and the first discriminator by utilizing the first real sample image set and the first simulation sample image set to obtain the trained first generator and the trained first discriminator.
According to an embodiment of the present disclosure, the sixth obtaining sub-module may include a first obtaining unit and a second obtaining unit.
And the first obtaining unit is used for processing the first real sample image set by utilizing a coding module included in the first generator to obtain a real sample image feature set corresponding to the first real sample image set.
And the second obtaining unit is used for processing the real age set corresponding to the first real sample image set by utilizing the multilayer perceptron included by the first generator to obtain a real age characteristic set corresponding to the real age set.
According to an embodiment of the present disclosure, the seventh obtaining sub-module may include a third obtaining unit.
And the third obtaining unit is used for processing the real sample image feature set and the real age feature set by utilizing a decoding module included by the first generator to obtain a first simulation sample image set.
According to an embodiment of the present disclosure, the eighth obtaining submodule may include a first training unit, a sixth determining unit, a fourth obtaining unit, a fifth obtaining unit, and a second training unit.
And the first training unit is used for alternately training the first generator and the first discriminator by utilizing the first real sample image set and the first simulation sample image set under the condition that the preset condition is not met.
And the sixth determining unit is used for determining the second simulation sample image set from the first simulation sample image set under the condition that the preset condition is determined to be met.
And the fourth obtaining unit is used for obtaining a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set.
And the fifth obtaining unit is used for obtaining a second real sample image set according to the second simulation sample image set and the first real sample image set.
And the second training unit is used for alternately training the first generator and the first discriminator by utilizing the second real sample image set and the third simulation sample image set.
According to an embodiment of the present disclosure, the third generation submodule may include a sixth obtaining unit, a second generating unit, and a third generating unit.
And a sixth obtaining unit, configured to obtain, according to the classifier model, a direction vector of an attribute axis corresponding to at least one preset age interval.
A second generating unit for generating at least one first spare image feature.
And a third generating unit, configured to generate an additional sample image set using the at least one first spare image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval based on the image generation model.
According to an embodiment of the present disclosure, the training apparatus 800 of the age transformation model may further include a second generating module, a fourth determining module, and a first obtaining module.
And the second generation module is used for generating a spare sample image set by using the image generation model.
A fourth determining module for determining a subset of the alternative sample images from the set of alternative sample images. The spare sample image subset characterizes a spare sample image subset corresponding to at least one preset age interval, the spare sample image subset including at least one spare sample image.
And the first obtaining module is used for training a preset model according to the second standby image characteristics and the age class labels corresponding to each standby sample image in the at least one standby sample image to obtain a classifier model.
According to an embodiment of the present disclosure, the third generating unit may include an adjusting subunit and an obtaining subunit.
An adjusting subunit, configured to adjust each of the at least one first alternative image feature to an expected image feature based on a directional vector of an attribute axis corresponding to the at least one preset age interval. The expected image features are image features corresponding to the expected age category labels.
An obtaining subunit, configured to input at least one expected image feature into the image generation model, to obtain an additional sample image set.
According to an embodiment of the present disclosure, the training apparatus 800 of the age transformation model may further include a third generating module, a second obtaining module, and a fifth determining module.
A third generation module for generating a fourth set of simulated sample images using the second generator;
and the second obtaining module is used for alternately training the second generator and the second discriminator by utilizing the third real sample image set and the fourth simulation sample image set to obtain the trained second generator and second discriminator.
And the fifth determining module is used for determining the trained second generator as the image generation model.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as above.
According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as above.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as above.
Fig. 9 schematically shows a block diagram of an electronic device adapted to implement an image generation method and a training method of an age conversion model according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as an image generation method or a training method of an age conversion model. For example, in some embodiments, the image generation method or the training method of the age conversion model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the image generation method or the training method of the age conversion model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g. by means of firmware) to perform an image generation method or a training method of an age conversion model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (34)

1. An image generation method, comprising:
determining a target age interval corresponding to a target age, wherein the target age interval comprises a start age and an end age;
determining a target age representation corresponding to the target age according to a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age; and
and generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image, wherein the target image is the image of which the age of the object is the target age.
2. The method of claim 1, wherein said determining a target age representation corresponding to the target age from a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age comprises:
and according to the difference between the target age and the starting age and the difference between the target age and the ending age, interpolating between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age to obtain the target age representation corresponding to the target age.
3. The method of claim 2, wherein said interpolating between a beginning age representation corresponding to said beginning age and an ending age representation corresponding to said ending age based on differences between said target age and said beginning age and said ending age, respectively, resulting in a target age representation corresponding to said target age comprises:
determining a first difference, wherein the first difference characterizes a difference between the ending age and the target age;
determining a second difference, wherein the second difference characterizes a difference between the target age and the starting age;
determining a first ratio, wherein the first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference;
determining a second ratio, wherein the second ratio characterizes a ratio of the second difference to a sum of the first difference and the second difference; and
and determining a target age representation corresponding to the target age according to the first ratio, the second ratio, the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age.
4. The method of claim 3, wherein said determining a target age representation corresponding to the target age from the first ratio, the second ratio, a starting age representation corresponding to the starting age, and an ending age representation corresponding to the ending age comprises:
determining a first product, wherein the first product characterizes a product between the first ratio and a starting age characterization corresponding to the starting age;
determining a second product, wherein the second product characterizes a product between the second ratio and an end age characterization corresponding to the end age; and
determining a sum between the first product and the second product as a target age characterization corresponding to the target age.
5. The method according to any one of claims 1 to 4, wherein the starting age representation, the ending age representation, the raw image features respectively comprise N starting age representations, N ending age representations and N raw image features, N being an integer greater than or equal to 1;
generating a target image of an object based on an image of the object in an original image according to the target age characteristic and an original image characteristic corresponding to the original image, including:
extracting the ith potential space representation corresponding to the original image to obtain the ith original image feature corresponding to the original image;
extracting a jth latent vector corresponding to the starting age to obtain a jth starting age representation corresponding to the starting age;
extracting a jth latent vector corresponding to the ending age to obtain a jth ending age representation corresponding to the ending age, wherein i and j are integers which are larger than 1 and smaller than N;
determining a jth target age representation according to the jth starting age representation and the jth ending age representation; and
and generating a target image of the object based on the image of the object in the original image according to the N original image characteristics and the N target age characteristics.
6. The method of claim 5, wherein said generating a target image of a subject based on an image of the subject in the original image from the N original image features and the N target age representations comprises:
under the condition that i is equal to j is equal to 1, obtaining a jth intermediate feature according to the jth target age feature and the ith original image feature;
under the condition that i is j & gt 1, obtaining a jth intermediate feature according to a (j-1) th intermediate feature, a jth target age feature and an ith original image feature; and
and performing convolution on the Nth intermediate feature to generate a target image of the object.
7. The method according to any one of claims 1 to 6, wherein the starting age representation, the ending age representation and the original image features are obtained by processing the starting age, the ending age and the original image respectively using an age transformation model;
the target image is obtained by processing the original image characteristics and the target age representation by using the age transformation model.
8. A method of training an age transformation model, comprising:
training a first generator and a first discriminator by using the first real sample image set and the first simulation sample image set to obtain a trained first generator and a trained first discriminator; and
determining the trained first generator as the age transformation model, wherein the age transformation model is used for generating the target image according to any one of claims 1-7.
9. The method of claim 8, wherein training the first generator and the first discriminator using the first set of real sample images and the first set of simulated sample images, resulting in a trained first generator and first discriminator, comprises:
generating an additional sample image set, wherein the additional sample image set characterizes a sample image set of at least one preset age interval;
obtaining the first real sample image set according to the initial sample image set and the additional sample image set;
respectively processing the first real sample image set and a real age set corresponding to the first real sample image set to obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set;
obtaining the first simulation sample image set according to the real sample image feature set and the real age feature set; and
and training a first generator and a first discriminator by using the first real sample image set and the first simulation sample image set to obtain the trained first generator and first discriminator.
10. The method of claim 9, wherein the separately processing the first set of true specimen images and the set of true ages corresponding to the first set of true specimen images to obtain a set of true specimen image features corresponding to the first set of true specimen images and a set of true age features corresponding to the set of true ages comprises:
processing the first real sample image set by using a coding module included in the first generator to obtain a real sample image feature set corresponding to the first real sample image set; and
processing a real age set corresponding to the first real sample image set by using a multi-layer perceptron comprised by the first generator to obtain a real age feature set corresponding to the real age set.
11. The method of claim 9 or 10, wherein said deriving the first set of simulated sample images from the set of real sample image features and the set of real age features comprises:
and processing the real sample image feature set and the real age feature set by using a decoding module included in the first generator to obtain the first simulation sample image set.
12. The method of any of claims 9-11, wherein the training the first generator and the first discriminator using the first set of authentic sample images and the first set of simulated sample images to obtain the trained first generator and first discriminator comprises:
under the condition that the preset condition is determined not to be met, alternately training the first generator and the first discriminator by utilizing the first real sample image set and the first simulation sample image set;
determining a second set of simulated sample images from the first set of simulated sample images if it is determined that the preset condition is met;
obtaining a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set;
obtaining a second real sample image set according to the second simulation sample image set and the first real sample image set; and
and alternately training the first generator and the first discriminator by using the second real sample image set and the third simulation sample image set.
13. The method of any of claims 9-12, wherein the generating an additional sample image set comprises:
obtaining a direction vector of an attribute axis corresponding to the at least one preset age interval according to the classifier model;
generating at least one first alternate image feature; and
generating the additional sample image set using the at least one first alternate image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval based on an image generation model.
14. The method of claim 13, further comprising:
generating a set of spare sample images using the image generation model;
determining a backup sample image subset from the backup sample image set, wherein the backup sample image subset characterizes a backup sample image subset corresponding to the at least one preset age interval, the backup sample image subset including at least one backup sample image; and
and training a preset model according to the second standby image characteristics and the age class labels corresponding to each standby sample image in the at least one standby sample image to obtain the classifier model.
15. The method according to claim 13 or 14, wherein the generating the additional sample image set using the at least one first alternative image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval based on the image generation model comprises:
adjusting each first standby image feature of the at least one first standby image feature to an expected image feature based on a direction vector of an attribute axis corresponding to the at least one preset age interval, wherein the expected image feature is an image feature corresponding to an expected age category label; and
inputting at least one of the expected image features into the image generation model, resulting in the additional sample image set.
16. The method of any of claims 13-15, further comprising:
generating a fourth set of simulated sample images with the second generator;
alternately training a second generator and a second discriminator by using a third real sample image set and the fourth simulation sample image set to obtain a trained second generator and a trained second discriminator; and
determining the trained second generator as the image generation model.
17. An image generation apparatus comprising:
a first determining module for determining a target age interval corresponding to a target age, wherein the target age interval comprises a start age and an end age;
a second determining module, configured to determine a target age representation corresponding to the target age according to a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age; and
the first generation module is used for generating a target image of the object based on an image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image, wherein the target image is an image of which the age of the object is the target age.
18. The apparatus of claim 17, wherein the second determining means comprises:
and the first obtaining submodule is used for carrying out interpolation between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age according to the difference between the target age and the starting age and the difference between the target age and the ending age respectively to obtain the target age representation corresponding to the target age.
19. The apparatus of claim 18, wherein the first obtaining submodule comprises:
a first determining unit for determining a first difference value, wherein the first difference value characterizes a difference between the ending age and the target age;
a second determining unit for determining a second difference value, wherein the second difference value characterizes a difference between the target age and the starting age;
a third determining unit configured to determine a first ratio, wherein the first ratio represents a ratio of the first difference sum to a sum of the first difference and the second difference;
a fourth determination unit for determining a second ratio, wherein the second ratio is indicative of a ratio of the second difference to a sum of the first difference and the second difference; and
a fifth determining unit, configured to determine a target age representation corresponding to the target age according to the first ratio, the second ratio, a starting age representation corresponding to the starting age, and an ending age representation corresponding to the ending age.
20. The apparatus of any of claims 17-19, wherein the starting age characterization, the ending age characterization, the raw image features respectively comprise N starting age characterizations, N ending age characterizations, and N raw image features, N being an integer greater than or equal to 1;
the first generation module includes:
the second obtaining submodule is used for extracting the ith potential space representation corresponding to the original image to obtain the ith original image feature corresponding to the original image;
the third obtaining submodule is used for extracting the jth latent vector corresponding to the starting age to obtain the jth starting age representation corresponding to the starting age;
a fourth obtaining submodule, configured to extract a jth latent vector corresponding to the ending age to obtain a jth ending age representation corresponding to the ending age, where i and j are integers greater than 1 and less than N;
the first determining submodule is used for determining a jth target age characteristic according to the jth starting age characteristic and the jth ending age characteristic; and
and the first generation submodule is used for generating a target image of the object based on the image of the object in the original image according to the N original image characteristics and the N target age characteristics.
21. The apparatus of claim 20, wherein the first generation submodule comprises:
a first obtaining unit, configured to, when i ═ j ═ 1, obtain a jth intermediate feature according to a jth target age feature and an ith original image feature;
a second obtaining unit, configured to, when i ═ j > 1, obtain a jth intermediate feature according to a (j-1) th intermediate feature, a jth target age feature, and an ith original image feature; and
and the first generation unit is used for performing convolution on the Nth intermediate characteristic to generate a target image of the object.
22. The apparatus according to any one of claims 17 to 21, wherein the starting age representation, the ending age representation and the original image features are obtained by processing the starting age, the ending age and the original image respectively using an age transformation model;
the target image is obtained by processing the original image characteristics and the target age representation by using the age transformation model.
23. An age transformation model training apparatus, comprising:
the training module is used for training a first generator and a first discriminator by utilizing the first real sample image set and the first simulation sample image set to obtain the trained first generator and the trained first discriminator; and
a third determining module, configured to determine the trained first generator as the age transformation model, wherein the age transformation model is used to generate the target image according to any one of claims 17 to 22.
24. The apparatus of claim 23, wherein the training module comprises:
a second generation submodule for generating an additional sample image set, wherein the additional sample image set characterizes a sample image set of at least one preset age interval;
a fifth obtaining submodule, configured to obtain the first true sample image set according to the initial sample image set and the additional sample image set;
a sixth obtaining submodule, configured to process the first real sample image set and the real age set corresponding to the first real sample image set, respectively, to obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set;
a seventh obtaining submodule, configured to obtain the first simulated sample image set according to the real sample image feature set and the real age feature set; and
and the eighth obtaining submodule is used for training the first generator and the first discriminator by utilizing the first real sample image set and the first simulation sample image set to obtain the trained first generator and the trained first discriminator.
25. The apparatus of claim 24, wherein the sixth obtaining submodule comprises:
a first obtaining unit, configured to process the first true sample image set by using a coding module included in the first generator, so as to obtain a set of true sample image features corresponding to the first true sample image set; and
and the second obtaining unit is used for processing the real age set corresponding to the first real sample image set by utilizing the multilayer perceptron included by the first generator to obtain a real age characteristic set corresponding to the real age set.
26. The apparatus of claim 24 or 25, wherein the seventh obtaining sub-module comprises:
a third obtaining unit, configured to process the set of real sample image features and the set of real age features by using a decoding module included in the first generator, so as to obtain the first set of simulated sample images.
27. The apparatus of any one of claims 24-26, wherein the eighth obtaining submodule comprises:
a first training unit, configured to perform alternating training on the first generator and the first discriminator by using the first real sample image set and the first simulated sample image set when it is determined that a preset condition is not satisfied;
a sixth determining unit, configured to determine a second simulated sample image set from the first simulated sample image set if it is determined that the preset condition is satisfied;
a fourth obtaining unit, configured to obtain a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set;
a fifth obtaining unit, configured to obtain a second real sample image set according to the second simulation sample image set and the first real sample image set; and
and the second training unit is used for alternately training the first generator and the first discriminator by utilizing the second real sample image set and the third simulation sample image set.
28. The apparatus of any one of claims 23 to 27, wherein the third generation submodule comprises:
a sixth obtaining unit, configured to obtain, according to the classifier model, a direction vector of an attribute axis corresponding to the at least one preset age interval;
a second generating unit for generating at least one first spare image feature; and
a third generating unit, configured to generate the additional sample image set using the at least one first candidate image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval based on an image generation model.
29. The apparatus of claim 28, further comprising:
a second generation module, configured to generate a set of candidate sample images using the image generation model;
a fourth determining module, configured to determine a spare sample image subset from the spare sample image set, where the spare sample image subset characterizes a spare sample image subset corresponding to the at least one preset age interval, and the spare sample image subset includes at least one spare sample image; and
and the first obtaining module is used for training a preset model according to the second standby image characteristics and the age class labels corresponding to each standby sample image in the at least one standby sample image to obtain the classifier model.
30. The apparatus of claim 28 or 29, wherein the third generating unit comprises:
an adjusting subunit, configured to adjust each of the at least one first standby image feature to an expected image feature based on a direction vector of an attribute axis corresponding to the at least one preset age interval, where the expected image feature is an image feature corresponding to an expected age category label; and
an obtaining subunit, configured to input at least one expected image feature into the image generation model, to obtain the additional sample image set.
31. The apparatus of any one of claims 28-30, further comprising:
a third generation module for generating a fourth set of simulated sample images using the second generator;
the second obtaining module is used for alternately training the second generator and the second discriminator by utilizing a third real sample image set and the fourth simulation sample image set to obtain a trained second generator and a trained second discriminator; and
and the fifth determining module is used for determining the trained second generator as the image generation model.
32. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7 or any one of claims 8 to 16.
33. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of claims 1-7 or any of claims 8-16.
34. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7 or any one of claims 8 to 16.
CN202111184646.6A 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium Active CN113902957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111184646.6A CN113902957B (en) 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111184646.6A CN113902957B (en) 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN113902957A true CN113902957A (en) 2022-01-07
CN113902957B CN113902957B (en) 2024-02-09

Family

ID=79191390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111184646.6A Active CN113902957B (en) 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN113902957B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977629A (en) * 2017-12-04 2018-05-01 电子科技大学 A kind of facial image aging synthetic method of feature based separation confrontation network
CN109308450A (en) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 A kind of face's variation prediction method based on generation confrontation network
CN111402113A (en) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111898482A (en) * 2020-07-14 2020-11-06 贵州大学 Face prediction method based on progressive generation confrontation network
CN111985405A (en) * 2020-08-21 2020-11-24 南京理工大学 Face age synthesis method and system
CN113392769A (en) * 2021-06-16 2021-09-14 广州繁星互娱信息科技有限公司 Face image synthesis method and device, electronic equipment and storage medium
US11120526B1 (en) * 2019-04-05 2021-09-14 Snap Inc. Deep feature generative adversarial neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977629A (en) * 2017-12-04 2018-05-01 电子科技大学 A kind of facial image aging synthetic method of feature based separation confrontation network
CN109308450A (en) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 A kind of face's variation prediction method based on generation confrontation network
US11120526B1 (en) * 2019-04-05 2021-09-14 Snap Inc. Deep feature generative adversarial neural networks
CN111402113A (en) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111898482A (en) * 2020-07-14 2020-11-06 贵州大学 Face prediction method based on progressive generation confrontation network
CN111985405A (en) * 2020-08-21 2020-11-24 南京理工大学 Face age synthesis method and system
CN113392769A (en) * 2021-06-16 2021-09-14 广州繁星互娱信息科技有限公司 Face image synthesis method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENGLONG SHI ET AL.: "CAN-GAN: Conditioned-attention normalized GAN for face age synthesis", PATTERN RECOGNITION LETTERS, vol. 138, pages 520 - 526, XP086293551, DOI: 10.1016/j.patrec.2020.08.021 *
刘扬 等: "测量数据处理及误差理论", 原子能出版社, pages: 293 *
刘璐: "基于深度学习的人脸老化合成研究", 中国博士学位论文全文数据库信息科技辑, vol. 2021, no. 03, pages 138 - 26 *

Also Published As

Publication number Publication date
CN113902957B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN111695415B (en) Image recognition method and related equipment
US11521221B2 (en) Predictive modeling with entity representations computed from neural network models simultaneously trained on multiple tasks
CN112800292B (en) Cross-modal retrieval method based on modal specific and shared feature learning
CN114612290A (en) Training method of image editing model and image editing method
CN113379059B (en) Model training method for quantum data classification and quantum data classification method
CN112580733A (en) Method, device and equipment for training classification model and storage medium
CN114863229A (en) Image classification method and training method and device of image classification model
CN115796310A (en) Information recommendation method, information recommendation device, information recommendation model training device, information recommendation equipment and storage medium
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN114360027A (en) Training method and device for feature extraction network and electronic equipment
CN113379594A (en) Face shape transformation model training, face shape transformation method and related device
CN112036439B (en) Dependency relationship classification method and related equipment
CN113591881A (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN116089586B (en) Question generation method based on text and training method of question generation model
CN113536845A (en) Face attribute recognition method and device, storage medium and intelligent equipment
CN116975347A (en) Image generation model training method and related device
JP2023017983A (en) Information generation model training method, information generation method, apparatus, electronic device, storage medium, and computer program
CN115879958A (en) Foreign-involved sales call decision method and system based on big data
CN113723111B (en) Small sample intention recognition method, device, equipment and storage medium
CN115481285A (en) Cross-modal video text matching method and device, electronic equipment and storage medium
CN115481255A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN115310590A (en) Graph structure learning method and device
CN113902957A (en) Image generation method, model training method, device, electronic device and medium
CN114969577A (en) Interest point recommendation method and interest point recommendation model training method and device
CN113792163B (en) Multimedia recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant