CN113902957B - Image generation method, training method and device of model, electronic equipment and medium - Google Patents

Image generation method, training method and device of model, electronic equipment and medium Download PDF

Info

Publication number
CN113902957B
CN113902957B CN202111184646.6A CN202111184646A CN113902957B CN 113902957 B CN113902957 B CN 113902957B CN 202111184646 A CN202111184646 A CN 202111184646A CN 113902957 B CN113902957 B CN 113902957B
Authority
CN
China
Prior art keywords
age
image
target
sample image
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111184646.6A
Other languages
Chinese (zh)
Other versions
CN113902957A (en
Inventor
尚太章
何声一
刘家铭
洪智滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111184646.6A priority Critical patent/CN113902957B/en
Publication of CN113902957A publication Critical patent/CN113902957A/en
Application granted granted Critical
Publication of CN113902957B publication Critical patent/CN113902957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an image generation method, a training method of a model, a training device of the model, electronic equipment and a medium, relates to the technical field of artificial intelligence, in particular to a computer vision and deep learning technology, and can be applied to scenes such as face image processing and face recognition. The specific implementation scheme is as follows: determining a target age interval corresponding to the target age, wherein the target age interval comprises a start age and an end age; determining a target age characteristic corresponding to the target age according to the starting age characteristic corresponding to the starting age and the ending age characteristic corresponding to the ending age; and generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image, wherein the target image is an image of which the age of the object is the target age.

Description

Image generation method, training method and device of model, electronic equipment and medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to computer vision and deep learning technology, which can be applied to scenes such as face image processing and face recognition. In particular, it relates to an image generation method, a training method of a model, an apparatus, an electronic device and a medium.
Background
Computer vision technology attracts more and more attention in the artificial intelligence era. Object property editing is a relatively important research direction of computer vision technology and attracts more and more attention.
Age transformation belongs to one of the object property edits. Age transformation may refer to rendering an image including an object under age conditions, leaving the identity of the object unchanged.
Disclosure of Invention
The disclosure provides an image generation method, a training method and device of a model, electronic equipment and a medium.
According to an aspect of the present disclosure, there is provided an image generating method including: determining a target age interval corresponding to a target age, wherein the target age interval comprises a start age and an end age; determining a target age characteristic corresponding to the target age according to a starting age characteristic corresponding to the starting age and an ending age characteristic corresponding to the ending age; and generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image, wherein the target image is an image of which the age of the object is the target age.
According to another aspect of the present disclosure, there is provided a training method of an age transformation model, including: training a first generator and a first discriminator by using a first real sample image set and a first simulation sample image set to obtain a first generator and a first discriminator after training is completed; and determining the trained first generator as the age transformation model, wherein the age transformation model is used for generating the target image.
According to another aspect of the present disclosure, there is provided an image generating apparatus including: a first determining module, configured to determine a target age interval corresponding to a target age, where the target age interval includes a start age and an end age; the second determining module is used for determining a target age representation corresponding to the target age according to the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age; and a first generation module, configured to generate a target image of the object based on an image of the object in the original image according to the target age characteristic and an original image feature corresponding to the original image, where the target image is an image of the object with an age of the target age.
According to another aspect of the present disclosure, there is provided a training apparatus of an age transformation model, including: the training module is used for training the first generator and the first discriminator by using the first real sample image set and the first simulation sample image set to obtain a first generator and a first discriminator after training; and a third determining module configured to determine the trained first generator as the age transformation model, where the age transformation model is used to generate the target image as described above.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture of a training method and apparatus that may be applied to an image generation method, an age transformation model, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of an image generation method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of an image generation process according to an embodiment of the present disclosure;
FIG. 4A schematically illustrates a flowchart of a training method of an age transformation model according to an embodiment of the present disclosure;
FIG. 4B schematically illustrates a flowchart of a training method of an age transformation model according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates an example schematic diagram of a training process of an age translation model according to an embodiment of the present disclosure;
FIG. 6A schematically illustrates an example schematic diagram of an image generation process according to an embodiment of the disclosure;
FIG. 6B schematically illustrates an example schematic diagram of a start age characterization process using a multi-layer perceptron to process a start age, resulting in a start age corresponding to the start age, in accordance with an embodiment of the present disclosure;
FIG. 6C schematically illustrates an example schematic diagram of a process of processing an original image with an encoding module to obtain an original image feature corresponding to the original image, according to an embodiment of the disclosure;
FIG. 6D schematically illustrates an example schematic diagram of a process for processing raw image features and target age characterizations with a decoding module to obtain a target image of an object in accordance with an embodiment of the present disclosure;
FIG. 6E schematically illustrates an example schematic diagram of a continuous age transformation according to an embodiment of the disclosure;
fig. 7 schematically shows a block diagram of an image generating apparatus according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a training apparatus of an age translation model according to an embodiment of the present disclosure; and
fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement an image generation method and a training method of an age conversion model, according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Object property editing may involve editing of different properties. The attributes may include at least one of facial features, facial skin features, hair features, neck skin features, and the like. For example, object property editing may include removing or adding glasses, removing or adding bang, editing hair colors, editing five sense organs, or editing makeup. If the edited attribute is a single attribute, and the single attribute tag is decoupled from other attributes, the editing of the target can be realized more easily. If the edited property is a plurality of properties, and the plurality of properties may be coupled, the difficulty of editing the object is high.
Age transformation is an object property edit with multiple properties and coupling between different properties. Age translation may be implemented as follows.
One way is an age transformation method based on a physical model. That is, a target image of a target age is obtained by studying the laws of the physical change mechanisms of the subject's aging process, for example, the laws of facial texture, shape, and bone change mechanisms, and adding these laws to the original image.
Another way is an age-transformation method based on deep learning. That is, based on the deep learning model, the mapping relation between the different age sections is learned by using the sample image set of the different age sections, the age conversion model is obtained, and the target image of the target age is generated by using the age conversion model.
The two ways are discrete age conversion, and continuous age conversion cannot be realized.
To this end, the embodiments of the present disclosure propose an image generation scheme capable of implementing continuous age transformation. That is, a target age range corresponding to the target age is determined. The target age range includes a start age and an end age. And determining a target age characteristic corresponding to the target age according to the starting age characteristic corresponding to the starting age and the ending age characteristic corresponding to the ending age. And generating a target image with the age of the object as the target age based on the image of the object in the original image according to the target age characterization and the original image characteristics corresponding to the original image.
The target age characteristic corresponding to the target age is determined based on the start age characteristic corresponding to the start age and the end age characteristic corresponding to the end age, and therefore, the target age characteristic corresponding to any target age can be obtained, and the data type of the target age is continuous data. Therefore, based on the target age characterization and the original image characteristics, a target image of which the data type is the target age of continuous data can be obtained, and continuous age transformation is realized.
Fig. 1 schematically illustrates an exemplary system architecture of a training method and apparatus that may be applied to an image generation method, an age transformation model according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the image generation method, the age transformation model training method, and the apparatus may be applied may include a terminal device, but the terminal device may implement the image generation method, the age transformation model training method, and the apparatus provided by the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be various types of servers providing various services, such as a background management server (by way of example only) that provides support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
The server 105 can be a cloud server, also called a cloud computing server or a cloud host, is a host product in a cloud computing service system, and solves the defects of large management difficulty and weak service expansibility in the traditional physical hosts and VPS services (Virtual Private Server, VPS). The server 105 may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that the image generation method and the training method of the age conversion model provided by the embodiments of the present disclosure may be generally performed by the terminal device 101, 102, or 103. Accordingly, the image generating apparatus and the training apparatus of the age conversion model provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103.
Alternatively, the image generation method and the training method of the age conversion model provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, the image generating apparatus and the age conversion model object training apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The image generation method and the training method of the age conversion model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the image generating apparatus and the training apparatus of the age conversion model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, the server 105 determines a target age interval corresponding to the target age. The target age range includes a start age and an end age. According to a start age characteristic corresponding to the start age and an end age characteristic corresponding to the end age. A target age characteristic corresponding to the target age is determined. And generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image. The target image is an image of which the age of the subject is the target age. Or by a server or cluster of servers capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, generating a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image characteristics corresponding to the original image.
For example, the server 105 trains the first generator and the first arbiter using the first set of real sample images and the first set of simulated sample images to obtain a trained first generator and first arbiter. And the third determining module is used for determining the trained first generator as an age transformation model. The age transformation model is used to generate the target image described in the present disclosure. Or training the first generator and the first arbiter with the first set of real sample images and the first set of simulated sample images by a server or a cluster of servers capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, resulting in a trained first generator and first arbiter. And the third determining module is used for determining the trained first generator as an age transformation model.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flowchart of an image generation method according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 includes operations S210 to S230.
In operation S210, a target age interval corresponding to the target age is determined. The target age range includes a start age and an end age.
In operation S220, a target age characteristic corresponding to the target age is determined from the start age characteristic corresponding to the start age and the end age characteristic corresponding to the end age.
In operation S230, a target image of the object is generated based on the image of the object in the original image according to the target age characteristic and the original image characteristic corresponding to the original image. The target image is an image of which the age of the subject is the target age.
According to an embodiment of the present disclosure, the original image may include an object. The original image may be an image of the subject of which age is the original age. The target image may be an image of the age of the subject as the target age. The original image and the target image include the same object. The target age interval may include a start age and an end age. The target age interval may be determined from the first age interval and the second age interval. The first age interval and the second age interval may be two adjacent age intervals. The first age interval may include a start age and an end age of the first age interval. The second age interval may include a start age and an end age of the second age interval. The starting age of the target age interval may be determined according to the starting age and the ending age of the first age interval. The ending age of the target age interval may be determined from the starting age and ending age of the second age interval. For example, the starting age of the target age interval may be an average of a sum of the starting age and the ending age of the first age interval. The ending age of the target age may be an average of a sum of the starting age and the ending age of the second age range.
According to embodiments of the present disclosure, a start age characterization may be used to characterize the start age. The end age characterization may be used to characterize the end age. The target age characterization may be used to characterize the target age. The starting age characteristic can be obtained by extracting characteristics of the starting age. The end age characterization may be a feature extraction of the end age. The original image features may be used to characterize the original image. The original image features can be obtained by extracting features from the original image. For example, a model obtained by training a preset model by using a training sample may be used to extract features of the starting age, the ending age and the original image, so as to obtain the starting age characterization, the ending age characterization and the original image features. The pre-set model may include at least one of a machine learning model, a deep learning model, a reinforcement learning model, and a transfer learning model. The number of start age characterizations may include one or more. The number of end age tokens may include one or more. The target age characteristic may include one or more. The number of original image features may include one or more. The above-described feature extraction methods of the start age characteristic, the end age characteristic, and the original image feature are merely exemplary embodiments, but are not limited thereto, and feature extraction methods known in the art may be included as long as feature extraction can be achieved.
According to the embodiments of the present disclosure, it is possible to determine a target age section corresponding to a target age after acquiring the target age, that is, determine a target age section to which the target age belongs. After determining the target age interval corresponding to the target age, the target age characteristic corresponding to the target age may be determined from each of the start age characteristics corresponding to the at least one start age and each of the end age characteristics corresponding to the at least one end age based on the interpolation method. The interpolation method may include a linear interpolation method or a nonlinear interpolation method.
According to embodiments of the present disclosure, after determining the target age characteristic, a target image including an object in the original image may be generated according to the target age characteristic and the original image feature corresponding to the original image. The age of the object in the target image is the target age, that is, the age of the object in the original image can be converted from the original age to the target age according to the target age characterization and the original image characteristics corresponding to the original image, so as to obtain the target image. For example, the original image body characteristics and the target age characteristics can be processed by the model obtained by training the preset model by using the training sample, so as to obtain the target image. The generation method of the target image described above is only an exemplary embodiment, but is not limited thereto, and may include generation methods known in the art as long as generation of the target image can be achieved.
According to the embodiment of the present disclosure, the target age characteristic corresponding to the target age is determined according to the start age characteristic corresponding to the start age and the end age characteristic corresponding to the end age, and therefore, any target age can obtain the target age characteristic corresponding thereto, and the data type of the target age is continuous data. Therefore, based on the target age characterization and the original image characteristics, a target image of which the data type is the target age of continuous data can be obtained, and continuous age transformation is realized.
According to an embodiment of the present disclosure, operation S220 may include the following operations.
And according to the difference value between the target age and the starting age and the ending age respectively, interpolating between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age to obtain the target age representation corresponding to the target age.
According to embodiments of the present disclosure, the differences between the target age and the start and end ages may be determined, respectively, and the proportions occupied by the start and end age characterizations in the target age characterizations may be determined based on the differences between the target age and the start and end ages, respectively. And interpolating between the starting age characteristic and the ending age characteristic according to the proportion to obtain a target age characteristic. The ratio may characterize the role played in the target age characterization. The larger the ratio, the greater the effect that can be explained in the target age characterization. The larger the difference, the smaller the ratio. For example, if the difference between the target age and the start age is less than the difference between the target age and the end age, the ratio occupied by the start age representation in the target age representation may be greater than the ratio occupied by the end age representation in the target age representation.
According to the embodiment of the disclosure, the target age characterization corresponding to the target age is obtained by interpolating between the start age characterization corresponding to the start age and the end age characterization corresponding to the end age according to the difference between the target age and the start age and the end age, respectively.
According to an embodiment of the present disclosure, interpolation is performed between a start age characterization corresponding to a start age and an end age characterization corresponding to an end age according to differences between a target age and the start age and the end age, respectively, to obtain a target age characterization corresponding to the target age, which may include the following operations.
A first difference is determined. The first difference characterizes a difference between the ending age and the target age. A second difference is determined. The second difference characterizes a difference between the target age and the starting age. A first ratio is determined. The first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference. A second ratio is determined. The second ratio characterizes a ratio of the second difference sum to a sum of the first difference and the second difference. And determining a target age representation corresponding to the target age according to the first ratio, the second ratio, the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age.
According to an embodiment of the present disclosure, a sum of the first difference and the second difference is determined, i.e. a difference between the ending age and the starting age is determined. A first ratio of the first difference to the difference between the ending age and the starting age is determined. A second ratio of the second difference to the difference between the ending age and the starting age is determined. The first ratio may characterize a proportion of the ending age characterization that is occupied in the target age characterization. The second ratio may characterize the proportion of the start age characterization that is occupied in the target age characterization.
According to the embodiment of the disclosure, the starting age characteristic and the ending age characteristic can be processed based on the first ratio and the second ratio to obtain the target age characteristic.
According to an embodiment of the present disclosure, determining a target age characteristic corresponding to a target age according to a first ratio, a second ratio, a start age characteristic corresponding to a start age, and an end age characteristic corresponding to an end age may include the following operations.
A first product is determined. The first product characterizes a product between the first ratio and a start age characterization corresponding to the start age. A second product is determined. The second product characterizes a product between the second ratio and an ending age characterization corresponding to the ending age. The sum between the first product and the second product is determined as a target age characteristic corresponding to the target age.
According to embodiments of the present disclosure, the first ratio may be multiplied by the start age characteristic to obtain a first product. Multiplying the second ratio by the end age representation to obtain a second product. And determining the sum of the first product and the second product as the target age characteristic.
According to an embodiment of the present disclosure, the target age characteristic may be determined according to the following formulas (1) to (5).
d 1 =End_age-Target_age (1)
d 2 =Target_age-Start_age (2)
According to an embodiment of the present disclosure, start_age characterizes the Start age. End_age characterizes the End age. Target_age characterization objectAge-marked. Start_age_casting represents a Start age representation corresponding to a Start age start_age. End_age_casting represents an ending age representation corresponding to ending age end_age. Target_age_casting represents a Target age representation corresponding to the Target age target_age. d, d 1 The first difference is characterized. d, d 2 The second difference is characterized. Alpha characterizes a first ratio. Beta characterizes a second ratio.
According to an embodiment of the present disclosure, the start age characterization, the end age characterization, the original image features include N start age characterizations, N end age characterizations, and N original image features, respectively, N being an integer greater than or equal to 1.
According to an embodiment of the present disclosure, the above-described image generation method may further include the following operations.
And extracting the ith potential space representation corresponding to the original image to obtain the ith original image feature corresponding to the original image. And extracting the jth latent vector corresponding to the starting age to obtain the jth starting age characteristic corresponding to the starting age. And extracting the jth latent vector corresponding to the ending age to obtain the jth ending age representation corresponding to the ending age. i and j are integers greater than 1 and less than N. And determining a jth target age characteristic according to the jth starting age characteristic and the jth ending age characteristic. And generating a target image of the object based on the image of the object in the original image according to the N original image features and the N target age characterizations.
According to the embodiment of the present disclosure, the value of N may be configured according to actual service requirements, which is not limited herein. For example, n=6. i e {1,2,..N-1, N }. j∈ {1,2,..N-1, N }.
According to embodiments of the present disclosure, an ith latent vector corresponding to a starting age may be extracted using a Multi-Layer perceptron (MLP) to obtain a jth starting age characterization corresponding to the starting age. The jth latent vector corresponding to the ending age can be extracted by using the multi-layer perceptron to obtain the jth ending age representation corresponding to the ending age. The coding module can be utilized to extract the ith potential spatial representation corresponding to the original image, so as to obtain the ith original image feature corresponding to the original image. Thus, 1 st to nth original image features can be obtained. The 1 st to nth start age characterization. End age characterization 1 to nth.
According to an embodiment of the present disclosure, generating a target image of an object based on an image of the object in an original image according to N original image features and N target age characterizations may include the following operations.
In the case of i=j=1, the jth intermediate feature is obtained from the jth target age characteristic and the ith original image feature. In the case where i=j > 1, the jth intermediate feature is obtained from the (j-1) th intermediate feature, the jth target age characteristic, and the ith original image feature. And convolving the Nth intermediate feature to generate a target image of the object.
According to the embodiment of the disclosure, the jth target age characteristic and the ith original image characteristic corresponding to the original image can be processed by using the first generator to obtain the jth intermediate characteristic. The first generator may include a decoding module, which may include N feature extraction units in cascade. The feature extraction unit of each level has intermediate features corresponding thereto. The feature extraction units of different levels are used to extract features of intermediate features corresponding thereto. That is, the feature extraction unit of the p-th level has the j-th intermediate feature corresponding thereto. p e {1, 2.,. N-1, N }.
According to an embodiment of the present disclosure, for the p=1 th feature extraction unit, the input of the feature extraction unit may include two parts, i.e., the j=1 th target age characteristic corresponding to the feature extraction unit and the i=1 th original image feature corresponding to the feature extraction unit. The output of the feature extraction unit may be the j=1 intermediate feature of the feature extraction unit obtained by processing the j=1 target age characteristic and the i=1 original image feature corresponding to the feature extraction unit by the feature extraction unit, that is, the j=1 intermediate feature is obtained by processing the j=1 target age characteristic and the i=1 original image feature by the p=1 feature extraction unit.
According to an embodiment of the present disclosure, for each of the other feature extraction units except for the p=1 th feature extraction unit, the input of each feature extraction unit may include three parts, that is, may include the middle feature of the last feature extraction unit of the feature extraction unit, the target age characteristic corresponding to the feature extraction unit, and the original image feature. The output of each feature extraction unit may be the intermediate feature of the feature extraction unit obtained by processing the intermediate feature of the last feature extraction unit of the feature extraction unit, the target age characteristic corresponding to the feature extraction unit, and the original image feature, that is, the (j-1) th intermediate feature, the j-th target age characteristic, and the i-th original image feature are processed by the p > 1 th feature extraction unit, to obtain the j-th intermediate feature.
For example, n=4. i e {1,2,3,4}. j ε {1,2,3,4}. For i=j=1, the 1 st intermediate feature is obtained from the 1 st target age feature and the 1 st original image feature. For i=j=2, the 2 nd intermediate feature is derived from the 1 st intermediate feature, the 2 nd target age feature, and the 2 nd original image feature. For i=j=3, the 3 rd intermediate feature is derived from the 2 nd intermediate feature, the 3 rd target age feature, and the 3 rd original image feature. For i=j=4, the 4 th intermediate feature is derived from the 3 rd intermediate feature, the 4 th target age feature, and the 4 th original image feature. And convolving the 4 th intermediate feature to generate a target image of the object.
An image generation method according to an embodiment of the present disclosure is further described below with reference to fig. 3.
Fig. 3 schematically illustrates a schematic diagram of an image generation process according to an embodiment of the present disclosure.
As shown in fig. 3, in the image generation process 300, a target age 301 is acquired, a target age section corresponding to the target age 301 is determined, and the target age section includes a start age 302 and an end age 303. A target age characteristic 306 is determined from a start age characteristic 304 corresponding to the start age 302 and an end age characteristic 305 corresponding to the end age 303. A target image 309 of the subject is generated from the target age representation 306 and the original image features 308 corresponding to the original image 307.
According to an embodiment of the present disclosure, the start age, end age, and original image characteristics are obtained by processing the start age, end age, and original image, respectively, using an age transformation model. The target image is obtained by processing the original image characteristics and the target age characterization by using an age transformation model.
According to embodiments of the present disclosure, the age transformation model may be trained on a deep learning model using training samples. The training samples may include a sample image set.
Fig. 4A schematically illustrates a flowchart of a training method of an age transformation model according to an embodiment of the present disclosure.
As shown in fig. 4A, the method 400A includes operations S410-S420.
In operation S410, the first generator and the first arbiter are trained using the first real sample image set and the first simulation sample image set, resulting in a trained first generator and first arbiter.
In operation S420, the trained first generator is determined as an age transformation model. The age transformation model is used to generate the target image according to the disclosed embodiments.
According to an embodiment of the present disclosure, the first set of real sample images may comprise at least one first real sample image. The set of real ages may include at least one real age. Each first true sample image has a corresponding true age. The true age characterizes the age of the object in the first true sample image corresponding to the true age after the object is needed to be transformed. Each true age may be an age obtained after encoding. The true age may be encoded in such a way that an age interval corresponding to the true age is determined. Setting each dimension of the age interval corresponding to the true age as a first preset identifier, and setting each dimension of other ages except the age interval corresponding to the true age as a second preset identifier to obtain the encoded age. The first preset identifier and the second preset identifier may be configured according to actual service requirements, which is not limited herein. For example, the first preset flag may be set to 1 and the second preset flag may be set to 0.
For example, an age interval may include M. The dimension of each age interval may include a U dimension. The dimension of each age may include MU dimensions. M is an integer greater than or equal to 1, and U is an integer greater than or equal to 1. Each age can be given a= { a 1 ,a 2 ,......,a K ,......,a M-1 ,a M Characterization, a K The K-th age interval is characterized, K.epsilon. {1, 2.. Sub.M.1, M.sub.m.. a, a K ={a K1 ,a K2 ,......,a Kl ,.....,a KU-1 ,a KU },a Kl Characterizing the first dimension of the kth age interval, l e {1, 2.,. U-1, U }. In determining that the age range corresponding to the true age is a 1 In the case of (a), will be a 1 Is set to 1, dividing a by 1 The other dimensions are set to 0.
According to an embodiment of the present disclosure, the first set of simulated sample images comprises at least one first simulated sample image. The first generator may be for generating a first set of simulated sample images. The first generator is continuously trained to learn the data distribution of the first real sample image set, so that samples conforming to the data distribution of the first real sample image set can be generated from none to none, and the first discriminator can be confused as far as possible. The first arbiter may be used to distinguish between the first set of real sample images and the first set of simulated sample images.
According to the embodiment of the disclosure, the first generator and the first discriminator are trained in an iterative and alternating manner by using the first real sample image set and the first simulation sample image set, so that the first generator and the first discriminator realize respective optimization through games between the first generator and the first discriminator, and finally the first discriminator cannot accurately distinguish the first real sample image set and the first simulation sample image set, namely Nash balance is achieved. In this case, it can be considered that the first generator learns the data distribution of the first simulation sample image set, and the trained first generator is determined as the age conversion model.
According to an embodiment of the present disclosure, using the first set of true sample images and the first set of simulated sample images, alternately training the first generator and the first arbiter may include: in each iteration process, under the condition that the model parameters of the first generator are kept unchanged, the first discriminator is trained by using the first real sample image set and the first simulation sample image set, so that the training times set for the first discriminator in the iteration process are completed. After the training times set for the first discriminator for the iteration are completed, the first generator is trained by using the first simulation sample image set under the condition that the model parameters of the first discriminator are kept unchanged, so that the training times set for the first generator for the iteration are completed. It should be noted that, in each training process, the first generator may be used to generate the first set of simulation sample images corresponding to the time. The training patterns of the first generator and the first arbiter described above are only exemplary embodiments, but are not limited thereto, and training patterns known in the art may be included as long as training of the first generator and the first arbiter can be achieved.
According to the embodiment of the disclosure, an appropriate training strategy can be selected according to actual service requirements, which is not limited herein. For example, the training strategy may include at least one of: in each iteration, the training times of the first generator and the first discriminator are one time, the training times of the first generator are one time, the training times of the first discriminator are multiple times, the training times of the first generator are multiple times, the training times of the first discriminator are one time, the training times of the first generator are multiple times, and the training times of the first discriminator are multiple times.
The training method and the image generation method of the age conversion model according to the embodiment of the present disclosure are further described below with reference to fig. 4B, 5, 6A, 6B, 6C, 6D, and 6E.
In the two age-transformed implementations described above, the number of sample images for some age intervals is small, e.g., the number of sample images for elderly age intervals and infant age intervals is small. Therefore, the prediction accuracy of the age-conversion model obtained based on the deep learning training is low. To this end, the embodiments of the present disclosure provide a training scheme for an age translation model. The following is a description with reference to fig. 4B.
Fig. 4B schematically illustrates a flowchart of a training method of an age transformation model according to another embodiment of the present disclosure.
As shown in fig. 4B, the method 400B includes operations S411, S412, S413, S414, S415, and S420.
In operation S411, an additional sample image set is generated. The additional sample image set characterizes the sample image set of at least one preset age interval.
In operation S412, a first set of true sample images is obtained from the initial set of sample images and the additional set of sample images.
In operation S413, the first real sample image set and the real age set corresponding to the first real sample image set are respectively processed to obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set.
In operation S414, a first set of simulated sample images is obtained from the real sample images and the real age feature set.
In operation S415, the first generator and the first arbiter are trained using the first set of true sample images and the first set of simulated sample images, resulting in a trained first generator and first arbiter.
In operation S420, the trained first generator is determined as an age transformation model.
According to an embodiment of the present disclosure, the preset age interval may refer to an age interval in which the number of sample images is less than or equal to the number threshold, i.e., the preset age interval may refer to an age interval in which the number of sample images is less. The preset age interval may include a start age and an end age. The at least one preset age interval may include at least one of: the starting age of the preset age interval is greater than or equal to the first age threshold and the ending age of the preset age interval is less than or equal to the second age threshold. The first age threshold is greater than the second age threshold. The starting age of the preset age interval being greater than or equal to the first age threshold may include an aged interval of elderly people. The aged person's age interval may characterize an age interval corresponding to the aged person. The ending age of the preset age interval being less than or equal to the second age threshold may include an infant age interval. The infant age interval may characterize an age interval corresponding to an infant. The first age threshold and the second age threshold may be configured according to actual service requirements, and are not limited herein. For example, the first age threshold may be 70 years old. The second age threshold may be 5 years old. The old person's age range may have a start age of 70 years and an end age of 110 years. The infant age range may have a start age of 0 years and an end age of 5 years.
According to embodiments of the present disclosure, the initial sample image set may include a sample image set of at least one preset age interval and a sample image set of at least one other age interval. The initial sample image set may include a sample image set of at least one other age interval. Other age intervals may refer to age intervals other than the preset age interval.
For example, age intervals may include 0 to 5 years old, 6 to 9 years old, 10 to 19 years old, 20 to 24 years old, 25 to 29 years old, 30 to 34 years old, 35 to 39 years old, 40 to 49 years old, 50 to 69 years old, and 70 to 110 years old. The at least one preset age interval may include 0 to 5 years old and 70 to 110 years old. At least one other age interval may include 6 to 9 years old, 10 to 19 years old, 20 to 24 years old, 25 to 29 years old, 30 to 34 years old, 35 to 39 years old, 40 to 49 years old, and 50 to 69 years old.
According to an embodiment of the present disclosure, a first generation-oriented network model may include a first generator and a first arbiter. The first set of true sample images may be processed with a first generator to obtain a true sample image feature corresponding to each first true sample image included in the first set of true sample images. The real age set corresponding to the first real sample image set may be processed with a first generator to obtain a real age representation corresponding to each real age included in the real age set.
According to embodiments of the present disclosure, after obtaining the true sample image features and the true age representation, the true sample image features and the true age representation may be processed with a first generator to obtain a first simulated sample image. The first simulation sample image may be an image in which the age of the object in the first real sample image is a real age.
According to embodiments of the present disclosure, the preset age interval may refer to an age interval in which the number of sample images is smaller, the number of sample images of the preset age interval being increased by generating an additional sample image set including at least one preset age interval. On the basis, the additional sample image set participates in training of the first generator, so that the prediction accuracy of the age transformation model is improved.
According to an embodiment of the present disclosure, operation S413 may include the following operations.
And processing the first real sample image set by using an encoding module included in the first generator to obtain a real sample image feature set corresponding to the first real sample image set. And processing the real age set corresponding to the first real sample image set by using a multi-layer perceptron included in the first generator to obtain a real age characteristic set corresponding to the real age set.
According to an embodiment of the present disclosure, the first generator may include an encoding module and a multi-layer perceptron. The first generator may include an encoding module operable to process the first set of true sample images to obtain a true sample image feature corresponding to each of the first true sample images included in the first set of true sample images. The multi-layered perceptron included in the first generator may be configured to process the set of real ages corresponding to the first set of real sample images to obtain a representation of the real ages corresponding to each of the real ages included in the set of real ages.
According to an embodiment of the present disclosure, operation S414 may include the following operations.
And processing the real sample image feature set and the real age feature set by using a decoding module included in the first generator to obtain a first simulation sample image set.
According to an embodiment of the present disclosure, the first generator may further include a decoding module. The first generator includes a decoding module operable to process the true sample image characteristics and the true age representation to obtain a first simulated sample image.
According to an embodiment of the present disclosure, the real age characterizations corresponding to each real age may include N. The real image features corresponding to each of the first real sample images may include N. N is an integer greater than or equal to 1. The first set of true sample images may include T first true sample images. T is an integer greater than or equal to 1.
According to an embodiment of the present disclosure, for a q first real sample image and a q real age corresponding to the q first real sample image, processing the q first real sample image by an encoding module included in a first generator to obtain N real sample image features corresponding to the q first real sample image may include: and extracting an ith potential space representation corresponding to the (q) th first real sample image by using an encoding module included in the first generator, and obtaining an ith real sample image characteristic corresponding to the first real sample image. The jth latent vector corresponding to the true age can be extracted by using a multi-layer perceptron included in the first generator, so as to obtain the jth true age representation corresponding to the true age. i e {1, 2.,. N-1, N }. j e {1, 2.,. N-1, N }. q e {1, 2.. The third party, T-1, T }.
According to an embodiment of the present disclosure, for a q-th first real sample image and a q-th real age corresponding to the q-th first real sample image, processing, with a decoding module included in a first generator, N real sample image features corresponding to the q-th first real sample image and N real age characterizations corresponding to the q-th real age to obtain a q-th first simulation sample image corresponding to the q-th first real sample image and the q-th real age may include: in the case of i=j=1, a jth sample intermediate feature is obtained from a jth real age representation corresponding to a qth real age and an ith real sample image feature corresponding to a qth first real sample image. And under the condition that i=j > 1, obtaining the jth sample intermediate feature according to the (j-1) th sample intermediate feature, the jth real age characteristic and the ith real sample image feature. And convolving the intermediate features of the Nth sample to generate a first simulation sample image.
According to an embodiment of the present disclosure, performing alternating training on a first generator and a first arbiter using a first set of true sample images and a first set of simulated sample images, resulting in a trained first generator and first arbiter, may include the following operations.
And under the condition that the preset condition is not met, using the first real sample image set and the first simulation sample image set to train the first generator and the first discriminator alternately.
In the case that the preset condition is determined to be met, a second simulation sample image set is determined from the first simulation sample image set. And obtaining a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set. And obtaining a second real sample image set according to the second simulation sample image set and the first real sample image set. And using the second real sample image set and the third simulation sample image set to train the first generator and the first discriminator alternately.
According to embodiments of the present disclosure, the preset conditions may be used as conditions for determining whether to train the first generator and the first arbiter using the supervised and unsupervised hybrid training methods. The preset condition may include the number of iterations being greater than or equal to the iteration number threshold. The second set of simulated sample images may comprise at least one second simulated sample image. The second simulated sample image may be a first simulated sample image with higher accuracy of age prediction. The age prediction accuracy may be determined according to a deviation between a true age corresponding to the first simulation sample image and the simulation age. The simulated age may be determined by age prediction of the first simulated sample image. Thus, the higher accuracy of age prediction may be determined with a deviation between the true age of the first simulated sample image and the simulated age being less than or equal to the deviation threshold. The deviation threshold may be configured according to actual service requirements, and is not limited herein.
According to an embodiment of the present disclosure, it may be determined whether a preset condition is satisfied. If it is determined that the preset condition is satisfied, a second set of simulated sample images may be determined from the first set of simulated sample images. And determining other sample image sets in the first simulation sample image set except the second simulation sample image set as a third simulation sample image set. And adding the second simulation sample image set to the first real sample image set to obtain a second real sample image set.
According to embodiments of the present disclosure, after the second set of true sample images and the third set of simulated sample images are obtained, the first generator and the first arbiter may be alternately trained using the above.
According to an embodiment of the present disclosure, if it is determined that the preset condition is not satisfied, the first generator and the first arbiter may be alternately trained using the first real sample image set and the first simulation sample image set.
According to an embodiment of the present disclosure, the first generator and the first arbiter are trained by using an unsupervised manner if a preset condition is not satisfied. Under the condition that preset conditions are met, the first generator and the first discriminator are trained by using an unsupervised and supervised mixed mode, namely, the second simulation sample image set is added to the first real sample image set as a group Truth, and generation of simulation sample images of corresponding ages is supervised by gradually adding more second simulation sample images, so that the training process can be more and more stable, and the model stabilizing speed is improved.
According to an embodiment of the present disclosure, operation S411 may include the following operations.
And obtaining the direction vector of the attribute shaft corresponding to at least one preset age interval according to the classifier model. At least one first alternate image feature is generated. An additional sample image set is generated based on the image generation model using at least one first alternate image feature and a direction vector of an attribute axis corresponding to at least one preset age interval.
According to embodiments of the present disclosure, a classifier model may be used to determine a direction vector of an attribute axis corresponding to a preset age interval. The classifier model may include a support vector machine model, a decision tree model, or a neural network model. The support vector machine model may include a linear support vector machine model. The image generation model may be used to generate additional sample image sets. The image generation model may be a second generator that generates the antagonism network model. The direction vector of the attribute axis corresponding to the at least one preset age interval may include one or more. The first alternate image feature may be an image feature derived based on a first random noise. The first random noise may include gaussian noise.
According to an embodiment of the present disclosure, after obtaining the classifier model, a direction vector of an attribute axis corresponding to each of at least one preset age interval may be determined based on the classifier model. For example, the classifier model may include a linear support vector machine model. The at least one preset age interval includes an aged age interval and an infant age interval. The classification hyperplane can be obtained according to a linear support vector machine model. A normal vector of the classification hyperplane is determined. The normal vector of the classification hyperplane is determined as the direction vector of the attribute axis corresponding to the age interval of the elderly and the age interval of the infants.
According to an embodiment of the present disclosure, after determining a direction vector of an attribute axis corresponding to at least one preset age interval and at least one first standby image feature, the first standby image feature is adjusted according to the direction vector of the attribute axis corresponding to the at least one preset age interval, resulting in an adjusted first standby image feature. And inputting the adjusted first standby image characteristics into an image generation model to obtain an additional sample image set.
According to an embodiment of the present disclosure, the training method of the age conversion model may further include the following operations.
A set of alternate sample images is generated using the image generation model. A subset of the alternate sample images is determined from the set of alternate sample images. The subset of backup sample images characterizes a subset of backup sample images corresponding to at least one preset age interval, the subset of backup sample images including at least one backup sample image. Training a preset model according to the second standby image characteristics and the age category labels corresponding to each standby sample image in the at least one standby sample image to obtain a classifier model.
The image generation model may also be used to generate a set of alternate sample images, according to embodiments of the present disclosure. The set of alternate sample images may include a subset of the alternate sample images corresponding to at least one preset age interval and a subset of the alternate sample images corresponding to at least one other age interval. An age category label may be used to characterize the category of age. The age category label may be determined according to a preset age interval. For example, the at least one preset age interval may include an aged age interval of an elderly person and an infant age interval. Accordingly, the age category labels may include elderly labels and infant labels.
According to an embodiment of the present disclosure, generating the set of alternative sample images using the image generation model may include: the second random noise data may be input into an image generation model resulting in a set of alternate sample images. The second random noise data may include gaussian noise. After obtaining the set of alternate sample images, an age interval corresponding to each of the alternate sample images included in the set of alternate sample images may be determined. And determining the standby sample image with the age interval being the preset age interval as the standby sample image included in the standby sample image subset.
In accordance with embodiments of the present disclosure, after determining the subset of the alternative sample images, a second alternative image feature may be determined that corresponds to each alternative sample image in the set of alternative sample images. The output value may be derived using a second alternative image feature and an age category label corresponding to each of the at least one alternative sample image based on a loss function corresponding to the pre-set model. And adjusting model parameters of the preset model according to the output value until the output value converges. And determining a preset model obtained under the condition that the output value converges as a classifier model.
According to an embodiment of the present disclosure, generating an additional sample image set using at least one first alternate image feature and a direction vector of an attribute axis corresponding to at least one preset age interval based on an image generation model may include the following operations.
Each of the at least one first alternate image feature is adjusted to a desired image feature based on a direction vector of an attribute axis corresponding to the at least one preset age interval. The expected image features are image features corresponding to expected age category labels. At least one expected image feature is input into the image generation model to obtain an additional sample image set.
According to an embodiment of the present disclosure, the expected age category label may be an age category label corresponding to the backup sample image. The expected age category labels may include elderly labels and infant labels.
According to an embodiment of the present disclosure, adjusting each of the at least one first alternate image feature to the desired image feature based on the direction vector of the attribute axis corresponding to the at least one preset age interval may include: at least one first alternate image feature may be processed using the classifier model to obtain a predicted age category label corresponding to each first alternate image feature. For each of the at least one first alternate image feature, in the event that it is determined that the predicted age category label corresponding to the alternate image feature does not coincide with the expected age category label, moving the first alternate image feature in a direction of the expected age category label corresponding to the first alternate image feature based on the direction vector of the attribute axis corresponding to the at least one preset age interval until the expected image feature corresponding to the first alternate image feature is obtained.
For example, the expected age category labels include elderly labels and infant labels. The expected age category label corresponding to a certain first alternate image feature is an elderly person label. A classifier model is utilized to determine that the predicted age category label corresponding to the first alternate image feature is a young person label. The predicted age category label is inconsistent with the expected age category label, and therefore, the first alternate image feature may be moved in the direction of the aged person label based on the direction vector of the attribute axis corresponding to the aged person's age range and the infant's age range until the expected image feature is obtained.
According to embodiments of the present disclosure, after the expected image features are obtained, the expected image features may be input into an image generation model, resulting in additional sample images.
According to embodiments of the present disclosure, additional sample images of expected age category labels meeting the number requirements may be generated in the manner described above.
According to an embodiment of the present disclosure, the training method of the age conversion model may further include the following operations.
A fourth set of simulated sample images is generated with the second generator. And using the third real sample image set and the fourth simulation sample image set to train the second generator and the second discriminator alternately, so as to obtain the trained second generator and second discriminator. The trained second generator is determined to be an image generation model.
According to an embodiment of the present disclosure, the second generation countermeasure network model may include a second generator and a second arbiter. The second generated antagonism network model may be a Style-based generated antagonism network (Style GAN) model.
According to an embodiment of the present disclosure, generating the fourth set of simulated sample images with the second generator may include: the third random noise data may be input to a second generator resulting in a fourth set of simulated sample images.
According to an embodiment of the present disclosure, using the third set of true sample images and the fourth set of simulated sample images, alternately training the second generator and the second arbiter may include: and in the process of each iteration, under the condition of keeping the model parameters of the second generator unchanged, training the second discriminator by using the third real sample image set and the fourth simulation sample image set so as to finish the training times set by the iteration for the second discriminator. After the number of training times set for the second discriminator for the iteration is completed, the second generator is trained by using the fourth simulation sample image set under the condition that the model parameters of the second discriminator are kept unchanged, so that the number of training times set for the second generator for the iteration is completed. In the training process, the second generator may be used to generate the fourth simulation sample image set corresponding to the training process. The training patterns of the second generator and the second arbiter described above are only exemplary embodiments, but are not limited thereto, and may also include training patterns known in the art as long as training of the second generator and the second arbiter can be achieved.
Fig. 5 schematically illustrates an example schematic diagram of a training process of an age conversion model according to an embodiment of the present disclosure.
As shown in fig. 5, in the training process 500, a fourth set of simulated sample images 502 is generated using a second generator included in a second generated network model 503. And using the third real sample image set 501 and the fourth simulation sample image set 502 to train the second generator and the second discriminant included in the second generation network model 503 alternately, so as to obtain a trained second generator and second discriminant. The trained second generator is determined to be the image generation model 504.
A set of alternative sample images 505 is generated using the image generation model 504. A subset of the alternate sample images 506 is determined from the set of alternate sample images 505. A classifier model 509 is obtained by training a preset model 508 from the second alternative image features and the age category label 507 corresponding to each of the at least one alternative sample image.
According to the classifier model 509, a direction vector 510 of the attribute axis corresponding to at least one preset age interval is obtained. At least one first alternate image feature 511 is generated. Each first alternate image feature 511 of the at least one first alternate image feature is adjusted to a desired image feature 512 based on the direction vector 510 of the attribute axis corresponding to the at least one preset age interval. At least one desired image feature 512 is input into the image generation model 504 resulting in an additional sample image set 513.
From the initial sample image set 514 and the additional sample image set 513, a first set of true sample images 515 is obtained. The first real sample image set 515 and the real age set 516 corresponding to the first real sample image set 515 are processed by a first generator included in the first generation resist network model 517, resulting in a real sample image feature set 518 corresponding to the real sample image set 515 and a real age feature set 519 corresponding to the real age set 516. The real sample image feature set 518 and the real age feature set 519 are processed using a first generator included in the first generation resist network model 517 to obtain a first simulated sample image set 520.
The first generator and the first arbiter included in the first generation network model 517 are alternately trained using the first set of true sample images 515 and the first set of simulated sample images 520, to obtain a trained first generator and first arbiter. The trained first generator is determined as an age conversion model 521.
According to an embodiment of the present disclosure, the first generator may include an encoding module, a multi-layer perceptron, and a decoding module. The process of obtaining the target image by the first generator will be described with reference to fig. 6A to 6E.
Fig. 6A schematically illustrates an example schematic diagram of an image generation process according to an embodiment of the present disclosure.
As shown in fig. 6A, in 600A, an original image 601 may be processed with an encoding module 602 of a first generator, resulting in an original image feature 603 corresponding to the original image 601.
The starting age 604 may be processed by a multi-layer perceptron 606 of the first generator to obtain a starting age representation 607 corresponding to the starting age 604. The ending age 605 may be processed by the multi-layer perceptron 606 to obtain an ending age representation 608 corresponding to the ending age 605. From the start age representation 607 and the end age representation 608, a target age representation 609 corresponding to the target age is obtained.
The original image features 603 and the target age representation 609 are processed by a decoding module 610 of the first generator to obtain a target image 611 of the subject.
Fig. 6B schematically illustrates an example schematic diagram of a start age characterization process corresponding to a start age using a multi-layer perceptron to process the start age, in accordance with an embodiment of the present disclosure.
As shown in fig. 6B, in 600B, the multi-layered perceptron 606 of fig. 6A includes the full connection layer 6060, the feature extraction unit 6061, the feature extraction unit 6062, the feature extraction unit 6063, the feature extraction unit 6064, the feature extraction unit 6065, and the feature extraction unit 6066 of fig. 6B.
The start age 604 is processed by the full link layer 6060 to obtain a processing result. Processing results are processed by a feature extraction unit 6061, a feature extraction unit 6062, a feature extraction unit 6063, a feature extraction unit 6064, a feature extraction unit 6065, and a feature extraction unit 6066, respectively, to obtain a 1 st start age characteristic 6070, a 2 nd start age characteristic 6071, a 3 rd start age characteristic 6072, a 4 th start age characteristic 6073, a 5 th start age characteristic 6074, and a 6 th start age characteristic 6075.
The end age indicator 608 may be processed in a manner similar to that shown in fig. 6B, and is not described in detail herein.
Fig. 6C schematically illustrates an example schematic diagram of a process of processing an original image with an encoding module to obtain an original image feature corresponding to the original image according to an embodiment of the present disclosure.
As in fig. 6C, in 600C, the encoding module 602 in fig. 6A includes the feature extraction unit 6020, the feature extraction units 6021, … …, the feature extraction unit 6024, and the feature extraction unit 6025 in fig. 6C.
The original image is processed by a feature extraction unit 6020, feature extraction units 6021, … …, a feature extraction unit 6024, and a feature extraction unit 6025, respectively, to obtain a 1 st original image feature 6030, a 2 nd original image feature 6031, … …, a 5 th original image feature 6034, and a 6 th original image feature 6035.
Fig. 6D schematically illustrates an example schematic diagram of a process of processing original image features and target age characterizations with a decoding module to obtain a target image of an object according to an embodiment of the present disclosure.
As shown in fig. 6D, in 600D, the decoding module 610 in fig. 6A includes the convolution layer 6106, the feature extraction unit 6100, the feature extraction units 6101, … …, and the feature extraction unit 6105 in fig. 6D.
The 1 st original image feature 6030 and the 1 st target age characteristic 6090 are processed by a feature extraction unit 6100 to obtain a 1 st intermediate feature 6120.
The 1 st intermediate result 6120, the 2 nd original image feature 6031, and the 2 nd target age characteristic 6091 are processed by the feature extraction unit 6101 to obtain a 2 nd intermediate feature. … …, the 5 th intermediate result 6124, the 6 th original image feature 6035 and the 6 th target age representation 6095 are processed by the feature extraction unit 6105 to obtain a 6 th intermediate feature.
The 6 th intermediate result is processed with a convolution layer 6106 to obtain a target image 611 of the object.
Fig. 6E schematically illustrates an example schematic diagram of a continuous age transformation according to an embodiment of the disclosure.
As shown in fig. 6E, in 600E, in the case of different target ages, the original image 601 is processed using the image generation method described in the embodiment of the present disclosure, and the target images 611 corresponding to the different target ages are obtained. Along the arrow direction in fig. 6E, the target age of the subject in the target image 611 is increasing.
The above is only an exemplary embodiment, but is not limited thereto, and other image generation methods and training methods of age conversion models known in the art may be included as long as continuous age conversion can be achieved.
It should be noted that, in the technical solution of the present disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, etc. of the personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations.
Fig. 7 schematically shows a block diagram of an image generating apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the image generating apparatus 700 may include a first determination module 710, a second determination module 720, and a first generation module 730.
The first determining module 710 is configured to determine a target age interval corresponding to a target age, where the target age interval includes a start age and an end age.
The second determining module 720 is configured to determine a target age characteristic corresponding to the target age according to the start age characteristic corresponding to the start age and the end age characteristic corresponding to the end age.
The first generation module 730 is configured to generate a target image of the object based on the image of the object in the original image according to the target age characteristic and the original image feature corresponding to the original image. The target image is an image of which the age of the subject is the target age.
According to an embodiment of the present disclosure, the second determination module 720 may include a first obtaining sub-module.
The first obtaining submodule is used for interpolating between the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age according to the difference value between the target age and the starting age and the ending age respectively to obtain the target age representation corresponding to the target age.
According to an embodiment of the present disclosure, the obtaining sub-module may include a first determining unit, a second determining unit, a third determining unit, a fourth determining unit, and a fifth determining unit.
And the first determining unit is used for determining the first difference value. The first difference characterizes a difference between the ending age and the target age.
And a second determining unit configured to determine a second difference value. The second difference characterizes a difference between the target age and the starting age.
And a third determining unit for determining the first ratio. The first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference.
And a fourth determining unit for determining the second ratio. The second ratio characterizes a ratio of the second difference sum to a sum of the first difference and the second difference.
And a fifth determining unit for determining a target age characteristic corresponding to the target age according to the first ratio, the second ratio, the start age characteristic corresponding to the start age, and the end age characteristic corresponding to the end age.
According to an embodiment of the present disclosure, the fifth determining unit may include a first determining subunit, a second determining subunit, and a third determining subunit.
A first determination subunit for determining a first product. The first product characterizes a product between the first ratio and a start age characterization corresponding to the start age.
And a second determination subunit configured to determine a second product. The second product characterizes a product between the second ratio and an ending age characterization corresponding to the ending age.
And a third determination subunit configured to determine a sum between the first product and the second product as a target age characteristic corresponding to the target age.
According to an embodiment of the present disclosure, the start age characterization, the end age characterization, the original image features include N start age characterizations, N end age characterizations, and N original image features, respectively, N being an integer greater than or equal to 1.
The first generation module comprises a second obtaining sub-module, a third obtaining sub-module, a fourth obtaining sub-module, a first determination sub-module and a first generation sub-module.
And the second obtaining submodule is used for extracting the ith potential space representation corresponding to the original image to obtain the ith original image feature corresponding to the original image.
And the third obtaining submodule is used for extracting the j-th latent vector corresponding to the starting age to obtain the j-th starting age representation corresponding to the starting age.
And the fourth obtaining submodule is used for extracting the j-th latent vector corresponding to the ending age to obtain the j-th ending age representation corresponding to the ending age. i and j are integers greater than 1 and less than N.
The first determining submodule is used for determining the jth target age representation according to the jth starting age representation and the jth ending age representation.
The first generation sub-module is used for generating a target image of the object based on the image of the object in the original image according to the N original image features and the N target age characterizations.
According to an embodiment of the present disclosure, the first generation sub-module may include a first obtaining unit, a second obtaining unit, and a first generation unit.
A first obtaining unit, configured to obtain an ith intermediate feature according to the jth target age characteristic and the ith original image feature in the case where i=j=1.
And a second obtaining unit, configured to obtain an ith intermediate feature according to the (j-1) th intermediate feature, the jth target age characteristic, and the ith original image feature, where i=j > 1.
The first generating unit is used for convolving the Nth intermediate feature to generate a target image of the object.
According to an embodiment of the present disclosure, the start age, end age, and original image characteristics are obtained by processing the start age, end age, and original image, respectively, using an age transformation model. The target image is obtained by processing the original image characteristics and the target age characterization by using an age transformation model.
Fig. 8 schematically illustrates a block diagram of a training apparatus of an age translation model according to an embodiment of the present disclosure.
As shown in fig. 8, the training apparatus 800 of the age translation model may include a training module 810 and a third determination module 820.
The training module 810 trains the first generator and the first arbiter using the first set of real sample images and the first set of simulated sample images, resulting in a trained first generator and first arbiter.
A third determining module 820 is configured to determine the trained first generator as an age transformation model. The age transformation model is used to generate the target image according to the disclosed embodiments.
According to an embodiment of the present disclosure, the training module 810 may include a second generation sub-module, a fifth acquisition sub-module, a sixth acquisition sub-module, a seventh acquisition sub-module, and an eighth acquisition sub-module.
A second generation sub-module for generating an additional sample image set. The additional sample image set characterizes the sample image set of at least one preset age interval.
And a fifth obtaining sub-module, configured to obtain a first real sample image set according to the initial sample image set and the additional sample image set.
And a sixth obtaining submodule, configured to process the first real sample image set and the real age set corresponding to the first real sample image set respectively, and obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set.
And a seventh obtaining sub-module, configured to obtain a first simulation sample image set according to the real sample image feature set and the real age feature set.
And the eighth obtaining submodule is used for training the first generator and the first discriminator by using the first real sample image set and the first simulation sample image set to obtain a trained first generator and first discriminator.
According to an embodiment of the present disclosure, the sixth obtaining sub-module may include a first obtaining unit and a second obtaining unit.
The first obtaining unit is used for processing the first real sample image set by using an encoding module included in the first generator to obtain a real sample image feature set corresponding to the first real sample image set.
The second obtaining unit is used for processing the real age set corresponding to the first real sample image set by using the multi-layer perceptron included in the first generator to obtain a real age characteristic set corresponding to the real age set.
According to an embodiment of the present disclosure, the seventh obtaining sub-module may include a third obtaining unit.
The third obtaining unit is used for processing the real sample image feature set and the real age feature set by using a decoding module included in the first generator to obtain a first simulation sample image set.
According to an embodiment of the present disclosure, the eighth obtaining sub-module may include a first training unit, a sixth determining unit, a fourth obtaining unit, a fifth obtaining unit, and a second training unit.
And the first training unit is used for alternately training the first generator and the first discriminator by using the first real sample image set and the first simulation sample image set under the condition that the preset condition is not met.
A sixth determining unit configured to determine a second set of simulation sample images from the first set of simulation sample images in case it is determined that the preset condition is satisfied.
The fourth obtaining unit is used for obtaining a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set.
And a fifth obtaining unit, configured to obtain a second real sample image set according to the second simulation sample image set and the first real sample image set.
And the second training unit is used for alternately training the first generator and the first discriminator by using the second real sample image set and the third simulation sample image set.
According to an embodiment of the present disclosure, the third generation sub-module may include a sixth obtaining unit, a second generation unit, and a third generation unit.
And a sixth obtaining unit, configured to obtain, according to the classifier model, a direction vector of the attribute axis corresponding to at least one preset age interval.
And the second generation unit is used for generating at least one first standby image feature.
And a third generation unit for generating an additional sample image set based on the image generation model by using at least one first standby image feature and a direction vector of an attribute axis corresponding to at least one preset age interval.
According to an embodiment of the present disclosure, the training apparatus 800 of the age conversion model may further include a second generation module, a fourth determination module, and a first obtaining module.
And the second generation module is used for generating a standby sample image set by using the image generation model.
And a fourth determining module for determining a subset of the alternate sample images from the set of alternate sample images. The subset of backup sample images characterizes a subset of backup sample images corresponding to at least one preset age interval, the subset of backup sample images including at least one backup sample image.
The first obtaining module is used for training a preset model according to the second standby image characteristics and the age category labels corresponding to each standby sample image in the at least one standby sample image to obtain a classifier model.
According to an embodiment of the present disclosure, the third generating unit may include an adjusting subunit and an obtaining subunit.
An adjustment subunit for adjusting each of the at least one first alternate image feature to a desired image feature based on the direction vector of the attribute axis corresponding to the at least one preset age interval. The expected image features are image features corresponding to expected age category labels.
A sub-unit is obtained for inputting at least one desired image feature into the image generation model, resulting in an additional sample image set.
According to an embodiment of the present disclosure, the training apparatus 800 of the age conversion model may further include a third generation module, a second obtaining module, and a fifth determining module.
A third generation module for generating a fourth set of simulated sample images using the second generator;
and the second obtaining module is used for alternately training the second generator and the second discriminator by utilizing the third real sample image set and the fourth simulation sample image set to obtain the trained second generator and second discriminator.
And a fifth determining module for determining the trained second generator as an image generation model.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as above.
According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as above.
Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement an image generation method and a training method of an age conversion model, according to an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic device 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, an image generation method or a training method of an age conversion model. For example, in some embodiments, the image generation method or the training method of the age translation model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the image generation method or the training method of the age conversion model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the image generation method or the training method of the age conversion model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (31)

1. An image generation method, comprising:
determining a target age interval corresponding to a target age, wherein the target age interval comprises a start age and an end age;
determining a target age characteristic corresponding to the target age according to a starting age characteristic corresponding to the starting age and an ending age characteristic corresponding to the ending age; and
generating a target image of an object based on the image of the object in the original image according to the target age characterization and the original image characteristics corresponding to the original image, wherein the target image is an image of which the age of the object is the target age;
The starting age characterization, the ending age characterization and the original image features respectively comprise N starting age characterizations, N ending age characterizations and N original image features, wherein N is an integer greater than or equal to 1;
the generating a target image of the object based on the image of the object in the original image according to the target age representation and the original image characteristics corresponding to the original image comprises the following steps: extracting an ith potential space representation corresponding to the original image to obtain an ith original image feature corresponding to the original image; extracting a j-th latent vector corresponding to the starting age to obtain a j-th starting age representation corresponding to the starting age; extracting a j-th latent vector corresponding to the ending age to obtain a j-th ending age characterization corresponding to the ending age, wherein i and j are integers which are more than or equal to 1 and less than or equal to N; determining a jth target age characteristic according to the jth start age characteristic and the jth end age characteristic; and generating a target image of the object based on the image of the object in the original image according to the N original image features and the N target age characterizations.
2. The method of claim 1, wherein the determining a target age characteristic corresponding to the target age from a start age characteristic corresponding to the start age and an end age characteristic corresponding to the end age comprises:
and according to the difference value between the target age and the starting age and the ending age respectively, interpolating between a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age to obtain a target age representation corresponding to the target age.
3. The method of claim 2, wherein interpolating between a start age representation corresponding to the start age and an end age representation corresponding to the end age based on the difference between the target age and the start age and the end age, respectively, results in a target age representation corresponding to the target age, comprising:
determining a first difference, wherein the first difference characterizes a difference between the ending age and the target age;
determining a second difference, wherein the second difference characterizes a difference between the target age and the starting age;
Determining a first ratio, wherein the first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference;
determining a second ratio, wherein the second ratio characterizes a ratio of the second difference sum to a sum of the first difference and the second difference; and
and determining a target age representation corresponding to the target age according to the first ratio, the second ratio, the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age.
4. The method of claim 3, wherein the determining a target age characteristic corresponding to the target age from the first ratio, the second ratio, a start age characteristic corresponding to the start age, and an end age characteristic corresponding to the end age comprises:
determining a first product, wherein the first product characterizes a product between the first ratio and a starting age characterization corresponding to the starting age;
determining a second product, wherein the second product characterizes a product between the second ratio and an ending age characterization corresponding to the ending age; and
And determining the sum between the first product and the second product as a target age characteristic corresponding to the target age.
5. The method of claim 1, wherein the generating a target image of the object based on the image of the object in the original image from the N original image features and the N target age characterizations comprises:
under the condition that i=j=1, according to the jth target age characteristic and the ith original image characteristic, obtaining a jth intermediate characteristic;
under the condition that i=j > 1, according to the (j-1) th intermediate feature, the j-th target age characteristic and the i-th original image feature, obtaining the j-th intermediate feature; and
and convolving the Nth intermediate feature to generate a target image of the object.
6. A method according to any one of claims 1 to 3, wherein the start age characterization, the end age characterization and the raw image features are derived by processing the start age, the end age and the raw image respectively using an age transformation model;
the target image is obtained by processing the original image features and the target age characterization by using the age transformation model.
7. A method of training an age transformation model, comprising:
training a first generator and a first discriminator by using a first real sample image set and a first simulation sample image set to obtain a first generator and a first discriminator after training is completed; and
determining the trained first generator as the age transformation model, wherein the age transformation model is used to generate the target image of any one of claims 1-6.
8. The method of claim 7, wherein the training the first generator and the first arbiter with the first set of real sample images and the first set of simulated sample images to obtain a trained first generator and first arbiter comprises:
generating an additional sample image set, wherein the additional sample image set characterizes a sample image set of at least one preset age interval;
obtaining the first real sample image set according to the initial sample image set and the additional sample image set;
processing the first real sample image set and a real age set corresponding to the first real sample image set respectively to obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set;
Obtaining the first simulation sample image set according to the real sample image feature set and the real age feature set; and
and training a first generator and a first discriminator by using the first real sample image set and the first simulation sample image set to obtain the trained first generator and first discriminator.
9. The method of claim 8, wherein the processing the first set of real sample images and the set of real ages corresponding to the first set of real sample images, respectively, to obtain the set of real sample image features corresponding to the first set of real sample images and the set of real age indicators corresponding to the set of real ages, comprises:
processing the first real sample image set by using an encoding module included in the first generator to obtain a real sample image feature set corresponding to the first real sample image set; and
and processing the real age set corresponding to the first real sample image set by using a multi-layer perceptron included in the first generator to obtain a real age characteristic set corresponding to the real age set.
10. The method according to claim 8 or 9, wherein said deriving the first simulated sample image set from the real sample image feature set and the real age feature set comprises:
And processing the real sample image feature set and the real age feature set by using a decoding module included in the first generator to obtain the first simulation sample image set.
11. The method of claim 8 or 9, wherein the training the first generator and the first arbiter with the first set of real sample images and the first set of simulated sample images resulting in the trained first generator and first arbiter comprises:
under the condition that the preset condition is not met, the first generator and the first discriminator are trained alternately by using the first real sample image set and the first simulation sample image set;
determining a second simulation sample image set from the first simulation sample image set under the condition that the preset condition is met;
obtaining a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set;
obtaining a second real sample image set according to the second simulation sample image set and the first real sample image set; and
and using the second real sample image set and the third simulation sample image set to train the first generator and the first discriminator alternately.
12. The method of claim 8 or 9, wherein the generating an additional set of sample images comprises:
obtaining a direction vector of an attribute shaft corresponding to the at least one preset age interval according to the classifier model;
generating at least one first alternate image feature; and
the additional sample image set is generated based on an image generation model using the at least one first alternate image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval.
13. The method of claim 12, further comprising:
generating a standby sample image set by using the image generation model;
determining a backup sample image subset from the backup sample image set, wherein the backup sample image subset characterizes a backup sample image subset corresponding to the at least one preset age interval, the backup sample image subset comprising at least one backup sample image; and
and training a preset model according to the second standby image characteristics and the age category labels corresponding to each standby sample image in the at least one standby sample image to obtain the classifier model.
14. The method of claim 12, wherein the generating the additional sample image set based on the image generation model using the at least one first alternate image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval comprises:
Adjusting each first standby image feature of the at least one first standby image feature to an expected image feature based on a direction vector of an attribute axis corresponding to the at least one preset age interval, wherein the expected image feature is an image feature corresponding to an expected age category label; and
inputting at least one of the expected image features into the image generation model to obtain the additional sample image set.
15. The method of claim 12, further comprising:
generating a fourth set of simulated sample images with the second generator;
alternately training the second generator and the second discriminator by using the third real sample image set and the fourth simulation sample image set to obtain a trained second generator and second discriminator; and
determining a second generator of the training completion as the image generation model.
16. An image generating apparatus comprising:
a first determining module, configured to determine a target age interval corresponding to a target age, where the target age interval includes a start age and an end age;
the second determining module is used for determining a target age representation corresponding to the target age according to the starting age representation corresponding to the starting age and the ending age representation corresponding to the ending age; and
The first generation module is used for generating a target image of the object based on the image of the object in the original image according to the target age representation and the original image characteristics corresponding to the original image, wherein the target image is an image of which the age of the object is the target age;
the starting age characterization, the ending age characterization and the original image features respectively comprise N starting age characterizations, N ending age characterizations and N original image features, wherein N is an integer greater than or equal to 1;
the first generation module includes:
the second obtaining submodule is used for extracting an ith potential space representation corresponding to the original image to obtain an ith original image feature corresponding to the original image;
the third obtaining submodule is used for extracting a j-th latent vector corresponding to the starting age to obtain a j-th starting age representation corresponding to the starting age;
a fourth obtaining submodule, configured to extract a jth latent vector corresponding to the ending age to obtain a jth ending age representation corresponding to the ending age, where i and j are integers greater than or equal to 1 and less than or equal to N;
The first determining submodule is used for determining a jth target age characteristic according to the jth starting age characteristic and the jth ending age characteristic; and
the first generation sub-module is used for generating a target image of the object based on the image of the object in the original image according to N original image features and N target age characterizations.
17. The apparatus of claim 16, wherein the second determination module comprises:
and the first obtaining submodule is used for interpolating between a starting age representation corresponding to the starting age and an ending age representation corresponding to the ending age according to the difference value between the target age and the starting age and the ending age respectively to obtain a target age representation corresponding to the target age.
18. The apparatus of claim 17, wherein the first obtaining sub-module comprises:
a first determining unit configured to determine a first difference value, wherein the first difference value characterizes a difference value between the ending age and the target age;
a second determining unit configured to determine a second difference value, wherein the second difference value characterizes a difference value between the target age and the start age;
A third determining unit configured to determine a first ratio, wherein the first ratio characterizes a ratio of the first difference sum to a sum of the first difference and the second difference;
a fourth determining unit configured to determine a second ratio, wherein the second ratio characterizes a ratio of the second difference sum to a sum of the first difference and the second difference; and
a fifth determining unit, configured to determine a target age characteristic corresponding to the target age according to the first ratio, the second ratio, a start age characteristic corresponding to the start age, and an end age characteristic corresponding to the end age.
19. The apparatus of claim 16, wherein the first generation sub-module comprises:
a first obtaining unit, configured to obtain a jth intermediate feature according to a jth target age characteristic and an ith original image feature in a case where i=j=1;
a second obtaining unit, configured to obtain a jth intermediate feature according to the (j-1) th intermediate feature, the jth target age characteristic, and the ith original image feature, where i=j > 1; and
the first generating unit is used for convolving the Nth intermediate feature to generate a target image of the object.
20. The apparatus of claim 16 or 17, wherein the start age characterization, the end age characterization, and the raw image features are derived by processing the start age, the end age, and the raw image, respectively, using an age transformation model;
the target image is obtained by processing the original image features and the target age characterization by using the age transformation model.
21. A training device of an age transformation model, comprising:
the training module is used for training the first generator and the first discriminator by using the first real sample image set and the first simulation sample image set to obtain a first generator and a first discriminator after training; and
a third determining module for determining the trained first generator as the age transformation model, wherein the age transformation model is used to generate the target image of any one of claims 16-20.
22. The apparatus of claim 21, wherein the training module comprises:
a second generation sub-module for generating an additional sample image set, wherein the additional sample image set characterizes a sample image set of at least one preset age interval;
A fifth obtaining sub-module, configured to obtain the first real sample image set according to the initial sample image set and the additional sample image set;
a sixth obtaining submodule, configured to respectively process the first real sample image set and a real age set corresponding to the first real sample image set, and obtain a real sample image feature set corresponding to the first real sample image set and a real age feature set corresponding to the real age set;
a seventh obtaining sub-module, configured to obtain the first simulated sample image set according to the real sample image feature set and the real age feature set; and
and an eighth obtaining sub-module, configured to train the first generator and the first arbiter by using the first real sample image set and the first simulation sample image set, so as to obtain the trained first generator and first arbiter.
23. The apparatus of claim 22, wherein the sixth obtaining sub-module comprises:
a first obtaining unit, configured to process the first real sample image set by using an encoding module included in the first generator, to obtain a real sample image feature set corresponding to the first real sample image set; and
The second obtaining unit is used for processing the real age set corresponding to the first real sample image set by using the multi-layer perceptron included in the first generator to obtain a real age sign set corresponding to the real age set.
24. The apparatus of claim 22 or 23, wherein the seventh obtaining sub-module comprises:
and the third obtaining unit is used for processing the real sample image feature set and the real age feature set by using a decoding module included in the first generator to obtain the first simulation sample image set.
25. The apparatus of claim 22 or 23, wherein the eighth obtaining sub-module comprises:
the first training unit is used for alternately training the first generator and the first discriminator by utilizing the first real sample image set and the first simulation sample image set under the condition that the preset condition is not met;
a sixth determining unit configured to determine a second set of simulation sample images from the first set of simulation sample images in case that it is determined that the preset condition is satisfied;
a fourth obtaining unit, configured to obtain a third simulation sample image set according to the first simulation sample image set and the second simulation sample image set;
A fifth obtaining unit, configured to obtain a second real sample image set according to the second simulated sample image set and the first real sample image set; and
and the second training unit is used for alternately training the first generator and the first discriminator by using the second real sample image set and the third simulation sample image set.
26. The apparatus of claim 22 or 23, wherein the second generation sub-module comprises:
a sixth obtaining unit, configured to obtain, according to the classifier model, a direction vector of an attribute axis corresponding to the at least one preset age interval;
a second generation unit for generating at least one first standby image feature; and
and a third generating unit, configured to generate the additional sample image set based on an image generation model by using the at least one first standby image feature and a direction vector of an attribute axis corresponding to the at least one preset age interval.
27. The apparatus of claim 26, further comprising:
a second generation module for generating a set of backup sample images using the image generation model;
a fourth determining module configured to determine a backup sample image subset from the backup sample image set, wherein the backup sample image subset characterizes a backup sample image subset corresponding to the at least one preset age interval, the backup sample image subset including at least one backup sample image; and
The first obtaining module is used for training a preset model according to the second standby image characteristics and the age category labels corresponding to each standby sample image in the at least one standby sample image, and obtaining the classifier model.
28. The apparatus of claim 26, wherein the third generation unit comprises:
an adjustment subunit, configured to adjust each first standby image feature of the at least one first standby image feature to an expected image feature based on a direction vector of an attribute axis corresponding to the at least one preset age interval, where the expected image feature is an image feature corresponding to an expected age category label; and
an obtaining subunit, configured to input at least one of the expected image features into the image generation model to obtain the additional sample image set.
29. The apparatus of claim 26, further comprising:
a third generation module for generating a fourth set of simulated sample images using the second generator;
the second obtaining module is used for alternately training the second generator and the second discriminator by utilizing the third real sample image set and the fourth simulation sample image set to obtain a trained second generator and second discriminator; and
And a fifth determining module for determining the trained second generator as the image generation model.
30. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6 or any one of claims 7 to 15.
31. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6 or any one of claims 7-15.
CN202111184646.6A 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium Active CN113902957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111184646.6A CN113902957B (en) 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111184646.6A CN113902957B (en) 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN113902957A CN113902957A (en) 2022-01-07
CN113902957B true CN113902957B (en) 2024-02-09

Family

ID=79191390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111184646.6A Active CN113902957B (en) 2021-10-11 2021-10-11 Image generation method, training method and device of model, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN113902957B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977629A (en) * 2017-12-04 2018-05-01 电子科技大学 A kind of facial image aging synthetic method of feature based separation confrontation network
CN109308450A (en) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 A kind of face's variation prediction method based on generation confrontation network
CN111402113A (en) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111898482A (en) * 2020-07-14 2020-11-06 贵州大学 Face prediction method based on progressive generation confrontation network
CN111985405A (en) * 2020-08-21 2020-11-24 南京理工大学 Face age synthesis method and system
CN113392769A (en) * 2021-06-16 2021-09-14 广州繁星互娱信息科技有限公司 Face image synthesis method and device, electronic equipment and storage medium
US11120526B1 (en) * 2019-04-05 2021-09-14 Snap Inc. Deep feature generative adversarial neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977629A (en) * 2017-12-04 2018-05-01 电子科技大学 A kind of facial image aging synthetic method of feature based separation confrontation network
CN109308450A (en) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 A kind of face's variation prediction method based on generation confrontation network
US11120526B1 (en) * 2019-04-05 2021-09-14 Snap Inc. Deep feature generative adversarial neural networks
CN111402113A (en) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111898482A (en) * 2020-07-14 2020-11-06 贵州大学 Face prediction method based on progressive generation confrontation network
CN111985405A (en) * 2020-08-21 2020-11-24 南京理工大学 Face age synthesis method and system
CN113392769A (en) * 2021-06-16 2021-09-14 广州繁星互娱信息科技有限公司 Face image synthesis method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAN-GAN: Conditioned-attention normalized GAN for face age synthesis;Chenglong Shi et al.;Pattern Recognition Letters;第138卷;520-526 *
刘扬 等.测量数据处理及误差理论.原子能出版社,1997,第293页. *
基于深度学习的人脸老化合成研究;刘璐;中国博士学位论文全文数据库信息科技辑;第2021卷(第03期);第I138-26页 *

Also Published As

Publication number Publication date
CN113902957A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
CN111695415B (en) Image recognition method and related equipment
CN112800292B (en) Cross-modal retrieval method based on modal specific and shared feature learning
CN112069302A (en) Training method of conversation intention recognition model, conversation intention recognition method and device
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN114612290B (en) Training method of image editing model and image editing method
CN113408430A (en) Image Chinese description system and method based on multistage strategy and deep reinforcement learning framework
CN114863229A (en) Image classification method and training method and device of image classification model
CN114840734B (en) Training method of multi-modal representation model, cross-modal retrieval method and device
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN112036439B (en) Dependency relationship classification method and related equipment
CN115269768A (en) Element text processing method and device, electronic equipment and storage medium
CN115248890A (en) User interest portrait generation method and device, electronic equipment and storage medium
WO2024021685A1 (en) Reply content processing method and media content interactive content interaction method
CN116975347A (en) Image generation model training method and related device
CN113902957B (en) Image generation method, training method and device of model, electronic equipment and medium
CN113536845A (en) Face attribute recognition method and device, storage medium and intelligent equipment
CN116091857B (en) Training method of image processing model, image processing method and device
CN117556275B (en) Correlation model data processing method, device, computer equipment and storage medium
WO2023112169A1 (en) Training method, estimation method, training device, estimation device, training program, and estimation program
CN115481285B (en) Cross-modal video text matching method and device, electronic equipment and storage medium
CN113553433B (en) Product classification method, device, medium and terminal equipment based on artificial intelligence
CN113420561B (en) Named entity identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant