CN109800730B

CN109800730B - Method and device for generating head portrait generation model

Info

Publication number: CN109800730B
Application number: CN201910090200.3A
Authority: CN
Inventors: 李华夏
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2022-03-08
Anticipated expiration: 2039-01-30
Also published as: CN109800730A

Abstract

The embodiment of the disclosure discloses a method and a device for generating an avatar generation model. One embodiment of the method comprises: acquiring a preset training sample set; acquiring a pre-established initial generation countermeasure network: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model. The avatar generation model generated by the embodiment can generate an avatar by using the input vector, thereby contributing to the improvement of the efficiency of generating the avatar.

Description

Method and device for generating head portrait generation model

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for generating an avatar generation model.

Background

At present, in some occasions, a large number of head portraits are needed for people to use. For example, when training a model for converting an image of a user's face into an image of another style (e.g., an animation avatar), a large number of avatars are generally required as training samples. The existing method for generating training samples mainly generates training samples by acquiring head portraits from existing head portraits resources or manually drawing the head portraits.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating an avatar generation model, and a method and a device for generating an avatar.

In a first aspect, an embodiment of the present disclosure provides a method for generating an avatar generation model, the method including: acquiring a preset training sample set, wherein the training samples comprise sample vectors and sample head portraits corresponding to the sample vectors; acquiring a pre-established initial generation countermeasure network, wherein the initial generation countermeasure network comprises a generation network and a judgment network, the generation network is used for generating an avatar by using the input sample vector, and the judgment network is used for distinguishing the avatar output by the generation network and the sample avatar corresponding to the sample vector input into the generation network; the following training steps are performed: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model.

In some embodiments, after determining the trained generative network as the avatar generation model, the training step further comprises: for training samples in a training sample set, inputting sample vectors included in the training samples into the generated head portrait generation model to obtain head portraits; determining a difference degree value for characterizing a degree of difference between the obtained avatars; and in response to determining that the difference degree value is greater than or equal to a preset difference degree threshold value, determining the generated head portrait generation model as a final head portrait generation model.

In some embodiments, after determining a difference measure value characterizing a degree of difference between the resulting avatars, the method further comprises: in response to determining that the degree of difference value is less than the degree of difference threshold, regenerating the set of training samples; and taking the generated avatar generation model as an initial generation countermeasure network, and executing the training step again by using the regenerated training sample set.

In some embodiments, regenerating the set of training samples comprises: and for the training samples in the training sample set, regenerating the sample vectors corresponding to the sample avatars included in the training samples, and determining the regenerated sample vectors and the sample avatars included in the training samples as regenerated training samples.

In some embodiments, determining a difference measure value for characterizing a degree of difference between the resulting avatars includes: determining a feature vector of the head portrait in the obtained head portrait; based on the determined distance between each two of the feature vectors, a degree of difference value is determined.

In some embodiments, wherein the sample vector is a randomly generated vector of preset dimensions.

In a second aspect, embodiments of the present disclosure provide a method for generating an avatar, the method comprising: acquiring a target vector; inputting the target vector into a pre-trained avatar generation model, obtaining an avatar corresponding to the target vector and outputting the avatar, wherein the avatar generation model is generated according to the method described in any of the embodiments of the first aspect.

In some embodiments, the target vector is a randomly generated vector of preset dimensions.

In a third aspect, an embodiment of the present disclosure provides an apparatus for generating an avatar generation model, the apparatus including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is configured to acquire a preset training sample set, and the training samples comprise sample vectors and sample head portraits corresponding to the sample vectors; a second obtaining unit configured to obtain a pre-established initial generation countermeasure network, wherein the initial generation countermeasure network includes a generation network for generating an avatar using the input sample vector and a discrimination network for discriminating the avatar output by the generation network from the sample avatar corresponding to the sample vector input to the generation network; a training unit configured to perform the following training steps: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model.

In some embodiments, the training unit comprises: the generating module is configured to input sample vectors included in training samples into the generated head portrait generating model for the training samples in the training sample set to obtain head portraits; a first determining module configured to determine a difference measure value characterizing a degree of difference between the obtained head images; a second determination module configured to determine the generated avatar generation model as a final avatar generation model in response to determining that the degree of difference value is greater than or equal to a preset difference degree threshold.

In some embodiments, the training unit further comprises: a generation module configured to regenerate the set of training samples in response to determining that the degree of difference value is less than the degree of difference threshold; and the training module is configured to take the generated head portrait generation model as an initial generation countermeasure network, and perform the training step again by using the regenerated training sample set.

In some embodiments, the generation module is further configured to: and for the training samples in the training sample set, regenerating the sample vectors corresponding to the sample avatars included in the training samples, and determining the regenerated sample vectors and the sample avatars included in the training samples as regenerated training samples.

In some embodiments, the first determining module comprises: a first determining submodule configured to determine a feature vector of an avatar in the obtained avatar; a second determination submodule configured to determine a degree of difference value based on a distance between the determined feature vectors.

In some embodiments, the sample vector is a randomly generated vector of preset dimensions.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for generating an avatar, the apparatus including: a vector acquisition unit configured to acquire a target vector; and an avatar generation unit configured to input the target vector into a pre-trained avatar generation model, obtain an avatar corresponding to the target vector and output, wherein the avatar generation model is generated according to the method described in any of the embodiments of the first aspect.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when executed by one or more processors, cause the one or more processors to implement a method as described in any of the implementations of the first or second aspects.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements a method as described in any of the implementations of the first or second aspects.

According to the method and the device for generating the head portrait generation model, a preset training sample set is obtained, wherein the training samples comprise sample vectors and sample head portraits corresponding to the sample vectors, the sample vectors included in the training samples in the training sample set are used as the input of a generation network, the head portraits output by the generation network and the sample head portraits corresponding to the sample vectors input to the generation network are used as the input of a judgment network, the generation network and the judgment network are trained, the generated network after training is determined to be the head portrait generation model, and the head portrait generation model after training can generate the head portraits by using the input vectors, so that the efficiency of generating the head portraits is improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating an avatar generation model, according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of one application scenario of a method for generating an avatar generation model in accordance with embodiments of the present disclosure;

FIG. 4 is a flow diagram of one embodiment of a method for generating an avatar according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating an embodiment of an apparatus for generating an avatar generation model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating an architecture of one embodiment of an apparatus for generating an avatar according to embodiments of the present disclosure;

FIG. 7 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates a method for generating an avatar generation model or an apparatus for generating an avatar generation model, and an exemplary system architecture 100 of a method for generating an avatar or an apparatus for generating an avatar, to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as an image processing application, a web browser application, an instant messaging tool, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal apparatuses

101, 102, 103 are hardware, various electronic apparatuses are possible. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server processing the training sample set uploaded by the

terminal devices

101, 102, 103. The background server can train the initially generated countermeasure network by using the acquired training sample set, so that an avatar generation model is obtained. In addition, the background image processing server can also process the input vector by using the avatar generation model to obtain the avatar.

It should be noted that the method for generating the avatar generation model provided in the embodiment of the present disclosure may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for generating the avatar generation model may be provided in the server 105, or may be provided in the

terminal devices

101, 102, and 103. Furthermore, the method for generating the avatar provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for generating the avatar may be disposed in the server 105, and may also be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the set of training samples or target vectors required to train the model need not be obtained remotely, the system architecture described above may not include a network, but only a server or terminal device.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating an avatar generation model in accordance with the present disclosure is shown. The method for generating the head portrait generation model comprises the following steps:

step 201, a preset training sample set is obtained.

In this embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) of the method for generating the avatar generation model may obtain a preset training sample set from a remote location or from a local location through a wired connection manner or a wireless connection manner. The training sample comprises a sample vector and a sample head portrait corresponding to the sample vector.

The sample avatar may be an avatar obtained in advance in various ways. Such as an avatar extracted from a preset avatar library, or an avatar manually drawn by a person. In practice, the sample head portrait is usually a cartoon head portrait, and a head portrait generation model obtained by training the cartoon head portrait can be used for generating the cartoon head portrait.

The sample vector may be pre-generated in various ways. For example, it may be previously obtained by a technician through manual input.

In some optional implementations of this embodiment, the sample vector is a randomly generated vector of preset dimensions. Specifically, as an example, the execution subject for generating the sample vector may generate a preset number (e.g., 512) of random numbers conforming to the normal distribution, combine the preset number of numbers conforming to the normal distribution into one vector, and determine the vector as the sample vector, which has the same dimension as the preset number.

Step 202, a pre-established initial generation countermeasure network is obtained.

In this embodiment, the execution subject may obtain the pre-established initial generation countermeasure network from a remote location or from a local location. The initial generative rival network may be a generative rival network of various configurations. For example, the initially generated confrontation Network may be a Deep convolution generated confrontation Network (DCGAN). It should be understood that the initial generative countermeasure network may be an untrained generative countermeasure network with initialized parameters, or may be an already trained generative countermeasure network.

The initially generated confrontation network comprises a generating network and a judging network, wherein the generating network is used for generating the head portrait by using the input sample vector, and the judging network is used for distinguishing the head portrait output by the generating network from the sample head portrait corresponding to the sample vector input into the generating network.

The generation network may be a convolutional neural network (e.g., various convolutional neural network structures including a convolutional layer, a pooling layer, an anti-pooling layer, and an anti-convolutional layer) for performing image processing. The discriminative network may be a convolutional neural network (e.g., a convolutional neural network of various structures including a fully-connected layer, wherein the fully-connected layer may perform a classification function). In addition, the discriminant network may be another model for implementing a classification function, such as a Support Vector Machine (SVM). Here, for each avatar input to the discrimination network, if the discrimination network determines that the avatar is an avatar output by the generation network, the discrimination network may output a tag 1 (or 0) corresponding to the avatar; if it is determined that the avatar is not an image output by the generation network, a tag 0 (or 1) corresponding to the avatar may be output. The discrimination network may output other preset information, and is not limited to the values 1 and 0.

Step 203, the following training steps are performed: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model.

In this embodiment, the executing entity may execute the following training steps: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model.

Specifically, the executing agent may first fix parameters of any one of the generating network and the discriminating network (which may be referred to as a first network), and optimize a network (which may be referred to as a second network) with unfixed parameters; and then fixing the parameters of the second network and optimizing the first network. And continuously performing the iteration to ensure that the discrimination network cannot distinguish the head portrait output by the generation network from the sample head portrait corresponding to the sample vector input into the generation network. In this case, the avatar generated by the generation network is close to the sample avatar, and the discrimination network cannot accurately discriminate the avatar output by the generation network from the sample avatar (that is, the discrimination accuracy is 50%), and the generation network at this time can be determined as the avatar generation model. In general, the execution agent may train the generation network and the discrimination network using existing back propagation algorithm and gradient descent algorithm. And adjusting the parameters of the generation network and the discrimination network after each training, and taking the generation network and the discrimination network obtained after each parameter adjustment as the initial generation countermeasure network for the next training. In the training process, a loss value can be determined by using a loss function, and the generation network and the discrimination network are iteratively trained according to the loss value, so that the determined loss value is minimum in each iterative operation.

Therefore, parameters of the head portrait generation model obtained based on the initially generated confrontation network training can be determined based on the back propagation of the discrimination result of the discrimination network, and the model training can be realized without depending on a large number of labeled samples, so that the labor cost can be reduced, and the efficiency of generating the head portrait generation model is further improved.

In some optional implementations of this embodiment, after determining the trained generation network as the avatar generation model, the training step may further include the following sub-steps:

step one, inputting a sample vector included by a training sample into a generated head portrait generation model for the training sample in a training sample set to obtain a head portrait.

And step two, determining a difference degree value for representing the difference degree between the obtained head images. Specifically, as an example, the executing body may determine the similarity between two obtained head images by using an existing algorithm for determining the similarity between images (e.g., a perceptual hash algorithm, a SIFT (Scale-invariant feature transform) algorithm, or the like), and determine the minimum value (or average value) of the obtained similarities as the degree of difference value. Generally, the greater the difference value, the greater the degree of difference between the representations obtained, i.e., the greater the diversity of the obtained avatars.

In some optional implementations of this embodiment, the executing entity may determine the difference degree value according to the following steps:

first, a feature vector of the avatar in the obtained avatar is determined. Specifically, the executing entity may obtain the feature vector of each avatar by using an existing method for determining the feature vector of the image (e.g., LBP (Local Binary Pattern) algorithm, neural network-based algorithm, etc.). Wherein the feature vector of the avatar can be used to characterize the features (e.g., shape, texture, etc.) of the avatar. As an example, the execution subject may determine the feature vector of the avatar using a preset convolutional neural network. In general, a convolutional neural network may include a convolutional layer for generating a feature map (feature map) of an avatar and a fully-connected layer for connecting data included in the feature map to obtain a feature vector.

Then, based on the distance between the determined feature vectors, a degree of difference value is determined. Specifically, the execution body may determine a minimum value (or an average value) of the obtained distances as the degree of difference value. Wherein, the distance between the feature vectors may include, but is not limited to, any of the following: euclidean distance, manhattan distance, chebyshev distance, etc.

And step three, in response to the fact that the difference degree value is larger than or equal to a preset difference degree threshold value, determining the generated head portrait generation model as a final head portrait generation model. When the determined difference degree value is greater than or equal to a preset difference degree threshold value, it indicates that the difference between the avatars generated by using the current avatar generation model is large, that is, the variety of the generated avatars is large. By the implementation mode, the similarity between the head portraits generated by the head portraits generating model can be reduced, and more diversified head portraits can be generated.

In some optional implementations of this embodiment, the executing body may further perform the following steps after determining a difference degree value for characterizing a difference degree between the obtained head images:

first, in response to determining that the determined degree of variance value is less than the degree of variance threshold, the set of training samples is regenerated. As an example, the executing entity may readjust the correspondence between the sample vectors and the sample avatars included in the training samples in the training sample set, so as to generate a new training sample set. For example, assuming that sample vector a originally corresponds to sample image a and sample vector B originally corresponds to sample image B, sample vector a may be adjusted to correspond to sample image B and sample vector B may be adjusted to correspond to sample image a.

In some optional implementations of this embodiment, the executing entity may regenerate the training sample set according to the following steps: and for the training samples in the training sample set, regenerating the sample vectors corresponding to the sample avatars included in the training samples, and determining the regenerated sample vectors and the sample avatars included in the training samples as regenerated training samples. Wherein the sample vector may be pre-generated in various ways. For example, it may be generated by a technician through manual input. Alternatively, it may be generated automatically at random. For example, the execution body may generate a preset number of random numbers conforming to a normal distribution, combine the generated numbers into one vector, and determine the vector as a regenerated sample vector, where the dimension of the sample vector is the same as that of the original sample vector.

Then, the generated avatar generation model is used as an initial generation confrontation network, and the training step is executed again by using the regenerated training sample set. The generated avatar generation model can be retrained again through the re-generated training sample set, so that the diversity of the avatar output by the retrained avatar generation model is improved.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating an avatar generation model according to the present embodiment. In the application scenario of fig. 3, the electronic device 301 first obtains a preset training sample set 302. Wherein each training sample in the training sample set 302 includes a sample vector (e.g., a randomly generated 512-dimensional vector in advance), and a sample avatar corresponding to the sample vector. The electronic device 301 then locally retrieves the pre-established initial generation countermeasure network 303. The initial generation countermeasure network 303 includes a generation network 3031 and a discrimination network 3032, the generation network 3031 is used for generating an avatar using the input sample vector, and the discrimination network 3032 is used for determining whether the avatar input to the discrimination network 3032 is the avatar output by the generation network 3031. Then, the following training steps are performed: using a machine learning method, a sample vector included in a training sample in the training sample set 302 is used as an input of a generation network, and an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, and the generation network and the discrimination network are trained. When the discrimination network cannot accurately discriminate the avatar output from the generation network from the sample avatar (i.e., the discrimination accuracy is 50%), the generation network at this time is determined as the avatar generation model 304.

In the method provided by the above embodiment of the present disclosure, by obtaining a preset training sample set, where a training sample includes a sample vector and a sample avatar corresponding to the sample vector, a machine learning method is used to use the sample vector included in the training sample set as an input of a generation network, use the avatar output by the generation network and the sample avatar corresponding to the sample vector input to the generation network as inputs of a discrimination network, train the generation network and the discrimination network, determine the generated network after training as an avatar generation model, and the avatar generation model after training can use the input vector to generate the avatar, thereby facilitating improvement of the efficiency of generating the avatar.

With further reference to fig. 4, a flow 400 of one embodiment of a method for generating an avatar according to the present disclosure is shown. The flow 400 of the method for generating an avatar comprises the steps of:

step 401, obtaining a target vector.

In this embodiment, an execution subject (e.g., a server or a terminal device shown in fig. 1) of the method for generating an avatar may acquire the target vector from a remote place or from a local place by a wired connection manner or a wireless connection manner. Wherein the target vector is the vector with which the avatar is to be generated. For example, the target vector may be a vector manually input by a user, or a vector automatically generated by the executing agent or other electronic device.

In some optional implementations of this embodiment, the target vector may be a randomly generated vector of preset dimensions. Specifically, as an example, the execution subject for generating the target vector may generate a preset number (e.g., 512) of random numbers conforming to the normal distribution, combine the preset number of numbers conforming to the normal distribution into one vector, and determine the vector as the target vector, the vector having the same dimension as the preset number.

Step 402, inputting the target vector into a pre-trained avatar generation model, obtaining an avatar corresponding to the target vector and outputting the avatar.

In this embodiment, the executing agent may input the target vector into a pre-trained avatar generation model, and obtain an avatar and an output corresponding to the target vector. The avatar generation model is generated according to the method described in the embodiment corresponding to fig. 2. In practice, the sample avatar used for training the avatar generation model may be a cartoon avatar, so that the avatar generation model may output the cartoon avatar.

The execution body may output the generated avatar in various ways. For example, the generated avatar may be displayed on a display screen included in the execution main body, or the generated avatar may be transmitted to another electronic device communicatively connected to the execution main body.

In practice, the generated avatar may be further used as a training sample to train other models. Because a large number of training samples are usually needed when other models are trained, by executing the embodiment, under the condition that a large number of head portraits need to be generated, the head portraits do not need to be extracted from an image library or drawn manually, and the head portraits can be obtained only by generating model input vectors to the head portraits, so that the efficiency of generating the head portraits is effectively improved.

According to the method provided by the embodiment of the disclosure, the target vector is acquired, and the target vector is input into the avatar generation model generated in advance according to the method described in the embodiment corresponding to fig. 2, so that the avatar corresponding to the target vector is obtained and output, and therefore, the avatar can be obtained according to the vector, and the efficiency of generating the avatar is improved.

With further reference to fig. 5, as an implementation of the method shown in fig. 2, the present disclosure provides an embodiment of an apparatus for generating an avatar generation model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 5, the apparatus 500 for generating an avatar generation model of the present embodiment includes: a first obtaining unit 501, configured to obtain a preset training sample set, where a training sample includes a sample vector and a sample avatar corresponding to the sample vector; a second obtaining unit 502 configured to obtain a pre-established initial generation countermeasure network, where the initial generation countermeasure network includes a generation network for generating an avatar using the input sample vector and a discrimination network for discriminating an avatar output by the generation network from a sample avatar corresponding to the sample vector input to the generation network; a training unit 503 configured to perform the following training steps: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model.

In this embodiment, the first obtaining unit 501 may obtain the preset training sample set from a remote location or a local location through a wired connection or a wireless connection. The training sample comprises a sample vector and a sample head portrait corresponding to the sample vector.

In this embodiment, the second obtaining unit 502 may obtain the pre-established initial generation countermeasure network from a remote location or from a local location. The initial generative rival network may be a generative rival network of various configurations. For example, the initially generated counterpoise network may be a deep convolution generated counterpoise network. It should be understood that the initial generative countermeasure network may be an untrained generative countermeasure network with initialized parameters, or may be an already trained generative countermeasure network.

The generation network may be a convolutional neural network (e.g., various convolutional neural network structures including a convolutional layer, a pooling layer, an anti-pooling layer, and an anti-convolutional layer) for performing image processing. The discriminative network may be a convolutional neural network (e.g., various convolutional neural network structures that include a fully-connected layer, where the fully-connected layer may perform a classification function). In addition, the discriminant network may be another model for implementing classification functions, such as a support vector machine. Here, for each avatar input to the discrimination network, if the discrimination network determines that the avatar is an avatar output by the generation network, the discrimination network may output a tag 1 (or 0) corresponding to the avatar; if it is determined that the avatar is not an image output by the generation network, a tag 0 (or 1) corresponding to the avatar may be output. The discrimination network may output other preset information, and is not limited to the values 1 and 0.

In this embodiment, training section 503 may train the generation network and the discrimination network using a sample vector included in a training sample set as an input of the generation network, and a sample avatar output by the generation network and a sample avatar corresponding to a sample vector input to the generation network as inputs of the discrimination network, and determine the generation network after training as the avatar generation model by using a machine learning method.

Specifically, the training unit 503 may first fix parameters of any one of the generated network and the discriminant network (which may be referred to as a first network), and optimize a network (which may be referred to as a second network) with unfixed parameters; and then fixing the parameters of the second network and optimizing the first network. The iteration is continuously carried out, so that the distinguishing network cannot distinguish the head portrait output by the generating network from the sample head portrait corresponding to the sample vector input to the generating network. In this case, the avatar generated by the generation network is close to the sample avatar, and the discrimination network cannot accurately discriminate the avatar output by the generation network from the sample avatar (that is, the discrimination accuracy is 50%), and the generation network at this time can be determined as the avatar generation model. In general, the training unit 503 can train the generation network and the discrimination network by using the existing back propagation algorithm and gradient descent algorithm. And adjusting the parameters of the generation network and the discrimination network after each training, and taking the generation network and the discrimination network obtained after each parameter adjustment as the initial generation countermeasure network for the next training. In the training process, a loss value can be determined by using a loss function, and the generation network and the discrimination network are iteratively trained according to the loss value, so that the determined loss value is minimum in each iterative operation.

In some optional implementations of this embodiment, the training unit 503 may include: a generating module (not shown in the figure) configured to, for a training sample in the training sample set, input a sample vector included in the training sample into the generated avatar generation model, so as to obtain an avatar; a first determining module (not shown in the figures) configured to determine a difference degree value for characterizing a degree of difference between the obtained head images; a second determining module (not shown in the figures) configured to determine the generated avatar generation model as a final avatar generation model in response to determining that the degree of difference value is greater than or equal to a preset difference degree threshold value.

In some optional implementations of this embodiment, the training unit 503 may further include: a generation module (not shown in the figures) configured to regenerate the set of training samples in response to determining that the degree of difference value is less than the degree of difference threshold; and a training module (not shown in the figure) configured to use the generated avatar generation model as an initial generation confrontation network, and to perform the training step again by using the regenerated training sample set.

In some optional implementations of this embodiment, the generating module may be further configured to: and for the training samples in the training sample set, regenerating the sample vectors corresponding to the sample avatars included in the training samples, and determining the regenerated sample vectors and the sample avatars included in the training samples as regenerated training samples.

In some optional implementations of this embodiment, the first determining module may include: a first determining submodule (not shown in the figure) configured to determine a feature vector of the avatar in the obtained avatar; a second determination submodule (not shown in the figures) configured to determine a measure of difference value based on the distance between two determined feature vectors.

In some optional implementations of this embodiment, the sample vector is a randomly generated vector of preset dimensions.

In the apparatus 500 provided by the above embodiment of the present disclosure, by obtaining a preset training sample set, where a training sample includes a sample vector and a sample avatar corresponding to the sample vector, a machine learning method is used to take the sample vector included in the training sample set as an input of a generation network, take the avatar output by the generation network and the sample avatar corresponding to the sample vector input to the generation network as inputs of a discrimination network, train the generation network and the discrimination network, determine the generated network after training as an avatar generation model, and the avatar generation model after training can generate the avatar by using the input vector, thereby facilitating improvement of efficiency of generating the avatar.

With further reference to fig. 6, as an implementation of the method shown in fig. 4, the present disclosure provides an embodiment of an apparatus for generating an avatar generation model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 4, and the apparatus may be applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for generating an avatar generation model according to the present embodiment includes: a vector acquisition unit 601 configured to acquire a target vector; the avatar generation unit 602 is configured to input the target vector into a pre-trained avatar generation model, obtain an avatar corresponding to the target vector and output the avatar, wherein the avatar generation model is generated according to the method described in the embodiment corresponding to fig. 2.

In this embodiment, the vector acquisition unit 601 may acquire the target vector from a remote location or from a local location by a wired connection or a wireless connection. Wherein the target vector is the vector with which the avatar is to be generated. For example, the target vector may be a vector manually input by a user, or a vector automatically generated by the apparatus 600 or other electronic device described above.

In this embodiment, avatar generation section 602 may input the target vector to a pre-trained avatar generation model, and obtain an avatar corresponding to the target vector and output the avatar. The avatar generation model is generated according to the method described in the embodiment corresponding to fig. 2. In practice, the sample avatar used for training the avatar generation model may be a cartoon avatar, so that the avatar generation model may output the cartoon avatar.

In some optional implementations of this embodiment, the target vector is a randomly generated vector of preset dimensions.

The apparatus 600 provided in the foregoing embodiment of the present disclosure obtains the target vector, inputs the target vector into the avatar generation model generated in advance according to the method described in the embodiment corresponding to fig. 2, obtains the avatar corresponding to the target vector, and outputs the avatar, thereby obtaining the avatar according to the vector, and improving the efficiency of generating the avatar.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a preset training sample set, wherein the training samples comprise sample vectors and sample head portraits corresponding to the sample vectors; acquiring a pre-established initial generation countermeasure network, wherein the initial generation countermeasure network comprises a generation network and a judgment network, the generation network is used for generating an avatar by using the input sample vector, and the judgment network is used for distinguishing the avatar output by the generation network and the sample avatar corresponding to the sample vector input into the generation network; the following training steps are performed: by using a machine learning method, a sample vector included in a training sample set is used as an input of a generation network, an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network are used as inputs of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as an avatar generation model.

Further, the one or more programs, when executed by the electronic device, cause the electronic device to: acquiring a target vector; and inputting the target vector into a pre-trained head portrait generation model to obtain a head portrait corresponding to the target vector and outputting the head portrait.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit, a second acquisition unit, and a training unit. Where the names of the units do not in some cases constitute a limitation of the unit itself, for example, the first acquisition unit may also be described as a "unit that acquires a preset set of training samples".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for generating an avatar generation model, comprising:

acquiring a preset training sample set, wherein the training samples comprise sample vectors and sample head portraits corresponding to the sample vectors;

acquiring a pre-established initial generation countermeasure network, wherein the initial generation countermeasure network comprises a generation network and a discrimination network, the generation network is used for generating an avatar by using the input sample vector, and the discrimination network is used for discriminating the avatar output by the generation network and the sample avatar corresponding to the sample vector input into the generation network;

the following training steps are performed: by using a machine learning method, taking a sample vector included in a training sample set as an input of a generation network, taking an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network as an input of a discrimination network, training the generation network and the discrimination network, and determining the trained generation network as an avatar generation model, wherein the avatar generation model is used for generating an avatar based on a random vector input to the avatar generation model;

after determining the trained generation network as an avatar generation model, the training step further includes:

for the training samples in the training sample set, inputting sample vectors included in the training samples into the generated head portrait generation model to obtain head portraits;

determining a difference degree value for characterizing a degree of difference between the obtained avatars;

and in response to determining that the difference degree value is greater than or equal to a preset difference degree threshold value, determining the generated head portrait generation model as a final head portrait generation model.

2. The method of claim 1, wherein after said determining a difference measure value characterizing a degree of difference between the resulting avatars, the method further comprises:

in response to determining that the difference degree value is less than the difference degree threshold, regenerating a set of training samples;

and taking the generated avatar generation model as an initial generation countermeasure network, and re-executing the training step by using the regenerated training sample set.

3. The method of claim 2, wherein the regenerating a set of training samples comprises:

and for the training samples in the training sample set, regenerating the sample vectors corresponding to the sample avatars included in the training samples, and determining the regenerated sample vectors and the sample avatars included in the training samples as regenerated training samples.

4. The method of claim 1, wherein said determining a difference measure value characterizing a degree of difference between the resulting avatars comprises:

determining a feature vector of the head portrait in the obtained head portrait;

based on the determined distance between each two of the feature vectors, a degree of difference value is determined.

5. The method according to one of claims 1 to 4, wherein the sample vector is a randomly generated vector of preset dimensions.

6. A method for generating an avatar, comprising:

acquiring a target vector;

inputting the target vector into a pre-trained avatar generation model, obtaining an avatar corresponding to the target vector and outputting, wherein the avatar generation model is generated according to the method of any one of claims 1-5.

7. The method of claim 6, wherein the target vector is a randomly generated vector of preset dimensions.

8. An apparatus for generating an avatar generation model, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is configured to acquire a preset training sample set, and the training samples comprise sample vectors and sample head portraits corresponding to the sample vectors;

a second obtaining unit configured to obtain a pre-established initial generation countermeasure network, wherein the initial generation countermeasure network includes a generation network for generating an avatar using the input sample vector and a discrimination network for discriminating an avatar output by the generation network from a sample avatar corresponding to the sample vector input to the generation network;

a training unit configured to perform the following training steps: by using a machine learning method, taking a sample vector included in a training sample set as an input of a generation network, taking an avatar output by the generation network and a sample avatar corresponding to the sample vector input to the generation network as an input of a discrimination network, training the generation network and the discrimination network, and determining the trained generation network as an avatar generation model, wherein the avatar generation model is used for generating an avatar based on a random vector input to the avatar generation model;

the training unit includes:

the generating module is configured to input sample vectors included in training samples into the generated head portrait generating model for the training samples in the training sample set to obtain head portraits;

a first determining module configured to determine a difference measure value characterizing a degree of difference between the obtained head images;

a second determination module configured to determine the generated avatar generation model as a final avatar generation model in response to determining that the difference degree value is greater than or equal to a preset difference degree threshold value.

9. The apparatus of claim 8, wherein the training unit further comprises:

a generation module configured to regenerate a set of training samples in response to determining that the difference degree value is less than the difference degree threshold;

a training module configured to re-execute the training step using the regenerated training sample set with the generated avatar generation model as an initial generation confrontation network.

10. The apparatus of claim 9, wherein the generation module is further configured to:

11. The apparatus of claim 8, wherein the first determining means comprises:

a first determining submodule configured to determine a feature vector of an avatar in the obtained avatar;

a second determination submodule configured to determine a degree of difference value based on a distance between the determined feature vectors.

12. The apparatus according to one of claims 8 to 11, wherein the sample vector is a randomly generated vector of preset dimensions.

13. An apparatus for generating an avatar, comprising:

a vector acquisition unit configured to acquire a target vector;

an avatar generation unit configured to input the target vector into a pre-trained avatar generation model, obtain an avatar corresponding to the target vector and output, wherein the avatar generation model is generated according to the method of any one of claims 1-5.

14. The apparatus of claim 13, wherein the target vector is a randomly generated vector of preset dimensions.

15. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.