WO2020139054A1

WO2020139054A1 - Apparatus and method for generating a virtual avatar

Info

Publication number: WO2020139054A1
Application number: PCT/KR2019/018710
Authority: WO
Inventors: Yanqing Lu; Xiufen CUI
Original assignee: Samsung Electronics Co., Ltd.
Priority date: 2018-12-29
Filing date: 2019-12-30
Publication date: 2020-07-02
Also published as: CN109727320A

Abstract

The present disclosure discloses a method and apparatus for generating a virtual avatar, comprising: detecting whether a target object to be virtualized in an image wears preset occlusion objects; when the target object wears at least one of the occlusion objects: removing, detected occlusion objects from the image of the target object, by a pre-trained neural network model; generating a corresponding virtual avatar according to the obtained image without the occlusion objects; and selecting, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object; when the target object does not wear the occlusion objects: directly generating the corresponding virtual avatar according to the image of the target object. Through the present disclosure, the similarity between the virtual avatar and the real image may be improved.

Description

APPARATUS AND METHOD FOR GENERATING A VIRTUAL AVATAR

The disclosure relates to the field of image processing technologies. More particularly, the disclosure relates to an apparatus and method for generating a virtual avatar.

With the increasing popularity of models such as virtual avatars in terminal devices such as mobile phones, generation methods of a model based on facial expressions and actions occupy the mainstream. The generation of virtual avatars is mostly done by selecting a selfie or by taking a selfie.

The related virtual avatar generation scheme is performing model matching directly based on the self-photograph of the user, that is, for the facial features, the corresponding facial feature textures are loaded from the model, and the virtual avatars are combined and generated.

In the process of implementing the present disclosure, the inventor found that the related virtual avatar generation scheme may generate an erroneous three-dimensional virtualized avatar in many cases. In particular, when a user takes a image with decorative objects such as glasses or earrings, the virtual avatar generated based on the corresponding image has many errors, and the virtual avatar has a low degree of similarity with the user, so that the virtual avatar accurately reflects the user's appearance characteristics, which affects the recognizability of the virtual avatar.

An aspect of the present disclosure to provide a method and device for generating a virtual avatar, which may improve the similarity between the virtual avatar and the real image.

In order to achieve the object above, the technical solution proposed by the present disclosure is:

A method for generating a virtual avatar, comprising:

detecting whether a target object to be virtualized in an image wears preset occlusion objects;

when the target object wears at least one of the occlusion objects:

removing, detected occlusion objects from the image of the target object, by a pre-trained neural network model;

generating a corresponding virtual avatar according to the obtained image without the occlusion objects; and

selecting, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;

when the target object does not wear the occlusion objects:

directly generating the corresponding virtual avatar according to the image of the target object.

Preferably, when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.

Preferably, the neural network model comprises a convolutional neural network model.

Preferably, the external features comprise shape and/or color.

A device for generating a virtual avatar, comprising: a processor, the processor is configured to:

detect whether a target object to be virtualized in an image wears preset occlusion objects;

when the target object wears at least one of the occlusion objects:

remove detected occlusion objects from the image of the target object, by a pre-trained neural network model;

generate a corresponding virtual avatar according to the obtained image without the occlusion objects; and

select, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;

when the target object does not wear the occlusion objects:

directly generate the corresponding virtual avatar according to the image of the target object.

Preferably, the external features comprise shape and/or color.

A non-transitory computer readable storage medium, storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method for generating a virtual avatar as described above.

An electronic device, comprising: a non-transitory computer readable storage medium, and a processor capable of accessing to the non-transitory computer readable storage medium.

In summary, the method and device for generating a virtual avatar provided by the embodiments of the present disclosure, according to wearing condition of the occlusion objects of the target object. When wearing the occlusion objects, the occlusion objects are removed from the image of the target object by Artificial Intelligence (AI) technology, to restore the image of the target object to an ideal input state for generating the virtual avatar. After the virtual avatar is generated, the external features of the removed occlusion objects are matched to the corresponding occlusion object images and corresponding occlusion object images are loaded to the virtual avatar to obtain the final virtual avatar of the target object.

Various embodiments of the present disclosure provide a better 3D display effect of the virtual avatar may be got, thereby effectively avoiding the errors due to the influence of the occlusion objects when generating the virtual avatar in the related technology, and improving the similarity between the virtual avatar and the real image of the target object.

Figure 1 is a flowchart for a method according to various embodiments of the present disclosure.

In order to make the objects, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below with reference to drawings and specific embodiments.

Figure 1 is a flowchart for a method according to various embodiments of the present disclosure. As shown in Figure 1, the method for generating a user virtual avatar in an embodiment includes:

Step 101: Whether a target object to be virtualized in an image wears preset occlusion objects is detected.

In this step, it is necessary to detect whether the target object in the image is wearing the preset occlusion objects, so that when there is an occlusion object, the occlusion object is processed first, and then the virtual avatar is generated to improve the similarity between the virtual avatar and the real target object.

The image may specifically be a self-photograph of a user or another image designated by the user, which is not limited herein.

Preferably, when the target object is a portrait, the occlusion objects may include glasses and/or an item or hair that blocks facial features. For example, the occlusion objects may be a garnish of glasses or the like, sunglasses, or the like, and may also be earrings or an ornament that blocks the facial features, which are not limited thereto.

Step 102: When the target object wears at least one of the occlusion objects, occlusion objects detected are removed from the image of the target object, by a pre-trained neural network model, and a corresponding virtual avatar is generated according to the obtained image without the occlusion objects; and according to preset external features of each occlusion object removed, occlusion object images matching the preset external features are selected from the preset three Dimension image library of occlusion objects to load to a corresponding position of the virtual avatar, the final virtual avatar of the target object is obtained.

In this step, the pre-trained neural network model is used to remove the occlusion objects one by one from the image of the target object, the image wearing the occlusion objects is restored to the image without the occlusion objects, and then the corresponding virtual avatar is generated based on the image without the occlusion objects to improve the similarity between the virtual avatar and the real image, and avoid influence of the occlusion objects on the accuracy of the virtual avatar.

In practical applications, the target object may be determined by a person skilled in the art according to requirement of the actual virtualized avatar, and the corresponding target object may be determined, for example, to be a portrait or an image of other creatures.

Assuming that the target object is a portrait, a specific training method of the neural network model may include the following steps:

X1, generating a training data set (in the following, taking the occlusion object as glasses as an example, the others are similar):

Select people with different skin colors, different genders, and in different environments to shoot one group of images without wearing glasses and another group of images wearing glasses (you can also use another method: load an glasses image of a right size on a group of images without glasses, which is set to be the group of images wearing glasses). Among the two groups of images saved, the one used for input of deep learning is the group of images wearing glasses, and the ground-truth data is the group of images without wearing glasses. 80% of the two groups of images may be randomly selected as a training set, and the remaining 20% of the two groups of images is used as a test set.

X2, the training of the neural network model:

A codec network model of Context Encoders may be used to repair and reconstruct input images wearing glasses. In a training process, the input images are first scaled to a preset standard size (eg, 128*128), and then final reconstructed images are generated by the codec network model composed of a multi-layer convolutional neural network.

The specific training process includes the following stages:

Coding stage: the original input images are encoded through an encoder network composed of the multi-layer convolutional neural network (such as a 5-layer convolutional neural network) to obtain coding features of a certain dimensions (such as when the encoder network composed of a 5-layer convolutional neural network is used, coding features of 4000 dimensions will be obtained).

Decoding stage: an encoded result obtained in the coding stage is input to a decoder based on a deep convolutional generative adversarial network (DCGAN) structure to generate reconstructed images.

The calculation of a loss value and the adjustment of model parameters: an error value corresponding to the reconstructed image generated is calculated according to a loss function, and the model parameters of the neural network model are adjusted based on minimizing the error value.

Here, for the loss function used for the training above: in addition to the Mean Square Error (MSE) commonly used, that is, a squared error between pixels of the real image and the reconstructed image generated, and then a term against loss is added, which comes from an error between that a discriminator in the generative adversarial networks judges the reconstructed image to be false and the true value, and thus, a better reconstruction effect is obtained.

Preferably, the neural network model includes, but is not limited to, a convolutional neural network model and a generative adversarial network model.

Preferably, the external features in this step may be set by those skilled in the art according to actual requirements, and may include features such as shape and/or color, but are not limited thereto, and for example, the external features may also be, a pattern or the like.

The generation of the virtual avatar in step 102 may be implemented by a related method, and details of the generation of the virtual avatar are not described herein again.

Specifically, in step 102, the matched occlusion object images can be worn on the virtual avatar by a three-dimensional image technology, and the specific method is known to those skilled in the art, and details are not described herein again.

Step 103: When the target object does not wear the occlusion object, the corresponding virtual avatar is generated directly according to the image of the target object.

This step may be implemented by using related methods, and details are not described herein again.

According to the method for generating a virtual avatar in the embodiment of the present disclosure, wearing condition of the occlusion objects of the target object is detected before the virtual avatar is generated, and different generating modes are adopted according to whether or not to wear the occlusion objects. When wearing the occlusion objects, the occlusion objects are removed from the image of the target object by Artificial Intelligence (AI) technology, to restore the image of the target object to an ideal input state for generating the virtual avatar. Then, based on the image without the occlusion objects, the corresponding virtual avatar is generated, and finally, the images of the occlusion objects are matched according to the external features of the occlusion objects and loaded onto the virtual avatar to obtain the final virtual avatar of the target object. In this way, when the target object wears the occlusion objects, the virtual avatar is generated based on the image without the occlusion objects after the reconstructed process, thereby ensuring the 3D display effect of the virtual avatar, effectively avoiding the errors due to the influence of the occlusion objects when generating the virtual avatar in the related technology, and improving the similarity between the virtual avatar and the real image of the target object.

A schematic diagram illustrating a structure of a device for generating a virtual avatar corresponding to the method in the embodiment of the present disclosure, the device includes: a processor, wherein the processor is configured to:

when the target object wears at least one of the occlusion objects, detected occlusion objects are removed from the image of the target object by a pre-trained neural network model, and a corresponding virtual avatar is generated according to the obtained image without the occlusion objects; and according to preset external features of each removed occlusion object, the occlusion object images matching with the external features are selected from a preset 3D image library of occlusion objects to load onto the virtual avatar to obtain a virtual avatar of the target object;

when the target object does not wear the occlusion objects, a corresponding virtual avatar is generated directly according to the image of the target object.

Preferably, when the target object is a portrait, the occlusion objects may include glasses and/or an item or hair that blocks facial features.

Preferably, the external features may include shape and/or color.

A non-transitory computer readable storage medium, storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method for generating the user virtual avatar as described above.

In conclusion, the embodiments above are only the preferred embodiments of the present disclosure and are not intended to limit the scope of the present disclosure. Any modifications, equivalent substitutions, improvements and so on made within the spirit and scope of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

A method performed by an electronic device for generating a virtual avatar, comprising:

detecting whether a target object to be virtualized in an image wears preset occlusion objects;

when the target object wears at least one of the occlusion objects:

removing, detected occlusion objects from the image of the target object, by a pre-trained neural network model;

generating a corresponding virtual avatar according to the obtained image without the occlusion objects; and

selecting, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion objects to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;

when the target object does not wear the occlusion objects:

directly generating the corresponding virtual avatar according to the image of the target object.
The method of claim 1, wherein when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.
The method of claim 1, wherein the neural network model comprises a convolutional neural network model.
The method of claim 1, wherein the external features comprise shape and/or color.
An apparatus for generating a virtual avatar, comprising: a processor, wherein the processor is configured to:

detect whether a target object to be virtualized in an image wears preset occlusion objects;

when the target object wears at least one of the occlusion objects:

remove detected occlusion objects from the image of the target object, by a pre-trained neural network model;

generate a corresponding virtual avatar according to the obtained image without the occlusion objects; and

select, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;

when the target object does not wear the occlusion objects:

directly generate the corresponding virtual avatar according to the image of the target object.
The apparatus of claim 5, wherein when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.
The apparatus of claim 5, wherein the neural network model comprises a convolutional neural network model.
The apparatus of claim 5, wherein the external features comprise shape and/or color.
A non-transitory computer readable storage medium, storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method according to any one of claims 1 to 4.
An electronic device, comprising the non-transitory computer readable storage medium of claim 9, and a processor capable of accessing to the non-transitory computer-readable storage medium.