WO2020139054A1 - Apparatus and method for generating a virtual avatar - Google Patents

Apparatus and method for generating a virtual avatar Download PDF

Info

Publication number
WO2020139054A1
WO2020139054A1 PCT/KR2019/018710 KR2019018710W WO2020139054A1 WO 2020139054 A1 WO2020139054 A1 WO 2020139054A1 KR 2019018710 W KR2019018710 W KR 2019018710W WO 2020139054 A1 WO2020139054 A1 WO 2020139054A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
occlusion
virtual avatar
image
occlusion objects
Prior art date
Application number
PCT/KR2019/018710
Other languages
French (fr)
Inventor
Yanqing Lu
Xiufen CUI
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Publication of WO2020139054A1 publication Critical patent/WO2020139054A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the disclosure relates to the field of image processing technologies. More particularly, the disclosure relates to an apparatus and method for generating a virtual avatar.
  • the related virtual avatar generation scheme is performing model matching directly based on the self-photograph of the user, that is, for the facial features, the corresponding facial feature textures are loaded from the model, and the virtual avatars are combined and generated.
  • the related virtual avatar generation scheme may generate an erroneous three-dimensional virtualized avatar in many cases.
  • the virtual avatar generated based on the corresponding image has many errors, and the virtual avatar has a low degree of similarity with the user, so that the virtual avatar accurately reflects the user's appearance characteristics, which affects the recognizability of the virtual avatar.
  • An aspect of the present disclosure to provide a method and device for generating a virtual avatar, which may improve the similarity between the virtual avatar and the real image.
  • a method for generating a virtual avatar comprising:
  • the occlusion objects comprise glasses and/or an item or hair blocking facial features.
  • the neural network model comprises a convolutional neural network model.
  • the external features comprise shape and/or color.
  • a device for generating a virtual avatar comprising: a processor, the processor is configured to:
  • occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;
  • the occlusion objects comprise glasses and/or an item or hair blocking facial features.
  • the neural network model comprises a convolutional neural network model.
  • the external features comprise shape and/or color.
  • a non-transitory computer readable storage medium storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method for generating a virtual avatar as described above.
  • An electronic device comprising: a non-transitory computer readable storage medium, and a processor capable of accessing to the non-transitory computer readable storage medium.
  • the method and device for generating a virtual avatar provided by the embodiments of the present disclosure, according to wearing condition of the occlusion objects of the target object.
  • the occlusion objects are removed from the image of the target object by Artificial Intelligence (AI) technology, to restore the image of the target object to an ideal input state for generating the virtual avatar.
  • AI Artificial Intelligence
  • the external features of the removed occlusion objects are matched to the corresponding occlusion object images and corresponding occlusion object images are loaded to the virtual avatar to obtain the final virtual avatar of the target object.
  • Various embodiments of the present disclosure provide a better 3D display effect of the virtual avatar may be got, thereby effectively avoiding the errors due to the influence of the occlusion objects when generating the virtual avatar in the related technology, and improving the similarity between the virtual avatar and the real image of the target object.
  • Figure 1 is a flowchart for a method according to various embodiments of the present disclosure.
  • Figure 1 is a flowchart for a method according to various embodiments of the present disclosure. As shown in Figure 1, the method for generating a user virtual avatar in an embodiment includes:
  • Step 101 Whether a target object to be virtualized in an image wears preset occlusion objects is detected.
  • this step it is necessary to detect whether the target object in the image is wearing the preset occlusion objects, so that when there is an occlusion object, the occlusion object is processed first, and then the virtual avatar is generated to improve the similarity between the virtual avatar and the real target object.
  • the image may specifically be a self-photograph of a user or another image designated by the user, which is not limited herein.
  • the occlusion objects may include glasses and/or an item or hair that blocks facial features.
  • the occlusion objects may be a garnish of glasses or the like, sunglasses, or the like, and may also be earrings or an ornament that blocks the facial features, which are not limited thereto.
  • Step 102 When the target object wears at least one of the occlusion objects, occlusion objects detected are removed from the image of the target object, by a pre-trained neural network model, and a corresponding virtual avatar is generated according to the obtained image without the occlusion objects; and according to preset external features of each occlusion object removed, occlusion object images matching the preset external features are selected from the preset three Dimension image library of occlusion objects to load to a corresponding position of the virtual avatar, the final virtual avatar of the target object is obtained.
  • the pre-trained neural network model is used to remove the occlusion objects one by one from the image of the target object, the image wearing the occlusion objects is restored to the image without the occlusion objects, and then the corresponding virtual avatar is generated based on the image without the occlusion objects to improve the similarity between the virtual avatar and the real image, and avoid influence of the occlusion objects on the accuracy of the virtual avatar.
  • the target object may be determined by a person skilled in the art according to requirement of the actual virtualized avatar, and the corresponding target object may be determined, for example, to be a portrait or an image of other creatures.
  • a specific training method of the neural network model may include the following steps:
  • X1 generating a training data set (in the following, taking the occlusion object as glasses as an example, the others are similar):
  • the two groups of images saved the one used for input of deep learning is the group of images wearing glasses, and the ground-truth data is the group of images without wearing glasses. 80% of the two groups of images may be randomly selected as a training set, and the remaining 20% of the two groups of images is used as a test set.
  • a codec network model of Context Encoders may be used to repair and reconstruct input images wearing glasses.
  • the input images are first scaled to a preset standard size (eg, 128*128), and then final reconstructed images are generated by the codec network model composed of a multi-layer convolutional neural network.
  • the specific training process includes the following stages:
  • Coding stage the original input images are encoded through an encoder network composed of the multi-layer convolutional neural network (such as a 5-layer convolutional neural network) to obtain coding features of a certain dimensions (such as when the encoder network composed of a 5-layer convolutional neural network is used, coding features of 4000 dimensions will be obtained).
  • an encoder network composed of the multi-layer convolutional neural network such as a 5-layer convolutional neural network
  • Decoding stage an encoded result obtained in the coding stage is input to a decoder based on a deep convolutional generative adversarial network (DCGAN) structure to generate reconstructed images.
  • DCGAN deep convolutional generative adversarial network
  • an error value corresponding to the reconstructed image generated is calculated according to a loss function, and the model parameters of the neural network model are adjusted based on minimizing the error value.
  • MSE Mean Square Error
  • the neural network model includes, but is not limited to, a convolutional neural network model and a generative adversarial network model.
  • the external features in this step may be set by those skilled in the art according to actual requirements, and may include features such as shape and/or color, but are not limited thereto, and for example, the external features may also be, a pattern or the like.
  • the generation of the virtual avatar in step 102 may be implemented by a related method, and details of the generation of the virtual avatar are not described herein again.
  • the matched occlusion object images can be worn on the virtual avatar by a three-dimensional image technology, and the specific method is known to those skilled in the art, and details are not described herein again.
  • Step 103 When the target object does not wear the occlusion object, the corresponding virtual avatar is generated directly according to the image of the target object.
  • This step may be implemented by using related methods, and details are not described herein again.
  • wearing condition of the occlusion objects of the target object is detected before the virtual avatar is generated, and different generating modes are adopted according to whether or not to wear the occlusion objects.
  • the occlusion objects are removed from the image of the target object by Artificial Intelligence (AI) technology, to restore the image of the target object to an ideal input state for generating the virtual avatar.
  • AI Artificial Intelligence
  • the corresponding virtual avatar is generated, and finally, the images of the occlusion objects are matched according to the external features of the occlusion objects and loaded onto the virtual avatar to obtain the final virtual avatar of the target object.
  • the virtual avatar is generated based on the image without the occlusion objects after the reconstructed process, thereby ensuring the 3D display effect of the virtual avatar, effectively avoiding the errors due to the influence of the occlusion objects when generating the virtual avatar in the related technology, and improving the similarity between the virtual avatar and the real image of the target object.
  • a schematic diagram illustrating a structure of a device for generating a virtual avatar corresponding to the method in the embodiment of the present disclosure the device includes: a processor, wherein the processor is configured to:
  • detected occlusion objects are removed from the image of the target object by a pre-trained neural network model, and a corresponding virtual avatar is generated according to the obtained image without the occlusion objects; and according to preset external features of each removed occlusion object, the occlusion object images matching with the external features are selected from a preset 3D image library of occlusion objects to load onto the virtual avatar to obtain a virtual avatar of the target object;
  • a corresponding virtual avatar is generated directly according to the image of the target object.
  • the occlusion objects may include glasses and/or an item or hair that blocks facial features.
  • the neural network model includes, but is not limited to, a convolutional neural network model and a generative adversarial network model.
  • the external features may include shape and/or color.
  • a non-transitory computer readable storage medium storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method for generating the user virtual avatar as described above.
  • An electronic device comprising: a non-transitory computer readable storage medium, and a processor capable of accessing to the non-transitory computer readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure discloses a method and apparatus for generating a virtual avatar, comprising: detecting whether a target object to be virtualized in an image wears preset occlusion objects; when the target object wears at least one of the occlusion objects: removing, detected occlusion objects from the image of the target object, by a pre-trained neural network model; generating a corresponding virtual avatar according to the obtained image without the occlusion objects; and selecting, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object; when the target object does not wear the occlusion objects: directly generating the corresponding virtual avatar according to the image of the target object. Through the present disclosure, the similarity between the virtual avatar and the real image may be improved.

Description

APPARATUS AND METHOD FOR GENERATING A VIRTUAL AVATAR
The disclosure relates to the field of image processing technologies. More particularly, the disclosure relates to an apparatus and method for generating a virtual avatar.
With the increasing popularity of models such as virtual avatars in terminal devices such as mobile phones, generation methods of a model based on facial expressions and actions occupy the mainstream. The generation of virtual avatars is mostly done by selecting a selfie or by taking a selfie.
The related virtual avatar generation scheme is performing model matching directly based on the self-photograph of the user, that is, for the facial features, the corresponding facial feature textures are loaded from the model, and the virtual avatars are combined and generated.
In the process of implementing the present disclosure, the inventor found that the related virtual avatar generation scheme may generate an erroneous three-dimensional virtualized avatar in many cases. In particular, when a user takes a image with decorative objects such as glasses or earrings, the virtual avatar generated based on the corresponding image has many errors, and the virtual avatar has a low degree of similarity with the user, so that the virtual avatar accurately reflects the user's appearance characteristics, which affects the recognizability of the virtual avatar.
An aspect of the present disclosure to provide a method and device for generating a virtual avatar, which may improve the similarity between the virtual avatar and the real image.
In order to achieve the object above, the technical solution proposed by the present disclosure is:
A method for generating a virtual avatar, comprising:
detecting whether a target object to be virtualized in an image wears preset occlusion objects;
when the target object wears at least one of the occlusion objects:
removing, detected occlusion objects from the image of the target object, by a pre-trained neural network model;
generating a corresponding virtual avatar according to the obtained image without the occlusion objects; and
selecting, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;
when the target object does not wear the occlusion objects:
directly generating the corresponding virtual avatar according to the image of the target object.
Preferably, when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.
Preferably, the neural network model comprises a convolutional neural network model.
Preferably, the external features comprise shape and/or color.
A device for generating a virtual avatar, comprising: a processor, the processor is configured to:
detect whether a target object to be virtualized in an image wears preset occlusion objects;
when the target object wears at least one of the occlusion objects:
remove detected occlusion objects from the image of the target object, by a pre-trained neural network model;
generate a corresponding virtual avatar according to the obtained image without the occlusion objects; and
select, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;
when the target object does not wear the occlusion objects:
directly generate the corresponding virtual avatar according to the image of the target object.
Preferably, when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.
Preferably, the neural network model comprises a convolutional neural network model.
Preferably, the external features comprise shape and/or color.
A non-transitory computer readable storage medium, storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method for generating a virtual avatar as described above.
An electronic device, comprising: a non-transitory computer readable storage medium, and a processor capable of accessing to the non-transitory computer readable storage medium.
In summary, the method and device for generating a virtual avatar provided by the embodiments of the present disclosure, according to wearing condition of the occlusion objects of the target object. When wearing the occlusion objects, the occlusion objects are removed from the image of the target object by Artificial Intelligence (AI) technology, to restore the image of the target object to an ideal input state for generating the virtual avatar. After the virtual avatar is generated, the external features of the removed occlusion objects are matched to the corresponding occlusion object images and corresponding occlusion object images are loaded to the virtual avatar to obtain the final virtual avatar of the target object.
Various embodiments of the present disclosure provide a better 3D display effect of the virtual avatar may be got, thereby effectively avoiding the errors due to the influence of the occlusion objects when generating the virtual avatar in the related technology, and improving the similarity between the virtual avatar and the real image of the target object.
Figure 1 is a flowchart for a method according to various embodiments of the present disclosure.
In order to make the objects, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below with reference to drawings and specific embodiments.
Figure 1 is a flowchart for a method according to various embodiments of the present disclosure. As shown in Figure 1, the method for generating a user virtual avatar in an embodiment includes:
Step 101: Whether a target object to be virtualized in an image wears preset occlusion objects is detected.
In this step, it is necessary to detect whether the target object in the image is wearing the preset occlusion objects, so that when there is an occlusion object, the occlusion object is processed first, and then the virtual avatar is generated to improve the similarity between the virtual avatar and the real target object.
The image may specifically be a self-photograph of a user or another image designated by the user, which is not limited herein.
Preferably, when the target object is a portrait, the occlusion objects may include glasses and/or an item or hair that blocks facial features. For example, the occlusion objects may be a garnish of glasses or the like, sunglasses, or the like, and may also be earrings or an ornament that blocks the facial features, which are not limited thereto.
Step 102: When the target object wears at least one of the occlusion objects, occlusion objects detected are removed from the image of the target object, by a pre-trained neural network model, and a corresponding virtual avatar is generated according to the obtained image without the occlusion objects; and according to preset external features of each occlusion object removed, occlusion object images matching the preset external features are selected from the preset three Dimension image library of occlusion objects to load to a corresponding position of the virtual avatar, the final virtual avatar of the target object is obtained.
In this step, the pre-trained neural network model is used to remove the occlusion objects one by one from the image of the target object, the image wearing the occlusion objects is restored to the image without the occlusion objects, and then the corresponding virtual avatar is generated based on the image without the occlusion objects to improve the similarity between the virtual avatar and the real image, and avoid influence of the occlusion objects on the accuracy of the virtual avatar.
In practical applications, the target object may be determined by a person skilled in the art according to requirement of the actual virtualized avatar, and the corresponding target object may be determined, for example, to be a portrait or an image of other creatures.
Assuming that the target object is a portrait, a specific training method of the neural network model may include the following steps:
X1, generating a training data set (in the following, taking the occlusion object as glasses as an example, the others are similar):
Select people with different skin colors, different genders, and in different environments to shoot one group of images without wearing glasses and another group of images wearing glasses (you can also use another method: load an glasses image of a right size on a group of images without glasses, which is set to be the group of images wearing glasses). Among the two groups of images saved, the one used for input of deep learning is the group of images wearing glasses, and the ground-truth data is the group of images without wearing glasses. 80% of the two groups of images may be randomly selected as a training set, and the remaining 20% of the two groups of images is used as a test set.
X2, the training of the neural network model:
A codec network model of Context Encoders may be used to repair and reconstruct input images wearing glasses. In a training process, the input images are first scaled to a preset standard size (eg, 128*128), and then final reconstructed images are generated by the codec network model composed of a multi-layer convolutional neural network.
The specific training process includes the following stages:
Coding stage: the original input images are encoded through an encoder network composed of the multi-layer convolutional neural network (such as a 5-layer convolutional neural network) to obtain coding features of a certain dimensions (such as when the encoder network composed of a 5-layer convolutional neural network is used, coding features of 4000 dimensions will be obtained).
Decoding stage: an encoded result obtained in the coding stage is input to a decoder based on a deep convolutional generative adversarial network (DCGAN) structure to generate reconstructed images.
The calculation of a loss value and the adjustment of model parameters: an error value corresponding to the reconstructed image generated is calculated according to a loss function, and the model parameters of the neural network model are adjusted based on minimizing the error value.
Here, for the loss function used for the training above: in addition to the Mean Square Error (MSE) commonly used, that is, a squared error between pixels of the real image and the reconstructed image generated, and then a term against loss is added, which comes from an error between that a discriminator in the generative adversarial networks judges the reconstructed image to be false and the true value, and thus, a better reconstruction effect is obtained.
Preferably, the neural network model includes, but is not limited to, a convolutional neural network model and a generative adversarial network model.
Preferably, the external features in this step may be set by those skilled in the art according to actual requirements, and may include features such as shape and/or color, but are not limited thereto, and for example, the external features may also be, a pattern or the like.
The generation of the virtual avatar in step 102 may be implemented by a related method, and details of the generation of the virtual avatar are not described herein again.
Specifically, in step 102, the matched occlusion object images can be worn on the virtual avatar by a three-dimensional image technology, and the specific method is known to those skilled in the art, and details are not described herein again.
Step 103: When the target object does not wear the occlusion object, the corresponding virtual avatar is generated directly according to the image of the target object.
This step may be implemented by using related methods, and details are not described herein again.
According to the method for generating a virtual avatar in the embodiment of the present disclosure, wearing condition of the occlusion objects of the target object is detected before the virtual avatar is generated, and different generating modes are adopted according to whether or not to wear the occlusion objects. When wearing the occlusion objects, the occlusion objects are removed from the image of the target object by Artificial Intelligence (AI) technology, to restore the image of the target object to an ideal input state for generating the virtual avatar. Then, based on the image without the occlusion objects, the corresponding virtual avatar is generated, and finally, the images of the occlusion objects are matched according to the external features of the occlusion objects and loaded onto the virtual avatar to obtain the final virtual avatar of the target object. In this way, when the target object wears the occlusion objects, the virtual avatar is generated based on the image without the occlusion objects after the reconstructed process, thereby ensuring the 3D display effect of the virtual avatar, effectively avoiding the errors due to the influence of the occlusion objects when generating the virtual avatar in the related technology, and improving the similarity between the virtual avatar and the real image of the target object.
A schematic diagram illustrating a structure of a device for generating a virtual avatar corresponding to the method in the embodiment of the present disclosure, the device includes: a processor, wherein the processor is configured to:
detect whether a target object to be virtualized in an image wears preset occlusion objects;
when the target object wears at least one of the occlusion objects, detected occlusion objects are removed from the image of the target object by a pre-trained neural network model, and a corresponding virtual avatar is generated according to the obtained image without the occlusion objects; and according to preset external features of each removed occlusion object, the occlusion object images matching with the external features are selected from a preset 3D image library of occlusion objects to load onto the virtual avatar to obtain a virtual avatar of the target object;
when the target object does not wear the occlusion objects, a corresponding virtual avatar is generated directly according to the image of the target object.
Preferably, when the target object is a portrait, the occlusion objects may include glasses and/or an item or hair that blocks facial features.
Preferably, the neural network model includes, but is not limited to, a convolutional neural network model and a generative adversarial network model.
Preferably, the external features may include shape and/or color.
A non-transitory computer readable storage medium, storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method for generating the user virtual avatar as described above.
An electronic device, comprising: a non-transitory computer readable storage medium, and a processor capable of accessing to the non-transitory computer readable storage medium.
In conclusion, the embodiments above are only the preferred embodiments of the present disclosure and are not intended to limit the scope of the present disclosure. Any modifications, equivalent substitutions, improvements and so on made within the spirit and scope of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

  1. A method performed by an electronic device for generating a virtual avatar, comprising:
    detecting whether a target object to be virtualized in an image wears preset occlusion objects;
    when the target object wears at least one of the occlusion objects:
    removing, detected occlusion objects from the image of the target object, by a pre-trained neural network model;
    generating a corresponding virtual avatar according to the obtained image without the occlusion objects; and
    selecting, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion objects to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;
    when the target object does not wear the occlusion objects:
    directly generating the corresponding virtual avatar according to the image of the target object.
  2. The method of claim 1, wherein when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.
  3. The method of claim 1, wherein the neural network model comprises a convolutional neural network model.
  4. The method of claim 1, wherein the external features comprise shape and/or color.
  5. An apparatus for generating a virtual avatar, comprising: a processor, wherein the processor is configured to:
    detect whether a target object to be virtualized in an image wears preset occlusion objects;
    when the target object wears at least one of the occlusion objects:
    remove detected occlusion objects from the image of the target object, by a pre-trained neural network model;
    generate a corresponding virtual avatar according to the obtained image without the occlusion objects; and
    select, according to preset external features of each occlusion object removed, occlusion object images matching the preset external features from a preset three Dimension image library of occlusion object to load to a corresponding position of the virtual avatar to obtain the virtual avatar of the target object;
    when the target object does not wear the occlusion objects:
    directly generate the corresponding virtual avatar according to the image of the target object.
  6. The apparatus of claim 5, wherein when the target object is a portrait, the occlusion objects comprise glasses and/or an item or hair blocking facial features.
  7. The apparatus of claim 5, wherein the neural network model comprises a convolutional neural network model.
  8. The apparatus of claim 5, wherein the external features comprise shape and/or color.
  9. A non-transitory computer readable storage medium, storing instructions, wherein the instructions, when executed by a processor, causing the processor to perform the method according to any one of claims 1 to 4.
  10. An electronic device, comprising the non-transitory computer readable storage medium of claim 9, and a processor capable of accessing to the non-transitory computer-readable storage medium.
PCT/KR2019/018710 2018-12-29 2019-12-30 Apparatus and method for generating a virtual avatar WO2020139054A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811632024.3A CN109727320A (en) 2018-12-29 2018-12-29 A kind of generation method and equipment of avatar
CN201811632024.3 2018-12-29

Publications (1)

Publication Number Publication Date
WO2020139054A1 true WO2020139054A1 (en) 2020-07-02

Family

ID=66297899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/018710 WO2020139054A1 (en) 2018-12-29 2019-12-30 Apparatus and method for generating a virtual avatar

Country Status (2)

Country Link
CN (1) CN109727320A (en)
WO (1) WO2020139054A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008940B (en) * 2019-06-04 2020-02-11 深兰人工智能芯片研究院(江苏)有限公司 Method and device for removing target object in image and electronic equipment
CN113344776B (en) * 2021-06-30 2023-06-27 北京字跳网络技术有限公司 Image processing method, model training method, device, electronic equipment and medium
CN115174985B (en) * 2022-08-05 2024-01-30 北京字跳网络技术有限公司 Special effect display method, device, equipment and storage medium
CN115019401B (en) * 2022-08-05 2022-11-11 上海英立视电子有限公司 Prop generation method and system based on image matching

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080050336A (en) * 2006-12-02 2008-06-05 한국전자통신연구원 A mobile communication terminal having a function of the creating 3d avata model and the method thereof
US20150312523A1 (en) * 2012-04-09 2015-10-29 Wenlong Li System and method for avatar management and selection
US20170054945A1 (en) * 2011-12-29 2017-02-23 Intel Corporation Communication using avatar
US20180374251A1 (en) * 2017-06-23 2018-12-27 Disney Enterprises, Inc. Single shot capture to animated vr avatar
US20180374242A1 (en) * 2016-12-01 2018-12-27 Pinscreen, Inc. Avatar digitization from a single image for real-time rendering

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469379B (en) * 2014-09-04 2020-07-28 广东中星微电子有限公司 Video target area shielding method and device
CN106204423B (en) * 2016-06-28 2019-09-27 Oppo广东移动通信有限公司 A kind of picture-adjusting method based on augmented reality, device and terminal
CN107145867A (en) * 2017-05-09 2017-09-08 电子科技大学 Face and face occluder detection method based on multitask deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080050336A (en) * 2006-12-02 2008-06-05 한국전자통신연구원 A mobile communication terminal having a function of the creating 3d avata model and the method thereof
US20170054945A1 (en) * 2011-12-29 2017-02-23 Intel Corporation Communication using avatar
US20150312523A1 (en) * 2012-04-09 2015-10-29 Wenlong Li System and method for avatar management and selection
US20180374242A1 (en) * 2016-12-01 2018-12-27 Pinscreen, Inc. Avatar digitization from a single image for real-time rendering
US20180374251A1 (en) * 2017-06-23 2018-12-27 Disney Enterprises, Inc. Single shot capture to animated vr avatar

Also Published As

Publication number Publication date
CN109727320A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
WO2020139054A1 (en) Apparatus and method for generating a virtual avatar
CN113569791B (en) Image processing method and device, processor, electronic device and storage medium
RU2679986C2 (en) Facial expression tracking
CN104599284B (en) Three-dimensional facial reconstruction method based on various visual angles mobile phone auto heterodyne image
WO2010005251A2 (en) Multiple object tracking method, device and storage medium
WO2020247174A1 (en) Single image-based real-time body animation
WO2022260386A1 (en) Method and apparatus for composing background and face by using deep learning network
CN111047509A (en) Image special effect processing method and device and terminal
US11758295B2 (en) Methods, systems, and media for generating compressed images
WO2022250401A1 (en) Methods and systems for generating three dimensional (3d) models of objects
CN108762508A (en) A kind of human body and virtual thermal system system and method for experiencing cabin based on VR
CN116634242A (en) Speech-driven speaking video generation method, system, equipment and storage medium
CN110610191A (en) Elevator floor identification method and device and terminal equipment
CN116051439A (en) Method, equipment and storage medium for removing rainbow-like glare of under-screen RGB image by utilizing infrared image
CN108241855A (en) image generating method and device
WO2024014819A1 (en) Multimodal disentanglement for generating virtual human avatars
WO2023075508A1 (en) Electronic device and control method therefor
CN112489144A (en) Image processing method, image processing apparatus, terminal device, and storage medium
WO2021261687A1 (en) Device and method for reconstructing three-dimensional human posture and shape model on basis of image
WO2022108275A1 (en) Method and device for generating virtual face by using artificial intelligence
CN106101489B (en) Template matching monitor video defogging system and its defogging method based on cloud platform
CN114758354A (en) Sitting posture detection method and device, electronic equipment, storage medium and program product
CN111429363A (en) Video noise reduction method based on video coding
WO2017150847A2 (en) Wide viewing angle image processing system, wide viewing angle image transmitting and reproducing method, and computer program therefor
WO2022158890A1 (en) Systems and methods for reconstruction of dense depth maps

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19903039

Country of ref document: EP

Kind code of ref document: A1