WO2020228384A1 - 虚拟头像生成方法及装置、存储介质 - Google Patents

虚拟头像生成方法及装置、存储介质 Download PDF

Info

Publication number
WO2020228384A1
WO2020228384A1 PCT/CN2020/074597 CN2020074597W WO2020228384A1 WO 2020228384 A1 WO2020228384 A1 WO 2020228384A1 CN 2020074597 W CN2020074597 W CN 2020074597W WO 2020228384 A1 WO2020228384 A1 WO 2020228384A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
face
face attribute
image
neural network
Prior art date
Application number
PCT/CN2020/074597
Other languages
English (en)
French (fr)
Inventor
刘庭皓
赵立晨
王权
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2020558437A priority Critical patent/JP2021528719A/ja
Priority to KR1020207015327A priority patent/KR102443026B1/ko
Priority to SG11202008025QA priority patent/SG11202008025QA/en
Priority to US16/994,148 priority patent/US11403874B2/en
Publication of WO2020228384A1 publication Critical patent/WO2020228384A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to the field of image processing, and in particular to a method and device for generating a virtual avatar, and a storage medium.
  • Facial feature point positioning is to calculate the positions of several pre-defined feature points, such as the corners of the eyes, the corners of the mouth, and the tip of the nose, on a picture including the face.
  • Current facial feature point positioning can define some simple features on the face, such as eye contours, mouth contours, etc., but cannot accurately locate more detailed feature information.
  • the present disclosure provides a method and device for generating a virtual avatar, and a storage medium.
  • a method for generating a virtual avatar comprising: determining a target task associated with at least one target face attribute, wherein the target face attribute is a plurality of predefined persons One of the face attributes; performing an analysis on the target face attributes on a target image that includes at least a face according to the target task to obtain target face attribute parameters associated with the target face attributes on the target image; Determine the target virtual avatar template corresponding to the target face attribute parameters according to the correspondence between the predefined face attribute parameters and the virtual avatar template; generate a virtual avatar on the target image based on the target virtual avatar template .
  • the analyzing the target image including at least a human face on the target face attribute according to the target task includes: determining a target neural network corresponding to the target face attribute; Input the target image into the target neural network to obtain an estimated value output by the target neural network, the estimated value indicating that the target image has at least one face attribute parameter associated with the target face attribute The possibility probability of; the face attribute parameter corresponding to the maximum value in the estimated value output by the target neural network is used as the target face attribute parameter.
  • the target face attribute includes at least one predefined category; the face attribute parameter corresponding to the maximum value among the estimated values output by the target neural network is used as The target face attribute parameter includes: for the first classification, the face attribute parameter corresponding to the maximum value of the estimated value for the first classification output by the target neural network is used as the For the target face attribute parameter corresponding to the first category, the first category is any category of at least one category included in the target face attribute.
  • the target neural network is trained in the following manner: at least one sample image including at least a human face is input into the first neural network, wherein each of the sample images is marked with the first human face attribute Associated face attribute parameters, the first neural network includes a first sub-network corresponding to the first face attribute, and the first face attribute is any one of the plurality of predefined face attributes Attributes; using the at least one face attribute parameter associated with the first face attribute on the at least one sample image output by the first neural network as a predicted value, and compare the value marked on the at least one sample image with the At least one face attribute parameter corresponding to the first face attribute is used as a true value, the first sub-network is trained, and the target neural network is obtained after the training is completed.
  • the first sub-network adopts a residual neural network structure and includes at least one residual unit.
  • each of the at least one residual unit includes at least one convolutional layer and at least one batch normalization layer; and, when the at least one residual unit includes a plurality of residual units , The number of convolutional layers included in the second residual unit in the plurality of residual units and the number of batch normalization layers are both greater than those included in the first residual unit in the plurality of residual units The number of the convolutional layer and the number of the batch normalization layer.
  • the first sub-network further includes an output segmentation layer, the output segmentation layer is used to classify the sample image from the sample image according to at least one predefined classification included in the first face attribute
  • the feature information extracted from the above is segmented to obtain an estimated value of at least one face attribute parameter associated with the at least one category.
  • the method further includes: performing affine transformation on the image of interest to obtain an image after the face is normalized; and cutting out an image of the target area from the image after the face is normalized to obtain the The target image or the sample image, wherein the target area includes at least the area where the key points of the human face are located.
  • the target area further includes an area of a preset area located outside the face part corresponding to the target face attribute.
  • a virtual avatar generating device comprising: a task determination module configured to determine a target task associated with at least one target face attribute, wherein the target face The attribute is one of multiple predefined face attributes; the face attribute analysis module is configured to perform analysis on the target face attributes on the target image including at least the face according to the target task, to obtain the target image The target face attribute parameter associated with the target face attribute; the virtual avatar template determination module is configured to determine the target face attribute parameter and the corresponding relationship between the predefined face attribute parameter and the virtual avatar template The target virtual avatar template corresponding to the attribute parameter; the avatar generating module is configured to generate a virtual avatar on the target image based on the target virtual avatar template.
  • the face attribute analysis module includes: a network determining sub-module configured to determine a target neural network corresponding to the target face attribute; an estimated value determining sub-module configured to The target image is input to the target neural network to obtain an estimated value output by the target neural network, and the estimated value indicates that the target image has at least one face attribute parameter associated with the target face attribute Likelihood probability; a parameter determination sub-module configured to use the face attribute parameter corresponding to the maximum value among the estimated values output by the target neural network as the target face attribute parameter.
  • the target face attribute includes at least one predefined category
  • the parameter determination sub-module is configured to: for the first category, the target neural network output is for the first category
  • the face attribute parameter corresponding to the maximum value in the estimated value is used as the target face attribute parameter corresponding to the first category, wherein the first category includes the target face attribute Any of at least one category.
  • the apparatus further includes a training module configured to input at least one sample image including at least a human face into the first neural network, wherein each sample image is marked with A face attribute parameter associated with a first face attribute, the first neural network includes a first sub-network corresponding to the first face attribute, and the first face attribute is the plurality of predefined faces Any one of the attributes; using at least one face attribute parameter associated with the first face attribute on the at least one sample image output by the first neural network as a predicted value, and using the at least one sample image At least one face attribute parameter corresponding to the first face attribute marked above is used as the true value, the first sub-network is trained, and the first sub-network obtained after the training can be used as the target Neural Networks.
  • a training module configured to input at least one sample image including at least a human face into the first neural network, wherein each sample image is marked with A face attribute parameter associated with a first face attribute, the first neural network includes a first sub-network corresponding to the first face attribute, and the first face attribute is
  • the first sub-network adopts a residual neural network structure and includes at least one residual unit.
  • each of the at least one residual unit includes at least one convolutional layer and at least one batch normalization layer; and, when the at least one residual unit includes a plurality of residual units , The number of convolutional layers included in the second residual unit in the plurality of residual units and the number of batch normalization layers are both greater than those included in the first residual unit in the plurality of residual units The number of the convolutional layer and the number of the batch normalization layer.
  • the first sub-network further includes an output segmentation layer, the output segmentation layer is used to classify the sample image from the sample image according to at least one predefined classification included in the first face attribute
  • the feature information extracted from the above is segmented to obtain an estimated value of at least one face attribute parameter associated with the at least one category.
  • the device further includes: a face correction processing module configured to perform affine transformation on the image of interest to obtain a face-corrected image; an image interception module configured to obtain an image from the person The image of the target area is cut out from the image after the face is corrected to obtain the target image or the sample image, wherein the target area includes at least the area where the key points of the human face are located.
  • the target area further includes an area of a preset area located outside the face part corresponding to the target face attribute.
  • a computer-readable storage medium wherein the storage medium stores a computer program, and the computer program is used to execute the virtual Avatar generation method.
  • a device for generating a virtual avatar comprising: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the The executable instructions stored in the memory implement the method for generating a virtual avatar according to any one of the first aspects.
  • the face attribute parameters on the target image including at least the face can be extracted, and then combined with the predefined face attribute parameters and the virtual avatar template.
  • the corresponding target virtual avatar template is determined for the target image, and the virtual avatar is generated on the target image based on the target virtual avatar template.
  • Fig. 1 is a flow chart showing a method for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 2 is a diagram showing an example of generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 3 is a diagram showing another example of generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 4 is a flow chart showing another method for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 5 is a flowchart of another method for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 6 is a flowchart showing another method for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 7 is an example diagram of a neural network according to an exemplary embodiment of the present disclosure.
  • Fig. 8 is a block diagram showing a device for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 9 is a block diagram showing another device for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 10 is a block diagram showing another device for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 11 is a block diagram showing another device for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • Fig. 12 is a schematic structural diagram of a device for generating a virtual avatar according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • word “if” as used herein can be interpreted as "when” or “when” or “in response to a certainty”.
  • the embodiment of the present disclosure provides a method for generating a virtual avatar, which may be executed by a face driving device, such as an avatar.
  • Fig. 1 shows an example of a method for generating a virtual avatar according to an embodiment of the present disclosure.
  • step 101 a target task associated with at least one target face attribute is determined.
  • the at least one target face attribute is one of multiple predefined face attributes.
  • face properties may include, but are not limited to, hairstyle, beard, glasses, eyelids, and the like.
  • the face driving device can provide the user with multiple predefined face attributes, and the user can determine the target face attribute to be analyzed in these face attributes according to his own needs.
  • the target task can analyze at least one face attribute mentioned above.
  • the target task can be to analyze hairstyles, or analyze beards and eyelids, and so on.
  • step 102 the target image including at least the face is analyzed on the target face attribute according to the target task, and the target face attribute parameter associated with the target face attribute on the target image is obtained.
  • the face driving device analyzes the face attributes of the target image collected by the camera, it can analyze only the target face attributes corresponding to the target task, so as to obtain the target face associated with the target face attributes on the target image Property parameter (target face property feature).
  • the eyelid can be associated with multiple predefined parameters, such as single eyelid, fan-shaped double eyelid, parallel double eyelid, European-style double eyelid, etc.
  • the face driving device can obtain the target face attribute parameters, such as parallel double eyelids, by analyzing the face attributes of the eyelids on the target image.
  • step 103 the target virtual avatar template corresponding to the target face attribute parameter is determined according to the correspondence between the predefined face attribute parameters and the virtual avatar template.
  • the face driving device can store the correspondence between the predefined face attribute parameters and the virtual avatar template.
  • each virtual avatar template corresponds to at least one face attribute parameter.
  • the virtual avatar template may adopt cartoon characters.
  • the face driving device may determine the target virtual avatar template corresponding to the target face attribute parameter in the pre-stored virtual avatar template library.
  • the number of target virtual avatar templates may be one or more, which is not limited in the present disclosure.
  • step 104 a virtual avatar is generated on the target image based on the target virtual avatar template.
  • the target virtual avatar template can be directly used as the virtual avatar to be generated on the target image.
  • the user can select one as the virtual avatar, and the face driving device generates the virtual avatar on the target image.
  • the generated virtual avatar can be shown in the upper right corner of Fig. 3.
  • the face attribute parameters on the target image including at least the face can be extracted according to the target task associated with at least one face attribute, and then combined with the corresponding relationship between the predefined face attribute parameters and the virtual avatar template Determine the corresponding target virtual avatar template, and generate a virtual avatar on the target image based on the target virtual avatar template.
  • a more accurate face attribute analysis can be performed on an image including at least a human face, and the correspondingly generated virtual avatar can provide users with richer initial face attribute styles.
  • step 102 may include:
  • step 102-1 a target neural network corresponding to the target face attribute is determined.
  • a corresponding target neural network may be trained for a target task associated with at least one face attribute.
  • the target neural network can be used as the corresponding neural network for subsequent face attribute analysis.
  • step 102-2 the target image is input to the target neural network to obtain an estimated value output by the target neural network, and the estimated value indicates that the target image has an attribute association with the target face The probability probability of at least one face attribute parameter.
  • the face driving device may input a target image including at least a human face into the target neural network, and the target neural network outputs an estimated value indicating the probability that the target image has at least one face attribute parameter associated with the target face attribute .
  • the estimated value of the probability that the target image has a certain face attribute parameter can also be simplified as the estimated value for the face attribute parameter or the estimated value of the face attribute parameter.
  • the target neural network can output the estimated values representing the probability that the target image has at least one face attribute parameter associated with the eyelids as shown in Table 1, respectively.
  • step 102-3 the face attribute parameter corresponding to the maximum value among the estimated values output by the target neural network is used as the target face attribute parameter.
  • the face attribute parameter corresponding to the maximum value 0.6 in the estimated value that is, the fan-shaped double eyelid can be used as the target face attribute parameter.
  • the aforementioned face attributes may be divided to obtain at least one subclass included in each face attribute, so as to more accurately describe the face attributes.
  • the division method may include but is not limited to the method shown in Table 2.
  • Face attributes Classification of face attributes hairstyle Bangs, curls, hair length Moustache Located in the person, in the center of the chin, on both sides of the chin glasses Glasses type, frame type, lens shape, frame thickness eyelid Single eyelid, fan-shaped double eyelid, parallel double eyelid, European-style double eyelid
  • classification of each face attribute may include at least one face attribute parameter, as shown in Table 3.
  • the classification of face attributes in the present disclosure includes but is not limited to the above methods.
  • the target face attributes that need to be analyzed for the target task may include at least one predefined category.
  • the face attribute of hairstyle includes three categories, namely bangs, curly hair, and hair length.
  • Each different category also includes at least one face attribute parameter.
  • the sum of the estimated values of all face attribute parameters associated with each category is 1.
  • the estimated values corresponding to the four face attribute parameters of no hair, straight hair, large curly hair, and small curly hair can be 0.1, 0.6, 0.2, and 0.1, and the sum value is 1.
  • the target neural network outputs the estimated value of the probability probability of at least one face attribute parameter associated with the category
  • the face driving device may use the face attribute parameter corresponding to the maximum value among the estimated values for the classification output by the target neural network as the target face attribute parameter corresponding to the classification.
  • the target face attribute is hairstyle, where the face attribute parameter corresponding to the maximum value of the estimated value for the category of bangs is no bangs, and the maximum value of the estimated value for the category of curly hair corresponds to The face attribute parameter is straight hair, and the face attribute parameter corresponding to the maximum value in the estimated value of the hair length category is short hair but not shoulder-length.
  • the face driving device can use the three face attribute parameters of no bangs, straight hair, and short hair but no shoulders as the target face attribute parameters corresponding to each of the three categories of bangs, curly hair, and hair length.
  • the above-mentioned virtual avatar generating method may further include the following:
  • step 100-1 an affine transformation is performed on the image of interest to obtain an image after the face is normalized.
  • the image of interest may be a pre-photographed image including a face
  • the face driving device may perform affine transformation on the image of interest, so as to correct the deflected face in the image of interest.
  • step 100-2 the image of the target area is cut out from the image after the face is corrected to obtain the target image.
  • the face driving device may adopt a face feature point positioning method, and the target area includes at least the area where the face key points are located.
  • the target area includes at least the area where the face key points are located.
  • an image that includes at least the region where the key points of the human face are located can be cut out from the image after the face is turned to be the target image.
  • the key points of the human face include but are not limited to eyebrows, eyes, nose, mouth, facial contours, etc.
  • the face driving device may perform face attribute analysis on the target image to obtain target face attribute parameters associated with the target face attributes on the target image.
  • the method for obtaining the target face attribute parameters is the same as the method in the foregoing embodiment, and will not be repeated here.
  • the target image including the area where the key points of the face are located can be intercepted, and then the face attribute analysis is performed on the target image, so that the result of the face attribute analysis is more accurate.
  • the target area when the face driving device intercepts the image of the target area from the image after the face is corrected, the target area may not only include the area where the key points of the face are located, but also It includes a region with a preset area outside the face part corresponding to the target face attribute.
  • the preset areas outside the face parts corresponding to different target face attributes may be different.
  • the corresponding face parts are mouth, eyebrows, eyes, etc.
  • the preset area may be less than half of the area occupied by the corresponding face part.
  • the target image may not only capture the area where the mouth is located, but also a preset area outside the mouth, and the preset area may be less than half of the area where the mouth is located.
  • the target face attribute is a hairstyle
  • the corresponding face part is a face contour.
  • the preset area may be half or more of the entire face contour area.
  • the target area includes not only the area where the key points of the face are located, but also a preset area outside the face part corresponding to the target face attributes, which improves the accuracy of the target face attribute analysis. degree.
  • the target neural network may include multiple sub-networks corresponding to different face attributes.
  • the target neural network includes 4 sub-networks corresponding to hair style, beard, glasses, and eyelids.
  • the process of training the target neural network may include:
  • step 201 at least one sample image including at least a human face is input to the first neural network.
  • each of the sample images is marked with a face attribute parameter associated with a first face attribute
  • the first neural network includes a first sub-network corresponding to the first face attribute.
  • the at least one sample image may be obtained by performing affine transformation on at least one pre-collected image including at least a human face, and then cutting out the image of the target area.
  • each pre-collected image including at least a human face can be subjected to affine transformation to correct the human face, and then an image of the target area can be intercepted from the image after the human face is normalized to obtain a corresponding sample image .
  • the target area at least includes the area where the key points of the human face are located.
  • step 202 at least one face attribute parameter associated with the first face attribute on the at least one sample image output by the first neural network is used as a predicted value, and the at least one sample image is marked At least one face attribute parameter corresponding to the first face attribute is used as a true value, and the first sub-network is trained. In this way, the first sub-network after training can be used as the target neural network.
  • the first face attribute may be any one of multiple predefined face attributes, for example, may be any one of eyelids, hairstyle, beard, and glasses.
  • the face attribute parameters associated with the first face attribute on the sample image are known.
  • the sample image may be marked with face attribute parameters associated with the first face attribute.
  • the face attribute parameters corresponding to a beard in a sample image may include no beard in the person, no beard in the center of the chin, and no beard on both sides of the chin.
  • At least one face attribute parameter associated with the first face attribute on at least one sample image output by the target neural network can be used as the predicted value of the neural network, and the value marked on the at least one sample image can be compared with the first face attribute parameter.
  • At least one face attribute parameter corresponding to the face attribute is used as a true value to optimize and adjust the network parameter of the first sub-network, so as to obtain the first sub-network corresponding to the first face attribute.
  • the sub-network corresponding to any face attribute can be obtained by training in the above-mentioned manner.
  • Multiple subnet faces constitute the target neural network.
  • the sub-network included in the target neural network in the embodiment of the present disclosure may adopt a residual network (Res Net).
  • the network structure of the residual network can be shown in Figure 7.
  • the residual network may include a single convolutional layer 710.
  • the convolutional layer 710 can be used to extract basic information and reduce the feature map dimension of the input image (for example, at least a target image or sample image including a human face). For example, reducing from 3 dimensions to 2 dimensions.
  • the deep residual network may also include two residual network blocks (ResNet Blob) 721 and 722.
  • ResNet Blob 721 may include a convolutional layer and a batch normalization (Batch Normalization, BN) layer, which can be used to extract feature information.
  • ResNet Blob 722 can also include a convolutional layer and a BN layer, and can also be used to extract feature information.
  • ResNet Blob 722 can have one more convolutional layer and BN layer than ResNet Blob 721 in structure. Therefore, ResNet Blob 722 can also be used to reduce the dimension of the feature map.
  • a deep residual network can be used to obtain the facial feature information of the target image more accurately.
  • any convolutional neural network structure can be used to perform feature extraction processing on the target region of the target image to obtain feature information of the face image of the target region, which is not limited in the present disclosure.
  • the deep residual network may further include a fully connected layer 730.
  • the deep residual network may include 3 fully connected layers.
  • the fully connected layer 730 can perform dimensionality reduction processing on the feature information of the face image, and at the same time retain useful information related to the face attributes.
  • the deep residual network may also include an output segmentation layer 740.
  • the output segmentation layer 740 may perform output segmentation processing on the output of the fully connected layer 730, specifically the last fully connected layer, to obtain an estimated value of at least one face attribute parameter associated with at least one face attribute classification. For example, after the output of the last fully connected layer is processed by the output segmentation layer 740, the four categories included when the first face attribute is glasses (which can be specifically the glasses type, frame type, lens shape, frame thickness) The corresponding estimated value of at least one face attribute parameter.
  • the pre-collected image of interest may also be processed first, for example, the face is corrected, and then the target area is cut out from the image after the face is corrected. Image to get the corresponding sample image.
  • the target area includes at least the area where the key points of the face are located in the image after the face is turned. This process is basically the same as that described in the face attribute analysis process, so I won’t repeat it here.
  • the target area when the sample image is intercepted, may include not only the area where the key points of the face are located, but also the outer face parts corresponding to different target face attributes. Set the area of the area. This process is also basically the same as that described in the face attribute analysis process, and will not be repeated here.
  • At least one of translation, rotation, zoom, and horizontal flip may also be processed, and The obtained processed image is also used as a sample image for subsequent network training.
  • the set of sample images is effectively expanded, and the target neural network obtained by subsequent training can be adapted to more complex scenes of face attribute analysis.
  • the present disclosure also provides apparatus embodiments.
  • FIG. 8 is a block diagram of a virtual avatar generating device provided by some embodiments of the present disclosure.
  • the device may include: a task determination module 810 configured to determine a target task associated with at least one target face attribute, Wherein, the at least one target face attribute is one of a plurality of predefined face attributes; the face attribute analysis module 820 is configured to perform the target image on the target image including at least the face according to the target task.
  • the analysis of the face attributes obtains the target face attribute parameters associated with the target face attributes on the target image; the virtual avatar template determination module 830 is configured to be based on one of the predefined face attribute parameters and the virtual avatar template The corresponding relationship between the two determines the target virtual avatar template corresponding to the target face attribute parameter; the avatar generation module 840 is configured to generate a virtual avatar on the target image based on the target virtual avatar template.
  • the face attribute analysis module 820 includes: a network determining sub-module 821 configured to determine a target neural network corresponding to the target face attribute; an estimated value The determining sub-module 822 is configured to input the target image into the target neural network to obtain an estimated value output by the target neural network, and the estimated value indicates that the target image has the same attributes as the target face.
  • the probability probability of the associated at least one face attribute parameter; the parameter determination sub-module 823 is configured to use the face attribute parameter corresponding to the maximum value among the estimated values output by the target neural network as the Target face attribute parameters.
  • the target face attribute includes at least one predefined category.
  • the parameter determination submodule 823 may be configured to: for the first classification, the person corresponding to the maximum value among the estimated values output by the target neural network for the first classification
  • the face attribute parameter is used as the target face attribute parameter corresponding to the first category.
  • the first classification is any one of at least one classification included in the target face attribute.
  • the device further includes: a face correction processing module 850, configured to perform affine transformation on the image of interest to obtain an image after the face is corrected; and an image interception module 860 , Configured to intercept an image of a target area from the image after the face is corrected to obtain the target image, wherein the target area includes at least an area where key points of the human face are located.
  • a face correction processing module 850 configured to perform affine transformation on the image of interest to obtain an image after the face is corrected
  • an image interception module 860 Configured to intercept an image of a target area from the image after the face is corrected to obtain the target image, wherein the target area includes at least an area where key points of the human face are located.
  • the target area further includes a predetermined area outside the face part corresponding to the target face attribute.
  • the target neural network includes multiple sub-networks corresponding to different face attributes.
  • the device further includes a training module 870 configured to input at least one sample image including at least a human face into the first neural network, wherein each of the The sample image is marked with a face attribute parameter associated with a first face attribute, the first neural network includes a first sub-network corresponding to the first face attribute, and the first face attribute is predefined Any one of the plurality of face attributes; use at least one face attribute parameter associated with the first face attribute on the at least one sample image output by the first neural network as a predicted value, and use all At least one face attribute parameter corresponding to the first face attribute marked on the at least one sample image is used as a true value, and the first sub-network is trained. In this way, the first sub-network obtained after the training can be used as the target neural network.
  • the first sub-network may adopt a residual neural network structure and includes at least one residual unit.
  • the at least one residual unit may each include at least one convolutional layer and at least one batch normalization layer.
  • the number of convolutional layers included in the second residual unit in the multiple residual units and the batch normalization is greater than the number of convolutional layers included in the first residual unit of the plurality of residual units and the number of batch normalization layers.
  • the first sub-network further includes an output segmentation layer, the output segmentation layer is used to classify the sample image from the sample image according to at least one predefined classification included in the first face attribute
  • the feature information extracted from the above is segmented to obtain an estimated value of at least one of the face attribute parameters respectively associated with the at least one category.
  • the target area includes at least the area where the key points of the face are located in the image after the face is turned. This process is basically the same as that described in the process of obtaining the target image from the image of interest, and will not be repeated here.
  • the target area includes not only the area where the key points of the face are located, but also the area located outside the preset area of the face part corresponding to the different target face attributes.
  • the relevant part can refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place. , Or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the present disclosure. Those of ordinary skill in the art can understand and implement it without creative work.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any one of the aforementioned methods for generating a virtual avatar.
  • the embodiment of the present disclosure also provides a virtual avatar generating device, the device comprising: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the storage in the memory Executable instructions to implement any of the aforementioned methods for generating virtual avatars.
  • FIG. 12 is a schematic structural diagram of a virtual avatar generating apparatus 1200 provided by some embodiments.
  • the apparatus 1200 may be provided as a virtual avatar generating apparatus, which is applied to a face driving device.
  • the apparatus 1200 includes a processing component 1222, which further includes one or more processors, and a memory resource represented by a memory 1232, for storing instructions executable by the processing component 1222, such as application programs.
  • the application program stored in the memory 1232 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1222 is configured to execute instructions to execute any of the aforementioned methods for generating a virtual avatar.
  • the device 1200 may further include a power component 1226 configured to perform power management of the device 1200, a wired or wireless network interface 1250 configured to connect the device 1200 to a network, and an input output (I/O) interface 1258.
  • the device 1200 can operate based on an operating system stored in the storage 1232, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeB SDTM or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Geometry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

本公开提供了一种虚拟头像生成方法及装置、存储介质。其中,所述方法的示例之一包括:确定与至少一个目标人脸属性关联的目标任务,所述至少一个目标人脸属性分别为预定义的多个人脸属性之一;根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数;根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版;基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。

Description

虚拟头像生成方法及装置、存储介质 技术领域
本公开涉及图像处理领域,尤其涉及一种虚拟头像生成方法及装置、存储介质。
背景技术
人脸特征点定位是在一张包括人脸的图片上,计算出预先定义好的若干个特征点的位置,例如眼角、嘴角、鼻尖等。目前的人脸特征点定位,可以定义人脸上的一些简单特征,例如眼睛轮廓、嘴巴轮廓等,但对更加精细的特征信息不能进行准确定位。
发明内容
有鉴于此,本公开提供了一种虚拟头像生成方法及装置、存储介质。
根据本公开实施例的第一方面,提供一种虚拟头像生成方法,所述方法包括:确定与至少一个目标人脸属性关联的目标任务,其中,所述目标人脸属性为预定义的多个人脸属性之一;根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数;根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版;基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。
在一些可选实施例中,所述根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,包括:确定与所述目标人脸属性对应的目标神经网络;将所述目标图像输入所述目标神经网络,获得所述目标神经网络输出的预估值,所述预估值表示所述目标图像具有与所述目标人脸属性关联的至少一个人脸属性参数的可能性概率;将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数。
在一些可选实施例中,所述目标人脸属性包括预定义的至少一个分类;所述将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数,包括:针对第一分类,将所述目标神经网络输出的针对所述第一分类的所述预估值中的最大值所对应的人脸属性参数,作为所述第一分类对应的所述目标人脸属性参数,所述第一分类为所述目标人脸属性包括的至少一个分类中的任一分类。
在一些可选实施例中,采用以下方式训练得到所述目标神经网络:将至少包括人脸的至少一个样本图像输入第一神经网络,其中,各所述样本图像标记有与第一人脸属性关联的人脸属性参数,所述第一神经网络包括与所述第一人脸属性对应的第一子网络,所述第一人脸属性是预定义的所述多个人脸属性中的任一属性;将所述第一神经网络输出的所述至少一个样本图像上与所述第一人脸属性关联的至少一个人脸属性参数作为预测值,将所述至少一个样本图像上标记的与所述第一人脸属性对应的至少一个人脸属性参数作为真实值,对所述第一子网络进行训练,训练完成后得到所述目标神经网络。
在一些可选实施例中,所述第一子网络采用残差神经网络的网络结构,且包括至少一个残差单元。
在一些可选实施例中,所述至少一个残差单元各自包括至少一个卷积层以及至少一个批量归一化层;并且,在所述至少一个残差单元包括多个残差单元的情况下,所述多个残差单元中的第二残差单元所包括的卷积层的数目和批量归一化层的数目均大于所述多个残差单元中的第一残差单元所包括的所述卷积层的数目和所述批量归一化层的数目。
在一些可选实施例中,所述第一子网络还包括输出分割层,所述输出分割层用于按照所述第一人脸属性包括的预定义的至少一个分类,对从所述样本图像中提取出的特征信息进行分割,得到针对所述至少一个分类各自关联的至少一个人脸属性参数的预估值。
在一些可选实施例中,所述方法还包括:对关注图像进行仿射变换,得到人脸转正后的图像;从所述人脸转正后的图像中截取出目标区域的图像,得到所述目标图像或所述样本图像,其中,所述目标区域至少包括人脸关键点所在的区域。此外,可选地,所述目标区域还包括位于所述目标人脸属性所对应的人脸部位外侧的预设面积的区域。
根据本公开实施例的第二方面,提供一种虚拟头像生成装置,所述装置包括:任务确定模块,被配置为确定与至少一个目标人脸属性关联的目标任务,其中,所述目标人脸属性为预定义的多个人脸属性之一;人脸属性分析模块,被配置为根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数;虚拟头像模版确定模块,被配置为根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版;头像生成模块,被配置为基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。
在一些可选实施例中,所述人脸属性分析模块包括:网络确定子模块,被配置为确定与所述目标人脸属性对应的目标神经网络;预估值确定子模块,被配置为将所述目标图像输入所述目标神经网络,获得所述目标神经网络输出的预估值,所述预估值表示所述目标图像具有与所述目标人脸属性关联的至少一个人脸属性参数的可能性概率;参数确定子模块,被配置为将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数。
在一些可选实施例中,所述目标人脸属性包括预定义的至少一个分类,所述参数确定子模块配置为:针对第一分类,将所述目标神经网络输出的针对所述第一分类的所述预估值中的最大值所对应的人脸属性参数,作为所述第一分类对应的所述目标人脸属性参数,其中,所述第一分类为所述目标人脸属性包括的至少一个分类中的任一分类。
在一些可选实施例中,所述装置还包括训练模块,所述训练模块被配置为:将至少包括人脸的至少一个样本图像输入第一神经网络,其中,各所述样本图像标记有与第一人脸属性关联的人脸属性参数,所述第一神经网络包括与所述第一人脸属性对应的第一子网络,所述第一人脸属性是预定义的所述多个人脸属性中的任一个属性;将所述第一神经网络输出的所述至少一个样本图像上与所述第一人脸属性关联的至少一个人脸属性参数作为预测值,将所述至少一个样本图像上标记的与所述第一人脸属性对应的至少一个人脸属性参数作为真实值,对所述第一子网络进行训练,训练完成后得到的所述第一子网络即可作为所述目标神经网络。
在一些可选实施例中,所述第一子网络采用残差神经网络的网络结构,且包括至少一个残差单元。
在一些可选实施例中,所述至少一个残差单元各自包括至少一个卷积层以及至少一个批量归一化层;并且,在所述至少一个残差单元包括多个残差单元的情况下,所述多个残差单元中的第二残差单元所包括的卷积层的数目和批量归一化层的数目均大于所述多个残差单元中的第一残差单元所包括的所述卷积层的数目和所述批量归一化层的数目。
在一些可选实施例中,所述第一子网络还包括输出分割层,所述输出分割层用于按照所述第一人脸属性包括的预定义的至少一个分类,对从所述样本图像中提取出的特征信息进行分割,得到针对所述至少一个分类各自关联的至少一个人脸属性参数的预估值。
在一些可选实施例中,所述装置还包括:人脸转正处理模块,被配置为对关注图像 进行仿射变换,得到人脸转正后的图像;图像截取模块,被配置为从所述人脸转正后的图像中截取出目标区域的图像,得到所述目标图像或所述样本图像,其中,所述目标区域至少包括人脸关键点所在的区域。此外,可选地,所述目标区域还包括位于所述目标人脸属性所对应的人脸部位外侧的预设面积的区域。
根据本公开实施例的第三方面,提供一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行上述第一方面中任一所述的虚拟头像生成方法。
根据本公开实施例的第四方面,提供一种虚拟头像生成装置,所述装置包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现上述第一方面中任一项所述的虚拟头像生成方法。
本公开实施例中,可以根据与至少一个人脸属性关联的目标任务,提取至少包括人脸的目标图像上的人脸属性参数,再结合预定义的人脸属性参数和虚拟头像模版之间的对应关系,为目标图像确定对应的目标虚拟头像模版,基于目标虚拟头像模版在目标图像上生成虚拟头像。通过与至少一个人脸属性关联的目标任务,对至少包括人脸的图像进行更精确地人脸属性分析,生成虚拟头像,为用户提供丰富的初始化人脸属性样式。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是本公开根据一示例性实施例示出的一种虚拟头像生成方法流程图。
图2是本公开根据一示例性实施例示出的一种虚拟头像生成示例图。
图3是本公开根据一示例性实施例示出的另一种虚拟头像生成示例图。
图4是本公开根据一示例性实施例示出的另一种虚拟头像生成方法流程图。
图5是本公开根据一示例性实施例示出的另一种虚拟头像生成方法流程图。
图6是本公开根据一示例性实施例示出的另一种虚拟头像生成方法流程图。
图7是本公开根据一示例性实施例示出的一种神经网络示例图。
图8是本公开根据一示例性实施例示出的一种虚拟头像生成装置框图。
图9是本公开根据一示例性实施例示出的另一种虚拟头像生成装置框图。
图10是本公开根据一示例性实施例示出的另一种虚拟头像生成装置框图。
图11是本公开根据一示例性实施例示出的另一种虚拟头像生成装置框图。
图12是本公开根据一示例性实施例示出的一种用于虚拟头像生成装置的一结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开运行的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所运行的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中运行的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所运行的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
本公开实施例提供了一种虚拟头像生成方法,可以由人脸驱动设备,例如avatar执行。
如图1所示,图1示出了根据本公开实施例的虚拟头像生成方法的一个例子。
在步骤101中,确定与至少一个目标人脸属性关联的目标任务。其中,所述至少一个目标人脸属性分别为预定义的多个人脸属性之一。
在本公开实施例中,人脸属性(face property)可以包括但不限于发型、胡子、眼镜、眼皮等。
人脸驱动设备可以为用户提供预定义的多个人脸属性,由用户根据自身需要在这些人脸属性中确定要分析的目标人脸属性。其中,目标任务可以对上述至少一个人脸属性进行分析。例如,目标任务可以是对发型进行分析,或者对胡子和眼皮进行分析等等。
在步骤102中,根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数。
人脸驱动设备对通过相机采集到的目标图像进行人脸属性分析时,可以只对目标任务对应的目标人脸属性进行分析,从而得到目标图像上与所述目标人脸属性关联的目标人脸属性参数(target face property feature)。
例如,假设目标人脸属性为眼皮,眼皮可以关联有多个预定义的参数,如包括单眼皮、开扇形双眼皮、平行型双眼皮、欧式型双眼皮等。人脸驱动设备通过对目标图像进行关于眼皮的人脸属性分析,可以得到目标人脸属性参数,例如平行型双眼皮。
在步骤103中,根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版。
人脸驱动设备中可以存储预定义的人脸属性参数和虚拟头像模版之间的对应关系。例如每个虚拟头像模版对应至少一个人脸属性参数。可选地,虚拟头像模版可以采用卡通人物形象。
人脸驱动设备在确定目标人脸属性参数之后,可以在预存的虚拟头像模版库中确定与目标人脸属性参数对应的目标虚拟头像模版。其中,目标虚拟头像模版的数目可以是一个或多个,本公开对此不作限定。
在步骤104中,基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。
在人脸驱动设备例如avatar中,可以直接将目标虚拟头像模版作为要在目标图像上生成的虚拟头像。
如果目标虚拟头像模版的数目为多个,则可以由用户选择一个作为虚拟头像,并由人脸驱动设备在目标图像上生成该虚拟头像。
例如,假设目标图像如图2所示,目标任务是对发型进行人脸属性分析,则生成的虚拟头像可以如图3的右上角所示。
上述实施例中,可以根据与至少一个人脸属性关联的目标任务提取至少包括人脸的目标图像上的人脸属性参数,再结合预定义的人脸属性参数和虚拟头像模版之间的对应 关系确定对应的目标虚拟头像模版,基于目标虚拟头像模版在目标图像上生成虚拟头像。如此,可对至少包括人脸的图像进行更精确地人脸属性分析,相应生成的虚拟头像可为用户提供更丰富的初始化人脸属性样式。
在一些可选实施例中,例如图4所示,步骤102可以包括:
在步骤102-1中,确定与所述目标人脸属性对应的目标神经网络。
本公开实施例中,可以为与至少一个人脸属性关联的目标任务训练得到对应的目标神经网络。在需要执行目标任务时,可以将目标神经网络作为相应的神经网络,以便后续进行人脸属性分析。
在步骤102-2中,将所述目标图像输入所述目标神经网络,获得所述目标神经网络输出的预估值,所述预估值表示所述目标图像具有与所述目标人脸属性关联的至少一个人脸属性参数的可能性概率。
人脸驱动设备可以将至少包括人脸的目标图像输入到目标神经网络中,由目标神经网络输出表示目标图像具有与目标人脸属性关联的至少一个人脸属性参数的可能性概率的预估值。其中,表示目标图像具有某个人脸属性参数的可能性概率的预估值,以下也可简化表述为针对该人脸属性参数的预估值、或者该人脸属性参数的预估值。
例如,假设目标人脸属性为眼皮,则目标神经网络可以输出表示目标图像具有与眼皮关联的至少一个人脸属性参数的可能性概率的预估值分别为如表1所示。
表1
人脸属性参数 可能性概率的预估值
单眼皮 0.1
开扇形双眼皮 0.6
平行型双眼皮 0.2
欧式型双眼皮 0.1
在步骤102-3中,将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数。
例如,根据表1可以将预估值中的最大值0.6所对应的人脸属性参数,即开扇形双眼皮作为目标人脸属性参数。
在一些可选实施例中,可以对上述人脸属性进行划分,得到每个人脸属性分别包括 的至少一个分类(subclass),从而更精确的描述人脸属性。划分方式可以包括但不限于表2所示的方式。
表2
人脸属性 人脸属性的分类
发型 刘海、卷发、头发长度
胡子 位于人中、位于下巴中心、位于下巴两侧
眼镜 眼镜种类、镜框类型、镜片形状、镜框粗细
眼皮 单眼皮、开扇形双眼皮、平行型双眼皮、欧式型双眼皮
再进一步地,每个人脸属性的分类可以包括至少一个人脸属性参数,例如表3所示。
表3
Figure PCTCN2020074597-appb-000001
本公开对人脸属性的划分包括但不限于上述方式。
本公开实施例中,根据表3的划分,目标任务所需要分析的目标人脸属性可以包括预定义的至少一个分类。例如,如果目标人脸属性为发型,则发型这个人脸属性包括了 三个分类,分别是刘海、卷发和头发长度。每个不同的分类还包括了至少一个人脸属性参数。其中,每个分类所关联的所有人脸属性参数的预估值的和值为1。例如,对于卷发这个分类,无头发、直发、大卷发、小卷发四个人脸属性参数各自对应的预估值可以分别为0.1、0.6、0.2和0.1,和值为1。
相应地,在上述实施例中,针对所述目标人脸属性的至少一个分类中的任一个分类,目标神经网络输出与该分类关联的至少一个人脸属性参数的可能性概率的预估值之后,人脸驱动设备可以将所述目标神经网络输出的针对该分类的所述预估值中的最大值所对应的人脸属性参数,作为该分类对应的目标人脸属性参数。例如,假设目标人脸属性为发型,其中,针对刘海这个分类的预估值中的最大值所对应的人脸属性参数为无刘海,针对卷发这个分类的预估值中的最大值所对应的人脸属性参数为直发,针对头发长度这个分类的预估值中的最大值所对应的人脸属性参数为短发但不过肩。则最终人脸驱动设备可以将无刘海、直发、短发但不过肩这三个人脸属性参数分别作为刘海、卷发和头发长度这三个分类各自对应的目标人脸属性参数。
在一些可选实施例中,例如图5所示,至少在执行步骤102之前,上述虚拟头像生成方法还可包括以下:
在步骤100-1中,对关注图像进行仿射变换,得到人脸转正后的图像。
本公开实施例中,所述关注图像(image of interest)可以是预先拍摄的包括人脸的图像,人脸驱动设备可以对关注图像进行仿射变换,从而将关注图像中偏转的人脸转正。
在步骤100-2中,从所述人脸转正后的图像中截取出目标区域的图像,得到所述目标图像。
本公开实施例中,人脸驱动设备可以采用人脸特征点定位方法,并且所述目标区域至少包括人脸关键点所在的区域。这样,可从所述人脸转正后的图像中截取出至少包括人脸关键点所在区域的图像,作为所述目标图像。其中,人脸关键点包括但不限于眉毛、眼睛、鼻子、嘴巴、脸部轮廓等。
相应地,人脸驱动设备在执行上述步骤102时,可以对所述目标图像进行人脸属性分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数。获得目标人脸属性参数的方法与上述实施例中的方法一致,在此不再赘述。
本公开实施例中,可以对关注图像进行人脸转正后,再截取出包括人脸关键点所在区域的目标图像,后续针对目标图像进行人脸属性分析,使得人脸属性分析的结果更加 准确。
在一些可选实施例中,针对上述步骤100-2,人脸驱动设备在从人脸转正后的图像中截取目标区域的图像时,目标区域除了可以包括人脸关键点所在的区域,还可以包括位于所述目标人脸属性所对应的人脸部位外侧预设面积的区域。可选地,不同的目标人脸属性所对应的人脸部位外侧的预设面积可以不同。
例如,在目标人脸属性为胡子、眼镜或眼皮时,对应的人脸部位为嘴巴、眉毛、眼睛等,且所述预设面积可以小于相应的人脸部位所占面积的一半。例如,在嘴巴作为目标人脸属性时,目标图像除了要截取嘴巴所在区域之外,还可以截取嘴巴外侧预设面积的区域,且该预设面积可以小于所截取的嘴巴所在区域面积的一半。
又例如,如果目标人脸属性为发型,对应的人脸部位为脸部轮廓。在这种情况下,为了避免提取发型时产生偏差,预设面积可以为整个脸部轮廓面积的一半或以上。
上述实施例中,目标区域除了包括人脸关键点所在的区域,还可以包括位于目标人脸属性所对应的人脸部位外侧的预设面积的区域,提高了进行目标人脸属性分析的准确度。
在一些可选实施例中,目标神经网络可以包括对应不同人脸属性的多个子网络。例如,目标神经网络包括分别对应发型、胡子、眼镜、眼皮的4个子网络。
在本公开实施例中,如图6所示,训练目标神经网络的过程可以包括:
在步骤201中,将至少包括人脸的至少一个样本图像输入第一神经网络。其中,各所述样本图像标记有与第一人脸属性关联的人脸属性参数,并且所述第一神经网络包括与所述第一人脸属性对应的第一子网络。
其中,可通过对预先采集的至少包括人脸的至少一个图像进行仿射变换后,截取出目标区域的图像,来得到所述至少一个样本图像。
本公开实施例中,可以对预先采集的至少包括人脸的每一个图像进行仿射变换以将人脸转正,然后从人脸转正后的图像截取出目标区域的图像,得到相应的一个样本图像。其中,所述目标区域至少包括人脸关键点所在的区域。
在步骤202中,将所述第一神经网络输出的所述至少一个样本图像上与所述第一人脸属性关联的至少一个人脸属性参数作为预测值,将所述至少一个样本图像上标记的与所述第一人脸属性对应的至少一个人脸属性参数作为真实值,对所述第一子网络进行训 练。这样,训练完成后的所述第一子网络可被用作所述目标神经网络。
本公开实施例中,第一人脸属性可以是预定义的多个人脸属性中的任一个,例如可以是眼皮、发型、胡子、眼镜中的任一个。
如上所述,样本图像上与第一人脸属性关联的人脸属性参数是已知的。换言之,样本图像可标记有与第一人脸属性关联的人脸属性参数。例如,假设第一人脸属性为胡子,某个样本图像上与胡子对应的人脸属性参数可包括人中无胡子、下巴中心无胡子、下巴两侧无胡子。
本公开实施例中,可以将目标神经网络输出的至少一个样本图像上与第一人脸属性关联的至少一个人脸属性参数作为神经网络的预测值,将至少一个样本图像上标记的与第一人脸属性对应的至少一个人脸属性参数作为真实值,来优化调整第一子网络的网络参数,从而得到与第一人脸属性对应的第一子网络。
本公开实施例中,可以采用上述方式训练得到任一人脸属性对应的子网络。多个子网脸构成了目标神经网络。
在一些可选实施例中,本公开实施例中的目标神经网络所包括的子网络可以采用残差网络(Res Net)。残差网络的网络结构可以如图7所示。
该残差网络可包括一个单独的卷积层710。该卷积层710可用于提取基本信息,并降低输入图像(例如,至少包括人脸的目标图像或样本图像)的特征图(feature map)维度。例如,从3维降为2维。
如图7所示,该深度残差网络还可以包括两个残差网络块(ResNet Blob)721和722。ResNet Blob在结构上的特点为具有一个残差单元,从而可以在不改变任务整体输入输出的情况下,将任务的复杂度降低。其中,ResNet Blob 721可以包括卷积层以及批量归一化(Batch Normalization,BN)层,可用于提取特征信息。ResNet Blob 722也可以包括卷积层以及BN层,也可用于提取特征信息。不过,ResNet Blob 722在结构上可以比ResNet Blob 721多一个卷积层以及BN层,因此,ResNet Blob 722还可用于降低特征图的维度。
通过这种方式,可以利用深度的残差网络,较准确地得到目标图像的人脸特征信息。应当理解,可以使用任意一种卷积神经网络结构对目标图像的目标区域进行特征提取处理,得到目标区域的人脸图像的特征信息,本公开对此不作限制。
如图7所示,该深度残差网络还可以包括全连接层730。例如,该深度残差网络可 以包括3个全连接层。全连接层730可将人脸图像的特征信息进行降维处理,并同时保留有用的人脸属性相关的信息。
该深度残差网络还可以包括输出分割层740。该输出分割层740可将全连接层730、具体为其中最后一个全连接层的输出进行输出分割处理,得到与至少一个人脸属性分类关联的至少一个人脸属性参数的预估值。例如,最后一个全连接层的输出经过输出分割层740处理后,可得到第一人脸属性为眼镜时所包括的4个分类(可具体为眼镜种类、镜框类型、镜片形状、镜框粗细)各自对应的至少一个人脸属性参数的预估值。
在一些可选实施例中,在步骤201之前,同样可以通过对预采集的关注图像先进行处理,例如进行人脸的转正,然后从所述人脸转正后的图像中截取出所述目标区域的图像,得到对应的样本图像。其中,目标区域至少包括人脸转正后的图像上人脸关键点所在的区域。这一过程与人脸属性分析过程中描述的基本一致,在此不再赘述。
在一些可选实施例中,在对样本图像进行截取时,目标区域除了包括人脸关键点所在的区域之外,还可以包括位于不同的目标人脸属性各自所对应的人脸部位外侧预设面积的区域。该过程也与人脸属性分析过程中描述的基本相同,在此也不再赘述。
在一些可选实施例中,预先采集的至少包括人脸的至少一个关注图像经过人脸转正和目标区域截取之后,还可以进行平移、旋转、缩放和水平翻转中的至少一项处理,并将得到的处理后图像也作为样本图像,进行后续的网络训练。这样,有效扩充了样本图像的集合,可以让后续训练得到的目标神经网络适应更多的人脸属性分析的复杂场景。
与前述方法实施例相对应,本公开还提供了装置的实施例。
如图8所示,图8是本公开一些实施例提供的一种虚拟头像生成装置的框图,装置可包括:任务确定模块810,被配置为确定与至少一个目标人脸属性关联的目标任务,其中,所述至少一个目标人脸属性分别为预定义的多个人脸属性之一;人脸属性分析模块820,被配置为根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数;虚拟头像模版确定模块830,被配置为根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版;头像生成模块840,被配置为基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。
在一些可选实施例中,例如图9所示,所述人脸属性分析模块820包括:网络确定子模块821,被配置为确定与所述目标人脸属性对应的目标神经网络;预估值确定 子模块822,被配置为将所述目标图像输入所述目标神经网络,获得所述目标神经网络输出的预估值,所述预估值表示所述目标图像具有与所述目标人脸属性关联的至少一个人脸属性参数的可能性概率;参数确定子模块823,被配置为将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数。
在一些可选实施例中,所述目标人脸属性包括预定义的至少一个分类。在此情况下,所述参数确定子模块823可被配置为:针对第一分类,将所述目标神经网络输出的针对所述第一分类的所述预估值中的最大值所对应的人脸属性参数,作为所述第一分类对应的目标人脸属性参数。其中,所述第一分类为所述目标人脸属性包括的至少一个分类中的任一分类。
在一些可选实施例中,例如图10所示,所述装置还包括:人脸转正处理模块850,被配置为对关注图像进行仿射变换,得到人脸转正后的图像;图像截取模块860,被配置为从所述人脸转正后的图像中截取出目标区域的图像,得到所述目标图像,其中,所述目标区域至少包括人脸关键点所在的区域。
在一些可选实施例中,所述目标区域还包括位于所述目标人脸属性所对应的人脸部位外侧的预设面积的区域。
在一些可选实施例中,所述目标神经网络包括对应不同的人脸属性的多个子网络。在这种情况下,例如图11所示,所述装置还包括训练模块870,所述训练模块被配置为:将至少包括人脸的至少一个样本图像输入第一神经网络,其中,各所述样本图像标记有与第一人脸属性关联的人脸属性参数,所述第一神经网络包括与所述第一人脸属性对应的第一子网络,所述第一人脸属性是预定义的所述多个人脸属性中的任一个属性;将所述第一神经网络输出的所述至少一个样本图像上与所述第一人脸属性关联的至少一个人脸属性参数作为预测值,将所述至少一个样本图像上标记的与所述第一人脸属性对应的至少一个人脸属性参数作为真实值,对所述第一子网络进行训练。这样,训练完成后得到的所述第一子网络即可作为所述目标神经网络。
在一些可选实施例中,所述第一子网络可采用残差神经网络的网络结构,且包括至少一个残差单元。其中,所述至少一个残差单元可各自包括至少一个卷积层以及至少一个批量归一化层。
在一些可选实施例中,如果所述第一子网络包括多个残差单元,则所述多个残差单元中的第二残差单元所包括的卷积层的数目和批量归一化层的数目均大于所述多 个残差单元中的第一残差单元所包括的所述卷积层的数目和所述批量归一化层的数目。
在一些可选实施例中,所述第一子网络还包括输出分割层,所述输出分割层用于按照所述第一人脸属性包括的预定义的至少一个分类,对从所述样本图像中提取出的特征信息进行分割,得到针对所述至少一个分类各自关联的至少一个所述人脸属性参数的预估值。
在一些可选实施例中,同样可以通过对预采集的关注图像先进行处理,例如进行人脸的转正,然后从所述人脸转正后的图像中截取出所述目标区域的图像,得到对应的样本图像。其中,目标区域至少包括人脸转正后的图像上人脸关键点所在的区域。这一过程与从关注图像获得目标图像的过程中描述的基本一致,在此不再赘述。此外,在对样本图像进行截取时,目标区域除了包括人脸关键点所在的区域之外,也可以包括位于不同的目标人脸属性各自所对应的人脸部位外侧预设面积的区域。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本公开实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述任一所述的虚拟头像生成方法。
本公开实施例还提供了一种虚拟头像生成装置,所述装置包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现上述任一所述的虚拟头像生成方法。
如图12所示,图12是一些实施例提供的一种虚拟头像生成装置1200的一结构示意图。例如,装置1200可以被提供为一虚拟头像生成装置,应用在人脸驱动设备上。参照图12,装置1200包括处理组件1222,其进一步包括一个或多个处理器,以及由存储器1232所代表的存储器资源,用于存储可由处理部件1222的执行的指令,例如应用程序。存储器1232中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1222被配置为执行指令,以执行上述任一的虚拟头像生成方法。
装置1200还可以包括一个电源组件1226被配置为执行装置1200的电源管理,一个有线或无线网络接口1250被配置为将装置1200连接到网络,和一个输入输出(I/O)接口1258。装置1200可以操作基于存储在存储器1232的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeB SDTM或类似。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或者惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。

Claims (20)

  1. 一种虚拟头像生成方法,包括:
    确定与至少一个目标人脸属性关联的目标任务,其中,所述目标人脸属性为预定义的多个人脸属性之一;
    根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数;
    根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版;
    基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,包括:
    确定与所述目标人脸属性对应的目标神经网络;
    将所述目标图像输入所述目标神经网络,获得所述目标神经网络输出的预估值,所述预估值表示所述目标图像具有与所述目标人脸属性关联的至少一个人脸属性参数的可能性概率;
    将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数。
  3. 根据权利要求2所述的方法,其特征在于,所述目标人脸属性包括预定义的至少一个分类;所述将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数,包括:
    针对第一分类,将所述目标神经网络输出的针对所述第一分类的所述预估值中的最大值所对应的人脸属性参数,作为所述第一分类对应的所述目标人脸属性参数,所述第一分类为所述目标人脸属性包括的至少一个分类中的任一分类。
  4. 根据权利要求2-3任一项所述的方法,其特征在于,采用以下方式训练得到所述目标神经网络:
    将至少包括人脸的至少一个样本图像输入第一神经网络,其中,各所述样本图像标记有与第一人脸属性关联的人脸属性参数,所述第一神经网络包括与所述第一人脸属性对应的第一子网络,所述第一人脸属性是预定义的所述多个人脸属性中的任一属性;
    将所述第一神经网络输出的所述至少一个样本图像上与所述第一人脸属性关联的至少一个人脸属性参数作为预测值,将所述至少一个样本图像上标记的与所述第一人脸属性对应的至少一个人脸属性参数作为真实值,对所述第一子网络进行训练,训练完成 后得到所述目标神经网络。
  5. 根据权利要求4所述的方法,其特征在于,所述第一子网络采用残差神经网络的结构,且包括至少一个残差单元。
  6. 根据权利要求5所述的方法,其特征在于,
    所述至少一个残差单元各自包括至少一个卷积层以及至少一个批量归一化层;并且
    在所述至少一个残差单元包括多个残差单元的情况下,所述多个残差单元中的第二残差单元所包括的卷积层的数目和批量归一化层的数目均大于所述多个残差单元中的第一残差单元所包括的所述卷积层的数目和所述批量归一化层的数目。
  7. 根据权利要求5或6所述的方法,其特征在于,
    所述第一子网络还包括输出分割层,
    所述输出分割层用于按照所述第一人脸属性包括的预定义的至少一个分类,对从所述样本图像中提取出的特征信息进行分割,得到针对所述至少一个分类各自关联的至少一个人脸属性参数的预估值。
  8. 根据权利要求4-7任一项所述的方法,还包括:
    对关注图像进行仿射变换,得到人脸转正后的图像;
    从所述人脸转正后的图像中截取出目标区域的图像,得到所述目标图像或所述样本图像,其中,所述目标区域至少包括人脸关键点所在的区域。
  9. 根据权利要求8所述的方法,其特征在于,所述目标区域还包括位于所述目标人脸属性所对应的人脸部位外侧的预设面积的区域。
  10. 一种虚拟头像生成装置,包括:
    任务确定模块,被配置为确定与至少一个目标人脸属性关联的目标任务,其中,所述目标人脸属性为预定义的多个人脸属性之一;
    人脸属性分析模块,被配置为根据所述目标任务对至少包括人脸的目标图像进行关于所述目标人脸属性的分析,获得所述目标图像上与所述目标人脸属性关联的目标人脸属性参数;
    虚拟头像模版确定模块,被配置为根据预定义的人脸属性参数和虚拟头像模版之间的对应关系,确定与所述目标人脸属性参数对应的目标虚拟头像模版;
    头像生成模块,被配置为基于所述目标虚拟头像模版在所述目标图像上生成虚拟头像。
  11. 根据权利要求10所述的装置,其特征在于,所述人脸属性分析模块包括:
    网络确定子模块,被配置为确定与所述目标人脸属性对应的目标神经网络;
    预估值确定子模块,被配置为将所述目标图像输入所述目标神经网络,获得所述目标神经网络输出的预估值,所述预估值表示所述目标图像具有与所述目标人脸属性关联的至少一个人脸属性参数的可能性概率;
    参数确定子模块,被配置为将所述目标神经网络输出的所述预估值中的最大值所对应的人脸属性参数,作为所述目标人脸属性参数。
  12. 根据权利要求11所述的装置,其特征在于,所述目标人脸属性包括预定义的至少一个分类;所述参数确定子模块配置为:
    针对第一分类,将所述目标神经网络输出的针对所述第一分类的所述预估值中的最大值所对应的人脸属性参数,作为所述第一分类对应的所述目标人脸属性参数,其中,所述第一分类为所述目标人脸属性包括的至少一个分类中的任一分类。
  13. 根据权利要求10-12任一项所述的装置,其特征在于,所述装置还包括训练模块,所述训练模块被配置为:
    将至少包括人脸的至少一个样本图像输入第一神经网络,其中,各所述样本图像标记有与第一人脸属性关联的人脸属性参数,所述第一神经网络包括与所述第一人脸属性对应的第一子网络,所述第一人脸属性是预定义的所述多个人脸属性中的任一个属性;
    将所述第一神经网络输出的所述至少一个样本图像上与所述第一人脸属性关联的至少一个人脸属性参数作为预测值,将所述至少一个样本图像上标记的与所述第一人脸属性对应的至少一个人脸属性参数作为真实值,对所述第一子网络进行训练,训练完成后得到的所述第一子网络作为所述目标神经网络。
  14. 根据权利要求13所述的装置,其特征在于,所述第一子网络采用残差神经网络的网络结构,且包括至少一个残差单元。
  15. 根据权利要求14所述的装置,其特征在于,
    所述至少一个残差单元各自包括至少一个卷积层以及至少一个批量归一化层;并且
    在所述至少一个残差单元包括多个残差单元的情况下,所述多个残差单元中的第二残差单元所包括的卷积层的数目和批量归一化层的数目均大于所述多个残差单元中的第一残差单元所包括的所述卷积层的数目和所述批量归一化层的数目。
  16. 根据权利要求14或15所述的装置,其特征在于,
    所述第一子网络还包括输出分割层,
    所述输出分割层用于按照所述第一人脸属性包括的预定义的至少一个分类,对从所述样本图像中提取出的特征信息进行分割,得到针对所述至少一个分类各自关联的至少一个人脸属性参数的预估值。
  17. 根据权利要求10-16任一项所述的装置,其特征在于,所述装置还包括:
    人脸转正处理模块,被配置为对关注图像进行仿射变换,得到人脸转正后的图像;
    图像截取模块,被配置为从所述人脸转正后的图像中截取出目标区域的图像,得到所述目标图像或所述样本图像,其中,所述目标区域至少包括人脸关键点所在的区域。
  18. 根据权利要求17所述的装置,其特征在于,所述目标区域还包括位于所述目标人脸属性所对应的人脸部位外侧的预设面积的区域。
  19. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1至9任一所述的虚拟头像生成方法。
  20. 一种虚拟头像生成装置,其特征在于,所述装置包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现权利要求1至9中任一项所述的虚拟头像生成方法。
PCT/CN2020/074597 2019-05-15 2020-02-10 虚拟头像生成方法及装置、存储介质 WO2020228384A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2020558437A JP2021528719A (ja) 2019-05-15 2020-02-10 仮想アバター生成方法および装置、ならびに記憶媒体
KR1020207015327A KR102443026B1 (ko) 2019-05-15 2020-02-10 가상 아바타 발생 방법 및 장치, 및 저장 매체
SG11202008025QA SG11202008025QA (en) 2019-05-15 2020-02-10 Virtual avatar generation method and apparatus, and storage medium
US16/994,148 US11403874B2 (en) 2019-05-15 2020-08-14 Virtual avatar generation method and apparatus for generating virtual avatar including user selected face property, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910403642.9A CN110111246B (zh) 2019-05-15 2019-05-15 一种虚拟头像生成方法及装置、存储介质
CN201910403642.9 2019-05-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/994,148 Continuation US11403874B2 (en) 2019-05-15 2020-08-14 Virtual avatar generation method and apparatus for generating virtual avatar including user selected face property, and storage medium

Publications (1)

Publication Number Publication Date
WO2020228384A1 true WO2020228384A1 (zh) 2020-11-19

Family

ID=67490224

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/074597 WO2020228384A1 (zh) 2019-05-15 2020-02-10 虚拟头像生成方法及装置、存储介质

Country Status (7)

Country Link
US (1) US11403874B2 (zh)
JP (1) JP2021528719A (zh)
KR (1) KR102443026B1 (zh)
CN (1) CN110111246B (zh)
SG (1) SG11202008025QA (zh)
TW (1) TW202046249A (zh)
WO (1) WO2020228384A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3926533A3 (en) * 2020-11-30 2022-04-27 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for changing hairstyle of human object, device and storage medium

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111246B (zh) 2019-05-15 2022-02-25 北京市商汤科技开发有限公司 一种虚拟头像生成方法及装置、存储介质
CN110728256A (zh) * 2019-10-22 2020-01-24 上海商汤智能科技有限公司 基于车载数字人的交互方法及装置、存储介质
CN111857697A (zh) * 2020-05-29 2020-10-30 北京编程猫科技有限公司 一种基于认知ai的图形化编程实现方法及装置
CN111627086A (zh) * 2020-06-03 2020-09-04 上海商汤智能科技有限公司 一种头像的展示方法、装置、计算机设备及存储介质
CN112529988A (zh) * 2020-12-10 2021-03-19 北京百度网讯科技有限公司 一种头像的生成方法、装置、电子设备、介质及产品
CN112734633A (zh) * 2021-01-07 2021-04-30 京东方科技集团股份有限公司 虚拟发型的替换方法、电子设备及存储介质
CN112907708B (zh) * 2021-02-05 2023-09-19 深圳瀚维智能医疗科技有限公司 人脸卡通化方法、设备及计算机存储介质
CN113808010B (zh) * 2021-09-24 2023-08-11 深圳万兴软件有限公司 无属性偏差的卡通人像生成方法、装置、设备及介质
KR102409988B1 (ko) * 2021-11-03 2022-06-16 주식회사 클레온 딥러닝 네트워크를 이용한 얼굴 변환 방법 및 장치
US20240013464A1 (en) * 2022-07-11 2024-01-11 Samsung Electronics Co., Ltd. Multimodal disentanglement for generating virtual human avatars
CN115908119B (zh) * 2023-01-05 2023-06-06 广州佰锐网络科技有限公司 基于人工智能的人脸图像美颜处理方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402A (zh) * 2016-09-27 2017-03-22 中国科学院自动化研究所 基于多任务学习的卷积神经网络的人脸属性分析方法
CN106652015A (zh) * 2015-10-30 2017-05-10 深圳超多维光电子有限公司 一种虚拟人物头像生成方法及装置
CN109447895A (zh) * 2018-09-03 2019-03-08 腾讯科技(武汉)有限公司 图片生成方法和装置、存储介质及电子装置
CN110111246A (zh) * 2019-05-15 2019-08-09 北京市商汤科技开发有限公司 一种虚拟头像生成方法及装置、存储介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200614094A (en) * 2004-10-18 2006-05-01 Reallusion Inc System and method for processing comic character
US7783135B2 (en) * 2005-05-09 2010-08-24 Like.Com System and method for providing objectified image renderings using recognition information from images
US20070080967A1 (en) * 2005-10-11 2007-04-12 Animetrics Inc. Generation of normalized 2D imagery and ID systems via 2D to 3D lifting of multifeatured objects
KR101558202B1 (ko) * 2011-05-23 2015-10-12 한국전자통신연구원 아바타를 이용한 애니메이션 생성 장치 및 방법
US10095917B2 (en) * 2013-11-04 2018-10-09 Facebook, Inc. Systems and methods for facial representation
CN104091148B (zh) * 2014-06-16 2017-06-27 联想(北京)有限公司 一种人脸特征点定位方法和装置
CN105354869B (zh) * 2015-10-23 2018-04-20 广东小天才科技有限公司 一种将用户真实头部特征化身到虚拟头像上的方法及系统
KR102014093B1 (ko) * 2016-03-28 2019-08-26 영남대학교 산학협력단 얼굴의 특징점 검출 시스템 및 방법
US10339365B2 (en) * 2016-03-31 2019-07-02 Snap Inc. Automated avatar generation
US20180024726A1 (en) * 2016-07-21 2018-01-25 Cives Consulting AS Personified Emoji
KR20180097915A (ko) * 2017-02-24 2018-09-03 트라이큐빅스 인크. 개인 맞춤형 3차원 얼굴 모델 생성 방법 및 그 장치
CN107730573A (zh) * 2017-09-22 2018-02-23 西安交通大学 一种基于特征提取的人物肖像漫画风格化生成方法
CN109345636B (zh) 2018-07-19 2023-10-24 北京永星互动科技有限公司 获取虚拟人脸图的方法和装置
CN109271884A (zh) * 2018-08-29 2019-01-25 厦门理工学院 人脸属性识别方法、装置、终端设备和存储介质
US10650564B1 (en) * 2019-04-21 2020-05-12 XRSpace CO., LTD. Method of generating 3D facial model for an avatar and related device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106652015A (zh) * 2015-10-30 2017-05-10 深圳超多维光电子有限公司 一种虚拟人物头像生成方法及装置
CN106529402A (zh) * 2016-09-27 2017-03-22 中国科学院自动化研究所 基于多任务学习的卷积神经网络的人脸属性分析方法
CN109447895A (zh) * 2018-09-03 2019-03-08 腾讯科技(武汉)有限公司 图片生成方法和装置、存储介质及电子装置
CN110111246A (zh) * 2019-05-15 2019-08-09 北京市商汤科技开发有限公司 一种虚拟头像生成方法及装置、存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3926533A3 (en) * 2020-11-30 2022-04-27 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for changing hairstyle of human object, device and storage medium

Also Published As

Publication number Publication date
KR102443026B1 (ko) 2022-09-13
SG11202008025QA (en) 2020-12-30
TW202046249A (zh) 2020-12-16
US11403874B2 (en) 2022-08-02
JP2021528719A (ja) 2021-10-21
CN110111246A (zh) 2019-08-09
CN110111246B (zh) 2022-02-25
US20200380246A1 (en) 2020-12-03
KR20200132833A (ko) 2020-11-25

Similar Documents

Publication Publication Date Title
WO2020228384A1 (zh) 虚拟头像生成方法及装置、存储介质
US10810742B2 (en) Dynamic and static image processing method and system
TWI714225B (zh) 注視點判斷方法和裝置、電子設備和電腦儲存介質
EP3338217B1 (en) Feature detection and masking in images based on color distributions
US20190138791A1 (en) Key point positioning method, terminal, and computer storage medium
US10452896B1 (en) Technique for creating avatar from image data
TW202044202A (zh) 建立臉部模型的方法、裝置、電腦可讀儲存介質及電子設備
US20170169501A1 (en) Method and system for evaluating fitness between wearer and eyeglasses
US20220148333A1 (en) Method and system for estimating eye-related geometric parameters of a user
CN108629336B (zh) 基于人脸特征点识别的颜值计算方法
CN109344742A (zh) 特征点定位方法、装置、存储介质和计算机设备
CN108829900A (zh) 一种基于深度学习的人脸图像检索方法、装置及终端
WO2022179401A1 (zh) 图像处理方法、装置、计算机设备、存储介质和程序产品
CN110147729A (zh) 用户情绪识别方法、装置、计算机设备及存储介质
WO2022257456A1 (zh) 头发信息识别方法、装置、设备及存储介质
WO2021208767A1 (zh) 人脸轮廓修正方法、装置、设备及存储介质
US9514354B2 (en) Facial analysis by synthesis and biometric matching
CN110598097B (zh) 一种基于cnn的发型推荐系统、方法、设备及存储介质
WO2023124869A1 (zh) 用于活体检测的方法、装置、设备及存储介质
CN115392216B (zh) 一种虚拟形象生成方法、装置、电子设备及存储介质
US11830236B2 (en) Method and device for generating avatar, electronic equipment, medium and product
RU2768797C1 (ru) Способ и система для определения синтетически измененных изображений лиц на видео
KR101382172B1 (ko) 얼굴 영상의 계층적 특징 분류 시스템 및 그 방법
CN114219704A (zh) 动漫形象生成方法及装置
WO2020135286A1 (zh) 整形模拟方法、系统、可读存储介质和设备

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020558437

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20806541

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20806541

Country of ref document: EP

Kind code of ref document: A1