WO2022022043A1 - 人脸图像生成方法、装置、服务器及存储介质 - Google Patents

人脸图像生成方法、装置、服务器及存储介质 Download PDF

Info

Publication number
WO2022022043A1
WO2022022043A1 PCT/CN2021/096715 CN2021096715W WO2022022043A1 WO 2022022043 A1 WO2022022043 A1 WO 2022022043A1 CN 2021096715 W CN2021096715 W CN 2021096715W WO 2022022043 A1 WO2022022043 A1 WO 2022022043A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
face
expression
image
synthesized
Prior art date
Application number
PCT/CN2021/096715
Other languages
English (en)
French (fr)
Inventor
曹辰捷
徐国强
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022022043A1 publication Critical patent/WO2022022043A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, server and storage medium for generating a face image.
  • GAN Generative Adversarial Networks
  • GAN Global System for Mobile Communications
  • the traditional method of locally adjusting the human image through the GAN model has the disadvantage that the generated expression is unstable, especially in the wild scene.
  • the number of expressions generated by traditional methods is very limited, and generally only a few expressions are supported, which cannot meet the user's editing needs for expressions, especially for micro-expressions.
  • the embodiments of the present application provide a method, device, server, and storage medium for generating a face image, which can not only make generating expressions more stable, but also meet users' needs for editing expressions, especially micro-expressions.
  • an embodiment of the present application provides a method for generating a face image, including:
  • a first synthesized facial image is obtained; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label;
  • a face image including the second composite face image is generated.
  • an embodiment of the present application provides a device for generating a face image, including:
  • an acquisition module for acquiring a face image of a target person and an expression label corresponding to the face image
  • a processing module for performing face detection on the face image to obtain a standard face image of the face image
  • a synthesis module configured to use an expression generation model to perform expression synthesis according to the standard facial image and the expression label to obtain a first synthesized facial image; the expression corresponding to the first synthesized facial image is the expression label indication expression;
  • the synthesis module is further configured to perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; the expression corresponding to the second synthesized face image is the Describe the expression indicated by the expression tag;
  • the processing module is further configured to generate a face image including the second synthesized face image.
  • an embodiment of the present application provides a server, including a processor and a memory, where the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions,
  • the processor is configured to invoke the program instructions to perform the following methods:
  • a first synthesized facial image is obtained; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label;
  • a face image including the second composite face image is generated.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following method:
  • a first synthesized facial image is obtained; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label;
  • a face image including the second composite face image is generated.
  • the embodiment of the present application can make the generated expressions more stable through expression synthesis and face synthesis, and can satisfy the user's need for expressions, especially Editing needs of micro-expressions.
  • FIG. 1A is a schematic flowchart of a method for generating a face image according to an embodiment of the present application
  • 1B is a schematic diagram of an image processing process for a face image provided by an embodiment of the application.
  • FIG. 1C is a schematic diagram of a face synthesis process provided by an embodiment of the present application.
  • FIG. 2A is a schematic flowchart of another method for generating a face image according to an embodiment of the present application
  • 2B is a schematic diagram of an image adjustment process for a face image provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an apparatus for generating a face image according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present application.
  • the technical solution of the present application may relate to the technical field of artificial intelligence, for example, it can be applied to scenarios such as image processing to generate a face image, so that the generated expression is more stable, thereby promoting the construction of a smart city.
  • the data involved in this application such as various images and/or labels, may be stored in a database, or may be stored in a blockchain, which is not limited in this application.
  • FIG. 1A is a schematic flowchart of a method for generating a face image according to an embodiment of the present application.
  • the method can be applied to a server, and the server can be a server or a server cluster. Specifically, the method can include the following steps:
  • the expression label may be a feature vector formed by the value of each expression unit in the at least one expression unit, that is, the expression label may include the value of each expression unit in the at least one expression unit.
  • Expression units can be used to describe expressions.
  • expression units may be referred to as action units (Action Units, AU).
  • the reference to at least one may be one or more.
  • the at least one expression unit includes, but is not limited to, at least one of the following 17 expression units: raised inner eyebrow, raised outer eyebrow, frown, raised eyelid, squinted, locked eyelid, wrinkled nose, raised upper lip, raised corner of mouth, Cheek muscles, mouth corners pulled down, chin raised, lip flattened, lip locked, mouth open, chin drooping, eyes closed.
  • the expression unit includes but is not limited to the expression units listed above.
  • the server may provide a plurality of emoticon images for the user terminal.
  • the user can use the user terminal to upload the face image of the target person and the target facial expression image among the multiple facial expression images to the server.
  • the target expression image can be any expression image among multiple expression images, and the server can receive the face image and the target expression image of the target person uploaded by the user using the user terminal, and according to the preset correspondence between the expression image and the expression label, The expression label corresponding to the target expression image is determined as the expression label corresponding to the face image.
  • the user can click an image synthesis button, and the user terminal can send an image synthesis instruction to the server in response to the click operation of the synthesis button, and the image synthesis instruction can carry the face image of the target person and the target expression image.
  • the server may acquire the face image and the target expression image of the target person carried by the image synthesis instruction when receiving the image synthesis instruction.
  • the server may provide a plurality of expression identification information for the user terminal.
  • Each emoticon identification information may be used to identify an emoticon.
  • the user can use the user terminal to upload the face image of the target person and the target expression identification information among the multiple expression identification information to the server.
  • the target expression identification information may be any expression identification information among the plurality of expression identification information.
  • the server can receive the face image of the target person and the target expression identification information uploaded by the user (which can be the name of the target expression, such as sadness, anger, and happiness), and can determine the corresponding relationship between the preset expression image and the expression identification information.
  • the expression label corresponding to the target expression identification information is used as the expression label corresponding to the face image.
  • the user can click an image synthesis button, and the user terminal can respond to the click operation of the synthesis button and send an image synthesis instruction to the server, where the image synthesis instruction carries the face image of the target person and the target expression identification information , the server may obtain the face image of the target person and the target expression identification information carried by the image synthesis instruction when receiving the image synthesis instruction.
  • facial expression synthesis can be achieved by setting expression-related data.
  • the process of finely editing expressions can also be achieved by setting the value of the expression unit.
  • the server may provide the user terminal with setting items of each expression unit in the plurality of expression units.
  • the user can set the value of each expression unit based on the setting items of each expression unit, and can use the user terminal to upload the face image of the target person and the value of each expression unit to the server.
  • the server may receive the face image of the target person uploaded by the user and the value of each expression unit, and may construct an expression label corresponding to the face image according to the value of each expression unit.
  • the user can click an image synthesis button, and the user terminal can respond to the click operation of the synthesis button, and send an image synthesis instruction to the server, and the image synthesis instruction carries the face image of the target person and each expression unit.
  • the server When receiving the image synthesis instruction, the server can obtain the face image of the target person and the value of each expression unit carried by the image synthesis instruction.
  • the user may also set the value of some expression units in multiple expression units, which is not limited in this embodiment of the present application .
  • the server may perform face detection on the face image to obtain a standard face image of the face image.
  • the face image described here is the face image of the target person.
  • the standard face image described here is the standard face image of the face image of the target person.
  • the server may call an image detection library, such as a dlib library, to perform face detection on the face image to obtain a standard face image of the face image.
  • an image detection library such as a dlib library
  • the server can specifically call the image detection library to perform face detection on the face image, obtain the original face image of the face image, and perform face alignment on the original face image to obtain the face The standard face image of the image. That is, the server can intercept and correct the face image through the image detection library, so as to obtain a standard face image corresponding to the face image.
  • the original face image refers to the face image obtained after calling the image detection library to perform face detection on the face image.
  • the server performs face alignment on the original face image to obtain a standard face image of the face image.
  • the specific manner is as follows: the server determines the coordinates of a plurality of key points included in the original face image, and Rigid transformation is performed on the original face image based on the coordinates of the multiple key points and the coordinates of the reference key point corresponding to each of the multiple key points, and the transformed face image is obtained as the standard face corresponding to the face image. part image.
  • rigid transformation can be called global transformation.
  • Rigid transformations can include translation, rotation, scaling, and more.
  • Rigid transformations are only transformations of position and orientation, and do not change the shape of the face.
  • the effect diagram of using the above-mentioned image adjustment process can refer to S2 in FIG. 1B .
  • the server may use the standard face image and the expression label as input data of the expression generation model, and perform expression synthesis through the expression generation model to obtain the first synthesized face image.
  • the expression corresponding to the first synthesized face image may be the expression indicated by the expression label.
  • the first synthetic face image may have the facial features of the standard face image.
  • the standard face image mentioned here refers to the standard face image corresponding to the face image of the target person.
  • the expression label described here is the expression label corresponding to the face image of the target person.
  • the expression generation model can be obtained by training a generative adversarial network model.
  • the server may specifically perform expression synthesis through the generator of the expression generation model to obtain the first synthesized face image.
  • the expression generation model when the expression generation model is obtained by training a generative adversarial network model, the expression generation model can be obtained in the following manner: the server obtains a training data set, the training data set includes a plurality of face images, the person The face image carries the corresponding expression label; the server performs face detection on each face image in the training data set to obtain a face image set, and the face image set includes the respective standard face images corresponding to the face images; the server The generative adversarial network model is trained by using each standard face image in the face image set and the expression label carried by the face image corresponding to the standard face image, and the trained generative adversarial network model is obtained as an expression generation model.
  • the multiple face images mentioned above may be divided into at least one face image set, for example, may be divided into a first face image set, a second face image set, a third face image set, and so on.
  • the first face image set such as a first high-quality face image set, such as FFHQ
  • the second face image set such as a second high-quality face image set, such as celebA-HQ
  • the third face image set such as a face expression image set, such as Emotion Net.
  • the first face image data set may be a data set formed according to face images selected from the first high-quality face image set.
  • the second face image data set may be a data set formed according to face images selected from the second high-quality face image set.
  • the third face image data set may be a data set composed of face images selected from the face expression image set.
  • the expression label corresponding to the face image may be obtained in the following manner: the server performs algorithm labeling on the face image to obtain the expression label corresponding to the face image.
  • the server can call the face recognition tool library to label each face image in the training data set, obtain the first label data corresponding to each face image, and according to the first label data corresponding to each face image Labeling data, and obtaining second labeling data corresponding to each of the face images as expression labels corresponding to each of the face images.
  • the face recognition tool library may be an open source face multi-function tool library, such as the Open Face open source face multi-function tool library. The above process can realize automatic expression label labeling, which improves the labeling efficiency of expression labels.
  • the server invokes the face recognition tool library to mark each face image in the training data set, and the process of obtaining the first marked data corresponding to each face image may be the server using a graphics processor (Graphics Processing Unit, GPU) obtains the coordinates of each key point in the multiple key points included in each face image in the training data set, and compares each face image and the multiple key points included in each face image with the coordinates of each key point.
  • the coordinates are input into the face recognition tool library, and the face recognition tool library is used to mark each face image and the coordinates of each key point in the multiple key points included in each face image, and obtain the corresponding first of each face image. 1.
  • Annotation data is used to mark each face image and the coordinates of each key point in the multiple key points included in each face image, and obtain the corresponding first of each face image.
  • the server obtains the second annotation data corresponding to each face image according to the first annotation data corresponding to each face image as the expression label corresponding to each face image. Specifically, the server Determining the first label data corresponding to each face image as the second label data corresponding to each face image, and using the second label data corresponding to each face image as each corresponding face image emoji tags.
  • the server obtains the second annotation data corresponding to the face images according to the first annotation data corresponding to the face images as the expression tags corresponding to the face images.
  • the process may also be as follows: The server normalizes the first label data corresponding to each face image, and obtains the second label data corresponding to each face image as the expression label corresponding to each face image. Since the value of the expression unit included in the first annotation data corresponding to each face image has a large value range, generally 0 to 5, an excessively large value may result in a face image with an exaggerated expression.
  • the above-mentioned normalization processing method can be used to reduce the value of the expression unit in the first labeled data, so that the value range is controlled within a smaller interval, such as 0-1.
  • the server performs face detection on each face image in the training data set
  • the process of obtaining the face image set may be that the server calls an image detection library to perform face detection on each face image in the training data set, Obtain the original face image corresponding to each face image, and perform face alignment on the original face image corresponding to each face image, obtain the standard face image corresponding to each face image, and generate a face image including each face image.
  • the method of performing face alignment on the original face images corresponding to each face image to obtain the standard face image corresponding to each face image can refer to the aforementioned face image (the face of the target person)
  • the method of performing face alignment on the original face image of the image) to obtain the standard face image of the face image will not be repeated in this embodiment of the present application. Since generative adversarial network models are generally sensitive, correcting face images in this way during the training phase can improve model stability.
  • the server uses each standard face image in the face image set and the expression label carried by the face image corresponding to the standard face image to train the generative adversarial network model, and obtains the trained generative adversarial network
  • the general process of the model as an expression generation model can be as follows: the server randomly selects two standard face images from the face image collection, such as face_a and face_b, and obtains the value of the expression unit of face_a, that is, the AU coefficient, such as au_a, and obtains The value of the expression unit of face_b, that is, the AU coefficient, such as au_b (face_a and face_b may not be the same person); the server uses the generator included in the generative adversarial network model to generate fake_b according to au_b and face_a; the server generates fake_b by generating the adversarial network model.
  • the generator trains according to fake_b and face_a; the server restores fake_b to rec_a with au_a through the generator, and calculates the restoration loss of rec_a and face_a, that is, L1-loss; the server uses L1-loss to train the model until the model converges, and the trained Generative Adversarial Network Models as Expression Generation Models.
  • a feature parameter corresponding to fake_b such as a mask parameter
  • the value range of the mask parameter can be 0 to 1.
  • the size of the mask parameter can reflect the importance of the region.
  • S104 Perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image.
  • the server may further perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image.
  • the expression corresponding to the second synthetic face image may be the expression indicated by the expression label.
  • the second synthetic face image may have the facial features of the standard face image.
  • the server may further obtain characteristic parameters corresponding to the first synthesized facial image.
  • the server may perform face synthesis according to the standard face image, the first synthesized face image, and the characteristic parameters corresponding to the first synthesized face image to obtain a second synthesized face image.
  • the process of face synthesis may be as follows:
  • Target map Figure 1*mask+ Figure 2*(1-mask) formula 1.1;
  • the aforementioned first synthetic face image can be as shown in Fig. 1 in formula 1.1, the aforementioned standard face image as shown in Fig. 2 in formula 1.1, and the aforementioned second synthetic face image as formula 1.1 target graph in .
  • the upper left image of FIG. 1C is a standard face image
  • the lower left image of FIG. 1C is the first synthesized face image
  • the right image of FIG. 1C is the second synthesized face image.
  • the above face synthesis method uses the image attention mechanism to make the generated micro-expressions and face images synthesized in a reasonable proportion, so that the generated results are more stable. And this method can edit facial expressions in natural environment without considering the background and other factors.
  • the server may generate a face image including the second composite face image according to the face image of the target person.
  • This process in simple terms, is equivalent to the server replacing the original face image of the target person's face image with the second synthetic face image, so as to obtain a face image including the second synthetic face image, that is, the server will The face image including the second synthesized face image is restored to the position of the original face image of the face image of the target person.
  • the server may output a face image including the second synthetic face image, for example, may send the face image including the second synthetic face image to a user terminal for viewing by the user. For the user, what may be seen is the face image of the target person after the expression changes.
  • spectral normalization spectral_norm to limit the Lipschitz constraint (L constraint for short) of the model.
  • the parameter w of the discriminator network can be replaced by In order to make the generative adversarial network model meet the L constraint, so as to improve the generalization ability and stability of the model.
  • 2 is the spectral norm of w.
  • the loss function used by the discriminator can be hinge-loss, which can be used in the binary classification process of the discriminator.
  • the generator joins the residual network.
  • the introduction of residual network can make the generated face image better restore face details.
  • the generator can use the residual network of each convolutional layer to convolve the input features to obtain the output features. That is, residual networks can be used to perform convolution operations on input features to obtain output features.
  • the generator can refer to styleGAN.
  • AdaIN Adaptive Instance Normalization
  • uniform random noise are introduced to perform AdaIN processing on the output features obtained after the convolution operation. to improve the quality of the generated image.
  • the process of AdaIN processing can refer to the following formula:
  • x represents the input feature
  • y represents the value of the expression unit, that is, the AU coefficient.
  • the generator introduces exponential decay, and the decay coefficient can be 0.9995. That is, each iteration of the generator, the parameters of the generator are updated, and each step is: old parameter * 0.9995 + new parameter * 0.0005. Introducing exponential decay makes generator updates slower and more stable.
  • the server can perform image processing on the face image of the target person to obtain a standard face image of the face image, and use the expression generation model to generate a model according to the standard face image and the face image.
  • the expression tag corresponding to the image is synthesized by expression to obtain a first synthesized face image, so that face synthesis is performed according to the standard face image and the first synthesized face image, and a second synthesized face image is obtained.
  • the face image of the face image is synthesized. This process can not only make the generated expression more stable, but also make the final generated face image more stable, and can also meet the user's editing needs for expressions, especially micro-expressions.
  • FIG. 2A is a schematic flowchart of another method for generating a face image according to an embodiment of the present application.
  • the method can be applied to a server, the server can be a server or a server cluster, and the method can include the following steps:
  • steps S201-S205 reference may be made to steps S101-S105, which are not repeated in this embodiment of the present application.
  • the server may process the face image including the second synthetic face image through an Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) model, and the output resolution is higher. high face image, and then achieve the purpose of optimizing the face image.
  • ESRGAN Enhanced Super-Resolution Generative Adversarial Networks
  • This process uses super-resolution technology to amplify the generated face images in high-definition, and improves the resolution of generated expressions without affecting the training of the generative adversarial network model.
  • the above process can increase the resolution of the face image from 128x128 to 512x512 on the premise of effectively preserving the clarity of the face.
  • the server may specifically use the face image including the second synthetic face image as the input data of the ESRGAN model, and adjust the face image including the second synthetic face image by using the ESRGAN model to obtain the adjusted face image.
  • the ESRGAN model can be trained by artificially perturbing some high-quality face images.
  • the left image of FIG. 2B shows a human face image including the second synthetic face image
  • the right image of FIG. 2B shows the adjusted face image including the second synthetic face image. That is, the server can use the face image shown in the left image of FIG. 2B as the input data of the ESRGAN model, and adjust the face image shown in the left image of FIG. 2B through the ESRGAN model to obtain the face image shown in the right image of FIG. 2B .
  • the server may generate first index information of the face image including the second composite face image, and write the first index information into the blockchain.
  • Writing the first index information into the blockchain can facilitate the indexing of the corresponding face image, and can also effectively prevent the corresponding face image from being used for illegal purposes.
  • the first index information refers to the index information of the face image including the second synthesized face image.
  • the first index information may be, for example, a hash value obtained by the server performing a hash calculation on the face image or signature information obtained by the server performing signature processing on the face image.
  • the server may generate the adjusted second index information of the face image including the second composite face image, and write the second index information into the blockchain. Write the second index information to the blockchain.
  • the second index information refers to the adjusted index information of the face image including the second synthesized face image.
  • the second index information may be, for example, a hash value obtained by the server performing a hash calculation on the face image or signature information obtained by the server performing signature processing on the face image.
  • this application can also be used for the construction of smart cities.
  • the face images generated by deep learning technology are becoming more and more realistic, which undoubtedly brings challenges to the face recognition process.
  • the server can also use the enhanced super-resolution generative adversarial network model for the face image including the second synthetic face image.
  • the face image is adjusted to achieve the purpose of optimizing the face image, so that the image quality of the output face image is higher.
  • FIG. 3 is a schematic structural diagram of a face generating apparatus according to an embodiment of the present application.
  • the apparatus can be applied to a server.
  • the apparatus may include:
  • the obtaining module 301 is configured to obtain a face image of a target person and an expression label corresponding to the face image.
  • the processing module 302 is configured to perform face detection on the face image to obtain a standard face image of the face image.
  • the synthesis module 303 is configured to use an expression generation model to perform expression synthesis according to the standard facial image and the expression label to obtain a first synthesized facial image; the expression corresponding to the first synthesized facial image is the expression label Indicated expression.
  • the synthesis module 303 is further configured to perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; the expression corresponding to the second synthesized face image is the The emoji indicated by the emoji tag.
  • the processing module 302 is further configured to generate a face image including the second synthesized face image.
  • the processing module 302 is further configured to, after generating the face image including the second synthetic face image, use an enhanced super-resolution generative adversarial network model for the second synthetic face image. Perform image adjustment on the face image of the synthesized face image to obtain the adjusted face image including the second synthesized face image, and output the adjusted face image including the second synthesized face image.
  • the processing module 302 performs face detection on the face image to obtain a standard face image of the face image, specifically calling an image detection library to perform human face detection on the face image. face detection to obtain an original face image of the face image; face alignment is performed on the original face image to obtain a standard face image of the face image.
  • the processing module 302 is further configured to obtain training data before obtaining a first synthesized face image by using an expression generation model to perform expression synthesis according to the standard face image and the expression label to obtain a first synthesized face image
  • the training data set includes a plurality of face images, and the face images carry corresponding expression labels; face detection is performed on each face image in the training data set to obtain a face image set, the face
  • the facial image set includes standard facial images corresponding to each of the facial images; the facial image set is generated by using each standard facial image in the facial image set and the expression tag pair carried by the facial image corresponding to the standard facial image.
  • the adversarial network model is trained, and the trained generative adversarial network model is obtained as an expression generation model.
  • the processing module 302 is further configured to call a face recognition tool library to label each face image in the training data set, and obtain the first label corresponding to each face image data; according to the first annotation data corresponding to each face image, obtain the second annotation data corresponding to each face image as the expression label corresponding to each face image.
  • the processing module 302 obtains, according to the first annotation data corresponding to the face images, the second annotation data corresponding to the face images as the respective first annotation data of the face images.
  • Corresponding expression labels specifically, normalizing the first labeling data corresponding to each of the face images, and obtaining the second labeling data corresponding to each of the face images as the corresponding corresponding to each of the face images. emoji tags.
  • the synthesis module 303 is further configured to obtain characteristic parameters corresponding to the first synthesized face image after the expression synthesis is performed according to the standard face image and the expression label by using an expression generation model .
  • the synthesis module 303 performs face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image, specifically according to the standard face image
  • the image, the first synthesized face image, and the characteristic parameters corresponding to the first synthesized face image are subjected to face synthesis to obtain a second synthesized face image.
  • the face image generation device can perform image processing on the face image of the target person to obtain a standard face image of the face image, and use the expression generation model to generate a model according to the standard face image.
  • the expression tag corresponding to the face image is synthesized to obtain a first synthesized facial image, so as to perform facial synthesis according to the standard facial image and the first synthesized facial image to obtain a second synthesized facial image, and
  • the process of generating a face image including the second composite face image can not only make the generated expression more stable, and make the final generated face image more stable, but also meet the user's editing requirements for expressions, especially micro-expressions.
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present application.
  • the server includes a processor and memory.
  • the server may further include an input device and/or an output device.
  • the server described in this embodiment may include: one or more processors 100 , one or more input devices 200 , one or more output devices 300 and a memory 400 .
  • the processor 100, the input device 200, the output device 300, and the memory 400 may be connected through a bus.
  • the input device 200 and the output device 300 are optional devices in this embodiment of the present application.
  • the input device 200 and the output device 300 may be standard wired or wireless communication interfaces.
  • the processor 100 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC) , Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 400 may be a high-speed RAM memory, or a non-volatile memory, such as a disk memory.
  • the memory 400 is used to store a set of program codes, and the input device 200 , the output device 300 and the processor 100 can call the program codes stored in the memory 400 . specifically:
  • the processor 100 is configured to obtain a face image of a target person and an expression label corresponding to the face image; perform face detection on the face image to obtain a standard face image of the face image; generate a face image by using an expression
  • the model performs expression synthesis according to the standard face image and the expression label to obtain a first synthesized face image; the expression corresponding to the first synthesized face image is the expression indicated by the expression label; according to the standard face
  • the facial image and the first synthesized facial image are synthesized to obtain a second synthesized facial image; the expression corresponding to the second synthesized facial image is the expression indicated by the expression label; the generation includes the second synthesized facial image.
  • the processor 100 is further configured to, after generating the face image including the second synthetic face image, use an enhanced super-resolution generative adversarial network model to analyze the face image including the second synthetic face image
  • the adjusted face image is adjusted to obtain the adjusted face image including the second synthesized face image; the adjusted face image including the second synthesized face image is output through the output device 300 .
  • the processor 100 performs face detection on the face image to obtain a standard face image of the face image, specifically calling an image detection library to perform face detection on the face image, and obtains The original face image of the face image; face alignment is performed on the original face image to obtain the standard face image of the face image.
  • the processor 100 is further configured to obtain a training data set before obtaining a first synthesized face image by using an expression generation model to perform expression synthesis according to the standard face image and the expression label, and the
  • the training data set includes a plurality of face images, and the face images carry corresponding expression labels; face detection is performed on each face image in the training data set to obtain a face image set, and the face image set includes The standard face images corresponding to each of the face images; the generative adversarial network model is performed using the standard face images in the face image set and the expression tags carried by the face images corresponding to the standard face images.
  • the trained generative adversarial network model is obtained as an expression generation model.
  • the processor 100 is further configured to call a face recognition tool library to label each face image in the training data set, and obtain the first label data corresponding to each face image. ; According to the respective first labeling data corresponding to the respective face images, obtain the respective second labeling data corresponding to the respective face images as the respective expression labels corresponding to the respective face images.
  • the processor 100 obtains the second annotation data corresponding to the face images according to the first annotation data corresponding to the face images as the expression labels corresponding to the face images , specifically performing normalization processing on the first labeling data corresponding to each of the face images to obtain the second labeling data corresponding to each of the face images as the expression labels corresponding to each of the face images.
  • the processor 100 is further configured to obtain feature parameters corresponding to the first synthesized face image after performing expression synthesis according to the standard facial image and the expression label by using an expression generation model.
  • the processor 100 performs face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image, specifically according to the standard face image, the The first synthesized face image and the characteristic parameters corresponding to the first synthesized face image are subjected to face synthesis to obtain a second synthesized face image.
  • the processor 100 , the input device 200 , and the output device 300 described in the embodiments of the present application may perform the implementations described in the embodiments of FIG. 1A and FIG. 2A , and may also perform the implementations described in the embodiments of the present application. The implementation manner is not repeated here.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the methods in the foregoing embodiments can be implemented, or the computer program is executed by the processor When executed, the functions of each module of the apparatus in the above embodiment are realized.
  • a computer program when executed by a processor can implement the following methods:
  • a first synthesized facial image is obtained; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label;
  • a face image including the second composite face image is generated.
  • the storage medium involved in the present application such as a computer-readable storage medium, may be non-volatile or volatile.
  • Each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of sampling hardware or in the form of sampling software function modules.
  • the computer-readable storage medium can be volatile or non-volatile.
  • the computer storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), and the like.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; Use the created data, etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例公开了一种人脸图像生成方法、装置、服务器及存储介质,该方法包括:获取目标人物的人脸图像以及该人脸图像对应的表情标签;对该人脸图像进行人脸检测,得到该人脸图像的标准脸部图像;利用表情生成模型根据该标准脸部图像以及该表情标签进行表情合成,得到第一合成脸部图像;根据该标准脸部图像以及该第一合成脸部图像进行脸部合成,得到第二合成脸部图像;生成包括该第二合成脸部图像的人脸图像。采用本申请,可以使得生成表情更加稳定,并且能够满足用户对表情,尤其是微表情的编辑需求。本申请还涉及区块链技术,可将包括第二合成脸部图像的人脸图像的索引信息写入区块链中,同时本申请还涉及人工智能领域中的图像处理技术。

Description

人脸图像生成方法、装置、服务器及存储介质
本申请要求于2020年7月27日提交中国专利局、申请号为202010731169.X,发明名称为“人脸图像生成方法、装置、服务器及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种人脸图像生成方法、装置、服务器及存储介质。
背景技术
深度学习作为一个近年来热门的技术研究领域,相关的应用也层出不穷。例如,生成对抗网络(Generative Adversarial Networks,GAN)的应用一般有生成不同风格图像,对图像进行补全,生成高清数据以用于对其它机器学习模型的数据增强,等等。
GAN广泛而有趣的应用以及较高的技术门槛,使得GAN成为了各大科技公司和学府研究的焦点。这里,我们关注GAN对表情生成方面的应用。发明人意识到,传统的通过GAN模型对人物图像进行局部调整的方法,有着生成表情不稳定,在野外场景下尤其不稳定的缺点。此外,传统方法的表情生成数量是十分有限的,一般只支持少数几种表情,无法满足用户对表情,尤其对微表情的编辑需求。
发明内容
本申请实施例提供了一种人脸图像生成方法、装置、服务器及存储介质,不仅可以使得生成表情更加稳定,还可以满足用户对表情,尤其对微表情的编辑需求。
第一方面,本申请实施例提供了一种人脸图像生成方法,包括:
获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
生成包括所述第二合成脸部图像的人脸图像。
第二方面,本申请实施例提供了一种人脸图像生成装置,包括:
获取模块,用于获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
处理模块,用于对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
合成模块,用于利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
所述合成模块,还用于根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
所述处理模块,还用于生成包括所述第二合成脸部图像的人脸图像。
第三方面,本申请实施例提供了一种服务器,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:
获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
生成包括所述第二合成脸部图像的人脸图像。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:
获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
生成包括所述第二合成脸部图像的人脸图像。
本申请实施例相较于现有技术基于GAN局部调整人脸图像导致生成表情不稳定的过程,本申请通过表情合成以及脸部合成能够使得生成表情更加稳定,并且能够满足用户对表情,尤其是微表情的编辑需求。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A为本申请实施例提供的一种人脸图像生成方法的流程示意图;
图1B为本申请实施例提供的一种对人脸图像的图像处理过程的示意图;
图1C为本申请实施例提供的一种脸部合成过程的示意图;
图2A为本申请实施例提供的另一种人脸图像生成方法的流程示意图;
图2B为本申请实施例提供的一种对人脸图像的图像调整过程的示意图;
图3为本申请实施例提供的一种人脸图像生成装置的结构示意图;
图4为本申请实施例提供的一种服务器的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
本申请的技术方案可涉及人工智能技术领域,如可应用于图像处理等场景中,以生成人脸图像,使得生成表情更加稳定,从而推动智慧城市的建设。可选的,本申请涉及的数据如各种图像和/或标签等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。
请参阅图1A,为本申请实施例提供的一种人脸图像生成方法的流程示意图。该方法可以应用于服务器,该服务器可以一个服务器或服务器集群,具体地,该方法可以包括以下步骤:
S101、获取目标人物的人脸图像以及所述人脸图像对应的表情标签。
其中,表情标签可以为由至少一个表情单元中每个表情单元的值构成的特征向量,即该表情标签可以包括至少一个表情单元中每个表情单元的值。表情单元能够用于描述表情。在一个实施例中,表情单元可以称为动作单元(Action Units,AU)。所指的至少一个可以为一个或多个。其中,至少一个表情单元包括但不限于以下17种表情单元中的至少一个:内眉上扬、外眉上扬、皱眉、眼皮上扬、眯眼、眼帘锁紧、皱鼻、上唇上提、嘴角上扬、脸颊肌、嘴角下拉、下巴上扬、唇部平压、唇部锁紧、张嘴、下颚下垂、闭眼。所述的表情单元包括但不限于上述列举的表情单元。
在一个实施例中,服务器可以为用户终端提供多个表情图像。用户可以使用用户终端 上传目标人物的人脸图像以及多个表情图像中的目标表情图像至服务器。目标表情图像可以为多个表情图像中的任一表情图像,服务器可以接收用户使用用户终端上传的目标人物的人脸图像以及目标表情图像,并根据预设的表情图像与表情标签的对应关系,确定该目标表情图像对应的表情标签作为该人脸图像对应的表情标签。在一个应用场景中,用户可以点击图像合成按钮,用户终端可以响应于对该合成按钮的点击操作,发送图像合成指令至服务器,该图像合成指令可以携带该目标人物的人脸图像以及目标表情图像,服务器可以在接收到该图像合成指令时,获取该图像合成指令携带的该目标人物的人脸图像以及目标表情图像。
在一个实施例中,服务器可以为用户终端提供多个表情标识信息。每个表情标识信息可以用于标识一种表情。用户可以使用用户终端上传目标人物的人脸图像以及多个表情标识信息中的目标表情标识信息至服务器。其中,目标表情标识信息可以为该多个表情标识信息中的任一表情标识信息。服务器可以接收用户上传的目标人物的人脸图像以及目标表情标识信息(可以为目标表情名称,如悲伤、愤怒、高兴),并可以根据预设的表情图像与表情标识信息的对应关系,确定该目标表情标识信息对应的表情标签作为该人脸图像对应的表情标签。在一个应用场景中,用户可以点击图像合成按钮,用户终端可以响应于对该合成按钮的点击操作,发送图像合成指令至服务器,该图像合成指令携带该目标人物的人脸图像以及目标表情标识信息,服务器可以在接收到该图像合成指令时,获取该图像合成指令携带的该目标人物的人脸图像以及目标表情标识信息。
前述两种方式均可以通过设置表情相关数据来实现人脸表情合成,本申请实施例除了采用这两种方式之外,还可以通过设置表情单元的值达到对表情的精细化编辑的过程。
在一个实施例中,服务器可以为用户终端提供多个表情单元中每个表情单元的设置项。用户可以基于每个表情单元的设置项设置每个表情单元的值,并可以使用用户终端上传目标人物的人脸图像以及每个表情单元的值至服务器。服务器可以接收用户上传的目标人物的人脸图像以及每个表情单元的值,并可以根据每个表情单元的值构建该人脸图像对应的表情标签。在一个应用场景中,用户可以点击图像合成按钮,用户终端可以响应于对该合成按钮的点击操作,发送图像合成指令至服务器,该图像合成指令携带该目标人物的人脸图像以及每个表情单元的值,服务器可以在接收到该图像合成指令时,获取该图像合成指令携带的该目标人物的人脸图像以及每个表情单元的值。此处,根据实际的应用场景的不同,用户除了可以采用上述方式设置每个表情单元的值之外,也可以设置多个表情单元中部分表情单元的值,本申请实施例对其不做限制。
S102、对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像。
为了得到较为标准的脸部图像,服务器可以对人脸图像进行人脸检测,得到该人脸图像的标准脸部图像。其中,此处所述的人脸图像为目标人物的人脸图像。此处所述的标准脸部图像为目标人物的人脸图像的标准脸部图像。
在一个实施例中,服务器可以调用图像检测库,如dlib库来对该人脸图像进行人脸检测,得到该人脸图像的标准脸部图像。
在一个实施例中,服务器具体可以调用图像检测库对该人脸图像进行人脸检测,得到该人脸图像的原始脸部图像,并对该原始脸部图像进行人脸对齐,得到该人脸图像的标准脸部图像。即,服务器可以通过图像检测库来对该人脸图像进行截取和矫正,从而得到该人脸图像对应的标准脸部图像。其中,所述的原始脸部图像指在调用图像检测库对该人脸图像进行人脸检测后得到的脸部图像。
在一个实施例中,服务器对该原始脸部图像进行人脸对齐,得到该人脸图像的标准脸部图像的具体方式如下:服务器确定该原始脸部图像包括的多个关键点的坐标,并基于多个关键点的坐标以及多个关键点中每个关键点对应的基准关键点的坐标对该原始脸部图像 进行刚性变换,得到变换后的脸部图像作为该人脸图像对应的标准脸部图像。其中,刚性变换可以称为全局变换。刚性变换可以包括平移、旋转、缩放,等等。刚性变换只是位置和朝向的变换,不会使得脸部形状发生改变。
例如,请参阅图1B,当多个关键点为图1B中S1所示的5个关键点时,采用上述图像调整过程的效果图可以参见图1B的S2。
S103、利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像。
本申请实施例中,服务器可以将该标准脸部图像以及该表情标签作为表情生成模型的输入数据,通过该表情生成模型来进行表情合成,得到第一合成脸部图像。该第一合成脸部图像对应的表情可以为该表情标签指示的表情。并且,该第一合成脸部图像可以具有该标准脸部图像的脸部特征。其中,此处所述的标准脸部图像是指目标人物的人脸图像对应的标准脸部图像。此处所述的表情标签为目标人物的人脸图像对应的表情标签。
在一个实施例中,所述的表情生成模型可以通过对生成对抗网络模型训练得到。当所述的表情生成模型是通过对生成对抗网络模型训练得到时,服务器具体可以通过表情生成模型的生成器来进行表情合成,得到第一合成脸部图像。
在一个实施例中,当表情生成模型通过对生成对抗网络模型训练得到时,该表情生成模型具体可以采用如下方式得到:服务器获取训练数据集,该训练数据集包括多张人脸图像,该人脸图像携带对应的表情标签;服务器对该训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合,该脸部图像集合包括该各人脸图像各自对应的标准脸部图像;服务器利用该脸部图像集合中的各标准脸部图像以及该标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型。
其中,上述提及的多张人脸图像可以分为至少一个人脸图像集,例如,可以分为第一人脸图像集、第二人脸图像集和第三人脸图像集,等等。其中,该第一人脸图像集,如可以为第一高质量人脸图像集合,如FFHQ,该第二人脸图像集,如可以为第二高质量人脸图像集合,如celebA-HQ,第三人脸图像集,如可以为人脸表情图像集合,如Emotion Net。或,该第一人脸图像数据集可以为根据从第一高质量人脸图像集合中筛选出的人脸图像构成的数据集。该第二人脸图像数据集可以为根据从第二高质量人脸图像集合中筛选出的人脸图像构成的数据集。第三人脸图像数据集可以为根据从人脸表情图像集筛选出的人脸图像构成的数据集。
在一个实施例中,人脸图像对应的表情标签可以通过如下方式得到:服务器对人脸图像进行算法标注,得到人脸图像对应的表情标签。具体地,服务器可以调用人脸识别工具库对该训练数据集中的各人脸图像进行标注,得到该各人脸图像各自对应的第一标注数据,并根据该各人脸图像各自对应的第一标注数据,获得该各人脸图像各自对应的第二标注数据作为该各人脸图像各自对应的表情标签。其中,该人脸识别工具库可以为开源人脸多功能工具库,如Open Face开源人脸多功能工具库。上述过程能够实现自动化的表情标签标注,提升了对表情标签的标注效率。
在一个实施例中,服务器调用人脸识别工具库对该训练数据集中的各人脸图像进行标注,得到该各人脸图像各自对应的第一标注数据的过程可以为服务器通过图形处理器(Graphics Processing Unit,GPU)获取训练数据集中各人脸图像包括的多个关键点中每个关键点的坐标,并将各人脸图像以及各人脸图像包括的多个关键点中每个关键点的坐标输入人脸识别工具库中,通过人脸识别工具库根据各人脸图像以及各人脸图像包括的多个关键点中每个关键点的坐标进行标注,得到各人脸图像各自对应的第一标注数据。
在一个实施例中,服务器根据该各人脸图像各自对应的第一标注数据,获得该各人脸 图像各自对应的第二标注数据作为该各人脸图像各自对应的表情标签的过程具体为服务器将该各人脸图像各自对应的第一标注数据确定为该各人脸图像各自对应的第二标注数据,并将该各人脸图像各自对应的第二标注数据作为该各人脸图像各自对应的表情标签。
在一个实施例中,服务器根据该各人脸图像各自对应的第一标注数据,获得该各人脸图像各自对应的第二标注数据作为该各人脸图像各自对应的表情标签的过程还可以为服务器对该各人脸图像各自对应的第一标注数据进行归一化处理,得到该各人脸图像各自对应的第二标注数据作为该各人脸图像各自对应的表情标签。由于各人脸图像各自对应的第一标注数据包括的表情单元的值的取值范围较大,一般为0~5,过大的取值可能会得到表情较为夸张的人脸图像,因此为了控制表情的夸张程度,可以采用上述归一化处理的方式以缩小第一标注数据中表情单元的值,使其取值范围控制在更小的区间内,如0~1。
在一个实施例中,服务器对该训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合的过程可以为服务器调用图像检测库对该训练数据集中各人脸图像进行人脸检测,得到各人脸图像各自对应的原始脸部图像,并对各人脸图像各自对应的原始脸部图像进行人脸对齐,得到各人脸图像各自对应的标准脸部图像,生成包括各人脸图像各自对应的标准脸部图像的脸部图像集合。其中,对各人脸图像各自对应的原始脸部图像进行人脸对齐,得到各人脸图像各自对应的标准脸部图像的方式可以参见前述提及的对该人脸图像(目标人物的人脸图像)的原始脸部图像进行人脸对齐,得到该人脸图像的标准脸部图像的方式,本申请实施例在此不做赘述。由于生成对抗网络模型一般都比较敏感,因此在训练阶段通过此种方式矫正人脸图像可以提高模型稳定度。
在一个实施例中,服务器利用该脸部图像集合中的各标准脸部图像以及该标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型大致流程可以如下:服务器每次从脸部图像集合随机选两张标准脸部图像,如face_a和face_b,并获取face_a的表情单元的值,即AU系数,如au_a,并获取face_b的表情单元的值,即AU系数,如au_b(face_a和face_b可以不为同一人);服务器利用生成对抗网络模型包括的生成器根据au_b和face_a生成fake_b;服务器通过生成对抗网络模型包括的判别器根据fake_b和face_a进行训练;服务器通过生成器以au_a还原fake_b为rec_a,并计算rec_a和face_a的还原损失,即L1-loss;服务器利用L1-loss进行模型训练,直到模型收敛,得到训练后的生成对抗网络模型作为表情生成模型。
在一个实施例中,除了可以得到fake_b,还可以得到fake_b对应的特征参数,如mask参数。mask参数的取值范围可以为0~1。mask参数的大小能够反映区域的重要程度。
S104、根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像。
为了使得合成的脸部图像更加逼真,服务器可以进一步根据该标准脸部图像以及该第一合成脸部图像进行脸部合成,得到第二合成脸部图像。其中,该第二合成脸部图像对应的表情可以为该表情标签指示的表情。并且,该第二合成脸部图像可以具有该标准脸部图像的脸部特征。
在一个实施例中,服务器在利用表情生成模型根据该标准脸部图像以及该表情标签进行表情合成之后,还可以得到第一合成脸部图像对应的特征参数。服务器可以根据该标准脸部图像、该第一合成脸部图像以及该第一合成脸部图像对应的特征参数进行脸部合成,得到第二合成脸部图像。
在一个实施例中,若第一合成脸部图像对应的特征参数为mask参数,那么脸部合成的过程可以为参见如下公式:
目标图=图1*mask+图2*(1-mask)公式1.1;
此处,前述的第一合成脸部图像即可为公式1.1中的图1,前述的标准脸部图像即可以 为公式1.1中的图2,前述的第二合成脸部图像即可以为公式1.1中的目标图。
以图1C的脸部合成过程图为例,图1C左上角图像是标准脸部图像,图1C左下角图像是第一合成脸部图像,图1C右边图像是第二合成脸部图像。
上述脸部合成的方法通过图像注意力机制,使得生成的微表情和脸部图像按照合理的比例合成,让生成结果更加稳定。并且该方法能够在自然环境下对人脸表情进行编辑而无需顾及背景等因素。
S105、生成包括所述第二合成脸部图像的人脸图像。
本申请实施例中,服务器可以根据目标人物的人脸图像生成包括该第二合成脸部图像的人脸图像。这个过程,简单来说,相当于服务器使用第二合成脸部图像替换了目标人物的人脸图像的原始脸部图像,从而得到包括该第二合成脸部图像的人脸图像,即,服务器将包括该第二合成脸部图像的人脸图像还原至目标人物的人脸图像的原始脸部图像所在位置。
在一个实施例中,服务器可以输出包括该第二合成脸部图像的人脸图像,如可以将包括该第二合成脸部图像的人脸图像发送至用户终端以便用户查看。对于用户来讲,可能看到的就是目标人物表情变化后的人脸图像。
在一个实施例中,在对生成对抗网络模型进行训练的过程中,可以加入以下至少一项改进:
1、引入谱归一化spectral_norm来限制模型的Lipschitz约束(简称L约束)。具体可以将判别器的网络的参数w替换为
Figure PCTCN2021096715-appb-000001
以使生成对抗网络模型满足L约束,从而提升模型的泛化能力和稳定性。其中,||w|| 2为w的谱范数。判别器使用的损失函数可以为hinge-loss,该损失函数可以用于判别器的二分类过程。
2、生成器加入残差网络。引入残差网络可以使得生成的人脸图像能够更好地还原人脸细节。生成器可以利用每层卷积层的残差网络对输入特征进行卷积操作,得到输出特征。即,残差网络可以用于根据输入特征执行卷积操作,得到输出特征。
3、生成器可以参考styleGAN,在每层卷积层执行卷积操作后引入自适应实例化标准(Adaptive Instance Normalization,AdaIN)模块和均匀随机噪声对卷积操作后得到的输出特征进行AdaIN处理,以提高生成图像的质量。其中,AdaIN处理的过程可以参见如下公式:
Figure PCTCN2021096715-appb-000002
其中,x表示输入特征,y表示表情单元的值,即AU系数。
4、生成器引入指数衰退,衰减系数可以为0.9995。即,生成器每次迭代,生成器的参数更新,每步是:旧参数*0.9995+新参数*0.0005。引入指数衰退使得生成器的更新更加缓慢和稳定。
可见,图1A所示的实施例中,服务器可以对目标人物的人脸图像进行图像处理,得到该人脸图像的标准脸部图像,并利用表情生成模型根据该标准脸部图像以及该人脸图像对应的表情标签进行表情合成,得到第一合成脸部图像,从而根据该标准脸部图像以及该第一合成脸部图像进行脸部合成,得到第二合成脸部图像,并生成包括该第二合成脸部图像的人脸图像,该过程不仅可以使得生成表情更加稳定,使得最终生成的人脸图像更加稳定,还能够满足用户对表情,尤其是微表情的编辑需求。
请参阅图2A,为本申请实施例提供的另一种人脸图像生成方法的流程示意图。该方法可以应用于服务器,该服务器可以一个服务器或服务器集群,该方法可以包括如下步骤:
S201、获取目标人物的人脸图像以及所述人脸图像对应的表情标签。
S202、对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像。
S203、利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像。
S204、根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像。
S205、生成包括所述第二合成脸部图像的人脸图像。
其中,步骤S201-S205可以参见步骤S101-S105,本申请实施例在此不做赘述。
S206、利用增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进行图像调整,得到调整后的所述包括第二合成脸部图像的人脸图像。
S207、输出调整后的所述包括第二合成脸部图像的人脸图像。
在步骤S206-S207中,服务器可以通过增强型超分辨率生成对抗网络(Enhanced Super-Resolution Generative Adversarial Networks,ESRGAN)模型对包括第二合成脸部图像的人脸图像进行处理后,输出分辨率更高的人脸图像,进而达到优化人脸图像的目的。该过程采用超分辨技术对生成的人脸图像进行了高清化放大,在不影响训练生成对抗网络模型的基础上提高了生成表情的分辨率。经实践,采用上述过程能够在有效保留人脸清晰度的前提下将人脸图像的分辨率从128x128提高到512x512。
本申请实施例中,服务器具体可以将包括第二合成脸部图像的人脸图像作为ESRGAN模型的输入数据,通过ESRGAN模型对包括第二合成脸部图像的人脸图像进行调整,得到调整后的包括第二合成脸部图像的人脸图像。在一个实施例中,所述的ESRGAN模型可以通过人工扰动一些高质量的人脸图像训练得到。
例如,参见图2B,图2B左图所示的为包括第二合成脸部图像的人脸图像,图2B右图所示的为调整后的包括第二合成脸部图像的人脸图像。即,服务器可以将图2B左图所示的人脸图像作为ESRGAN模型的输入数据,通过ESRGAN模型对图2B左图所示的人脸图像进行调整,得到图2B右图所示的人脸图像。
在一个实施例中,服务器可以生成包括第二合成脸部图像的人脸图像的第一索引信息,并将该第一索引信息写入区块链中。将第一索引信息写入区块链,可以方便对相应人脸图像进行索引,还能有效防止将相应人脸图像用作非法用途。其中,第一索引信息是指包括第二合成脸部图像的人脸图像的索引信息。该第一索引信息例如可以为服务器对该人脸图像进行哈希计算得到的哈希值或服务器对该人脸图像进行签名处理后得到的签名信息。
在一个实施例中,服务器可以生成调整后的该包括第二合成脸部图像的人脸图像的第二索引信息,并将该第二索引信息写入区块链中。将第二索引信息写入区块链。其中,该第二索引信息是指调整后的该包括第二合成脸部图像的人脸图像的索引信息。该第二索引信息例如可以为服务器对该人脸图像进行哈希计算得到的哈希值或服务器对该人脸图像进行签名处理后得到的签名信息。
同样,本申请也可以用于智慧城市的建设。随着深度学习技术的发展,通过深度学习技术生成的人脸图像也越来越逼真,这无疑对人脸识别过程带来了挑战,那么在人脸识别过程中,为了避免出现利用合成的人脸图像或相关视频来以假乱真进行人脸识别的情况,可以在人脸识别过程中采集人脸图像,并将采集的人脸图像与服务器存储的多张合成脸部图像进行比对,然后根据比对结果判断是否进行警告处理。
可见,图2A所示的实施例中,服务器在得到包括第二合成脸部图像的人脸图像之后,还可以通过增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进 行调整,以到达优化人脸图像的目的,使得输出的人脸图像的图像质量更高。
请参阅图3,为本申请实施例提供的一种人脸生成装置的结构示意图。该装置可以应用于服务器。该装置可以包括:
获取模块301,用于获取目标人物的人脸图像以及所述人脸图像对应的表情标签。
处理模块302,用于对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像。
合成模块303,用于利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情。
合成模块303,还用于根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情。
处理模块302,还用于生成包括所述第二合成脸部图像的人脸图像。
在一种可选的实施方式中,处理模块302,还用于在生成包括所述第二合成脸部图像的人脸图像之后,利用增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进行图像调整,得到调整后的所述包括第二合成脸部图像的人脸图像,输出调整后的所述包括第二合成脸部图像的人脸图像。
在一种可选的实施方式中,处理模块302对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像,具体为调用图像检测库对所述人脸图像进行人脸检测,得到所述人脸图像的原始脸部图像;对所述原始脸部图像进行人脸对齐,得到所述人脸图像的标准脸部图像。
在一种可选的实施方式中,处理模块302,还用于在利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像前,获取训练数据集,所述训练数据集包括多张人脸图像,所述人脸图像携带对应的表情标签;对所述训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合,所述脸部图像集合包括所述各人脸图像各自对应的标准脸部图像;利用所述脸部图像集合中的各标准脸部图像以及所述标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型。
在一种可选的实施方式中,处理模块302,还用于调用人脸识别工具库对所述训练数据集中的各人脸图像进行标注,得到所述各人脸图像各自对应的第一标注数据;根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
在一种可选的实施方式中,处理模块302根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签,具体为对所述各人脸图像各自对应的第一标注数据进行归一化处理,得到所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
在一种可选的实施方式中,合成模块303,还用于在利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成之后,得到第一合成脸部图像对应的特征参数。
在一种可选的实施方式中,合成模块303根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像,具体为根据所述标准脸部图像、所述第一合成脸部图像以及所述第一合成脸部图像对应的特征参数进行脸部合成,得到第二合成脸部图像。
可见,图3所示的实施例中,人脸图像生成装置可以对目标人物的人脸图像进行图像处理,得到该人脸图像的标准脸部图像,并利用表情生成模型根据该标准脸部图像以及该 人脸图像对应的表情标签进行表情合成,得到第一合成脸部图像,从而根据该标准脸部图像以及该第一合成脸部图像进行脸部合成,得到第二合成脸部图像,并生成包括该第二合成脸部图像的人脸图像,该过程不仅可以使得生成表情更加稳定,使得最终生成的人脸图像更加稳定,还能够满足用户对表情,尤其是微表情的编辑需求。
请参参阅图4,为本申请实施例提供的一种服务器的结构示意图。该服务器包括处理器和存储器。可选的,该服务器还可包括输入设备和/或输出设备。例如,如图4所示,本实施例中所描述的服务器可以包括:一个或多个处理器100,一个或多个输入设备200,一个或多个输出设备300和存储器400。处理器100、输入设备200、输出设备300和存储器400可以通过总线连接。其中,输入设备200和输出设备300为本申请实施例可选的设备。输入设备200、输出设备300可以是标准的有线或无线通信接口。
处理器100可以是中央处理模块(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器400可以是高速RAM存储器,也可为非不稳定的存储器(non-volatile memory),例如磁盘存储器。存储器400用于存储一组程序代码,输入设备200、输出设备300和处理器100可以调用存储器400中存储的程序代码。具体地:
处理器100,用于获取目标人物的人脸图像以及所述人脸图像对应的表情标签;对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;生成包括所述第二合成脸部图像的人脸图像。
在一个实施例中,处理器100,还用于在生成包括所述第二合成脸部图像的人脸图像之后,利用增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进行图像调整,得到调整后的所述包括第二合成脸部图像的人脸图像;通过输出设备300输出调整后的所述包括第二合成脸部图像的人脸图像。
在一个实施例中,处理器100对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像,具体为调用图像检测库对所述人脸图像进行人脸检测,得到所述人脸图像的原始脸部图像;对所述原始脸部图像进行人脸对齐,得到所述人脸图像的标准脸部图像。
在一个实施例中,处理器100,还用于在利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像前,获取训练数据集,所述训练数据集包括多张人脸图像,所述人脸图像携带对应的表情标签;对所述训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合,所述脸部图像集合包括所述各人脸图像各自对应的标准脸部图像;利用所述脸部图像集合中的各标准脸部图像以及所述标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型。
在一个可选的实施例中,处理器100,还用于调用人脸识别工具库对所述训练数据集中的各人脸图像进行标注,得到所述各人脸图像各自对应的第一标注数据;根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
在一个实施例中,处理器100根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签,具体为 对所述各人脸图像各自对应的第一标注数据进行归一化处理,得到所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
在一个实施例中,处理器100,还用于在利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成之后,得到第一合成脸部图像对应的特征参数。
在一个实施例中,处理器100根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像,具体为根据所述标准脸部图像、所述第一合成脸部图像以及所述第一合成脸部图像对应的特征参数进行脸部合成,得到第二合成脸部图像。
具体实现中,本申请实施例中所描述的处理器100、输入设备200、输出设备300可执行图1A实施例、图2A实施例所描述的实现方式,也可执行本申请实施例所描述的实现方式,在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时可实现上述实施例中方法的步骤,或者,计算机程序被处理器执行时实现上述实施例中装置的各模块的功能。例如,计算机程序被处理器执行时可以实现以下方法:
获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
生成包括所述第二合成脸部图像的人脸图像。
可选的,该计算机程序被处理器执行时还可实现上述实施例中方法的其他步骤,这里不再赘述。进一步可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采样硬件的形式实现,也可以采样软件功能模块的形式实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的计算机可读存储介质可为易失性的或非易失性的。例如,该计算机存储介质可以为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。所述的计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。

Claims (20)

  1. 一种人脸图像生成方法,包括:
    获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
    对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
    利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
    根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
    生成包括所述第二合成脸部图像的人脸图像。
  2. 根据权利要求1所述的方法,其中,所述生成包括所述第二合成脸部图像的人脸图像之后,所述方法还包括:
    利用增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进行图像调整,得到调整后的所述包括第二合成脸部图像的人脸图像;
    输出调整后的所述包括第二合成脸部图像的人脸图像。
  3. 根据权利要求1所述的方法,其中,所述对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像,包括:
    调用图像检测库对所述人脸图像进行人脸检测,得到所述人脸图像的原始脸部图像;
    对所述原始脸部图像进行人脸对齐,得到所述人脸图像的标准脸部图像。
  4. 根据权利要求1-3任一项所述的方法,其中,所述利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像前,所述方法还包括:
    获取训练数据集,所述训练数据集包括多张人脸图像,所述人脸图像携带对应的表情标签;
    对所述训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合,所述脸部图像集合包括所述各人脸图像各自对应的标准脸部图像;
    利用所述脸部图像集合中的各标准脸部图像以及所述标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型。
  5. 根据权利要求4所述的方法,其中,所述方法还包括:
    调用人脸识别工具库对所述训练数据集中的各人脸图像进行标注,得到所述各人脸图像各自对应的第一标注数据;
    根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
  6. 根据权利要求5所述的方法,其中,所述根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签,包括:
    对所述各人脸图像各自对应的第一标注数据进行归一化处理,得到所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
  7. 根据权利要求1所述的方法,其中,所述利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成之后,所述方法还包括:
    得到第一合成脸部图像对应的特征参数;
    所述根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像,包括:
    根据所述标准脸部图像、所述第一合成脸部图像以及所述第一合成脸部图像对应的特征参数进行脸部合成,得到第二合成脸部图像。
  8. 一种人脸图像生成装置,包括:
    获取模块,用于获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
    处理模块,用于对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
    合成模块,用于利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
    所述合成模块,还用于根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
    所述处理模块,还用于生成包括所述第二合成脸部图像的人脸图像。
  9. 一种服务器,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:
    获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
    对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
    利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
    根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
    生成包括所述第二合成脸部图像的人脸图像。
  10. 根据权利要求9所述的服务器,其中,所述生成包括所述第二合成脸部图像的人脸图像之后,所述处理器还用于执行:
    利用增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进行图像调整,得到调整后的所述包括第二合成脸部图像的人脸图像;
    输出调整后的所述包括第二合成脸部图像的人脸图像。
  11. 根据权利要求9所述的服务器,其中,执行所述对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像,包括:
    调用图像检测库对所述人脸图像进行人脸检测,得到所述人脸图像的原始脸部图像;
    对所述原始脸部图像进行人脸对齐,得到所述人脸图像的标准脸部图像。
  12. 根据权利要求9-11任一项所述的服务器,其中,所述利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像前,所述处理器还用于执行:
    获取训练数据集,所述训练数据集包括多张人脸图像,所述人脸图像携带对应的表情标签;
    对所述训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合,所述脸部图像集合包括所述各人脸图像各自对应的标准脸部图像;
    利用所述脸部图像集合中的各标准脸部图像以及所述标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型。
  13. 根据权利要求12所述的服务器,其中,所述处理器还用于执行:
    调用人脸识别工具库对所述训练数据集中的各人脸图像进行标注,得到所述各人脸图像各自对应的第一标注数据;
    根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
  14. 根据权利要求9所述的服务器,其中,所述利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成之后,所述处理器还用于执行:
    得到第一合成脸部图像对应的特征参数;
    执行所述根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像,包括:
    根据所述标准脸部图像、所述第一合成脸部图像以及所述第一合成脸部图像对应的特征参数进行脸部合成,得到第二合成脸部图像。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:
    获取目标人物的人脸图像以及所述人脸图像对应的表情标签;
    对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像;
    利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像;所述第一合成脸部图像对应的表情为所述表情标签指示的表情;
    根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像;所述第二合成脸部图像对应的表情为所述表情标签指示的表情;
    生成包括所述第二合成脸部图像的人脸图像。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述生成包括所述第二合成脸部图像的人脸图像之后,所述计算机程序被处理器执行时还用于实现:
    利用增强型超分辨率生成对抗网络模型对所述包括第二合成脸部图像的人脸图像进行图像调整,得到调整后的所述包括第二合成脸部图像的人脸图像;
    输出调整后的所述包括第二合成脸部图像的人脸图像。
  17. 根据权利要求15所述的计算机可读存储介质,其中,执行所述对所述人脸图像进行人脸检测,得到所述人脸图像的标准脸部图像,包括:
    调用图像检测库对所述人脸图像进行人脸检测,得到所述人脸图像的原始脸部图像;
    对所述原始脸部图像进行人脸对齐,得到所述人脸图像的标准脸部图像。
  18. 根据权利要求15-17任一项所述的计算机可读存储介质,其中,所述利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成,得到第一合成脸部图像前,所述计算机程序被处理器执行时还用于实现:
    获取训练数据集,所述训练数据集包括多张人脸图像,所述人脸图像携带对应的表情标签;
    对所述训练数据集中的各人脸图像进行人脸检测,得到脸部图像集合,所述脸部图像集合包括所述各人脸图像各自对应的标准脸部图像;
    利用所述脸部图像集合中的各标准脸部图像以及所述标准脸部图像对应的人脸图像携带的表情标签对生成对抗网络模型进行训练,得到训练后的生成对抗网络模型作为表情生成模型。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还用于实现:
    调用人脸识别工具库对所述训练数据集中的各人脸图像进行标注,得到所述各人脸图像各自对应的第一标注数据;
    根据所述各人脸图像各自对应的第一标注数据,获得所述各人脸图像各自对应的第二标注数据作为所述各人脸图像各自对应的表情标签。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述利用表情生成模型根据所述标准脸部图像以及所述表情标签进行表情合成之后,所述计算机程序被处理器执行时还用于实现:
    得到第一合成脸部图像对应的特征参数;
    执行所述根据所述标准脸部图像以及所述第一合成脸部图像进行脸部合成,得到第二合成脸部图像,包括:
    根据所述标准脸部图像、所述第一合成脸部图像以及所述第一合成脸部图像对应的特征参数进行脸部合成,得到第二合成脸部图像。
PCT/CN2021/096715 2020-07-27 2021-05-28 人脸图像生成方法、装置、服务器及存储介质 WO2022022043A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010731169.XA CN111860380B (zh) 2020-07-27 2020-07-27 人脸图像生成方法、装置、服务器及存储介质
CN202010731169.X 2020-07-27

Publications (1)

Publication Number Publication Date
WO2022022043A1 true WO2022022043A1 (zh) 2022-02-03

Family

ID=72948327

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096715 WO2022022043A1 (zh) 2020-07-27 2021-05-28 人脸图像生成方法、装置、服务器及存储介质

Country Status (2)

Country Link
CN (1) CN111860380B (zh)
WO (1) WO2022022043A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115908655A (zh) * 2022-11-10 2023-04-04 北京鲜衣怒马文化传媒有限公司 一种虚拟人物面部表情处理方法及装置
CN116091857A (zh) * 2022-10-17 2023-05-09 北京百度网讯科技有限公司 图像处理模型的训练方法、图像处理方法和装置
CN116369918A (zh) * 2023-02-22 2023-07-04 北京决明科技有限公司 情绪检测方法、系统、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860380B (zh) * 2020-07-27 2024-07-23 平安科技(深圳)有限公司 人脸图像生成方法、装置、服务器及存储介质
CN113688799B (zh) * 2021-09-30 2022-10-04 合肥工业大学 一种基于改进深度卷积生成对抗网络的人脸表情识别方法
CN114398449B (zh) * 2021-12-29 2023-01-06 深圳市海清视讯科技有限公司 数据处理方法、装置、视频监控系统、存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257162A (zh) * 2016-12-29 2018-07-06 北京三星通信技术研究有限公司 合成脸部表情图像的方法和装置
CN109325988A (zh) * 2017-07-31 2019-02-12 腾讯科技(深圳)有限公司 一种面部表情合成方法、装置及电子设备
CN110136071A (zh) * 2018-02-02 2019-08-16 杭州海康威视数字技术股份有限公司 一种图像处理方法、装置、电子设备及存储介质
CN110889381A (zh) * 2019-11-29 2020-03-17 广州华多网络科技有限公司 换脸方法、装置、电子设备及存储介质
CN111028305A (zh) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 表情生成方法、装置、设备及存储介质
CN111860380A (zh) * 2020-07-27 2020-10-30 平安科技(深圳)有限公司 人脸图像生成方法、装置、服务器及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680069B (zh) * 2017-08-30 2020-09-11 歌尔股份有限公司 一种图像处理方法、装置和终端设备
CN109359575B (zh) * 2018-09-30 2022-05-10 腾讯科技(深圳)有限公司 人脸检测方法、业务处理方法、装置、终端及介质
CN110084121A (zh) * 2019-03-27 2019-08-02 南京邮电大学 基于谱归一化的循环生成式对抗网络的人脸表情迁移的实现方法
CN110222588B (zh) * 2019-05-15 2020-03-27 合肥进毅智能技术有限公司 一种人脸素描图像衰老合成方法、装置及存储介质
CN110517185B (zh) * 2019-07-23 2024-02-09 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及存储介质
CN111027425A (zh) * 2019-11-28 2020-04-17 深圳市木愚科技有限公司 一种智能化表情合成反馈交互系统及方法
CN111275779B (zh) * 2020-01-08 2022-12-16 网易(杭州)网络有限公司 表情迁移方法、图像生成器的训练方法、装置及电子设备
CN111369646B (zh) * 2020-03-09 2023-03-24 南京理工大学 一种融合注意力机制的表情合成方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257162A (zh) * 2016-12-29 2018-07-06 北京三星通信技术研究有限公司 合成脸部表情图像的方法和装置
CN109325988A (zh) * 2017-07-31 2019-02-12 腾讯科技(深圳)有限公司 一种面部表情合成方法、装置及电子设备
CN110136071A (zh) * 2018-02-02 2019-08-16 杭州海康威视数字技术股份有限公司 一种图像处理方法、装置、电子设备及存储介质
CN111028305A (zh) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 表情生成方法、装置、设备及存储介质
CN110889381A (zh) * 2019-11-29 2020-03-17 广州华多网络科技有限公司 换脸方法、装置、电子设备及存储介质
CN111860380A (zh) * 2020-07-27 2020-10-30 平安科技(深圳)有限公司 人脸图像生成方法、装置、服务器及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091857A (zh) * 2022-10-17 2023-05-09 北京百度网讯科技有限公司 图像处理模型的训练方法、图像处理方法和装置
CN116091857B (zh) * 2022-10-17 2023-10-20 北京百度网讯科技有限公司 图像处理模型的训练方法、图像处理方法和装置
CN115908655A (zh) * 2022-11-10 2023-04-04 北京鲜衣怒马文化传媒有限公司 一种虚拟人物面部表情处理方法及装置
CN115908655B (zh) * 2022-11-10 2023-07-14 北京鲜衣怒马文化传媒有限公司 一种虚拟人物面部表情处理方法及装置
CN116369918A (zh) * 2023-02-22 2023-07-04 北京决明科技有限公司 情绪检测方法、系统、设备及存储介质
CN116369918B (zh) * 2023-02-22 2024-02-20 北京决明科技有限公司 情绪检测方法、系统、设备及存储介质

Also Published As

Publication number Publication date
CN111860380B (zh) 2024-07-23
CN111860380A (zh) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2022022043A1 (zh) 人脸图像生成方法、装置、服务器及存储介质
US11410457B2 (en) Face reenactment
CN109376582B (zh) 一种基于生成对抗网络的交互式人脸卡通方法
US20220121931A1 (en) Direct regression encoder architecture and training
US11410364B2 (en) Systems and methods for realistic head turns and face animation synthesis on mobile device
WO2022078041A1 (zh) 遮挡检测模型的训练方法及人脸图像的美化处理方法
WO2021036059A1 (zh) 图像转换模型训练方法、异质人脸识别方法、装置及设备
WO2020150689A1 (en) Systems and methods for realistic head turns and face animation synthesis on mobile device
WO2024109374A1 (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
CN113034355B (zh) 一种基于深度学习的肖像图像双下巴去除方法
US20230058793A1 (en) Retouching digital images utilizing layer specific deep-learning neural networks
Tolosana et al. An introduction to digital face manipulation
Chen et al. Coogan: A memory-efficient framework for high-resolution facial attribute editing
CN117557688B (zh) 肖像生成模型训练方法、装置、计算机设备和存储介质
US20220101122A1 (en) Energy-based variational autoencoders
WO2022016996A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2024087946A1 (zh) 图像编辑方法、装置、计算机设备和存储介质
CN115392216B (zh) 一种虚拟形象生成方法、装置、电子设备及存储介质
CN112464924A (zh) 一种构建训练集的方法及装置
US20220101145A1 (en) Training energy-based variational autoencoders
Shi et al. Face deblurring based on regularized structure and enhanced texture information
WO2024130704A1 (zh) 基于神经网络的文本抠图方法、装置、设备以及存储介质
CN117078503A (zh) 一种基于风格图像的数字藏品生成方法和系统
CN117011430A (zh) 游戏资源处理方法、装置、设备、存储介质和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849277

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849277

Country of ref document: EP

Kind code of ref document: A1