WO2020192568A1 - 人脸图像生成方法、装置、设备及存储介质 - Google Patents

人脸图像生成方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020192568A1
WO2020192568A1 PCT/CN2020/080335 CN2020080335W WO2020192568A1 WO 2020192568 A1 WO2020192568 A1 WO 2020192568A1 CN 2020080335 W CN2020080335 W CN 2020080335W WO 2020192568 A1 WO2020192568 A1 WO 2020192568A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
model
face
optical flow
target
Prior art date
Application number
PCT/CN2020/080335
Other languages
English (en)
French (fr)
Inventor
者雪飞
凌永根
暴林超
宋奕兵
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP20779223.5A priority Critical patent/EP3944200B1/en
Publication of WO2020192568A1 publication Critical patent/WO2020192568A1/zh
Priority to US17/235,456 priority patent/US11380050B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • This application relates to the field of image processing technology, and in particular to a method, device, device and storage medium for generating a face image.
  • face image generation technology is used in many scenarios.
  • One or more face images are used as input to generate other face images similar to the input face image posture and facial expression; for example, a A smiling face image of a person is used as a basis, and a smiling face image of the person or other people is generated through the face image generation technology.
  • the existing face image generation technology directly relies on the generative confrontation network to synthesize the face image.
  • the parameter space of the generative confrontation network is relatively large, the model complexity is relatively high, its actual training effect is not good, and it is prone to overfitting.
  • the synthesized face image is not natural and realistic enough, and it only targets a specific face image and cannot achieve personalized face image synthesis.
  • the embodiment of the application provides a method for generating a face image, which generates an initial optical flow map through a three-dimensional face variable model, and then performs optical flow completion on the initial optical flow map based on a convolutional neural network.
  • the target optical flow diagram of the final composite target face image so that the contour of the face image in the first reference element can be retained, and the pose and expression of the target face image represented by the second reference element can be retained, so that the generation
  • the target face image is more realistic and natural, and, based on the three-dimensional face variable model, personalized face image synthesis can be realized.
  • the embodiments of the present application also provide a face image generation device, device, computer-readable storage medium, and computer program product.
  • the first aspect of the present application provides a method for generating a face image, the method including:
  • the second reference element Determining, according to the second reference element, a three-dimensional variable face model corresponding to the second reference element as the second model; the second reference element is used to represent the pose and/or expression of the target face image;
  • the first model and the second model determine the initial optical flow map corresponding to the first face image, and transform the first face image to the first face image according to the initial optical flow map.
  • the optical flow increment map and the visible probability corresponding to the first face image are obtained through a convolutional neural network Figure
  • the optical flow incremental map and the visible probability map corresponding to the first face image is generated.
  • a second aspect of the present application provides a face image generation device, the device includes:
  • the first model generation module is configured to determine, according to the first face image in the first reference element, a three-dimensional face variable model corresponding to the first face image as the first model;
  • the second model generation module is used to determine the three-dimensional face variable model corresponding to the second reference element as the second model according to the second reference element; the second reference element is used to characterize the posture of the target face image and / Or emoji;
  • the determining module is configured to determine the initial optical flow map corresponding to the first face image according to the first model and the second model, and deform the first face image according to the initial optical flow map Obtaining an initial deformation map corresponding to the first face image;
  • the obtaining module is configured to obtain the optical flow increase corresponding to the first face image through a convolutional neural network according to the first face image and the initial optical flow map and the initial deformation map corresponding to the first face image Volume graph and visible probability graph;
  • the target face image generation module is configured to generate the target face image according to the first face image and the initial optical flow map, the optical flow incremental map and the visible probability map corresponding to the first face image.
  • a third aspect of the present application provides a device including a processor and a memory:
  • the memory is used to store program code and transmit the program code to the processor
  • the processor is configured to execute the steps of the method for generating a face image as described in the first aspect according to the instructions in the program code.
  • a fourth aspect of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store program code, and the program code is used to execute the face image generation method described in the first aspect.
  • a fifth aspect of the present application provides a computer program product.
  • the computer program product includes instructions that, when run on a computer, cause the computer to execute the face image generation method described in the first aspect.
  • An embodiment of the application provides a method for generating a face image.
  • a three-dimensional variable face model corresponding to the first face image is determined as the first model.
  • the second reference element that characterizes the posture and/or expression of the target face image determines the corresponding three-dimensional face variable model as the second model, and then determines the initial light corresponding to the first face image according to the first model and the second model.
  • Flow graph it can be seen that this method determines the initial optical flow graph through the three-dimensional face variable model.
  • the probability map generates the target face image, so that it retains more detailed information of the original image, so it is more realistic and natural.
  • this greatly reduces the parameter space, reduces the model complexity, improves the generalization performance, and can generate natural Realistic face image.
  • FIG. 1 is a scene architecture diagram of a method for generating a face image in an embodiment of the application
  • 2A is a flowchart of a method for generating a face image in an embodiment of the application
  • FIG. 2B is an example effect diagram of image synthesis based on FIG. 2A;
  • FIG. 2C is an example effect diagram of generating an initial optical flow diagram based on FIG. 2A;
  • FIG. 3 is a flowchart of determining a three-dimensional face variable model corresponding to a first face image based on a neural network model in an embodiment of the application;
  • FIG. 4 is a schematic diagram of input and output of a convolutional neural network in an embodiment of the application.
  • 5A is a schematic structural diagram of a generative confrontation network model in an embodiment of this application.
  • FIG. 5B is a flowchart of a method for training a generative confrontation network model in an embodiment of the application
  • FIG. 6 is a schematic diagram of the effect of generating a target face image in an embodiment of the application.
  • FIG. 7 is a schematic diagram of the effect of generating a target face image in an embodiment of the application.
  • FIG. 8A is a schematic diagram of an application scenario of a method for generating a face image in an embodiment of the application
  • FIG. 8B is a schematic diagram of another application scenario of the method for generating a face image in an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 15 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 16 is a schematic structural diagram of a face image generating apparatus in an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a server in an embodiment of the application.
  • FIG. 18 is a schematic structural diagram of a terminal in an embodiment of the application.
  • this application provides a face image generation method based on optical flow graphs.
  • the method uses 3D Morphable Models (3DMM) to determine the initial optical flow graphs.
  • 3DMM 3D Morphable Models
  • the method can The contour of the first face image in the first reference element and the posture and expression of the target face image identified by the second reference element are retained.
  • parameterized control can be achieved through a three-dimensional face variable model, which is convenient for users according to actual conditions.
  • the initial optical flow map and the initial deformation map use the convolutional neural network to obtain the corresponding optical flow incremental map and The visible probability map is used to generate the target face image based on the first face image and its corresponding initial optical flow map, optical flow incremental map, and visible probability map, so that it retains more detailed information of the original image, so it is more realistic and realistic. natural.
  • the face image generation method provided in this application can be applied to a processing device with graphics processing capabilities, and the processing device can be any including a central processing unit (CPU) and/or a graphics processing unit (Graphics Processing Unit). , GPU) terminal or server, processing device, when executing the face image generation method provided in this application, it can be executed independently, or can be executed through cluster cooperation.
  • the method may be stored in the processing device in the form of an application program or software, and the processing device implements the face image generation method provided in this application by executing the application program or software.
  • the scene includes a server 10 and a terminal 20.
  • the terminal 20 sends a face image generation request to the server 10, and the face image generation request carries the first reference Element and the second reference element, where the first reference element includes the first face image, the second reference element is used to characterize the pose and/or expression of the target face image, and the server 10 according to the first person in the first reference element
  • the face image determines the 3DMM corresponding to the first face image as the first model, determines the 3DMM corresponding to the second reference element as the second model according to the second reference element, and then determines the first model according to the first model and the second model.
  • the initial optical flow map corresponding to the face image, and the first face image in the first reference element is transformed into the initial deformation map corresponding to the first face image according to the initial optical flow map, and the server 10 then according to the first reference element
  • the optical flow increment map and the visible probability map corresponding to the first face image are obtained through the convolutional neural network.
  • the first face image in a reference element and the initial optical flow map, optical flow incremental map, and visible probability map corresponding to the first face image generate a target face image, and then the server 10 returns the target face image to the terminal 20 .
  • the method includes:
  • S201 Determine a 3DMM corresponding to the first face image as a first model according to the first face image in the first reference element.
  • S202 Determine, according to the second reference element, a 3DMM corresponding to the second reference element as the second model.
  • the first reference element includes a first face image
  • the second reference element is used to characterize the posture and/or expression of the target face image.
  • the face image generation method provided in this embodiment of the application is the first person Based on the face image, a target face image with a specified posture and/or a specified expression is generated.
  • the posture refers to the appearance of the body.
  • the posture can be understood as the appearance of the head, and the posture can be characterized by the angle between the central axis of the head and the horizontal or vertical direction.
  • the posture may include a left deviation at an angle of 30° from the vertical direction, or a right deviation at an angle of 60° from the vertical direction.
  • Facial expressions can be characterized by the difference between the five senses and the normal situation, such as the upturned corners of the mouth to represent smiles, and the drooping corners of the mouth to represent depression. Of course, some expressions can also be represented by gestures. Characterize by scratching your head.
  • the second reference element can represent the pose and/or expression of the target face image in different forms.
  • the second reference element may include target model parameters that characterize posture and/or expression, and may also include a second face image that is different from the first face image.
  • the posture and/or expression in the second face image represents the posture and/or expression of the target face image.
  • the 3DMM corresponding to the target model parameter is determined according to the target model parameter as the second model; in response to the second reference element including the second person For the face image, a 3DMM corresponding to the second face image is determined according to the second face image as the second model.
  • the embodiment of the present application provides two implementation methods of calculating the model coefficients through a mathematical algorithm and directly determining the model coefficients through the network to determine the 3DMM corresponding to the first face image.
  • the two implementations are described in detail below.
  • the server detects the coordinates of key points of the face in the first face image, constructs an initial 3DMM based on the average face, projects the three-dimensional coordinates of the initial 3DMM to the two-dimensional image to obtain the projection coordinates, and then determines that the face is
  • the first model parameter in which the distance between the key point coordinate and the projection coordinate is minimized is determined according to the first model parameter, and the 3DMM corresponding to the first face image is determined.
  • the average face refers to a composite face obtained by extracting facial features from a certain number of ordinary human faces, averaging the measured data, and then using computer technology.
  • the initial 3DMM is a linear model of the 3D face, which can be expressed by the following formula Characterization:
  • the initial 3DMM can be projected to the 2D image according to the following weak projection model to obtain the projection coordinates:
  • V(p) f*Pr*R*S+t 2d (2)
  • f is the focal length of the camera
  • Pr is the orthogonal projection matrix
  • R is the rotation matrix corresponding to the rotation angle
  • t 2d is the pixel translation parameter.
  • the first model parameter [a id , a exp , f, R, t 2d ] can be solved, and the parameters in the initial 3DMM can be updated according to the first model parameter to determine the same as the first face The 3DMM corresponding to the image.
  • the server detects the coordinates of the key points of the face in the first face image, and then obtains the second model through the neural network model according to the coordinates of the key points of the first face and the first face image Parameters, and then determine the 3DMM corresponding to the first face image according to the second model parameters.
  • Figure 3 shows a flow chart of determining the 3DMM corresponding to the first face image based on the neural network model.
  • the neural network model includes a depth encoder and a model-based decoder, and the first face image is input Then, perform face feature detection on the first face image to obtain the key point coordinates of the face.
  • the Deep Encoder of the neural network model can encode the first face image and the key point coordinates of the face, and then semantics
  • the encoding vector performs semantic encoding on the encoded text, where the encoder can be implemented by alexNet or VGG-Face, and the semantic encoding vector can be implemented by the model parameters of the neural network model [a id , a exp , f, R, t 2d ], and then
  • the neural network model uses a Model-based Decoder to decode the semantically encoded text to reconstruct the image, and then the server calculates the loss function of the model, which includes at least the distance between the key point coordinates of the face and the projection coordinates , And the projection brightness difference of key points on the face, where the calculation of the distance between the key point coordinates of the face and the projection coordinates can refer to formula 3, and the calculation of the projection brightness difference can refer to the following formula:
  • E 2 represents the difference in projection brightness
  • I represents the brightness
  • Iu(x,y) is the brightness of the key point u(x,y) obtained by detecting the first face image
  • I(V(p)) is It is the brightness when the key points of the face are projected from the 3DMM to the 2D image.
  • the process of determining the 3DMM corresponding to the second face image according to the second face image can refer to the above two implementations of determining the first model Any implementation manner of is not repeated in this embodiment.
  • the server may directly determine the 3DMM corresponding to the target model parameter based on the target model parameter.
  • the target model parameters included in the second reference element only include part of the parameters in the model parameters [a id , a exp , f, R, t 2d ]
  • part of the parameters in the second reference element can be used to replace
  • the 3DMM corresponding to the target model parameters can be determined.
  • the first reference element may include a first face image, or may include multiple first face images.
  • the server may determine, for each first face image in the first reference element, the first face image The corresponding variable three-dimensional face model is used as the first model corresponding to the first face image.
  • the first reference element includes two first face images 211
  • the second reference element includes a second face image 212.
  • the corresponding first model is determined according to the first face image 211
  • the corresponding first model is determined according to the first face image 211.
  • the two face images 212 determine the corresponding second model. It should be noted that, because the first face image 211 includes two first face images, the first model includes two first models corresponding to the two first face images.
  • S203 Determine an initial optical flow map corresponding to the first face image according to the first model and the second model, and transform the first face image into the first face image according to the initial optical flow map.
  • the server compares the first model and the second model, and calculates the initial optical flow diagram based on the projection geometric relationship.
  • the server may calculate the initial optical flow graph by projecting a normalized coordinate code (Projected Normalized Coordinate Code, PNCC). Specifically, the server projects the first model to obtain the input PNCC image according to the projection normalized coordinate coding algorithm, projects the second model to obtain the target PNCC image, and then finds the pixel with the smallest pixel difference between the input PNCC image and the target PNCC image as the corresponding Point, calculate the pixel difference of each group of corresponding points, and generate the initial optical flow diagram according to the pixel difference of each group of corresponding points.
  • PNCC Projected Normalized Coordinate Code
  • the corresponding PNCC image can be obtained by projecting the above 3DMM model, specifically, the 3DMM model PNCC 1 is obtained by projection 1, PNCC2 is obtained by projecting 3DMM model 2, and PNCC T is obtained by projecting the target 3DMM model.
  • the server may transform the first face image to an initial deformation map corresponding to the first face image according to the initial optical flow map.
  • the initial optical flow diagram describes the pixel correspondence between the first face image in the first reference element and the image represented by the second reference element. Therefore, according to the initial optical flow diagram, the initial optical flow diagram is found. The pixel position on the first face image corresponding to the flow map is copied to the corresponding pixel position on the first face image by copying the pixel value in the initial optical flow map to obtain the initial deformation map corresponding to the first face image.
  • the initial optical flow diagram corresponding to each first face image is obtained first, and then the initial optical flow diagram corresponding to each first face image is obtained.
  • the figure deforms the face image to obtain the initial deformation map corresponding to the face image. That is, when the first reference element includes multiple first face images, the initial deformation map corresponding to each first face image is obtained respectively.
  • the first face image 211 is deformed according to the initial optical flow map 213 to generate a corresponding initial deformation map 214.
  • the initial deformation map 214 includes initial deformation maps respectively corresponding to the two first face images.
  • the server can complement and correct the initial optical flow image to generate a realistic and natural face image.
  • the server can obtain the optical flow increment map and the visible probability map corresponding to the first face image through the convolutional neural network.
  • the optical flow increment map is formed according to the optical flow increment of each pixel of the first face image, based on the optical flow increment of each pixel of the first face image and the initial optical flow corresponding to the first face image
  • the initial optical flow of each pixel in the figure can generate the optical flow of each pixel of the first face image, thereby realizing optical flow complement and correction.
  • the visible probability map represents the probability of each pixel in the first face image appearing in the target face image, and based on the visible probability map, the details of the first face image retained in the target face image can be determined.
  • the first face image 211 and its corresponding initial optical flow map 213 and initial deformation map 214 are input into the convolutional neural network to obtain the optical flow increase corresponding to the first face image 211 output by the convolutional neural network
  • the volume graph 215 and the visible probability graph 216 The optical flow incremental map 215 includes optical flow incremental maps respectively corresponding to the two first face images
  • the visible probability map 216 includes visible probability maps respectively corresponding to the two first face images.
  • the convolutional neural network can adopt a network structure of encoder and decoder.
  • the network structure may specifically be a U-NET structure.
  • U-NET is a convolutional neural network based on the encoder-decoder structure, which is often used for image segmentation tasks.
  • the encoder structure reduces the spatial dimension and extracts image semantic features through the pooling layer, and the decoder structure restores the details of the object and restores the spatial dimension through the deconvolution layer.
  • U-NET takes the first face image and its corresponding initial optical flow map and initial deformation map as input, and takes the optical flow increment map and visible probability map corresponding to the first face image as output.
  • Figure 4 shows a schematic diagram of the input and output of the convolutional neural network.
  • the convolutional neural network adopts the U-NET network structure, and I 0 and I 1 represent the two first face images respectively.
  • the convolutional neural network uses I 0 , I 1 , with As input, ⁇ F 0 ⁇ t , ⁇ F 1 ⁇ t , V 0 ⁇ t and V 1 ⁇ t are output.
  • the embodiment of the present application also provides an implementation manner for training a convolutional neural network.
  • the server determines a first training sample set, and each training sample in the first training sample set includes at least one set of image data and the Label data corresponding to the image data, the image data includes a first sample face image and an initial optical flow diagram and an initial deformation map corresponding to the first sample face image.
  • the initial deformation map corresponding to the first sample face image is based on the first sample face image.
  • the initial optical flow map corresponding to the same face image is transformed into the first sample face image; the label data includes the calibrated optical flow incremental map and the visible probability map, and then the server uses the first training sample set Network training is performed on the training samples to obtain the convolutional neural network.
  • the server trains the U-NET network through training samples in the first training sample set to obtain the convolutional neural network.
  • S205 Generate a target face image according to the first face image and the initial optical flow map, the optical flow incremental map, and the visible probability map corresponding to the first face image.
  • the server may perform optical flow completion on the initial optical flow map corresponding to the first face image according to the optical flow incremental map corresponding to the first face image to obtain the first face image Corresponding target optical flow map, and then transform the first face image to the target deformation map corresponding to the first face image according to the target optical flow map corresponding to the first face image, and then according to the The target deformation map and the visible probability map corresponding to the first face image are generated to generate the target face image.
  • the target face image may be determined by the product of the target deformation map corresponding to the first face image and the visible probability map;
  • the target face image may be specifically determined in the following manner:
  • the first reference element includes two different first face images as an example for illustration.
  • the initial optical flow map 213 is optical flow complemented according to the optical flow increment map 215 corresponding to the first face image 211 to obtain the target optical flow map corresponding to the first face image 211
  • the first face image 211 is deformed according to the target optical flow map to obtain the target deformation map 217, and then the target face image 218 can be generated according to the target deformation map 217 and the visible probability map 216.
  • the server corresponds to the initial optical flow diagram corresponding to ⁇ F 0 ⁇ t to I 0 Optical flow obtained completion I 0 corresponding to the target light flow diagram F 0 ⁇ t, I Initialization of optical flow corresponding to FIG.
  • the server may correspond to the first person images corresponding to each of the visible faces probability map corresponding to FIG target deformation The value of the position is multiplied, the calculation results of each first face image are summed, and the sum result is divided by the sum of the visible probability maps corresponding to each first face image to generate the target face image. See the following formula:
  • V 0 and V 1 respectively represent the visible probability map corresponding to the first face image I 0 and I 1 , g(I 0 , F 0 ⁇ t ), g(I 1 , F 1 ⁇ t ) Respectively represent the target deformation map corresponding to I 0 and I 1 , and ⁇ represents the multiplication of the corresponding positions of the two images.
  • the target face image can be generated by the following formula (6):
  • n is a positive integer greater than 1
  • I n-1 represents the nth first face image in the first reference element
  • V n-1 represents the visible probability map corresponding to I n-1
  • g(I n- 1 , F n-1 ⁇ t ) characterizes the target deformation map corresponding to I n-1
  • the meanings of other elements in formula (6) refer to formula (5).
  • the embodiment of the present application provides a method for generating a face image.
  • the method is based on an optical flow diagram to generate a target face image. Specifically, it is determined based on the first reference element including the first face image.
  • the corresponding three-dimensional face variable model is used as the first model, and the corresponding three-dimensional face variable model is determined as the second model according to the second reference element used to represent the posture and/or expression of the target face image.
  • the first model and the second model determine the initial optical flow map corresponding to the first face image.
  • the three-dimensional face variable model is used to achieve parameterized control, and on the other hand, the original image identity shape information is retained through the initial optical flow map , And then transform the first face image to the corresponding initial deformation map according to the initial optical flow map, obtain the optical flow increment map and the visible probability map corresponding to the first face image through the convolutional neural network, and then according to the first
  • the face image and its corresponding initial optical flow map, optical flow incremental map and visible probability map generate the target face image, which retains the original image detail information, so it is more realistic and natural.
  • this since it is no longer dependent on a single network, but through different small networks to achieve corresponding functions, this greatly reduces the parameter space, reduces the model complexity, improves the generalization performance, and can generate natural Realistic face image.
  • the server may also optimize the target face image through a generative confrontation network model to obtain an optimized target face image output by the generative confrontation network model.
  • the generative confrontation network model can further improve the artificial texture generated during the deformation process and the invisible area in the target deformation map, so it can generate natural and realistic face images.
  • the generative confrontation network model structure includes a generator and a discriminator.
  • the generator is used to generate an improved image, which is based on the target generated by S205
  • the face image is input, and the image with improved artificial texture and invisible area is the output.
  • the input image is the target face image generated based on S205, which has deformation (artifact) at the boundary position.
  • the repaired face image can be generated.
  • the discriminator is used to determine whether the image generated by the generator is real.
  • the image generated by the generator takes the image generated by the generator as input to determine whether the image is real Face image, if yes, then output the image as the optimized target face image, if not, then regenerate the improved image, and perform the corresponding discrimination steps until the image generated by the generator is judged as a real person by the discriminator Face image.
  • the embodiment of the present application also provides an exemplary implementation manner for training a generative confrontation network model.
  • the method includes:
  • S501 Determine a second training sample set.
  • Each training sample in the second training sample set includes the second sample face image and a calibration face image corresponding to the second sample face image.
  • the second sample face image refers to the image generated based on the initial sample face image and its corresponding initial optical flow map, optical flow incremental map and visible probability map
  • the calibrated face image refers to the pre-calibrated real person Face image.
  • the initial sample face image is a face image with the head 15° to the left
  • the facial expression is a smiling face image
  • the second sample face image is based on the face image generation method provided by the embodiment shown in FIG. 2A
  • the head of the person is 30° to the right
  • the facial expression is sad
  • the calibration face image is the head of the person is 30° to the right
  • the facial expression is sad. image.
  • the server uses the pixel error and the confrontation loss function as the loss function, and uses the training samples in the second training sample set to train the generative confrontation network, thereby obtaining the generative confrontation network model.
  • the face image generation method provided by the embodiments of the present application can be applied to many fields through artificial intelligence (AI) technology.
  • AI artificial intelligence
  • it can be applied to the field of social networking or video editing, and it can be synthesized according to the first face image.
  • Multiple target face images based on a variety of different target face images, generate dynamic short videos or dynamic expressions, and apply them to scenes such as virtual anchors, movie special effects, or program synthesis.
  • the second reference element includes multiple sets of ordered target model parameters or multiple ordered second face images derived from a specified video or multiple ordered images derived from a specified animation A second face image; in response to the second reference element including multiple sets of ordered target model parameters, for each set of target model parameters in order, determine a three-dimensional face variable model corresponding to the target model parameters, as and The second model corresponding to the target model parameter; in response to the second reference element including a plurality of ordered second face images, for each second face image in order, it is determined to correspond to the second face image
  • the variable three-dimensional face model of is used as the second model corresponding to the second face image.
  • the server executes the step of determining the initial optical flow diagram corresponding to the first face image according to the first model and the second model in order for each of the second models. In this way, more information can be obtained.
  • An initial optical flow graph can generate multiple ordered target face images.
  • the server can generate an ordered image set according to the order of each second model and the target face images generated based on each second model.
  • the ordered image set may specifically be a video or a dynamic expression, etc., and the form of the set is not limited in this embodiment.
  • the target face image and the first face image generated based on the above method are face images belonging to the same person, and the server can generate the same person based on multiple ordered target face images.
  • the corresponding video or emoticon package See Figure 6, which shows a schematic diagram of the effect of generating a target face image.
  • the server takes the first column and the second column of images (as shown in 61 in Figure 6) as input to generate targets in different poses. Face image. Taking the input image as the first row of images in the first column and the second column of images (as shown at 62 in Figure 6) as an example, the corresponding first model can be determined based on the image 62, and the corresponding first model can be determined based on the target model parameters.
  • the initial optical flow diagram corresponding to the image 62 can be determined, and the corresponding initial deformation diagram can be obtained by deforming the image 62 according to the initial optical flow diagram.
  • Deformation map, the corresponding optical flow incremental map and visible probability map can be obtained through the convolutional neural network, and the target face image can be generated according to the image 62 and its corresponding initial optical flow map, optical flow incremental map and visible probability map ( As shown at 63 in Figure 6). Because these multiple target face images in different poses reflect the process of changing the character’s head from one angle to another, the server can generate the character’s head from one angle to another based on multiple target face images. A video or dynamic emoticon with a changing angle.
  • the second face image and the first face image may be face images belonging to the same person, or face images belonging to different people.
  • the second face image and the first face image are face images belonging to the same person, the effect of generating the target face image is similar to that in Figure 6, when the second face image and the first face image belong to different people
  • the server may generate a target face image with a difference in posture and/or expression from the first face image. Based on multiple ordered target face images, it may generate a video or video corresponding to the person in the first face image. Emoticons.
  • the first face image may be a face image of a public figure, such as a face image of a star;
  • the second face image may be any face image with a designated posture and/or a designated expression, for example, it may be a non-public figure
  • a face image with a designated posture and/or a designated expression is presented.
  • a target face image in which a public figure presents a specified posture and/or a specified expression can be generated, and the target face image in which a public figure presents a specified posture and/or a specified expression according to multiple public figures can be generated
  • the server uses the first two columns of images (shown as 71 in Figure 7) as input images, and the first row of images (shown in Figure 7) 72) as a drive to generate a target face image.
  • the two images included in the first row of images 73 in 71 are used as the first reference element, and the multiple images included in 72 are used as the second reference element, and the first reference element and the second reference element are generated respectively.
  • the target face image and the first face image in the first reference element belong to the same person, and the pose and expression of the person in the target face image are compared with the second reference The posture and expression of the person in the second face image in the element are the same.
  • the server can generate a video or dynamic expression related to the person in the first face image based on multiple target face images.
  • the server may also receive a sharing instruction for the ordered image set, and share the ordered image set according to the sharing instruction, so as to attract users and increase user activity.
  • the server In live broadcast application scenarios or social network application scenarios, users are often required to configure avatars. Based on this, the method provided in the embodiments of the present application can also be used to implement personalized customization of avatars according to actual needs of users.
  • the server generates a corresponding target face image in response to a user request, and then sends the target face image to the terminal, instructing the terminal to display the target face image as the user's social network avatar.
  • the face image generation method provided by the embodiment of the application is executed by the terminal, after the terminal generates the target face image, the target face image is set as the user's social network avatar, and the avatar display interface Display the social network avatar.
  • the face image generation method provided by the present application will be introduced in conjunction with the application scenario of video editing.
  • the application scenario includes the server 10 and the terminal 20.
  • the user sends a video generation request to the server 10 through the terminal 20, and the video generation request carries the first reference element and the first reference element.
  • the first reference element includes a first face image
  • the first face image is specifically a face image of a star
  • the second reference element includes multiple second face images
  • the second face image may It is a face image taken when the user shows different postures and expressions.
  • the server 10 uses the face image generation method provided in the embodiment of the application to generate a plurality of ordered target face images.
  • the target face images are specifically the faces of the celebrity when displaying the aforementioned postures and expressions.
  • the server 10 uses the above multiple target face images to generate a video showing the above posture and expression of the star.
  • the server 10 detects the face key point coordinates in the first face image, and then obtains the model parameters through the neural network model according to the face key point coordinates and the first face image, and then determines according to the model parameters
  • the 3DMM corresponding to the first face image is the first model.
  • the server 10 uses the same method to generate the first model to detect the face key point coordinates in the second face image, and according to the face key point coordinates and the second person
  • model parameters are obtained through a neural network model, and then a 3DMM corresponding to the second face image, that is, a second model, is determined according to the model parameters.
  • each second face image corresponds to a second model.
  • the server 10 projects the first model to obtain the input PNCC image according to the projection normalized coordinate coding algorithm, and projects the second model to obtain the target PNCC image, and then finds the pixel with the smallest pixel difference between the input PNCC image and the target PNCC image as the corresponding Point, calculate the pixel difference of each group of corresponding points, and generate the initial optical flow diagram according to the pixel difference of each group of corresponding points.
  • an initial optical flow diagram can be obtained.
  • the server 10 can use the initial optical flow image to transform the first face image into an initial deformation image corresponding to the first face image. In this way, the server 10 can transform the first face image into multiple initial deformation images.
  • the server 10 performs optical flow completion and correction through a convolutional neural network.
  • the server 10 adopts the U-NET structure to train a convolutional neural network.
  • the convolutional neural network takes the first face image and its corresponding initial optical flow map and initial deformation map as input, and takes the first face image corresponding to the Optical flow incremental graph and visible probability graph are output.
  • the server 10 may superimpose the initial optical flow graph and the optical flow incremental graph to obtain the target optical flow graph, and transform the first face image to the one corresponding to the first face image according to the target optical flow graph corresponding to the first face image.
  • the target deformation map, and the target face image is generated according to the target deformation map and the visible probability map corresponding to the first face image.
  • the server 10 may generate multiple target optical flow diagrams corresponding to each second face image one-to-one, and then generate multiple target optical flow diagrams corresponding to each second face image. Multiple target face images in one-to-one correspondence.
  • the server 10 inputs the above-mentioned multiple target face images into the pre-trained generative confrontation network model to eliminate artificial textures and invisible areas and optimize the target face images.
  • the server 10 obtains the optimized target face images.
  • Image based on the optimized target face image, generate a video about the star showing the above-mentioned posture and expression, and return the video to the terminal 20.
  • the face image generation method provided by this application can include three stages of optical flow activation, optical flow completion, and deformation improvement.
  • Figure 8B two facial images based on the user are used to synthesize the user’s target posture and
  • the target face image under the expression is taken as an example to illustrate the specific realization of the above three stages.
  • Figure 8B In the optical flow activation stage, input two first face images (input image 1 and input image 2), that is, the first reference element, and input the second reference element representing the posture and expression of the target face image , Obtain the first model corresponding to the two first face images and the second model corresponding to the second reference element, project the above models to obtain the corresponding PNCC image, and look for the PNCC image corresponding to each first model The point with the smallest pixel difference between it and the target PNCC image is used as the corresponding point. Based on the pixel difference of each group of corresponding points, the initial optical flow diagram can be generated, so that the initial optical flow diagram corresponding to each first face image (initial light Flow chart 1 and initial optical flow chart 2).
  • the optical flow incremental map and the visible probability map corresponding to each input image are obtained through the convolutional neural network.
  • the optical flow map corresponding to each input image (optical Flow diagram 1 and optical flow diagram 2)
  • the input image is deformed according to the optical flow diagram to obtain the target deformation diagram corresponding to each input image.
  • the target deformation map 1 and the target deformation map 2 are combined to obtain the target face image.
  • the target face image is input into the generative confrontation network model, which can optimize the artificial texture, artifacts and invisible areas in the target face image, and generate an optimized target face image.
  • the optimized target face image is output.
  • the embodiment of the present application also provides a corresponding device.
  • the following will introduce the device from the perspective of functional modularity.
  • the apparatus 900 includes:
  • the first model generating module 910 is configured to determine, according to the first face image in the first reference element, a three-dimensional face variable model corresponding to the first face image as the first model;
  • the second model generation module 920 is configured to determine a three-dimensional variable face model corresponding to the second reference element as the second model according to the second reference element; the second reference element is used to represent the pose of the target face image And/or emoticons,
  • the determining module 930 is configured to determine an initial optical flow diagram corresponding to the first face image according to the first model and the second model, and perform processing on the first face image according to the initial optical flow diagram. Transform to an initial deformation map corresponding to the first face image;
  • the obtaining module 940 is configured to obtain the optical flow corresponding to the first face image through a convolutional neural network according to the first face image and the initial optical flow map and the initial deformation map corresponding to the first face image Incremental graph and visible probability graph;
  • the target face image generation module 950 is configured to generate the target face image according to the first face image and the initial optical flow map, the optical flow incremental map, and the visible probability map corresponding to the first face image .
  • Fig. 10 is a schematic structural diagram of a face image generating apparatus provided by an embodiment of the application. Based on the structure shown in Fig. 9, the apparatus 900 further includes:
  • the optimization module 960 is configured to optimize the target face image by using a generative confrontation network model to obtain an optimized target face image output by the generative confrontation neural network model.
  • the second model generation module 920 is specifically configured to:
  • a three-dimensional variable face model corresponding to the target model parameter is determined as the second model according to the target model parameter.
  • the second model generation module 920 is specifically configured to:
  • the second face image is different from the first face image.
  • FIG. 11 is a schematic structural diagram of a face image generating apparatus provided by an embodiment of the application. Based on the structure shown in FIG. 9, the apparatus 900 further includes:
  • the variable three-dimensional face model generation module 970 is configured to detect the coordinates of key points of the face in the first face image; construct an initial three-dimensional face variable model based on the average face, and convert the initial three-dimensional face variable model Projection of the three-dimensional coordinates to the two-dimensional image to obtain the projection coordinates; determine the first model parameter that minimizes the distance between the key point coordinates of the face and the projection coordinates, and determine the first model parameter according to the first model parameter.
  • the variable 3D face model corresponding to the image.
  • the face image generating apparatus may also include a three-dimensional face variable model generating module 970 based on the structure shown in FIG. 10, which is not limited in this embodiment.
  • the three-dimensional variable face model generation module 970 determines the three-dimensional variable face model corresponding to the first face image in the following manner:
  • a three-dimensional variable face model corresponding to the first face image is determined according to the second model parameter.
  • the target face image generation module 950 is specifically configured to:
  • the target face image is generated according to the target deformation map and the visible probability map corresponding to the first face image.
  • FIG. 12 is a schematic structural diagram of a face image generating apparatus provided by an embodiment of the application.
  • the second reference element includes multiple groups of ordered targets. Model parameters or multiple ordered second face images;
  • the second model generation module 920 is specifically configured to:
  • a three-dimensional face variable model corresponding to the target model parameters is determined as corresponding to the target model parameters Second model
  • the determining module 930 is specifically configured to:
  • the device 900 further includes:
  • the image set generating module 980 is configured to generate an ordered image set according to the order of each second model and the target face image generated based on each second model.
  • FIG. 13 is a schematic structural diagram of a face image generating apparatus provided by an embodiment of this application. Based on the structure shown in FIG. 12, the apparatus 900 further includes:
  • the image set sharing module 981 is configured to receive a sharing instruction for the ordered image set, and share the ordered image set according to the sharing instruction.
  • FIG. 14 is a schematic structural diagram of a face image generating apparatus according to an embodiment of the application. Based on the structure shown in FIG. 9, the apparatus 900 further includes:
  • the sending module 990 is configured to send the target face image to the terminal, and instruct the terminal to display the target face image as the user's social network avatar.
  • the first reference element includes multiple different first face images belonging to the same person
  • the first model generation module 910 is specifically configured to:
  • a three-dimensional variable face model corresponding to the first face image is determined as a first model corresponding to the first face image.
  • FIG. 15 is a schematic structural diagram of a face image generating device provided by an embodiment of the application. Based on the structure shown in FIG. 9, the convolutional neural network adopts an encoder and a decoder. Network structure
  • the device 900 further includes:
  • the convolutional neural network training module 991 is used to determine a first training sample set, each training sample in the first training sample set includes at least one set of image data and label data corresponding to the image data, the image data Including a first sample face image and an initial optical flow map and an initial deformation map corresponding to the first sample face image.
  • the initial deformation map corresponding to the first sample image is based on the first sample face image.
  • the initial optical flow map transforms the first sample face image; the label data includes a calibrated optical flow incremental map and a visible probability map; network training is performed on the training samples in the first training sample set to obtain The convolutional neural network.
  • FIG. 16 is a schematic structural diagram of a face image generating apparatus according to an embodiment of the application. Based on the structure shown in FIG. 10, the apparatus 900 further includes:
  • the generative confrontation network model training module 961 is used to determine a second training sample set.
  • Each training sample in the second training sample set includes a second sample face image and a calibration person corresponding to the second sample face image Face image; training a generative confrontation network through training samples in the second training sample set to obtain the generative confrontation network model.
  • the embodiment of the present application also provides a device for generating a face image.
  • the device may be a server or a terminal.
  • the device provided in the embodiment of the present application will be described in detail below from the perspective of hardware materialization.
  • FIG. 17 is a schematic diagram of a server structure provided by an embodiment of the present application.
  • the server 1700 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1722 (for example, , One or more processors) and memory 1732, one or more storage media 1730 for storing application programs 1742 or data 1744 (for example, one or one storage device with a large amount of storage).
  • the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
  • the central processing unit 1722 may be configured to communicate with the storage medium 1730, and execute a series of instruction operations in the storage medium 1730 on the server 1700.
  • the server 1700 may also include one or more power supplies 1726, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758, and/or one or more operating systems 1741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the steps performed by the server in the foregoing embodiment may be based on the server structure shown in FIG. 17.
  • the CPU 1722 is used to perform the following steps:
  • the second reference element Determining, according to the second reference element, a three-dimensional variable face model corresponding to the second reference element as the second model; the second reference element is used to represent the pose and/or expression of the target face image;
  • the first model and the second model determine the initial optical flow map corresponding to the first face image, and transform the first face image to the first face image according to the initial optical flow map.
  • the optical flow increment map and the visible probability corresponding to the first face image are obtained through a convolutional neural network Figure
  • the optical flow incremental map and the visible probability map corresponding to the first face image is generated.
  • the CPU 1722 may also be used to execute the steps of any implementation manner of the method for generating a face image in the embodiment of the present application.
  • the embodiment of the present application also provides another device for generating a face image.
  • the device is a terminal, as shown in FIG. 18.
  • the terminal can be any terminal device including mobile phone, tablet computer, personal digital assistant (English full name: Personal Digital Assistant, English abbreviation: PDA), sales terminal (English full name: Point of Sales, English abbreviation: POS), on-board computer, etc. Take the terminal as a mobile phone as an example:
  • FIG. 18 shows a block diagram of a part of the structure of a mobile phone related to a terminal provided in an embodiment of the present application.
  • the mobile phone includes: radio frequency (English full name: Radio Frequency, English abbreviation: RF) circuit 1810, memory 1820, input unit 1830, display unit 1840, sensor 1850, audio circuit 1860, wireless fidelity (full English name: wireless fidelity , English abbreviation: WiFi) module 1870, processor 1880, and power supply 1890.
  • radio frequency English full name: Radio Frequency, English abbreviation: RF
  • memory 1820 input unit 1830
  • display unit 1840 sensor 1850
  • audio circuit 1860 includes: wireless fidelity (full English name: wireless fidelity , English abbreviation: WiFi) module 1870, processor 1880, and power supply 1890.
  • wireless fidelity full English name: wireless fidelity , English abbreviation: WiFi
  • the memory 1820 may be used to store software programs and modules.
  • the processor 1880 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1820.
  • the memory 1820 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones.
  • the memory 1820 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the processor 1880 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 1820, and calling data stored in the memory 1820. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
  • the processor 1880 may include one or more processing units; preferably, the processor 1880 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1880.
  • the processor 1880 included in the terminal also has the following functions:
  • the second reference element Determining, according to the second reference element, a three-dimensional variable face model corresponding to the second reference element as the second model; the second reference element is used to represent the pose and/or expression of the target face image;
  • the first model and the second model determine the initial optical flow map corresponding to the first face image, and transform the first face image to the first face image according to the initial optical flow map.
  • the optical flow increment map and the visible probability corresponding to the first face image are obtained through a convolutional neural network Figure
  • the optical flow incremental map and the visible probability map corresponding to the first face image is generated.
  • the processor 1880 may also be configured to execute steps of any implementation manner of the method for generating a face image in the embodiment of the present application.
  • the embodiments of the present application also provide a computer-readable storage medium for storing program code, and the program code is used to execute any implementation of the method for generating a face image described in the foregoing embodiments.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes instructions that, when run on a computer, cause the computer to execute any one of the face image generation methods described in the foregoing embodiments. the way.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the related technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic Various media that can store program codes, such as discs or optical discs.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are in an “or” relationship.
  • the following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种人脸图像生成方法,包括:根据第一参考元素中的第一人脸图像确定与其对应的三维人脸可变模型作为第一模型;根据第二参考元素确定与其对应的三维人脸可变模型作为第二模型;根据第一模型和第二模型,确定第一人脸图像对应的初始光流图,根据初始光流图对第一人脸图像进行形变得到初始形变图;根据第一人脸图像及其对应的初始光流图和初始形变图,通过卷积神经网络获得光流增量图和可见概率图;根据第一人脸图像及其对应的初始光流图、光流增量图和可见概率图,生成目标人脸图像。该方法一方面实现了参数化控制,另一方面基于光流保留原始图像细节信息,从而使得生成的图像逼真自然。本申请还公开了对应的装置、设备及介质。

Description

人脸图像生成方法、装置、设备及存储介质
本申请要求于2019年03月22日提交的申请号为201910222403.3、发明名称为“一种人脸图像生成方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种人脸图像生成方法、装置、设备及存储介质。
背景技术
如今,很多场景下会用到人脸图像生成技术,以一张或者多张人脸图像作为输入,生成与该输入的人脸图像姿态、面部表情相类似的其他人脸图像;例如,以一个人的一张微笑人脸图像作为基础,通过人脸图像生成技术生成该人或者其他人的微笑人脸图像。
现有的人脸图像生成技术直接依赖生成式对抗网络来合成人脸图像,该生成式对抗网络的参数空间比较大、模型复杂性比较高,其实际训练效果并不好,容易出现过拟合,导致合成的人脸图像还不够自然逼真,而且其仅以特定人脸图像为目标,无法实现个性化的人脸图像合成。
发明内容
本申请实施例提供了一种人脸图像生成方法,通过三维人脸可变模型生成初始光流图,再基于卷积神经网络对初始光流图进行光流补全,基于光流补全后的目标光流图最终合成目标人脸图像,如此,既能够保留第一参考元素中人脸图像的轮廓,又能够保留第二参考元素所表征的目标人脸图像的位姿和表情,使得生成的目标人脸图像更逼真自然,而且,基于三维人脸可变模型能实现个性化的人脸图像合成。对应地,本申请实施例还提供了一种人脸图像生成装置、设备、计算机可读存储介质以及计算机程序产品。
有鉴于此,本申请第一方面提供了一种人脸图像生成方法,所述方法包括:
根据第一参考元素中的第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
本申请第二方面提供一种人脸图像生成装置,所述装置包括:
第一模型生成模块,用于根据第一参考元素中的第一人脸图像确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
第二模型生成模块,用于根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
确定模块,用于根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
获取模块,用于根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
目标人脸图像生成模块,用于根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
本申请第三方面提供一种设备,所述设备包括处理器以及存储器:
所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;
所述处理器用于根据所述程序代码中的指令,执行如上述第一方面所述的人脸图像生成方法的步骤。
本申请第四方面提供一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行上述第一方面所述的人脸图像生成方法。
本申请第五方面提供一种计算机程序产品,所述计算机程序产品包括指令,所述指令在计算机上运行时,使得所述计算机执行上述第一方面所述的人脸图像生成方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例中提供了一种人脸图像生成方法,根据第一参考元素中的第一人脸图像确定与第一人脸图像对应的三维人脸可变模型作为第一模型,根据用于表征目标人脸图像 的姿态和/或表情的第二参考元素确定与其对应的三维人脸可变模型作为第二模型,然后根据第一模型和第二模型确定第一人脸图像对应的初始光流图;可知,该方法通过三维人脸可变模型确定出初始光流图,一方面能够保留第一参考元素中人脸图像的轮廓和第二参考元素所标识的目标人脸的姿态和表情中的至少一个,另一方面能够通过三维人脸可变模型实现参数化控制,方便用户根据实际需求实现个性化的图像合成,接着根据该初始光流图对第一人脸图像进行形变得到对应的初始形变图,通过卷积神经网络获得第一人脸图像对应的光流增量图和可见概率图,再根据第一人脸图像及其对应的初始光流图、光流增量图和可见概率图生成目标人脸图像,使得其保留原始图像更多的细节信息,因而较为逼真和自然。此外,由于不再依赖单一网络,而是通过不同的小型网络分别实现相应的功能,如此大大减小了参数空间,降低了模型复杂性,提高了泛化性能,在实际应用时,能够生成自然逼真的人脸图像。
附图说明
图1为本申请实施例中人脸图像生成方法的场景架构图;
图2A为本申请实施例中人脸图像生成方法的流程图;
图2B为基于图2A进行图像合成的示例效果图;
图2C为基于图2A生成初始光流图的示例效果图;
图3为本申请实施例中基于神经网络模型确定与第一人脸图像对应的三维人脸可变模型的流程图;
图4为本申请实施例中卷积神经网络输入输出示意图;
图5A为本申请实施例中生成式对抗网络模型的结构示意图;
图5B为本申请实施例中生成式对抗网络模型训练方法的流程图;
图6为本申请实施例中生成目标人脸图像的效果示意图;
图7为本申请实施例中生成目标人脸图像的效果示意图;
图8A为本申请实施例中人脸图像生成方法的应用场景示意图;
图8B为本申请实施例中人脸图像生成方法的另一应用场景示意图;
图9为本申请实施例中人脸图像生成装置的一个结构示意图;
图10为本申请实施例中人脸图像生成装置的一个结构示意图;
图11为本申请实施例中人脸图像生成装置的一个结构示意图;
图12为本申请实施例中人脸图像生成装置的一个结构示意图;
图13为本申请实施例中人脸图像生成装置的一个结构示意图;
图14为本申请实施例中人脸图像生成装置的一个结构示意图;
图15为本申请实施例中人脸图像生成装置的一个结构示意图;
图16为本申请实施例中人脸图像生成装置的一个结构示意图;
图17为本申请实施例中服务器的一个结构示意图;
图18为本申请实施例中终端的一个结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
针对相关技术中,基于生成式对抗网络生成人脸图像存在的训练不稳定和模式丢失的问题,以及模型复杂性变高、泛化性能较差导致基于生成对抗神经网络生成的人脸图像不够自然逼真的问题,本申请提供了一种基于光流图的人脸图像生成方法,该方法通过三维人脸可变模型(3D Morphable Models,3DMM)确定初始光流图,如此,该方法一方面能够保留第一参考元素中第一人脸图像的轮廓和第二参考元素所标识的目标人脸图像的姿态和表情,另一方面能够通过三维人脸可变模型实现参数化控制,方便用户根据实际需求实现个性化的图像合成,然后根据该初始光流图对人脸图像进行形变得到初始形变图,根据初始光流图和初始形变图,利用卷积神经网络获取对应的光流增量图和可见概率图,再根据第一人脸图像及其对应的初始光流图、光流增量图和可见概率图生成目标人脸图像,使得其保留原始图像更多的细节信息,因而较为逼真和自然。
由于不再依赖单一网络,而是通过不同的小型网络分别实现相应的功能,如此大大减小了参数空间,降低了模型复杂性,提高了泛化性能,在实际应用时,能够生成自然逼真的人脸图像。
可以理解,本申请提供的人脸图像生成方法可以应用于具有图形处理能力的处理设备,该处理设备可以是任意包括中央处理器(Central Processing Unit,CPU)和/或图形处理器(Graphics Processing Unit,GPU)的终端或服务器,处理设备在执行本申请提供的人脸图像生成方法时,可以是独立执行,也可以通过集群协作的方式执行。需要说明的是,该方法可以采用应用程序或软件的形式存储于处理设备,处理设备通过执行该应用程序或软件实现本申请提供的人脸图像生成方法。
为了使得本申请的技术方案更加清楚、易于理解,下面将结合具体场景对本申请提供的人脸图像生成方法进行介绍。参见图1所示的人脸图像生成方法的场景架构图,该场景中包括服务器10和终端20,终端20向服务器10发送人脸图像生成请求,该人脸图像生成请求中携带有第一参考元素和第二参考元素,其中,第一参考元素包括第一人脸图像,第二参考元素用于表征目标人脸图像的姿态和/或表情,服务器10根据第一参考元素中的第一人脸图像确定与该第一人脸图像对应的3DMM作为第一模型,根据第二参考元素确定与该第二参考元素对应的3DMM作为第二模型,然后根据第一模型和第二模型确定第一人脸图像对应的初始光流图,根据该初始光流图对第一参考元素中的第一人脸图像进行形变得到第一人脸图像对应的初始形变图,服务器10再根据第一参考元素中的第一人脸图像及该第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得第一人脸图像对应的光流增量图和可见概率图,基于第一参考元素中的第一人脸图像及该第一人脸图像对应的初始光流图、光流增量图和可见概率图生成目标人脸图像,然后服务器10向终端20返回目标人脸图像。
接下来,从服务器的角度对本申请实施例提供的人脸图像生成方法的各个步骤进行详细说明。
参见图2A所示的人脸图像生成方法的流程图,该方法包括:
S201:根据第一参考元素中的第一人脸图像确定与所述第一人脸图像对应的3DMM作为第一模型。
S202:根据第二参考元素确定与所述第二参考元素对应的3DMM作为第二模型。
所述第一参考元素包括第一人脸图像,所述第二参考元素用于表征目标人脸图像的姿态和/或表情,本申请实施例提供的人脸图像生成方法即为在第一人脸图像的基础上生成指定姿态和/或指定表情的目标人脸图像。
在一种可能实现方式中,姿态是指身体呈现的样子,具体到本实施例,姿态可以理解为头部呈现的样子,姿态可以通过头部中轴线与水平方向或竖直方向的角度进行表征。作为本申请的一些具体示例,姿态可以包括与竖直方向呈30°夹角左偏,或者与竖直方向呈 60°夹角右偏。
表情是指表达在面部或姿态上的思想感情。针对面部表情,可以通过五官与正常情况下的差异进行表征,如通过嘴角上翘表征微笑、嘴角下垂表征沮丧等,当然,有些表情也可以通过姿态进行表征,例如不知所措的表情可以通过手挠头的姿态进行表征。
基于此,第二参考元素可以通过不同形式表征目标人脸图像的姿态和/或表情。在一些可能的实现方式中,第二参考元素可以包括表征姿态和/或表情的目标模型参数,也可以包括第二人脸图像,该第二人脸图像与第一人脸图像存在差异,在此种情形下,第二人脸图像中的姿态和/或表情即表征目标人脸图像的姿态和/或表情。
在本实施例中,响应于第二参考元素包括目标模型参数,根据所述目标模型参数确定与该目标模型参数对应的3DMM,作为第二模型;响应于所述第二参考元素包括第二人脸图像,根据所述第二人脸图像确定与该第二人脸图像对应的3DMM,作为第二模型。
本申请实施例提供了通过数学算法计算模型系数和通过网络直接确定模型系数两种实现方式确定与第一人脸图像对应的3DMM。下面对这两种实现方式进行详细说明。
一种实现方式为,服务器检测第一人脸图像中的人脸关键点坐标,根据平均脸构建初始3DMM,将初始3DMM的三维坐标投影至二维图像得到投影坐标,然后确定使得所述人脸关键点坐标与所述投影坐标距离最小化的第一模型参数,根据所述第一模型参数确定与所述第一人脸图像对应的3DMM。
具体地,平均脸是指从一定数量的普通人脸提取面部特征,根据测量数据求平均值,再利用计算机技术得到的一张合成脸。根据平均脸构建的初始3DMM可以通过人脸3D点的集合表征,该集合记作S={p=(x,y,z)},初始3DMM为3D人脸的线性模型,具体可以通过如下公式表征:
Figure PCTCN2020080335-appb-000001
其中,
Figure PCTCN2020080335-appb-000002
为平均脸,A id和A exp为形状基与表情基,a id和a exp分别为形状基与表情基各自对应的系数。初始3DMM可以按照如下弱投影模型投影至2D图像得到投影坐标:
V(p)=f*Pr*R*S+t 2d      (2)
其中,f为相机的焦距,Pr为正交投影矩阵,在一个示例中,
Figure PCTCN2020080335-appb-000003
R为对应旋转角旋转矩阵,t 2d为像素平移参数。针对单张第一人脸图像,服务器检测该第一人脸图像中的人脸关键点坐标u(x,y),则人脸关键点坐标与投影坐标距离E 1可以通过下式表征:
E 1=Σ||u(x,y)-V(p)||         (3)
通过最小化E 1,可以求解得到第一模型参数[a id,a exp,f,R,t 2d],根据该第一模型参数 对初始3DMM中的参数进行更新,可以确定与第一人脸图像对应的3DMM。
另一种实现方式为,服务器检测第一人脸图像中的人脸关键点坐标,然后根据所述第一人脸关键点坐标和所述第一人脸图像,通过神经网络模型获得第二模型参数,再根据所述第二模型参数确定与所述第一人脸图像对应的3DMM。
图3示出了基于神经网络模型确定与第一人脸图像对应的3DMM的流程图,如图3所示,该神经网络模型包括深度编码器和基于模型的解码器,输入第一人脸图像后,对该第一人脸图像进行人脸特征检测,得到人脸关键点坐标,神经网络模型的深度编码器(Deep Encoder)可以对第一人脸图像以及人脸关键点坐标编码,然后语义编码向量对编码文本进行语义编码,其中,编码器可以通过alexNet或VGG-Face实现,语义编码向量可以通过神经网络模型的模型参数[a id,a exp,f,R,t 2d]实现,接着,神经网络模型利用基于模型的解码器(Model-based Decoder)对语义编码后的文本解码以重建图像,接着服务器计算模型的损失函数,该损失函数至少包括人脸关键点坐标与投影坐标的距离,以及人脸关键点投影亮度差,其中,人脸关键点坐标与投影坐标的距离的计算可以参见式3,投影亮度差的计算可以参见如下公式:
E 2=Σ||Iu(x,y)-I(V(p))||         (4)
其中,E 2表征投影亮度差,I表征亮度,Iu(x,y)即为检测第一人脸图像得到的人脸关键点u(x,y)的亮度,I(V(p))即为人脸关键点从3DMM投影至2D图像时的亮度。
需要说明的是,当第二参考元素包括第二人脸图像时,根据第二人脸图像确定与该第二人脸图像对应的3DMM的过程可以参照上述确定第一模型的两种实现方式中的任一种实现方式,本实施例不再赘述。
当第二参考元素包括表征姿态或表情的目标模型参数时,则服务器可以直接基于该目标模型参数确定与该目标模型参数对应的3DMM。具体地,当第二参考元素包括的目标模型参数中仅包括模型参数[a id,a exp,f,R,t 2d]中的部分参数时,则可以利用第二参考元素中的部分参数替换初始模型参数中的部分参数,将其余参数保持默认值不变,得到更新后的模型参数。根据更新后的参数可以确定与目标模型参数对应的3DMM。
还需要说明的是,第一参考元素可以包括一张第一人脸图像,也可以包括多张第一人脸图像。当所述第一参考元素包括属于同一人的多张不同的第一人脸图像时,服务器可以针对所述第一参考元素中的每张第一人脸图像,确定与该第一人脸图像对应的三维人脸可变模型,作为与该第一人脸图像对应的第一模型。
为了便于理解,下面结合图2B对方法实现效果进行示例性说明。如图2B所示,第一参考元素包括两张第一人脸图像211,第二参考元素包括一张第二人脸图像212,根据第一 人脸图像211确定对应的第一模型,根据第二人脸图像212确定对应的第二模型。需要说明的是,由于第一人脸图像211中包括两张第一人脸图像,因此,第一模型中包括分别与两张第一人脸图像对应的两个第一模型。
S203:根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图。
在本实施例中,服务器对比第一模型和第二模型,并基于投影几何关系计算初始光流图。在一些可能的实现方式中,服务器可以通过投影归一化坐标编码图像(Projected Normalized coordinate code,PNCC)计算初始光流图。具体地,服务器根据投影归一化坐标编码算法对第一模型投影得到输入PNCC图像,对第二模型投影得到目标PNCC图像,然后查找输入PNCC图像和目标PNCC图像中像素差最小的像素点作为对应点,计算每组对应点的像素差,根据每组对应点的像素差生成初始光流图。
请参见图2C,输入两张第一人脸图像(输入图像1和输入图像2)以及表征目标人脸图像姿态和表情的第二参考元素,得到分别与两张第一人脸图像对应的3DMM模型(3DMM模型1和3DMM模型2)和与第二参考元素对应的目标3DMM模型(即第二模型)后,可以通过分别对上述3DMM模型进行投影得到对应的PNCC图像,具体地,对3DMM模型1进行投影得到PNCC 1,对3DMM模型2进行投影得到PNCC2,对目标3DMM模型进行投影得到PNCC T。查找PNCC 1和PNCC T中像素差最小的像素点作为对应点,计算每组对应点的像素差,根据每组对应点的像素差生成与输入图像1对应的初始光流图1;类似地,查找PNCC2和PNCC T中像素差最小的像素点作为对应点,计算每组对应点的像素差,根据每组对应点的像素差生成与输入图像2对应的初始光流图2。需要说明的时,初始光流图一般以彩色形式呈现,图2C中的初始光流图仅为将彩色转成灰度的效果。
进一步地,服务器可以根据初始光流图对第一人脸图像进行形变得到第一人脸图像对应的初始形变图。可以理解的是,初始光流图描述的是第一参考元素中的第一人脸图像与第二参考元素所表征的图像之间的像素对应关系,因此,根据初始光流图,找到初始光流图对应的在第一人脸图像上的像素位置,将初始光流图中的像素值复制到第一人脸图像上对应的像素位置处,得到第一人脸图像对应的初始形变图。
需要说明的是,当第一参考元素包括多张第一人脸图像时,先分别得到每张第一人脸图像对应的初始光流图,然后根据每张第一人脸图像对应的初始形变图对该张人脸图像进行形变,得到该张人脸图像对应的初始形变图。也就是说,当第一参考元素包括多张第一人脸图像时,分别得到每张第一人脸图像对应的初始形变图。
请参见图2B,根据第一模型和第二模型确定出初始光流图213后,根据初始光流图213对第一人脸图像211进行形变生成对应的初始形变图214。初始形变图214中包括分别与两张第一人脸图像对应的初始形变图。
S204:根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图。
由于初始光流图中仅包含人脸部分区域,服务器可以对初始光流图进行补全和矫正,以生成逼真自然的人脸图像。在实际应用时,服务器可以通过卷积神经网络获得第一人脸图像对应的光流增量图和可见概率图。
其中,光流增量图是根据第一人脸图像各像素点的光流增量形成的,根据第一人脸图像各像素点的光流增量和第一人脸图像对应的初始光流图中各像素点的初始光流可以生成第一人脸图像各像素点的光流,从而实现光流补全和矫正。可见概率图表征了第一人脸图像中各像素点出现在目标人脸图像中的概率,基于该可见概率图可以确定在目标人脸图像中保留的第一人脸图像细节。
请参见图2B,将第一人脸图像211及其对应的初始光流图213和初始形变图214输入卷积神经网络,获得卷积神经网络输出的第一人脸图像211对应的光流增量图215和可见概率图216。光流增量图215中包括分别与两张第一人脸图像对应的光流增量图,可见概率图216中包括分别与两张第一脸图像对应的可见概率图。
在一种可能实现方式中,卷积神经网络可以采用编码器和解码器的网络结构。作为本申请的一个示例,该网络结构具体可以是U-NET结构。U-NET是一种基于编码器-解码器结构的卷积神经网络,常用于图像分割任务。编码器结构通过池化层降低空间维度并提取图像语义特征,解码器结构通过反卷积层修复物体的细节并恢复空间维度。编码器和解码器之间存在快捷连接,以帮助解码器更好地复原目标的细节信息。
具体到本实施例,U-NET以第一人脸图像及其对应的初始光流图和初始形变图为输入,以第一人脸图像对应的光流增量图和可见概率图为输出。图4示出了卷积神经网络输入输出示意图,在该示例中,该卷积神经网络采用U-NET网络结构,I 0和I 1分别表征两张第一人脸图像,
Figure PCTCN2020080335-appb-000004
Figure PCTCN2020080335-appb-000005
分别表征I 0和I 1各自对应的初始光流图,
Figure PCTCN2020080335-appb-000006
Figure PCTCN2020080335-appb-000007
分别表征I 0和I 1各自对应的初始形变图,ΔF 0→t和ΔF 1→t分别表征I 0和I 1各自对应的光流增量图,V 0→t和V 1→t分别表征I 0和I 1各自对应的可见概率图,该卷积神经网络以I 0、I 1
Figure PCTCN2020080335-appb-000008
Figure PCTCN2020080335-appb-000009
为输入,以ΔF 0→t、ΔF 1→t、V 0→t和V 1→t为输出。
本申请实施例还提供了训练卷积神经网络的一种实现方式,具体地,服务器确定第一 训练样本集,所述第一训练样本集中的每个训练样本包括至少一组图像数据及该图像数据对应的标签数据,所述图像数据包括第一样本人脸图像及该第一样本人脸图像对应的初始光流图和初始形变图,第一样本人脸图像对应的初始形变图根据第一样本人脸图像对应的初始光流图对第一样本人脸图像进行形变得到;所述标签数据包括标定的光流增量图和可见概率图,然后服务器通过所述第一训练样本集中的训练样本进行网络训练,获得所述卷积神经网络。示例性地,服务器通过所述第一训练样本集中的训练样本训练U-NET网络,获得所述卷积神经网络。
S205:根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成目标人脸图像。
在具体实现时,服务器可以根据所述第一人脸图像对应的光流增量图对所述第一人脸图像对应的初始光流图进行光流补全,得到所述第一人脸图像对应的目标光流图,然后根据所述第一人脸图像对应的目标光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的目标形变图,再根据所述第一人脸图像对应的目标形变图和可见概率图,生成目标人脸图像。
在具体实现时,当第一参考元素包括一张第一人脸图像时,所述目标人脸图像可以通过所述第一人脸图像对应的目标形变图和可见概率图的乘积来确定;
当第一参考元素包括多张第一人脸图像时,所述目标人脸图像具体可以通过以下方式确定:
根据各所述第一人脸图像对应的可见概率图确定各所述第一人脸图像对应的目标形变图的权值,利用该权值对各所述第一人脸图像对应的目标形变图进行加权平均,得到目标人脸图像。
下面以第一参考元素包括两张不同的第一人脸图像为例进行示例说明。
仍以图2B作为示例进行说明,根据第一人脸图像211对应的光流增量图215对初始光流图213进行光流补全,得到与第一人脸图像211对应的目标光流图,根据该目标光流图对第一人脸图像211进行形变可以得到目标形变图217,然后,根据目标形变图217和可见概率图216可以生成目标人脸图像218。
以图4的卷积神经网络为例,服务器根据ΔF 0→t对I 0对应的初始光流图
Figure PCTCN2020080335-appb-000010
进行光流补全得到I 0对应的目标光流图F 0→t,根据ΔF 1→t对I 1对应的初始光流图
Figure PCTCN2020080335-appb-000011
进行光流补全得到I 1对应的目标光流图F 1→t,然后根据F 0→t对I 0进行形变得到I 0对应的目标形变图g(I 0,F 0→t),根据F 1→t对I 1进行形变得到I 1对应的目标形变图g(I 1,F 1→t),服务器可以将各张第一人脸图像对应的可见概率图与其对应的目标形变图对应位置的数值进行乘法运算,针对各张第 一人脸图像运算结果进行求和,再将求和结果除以各第一人脸图像对应的可见概率图之和,从而生成目标人脸图像,具体参见如下公式:
Figure PCTCN2020080335-appb-000012
其中,
Figure PCTCN2020080335-appb-000013
表征目标人脸图像,V 0、V 1分别表征第一人脸图像I 0、I 1对应的可见概率图,g(I 0,F 0→t)、g(I 1,F 1→t)分别表征I 0、I 1对应的目标形变图,⊙表征对两张图像对应位置进行乘法运算。
可以理解的是,当第一参考元素包括n张第一人脸图像时,可以通过如下公式(6)生成目标人脸图像:
Figure PCTCN2020080335-appb-000014
其中,n为大于1的正整数,I n-1表征第一参考元素中的第n张第一人脸图像,V n-1表征I n-1对应的可见概率图,g(I n-1,F n-1→t)表征I n-1对应的目标形变图,公式(6)中其他元素的含义参见公式(5)。
由上可知,本申请实施例提供了一种人脸图像生成方法,该方法是基于光流图实现生成目标人脸图像的,具体地,根据包括第一人脸图像的第一参考元素确定与其对应的三维人脸可变模型作为第一模型,根据用于表征目标人脸图像的姿态和/或表情的第二参考元素确定与其对应的三维人脸可变模型作为第二模型,然后根据第一模型和第二模型确定第一人脸图像对应的初始光流图,一方面利用三维人脸可变模型实现了参数化控制,另一方面通过该初始光流图保留了原始图像身份形状信息,接着根据该初始光流图对第一人脸图像进行形变得到对应的初始形变图,通过卷积神经网络获得第一人脸图像对应的光流增量图和可见概率图,再根据第一人脸图像及其对应的初始光流图、光流增量图和可见概率图生成目标人脸图像,其保留原始图像细节信息,因而较为逼真和自然。此外,由于不再依赖单一网络,而是通过不同的小型网络分别实现相应的功能,如此大大减小了参数空间,降低了模型复杂性,提高了泛化性能,在实际应用时,能够生成自然逼真的人脸图像。
在一些可能的实现方式中,服务器还可以通过生成式对抗网络模型对所述目标人脸图像进行优化,获得所述生成式对抗网络模型输出的优化后的目标人脸图像。该生成式对抗网络模型能够对形变过程中产生的人工纹理以及目标形变图中存在的不可见区域进行进一步改善,因而能够生成自然、逼真的人脸图像。
在一种可能实现方式中,生成式对抗网络模型结构包括生成器和判别器,参见图5A所示的生成式对抗网络模型的结构示意图,生成器用于生成改善后图像,其以S205生成的目标人脸图像为输入,以改善人工纹理和不可见区域后的图像为输出,如图5A所示,输入图 像为基于S205生成的目标人脸图像,其在边界位置存在形变(伪影),在经过生成对抗网络模型的生成器后,可以生成修复后的人脸图像,判别器用于判别生成器生成的图像是否真实,具体地,其以生成器生成的图像为输入,判别该图像是否为真实人脸图像,若是,则输出该图像作为优化后的目标人脸图像,若否,则重新生成改善后的图像,并执行相应的判别步骤,直至生成器生成的图像被判别器判别为真实人脸图像为止。
在实际应用时,本申请实施例还提供了训练生成式对抗网络模型的示例性实现方式。参见图5B所示的生成式对抗网络模型训练方法的流程图,该方法包括:
S501:确定第二训练样本集。
所述第二训练样本集中的每个训练样本包括所述第二样本人脸图像及该第二样人脸图像对应的标定人脸图像。其中,第二样本人脸图像是指根据初始样本人脸图像及其对应的初始光流图、光流增量图和可见概率图所生成的图像,标定人脸图像是指预先标定的真实人脸图像。
为了便于理解,下面结合具体示例对第二训练样本集中的训练样本进行说明。在一个示例中,初始样本人脸图像为头部左偏15°,面部表情为微笑的人脸图像,第二样本人脸图像是指基于图2A所示实施例提供的人脸图像生成方法所生成的人脸图像,在第二样本人脸图像中,人物头部右偏30°,面部表情为哀伤,而标定人脸图像则是人物头部右偏30°,面部表情为哀伤时拍摄所得图像。
S502:通过所述第二训练样本集中的训练样本训练生成式对抗网络,获得所述生成式对抗网络模型。
在具体实现时,服务器以像素误差和对抗损失函数作为损失函数,利用第二训练样本集中的训练样本训练生成式对抗网络,从而获得生成式对抗网络模型。
以上为本申请实施例提供的人脸图像生成方法的一些具体实现方式,为了便于理解,下面将从产品应用的角度对本申请实施例提供的人脸图像生成方法进行介绍。
可以理解,本申请实施例提供的人脸图像生成方法可以通过人工智能(Artificial Intelligence,AI)技术应用于许多领域,例如,可以运用于网络社交领域或视频剪辑领域,根据第一人脸图像合成多张目标人脸图像,基于多种不同目标人脸图像生成动态短视频或动态表情,并将其应用于虚拟主播、电影特效或者程序式合成等场景中。
下面对本申请实施例提供的人脸图像生成方法在产品侧应用进行详细说明。
在一些可能的实现方式中,所述第二参考元素包括多组有序的目标模型参数或者来源于指定视频的多张有序的第二人脸图像或者来源于指定动画的多张有序的第二人脸图像;响应于所述第二参考元素包括多组有序的目标模型参数,按照顺序针对每组目标模型参数, 确定与该目标模型参数对应的三维人脸可变模型,作为与该目标模型参数对应的第二模型;响应于所述第二参考元素包括多张有序的第二人脸图像,按照顺序针对每张第二人脸图像,确定与该第二人脸图像对应的三维人脸可变模型,作为与该第二人脸图像对应的第二模型。
然后,服务器按照顺序针对每个所述第二模型,执行根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图的步骤,如此,可以得到多张初始光流图,进而可以生成多张有序的目标人脸图像,服务器可以根据每个第二模型的顺序和基于每个第二模型生成的目标人脸图像,生成有序图像集。有序图像集具体可以是视频或动态表情等,本实施例对其形式不作限定。
第二参考元素包括目标模型参数时,基于上述方法生成的目标人脸图像和第一人脸图像是属于同一人的人脸图像,服务器可以基于多张有序目标人脸图像,生成与该人物对应的视频或表情包。参见图6,其示出了生成目标人脸图像的效果示意图,在该示例中,服务器以第一列和第二列图像(如图6中61所示)作为输入,生成不同姿态下的目标人脸图像。以输入图像为第一列和第二列图像中的第一行图像(如图6中62所示)为例,基于图像62可以确定对应的第一模型,基于目标模型参数可以确定对应的第二模型,根据第一模型和第二模型可以确定与图像62对应的初始光流图,根据该初始光流图对图像62进行形变可以得到对应的初始形变图,然后根据初始光流图和初始形变图,通过卷积神经网络可以获得对应的光流增量图和可见概率图,根据图像62及其对应的初始光流图、光流增量图和可见概率图可以生成目标人脸图像(如图6中63所示)。由于这多张不同姿态下的目标人脸图像体现了人物头部从某一角度向另一角度变化的过程,因此,服务器可以基于多张目标人脸图像生成人物头部从某一角度向另一角度变化的视频或动态表情。
第二参考元素包括第二人脸图像时,第二人脸图像和第一人脸图像可以是属于同一人的人脸图像,也可以是属于不同人的人脸图像。当第二人脸图像和第一人脸图像是属于同一人的人脸图像时,生成目标人脸图像效果与图6类似,当第二人脸图像和第一人脸图像属于不同人的人脸图像时,服务器可以生成与第一人脸图像中姿态和/或表情存在差异的目标人脸图像,基于多张有序目标人脸图像,可以生成第一人脸图像中人物对应的视频或表情包。
具体地,第一人脸图像可以是公众人物的人脸图像,如明星的人脸图像;第二人脸图像可以是任意具有指定姿态和/或指定表情的人脸图像,例如可以是非公众人物呈现指定姿态和/或指定表情的人脸图像。如此,可以基于本申请实施例提供的人脸图像生成方法生成公众人物呈现指定姿态和/或指定表情的目标人脸图像,根据多张公众人物呈现指定姿态和/或指定表情的目标人脸图像可以生成关于该公众人物的视频或动态表情。
参见图7,其示出了生成目标人脸图像的效果示意图,在该示例中,服务器以前两列图像(如图7中71所示)作为输入图像,以第一行图像(如图7中72所示)作为驱动,生成目标人脸图像。具体地,以71中的第一行图像73包括的两个图像作为第一参考元素,以72中包含的多个图像作为第二参考元素,分别针对第一参考元素和上述第二参考元素生成与其对应的多张有序的目标人脸图像74,该目标人脸图像与第一参考元素中的第一人脸图像属于同一人且该目标人脸图像中人物的姿态和表情与第二参考元素中的第二人脸图像中人物的姿态和表情相同,如此,服务器可以基于多张目标人脸图像生成关于第一人脸图像中人物相关的视频或动态表情。
需要说明的是,服务器生成有序图像集后,还可以接收针对所述有序图像集的分享指令,根据所述分享指令,分享所述有序图像集,以便吸引用户,增加用户活跃度。
在直播应用场景或社交网络应用场景中,常常需要用户配置头像,基于此,本申请实施例提供的方法还可以用于根据用户实际需求实现头像个性化定制。具体地,服务器响应于用户请求,生成对应的目标人脸图像,然后向终端发送所述目标人脸图像,指示所述终端将所述目标人脸图像作为用户的社交网络头像进行显示。
需要说明的是,当本申请实施例提供的人脸图像生成方法由终端执行时,终端在生成目标人脸图像后,将该目标人脸图像设置为用户的社交网络头像,并在头像显示界面显示所述社交网络头像。
为了使得本申请的技术方案更加清楚,下面将结合视频剪辑的应用场景对本申请提供的人脸图像生成方法进行介绍。
参见图8A所示的人脸图像生成方法的应用场景示意图,该应用场景包括服务器10和终端20,用户通过终端20向服务器10发送视频生成请求,该视频生成请求携带有第一参考元素和第二参考元素,第一参考元素包括第一人脸图像,该第一人脸图像具体为某明星的人脸图像,第二参考元素包括多张第二人脸图像,该第二人脸图像可以是用户自身展示不同姿态和表情时拍摄得到的人脸图像。服务器10接收到视频生成请求后,利用本申请实施例提供的人脸图像生成方法生成多张有序的目标人脸图像,该目标人脸图像具体为该明星展示上述姿态和表情时的人脸图像,进一步地,服务器10利用上述多张目标人脸图像生成该明星展示上述姿态和表情的视频。
具体地,服务器10检测第一人脸图像中的人脸关键点坐标,然后根据所述人脸关键点坐标和第一人脸图像,通过神经网络模型获得模型参数,再根据所述模型参数确定所述第一人脸图像对应的3DMM,即第一模型。
然后,针对每一张第二人脸图像,服务器10采用与生成第一模型相同的方式,检测第 二人脸图像中的人脸关键点坐标,根据所述人脸关键点坐标和第二人脸图像,通过神经网络模型获得模型参数,再根据所述模型参数确定所述第二人脸图像对应的3DMM,即第二模型。在该示例中,每一张第二人脸图像对应一个第二模型。
接着,服务器10根据投影归一化坐标编码算法对第一模型投影得到输入PNCC图像,对第二模型投影得到目标PNCC图像,然后查找输入PNCC图像和目标PNCC图像中像素差最小的像素点作为对应点,计算每组对应点的像素差,根据每组对应点的像素差生成初始光流图。如此,针对每一个第二模型,可以得到一个初始光流图。针对每一个初始光流图,服务器10可以利用初始光流图对第一人脸图像进行形变得到与该第一人脸图像对应的初始形变图,如此,服务器10可以进行形变得到多个初始形变图。
再次,服务器10通过卷积神经网络进行光流补全和矫正。具体地,服务器10采用U-NET结构训练卷积神经网络,该卷积神经网络以第一人脸图像及其对应的初始光流图和初始形变图为输入,以第一人脸图像对应的光流增量图和可见概率图为输出。如此,服务器10可以将初始光流图和光流增量图叠加得到目标光流图,根据第一人脸图像对应的目标光流图对第一人脸图像进行形变得到第一人脸图像对应的目标形变图,并根据所述第一人脸图像对应的目标形变图和可见概率图,生成目标人脸图像。由于第二参考元素包括多张第二人脸图像,对应地,服务器10可以生成与每张第二人脸图像一一对应的多张目标光流图,进而生成与每张第二人脸图像一一对应的多张目标人脸图像。
最后,服务器10将上述多张目标人脸图像输入至预先训练的生成式对抗网络模型,以消除人工纹理和不可见区域,实现对目标人脸图像的优化,服务器10获取优化后的目标人脸图像,根据优化后的目标人脸图像生成关于该明星展示上述姿态和表情的视频,并向终端20返回该视频。
下面将结合图8B对本申请提供的人脸图像生成方法应用进行另一示例说明。
本申请提供的人脸图像生成方法在实现时可以包括:光流激活、光流补全和形变改善这三个阶段,图8B中以基于用户的两张人脸图像合成该用户在目标姿态和表情下的目标人脸图像为例,对上述三个阶段的具体实现进行说明。
请参见图8B,在光流激活阶段,输入两张第一人脸图像(输入图像1和输入图像2)即第一参考元素,并输入表征目标人脸图像的姿态和表情的第二参考元素,得到与两张第一人脸图像对应的第一模型以及与第二参考元素对应的第二模型,对上述模型进行投影得到对应的PNCC图像,针对每一个第一模型对应的PNCC图像,查找其与目标PNCC图像像素差最小的点作为对应点,基于每组对应的点的像素差可以生成初始光流图,如此可以得到每张第一人脸图像各自对应的初始光流图(初始光流图1和初始光流图2)。
在光流补全阶段,通过卷积神经网络得到各输入图像对应的光流增量图和可见概率图,基于初始光流图和光流增量图可以得到各输入图像对应的光流图(光流图1和光流图2),根据光流图对输入图像进行形变可以得到各输入图像对应的目标形变图。然后基于卷积神经网络输出的可见概率图,将目标形变图1和目标形变图2进行合并得到目标人脸图像。
在形变改善阶段,将目标人脸图像输入生成式对抗网络模型,可以对目标人脸图像中的人工纹理、伪影及不可见区域等进行优化,生成优化后的目标人脸图像,若该优化后的目标人脸图像被判别器判别为真,则输出该优化后的目标人脸图像。
基于本申请实施例提供的人脸图像生成方法的具体实现方式,本申请实施例还提供了对应的装置,下面将从功能模块化的角度对装置进行介绍。
参见图9所示的人脸图像生成装置的结构示意图,该装置900包括:
第一模型生成模块910,用于根据第一参考元素中的第一人脸图像确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
第二模型生成模块920,用于根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情,;
确定模块930,用于根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
获取模块940,用于根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
目标人脸图像生成模块950,用于根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
可选地,参见图10,图10为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图9所示结构的基础上,所述装置900还包括:
优化模块960,用于通过生成式对抗网络模型对所述目标人脸图像进行优化,获得所述生成式对抗神经网络模型输出的优化后的目标人脸图像。
可选地,所述第二模型生成模块920具体用于:
响应于第二参考元素包括目标模型参数,根据所述目标模型参数确定与所述目标模型参数对应的三维人脸可变模型作为第二模型。
可选地,所述第二模型生成模块920具体用于:
响应于所述第二参考元素包括第二人脸图像,根据所述第二人脸图像确定与所述第二 人脸图像对应的三维人脸可变模型作为第二模型;
其中,所述第二人脸图像与所述第一人脸图像存在差异。
可选地,参见图11,图11为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图9所示结构的基础上,所述装置900还包括:
三维人脸可变模型生成模块970,用于检测所述第一人脸图像中的人脸关键点坐标;根据平均脸构建初始三维人脸可变模型,将所述初始三维人脸可变模型的三维坐标投影至二维图像得到投影坐标;确定使得所述人脸关键点坐标与所述投影坐标距离最小化的第一模型参数,根据所述第一模型参数确定与所述第一人脸图像对应的三维人脸可变模型。
需要说明的是,该人脸图像生成装置也可以是在图10所示结构的基础上还包括三维人脸可变模型生成模块970,本实施例对此不作限定。
可选地,所述三维人脸可变模型生成模块970通过以下方式确定与所述第一人脸图像对应的三维人脸可变模型:
检测所述第一人脸图像中的人脸关键点坐标;
根据所述人脸关键点坐标和所述第一人脸图像,通过神经网络模型获得第二模型参数;
根据所述第二模型参数确定与所述第一人脸图像对应的三维人脸可变模型。
可选地,所述目标人脸图像生成模块950具体用于:
根据所述第一人脸图像对应的光流增量图对所述第一人脸图像对应的初始光流图进行光流补全,得到所述第一人脸图像对应的目标光流图;
根据所述第一人脸图像对应的目标光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的目标形变图;
根据所述第一人脸图像对应的目标形变图和可见概率图,生成所述目标人脸图像。
可选地,参见图12,图12为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图9所示结构的基础上,所述第二参考元素包括多组有序的目标模型参数或者多张有序的第二人脸图像;
所述第二模型生成模块920具体用于:
响应于所述第二参考元素包括多组有序的目标模型参数,按照顺序针对每组目标模型参数,确定与所述目标模型参数对应的三维人脸可变模型,作为与该目标模型参数对应的第二模型;
响应于所述第二参考元素包括多张有序的第二人脸图像,按照顺序针对每张第二人脸图像,确定与所述第二人脸图像对应的三维人脸可变模型,作为与该第二人脸图像对应的第二模型;
所述确定模块930具体用于:
按照顺序针对每个所述第二模型,执行所述步骤:根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图。
可选地,所述装置900还包括:
图像集生成模块980,用于根据每个第二模型的顺序和基于每个第二模型生成的目标人脸图像,生成有序图像集。
可选地,参见图13,图13为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图12所示结构的基础上,所述装置900还包括:
图像集分享模块981,用于接收针对所述有序图像集的分享指令,根据所述分享指令,分享所述有序图像集。
可选地,参见图14,图14为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图9所示结构的基础上,所述装置900还包括:
发送模块990,用于向终端发送所述目标人脸图像,指示所述终端将所述目标人脸图像作为用户的社交网络头像进行显示。
可选地,所述第一参考元素包括属于同一人的多张不同的第一人脸图像;
所述第一模型生成模块910具体用于:
针对所述第一参考元素中的每张第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型,作为与所述第一人脸图像对应的第一模型。
可选地,参见图15,图15为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图9所示结构的基础上,所述卷积神经网络采用编码器和解码器的网络结构;
所述装置900还包括:
卷积神经网络训练模块991,用于确定第一训练样本集,所述第一训练样本集中的每个训练样本包括至少一组图像数据及所述图像数据对应的标签数据,所述图像数据包括第一样本人脸图像及所述第一样本人脸图像对应的初始光流图和初始形变图,所述第一样本图像对应的初始形变图根据所述第一样本人脸图像对应的初始光流图对所述第一样本人脸图像进行形变得到;所述标签数据包括标定的光流增量图和可见概率图;通过所述第一训练样本集中的训练样本进行网络训练,获得所述卷积神经网络。
可选地,参见图16,图16为本申请实施例提供的人脸图像生成装置的一个结构示意图,在图10所示结构的基础上,所述装置900还包括:
生成式对抗网络模型训练模块961,用于确定第二训练样本集,所述第二训练样本集中的每个训练样本包括第二样本人脸图像及所述第二样本人脸图像对应的标定人脸图像;通 过所述第二训练样本集中的训练样本训练生成式对抗网络,获得所述生成式对抗网络模型。
本申请实施例还提供了一种用于生成人脸图像的设备,该设备可以是服务器,也可以是终端,下面将从硬件实体化的角度对本申请实施例提供的设备进行详细说明。
图17是本申请实施例提供的一种服务器结构示意图,该服务器1700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1722(例如,一个或一个以上处理器)和存储器1732,一个或一个以上存储应用程序1742或数据1744的存储介质1730(例如一个或一个以上海量存储设备)。其中,存储器1732和存储介质1730可以是短暂存储或持久存储。存储在存储介质1730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1722可以设置为与存储介质1730通信,在服务器1700上执行存储介质1730中的一系列指令操作。
服务器1700还可以包括一个或一个以上电源1726,一个或一个以上有线或无线网络接口1750,一个或一个以上输入输出接口1758,和/或,一个或一个以上操作系统1741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的步骤可以基于该图17所示的服务器结构。
其中,CPU 1722用于执行如下步骤:
根据第一参考元素中的第一人脸图像确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
可选的,CPU 1722还可以用于执行本申请实施例中人脸图像生成方法的任意一种实现方式的步骤。
本申请实施例还提供了另一种用于生成人脸图像的设备,该设备为终端,如图18所示,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本 申请实施例方法部分。该终端可以为包括手机、平板电脑、个人数字助理(英文全称:Personal Digital Assistant,英文缩写:PDA)、销售终端(英文全称:Point of Sales,英文缩写:POS)、车载电脑等任意终端设备,以终端为手机为例:
图18示出的是与本申请实施例提供的终端相关的手机的部分结构的框图。参考图18,手机包括:射频(英文全称:Radio Frequency,英文缩写:RF)电路1810、存储器1820、输入单元1830、显示单元1840、传感器1850、音频电路1860、无线保真(英文全称:wireless fidelity,英文缩写:WiFi)模块1870、处理器1880、以及电源1890等部件。本领域技术人员可以理解,图18中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
存储器1820可用于存储软件程序以及模块,处理器1880通过运行存储在存储器1820的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1820可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1820可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器1880是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1820内的软件程序和/或模块,以及调用存储在存储器1820内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1880可包括一个或多个处理单元;优选的,处理器1880可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1880中。
在本申请实施例中,该终端所包括的处理器1880还具有以下功能:
根据第一参考元素中的第一人脸图像确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
可选的,处理器1880还可以用于执行本申请实施例中人脸图像生成方法的任意一种实现方式的步骤。
本申请实施例还提供一种计算机可读存储介质,用于存储程序代码,该程序代码用于执行前述各个实施例所述的一种人脸图像生成方法中的任意一种实施方式。
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括指令,该指令在计算机上运行时,使得计算机执行前述各个实施例所述的一种人脸图像生成方法中的任意一种实施方式。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、 磁碟或者光盘等各种可以存储程序代码的介质。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (31)

  1. 一种人脸图像生成方法,其特征在于,所述方法应用于处理设备,所述方法包括:
    根据第一参考元素中的第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
    根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
    根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
    根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
    根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
  2. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述方法还包括:
    通过生成式对抗网络模型对所述目标人脸图像进行优化,获得所述生成式对抗网络模型输出的优化后的目标人脸图像。
  3. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型,包括:
    响应于所述第二参考元素包括目标模型参数,根据所述目标模型参数确定与所述目标模型参数对应的三维人脸可变模型作为第二模型。
  4. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型,包括:
    响应于所述第二参考元素包括第二人脸图像,根据所述第二人脸图像确定与所述第二人脸图像对应的三维人脸可变模型作为第二模型;
    其中,所述第二人脸图像与所述第一人脸图像存在差异。
  5. 根据权利要求1至4中任一项所述的人脸图像生成方法,其特征在于,通过以下方式确定与所述第一人脸图像对应的三维人脸可变模型:
    检测所述第一人脸图像中的人脸关键点坐标;
    根据平均脸构建初始三维人脸可变模型,将所述初始三维人脸可变模型的三维坐标投影至二维图像得到投影坐标;
    确定使得所述人脸关键点坐标与所述投影坐标距离最小化的第一模型参数,根据所述 第一模型参数确定与所述第一人脸图像对应的三维人脸可变模型。
  6. 根据权利要求1至4中任一项所述的人脸图像生成方法,其特征在于,通过以下方式确定与所述第一人脸图像对应的三维人脸可变模型:
    检测所述第一人脸图像中的人脸关键点坐标;
    根据所述人脸关键点坐标和所述第一人脸图像,通过神经网络模型获得第二模型参数;
    根据所述第二模型参数确定与所述第一人脸图像对应的三维人脸可变模型。
  7. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像,包括:
    根据所述第一人脸图像对应的光流增量图对所述第一人脸图像对应的初始光流图进行光流补全,得到所述第一人脸图像对应的目标光流图;
    根据所述第一人脸图像对应的目标光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的目标形变图;
    根据所述第一人脸图像对应的目标形变图和可见概率图,生成所述目标人脸图像。
  8. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述第二参考元素包括多组有序的目标模型参数或者多张有序的第二人脸图像;
    所述根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型,包括:
    响应于所述第二参考元素包括多组有序的目标模型参数,按照顺序针对每组目标模型参数,确定与所述目标模型参数对应的三维人脸可变模型,作为与所述目标模型参数对应的第二模型;
    响应于所述第二参考元素包括多张有序的第二人脸图像,按照顺序针对每张第二人脸图像,确定与所述第二人脸图像对应的三维人脸可变模型,作为与所述第二人脸图像对应的第二模型;
    按照顺序针对每个第二模型,执行所述步骤:根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图。
  9. 根据权利要求8所述的人脸图像生成方法,其特征在于,所述方法还包括:
    根据每个第二模型的顺序和基于每个第二模型生成的目标人脸图像,生成有序图像集。
  10. 根据权利要求9所述的人脸图像生成方法,其特征在于,所述方法还包括:
    接收针对所述有序图像集的分享指令;
    根据所述分享指令,分享所述有序图像集。
  11. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述方法还包括:
    向终端发送所述目标人脸图像,指示所述终端将所述目标人脸图像作为用户的社交网络头像进行显示。
  12. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述第一参考元素包括属于同一人的多张不同的第一人脸图像;
    所述根据第一参考元素中的第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型,包括:
    针对所述第一参考元素中的每张第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型,作为与所述第一人脸图像对应的第一模型。
  13. 根据权利要求1所述的人脸图像生成方法,其特征在于,所述卷积神经网络采用编码器和解码器的网络结构;
    所述卷积神经网络通过以下方式训练生成:
    确定第一训练样本集,所述第一训练样本集中的每个训练样本包括至少一组图像数据及所述图像数据对应的标签数据,所述图像数据包括第一样本人脸图像及所述第一样本人脸图像对应的初始光流图和初始形变图,所述第一样本人脸图像对应的初始形变图根据所述第一样本人脸图像对应的初始光流图对所述第一样本人脸图像进行形变得到;所述标签数据包括标定的光流增量图和可见概率图;
    通过所述第一训练样本集中的训练样本进行网络训练,获得所述卷积神经网络。
  14. 根据权利要求2所述的人脸图像生成方法,其特征在于,所述生成式对抗网络模型通过以下方式训练生成:
    确定第二训练样本集,所述第二训练样本集中的每个训练样本包括第二样本人脸图像及所述第二样本人脸图像对应的标定人脸图像;
    通过所述第二训练样本集中的训练样本训练生成式对抗网络,获得所述生成式对抗网络模型。
  15. 一种人脸图像生成装置,其特征在于,所述装置包括:
    第一模型生成模块,用于根据第一参考元素中的第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
    第二模型生成模块,用于根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
    确定模块,用于根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应 的初始形变图;
    获取模块,用于根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络获得所述第一人脸图像对应的光流增量图和可见概率图;
    目标人脸图像生成模块,用于根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
  16. 一种设备,其特征在于,所述设备包括处理器以及存储器:所述存储器用于存储程序代码;所述处理器用于根据所述程序代码中的指令执行下述人脸图像生成方法的步骤:
    根据第一参考元素中的第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型;
    根据第二参考元素确定与所述第二参考元素对应的三维人脸可变模型作为第二模型;所述第二参考元素用于表征目标人脸图像的姿态和/或表情;
    根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图,根据所述初始光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的初始形变图;
    根据所述第一人脸图像及所述第一人脸图像对应的初始光流图和初始形变图,通过卷积神经网络模型获得所述第一人脸图像对应的光流增量图和可见概率图;
    根据所述第一人脸图像及所述第一人脸图像对应的初始光流图、光流增量图和可见概率图,生成所述目标人脸图像。
  17. 根据权利要求16所述的设备,其特征在于,所述处理器还用于执行下述步骤:
    通过生成式对抗网络模型对所述目标人脸图像进行优化,获得所述生成式对抗网络模型输出的优化后的目标人脸图像。
  18. 根据权利要求16所述的设备,其特征在于,所述处理器用于执行下述步骤:
    响应于所述第二参考元素包括目标模型参数,根据所述目标模型参数确定与所述目标模型参数对应的三维人脸可变模型作为第二模型。
  19. 根据权利要求16所述的设备,其特征在于,所述处理器用于执行下述步骤:
    响应于所述第二参考元素包括第二人脸图像,根据所述第二人脸图像确定与所述第二人脸图像对应的三维人脸可变模型作为第二模型;
    其中,所述第二人脸图像与所述第一人脸图像存在差异。
  20. 根据权利要求16至19中任一项所述的设备,其特征在于,所述处理器用于执行下述步骤:
    检测所述第一人脸图像中的人脸关键点坐标;
    根据平均脸构建初始三维人脸可变模型,将所述初始三维人脸可变模型的三维坐标投 影至二维图像得到投影坐标;
    确定使得所述人脸关键点坐标与所述投影坐标距离最小化的第一模型参数,根据所述第一模型参数确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型。
  21. 根据权利要求16至19中任一项所述的设备,其特征在于,所述处理器用于执行下述步骤:
    检测所述第一人脸图像中的人脸关键点坐标;
    根据所述人脸关键点坐标和所述第一人脸图像,通过神经网络模型获得第二模型参数;
    根据所述第二模型参数确定与所述第一人脸图像对应的三维人脸可变模型作为第一模型。
  22. 根据权利要求16所述的设备,其特征在于,所述处理器用于执行下述步骤:
    根据所述第一人脸图像对应的光流增量图对所述第一人脸图像对应的初始光流图进行光流补全,得到所述第一人脸图像对应的目标光流图;
    根据所述第一人脸图像对应的目标光流图对所述第一人脸图像进行形变得到所述第一人脸图像对应的目标形变图;
    根据所述第一人脸图像对应的目标形变图和可见概率图,生成所述目标人脸图像。
  23. 根据权利要求16所述的设备,其特征在于,所述第二参考元素包括多组有序的目标模型参数或者多张有序的第二人脸图像;
    所述处理器用于执行下述步骤:
    响应于所述第二参考元素包括多组有序的目标模型参数,按照顺序针对每组目标模型参数,确定与所述目标模型参数对应的三维人脸可变模型,作为与所述目标模型参数对应的第二模型;
    响应于所述第二参考元素包括多张有序的第二人脸图像,按照顺序针对每张第二人脸图像,确定与所述第二人脸图像对应的三维人脸可变模型,作为与所述第二人脸图像对应的第二模型;
    按照顺序针对每个第二模型,执行下述步骤:根据所述第一模型和所述第二模型,确定所述第一人脸图像对应的初始光流图。
  24. 根据权利要求23所述的设备,其特征在于,所述处理器还用于执行下述步骤:
    根据每个第二模型的顺序和基于每个第二模型生成的目标人脸图像,生成有序图像集。
  25. 根据权利要求24所述的设备,其特征在于,所述处理器还用于执行下述步骤:
    接收针对所述有序图像集的分享指令;
    根据所述分享指令,分享所述有序图像集。
  26. 根据权利要求16所述的设备,其特征在于,所述处理器还用于执行下述步骤:
    向终端发送所述目标人脸图像,指示所述终端将所述目标人脸图像作为用户的社交网络头像进行显示。
  27. 根据权利要求16所述的设备,其特征在于,所述第一参考元素包括属于同一人的多张不同的第一人脸图像;
    所述处理器用于执行下述步骤:
    针对所述第一参考元素中的每张第一人脸图像,确定与所述第一人脸图像对应的三维人脸可变模型,作为与所述第一人脸图像对应的第一模型。
  28. 根据权利要求16所述的设备,其特征在于,所述卷积神经网络采用编码器和解码器的网络结构;所述处理器用于执行下述步骤:
    确定第一训练样本集,所述第一训练样本集中的每个训练样本包括至少一组图像数据及所述图像数据对应的标签数据,所述图像数据包括第一样本人脸图像及所述第一样本人脸图像对应的初始光流图和初始形变图,所述第一样本人脸图像对应的初始形变图根据所述第一样本人脸图像对应的初始光流图对所述第一样本人脸图像进行形变得到;所述标签数据包括标定的光流增量图和可见概率图;
    通过所述第一训练样本集中的训练样本进行网络训练,获得所述卷积神经网络。
  29. 根据权利要求17所述的设备,其特征在于,所述处理器用于执行下述步骤:
    确定第二训练样本集,所述第二训练样本集中的每个训练样本包括第二样本人脸图像及所述第二样本人脸图像对应的标定人脸图像;
    通过所述第二训练样本集中的训练样本训练生成式对抗网络,获得所述生成式对抗网络模型。
  30. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1-14任一项所述的人脸图像生成方法。
  31. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,所述指令在计算机上运行时,使得所述计算机执行权利要求1-14任一项所述的人脸图像生成方法。
PCT/CN2020/080335 2019-03-22 2020-03-20 人脸图像生成方法、装置、设备及存储介质 WO2020192568A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20779223.5A EP3944200B1 (en) 2019-03-22 2020-03-20 Facial image generation method and apparatus, device and storage medium
US17/235,456 US11380050B2 (en) 2019-03-22 2021-04-20 Face image generation method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910222403.3A CN109961507B (zh) 2019-03-22 2019-03-22 一种人脸图像生成方法、装置、设备及存储介质
CN201910222403.3 2019-03-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/235,456 Continuation US11380050B2 (en) 2019-03-22 2021-04-20 Face image generation method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020192568A1 true WO2020192568A1 (zh) 2020-10-01

Family

ID=67024657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080335 WO2020192568A1 (zh) 2019-03-22 2020-03-20 人脸图像生成方法、装置、设备及存储介质

Country Status (4)

Country Link
US (1) US11380050B2 (zh)
EP (1) EP3944200B1 (zh)
CN (1) CN109961507B (zh)
WO (1) WO2020192568A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239857A (zh) * 2021-05-27 2021-08-10 京东科技控股股份有限公司 视频合成方法及其装置
CN113506232A (zh) * 2021-07-02 2021-10-15 清华大学 图像生成方法、装置、电子设备以及存储介质
CN113609900A (zh) * 2021-06-25 2021-11-05 南京信息工程大学 局部生成人脸定位方法、装置、计算机设备和存储介质

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3671660A1 (en) * 2018-12-20 2020-06-24 Dassault Systèmes Designing a 3d modeled object via user-interaction
CN109977847B (zh) * 2019-03-22 2021-07-16 北京市商汤科技开发有限公司 图像生成方法及装置、电子设备和存储介质
CN109961507B (zh) 2019-03-22 2020-12-18 腾讯科技(深圳)有限公司 一种人脸图像生成方法、装置、设备及存储介质
CN110390291B (zh) * 2019-07-18 2021-10-08 北京字节跳动网络技术有限公司 数据处理方法、装置及电子设备
CN110427864B (zh) * 2019-07-29 2023-04-21 腾讯科技(深圳)有限公司 一种图像处理方法、装置及电子设备
CN113569790B (zh) * 2019-07-30 2022-07-29 北京市商汤科技开发有限公司 图像处理方法及装置、处理器、电子设备及存储介质
CN110675475B (zh) * 2019-08-19 2024-02-20 腾讯科技(深圳)有限公司 一种人脸模型生成方法、装置、设备及存储介质
CN111079507B (zh) * 2019-10-18 2023-09-01 深兰科技(重庆)有限公司 一种行为识别方法及装置、计算机装置及可读存储介质
CN111028343B (zh) 2019-12-16 2020-12-11 腾讯科技(深圳)有限公司 三维人脸模型的生成方法、装置、设备及介质
CN111476749B (zh) * 2020-04-03 2023-02-28 陕西师范大学 基于人脸关键点引导式生成对抗网络的人脸修复方法
CN111539903B (zh) * 2020-04-16 2023-04-07 北京百度网讯科技有限公司 训练人脸图像合成模型的方法和装置
CN111523497B (zh) * 2020-04-27 2024-02-27 深圳市捷顺科技实业股份有限公司 一种人脸纠正方法、装置和电子设备
CN112652058B (zh) * 2020-12-31 2024-05-31 广州华多网络科技有限公司 人脸图像重演方法、装置、计算机设备及存储介质
CN112990097B (zh) * 2021-04-13 2022-11-04 电子科技大学 一种基于对抗消除的人脸表情识别方法
CA3215196A1 (en) * 2021-04-30 2022-11-03 Ethan Rublee Real time localization with image data
CN113223137B (zh) * 2021-05-13 2023-03-24 广州虎牙科技有限公司 透视投影人脸点云图的生成方法、装置及电子设备
US11900534B2 (en) * 2021-07-30 2024-02-13 The Boeing Company Systems and methods for synthetic image generation
CN113724163B (zh) * 2021-08-31 2024-06-07 平安科技(深圳)有限公司 基于神经网络的图像矫正方法、装置、设备及介质
CN113744129A (zh) * 2021-09-08 2021-12-03 深圳龙岗智能视听研究院 一种基于语义神经渲染的人脸图像生成方法及系统
WO2023225825A1 (zh) * 2022-05-23 2023-11-30 上海玄戒技术有限公司 位置差异图生成方法及装置、电子设备、芯片及介质
CN115171196B (zh) * 2022-08-25 2023-03-28 北京瑞莱智慧科技有限公司 人脸图像处理方法、相关装置及存储介质
CN116152122B (zh) * 2023-04-21 2023-08-25 荣耀终端有限公司 图像处理方法和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6492986B1 (en) * 1997-06-02 2002-12-10 The Trustees Of The University Of Pennsylvania Method for human face shape and motion estimation based on integrating optical flow and deformable models
CN106910247A (zh) * 2017-03-20 2017-06-30 厦门幻世网络科技有限公司 用于生成三维头像模型的方法和装置
US20180197330A1 (en) * 2017-01-10 2018-07-12 Ditto Technologies, Inc. Modeling of a user's face
CN109410253A (zh) * 2018-11-06 2019-03-01 北京字节跳动网络技术有限公司 用于生成信息的方法和装置
CN109961507A (zh) * 2019-03-22 2019-07-02 腾讯科技(深圳)有限公司 一种人脸图像生成方法、装置、设备及存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391642B1 (en) * 2008-05-12 2013-03-05 Hewlett-Packard Development Company, L.P. Method and system for creating a custom image
KR20120005587A (ko) * 2010-07-09 2012-01-17 삼성전자주식회사 컴퓨터 시스템에서 얼굴 애니메이션 생성 방법 및 장치
US10521892B2 (en) * 2016-08-31 2019-12-31 Adobe Inc. Image lighting transfer via multi-dimensional histogram matching
CN114694221B (zh) * 2016-10-31 2024-06-18 谷歌有限责任公司 基于学习的面部重建方法
US10878612B2 (en) * 2017-04-04 2020-12-29 Intel Corporation Facial image replacement using 3-dimensional modelling techniques
CN107292813B (zh) * 2017-05-17 2019-10-22 浙江大学 一种基于生成对抗网络的多姿态人脸生成方法
US10565758B2 (en) * 2017-06-14 2020-02-18 Adobe Inc. Neural face editing with intrinsic image disentangling
CN108229381B (zh) * 2017-12-29 2021-01-08 湖南视觉伟业智能科技有限公司 人脸图像生成方法、装置、存储介质和计算机设备
US10839585B2 (en) * 2018-01-05 2020-11-17 Vangogh Imaging, Inc. 4D hologram: real-time remote avatar creation and animation control
US10896535B2 (en) * 2018-08-13 2021-01-19 Pinscreen, Inc. Real-time avatars using dynamic textures
WO2020037679A1 (zh) * 2018-08-24 2020-02-27 太平洋未来科技(深圳)有限公司 视频处理方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6492986B1 (en) * 1997-06-02 2002-12-10 The Trustees Of The University Of Pennsylvania Method for human face shape and motion estimation based on integrating optical flow and deformable models
US20180197330A1 (en) * 2017-01-10 2018-07-12 Ditto Technologies, Inc. Modeling of a user's face
CN106910247A (zh) * 2017-03-20 2017-06-30 厦门幻世网络科技有限公司 用于生成三维头像模型的方法和装置
CN109410253A (zh) * 2018-11-06 2019-03-01 北京字节跳动网络技术有限公司 用于生成信息的方法和装置
CN109961507A (zh) * 2019-03-22 2019-07-02 腾讯科技(深圳)有限公司 一种人脸图像生成方法、装置、设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239857A (zh) * 2021-05-27 2021-08-10 京东科技控股股份有限公司 视频合成方法及其装置
CN113239857B (zh) * 2021-05-27 2023-11-03 京东科技控股股份有限公司 视频合成方法及其装置
CN113609900A (zh) * 2021-06-25 2021-11-05 南京信息工程大学 局部生成人脸定位方法、装置、计算机设备和存储介质
CN113609900B (zh) * 2021-06-25 2023-09-12 南京信息工程大学 局部生成人脸定位方法、装置、计算机设备和存储介质
CN113506232A (zh) * 2021-07-02 2021-10-15 清华大学 图像生成方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
EP3944200B1 (en) 2024-06-19
CN109961507A (zh) 2019-07-02
US11380050B2 (en) 2022-07-05
EP3944200A4 (en) 2022-05-18
EP3944200A1 (en) 2022-01-26
CN109961507B (zh) 2020-12-18
US20210241521A1 (en) 2021-08-05

Similar Documents

Publication Publication Date Title
WO2020192568A1 (zh) 人脸图像生成方法、装置、设备及存储介质
US10789453B2 (en) Face reenactment
US10540817B2 (en) System and method for creating a full head 3D morphable model
US11393152B2 (en) Photorealistic real-time portrait animation
US10839586B1 (en) Single image-based real-time body animation
US9314692B2 (en) Method of creating avatar from user submitted image
CN113838176B (zh) 模型的训练方法、三维人脸图像生成方法及设备
CN109325990B (zh) 图像处理方法及图像处理装置、存储介质
KR20230165350A (ko) 모바일 디바이스에서 사실적인 머리 회전들 및 얼굴 애니메이션 합성을 위한 방법들 및 시스템들
CN114429518A (zh) 人脸模型重建方法、装置、设备和存储介质
CN116051722A (zh) 三维头部模型重建方法、装置及终端
WO2021173489A1 (en) Apparatus, method, and system for providing a three-dimensional texture using uv representation
US20240233146A1 (en) Image processing using neural networks, with image registration
CN111047509B (zh) 一种图像特效处理方法、装置及终端
CN110689602A (zh) 三维人脸重建方法、装置、终端及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20779223

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2020779223

Country of ref document: EP