CN111461959B - Face emotion synthesis method and device - Google Patents

Face emotion synthesis method and device Download PDF

Info

Publication number
CN111461959B
CN111461959B CN202010095755.XA CN202010095755A CN111461959B CN 111461959 B CN111461959 B CN 111461959B CN 202010095755 A CN202010095755 A CN 202010095755A CN 111461959 B CN111461959 B CN 111461959B
Authority
CN
China
Prior art keywords
image
face
face image
contour
synthesized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010095755.XA
Other languages
Chinese (zh)
Other versions
CN111461959A (en
Inventor
沈海斌
孔家慧
黄科杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010095755.XA priority Critical patent/CN111461959B/en
Publication of CN111461959A publication Critical patent/CN111461959A/en
Application granted granted Critical
Publication of CN111461959B publication Critical patent/CN111461959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a face emotion synthesis method and device. The method comprises the steps of obtaining a color image of a current frame, extracting a face image and adjusting the face image to a preset size; detecting a plurality of preset key point positions of a face image, drawing a contour map of each part of the face according to the key point positions, and obtaining a face contour image; inputting the face image, the face outline image and the target emotion label into a first-stage convolution neural network to obtain a rough synthetic face image; using residual images of the rough synthesized face image and the original face image as inputs, and using a second-stage convolutional neural network prediction image mask; and calculating a corrected synthesized face image according to the rough synthesized face image, the face image and the image mask. The natural and lifelike face image or face video with target emotion can be synthesized under various environmental illumination, face shielding and extreme gesture conditions.

Description

Face emotion synthesis method and device
Technical Field
The invention belongs to the technical field of facial emotion synthesis, and particularly relates to a facial emotion synthesis method and device.
Background
Facial emotion synthesis refers to changing the emotional expression of a person in a given image or video by technical means, such as nature, happiness, surprise, heart injury, etc. The facial emotion synthesis has more entertainment application in image editing software, photographing software and small video software, and has commercial application value in the field of picture making and film and television making. However, the existing face emotion synthesis is not mature enough, is mainly used on some special effects APP, and has not strong application capability. The prior art mainly has the following defects: (1) the emotion is not abundant enough; (2) The continuity of the synthesized video obtained by processing the video frame by frame is not enough, the emotion expression modes of the existing effect synthesis are relatively uniform, for example, after the video of the character lecture is processed, the content of the original lecture of the character cannot be reserved, so that the video is not natural enough, and the application capability in the fields of small video and film and television production is limited; (3) Under the conditions of complex illumination environment, occlusion of a human face and large gesture of a person, the synthesis effect is unstable, and the robustness is poor.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for synthesizing natural and vivid face images or face videos with target moods under various conditions of ambient light, face shielding and extreme postures, and designs a device for realizing the method.
The invention adopts the following technical scheme:
a face emotion synthesis method, comprising:
step S101, acquiring a current frame color image;
step S102, extracting a face image from the current frame color image, and adjusting the face image to a preset size;
step S103, obtaining a face contour image according to a plurality of preset key point positions of the face image;
step S104, setting a target emotion label, inputting the face image, the face outline image and the target emotion label into a first-stage convolutional neural network, and obtaining a rough synthetic face image; the target emotion label refers to the emotion of a desired rough synthetic face image;
step S105, taking the difference between the rough synthesized face image and the face image to obtain a residual image, and inputting the residual image into a second-stage convolutional neural network to obtain an image mask;
and step S106, calculating the rough synthesized face image and the face image by using the image mask to obtain a final corrected synthesized face image.
As a preferred aspect of the present invention, the first-stage convolutional neural network includes an image encoder, a contour encoder, an image decoder, and a contour decoder; the image encoder and the contour encoder are composed of a plurality of downsampling layers, the adjusted face image is input to the image encoder, the face contour image and the emotion label are spliced and then input to the contour encoder, the encoding features output by the image encoder and the contour encoder are spliced, and the mixed features are obtained after a plurality of cascaded residual blocks are processed; the image decoder comprises a plurality of up-sampling layers and splicing layers, wherein each up-sampling layer is followed by one splicing layer, and the last splicing layer is connected with the output layer; the contour decoder consists of a plurality of upsampling layers, and the last upsampling layer is connected with the output layer; inputting the mixed features to an image decoder, wherein each time an up-sampling layer is passed, the obtained features are spliced with the features with the same size calculated by the image encoder, and a rough synthetic face image is obtained; and inputting the mixed features to a contour decoder to obtain a synthetic human face contour image.
As a preferred aspect of the present invention, the second-stage convolutional neural network includes a plurality of residual blocks and a convolutional layer; subtracting the adjusted face image from the rough synthesized face image to obtain a residual image; and inputting the residual images into a plurality of cascaded residual blocks, and finally, processing the residual images through a convolution layer to obtain an image mask.
Aiming at the face emotion synthesis method, the invention discloses a face emotion synthesis device which comprises an image acquisition module, a face extraction module, a contour extraction module, a coarse synthesis module and a correction module; the image acquisition module is used for acquiring a color image of the current frame; the face extraction module is used for extracting a face image from the color image of the current frame and adjusting the size of the face image; the contour extraction module is used for detecting a plurality of key point coordinates from the face image and drawing the face contour image; the rough synthesis module is used for processing the adjusted face image by using a first-stage convolutional neural network, the face outline image and a target emotion label to obtain a rough synthesized face image, wherein the target emotion label refers to the emotion of the expected rough synthesized face image; the correction module is used for processing the residual errors among the adjusted face images and the roughly synthesized face images by using a second-stage convolutional neural network, obtaining an image mask, and calculating a finally corrected synthesized face image according to the image mask.
Compared with the prior art, the invention has the beneficial effects that:
the scheme of the invention acquires a current frame color image, extracts a face image from the image, detects a plurality of key point coordinates of the face image, draws a face contour image, processes the face image by using a first-stage convolution neural network, acquires a rough synthesized face image by using the face contour image and a target emotion label, processes the rough synthesized face image and a residual image of the face image by using a second-stage convolution neural network, acquires an image mask, and finally calculates to acquire a final corrected synthesized face image. According to the scheme, the robustness of the rough synthesized face image under the conditions of complex illumination, face shielding and extreme gesture is improved through the face contour image and the first-stage convolution neural network. In addition, the cascade connection of the first-stage convolutional neural network and the second-stage convolutional neural network improves the image consistency of the synthesized video obtained after video processing. According to the scheme, natural and lifelike emotion expression of the person can be synthesized under any image or video shooting environment and any gesture.
Drawings
FIG. 1 is a flowchart of a face emotion synthesis method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a facial emotion synthesizing device according to an embodiment of the present invention;
FIG. 3 is a block diagram of a first stage convolutional neural network in accordance with the present invention; in the figure, 31 image encoder, 32 contour encoder, 33 image decoder, 34 contour decoder;
fig. 4 is a block diagram of a second level convolutional neural network in the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
As shown in fig. 1, in a first aspect of the present invention, a face emotion synthesis method S100 is designed, including:
step S101, acquiring a current frame color image;
in a specific embodiment of the present invention, a previously photographed image or video clip may be provided, or a current frame color image may be directly obtained through a camera.
Step S102, extracting a face image from the current frame color image, and adjusting the face image to a preset size;
in a specific embodiment of the present invention, face detection is performed using a face detector of a machine learning library such as OpenCV or Dlib, and the detected face image is adjusted to a preset size after the detected face image is acquired. The preset size may be set to m×m (e.g., 128×128), where M is an integer greater than zero.
It should be noted that, when the face image is obtained, certain background information may be included, which is not limited to the face portion, but generally the face image should include the head of the person and a part of the background.
Step S103, detecting a plurality of preset key point positions of the face image, drawing the outline of each part of the face on the blank image according to the key point positions, and obtaining a face outline image;
in one embodiment of the present invention, the face_alignment library is a machine learning library written by python and dedicated to detecting face keypoints, which is used to obtain the coordinates of 68 face keypoints and then to draw a face contour image. The face contour image should correspond to the adjusted face image, i.e. the same size, and the coordinates of the key points.
Step S104, setting a target emotion label, inputting the face image, the face outline image and the target emotion label into a first-stage convolutional neural network, and obtaining a rough synthetic face image; the target emotion label refers to the emotion of a desired rough synthetic face image;
in a specific embodiment of the present invention, the structure of the first-stage convolutional neural network is shown in fig. 3, the face image is input to an image encoder, and the face contour image and the target emotion label are spliced and then input to the contour encoder; splicing the coding vectors obtained by coding the two encoders, and then processing a plurality of residual blocks to obtain a mixed characteristic, wherein the number of the residual blocks is 3; the mixed features are input to an image decoder, the features obtained at present and the features with the same size obtained by encoding of the encoder are spliced after each layer of up-sampling, the spliced features are input to the next layer of up-sampling layer, and finally a coarse synthesized face image is obtained; optionally, the mixed features are input to a contour decoder to obtain a synthesized face contour image.
The training method of the first-stage convolutional neural network specifically comprises the following steps: and acquiring a field public data set with an expression label, preprocessing all images in the data set (extracting face images from the images and scaling to a preset size, drawing corresponding face contour images) and obtaining the face images and the face contour images with the preset size. The first-stage convolutional neural network comprises an image encoder, a contour encoder, an image decoder, a contour decoder and a plurality of residual blocks. In the training stage, firstly, a face image, a face outline image and a target emotion label are received as inputs, meanwhile, a coarse synthesized face image and a face outline image corresponding to the coarse synthesized face image are output, a training mode of an antagonistic generation network is adopted in the training process, besides a first-stage convolutional neural network, two different convolutional neural networks are required to be arranged for distinguishing, supervising and judging the authenticity and emotion labels of the coarse synthesized face image and the face outline image corresponding to the coarse synthesized face image, and the face outline image corresponding to the coarse synthesized face image and the emotion label of the original face image are input into the first-stage neural network again, so that the first-stage neural network can recover the original face image and the original face outline image, then the Loss function Loss is calculated, a model is optimized by using an Adam optimizer, the learning rate of all networks can be 0.0001, the total iteration times can be 300000 times, and each 1000 output results are observed, and the test data set is subjected to pretreatment to obtain the face image and the face outline image with the preset size. The contour decoder in the first-stage convolutional neural network can be omitted during testing and actual use.
Wherein, the target emotion label refers to emotion expression of the corresponding face image, including but not limited to nature, happiness, surprise, injury, happiness, aversion, fear and the like, and for example, the target emotion label may be 0 (nature), 1 (happiness), 0 (surprise), 0 (injury), 0 (happiness), 0 (aversion), 0 (fear), and the emotion expression of the face image corresponding to the emotion label is happy.
Step S105, subtracting the face image from the rough synthesized face image to obtain a residual image, and processing the residual image through a second-stage convolutional neural network to obtain a predicted image mask. The structure of the second-stage convolutional neural network is shown in fig. 4, and consists of a plurality of residual blocks and a convolutional layer.
In a specific embodiment of the present invention, a residual image obtained by differencing the coarse synthesized face image and the face image is processed by a plurality of residual blocks and then passed through a convolution layer, and a final image mask is predicted.
The training method of the second-stage convolutional neural network specifically comprises the following steps: acquiring a field public data set with an expression label, acquiring a trained first-stage convolutional neural network according to the first-stage convolutional neural network training method, obtaining a coarse synthetic image by using the first-stage convolutional neural network, subtracting a corresponding face image from the coarse synthetic image to obtain a residual image, processing the residual image by using a second-stage convolutional neural network to obtain an image mask, and calculating according to the step S106 to obtain a final corrected synthetic face image, wherein the second-stage convolutional neural network consists of a plurality of residual blocks and a layer of convolutional layer. The training process adopts a training mode of an countermeasure generation network, a convolutional neural network is required to be additionally arranged to judge the authenticity of the final corrected synthesized face image, then Loss function Loss is calculated, an Adam optimizer is used for model optimization, the learning rate of all networks can be 0.0001, the total iteration times can be 10000, and the result is output every 1000 times for observation, and a test data set is processed to obtain the residual image.
Step S106, calculating the rough synthesized face image and the face image by using the image mask to obtain a final corrected synthesized face image;
the final corrected composite face image satisfies the following relationship:
I=Isrc*(1-Mask)+Isyn*Mask
wherein I is the final corrected synthesized face image, isrc is the adjusted face image, isyn is the rough synthesized face image, and Mask is the image Mask.
The face emotion synthesis method can synthesize richer emotion; the invention uses the assistance of the key points of the human face and the outline information of the human face, thereby being capable of adapting to complex illumination environment, face shielding and extreme gesture conditions and having good robustness; the invention uses two convolutional neural network cascades to further optimize the results, so that more natural and lifelike images or coherent videos can be synthesized.
In a second aspect of the present invention, as shown in fig. 2, there is provided a face emotion synthesis device 20 including:
an image acquisition module 21 for acquiring a color image of a current frame;
a face extraction module 22, configured to extract a face image from the color image of the current frame and adjust the size;
the contour extraction module 23 is used for detecting a plurality of key point coordinates from the face image and drawing the face contour image;
the coarse synthesis module 24 is configured to process the adjusted face image by using a first-level convolutional neural network, the face contour image and a target emotion label, and obtain a coarse synthesized face image, where the target emotion label refers to the emotion of the desired coarse synthesized face image.
And the correction module 25 is configured to process the residues between the adjusted face images and the coarsely synthesized face images by using a second-stage convolutional neural network, obtain an image mask, and calculate a final corrected synthesized face image according to the mask.
In a specific embodiment of the present invention, the face extraction module includes:
extraction unit: the face image is extracted from the current frame color image;
an adjusting unit: and the face image is used for adjusting the face image to a preset size.
In a specific embodiment of the present invention, the contour extraction module includes:
and a detection unit: coordinates for detecting 68 key points from the adjusted face image;
and a drawing unit: and the method is used for creating blank images with preset sizes and drawing the outline of the corresponding face part according to 68 key point coordinates.
In a specific embodiment of the present invention, the coarse synthesis module includes:
a synthesis unit: the face image processing module is used for processing the adjusted face image, the face outline image and the target emotion label by using a first-stage convolutional neural network to obtain a roughly synthesized face image; the target emotion label is preset and input into the synthesis unit, and the roughly synthesized face image has emotion corresponding to the target emotion label.
In a specific embodiment of the present invention, the correction module includes:
residual calculation unit: the residual image is used for calculating the face image which is roughly synthesized and the adjusted face image;
prediction unit: the second-stage convolutional neural network is used for processing the residual image and predicting an image mask;
and a correction unit: and the method is used for calculating a final corrected synthesized face image by using the rough synthesized face image, the adjusted image and the predicted image mask.
In one embodiment of the present invention, the working process of the face emotion synthesis device 20 specifically includes: acquiring a color image of a current frame by adopting an image acquisition module, sequentially connecting an extraction unit and an adjustment unit, extracting a face image from the color image of the current frame, and adjusting the face image to a preset size; the output of the adjusting unit is connected with the input of the detecting unit, the detecting unit obtains the coordinates of key points of the adjusted face image and inputs the coordinates into the drawing unit, and the drawing unit draws the outline of the corresponding face part according to the coordinates of the key points of the face. The output of the drawing unit and the output of the adjusting unit are both connected with the synthesizing unit, the synthesizing unit further comprises an input port of a target emotion label, a trained first-stage convolutional neural network model is loaded in the synthesizing unit, the output of the synthesizing unit is connected with the correcting module, a trained second-stage convolutional neural network model is loaded in the correcting module, and finally a corrected synthetic face image is obtained.
The facial emotion synthesis device provided by the embodiment of the invention can be applied to the related embodiment of the facial emotion synthesis method, and details of the method are described in the above description, and are not repeated here.
It is to be understood that the above embodiments are merely illustrative of the application of the principles of the present invention, but not in limitation thereof. Modifications of the above-described embodiments, or equivalent substitutions of some of the features thereof, will be apparent to those of ordinary skill in the art, and are intended to be within the scope of the present invention.

Claims (10)

1. A method of face emotion synthesis, comprising:
step S101, acquiring a current frame color image;
step S102, extracting a face image from the current frame color image, and adjusting the face image to a preset size;
step S103, obtaining a face contour image according to a plurality of preset key point positions of the face image;
step S104, setting a target emotion label, inputting the face image, the face outline image and the target emotion label into a first-stage convolutional neural network, and obtaining a rough synthetic face image; the target emotion label refers to the emotion of a desired rough synthetic face image;
step S105, taking the difference between the rough synthesized face image and the face image to obtain a residual image, and inputting the residual image into a second-stage convolutional neural network to obtain an image mask;
and step S106, calculating the rough synthesized face image and the face image by using the image mask to obtain a final corrected synthesized face image.
2. The face emotion synthesis method according to claim 1, wherein the step S103 is specifically: and detecting 68 key point coordinates of the adjusted face image, drawing the outline of each part of the face on a blank image with a preset size according to the 68 key point coordinates, and obtaining a face outline image.
3. The method of face emotion synthesis according to claim 1, wherein the first level convolutional neural network comprises an image encoder, a contour encoder, an image decoder, and a contour decoder; the image encoder and the contour encoder are composed of a plurality of downsampling layers, the adjusted face image is input to the image encoder, the face contour image and the emotion label are spliced and then input to the contour encoder, the encoding features output by the image encoder and the contour encoder are spliced, and the mixed features are obtained after a plurality of cascaded residual blocks are processed;
the image decoder comprises a plurality of up-sampling layers and splicing layers, wherein each up-sampling layer is followed by one splicing layer, and the last splicing layer is connected with the output layer; the contour decoder consists of a plurality of upsampling layers, and the last upsampling layer is connected with the output layer; inputting the mixed features to an image decoder, wherein each time an up-sampling layer is passed, the obtained features are spliced with the features with the same size calculated by the image encoder, and a rough synthetic face image is obtained; and inputting the mixed features to a contour decoder to obtain a synthetic human face contour image.
4. The face emotion synthesis method of claim 1, wherein the second-level convolutional neural network comprises a plurality of residual blocks and a convolutional layer; subtracting the adjusted face image from the rough synthesized face image to obtain a residual image; and inputting the residual images into a plurality of cascaded residual blocks, and finally, processing the residual images through a convolution layer to obtain an image mask.
5. The face emotion synthesis method according to claim 1, characterized in that in step S106, the final corrected synthesized face image satisfies the following relation:
I=Isrc*(1-Mask)+Isyn*Mask
wherein I is the final corrected synthesized face image, isrc is the adjusted face image, isyn is the rough synthesized face image, and Mask is the image Mask.
6. A facial emotion synthesizing device, characterized by comprising:
the image acquisition module is used for acquiring a color image of the current frame;
the face extraction module is used for extracting a face image from the color image of the current frame and adjusting the size of the face image;
the contour extraction module is used for detecting a plurality of key point coordinates from the face image and drawing the face contour image;
the rough synthesis module is used for processing the adjusted face image, the face outline image and the target emotion label by using the first-stage convolutional neural network to obtain a rough synthesized face image, wherein the target emotion label refers to the emotion of the expected rough synthesized face image;
and the correction module is used for processing the residual errors among the adjusted face images and the roughly synthesized face images by using a second-stage convolutional neural network, obtaining an image mask, and calculating a finally corrected synthesized face image according to the image mask.
7. The facial emotion synthesis apparatus of claim 6, wherein said facial extraction module comprises:
extraction unit: the face image is extracted from the current frame color image;
an adjusting unit: and the face image is used for adjusting the face image to a preset size.
8. The facial emotion synthesis device as recited in claim 6, said contour extraction module comprising:
and a detection unit: coordinates for detecting key points from the adjusted face image;
and a drawing unit: and the method is used for creating blank images with preset sizes and drawing the outline of the corresponding face part according to the coordinates of the key points.
9. The facial emotion synthesis device as recited in claim 6, said correction module comprising:
residual calculation unit: the residual image is used for calculating the face image which is roughly synthesized and the adjusted face image;
prediction unit: the second-stage convolutional neural network is used for processing the residual image and predicting an image mask;
and a correction unit: and the method is used for calculating a final corrected synthesized face image by using the rough synthesized face image, the adjusted image and the predicted image mask.
10. The facial emotion synthesis device as recited in claim 6, wherein said first level convolutional neural network comprises an image encoder, a contour encoder, an image decoder, and a contour decoder;
the image encoder and the contour encoder are composed of a plurality of downsampling layers, and the outputs of the image encoder and the contour encoder are sequentially connected with a splicing layer and a plurality of cascaded residual blocks;
the image decoder comprises a plurality of up-sampling layers and splicing layers, wherein each up-sampling layer is followed by one splicing layer, and the last splicing layer is connected with the output layer; the contour decoder consists of a plurality of upsampling layers, and the last upsampling layer is connected with the output layer;
the second-stage convolutional neural network is formed by sequentially connecting an input layer, a plurality of cascaded residual blocks, a convolutional layer and an output layer.
CN202010095755.XA 2020-02-17 2020-02-17 Face emotion synthesis method and device Active CN111461959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010095755.XA CN111461959B (en) 2020-02-17 2020-02-17 Face emotion synthesis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010095755.XA CN111461959B (en) 2020-02-17 2020-02-17 Face emotion synthesis method and device

Publications (2)

Publication Number Publication Date
CN111461959A CN111461959A (en) 2020-07-28
CN111461959B true CN111461959B (en) 2023-04-25

Family

ID=71680899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010095755.XA Active CN111461959B (en) 2020-02-17 2020-02-17 Face emotion synthesis method and device

Country Status (1)

Country Link
CN (1) CN111461959B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101320A (en) * 2020-11-18 2020-12-18 北京世纪好未来教育科技有限公司 Model training method, image generation method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067429A (en) * 2017-03-17 2017-08-18 徐迪 Video editing system and method that face three-dimensional reconstruction and face based on deep learning are replaced
CN108460812A (en) * 2018-04-04 2018-08-28 北京红云智胜科技有限公司 A kind of expression packet generation system and method based on deep learning
CN109087379A (en) * 2018-08-09 2018-12-25 北京华捷艾米科技有限公司 The moving method of human face expression and the moving apparatus of human face expression
CN109151340A (en) * 2018-08-24 2019-01-04 太平洋未来科技(深圳)有限公司 Method for processing video frequency, device and electronic equipment
CN109840477A (en) * 2019-01-04 2019-06-04 苏州飞搜科技有限公司 Face identification method and device are blocked based on eigentransformation
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
CN110427867A (en) * 2019-07-30 2019-11-08 华中科技大学 Human facial expression recognition method and system based on residual error attention mechanism
US10552977B1 (en) * 2017-04-18 2020-02-04 Twitter, Inc. Fast face-morphing using neural networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067429A (en) * 2017-03-17 2017-08-18 徐迪 Video editing system and method that face three-dimensional reconstruction and face based on deep learning are replaced
US10552977B1 (en) * 2017-04-18 2020-02-04 Twitter, Inc. Fast face-morphing using neural networks
CN108460812A (en) * 2018-04-04 2018-08-28 北京红云智胜科技有限公司 A kind of expression packet generation system and method based on deep learning
CN109087379A (en) * 2018-08-09 2018-12-25 北京华捷艾米科技有限公司 The moving method of human face expression and the moving apparatus of human face expression
CN109151340A (en) * 2018-08-24 2019-01-04 太平洋未来科技(深圳)有限公司 Method for processing video frequency, device and electronic equipment
CN109840477A (en) * 2019-01-04 2019-06-04 苏州飞搜科技有限公司 Face identification method and device are blocked based on eigentransformation
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
CN110427867A (en) * 2019-07-30 2019-11-08 华中科技大学 Human facial expression recognition method and system based on residual error attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
井长兴 ; 章东平 ; 杨力 ; .级联神经网络人脸关键点定位研究.中国计量大学学报.2018,(第02期),P81-P87. *

Also Published As

Publication number Publication date
CN111461959A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN110490896B (en) Video frame image processing method and device
US8655152B2 (en) Method and system of presenting foreign films in a native language
CN110659573B (en) Face recognition method and device, electronic equipment and storage medium
CN110033463B (en) Foreground data generation and application method thereof, and related device and system
CN109117753B (en) Part recognition method, device, terminal and storage medium
CN111768388A (en) Product surface defect detection method and system based on positive sample reference
WO2023066173A1 (en) Image processing method and apparatus, and storage medium and electronic device
CN111988657A (en) Advertisement insertion method and device
CN113808005A (en) Video-driving-based face pose migration method and device
CN111461959B (en) Face emotion synthesis method and device
CN115471886A (en) Digital person generation method and system
CN115731597A (en) Automatic segmentation and restoration management platform and method for mask image of face mask
Mattos et al. Multi-view mouth renderization for assisting lip-reading
KR101124560B1 (en) Automatic object processing method in movie and authoring apparatus for object service
CN110062132B (en) Theater performance reconstruction method and device
CN114898447B (en) Personalized fixation point detection method and device based on self-attention mechanism
CN114943746A (en) Motion migration method utilizing depth information assistance and contour enhancement loss
CN110569707A (en) identity recognition method and electronic equipment
CN113766130B (en) Video shooting method, electronic equipment and device
CN113256541B (en) Method for removing water mist from drilling platform monitoring picture by machine learning
US20220207261A1 (en) Method and apparatus for detecting associated objects
CN114627404A (en) Intelligent video character replacing method and system
Liu et al. Image inpainting algorithm based on KSVD and improved CDD
CN112232302A (en) Face recognition method
Zeng et al. Highly fluent sign language synthesis based on variable motion frame interpolation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant