WO2023020358A1 - 面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品 - Google Patents

面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023020358A1
WO2023020358A1 PCT/CN2022/111744 CN2022111744W WO2023020358A1 WO 2023020358 A1 WO2023020358 A1 WO 2023020358A1 CN 2022111744 W CN2022111744 W CN 2022111744W WO 2023020358 A1 WO2023020358 A1 WO 2023020358A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial
image
face
feature
dimensional
Prior art date
Application number
PCT/CN2022/111744
Other languages
English (en)
French (fr)
Inventor
贺珂珂
朱俊伟
赵艳丹
陈旭
邰颖
汪铖杰
李季檩
黄飞跃
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2022565902A priority Critical patent/JP7500768B2/ja
Priority to KR1020227041706A priority patent/KR20230028253A/ko
Priority to US18/070,301 priority patent/US20230100427A1/en
Publication of WO2023020358A1 publication Critical patent/WO2023020358A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/653Three-dimensional objects by matching three-dimensional models, e.g. conformal mapping of Riemann surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/755Deformable models or variational models, e.g. snakes or active contours
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, and in particular to a facial image processing method, a facial image processing model training method, device, equipment, storage medium and program product.
  • the facial image processing method in the related art only considers the maintenance of the identity of the face, so that the accuracy of facial image processing is low.
  • the embodiment of the present application proposes a facial image processing method, a facial image processing model training method, a device, a computer device, a computer-readable storage medium, and a computer program product, which improve the accuracy of facial image processing.
  • the embodiment of the present application provides a facial image processing method, the method is executed by a computer device, including:
  • the face image is subjected to face replacement feature extraction to obtain an initial face replacement feature
  • the template face in the face template image is replaced with the source face to obtain a replaced face image.
  • the embodiment of the present application also provides a facial image processing device, including:
  • a first acquiring unit configured to acquire a facial image of a source face and a facial template image of a template face
  • a three-dimensional facial modeling unit configured to perform three-dimensional facial modeling on the facial image and the facial template image to obtain the three-dimensional facial image features of the facial image and the three-dimensional facial template image features of the facial template image;
  • the first fusion unit is configured to fuse the 3D facial image features and the 3D facial template image features to obtain 3D fusion features;
  • the feature extraction unit is configured to perform facial replacement feature extraction on the facial image based on the facial template image to obtain initial facial replacement features
  • a conversion unit configured to convert the initial face replacement feature based on the three-dimensional fusion feature to obtain a target face replacement feature
  • the first replacement unit is configured to replace the template face in the face template image with the source face based on the target face replacement feature to obtain a replaced face image.
  • the embodiment of the present application also provides a training method of a facial image processing model, including:
  • the training image sample group includes facial image samples, facial template image samples and facial reference image samples;
  • the template face in described facial template image sample is replaced with the source face in described facial image sample, obtain predicted facial image;
  • model parameters of the facial image processing model are updated.
  • the embodiment of the present application also provides a training device for a facial image processing model, including:
  • the second acquisition unit is configured to acquire a training image sample group, the training image sample group including facial image samples, facial template image samples and facial reference image samples;
  • the second replacement unit is configured to use a face image processing model to replace the template face in the face template image sample with the source face in the face image sample to obtain a predicted face image;
  • the three-dimensional facial contour point detection unit is configured to perform three-dimensional facial contour point detection on the predicted facial image to obtain the three-dimensional facial contour point of the predicted facial image, and perform three-dimensional facial contour point detection on the facial reference image sample to obtain 3D facial contour points of the facial reference image sample;
  • a calculation unit configured to obtain the difference between the three-dimensional facial contour points of the predicted facial image and the three-dimensional facial contour points of the facial reference image sample, and obtain the facial contour points between the predicted facial image and the facial reference image sample contour loss;
  • An adjustment unit configured to update model parameters of the facial image processing model based on the facial contour loss.
  • the embodiment of the present application also provides a computer program product or computer program, the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium; The computer instruction is read, and the processor executes the computer instruction to implement the above method provided by the embodiment of the present application.
  • the embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the foregoing method provided in the embodiment of the present application is implemented.
  • the embodiment of the present application also provides a computer device, and the computer device includes:
  • memory configured to store executable instructions
  • the three-dimensional facial image features of the facial image and the three-dimensional facial template image features of the facial template image are obtained. Since the extracted image features are three-dimensional features, the facial contours in the facial image, that is, the facial features, can be preserved, thereby ensuring The facial contours before and after face replacement are consistent; the three-dimensional facial image features and the three-dimensional facial template image features are fused to obtain three-dimensional fusion features, so that the facial image obtained after face replacement has both facial image features and facial features.
  • the characteristics of the template image based on the face template image, the facial replacement feature is extracted from the facial image to obtain the initial facial replacement feature, and the initial facial replacement feature is converted based on the three-dimensional fusion feature to obtain the target facial replacement feature.
  • the conversion is based on the 3D fusion features obtained by fusing the 3D facial image features and the 3D facial template image features, so that the converted target face replacement features can carry both facial contour features and facial identity features, and then Based on the target face replacement feature, the face image obtained after replacing the template face in the face template image with the source face has a more natural and realistic display effect, thereby improving the accuracy of face image processing.
  • Fig. 1 is a schematic diagram of the scene of the method provided by the embodiment of the present application.
  • Fig. 2 is a schematic flow chart of the facial image processing method provided by the embodiment of the present application.
  • FIG. 3 is a schematic diagram of a scene of a facial image processing method provided in an embodiment of the present application.
  • Fig. 4 is another schematic diagram of the scene of the facial image processing method provided by the embodiment of the present application.
  • Fig. 5 is a schematic flow chart of the training method of the facial image processing model provided by the embodiment of the present application.
  • Fig. 6 is a schematic diagram of the scene of the training method of the facial image processing model provided by the embodiment of the present application.
  • Fig. 7 is another schematic flow chart of the facial image processing method provided by the embodiment of the present application.
  • Fig. 8 is another schematic flow chart of the training method of the facial image processing model provided by the embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a facial image processing device provided by an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of a training device for a facial image processing model provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the embodiment of the present application proposes a facial image processing method, which can be executed by a facial image processing device, and the facial image processing device can be integrated into a computer device.
  • the computer device may include at least one of a terminal and a server.
  • the embodiment of the present application also proposes a training method for the facial image processing model, so that the facial image processing model can be used to perform The facial image processing method proposed by the example of this application.
  • the facial image processing model training method provided in the embodiment of the present application may be executed by a facial image processing model training device, and the facial image processing model training device may be integrated in a computer device.
  • the computer device may include at least one of a terminal and a server.
  • the terminal can be a smartphone, a tablet computer, a notebook computer, a personal computer (Personal Computer, PC), a smart home, a wearable electronic device, a virtual reality (Virtual Reality, VR)/augmented reality (Augmented Reality, AR) device, On-board computers and more.
  • the server can be an intercommunication server or background server between multiple heterogeneous systems, it can also be an independent physical server, it can also be a server cluster or distributed system composed of multiple physical servers, and it can also provide cloud services and cloud databases. , cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc.
  • the facial image processing apparatus can be integrated on a computer device such as a terminal or a server to implement the facial image processing method proposed in the embodiment of the present application.
  • the computer device can obtain the facial image of the source face and the facial template image of the template face; carry out three-dimensional facial modeling to the facial image and the facial template image, and obtain the three-dimensional facial image features of the facial image and the three-dimensional facial template image features of the facial template image ;
  • the 3D face image feature and the 3D face template image feature are fused to obtain a 3D fusion feature; based on the face template image, the face replacement feature is extracted from the face image to obtain the initial face replacement feature; the initial face replacement feature is obtained based on the 3D fusion feature Convert to obtain the target face replacement feature; based on the target face replacement feature, replace the template face in the face template image with the source face to obtain the replaced face image.
  • the training device for the facial image processing model can be integrated in a computer device such as a terminal or a server, so as to implement the training method for the facial image processing model proposed in the embodiment of the present application.
  • computer equipment can obtain training image sample group, and training image sample group comprises face image sample, face template image sample and face reference image sample;
  • the source face of the predicted face is obtained to obtain the predicted facial image;
  • 3D facial contour point detection is performed on the predicted facial image to obtain the 3D facial contour point of the predicted facial image, and the 3D facial contour point detection is performed on the facial reference image sample to obtain the 3D facial contour point of the facial reference image sample.
  • Facial contour points obtain the difference between the 3D facial contour points of the predicted facial image and the 3D facial contour points of the facial reference image sample, and obtain the facial contour loss between the predicted facial image and the facial reference image sample; based on the facial contour loss, update
  • the model parameters of the facial image processing model that is, the facial image processing model is adjusted to obtain the trained facial image processing model.
  • the process of facial image processing can be regarded as replacing the face of the object in the face template image with the face of the object in the source image, which can be understood as changing the face of the face object in the face template image.
  • the so-called face-changing refers to changing the identity of the face in the face template image to the person in the source image, while keeping at least one of the elements in the face template image, such as posture, expression, makeup, and background, unchanged.
  • the facial image processing method proposed in the embodiment of this application can generally be applied in scenarios such as ID photo production, film and television portrait production, game character design, virtual image, and privacy protection.
  • the training process of the facial image processing model can be regarded as inputting multiple training image sample groups into the facial image processing model, so that the facial image processing model can continuously learn from multiple training image sample groups and continuously summarize Finally, the template face in the face template image can be accurately replaced with the source face of the face image.
  • the image processing method and the training method of the facial image processing model provided by the embodiment of the present application relate to the computer vision technology in the field of artificial intelligence, that is, in the embodiment of the present application, the computer vision technology of artificial intelligence can be used Replace the object in the face template image with the source object of the face image to obtain the replaced face image.
  • the embodiment of the present application will be described from the perspective of a facial image processing device.
  • the facial image processing device may be integrated in a computer device, and the computer device may be a server or a terminal or other equipment.
  • the facial image processing method provided by the embodiment of the present application can be realized by the terminal or the server alone, or by the cooperation of the terminal and the server. Taking the server alone as an example, as shown in Figure 2, a facial image processing method is provided.
  • An image processing method the method comprising:
  • the server acquires the face image of the source face and the face template image of the template face.
  • the facial image includes a source object
  • the so-called source object may be an object contained in the facial image.
  • the source object may be the person corresponding to the human face image.
  • the source face is the source of the face object that provides the replacement of the face object.
  • the template face which contains other elements such as the face object to be replaced and the face background to be maintained.
  • 001 in FIG. 3 may be a face image
  • 002 in FIG. 3 may be a face template image
  • 003 in the figure may be a face image after replacement. It can be seen from FIG. 3 that the facial features of the facial image 001 are maintained in the facial image 003 after replacement, and elements such as posture, expression, makeup and background of the facial template image 002 are also maintained.
  • the face image and the face template image can be obtained directly, or, when the number of the face image and the face template image is large or the memory is large, it can also be obtained indirectly.
  • Get the face image and face template image specifically as follows:
  • the original face image uploaded by the user and the image processing information corresponding to the original face image can be directly received, and the face image of the source face and the face template image of the template face are screened out from the original face image according to the image processing information, or, in the image A facial image pair is obtained from a database or the network, and one facial image is arbitrarily screened out from the facial image pair as the facial image of the source face, and the other facial image in the facial image pair is used as the facial template image.
  • the image processing request sent by the terminal can be received, the image processing request carries the storage address and image processing information of the original facial image, according to the storage address, the original facial image is obtained in the memory, cache or third-party database, and the original facial image is obtained according to the image processing information , filter out the face image of the source face and the face template image of the template face in the original face image.
  • prompt information may also be sent to the terminal to remind the terminal that the original facial image has been successfully acquired.
  • preprocessing can also be performed on the original facial image, thereby obtaining a facial image and a facial template image.
  • preprocessing can also be performed on the original facial image, thereby obtaining a facial image and a facial template image.
  • the original facial The size of the image is adjusted to a preset size, or facial keypoint registration can also be used to align facial objects in the original facial image to a uniform position.
  • the three-dimensional facial image features may include information indicating characteristics of the facial image in three-dimensional space.
  • the characteristics of the source facial features, facial contour, texture, angle and illumination in the facial image can be indicated by the features of the 3D facial image.
  • the features of the three-dimensional facial template image may include features describing features of the facial template image in three-dimensional space.
  • the features of the three-dimensional facial template image may include: at least one of the expression feature, texture feature, angle feature and illumination feature of the template face in the face template image.
  • the three-dimensional facial image features may include multiple (ie at least two) features, and these features together constitute the three-dimensional facial image features.
  • the 3D facial image features may include source facial identity features, facial contour features, source facial expression features, source facial texture features, source facial angle features, and source facial illumination features, and so on.
  • the source facial identity features include features that can explain the source facial identity in the 3D facial image.
  • the identity feature the source face of the facial image can be distinguished from the source face of other facial images, that is, when the face image includes the face of the target object, the identity feature can identify the target object, through the identity feature It is possible to know who the source face of the facial image is.
  • the source facial expression feature includes the feature that can explain the expression of the source face in the facial image
  • the source facial texture feature includes the feature that can explain the texture of the facial image
  • the source facial angle feature includes the feature that can explain the angle of the source face in the facial image, That is to say, the source facial angle feature can indicate the orientation of the source face in the facial image. For example, through the angle feature, it can be known whether the source face is looking to the left or right, or facing straight ahead
  • the source facial illumination feature includes the ability to explain the facial image characteristics of lightness and darkness.
  • the three-dimensional facial template image feature may also include multiple features, and these features together constitute the three-dimensional facial template image feature.
  • the three-dimensional facial template image features may also include template facial identity features, template facial expression features, template facial texture features, template facial angle features, template facial illumination features, and so on.
  • the template face in the facial template image can be replaced
  • information such as the expression, texture, angle, and orientation of the template face is retained, and the identity in the template face is replaced with the identity of the source face.
  • the face replacement only replaces Li Si with Zhang San, but Li Si is in the face template image 002 Information such as expressions, textures, angles, and lighting are still preserved.
  • the three-dimensional facial image features and the three-dimensional facial template image features may have multiple representation forms.
  • the 3D facial image features and the 3D facial template image features may be vectors.
  • the three-dimensional facial image features and the three-dimensional facial template image features may be matrices, and so on.
  • the face image and the face template image may be modeled in various ways to obtain the three-dimensional face image features of the face image and the three-dimensional face template image features of the face template image.
  • the 3D face modeling model can be used to perform 3D face modeling on the face image and the face template image, so as to obtain the 3D face image features of the face image and the 3D face template image features of the face template image.
  • the three-dimensional facial modeling model may include a model for performing three-dimensional modeling on an image and obtaining three-dimensional features of the image therefrom.
  • the three-dimensional facial modeling model may include at least one of a convolutional neural network (Convolutional Neural Networks, CNN), a deep residual network (Deep residual network, ResNet), a three-dimensional facial reconstruction model (3DMM), and the like.
  • a convolutional neural network Convolutional Neural Networks, CNN
  • a deep residual network Deep residual network, ResNet
  • 3DMM three-dimensional facial reconstruction model
  • a three-dimensional facial reconstruction model can be used to perform regression on the facial image and the facial template image respectively, thereby performing facial modeling on the source face and the template face, and obtaining multiple source facial features and templates of the facial image in the three-dimensional facial reconstruction model Multiple template facial features for face images. Then, multiple source facial features can be fused to obtain 3D facial image features, and multiple template facial features can be fused to obtain 3D facial template image features.
  • the 3D fusion features are obtained by fusing the features of the 3D face image and the 3D face template image, so that the replaced face image has both the features of the face image and the features of the face template image.
  • the source facial identity feature and the template facial image feature can be fused in the following manner to obtain a three-dimensional fusion feature:
  • the source facial identity features and template facial image features are fused to obtain 3D fusion features.
  • the source facial identity features include features capable of characterizing the source facial identity in the three-dimensional facial image
  • the template facial image features may include template facial expression features, template facial texture features, template facial angle features, and template facial illumination features.
  • the source facial identity features and the template facial image features can be fused in various ways to obtain a three-dimensional fused feature.
  • the source facial identity feature and the template facial image feature can be spliced, and for example, the source facial identity feature and the template facial image feature can be added, and for another example, the source facial identity feature and the template facial image feature can be weighted Summation, etc.
  • the source facial identity feature and the template facial image feature can be added to obtain a three-dimensional fusion feature.
  • the source facial identity feature, the template facial expression feature, the template facial texture feature, the template facial angle feature and the template facial illumination feature can be added to obtain a 3D fusion feature.
  • the addition process can be shown in the following formula:
  • symbol can represent source facial identity features
  • symbols can represent template facial expression features
  • symbols can represent template facial texture features
  • symbols can represent template facial angle features
  • symbols can express
  • performing three-dimensional facial modeling on the facial image and the facial template image is equivalent to processing the facial image and the facial template image from the perspective of three-dimensional space.
  • the face replacement feature extraction process on the face image is equivalent to processing the face template image and the face image from the perspective of two-dimensional space.
  • the initial face replacement feature may include a feature that forms a mapping relationship between the face image and the face template image.
  • multiple methods may be used to extract face replacement features from the face image based on the face template image to obtain initial face replacement features.
  • machine learning networks such as Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN) can be used to perform facial replacement feature extraction processing on facial images based on facial template images to obtain initial facial replacement features .
  • CNN Convolutional Neural Networks
  • GAN Generative Adversarial Networks
  • GAN As a method of deep learning, GAN is different from ordinary neural networks.
  • GAN is composed of two main networks, one is a generator or called a generator network (Generator Network), and the other is a discriminator or called a discriminator network (Discriminator Network). ).
  • the core logic of GAN is that the generator and the discriminator compete with each other and play games with each other.
  • the generator may be a neural network whose function is to generate content.
  • a generator can generate a picture, a piece of text, or a video, and so on.
  • the discriminator may also be a neural network, and its function is to discriminate the content input into the discriminator. For example, taking pictures as an example, the goal of the discriminator is to judge whether the picture input to the discriminator is a picture generated by the generator or a real picture.
  • the generator may include a decoder and an encoder.
  • the role of the encoder is to compress real data, compressing high-dimensional data into low-dimensional data.
  • the role of the decoder is to restore the compressed data to the original data.
  • a machine learning network composed of multiple convolutional neural networks may be used to extract face replacement features from the face image based on the face template image to obtain initial face replacement features.
  • a face image and a face replacement image can be fed into a machine learning network consisting of multiple convolutional neural networks, which then continuously reduces the resolution of the face image and face replacement Encode the two into initial face replacement features in .
  • the latent space includes the space formed by the structure of the machine learning network.
  • the machine learning network includes an input layer, an output layer, and several convolutional layers between the input layer and the output layer, and these several convolutional layers can form a latent space.
  • the face image can be extracted in the following manner to obtain the initial face replacement feature:
  • the first encoding feature is adjusted to obtain the initial face replacement feature.
  • the 3D fusion feature may indicate the relationship between the face image and the face template image in 3D space
  • the initial face replacement feature may indicate the relationship between the face image and the face template image in 2D space.
  • the initial face replacement feature can be converted based on the 3D fusion feature to obtain the target face replacement feature; for example, the 3D fusion feature and the initial face replacement feature can be mapped to the same space to obtain the target face replacement feature .
  • the initial face replacement features can be converted in multiple ways to obtain the target face replacement features.
  • the initial face replacement feature can be converted by using a norm to obtain the target face replacement feature.
  • a norm for example, based on the 3D fusion feature, the initial face replacement feature can be converted by means of L1 norm or L2 norm to obtain the target face replacement feature.
  • the initial face replacement feature can also be converted in the following manner to obtain the target face replacement feature:
  • the fourth logical operation is performed on the calculated face replacement feature and the calculated three-dimensional facial image feature to obtain the target face replacement feature.
  • the first logic operation and the second logic operation may include calculating the mean value of the data, calculating the variance of the data, calculating the standard deviation of the data, or calculating the covariance of the data, and so on.
  • the third logical operation and the fourth logical operation may include methods of processing data by using addition, subtraction, multiplication, and division.
  • the third logic operation and the fourth logic operation may include dividing data
  • the third logic operation and the fourth logic operation may include subtracting data, and so on.
  • the initial face replacement feature can also be transformed by Adaptive Instance Normalization (AdaIN) to obtain the target face replacement feature.
  • AdaIN Adaptive Instance Normalization
  • AdaIN is a method that can align the mean and variance of the 3D fusion features to the mean and variance of the initial face replacement features, thereby realizing the feature transformation of the image.
  • AdaIN can align the mean and variance of 3D fused features to the mean and variance of initial face replacement features, resulting in feature replacement.
  • AdaIN to convert the initial face replacement feature to obtain the target face replacement feature, which can be as follows:
  • x can represent the initial face replacement feature
  • y can represent the 3D fusion feature
  • ⁇ () and ⁇ () can represent the mean and standard deviation, respectively.
  • AdaIN(x, y) can represent the target face replacement features.
  • logical operations can be performed on the three-dimensional fusion features according to AdaIN to obtain the three-dimensional facial image features after the operation, and logical operations are performed on the initial face replacement features to obtain the post-operation
  • AdaIN addition, subtraction, multiplication, and division, etc.
  • the face replacement feature for .
  • the mean and standard deviation of the three-dimensional fusion features can be calculated according to AdaIN, and the mean and standard deviation of the initial face replacement features can be calculated.
  • the three-dimensional fusion feature includes at least two sub-three-dimensional fusion features (such as sub-three-dimensional fusion features can be three-dimensional facial identity features, three-dimensional facial expression features, three-dimensional facial texture features, three-dimensional facial angle features and facial lighting features), each of the sub-three-dimensional fusion features
  • the three-dimensional fusion feature corresponds to a feature dimension; correspondingly, the following method can be used to perform the first logical operation on the three-dimensional fusion feature to obtain the three-dimensional facial image feature after the operation: determine the standard deviation of at least two sub-three-dimensional fusion features, and divide the standard deviation As the calculated three-dimensional facial image features.
  • the initial face replacement feature and the calculated face replacement feature can be processed by logical operation to obtain the calculated face replacement feature.
  • the initial face replacement feature and the standard deviation of the initial face replacement feature may be subtracted and then divided to obtain the face replacement feature after operation.
  • the face replacement features after calculation and the three-dimensional facial image features after statistics can be processed by logical operation to obtain the target face replacement features.
  • the mean value of the calculated panel replacement feature and the statistical three-dimensional facial image feature may be multiplied and then added to obtain the target face replacement feature.
  • feature extraction can be performed on the face image to obtain the facial features of the face image, and then, based on The target face replacement features and the facial features of the face image are used to replace the template face in the face template image with the source face to obtain a replaced face image.
  • performing feature extraction on the facial image may include representing facial information through some numbers, and these data may refer to facial features of the facial image.
  • the facial features in the embodiment of the present application can be the geometric features of the source face, or the characteristic features of the source face; wherein, the geometric features refer to the geometric relationship between facial features such as eyes, nose and mouth , such as distance, area and angle, etc.; the characteristic features of the source face are to use the gray information of the facial image to extract global or local features through some algorithms.
  • the feature extraction of the facial image may be performed in various ways to obtain the facial features of the facial image.
  • the facial image can be generated pointwise to obtain the positional information of the facial feature points in the facial image; then, the space between the positional information of the facial feature points can be calculated separately difference, and use this spatial difference as a facial feature.
  • the convolution kernel can be used to perform convolution extraction on the facial image to obtain the convolution result of the facial image, and then perform regression processing on the convolution result of the facial image to obtain the facial feature.
  • a machine learning network that can extract facial features to perform feature extraction on facial images.
  • CNN deep neural network
  • DNN Deep Neural Networks
  • the trained face image processing model can be used to replace the template face in the face template image with the source face based on the target face replacement features and the facial features of the face image to obtain a replaced face image. For example, using GAN, based on the target face replacement features and the facial features of the face image, the template face in the face template image is replaced with the source face to obtain the replaced face image.
  • the model structure of the trained image processing model may include a three-dimensional face modeling network, a facial feature extraction network, and an adversarial generation network.
  • the confrontation generation network can include a generator and a discriminator.
  • the generator may include an encoder and a decoder.
  • the three-dimensional facial modeling network can be a ResNet network, and the three-dimensional facial modeling network can be configured to carry out three-dimensional facial modeling to the facial image and the facial template image, to obtain the three-dimensional facial image features of the facial image and the three-dimensional facial template image of the facial template image feature.
  • the 3D facial modeling network can also be configured to fuse 3D facial image features and 3D facial template image features to obtain 3D fusion features.
  • the facial feature extraction network may be a CNN network, and the facial feature extraction network may be configured to perform feature extraction on facial images to obtain facial features.
  • the encoder may perform face replacement feature extraction on the face image based on the face template image to obtain initial face replacement features.
  • the decoder can replace the template face in the face template image with the source face based on the target face replacement feature and the facial features of the face image to obtain the replaced face image.
  • a decoder may be used to decode the target face replacement feature and the facial features of the face image to obtain the decoded face replacement feature and the decoded facial feature.
  • the decoded facial replacement feature and the decoded facial feature are fused to obtain the fused facial feature.
  • the fused facial features can be mapped to a preset probability distribution space to obtain the probability distribution of the fused facial features.
  • a replacement face image is generated based on the probability distribution of the fused facial features.
  • the preset probability distribution space is a mathematical space in the decoder.
  • the preset probability distribution space is a space that is continuously formed during the training process of the facial image processing model, and content that matches the training purpose can be generated based on features.
  • the facial image processing model is trained by using three-dimensional facial contour points, so that the replaced facial image obtained by training the facial image processing model can maintain the facial contour of the source face in the facial image, so that the facial image The processing effect is more realistic, thereby improving the accuracy of facial image processing.
  • 011 in Fig. 4 may be a face image
  • 012 in Fig. 4 may be a face template image
  • 013 in Fig. 4 may be a replaced face image obtained by using related technologies
  • 014 in Fig. 4 may be a face image after training Comparing the replaced face image obtained by the facial image processing model of 013 and 014 in Figure 4, it can be clearly seen that the face in 014 is more similar to the face in 011.
  • the embodiment of the present application proposes a facial image processing method, the facial image processing method includes: acquiring the facial image of the source face and the facial template image of the template face; performing three-dimensional facial modeling on the facial image and the facial template image to obtain the facial image
  • the three-dimensional facial image features of the facial template image and the three-dimensional facial template image features of the facial template image; the three-dimensional facial image features and the three-dimensional facial template image features are fused to obtain the three-dimensional fusion feature; based on the facial template image, the facial image is extracted for facial replacement features , to obtain the initial face replacement feature; based on the three-dimensional fusion feature, the initial face replacement feature is converted to obtain the target face replacement feature; based on the target face replacement feature and the facial features of the facial image, the template face in the face template image is replaced with the source face, Get the replaced face image.
  • the embodiment of the present application also correspondingly proposes a training method for a facial image processing model.
  • a training device for a facial image processing model can be integrated in a computer device, which can be a server or a terminal.
  • the training method of the facial image processing model provided by the embodiment of the present application can be implemented by the terminal or the server alone, or by the terminal and the server in cooperation.
  • the server alone as an example, as shown in FIG. 5 , provides A training method for a facial image processing model, comprising:
  • the server acquires a training image sample group, where the training image sample group includes facial image samples, facial template image samples, and facial reference image samples.
  • the training image sample group includes data used for training the preset facial image processing model.
  • the set of training image samples includes facial image samples, facial template image samples, and facial reference image samples.
  • the face image sample may correspond to a face image.
  • the facial template image samples may correspond to facial template images.
  • the face image sample also includes the source face, and the face template image sample includes the template face.
  • 015 in FIG. 6 may be a face image sample, and 016 may be a face template image sample.
  • the face reference image samples include reference images synthesized using face images and face template image samples.
  • the face reference image sample not only has the information of the source face in the face image, but also has image information such as texture, angle, illumination and expression of the template face in the face template image sample.
  • the facial reference image sample may be equivalent to the replaced facial image, the difference being that the facial reference image sample is artificially synthesized.
  • the face reference image sample is equivalent to the training purpose of the developer. Its function is to serve as a reference for the image processing model during the training process of the image processing model, so that the image processing model after training can generate content that meets the requirements. .
  • training image sample sets can be obtained directly from open source websites.
  • the training image sample group may be obtained by collecting face image samples and face template image samples, and synthesizing face reference image samples by using the face image samples and the face template image samples.
  • the predicted face image may include an image obtained after replacing the template face in the face template image sample with the source face in the face image sample.
  • 017 in FIG. 6 may be a predicted facial image.
  • the facial image processing model includes a facial image processing model to be trained.
  • the structure of the preset facial image processing model is the same as that of the trained facial image processing model in step 106, except that the predicted facial image generated by the facial image processing model to be trained does not meet the training purpose.
  • the template face in the face template image sample is replaced with the source face in the face image sample, and the predicted face image obtained includes:
  • facial image processing model carry out three-dimensional facial modeling to facial image sample and facial template image sample, obtain the three-dimensional facial image sample feature of facial image sample and the three-dimensional facial template image sample feature of facial template image sample;
  • the facial image processing model is used to fuse the sample features of the three-dimensional facial image and the sample features of the three-dimensional facial template image to obtain the sample features of the fused three-dimensional facial image;
  • the face image sample is subjected to face replacement feature extraction to obtain the initial face replacement sample feature;
  • the initial face replacement sample features are converted to obtain the target face replacement sample features
  • the template face in the face template image sample is replaced with the source face of the face image sample to obtain a predicted face image.
  • three-dimensional face modeling can be performed on the face template image sample and the face image sample to obtain the three-dimensional face image sample features of the face image sample and the three-dimensional face template image sample features of the face template image sample.
  • the facial image processing model can fuse the sample features of the 3D facial image and the sample features of the 3D facial template image to obtain the fused sample features of the 3D facial image.
  • the face image processing model is based on the face template image sample, and extracts the face replacement feature from the face image sample to obtain the initial face replacement sample feature.
  • the preset facial image processing model can use AdaIN to inject the fused 3D facial image sample features into the initial face replacement sample features.
  • the 3D features in FIG. 6 may include fused 3D face image sample features.
  • the three-dimensional facial contour points include information points describing the facial contour in the image from three-dimensional space. For example, by predicting the three-dimensional facial contour points of the facial image, the contour information of the predicted face in the predicted facial image can be known. For another example, the contour information of the face in the face reference image sample can be known through the three-dimensional face contour points of the face reference image sample.
  • three-dimensional facial contour points can be used to train the facial image processing model. Therefore, the 3D facial contour point detection can be performed on the predicted facial image to obtain the 3D facial contour points of the predicted facial image, and the 3D facial contour point detection can be performed on the facial reference image samples to obtain the 3D facial contour points of the facial reference image samples.
  • the predicted facial image may be projected into a three-dimensional space, and three-dimensional facial contour points of the predicted facial image may be searched in the three-dimensional space.
  • 3D facial modeling can be performed on the predicted facial image to obtain the 3D predicted facial image features of the predicted facial image
  • 3D key point projection can be performed on the 3D predicted facial image features to obtain the 3D facial key points of the predicted facial image
  • the three-dimensional predicted facial image feature may indicate the predicted facial image feature in three-dimensional space.
  • the features of the three-dimensional predicted facial image may indicate information such as features of the predicted facial features, facial contour, texture, angle, and orientation in the predicted facial image.
  • the three-dimensional predicted facial image feature may include multiple features, and these features together constitute the three-dimensional predicted facial image feature.
  • the three-dimensional predicted facial image features may include predicted facial identity features, predicted facial expression features, predicted facial texture features, predicted facial angle features, predicted facial lighting features, and so on.
  • the key points of the three-dimensional face include multiple key points with the information of the predicted face, for example, all key points used to indicate the features of the predicted face are included.
  • the 018 in Figure 6 can be used to carry out three-dimensional facial modeling to the predicted facial image to obtain the three-dimensional predicted facial image features of the predicted facial image; then perform three-dimensional key to the three-dimensional predicted facial image features Point projection to get the 3D facial keypoints of the predicted facial image. Then, based on the 3D facial key points as shown in 019 in Figure 6, the 3D facial contour points can be screened out from the 3D facial key points.
  • the projection function may be used to project the 3D key points on the features of the 3D predicted facial image to obtain the 3D facial key points of the predicted facial image.
  • the three-dimensional key points of the predicted facial image can be projected according to the following formula to obtain the three-dimensional facial key points of the predicted facial image:
  • result_3d_points can be 3D facial key points
  • result_3d_feature can be 3D predicted facial image features
  • reconstruction_without_tex() can be a projection function.
  • the projection function may be of various types.
  • the projection function can be glOrtho(), glFrustum(), or gluPerspective() in the Open Graphics Library (OpenGL), etc.
  • the three-dimensional facial key points of the predicted facial image can be obtained.
  • the 3D key point projection is performed on the features of the 3D predicted facial image to obtain the 3D facial key points of the predicted facial image, including:
  • Preset transfer parameters are used to project the predicted facial identity features and predicted facial expression features to three-dimensional key points to obtain the three-dimensional facial key points of the predicted facial image.
  • the preset transfer parameters include preset parameters that can realize information transfer.
  • the predicted facial identity features and the predicted facial expression features can be projected into three-dimensional key points according to the following formula to obtain the three-dimensional facial key points of the predicted facial image:
  • id_coeff can be the predicted facial identity feature
  • ex_coeff can be the predicted facial expression feature
  • idBase, exBase, and meanshape can be preset transfer parameters.
  • the 3D facial contour points may be screened out from the 3D facial key points based on the 3D facial key points.
  • the location information of the key points of the 3D face can be obtained, and then the contour points of the 3D face can be filtered out from the key points of the 3D face based on the location information of the key points of the 3D face.
  • the 3D facial key points whose position information is at the edge can be determined as the 3D facial contour points.
  • the three-dimensional facial contour points may be filtered out according to the output sequence of the three-dimensional facial key points.
  • the output order of the 3D facial key points is specified according to preset settings. For example, there are 68 3D facial key points, of which the first 17 may be 3D facial contour points, so the 3D facial key points whose output order is in the first 17 positions can be determined as 3D facial contour points.
  • the difference between the three-dimensional facial contour points of the predicted facial image and the three-dimensional facial contour points of the facial reference image samples can be calculated to obtain the predicted facial image and the facial reference image samples Loss of facial contour between.
  • the difference between the three-dimensional facial contour points of the predicted facial image and the three-dimensional facial contour points of the facial reference image samples can be calculated to obtain the facial contour loss; for another example, the three-dimensional facial contour points of the predicted facial image and the facial reference image samples can be calculated The spatial similarity between the 3D facial contour points of , thus obtaining the facial contour loss.
  • 3d_point_loss abs(gt_3d_OutlookPoint-result_3d_OutlookPoint)
  • gt_3d_OutlookPoint can be the 3D facial contour point of the face reference image sample
  • result_3d_OutlookPoint can be the 3D facial contour point of the predicted facial image
  • 3d_point_loss can be the facial contour loss
  • abs() can be the absolute value symbol.
  • the loss in order to improve the performance of the facial image processing model after training, so that the facial image processing model after training can generate images that meet the requirements, the loss can be calculated from other multiple dimensions, and the facial contour loss and The loss in other dimensions adjusts the preset facial image processing model, that is, the model parameters of the facial image processing model are updated by combining the facial contour loss and the loss in other dimensions.
  • the facial feature loss between the face image sample and the predicted face image can be calculated, so that in addition to using the face contour loss to adjust the preset face image processing model, other losses can also be used to adjust the preset face The image processing model is adjusted.
  • calculating the difference between the facial reference image sample and the predicted facial image except for the three-dimensional facial contour points obtaining the first loss of the facial reference image sample and the predicted facial image, the first loss includes losses other than the facial contour loss;
  • the model parameters of the facial image processing model can be updated based on the second loss.
  • the loss in addition to the facial contour loss can include losses in other dimensions.
  • the loss in other dimensions may include at least one of pixel loss, feature loss and discriminative loss.
  • the pixel loss may include the face reference image sample and the predicted face image loss at the pixel level; the feature loss may include the face reference image sample and the predicted face image at the feature level loss.
  • feature loss may refer to the difference between a face of a face reference image sample and a predicted face of a predicted face image.
  • the facial image processing model has a discriminator, and the role of the discriminator is to identify whether the image generated by the generator is a real image. Therefore, the discriminative loss can include the information generated by the discriminator after discriminating the face reference image sample and the predicted face image.
  • the first loss includes pixel loss, feature loss, and discriminative loss
  • calculating the difference between the face reference image sample and the predicted face image except for the three-dimensional facial contour points obtains the face reference image sample and the predicted face image first loss, including:
  • the discriminative loss is obtained by computing the discriminative difference between the face reference image sample and the predicted face image.
  • the pixel information of the face reference image and the predicted face image can be extracted, and then the difference between the pixel information can be calculated to obtain the pixel loss.
  • the value of the face reference image sample on the color channel and the value of the predicted face image on the color channel can be extracted, and the absolute value can be obtained by subtracting the two to obtain the pixel loss.
  • the absolute value can be obtained by subtracting the two to obtain the pixel loss.
  • result can be the pixel information of the predicted facial image
  • gt_img can be the pixel information of the facial reference image sample
  • Reconstruction loss can be the pixel loss
  • the feature loss can include two-dimensional feature loss and three-dimensional feature loss, correspondingly, calculate The feature difference between the face reference image sample and the predicted face image, resulting in a feature loss, can include:
  • the two-dimensional feature loss and the three-dimensional feature loss are fused to obtain the feature loss.
  • the two-dimensional feature difference may include the feature difference between the face reference image sample and the predicted face image in two-dimensional space.
  • a two-dimensional feature difference may include a difference in image features of a reference image sample of a face and an image feature of a predicted face image.
  • the three-dimensional feature difference may include the feature difference between the face reference image sample and the predicted face image in three-dimensional space.
  • a 3D feature difference may include a difference between a 3D facial reference image sample feature of a facial reference image sample and a 3D predicted facial image feature of a predicted facial image.
  • feature extraction can be performed on the facial reference image sample and the predicted facial image respectively to obtain the image features of the facial reference image sample and the predicted
  • the image features of the face image are then computed as the difference between the image features of the reference image sample of the face and the image features of the predicted face image.
  • the Alexnet network can be used to extract features from the face reference image sample and the predicted face image to obtain the image features of the face reference image sample and the image features of the predicted face image.
  • the Alexnet network consists of 5 convolutional layers and 3 fully connected layers.
  • five convolutional layers can be configured to extract features from images, and each layer has an information transfer relationship. For example, after the first convolutional layer extracts the features of the image, the extracted information will be passed to the second convolutional layer. Then, the second convolutional layer will continue to perform feature extraction on the features extracted by the first convolutional layer, and pass the extracted features to the third convolutional layer. By analogy, the last fifth convolutional layer will pass the extracted features to the fully connected layer.
  • the difference between the image features of the face reference image samples and the image features of the predicted face image may be calculated in a number of ways.
  • the image perceptual similarity index can be used to calculate the difference between the image features of the face reference image sample and the image features of the predicted face image.
  • the two-dimensional feature loss can be calculated by means of difference.
  • the two-dimensional feature loss can be calculated by means of cosine similarity, and so on.
  • the difference between the image features of the face reference image samples in each convolutional layer and the image features of the predicted face image can be calculated.
  • gt_img_fral1, gt_img_feal2, gt_img_feal3 and gt_img_feal4 can respectively refer to the features of the facial reference image samples output by the four convolutional layers in the Alexnet network.
  • the Alexnet network can be used to extract the features of the predicted facial image to obtain result_feal1, result_feal2, result_feal3 and result_feal4.
  • result_feal1, result_feal2, result_feal3 and result_feal4 can respectively refer to the features of the predicted facial images output by the four convolutional layers in the Alexnet network.
  • the two-dimensional feature loss can be calculated according to the following formula:
  • Two_loss abs(result_feal1-gt_img_feal1)+abs(result_feal2 - gt_img_feal2)+abs( result_feal3 -gt_img_feal3)+abs(result_feal4-gt)img_feal4)
  • Two_loss can be a two-dimensional feature loss.
  • face modeling can be performed on the face reference image sample and the predicted face image to obtain the three-dimensional face reference image sample features of the face reference image sample and the three-dimensional predicted face image features of the predicted face image, and then the three-dimensional face reference can be calculated The difference between image sample features and 3D predicted face image features.
  • the difference between the sample features of the 3D facial reference image and the features of the 3D predicted facial image may also be calculated in various ways.
  • the image perceptual similarity index LPIPS
  • the three-dimensional feature loss can be calculated by means of difference.
  • the three-dimensional feature loss can be calculated by means of cosine similarity, and so on.
  • the three-dimensional feature loss can be calculated according to the following formula:
  • 3D predicted facial image features can be represented.
  • the two-dimensional feature loss and the three-dimensional feature loss may be fused to obtain the feature loss.
  • the two-dimensional feature loss and the three-dimensional feature loss can be added to obtain the feature loss.
  • the two-dimensional feature loss and the three-dimensional feature loss can be weighted and summed to obtain the feature loss.
  • the facial reference image sample and the predicted facial image may be subjected to scale transformation, and a discriminator is used to discriminate the scale-transformed image, This improves the richness of the discriminative loss. For example, performing scale transformation processing on the facial reference image sample and the predicted facial image respectively to obtain at least one scale-transformed facial reference image sample and at least one scale-transformed predicted facial image;
  • Discriminating processing is performed on at least one scale-transformed facial reference image sample and at least one scale-transformed predicted facial image, respectively, to obtain a first discriminant feature of the scale-transformed facial reference image sample and a second discriminant feature of the scale-transformed predicted facial image;
  • the discriminative loss is calculated based on the first discriminative feature and the second discriminative feature.
  • scaling may refer to changing the size of an image.
  • the size of an image is 256 long ⁇ 256 wide, and the size of the image can be changed to 128 long ⁇ 128 wide through scale transformation.
  • the original size of the face reference image sample is a
  • the face reference image sample with a size of 1/2a and the face reference image sample with a size of 1/4a can be obtained through scale transformation.
  • the predicted facial image with a size of 1/2b and the predicted facial image with a size of 1/4b can be obtained through scale transformation.
  • At least one scale-transformed facial reference image sample and at least one scale-transformed predicted facial image can be discriminated, respectively, to obtain the first discriminant feature of the scale-transformed facial reference image sample and the scale-transformed predicted facial image.
  • Second discriminant feature At least one scale-transformed facial reference image sample and at least one scale-transformed predicted facial image can be discriminated, respectively, to obtain the first discriminant feature of the scale-transformed facial reference image sample and the scale-transformed predicted facial image.
  • the face reference image sample with the original size a, the face reference image sample with the size 1/2a and the face reference image sample with the size 1/4a can be input into the discriminator to obtain the discriminant result.
  • the obtained discriminant results are respectively D(gt_img) , D(gt_img_1/2) and D(gt_img_1/4).
  • the symbol D() can represent the discrimination result of the discriminator.
  • gt_img can refer to the face reference image sample whose original size is a
  • gt_img_1/2 can refer to the face reference image sample whose size is 1/2a
  • gt_img_1/4 can refer to the face reference image sample whose size is 1/4a.
  • the discriminant results are generally represented by features.
  • D() is generally a value between 0 and 1, wherein when the judgment result is 1, it means that the image has passed the judgment, and when the judgment result is 0, it means that the image has not passed the judgment.
  • the first discriminant features may include D(gt_img), D(gt_img_1/2) and D(gt_img_1/4).
  • the predicted facial image with the original size b, the predicted facial image with the size 1/2b and the predicted facial image with the size 1/4b can be input into the discriminator to obtain the discriminant result.
  • result may refer to the predicted facial image with original size b
  • result_1/2 may refer to the predicted facial image with size 1/2a
  • result_1/4 may refer to the predicted facial image with size 1/4a.
  • the second discriminant feature may include a discriminant result of the discriminator.
  • the second discriminant features may include D(result), D(result_1/2) and D(result_1/4).
  • the discriminant loss can be calculated by means of difference.
  • the discriminant loss can be calculated by means of cosine similarity, and so on.
  • the discriminative loss can be calculated according to:
  • D_loss can be the discriminative loss.
  • the face feature loss between the face image sample and the predicted face image sample may also be calculated.
  • facial features may be extracted from the predicted facial image and facial image samples to obtain facial features of the predicted facial image and facial features of the facial image samples. Then, a facial feature loss between the facial features of the predicted face image and the facial features of the sample face image is calculated.
  • the two-dimensional feature loss can be calculated by means of difference.
  • the two-dimensional feature loss can be calculated by means of cosine similarity, and so on.
  • the two-dimensional feature loss can be calculated according to the following formula:
  • id loss can be a two-dimensional feature loss, can be the facial features of the predicted face image, Facial features that may be facial image samples.
  • cosine similarity can be the calculation method of cosine similarity, where the expression of cosine similarity can be as follows:
  • a and B can be vectors, A i can be a component in vector A, and B i can be a component in vector B.
  • i can refer to the i-th component, and n can refer to the total number of components in the quantity.
  • the first loss and the facial feature loss may be fused to obtain the second loss.
  • the first loss and the facial feature loss may be added to obtain the second loss.
  • the first loss and facial features may be weighted and summed to obtain the second loss.
  • the model parameters of the facial image processing model may be updated based on the facial contour loss, that is, the facial image processing model may be adjusted to obtain a trained image processing model.
  • the face contour loss can be used to constrain the prediction that the 3D facial contour points of the facial image are consistent with the 3D facial contour points of the facial reference image sample.
  • the model parameters in the preset facial image processing model can be adjusted based on the facial contour loss to obtain an adjusted facial image processing model. Then, use the training image sample group to train the adjusted facial image processing model.
  • the facial contour loss information is less than a certain level, or meets the requirements, it means that the training has achieved the goal. At this point, a trained image processing model whose performance meets the requirements can also be obtained.
  • the facial contour loss and the second loss can also be fused to obtain a third loss, and then the third loss is used to adjust the image processing model, that is, the third loss is used to update the model parameters of the facial image processing model, Obtain the trained facial image processing model. For example, obtain model parameters of a facial image processing model;
  • the facial contour loss and the second loss are fused to obtain the third loss
  • the third loss is used to adjust the model parameters to obtain the trained face image processing model.
  • the face contour and the second loss can be added to obtain the third loss.
  • the model parameters of the preset facial image processing model are adjusted by using the third loss to obtain the trained facial image processing model.
  • the preset facial image processing model in addition to learning how to realize face replacement, it will also learn three-dimensional features, so as to predict the three-dimensional features of the image, for example, as shown in Figure 6 .
  • the training image sample group can be obtained, and the training image sample group includes a face image sample, a face template image sample and a face reference image sample; the face template face in the face template image sample is replaced with a face by using a face image processing model
  • the source face in the image sample is obtained to predict the facial image; the 3D facial contour point detection is performed on the predicted facial image to obtain the 3D facial contour point of the predicted facial image, and the 3D facial contour point detection is carried out to the facial reference image sample to obtain the facial reference image
  • the three-dimensional facial contour point of sample Calculate the difference between the three-dimensional facial contour point of predicted facial image and the three-dimensional facial contour point of facial reference image sample, obtain the facial contour loss between predicted facial image and described facial reference image sample; Based on The facial contour loss facial image processing model is adjusted to obtain the trained facial image processing model.
  • the obtained facial image after replacement can maintain the facial contour of the source face, Therefore, the accuracy of the facial image processing method is improved.
  • the preset facial image processing model by calculating the loss information in multiple different dimensions, and using the losses in multiple dimensions to adjust the preset facial image processing model, so that the preset facial image processing model can use multiple
  • the loss in dimension adjusts the parameters in different dimensions, so that the face image processing model after training has better performance.
  • the method in the embodiment of the present application will be introduced by taking the training of the facial image processing model integrated on the computer device as an example.
  • a kind of training method of face image processing model comprises:
  • the computer device acquires a training image sample group, where the training image sample group includes facial image samples, facial template image samples, and facial reference image samples.
  • a face image sample may be represented as source
  • a face template image sample may be represented as target
  • a face reference image sample may be represented as gt_img.
  • the computer device replaces the template face in the face template image sample with the source face in the face image sample by using a preset facial image processing model to obtain a predicted facial image.
  • a computer device may input source and target to an encoder in a preset facial image processing model.
  • the encoder will continuously reduce the resolution of the source and target, and encode them into the initial face replacement sample features in the latent space.
  • the device on the computer can use the facial feature extraction network to perform feature extraction on the source to obtain the source_id_feature of the source's facial feature.
  • the computer device can also perform three-dimensional facial modeling on the source and the target to obtain the three-dimensional facial image sample features of the facial image sample and the three-dimensional facial template image sample features of the facial template image sample.
  • the computer device can use the preset face image processing model to convert the features of the initial face replacement sample based on the features of the fused three-dimensional face image samples to obtain the features of the target face replacement sample.
  • the computer device can use the preset face image processing model to replace the template face in the face template image sample with the source face of the face image sample based on the target face replacement sample feature and the face feature of the face image sample to obtain a predicted face image.
  • the computer device performs three-dimensional facial contour point detection on the predicted facial image to obtain three-dimensional facial contour points of the predicted facial image, and performs three-dimensional facial contour point detection on the facial reference image sample to obtain three-dimensional facial contour points of the facial reference image sample.
  • the computer device can calculate the three-dimensional predicted facial image features of result (may be represented as result_3d_feature).
  • the computer device can perform three-dimensional key point projection on the features of the three-dimensional predicted facial image to obtain three-dimensional facial key points of the predicted facial image. For example, as shown in the following formula:
  • the device on the computer can filter out the three-dimensional facial contour points from the three-dimensional facial key points based on the three-dimensional facial key points.
  • the computer device can calculate the three-dimensional facial template image sample features of gt_img (may be expressed as gt_3d_feature).
  • the computer device can perform three-dimensional key point projection on the features of the three-dimensional facial template image sample to obtain the three-dimensional facial key point of the facial reference image sample. For example, as shown in the following formula:
  • gt_3d_points reconstruction_without_tex(result_3d_feature)
  • gt_3d_points may be the 3D face key points of the face reference image sample.
  • the computer device calculates the difference between the three-dimensional facial contour points of the predicted facial image and the three-dimensional facial contour points of the facial reference image sample, and obtains the facial contour loss between the predicted facial image and the facial reference image sample.
  • the facial contour loss can be calculated according to the following formula:
  • 3d_point_loss abs(gt_3d_OutlookPoint-result_3d_OutlookPoint)
  • other losses may also be calculated, and the face contour loss and other losses may be used to jointly adjust the preset facial image processing model to obtain a trained facial image processing model.
  • facial feature loss For example, facial feature loss, pixel loss, feature loss, discriminative loss, and facial contour loss can be added together. Then, the preset facial image processing model is adjusted by using the added loss to obtain the trained facial image processing model.
  • the computer device adjusts the model parameters of the facial image processing model based on the facial contour loss to obtain a trained facial image processing model.
  • the computer device can obtain the training image sample group, and the training image sample group includes a face image sample, a face template image sample and a face reference image sample;
  • the template face is replaced with the source face in the facial image sample to obtain a predicted facial image;
  • the computer device can perform three-dimensional facial contour point detection on the predicted facial image, obtain the three-dimensional facial contour points of the predicted facial image, and perform three-dimensional facial contour point detection on the facial reference image sample.
  • Contour points are detected to obtain the three-dimensional facial contour points of the face reference image sample; computer equipment calculates the difference between the three-dimensional facial contour points of the predicted facial image and the three-dimensional facial contour points of the facial reference image sample, and obtains the predicted facial image and the facial reference Facial contour loss between image samples; the computer device adjusts the preset facial image processing model based on the facial contour loss information to obtain a trained facial image processing model.
  • the three-dimensional facial contour points to train the preset facial image processing model, when using the trained facial image processing model to replace the template face in the facial template image with the source face, the obtained replaced facial image can maintain the facial contour of the source face , thus improving the accuracy of the face image processing method.
  • the method in the embodiment of the present application will be introduced by taking the training of the facial image processing model integrated on the computer device as an example.
  • a facial image processing method comprising:
  • the computer device acquires a facial image of a source face and a facial template image of a template face.
  • the computer device performs three-dimensional facial modeling on the facial image and the facial template image to obtain the three-dimensional facial image features of the facial image and the three-dimensional facial template image features of the facial template image.
  • the computer device fuses the features of the three-dimensional facial image and the features of the three-dimensional facial template image to obtain three-dimensional fusion features.
  • the computer device Based on the face template image, the computer device extracts face replacement features from the face image to obtain initial face replacement features.
  • the computer device converts the initial face replacement feature based on the three-dimensional fusion feature to obtain the target face replacement feature.
  • the computer device uses the trained face image processing model to replace the template face in the face template image with the source face based on the target face replacement feature and the face features of the face image, to obtain a replaced face image.
  • the computer device obtains the facial image of the source face and the facial template image of the template face; the computer device performs three-dimensional facial modeling on the facial image and the facial template image, and obtains the three-dimensional facial image features of the facial image and the facial template image.
  • the three-dimensional face template image feature; the computer device fuses the three-dimensional face image feature and the three-dimensional face template image feature to obtain a three-dimensional fusion feature; the computer device extracts the face replacement feature from the face image based on the face template image to obtain an initial face replacement feature
  • the computer device converts the initial face replacement feature based on the three-dimensional fusion feature to obtain the target face replacement feature; the computer device uses the trained facial image processing model to convert the template face in the face template image based on the target face replacement feature and the facial features of the facial image. Replace it with the source face to get the replaced face image.
  • the embodiment of the present application can obtain more features of the face image and the face template image from the perspectives of two-dimensional space and three-dimensional space respectively, so that there are more information basis when performing face replacement processing, thereby improving the Accuracy of Facial Image Processing Methods.
  • the embodiment of the present application further provides a facial image processing device, which can be integrated into a computer device.
  • a facial image processing device which can be integrated into a computer device.
  • the meanings of the nouns are the same as those in the above facial image processing method, and for specific implementation details, please refer to the description in the method embodiments.
  • a facial image processing device is provided, and the facial image processing device can be integrated in a computer device.
  • the facial image processing device includes: a first acquisition unit 501, a three-dimensional facial modeling Unit 502, first fusion unit 503, feature extraction unit 504, conversion unit 505 and first replacement unit 506, specifically as follows:
  • the first acquiring unit 501 is configured to acquire the facial image of the source face and the facial template image of the template face;
  • Three-dimensional facial modeling unit 502 configured to carry out three-dimensional facial modeling to described facial image and described facial template image, obtain the three-dimensional facial image feature of described facial image and the three-dimensional facial template image feature of described facial template image;
  • the first fusion unit 503 is configured to fuse the 3D facial image features and the 3D facial template image features to obtain 3D fusion features;
  • the feature extraction unit 504 is configured to perform facial replacement feature extraction on the facial image based on the facial template image to obtain initial facial replacement features
  • the conversion unit 505 is configured to convert the initial face replacement feature based on the three-dimensional fusion feature to obtain a target face replacement feature
  • the first replacement unit 506 is configured to replace the template face in the face template image with the source face based on the target face replacement feature to obtain a replaced face image.
  • the first fusion unit 503 includes:
  • the first extraction subunit is configured to extract the source facial identity features corresponding to the facial image from the three-dimensional facial image features;
  • the second extraction subunit is configured to extract the template facial image features corresponding to the facial template image from the three-dimensional facial template image features;
  • the first fusion subunit is configured to fuse the source facial identity features and the template facial image features to obtain the three-dimensional fusion features.
  • the feature extraction unit 504 includes:
  • the first encoding subunit is configured to perform encoding processing on the facial template image to obtain a first encoding feature of the facial template image;
  • the second encoding subunit is configured to perform encoding processing on the facial image to obtain a second encoding feature of the facial image
  • the first adjustment subunit is configured to adjust the first encoding feature based on the second encoding feature to obtain the initial face replacement feature.
  • the conversion unit 505 includes:
  • the first statistical subunit is configured to perform a first logical operation on the three-dimensional fusion features to obtain the calculated three-dimensional facial image features, and perform a second logical operation on the initial facial replacement features to obtain the calculated facial replacement features ;
  • the second statistical subunit is configured to perform a third logic operation on the initial face replacement feature and the calculated face replacement feature to obtain the calculated face replacement feature;
  • the logic operation processing subunit is configured to perform logic operations on the calculated face replacement features and the calculated three-dimensional facial image features to obtain the target face replacement features.
  • Each of the above units can be implemented as an independent entity, or can be combined arbitrarily as the same or several entities.
  • the specific implementation of each of the above units can refer to the previous method embodiments, and will not be repeated here.
  • the accuracy of replacing facial images can be improved by the above-mentioned facial image processing device.
  • a training device for a facial image processing model is also provided, and the training device for a facial image processing model can be integrated into a computer device.
  • the meanings of the nouns are the same as those in the above-mentioned training method of the facial image processing model, and for specific implementation details, please refer to the description in the method embodiments.
  • a training device for a facial image processing model is provided.
  • the training device for the facial image processing model can be integrated in a computer device.
  • the training device for the facial image processing model includes: Two acquisition unit 601, second replacement unit 602, three-dimensional facial contour point detection unit 603, calculation unit 604 and adjustment unit 605, wherein:
  • the second acquiring unit 601 is configured to acquire a training image sample group, the training image sample group including facial image samples, facial template image samples and facial reference image samples;
  • the second replacement unit 602 is configured to use a face image processing model to replace the template face in the face template image sample with the source face in the face image sample to obtain a predicted face image;
  • the three-dimensional facial contour point detection unit 603 is configured to perform three-dimensional facial contour point detection on the predicted facial image, obtain the three-dimensional facial contour point of the predicted facial image, and perform three-dimensional facial contour point detection on the facial reference image sample, obtaining the three-dimensional facial contour points of the facial reference image sample;
  • the calculation unit 604 is configured to calculate the difference between the three-dimensional facial contour points of the predicted facial image and the three-dimensional facial contour points of the facial reference image sample, and obtain the difference between the predicted facial image and the facial reference image sample. loss of facial contour;
  • the adjustment unit 605 is configured to adjust the facial image processing model based on the facial contour loss to obtain a trained facial image processing model.
  • the three-dimensional facial contour point detection unit 603 includes:
  • the three-dimensional facial modeling subunit is configured to perform three-dimensional facial modeling on the predicted facial image to obtain the three-dimensional predicted facial image features of the predicted facial image;
  • the three-dimensional key point projection subunit is configured to perform three-dimensional key point projection on the features of the three-dimensional predicted facial image to obtain the three-dimensional facial key points of the predicted facial image;
  • the screening subunit is configured to filter out the 3D facial contour points from the 3D facial key points based on the 3D facial key points.
  • the three-dimensional key point projection subunit includes:
  • An extraction module configured to extract predicted facial identity features and predicted facial expression features of the predicted facial image from the three-dimensional predicted facial image features
  • the three-dimensional key point projection module is configured to perform three-dimensional key point projection on the predicted facial identity features and predicted facial expression features by using preset transfer parameters to obtain three-dimensional facial key points of the predicted facial image.
  • the training device of the facial image processing model also includes:
  • a first calculation unit configured to calculate differences between the facial reference image samples and the predicted facial image except for three-dimensional facial contour points to obtain a first loss of the facial reference image samples and the predicted facial image, so The first loss includes a loss other than a facial contour loss;
  • a second computing unit configured to compute a facial feature loss between said facial image sample and said predicted facial image
  • the second fusion unit is configured to perform fusion processing on the first loss information and the facial feature loss information to obtain a second loss.
  • the first computing unit includes:
  • the first calculation subunit is configured to calculate the pixel difference between the facial reference image sample and the predicted facial image to obtain a pixel loss
  • the second calculation subunit is configured to calculate the feature difference between the facial reference image sample and the predicted facial image to obtain a feature loss
  • the third calculation subunit is configured to calculate the discriminant difference between the face reference image sample and the predicted face image to obtain a discriminant loss.
  • the second computing subunit includes:
  • a first calculation module configured to calculate a two-dimensional feature difference between the facial reference image sample and the predicted facial image to obtain a two-dimensional feature loss
  • the second calculation module is configured to calculate the three-dimensional feature difference between the facial reference image sample and the predicted facial image to obtain a three-dimensional feature loss
  • the first fusion module is configured to fuse the two-dimensional feature loss information and the three-dimensional feature loss information to obtain the feature loss.
  • the third computing subunit includes:
  • a scaling module configured to perform scaling processing on the facial reference image sample and the predicted facial image respectively to obtain at least one scaled facial reference image sample and at least one scaled predicted facial image
  • a discrimination module configured to discriminate the at least one scale-transformed face reference image sample and the at least one scale-transformed predicted face image respectively, to obtain the first discriminant feature and the scale-transformed face reference image sample predicting a second discriminative feature of the face image after scale transformation;
  • a third calculation module configured to calculate the discriminant loss based on the first discriminant feature and the second discriminant feature.
  • the adjustment unit 605 includes:
  • the obtaining subunit is configured to obtain the model parameters of the facial image processing model after training;
  • the second fusion subunit is configured to perform fusion processing on the facial contour loss and the second loss to obtain a third loss
  • a parameter adjustment unit configured to use the third loss to adjust the model parameters to obtain the trained facial image processing model.
  • the embodiment of the present application also provides a computer device.
  • the computer device may include a terminal or a server.
  • the terminal may be a mobile phone, a tablet computer, etc.; another example, the computer device may be a server, and the like.
  • FIG. 11 it shows a schematic structural diagram of a terminal involved in the embodiment of the present application.
  • the computer device may include a processor 701 of one or more processing cores, a memory 702 of one or more computer-readable storage media, a power supply 703, an input unit 704 and other components.
  • a processor 701 of one or more processing cores may include a processor 701 of one or more processing cores, a memory 702 of one or more computer-readable storage media, a power supply 703, an input unit 704 and other components.
  • FIG. 11 is not limited to the computer device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. in:
  • the processor 701 is the control center of the computer equipment, connects various parts of the entire computer equipment with various interfaces and lines, runs or executes the software programs and/or modules stored in the memory 702, and calls the Data, perform various functions of computer equipment and process data, so as to monitor the computer equipment as a whole.
  • the processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user pages and application Programs, etc., the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 701 .
  • the memory 702 can be configured to store software programs and modules, and the processor 701 executes various functional applications and data processing by running the software programs and modules stored in the memory 702 .
  • the memory 702 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.); Data created by the use of computer equipment, etc.
  • the memory 702 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the memory 702 may further include a memory controller to provide the processor 701 with access to the memory 702 .
  • the computer device also includes a power supply 703 for supplying power to various components.
  • the power supply 703 can be logically connected to the processor 701 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.
  • the power supply 703 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.
  • the computer device may also include an input unit 704, which may be configured to receive input numeric or character information, and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • an input unit 704 may be configured to receive input numeric or character information, and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • the computer device may also include a display unit, etc., which will not be repeated here.
  • the processor 701 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 702 according to the following instructions, and the processor 701 runs the executable file stored in the The application program in the memory 702, so as to realize the above-mentioned facial image processing method or the training method of the facial image processing model provided by the embodiment of the present application
  • the computer program stored in the storage medium can execute the steps in any facial image processing method provided by the embodiments of the present application, it can realize any facial image processing method provided by the embodiments of the present application.
  • any facial image processing method provided by the embodiments of the present application For the beneficial effects that can be achieved, see the previous embodiments for details, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了面部图像处理方法和面部图像处理模型的训练方法;可以获取源面部的面部图像和模板面部的面部模板图像;对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征;将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征;基于面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征;基于目标面部替换特征,将面部模板图像中的模板面部替换为源面部,得到替换后面部图像。

Description

面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品
相关申请的交叉引用
本申请基于申请号为202110963370.5、申请日为2021年08月20日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域,尤其涉及一种面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品。
背景技术
随着技术的发展,在电影特效和互联网社交等应用中,存在保持将面部图像中对象的风格的情况下,将对象的面部替换为其他对象的面部的需求,针对这种需求,就需要对面部图像进行处理。
相关技术中的面部图像处理方法,只考虑了面部的身份的保持,使得面部图像处理的准确性较低。
发明内容
本申请实施例提出了一种面部图像处理方法、面部图像处理模型的训练方法、装置、计算机设备、计算机可读存储介质及计算机程序产品,提高了面部图像处理的准确性。
本申请实施例提供了一种面部图像处理方法,该方法由计算机设备执行,包括:
获取源面部的面部图像和模板面部的面部模板图像;
对所述面部图像和所述面部模板图像进行三维面部建模,得到所述面部图像的三维面部图像特征和所述面部模板图像的三维面部模板图像特征;
将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征;
基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;
基于所述三维融合特征对所述初始面部替换特征进行转换,得到目标面部替换特征;
基于所述目标面部替换特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后的面部图像。
本申请实施例还提供了一种面部图像处理装置,包括:
第一获取单元,配置为获取源面部的面部图像和模板面部的面部模板图像;
三维面部建模单元,配置为对所述面部图像和所述面部模板图像进行三维面部建模,得到所述面部图像的三维面部图像特征和所述面部模板图像的三维面部模板图像特征;
第一融合单元,配置为将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征;
特征提取单元,配置为基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;
转换单元,配置为基于所述三维融合特征对所述初始面部替换特征进行转换,得到目标面部替换特征;
第一替换单元,配置为基于所述目标面部替换特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后的面部图像。
本申请实施例还提供了一种面部图像处理模型的训练方法,包括:
获取训练图像样本组,所述训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;
利用面部图像处理模型,将所述面部模板图像样本中的模板面部替换为所述面部图像样本中的源面部,得到预测面部图像;
对所述预测面部图像进行三维面部轮廓点检测,得到所述预测面部图像的三维面部轮廓点,以及对所述面部参考图像样本进行三维面部轮廓点检测,得到所述面部参考图像样本的三维面部轮廓点;
获取所述预测面部图像的三维面部轮廓点和所述面部参考图像样本的三维面部轮廓点之间的差异,得到所述预测面部图像和所述面部参考图像样本之间的面部轮廓损失;
基于所述面部轮廓损失,更新所述面部图像处理模型的模型参数。
本申请实施例还提供了一种面部图像处理模型的训练装置,包括:
第二获取单元,配置为获取训练图像样本组,所述训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;
第二替换单元,配置为利用面部图像处理模型,将所述面部模板图像样本中的模板面部替换为所述面部图像样本中的源面部,得到预测面部图像;
三维面部轮廓点检测单元,配置为对所述预测面部图像进行三维面部轮廓点检测,得到所述预测面部图像的三维面部轮廓点,以及对所述面部参考图像样本进行三维面部轮廓点检测,得到所述面部参考图像样本的三维面部轮廓点;
计算单元,配置为获取所述预测面部图像的三维面部轮廓点和所述面部参考图像样本的三维面部轮廓点之间的差异,得到所述预测面部图像和所述面部参考图像样本之间的面部轮廓损失;
调整单元,配置为基于所述面部轮廓损失,更新所述面部图像处理模型的模型参数。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中;计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,以实现本申请实施例提供的上述方法。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时,实现本申请实施例提供的上述方法。
本申请实施例还提供一种计算机设备,所述计算机设备包括:
存储器,配置为存储可执行指令;
处理器,配置为执行所述存储器中存储的可执行指令时,实现本申请实施例提供的上述方法。
本申请实施例具有以下有益效果:
通过三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征,由于提取得到的图像特征是三维特征,使得面部图像中面部轮廓即脸型特征能够得以保留,进而保证面部替换前后面部轮廓相一致;将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征,如此,使得进行面部替换后所得到的面部图像,既具有面部图像的特征,又具有面部模板图像的特征;基于面部模板图像,对面部图像进行面部替换特征提取,得到初始面部替换特征,基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征,由于对初始面部替换特征进行转换,依据的是对三维面部图像特征和三维面部模板图像特征进行融合所得到的三维融合特征,使得转换得到的目标面部替换特征,既能够携带面部轮廓特征,又能够携带面部的身份特征,进而在基于目标面部替换特征,将面部模板图像中的模板面部替换为源面部后所得到的面部图像,显示效果更为自然及逼真,从而提高面部图像处理的准确性。
附图说明
图1是本申请实施例提供的方法的场景示意图;
图2是本申请实施例提供的面部图像处理方法的流程示意图;
图3是本申请实施例提供的面部图像处理方法的场景示意图;
图4是本申请实施例提供的面部图像处理方法的又一场景示意图;
图5是本申请实施例提供的面部图像处理模型的训练方法的流程示意图;
图6是本申请实施例提供的面部图像处理模型的训练方法的场景示意图;
图7是本申请实施例提供的面部图像处理方法的又一流程示意图;
图8是本申请实施例提供的面部图像处理模型的训练方法的又一流程示意图;
图9是本申请实施例提供的面部图像处理装置的结构示意图;
图10是本申请实施例提供的面部图像处理模型的训练装置的结构示意图;
图11是本申请实施例提供的终端的结构示意图。
具体实施方式
为使下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,然而,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
本申请实施例提出了一种面部图像处理方法,该面部图像处理方法可以由面部图像处理装置执行,该面部图像处理装置可以集成在计算机设备中。其中,该计算机设备可以包括终端以及服务器等中的至少一个。
在一些实施例中,为了更好地实施本申请实施例提出的面部图像处理方法,相应地,本申请实施例还提出了一种面部图像处理模型的训练方法,从而可以利用面部图像处理模型执行本申请实例提出的面部图像处理方法。本申请实施例提供的面部图像处理模型的训练方法可以由面部图像处理模型的训练装置执行,该面部图像处理模型的训练装置可以集成在计算机设备中。其中,该计算机设备可以包括终端以及服务器 等中的至少一个。
其中,终端可以为智能手机、平板电脑、笔记本电脑、个人电脑(Personal Computer,PC)、智能家居、可穿戴电子设备、虚拟现实(Virtual Reality,VR)/增强现实(Augmented Reality,AR)设备、车载计算机等等。服务器可以为多个异构系统之间的互通服务器或者后台服务器,还可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、以及大数据和人工智能平台等基础云计算服务的云服务器等等。
在一些实施例中,如图1所述,面部图像处理装置可以集成在终端或服务器等计算机设备上,以实施本申请实施例提出的面部图像处理方法。例如,计算机设备可以获取源面部的面部图像和模板面部的面部模板图像;对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征;将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征;基于面部模板图像,对面部图像进行面部替换特征提取,得到初始面部替换特征;基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征;基于目标面部替换特征,将面部模板图像中的模板面部替换为源面部,得到替换后的面部图像。
面部图像处理模型的训练装置可以集成在终端或服务器等计算机设备中,以实施本申请实施例提出的面部图像处理模型的训练方法。例如,计算机设备可以获取训练图像样本组,训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;利用面部图像处理模型将面部模板图像样本中的模板面部替换为面部图像样本中的源面部,得到预测面部图像;对预测面部图像进行三维面部轮廓点检测,得到预测面部图像的三维面部轮廓点,以及对面部参考图像样本进行三维面部轮廓点检测,得到面部参考图像样本的三维面部轮廓点;获取预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的差异,得到预测面部图像和面部参考图像样本之间的面部轮廓损失;基于面部轮廓损失,更新面部图像处理模型的模型参数,也即,对面部图像处理模型进行调整,得到训练后的面部图像处理模型。
其中,面部图像处理的过程可以看作是将面部模板图像中对象的面部替换为源图中对象的面部,可以理解为对面部模板图像中脸部对象进行换脸,以面部为人脸为例,所谓换脸指的是将面部模板图中人脸身份换成源图中的人,同时保持面部模板图中的人脸的姿态、表情、妆容和背景等元素中至少之一不变。在本申请实施例提出的面部图像处理方法通常可以应用在证件照制作、影视人像制作、游戏人物设计、虚拟形象、隐私保护等场景下。
其中,对面部图像处理模型的训练过程可以看作将多个训练图像样本组输入到面部图像处理模型中,从而使得面部图像处理模型可以从多个训练图像样本组中不断学习,并不断地总结出规律,最终可以精确地将面部模板图像中的模板面部替换为面部图像的源面部。
其中,需说明的是,本申请实施例提供的图像处理方法和面部图像处理模型的训练方法涉及到人工智能领域的计算机视觉技术,即在本申请实施例中,可以利用人工智能的计算机视觉技术将面部模板图像中的对象替换为面部图像的源对象,得到替换后面部图像。
本申请实施例将从面部图像处理装置的角度进行描述,该面部图像处理装置可以集成在计算机设备中,该计算机设备可以是服务器,也可以是终端等设备。
在一些实施例中,本申请实施例提供的面部图像处理方法可以由终端或服务器单独实现,或由终端及服务器协同实现,以服务器单独实现为例,如图2所述,提供了 一种面部图像处理方法,该方法包括:
101、服务器获取源面部的面部图像和模板面部的面部模板图像。
其中,面部图像包括源对象,所谓源对象可以为面部图像中包含对象,以面部图像为人脸图像为例,则源对象就可以为该人脸图像对应的人。源面部为提供面部对象替换的面部对象源,与之相对应的是模板面部,该模板面部中包含了需要替换的面部对象以及需要保持的面部背景等其他元素。
例如,如图3所示,图3中的001可以是面部图像,图3中的002可以是面部模板图像,图中的003可以是替换后面部图像。从图3中可以看出,替换后面部图像003中保持了面部图像001脸部特征,同时还保持了面部模板图像002的姿态、表情、妆容和背景等元素。
其中,获取面部图像和面部模板图像的方式可以有多种,比如,可以直接获取面部图像和面部模板图像,或者,当面部图像和面部模板图像的数量较多或者内存较大时,还可以间接获取面部图像和面部模板图像,具体可以如下:
(1)直接获取面部图像和面部模板图像
例如,可以直接接收用户上传的原始面部图像和原始面部图像对应的图像处理信息,根据图像处理信息,在原始面部图像中筛选出源面部的面部图像和模板面部的面部模板图像,或者,在图像数据库或者网络上获取面部图像对,在面部图像对中任意筛选出一个面部图像作为源面部的面部图像,将面部图像对中的另一个面部图像作为面部模板图像。
(2)间接获取面部图像和面部模板图像
例如,可以接收终端发送的图像处理请求,该图像处理请求中携带原始面部图像的存储地址和图像处理信息,根据存储地址,在内存、缓存或第三方数据库中获取原始面部图像,根据图像处理信息,在原始面部图像中筛选出源面部的面部图像和模板面部的面部模板图像。
在一些实施例中,在成功获取到原始面部图像之后,还可以向终端发送提示信息,以提示终端成功获取到原始面部图像。
在一些实施例中,在成功获取到原始面部图像之后,还可以对原始面部图像中进行预处理,从而得到面部图像和面部模板图像,预处理的方式可以有多种,比如,可以将原始面部图像的尺寸调整为预设尺寸,或者,还可以使用面部关键点配准将原始面部图像中面部对象对齐到统一位置。
102、对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征。
其中,三维面部图像特征可以包括在三维空间中指示面部图像特点的信息。例如,通过三维面部图像特征可以指示面部图像中源面部五官的特点、面部轮廓、纹理、角度和光照等信息。
其中,三维面部模板图像特征可以包括在三维空间中说明面部模板图像特点的特征。例如,三维面部模板图像特征可以包括:面部模板图像中模板面部的表情特征、纹理特征、角度特征和光照特征中至少之一。
在一些实施例中,三维面部图像特征中可以包括多个(即至少两个)特征,这些特征共同构成了三维面部图像特征。例如,三维面部图像特征可以包括源面部身份特征、面部轮廓特征、源面部表情特征、源面部纹理特征、源面部角度特征和源面部光照特征,等等。
其中,源面部身份特征包括能够说明三维面部图像中源面部身份的特征。譬如,通过身份特征,可以使得面部图像的源面部和其他面部图像的源面部区分开来,也即, 当面部图像中包括目标对象的面部时,该身份特征能够标识该目标对象,通过身份特征可以知道面部图像的源面部是谁。
其中,源面部表情特征包括能够说明面部图像中源面部的表情的特征;源面部纹理特征包括能够说明面部图像的纹理的特征;源面部角度特征包括能够说明面部图像中源面部的角度的特征,也即该源面部角度特征能够指示面部图像中源面部的朝向,例如,通过角度特征可以知道源面部是向左看还是向右看,或者是朝向正前方;源面部光照特征包括能够说明面部图像的明暗程度的特征。
在一些实施例中,三维面部模板图像特征也可以包括多个特征,这些特征共同构成了三维面部模板图像特征。例如,三维面部模板图像特征也可以包括模板面部身份特征、模板面部表情特征、模板面部纹理特征、模板面部角度特征和模板面部光照特征,等等。
在一些实施例中,当三维面部模板图像特征包括模板面部身份特征、模板面部表情特征、模板面部纹理特征、模板面部角度特征和模板面部光照特征时,能够在将面部模板图像中的模板面部替换为面部图像的源面部时,保留模板面部的表情、纹理、角度和关照等信息,而将模板面部中的身份替换为源面部的身份。例如,如图3所示,假设面部图像001中的人物为张三,而面部模板图像002中的人物为李四,面部替换只是将李四替换为张三,但是李四在面部模板图像002中表情、纹理、角度和光照等信息仍是保留的。
在一些实施例中,三维面部图像特征和三维面部模板图像特征可以有多种表现形式。例如,三维面部图像特征和三维面部模板图像特征可以是向量。又例如,三维面部图像特征和三维面部模板图像特征可以是矩阵,等等。
在一些实施例中,可以采用多种方式对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征。
例如,可以利用三维面部建模模型分别对面部图像和面部模板图像进行三维面部建模,从而得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征。
其中,三维面部建模模型可以包括对图像进行三维建模,并从中获取到图像的三维特征的模型。
例如,三维面部建模模型可以包括卷积神经网络(Convolutional Neural Networks,CNN)、深度残差网络(Deep residual network,ResNet)、三维面部重建模型(3DMM)等中的至少一种。
又例如,可以采用三维面部重建模型分别对面部图像和面部模板图像进行回归,从而对源面部和模板面部进行面部建模,并在三维面部重建模型中获取面部图像的多个源面部特征以及模板面部图像的多个模板面部特征。然后可以将多个源面部特征进行融合,从而得到三维面部图像特征,以及将多个模板面部特征进行融合,从而得到三维面部模板图像特征。
103、将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征。
在实际应用中,通过将三维面部图像特征和三维面部模板图像特征进行融合处理,得到三维融合特征,使得替换后面部图像既具有面部图像的特征,又具有面部模板图像的特征。
在一些实施例中,为了在将面部模板图像中的模板面部替换为面部图像的源面部时,能够保留模板面部5表情、纹理、角度和关照等信息,而将模板面部中的身份替换为源面部的身份,可以采用如下方式将源面部身份特征和模板面部图像特征进行融 合,得到三维融合特征:
在三维面部图像特征中提取出面部图像的源面部身份特征;
在三维面部模板图像特征中提取出面部模板图像的模板面部图像特征;
将源面部身份特征和模板面部图像特征进行融合,得到三维融合特征。
其中,源面部身份特征包括能够表征三维面部图像中源面部身份的特征;模板面部图像特征可以包括模板面部表情特征、模板面部纹理特征、模板面部角度特征和模板面部光照特征。
在一些实施例中,可以利用多种方式将源面部身份特征和模板面部图像特征进行融合,得到三维融合特征。
例如,可以将源面部身份特征和模板面部图像特征进行拼接,又例如,可以将源面部身份特征和模板面部图像特征进行相加,又例如,可以将源面部身份特征和模板面部图像特征进行加权求和,等等。
在一些实施例中,可以将源面部身份特征和模板面部图像特征进行相加,从而得到三维融合特征。
例如,可以将源面部身份特征、模板面部表情特征、模板面部纹理特征、模板面部角度特征和模板面部光照特征进行相加,从而得到三维融合特征。其中,相加过程可以如下公式所示:
Figure PCTCN2022111744-appb-000001
其中,符号
Figure PCTCN2022111744-appb-000002
可以表示源面部身份特征,符号
Figure PCTCN2022111744-appb-000003
可以表示模板面部表情特征,符号
Figure PCTCN2022111744-appb-000004
可以表示模板面部纹理特征,符号
Figure PCTCN2022111744-appb-000005
可以表示模板面部角度特征,符号
Figure PCTCN2022111744-appb-000006
可以表示
Figure PCTCN2022111744-appb-000007
104、基于面部模板图像,对面部图像进行面部替换特征提取,得到初始面部替换特征。
在一些实施例中,对面部图像和面部模板图像进行三维面部建模,相当于从三维空间的角度出发对面部图像以及对面部模板图像进行处理。而基于面部模板图像,对面部图像进行面部替换特征提取处理则相当于从二维空间的角度出发对面部模板图像以及面部图像进行处理。
通过分别从二维空间以及三维空间的角度出发,可以获取到面部图像和面部模板图像更多的特征,从而使得在进行面部替换处理时,有更加多的信息依据,从而提高了面部图像处理方法的准确性。
其中,初始面部替换特征可以包括将面部图像和面部模板图像之间形成映射关系的特征。
在一些实施例中,可以利用多种方式基于面部模板图像,对面部图像进行面部替换特征提取处理,得到初始面部替换特征。
例如,可以利用卷积神经网络(Convolutional Neural Networks,CNN)、生成对抗网络(Generative Adversarial Networks,GAN)等机器学习网络基于面部模板图像,对面部图像进行面部替换特征提取处理,得到初始面部替换特征。
GAN作为深度学习的一种方法,与普通神经网络不同,GAN由两个主要网络构成,一个是生成器或称为生成网络(Generator Network),另一个是判别器或称为判 别网络(Discriminator Network)。而GAN的核心逻辑就是生成器和判别器相互对抗、相互博弈。
其中,生成器可以是神经网络,其作用是生成内容。例如,生成器可以生成一张图片、一段文字或一段视频等等。
其中,判别器也可以是神经网络,其作用是对输入判别器中的内容进行判别。例如,以图片为例,判别器的目标是判断输入判别器的图片是生成器生成的图片还是真实图片。
其中,生成器可以包括解码器和编码器。编码器的作用是是现实数据的压缩,将高维数据压缩成低维数据。而解码器的作用是将压缩数据还原成原始数据。
又例如,可以利用由多个卷积神经网络构成的机器学习网络基于面部模板图像,对面部图像进行面部替换特征提取处理,得到初始面部替换特征。
例如,可以将面部图像和面部替换图像输入到由多个卷积神经网络构成的机器学习网络中,然后,该机器学习网络会将面部图像和面部替换图像的分辨率不断降低,并在隐空间中将两者编码成初始面部替换特征。
其中,隐空间包括机器学习网络的结构所形成的空间。例如,该机器学习网络包括输入层、输出层以及在输入层和输出层中间的若干个卷积层,则这若干个卷积层就可以构成隐空间。
在一些实施例中,基于面部模板图像,可采用如下方式对面部图像进行面部替换特征提取,得到初始面部替换特征:
对面部模板图像进行编码,得到面部模板图像的第一编码特征;
对面部图像进行编码,得到面部图像的第二编码特征;
基于第二编码特征,对第一编码特征进行调整,得到初始面部替换特征。
105、基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征。
在一些实施例中,三维融合特征可以指示面部图像和面部模板图像在三维空间中的关系,而初始面部替换特征可以指示面部图像和面部模板图像在二维空间中的关系。为了提高面部替换的准确度,可以基于三维融合特征,对初始面部替换特征进行转换,得到目标面部替换特征;例如,可以将三维融合特征和初始面部替换特征映射到同一空间,得到目标面部替换特征。
在一些实施例中,基于三维融合特征,可以利用多种方式对初始面部替换特征进行转换,得到目标面部替换特征。
示例性地,基于三维融合特征,可以利用范数的方式对初始面部替换特征进行转换,得到目标面部替换特征。例如,基于三维融合特征,可以利用L1范数或L2范数等方式对初始面部替换特征进行转换,得到目标面部替换特征。
在一些实施例中,基于所述三维融合特征,还可以通过如下方式对所述初始面部替换特征进行转换,得到目标面部替换特征:
对三维融合特征进行第一逻辑运算,得到运算后的三维面部图像特征,以及对初始面部替换特征进行第二逻辑运算,得到运算后的面部替换特征;
对初始面部替换特征和运算后的面部替换特征,进行第三逻辑运算,得到运算后的面部替换特征;
将运算后的面部替换特征和运算后的三维面部图像特征进行第四逻辑运算,得到目标面部替换特征。
其中,第一逻辑运算及第二逻辑运算可以包括求数据的均值、求数据的方差、求数据的标准差或者求数据的协方差,等等。
其中,第三逻辑运算及第四逻辑运算可以包括利用加、减、乘、除等对数据进行 处理的方式。例如,第三逻辑运算及第四逻辑运算可以包括将数据进行相除,又例如,第三逻辑运算及第四逻辑运算可以包括将数据进行相减,等等。
在一些实施例中,基于三维融合特征,还可以利用自适应实例正则化(Adaptive Instance Normalization,AdaIN)对初始面部替换特征进行转换,得到目标面部替换特征。
其中,AdaIN是一种可以将三维融合特征的均值和方差对齐到初始面部替换特征的均值和方差上的方法,从而实现图像的特征转换。例如,AdaIN可以将三维融合特征的均值和方差对齐到初始面部替换特征的均值和方差上,从而得到特征的替换。
其中,基于三维融合特征,利用AdaIN对初始面部替换特征进行转换处理,得到目标面部替换特征,可以如下:
Figure PCTCN2022111744-appb-000008
其中,x可以表示初始面部替换特征,y可以表示三维融合特征。σ()和μ()可以分别表示均值和标准差。AdaIN(x,y)可以表示目标面部替换特征。
在一些实施例中,可以根据AdaIN对三维融合特征进行逻辑运算(如加、减、乘、除等),得到运算后的三维面部图像特征,以及对初始面部替换特征进行逻辑运算,得到运算后的面部替换特征。例如,可以根据AdaIN求对三维融合特征的均值和标准差,以及求初始面部替换特征的均值和标准差。
这里,三维融合特征包括至少两个子三维融合特征(如子三维融合特征可以为三维面部身份特征、三维面部表情特征、三维面部纹理特征、三维面部角度特征及面部光照特征),每个所述子三维融合特征对应一个特征维度;相应的,可采用如下方式对三维融合特征进行第一逻辑运算,得到运算后的三维面部图像特征:确定至少两个子三维融合特征的标准差,并将该标准差作为运算后的三维面部图像特征。
然后,可以根据AdaIN将初始面部替换特征和运算后的面部替换特征进行逻辑运算处理,得到运算后的面部替换特征。例如,可以将初始面部替换特征和初始面部替换特征的标准差进行相减之后相除,从而得到运算后面部替换特征。
接下来,可以根据AdaIN将运算后面部替换特征和统计后三维面部图像特征进行逻辑运算处理,得到目标面部替换特征。例如,可以将运算后面板替换特征和统计后三维面部图像特征的均值进行相乘之后相加,从而得到目标面部替换特征。
106、基于目标面部替换特征,将面部模板图像中的模板面部替换为源面部,得到替换后的面部图像。
在一些实施例中,为了提高替换后面部图像中的面部和原面部的相似程度,从而提高面部图像处理方法的精确度,可以对面部图像进行特征提取,得到面部图像的面部特征,然后,基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后的面部图像。
其中,对面部图像进行特征提取可以包括通过一些数字来表征面部信息,而这些数据可以指面部图像的面部特征。
在一些实施例中,本申请实施例中的面部特征可以是源面部的几何特征,也可以是源面部的表征特征;其中,几何特征是指眼睛、鼻子和嘴等面部特征之间的几何关系,如距离、面积和角度等;源面部的表征特征是利用面部图像的灰度信息,通过一些算法提取全局或局部特征。
在一些实施例中,可以利用多种方式对面部图像进行特征提取,得到面部图像的面部特征。
例如,当面部特征是源面部的几何特征时,可以对面部图像进行点位生成,得到面部图像中面部部位特征点的位置信息;然后,可以分别计算面部部位特征点的位置信息之间的空间差异,并将该空间差异作为面部特征。
又例如,当面部特征是源面部的表征特征时,可以利用卷积核对面部图像进行卷积提取,得到面部图像的卷积结果,然后将面部图像的卷积结果进行回归处理,得到面部特征。
又例如,还可以利用可以提取面部特征的机器学习网络对面部图像进行特征提取。譬如,可以利用CNN、深度神经网络(Deep Neural Networks,DNN)等对面部图像进行特征提取,得到面部图像的面部特征。
在一些实施例中,可以利用训练得到的面部图像处理模型,基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后的面部图像。例如,利用GAN,基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后的面部图像。
在一些实施例中,训练得到的图像处理模型的模型结构可以包括三维面部建模网络、面部特征提取网络和对抗生成网络。其中,对抗生成网络可以包括生成器和判别器。其中,生成器可以包括编码器和解码器。
其中,三维面部建模网络可以是ResNet网络,三维面部建模网络可以配置为对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征。此外,三维面部建模网络还可以配置为将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征。
其中,面部特征提取网络可以是CNN网络,面部特征提取网络可以配置为对面部图像进行特征提取,得到面部特征。
其中,编码器可以基于面部模板图像,对面部图像进行面部替换特征提取,得到初始面部替换特征。
其中,解码器可以基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后的面部图像。例如,可以利用解码器对目标面部替换特征和面部图像的面部特征进行解码,得到解码后面部替换特征和解码后面部特征。然后将解码后面部替换特征和解码后面部特征进行融合,得到融合后的面部特征。接下来,可以将融合后的面部特征映射到预设概率分布空间中,得到融合后面部特征的概率分布。并基于融合后面部特征的概率分布生成替换后面部图像。
其中,预设概率分布空间是在解码器中一个数学空间。该预设概率分布空间是在对面部图像处理模型进行训练的过程中不断形成的空间,可以依据特征生成和训练目的相符的内容。
在本申请实例中,利用了三维面部轮廓点对面部图像处理模型进行训练,从而使得通过训练后面部图像处理模型得到的替换后面部图像可以保持面部图像中源面部的面部轮廓,从而使得面部图像处理的效果更加地逼真,从而提高了面部图像处理的准确度。
例如,如图4所述。其中,图4中的011可以是面部图像,图4中的012可以是面部模板图像,图4中的013可以是利用相关技术得到的替换后面部图像,图4中的014可以是利用训练后的面部图像处理模型得到的替换后面部图像,通过将图4中的013和014进行对比,可以明确地看出014中的面部和011中的面部更加相似。
本申请实施例提出了一种面部图像处理方法,该面部图像处理方法包括:获取源面部的面部图像和模板面部的面部模板图像;对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特 征;将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征;基于面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征;基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后面部图像。本申请实施例通过分别从二维空间以及三维空间的角度出发,结合了三维融合特征及二维的面部特征,可以获取到面部图像和面部模板图像更多的特征,从而使得在进行面部替换处理时,有更加多的信息依据,从而提高了面部图像处理方法的准确性。
本申请实施例还相应地提出了一个面部图像处理模型的训练方法。接下来,本申请实施例将从面部图像处理模型的训练装置的角度进行描述,该面部图像处理模型的训练装置可以集成在计算机设备中,该计算机设备可以是服务器,也可以是终端等设备。
在一些实施例中,本申请实施例提供的面部图像处理模型的训练方法可以由终端或服务器单独实现,或由终端及服务器协同实现,以服务器单独实现为例,如图5所述,提供了一种面部图像处理模型的训练方法,包括:
201、服务器获取训练图像样本组,训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本。
其中,训练图像样本组包括对预设面部图像处理模型进行训练是用到的数据。
在一些实施例中,训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本。
其中,面部图像样本可以对应面部图像。面部模板图像样本可以对应面部模板图像。其中,面部图像样本中也包括源面部,而面部模板图像样本中包括模板面部。例如,如图6所示,图6中的015可以是面部图像样本,016可以是面部模板图像样本。
面部参考图像样本包括利用面部图像和面部模板图像样本合成的参考图像。该面部参考图像样本不仅具有面部图像中源面部的信息,还具有面部模板图像样本中模板面部的纹理、角度、光照和表情等图像信息。该面部参考图像样本可以相当于替换后面部图像,区别点在于面部参考图像样本是人工合成的。此外,面部参考图像样本相当于开发人员的训练目的,其作用是在对图像处理模型进行训练的过程中,给图像处理模型起一个参考作用,从而使得训练后图像处理模型可以生成符合要求的内容。
在一些实施例中,可以有多种方式获取训练图像样本组。例如,可以直接从开源网站中获取训练图像样本组。又例如,可以通过收集面部图像样本和面部模板图像样本,并利用面部图像样本和面部模板图像样本合成面部参考图像样本,从而得到训练图像样本组。
202、利用面部图像处理模型,将面部模板图像样本中的模板面部替换为面部图像样本中的源面部,得到预测面部图像。
其中,预测面部图像可以包括将面部模板图像样本中的模板面部替换为面部图像样本中的源面部之后得到的图像。例如,如图6所示,图6中的017可以是预测面部图像。
其中,面部图像处理模型包括准备训练的面部图像处理模型。
其中,预设面部图像处理模型的结构和步骤106中训练后面部图像处理模型相同,只是该准备训练的面部图像处理模型生成的预测面部图像还不符合训练目的。
在一些实施例中,利用面部图像处理模型,将面部模板图像样本中的模板面部替换为面部图像样本中的源面部,得到预测面部图像包括:
利用面部图像处理模型,对面部图像样本和面部模板图像样本进行三维面部建 模,得到面部图像样本的三维面部图像样本特征和面部模板图像样本的三维面部模板图像样本特征;
利用面部图像处理模型将三维面部图像样本特征和三维面部模板图像样本特征进行融合,得到融合后的三维面部图像样本特征;
利用面部图像处理模型基于面部模板图像样本,对面部图像样本进行面部替换特征提取,得到初始面部替换样本特征;
利用面部图像处理模型基于融合后的三维面部图像样本特征,对初始面部替换样本特征进行转换,得到目标面部替换样本特征;
利用面部图像处理模型,基于目标面部替换样本特征和面部图像样本的面部特征,将面部模板图像样本中的模板面部替换为面部图像样本的源面部,得到预测面部图像。
例如,如图6所示,可以对面部模板图像样本和面部图像样本进行三维面部建模,得到面部图像样本的三维面部图像样本特征和面部模板图像样本的三维面部模板图像样本特征。然后,面部图像处理模型可以将三维面部图像样本特征和三维面部模板图像样本特征进行融合,得到融合后的三维面部图像样本特征。此外,面部图像处理模型基于面部模板图像样本,对面部图像样本进行面部替换特征提取,得到初始面部替换样本特征。然后,预设面部图像处理模型可以利用AdaIN将融合后三维面部图像样本特征注入到初始面部替换样本特征。其中,图6中的三维特征可以包括融合后三维面部图像样本特征。
203、对预测面部图像进行三维面部轮廓点检测,得到预测面部图像的三维面部轮廓点,以及对面部参考图像样本进行三维面部轮廓点检测,得到面部参考图像样本的三维面部轮廓点。
其中,三维面部轮廓点包括从三维空间中说明图像中面部轮廓的信息点。例如,通过预测面部图像的三维面部轮廓点,可以知道预测面部图像中预测面部的轮廓信息。又例如,通过面部参考图像样本的三维面部轮廓点,可以知道面部参考图像样本中面部的轮廓信息。
在一些实施例中,为了提高面部图像处理的准确性,在本申请实施例可以利用三维面部轮廓点对面部图像处理模型进行训练。所以,可以对预测面部图像进行三维面部轮廓点检测,得到预测面部图像的三维面部轮廓点,以及对面部参考图像样本进行三维面部轮廓点检测,得到面部参考图像样本的三维面部轮廓点。
例如,可以将预测面部图像投影到三维空间中,并在三维空间中搜索预测面部图像的三维面部轮廓点。
又例如,可以对预测面部图像进行三维面部建模,得到预测面部图像的三维预测面部图像特征;对三维预测面部图像特征进行三维关键点投影,得到预测面部图像的三维面部关键点;基于三维面部关键点,从三维面部关键点中筛选出三维面部轮廓点。
其中,三维预测面部图像特征可以在三维空间中指示预测面部图像特点。例如,三维预测面部图像特征可以指示预测面部图像中预测面部五官的特点、面部轮廓,纹理,角度和关照等信息。
其中,三维预测面部图像特征中可以包括多个特征,这些特征共同构成了三维预测面部图像特征。例如,三维预测面部图像特征可以包括预测面部身份特征、预测面部表情特征、预测面部纹理特征、预测面部角度特征和预测面部光照特征,等等。
其中,三维面部关键点包括具有预测面部的信息的多个关键点,如包括用于指示预测面部特征的所有关键点。
在一些实施例中,如图6所示,可以如图6中的018对预测面部图像进行三维面 部建模,得到预测面部图像的三维预测面部图像特征;然后对三维预测面部图像特征进行三维关键点投影,得到预测面部图像的三维面部关键点。然后,可以如图6中的019基于三维面部关键点,从三维面部关键点中筛选出三维面部轮廓点。
在一些实施例中,可以利用投影函数对三维预测面部图像特征进行三维关键点投影,得到预测面部图像的三维面部关键点。例如,可以根据下述公式对三维预测面部图像特征进行三维关键点投影,得到预测面部图像的三维面部关键点:
result _3d _points=reconstruction_without_tex(result_3d_feature)
其中,result_3d_points可以是三维面部关键点;result_3d_feature可以是三维预测面部图像特征;reconstruction_without_tex()可以是投影函数。
其中,投影函数可以是多种类型的函数。例如,投影函数可以是开放图形库(Open Graphics Library,OpenGL)中的glOrtho()、glFrustum()或gluPerspective()等等。
在一些实施例中,可以基于预测面部身份特征和预测面部表情特征,得到预测面部图像的三维面部关键点。例如,对三维预测面部图像特征进行三维关键点投影,得到预测面部图像的三维面部关键点,包括:
从三维预测面部图像特征中提取出预测面部图像的预测面部身份特征和预测面部表情特征;
利用预设传递参数将预测面部身份特征和预测面部表情特征进行三维关键点投影,得到预测面部图像的三维面部关键点。
其中,预设传递参数包括预先设置好的,可以实现信息传递的参数。
在一些实施例中,可以按照下列公式将预测面部身份特征和预测面部表情特征进行三维关键点投影,得到预测面部图像的三维面部关键点:
result_3d_points
=idBase*id_coeff+exBase*ex_coeff+meanshape
其中,id_coeff可以是预测面部身份特征,ex_coeff可以是预测面部表情特征,idBase、exBase和meanshape可以是预设传递参数。
在一些实施例中,在得到三维面部关键点之后,可以基于三维面部关键点,从三维面部关键点中筛选出三维面部轮廓点。
例如,可以获取三维面部关键点的位置信息,然后基于三维面部关键点的位置信息从三维面部关键点中筛选出三维面部轮廓点。譬如,可以将位置信息处于边缘的三维面部关键点确定为三维面部轮廓点。
又例如,可以根据三维面部关键点的输出顺序筛选出三维面部轮廓点。在一些实施例中,三维面部关键点的输出顺序是根据预先的设置规定好的。例如,有68个三维面部关键点,其中前17个可以是三维面部轮廓点,所以可以将输出顺序在前17位的三维面部关键点确定位三维面部轮廓点。
204、获取预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的差异,得到预测面部图像和所述面部参考图像样本之间的面部轮廓损失。
在一些实施例中,在得到三维面部轮廓点之后,可以计算预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的差异,得到预测面部图像和所述面部参考图像样本之间的面部轮廓损失。
例如,可以将预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点进行求差,从而得到面部轮廓损失;又例如,可以求预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的宇轩相似度,从而得到面部轮廓损 失。
譬如,当将预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点进行求差时,可以按照下列公式进行:
3d_point_loss=abs(gt_3d_OutlookPoint-result_3d_OutlookPoint)
其中,gt_3d_OutlookPoint可以是面部参考图像样本的三维面部轮廓点,result_3d_OutlookPoint可以是预测面部图像的三维面部轮廓点,3d_point_loss可以是面部轮廓损失,abs()可以是绝对值符号。
在一些实施例中,为了提高训练后的面部图像处理模型的性能,使得训练后的面部图像处理模型能够生成符合要求的图像,可以从其他的多个维度上计算损失,并利用面部轮廓损失和其他维度上的损失对预设面部图像处理模型进行调整,也即结合面部轮廓损失和其他维度上的损失,更新面部图像处理模型的模型参数。
例如,如图6所示,可以计算面部图像样本和预测面部图像之间的面部特征损失,从而使得除了利用面部轮廓损失对预设面部图像处理模型进行调整,还可以利用其他损失对预设面部图像处理模型进行调整。
例如,计算面部参考图像样本和预测面部图像之间除了三维面部轮廓点以外的差异,得到面部参考图像样本和预测面部图像的第一损失,第一损失包括除了面部轮廓损失以外的损失;
计算面部图像样本和预测面部图像之间的面部特征损失,并将第一损失和面部特征损失进行融合,得到第二损失;如此,可基于第二损失更新面部图像处理模型的模型参数。
其中,除了面部轮廓损失以外的损失可以包括其他维度上的损失。例如,其他维度上的损失可以包括像素损失、特征损失和判别损失中至少之一。
其中,像素损失可以包括面部参考图像样本和预测面部图像在像素级别上的损失;特征损失可以包括面部参考图像样本和预测面部图像在特征级别上的损失。例如,特征损失可以指面部参考图像样本的面部和预测面部图像的预测面部之间的差异。
在一些实施例中,面部图像处理模型中具有判别器,判别器的作用是识别出生成器生成的图像是否是真实的图像。所以,判别损失可以包括判别器对面部参考图像样本和预测面部图像进行判别后生成的信息。
在一些实施例中,当第一损失包括像素损失、特征损失和判别损失时,计算面部参考图像样本和预测面部图像之间除了三维面部轮廓点以外的差异,得到面部参考图像样本和预测面部图像的第一损失,包括:
计算面部参考图像样本和预测面部图像之间的像素差异,得到像素损失;
计算面部参考图像样本和预测面部图像之间的特征,得到特征损失;
计算面部参考图像样本和预测面部图像之间的判别差异,得到判别损失。
在一些实施例中,在计算面部参考图像样本和预测面部图像之间的像素差异时,可以提取面部参考图像和预测面部图像的像素信息,然后计算像素信息之间的差异,从而得到像素损失。
例如,可以提取面部参考图像样本在颜色通道上的值,以及提取预测面部图像在颜色通道上的值,并将两者进行相减后求绝对值,从而得到像素损失。譬如,可以如下公式所示:
Resconstruction loss=abs(result-gt_img)
其中,result可以是预测面部图像的像素信息,gt_img可以是面部参考图像样本的像素信息,Resconstruction loss可以是像素损失。
在一些实施例中,由于面部图像处理模型中可以对图像在三维空间中进行特征提取以及在二维空间中进行特征提取,所以特征损失可以包括二维特征损失和三维特征损失,相应的,计算面部参考图像样本和预测面部图像之间的特征差异,得到特征损失,可以包括:
计算面部参考图像样本和预测面部图像之间的二维特征差异,得到二维特征损失;
计算面部参考图像样本和预测面部图像之间的三维特征差异,得到三维特征损失;
将二维特征损失和三维特征损失进行融合,得到特征损失。
其中,二维特征差异可以包括面部参考图像样本和预测面部图像在二维空间中特征的差异。例如,二维特征差异可以包括面部参考图像样本的图像特征和预测面部图像的图像特征上的差异。
其中,三维特征差异可以包括面部参考图像样本和预测面部图像在三维空间中特征的差异。例如,三维特征差异可以包括面部参考图像样本的三维面部参考图像样本特征和预测面部图像的三维预测面部图像特征之间的差异。
在一些实施例中,在计算面部参考图像样本和预测面部图像之间的二维特征差异时,可以分别对面部参考图像样本和预测面部图像进行特征提取,得到面部参考图像样本的图像特征以及预测面部图像的图像特征,然后,计算面部参考图像样本的图像特征以及预测面部图像的图像特征之间的差异。
例如,可以利用Alexnet网络对面部参考图像样本和预测面部图像进行特征提取,得到面部参考图像样本的图像特征以及预测面部图像的图像特征。
其中,Alexnet网络由5个卷积层和3个全连接层组成。其中,5个卷积层可以配置为对图像进行特征提取,且每个层之间具有信息传递的关系。例如,第一个卷积层对图像进行特征提取之后,会将提取得到的信息传递给第二个卷积层。然后,第二个卷积层会将对第一个卷积层提取到的特征继续进行特征提取,并将提取到的特征传递给第三个卷积层。以此类推,最后第五个卷积层会将提取到的特征传递给全连接层。
在一些实施例中,可以利用多种方式计算面部参考图像样本的图像特征以及预测面部图像的图像特征之间的差异。
例如,可以利用图像感知相似度指标(LPIPS)计算面部参考图像样本的图像特征以及预测面部图像的图像特征之间的差异。又例如,可以利用求差的方式计算二维特征损失。又例如,可以利用余弦相似度的方式计算二维特征损失,等等。
在一些实施例中,当利用Alexnet网络对图像进行特征提取时,可以计算每个卷积层中面部参考图像样本的图像特征以及预测面部图像的图像特征之间的差异。
例如,利用Alexnet网络对面部参考图像样本进行特征提取,得到gt_img_fral1、gt_img_feal2、gt_img_feal3和gt_img_feal4。其中,gt_img_feal1、gt_img_feal2、gt_img_feal3和gt_img_feal4可以分别指Alexnet网络中其中4个卷积层输出的面部参考图像样本的特征。
同理,可以利用Alexnet网络对预测面部图像进行特征提取,得result_feal1、result_feal2、result_feal3和resuly_feal4。其中,result_feal1、result_feal2、result_feal3和result_feal4可以分别指Alexnet网络中其中4个卷积层输出的预测面部图像的特征。
然后,可以依据下列公式计算二维特征损失:
Two_loss=abs(result_feal1-gt_img_feal1)+abs(result_feal2-gt_img _feal2)+abs(result _feal3-gt_img_feal3)+abs(result_feal4-gt)img_feal4)
其中,Two_loss可以是二维特征损失。
在一些实施例中,可以对面部参考图像样本和预测面部图像进行面部建模,得到面部参考图像样本的三维面部参考图像样本特征和预测面部图像的三维预测面部图像特征,然后可以计算三维面部参考图像样本特征和三维预测面部图像特征之间的差异。
在一些实施例中,也可以利用多种方式计算三维面部参考图像样本特征和三维预测面部图像特征之间的差异。例如,可以利用图像感知相似度指标(LPIPS)计算三维特征损失。又例如,可以利用求差的方式计算三维特征损失。又例如,可以利用余弦相似度的方式计算三维特征损失,等等。
在一些实施例中,可以按照下列公式计算三维特征损失:
Figure PCTCN2022111744-appb-000009
其中,
Figure PCTCN2022111744-appb-000010
可以表示三维特征损失,
Figure PCTCN2022111744-appb-000011
可以表示三维面部参考图像样本特征,
Figure PCTCN2022111744-appb-000012
可以表示三维预测面部图像特征。
在一些实施例中,在得到二维特征损失和三维特征损失之后,可以将二维特征损失和三维特征损失进行融合,得到特征损失。例如,可以将二维特征损失和三维特征损失进行相加,得到特征损失。又例如,可以将二维特征损失和三维特征损失加权后求和,得到特征损失。
在一些实施例中,在计算面部参考图像样本和预测面部图像之间的判别差异时,可以将面部参考图像样本和预测面部图像进行尺度变换,并利用判别器对尺度变换后的图像进行判别,从而提高判别损失的丰富性。例如,分别对面部参考图像样本和预测面部图像进行尺度变换处理,得到至少一个尺度变换后的面部参考图像样本、和至少一个尺度变换后的预测面部图像;
分别对至少一个尺度变换后面部参考图像样本和至少一个尺度变换后预测面部图像进行判别处理,得到尺度变换后面部参考图像样本的第一判别特征和尺度变换后预测面部图像的第二判别特征;
基于第一判别特征和第二判别特征计算所述判别损失。
其中,尺度变换可以指改变图像的尺寸。例如,图像的尺寸为长256×宽256,通过尺度变换,可以将图像的尺寸改变为长128×宽128。
例如,面部参考图像样本的原尺寸为a,可以通过尺度变换处理,得到尺寸为1/2a的面部参考图像样本,和尺寸为1/4a的面部参考图像样本。
同理,假设预测面部图像的原尺寸为b,可以通过尺度变换处理,得到尺寸为1/2b的预测面部图像和尺寸为1/4b的预测面部图像。
接下来,可以分别对至少一个尺度变换后面部参考图像样本和至少一个尺度变换后预测面部图像进行判别,得到尺度变换后的面部参考图像样本的第一判别特征和尺度变换后的预测面部图像的第二判别特征。
例如,可以将原尺寸为a的面部参考图像样本、尺寸为1/2a的面部参考图像样本和尺寸为1/4a的面部参考图像样本输入到判别器中,得到判别结果。
譬如,将原尺寸为a的面部参考图像样本、尺寸为1/2a的面部参考图像样本和尺寸为1/4a的面部参考图像样本输入到判别器之后,得到的判别结果分别为D(gt_img)、D(gt_img_1/2)和D(gt_img_1/4)。其中,符号D()可以表示判别器 的判别结果。gt_img可以指原尺寸为a的面部参考图像样本,gt_img_1/2可以指尺寸为1/2a的面部参考图像样本,gt_img_1/4可以指尺寸为1/4a的面部参考图像样本。
在一些实施例中,判别结果一般用特征来表示。例如,D()一般是一个在0至1之间的数值,其中,当判别结果为1时,说明图像通过判别,而当判别结果为0时,说明图像未通过判别。
例如,第一判别特征可以包括D(gt_img)、D(gt_img_1/2)和D(gt_img_1/4)。
又例如,可以将原尺寸为b的预测面部图像、尺寸为1/2b的预测面部图像和尺寸为1/4b的预测面部图像输入到判别器中,得到判别结果。
譬如,将原尺寸为b的预测面部图像、尺寸为1/2b的预测面部图像和尺寸为1/4b的预测面部图像输入到判别器中,得到判别结果分别为D(result)、D(result_1/2)和D(result_1/4)。其中,result可以指原尺寸为b的预测面部图像,result_1/2可以指尺寸为1/2a的预测面部图像,result_1/4可以指尺寸为1/4a的预测面部图像。
其中,第二判别特征可以包括判别器的判别结果。例如,第二判别特征可以包括D(result)、D(result_1/2)和D(result_1/4)。
在一些实施例中,可以利用求差的方式计算判别损失。又例如,可以利用余弦相似度的方式计算判别损失,等等。
在一些实施例中,可以根据下列方式计算判别损失:
D_loss=1/3*(-logD(gt_img)-logD(result)
-logD(gt_img_1/2)-logD(result_1/2)
-logD(gt_img_1/4)-logD(result_1/4))
其中,D_loss可以是判别损失。
在一些实施例中,为了使得预测面部图像中预测面部的身份特征和面部图像样本中源面部的身份特征尽量相似,所以还可以计算面部图像样本和预测面部图像样本之间的面部特征损失。
例如,可以对预测面部图像和面部图像样本进行面部特征提取,得到预测面部图像的面部特征和面部图像样本的面部特征。然后,计算预测面部图像的面部特征和面部图像样本的面部特征间的面部特征损失。
在一些实施例中,可以利用求差的方式计算二维特征损失。又例如,可以利用余弦相似度的方式计算二维特征损失,等等。
在一些实施例中,可以按照下式计算二维特征损失:
Figure PCTCN2022111744-appb-000013
其中,id loss可以是二维特征损失,
Figure PCTCN2022111744-appb-000014
可以是预测面部图像的面部特征,
Figure PCTCN2022111744-appb-000015
可以是面部图像样本的面部特征。cosine similarity可以是余弦相似度的计算方式,其中,cosine similarity的表达式可以如下:
Figure PCTCN2022111744-appb-000016
其中,A和B可以是向量,A i可以是向量A中的分量,B i可以是向量B中的分量。i可以指第i个分量,n可以指向量中分量的总数。
在一些实施例中,在得到第一损失和面部特征损失之后,可以将第一损失和面部特征损失进行融合,得到第二损失。例如,可以将第一损失和面部特征损失进行相加,得到第二损失。又例如,可以将第一损失和面部特征进行加权求和,得到第二损失。
205、基于面部轮廓损失,更新所述面部图像处理模型的模型参数。
在一些实施例中,可以基于面部轮廓损失,更新所述面部图像处理模型的模型参数,也即对面部图像处理模型进行调整,得到训练后的图像处理模型。
例如,如图6所示,在得到面部轮廓损失之后,可以利用面部轮廓损失约束预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点一致。
例如,可以基于面部轮廓损失对预设面部图像处理模型中的模型参数进行调整,得到调整后的面部图像处理模型。然后,又利用训练图像样本组对调整后面部图像处理模型进行训练,通过上述反复的操作,使得面部轮廓损失信息小于一定的程度,或者符合要求时,说明训练达到了目的。此时,便可也得到性能也符合要求的训练后图像处理模型。
在一些实施例中,还可以将面部轮廓损失和第二损失进行融合,得到第三损失,然后利用第三损失对图像处理模型进行调整,即利用第三损失更新面部图像处理模型的模型参数,得到训练后的面部图像处理模型。例如,获取面部图像处理模型的模型参数;
将面部轮廓损失和第二损失进行融合处理,得到第三损失;
利用第三损失对模型参数进行调整,得到训练后面部图像处理模型。
例如,可以将面部轮廓和第二损失进行相加,得到第三损失。然后,利用第三损失对预设面部图像处理模型的模型参数进行调整,得到训练后的面部图像处理模型。
在一些实施例中,预设面部图像处理模型在训练的过程中,除了会学习如何实现面部替换之后,还会对三维特征进行学习,从而预测出图像的三维特征,例如,如图6所示。
在本申请实施例中,可以获取训练图像样本组,训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;利用面部图像处理模型将面部模板图像样本中的模板面部替换为面部图像样本中的源面部,得到预测面部图像;对预测面部图像进行三维面部轮廓点检测,得到预测面部图像的三维面部轮廓点,以及对面部参考图像样本进行三维面部轮廓点检测,得到面部参考图像样本的三维面部轮廓点;计算预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的差异,得到预测面部图像和所述面部参考图像样本之间的面部轮廓损失;基于面部轮廓损失面部图像处理模型进行调整,得到训练后的面部图像处理模型。通过利用三维面部轮廓点对面部图像处理模型进行训练,使得利用训练后的面部图像处理模型将面部模板图像中的模板面部替换为源面部时,得到替换后面部图像可以保持源面部的面部轮廓,从而提高了面部图像处理方法的准确性。
此外,在本申请实施例中,通过在多个不同维度上计算损失信息,并利用多个维度上的损失对预设面部图像处理模型进行调整,从而使得预设面部图像处理模型可以利用多个维度上的损失进行在不同的维度上进行参数的调整,使得训练后面部图像处理模型具有更好的性能。
根据上面实施例所描述的方法,以下将举例进行详细说明。
本申请实施例将以面部图像处理模型的训练集成在计算机设备上为例来介绍本申请实施例方法。
在一些实施例中,如图7所示,一种面部图像处理模型的训练方法,包括:
301、计算机设备获取训练图像样本组,训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本。
例如,可以将面部图像样本表示为source,将面部模板图像样本表示为target,将面部参考图像样本表示为gt_img。
302、计算机设备利用预设面部图像处理模型将面部模板图像样本中的模板面部 替换为面部图像样本中的源面部,得到预测面部图像。
例如,计算机设备可以将source和target输入到预设面部图像处理模型中的编码器。编码器会将source和target的分辨率不断降低,并在隐空间将两者编码为初始面部替换样本特征。
此外,计算机上设备可以利用面部特征提取网络对source进行特征提取,得到source的面部特征source_id_feature。
此外,计算机设备还可以对source和target进行三维面部建模,得到面部图像样本的三维面部图像样本特征和面部模板图像样本的三维面部模板图像样本特征。
然后计算机设备可以利用预设面部图像处理模型基于融合后三维面部图像样本特征对初始面部替换样本特征进行转换,得到目标面部替换样本特征。
接下来,计算机设备可以利用预设面部图像处理模型基于目标面部替换样本特征和面部图像样本的面部特征,将面部模板图像样本中的模板面部替换为面部图像样本的源面部,得到预测面部图像。
其中,可以将预测面部图像表示为result。
303、计算机设备对预测面部图像进行三维面部轮廓点检测,得到预测面部图像的三维面部轮廓点,以及对面部参考图像样本进行三维面部轮廓点检测,得到面部参考图像样本的三维面部轮廓点。
例如,计算机设备可以计算result的三维预测面部图像特征(可以表示为result_3d_feature)。
然后,计算机设备可以对三维预测面部图像特征进行三维关键点投影,得到预测面部图像的三维面部关键点。例如,如下公式所示:
result_3d_points=reconstruction_without_tex(result_3d_feature)
接下来,计算机上设备可以基于三维面部关键点,从三维面部关键点中筛选出三维面部轮廓点。
同理,计算机设备可以计算gt_img的三维面部模板图像样本特征(可以表示为gt_3d_feature)。
然后,计算机设备可以对三维面部模板图像样本特征进行三维关键点投影,得到面部参考图像样本的三维面部关键点。例如,如下公式所示:
gt_3d_points=reconstruction_without_tex(result_3d_feature)
其中,gt_3d_points可以为面部参考图像样本的三维面部关键点。
304、计算机设备计算预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的差异,得到预测面部图像和所述面部参考图像样本之间的面部轮廓损失。
例如,可以根据下述公式计算面部轮廓损失:
3d_point_loss=abs(gt_3d_OutlookPoint-result_3d_OutlookPoint)
在一些实施例中,还可以计算其它损失,并利用面部轮廓损失和其它损失共同对对预设面部图像处理模型进行调整,得到训练后面部图像处理模型。
例如,可以将面部特征损失、像素损失、特征损失、判别损失以及面部轮廓损失进行相加。然后,利用相加得到的损失对预设面部图像处理模型进行调整,得到训练后的面部图像处理模型。
305、计算机设备基于面部轮廓损失,对面部图像处理模型的模型参数进行调整,得到训练后的面部图像处理模型。
本申请实施例中计算机设备可以获取训练图像样本组,训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;计算机设备可以利用预设面部图 像处理模型将面部模板图像样本中的模板面部替换为面部图像样本中的源面部,得到预测面部图像;计算机设备可以对预测面部图像进行三维面部轮廓点检测,得到预测面部图像的三维面部轮廓点,以及对面部参考图像样本进行三维面部轮廓点检测,得到面部参考图像样本的三维面部轮廓点;计算机设备计算预测面部图像的三维面部轮廓点和面部参考图像样本的三维面部轮廓点之间的差异,得到预测面部图像和所述面部参考图像样本之间的面部轮廓损失;计算机设备基于面部轮廓损失信息对预设面部图像处理模型进行调整,得到训练后面部图像处理模型。通过利用三维面部轮廓点对预设面部图像处理模型进行训练,使得利用训练后面部图像处理模型将面部模板图像中的模板面部替换为源面部时,得到替换后面部图像可以保持源面部的面部轮廓,从而提高了面部图像处理方法的准确性。
本申请实施例将以面部图像处理模型的训练集成在计算机设备上为例来介绍本申请实施例方法。
在一些实施例中,如图8所示,一种面部图像处理方法,包括:
401、计算机设备获取源面部的面部图像和模板面部的面部模板图像。
402、计算机设备对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征。
403、计算机设备将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征。
404、计算机设备基于面部模板图像,对面部图像进行面部替换特征提取,得到初始面部替换特征。
405、计算机设备基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征。
406、计算机设备利用训练后的面部图像处理模型基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后面部图像。
本申请实施例中,计算机设备获取源面部的面部图像和模板面部的面部模板图像;计算机设备对面部图像和面部模板图像进行三维面部建模,得到面部图像的三维面部图像特征和面部模板图像的三维面部模板图像特征;计算机设备将三维面部图像特征和三维面部模板图像特征进行融合,得到三维融合特征;计算机设备基于面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;计算机设备基于三维融合特征对初始面部替换特征进行转换,得到目标面部替换特征;计算机设备利用训练后面部图像处理模型基于目标面部替换特征和面部图像的面部特征,将面部模板图像中的模板面部替换为源面部,得到替换后面部图像。本申请实施例通过分别从二维空间以及三维空间的角度出发,可以获取到面部图像和面部模板图像更多的特征,从而使得在进行面部替换处理时,有更加多的信息依据,从而提高了面部图像处理方法的准确性。
为了更好地实施本申请实施例提供的面部图像处理方法,本申请实施例还提供了一种面部图像处理装置,该面部图像处理装置可以集成于计算机设备中。其中名词的含义与上述面部图像处理方法中相同,具体实现细节可以参考方法实施例中的说明。
在一些实施例中,提供了一种面部图像处理装置,该面部图像处理装置可以集成在计算机设备中,如图9所示,该面部图像处理装置包括:第一获取单元501、三维面部建模单元502、第一融合单元503、特征提取单元504、转换单元505和第一替换单元506,具体如下:
第一获取单元501,配置为获取源面部的面部图像和模板面部的面部模板图像;
三维面部建模单元502,配置为对所述面部图像和所述面部模板图像进行三维面 部建模,得到所述面部图像的三维面部图像特征和所述面部模板图像的三维面部模板图像特征;
第一融合单元503,配置为将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征;
特征提取单元504,配置为基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;
转换单元505,配置为基于所述三维融合特征对所述初始面部替换特征进行转换,得到目标面部替换特征;
第一替换单元506,配置为基于所述目标面部替换特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后面部图像。
在一些实施例中,所述第一融合单元503,包括:
第一提取子单元,配置为在所述三维面部图像特征中提取出所述面部图像对应的源面部身份特征;
第二提取子单元,配置为在所述三维面部模板图像特征中提取出所述面部模板图像对应的模板面部图像特征;
第一融合子单元,配置为将所述源面部身份特征和所述模板面部图像特征进行融合,得到所述三维融合特征。
在一些实施例中,所述特征提取单元504,包括:
第一编码子单元,配置为对所述面部模板图像进行编码处理,得到所述面部模板图像的第一编码特征;
第二编码子单元,配置为对所述面部图像进行编码处理,得到所述面部图像的第二编码特征;
第一调整子单元,配置为基于所述第二编码特征,对所述第一编码特征进行调整,得到所述初始面部替换特征。
在一些实施例中,所述转换单元505,包括:
第一统计子单元,配置为对所述三维融合特征进行第一逻辑运算,得到运算后的三维面部图像特征,以及对所述初始面部替换特征进行第二逻辑运算,得到运算后的面部替换特征;
第二统计子单元,配置为将所述初始面部替换特征和所述运算后的面部替换特征进行第三逻辑运算,得到运算后的面部替换特征;
逻辑运算处理子单元,配置为将所述运算后的面部替换特征和所述运算后的三维面部图像特征进行逻辑运算,得到所述目标面部替换特征。
以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
通过上述的面部图像处理装置可以提高了对面部图像进行替换的准确度。
此外,在一些实施例中还提供了一种面部图像处理模型的训练装置,该面部图像处理模型的训练装置可以集成于计算机设备中。其中名词的含义与上述面部图像处理模型的训练方法中相同,具体实现细节可以参考方法实施例中的说明。
在一些实施例中,提供了一种面部图像处理模型的训练装置,该面部图像处理模型的训练装置可以集成在计算机设备中,如图10所示,该面部图像处理模型的训练装置包括:第二获取单元601、第二替换单元602、三维面部轮廓点检测单元603、计算单元604和调整单元605,其中:
第二获取单元601,配置为获取训练图像样本组,所述训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;
第二替换单元602,配置为利用面部图像处理模型将所述面部模板图像样本中的模板面部替换为所述面部图像样本中的源面部,得到预测面部图像;
三维面部轮廓点检测单元603,配置为对所述预测面部图像进行三维面部轮廓点检测,得到所述预测面部图像的三维面部轮廓点,以及对所述面部参考图像样本进行三维面部轮廓点检测,得到所述面部参考图像样本的三维面部轮廓点;
计算单元604,配置为计算所述预测面部图像的三维面部轮廓点和所述面部参考图像样本的三维面部轮廓点之间的差异,得到所述预测面部图像和所述面部参考图像样本之间的面部轮廓损失;
调整单元605,配置为基于所述面部轮廓损失对所述面部图像处理模型进行调整,得到训练后的面部图像处理模型。
在一些实施例中,所述三维面部轮廓点检测单元603,包括:
三维面部建模子单元,配置为对所述预测面部图像进行三维面部建模,得到所述预测面部图像的三维预测面部图像特征;
三维关键点投影子单元,配置为对所述三维预测面部图像特征进行三维关键点投影,得到所述预测面部图像的三维面部关键点;
筛选子单元,配置为基于所述三维面部关键点,从所述三维面部关键点中筛选出所述三维面部轮廓点。
在一些实施例中,所述三维关键点投影子单元,包括:
提取模块,配置为从所述三维预测面部图像特征中提取出所述预测面部图像的预测面部身份特征和预测面部表情特征;
三维关键点投影模块,配置为利用预设传递参数将所述预测面部身份特征和预测面部表情特征进行三维关键点投影,得到所述预测面部图像的三维面部关键点。
在一些实施例中,所述面部图像处理模型的训练装置还包括:
第一计算单元,配置为计算所述面部参考图像样本和所述预测面部图像之间除了三维面部轮廓点以外的差异,得到所述面部参考图像样本和所述预测面部图像的第一损失,所述第一损失包括除了面部轮廓损失以外的损失;
第二计算单元,配置为计算所述面部图像样本和所述预测面部图像之间的面部特征损失;
第二融合单元,配置为将所述第一损失信息和所述面部特征损失信息进行融合处理,得到第二损失。
在一些实施例中,所述第一计算单元,包括:
第一计算子单元,配置为计算所述面部参考图像样本和所述预测面部图像之间的像素差异,得到像素损失;
第二计算子单元,配置为计算所述面部参考图像样本和所述预测面部图像之间的特征差异,得到特征损失;
第三计算子单元,配置为计算所述面部参考图像样本和所述预测面部图像之间的判别差异,得到判别损失。
在一些实施例中,所述第二计算子单元,包括:
第一计算模块,配置为计算所述面部参考图像样本和所述预测面部图像之间的二维特征差异,得到二维特征损失;
第二计算模块,配置为计算所述面部参考图像样本和所述预测面部图像之间的三维特征差异,得到三维特征损失;
第一融合模块,配置为将所述二维特征损失信息和所述三维特征损失信息进行融合,得到所述特征损失。
在一些实施例中,所述第三计算子单元,包括:
尺度变换模块,配置为分别对所述面部参考图像样本和所述预测面部图像进行尺度变换处理,得到至少一个尺度变换后面部参考图像样本和至少一个尺度变换后预测面部图像;
判别模块,配置为分别对所述至少一个尺度变换后面部参考图像样本和所述至少一个尺度变换后预测面部图像进行判别,得到所述尺度变换后面部参考图像样本的第一判别特征和所述尺度变换后预测面部图像的第二判别特征;
第三计算模块,配置为基于所述第一判别特征和所述第二判别特征计算所述判别损失。
在一些实施例中,所述调整单元605,包括:
获取子单元,配置为获取所述训练后面部图像处理模型的模型参数;
第二融合子单元,配置为将所述面部轮廓损失和所述第二损失进行融合处理,得到第三损失;
参数调整单元,配置为利用所述第三损失对所述模型参数进行调整,得到所述训练后面部图像处理模型。
本申请实施例还提供一种计算机设备,该计算机设备可以包括终端或服务器,比如,该终端可以为手机、平板电脑等等;又比如计算机设备可以为服务器,等。如图11所示,其示出了本申请实施例所涉及的终端的结构示意图。
该计算机设备可以包括一个或者一个以上处理核心的处理器701、一个或一个以上计算机可读存储介质的存储器702、电源703和输入单元704等部件。本领域技术人员可以理解,图11中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器701是该计算机设备的控制中心,利用各种接口和线路连接整个计算机设备的各个部分,通过运行或执行存储在存储器702内的软件程序和/或模块,以及调用存储在存储器702内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。在一些实施例中,处理器701可包括一个或多个处理核心;优选的,处理器701可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户页面和应用程序等,调制解调处理器主要处理无线通讯。可以理解的是,上述调制解调处理器也可以不集成到处理器701中。
存储器702可配置为存储软件程序以及模块,处理器701通过运行存储在存储器702的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器702可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备的使用所创建的数据等。此外,存储器702可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器702还可以包括存储器控制器,以提供处理器701对存储器702的访问。
计算机设备还包括给各个部件供电的电源703,例如,电源703可以通过电源管理系统与处理器701逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源703还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该计算机设备还可包括输入单元704,该输入单元704可配置为接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,计算机设备还可以包括显示单元等,在此不再赘述。具体在本实施例中,计算机设备中的处理器701会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器702中,并由处理器701来运行存储在存储器702中的应用程序,从而实现本申请实施例提供的上述面部图像处理方法或面部图像处理模型的训练方法
由于该存储介质中所存储的计算机程序,可以执行本申请实施例所提供的任一种面部图像处理方法中的步骤,因此,可以实现本申请实施例所提供的任一种面部图像处理方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
以上对本申请实施例所提供的一种面部图像处理方法和面部图像处理模型进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种面部图像处理方法,所述方法由计算机设备执行,包括:
    获取源面部的面部图像和模板面部的面部模板图像;
    对所述面部图像和所述面部模板图像进行三维面部建模,得到所述面部图像的三维面部图像特征和所述面部模板图像的三维面部模板图像特征;
    将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征;
    基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;
    基于所述三维融合特征,对所述初始面部替换特征进行转换,得到目标面部替换特征;
    基于所述目标面部替换特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后的面部图像。
  2. 如权利要求1所述的方法,其中,所述将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征,包括:
    在所述三维面部图像特征中,提取所述面部图像的源面部身份特征;
    在所述三维面部模板图像特征中,提取所述面部模板图像的模板面部图像特征;
    将所述源面部身份特征和所述模板面部图像特征进行融合,得到所述三维融合特征。
  3. 如权利要求1所述的方法,其中,所述基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征,包括:
    对所述面部模板图像进行编码,得到所述面部模板图像的第一编码特征;
    对所述面部图像进行编码,得到所述面部图像的第二编码特征;
    基于所述第二编码特征,对所述第一编码特征进行调整,得到所述初始面部替换特征。
  4. 如权利要求1所述的方法,其中,所述基于所述融合特征,对所述初始面部替换特征进行转换,得到目标面部替换特征,包括:
    对所述三维融合特征进行第一逻辑运算,得到运算后的三维面部图像特征,以及对所述初始面部替换特征进行第二逻辑运算,得到运算后的面部替换特征;
    对所述初始面部替换特征和所述运算后的面部替换特征,进行第三逻辑运算,得到运算后的面部替换特征;
    将所述运算后的面部替换特征和所述运算后的三维面部图像特征,进行第四逻辑运算,得到所述目标面部替换特征。
  5. 如权利要求4所述的方法,其中,所述三维融合特征包括至少两个子三维融合特征,每个所述子三维融合特征对应一个特征维度;
    所述对所述三维融合特征进行第一逻辑运算,得到运算后的三维面部图像特征,包括:
    确定所述至少两个子三维融合特征的标准差,并将所述标准差作为所述运算后的三维面部图像特征。
  6. 如权利要求1所述的方法,其中,所述将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征,包括:
    利用面部图像处理模型,将所述三维面部图像特征和所述三维面部模板图像特 征进行融合,得到三维融合特征;
    所述基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征,包括:
    利用所述面部图像处理模型,基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;
    所述基于所述三维融合特征,对所述初始面部替换特征进行转换,得到目标面部替换特征,包括:
    利用所述面部图像处理模型,基于所述三维融合特征,对所述初始面部替换特征进行转换,得到目标面部替换特征。
  7. 如权利要求1所述的方法,其中,所述基于所述目标面部替换特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后的面部图像,包括:
    对所述面部图像进行特征提取,得到面部图像的面部特征;
    基于所述目标面部替换特征和所述面部图像的面部特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后的面部图像。
  8. 一种面部图像处理模型的训练方法,所述方法由计算机设备执行,包括:
    获取训练图像样本组,所述训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;
    利用面部图像处理模型,将所述面部模板图像样本中的模板面部替换为所述面部图像样本中的源面部,得到预测面部图像;
    对所述预测面部图像进行三维面部轮廓点检测,得到所述预测面部图像的三维面部轮廓点,以及对所述面部参考图像样本进行三维面部轮廓点检测,得到所述面部参考图像样本的三维面部轮廓点;
    获取所述预测面部图像的三维面部轮廓点和所述面部参考图像样本的三维面部轮廓点之间的差异,得到所述预测面部图像和所述面部参考图像样本之间的面部轮廓损失;
    基于所述面部轮廓损失,更新所述面部图像处理模型的模型参数。
  9. 如权利要求8所述的方法,其中,所述对所述预测面部图像进行三维面部轮廓点检测,得到所述预测面部图像的三维面部轮廓点,包括:
    对所述预测面部图像进行三维面部建模,得到所述预测面部图像的三维预测面部图像特征;
    对所述三维预测面部图像特征进行三维关键点投影,得到所述预测面部图像的三维面部关键点;
    从所述三维面部关键点中筛选出所述三维面部轮廓点。
  10. 如权利要求9所述的方法,其中,所述对所述三维预测面部图像特征进行三维关键点投影,得到所述预测面部图像的三维面部关键点,包括:
    从所述三维预测面部图像特征中,提取出所述预测面部图像的预测面部身份特征和预测面部表情特征;
    利用预设传递参数,将所述预测面部身份特征和预测面部表情特征进行三维关键点投影,得到所述预测面部图像的三维面部关键点。
  11. 如权利要求8所述的方法,其中,所述获取所述预测面部图像的三维面部轮廓点和所述面部参考图像样本的三维面部轮廓点之间的差异之后,还包括:
    获取所述面部参考图像样本和所述预测面部图像之间除了三维面部轮廓点以外的差异,得到所述面部参考图像样本和所述预测面部图像之间的第一损失,所述第一损失包括除了面部轮廓损失以外的损失;
    获取所述面部图像样本和所述预测面部图像之间的面部特征损失;
    将所述第一损失和所述面部特征损失进行融合,得到第二损失。
  12. 如权利要求11所述的方法,其中,所述第一损失包括像素损失、特征损失和判别损失;
    所述获取所述面部参考图像样本和所述预测面部图像之间除了三维面部轮廓点以外的差异,得到所述面部参考图像样本和所述预测面部图像之间的第一损失,包括:
    获取所述面部参考图像样本和所述预测面部图像之间的像素差异,得到像素损失;
    获取所述面部参考图像样本和所述预测面部图像之间的特征差异,得到特征损失;
    获取所述面部参考图像样本和所述预测面部图像之间的判别差异,得到判别损失。
  13. 如权利要求12所述的方法,其中,所述特征损失包括三维特征损失和二维特征损失;
    所述获取所述面部参考图像样本和所述预测面部图像之间的特征差异,得到特征损失,包括:
    获取所述面部参考图像样本和所述预测面部图像之间的二维特征差异,得到二维特征损失;
    获取所述面部参考图像样本和所述预测面部图像之间的三维特征差异,得到三维特征损失;
    将所述二维特征损失和所述三维特征损失进行融合,得到所述特征损失。
  14. 如权利要求12所述的方法,其中,所述获取所述面部参考图像样本和所述预测面部图像之间的判别差异,得到判别损失,包括:
    分别对所述面部参考图像样本和所述预测面部图像进行尺度变换,得到尺度变换后的面部参考图像样本和尺度变换后的预测面部图像;
    对所述尺度变换后的面部参考图像样本进行判别,得到第一判别特征,并对所述尺度变换后的预测面部图像进行判别,得到第二判别特征;
    基于所述第一判别特征和所述第二判别特征,确定所述判别损失。
  15. 如权利要求11所述的方法,其中,所述基于所述面部轮廓损失,更新所述面部图像处理模型的模型参数,包括:
    获取所述面部图像处理模型的模型参数;
    将所述面部轮廓损失和所述第二损失进行融合,得到第三损失;
    利用所述第三损失,更新所述面部图像处理模型的模型参数。
  16. 一种面部图像处理装置,所述装置包括:
    第一获取单元,配置为获取源面部的面部图像和模板面部的面部模板图像;
    三维面部建模单元,配置为对所述面部图像和所述面部模板图像进行三维面部建模,得到所述面部图像的三维面部图像特征和所述面部模板图像的三维面部模板图像特征;
    融合单元,配置为将所述三维面部图像特征和所述三维面部模板图像特征进行融合,得到三维融合特征;
    特征提取单元,配置为基于所述面部模板图像,对所述面部图像进行面部替换特征提取,得到初始面部替换特征;
    转换单元,配置为基于所述三维融合特征对所述初始面部替换特征进行转换处理,得到目标面部替换特征;
    第一替换单元,配置为基于所述目标面部替换特征,将所述面部模板图像中的模板面部替换为所述源面部,得到替换后的面部图像。
  17. 一种面部图像处理模型的训练装置,所述装置包括:
    第二获取单元,配置为获取训练图像样本组,所述训练图像样本组包括面部图像样本、面部模板图像样本和面部参考图像样本;
    第一替换单元,配置为利用面部图像处理模型,将所述面部模板图像样本中的模板面部替换为所述面部图像样本中的源面部,得到预测面部图像;
    三维面部轮廓点检测单元,配置为对所述预测面部图像进行三维面部轮廓点检测,得到所述预测面部图像的三维面部轮廓点,以及对所述面部参考图像样本进行三维面部轮廓点检测,得到所述面部参考图像样本的三维面部轮廓点;
    计算单元,配置为获取所述预测面部图像的三维面部轮廓点和所述面部参考图像样本的三维面部轮廓点之间的差异,得到所述预测面部图像和所述面部参考图像样本之间的面部轮廓损失;
    调整单元,配置为基于所述面部轮廓损失,更新所述面部图像处理模型的模型参数。
  18. 一种计算机设备,所述计算机设备包括:
    存储器,配置为存储可执行指令;
    处理器,配置为执行所述存储器中存储的可执行指令时,实现权利要求1至7任一项所述的方法,或者实现权利要求8至15任一项所述的方法。
  19. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被处理器执行时,实现权利要求1至7任一项所述的方法,或者实现权利要求8至15任一项所述的方法。
  20. 一种计算机程序产品,包括计算机程序或计算机可执行指令,所述计算机程序或计算机可执行指令被处理器执行时,实现权利要求1至7任一项所述的方法,或者实现权利要求8至15任一项所述的方法。
PCT/CN2022/111744 2021-08-20 2022-08-11 面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品 WO2023020358A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022565902A JP7500768B2 (ja) 2021-08-20 2022-08-11 顔画像処理方法、顔画像処理モデルの訓練方法、装置、機器、及びコンピュータプログラム
KR1020227041706A KR20230028253A (ko) 2021-08-20 2022-08-11 얼굴 이미지 처리 방법, 얼굴 이미지 처리 모델 훈련 방법, 장치, 디바이스, 저장 매체 및 프로그램 제품
US18/070,301 US20230100427A1 (en) 2021-08-20 2022-11-28 Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110963370.5A CN114973349A (zh) 2021-08-20 2021-08-20 面部图像处理方法和面部图像处理模型的训练方法
CN202110963370.5 2021-08-20

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/070,301 Continuation US20230100427A1 (en) 2021-08-20 2022-11-28 Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product

Publications (1)

Publication Number Publication Date
WO2023020358A1 true WO2023020358A1 (zh) 2023-02-23

Family

ID=82972978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111744 WO2023020358A1 (zh) 2021-08-20 2022-08-11 面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品

Country Status (4)

Country Link
US (1) US20230100427A1 (zh)
KR (1) KR20230028253A (zh)
CN (1) CN114973349A (zh)
WO (1) WO2023020358A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171198B (zh) * 2022-09-02 2022-11-25 腾讯科技(深圳)有限公司 模型质量评估方法、装置、设备及存储介质
KR20240071778A (ko) * 2022-11-16 2024-05-23 전준혁 2d 얼굴 이미지로부터 3d 얼굴 모델을 생성하는 시스템 및 방법
CN116386121B (zh) * 2023-05-30 2023-08-11 湖北华中电力科技开发有限责任公司 一种基于电网安全生产的人员识别方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200302670A1 (en) * 2018-02-12 2020-09-24 Tencent Technology (Shenzhen) Company Limited Image processing method, electronic device, and storage medium
CN111783603A (zh) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 生成对抗网络训练方法、图像换脸、视频换脸方法及装置
CN111860167A (zh) * 2020-06-18 2020-10-30 北京百度网讯科技有限公司 人脸融合模型获取及人脸融合方法、装置及存储介质
CN113240792A (zh) * 2021-04-29 2021-08-10 浙江大学 一种基于人脸重建的图像融合生成式换脸方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200302670A1 (en) * 2018-02-12 2020-09-24 Tencent Technology (Shenzhen) Company Limited Image processing method, electronic device, and storage medium
CN111860167A (zh) * 2020-06-18 2020-10-30 北京百度网讯科技有限公司 人脸融合模型获取及人脸融合方法、装置及存储介质
CN111783603A (zh) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 生成对抗网络训练方法、图像换脸、视频换脸方法及装置
CN113240792A (zh) * 2021-04-29 2021-08-10 浙江大学 一种基于人脸重建的图像融合生成式换脸方法

Also Published As

Publication number Publication date
KR20230028253A (ko) 2023-02-28
CN114973349A (zh) 2022-08-30
US20230100427A1 (en) 2023-03-30
JP2023541745A (ja) 2023-10-04

Similar Documents

Publication Publication Date Title
US11610122B2 (en) Generative adversarial neural network assisted reconstruction
US11610435B2 (en) Generative adversarial neural network assisted video compression and broadcast
US11983850B2 (en) Image processing method and apparatus, device, and storage medium
WO2023020358A1 (zh) 面部图像处理方法、面部图像处理模型的训练方法、装置、设备、存储介质及程序产品
CN111325851B (zh) 图像处理方法及装置、电子设备和计算机可读存储介质
CN112990054B (zh) 紧凑的无语言面部表情嵌入和新颖三元组的训练方案
JP7373554B2 (ja) クロスドメイン画像変換
WO2020228525A1 (zh) 地点识别及其模型训练的方法和装置以及电子设备
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
WO2022143645A1 (zh) 三维人脸重建的方法、装置、设备和存储介质
CN110555896B (zh) 一种图像生成方法、装置以及存储介质
Yang et al. MPED: Quantifying point cloud distortion based on multiscale potential energy discrepancy
CN114648613A (zh) 基于可变形神经辐射场的三维头部模型重建方法及装置
CN111402403A (zh) 高精度三维人脸重建方法
CN113902848A (zh) 对象重建方法、装置、电子设备及存储介质
Purps et al. Reconstructing facial expressions of hmd users for avatars in vr
JP7500768B2 (ja) 顔画像処理方法、顔画像処理モデルの訓練方法、装置、機器、及びコンピュータプログラム
CN115359508A (zh) 通过专家的神经元优化以提高的效率执行复杂优化任务
Ho et al. Advances in Multimedia Information Processing--PCM 2015: 16th Pacific-Rim Conference on Multimedia, Gwangju, South Korea, September 16-18, 2015, Proceedings, Part I
CN116152399A (zh) 三维人脸形状生成方法、装置、设备及存储介质
JP7479507B2 (ja) 画像処理方法及び装置、コンピューター機器、並びにコンピュータープログラム
WO2024066549A1 (zh) 一种数据处理方法及相关设备
Zhao et al. The Method of Reconstructing Three‐Dimensional Human Posture from Two‐Dimensional Images in Film and Television Production
Zhao et al. 3D Face Reconstruction with Geometry Details from a Single Color Image Under Occluded Scenes
CN116630138A (zh) 图像处理方法、装置、电子设备和计算机可读存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022565902

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857670

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE