WO2023184817A1 - 图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 - Google Patents

图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2023184817A1
WO2023184817A1 PCT/CN2022/111774 CN2022111774W WO2023184817A1 WO 2023184817 A1 WO2023184817 A1 WO 2023184817A1 CN 2022111774 W CN2022111774 W CN 2022111774W WO 2023184817 A1 WO2023184817 A1 WO 2023184817A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
face
sample
target
features
Prior art date
Application number
PCT/CN2022/111774
Other languages
English (en)
French (fr)
Inventor
贺珂珂
朱俊伟
张昕昳
邰颖
汪铖杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020227040870A priority Critical patent/KR20230141429A/ko
Priority to JP2022565680A priority patent/JP7479507B2/ja
Priority to US17/984,110 priority patent/US20230316607A1/en
Publication of WO2023184817A1 publication Critical patent/WO2023184817A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • This application relates to technical fields such as artificial intelligence and computer vision.
  • This application relates to an image processing method, device, computer equipment, computer-readable storage medium and computer program product.
  • Face-changing is an important technology in the field of computer vision. Face-changing is widely used in content production, film and television portrait production, entertainment video production and other scenarios.
  • face swapping refers to the process of transferring facial features in image A to image B to obtain a face swapped image.
  • face swapping is usually implemented based on shape fitting; for example, the facial features and contours between the two images can be calculated based on the detected facial key points in image A and the facial key points in image B. According to the shape change relationship of the equal area, the faces in image A and image B are fused according to the shape transformation relationship to obtain the face-swap image.
  • the above shape fitting process realizes face changing through facial deformation and fusion processes.
  • simple shape fitting cannot handle the face with large difference in posture, and it is easy to make the face deformation in the face swap image unnatural, that is, The similarity between the face-swapping image and the face in image A is low, resulting in low accuracy of face-swapping.
  • Embodiments of the present application provide an image processing method, device, computer equipment, computer-readable storage medium, and computer program product, which can improve the similarity before and after face replacement, thereby improving the accuracy of face replacement.
  • An embodiment of the present application provides an image processing method, which includes:
  • the attribute parameters of the face-changing image indicate the characteristics of the face in the face-changing image. three-dimensional properties;
  • the target comprehensive features are transferred to the image coding features of the face-changing image to obtain fused coding features;
  • the fused coding features are decoded to obtain a target face-changing image including a fused face, where the fused face is a fusion of the face in the image to be changed and the target face.
  • An embodiment of the present application provides an image processing device, which includes:
  • the attribute parameter acquisition module is configured to receive a face-changing request, where the face-changing request is used to request that the face in the face-changing image be replaced with the target face;
  • the target attribute parameter determination module is configured to obtain the attribute parameters of the face-to-be-swapped image, the attribute parameters of the target face, and the facial features of the target face.
  • the attribute parameters of the face-to-be-swapped image indicate the Three-dimensional attributes of the face in the face-to-be-swapped image; determining target attribute parameters based on the attribute parameters of the face-to-be-swapped image and the attribute parameters of the target face;
  • a comprehensive feature determination module configured to determine the target comprehensive features based on the target attribute parameters and facial features of the target face
  • An encoding module configured to perform encoding processing on the face-to-be-swapped image to obtain image coding features of the face-to-be-swapped image
  • a migration module configured to migrate the target comprehensive features to the image coding features of the face-changing image through a regularization method to obtain fused coding features
  • a decoding module configured to decode the fused coding features to obtain a target face-swapping image including a fused face, where the fused face is a fusion of the face in the face-swapping image and the target face.
  • An embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory.
  • the processor executes the computer program to implement the above image processing method.
  • Embodiments of the present application provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the above image processing method is implemented.
  • An embodiment of the present application provides a computer program product, which includes a computer program.
  • the computer program is executed by a processor, the above image processing method is implemented.
  • the embodiment of the present application provides the method of determining the target attribute parameters based on the attribute parameters of the face-changing image and the attribute parameters of the target face, thereby locating the three-dimensional attribute characteristics of the face in the desired generated image; based on the target attribute parameters and the target face
  • the facial features of the face to be swapped are obtained, and the target comprehensive features that can comprehensively characterize the face-to-be-swapped image and the target face are obtained;
  • the face-to-be-swapped image is encoded to obtain the image coding features of the face-to-be-swapped image, and thus the image coding features are obtained
  • the refined features of the face-to-be-swapped image at the pixel level; the target comprehensive features are migrated to the image coding features of the face-swapping image through regularization to obtain fused coding features.
  • This embodiment of the present application will refine the features to the pixel level.
  • the coding features of the image are mixed with the global comprehensive features, and the features of the image coding features are aligned to the target comprehensive features, thereby improving the accuracy of the generated fused coding features; by decoding the fused coding features, we obtain the fusion coding features including
  • the target face-changing image of the face the decoded image can be refined to each pixel point to show the comprehensive characteristics of the target, making the sense of the fused face in the decoded image closer to the target face, improving the efficiency of the fused face and the target face Sensory similarity between images, thereby improving the accuracy of face replacement.
  • Figure 1 is a schematic diagram of an implementation environment for implementing an image processing method provided by an embodiment of the present application
  • Figure 2 is a schematic flowchart of a training method for a face-changing model provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of the training process framework of a face-changing model provided by an embodiment of the present application.
  • Figure 4 is a signaling interaction diagram of an image processing method provided by an embodiment of the present application.
  • Figure 5 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the facial images involved such as the first sample image, the second sample image, the posture image, the video of the target object, etc. used in the training of the face-swap model, and any other related Object-related data, as well as any object-related data such as the face-swapping image used when using the face-swapping model to perform face-swapping, facial features of the target face, attribute parameters, etc.
  • any of the above-mentioned object-related data are processed Obtained after the consent or permission of the relevant subjects; when the following embodiments of this application are applied to specific products or technologies, the permission or consent of the subject is required, and the collection, use and processing of relevant data need to comply with the relevant laws and regulations of the relevant countries and regions. and standards.
  • the face-changing process performed on the facial image of any subject using the image processing method of this application is based on the face-changing service or face-changing request triggered by the relevant object, and the face-changing process is performed after the permission or consent of the relevant object. face process.
  • Face replacement using the target face in one face image to replace the face in another image.
  • Face-swap model Calling the face-swap model can swap the target face into any image to be swapped based on the attribute data and facial features of the target face; the image processing method provided by the embodiment of the present application can use this face-swap.
  • the face model is used to replace the face in the image to be replaced with a unique target face.
  • Image to be replaced an image in which the face needs to be replaced.
  • the target face can be replaced with the face in the image to be replaced; it should be noted that the image processing method of the embodiment of the present application is used to replace the face.
  • the target face-swapping image is obtained by face-swapping the image.
  • the fused face included in the target face-swapping image is the fusion of the face in the face-swapping image and the target face.
  • the sensory similarity between the fused face and the target face is higher.
  • the fused face also fuses the expressions, angles and other postures of the face in the face image to be replaced, thereby making the target face image more vivid and realistic.
  • Attribute parameters are used to indicate the three-dimensional attributes of the face in the image, which can represent the posture of the face in the three-dimensional space, spatial environment and other attributes.
  • Facial features Characterize the features of the face in the image on a two-dimensional plane, such as the distance between the eyes and the size of the nose; facial features can represent the identity of the object with the facial features.
  • Target face an exclusive face used to replace the face in the image.
  • the target face can be a face specified based on the user's selection operation; the embodiment of the present application provides a replacement that uses the target face as an exclusive face.
  • the face service means that the exclusive target face can be replaced with any face image to be replaced; for example, target face A can replace the face of image B, and target image A can also replace the face of image C.
  • the first sample image includes the target face and is an image used when training the face-swap model.
  • the second sample image includes the face to be replaced and is the image used when training the face replacement model.
  • the target face in the first sample image can be used as an exclusive face, and the target face in the first sample image can be replaced with the second sample image. Based on this process, a face-swap model can be trained.
  • Figure 1 is a schematic diagram of the implementation environment of an image processing method provided by an embodiment of the present application. As shown in Figure 1, the implementation environment includes: a server 11 and a terminal 12.
  • the server 11 is configured with a trained face-changing model, and the server 11 can provide a face-changing function to the terminal 12 based on the face-changing model.
  • the face-changing service refers to face-changing the face in the face-changing image based on the target face, so that the fused face in the generated target face image can fuse the original face and the target face in the image.
  • the terminal 12 can send a face-changing request to the server 11.
  • the face-changing request can carry the image to be changed.
  • the server 11 can execute the image processing method of the present application to generate a target based on the face-changing request. face-swapping image, and returns the target face-swapping image to the terminal 12 .
  • the server 11 may be a backend server for the application.
  • the terminal 12 is installed with an application program, and the terminal 12 and the server 11 can interact with each other based on the application program to implement the face-changing process.
  • the application can be configured with face-swapping functionality.
  • the application is any application that supports the face-changing function.
  • the application includes but is not limited to: video editing applications, image processing tools, video applications, live broadcast applications, social applications, content interaction platforms, game applications, etc.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware. Cloud servers or server clusters that provide basic cloud computing services such as software services, domain name services, security services, content delivery networks (CDN, Content Delivery Network), and big data and artificial intelligence platforms.
  • the above-mentioned networks may include, but are not limited to: wired networks, and wireless networks.
  • the wired networks include local area networks, metropolitan area networks, and wide area networks.
  • the wireless networks include Bluetooth, Wi-Fi, and other networks that implement wireless communication.
  • the terminal can be a smartphone (such as Android phone, iOS phone, etc.), tablet computer, laptop computer, digital broadcast receiver, mobile Internet device (MID, Mobile Internet Devices), personal digital assistant, desktop computer, vehicle terminal (such as vehicle navigation terminals, vehicle-mounted computers, etc.), smart home appliances, aircraft, smart speakers, smart watches, etc.
  • the terminals and servers can be connected directly or indirectly through wired or wireless communication methods, but are not limited to this.
  • the image processing method provided by the embodiment of this application involves the following artificial intelligence, computer vision and other technologies.
  • cloud computing, big data processing and other technologies in artificial intelligence technology are used to realize attribute parameter extraction and face replacement in the first sample image.
  • computer vision technology is used to perform face recognition on image frames in the video to crop out a first sample image including the target face.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, autonomous driving, smart transportation and other major directions.
  • CV Computer Vision
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to machine vision such as using cameras and computers instead of human eyes to identify and measure targets. , and further perform graphics processing to make computer processing into an image more suitable for human eye observation or transmitted to instrument detection.
  • computer vision studies related theories and technologies trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, optical character recognition, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, and simultaneous positioning With map construction, autonomous driving, smart transportation and other technologies, it also includes common biometric recognition technologies such as face recognition and fingerprint recognition.
  • FIG 2 is a schematic flowchart of a face-changing module training method provided by an embodiment of the present application.
  • the execution subject of this method may be a computer device (such as the server 11 shown in Figure 1).
  • the method includes the following steps 201 to 208.
  • Step 201 The computer device obtains the facial features and attribute parameters of the first sample image, and obtains the attribute parameters of the second sample image.
  • the first sample image includes the target face
  • the second sample image includes the face to be replaced.
  • the computer device may collect data including an arbitrary face as the second sample image, and collect images including multiple posture angles of the target face as the first sample image.
  • the computer device can obtain the attribute parameters of the first sample image and the attribute parameters of the second sample image through the facial parameter estimation model.
  • the computer device can obtain the facial features of the first sample image through the facial recognition model.
  • the facial parameter estimation model is used to estimate three-dimensional attribute parameters of the face based on the input two-dimensional facial image.
  • the facial parameter estimation model can be a model with a convolutional neural network structure.
  • the facial parameter estimation model can be a three-dimensional deformable face model (3DMM, 3D Morphable models).
  • 3DMM three-dimensional deformable face model
  • the embodiment of the present application can use the residual network in the 3DMM (ResNet, Residual Network) partially regresses the three-dimensional attribute parameters of the input two-dimensional facial image.
  • the facial parameter estimation model can also be any other model that has the function of extracting the three-dimensional attribute parameters of the face in the two-dimensional image.
  • 3DMM Residual Network
  • Attribute parameters are used to indicate the three-dimensional attributes of the face in the image, which can represent the posture of the face in three-dimensional space, spatial environment and other attributes; attribute parameters include but are not limited to: shape coefficient (id_coeff), expression coefficient (expression_coeff), texture coefficient ( texture_coeff), angle coefficient (angles_coeff), lighting coefficient (gamma_coeff), etc.
  • shape coefficient represents the shape of the face, the shape of the facial features, etc.
  • the angle coefficient represents the pitch angle, left and right deflection angles of the face
  • the texture coefficient can represent the skin, hair, etc. of the face
  • the lighting coefficient can represent the face in the image. The lighting conditions of the surrounding environment.
  • the computer device provided by the embodiment of the present application can extract one or more specified items among the shape coefficient, expression coefficient, texture coefficient, angle coefficient, and illumination coefficient as attribute parameters of each sample image, or can extract all items as corresponding samples.
  • Image attribute parameters can include the following three methods.
  • Method 1 The computer device extracts the shape coefficient of the target face in the first sample image as an attribute parameter of the first sample image, and the computer device extracts the expression coefficient and angle coefficient in the second sample image as attribute parameters of the second sample image. .
  • the attribute parameters of the first sample image include the shape coefficient of the target face in the first sample image.
  • the attribute parameters of the second sample image include the expression coefficient and angle coefficient of the face in the second sample image.
  • Method 2 For the second sample image, the computer device can obtain the preconfigured parameters of the second sample image as the attribute parameters of the second sample image. For the first sample image, the computer device extracts the shape coefficient of the target face in the first sample image as an attribute parameter of the first sample image.
  • the computer device configures which attribute parameters in the second sample image may be included based on needs, and the attribute parameters of the second sample image may include preconfigured parameters.
  • the preconfigured parameters may include at least one of an expression coefficient, a texture coefficient, an angle coefficient, and an illumination coefficient.
  • the preconfigured parameters are based on parameters that need to be preconfigured. For example, through preconfigured parameters including lighting coefficients and expression coefficients, the final fused face has the lighting, expression and other characteristics of the surrounding environment of the face to be replaced; You can also configure preconfigured parameters including texture coefficients, angle coefficients, etc., which will not be described here.
  • Method 3 The computer device can also extract multiple parameters of the first sample image and the second sample image as corresponding attribute parameters. In subsequent steps, the required parameters can be further extracted from the multiple parameters.
  • the attribute parameters of the first sample image may include five parameters including the shape coefficient, expression coefficient, texture coefficient, angle coefficient, and lighting coefficient of the target face in the first sample image.
  • the attribute parameters can be represented by vectors.
  • the attribute parameters of the first sample image include the above five parameters
  • the attribute parameters of the first sample image can be represented as a 257-dimensional feature vector.
  • the attribute parameters of the second sample image may also include five parameters such as the shape coefficient, expression coefficient, texture coefficient, angle coefficient, and lighting coefficient of the second sample image.
  • the attribute parameters of the second sample image may also be expressed as 257-dimensional eigenvector.
  • the computer device can acquire posture images of the target face at multiple posture angles, and extract facial features and attribute parameters of the first sample image based on the multiple posture images.
  • the process of the computer device acquiring the facial features and attribute parameters of the first sample image can be implemented through the following technical solution: the computer device acquires at least two posture images as the first sample image, and the at least two posture images include at least one of the target face.
  • the computer device obtains facial features and attribute parameters corresponding to at least two facial gestures based on at least two gesture images; the computer device uses the mean value of the facial features corresponding to at least two facial gestures as the third For the facial features of a sample image, the mean value of the attribute parameters corresponding to at least two facial postures is used as the attribute parameter of the first sample image.
  • the computer device may call the facial parameter estimation model to extract the attribute parameters of each of the at least two posture images, and calculate the mean of the attribute parameters of the at least two posture images, and use the mean of the attribute parameters of the at least two posture images as Attribute parameters of the first sample image.
  • the computer device can call the facial recognition model to extract the facial features in the two-dimensional plane of each of the at least two posture images, and calculate the mean value of the facial features of the at least two posture images, and combine the facial features of the at least two posture images.
  • the mean value of the facial features is used as the facial feature of the first sample image; for example, the facial feature of the first sample image can be a 512-dimensional feature vector.
  • the facial features represent the identity of the target object, and the target face is the face of the target object.
  • the computer device may extract multiple pose images including the target's face from the video.
  • the computer device acquiring at least two gesture images as the first sample image can be implemented through the following technical solution: the computer device performs face recognition processing on at least two image frames included in the video of the target object, and obtains at least two image frames including the target face. image frames, the face of the target object is the target face; the computer device performs face cropping processing on at least two image frames to obtain at least two posture images, and uses the at least two posture images as the first sample image.
  • Facial gestures may include but are not limited to: facial expressions, angles, shapes and movements of facial features, glasses worn on the face, facial makeup, and other attributes.
  • the computer device can use any attribute of the facial gestures to Make pose distinctions; for example, a face with a smiling expression and a face with an angry expression can be used as faces in two poses; a face wearing glasses and a face without glasses can also be used as faces in two poses; target face The head pitch angle is 45° upward with the eyes closed, and the face tilted 30° downward with the eyes open can also be used as a face in two postures.
  • the computer device may also acquire a plurality of independent still images of the target's face and extract a plurality of pose images from the plurality of independent still images.
  • the computer device may also perform face cropping processing on multiple still images to obtain at least two posture images, and use the at least two posture images as the first sample image.
  • the computer device can perform facial cropping on image frames to obtain gesture images through the following technical solutions.
  • the computer device performs face detection processing on the image frame to obtain the face coordinate frame of the image frame. Specifically, the face area where the target face is located in the image frame is circled by the face coordinate box.
  • the computer device performs face registration processing on the image frame according to the face coordinate frame of the image frame to obtain the target face key points in the image frame.
  • the target face key points may include the target face in the image frame.
  • the key points of facial features, facial contours, and hair details can also be included.
  • the computer device can implement key point detection processing on the image frame through a target detection network, such as the YOLO network, etc.
  • the input information of the target detection network is the face image and the face coordinate frame of the face image in the image frame, and the output
  • the information is a facial key point coordinate sequence including target facial key points.
  • the number of key points included in the facial key point coordinate sequence can be pre-configured based on different needs for facial details.
  • the facial key point coordinate sequence contains The number of key points included can be fixed values such as 5 points, 68 points, 90 points, etc.
  • the computer device performs face cropping processing on the image frame to obtain the posture image, and performs connection processing on the key points of the target face according to the order represented by the coordinate sequence of the facial key points, and connects the obtained Closed figures as gesture images.
  • the process of obtaining the second sample image is similar to the process of obtaining the first sample image.
  • the computer device may acquire an object image including an arbitrary object, perform face cropping processing on the object image, obtain an image including the object's face, and use the image including the object's face as the second sample image.
  • the face cropping method is similar to the technical solution of cropping the face of the image frame to obtain the posture image, and will not be described in detail here.
  • the computer device may call the facial parameter estimation model to extract the attribute parameters of the second sample image.
  • the computer device can store the facial features and attribute parameters of the first sample image; specifically, the computer device stores the facial features and attribute parameters of the first sample image to a target address, and the target address is Preconfigured storage address.
  • the target address is Preconfigured storage address.
  • Step 202 The computer device determines the sample attribute parameters based on the attribute parameters of the first sample image and the attribute parameters of the second sample image.
  • the sample attribute parameters are used to indicate the expected attributes of the face in the sample face-swapping image to be generated.
  • the computer device may determine the shape coefficient of the first sample image and the expression coefficient and angle coefficient of the second sample image as sample attribute parameters.
  • the computer device may select various attribute parameters of the first sample image and the second sample image as the sample attribute parameters based on needs.
  • Step 202 can be implemented through the following technical solution: the computer device determines the shape coefficient of the first sample image and the preconfigured parameters of the second sample image as target attribute parameters.
  • the preconfigured parameters of the second sample image include expression coefficients and angle coefficients. , at least one of the texture coefficient and the lighting coefficient.
  • the preconfigured parameters may be the preconfigured parameters obtained in the method 2 of step 201. In this step, the computer device may directly obtain the preconfigured parameters of the second sample image.
  • the preconfigured parameters may also be preconfigured parameters extracted from attribute parameters including five coefficients; in this step, the computer device may extract from the second sample image according to the preconfigured parameter identification.
  • the preconfigured parameter identifies the preconfigured parameter corresponding to the preconfigured parameter identifier.
  • the preconfigured parameter identification may include a parameter identification of at least one parameter among expression coefficient, angle coefficient, texture coefficient and lighting coefficient.
  • the preconfigured parameters may include expression coefficients and angles, that is, for the face in the sample face-changing image to be generated, the shape of the face, facial features, etc., expected to have the target face, and the face in the second sample image.
  • the computer device can determine the shape coefficient of the target face and the expression coefficient and angle of the second sample image as target attribute parameters.
  • the preconfigured parameters may also include texture coefficients and lighting coefficients, that is, the face in the sample face-changing image is expected to have the shape of the target face, and the texture coefficient and lighting coefficient of the face in the second sample image, etc. ;
  • the computer device may also determine the shape coefficient of the target face, and the texture coefficient and illumination coefficient of the second sample image as sample attribute parameters.
  • Step 203 The computer device determines the comprehensive characteristics of the sample based on the sample attribute parameters and the facial features of the first sample image.
  • the computer device can splice the sample attribute parameters and the facial features of the first sample image, and use the spliced features obtained as the sample comprehensive features.
  • the sample comprehensive features can characterize the comprehensive features of the face in the sample facial features expected to be generated.
  • the sample attribute parameters and facial features can be expressed in the form of feature vectors.
  • the computer device can perform a splicing operation on the first feature vector corresponding to the sample attribute parameters and the second feature vector corresponding to the facial features to obtain the sample comprehensive feature correspondence.
  • the third eigenvector of .
  • Step 204 The computer device performs encoding processing on the second sample image to obtain sample encoding features.
  • the computer device inputs the second sample image into the encoder of the initialized face-changing model, encodes the second sample image through the encoder, obtains the encoding vector corresponding to the second sample image, and uses the encoding vector as the sample encoding feature.
  • the sample encoding features are obtained by encoding the second sample image, thereby accurately refining the pixel-level information of each pixel included in the second sample image.
  • the encoder includes multiple cascaded convolutional layers.
  • the second sample image is convolved through multiple cascaded convolutional layers.
  • Each convolutional layer inputs the result of the convolutional processing to the next convolutional layer to continue.
  • Convolution processing is performed, and the output of the last convolutional layer is the sample encoding feature.
  • Step 205 The computer device transfers the sample comprehensive features to the sample coding features of the second sample image through a regularization method to obtain the sample fusion features.
  • the computer device may use step 205 to integrate the sample comprehensive features and the sample coding features.
  • the computer device can use a regularization method to align the third feature distribution of the sample comprehensive feature with the fourth feature distribution of the second sample image to obtain the sample fusion feature.
  • feature distributions may include means and standard deviations.
  • step 205 can be implemented through the following technical solution: the computer device obtains the third mean and the third standard deviation of the sample coding feature in at least one feature channel, and uses the normal distribution that conforms to the third mean and the third standard deviation as the third feature distribution, and obtain the fourth mean and fourth standard deviation of the sample's comprehensive characteristics in at least one feature channel, and use the normal distribution that conforms to the fourth mean and the fourth standard deviation as the fourth feature distribution; the computer equipment encodes the sample characteristics in The mean and standard deviation of each feature channel (the third feature distribution) are aligned with the mean and standard deviation of the sample comprehensive feature in the corresponding feature channel (the fourth feature distribution) to obtain the sample fusion feature. The computer equipment can normalize each feature channel of the sample coding features, and align the mean and standard deviation of the normalized sample coding features with the mean and standard deviation of the sample comprehensive features to generate sample fusion features.
  • the computer device can implement the above-mentioned alignment process from the third feature distribution to the fourth feature distribution through the following formula (1) based on the sample coding features and the sample comprehensive features, and calculate the sample fusion features.
  • x represents the sample coding characteristics
  • y represents the sample comprehensive characteristics
  • ⁇ (x) and ⁇ (x) respectively represent the mean and standard deviation of the sample coding characteristics
  • ⁇ (y) and ⁇ (y) respectively represent the mean of the sample comprehensive characteristics. and standard deviation.
  • using the adaptive instance regularization method is to use the adaptive instance regularization (AdaIN, Adaptive Instance Normalization) algorithm.
  • AdaIN (x, y) represents the sample fusion features generated based on the adaptive instance regularization method.
  • the instance regularization (IN, Instance Normalization) algorithm can also be used, without limitation.
  • Step 206 The computer device decodes the sample fusion features to obtain a sample face-changing image.
  • the computer device passes the decoder in the face-swapping model initialized with the sample fusion features, and restores the image corresponding to the sample fusion features through the decoder.
  • the computer device uses the image output by the decoder as the sample face-swapping image.
  • the decoder can restore the image corresponding to the injected features based on the injected features.
  • the computer device decodes the sample fusion image through the decoder to obtain the sample face-changing image; for example, the encoder can perform a convolution operation on the input image, so the decoder can perform reverse operations according to the operating principle of the encoder when running. , that is, a deconvolution operation to restore the image corresponding to the sample fusion features.
  • the encoder can be an autoencoder (AE, AutoEncoder), and the decoder can be a decoder corresponding to the autoencoder.
  • the decoder includes multiple cascaded convolutional layers.
  • the sample fusion features are deconvolved through multiple cascaded convolutional layers.
  • Each convolutional layer inputs the result of the deconvolution processing to the next convolutional layer.
  • the output of the last convolutional layer is the sample face-swapping image.
  • step 205 using regularization to perform feature migration can support the migration of sample comprehensive features to the coding features of any image, realizing the mixing of sample comprehensive features and sample coding features; and, the sample coding features are the second representation of The characteristics of each pixel in the sample image, and the sample comprehensive feature synthesizes the characteristics of the first sample image and the second sample image from a global perspective.
  • the mixture between the coding features refined to the pixel level and the global comprehensive features is achieved, and the feature distribution of the sample coding features is aligned to the sample comprehensive features, thereby improving the generated sample fusion
  • the accuracy of the features through step 206, the sample fusion features are used to decode the image, so that the decoded image can be refined to each pixel point to show the comprehensive features of the sample, which improves the sensory relationship between the face in the decoded image and the target face. Similarity improves the accuracy of face swapping.
  • Step 207 Based on the first difference between the sample face-changing image and the sample attribute parameters, and the second difference between the facial features of the sample face-changing image and the facial features of the first sample image. The difference, the third difference between the sample face-swap image and the second sample image, determines the total loss of the initialized face-swap model.
  • the three differences are weighted and averaged to obtain the total loss.
  • the weight corresponding to each difference can be a preconfigured value.
  • step 208 the initialized face-swapping model is trained based on the total loss until it meets the target conditions, and the model obtained when the target conditions are met is used as the face-swapping model.
  • the computer device can respectively determine multiple similarities between the sample face-changing image and the sample attribute parameters, facial features of the first sample image, and the second sample image, and obtain the total loss based on the multiple similarities.
  • the initialized face-swapping model may include a discriminator, and the computer device may use the discriminator to determine the authenticity of the sample face-swapping image.
  • the process of the computer device determining the total loss may include the following steps: the computer device obtains the first similarity between the attribute parameters of the sample face-changing image and the sample attribute parameters, and uses the first similarity as the first difference; the computer device obtains the sample face-changing image The second similarity between the facial features of the image and the facial features of the first sample image is used as the second difference; the computer device obtains the second sample image through the discriminator of the initialized face-swapping model The third degree of similarity between the sample face-changing image and the sample face-swap image is the third degree of similarity as the third difference; the computer device determines the total loss based on the first degree of similarity, the second degree of similarity, and the third degree of similarity.
  • the computer device can extract the attribute parameters of the sample face-swapping image, and determine the first similarity between the attribute parameters of the sample face-swapping image and the sample attribute parameters through the following formula (2).
  • 3d feature loss represents the first similarity.
  • result 3d feature represents the attribute parameters of the sample face-changing image
  • gt 3d feature represents the sample attribute parameters
  • abs represents the absolute value of (gt 3d feature–result 3d feature).
  • the sample attribute parameters can be the shape coefficient of the target face and the expression coefficient and angle of the second sample image.
  • gt 3d feature can be expressed as the following formula (3):
  • gt 3d feature source 3d feature id+target 3d feature expression+target 3d feature angles; (3);
  • source 3d feature id represents the shape pixel number of the first sample image
  • target 3d feature expression represents the expression coefficient of the second sample image
  • target 3d feature angles represents the angle of the second sample image
  • the computer device can extract the facial features of the sample face-swap image, and determine the second degree of similarity between the facial features of the sample face-swap image and the facial features of the first sample image through the following formula (4).
  • id loss 1-cosine similarity(result id feature,Mean Source ID) (4);
  • id loss represents the second similarity.
  • the result id feature represents the facial features of the sample face-changing image
  • the Mean Source ID represents the facial features of the first sample image.
  • cosine similarity (result id feature, Mean Source ID) represents the cosine similarity between result id feature and Mean Source ID; among them, the cosine similarity can be determined by the process shown in the following formula (5):
  • a and B can respectively represent the feature vectors corresponding to the facial features of the sample face-changing image and the feature vectors corresponding to the facial features of the first sample image; ⁇ represents the difference between the two feature vectors of vector A and vector B. Angle; A i represents the component of the i-th feature channel in the facial features of the sample face-changing image; B i represents the component of the i-th feature channel in the facial features of the first sample image. Similarity and cos( ⁇ ) represent cosine similarity.
  • the computer device can input the second sample image as a real image into the discriminator, and input the sample face-changing image into the discriminator; through the discriminator, the computer device obtains a third-scale image of the second sample image in at least one scale, and the sample face-changing image, respectively.
  • the image is a fourth-scale image corresponding to at least one scale; the computer device obtains the discrimination probability corresponding to each third-scale image, and obtains the discrimination probability corresponding to each fourth-scale image, and the discrimination probability of the image is used to indicate that the image
  • the third degree of similarity may include a generator and a discriminator.
  • the computer device obtains the discrimination loss value corresponding to the discriminator and the generation loss value corresponding to the generator, and determines the third similarity based on the generation loss value and the discrimination loss value.
  • the generator is used to generate a sample face-changing image based on the second sample image and the first sample image.
  • the generator may include the encoder and decoder used in the above steps 204 to 206.
  • the third degree of similarity may include a generation loss value and a discrimination loss value; the computer device may use the discrimination probability of the sample face-changing image to represent the generation loss value.
  • the computer device may use the following formula (6) based on the discrimination probability of the sample face-changing image, Calculate the generated loss value.
  • D represents the discrimination probability of the sample face-changing image.
  • the discrimination probability of the sample face-changing image refers to the probability that the sample face-changing image belongs to the real image.
  • G loss represents the generation loss value.
  • the generator includes multiple cascaded convolution layers.
  • the generator can be a U-shaped network structure.
  • the second sample image and the first sample image are downsampled through the U-shaped network, and then the downsampling result is upsampled.
  • the sample face-changing image is obtained.
  • the discriminator also includes multiple cascaded convolution layers.
  • the discriminator is the down-sampling structure of the U-shaped network and the fully connected layer.
  • the down-sampling structure of the U-shaped network convolves the sample face-changing image. Convolution processing is performed, and then the convolution results are mapped through the fully connected layer to obtain the discrimination probability of the sample face-changing image.
  • the discriminator can be a multi-scale discriminator, and the computer device can perform scale transformation on the sample face-swapping image through the discriminator to obtain fourth-scale images of multiple scales, for example, obtain fourth-scale images of the sample face-swapping images in the first scale. , the fourth scale image at the second scale and the fourth scale image at the third scale; similarly, the computer device can obtain the third scale image at the first scale and the third scale image at the second scale of the second sample image through the discriminator.
  • a third-scale image at a scale and a third-scale image at a third scale. The first scale, the second scale and the third scale can be set as needed.
  • the first scale can be the original scale of the sample face-swapping image or the second sample image
  • the second scale can be 1/2 of the original scale
  • the third scale Can be 1/4 of the original scale.
  • the computer device can obtain the discrimination probability corresponding to the scale image of each scale through the multi-scale discriminator, and calculate the discrimination loss value based on the discrimination probabilities of the scale image of multiple scales.
  • the computer device obtains the discrimination loss value through the following formula (7) based on the discrimination probability corresponding to each third-scale image and at least one discrimination probability corresponding to each fourth-scale image:
  • D(template img), D(template img1/2), and D(template img1/4) respectively represent the discrimination probability of the second sample image at the original scale and the third scale image, and the second sample image at the 1/2 scale.
  • the discrimination probability of the third scale image, the discrimination probability of the second sample image in the third scale image of 1/4 scale; D(result), D(result1/2), D(result 1/4) respectively represent the sample replacement
  • the second sample image frame may be used as the real image.
  • the discriminator when a balance is reached between the generation loss value and the discrimination loss value, it can be considered that the discriminator has reached the training stop condition and does not need to be trained again.
  • the computer device may determine the total loss based on the above-mentioned first similarity, second similarity and third similarity through the following formula (8):
  • loss represents the total loss
  • 3d feature loss represents the first similarity
  • id loss represents the second similarity
  • (D loss+G loss) represents the third similarity
  • the computer device can perform iterative training on the initialized face-swapping model based on the above steps 201 to 206, obtain the total loss corresponding to each iterative training, and adjust the parameters of the initialized face-swapping model based on the total loss of each iterative training.
  • the parameters including the encoder, decoder, discriminator, etc. in the initialized face-swapping model are optimized multiple times until the total loss meets the target conditions, the computer equipment stops training and uses the model obtained by the last optimization as the face-swapping model. face model.
  • the target condition can be that the value of the total loss is within the target value range, which is a range preset based on multiple experiments.
  • the total loss is within the target value range of no more than 0.5; or, all the values of multiple iterations of training
  • the time consumed exceeds the maximum duration, etc.
  • the maximum duration is 70% of the required time from training to online application.
  • the required time from training to online application is 1 hour, and the time consumed in multiple iterations of training exceeds 0.7
  • the hour representation satisfies the target conditions.
  • Figure 3 is a schematic framework diagram of a dedicated face-changing model training process provided by an embodiment of the present application.
  • the computer device can use the face of subject A as a dedicated target face to obtain the face of subject A. Facial images of multiple poses are used as the first sample image, and attribute parameters of the first sample image are extracted through the 3D facial parameter estimation model, and facial features of the first sample image are extracted through the face recognition model, and Extract attribute parameters of the second sample image through the 3D facial parameter estimation model.
  • the computer device integrates the facial features and shape coefficients of the first sample image, and the preconfigured parameters (such as expression coefficients and angle coefficients) of the second sample image into sample attribute parameters.
  • the computer device can input the second sample image into the initialized face-swapping model.
  • the initialized face-swapping model can include an encoder and a decoder.
  • the computer device can encode the second sample image through the encoder to obtain the encoding characteristics of the second sample image. , for example, encoding the second sample image into the corresponding feature vector.
  • the computer device obtains the sample fusion features based on the sample attribute parameters and the encoding features of the second sample image, and injects the sample fusion features into the decoder in the initialized face-changing model.
  • the decoder can restore the image corresponding to the injected features based on the injected features. .
  • the computer device decodes the sample fusion image through the decoder to obtain the sample face-changing image; for example, the encoder performs a deconvolution operation according to the operating principle of the encoder to restore the image corresponding to the sample fusion feature.
  • the computer device obtains the third degree of similarity through the multi-scale discriminator, and obtains the first degree of similarity and the second degree of similarity based on the facial features and attribute parameters of the extracted sample face-changing image, based on the first degree of similarity and the second degree of similarity , the third similarity, calculates the total loss to optimize the model parameters based on the total loss; the computer equipment performs iterative training with the above process, until the training is stopped when the target conditions are met, and the face in any image can be replaced with an exclusive target face Face-swap model.
  • FIG 4 is a signaling interaction diagram of an image processing method provided by an embodiment of the present application. As shown in Figure 4, the image processing method can be implemented interactively between the server and the terminal. The interactive process of the image processing method can be referred to steps 401 to 410.
  • Step 401 The terminal displays the application page of the target application.
  • the application page includes a target trigger control.
  • the target trigger control is used to trigger a face change request for the image to be changed.
  • the target application can provide a face-changing function, and the face-changing function can be a function of replacing the face in the image to be changed with a dedicated target face.
  • the application page of the target application can provide a target trigger control, and the terminal can send a face-changing request to the server based on the triggering operation of the target trigger control by the object.
  • the target application can be an image processing application, a live broadcast application, a camera tool, a video editing application, etc.
  • the server can be a background server of the target application, or the server can also be used for any computer device that provides the face-changing function, for example, a cloud computing center device equipped with a face-changing model.
  • Step 402 In response to receiving a trigger operation for the target trigger control in the application page, the terminal obtains the face-changing image, and sends a face-changing request to the server based on the face-changing image.
  • the target application may provide a face-changing function for a single image.
  • the target application may be an image processing application, a live broadcast application, a social networking application, etc., and the image to be changed may be obtained from the local storage space for the terminal. The selected image, or the image acquired by the terminal by photographing the object in real time.
  • the target application may provide a face-swapping function for each image frame included in a video.
  • the target application may be a video editing application, a live broadcast application, etc.
  • the server may replace the entire image frame including the face of subject A in the video with the target face.
  • the face image to be replaced may include each image frame in the video, or the terminal may perform initial face detection on each image frame in the video, and use each image frame in the video that includes the face of object A as the face to be replaced. face image.
  • Step 403 The server receives the face-changing request sent by the terminal.
  • Step 404 The server obtains the attribute parameters of the face-changing image, the attribute parameters of the target face, and the facial features of the target face.
  • the attribute parameters of the face-changing image indicate the three-dimensional attributes of the face in the face-changing image; based on The attribute parameters of the face image and the attribute parameters of the target face are used to determine the target attribute parameters.
  • the face swap request is used to request that the face in the face swap image be replaced with the target face.
  • the attribute parameters of the face swap image are used to indicate the three-dimensional attributes of the face in the face swap image; the server can estimate the model through the 3D face parameters. , obtain the attribute parameters of the image to be replaced.
  • the attribute parameters of the image include at least one of a shape coefficient, an expression coefficient, an angle coefficient, a texture coefficient and an illumination coefficient.
  • the attribute parameters of the target face and the facial features of the target face may be stored in advance.
  • the server may determine the shape coefficient of the target face and the preconfigured parameters of the face-to-be-swapped image as target attribute parameters.
  • the preconfigured parameters include at least one of an expression coefficient, an angle coefficient, a texture coefficient, and an illumination coefficient. item.
  • the preconfigured parameters may include expression coefficients and angle coefficients.
  • the preconfigured parameters may also include texture coefficients, lighting coefficients, etc.
  • Step 405 The server determines the comprehensive characteristics of the target based on the target attribute parameters and the facial features of the target face.
  • the server can splice the target attribute parameters and the facial features of the target face to obtain comprehensive features of the target.
  • the server can be configured with a trained face-swapping model, and the server can use the face-swapping model to perform the above-mentioned steps 404 to 405.
  • the face-changing model is trained based on the above steps 201 to 208.
  • the server obtains the face-changing model through training, it can fixedly store the facial features and attribute parameters of the target face, for example, to the target address.
  • the server may extract the attribute parameters of the target face from the target address and execute step 404.
  • the server may extract the facial features of the target face from the target address and execute step 405.
  • the server may perform the following processes from step 406 to step 408 through the face-changing model.
  • Step 406 The server performs coding processing on the face-swapping image to obtain the image coding features of the face-swapping image.
  • Step 407 Transfer the target comprehensive features to the image coding features of the face-changing image through regularization to obtain fused coding features.
  • the computer device can align the mean and standard deviation of the image coding features with the target comprehensive features.
  • Step 407 can be implemented through the following technical solution: the server obtains the first mean and the first standard deviation of the image coding feature in at least one feature channel, uses the normal distribution that conforms to the first mean and the first standard deviation as the first feature distribution, and Obtain the second mean and the second standard deviation of the target comprehensive feature in at least one feature channel, and use the normal distribution that conforms to the second mean and the second standard deviation as the second feature distribution; the server encodes the image features from the first feature distribution Alignment processing to the second feature distribution to obtain fused coding features.
  • the server performs mapping processing on the image coding features so that the mean and standard deviation of the image coding features in each feature channel are consistent with the mean value of the target comprehensive feature in the corresponding feature channel. Aligned with the standard deviation to obtain fused coding features.
  • the server can also use formula (1) in step 205 above to calculate the fused coding features.
  • Step 408 The server decodes the fused coding features to obtain a target face-swapping image including a fused face.
  • the fused face is a fusion of the face in the face-swapping image and the target face.
  • the implementation of the server performing the above steps 403 to 408 to obtain the target face-swapping image is similar to the implementation of using the computer device to perform the above-mentioned steps 201 to 206 to obtain the sample face-swapping image, which will not be described again here.
  • Step 409 The server returns the target face-changing image to the terminal.
  • the server can return to the terminal the target face-swapping image corresponding to the single image to be changed.
  • the server can generate a target corresponding to the image frame to be changed through the above steps 403 to 408 for each image frame in the video.
  • the server can return the face-swapping video corresponding to the video to the terminal.
  • the face-swapping video includes the target face-swapping image corresponding to each image frame.
  • Step 410 The terminal receives the target face-swapping image returned by the server, and displays the target face-swapping image.
  • the terminal can display the target face-changing image in the application page.
  • the terminal can also play each target face-changing image in the face-changing video on the application page.
  • the image processing method provided by the embodiment of the present application obtains the attribute parameters of the image to be replaced, and the attribute parameters are used to indicate the three-dimensional attributes of the face in the image, and based on the attribute parameters of the image to be replaced and the attribute parameters of the target face, Determine the target attribute parameters to locate the three-dimensional attribute characteristics of the face in the desired generated image; and, based on the target attribute parameters and the facial features of the target face, obtain a target that can comprehensively characterize the face image to be replaced and the target face.
  • the decoded image can be refined to each pixel point to show the target comprehensive features, making the sense of the fused face in the decoded image closer to the target. Face, improves the sensory similarity between the fused face and the target face, thereby improving the accuracy of face replacement.
  • FIG. 5 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • the device includes: an attribute parameter acquisition module 501, configured to receive a face-changing request, which is used to request that the face in the face-changing image be replaced with a target face; a target attribute parameter determination module 502. Configured to obtain the attribute parameters of the face-changing image, the attribute parameters of the target face, and the facial features of the target face.
  • the attribute parameters of the face-changing image indicate the face-changing image.
  • the comprehensive feature determination module 503 is configured to based on the target attribute parameters and the target face facial features to determine the target comprehensive features;
  • the encoding module 504 is configured to encode the image to be changed to obtain the image encoding features of the image to be changed;
  • the migration module 505 is configured to use regularization, Migrate the target comprehensive features to the image coding features of the image to be changed to obtain fused coding features;
  • the decoding module 506 is configured to decode the fused coding features to obtain the target face changing including the fused face.
  • Image the fused face is a fusion of the face in the image to be changed and the target face.
  • the attribute parameters of the target face are shape coefficients
  • the attribute parameters of the face image to be swapped are preconfigured parameters
  • the target attribute parameter determination module is configured to combine the shape coefficient of the target face and the face to be swapped
  • the preconfigured parameters of the image are determined as the target attribute parameters, and the preconfigured parameters include at least one of an expression coefficient, an angle coefficient, a texture coefficient and a lighting coefficient.
  • the migration module is configured to: obtain the first mean and the first standard deviation of the image coding feature in at least one feature channel, and use a normal distribution that conforms to the first mean and the first standard deviation as the first feature. distribution, and obtain the second mean and second standard deviation of the target comprehensive feature in at least one feature channel, and use the normal distribution that conforms to the second mean and the second standard deviation as the second feature distribution; perform the image coding feature from the first The feature distribution is aligned to the second feature distribution to obtain fused coding features.
  • the target face-changing image is obtained by calling a trained face-changing model; the face-changing model is used to change the target face to any face based on the attribute data and facial features of the target face.
  • the device also includes a model training module, which includes: an acquisition unit configured to acquire facial features and attribute parameters of the first sample image, and acquire attribute parameters of the second sample image.
  • a sample image includes the target face, and the second sample image includes the face to be replaced; a sample attribute parameter determination unit configured to be based on the attribute parameters of the first sample image and the attribute parameters of the second sample image, Determine the sample attribute parameters, which are used to indicate the expected attributes of the face in the sample face-changing image to be generated; the sample comprehensive feature acquisition unit is configured to be based on the sample attribute parameters and the facial features of the first sample image , determine the comprehensive characteristics of the sample; the encoding unit is configured to encode the second sample image to obtain the sample encoding characteristics; the migration unit is configured to migrate the comprehensive characteristics of the sample to the sample encoding characteristics of the second sample image through regularization , the sample fusion features are obtained; the decoding unit is configured to decode the sample fusion features to obtain the sample face-changing image; the training unit is configured to based on the first difference between the sample face-changing image and the sample attribute parameters, the sample face-changing The second difference between the facial features of the face image and the facial features of the first sample image, and the third difference between
  • the training unit is further configured to: obtain the first similarity between the attribute parameters of the sample face-swapping image and the sample attribute parameters, and use the first similarity as the first difference; obtain the first similarity of the sample face-swapping image.
  • the second similarity between the facial features and the facial features of the first sample image is used as the second difference; the third similarity between the second sample image and the sample face-changing image is obtained, and The third degree of similarity serves as the third difference.
  • the training unit is further configured to: obtain a first-scale image of the second sample image in at least one scale, and a second-scale image of the sample face-changing image in at least one scale; and use the second sample image as Real image; obtain the discrimination probability corresponding to each first-scale image, and obtain the discrimination probability corresponding to each second-scale image.
  • the discrimination probability of the image is used to indicate the probability of judging the image as a real image, and the image is the first a scale image or a second scale image; determining a third degree of similarity based on a discrimination probability corresponding to each first scale image and at least one discrimination probability corresponding to each second scale image.
  • the acquisition unit is further configured to: acquire at least two posture images and use the at least two posture images as the first sample image, the at least two posture images include at least two types of the target face. Facial gestures; based on the at least two gesture images, obtain the facial features and attribute parameters corresponding to the at least two facial gestures; use the mean value of the facial features corresponding to the at least two facial gestures as the first For the facial features of this image, the mean value of the attribute parameters corresponding to the at least two facial postures is used as the attribute parameter of the first sample image;
  • the device further includes a storage unit configured to store facial features and attribute parameters of the first sample image.
  • the acquisition unit is further configured to: perform face recognition processing on at least two image frames included in the video of the target object, to obtain at least two image frames including the target face, and the target face is the face of the target object; perform face cropping processing on the at least two image frames to obtain the at least two posture images.
  • the image processing device obtains the attribute parameters of the face-to-be-swapped image, which are used to indicate the three-dimensional attributes of the face in the image, and based on the attribute parameters of the face-swapping image and the attributes of the target face Parameters, determine the target attribute parameters, thereby locating the three-dimensional attribute characteristics of the face in the expected generated image; and, based on the target attribute parameters and the facial features of the target face, obtain a comprehensive representation of the face image to be changed and the target face.
  • target comprehensive features of the target part and encode the face-to-be-swapped image to obtain the image coding features of the face-swapping image, thereby obtaining the pixel-level refined features of the face-to-be-swapped image through the image coding features; further
  • the target comprehensive features are transferred to the image coding features of the face-changing image through regularization to obtain fused coding features.
  • This application refines the mixture between pixel-level coding features and global comprehensive features, and aligns the feature distribution of the image coding features to the target comprehensive features, thereby improving the accuracy of the generated fused coding features; by
  • the fused coding features are decoded to obtain the target face-changing image including the fused face.
  • the decoded image can be refined to each pixel point to show the target comprehensive features, making the sense of the fused face in the decoded image closer to the target. Face, improves the sensory similarity between the fused face and the target face, thereby improving the accuracy of face replacement.
  • the device of the embodiment of the present application can execute the image processing method provided by the embodiment of the present application, and its implementation principle is similar.
  • the actions performed by each module in the image processing device of the embodiment of the present application are the same as those of the embodiments of the present application.
  • Figure 6 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
  • the computer device includes: a memory, a processor, and a computer program stored on the memory.
  • the processor executes the above computer program to implement the steps of the image processing method.
  • the image processing device obtains the attribute parameters of the face-to-be-swapped image, which are used to indicate the three-dimensional attributes of the face in the image, and based on the attribute parameters of the face-swapping image and the attributes of the target face Parameters, determine the target attribute parameters, thereby locating the three-dimensional attribute characteristics of the face in the expected generated image; and, based on the target attribute parameters and the facial features of the target face, obtain a comprehensive representation of the face image to be changed and the target face.
  • target comprehensive features of the target part and encode the face-to-be-swapped image to obtain the image coding features of the face-swapping image, thereby obtaining the pixel-level refined features of the face-to-be-swapped image through the image coding features; further
  • the target comprehensive features are transferred to the image coding features of the face-changing image through regularization to obtain fused coding features.
  • This application refines the mixture between pixel-level coding features and global comprehensive features, and aligns the feature distribution of the image coding features to the target comprehensive features, thereby improving the accuracy of the generated fused coding features; by
  • the fused coding features are decoded to obtain the target face-changing image including the fused face, so that the decoded image can be refined to each pixel point to show the target comprehensive features, making the sense of the fused face in the decoded image closer to The target face improves the sensory similarity between the fused face and the target face, thereby improving the accuracy of face replacement.
  • a computer device is provided, as shown in Figure 6.
  • the computer device 600 shown in Figure 6 includes: a processor 601 and a memory 603. Among them, the processor 601 and the memory 603 are connected, such as through a bus 602.
  • the computer device 600 may also include a transceiver 604, which may be used for data interaction between the computer device and other computer devices, such as data transmission and/or data reception. It should be noted that in practical applications, the number of transceivers 604 is not limited to one, and the structure of the computer device 600 does not limit the embodiments of the present application.
  • the processor 601 can be a central processing unit (CPU, Central Processing Unit), a general-purpose processor, a data signal processor (DSP, Digital Signal Processor), an application specific integrated circuit (ASIC, Application Specific Integrated Circuit), or a field programmable gate array ( FPGA, Field Programmable Gate Array) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure.
  • the processor 601 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.
  • Bus 602 may include a path that carries information between the above-mentioned components.
  • the bus 602 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus 602 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 6, but it does not mean that there is only one bus or one type of bus.
  • the memory 603 may be a read-only memory (ROM, Read Only Memory) or other types of static storage devices that can store static information and instructions, a random access memory (RAM, Random Access Memory) or other types that can store information and instructions.
  • Dynamic storage devices can also be electrically erasable programmable read-only memory (EEPROM, Electrically Erasable Programmable Read Only Memory), read-only discs (CD-ROM, Compact Disc Read Only Memory) or other optical disc storage, optical disc storage (including compression Optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media ⁇ other magnetic storage devices, or any other media that can be used to carry or store computer programs and can be read by a computer, without limitation here .
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • optical disc storage including compression Optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • the memory 603 is used to store computer programs for executing embodiments of the present application, and is controlled by the processor 601 for execution.
  • the processor 601 is used to execute the computer program stored in the memory 603 to implement the steps shown in the foregoing method embodiments.
  • Computer equipment includes but is not limited to: servers or cloud computing center equipment, etc.
  • Embodiments of the present application provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the steps and corresponding contents of the foregoing method embodiments can be implemented.
  • Embodiments of the present application also provide a computer program product, including a computer program.
  • a computer program When the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.
  • each operation step is indicated by arrows in the flow chart of the embodiment of the present application, the order of implementation of these steps is not limited to the order indicated by the arrows.
  • the implementation steps in each flowchart may be executed in other orders according to requirements.
  • some or all of the steps in each flowchart are based on actual implementation scenarios and may include multiple sub-steps or multiple stages. Some or all of these sub-steps or stages may be executed at the same time, and each of these sub-steps or stages may also be executed at different times. In scenarios with different execution times, the execution order of these sub-steps or stages can be flexibly configured according to needs, and the embodiments of the present application do not limit this.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例提供一种图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品,涉及人工智能、计算机视觉、地图、智慧交通领域。接收换脸请求;获取待换脸图像的属性参数、目标脸部的属性参数、目标脸部的脸部特征,待换脸图像的属性参数指示待换脸图像中脸部的三维属性;基于待换脸图像的属性参数和目标脸部的属性参数,确定目标属性参数;基于目标属性参数和目标脸部的脸部特征,确定目标综合特征;对待换脸图像进行编码处理,得到待换脸图像的图像编码特征;通过正则化方式,将目标综合特征迁移至待换脸图像的图像编码特征中,得到融合编码特征;对融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像。

Description

图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请基于申请号为202210334052.7、申请日为2022年03月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及人工智能、计算机视觉等技术领域,本申请涉及一种图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品。
背景技术
换脸是计算机视觉领域的一项重要技术,换脸被广泛用于内容生产、影视人像制作、娱乐视频制作等场景。针对给定图像A和图像B,换脸是指将图像A中的脸部特征迁移到图像B中得到换脸图像的过程。
相关技术中,通常基于形状拟合的方式实现换脸;例如,可以基于检测到的图像A中脸部关键点与图像B中脸部关键点,计算两个图像之间关于脸部五官、轮廓等区域的形状变化关系,根据形状变换关系将图像A和图像B中脸部进行融合,得到换脸图像。
上述形状拟合过程是通过对脸部形变、融合过程实现换脸。然而,当图像A和图像B的脸部之间姿态存在较大差异时,简单的形状拟合无法处理差异姿态较大的脸部,容易使得换脸图像中脸部形变不自然,也就是说换脸图像与图像A中脸部的相似度较低,导致换脸的精确度较低。
发明内容
本申请实施例提供了一种图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品,可以提高换脸前后的相似度,从而提高换脸精确度。
本申请实施例提供了一种图像处理方法,所述方法包括:
接收换脸请求,所述换脸请求用于请求将待换脸图像中脸部替换为目标脸部;
获取所述待换脸图像的属性参数、所述目标脸部的属性参数、所述目标脸部的脸部特征,所述待换脸图像的属性参数指示所述待换脸图像中脸部的三维属性;
基于所述待换脸图像的属性参数和所述目标脸部的属性参数,确定目标属性参数;
基于所述目标属性参数和所述目标脸部的脸部特征,确定目标综合特征;
对所述待换脸图像进行编码处理,得到所述待换脸图像的图像编码特征;
通过正则化方式,将所述目标综合特征迁移至所述待换脸图像的图像编码特征中,得到融合编码特征;
对所述融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像,所述融合脸部是所述待换脸图像中脸部和所述目标脸部的融合。
本申请实施例提供了一种图像处理装置,所述装置包括:
属性参数获取模块,配置为接收换脸请求,所述换脸请求用于请求将待换脸图像中脸部替换为目标脸部;
目标属性参数确定模块,配置为获取所述待换脸图像的属性参数、所述目标脸部的属性参数、所述目标脸部的脸部特征,所述待换脸图像的属性参数指示所述待换脸图像中脸部的三维属性;基于所述待换脸图像的属性参数和所述目标脸部的属性参数,确定目标属性参数;
综合特征确定模块,配置为基于所述目标属性参数和目标脸部的脸部特征,确定目标综合特征;
编码模块,配置为对所述待换脸图像进行编码处理,得到所述待换脸图像的图像编码特征;
迁移模块,配置为通过正则化方式,将所述目标综合特征迁移至所述待换脸图像的图像编码特征中,得到融合编码特征;
解码模块,配置为对所述融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像,所述融合脸部是所述待换脸图像中脸部和所述目标脸部的融合。
本申请实施例提供了一种计算机设备,包括存储器、处理器及存储在存储器上的计算机程序,所述处理器执行所述计算机程序以实现上述的图像处理方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的图像处理方法。
本申请实施例提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述的图像处理方法。
本申请实施例提供的技术方案带来的有益效果是:
本申请实施例提供基于待换脸图像的属性参数和目标脸部的属性参数,确定目标属性参数,从而定位出期望生成的图像中脸部的三维属性特征;基于该目标属性参数和目标脸部的脸部特征,得到了能够综合表征待换脸图像和目标脸部的目标综合特征;将该待换脸图像进行编码,得到该待换脸图像的图像编码特征,从而通过该图像编码特征得到该待换脸图像在像素级的细化特征;将该目标综合特征通过正则化方式迁移至该待换脸图像的图像编码特征中,得到融合编码特征,本申请实施例将细化至像素级的编码特征与全局的综合特征之间混合,且使得图像编码特征的特征向目标综合特征对齐,从而提高了所生成的融合编码特征的精确度;通过将该融合编码特征进行解码,得到包括融合脸部的目标换脸图像,解码的图像能够细化至各个像素点来展现目标综合特征,使得所解码图像中融合脸部的感官更接近于目标脸部,提高了融合脸部与目标脸部之间感官的相似度,从而提高换脸的精确度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。
图1为本申请实施例提供的一种实现图像处理方法的实施环境示意 图;
图2为本申请实施例提供的一种换脸模型的训练方法的流程示意图;
图3为本申请实施例提供的一种换脸模型的训练过程框架示意图;
图4为本申请实施例提供的一种图像处理方法的信令交互图;
图5为本申请实施例提供的一种图像处理装置的结构示意图;
图6为本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面结合本申请中的附图描述本申请的实施例。应理解,下面结合附图所阐述的实施方式,是用于解释本申请实施例的技术方案的示例性描述,对本申请实施例的技术方案不构成限制。
可以理解的是,在本申请的具体实施方式中,涉及到的脸部图像,如换脸模型训练时使用的第一样本图像、第二样本图像、姿态图像、目标对象的视频等任何与对象相关的数据,以及,利用换脸模型进行换脸时使用的待换脸图像、目标脸部的脸部特征、属性参数等任何与对象相关的数据,上述任何与对象相关的数据均为经过相关对象同意或许可之后获取的;当本申请以下实施例运用到具体产品或技术中时,需要获得对象许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。另外,采用本申请的图像处理方法对任一对象的脸部图像进行的换脸过程,均是基于相关对象所触发的换脸服务或换脸请求、经过相关对象许可或同意之后再执行的换脸过程。
下面对本申请涉及的技术术语进行介绍:
1)换脸:是利用一个脸部图像中的目标脸部替换另一个图像中的脸部。
2)换脸模型:调用换脸模型可以基于目标脸部的属性数据和脸部特征将目标脸部换至任一待换脸图像中;本申请实施例提供的图像处理方法可以是使用该换脸模型以将待换脸图像中脸部替换为专属的目标脸部。
3)待换脸图像:脸部需要被替换掉的图像,例如,可将目标脸部换到待换脸图像中脸部;需要说明的是,采用本申请实施例的图像处理方法 对待换脸图像进行换脸得到目标换脸图像,该目标换脸图像所包括的融合脸部是待换脸图像中脸部和目标脸部的融合,融合脸部与目标脸部的感官相似度更高,融合脸部还融合了待换脸图像中脸部的表情、角度等姿态,从而使得目标脸部图像更形象、逼真。
3)属性参数:图像的属性参数用于指示图像中脸部的三维属性,可以表示脸部在三维空间的姿态、空间环境等属性。
4)脸部特征:表征图像中脸部在二维平面的特征,例如,双眼之间的间距、鼻子大小;脸部特征可以表征具备该脸部特征的对象的身份。
5)目标脸部:用于替换图像中脸部的专属脸部,目标脸部可以是基于用户的选择操作指定的脸部;本申请实施例提供有以该目标脸部作为专属脸部的换脸服务,也即是,可以将专属的目标脸部换到任意待换脸图像;例如,目标脸部A可以替换掉图像B的脸部,目标图像A也可以替换掉图像C的脸部。
6)第一样本图像:该第一样本图像包括该目标脸部,是换脸模型训练时使用的图像。
7)第二样本图像:该第二样本图像包括待替换的脸部,是换脸模型训练时使用的图像。训练过程中,可以将第一样本图像中目标脸部作为专属的脸部,将第一样本图像中目标脸部换到第二样本图像中,基于此过程来训练得到换脸模型。
图1为本申请实施例提供的一种图像处理方法的实施环境示意图。如图1所示,该实施环境包括:服务器11和终端12。
该服务器11配置有训练得到的换脸模型,该服务器11可基于该换脸模型向终端12提供换脸功能。该换脸服务是指基于目标脸部对待换脸图像中脸部进行换脸,使得生成的目标脸部图像中融合脸部可融合该图像中原有脸部和目标脸部。在一些实施例中,该终端12可以向该服务器11发送换脸请求,该换脸请求可以携带待换脸图像,该服务器11可以基于该换脸请求,执行本申请的图像处理方法以生成目标换脸图像,并向该终端12返回该目标换脸图像。在一些实施例中,该服务器11可以为应用程序 的后台服务器。该终端12安装有应用程序,该终端12和该服务器11可以基于该应用程序进行数据交互,以实现换脸过程。该应用程序可以配置有换脸功能。该应用程序为任一支持换脸功能的应用,例如,该应用程序包括但不限于:视频剪辑应用、图像处理工具、视频应用、直播应用、社交应用、内容交互平台、游戏应用等等。
服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN,Content Delivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器或服务器集群。上述网络可以包括但不限于:有线网络,无线网络,其中,该有线网络包括:局域网、城域网和广域网,该无线网络包括:蓝牙、Wi-Fi及其他实现无线通信的网络。终端可以是智能手机(如Android手机、iOS手机等)、平板电脑、笔记本电脑、数字广播接收器、移动互联网设备(MID,Mobile Internet Devices)、个人数字助理、台式计算机、车载终端(例如车载导航终端、车载电脑等)、智能家电、飞行器、智能音箱、智能手表等,终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,但并不局限于此。
本申请实施例提供的图像处理方法,涉及以下人工智能、计算机视觉等技术,例如,利用人工智能技术中的云计算、大数据处理等技术,实现第一样本图像中属性参数提取、换脸模型训练等过程。例如,利用计算机视觉技术,对视频中图像帧进行脸部识别,以裁剪出包括目标脸部的第一样本图像。
人工智能(AI,Artificial Intelligence)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习、自动驾驶、智慧交通等几大方向。
应理解,计算机视觉技术(CV,Computer Vision)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建、自动驾驶、智慧交通等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
为使本申请实施例所解决的技术问题目的、所实施的技术方案和所达到的技术效果更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图2为本申请实施例提供的一种换脸模块训练方法的流程示意图。该方法的执行主体可以为计算机设备(例如图1所示的服务器11)。如图2所示,该方法包括以下步骤201至步骤208。
步骤201、计算机设备获取第一样本图像的脸部特征和属性参数,并获取第二样本图像的属性参数。
第一样本图像包括目标脸部,第二样本图像包括待替换的脸部。计算机设备可以收集包括任意脸部的数据作为第二样本图像,并收集包括目标脸部的多种姿态角度的图像作为第一样本图像。计算机设备可以通过脸部 参数估计模型,获取第一样本图像的属性参数、以及第二样本图像的属性参数。计算机设备可以通过脸部识别模型获取第一样本图像的脸部特征。
脸部参数估计模型用于基于所输入的二维的脸部图像估计该脸部的三维的属性参数。脸部参数估计模型可以是卷积神经网络结构的模型,例如,脸部参数估计模型可以是三维可变形人脸模型(3DMM,3D Morphable models),本申请实施例可以通过3DMM中的残差网络(ResNet,Residual Network)部分回归出所输入的二维脸部图像的三维的属性参数。脸部参数估计模型也可以是具备提取二维图像中脸部的三维属性参数的功能的任意其它模型,此处仅以3DMM模型为例说明。
属性参数用于指示图像中脸部的三维属性,可以表征脸部在三维空间的姿态、空间环境等属性;属性参数包括但不限于:形状系数(id_coeff)、表情系数(expression_coeff)、纹理系数(texture_coeff)、角度系数(angles_coeff)、光照系数(gamma_coeff)等。形状系数表示脸部形状、脸部五官的形状等;角度系数表示脸部的俯仰角、左右偏转角等角度;纹理系数可以表示脸部的皮肤、毛发等情况;光照系数可以表示图像中脸部所处的周围环境的光照情况。
本申请实施例提供的计算机设备可以提取形状系数、表情系数、纹理系数、角度系数、光照系数中指定的其中一项或几项,作为各个样本图像的属性参数,也可以提取全部项作为对应样本图像的属性参数。相应的,第一样本图像和第二样本图像的属性参数的获取方式可以包括以下三种方式。
方式1、计算机设备提取第一样本图像中目标脸部的形状系数作为第一样本图像的属性参数,并且计算机设备提取第二样本图像中表情系数和角度系数作为第二样本图像的属性参数。
在方式1中,第一样本图像的属性参数包括第一样本图像中目标脸部的形状系数。第二样本图像的属性参数包括第二样本图像中脸部的表情系数和角度系数。通过获取第一样本图像的形状系数以及第二样本图像的表情系数和角度系数,以便后续利用目标脸部的形状特征和待被替换的脸部 的表情、角度等特征进行融合,使得融合得到的样本换脸图像中脸部能够具备目标脸部的五官形状、以及待被替换的脸部的表情、角度等,从而提高了所融合的脸部与目标脸部之间在五官形状上的相似度。
方式2、对于第二样本图像,计算机设备可以获取第二样本图像的预配置参数作为第二样本图像的属性参数。对于第一样本图像,计算机设备提取第一样本图像中目标脸部的形状系数作为第一样本图像的属性参数。
在方式2中,计算机设备基于需要来配置第二样本图像中属性参数可包括哪几项,第二样本图像的属性参数可以包括预配置参数。例如,预配置参数可以包括表情系数、纹理系数、角度系数、光照系数中的至少一项。预配置参数是基于需要进行预先配置的参数,例如,通过包括光照系数、表情系数的预配置参数,使得最终融合得到的脸部具备待被替换的脸部的周围环境的光照、表情等特征;也可以配置预配置参数包括纹理系数、角度系数等,此处不再一一赘述。
方式3、计算机设备也可以提取第一样本图像和第二样本图像的多项参数作为对应的属性参数,后续步骤中可以从多项参数中进一步提取所需参数。
作为示例,第一样本图像的属性参数可以包括第一样本图像中目标脸部的形状系数、表情系数、纹理系数、角度系数、光照系数等五项参数。例如,属性参数可以采用向量表示,当第一样本图像的属性参数包括上述五项参数时,第一样本图像的属性参数可以表示为257维的特征向量。第二样本图像的属性参数也可以包括第二样本图像的形状系数、表情系数、纹理系数、角度系数、光照系数等五项参数,相应的,第二样本图像的属性参数也可以表示为257维的特征向量。
在一些实施例中,计算机设备可以获取目标脸部在多种姿态角度的姿态图像,基于多张姿态图像提取第一样本图像的脸部特征和属性参数。计算机设备获取第一样本图像的脸部特征和属性参数的过程可以通过以下技术方案实现:计算机设备获取至少两个姿态图像作为第一样本图像,至少两个姿态图像包括目标脸部的至少两种脸部姿态;计算机设备基于至少 两个姿态图像,获取至少两种脸部姿态对应的脸部特征和属性参数;计算机设备将至少两种脸部姿态对应的脸部特征的均值,作为第一样本图像的脸部特征,将至少两种脸部姿态对应的属性参数的均值,作为第一样本图像的属性参数。计算机设备可以调用脸部参数估计模型来提取至少两个姿态图像中每个姿态图像的属性参数,并计算至少两个姿态图像的属性参数的均值,将至少两个姿态图像的属性参数的均值作为第一样本图像的属性参数。计算机设备可以调用脸部识别模型提取至少两个姿态图像中每个姿态图像在二维平面的脸部特征,并计算至少两个姿态图像的脸部特征的均值,将至少两个姿态图像的脸部特征的均值作为第一样本图像的脸部特征;例如,第一样本图像的脸部特征可以为512维的特征向量。脸部特征表征了目标对象的身份,目标脸部是目标对象的脸部。
在一些实施例中,计算机设备可以从视频中提取包括目标脸部的多个姿态图像。计算机设备获取至少两个姿态图像作为第一样本图像可以通过以下技术方案实现:计算机设备对目标对象的视频所包括的至少两个图像帧进行脸部识别处理,得到包括目标脸部的至少两个图像帧,目标对象的脸部为目标脸部;计算机设备对至少两个图像帧进行脸部裁剪处理,得到至少两个姿态图像,并将至少两个姿态图像作为第一样本图像。脸部姿态可以包括但不限于:脸部的表情、角度、脸部五官的形状、动作、脸部所戴的眼镜、脸部妆容等任一属性,计算机设备可以通过脸部姿态中任一属性进行姿态区分;例如,微笑表情的脸部和愤怒表情的脸部可以作为两种姿态的脸部;戴眼镜的脸部和不戴眼镜的脸部也可作为两种姿态的脸部;目标脸部俯仰角为向上偏45°、闭眼的脸部和向下偏30°、睁眼的脸部也可以作为两种姿态的脸部。计算机设备也可以获取目标脸部的多个各自独立的静止图像,从多个各自独立的静止图像中提取多个姿态图像。计算机设备也可以对多个静止图像进行脸部裁剪处理,得到至少两个姿态图像,并将至少两个姿态图像作为第一样本图像。
在一些实施例中,计算机设备可以通过以下技术方案对图像帧进行脸部裁剪得到姿态图像。首先,计算机设备对图像帧进行脸部检测处理,得 到图像帧的脸部坐标框。具体而言,通过脸部坐标框圈出了图像帧中目标脸部所在的脸部区域。接着,计算机设备根据图像帧的脸部坐标框,对图像帧进行脸部配准处理,得到图像帧中的目标脸部关键点,具体而言,目标脸部关键点可以包括图像帧中目标脸部的五官关键点、脸部轮廓关键点,还可以包括头发关键点等。计算机设备可以通过目标检测网络实现,例如,YOLO网络等等,对图像帧进行关键点检测处理,目标检测网络的输入信息为脸部图像和脸部图像在图像帧中的脸部坐标框,输出的信息为包括目标脸部关键点的脸部关键点坐标序列,脸部关键点坐标序列所包括的关键点数量可以基于对脸部细节的不同需求预先配置,例如,脸部关键点坐标序列所包括的关键点数量可以为5点、68点、90点等固定值。最终,计算机设备基于目标脸部关键点,对图像帧进行脸部裁剪处理,得到姿态图像,按照脸部关键点坐标序列所表征的顺序,对目标脸部关键点进行连接处理,将连接得到的封闭图形作为姿态图像。
在一些实施例中,对于第二样本图像的获取过程与获取第一样本图像的过程类似。例如,计算机设备可以获取包括任意对象的对象图像,并对对象图像进行脸部裁剪处理,得到包括对象的脸部的图像,并将包括对象的脸部的图像作为第二样本图像。脸部裁剪方式与对图像帧进行脸部裁剪得到姿态图像的技术方案类似,此处不再一一赘述。此外,计算机设备可以调用脸部参数估计模型提取第二样本图像的属性参数。
在一些实施例中,计算机设备可以存储第一样本图像的脸部特征和属性参数;具体而言,计算机设备将第一样本图像的脸部特征和属性参数存储至目标地址,目标地址是预先配置的存储地址。通过固定存储目标脸部的脸部特征和属性参数,可以方便后续使用时直接从目标地址中进行数据提取;例如,在使用已训练好的换脸模型对外提供专属的换脸服务时,通过固定存储的方式,使得计算机设备可直接提取已存储的目标脸部的脸部特征和属性参数,实现将专属的目标脸部换至任一脸部图像中的专属换脸过程;又例如,在迭代训练阶段可以直接从目标地址中提取目标脸部的脸部特征和属性参数进行训练。
步骤202、计算机设备基于第一样本图像的属性参数和第二样本图像的属性参数,确定样本属性参数。
样本属性参数用于指示待生成的样本换脸图像中脸部的期望属性。
对应于步骤201的方式1,计算机设备可以将第一样本图像的形状系数和第二样本图像的表情系数、角度系数,确定为样本属性参数。
对应于步骤201的方式2、方式3,计算机设备可以基于需要选取第一样本图像和第二样本图像的各项属性参数,作为样本属性参数。步骤202可以通过以下技术方案实现:计算机设备将第一样本图像的形状系数和第二样本图像的预配置参数,确定为目标属性参数,第二样本图像的预配置参数包括表情系数、角度系数、纹理系数和光照系数中的至少一项。对应于步骤201的方式2,预配置参数可以是步骤201的方式2中已获取的预配置参数,本步骤中,计算机设备可以直接获取第二样本图像的预配置参数。对应于步骤201的方式3,预配置参数也可以是从包括五项系数的属性参数中提取的预配置参数;本步骤中,计算机设备可以按照预配置的参数标识,从第二样本图像中提取预配置的参数标识所对应的预配置参数。例如,预配置的参数标识可以包括表情系数、角度系数、纹理系数和光照系数中至少一项参数的参数标识。例如,预配置参数可以包括表情系数和角度,也即是,针对待生成的样本换脸图像中的脸部,期望具有目标脸部的脸部、五官等的形状、以及第二样本图像中脸部的表情、角度等;计算机设备可以将目标脸部的形状系数、以及第二样本图像的表情系数和角度,确定为目标属性参数。又例如,预配置参数也可以包括纹理系数和光照系数,也即是,样本换脸图像中脸部,期望具有目标脸部的形状、以及第二样本图像中脸部的纹理系数、光照系数等;计算机设备也可以将目标脸部的形状系数、以及第二样本图像的纹理系数和光照系数,确定为样本属性参数。
步骤203、计算机设备基于样本属性参数和第一样本图像的脸部特征,确定样本综合特征。
计算机设备可以将样本属性参数和第一样本图像的脸部特征进行拼接,将拼接得到的拼接特征作为样本综合特征。样本综合特征可以表征期望生成的样本脸部特征中脸部的综合特征。例如,样本属性参数、脸部特征可以表示为特征向量的形式,计算机设备可以将样本属性参数对应的第一特征向量、与脸部特征对应的第二特征向量进行拼接操作,得到样本综合特征对应的第三特征向量。
步骤204、计算机设备对第二样本图像进行编码处理,得到样本编码特征。
计算机设备将第二样本图像输入初始化的换脸模型的编码器,通过编码器对第二样本图像进行编码处理,得到第二样本图像对应的编码向量,将编码向量作为样本编码特征。通过对第二样本图像进行编码得到样本编码特征,从而精准的细化出第二样本图像所包括的各个像素点的像素级信息。
编码器包括多个级联的卷积层,通过多个级联的卷积层对第二样本图像进行卷积处理,每个卷积层将卷积处理的结果输入至下一个卷积层继续进行卷积处理,最后一个卷积层的输出是样本编码特征。
步骤205、计算机设备通过正则化方式将样本综合特征迁移至第二样本图像的样本编码特征中,得到样本融合特征。
计算机设备可以采用步骤205,实现对样本综合特征和样本编码特征的融合。计算机设备可以采用正则化方式,对样本编码特征进行从将样本综合特征的第三特征分布与第二样本图像的第四特征分布进行对齐,得到样本融合特征。在一些实施例中,特征分布可以包括均值和标准差。相应的,步骤205可以通过以下技术方案实现:计算机设备获取样本编码特征在至少一个特征通道的第三均值和第三标准差,将符合第三均值和第三标准差的正态分布作为第三特征分布,以及获取样本综合特征在至少一个特征通道的第四均值和第四标准差,将符合第四均值和第四标准差的正态分布作为第四特征分布;计算机设备将样本编码特征在每个特征通道的均值和标准差(第三特征分布),与样本综合特征在对应特征通道的均值和标 准差(第四特征分布)进行对齐处理,得到样本融合特征。计算机设备可以将样本编码特征的每个特征通道进行归一化,并将归一化后的样本编码特征的均值、标准差与样本综合特征的均值、标准差进行对齐,生成样本融合特征。
作为示例,计算机设备可以基于样本编码特征、样本综合特征,通过以下公式(1)实现上述从第三特征分布到第四特征分布的对齐处理,计算得到样本融合特征。
Figure PCTCN2022111774-appb-000001
其中,x表示样本编码特征,y表示样本综合特征,σ(x)、μ(x)分别表示样本编码特征的均值和标准差,σ(y)、μ(y)分别表示样本综合特征的均值和标准差。其中,利用自适应实例正则化方式即为采用自适应实例正则化(AdaIN,Adaptive Instance Normalization)算法,AdaIN(x,y)表示基于自适应实例正则化方式所生成的样本融合特征。
作为示例,除了上述自适应实例正则化方式之外,还可以采用实例正则化(IN,Instance Normalization)算法,对此不作限定。
步骤206、计算机设备对样本融合特征进行解码处理,得到样本换脸图像。
计算机设备将样本融合特征经过初始化的换脸模型中的解码器,通过解码器还原出样本融合特征对应的图像,计算机设备将解码器输出的图像作为样本换脸图像。解码器可以基于所注入的特征还原出所注入特征所对应的图像。计算机设备通过解码器对样本融合图像进行解码,得到样本换脸图像;例如,编码器可对所输入图像进行卷积操作,因此解码器在运行时可将按照编码器的运行原理进行反向操作,也即是反卷积操作,以还原出样本融合特征所对应的图像。例如,编码器可以为自编码器(AE,AutoEncoder),则解码器可以为自编码器对应的解码器。
解码器包括多个级联的卷积层,通过多个级联的卷积层对样本融合特征进行反卷积处理,每个卷积层将反卷积处理的结果输入至下一个卷积层继续进行反卷积处理,最后一个卷积层的输出是样本换脸图像。
通过上述步骤205,利用正则化的方式进行特征迁移,能够支持将样本综合特征迁移至任意图像的编码特征中,实现对样本综合特征和样本编码特征的混合;并且,样本编码特征是表征第二样本图像中各个像素点的特征,样本综合特征从全局角度综合了第一样本图像和第二样本图像的特征。因此,通过正则化的方式,实现将细化至像素级的编码特征和全局的综合特征之间的混合,且使得样本编码特征的特征分布向样本综合特征对齐,从而提高了所生成的样本融合特征的精确度;通过步骤206,使用样本融合特征进行解码出图像,使得解码的图像能够细化至各个像素点来展现样本综合特征,提高了解码图像中脸部与目标脸部之间感官的相似度,提高了换脸的精确度。
步骤207,基于所述样本换脸图像与所述样本属性参数之间的第一差异、所述样本换脸图像的脸部特征与所述第一样本图像的脸部特征之间的第二差异、所述样本换脸图像与所述第二样本图像之间的第三差异,确定所述初始化的换脸模型的总损失。
获取对应第一差异的第一权重、对应第二差异的第二权重、对应第三差异的第三权重,基于第一权重、第二权重以及第三权重对第一差异、第二差异以及第三差异进行加权平均处理,得到总损失,对应各个差异的权重可以是预先配置的数值。
在步骤208中,基于所述总损失对所述初始化的换脸模型进行训练直至符合目标条件,并将符合目标条件时所得到的模型作为所述换脸模型。
计算机设备可分别确定样本换脸图像与样本属性参数、第一样本图像的脸部特征、第二样本图像之间的多个相似度,基于多个相似度得到总损失。在一些实施例中,初始化的换脸模型可以包括判别器,计算机设备可以利用判别器判断样本换脸图像的真实性。计算机设备确定总损失的过程可以包括以下步骤:计算机设备获取样本换脸图像的属性参数和样本属性 参数之间的第一相似度,将第一相似度作为第一差异;计算机设备获取样本换脸图像的脸部特征和第一样本图像的脸部特征之间的第二相似度,将第二相似度作为第二差异;计算机设备通过初始化的换脸模型的判别器,获取第二样本图像和样本换脸图像之间的第三相似度将第三相似度作为第三差异;计算机设备基于第一相似度、第二相似度和第三相似度,确定总损失。
计算机设备可以提取样本换脸图像的属性参数,通过以下公式(2),确定样本换脸图像的属性参数和样本属性参数之间的第一相似度。
3d feature loss=abs(gt 3d feature–result 3d feature)  (2);
其中,3d feature loss表示第一相似度,第一相似度的值越小,说明样本换脸图像的属性参数与样本属性参数之间越接近。result 3d feature表示样本换脸图像的属性参数,gt 3d feature表示样本属性参数;abs表示取值为(gt 3d feature–result 3d feature)的绝对值。样本属性参数可以是目标脸部的形状系数以及第二样本图像的表情系数和角度,相应的,gt 3d feature可表示为以下公式(3):
gt 3d feature=source 3d feature id+target 3d feature expression+target 3d feature angles;    (3);
其中,source 3d feature id表示第一样本图像的形状像数,target 3d feature expression表示第二样本图像的表情系数,target 3d feature angles表示第二样本图像的角度。
计算机设备可以提取样本换脸图像的脸部特征,通过以下公式(4),确定样本换脸图像的脸部特征和第一样本图像的脸部特征之间的第二相似度。
id loss=1–cosine similarity(result id feature,Mean Source ID)   (4);
其中,id loss表示第二相似度,第二相似度的值越小,说明样本换脸图像的脸部特征与第一样本图像的脸部特征之间越接近。result id feature表示样本换脸图像的脸部特征,Mean Source ID表示第一样本图像的脸部特征。cosine similarity(result id feature,Mean Source ID)表示result id  feature和Mean Source ID之间的余弦相似度;其中,余弦相似度的确定方式可以如以下公式(5)所示的过程:
Figure PCTCN2022111774-appb-000002
其中,A、B可以分别表示样本换脸图像的脸部特征对应的特征向量、第一样本图像的脸部特征对应的特征向量;θ表示向量A、向量B这两个特征向量之间的夹角;A i表示样本换脸图像的脸部特征中第i个特征通道的分量;B i表示第一样本图像的脸部特征中第i个特征通道的分量。similarity以及cos(θ)表示余弦相似度。
计算机设备可以将第二样本图像作为真实图输入判别器,将样本换脸图像输入判别器;计算机设备通过判别器,分别获取第二样本图像在至少一个尺度的第三尺度图像,以及样本换脸图像在对应至少一个尺度的第四尺度图像;计算机设备获取与每个第三尺度图像对应的判别概率,并获取与每个第四尺度图像对应的判别概率,图像的判别概率用于指示将图像判断为真实图的概率,图像为第三尺度图像或第四尺度图像;计算机设备基于与每个第三尺度图像对应的判别概率、以及与每个第四尺度图像对应的至少一个判别概率,确定第三相似度。例如,初始化的换脸模型可以包括生成器和判别器,计算机设备获取判别器对应的判别损失值,以及获取生成器对应的生成损失值,基于生成损失值和判别损失值,确定第三相似度。生成器用于基于第二样本图像和第一样本图像生成样本换脸图像,例如,生成器可以包括上述步骤204至步骤206使用的编码器和解码器。第三相似度可以包括生成损失值和判别损失值;计算机设备可以采用样本换脸图像的判别概率表示生成损失值,例如,计算机设备基于样本换脸图像的判别概率,通过以下公式(6),计算得到生成损失值。
G loss=log(1–D(result))   (6);
其中,D(result)表示样本换脸图像的判别概率,样本换脸图像的判别概率是指样本换脸图像属于真实图的概率,G loss表示生成损失值。
生成器包括多个级联的卷积层,例如生成器可以是U型网络结构,通过U型网络对第二样本图像和第一样本图像进行下采样处理,再对下采样结果进行上采样处理,得到样本换脸图像,判别器也包括多个级联的卷积层,判别器是U型网络的下采样结构以及全连接层,U型网络的下采样结构对样本换脸图像进行卷积处理,再通过全连接层对卷积结果进行映射,得到样本换脸图像的判别概率。
判别器可以为多尺度判别器,计算机设备可以通过判别器对样本换脸图像进行尺度变换,得到多个尺度的第四尺度图像,例如得到样本换脸图像分别在第一尺度的第四尺度图像、在第二尺度的第四尺度图像以及在第三尺度的第四尺度图像;同理,计算机设备可以通过判别器获取第二样本图像的分别在第一尺度的第三尺度图像、在第二尺度的第三尺度图像以及在第三尺度的第三尺度图像。第一尺度、第二尺度以及第三尺度可以根据需要设置,例如第一尺度可以是样本换脸图像或第二样本图像的原始尺度,第二尺度可以是原始尺度的1/2,第三尺度可以是原始尺度的1/4。计算机设备可以通过多尺度判别器获取各个尺度的尺度图像对应的判别概率,并基于多个尺度的尺度图像的判别概率计算得到判别损失值。例如,计算机设备基于与每个第三尺度图像对应的判别概率、以及与每个第四尺度图像对应的至少一个判别概率,通过以下公式(7),获取判别损失值:
D loss=1/3*{–logD(template img)–log(1–D(result))–logD(template img1/2)–log(1–D(result1/2))–logD(template img1/4)–log(1–D(result 1/4))}    (7);
其中,D(template img)、D(template img1/2)、D(template img1/4)分别表示第二样本图像在原始尺度的第三尺度图像的判别概率、第二样本图像在1/2尺度的第三尺度图像的判别概率、第二样本图像在1/4尺度的第三尺度图像的判别概率;D(result)、D(result1/2)、D(result 1/4)分别表示样本换脸图像在原始尺度的第四尺度图像的判别概率、样本换脸图像在1/2尺度的第二尺度图像的判别概率、样本换脸图像在1/4尺度的 第二尺度图像的判别概率。本申请实施例中可将第二样本图像帧作为真实图。
计算机设备可以基于上述判别损失值和生成损失值,确定第三相似度,例如,第三相似度=G loss+D loss。其中,对于判别器,当生成损失值和判别损失值之间达到平衡时,可以认为判别器已达到训练停止的条件,无需再训练。
计算机设备可以基于上述的第一相似度、第二相似度和第三相似度,通过以下公式(8),确定总损失:
loss=id loss+3d feature loss+D loss+G loss    (8);
其中,loss表示总损失,3d feature loss表示第一相似度,id loss表示第二相似度,(D loss+G loss)表示第三相似度。
计算机设备可以基于以上步骤201至步骤206,对初始化的换脸模型进行迭代训练,并获取每次迭代训练对应的总损失,基于每次迭代训练的总损失对初始化的换脸模型的参数进行调整,例如,对初始化的换脸模型中编码器、解码器、判别器等包括的参数进行多次优化,直至总损失符合目标条件时,计算机设备停止训练,并将最后一次优化得到的模型作为换脸模型。目标条件可以是总损失的数值大小位于目标数值范围内,目标数值范围是根据多次实验预先设定的范围,例如,总损失处于不大于0.5的目标数值范围;或者,多次迭代训练的所消耗的时间超过最大时长等,最大时长是从训练到上线应用的要求时长的70%,例如,从训练到上线应用的要求时长为1个小时,多次迭代训练的所消耗的时间超过0.7个小时表征满足目标条件。
图3为本申请实施例提供的一种专属的换脸模型训练过程的框架示意图,如图3所示,计算机设备可以将对象A的脸部作为专属的目标脸部,获取对象A的脸部的多个姿态的脸部图像作为第一样本图像,并通过3D脸部参数估计模型提取第一样本图像的属性参数,通过脸部识别模型提取第一样本图像的脸部特征,以及通过3D脸部参数估计模型提取第二样本图像的属性参数。计算机设备将第一样本图像的脸部特征和形状系数、以 及第二样本图像的预配置参数(例如表情系数和角度系数)整合为样本属性参数。计算机设备可以将第二样本图像输入初始化的换脸模型,初始化的换脸模型可以包括编码器和解码器,计算机设备可以通过编码器对第二样本图像进行编码,得到第二样本图像的编码特征,例如,将第二样本图像编码为对应的特征向量。计算机设备基于样本属性参数和第二样本图像的编码特征得到样本融合特征,将样本融合特征注入初始化的换脸模型中的解码器,解码器可以基于所注入的特征还原出所注入特征所对应的图像。计算机设备通过解码器对样本融合图像进行解码,得到样本换脸图像;例如,编码器按照编码器的运行原理进行反卷积操作,以还原出样本融合特征所对应的图像。
计算机设备通过多尺度判别器获取第三相似度,以及基于提取的样本换脸图像的脸部特征和属性参数,获取第一相似度、第二相似度,基于第一相似度、第二相似度、第三相似度,计算总损失,以根据总损失优化模型参数;计算机设备并以上过程进行迭代训练,直至得到符合目标条件时停止训练,得到可将任意图像中脸部替换为专属的目标脸部的换脸模型。
图4为本申请实施例提供的一种图像处理方法的信令交互图。如图4所示,图像处理方法可以由服务器和终端交互实现。图像处理方法的交互过程可以参见步骤401至步骤410。
步骤401、终端显示目标应用的应用页面,应用页面包括目标触发控件,目标触发控件用于触发针对待换脸图像的换脸请求。
目标应用可以提供换脸功能,换脸功能可以是将待换脸图像中脸部换为专属的目标脸部的功能。目标应用的应用页面中可以提供有目标触发控件,终端可基于对象对目标触发控件的触发操作,向服务器发送换脸请求。例如,目标应用可以为图像处理应用、直播应用、拍照工具、视频剪辑应用等。服务器可以为目标应用的后台服务器,或者,服务器也可以用于提供换脸功能的任一计算机设备,例如,配置有换脸模型的云计算中心设备。
步骤402、终端响应于在应用页面中接收到针对目标触发控件的触发操作,获取待换脸图像,并基于待换脸图像向服务器发送换脸请求。
在一些实施例中,目标应用可以提供有针对单张图像的换脸功能,例如,目标应用可以为图像处理应用、直播应用、社交应用等,待换脸图像可以为终端从本地存储空间中获取的被选中的图像,或者,也可以为终端获取的实时对对象进行拍摄得到的图像。在一些实施例中,目标应用可以提供有针对一段视频所包括的每帧图像帧的脸部进行换脸的功能,例如,目标应用可以为视频剪辑应用、直播应用等。服务器可将视频中包括A对象的脸部的图像帧整体替换为目标脸部。待换脸图像可以包括视频中的每个图像帧,或者,终端可以对视频中每个图像帧进行初始的脸部检测,将视频中包括A对象的脸部的每个图像帧,作为待换脸图像。
步骤403、服务器接收终端发送的换脸请求。
步骤404、服务器获取待换脸图像的属性参数、目标脸部的属性参数、目标脸部的脸部特征,待换脸图像的属性参数指示待换脸图像中脸部的三维属性;基于待换脸图像的属性参数和目标脸部的属性参数,确定目标属性参数。
换脸请求用于请求将待换脸图像中脸部替换为目标脸部,待换脸图像的属性参数用于指示待换脸图像中脸部的三维属性;服务器可以通过3D脸部参数估计模型,获取待换脸图像的属性参数。图像的属性参数包括形状系数、表情系数、角度系数、纹理系数和光照系数中的至少一项,目标脸部的属性参数、目标脸部的脸部特征可以是预先存储的。
在一些实施例中,服务器可以将目标脸部的形状系数和待换脸图像的预配置参数,确定为目标属性参数,预配置参数包括表情系数、角度系数、纹理系数和光照系数中的至少一项。例如,预配置参数可以包括表情系数、角度系数。或者,预配置参数也可以包括纹理系数、光照系数等。
步骤405、服务器基于目标属性参数和目标脸部的脸部特征,确定目标综合特征。
服务器可以对目标属性参数和目标脸部的脸部特征进行拼接,得到目标综合特征。
需要说明的是,服务器可以配置有经过训练的换脸模型,服务器可以通过换脸模型,执行上述步骤404至步骤405的过程。换脸模型是基于上述步骤201至步骤208进行训练得到的。服务器在训练得到换脸模型时,便可将目标脸部的脸部特征和属性参数进行固定存储,例如,可存储至目标地址。当执行步骤404、405时,服务器可以从目标地址中提取目标脸部的属性参数,并执行步骤404,服务器从目标地址中提取目标脸部的脸部特征,并执行步骤405。服务器可以通过换脸模型执行以下步骤406至步骤408的过程。
步骤406、服务器对待换脸图像进行编码处理,得到待换脸图像的图像编码特征。
步骤407、通过正则化方式,将目标综合特征迁移至待换脸图像的图像编码特征中,得到融合编码特征。
在一种可能实现方式中,计算机设备可以将图像编码特征的均值、标准差与目标综合特征进行对齐。步骤407可以通过以下技术方案实现:服务器获取图像编码特征在至少一个特征通道的第一均值和第一标准差,将符合第一均值和第一标准差的正态分布作为第一特征分布,并获取目标综合特征在至少一个特征通道的第二均值和第二标准差,将符合第二均值和第二标准差的正态分布作为第二特征分布;服务器对图像编码特征进行从第一特征分布到第二特征分布的对齐处理,得到融合编码特征,具体而言服务器对图像编码特征进行映射处理使得图像编码特征在每个特征通道的均值和标准差,与目标综合特征在对应特征通道的均值和标准差对齐,得到融合编码特征。例如,服务器也可以采用上述步骤205中的公式(1),计算得到融合编码特征。
步骤408、服务器对融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像,融合脸部是待换脸图像中脸部和目标脸部的融合。
服务器执行上述步骤403至步骤408得到目标换脸图像的实现方式类似于通过计算机设备执行上述步骤201至步骤206得到样本换脸图像的实现方式,此处不再一一赘述。
步骤409、服务器向终端返回目标换脸图像。
当待换脸图像为单张图像时,服务器可以向终端返回单张待换脸图像对应的目标换脸图像。当待换脸图像为视频中所包括的多个图像帧时,服务器可以针对视频中每个待换脸的图像帧,通过上述步骤403至步骤408,生成待换脸的图像帧所对应的目标换脸图像,服务器可以向终端返回视频对应的换脸视频,换脸视频包括了每个图像帧所对应的目标换脸图像。
步骤410、终端接收服务器返回的目标换脸图像,并显示目标换脸图像。
终端可以在应用页面中显示目标换脸图像。或者,终端也可以在应用页面中播放换脸视频中每个目标换脸图像。
本申请实施例提供的图像处理方法,通过获取待换脸图像的属性参数,属性参数用于指示图像中脸部的三维属性,并基于待换脸图像的属性参数和目标脸部的属性参数,确定目标属性参数,从而定位出期望生成的图像中脸部的三维属性特征;以及,基于目标属性参数和目标脸部的脸部特征,得到了能够综合表征待换脸图像和目标脸部的目标综合特征;并将待换脸图像进行编码,得到待换脸图像的图像编码特征,从而通过图像编码特征得到待换脸图像在像素级的细化特征;进一步的将目标综合特征通过正则化方式、迁移至待换脸图像的图像编码特征中,得到融合编码特征。本申请将细化至像素级的编码特征和全局的综合特征之间的混合,且使得图像编码特征的特征分布向目标综合特征对齐,从而提高了所生成的融合编码特征的精确度;通过将融合编码特征进行解码,得到包括融合脸部的目标换脸图像,达到解码的图像能够细化至各个像素点来展现目标综合特征的效果,使得所解码图像中融合脸部的感官更接近于目标脸部,提高了融合脸部与目标脸部之间感官的相似度,从而提高换脸的精确度。
图5为本申请实施例提供的一种图像处理装置的结构示意图。如图5所示,该装置包括:属性参数获取模块501,配置为接收换脸请求,所述换脸请求用于请求将待换脸图像中脸部替换为目标脸部;目标属性参数确定模块502,配置为获取所述待换脸图像的属性参数、所述目标脸部的属 性参数、所述目标脸部的脸部特征,所述待换脸图像的属性参数指示所述待换脸图像中脸部的三维属性;基于所述待换脸图像的属性参数和所述目标脸部的属性参数,确定目标属性参数;综合特征确定模块503,配置为基于所述目标属性参数和目标脸部的脸部特征,确定目标综合特征;编码模块504,配置为对所述待换脸图像进行编码处理,得到所述待换脸图像的图像编码特征;迁移模块505,配置为通过正则化方式,将所述目标综合特征迁移至所述待换脸图像的图像编码特征中,得到融合编码特征;解码模块506,配置为对所述融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像,所述融合脸部是所述待换脸图像中脸部和所述目标脸部的融合。
在一些实施例中,目标脸部的属性参数是形状系数,待换脸图像的属性参数是预配置参数;该目标属性参数确定模块,配置为将该目标脸部的形状系数和该待换脸图像的预配置参数,确定为该目标属性参数,该预配置参数包括表情系数、角度系数、纹理系数和光照系数中的至少一项。
在一些实施例中,该迁移模块,配置为:获取图像编码特征在至少一个特征通道的第一均值和第一标准差,将符合第一均值和第一标准差的正态分布作为第一特征分布,以及获取目标综合特征在至少一个特征通道的第二均值和第二标准差,将符合第二均值和第二标准差的正态分布作为第二特征分布;对图像编码特征进行从第一特征分布到第二特征分布的对齐处理,得到融合编码特征。
在一些实施例中,该目标换脸图像是通过调用经过训练的换脸模型得到的;该换脸模型用于基于目标脸部的属性数据和脸部特征将该目标脸部换至任一脸部图像中;该装置还包括模型训练模块,该模型训练模块,包括:获取单元,配置为获取第一样本图像的脸部特征和属性参数,以及获取第二样本图像的属性参数,该第一样本图像包括该目标脸部,该第二样本图像包括待替换的脸部;样本属性参数确定单元,配置为基于该第一样本图像的属性参数和该第二样本图像的属性参数,确定样本属性参数,该样本属性参数用于指示待生成的样本换脸图像中脸部的期望属性;样本综 合特征获取单元,配置为基于该样本属性参数和该第一样本图像的脸部特征,确定样本综合特征;编码单元,配置为对第二样本图像进行编码处理,得到样本编码特征;迁移单元,配置为通过正则化方式将该样本综合特征迁移至该第二样本图像的样本编码特征中,得到样本融合特征;解码单元,配置为对该样本融合特征进行解码处理,得到样本换脸图像;训练单元,配置为基于样本换脸图像与样本属性参数之间的第一差异、样本换脸图像的脸部特征与第一样本图像的脸部特征之间的第二差异、样本换脸图像与第二样本图像之间的第三差异,确定初始化的换脸模型的总损失;基于总损失对初始化的换脸模型进行训练直至符合目标条件,并将符合目标条件时所得到的模型作为换脸模型。
在一些实施例中,该训练单元,还配置为:获取样本换脸图像的属性参数和样本属性参数之间的第一相似度,将第一相似度作为第一差异;获取样本换脸图像的脸部特征和第一样本图像的脸部特征之间的第二相似度,将第二相似度作为第二差异;获取第二样本图像和样本换脸图像之间的第三相似度,将第三相似度作为第三差异。
在一些实施例中,该训练单元,还配置为:获取第二样本图像在至少一个尺度的第一尺度图像,以及样本换脸图像在至少一个尺度的第二尺度图像;将第二样本图像作为真实图;获取与每个第一尺度图像对应的判别概率,并获取与每个第二尺度图像对应的判别概率,图像的判别概率用于指示将图像判断为真实图的概率,图像为第一尺度图像或第二尺度图像;基于与每个第一尺度图像对应的判别概率、以及与每个第二尺度图像对应的至少一个判别概率,确定第三相似度。
在一些实施例中,获取单元,还配置为:获取至少两个姿态图像,并将至少两个姿态图像作为该第一样本图像,该至少两个姿态图像包括该目标脸部的至少两种脸部姿态;基于该至少两个姿态图像,获取该至少两种脸部姿态对应的脸部特征和属性参数;将该至少两种脸部姿态对应的脸部特征的均值,作为该第一样本图像的脸部特征,将该至少两种脸部姿态对应的属性参数的均值,作为该第一样本图像的属性参数;
相应的,该装置还包括存储单元,该存储单元配置为存储该第一样本图像的脸部特征和属性参数。
在一些实施例中,该获取单元,还配置为:对目标对象的视频所包括的至少两个图像帧进行脸部识别处理,得到包括该目标脸部的至少两个图像帧,该目标脸部是该目标对象的脸部;对该至少两个图像帧进行脸部裁剪处理,得到该至少两个姿态图像。
本申请实施例提供的图像处理装置,通过获取待换脸图像的属性参数,该属性参数用于指示图像中脸部的三维属性,并基于该待换脸图像的属性参数和目标脸部的属性参数,确定目标属性参数,从而定位出期望生成的图像中脸部的三维属性特征;以及,基于该目标属性参数和目标脸部的脸部特征,得到了能够综合表征待换脸图像和目标脸部的目标综合特征;并将该待换脸图像进行编码,得到该待换脸图像的图像编码特征,从而通过该图像编码特征得到该待换脸图像在像素级的细化特征;进一步的将该目标综合特征通过正则化方式、迁移至该待换脸图像的图像编码特征中,得到融合编码特征。本申请将细化至像素级的编码特征和全局的综合特征之间的混合,且使得图像编码特征的特征分布向目标综合特征对齐,从而提高了所生成的融合编码特征的精确度;通过将该融合编码特征进行解码,得到包括融合脸部的目标换脸图像,达到解码的图像能够细化至各个像素点来展现目标综合特征的效果使得所解码图像中融合脸部的感官更接近于目标脸部,提高了融合脸部与目标脸部之间感官的相似度,从而提高换脸的精确度。
本申请实施例的装置可执行本申请实施例所提供的图像处理方法,其实现原理相类似,本申请各实施例的图像处理装置中的各模块所执行的动作是与本申请各实施例的图像处理方法中的步骤相对应的,对于装置的各模块的详细功能描述具体可以参见前文中所示的对应图像处理方法中的描述,此处不再赘述。
图6是本申请实施例中提供了一种计算机设备的结构示意图。如图6所示,该计算机设备包括:存储器、处理器及存储在存储器上的计算机程序, 该处理器执行上述计算机程序以实现图像处理方法的步骤。
本申请实施例提供的图像处理装置,通过获取待换脸图像的属性参数,该属性参数用于指示图像中脸部的三维属性,并基于该待换脸图像的属性参数和目标脸部的属性参数,确定目标属性参数,从而定位出期望生成的图像中脸部的三维属性特征;以及,基于该目标属性参数和目标脸部的脸部特征,得到了能够综合表征待换脸图像和目标脸部的目标综合特征;并将该待换脸图像进行编码,得到该待换脸图像的图像编码特征,从而通过该图像编码特征得到该待换脸图像在像素级的细化特征;进一步的将该目标综合特征通过正则化方式、迁移至该待换脸图像的图像编码特征中,得到融合编码特征。本申请将细化至像素级的编码特征和全局的综合特征之间的混合,且使得图像编码特征的特征分布向目标综合特征对齐,从而提高了所生成的融合编码特征的精确度;通过将该融合编码特征进行解码,得到包括融合脸部的目标换脸图像,达到解码的图像能够细化至各个像素点来展现目标综合特征的效果,使得所解码图像中融合脸部的感官更接近于目标脸部,提高了融合脸部与目标脸部之间感官的相似度,从而提高换脸的精确度。
在一个可选实施例中提供了一种计算机设备,如图6所示,图6所示的计算机设备600包括:处理器601和存储器603。其中,处理器601和存储器603相连,如通过总线602相连。可选地,计算机设备600还可以包括收发器604,收发器604可以用于该计算机设备与其他计算机设备之间的数据交互,如数据的发送和/或数据的接收等。需要说明的是,实际应用中收发器604不限于一个,该计算机设备600的结构并不构成对本申请实施例的限定。
处理器601可以是中央处理器(CPU,Central Processing Unit),通用处理器,数据信号处理器(DSP,Digital Signal Processor),专用集成电路(ASIC,Application Specific Integrated Circuit),现场可编程门阵列(FPGA,Field Programmable Gate Array)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请 公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器601也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
总线602可包括一通路,在上述组件之间传送信息。总线602可以是外设部件互连标准(PCI,Peripheral Component Interconnect)总线或扩展工业标准结构(EISA,Extended Industry Standard Architecture)总线等。总线602可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器603可以是只读存储器(ROM,Read Only Memory)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(RAM,Random Access Memory)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(EEPROM,Electrically Erasable Programmable Read Only Memory)、只读光盘(CD-ROM,Compact Disc Read Only Memory)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质\其他磁存储设备、或者能够用于携带或存储计算机程序并能够由计算机读取的任何其他介质,在此不做限定。
存储器603用于存储执行本申请实施例的计算机程序,并由处理器601来控制执行。处理器601用于执行存储器603中存储的计算机程序,以实现前述方法实施例所示的步骤。
计算机设备包括但不限于:服务器或云计算中心设备等。
本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。
本申请实施例还提供了一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。本申请实施例所使用的 术语“包括”以及“包含”是指相应特征可以实现为所呈现的特征、信息、数据、步骤、操作,但不排除实现为本技术领域所支持其他特征、信息、数据、步骤、操作等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”、“1”、“2”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除图示或文字描述以外的顺序实施。
应该理解的是,虽然本申请实施例的流程图中通过箭头指示各个操作步骤,但是这些步骤的实施顺序并不受限于箭头所指示的顺序。除非本文中有明确的说明,否则在本申请实施例的一些实施场景中,各流程图中的实施步骤可以按照需求以其他的顺序执行。此外,各流程图中的部分或全部步骤基于实际的实施场景,可以包括多个子步骤或者多个阶段。这些子步骤或者阶段中的部分或全部可以在同一时刻被执行,这些子步骤或者阶段中的每个子步骤或者阶段也可以分别在不同的时刻被执行。在执行时刻不同的场景下,这些子步骤或者阶段的执行顺序可以根据需求灵活配置,本申请实施例对此不限制。
以上所述仅是本申请部分实施场景的可选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请的方案技术构思的前提下,采用基于本申请技术思想的其他类似实施手段,同样属于本申请实施例的保护范畴。

Claims (12)

  1. 一种图像处理方法,所述方法由计算机设备执行,所述方法包括:
    接收换脸请求,所述换脸请求用于请求将待换脸图像中脸部替换为目标脸部;
    获取所述待换脸图像的属性参数、所述目标脸部的属性参数、所述目标脸部的脸部特征,所述待换脸图像的属性参数指示所述待换脸图像中脸部的三维属性;
    基于所述待换脸图像的属性参数和所述目标脸部的属性参数,确定目标属性参数;
    基于所述目标属性参数和所述目标脸部的脸部特征,确定目标综合特征;
    对所述待换脸图像进行编码处理,得到所述待换脸图像的图像编码特征;
    通过正则化方式,将所述目标综合特征迁移至所述待换脸图像的图像编码特征中,得到融合编码特征;
    对所述融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像,所述融合脸部是所述待换脸图像中脸部和所述目标脸部的融合。
  2. 根据权利要求1所述的图像处理方法,其中,所述目标脸部的属性参数是形状系数,所述待换脸图像的属性参数是预配置参数;
    所述基于所述待换脸图像的属性参数和所述目标脸部的属性参数,确定目标属性参数,包括:
    将所述目标脸部的形状系数和所述待换脸图像的预配置参数,确定为所述目标属性参数,所述预配置参数包括表情系数、角度系数、纹理系数和光照系数中的至少一项。
  3. 根据权利要求1所述的图像处理方法,其中,所述通过正则化方式,将所述目标综合特征迁移至所述待换脸图像的图像编码特征中,得到融合编码特征,包括:
    获取所述图像编码特征在至少一个特征通道的第一均值和第一标准差,将符合所述第一均值和所述第一标准差的正态分布作为第一特征分布,以及获取所述目标综合特征在至少一个特征通道的第二均值和第二标准差,将符合所述第二均值和所述第二标准差的正态分布作为第二特征分布;
    对所述图像编码特征进行从所述第一特征分布到所述第二特征分布的对齐处理,得到所述融合编码特征。
  4. 根据权利要求1所述的图像处理方法,其中,所述目标换脸图像是通过调用经过训练的换脸模型得到的;所述换脸模型用于基于目标脸部的属性数据和脸部特征将所述目标脸部换至任一脸部图像中;
    所述方法还包括:
    获取第一样本图像的脸部特征和属性参数,以及获取第二样本图像的属性参数,所述第一样本图像包括所述目标脸部,所述第二样本图像包括待替换的脸部;
    基于所述第一样本图像的属性参数和所述第二样本图像的属性参数,确定样本属性参数,所述样本属性参数用于指示待生成的样本换脸图像中脸部的期望属性;
    通过初始化的换脸模型执行以下处理:
    基于所述样本属性参数和所述第一样本图像的脸部特征,确定样本综合特征;
    对所述第二样本图像进行编码处理,得到样本编码特征;
    通过正则化方式,将所述样本综合特征迁移至所述第二样本图像的样本编码特征中,得到样本融合特征;
    对所述样本融合特征进行解码处理,得到样本换脸图像;
    基于所述样本换脸图像与所述样本属性参数之间的第一差异、所述样本换脸图像的脸部特征与所述第一样本图像的脸部特征之间的第二差异、所述样本换脸图像与所述第二样本图像之间的第三差异,确定所述初始化的换脸模型的总损失;
    基于所述总损失对所述初始化的换脸模型进行训练直至符合目标条件,并将符合目标条件时所得到的模型作为所述换脸模型。
  5. 根据权利要求4所述的图像处理方法,其中,在基于所述样本换脸图像与所述样本属性参数之间的第一差异、所述样本换脸图像的脸部特征与所述第一样本图像的脸部特征之间的第二差异、所述样本换脸图像与所述第二样本图像之间的第三差异,确定所述初始化的换脸模型的总损失之前,所述方法还包括:
    获取所述样本换脸图像的属性参数和所述样本属性参数之间的第一相似度,将所述第一相似度作为所述第一差异;
    获取所述样本换脸图像的脸部特征和所述第一样本图像的脸部特征之间的第二相似度,将所述第二相似度作为所述第二差异;
    获取所述第二样本图像和所述样本换脸图像之间的第三相似度,将所述第三相似度作为所述第三差异。
  6. 根据权利要求5所述的图像处理方法,其中,所述获取所述第二样本图像和所述样本换脸图像之间的第三相似度,包括:
    获取所述第二样本图像在至少一个尺度的第一尺度图像,以及所述样本换脸图像在所述至少一个尺度的第二尺度图像;
    将所述第二样本图像作为真实图;
    获取与每个所述第一尺度图像对应的判别概率,并获取与每个所述第二尺度图像对应的判别概率,图像的判别概率用于指示将所述图像判断为所述真实图的概率,所述图像为所述第一尺度图像或所述第二尺度图像;
    基于与每个所述第一尺度图像对应的判别概率、以及与每个所述第二尺度图像对应的至少一个判别概率,确定所述第三相似度。
  7. 根据权利要求4所述的图像处理方法,其中,所述获取第一样本图像的脸部特征和属性参数,包括:
    获取至少两个姿态图像,并将至少两个姿态图像作为所述第一样本图像,所述至少两个姿态图像包括所述目标脸部的至少两种脸部姿态;
    基于所述至少两个姿态图像,获取所述至少两种脸部姿态对应的脸部特征和属性参数;
    将所述至少两种脸部姿态对应的脸部特征的均值,作为所述第一样本图像的脸部特征,将所述至少两种脸部姿态对应的属性参数的均值,作为所述第一样本图像的属性参数;
    相应的,所述获取第一样本图像的脸部特征和属性参数之后,所述方法还包括:
    存储所述第一样本图像的脸部特征和属性参数。
  8. 根据权利要求7所述的图像处理方法,其中,所述获取至少两个姿态图像,包括:
    对目标对象的视频所包括的至少两个图像帧进行脸部识别处理,得到包括所述目标脸部的至少两个图像帧,所述目标脸部是所述目标对象的脸部;
    对所述至少两个图像帧进行脸部裁剪处理,得到所述至少两个姿态图像。
  9. 一种图像处理装置,所述装置包括:
    属性参数获取模块,配置为接收换脸请求,所述换脸请求用于请求将待换脸图像中脸部替换为目标脸部;
    目标属性参数确定模块,配置为获取所述待换脸图像的属性参数、所述目标脸部的属性参数、所述目标脸部的脸部特征,所述待换脸图像的属性参数指示所述待换脸图像中脸部的三维属性;基于所述待换脸图像的属性参数和所述目标脸部的属性参数,确定目标属性参数;
    综合特征确定模块,配置为基于所述目标属性参数和目标脸部的脸部特征,确定目标综合特征;
    编码模块,配置为对所述待换脸图像进行编码处理,得到所述待换脸图像的图像编码特征;
    迁移模块,配置为通过正则化方式,将所述目标综合特征迁移至所述待换脸图像的图像编码特征中,得到融合编码特征;
    解码模块,配置为对所述融合编码特征进行解码处理,得到包括融合脸部的目标换脸图像,所述融合脸部是所述待换脸图像中脸部和所述目标脸部的融合。
  10. 一种计算机设备,包括存储器、处理器及存储在存储器上的计算机程序,所述处理器执行所述计算机程序以实现权利要求1至8任一项所述的图像处理方法。
  11. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至8任一项所述的图像处理方法。
  12. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至8任一项所述的图像处理方法。
PCT/CN2022/111774 2022-03-30 2022-08-11 图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 WO2023184817A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020227040870A KR20230141429A (ko) 2022-03-30 2022-08-11 이미지 프로세싱 방법 및 장치, 컴퓨터 디바이스, 컴퓨터-판독가능 저장 매체, 및 컴퓨터 프로그램 제품
JP2022565680A JP7479507B2 (ja) 2022-03-30 2022-08-11 画像処理方法及び装置、コンピューター機器、並びにコンピュータープログラム
US17/984,110 US20230316607A1 (en) 2022-03-30 2022-11-09 Image processing method and apparatus, computer device, computer-readable storage medium, and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210334052.7 2022-03-30
CN202210334052.7A CN114972010A (zh) 2022-03-30 2022-03-30 图像处理方法、装置、计算机设备、存储介质及程序产品

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/984,110 Continuation US20230316607A1 (en) 2022-03-30 2022-11-09 Image processing method and apparatus, computer device, computer-readable storage medium, and computer program product

Publications (1)

Publication Number Publication Date
WO2023184817A1 true WO2023184817A1 (zh) 2023-10-05

Family

ID=82976353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111774 WO2023184817A1 (zh) 2022-03-30 2022-08-11 图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品

Country Status (2)

Country Link
CN (1) CN114972010A (zh)
WO (1) WO2023184817A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540789A (zh) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 模型训练方法、面部表情迁移方法、装置、设备及介质
CN117808854A (zh) * 2024-02-29 2024-04-02 腾讯科技(深圳)有限公司 图像生成方法、模型训练方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080511A (zh) * 2019-11-18 2020-04-28 杭州时光坐标影视传媒股份有限公司 一种端到端的高分辨率多特征提取的人脸交换方法
US20200151424A1 (en) * 2018-11-09 2020-05-14 Sap Se Landmark-free face attribute prediction
CN113642491A (zh) * 2021-08-20 2021-11-12 北京百度网讯科技有限公司 人脸融合方法、人脸融合模型的训练方法及装置
CN113762022A (zh) * 2021-02-09 2021-12-07 北京沃东天骏信息技术有限公司 人脸图像的融合方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151424A1 (en) * 2018-11-09 2020-05-14 Sap Se Landmark-free face attribute prediction
CN111080511A (zh) * 2019-11-18 2020-04-28 杭州时光坐标影视传媒股份有限公司 一种端到端的高分辨率多特征提取的人脸交换方法
CN113762022A (zh) * 2021-02-09 2021-12-07 北京沃东天骏信息技术有限公司 人脸图像的融合方法和装置
CN113642491A (zh) * 2021-08-20 2021-11-12 北京百度网讯科技有限公司 人脸融合方法、人脸融合模型的训练方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540789A (zh) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 模型训练方法、面部表情迁移方法、装置、设备及介质
CN117540789B (zh) * 2024-01-09 2024-04-26 腾讯科技(深圳)有限公司 模型训练方法、面部表情迁移方法、装置、设备及介质
CN117808854A (zh) * 2024-02-29 2024-04-02 腾讯科技(深圳)有限公司 图像生成方法、模型训练方法、装置及电子设备
CN117808854B (zh) * 2024-02-29 2024-05-14 腾讯科技(深圳)有限公司 图像生成方法、模型训练方法、装置及电子设备

Also Published As

Publication number Publication date
CN114972010A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
US11232286B2 (en) Method and apparatus for generating face rotation image
CN112052839B (zh) 图像数据处理方法、装置、设备以及介质
JP7373554B2 (ja) クロスドメイン画像変換
WO2023184817A1 (zh) 图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品
US20220084163A1 (en) Target image generation method and apparatus, server, and storage medium
CN111401216B (zh) 图像处理、模型训练方法、装置、计算机设备和存储介质
WO2020103700A1 (zh) 一种基于微表情的图像识别方法、装置以及相关设备
CN111553267B (zh) 图像处理方法、图像处理模型训练方法及设备
CN109684969B (zh) 凝视位置估计方法、计算机设备及存储介质
CN110288513B (zh) 用于改变人脸属性的方法、装置、设备和存储介质
CN113850168A (zh) 人脸图片的融合方法、装置、设备及存储介质
US20230143452A1 (en) Method and apparatus for generating image, electronic device and storage medium
CN110728319B (zh) 一种图像生成方法、装置以及计算机存储介质
WO2023231182A1 (zh) 图像处理方法、装置、计算机设备、存储介质及程序产品
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
CN115359219A (zh) 虚拟世界的虚拟形象处理方法及装置
CN112036284B (zh) 图像处理方法、装置、设备及存储介质
JP2023131117A (ja) 結合感知モデルのトレーニング、結合感知方法、装置、機器および媒体
US20230316607A1 (en) Image processing method and apparatus, computer device, computer-readable storage medium, and computer program product
CN117115900B (zh) 一种图像分割方法、装置、设备及存储介质
CN113822114A (zh) 一种图像处理方法、相关设备及计算机可读存储介质
CN117011449A (zh) 三维面部模型的重构方法和装置、存储介质及电子设备
CN116883770A (zh) 深度估计模型的训练方法、装置、电子设备及存储介质
CN115708135A (zh) 人脸识别模型的处理方法、人脸识别方法及装置
CN114639132A (zh) 人脸识别场景下的特征提取模型处理方法、装置、设备

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022565680

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934637

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022934637

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022934637

Country of ref document: EP

Effective date: 20240328