US12277738B2 - Method and system for latent-space facial feature editing in deep learning based face swapping - Google Patents
Method and system for latent-space facial feature editing in deep learning based face swapping Download PDFInfo
- Publication number
- US12277738B2 US12277738B2 US17/707,782 US202217707782A US12277738B2 US 12277738 B2 US12277738 B2 US 12277738B2 US 202217707782 A US202217707782 A US 202217707782A US 12277738 B2 US12277738 B2 US 12277738B2
- Authority
- US
- United States
- Prior art keywords
- image
- latent space
- space point
- face
- output image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/24—Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- Face swapping is the process of replacing an actor's face in a plate with another person's face.
- face swapping is desirable for many creative goals, like replacing the face of a stunt double with that of the main actor or to achieve de-aging by swapping the face of a present-day actor with a younger looking face learned on archival footage.
- face-swapping techniques based on deep learning have become popular and are starting to see adoption for high quality visual effects production. These techniques typically employ encoder-decoder neural networks where the encoder ingests images of the actor to be replaced (e.g. the stunt double) and outputs a “latent space point” (a lower-dimensional abstract representation of that input data). An identity-specific decoder can then transform this latent space point back into an image in which the stunt double's face is replaced with the main actor's face.
- Embodiments set forth in the present disclosure are directed to methods and systems for performing face swapping.
- Embodiments of the present disclosure enable face swapping to be performed with a high degree of accuracy and can generate high resolution output images that are sufficient to use in the generation of film-production quality images and videos.
- a progressively trained, multi-way neural network is provided.
- the network can embed input faces in a shared latent space and can decode the embedded faces as an output face selected from any of the various different facial identity options supported by the network while maintaining the facial expression of the input face.
- the neural network instead of a single encoder that encodes the entire input image, the neural network includes multiple encoders that encode different parts of an input image into separate latent space vectors representative of each part. When concatenated together, the separate latent space vectors represent the entire image.
- embodiments can enable expressions of an output image generated by a decoder to be more faithful to the original expression in the input image in some instances than when a single encoder is employed to encode the entirety of the input image.
- a computer-implemented method of changing a face within an output image or video frame includes: receiving an input image that includes a face presenting a facial expression in a pose; separately encoding different portions of the image by, for each separately encoded portion, generating a latent space point of the portion, thereby generating a plurality of multi-dimensional vectors where each multi-dimensional vector is an encoded representation of a different portion of the input image; concatenating the plurality of multi-dimensional vectors into a combined latent space vector; and decoding the combined latent space vector to generate the output image in accordance with a desired facial identity but with the facial expression and pose of the face in the input image.
- Various implementations of the method can include one or more of the following features or additional steps. After receiving the input image and prior to separately encoding, identifying different features within the image that correlate to the different portions of the image. Normalizing the input image prior to the receiving step. Resizing the input image prior to the receiving step. For each of the different features identified, extracting, from the input image, an image segment that comprises the identified feature thereby generating a plurality of image segments. Each of the image segments can be a predetermined size. Incorporating the output image into one or more of a movie, a video, a video game or virtual or augmented reality content.
- the plurality of image segments can include: a first image segment that contains a portion of the input image with a left eye of the face, a second image segment that contains a portion of the input image with a right eye of the face, a third image segment that contains a portion of the input image with a mouth of the face, and a fourth image segment that contains a remaining portion of the input image not included in the first, second or third image segments.
- Each of the first, second, third and fourth image segments can comprise a predetermined size.
- Various implementations of the method can include one or more of the following features or additional steps. Repeating the steps of applying an adjustment vector to the latent space point corresponding to the initial output image to generate an adjusted latent space point and decoding the adjusted space point to generate an adjusted output image until the adjusted output image has the desired facial expression.
- the adjustment vector can be generated from a plurality of key poses from selected images having a facial expression with a selected trait.
- the adjustment vector can be generated from a plurality of key poses from selected images having a facial expression with a selected trait and the method can include calculating latent space points for the selected images, and generating the adjustment vectors by computing differences between an average of latent space points for the selected images and a neutral latent space point.
- the neural network is trained to be identity agnostic.
- Some embodiments pertain to a non-transitory computer-readable medium that stores instructions for performing any of the above methods. And, additional embodiments pertain to a computer system that includes one or more processors that execute such computer-readable instructions to perform any of the above methods.
- FIG. 3 is a simplified block diagram of a multi-encoder system according to some embodiments.
- FIG. 4 is a simplified flow diagram of steps associated with a method of changing a face appearing in an image according to some embodiments
- FIG. 8 is a simplified diagram depicting an example of a user interface according to some embodiments that can facilitate editing one or more features of a face generated by a deep learning system.
- FIG. 1 A is a simplified block diagram of a system 100 a according to some embodiments.
- system 100 a includes a machine learning server 110 , a data store 120 , and a computing device 140 in communication over a network 130 , which can be a wide area network (WAN) such as the Internet, a local area network (LAN), or any other suitable network.
- Machine learning server 110 can include a processor 112 , a system memory 114 and a model trainer 116 .
- the model trainer 116 executes on processor 112 and can be stored in system memory 114 .
- the processor 112 can receive user input from input devices (not shown), such as a keyboard or a mouse.
- System memory 114 can store content, such as software applications and data, for use by processor 112 and the GPU.
- the system memory 114 can be any type of memory capable of storing data and software applications, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash ROM), or any suitable combination of the foregoing or other suitable memory components.
- a computer-readable storage unit (not shown) can supplement or replace the computer-readable system memory 114 .
- the computer-readable storage unit can include any number and type of external memories that are accessible to the processor 112 and/or the GPU.
- the storage unit can include a Secure Digital Card, an external Flash memory, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing or other suitable storage devices.
- machine learning server 110 is illustrative and that variations and modifications are possible.
- the number of processors 112 , the number of GPUs, the number of system memories 114 , and the number of applications included in system memory 114 can vary or be modified as desired.
- the connection topology between the various units in FIG. 1 A can be modified as desired.
- any combination of processor 112 , system memory 114 , and a GPU can be replaced with any type of virtual computing system, distributed computing system, or cloud computing environment, such as a public, private, or a hybrid cloud.
- Encoding operation(s) Operation(s) performed by encoder 152 to encode an image into a latent space point (i.e., a representation of compressed data in which similar data points are closer together in space) are sometimes referred to herein as “encoding operation(s).” Operation(s) performed to generate an output image from a latent space using decoder 154 a are sometimes referred to herein as “decoding operation(s).”
- an image can be normalized in any technically-feasible manner, including using face alignment techniques to compute an affine transformation that rotates, scales, translates, etc. the image, and/or cropping the image.
- An affine transformation is a linear mapping that preserves points, straight lines, and planes.
- normalizing an image includes detecting a largest face in the image and determining the locations of facial landmarks using a modified Deep Alignment Network (DAN). In such cases, the image is then rotated and scaled so that the eyes of the largest face lie on a predefined horizontal line and have a predefined ocular distance. The image can then be cropped and resized to a predetermined size, e.g., 1024 ⁇ 1024 pixels.
- DAN Deep Alignment Network
- encoder 152 can perform an encoding operation that outputs an encoded representation of the normalized image, which is also referred to herein as a “latent space point” of the normalized image.
- the latent space point can be a most compressed version of the normalized image generated by encoder 152 .
- Encoder 152 can learn to generate such a latent space point during training and can also generate such a latent decoding from previously unseen data.
- Decoder 154 a can then take as input the latent space point output by encoder 152 and perform a decoding operation that outputs a 2D image including a face.
- Training data and/or trained machine learning models can be stored in the data store 120 and deployed in any suitable application, such as a face changing application 146 a .
- the training data includes videos in which multiple facial identities appear under similar environmental and lighting conditions.
- the environmental conditions can include the same setup, with the same background behind individuals who are recorded in the videos.
- frames in which faces are partially covered, or blurred due to motion can be removed.
- ML model 150 a can be trained using progressive training techniques that minimize reconstruction loss, as described in greater detail U.S. Pat. No. 10,902,571 referenced above.
- FIG. 1 B is a simplified block diagram of a system 100 b according to some additional embodiments.
- System 100 b is similar to system 100 a and includes many of the same elements as system 100 a .
- Like reference numbers in FIG. 1 B represent like elements in FIG. 1 a and thus descriptions of some such elements are not repeated herein for the sake of brevity.
- One difference between systems 100 a and 100 b is that system 100 b includes a machine learning (ML) model 150 b that is part of face changing application 146 b .
- ML model 150 b includes a single, large decoder 154 b , which in turn, can output any of the facial identities system 100 b can generate instead of having a separate decoder for each identity.
- ML model 150 b includes a single, large decoder 154 b , which in turn, can output any of the facial identities system 100 b can generate instead of having a separate decoder for each identity.
- AdaIN adaptive instance normalization
- convolution layers within decoder 154 so that the decoder 154 generates images including the given facial identity.
- AdaIN coefficients are coefficients that can be used to perform multiplications and/or additions on activations of convolution layers, which is similar to performing an affine transformation and can cause decoder 154 to generate images including different facial identities. Doing so essentially creates multiple “virtual” decoders, one for each of the different facial identities used to train ML model 150 b .
- a single dense layer may be used in lieu of a set of dense layers.
- FIG. 2 is a simplified block diagram of a machine language (ML) model 200 according to some embodiments.
- ML model 200 can be representative of ML model 150 a and ML model 150 b shown in FIGS. 1 A and 1 B , respectively, and can thus be part of face changing applications 146 a and 146 b .
- ML model 200 includes an encoder 220 and a decoder 240 .
- Decoder 240 can be implemented in a variety of different ways.
- decoder 240 can include multiple, separate decoders, each of which can generate images of a different facial identity as described above with respect to decoder 154 a .
- encoder 152 can output an encoded representation of the normalized image, which is also referred to herein as a “latent space point” of the normalized image.
- the latent space point can be in the form of multi-dimensional vector that include hundreds of dimensions (and thus is sometimes referred to herein as a “latent space vector”).
- the latent space point can be a 512 dimension vector.
- a single encoder encodes the entire face within image 210 as a single latent space point. Other embodiments, however, can include multiple encoders each of which encodes a portion of image 210 .
- FIG. 3 is a simplified block diagram of a multi-encoder system 300 according to some embodiments, which can be representative of encoder 220 .
- the four separate encoders 322 , 324 , 326 , 328 represent an illustrative embodiment only. In other embodiments, fewer or more than four encoders can be included in multi-encoder system 300 and individual encoders in the system can encode different portions of an input image. For example, in some embodiments a single encoder can be trained to encode both the left and right eyes.
- Method 400 can be initiated when an image, such as image 210 shown in FIG. 2 , and a selection of a facial identity is received by a face changing application, such as face changing application 146 a or 146 b , in which multi-encoder system 300 is included ( FIG. 4 , block 410 ).
- the selected facial identity can be one of the facial identities that the ML model within face changing application 146 a or 146 b (e.g., ML model 150 a or ML model 150 b ) was trained for.
- the selected facial identity could be an interpolation that combines the facial identities that the ML model was trained for.
- the face changing application 146 a or 146 b can also receive a video including multiple frames that include faces and process each frame according to steps of the method 400 .
- the identifying, separating and cropping process can be done using any known technique for identifying information and/or facial features within an image including deep learning techniques.
- the predetermined dimensions e.g., a predetermined pixel resolution
- the portions cropped for encoders 322 and 324 which have been trained on left and right eyes, respectively, can be identically sized but can be smaller than the portion cropped for encoder 326 , which can be trained on a mouth.
- block 430 can include identifying a portion of the input image that corresponds to a left eye and thus correlates with encoder 322 , a portion of the input image that corresponds to a right eye and thus correlates with encoder 324 , a portion of the input image that corresponds to a mouth and thus correlates with encoder 326 and the remainder of the face within the image that correlates to encoder 328 .
- the portion of the image that correlates to encoder 328 can be the entirety of the face within the image minus each of the portions that correlate to encoders 322 , 324 and 326 .
- the encoded representation generated by each of the encoders 322 - 328 can be in the form of multi-dimensional vector that include many dimensions.
- the encoded representation output from each of encoders 322 , 324 , 326 , 328 can be a 128 dimension vector.
- the ML model can generate the overall image latent space point (e.g., latent space vector 330 shown in FIG. 3 ) as a concatenation of the separate latent space points 332 , 334 , 336 , 338 from each of the separate encoder 322 , 324 , 326 , 328 , respectively (block 450 ).
- the overall image latent space point e.g., latent space vector 330 shown in FIG. 3
- the overall image latent space point e.g., latent space vector 330 shown in FIG. 3
- Some embodiments disclosed herein provide a system that can solve such problems by enabling the latent space point generated by encoder 220 of the source facial shape to be edited prior to being decoded and transformed into an image that represents the target character's face by decoder 240 .
- FIG. 5 is a simplified block diagram of a deep learning system 500 according to some embodiments.
- System 500 can be similar to either of systems 100 a or 100 b discussed above and can include many of the same elements as those systems.
- like reference numbers in FIG. 5 represent like elements discussed above with respect to FIGS. 1 A and 1 B , and thus descriptions of some such like elements are not repeated herein for the sake of brevity.
- One difference between system 500 and systems 100 a and 100 b is that system 500 includes a machine learning (ML) model 150 c that includes a latent space editor 510 in addition to an encoder 152 and decoder 154 .
- Encoder 220 and decoder 240 shown in FIG. 2 can be representative of encoder 152 and decoder 154 , respectively.
- Latent space editor 510 allows a user of face swapping application 146 to control certain aspects of the output image generated by decoder 154 as described in detail below. For example, in some embodiments, latent space editor 510 allows a user to control the direction in which the eyes of a target facial identity are directed in the output image. As another example, in some embodiments, latent space editor 510 allows a user to control the degree at which the mouth of a target facial identity is opened or closed in the output image.
- the selected images can be a subset of images and identities from the set of images and identities used to train the neural network, Alternatively, as long as the network is trained to be identity-agnostic, the selected images can be images from any identity as long as facial features in the selected images exhibit the desired feature or trait.
- a first set of images can be identified in block 620 in which the mouth is opened and a second set of images can be selected in which the mouth is closed.
- images can be identified in block 620 which the eyes are looking in a particular direction. Since left is the opposite of right and up is the opposite of down, in order to generate adjustment vectors that allow the eyes to be altered in either the left/right directions as well as the up/down directions, sets of training images can be selected that include eyes looking in each of the four directions.
- any reasonable number of images can be selected in block 620 to generate the adjustment vectors.
- a set of between 5-10 images that exhibit the selected characteristic or trait can be selected.
- fewer than five images or more than ten images can be selected to train the latent space editor 510 .
- the selected adjustment vector is applied to the latent space point representative of the initial output image to generate an adjusted latent space point (block 780 ).
- the adjustment vector can be added to the latent space point representative of the initial output image that is generated to nudge the feature or characteristic being adjusted in the output image in the desired direction.
- a new, adjusted output image can then be generated by the decoder from the adjusted latent space point (block 790 ).
- the new, adjusted output image can then be reviewed and evaluated to determine whether additional adjustments are desired (block 750 ).
- Blocks 770 , 780 and 790 can then be repeated as many times as necessary until the image generated by the decoder is accepted as a final output image (block 760 ).
- adjustment vectors can be calculated and applied to the image as a whole in some embodiments.
- adjustment vectors can be calculated and applied to each individual portion of the overall image generated by the separate encoders. Combining the multiple encoders 322 - 328 of FIG. 3 and latent space editor 510 of FIG. 5 into the same face changing application can allow for a higher degree of control over the output image than is possible when the adjustment vectors generated by the latent space editor 510 are applied to the image as a whole.
- latent space editor 510 can include a user interface that enables a user to easily select, on a sliding scale, how much of a desired adjustment to the particular feature is desired.
- FIG. 8 is a simplified diagram depicting an example of a user interface 800 according to some embodiments along with three separate output images 810 , 820 and 830 each of which has been generated from the same input image encoded by encoder 152 in which the eyes of the subject in the input image are in a neutral position looking neither left or right, i.e., looking straight.
- User interface 800 can be a slider that enables a user to select on a sliding scale (e.g., from ⁇ 1.0 to 1.0) a weight that will be given to a selected adjustment vector and applied by latent space editor 510 to the latent space point that represents the initial output image.
- a sliding scale e.g., from ⁇ 1.0 to 1.0
- a weight that will be given to a selected adjustment vector and applied by latent space editor 510 to the latent space point that represents the initial output image.
- an adjustment vector has been calculated as described above that can nudge the eyes of a face within an output image generated by a decoder, such as decoder 154 , to the right when the vector is added to the latent space point representing the initial output image.
- the adjustment vector is subtracted from the initial output image, the eyes of the output image can be nudged to the left.
- image 830 has been decoded by decoder 154 after a positive value of the adjustment vector has been applied to the initial latent space point representative of the initial output image (e.g., output image 810 ) changing the direction of the eyes from looking straight ahead to looking right.
- the amount of change in right direction can be controlled by moving the slider further or less right than is depicted.
- the user interface 800 can include additional sliders, such as one for each adjustment vector that has been identified, to enable additional adjustments to the output image.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (22)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/707,782 US12277738B2 (en) | 2022-03-29 | 2022-03-29 | Method and system for latent-space facial feature editing in deep learning based face swapping |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/707,782 US12277738B2 (en) | 2022-03-29 | 2022-03-29 | Method and system for latent-space facial feature editing in deep learning based face swapping |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230316587A1 US20230316587A1 (en) | 2023-10-05 |
| US12277738B2 true US12277738B2 (en) | 2025-04-15 |
Family
ID=88193205
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/707,782 Active 2042-11-05 US12277738B2 (en) | 2022-03-29 | 2022-03-29 | Method and system for latent-space facial feature editing in deep learning based face swapping |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12277738B2 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230281979A1 (en) * | 2020-08-03 | 2023-09-07 | Xuhui Jia | Systems and Methods for Training Machine-Learned Visual Attention Models |
| US20250029208A1 (en) * | 2023-07-17 | 2025-01-23 | Metaphysic.AI | Modifying source data to generate hyperreal synthetic content |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11875600B2 (en) * | 2021-03-31 | 2024-01-16 | Snap Inc. | Facial synthesis in augmented reality content for online communities |
| US12198225B2 (en) * | 2021-10-01 | 2025-01-14 | Disney Enterprises, Inc. | Transformer-based shape models |
| JP7479507B2 (en) * | 2022-03-30 | 2024-05-08 | ▲騰▼▲訊▼科技(深▲セン▼)有限公司 | Image processing method and device, computer device, and computer program |
| US20250245886A1 (en) * | 2024-01-31 | 2025-07-31 | Google Llc | Optimization of overall editing vector to achieve target expression photo editing effect |
| KR102873262B1 (en) * | 2024-11-18 | 2025-10-20 | 주식회사 오핌디지털 | Method and apparatus for swapping face of character in video |
Citations (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170076142A1 (en) | 2015-09-15 | 2017-03-16 | Google Inc. | Feature detection and masking in images based on color distributions |
| US20190251707A1 (en) | 2018-02-15 | 2019-08-15 | Adobe Inc. | Saliency prediction for a mobile user interface |
| JP2021000224A (en) | 2019-06-20 | 2021-01-07 | 国立大学法人 東京大学 | Information processing device, information processing method, and program |
| US10902571B2 (en) | 2019-05-20 | 2021-01-26 | Disney Enterprises, Inc. | Automated image synthesis using a comb neural network architecture |
| KR20210033781A (en) | 2019-09-19 | 2021-03-29 | 주식회사 케이티 | System and method for analyzing face |
| CN112766160A (en) | 2021-01-20 | 2021-05-07 | 西安电子科技大学 | Face replacement method based on multi-stage attribute encoder and attention mechanism |
| US20210142440A1 (en) | 2019-11-07 | 2021-05-13 | Hyperconnect, Inc. | Image conversion apparatus and method, and computer-readable recording medium |
| US20210192684A1 (en) | 2019-12-24 | 2021-06-24 | Nvidia Corporation | Panorama generation using one or more neural networks |
| CN113420703A (en) | 2021-07-03 | 2021-09-21 | 西北工业大学 | Dynamic facial expression recognition method based on multi-scale feature extraction and multi-attention mechanism modeling |
| US20210295483A1 (en) | 2019-02-26 | 2021-09-23 | Tencent Technology (Shenzhen) Company Limited | Image fusion method, model training method, and related apparatuses |
| US20210327038A1 (en) | 2020-04-16 | 2021-10-21 | Disney Enterprises, Inc. | Tunable models for changing faces in images |
| CN113592982A (en) | 2021-09-29 | 2021-11-02 | 北京奇艺世纪科技有限公司 | Identity migration model construction method and device, electronic equipment and readable storage medium |
| US11222466B1 (en) * | 2020-09-30 | 2022-01-11 | Disney Enterprises, Inc. | Three-dimensional geometry-based models for changing facial identities in video frames and images |
| US20220036534A1 (en) | 2020-07-31 | 2022-02-03 | Adobe Inc. | Facial reconstruction network |
| US11308657B1 (en) | 2021-08-11 | 2022-04-19 | Neon Evolution Inc. | Methods and systems for image processing using a learning engine |
| US20220374649A1 (en) | 2021-05-20 | 2022-11-24 | Disney Enterprises, Inc. | Face swapping with neural network-based geometry refining |
| US20220391611A1 (en) * | 2021-06-08 | 2022-12-08 | Adobe Inc. | Non-linear latent to latent model for multi-attribute face editing |
| US20230086807A1 (en) | 2021-09-17 | 2023-03-23 | Adobe Inc. | Segmented differentiable optimization with multiple generators |
| US20230162407A1 (en) * | 2021-11-19 | 2023-05-25 | Adobe Inc. | High resolution conditional face generation |
-
2022
- 2022-03-29 US US17/707,782 patent/US12277738B2/en active Active
Patent Citations (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170076142A1 (en) | 2015-09-15 | 2017-03-16 | Google Inc. | Feature detection and masking in images based on color distributions |
| US20190251707A1 (en) | 2018-02-15 | 2019-08-15 | Adobe Inc. | Saliency prediction for a mobile user interface |
| US20210295483A1 (en) | 2019-02-26 | 2021-09-23 | Tencent Technology (Shenzhen) Company Limited | Image fusion method, model training method, and related apparatuses |
| US10902571B2 (en) | 2019-05-20 | 2021-01-26 | Disney Enterprises, Inc. | Automated image synthesis using a comb neural network architecture |
| JP2021000224A (en) | 2019-06-20 | 2021-01-07 | 国立大学法人 東京大学 | Information processing device, information processing method, and program |
| KR20210033781A (en) | 2019-09-19 | 2021-03-29 | 주식회사 케이티 | System and method for analyzing face |
| US20210142440A1 (en) | 2019-11-07 | 2021-05-13 | Hyperconnect, Inc. | Image conversion apparatus and method, and computer-readable recording medium |
| US20210192684A1 (en) | 2019-12-24 | 2021-06-24 | Nvidia Corporation | Panorama generation using one or more neural networks |
| US20210327038A1 (en) | 2020-04-16 | 2021-10-21 | Disney Enterprises, Inc. | Tunable models for changing faces in images |
| US20220036534A1 (en) | 2020-07-31 | 2022-02-03 | Adobe Inc. | Facial reconstruction network |
| US11222466B1 (en) * | 2020-09-30 | 2022-01-11 | Disney Enterprises, Inc. | Three-dimensional geometry-based models for changing facial identities in video frames and images |
| CN112766160A (en) | 2021-01-20 | 2021-05-07 | 西安电子科技大学 | Face replacement method based on multi-stage attribute encoder and attention mechanism |
| US20220374649A1 (en) | 2021-05-20 | 2022-11-24 | Disney Enterprises, Inc. | Face swapping with neural network-based geometry refining |
| US20220391611A1 (en) * | 2021-06-08 | 2022-12-08 | Adobe Inc. | Non-linear latent to latent model for multi-attribute face editing |
| CN113420703A (en) | 2021-07-03 | 2021-09-21 | 西北工业大学 | Dynamic facial expression recognition method based on multi-scale feature extraction and multi-attention mechanism modeling |
| US11308657B1 (en) | 2021-08-11 | 2022-04-19 | Neon Evolution Inc. | Methods and systems for image processing using a learning engine |
| US20230049729A1 (en) | 2021-08-11 | 2023-02-16 | Neon Evolution Inc. | Methods and systems for image processing using a learning engine |
| US20230086807A1 (en) | 2021-09-17 | 2023-03-23 | Adobe Inc. | Segmented differentiable optimization with multiple generators |
| CN113592982A (en) | 2021-09-29 | 2021-11-02 | 北京奇艺世纪科技有限公司 | Identity migration model construction method and device, electronic equipment and readable storage medium |
| US20230162407A1 (en) * | 2021-11-19 | 2023-05-25 | Adobe Inc. | High resolution conditional face generation |
Non-Patent Citations (4)
| Title |
|---|
| U.S. Appl. No. 17/707,785, "Final Office Action", Nov. 29, 2024, 27 pages. |
| U.S. Appl. No. 17/707,785, "Non-Final Office Action", Jun. 13, 2024, 24 pages. |
| Xu, et al., "Face Shape Gene: A Disentangled Shape Representation for Flexible Face Image Editing", Computer Science, Computer Vision and Pattern Recognition Available online at : https://arxiv.org/abs/1905.01920, May 6, 2019, 10 pages. |
| Zeng, et al., "Facial Expression Transfer from Video via Deep Learning", SCA '21: The ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Sep. 6-9, 2021, 2 pages. |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230281979A1 (en) * | 2020-08-03 | 2023-09-07 | Xuhui Jia | Systems and Methods for Training Machine-Learned Visual Attention Models |
| US12406487B2 (en) * | 2020-08-03 | 2025-09-02 | Google Llc | Systems and methods for training machine-learned visual attention models |
| US20250029208A1 (en) * | 2023-07-17 | 2025-01-23 | Metaphysic.AI | Modifying source data to generate hyperreal synthetic content |
| US12541818B2 (en) * | 2023-07-17 | 2026-02-03 | Metaphysic Limited | Modifying source data to generate hyperreal synthetic content |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230316587A1 (en) | 2023-10-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12452385B2 (en) | Method and system for deep learning based face swapping with multiple encoders | |
| US12277738B2 (en) | Method and system for latent-space facial feature editing in deep learning based face swapping | |
| US12475537B2 (en) | Method and system for image processing | |
| US20240169701A1 (en) | Affordance-based reposing of an object in a scene | |
| Logacheva et al. | Deeplandscape: Adversarial modeling of landscape videos | |
| CN118175324A (en) | A multi-dimensional generative framework for video generation | |
| US20250209707A1 (en) | Techniques for generating dubbed media content items | |
| WO2024164030A2 (en) | Photorealistic content generation from animated content by neural radiance field diffusion guided by vision-language models | |
| Gowda et al. | From pixels to portraits: A comprehensive survey of talking head generation techniques and applications | |
| WO2024103190A1 (en) | Method and system for image processing across multiple frames using machine learning | |
| Yang et al. | Semantic layout-guided diffusion model for high-fidelity image synthesis in ‘The Thousand Li of Rivers and Mountains’ | |
| Huang et al. | 360$^\circ $ Stereo Image Composition With Depth Adaption | |
| US20250209759A1 (en) | Techniques for generating dubbed media content items | |
| US12541818B2 (en) | Modifying source data to generate hyperreal synthetic content | |
| US20250218109A1 (en) | Rendering Videos with Novel Views from Near-Duplicate Photos | |
| Teng et al. | Blind face restoration via multi-prior collaboration and adaptive feature fusion | |
| Ma et al. | Decoupled two-stage talking head generation via Gaussian-landmark-based neural radiance fields | |
| Bin et al. | FSA-Net: A Cost-efficient Face Swapping Attention Network with Occlusion-Aware Normalization. | |
| US20250209708A1 (en) | Techniques for generating dubbed media content items | |
| Kicanaoglu et al. | Unsupervised Facial Performance Editing via Vector-Quantized StyleGAN Representations | |
| Kurisaki et al. | Animating cloud images with flow style transfer | |
| Park | Machine Learning for Deep Image Synthesis | |
| Dornier | Transfer learning for facial analysis with limited and inconsistent annotations | |
| Huang et al. | 360 {\deg} Stereo Image Composition with Depth Adaption | |
| Li | Semantically Aware Style-Controlled Animation Line Art Colorization Using Conditional GANs with GCN and Attention Mechanisms |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| AS | Assignment |
Owner name: THE WALT DISNEY COMPANY (SWITZERLAND) GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NARUNIEC, JACEK KRZYSZTOF;WEBER, ROMANN MATTHEW;SCHROERS, CHRISTOPHER RICHARD;SIGNING DATES FROM 20220324 TO 20220325;REEL/FRAME:070449/0554 Owner name: LUCASFILM ENTERTAINMENT COMPANY LTD. LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GHEBREMUSSE, SIRAK;GRABLI, STEPHANE;REEL/FRAME:070449/0551 Effective date: 20220325 |
|
| AS | Assignment |
Owner name: DISNEY ENTERPRISES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THE WALT DISNEY COMPANY (SWITZERLAND) GMBH;REEL/FRAME:070461/0155 Effective date: 20250310 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |