WO2018102880A1 - Systems and methods for replacing faces in videos - Google Patents

Systems and methods for replacing faces in videos Download PDF

Info

Publication number
WO2018102880A1
WO2018102880A1 PCT/AU2017/051353 AU2017051353W WO2018102880A1 WO 2018102880 A1 WO2018102880 A1 WO 2018102880A1 AU 2017051353 W AU2017051353 W AU 2017051353W WO 2018102880 A1 WO2018102880 A1 WO 2018102880A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
facial image
video
facial
target
Prior art date
Application number
PCT/AU2017/051353
Other languages
French (fr)
Inventor
Marcus George FRANGOS
Original Assignee
Frangos Marcus George
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2016905100A external-priority patent/AU2016905100A0/en
Application filed by Frangos Marcus George filed Critical Frangos Marcus George
Publication of WO2018102880A1 publication Critical patent/WO2018102880A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Definitions

  • the present invention relates to systems and methods for replacing facial images in videos.
  • Pre-processing of the video may comprises:
  • the target face in the selected frame may be automatically detected by facial image recognition.
  • the steps of identifying the facial landmarks of the user's facial image, mapping the facial landmarks to the 3D subject model and compositing the user's facial image texture into each video frame may be performed by a processor of the user device.
  • the user's facial image may be obtained via a camera of the user device.
  • the method may further comprise displaying a head positioning guide on the user device while capturing the user's facial image.
  • the method may further comprise receiving front and profile facial images of the user and combining the front and profile facial images into a single facial image texture.
  • the user device may be a mobile device comprising a tablet or a smartphone.
  • the pre-processed video may comprise two or more different target faces appearing in multiple frames of the video, each target face being mapped to a 3D target model, wherein the user selects, via the user device, one of the target faces for compositing with the user's facial image.
  • the method may further comprise processing each composited video frame by texture blending, alpha blending, pixel intensity blending, luminescence blending, hue manipulation, applying blur filters, or a combination thereof.
  • processors a processor; and a non-transitory computer-readable medium coupled to the processor and having instructions stored thereon, which, when executed by the processor, cause the processor to perform operations comprising:
  • the pre-processed video may be stored on a server.
  • a non-transitory computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the processor to perform operations comprising:
  • Figure 1 is a flowchart of a method for replacing faces in videos according to an embodiment
  • Figure 2 is a block diagram illustrating the system and method for replacing faces in videos according to an embodiment
  • Figures 3a to 3i are screenshots of the method and system implemented on a smartphone
  • Figures 4a to 4d are screenshots illustrating the video processing method in more detail.
  • Figure 5 illustrates a composited frame, in which the target face has been replaced with a user image.
  • Figures 1 illustrates a method 10 for replacing faces in videos according to one embodiment.
  • the method may be performed by one or more specially programmed computing devices.
  • the method comprises three main components: video processing 30, user image processing (to a 3D model) 40, and video compositing 50.
  • the method may involve receiving input from the user and displaying results to the user via a user device 2.
  • the user device 2 may generally include a memory for storing instructions and data, and a processor for executing stored instructions.
  • the memory may include both read only and writable memory.
  • the user device 2 may be a mobile device such as a smartphone or tablet coupled to the one or more specially programmed computing devices through a data communication network, eg, local area network (LAN) or wide area network (WAN), eg, the Internet, or a combination of networks, any of which may include wireless links.
  • a data communication network eg, local area network (LAN) or wide area network (WAN), eg, the Internet, or a combination of networks, any of which may include wireless links.
  • the method 10 starts by providing a video that has been processed to map a target face 4 (eg an actor's or actress' face) to a 3D target model 8, in each frame of the video where the target face 4 appears.
  • a target face 4 eg an actor's or actress' face
  • a user's facial image 1 1 is received via the user device 2.
  • the image 1 1 is obtained via a camera of the user device 2, eg an integrated smartphone camera, or a web camera connected to the user's device, etc.
  • the user may upload an image 1 1 or may select an image that has been previously uploaded and saved.
  • at least two user facial images are obtained, each from a different angle (preferably orthogonal to each other), such as front and profile views. Images from multiple angles allow for the generation of a combined facial image texture, which retains high image quality and fidelity when viewed from any angle.
  • the method then automatically identifies a plurality of facial landmarks (not shown) in the user's facial image 1 1.
  • These facial landmarks preferably correspond to naturally occurring facial features that are common to all or most people, for example corners of the eyes and eyebrows, tip of nose, corners of mouth, etc.
  • the facial landmarks in the user's facial image 1 1 may be automatically identified via face-fitting techniques. For example, cascade classifiers or stochastic methods, and an Active Shape Model (ASM) or Active Appearance Model (AAM) may be used to locate the facial features.
  • ASM Active Shape Model
  • AAM Active Appearance Model
  • the method further involves displaying a head positioning guide 18 on the screen of the user device while the user's facial image is being captured by the camera of the user device.
  • the positioning guide approximately aligns the facial landmarks on the user's face at specific locations and orientations, to assist with the step of detecting the user's facial landmarks.
  • the method automatically maps the identified facial landmarks to a 3D subject mesh model, via mesh generation, face-fitting techniques, etc, to generate a user's facial image texture.
  • the fitted ASM or AAM may provide the starting point for fitting the image to a suitable face model.
  • the method automatically composites the user's facial image texture into each frame of the video by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model 8 and pixels of the target face 4 respectively.
  • the user's facial image texture is combined from multiple facial images of the user (eg front and profile views), such that the user's image texture is accurately composited into the video, regardless of the angle of the target face.
  • texture blending eg alpha, pixel intensity, luminescence blending, hue manipulation, applying blur filters, etc, may be implemented.
  • the method ends by displaying the composited video on the user device 2.
  • the user may also share the composited video on social media platforms.
  • the user image processing 40 and video compositing 50 components ie. the steps of identifying the facial landmarks of the user's face, mapping the facial landmarks of the user's face to the 3D subject model and compositing the user's facial image into each video frame
  • the user's facial image 1 1 is uploaded to a server, and these automatic processing steps may instead be performed by an external computing device, eg via cloud computing.
  • Figure 2 illustrates a system 100 for replacing faces in videos according to one embodiment, and the associated method steps that may be implemented by and/or on one or more specially programmed computing devices of the system 100.
  • the system 100 may comprise one or more servers 20.
  • a user may interact with the system 100 through the user device 2.
  • the video processing component 30 is performed externally of the user device 2. That is, a provider/curator 22 may select, process and upload videos onto server 20. The user may then access the library of pre-processed videos via a mobile application running on the user device 2, as illustrated in Figure 3a.
  • Figures 4a to 4c illustrate exemplary steps of the video processing component 30 in more detail.
  • a frame 14 of the video displaying the target face 4 is selected, and the target face 4 is detected.
  • the selection of the frame 14 may be automated, for example by facial image recognition of the or a target face 4. This may be implemented as face detection tool 32 of system 100.
  • facial landmarks 6 of the target face 4 are identified. These facial landmarks are preferably the same landmarks detected in the user's facial image during user image processing 40. In some embodiments, this step is performed manually by the curator 22. In other embodiments, the facial landmarks may be automatically detected, for example via face-fitting techniques discussed above. However, it will be appreciated that because the target face 4 in the selected frame 14 could be in any orientation, it may be more challenging to apply face-fitting techniques which typically rely on the facial image being in a known orientation, eg a front or profile view. Accordingly, after automatic detection, the curator 22 may review the frame 22 to ensure that that the facial landmarks have been correctly identified.
  • the curator may reposition the facial landmarks appropriately.
  • the facial landmarks 6 are fitted to a 3D target model 8, via mesh generation, face-fitting techniques, etc, as described above.
  • the 3D target model 8 has the same parameters as the 3D subject model, eg the same number of nodes, cell type, node number associated with a facial landmark, etc.
  • the method automatically tracks the target face 4 and fits the 3D target model to the target face in remaining frames of the video.
  • Automatic tracking may be performed by using the fitted model in the previous frame as the initialising conditions for the current frame, since there is typically minimal change and movement of the target face from frame to frame.
  • the curator 22 may review the frames to ensure that the target face 4 has been fitted correctly across the entire video. Accordingly, the time-varying or frame-varying mesh coordinates of the 3D target model may be obtained.
  • FIG. 5 is an exemplary composited frame 16 illustrating results from the face replacement method, in which the target face 4 shown in Figure 4 has been replaced with a user's image.
  • the video may comprise two or more different target faces 4a, 4b appearing in multiple frames of the video.
  • the video is processed to map each target face to separate 3D target models.
  • the user may then select, via the user device 2, one of the target faces for compositing with the user's facial image 1 1 , as illustrated in Figure 3b and 3c.
  • Figures 3a to 3i are example user interfaces that are displayed on the user device 2 to enable the user to replace the user's facial image with a target face in a pre- processed video.
  • the resulting composited video may be stored on a video content platform for sharing with other users.
  • Embodiments of the present invention provide systems and methods that are useful for implementing a face replacement in a video.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A method of replacing faces in videos is disclosed. The method comprises providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model and receiving, from a user device, a user's facial image. The method further comprises automatically identifying facial landmarks in the user's facial image, automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture and automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively. The method also comprises displaying the composited video on the user device. A system for replacing faces in videos is also disclosed.

Description

SYSTEMS AND METHODS FOR REPLACING FACES IN VIDEOS
Field
[0001 ] The present invention relates to systems and methods for replacing facial images in videos.
Background
[0002] With increasingly more content being shared to social media, the ability to personalise and modify content has become an important differentiating factor.
[0003] For example, several popular existing mobile applications allow the user to personalise a photograph for entertainment purposes, by replacing a face in the photograph with the user's portrait.
[0004] It is much more challenging to implement a face replacement in a video, due to the changing position and angle of the target face throughout the video. Conventionally, such video editing would require substantial skill, time and processing power for frame- by-frame processing, which would inhibit implementation on a mobile device such as a smartphone.
[0005] In this context, there is a need for an improved system and method for replacing faces in videos.
Summary
[0006] According to the present invention, there is provided a method comprising:
providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model;
receiving, from a user device, a user's facial image;
automatically identifying facial landmarks in the user's facial image;
automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture;
automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively; and displaying the composited video on the user device.
[0007] Pre-processing of the video may comprises:
selecting a frame from the video;
detecting the target face in the selected frame;
mapping facial landmarks of the target face to a 3D target model;
automatically tracking the target face and fitting the 3D target model to the target face in remaining frames of the video.
[0008] The target face in the selected frame may be automatically detected by facial image recognition.
[0009] The steps of identifying the facial landmarks of the user's facial image, mapping the facial landmarks to the 3D subject model and compositing the user's facial image texture into each video frame may be performed by a processor of the user device.
[0010] The user's facial image may be obtained via a camera of the user device.
[001 1 ] The method may further comprise displaying a head positioning guide on the user device while capturing the user's facial image.
[0012] The method may further comprise receiving front and profile facial images of the user and combining the front and profile facial images into a single facial image texture.
[0013] The user device may be a mobile device comprising a tablet or a smartphone.
[0014] The pre-processed video may comprise two or more different target faces appearing in multiple frames of the video, each target face being mapped to a 3D target model, wherein the user selects, via the user device, one of the target faces for compositing with the user's facial image.
[0015] The method may further comprise processing each composited video frame by texture blending, alpha blending, pixel intensity blending, luminescence blending, hue manipulation, applying blur filters, or a combination thereof.
[0016] In another aspect of the present invention, there is provided a system comprising:
a processor; and a non-transitory computer-readable medium coupled to the processor and having instructions stored thereon, which, when executed by the processor, cause the processor to perform operations comprising:
providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model;
receiving, from a user device, a user's facial image;
automatically identifying facial landmarks in the user's facial image; automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture;
automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively; and
displaying the composited video on the user device.
[0017] The pre-processed video may be stored on a server.
[0018] In another aspect of the present invention, there is provided a non-transitory computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the processor to perform operations comprising:
providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model;
receiving, from a user device, a user's facial image;
automatically identifying facial landmarks in the user's facial image;
automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture;
automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively; and
displaying the composited video on the user device.
Brief Description of Drawings
[0019] Embodiments of the invention will now be described by way of example only with reference to the accompanying drawings, in which: Figure 1 is a flowchart of a method for replacing faces in videos according to an embodiment;
Figure 2 is a block diagram illustrating the system and method for replacing faces in videos according to an embodiment;
Figures 3a to 3i are screenshots of the method and system implemented on a smartphone;
Figures 4a to 4d are screenshots illustrating the video processing method in more detail; and
Figure 5 illustrates a composited frame, in which the target face has been replaced with a user image.
Description of Embodiments
[0020] Figures 1 illustrates a method 10 for replacing faces in videos according to one embodiment. The method may be performed by one or more specially programmed computing devices. The method comprises three main components: video processing 30, user image processing (to a 3D model) 40, and video compositing 50. The method may involve receiving input from the user and displaying results to the user via a user device 2. The user device 2 may generally include a memory for storing instructions and data, and a processor for executing stored instructions. The memory may include both read only and writable memory. For example, the user device 2 may be a mobile device such as a smartphone or tablet coupled to the one or more specially programmed computing devices through a data communication network, eg, local area network (LAN) or wide area network (WAN), eg, the Internet, or a combination of networks, any of which may include wireless links.
[0021 ] The method 10 starts by providing a video that has been processed to map a target face 4 (eg an actor's or actress' face) to a 3D target model 8, in each frame of the video where the target face 4 appears.
[0022] Next, a user's facial image 1 1 is received via the user device 2. In some embodiments, the image 1 1 is obtained via a camera of the user device 2, eg an integrated smartphone camera, or a web camera connected to the user's device, etc. In other embodiments, the user may upload an image 1 1 or may select an image that has been previously uploaded and saved. In some embodiments, at least two user facial images are obtained, each from a different angle (preferably orthogonal to each other), such as front and profile views. Images from multiple angles allow for the generation of a combined facial image texture, which retains high image quality and fidelity when viewed from any angle.
[0023] The method then automatically identifies a plurality of facial landmarks (not shown) in the user's facial image 1 1. These facial landmarks preferably correspond to naturally occurring facial features that are common to all or most people, for example corners of the eyes and eyebrows, tip of nose, corners of mouth, etc. In some embodiments, the facial landmarks in the user's facial image 1 1 may be automatically identified via face-fitting techniques. For example, cascade classifiers or stochastic methods, and an Active Shape Model (ASM) or Active Appearance Model (AAM) may be used to locate the facial features.
[0024] In some embodiments, the method further involves displaying a head positioning guide 18 on the screen of the user device while the user's facial image is being captured by the camera of the user device. The positioning guide approximately aligns the facial landmarks on the user's face at specific locations and orientations, to assist with the step of detecting the user's facial landmarks.
[0025] Next, the method automatically maps the identified facial landmarks to a 3D subject mesh model, via mesh generation, face-fitting techniques, etc, to generate a user's facial image texture. In the embodiment where face-fitting is used for automatic facial feature detection, the fitted ASM or AAM may provide the starting point for fitting the image to a suitable face model.
[0026] Next, the method automatically composites the user's facial image texture into each frame of the video by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model 8 and pixels of the target face 4 respectively. In some embodiments, the user's facial image texture is combined from multiple facial images of the user (eg front and profile views), such that the user's image texture is accurately composited into the video, regardless of the angle of the target face. Subsequent steps of texture blending, eg alpha, pixel intensity, luminescence blending, hue manipulation, applying blur filters, etc, may be implemented.
[0027] The method ends by displaying the composited video on the user device 2. The user may also share the composited video on social media platforms. [0028] In preferred embodiments, the user image processing 40 and video compositing 50 components (ie. the steps of identifying the facial landmarks of the user's face, mapping the facial landmarks of the user's face to the 3D subject model and compositing the user's facial image into each video frame) are all performed by the processor of the user device 2. In other embodiments, the user's facial image 1 1 is uploaded to a server, and these automatic processing steps may instead be performed by an external computing device, eg via cloud computing.
[0029] Figure 2 illustrates a system 100 for replacing faces in videos according to one embodiment, and the associated method steps that may be implemented by and/or on one or more specially programmed computing devices of the system 100. The system 100 may comprise one or more servers 20. A user may interact with the system 100 through the user device 2.
[0030] In some embodiments, the video processing component 30 is performed externally of the user device 2. That is, a provider/curator 22 may select, process and upload videos onto server 20. The user may then access the library of pre-processed videos via a mobile application running on the user device 2, as illustrated in Figure 3a.
[0031 ] Figures 4a to 4c illustrate exemplary steps of the video processing component 30 in more detail. First, a frame 14 of the video displaying the target face 4 is selected, and the target face 4 is detected. In some cases, the selection of the frame 14 may be automated, for example by facial image recognition of the or a target face 4. This may be implemented as face detection tool 32 of system 100.
[0032] Next, facial landmarks 6 of the target face 4 are identified. These facial landmarks are preferably the same landmarks detected in the user's facial image during user image processing 40. In some embodiments, this step is performed manually by the curator 22. In other embodiments, the facial landmarks may be automatically detected, for example via face-fitting techniques discussed above. However, it will be appreciated that because the target face 4 in the selected frame 14 could be in any orientation, it may be more challenging to apply face-fitting techniques which typically rely on the facial image being in a known orientation, eg a front or profile view. Accordingly, after automatic detection, the curator 22 may review the frame 22 to ensure that that the facial landmarks have been correctly identified. If not, as shown in Figures 4b to 4d, the curator may reposition the facial landmarks appropriately. The facial landmarks 6 are fitted to a 3D target model 8, via mesh generation, face-fitting techniques, etc, as described above. The 3D target model 8 has the same parameters as the 3D subject model, eg the same number of nodes, cell type, node number associated with a facial landmark, etc.
[0033] Next, the method automatically tracks the target face 4 and fits the 3D target model to the target face in remaining frames of the video. Automatic tracking may be performed by using the fitted model in the previous frame as the initialising conditions for the current frame, since there is typically minimal change and movement of the target face from frame to frame. After automatic tracking, the curator 22 may review the frames to ensure that the target face 4 has been fitted correctly across the entire video. Accordingly, the time-varying or frame-varying mesh coordinates of the 3D target model may be obtained.
[0034] The processed video and the associated 3D target model coordinates for each frame may then be uploaded to server 20 and may subsequently be accessed by the user device 2 for video compositing 50, as described above. Figure 5 is an exemplary composited frame 16 illustrating results from the face replacement method, in which the target face 4 shown in Figure 4 has been replaced with a user's image.
[0035] In some embodiments, the video may comprise two or more different target faces 4a, 4b appearing in multiple frames of the video. The video is processed to map each target face to separate 3D target models. The user may then select, via the user device 2, one of the target faces for compositing with the user's facial image 1 1 , as illustrated in Figure 3b and 3c.
[0036] Figures 3a to 3i are example user interfaces that are displayed on the user device 2 to enable the user to replace the user's facial image with a target face in a pre- processed video. The resulting composited video may be stored on a video content platform for sharing with other users.
[0037] Embodiments of the present invention provide systems and methods that are useful for implementing a face replacement in a video.
[0038] For the purpose of this specification, the word "comprising" means "including but not limited to", and the word "comprises" has a corresponding meaning. [0039] The above embodiments have been described by way of example only and modifications are possible within the scope of the claims that follow.

Claims

Claims
1 . A method comprising:
providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model;
receiving, from a user device, a user's facial image;
automatically identifying facial landmarks in the user's facial image;
automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture;
automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively; and
displaying the composited video on the user device.
2. The method of claim 1 , wherein pre-processing of the video comprises:
selecting a frame from the video;
detecting the target face in the selected frame;
mapping facial landmarks of the target face to a 3D target model;
automatically tracking the target face and fitting the 3D target model to the target face in remaining frames of the video.
3. The method of claim 2, wherein the target face in the selected frame is automatically detected by facial image recognition.
4. The method of any one of the preceding claims, wherein the steps of identifying the facial landmarks of the user's facial image, mapping the facial landmarks to the 3D subject model and compositing the user's facial image texture into each video frame are performed by a processor of the user device.
5. The method of any one of the preceding claims, wherein the user's facial image is obtained via a camera of the user device.
6. The method of claim 5, further comprising displaying a head positioning guide on the user device while capturing the user's facial image.
7. The method of any one of the preceding claims, comprising receiving front and profile facial images of the user.
8. The method of any one of the preceding claims, wherein the user device comprises a computer, a tablet or a smartphone.
9. The method of any one of the preceding claims, wherein the pre-processed video comprises two or more different target faces appearing in multiple frames of the video, each target face being mapped to a 3D target model, and
wherein the user selects, via the user device, one of the target faces for compositing with the user's facial image.
10. The method of any one of the preceding claims, further comprising processing each composited video frame by texture blending, alpha blending, pixel intensity blending, luminescence blending, hue manipulation, applying blur filters, or a combination thereof.
1 1 . A system, comprising:
a processor; and
a non-transitory computer-readable medium coupled to the processor and having instructions stored thereon, which, when executed by the processor, cause the processor to perform operations comprising:
providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model;
receiving, from a user device, a user's facial image;
automatically identifying facial landmarks in the user's facial image; automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture;
automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively; and
displaying the composited video on the user device.
12. The system of claim 1 1 , wherein the pre-processed video is stored on a server.
13. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the processor to perform operations comprising:
providing a pre-processed video wherein a target face appearing in multiple frames of the video has been mapped to a 3D target model;
receiving, from a user device, a user's facial image;
automatically identifying facial landmarks in the user's facial image;
automatically mapping the facial landmarks to a 3D subject model to generate a user's facial image texture;
automatically compositing the user's facial image texture into each video frame by transforming coordinates of the 3D subject model and corresponding pixels of the user's facial image texture to coordinates of the 3D target model and pixels of the target face respectively; and
displaying the composited video on the user device.
PCT/AU2017/051353 2016-12-09 2017-12-08 Systems and methods for replacing faces in videos WO2018102880A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2016905100A AU2016905100A0 (en) 2016-12-09 Systems and methods for replacing faces in videos
AU2016905100 2016-12-09

Publications (1)

Publication Number Publication Date
WO2018102880A1 true WO2018102880A1 (en) 2018-06-14

Family

ID=62490568

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2017/051353 WO2018102880A1 (en) 2016-12-09 2017-12-08 Systems and methods for replacing faces in videos

Country Status (1)

Country Link
WO (1) WO2018102880A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291218A (en) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 Video fusion method and device, electronic equipment and readable storage medium
WO2020150691A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for providing personalized videos featuring multiple persons
WO2020150692A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for template-based generation of personalized videos
WO2020150686A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for face reenactment
WO2020150690A3 (en) * 2019-01-18 2020-09-10 Snap Inc. Systems and methods for providing personalized videos
KR20210117304A (en) * 2019-01-18 2021-09-28 스냅 아이엔씨 Methods and systems for realistic head rotations and facial animation compositing on a mobile device
WO2021202042A1 (en) * 2020-03-31 2021-10-07 Snap Inc. Searching and ranking modifiable videos in multimedia messaging application
WO2021202039A1 (en) * 2020-03-31 2021-10-07 Snap Inc. Selfie setup and stock videos creation
US11288880B2 (en) 2019-01-18 2022-03-29 Snap Inc. Template-based generation of personalized videos
CN114666622A (en) * 2022-04-02 2022-06-24 北京字跳网络技术有限公司 Special effect video determination method and device, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHENG, Y. ET AL.: "3D-model-based face replacement in video", CONFERENCE PROCEEDINGS ARTICLE, January 2009 (2009-01-01), XP058201254, Retrieved from the Internet <URL:https://www.researchgate.net/publication/220720785> [retrieved on 20150107] *
DALE, K. ET AL.: "Video Face Replacement", YOUTUBE, 12 December 2011 (2011-12-12), Hong Kong, China. Proceedings of the 2011 SIGGRAPH Asia Conference, XP054978747, Retrieved from the Internet <URL:https://www.youtube.com/watch?v=rTvdvNNiCVI> *
GARRIDO, P. ET AL.: "Automatic Face Reenactment", YOUTUBE, 4 May 2016 (2016-05-04), The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4217 - 4224, XP054978748, Retrieved from the Internet <URL:https://www.youtube.com/watch?v=rGiFi4Kqk3s> *
MIN, F. ET AL.: "Automatic Face Replacement in Video Based on 2D Morphable Model", 2010 20TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 23 August 2010 (2010-08-23), Istanbul, Turkey, pages 2250 - 2253, XP031771081 *
NISWAR, A. ET AL.: "Face replacement in video from a single image", PROCEEDING SIGGRAPH ASIA 2012 POSTERS, 28 November 2012 (2012-11-28), Singapore, pages 1, XP058010100 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210314498A1 (en) * 2019-01-18 2021-10-07 Snap Inc. Personalized videos featuring multiple persons
KR102605077B1 (en) 2019-01-18 2023-11-23 스냅 아이엔씨 Methods and systems for compositing realistic head rotations and facial animation on mobile devices
WO2020150692A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for template-based generation of personalized videos
WO2020150686A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for face reenactment
WO2020150690A3 (en) * 2019-01-18 2020-09-10 Snap Inc. Systems and methods for providing personalized videos
US11089238B2 (en) 2019-01-18 2021-08-10 Snap Inc. Personalized videos featuring multiple persons
CN113287118A (en) * 2019-01-18 2021-08-20 斯纳普公司 System and method for face reproduction
CN113302694A (en) * 2019-01-18 2021-08-24 斯纳普公司 System and method for generating personalized video based on template
CN113330453A (en) * 2019-01-18 2021-08-31 斯纳普公司 System and method for providing personalized video for multiple persons
KR20210117304A (en) * 2019-01-18 2021-09-28 스냅 아이엔씨 Methods and systems for realistic head rotations and facial animation compositing on a mobile device
KR20210118428A (en) * 2019-01-18 2021-09-30 스냅 아이엔씨 Systems and methods for providing personalized video
KR20210119440A (en) * 2019-01-18 2021-10-05 스냅 아이엔씨 Systems and methods for creating personalized videos with custom text messages
KR102658104B1 (en) * 2019-01-18 2024-04-17 스냅 아이엔씨 Template-based personalized video creation system and method
WO2020150691A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for providing personalized videos featuring multiple persons
KR20210119439A (en) * 2019-01-18 2021-10-05 스냅 아이엔씨 Template-based personalized video creation system and method
KR102616013B1 (en) 2019-01-18 2023-12-21 스냅 아이엔씨 System and method for creating personalized video with customized text message
KR102546016B1 (en) 2019-01-18 2023-06-22 스냅 아이엔씨 Systems and methods for providing personalized video
US11288880B2 (en) 2019-01-18 2022-03-29 Snap Inc. Template-based generation of personalized videos
US20230049489A1 (en) * 2019-01-18 2023-02-16 Snap Inc. Personalized videos featuring multiple persons
US11394888B2 (en) 2019-01-18 2022-07-19 Snap Inc. Personalized videos
US11558561B2 (en) * 2019-01-18 2023-01-17 Snap Inc. Personalized videos featuring multiple persons
CN111291218A (en) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 Video fusion method and device, electronic equipment and readable storage medium
CN111291218B (en) * 2020-01-20 2023-09-08 北京百度网讯科技有限公司 Video fusion method, device, electronic equipment and readable storage medium
US11477366B2 (en) 2020-03-31 2022-10-18 Snap Inc. Selfie setup and stock videos creation
US11263260B2 (en) 2020-03-31 2022-03-01 Snap Inc. Searching and ranking modifiable videos in multimedia messaging application
WO2021202042A1 (en) * 2020-03-31 2021-10-07 Snap Inc. Searching and ranking modifiable videos in multimedia messaging application
WO2021202039A1 (en) * 2020-03-31 2021-10-07 Snap Inc. Selfie setup and stock videos creation
CN114666622A (en) * 2022-04-02 2022-06-24 北京字跳网络技术有限公司 Special effect video determination method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2018102880A1 (en) Systems and methods for replacing faces in videos
CN109922355B (en) Live virtual image broadcasting method, live virtual image broadcasting device and electronic equipment
CN108154518B (en) Image processing method and device, storage medium and electronic equipment
EP3457683B1 (en) Dynamic generation of image of a scene based on removal of undesired object present in the scene
US10573018B2 (en) Three dimensional scene reconstruction based on contextual analysis
US9699380B2 (en) Fusion of panoramic background images using color and depth data
US11176355B2 (en) Facial image processing method and apparatus, electronic device and computer readable storage medium
US10762649B2 (en) Methods and systems for providing selective disparity refinement
US20140085398A1 (en) Real-time automatic scene relighting in video conference sessions
US9052740B2 (en) Adaptive data path for computer-vision applications
DE202017105899U1 (en) Camera adjustment adjustment based on predicted environmental factors and tracking systems using them
US10580143B2 (en) High-fidelity 3D reconstruction using facial features lookup and skeletal poses in voxel models
EP3038056A1 (en) Method and system for processing video content
CN113973190A (en) Video virtual background image processing method and device and computer equipment
CN105701762B (en) Picture processing method and electronic equipment
WO2019084712A1 (en) Image processing method and apparatus, and terminal
US9524540B2 (en) Techniques for automatically correcting groups of images
CN111079535B (en) Human skeleton action recognition method and device and terminal
US9171357B2 (en) Method, apparatus and computer-readable recording medium for refocusing photographed image
CN111080546A (en) Picture processing method and device
KR20160062665A (en) Apparatus and method for analyzing motion
US10282633B2 (en) Cross-asset media analysis and processing
JP2017021430A (en) Panoramic video data processing device, processing method, and program
CN110298229B (en) Video image processing method and device
KR20150011714A (en) Device for determining orientation of picture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17877612

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17877612

Country of ref document: EP

Kind code of ref document: A1