WO2018018076A1 - Création de vidéos avec des expressions faciales - Google Patents

Création de vidéos avec des expressions faciales Download PDF

Info

Publication number
WO2018018076A1
WO2018018076A1 PCT/AU2017/050763 AU2017050763W WO2018018076A1 WO 2018018076 A1 WO2018018076 A1 WO 2018018076A1 AU 2017050763 W AU2017050763 W AU 2017050763W WO 2018018076 A1 WO2018018076 A1 WO 2018018076A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial
user
character
facial feature
feature
Prior art date
Application number
PCT/AU2017/050763
Other languages
English (en)
Inventor
Ron FORTUNE
Bill Bailey
Original Assignee
BGR Technologies Pty Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2016902919A external-priority patent/AU2016902919A0/en
Application filed by BGR Technologies Pty Limited filed Critical BGR Technologies Pty Limited
Priority to US16/320,966 priority Critical patent/US11003898B2/en
Publication of WO2018018076A1 publication Critical patent/WO2018018076A1/fr
Priority to US17/187,604 priority patent/US20210264139A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]

Definitions

  • the present disclosure generally relates to creating videos.
  • the present disclosure includes computer-implemented methods, software, and computer systems for creating videos with facial expressions to reflect styles of individual persons.
  • a video document is often used to present content in relation to a "story".
  • the content typically consists of audio and/or visual content, or both visual and audio content, for example, the video documents available at Youtube.
  • the content presented in the video document often involves at least one character and a storyline associated with the character.
  • the storyline is used to represent how the story develops with respect to the character over time, including what the character does and the interactions of the character with other characters in the story.
  • a method for creating a video on a mobile device that comprises a camera, the method comprising: creating a graphic user interface on the mobile device to capture by the camera multiple photographic facial images of a user for respective multiple facial expressions of a character in the video;
  • a method for creating a video including a character on a mobile device that comprises a camera comprising:
  • the first frame of the video is created by modifying the reference facial image of the character with reference to the corresponding user facial feature. Therefore, the original character's visual style is not replaced by a given user's visual style. Instead, the facial expression of the user is used to influence the facial expression of the character.
  • This method enables replacement of certain visual style elements with a given user's own style elements. Although this method is described with reference to facial expressions, the method is also applicable to skin tone, eye colour, etc.
  • the method may further comprise:
  • the user facial feature may comprise a set of control points.
  • the graphic user interface may comprise the reference facial image of the character.
  • the graphic user interface may comprise a live view of each of the multiple photographic facial images.
  • the live view may be positioned next to the camera.
  • the live view may be positioned next to the reference facial image of the character.
  • the method may further comprise superimposing the live view on the reference facial image of the character.
  • the method may further comprise selecting the character from a plurality of characters in the video.
  • the method may further comprise recording audio data associated with the user facial feature.
  • a computer software product including machine-readable instructions, when executed by a processor of a mobile device, causes the processor to perform any one of the methods described above.
  • a mobile device for creating a video including a character comprising:
  • processor configured to:
  • (c) store associated with a respective facial expression identifier the user facial feature from each of the multiple photographic facial images;
  • this method determines the estimated reference facial feature of the character and the estimated user facial feature of the user, and determines the transformation based on the estimated reference facial feature of the character and the estimated user facial feature of the user. This dramatically reduces the time required to create the output frame.
  • Determining the estimated reference facial feature of the character may comprise: determining a first distance between a first reference facial feature of the first reference facial image of the character and a second reference facial feature of the second reference facial image of the character; and
  • Determining the estimated reference facial feature of the character may comprise performing an interpolation operation based on the first reference facial feature and the second reference facial feature with respect to the first distance.
  • Determining the estimated reference facial feature of the character may comprise performing an extrapolation operation based on the first reference facial feature and the second reference facial feature with respect to the first distance.
  • the first reference facial feature may include a first set of control points
  • the second reference facial feature may include a second set of control points
  • the first distance may be indicative of a distance between the first set of control points and the second set of control points.
  • Determining the estimated user facial feature of the user may comprise: determining a second distance between a user first facial feature of the first photographic facial image of the user and a user second facial feature of the second photographic facial image of the user;
  • Determining the estimated user facial feature of the user may comprise performing an interpolation operation based on the user first facial feature and the user second facial feature with respect to the second distance.
  • Determining the estimated user facial feature of the user may comprise performing an extrapolation operation based on the user first facial feature and the user second facial feature with respect to the second distance.
  • the user first facial feature may include a third set of control points
  • the user second facial feature may include a fourth set of control points
  • the second distance may be indicative of a distance between the third set of control points and the fourth set of control points.
  • Modifying the third reference facial image of the character may comprise transforming a first spline curve represented by the estimated reference facial feature of the character into an approximation or representation of a second spline curve represented by the estimated user facial feature of the user.
  • a computer software product including machine-readable instructions, when executed by a processor of a mobile device, causes the processor to perform any one of the methods described above.
  • a mobile device for creating an output frame for a character in a video comprising:
  • a camera to capture a first photographic facial image and a second photographic facial image of the user
  • Fig. 1 illustrates an example mobile device for creating a video including a character in accordance with the present disclosure
  • Figs. 2(a) and 2(b) illustrate example methods for creating a video including a character on the mobile device in accordance with the present disclosure
  • Fig. 3 illustrates a graphic user interface in accordance with the present disclosure
  • Figs. 4 and 5 illustrate facial features in accordance with the present disclosure
  • Fig. 6 illustrates a detailed process for creating a video including a character on the mobile device in accordance with the present disclosure
  • Fig. 7 illustrate an example mobile device for creating an output frame for a character in a video in accordance with the present disclosure
  • Fig. 8 illustrates an example method for creating an output frame for a character in a video in accordance with the present disclosure.
  • a video in the present disclosure consists of a sequence of images, i.e., "frames". Each frame differs in content from its adjacent frames (i.e., previous and next frames) by a small amount in terms of appearance.
  • a high rate e.g. 30 frames per second
  • a viewer of the sequence is given the impression of viewing a "movie clip”.
  • a frame of the video includes at least two "layers" of visual content.
  • One or more layers represent the non-replaceable content.
  • One or more layers represent replaceable characters.
  • a replaceable character may be replaced with user-supplied content according to the method(s) as described in the present disclosure. All layers are composited together in order to produce a processed frame, or an output frame, associated with the frame.
  • the video may also include one or more audio tracks.
  • the replaceable character audio content occupies one single audio track. Additional audio tracks are used to store audio content for each replaceable character.
  • This per-character content is then further subdivided into individual elements, each representing a "sound bite" (e.g. a short voiceover speech element, or a noise element) for that character in that specific story.
  • an original video document contains only original, or "reference”, material.
  • the replaceable reference content consists of some or all of the graphical elements for each replaceable character, saved on a frame-by-frame basis. At a minimum, this content consists of the replaceable character' s head or face as it appears in each frame of the reference video content.
  • Replaceable reference content may also include elements such as hands, feet, etc. where it may be desirable to offer the users a selectable set of display options (e.g. skin colour).
  • the non-replaceable visual content may consist of graphical assets, arranged as sets of assets on a per-frame basis in an animation sequence that are normally used to generate video content, but with all replaceable content removed.
  • This form of non-replaceable visual content is packaged as a number of asset layers per frame which, when combined with the associated per-frame replaceable content, forms a complete sequence of video frames.
  • the non-replaceable content may alternatively consist of standard video content, with replaceable reference content masked (or removed) from each video frame.
  • replaceable character audio content are extracted from the original video content.
  • the video is deconstructed on a frame-by-frame basis, either in real time or as a separate pre-processing stage where the frames are stored in a database. In either case, the deconstructed video frames are then subsequently combined with the associated per-frame replaceable content, forming a complete sequence of video frames.
  • a user provides material for all replaceable content (i.e., audio and visual) for a given story.
  • audio material the user typically provides their own “sound bite” (voiceover, etc.) for each element in a replaceable character's audio track.
  • visual material the user produces a facial expression identified by a facial expression identifier or mimics the original replaceable character's video sequence, particularly, a facial expression of the original replaceable character in a key frame at a time instant.
  • the feature of the facial expression of the user is extracted from the user photographic image captured by the camera 101 of the mobile device 100.
  • the feature of facial expression of the character in the key frame is also extracted.
  • the mathematical difference between the character's feature and the user's features is then used to modify the original character's facial appearance in order to better resemble the user's facial appearance.
  • This resemblance includes, but is not limited to, the position and shape of: eyes, eyebrows, nose, mouth, and facial outline / jawline, as described with reference Figs. 2(a) and (b), Figs. 3 to 6.
  • the user produces distinctive or representative facial expressions identified by facial expression identifiers or mimics distinctive or representative facial expressions in different key frames at different time instants.
  • the features of the facial expression of both the user and the original replaceable character at the different time instants are extracted.
  • the method(s) described in the present disclosure then dynamically creates a facial image of the character by using an algorithm for example, interpolation and/or extrapolation based on these facial expression features, as described with reference to Figs. 7 and 8.
  • Fig. 1 illustrates an example mobile device 100 for creating a video including a character in accordance with the present disclosure.
  • the mobile device includes a camera 101, a display 103, and a processor 105.
  • the camera 101, the display 103 and the processor 105 are connected to each other via a bus 107.
  • the mobile device 100 may also include a microphone 109, and a memory device 111.
  • the camera 101 is an optical device that captures photographic images of the user of the mobile device 100.
  • the photographic images captured by the camera 101 are transmitted from the camera 101 to the processor 105 for further processing, or to the memory device 111 for storage.
  • the display 103 in this example is a screen to present visual content to the user under control of the processor 105.
  • the display 103 displays images to the user of the mobile device 100.
  • the images can be those captured by the camera 101, or processed by the processor 105, or retrieved from the memory device 111.
  • the display 103 is able to present a graphic user interface to the user, as shown in Fig. 1.
  • the graphic user interface includes one or more "pages".
  • Each of the pages includes one or more graphic user interface elements, for example, buttons, menus, drop-down list, text boxes, picture boxes, etc. to present visual content to the user or to receive commands from the user, as shown in Fig. 1, which represents one of the pages included in the graphic user interface.
  • the display 103 can also be a screen with a touch-sensitive device (not shown in Fig. 1).
  • a virtual keyboard is displayed on the display 103, and the display 103 is able to receive commands through the touch-sensitive device when the user touches the virtual keys of the virtual keyboard, as shown in Fig. 3(c).
  • the memory device 111 is a computer-readable medium that stores a computer software product.
  • the memory device 111 can be part of the processor 105, for example, a Random Access Memory (RAM) device, a Read Only Memory (ROM) device, a FLASH memory device, which is integrated with the processor 105.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • FLASH memory device which is integrated with the processor 105.
  • the memory device 111 can also be a device separate from the processor, for example, a floppy disk, a hard disk, an optical disk, a USB stick.
  • the memory device 111 can be directly connected to the bus 107 by inserting the memory device 111 into an appropriate interface provided by the bus 107.
  • the memory device 111 is located remotely and connected to the bus 107 through a communication network (not shown in Fig. 1).
  • the computer software product stored in the memory device 111 is downloaded, through the communication network, to the processor 105 for execution.
  • the computer software product includes machine-readable instructions.
  • the processor 105 of the mobile device 100 loads the computer software product from the memory device 111 and reads the machine-readable instructions included in the computer software product. When these machine-readable instructions are executed by the processor 105, these instructions cause the processor 105 to perform one or more method steps described below.
  • Fig. 2(a) illustrates an example method 200 for creating a video including a character on the mobile device 100.
  • the method 200 is performed by the processor 105 of the mobile device 100.
  • the processor 105 is configured to
  • Fig. 2(b) illustrates another example method 210 for creating a video including a character on the mobile device 100.
  • the method 210 is performed by the processor 105 of the mobile device 100.
  • the processor 105 is configured to
  • the processor 105 is also configured to present, on the display 103, the first frame of the video in the graphic user interface.
  • the processor 105 repeats steps (d) to (g) to create the second frame of the video.
  • the first frame of the video is created by modifying the reference facial image of the character with reference to the corresponding user facial feature. Therefore, the original character's visual style is not replaced by a given user's visual style. Instead, the facial expression of the user is used to influence the facial expression of the character.
  • This method enables replacement of certain visual style elements with a given user's own style elements. Although this method is described with reference to facial expressions, the method is also applicable to skin tone, eye colour, etc.
  • the content generated by the method(s) described in the present disclosure is significantly personalised for each user, and it is constructed "on demand" in real time from sets of associated asset elements.
  • the resulting content (a sequence of frames) can then be immediately displayed on a device.
  • the generated content may be used to produce a final multimedia asset such as a static, viewable Youtube asset.
  • Fig. 3 illustrates the graphic user interface in accordance with the present disclosure.
  • the processor 103 creates 211 a graphic user interface on the mobile device 100 to capture by the camera 101 multiple photographic facial images of a user for respective multiple facial expressions.
  • the graphic user interface starts with page (a) as shown in Fig. 3, which presents on the display 103 a movie library consisting of one or more movies. As shown in page (a), there are multiple movies available for the user to choose to work on, for example, "Kong Fu Panda", “Fast Friends", “Frozen”, etc. The user chooses "Fast Friends", and the graphic user interface proceeds to page (b).
  • Page (b) shows a plurality of characters in this movie, for example, a boy, a turtle, and a worm. The user can select one of the characters by touching the character. The user can also select one of the characters by entering the name of the character through a virtual keyboard presented in the graphic user interface, as shown in page (c). In the example shown in page (c), the character of the boy is selected by the user. Upon selection of the character, the graphic user interface proceeds to page (d).
  • Page (d) shows a list of facial expression identifiers to identify facial expressions.
  • the facial expression identifiers serve the purpose of guiding the user to produce facial expressions identified by the facial expression identifiers.
  • a facial expression identifier can be a text string indicative of the name of a facial expression, for example, "Smile”, “Frown”, “Gaze”, “Surprise”, and “Grave”, as shown in page (d).
  • the facial expression identifier can include an icon, for example, the icon for facial expression "Gaze”.
  • the facial expression identifier can also include a reference facial image of the character extracted from movie, for example, the facial image of the character in a frame of the movie where the character is "surprised", which makes it easier for the user to produce the corresponding facial expression.
  • the facial expression identifier can take other forms without departing from the scope of the present disclosure.
  • the user is producing a facial expression identified by a text string "Surprise” with a reference facial image of the character.
  • the user recognises the facial expression identifier and produces the corresponding facial expression.
  • the facial image of the user is captured by the camera 101 and presented in a live view of the graphic user interface.
  • the live view of the user's facial image is positioned next to the camera 101 to alleviate the issue where the user does not appear to look at the camera 101 when the user is looking at the live view.
  • the processor 105 also displays the reference facial image of the character in a character view of the graphic user interface.
  • the live view is also positioned next to the reference facial image of the character to make it easier for the user to compare the facial expression of the user and the facial expression of the character.
  • the processor 105 further superimposes the live view of the facial image of the user on the reference facial image of the character to make it even easier for the user to compare the facial expression of the user and the facial expression of the character.
  • the user or another person clicks on the shutter button of the graphic user interface to capture the photographic facial image of the user.
  • the photographic facial image of the user can be displayed in a picture box associated with the facial expression identifier.
  • the photographic facial images of the user for facial expressions "Smile” and "Frown” are displayed in respective picture boxes, as shown in page (d).
  • the processor 105 may retrieve photographic facial images of the user that have been stored in the memory device 111 and associate the photographic facial images with the corresponding facial expression identifiers.
  • the photographic facial image of the user is transmitted from the camera 101 to the processor 105.
  • the processor 105 extracts 213 a user facial feature "U4" from the
  • the processor 105 stores 215 in a user feature table associated with the facial expression identifier "Surprise” the user facial feature "U4", as shown in the fourth entry of the user feature table below.
  • the processor 105 records, through the microphone 109, audio data "S4" associated with the user facial feature "U4".
  • the processor 105 further stores the audio data "S4" in the user feature table in association with the facial expression identifier "Surprise", as shown in the fourth entry of the user feature table below.
  • the processor 105 repeats the above steps for each expression identifier in page (d), and populates the user feature table for the character of the boy, which associates the facial expression identifiers with the corresponding user facial features and audio data. For other characters in the movie, the processor 105 can similarly generates respective user feature table for those characters.
  • Figs. 4 and 5 illustrate facial features in accordance with the present disclosure.
  • Facial features in the present disclosure include a set of control points.
  • Fig. 4(a) represents a facial image of an object, which is captured by a camera.
  • the object in the present disclosure can be a user or a character in a movie.
  • the facial image in Fig. 4(a) shows the object to be generally front-facing such that all key areas of the face are visible: [both] eyes, [both] eyebrows, nose, mouth, and jawline. Ideally, these areas should be largely unobstructed.
  • the dots in Fig. 4(b) represents a set of control points extracted by the processor 105.
  • a third party software library is used to extract the set of points from the facial image shown in Fig. 4(a).
  • the set of control points that are extracted from the facial image may comply with an industry standard, for example, MPEG-4, ISO/IEC 14496-1, 14496-2, etc.
  • the control points shown in Fig. 5 comply with the MPEG-4 standard.
  • the facial shape of the object may be reconstructed by connecting those controls with segments.
  • Fig. 6 illustrates a detailed process 400 for creating a video including a character on the mobile device 100 in accordance with the present disclosure.
  • a storyline is shown in Fig. 6 to indicate a sequence of facial expression identifiers of the character of the boy over time.
  • facial expression identifiers there are five facial expression identifiers labelled along the storyline at five time instants "A” to "E", which are “Smile”, “Gaze”, “Frown”, “Grave”, and “Smile”.
  • These facial expression identifiers indicate the facial expressions of the character in the frames at the five time instants.
  • the processor 105 also extracts frames at the five time instants from the video document of the movie "Fast Friends".
  • the frame at time instant "A" contains a facial image of the character that corresponds to the facial expression identified by the facial expression identifier "Smile”.
  • the facial image of the character at time instant "A” is also shown in Fig. 6 for description purposes.
  • the processor 105 extracts a facial expression feature "Rl" from the facial image of the character as a reference facial feature associated with the facial expression identifier "Smile”.
  • the facial image of the character at time instant "A” is used as a reference facial image associated with the facial expression identifier "Smile” and the reference facial feature "Rl”.
  • the processor 105 selects 217 one of the multiple user facial features based on the facial expression identifier "Smile" associated with the frame at time instant "A" in the video.
  • the processor 105 selects a user facial feature "Ul” since the user facial feature "Ul” is associated with the facial expression identifier "Smile” in the user feature table.
  • the processor 105 may further select audio data "S I" associated with the facial expression identifier "Smile”.
  • the processor 105 determines 219 a transformation that transforms the reference facial feature "Rl" associated with the facial expression identifier "Smile" into an
  • the transformation can be a transformation matrix that transforms the control points of the reference facial feature "Rl” into an approximation or representation of the control points of the selected user facial feature "Ul”.
  • the processor 105 modifies 221, based on the transformation, the reference facial image associated with the facial expression identifier "Smile” and the reference facial feature "Rl". Particularly, the processor 105 may modify the reference facial image by changing the positions of pixels in the reference facial image based on the transformation. The processor 105 then creates 223 the frame at time instant "A" of the video based on the modified reference facial image by for example combining the modified reference facial image and the selected audio data "S I" associated with the facial expression identifier "Smile”.
  • the user-recorded audio data may be associated with a facial expression
  • the audio data may equally be independent from the facial expressions but otherwise associated with the story line.
  • the user may record audio data for what the character says in a particular scene where no facial expression identifier is associated with frames in that scene.
  • the proposed methods and systems may perform only the disclosed face modification techniques or only the audio voice-over techniques or both.
  • the processor 105 repeats the above process for each of the characters contained in the frame at time instant "A" and/or each of the frames at the five time instants "A" to "E” along the storyline.
  • the frames at those time instants in the video contain personal expression features of the user, and thus the video becomes more personalised and user- friendly when played, as shown on page (f) of the graphic use interface shown in Fig. 3. It can be seen from page (f) that the shape of the face of the character is more like the user's actual face than the original character's face is.
  • Fig. 7 illustrates an example mobile device 700 for creating an output frame for a character in a video in accordance with the present disclosure.
  • the mobile device 700 includes a camera 701, a display 703, and a processor 705.
  • the camera 701, the display 703 and the processor 705 are connected to each other via a bus 707.
  • the mobile device 700 may also include a microphone 709, and a memory device 711.
  • the camera 701 is an optical device that captures photographic images of the user of the mobile device 700.
  • the photographic images captured by the camera 701 are transmitted from the camera 701 to the processor 705 for further processing, or to the memory device 711 for storage.
  • the display 703 in this example is a screen to present visual content to the user under control of the processor 705.
  • the display 703 displays images to the user of the mobile device 700.
  • the images can be those captured by the camera 701, or processed by the processor 705, or retrieved from the memory device 711.
  • the display 703 is able to present a graphic user interface to the user, as shown in Fig. 7.
  • the memory device 711 is a computer-readable medium that stores a computer software product.
  • the memory device 711 can be part of the processor 705, for example, a Random Access Memory (RAM) device, a Read Only Memory (ROM) device, a FLASH memory device, which is integrated with the processor 105.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • FLASH memory device which is integrated with the processor 105.
  • the memory device 711 can also be a device separate from the processor, for example, a floppy disk, a hard disk, an optical disk, a USB stick.
  • the memory device 711 can be directly connected to the bus 707 by inserting the memory device 711 into an appropriate interface provided by the bus 707.
  • the memory device 711 is located remotely and connected to the bus 707 through a communication network (not shown in Fig. 7).
  • the computer software product stored in the memory device 711 is downloaded, through the communication network, to the processor 705 for execution.
  • the computer software product includes machine-readable instructions.
  • the processor 705 of the mobile device 700 loads the computer software product from the memory device 711 and reads the machine-readable instructions included in the computer software product. When these machine-readable instructions are executed by the processor 705, these instructions cause the processor 705 to perform one or more method steps described below.
  • Fig. 8 illustrates an example method 800 for creating an output frame for a character in a video in accordance with the present disclosure.
  • the method 800 is used to create an output frame based on a first reference facial image and a second reference facial image of the character.
  • the first reference facial image of the character is in a first key frame of the video
  • the second reference facial image of the character is in a second key frame of the video.
  • the output frame can be a frame between the first key frame and the second key frame along the storyline, or outside the first key frame and the second key frame along the storyline.
  • the method 800 is performed by the processor 705 of the mobile device 700.
  • the camera 701 of the mobile device 700 captures a first photographic facial image and a second photographic facial image of the user, and the processor 705 is configured to determine 810 an estimated reference facial feature of the character based on the first reference facial image and the second reference facial image of the character;
  • the processor 705 is further configured to present the output frame on the display 103.
  • the method 800 determines the estimated reference facial feature of the character and the estimated user facial feature of the user, and determines the transformation based on the estimated reference facial feature of the character and the estimated user facial feature of the user. This dramatically reduces the time required to create the output frame. A detailed process for creating the output frame is described below.
  • two time instants "A", “B” along the storyline are selected by the user or the director as the facial expressions of the character at these time instants are distinctive or representative.
  • the facial expressions of the character at the time instants "A", “B” are identified as “Surprise” and “Grave”, respectively.
  • a facial image of the character is extracted from the first key frame at time instant "A”, and is referred to as a first reference facial image.
  • a facial image of the character is extracted from the second key frame at time instant "B”, and is referred to as a second reference facial image. Both reference facial images of the character are shown in the graphic user interface for the user's reference.
  • the processor 705 determines 810 an estimated reference facial feature of the character based on the first reference facial image and the second reference facial image of the character. Particularly, the processor 705 extracts a reference facial feature of the character from the first reference facial image of the character, referred to as a first reference facial feature. The processor 705 also extracts a reference facial feature of the character from the second reference facial image of the character, referred to as a second reference facial feature.
  • the processor 705 further determines a first distance between the first reference facial feature of the first reference facial image and the second reference facial feature of the second reference facial image.
  • the processor 705 determines the estimated reference facial feature of the character based on the first distance, the first reference facial feature and the second reference facial feature.
  • the first reference facial feature includes a first set of control points
  • the second reference facial feature includes a second set of control points.
  • the first distance is indicative of a distance between the first set of control points and the second set of control points.
  • the processor 705 determines the estimated reference facial feature of the character by performing an interpolation operation based on the first reference facial feature and the second reference facial feature with respect to the first distance.
  • the processor 705 determines the estimated reference facial feature of the character by performing an extrapolation operation based on the first reference facial feature and the second reference facial feature with respect to the first distance.
  • the user recognises the first facial expression identifier "Surprise” and/or observes the first reference facial image of the character (i.e., the facial image of the character at time instant "A"), and produces a facial expression that corresponds to the first facial expression identifier "Surprise”. If the user or the director is satisfied with the facial expression of the user, a facial image of the user is captured by the camera 701, referred to as a first
  • the user recognises the second facial expression identifier "Grave” and/or observes the second reference facial image of the character (i.e., the facial image of the character at time instant "B"), and produces a facial expression that corresponds to the second facial expression identifier "Grave”. If the user or the director is satisfied with the facial expression of the user, a facial image of the user is captured by the camera 701, referred to as a second photographic facial image.
  • the processor 705 may retrieve photographic facial images of the user that have been stored in the memory device 711 and associate the photographic facial images with the corresponding facial expression identifiers.
  • Both the first photographic facial image and the second photographic facial image of the user are transmitted from the camera 701 to the processor 705.
  • the processor 705 determines 820 an estimated user facial feature of a user based on the first photographic facial image and the second photographic facial image of the user. Particularly, the processor 705 extracts a facial feature from the first photographic facial image of the user, referred to as a user first facial feature. The processor 705 also extracts a facial feature from the second photographic facial image of the user, referred to as a user second facial feature.
  • the processor 705 further determines a second distance between the user first facial feature and the user second facial feature.
  • the processor 705 determines the estimated user facial feature of the user based on the second distance, the user first facial feature and the user second facial feature.
  • the user first facial feature includes a third set of control points
  • the user second facial feature includes a fourth set of control points.
  • the second distance is indicative of a distance between the third set of control points and the fourth set of control points.
  • the processor 705 determines the estimated user facial feature of the user by performing an interpolation operation based on the user first facial feature and the user second facial feature with respect to the second distance.
  • the processor 705 determines the estimated user facial feature of the user by performing an extrapolation operation based on the user first facial feature and the user second facial feature with respect to the second distance.
  • Fig. 9 illustrates the interpolation process 900 in more detail.
  • the storyline 901 is annotated with facial expression identifiers and Fig. 9 also shows the corresponding control points of the facial features.
  • the y-axis 902 indicates the y-position of the central control point 903 of the lips.
  • the storyline evolves from a smile 911 to a frown 912 back to a smile 913 and finally into a frown 914 again.
  • the control point 903 starts from a low position 921 into a high position 922, back to a low position 923 and finally into a high position 924.
  • processor 705 may interpolate the y-position of control point 903 using a linear interpolation method. In some examples, however, this may lead to an unnatural appearance at the actual transition points, such as a sharp corner at point 922. Therefore, processor 704 may generate a spline interpolation 904 using the y-coordinates of the points 921, 922, 923 and 924 as knots. This results in a smooth transition between the facial expressions. While control point 903 moves only in y-direction in this example, control points are generally allowed to move in both dimensions. Therefore, the spline curve 904 may be a two-dimensional spline approximation of the knots to allow the processor 705 to interpolate both x- and y-coordinates.
  • the processor 705 determines 830 a transformation that transforms the estimated reference facial feature of the character into an approximation or representation of the estimated user facial feature of the user.
  • the transformation can be a transformation matrix that transforms the control points of the estimated reference facial feature of the character into an approximation or representation of the control points of the estimated user facial feature of the user.
  • the processor 705 determines a further reference facial image of the character by performing an interpolation operation based on the first reference facial image and the second reference facial image of the character, referred to as a third reference facial image.
  • the third reference facial image is associated with the estimated reference facial feature of the character.
  • the processor 705 determines the third reference facial image of the character by performing an extrapolation operation based on the first reference facial image and the second facial image.
  • the processor 705 modifies 840, based on the transformation, the third reference facial image of the character by for example changing the positions of pixels in the third reference facial image. Since the estimated reference facial feature of the character may represent a spline curve, referred to as a first spline curve, and the estimated user facial feature of the user may represent another spline curve, referred to as a second spline curve, modifying the third reference facial image of the character also results in transforming the first spline curve into an approximation or representation of the second spline curve.
  • the processor 705 repeats the above steps for each of the characters in the first key frame and the second key frame, and creates 850 the output frame for the characters in the video based on the modified third reference facial images for those characters. For example, the processor 705 may create the output frame by combining the modified third reference facial images into the output frame.
  • processor 750 may apply a perspective
  • processor 750 applies the transformation on 2D coordinates of control points to create the impression of a 3D rotation.
  • Fig. 10(a) shows a transformation of the 2D coordinates of the control points to create the impression of a 3D rotation of the character's face. The degree of rotation may be known from the storyline and therefore, processor 750 calculates a transformation that creates the corresponding impression. This transformation may also be integrated into the previous transformation applied to the reference image. Processor 750 may also create the impression of perspective by down-scaling points that are further away from the virtual camera.
  • Fig. 10b illustrates a simplified 3D model of a character's head.
  • This 3D model may be created by a designer or developer once for each character.
  • processor 750 can calculate which control points are not visible because they are occluded by other parts of the head. In the example of Fig. 10(b), the right eye is occluded and not visible. Applying this calculation to the output image to hide the parts of the images that are not visible according to the 3D model, increases the realistic impression of the created video.
  • the calculation may be based on an assumed pivot point, that may be the top of the neck.
  • the processor 750 can then perform the transformation based on rotation and tilt around the pivot point.
  • Suitable computer readable media may include volatile (e.g. RAM) and/or non-volatile (e.g. ROM, disk) memory, carrier waves and transmission media.
  • Exemplary carrier waves may take the form of electrical, electromagnetic or optical signals conveying digital data steams along a local network or a publically accessible network such as internet.
  • authentication refers to the action and processes of a computer system, or similar electronic computing device, that processes and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention concerne la création de vidéos. Un dispositif mobile crée une interface utilisateur graphique servant à capturer au moyen de la caméra du dispositif de multiples images photographiques du visage d'un utilisateur pour de multiples expressions faciales respectives d'un personnage dans la vidéo. À l'aide des multiples images photographiques du visage, le dispositif modifie les images stockées du personnage en faisant correspondre les caractéristiques du visage du personnage avec les caractéristiques du visage de l'utilisateur pour les multiples expressions du visage du personnage dans la vidéo et crée la vidéo en fonction des images modifiées du personnage. L'expression du visage de l'utilisateur est utilisée pour influencer l'expression du visage du personnage. Ce procédé permet le remplacement de certains éléments de style visuels par les propres éléments de style d'un utilisateur donné.
PCT/AU2017/050763 2016-07-25 2017-07-25 Création de vidéos avec des expressions faciales WO2018018076A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/320,966 US11003898B2 (en) 2016-07-25 2017-07-25 Creating videos with facial expressions
US17/187,604 US20210264139A1 (en) 2016-07-25 2021-02-26 Creating videos with facial expressions

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201662366406P 2016-07-25 2016-07-25
US201662366375P 2016-07-25 2016-07-25
AU2016902921 2016-07-25
AU2016902919A AU2016902919A0 (en) 2016-07-25 Creating videos with facial expressions
US62/366,375 2016-07-25
AU2016902921A AU2016902921A0 (en) 2016-07-25 Modifying facial expressions in videos
US62/366,406 2016-07-25
AU2016902919 2016-07-25

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US16/320,966 A-371-Of-International US11003898B2 (en) 2016-07-25 2017-07-25 Creating videos with facial expressions
US17/187,604 Continuation US20210264139A1 (en) 2016-07-25 2021-02-26 Creating videos with facial expressions

Publications (1)

Publication Number Publication Date
WO2018018076A1 true WO2018018076A1 (fr) 2018-02-01

Family

ID=61015160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2017/050763 WO2018018076A1 (fr) 2016-07-25 2017-07-25 Création de vidéos avec des expressions faciales

Country Status (1)

Country Link
WO (1) WO2018018076A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476871A (zh) * 2020-04-02 2020-07-31 百度在线网络技术(北京)有限公司 用于生成视频的方法和装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130147788A1 (en) * 2011-12-12 2013-06-13 Thibaut WEISE Method for facial animation
US20130215113A1 (en) * 2012-02-21 2013-08-22 Mixamo, Inc. Systems and methods for animating the faces of 3d characters using images of human faces
US20160275341A1 (en) * 2015-03-18 2016-09-22 Adobe Systems Incorporated Facial Expression Capture for Character Animation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130147788A1 (en) * 2011-12-12 2013-06-13 Thibaut WEISE Method for facial animation
US20130215113A1 (en) * 2012-02-21 2013-08-22 Mixamo, Inc. Systems and methods for animating the faces of 3d characters using images of human faces
US20160275341A1 (en) * 2015-03-18 2016-09-22 Adobe Systems Incorporated Facial Expression Capture for Character Animation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476871A (zh) * 2020-04-02 2020-07-31 百度在线网络技术(北京)有限公司 用于生成视频的方法和装置
US11670015B2 (en) 2020-04-02 2023-06-06 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for generating video
CN111476871B (zh) * 2020-04-02 2023-10-03 百度在线网络技术(北京)有限公司 用于生成视频的方法和装置

Similar Documents

Publication Publication Date Title
US20210264139A1 (en) Creating videos with facial expressions
US9626788B2 (en) Systems and methods for creating animations using human faces
US11410457B2 (en) Face reenactment
US8988436B2 (en) Training system and methods for dynamically injecting expression information into an animated facial mesh
US7859551B2 (en) Object customization and presentation system
US8655152B2 (en) Method and system of presenting foreign films in a native language
CN112822542A (zh) 视频合成方法、装置、计算机设备和存储介质
US8135724B2 (en) Digital media recasting
US20100079491A1 (en) Image compositing apparatus and method of controlling same
WO2019089097A1 (fr) Systèmes et procédés permettant de générer un scénarimage récapitulatif à partir d'une pluralité de trames d'image
CN108846886B (zh) 一种ar表情的生成方法、客户端、终端和存储介质
US20180143741A1 (en) Intelligent graphical feature generation for user content
US10748579B2 (en) Employing live camera feeds to edit facial expressions
US20240087204A1 (en) Generating personalized videos with customized text messages
EP3912136A1 (fr) Systèmes et procédés pour générer des vidéos personnalisées avec des messages textuels personnalisés
US11582519B1 (en) Person replacement utilizing deferred neural rendering
US11581020B1 (en) Facial synchronization utilizing deferred neural rendering
CN113542624A (zh) 生成商品对象讲解视频的方法及装置
KR20160010810A (ko) 실음성 표출 가능한 실사형 캐릭터 생성 방법 및 생성 시스템
WO2018018076A1 (fr) Création de vidéos avec des expressions faciales
Seymour et al. Beyond deep fakes
KR102622709B1 (ko) 2차원 영상에 기초하여 3차원 가상객체를 포함하는 360도 영상을 생성하는 방법 및 장치
KR100965622B1 (ko) 감성형 캐릭터 및 애니메이션 생성 방법 및 장치
Hetayothin Protanopia: An Alternative Reading Experience of a Digital Comic
KR20200076234A (ko) 3차원 vr 콘텐츠 제작 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17833091

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17833091

Country of ref document: EP

Kind code of ref document: A1