NZ762338B2 - On-set facial performance capture and transfer to a three-dimensional computer-generated model - Google Patents

On-set facial performance capture and transfer to a three-dimensional computer-generated model Download PDF

Info

Publication number
NZ762338B2
NZ762338B2 NZ762338A NZ76233820A NZ762338B2 NZ 762338 B2 NZ762338 B2 NZ 762338B2 NZ 762338 A NZ762338 A NZ 762338A NZ 76233820 A NZ76233820 A NZ 76233820A NZ 762338 B2 NZ762338 B2 NZ 762338B2
Authority
NZ
New Zealand
Prior art keywords
pump
conveying
model
conveyed
interval
Prior art date
Application number
NZ762338A
Other versions
NZ762338A (en
Inventor
Ferrall Nunge Adam
Phillips Cary
Yost Jeffery
Estebecorena Leandro
Bao Michael
Helman Pablo
Karefelt Per
Fedkiw Ronald
Grabli Stephane
Original Assignee
Lucasfilm Entertainment Company Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/681,300 external-priority patent/US11069135B2/en
Application filed by Lucasfilm Entertainment Company Ltd filed Critical Lucasfilm Entertainment Company Ltd
Publication of NZ762338A publication Critical patent/NZ762338A/en
Publication of NZ762338B2 publication Critical patent/NZ762338B2/en

Links

Abstract

method of transferring a facial expression from a subject to a computer generated character that includes receiving a plate with an image of the subject’s facial expression, a three-dimensional parameterized deformable model of the subject’s face where different facial expressions of the subject can be obtained by varying values of the model parameters, a model of a camera rig used to capture the plate, and a virtual lighting model that estimates lighting conditions when the image on the plate was captured. The method can solve for the facial expression in the plate by executing a deformation solver to solve for at least some parameters of the deformable model with a differentiable renderer and shape from shading techniques, using, as inputs, the three-dimensional parameterized deformable model, the model of the camera rig and the virtual lighting model over a series of iterations to infer geometry of the facial expression and generate a final facial mesh using the set of parameter values of the deformable model which result in a facial expression that closely matches the expression of the subject in the plate. an be obtained by varying values of the model parameters, a model of a camera rig used to capture the plate, and a virtual lighting model that estimates lighting conditions when the image on the plate was captured. The method can solve for the facial expression in the plate by executing a deformation solver to solve for at least some parameters of the deformable model with a differentiable renderer and shape from shading techniques, using, as inputs, the three-dimensional parameterized deformable model, the model of the camera rig and the virtual lighting model over a series of iterations to infer geometry of the facial expression and generate a final facial mesh using the set of parameter values of the deformable model which result in a facial expression that closely matches the expression of the subject in the plate.

Description

ON-SET FACIAL PERFORMANCE CAPTURE AND TRANSFER TO A DIMENSIONAL COMPUTER-GENERATED MODEL CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the t of U.S. Patent Application No. 62/814,994, filed March 7, 2019 and U.S. Patent Application No. 16/681,300, filed November 12, 2019. The disclosure of which are incorporated by nce herein in their entirety for all purposes.
FIELD The present disclosure relates lly to performance capture, and more specifically to methods, techniques and systems for capturing facial expressions from a subject during a performance and transferring the captured expressions to a three-dimensional model of the subject.
BACKGROUND OF THE INVENTION Facial expression transfer is the act of adapting the facial sions of a subject, such as an actor, to a three-dimensional computer-generated (CG) model that can be used to create visual effects that can then be incorporated into animations, movies, video games and the like. Mastering facial expression transfer and other aspects of facial animation is a longstanding nge in computer graphics. The face can describe the emotions of a character, convey their state of mind, and hint at their future actions. Audiences are particularly trained to look at faces and identify these subtle characteristics. Accurately capturing the shape and motion of real human faces in the expression transfer process can play an important role in transferring subtle facial expressions of the subject to a er-generated character giving the computer-generated ter natural, life-like expressions.
In order to transfer facial expressions from a subject to a computer-generated model, the subject’s facial expressions first have to be ed, for example, on l film or another appropriate . Some traditional techniques that are used to capture facial expressions of a subject (e.g., of an actor during a performance) rely on numerous markers positioned at strategic locations on an actor’s face and a head-mounted, high-resolution camera that is directed towards the actor’s face. The camera can then be used to film the actor’s face during his or her performance. Software can track movement of the markers as the actor’s face displays different expressions during the performance and translate the marker movement into a computer-generated model that mimics the actor’s facial expressions.
[0005] While such techniques have been successfully used in a variety of different situations including in various well-known , it can be cumbersome and cting to actors to wear a head-mounted camera during a performance and to have their faces d with dozens of markers.
SUMMARY OF THE INVENTION
[0006] Embodiments of the disclosure n to methods and systems for capturing the facial expressions of an actor or other subject without the use of a head-mounted camera and in film set ions. The captured facial expressions can be transferred to a threedimensional parametrized able model of the actor or subject and used to in the context of visual effects production, including but not limited to, animations, movies, video clips, video games, and virtual and/or augmented reality content. In some embodiments the method ively deforms a dimensional mesh with the goal to minimize the difference between a 3D render of that mesh and the plate (i.e., a frame from the captured footage). A differentiable renderer can be used to generate the 3D face renders making it possible to leverage well-known derivative-based minimization techniques to meet the goal.
[0007] Some embodiments of the invention provide a method of transferring a facial expression from a subject to a computer generated character. The method includes receiving a plate with an image of the subject’s facial expression, a dimensional parameterized deformable model of the subject’s face where different facial expressions of the subject can be obtained by varying values of the model parameters, a model of a camera rig used to capture the plate, and a virtual lighting model that estimates lighting conditions when the image on the plate was ed. The method can solve for the facial expression in the plate by executing a deformation solver to solve for at least some parameters of the deformable model with a differentiable renderer and shape from shading techniques, using, as inputs, the three-dimensional parameterized deformable model, the model of the camera rig and the virtual ng model over a series of iterations to infer geometry of the facial expression and generate a final facial mesh using the set of parameter values of the deformable model which result in a facial sion that closely matches the expression of the subject in the plate.
In some embodiments the three-dimensional parameterized deformable model can include a plurality of blendshapes that represent different facial expressions of the subject and include a set of hape weight values, one per blendshape. The final facial mesh is obtained by ng a set of weighted blendshapes that best mimic the facial expression in the plate. In various embodiments t he deformable model can also include rotation and translation values that represent a rigid adjustment of the subject’s head as well as a delta vector that represents a per vertex displacement used in transferring the facial expression of the subject to the computer-generated character, which can be particularly useful where the computer-generated character has a head sized or shaped differently than the head of the t.
In some embodiments the plate can be an image made up of thousands or even more than a million pixels. Each pixel can have a particular RGB value. During each iteration of the series of iterations the entiable renderer can generate a rendering of the deformable model and a solver can then try to minimize differences between the RGB values of the plate and the RGB values of corresponding pixels in the rendered version of the deformable model.
An initial iteration of the solving can include: ing an initial facial mesh generated from the three-dimensional deformable model representing a neutral expression of the subject; trying to minimize differences n RGB values of the plate and RGB values of the ed initial facial mesh representing the neutral expression; and generating an updated facial mesh including a set of weighted blendshapes that represents a facial expression of the subject that is more similar to the facial expression of the subject in the plate than is the initial facial mesh. In each additional ion of the solving step, an output of that iteration can be ted that is closer to the actual representation of the subject in the plate than an output of the previous iteration.
[0010] In some embodiments, solving for the facial expression in the plate can include executing a plurality of different solvers where each solver executes multiple iterations before the next solver is run and where each solver has at least one cost function associated with it that defines an objective that the solver tries to minimize. The plurality of different solvers can be executed in a predetermined sequence that is defined by a recipe ed from a y that stores multiple predetermined recipes. Each predetermined recipe in the library can include one or more ation solvers each of which has at least one cost function associated with it.
In additional embodiments, a method of erring a facial sion from a subject during a performance to a computer generated character can include obtaining at least: (i) digital video footage of the performance in the format of a plurality of tially ordered plates each of which includes an image of the subject’s facial expression during the performance; (ii) a three-dimensional terized deformable model of the subject’s face where different facial expressions of the subject can be obtained by varying the values of the model parameters; (iii) a model of a camera rig used to capture the performance; and (iv) a virtual ng model that estimates lighting conditions used during the performance. The method can further include generating a computer model of the performance by, for each individual plate in the plurality of sequentially ordered plates, sing the individual plate independently of other plates in the ity to solve for the facial expression in the plate being processed using a differential renderer with shape from shading techniques over a series of iterations to infer geometry of the facial expression and generate a final facial mesh using the set of parameter values for the deformable model which result in a facial sion that closely matches the expression of the subject in the plate being processed where the solving uses the three-dimensional deformable model, the camera rig and the virtual ng model as .
To better understand the nature and advantages of the present invention, reference should be made to the following description and the accompanying figures. It is to be understood, however, that each of the figures is provided for the purpose of illustration only and is not ed as a definition of the limits of the scope of the present invention. Also, as a general rule, and unless it is evident to the contrary from the description, where elements in different figures use identical reference numbers, the elements are lly either identical or at least similar in function or purpose.
BRIEF DESCRIPTION OF THE DRAWINGS is a simplified diagram of an exemplary environment in which embodiments of the present invention can be employed; is a simplified diagram of an exemplary configuration of a camera system that can be used for facial mance capture according to some embodiments of the invention; is a simplified flowchart depicting a facial performance capture and expression transfer method according to some embodiments of the invention; is a simplified illustration of exemplary positions for small set of gel-based s that enable motion capture of the skull of an actor during a performance according to some embodiments of the invention; is a simplified art of steps associated with ng facial expressions of an actor captured during a performance to facial expressions of a computergenerated model of the actor according to some embodiments of the invention; is a simplified block diagram of an exemplary recipe that can be ed in block 510 of the method shown in according to some embodiments of the invention;
[0019] is a simplified block diagram of system for creating er generated imagery (CGI) and computer-aided animation that can ent or incorporate various embodiments in ance with the sure; and is a block diagram of an exemplary computer system according to some embodiments of the invention.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS Embodiments of the invention pertain to methods and systems for capturing the facial expressions of an actor or other subject during a performance without the use of a unted camera and in film set conditions allowing the actor to have full freedom of motion and full freedom of interaction with the others actors and with the set. In this manner, embodiments of the invention enable film-production quality facial motion capture with as little encumberment as possible for the actor so as to not compromise his or her performance.
Embodiments also impose as little restriction as possible on the on-set filming conditions, e.g. location, illumination conditions, camera models and settings, and the like.
Once captured, embodiments of the invention further pertain to transferring the captured facial expressions to a three-dimensional parameterized deformable model of the actor that can be used in the t of visual effects production. In some embodiments the method iteratively deforms a three-dimensional mesh with the goal to minimize the difference n a 3D render of that mesh and the plate (i.e., a frame from the captured footage). A differentiable renderer can be used by itself or in conjunction with other elements to generate the 3D face renders making it possible to leverage well-known derivative-based minimization techniques to meet the goal. e Performance Environment: In order to better understand and appreciate embodiments of the invention, reference is made below to which is a simplified diagram of an exemplary environment 100 in which embodiments of the present invention can be employed.
Environment 100 can include a performance area 102 and a backdrop 104. Performance area 102 can be a stage or any area in which one or more actors can carry out a performance. op 104 can be a green screen that facilitates post-production work or can include scenery that is appropriate for the performance. For example and solely for illustrative purposes, in backdrop 104 is an outdoor scene that includes mountains and clouds. In some ments, some or all of the scenery of backdrop 104 can be generated on a computer and displayed on one or more displays, such as large LCD or LED displays, that surround performance area 102.
One or more cameras 106 can be positioned at strategic locations (e.g., locations that help with the e and/or locations that are desirable for the director cinematically) within environment 100 to capture the performance of an actor 110. onally, one or more lights, for example LED lights, can be placed around stage 102 in order to t visible light onto the stage to lish desired lighting effects for the performance.
Embodiments of the ion can be used with a y of different cameras and are not limited to any number of cameras or to any particular camera type. F or the purpose of facial motion capture, some embodiments can include a camera system that includes at least two different types of cameras. For example, in some embodiments each camera 106 (or a subset of cameras 106) can include a first camera that is set up and configured to capture images of an actor in the visible light ngth spectrum and one or more second cameras that are set up and configured to capture images of a small set of s placed on the actor’s face in an invisible light wavelength um, e.g., infrared (IR) or ultraviolet (UV) light wavelength spectrum. The first camera is sometimes referred to herein as a g camera” and the second cameras are sometimes referred to a “witness cameras”. It is to be appreciated that the words “visible” and “invisible” used herein are to be interpreted in relation to what is detectable by the naked eye. By being ured to capture light in different spectrums, the taking camera and the one or more witness cameras can simultaneously capture different aspects of a scene based on their respective light wavelengths, thereby eliminating the need to capture two separate performances of the same scene to generate content.
Example Camera System: An example of a camera system that can be used as one or more of the cameras 106 is discussed in U.S. Patent ation 16/102,556 (“the ‘556 application”), filed on Aug. 13, 2018 and entitled “Camera Systems for Motion Capture”. The ‘556 application published on April 25, 2019 as U.S. Publication 2019-0124244 and is orated herein by reference in its entirety. For convenience, an abbreviated ption of an example of a camera system described in the ‘556 application is also depicted in which is a fied diagram of an exemplary uration of a camera system 200 that can be used for facial mance capture according to some embodiments of the invention.
[0027] As shown in camera system 200 can include a taking camera 202 along with two infrared (IR) witness cameras 204, 206 positioned on opposite sides of the taking camera.
In some embodiments the cameras can all be mounted to a moveable rig 210 and can be pointing in the same general direction such that all three cameras can capture the same scene but at different angles. Rig 210 can include wheels 208 that enable the rig to be easily moved around performance area 102 to capture different angles of an actor 110. In some embodiments, witness cameras 204, 206 can be rotated around respective pivot points 212 so that witness cameras can be positioned at different angles with respect to support ure 305. For instance, as shown in when actor 110 is positioned close to camera system 200, witness cameras 204, 206 can each be pivoted around respective pivot points 212 and be oriented at appropriate angles 214 so that the witness cameras are pointed at actor 110. On the other hand, when actor 110 is positioned further away from camera system 200, the witness cameras can be oriented at an increased angle 214 to actor 110.
A band-pass filter (not shown) can be d on each IR camera 204, 206 such that each IR camera only captures a narrow spectrum in the IR domain. Additionally, each IR camera can be fitted with an “IR ring-shaped light” (not shown) made of a set of IR LEDs emitting in the desired spectrum. The light emitted by these rings is invisible to main camera 202 but produces a consistent “flat” nation for IR cameras 204, 206 – a type of imagery that is friendlier to computer processing. Finally, the type of shading produced by these rings on the face is highly predictable since the light types and positions are precisely known, which can be used to solve for facial deformation based on shading observed on the plate by the witness camera. Other embodiments of camera system 200 do not require the witness cameras 204, 206 to be IR cameras and can d employ witness cameras that use a different spectra, but IR cameras can make the data ed relatively easy to process.
On-set Facial Performance Capture and Transfer: Embodiments of the invention and operation of camera system 200 can be better understood from an exemplary use case scenario described with respect to which is a fied flowchart depicting an on-set facial performance capture and transfer method 300 according to some embodiments of the invention. For example, unlike when ing and transferring a facial performance from archival footage, on-set facial motion capture and transfer process 300 allows for access to the , the cameras and the set as part of a set-up and/or tion tasks (step 310). Such initiation tasks can be performed prior to filming a motion picture or other video sequence. For example, the actor’s face can be “scanned” in a few ermined positions through image-based multi-view stereo techniques, the actor’s face reflectance properties (e.g. a diffuse color map) can be measured; the set or performance area (e.g., area 102) can be scanned and measured (e.g., by scanning the set with a LIDAR unit); lenses for the taking and witness cameras can be selected based on the conditions of the set along with riate aperture, ISO and other settings; tion for the lenses used in camera system 200 can be measured and estimates of the camera’s intrinsic and extrinsic parameters can be made using known calibration techniques and known approaches to calibrating color; lighting for the set and cameras can be adjusted as appropriate; and the onset illumination can be captured for each ent lighting configuration that will be used during the shoot (e.g., by capturing a stereo pair of HDRI light probes in key locations, such as where the actor(s) will stand). Additionally, sensors can be placed at various locations on set to gather useful information for the reconstruction, most commonly in the form of the witness cameras 204, 206.
Deformation Model Step 310 can also include building a facial rig for each actor. The facial rig can be a three-dimensional parameterized deformable model of the actor’s face. Parameters of the deformable model can be varied to generate different facial expressions of the actor allowing the deformable model to be lated to mimic the actor’s facial expressions. Building the facial rig typically es ing” the actor’s geometry in a set of predetermined poses.
For example, some ments can use Disney Research’s Medusa system to do the capture and rely on artists to clean up the capture result into a usable film-quality facial rig. In some embodiments, the facial rig can be made of a simple set of linear blendshapes as described generally in U.S. Patent No. 8,207,971, entitled “Controlling Animated Character Expressions”, which is incorporated by reference herein in its entirety. Other embodiments of the invention also support solving for a more x rig with rotational-translational joints and skinning as well as arbitrary functional mapping between rig controls and final blend shape weights. ments of the invention are not limited to deformable models based on blendshapes. For example, in other embodiments the three-dimensional parameterized deformable model can be made purely of per-vertex displacements. In still other embodiments, more sophisticated models that rely on per-patch deformation and don’t use blendshapes in the traditional sense of the term can be used. In various embodiments, different facial expressions can be attained by setting different parameter values for the deformable model. For example, for a three-dimensional parameterized able model based on hapes, different facial expressions can be attained from a linear combination of a selected set of facial expressions (i.e., blendshapes) from the facial rig. By in g one or more ters associated with the linear combination, a range of facial expressions can be created while utilizing relatively small amounts of computational resources.
[0032] As an example, some embodiments of the invention use a ation function that produces a facial sion mesh M by combining linearly a set of m three-dimensional blendshapes B0, B1, B2, … Bm, where each Bj is made of n es and represents a predefined canonical sion (e.g., inspired from Facial Action Coding System (FACS) shapes), where B0 is the neutral expression, and where per-vertex displacements δ are added.
A rotation R and translation t can also be applied to the resulting geometry. Thus, the deformation for a vertex of index i can be as follows: ���� ���� (���� )���� = [���� ���� (���� ) +� ���� ���� (���� ���� (���� ) − ���� ���� (���� )) + ���� (���� )]���� ∙ ���� + ���� (1) ���� −1 where the ���� ���� are the blend shape weights, i.e. the weights used to combine the blend shapes linearly. The rotation R, the translation t, the blendshape weights wj and the per-vertex displacements δ(i) are the parameters of the deformable model.
This deformation is versatile in the sense that it incorporates both a strong prior in the form of blend shapes and a less constrained deformation component through the deltas (per-vertex 3D displacements), which enables expressions to be matched which, expectedly, go beyond the abilities of the shapes alone. Some embodiments also support more complex facial rigs and ation functions which include rotational and/or translational joints and skinning (e.g., for the jaw) in addition to blendshapes and deltas. Some ments also support arbitrary functional mapping between a set of user-facing ls and final shape (or joint) weights.
Referring again to step 310, the director or other party planning the performance capture (e.g., a Director of Photography) can also select s configurations for the cameras 106 to be used in the capture session including the type of camera and lens used in the shoot for each camera 106 (or where a camera 106 is a system including both taking and witness cameras, the type of camera and lens used with each such camera in the system), the settings on the camera(s), luminosity levels, etc. In some embodiments all cameras are jam- synchronized but other embodiments are able to work with systems where only time codesynchronization is ble between the main camera and the witness cameras. In some embodiments, the witness cameras can be set to a short exposure time to limit the amount of motion blur in s camera images.
As mentioned above, in some embodiments, a small number of s (e.g., the markers can be applied to two, three, six, eight, or more points on an actor’s face) can be positioned on an actor’s face to assist in the motion capture process as described in U.S.
Patent ation No. 16/102,556, which as noted above is incorporated by reference herein.
The markers can be positioned on substantially rigid parts of an actor’s face to minimize distortion caused by facial movement during a mance. The markers enable motion capture of the s skull as he or she is performing and can also be used for deformation tracking as well. The data generated from tracking the s can be used for determining rotation and translation of the actor’s skull in each plate as opposed to being used for tracking movements in the actor’s facial expressions in accordance with some traditional techniques as mentioned in the Background of the Invention section above.
[0036] is a simplified illustration 400 of exemplary positions for markers 402a-g that enable motion capture of the skull of an actor 404 during a performance according to some ments of the present disclosure. As shown in markers 402a and 402d can be positioned at the temples of actor 404, markers 402b and 402c can be positioned along the hairline of actor 404, marker 402e can be positioned on the nose bridge, and marker 402f can be oned on the chin of actor 404. These positions are selected because they are generally substantially free of nt caused by facial expressions and/or talking. That way, the positions can closely track the movement of the skull of actor 404. By tracking these positions, the witness cameras can more accurately capture the movement of the actor’s head.
In some embodiments the markers can be retroreflective gel-based markers that reflect the ble light (e.g., IR or UV light in the bandwidth captured by the witness cameras) but are not visible to the taking camera as there is generally no visible light emitted near the optical axis of the taking camera. The markers gel-based markers can be applied to an actor’s face as if it were makeup. As a retroreflective substance, each marker, when applied to an actor’s face, can act as a surface that reflects light back to its source with a minimum of scattering along a vector that is el but te in ion from the s source. By being retroreflective, each marker can effectively negate any noise from ambient light. For ce, under normal lighting conditions indoors (i.e., absent lights directly beaming at the markers), the markers may not be visible or have negligible visibility. For instances where a set is positioned outside, the sun can emit vast amounts of IR light.
However, because the markers are retroreflective, the IR light emitted from the sun may not reflect back to the witness cameras. Instead, only the IR light emitted from the witness camera light sources (e.g., ring of IR LEDs around the witness cameras’ lenses) will get reflected back to the witness cameras. Thus, even though a taking camera and one or more witness cameras are g an actor with markers 402a-g, only the witness s will capture the markers.
[0038] By having two types of cameras 202 and 204, 206 with their respective light sources and ng markers that are only visible to witness cameras and not a taking , camera system 200 can effectively and efficiently capture two different motion picture compositions with one shoot, i.e., act of filming. Thus, with a single performance by actor 110, camera system 200 can capture images that are directly usable: (1) for an item of content (e.g., content that can be used in cinematic e) and/or driving a digital character in a virtual nment. and (2) for accurately determining the location of a digital character mapped to the head of actor 110 in a virtual environment.
Capturing the Performance After set-up and initiation tasks have been completed, camera system 200 can be used to capture the entire composition of a set, such as set 100, during a performance ( step 320). For example, the taking camera 202 can be used to film actor 110 during a performance in which the actor is surrounded by a op of rural mountains. Light sources 108 can be flood lights g white, visible light that illuminates the scene with visible light so that taking camera 202 can capture footage of actor 110 as he or she looks around, as well as any extras present in the scene and props, such as farm equipment that may be near the actor. Meanwhile, the IR lights associated with witness cameras 204, 206 can project onto the scene invisible, IR light so that witness cameras can simultaneously capture footage of markers 112 (which as discussed in more detail herein be configured as a retroreflectors that can substantially reflect IR light) on the face of actor 110. ingly, the markers may appear as bright dots in the images captured by witness cameras 204, 206.
Because taking camera 202 is generally unable to detect IR light, the images ed by taking camera 202 will likely not include portions of reflected IR light from markers 112. As a result, the images ed by taking camera 202 can be used directly in an item of content (e.g., as footage in a movie) and/or used to drive a digital replica of actor 110 based on a markerless motion solving . In some embodiments, markers 112 can be able in both visible and invisible light spectrums. For instance, markers 112 can be black dots that are detectable in both visible light and IR light. In such instances, taking camera 202 and witness s 204, 206 can both capture the positions of markers 112, thereby enabling a more robust triangulation of the face of actor 110 during the performance.
Once the desired facial motion capture footage has been obtained, the footage can be used to generate a computer model of the performance thereby transferring the captured movement of the actor during the performance, including the actor’s facial expressions, to a three-dimensional model of the t. The three-dimensional model can, in turn, be used to create visual effects that can be incorporated into ions, movies, video games and the like ( step 330). A number of different inputs and data sets can be used to solve for the actor’s performance, i.e., to identify and transfer the facial expressions of the actor to those of a three-dimensional computer model. Some of the models and/or data that can be used to solve the performance can be built or otherwise compiled or created independent of the performance and thus can be done either before, during or after step 320. Other inputs that are used to solve for the performance (e.g., the tracked location of markers 402a-g during the performance) are created based on the mance itself and thus can be generated either during performance 320 or after the performance. ering the Captured Performance to a Computer-Generated Model is a simplified flowchart of a method 500 of apture processing that can be performed as part of step 330 according to some embodiments of the invention. Method 500 can match facial expressions of an actor ed during a performance (e.g., step 320) to facial expressions of a computer-generated model of the actor. Method 500 can be performed on each and every plate in a sequence of video so that the facial expressions of a computergenerated model of the actor matches the facial expressions of the actor throughout the entire video sequence. In some embodiments method 500 can be performed such that each plate in the sequence of video frames can be processed independently without depending on the processing or solving of one or more previous plates. Thus, some embodiments of the method 500 allow each plate of a filmed video ce to be sed in parallel taking advantage of the parallelization offered by computer clusters.
For each plate processed on a plate-by-plate basis, method 500 can start with various inputs including a plate from the performance capture session (block 502) and an initial facial mesh (block 504) representing a neutral geometry of a deformable model generated, for example, as described above with respect to Fig. 3, step 310. The initial facial mesh (i.e., initial deformable model) can include the rigid adjustment (rotation and translation), the blend shape s and the per-vertex deltas for the deformable model that define the neutral geometry. A differentiable er (block 506) can render the l facial mesh and then method 500 can solve the deformation from the plate (block 510) by trying to minimize the differences between the initial deformable model (i.e., l expression) and the actor’s actual facial expression in the plate using a recipe (i.e., a sequence of deformation solvers as discussed below) based on various inputs as described below over a series of n ions. Thus, the solver in block 510 calculates an sion of the deformable model that is closest to the expression of the actor in the plate.
Each of the n iterations involved with solving the deformation in block 510 generates a revised version of the deformable model (i.e., updated values for the parameters of the deformable model) that changes in each ion from the initial neutral expression of block 504 to an expression that comes closer and closer to resembling the actor’s actual facial expression in the plate. The plate can be an image made up of millions of pixels where each pixel has a particular RGB value. In each iteration, block 510 uses the differential renderer (block 506) to generate a rendering of the deformable model for the particular iteration along with derivatives. The differentiable render is an image made up of pixels and, having access to derivatives of pixel color values with respect to parameters of the model generated by the differentiable er, the solver tries to minimize the differences between the RGB values of the plate and the RGB values of corresponding pixels in the rendered version of the deformable model. In each iteration the output of the solver (block 510) will get closer and closer to the actual expression of the actor in the plate until the final iteration produces a final facial mesh (block 520) in which the ters of the deformable model (e.g., the various weights of the blendshapes and the values of the rigid rotation, translation and the per-vertex displacements) result in a facial expression that very closely s the expression of the actor in the plate. Since embodiments of the invention provide the solver with a very dense set of pixels in each iteration, the solver can e a more detailed solution for the performance compared to ons calculated by traditional -based systems that are limited in the detail they capture by the number of markers being d.
Inputs for the Transferring Process
[0045] When solving the deformation in block 510, embodiments of the invention can use some or all of the following inputs (block 502) in addition to the footage of the actor whose facial expressions are being captured (i.e., the plate also in block 502): 1) A facial rig of the actor which includes a 3D mesh of the actor’s face with neutral expression and a set of canonical expressions also represented as 3D meshes (also known as blend shapes). The facial rig can be made up of the following components and can be built as described above: a three-dimensional mesh B0 of the face in a neutral pose comprising n vertices, and a set of m three-dimensional meshes B1, B2, … Bm, where each Bj is made of n vertices and represents a predefined canonical expression (e.g., inspired from Facial Action Coding System (FACS) shapes). 2) The camera rig – calibrated and match-moved as described below. 3) A small set (e.g., 4-8) of 2D markers chosen on as rigid as possible places of the face and tracked hout the e as described above and/or a set of virtual landmarks can be added to the face in various predetermined locations using known ation techniques. 4) The rigid motion of the 3D facial mesh throughout the footage, i.e,. an estimate of the rotational and translational components of the head for each frame. The IR dots visible in the witness cameras can be used to triangulate the positions of these s in 3D and solve for the rigid head motion which best satisfies the 3D dot positions at every frame. While this rigid motion is not expected to be perfectly accurate, it can be refined later during facial capture. 5) A hand-matched pose for a nce frame – i.e. for one of the frames of the footage an artist manually dials in facial rig controls to best match the expression from the plate. In the case of strong head rotation, it can be useful to produce two or three reference frames rather than one to improve the albedo and lighting estimate (described below). This pose matching can also be done automatically (albeit more approximately) leveraging machine-learning-based virtual facial landmarks. 6) A l light rig built as described below. 7) Flattened rotoscoping splines and masks as described below. 8) The albedo measured on the light stage.
Some of the above inputs can be generated from data processed on a per-shot (i.e., a continuous sequence of frames of l film) basis as opposed to a per-plate basis. For example, for each shot, one or more of the following can be done, several of which can be required for lighting of the final frame and shared with the lighting ment: 1) A virtual light rig can be built by, for example: stitching the HDRI light probes into lat-long images; using stereo view geometry, turn the key lights in the HDRI probes into actual virtual lights (e.g., rectangular area lights); and using the gray sphere, chrome sphere and/or McBeth chart, adjust light intensities and colors. 2) Match move the camera rig by, for example, using footage of calibration devices, solving for camera intrinsic and extrinsic parameters and the relative transformations between the main camera and the witness cameras. And, using standard match- moving techniques (markers on set, etc), solving for the rig transformation matrix during the shot. 3) Rotoscoping s by, for example, drawing eye lid and outer lip splines as viewindependent splines (i.e., that are “drawn” on the mesh) and inner lip view-dependent splines (i.e., which delineate the occluding contours of the inner lips). Rotoscoping splines can be replaced by e-learning based facial virtual landmarks if desired. 4) Rotoscoping shapes by, for example, drawing shapes to define occluding masks – any object which occludes the face at any point during a shot can be drawn as a closed 2D shape.
) Flatten all two-dimensional elements by, for example, using the lens distortion measurements done r. The lens distortion as an image-space map can be inverted and d to the 2D elements (such as the plate, the rotoscoping s, occluding masks etc). 6) Model occluding geometry by, for example, for any object that casts a significant shadow on the actor’s face, a 3D mesh can be produced that approximates the occluding geometry.
Embodiments of the invention can solve for the performance in block 510 with a differentiable renderer based on some or all of the above inputs using appearance and/or shading to infer geometry as opposed to using a standard VFX rendering system. For example, some embodiments can employ shape from shading techniques that can leverage gradient patterns on the image to provide clues as to what the actor’s face is doing at the time the image was taken and use the gradient patterns to estimate what deformation the s face is doing based on the image.
A simplified shading model can accommodate the differentiability constraints imposed by an optimization framework while maintaining acceptable performances. In some embodiments the e reflectance model can be a simple diffuse Lambertian model and four types of lights can be supported, including: environment light, rectangular area light, ional light and point light. Embodiments can represent the environmental illumination using a second order cal ics basis representation (i.e. nine components) or a higher order basis representation.
[0049] All lights can be initialized using the light rig measured on set. In particular, the Spherical Harmonics components of the environment light can be lized by projecting the HDRI measured on set onto the Spherical Harmonics basis. For rectangular area lights, the light geometry can be known from the stereo pair of HDRI images (and potentially with the help of the scan). Their emission color can be approximated by averaging the full emission texture as photographed on set. Directional lights can be used to model illumination from the sun, and point lights can occasionally be used as a r approximation for finite size lights which are far away from the subject.
For all these , irradiance can be computed ically using closed-form differentiable expressions, as described further. Shadows can be imated using stochastic Monte-Carlo integration and multiplied with the unshadowed irradiance to get the final reflected radiance. While this approximation may not be entirely correct (taking the visibility term outside of the rendering integral), it is often good enough for the purpose it is required for and makes the approach practical.
For the nment light, efficiency can be improved by computing a visbility term V as the proportion of samples for which the environment is unoccluded, where the light samples are importance-sampled ing to the energy defined by the Spherical Harmonics components. For rectangular area lights, samples on the light geometry can be buted and, again, the tion of occluded shadow rays against the full set of samples drawn can be computed. Shadowing for directional and point lights can also be done. Note that, in some embodiments, the visibility term is not easily differentiable and can be considered a constant term in the optimization. Its value can be updated at every step of the iterative solve.
[0052] In some embodiments the model includes an albedo term α ented as an RGB color for each vertex of the mesh. The albedo value at an arbitrary point on the surface of the mesh can be obtained through barycentric interpolation of the albedo at the triangle vertices where the point lies. With this model, the ce ���� ������ scattering off a 3D point p of normal n on the mesh under an illumination defined by the Spherical Harmonics components {���� ����

Claims (19)

1. A sampling apparatus (100) for taking a representative milk sample in a ermined quantity range from a ing line (10) in which milk is conveyed at conveying intervals of unknown length, sing: a pump (108), a controller (109) of the pump (108), and a sample container connection element (112) connected to the pump (108), wherein the controller (109) is configured to control a pulsed ion of the pump (108) in a first mode of operation in a conveying interval based on the predetermined quantity range and a measured quantity indicating a flow rate of the milk conveyed in the conveying line (10) and/or a measured variable indicating a flow rate of the milk conveyed in the conveying line (10) and/or a predetermined variable indicating a total quantity of milk to be conveyed through the conveying line (10) in a conveying interval, wherein the pump (108) s a discrete sample subset along a first conveying direction (122) of the pump (108) during pulsed operation in each sample pulse al, wherein the predetermined quantity range is greater than or equal to a total quantity corresponding to a total number of discrete sample subsets in the conveying interval, wherein the controller (109) is further configured to reduce a delivery velocity of the pump (106) and increase a frequency of sample pulse intervals as the flow rate of the milk conveyed in the conveying line (10) decreases.
2. Sampling apparatus (100) according to claim 1, wherein the ller (109) is further configured to control the operation of the pump (108) based on a comparison of an actual quantity corresponding to an actual number of sample subsets conveyed and the predetermined quantity range.
3. Sampling tus (100) according to claim 2, wherein the controller (109) is ured to control the operation of the pump (108) to interrupt the first mode of ion and convey a subset of the actual quantity of the actual number of sample subsets conveyed at an interruption interval in a second mode of operation in a second conveying direction (124) oriented opposite the first conveying ion (122).
4. Sampling apparatus (100) ing to claim 3, wherein the ller (109) is further configured to control the pulsed operation of the pump (108) to update, in the first mode of operation, a number of sample pulse intervals for the remaining duration of the conveying interval and/or a size of the sample pulse intervals after the interruption interval based on a remaining actual quantity.
5. Sampling apparatus (100) ing to claim 3 or 4, r sing a mixing device (116, 118) adapted to mix the actual quantity collected in a sample container prior to the interruption interval, wherein the controller (109) is further configured to activate the mixing device (116, 118) in the second mode of ion prior to operation of the pump (108) in the interruption interval.
6. Sampling apparatus (100) according to one of claims 1 to 5, wherein the controller (109) is configured to update a number of sample pulse intervals based on the measured size after at least one sample pulse interval.
7. Sampling apparatus (100) according to one of claims 1 to 6, further comprising a pair of ring electrodes (14) adapted to detect a conductance of milk conveyed in the conveying line (10).
8. Sampling apparatus (100) according to claim 7, further comprising a settling chamber and a pair of electrodes ed in the settling chamber for ing a conductivity of milk in the settling chamber.
9. Sampling apparatus (100) according to claim 7 or 8, further comprising a further pair of ring electrodes (14') adapted to detect a conductance of milk conveyed in the conveying line (10), wherein the controller (109) is configured to control operation of the pump (108) based on the flow rate of milk conveyed in the conveying line (10) based on conductance values detected by the pair of ring odes (14) and the further pair of ring electrodes (14').
10. Sampling apparatus according to one of claims 1 to 9, further comprising an optical flow sensor having a light source and a light or connected to the controller (109), wherein the controller (109) is further ured to control the pump (108) based on data output from the optical flow sensor.
11. Method for taking representative milk samples in a predetermined quantity range from a conveying line (10) in which milk is conveyed in conveying als of unknown length, wherein the predetermined quantity range is greater than or equal to a total quantity corresponding to a total number of discrete sample subsets in the conveying interval, and wherein the method comprises in a first mode of operation determining an ing rate of a pump (108) based on the predetermined ty range and a predetermined value that estimates a length of a conveying interval and/or estimates a total quantity conveyed through the conveying line (10) in the conveying interval and/or estimates a quantity that indicates a flow rate or change thereof conveyed in the conveying line (10) in the ing al, the operating rate defining a certain number of sample pulse intervals, wherein the pump (108) s a discrete sample subset along a first conveying direction (122) in each sample pulse interval, operating the pump (108) based on the determined operating rate in the first conveying direction (122), wherein with decreasing flow rate of the milk conveyed in the conveying line (10), a conveying velocity of the pump (108) is reduced and a frequency of sample pulse intervals is increased.
12. Method according to claim 11, the method comprising during operation of the pump (108): detecting an actual value indicating the flow rate in the conveying line (10) and/or an actual value indicating a flow velocity of the milk conveyed in the conveying line (10), ng the ing rate based on the detected actual value, and operating the pump (108) based on the updated operating rate in the first conveying direction (122).
13. Method ing to claim 11 or 12, wherein operation of the pump (108) in the first mode of operation is interrupted when an actual quantity of an actual number of sample subsets conveyed approaches the predetermined quantity range up to a predetermined distance and it is determined on the basis of the detected actual value that a current conveying al has not yet ended.
14. Method according to claim 13, further comprising operating the pump (108) in a second mode of operation during an interruption interval in a second conveying ion (124) opposite the first conveying direction (122) to convey a subset of the actual ty of the conveyed actual number of sample subsets in the second conveying direction (124).
15 Method according to claim 13, wherein the second mode of operation further comprises mixing the actual quantity collected in a sample container prior to operating the pump (108) in the interruption al.
16. Method according to one of claims 13 to 15, further comprising updating the ing rate of the pump (108) based on a remaining actual quantity after the interruption interval.
17. Method according to one of claims 11 to 16, further comprising updating a number of sample pulse intervals based on the measured quantity after at least one sample pulse interval in the first mode of operation.
18. Method according to one of claims 11 to 17, where the determined operating rate in the first mode of operation specifies a sample subset to be conveyed three times per minute.
19. Method according to one of claims 11 to 18, further comprising operating the pump (108) in a third mode of operation after the first mode of ion has been completed, n the pump (108) is operated in the third mode of operation at a flushing interval for delivery along the second direction of delivery (124).
NZ762338A 2019-03-07 2020-03-04 On-set facial performance capture and transfer to a three-dimensional computer-generated model NZ762338B2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962814994P 2019-03-07 2019-03-07
US62/814,994 2019-03-07
US16/681,300 US11069135B2 (en) 2019-03-07 2019-11-12 On-set facial performance capture and transfer to a three-dimensional computer-generated model
US16/681,300 2019-11-12

Publications (2)

Publication Number Publication Date
NZ762338A NZ762338A (en) 2021-09-24
NZ762338B2 true NZ762338B2 (en) 2022-01-06

Family

ID=

Similar Documents

Publication Publication Date Title
AU2020201618B2 (en) On-set facial performance capture and transfer to a three-dimensional computer-generated model
Meka et al. Lime: Live intrinsic material estimation
US11115633B2 (en) Method and system for projector calibration
US11671717B2 (en) Camera systems for motion capture
US10380802B2 (en) Projecting augmentation images onto moving objects
US20200013223A1 (en) Method and System for Representing a Virtual Object in a View of a Real Environment
US10133171B2 (en) Augmenting physical appearance using illumination
Martull et al. Realistic CG stereo image dataset with ground truth disparity maps
US8207963B2 (en) System and method for performing motion capture and image reconstruction
US11425283B1 (en) Blending real and virtual focus in a virtual display environment
Fender et al. Optispace: Automated placement of interactive 3d projection mapping content
CN110458964B (en) Real-time calculation method for dynamic illumination of real environment
Fukamizu et al. Elamorph projection: Deformation of 3d shape by dynamic projection mapping
JP7333437B2 (en) 3D digital model surface rendering and conversion
NZ762338B2 (en) On-set facial performance capture and transfer to a three-dimensional computer-generated model
GB2584192A (en) On-set facial performance capture and transfer to a three-dimensional computer-generated model
AU2020368983B2 (en) Method and system for rendering
Lensch et al. A framework for the acquisition, processing, transmission, and interactive display of high quality 3D models on the web
Abad et al. Integrating synthetic objects into real scenes
CN116368350A (en) Motion capture calibration using targets