NZ762338B2 - On-set facial performance capture and transfer to a three-dimensional computer-generated model - Google Patents
On-set facial performance capture and transfer to a three-dimensional computer-generated model Download PDFInfo
- Publication number
- NZ762338B2 NZ762338B2 NZ762338A NZ76233820A NZ762338B2 NZ 762338 B2 NZ762338 B2 NZ 762338B2 NZ 762338 A NZ762338 A NZ 762338A NZ 76233820 A NZ76233820 A NZ 76233820A NZ 762338 B2 NZ762338 B2 NZ 762338B2
- Authority
- NZ
- New Zealand
- Prior art keywords
- pump
- conveying
- model
- conveyed
- interval
- Prior art date
Links
Abstract
method of transferring a facial expression from a subject to a computer generated character that includes receiving a plate with an image of the subject’s facial expression, a three-dimensional parameterized deformable model of the subject’s face where different facial expressions of the subject can be obtained by varying values of the model parameters, a model of a camera rig used to capture the plate, and a virtual lighting model that estimates lighting conditions when the image on the plate was captured. The method can solve for the facial expression in the plate by executing a deformation solver to solve for at least some parameters of the deformable model with a differentiable renderer and shape from shading techniques, using, as inputs, the three-dimensional parameterized deformable model, the model of the camera rig and the virtual lighting model over a series of iterations to infer geometry of the facial expression and generate a final facial mesh using the set of parameter values of the deformable model which result in a facial expression that closely matches the expression of the subject in the plate. an be obtained by varying values of the model parameters, a model of a camera rig used to capture the plate, and a virtual lighting model that estimates lighting conditions when the image on the plate was captured. The method can solve for the facial expression in the plate by executing a deformation solver to solve for at least some parameters of the deformable model with a differentiable renderer and shape from shading techniques, using, as inputs, the three-dimensional parameterized deformable model, the model of the camera rig and the virtual lighting model over a series of iterations to infer geometry of the facial expression and generate a final facial mesh using the set of parameter values of the deformable model which result in a facial expression that closely matches the expression of the subject in the plate.
Description
ON-SET FACIAL PERFORMANCE CAPTURE AND TRANSFER TO A
DIMENSIONAL COMPUTER-GENERATED MODEL
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the t of U.S. Patent Application No. 62/814,994, filed
March 7, 2019 and U.S. Patent Application No. 16/681,300, filed November 12, 2019. The
disclosure of which are incorporated by nce herein in their entirety for all purposes.
FIELD
The present disclosure relates lly to performance capture, and more
specifically to methods, techniques and systems for capturing facial expressions from a
subject during a performance and transferring the captured expressions to a three-dimensional
model of the subject.
BACKGROUND OF THE INVENTION
Facial expression transfer is the act of adapting the facial sions of a subject,
such as an actor, to a three-dimensional computer-generated (CG) model that can be used to
create visual effects that can then be incorporated into animations, movies, video games and
the like. Mastering facial expression transfer and other aspects of facial animation is a longstanding
nge in computer graphics. The face can describe the emotions of a character,
convey their state of mind, and hint at their future actions. Audiences are particularly trained
to look at faces and identify these subtle characteristics. Accurately capturing the shape and
motion of real human faces in the expression transfer process can play an important role in
transferring subtle facial expressions of the subject to a er-generated character giving
the computer-generated ter natural, life-like expressions.
In order to transfer facial expressions from a subject to a computer-generated
model, the subject’s facial expressions first have to be ed, for example, on l film
or another appropriate . Some traditional techniques that are used to capture facial
expressions of a subject (e.g., of an actor during a performance) rely on numerous markers
positioned at strategic locations on an actor’s face and a head-mounted, high-resolution
camera that is directed towards the actor’s face. The camera can then be used to film the
actor’s face during his or her performance. Software can track movement of the markers as
the actor’s face displays different expressions during the performance and translate the
marker movement into a computer-generated model that mimics the actor’s facial
expressions.
[0005] While such techniques have been successfully used in a variety of different
situations including in various well-known , it can be cumbersome and cting to
actors to wear a head-mounted camera during a performance and to have their faces d
with dozens of markers.
SUMMARY OF THE INVENTION
[0006] Embodiments of the disclosure n to methods and systems for capturing the
facial expressions of an actor or other subject without the use of a head-mounted camera and
in film set ions. The captured facial expressions can be transferred to a threedimensional
parametrized able model of the actor or subject and used to in the context
of visual effects production, including but not limited to, animations, movies, video clips,
video games, and virtual and/or augmented reality content. In some embodiments the method
ively deforms a dimensional mesh with the goal to minimize the difference
between a 3D render of that mesh and the plate (i.e., a frame from the captured footage). A
differentiable renderer can be used to generate the 3D face renders making it possible to
leverage well-known derivative-based minimization techniques to meet the goal.
[0007] Some embodiments of the invention provide a method of transferring a facial
expression from a subject to a computer generated character. The method includes receiving
a plate with an image of the subject’s facial expression, a dimensional parameterized
deformable model of the subject’s face where different facial expressions of the subject can
be obtained by varying values of the model parameters, a model of a camera rig used to
capture the plate, and a virtual lighting model that estimates lighting conditions when the
image on the plate was ed. The method can solve for the facial expression in the plate
by executing a deformation solver to solve for at least some parameters of the deformable
model with a differentiable renderer and shape from shading techniques, using, as inputs, the
three-dimensional parameterized deformable model, the model of the camera rig and the
virtual ng model over a series of iterations to infer geometry of the facial expression and
generate a final facial mesh using the set of parameter values of the deformable model which
result in a facial sion that closely matches the expression of the subject in the plate.
In some embodiments the three-dimensional parameterized deformable model can
include a plurality of blendshapes that represent different facial expressions of the subject and
include a set of hape weight values, one per blendshape. The final facial mesh is
obtained by ng a set of weighted blendshapes that best mimic the facial expression in
the plate. In various embodiments t he deformable model can also include rotation and
translation values that represent a rigid adjustment of the subject’s head as well as a delta
vector that represents a per vertex displacement used in transferring the facial expression of
the subject to the computer-generated character, which can be particularly useful where the
computer-generated character has a head sized or shaped differently than the head of the
t.
In some embodiments the plate can be an image made up of thousands or even more
than a million pixels. Each pixel can have a particular RGB value. During each iteration of
the series of iterations the entiable renderer can generate a rendering of the deformable
model and a solver can then try to minimize differences between the RGB values of the plate
and the RGB values of corresponding pixels in the rendered version of the deformable model.
An initial iteration of the solving can include: ing an initial facial mesh generated from
the three-dimensional deformable model representing a neutral expression of the subject;
trying to minimize differences n RGB values of the plate and RGB values of the
ed initial facial mesh representing the neutral expression; and generating an updated
facial mesh including a set of weighted blendshapes that represents a facial expression of the
subject that is more similar to the facial expression of the subject in the plate than is the initial
facial mesh. In each additional ion of the solving step, an output of that iteration can be
ted that is closer to the actual representation of the subject in the plate than an output
of the previous iteration.
[0010] In some embodiments, solving for the facial expression in the plate can include
executing a plurality of different solvers where each solver executes multiple iterations before
the next solver is run and where each solver has at least one cost function associated with it
that defines an objective that the solver tries to minimize. The plurality of different solvers
can be executed in a predetermined sequence that is defined by a recipe ed from a
y that stores multiple predetermined recipes. Each predetermined recipe in the library
can include one or more ation solvers each of which has at least one cost function
associated with it.
In additional embodiments, a method of erring a facial sion from a
subject during a performance to a computer generated character can include obtaining at
least: (i) digital video footage of the performance in the format of a plurality of tially
ordered plates each of which includes an image of the subject’s facial expression during the
performance; (ii) a three-dimensional terized deformable model of the subject’s face
where different facial expressions of the subject can be obtained by varying the values of the
model parameters; (iii) a model of a camera rig used to capture the performance; and (iv) a
virtual ng model that estimates lighting conditions used during the performance. The
method can further include generating a computer model of the performance by, for each
individual plate in the plurality of sequentially ordered plates, sing the individual plate
independently of other plates in the ity to solve for the facial expression in the plate
being processed using a differential renderer with shape from shading techniques over a
series of iterations to infer geometry of the facial expression and generate a final facial mesh
using the set of parameter values for the deformable model which result in a facial sion
that closely matches the expression of the subject in the plate being processed where the
solving uses the three-dimensional deformable model, the camera rig and the virtual ng
model as .
To better understand the nature and advantages of the present invention, reference
should be made to the following description and the accompanying figures. It is to be
understood, however, that each of the figures is provided for the purpose of illustration only
and is not ed as a definition of the limits of the scope of the present invention. Also, as
a general rule, and unless it is evident to the contrary from the description, where elements in
different figures use identical reference numbers, the elements are lly either identical
or at least similar in function or purpose.
BRIEF DESCRIPTION OF THE DRAWINGS
is a simplified diagram of an exemplary environment in which embodiments
of the present invention can be employed;
is a simplified diagram of an exemplary configuration of a camera system
that can be used for facial mance capture according to some embodiments of the
invention;
is a simplified flowchart depicting a facial performance capture and
expression transfer method according to some embodiments of the invention;
is a simplified illustration of exemplary positions for small set of gel-based
s that enable motion capture of the skull of an actor during a performance according to
some embodiments of the invention;
is a simplified art of steps associated with ng facial
expressions of an actor captured during a performance to facial expressions of a computergenerated
model of the actor according to some embodiments of the invention;
is a simplified block diagram of an exemplary recipe that can be ed
in block 510 of the method shown in according to some embodiments of the
invention;
[0019] is a simplified block diagram of system for creating er generated
imagery (CGI) and computer-aided animation that can ent or incorporate various
embodiments in ance with the sure; and
is a block diagram of an exemplary computer system according to some
embodiments of the invention.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
Embodiments of the invention pertain to methods and systems for capturing the
facial expressions of an actor or other subject during a performance without the use of a unted
camera and in film set conditions allowing the actor to have full freedom of motion
and full freedom of interaction with the others actors and with the set. In this manner,
embodiments of the invention enable film-production quality facial motion capture with as
little encumberment as possible for the actor so as to not compromise his or her performance.
Embodiments also impose as little restriction as possible on the on-set filming conditions, e.g.
location, illumination conditions, camera models and settings, and the like.
Once captured, embodiments of the invention further pertain to transferring the
captured facial expressions to a three-dimensional parameterized deformable model of the
actor that can be used in the t of visual effects production. In some embodiments the
method iteratively deforms a three-dimensional mesh with the goal to minimize the
difference n a 3D render of that mesh and the plate (i.e., a frame from the captured
footage). A differentiable renderer can be used by itself or in conjunction with other
elements to generate the 3D face renders making it possible to leverage well-known
derivative-based minimization techniques to meet the goal.
e Performance Environment:
In order to better understand and appreciate embodiments of the invention,
reference is made below to which is a simplified diagram of an exemplary
environment 100 in which embodiments of the present invention can be employed.
Environment 100 can include a performance area 102 and a backdrop 104. Performance area
102 can be a stage or any area in which one or more actors can carry out a performance.
op 104 can be a green screen that facilitates post-production work or can include
scenery that is appropriate for the performance. For example and solely for illustrative
purposes, in backdrop 104 is an outdoor scene that includes mountains and clouds. In
some ments, some or all of the scenery of backdrop 104 can be generated on a
computer and displayed on one or more displays, such as large LCD or LED displays, that
surround performance area 102.
One or more cameras 106 can be positioned at strategic locations (e.g., locations
that help with the e and/or locations that are desirable for the director cinematically)
within environment 100 to capture the performance of an actor 110. onally, one or
more lights, for example LED lights, can be placed around stage 102 in order to t
visible light onto the stage to lish desired lighting effects for the performance.
Embodiments of the ion can be used with a y of different cameras and
are not limited to any number of cameras or to any particular camera type. F or the purpose of
facial motion capture, some embodiments can include a camera system that includes at least
two different types of cameras. For example, in some embodiments each camera 106 (or a
subset of cameras 106) can include a first camera that is set up and configured to capture
images of an actor in the visible light ngth spectrum and one or more second cameras
that are set up and configured to capture images of a small set of s placed on the
actor’s face in an invisible light wavelength um, e.g., infrared (IR) or ultraviolet (UV)
light wavelength spectrum. The first camera is sometimes referred to herein as a g
camera” and the second cameras are sometimes referred to a “witness cameras”. It is to be
appreciated that the words “visible” and “invisible” used herein are to be interpreted in
relation to what is detectable by the naked eye. By being ured to capture light in
different spectrums, the taking camera and the one or more witness cameras can
simultaneously capture different aspects of a scene based on their respective light
wavelengths, thereby eliminating the need to capture two separate performances of the same
scene to generate content.
Example Camera System:
An example of a camera system that can be used as one or more of the cameras 106
is discussed in U.S. Patent ation 16/102,556 (“the ‘556 application”), filed on Aug. 13,
2018 and entitled “Camera Systems for Motion Capture”. The ‘556 application published on
April 25, 2019 as U.S. Publication 2019-0124244 and is orated herein by reference in
its entirety. For convenience, an abbreviated ption of an example of a camera system
described in the ‘556 application is also depicted in which is a fied diagram of
an exemplary uration of a camera system 200 that can be used for facial mance
capture according to some embodiments of the invention.
[0027] As shown in camera system 200 can include a taking camera 202 along with
two infrared (IR) witness cameras 204, 206 positioned on opposite sides of the taking camera.
In some embodiments the cameras can all be mounted to a moveable rig 210 and can be
pointing in the same general direction such that all three cameras can capture the same scene
but at different angles. Rig 210 can include wheels 208 that enable the rig to be easily moved
around performance area 102 to capture different angles of an actor 110. In some
embodiments, witness cameras 204, 206 can be rotated around respective pivot points 212 so
that witness cameras can be positioned at different angles with respect to support ure
305. For instance, as shown in when actor 110 is positioned close to camera system
200, witness cameras 204, 206 can each be pivoted around respective pivot points 212 and be
oriented at appropriate angles 214 so that the witness cameras are pointed at actor 110. On
the other hand, when actor 110 is positioned further away from camera system 200, the
witness cameras can be oriented at an increased angle 214 to actor 110.
A band-pass filter (not shown) can be d on each IR camera 204, 206 such
that each IR camera only captures a narrow spectrum in the IR domain. Additionally, each
IR camera can be fitted with an “IR ring-shaped light” (not shown) made of a set of IR LEDs
emitting in the desired spectrum. The light emitted by these rings is invisible to main camera
202 but produces a consistent “flat” nation for IR cameras 204, 206 – a type of imagery
that is friendlier to computer processing. Finally, the type of shading produced by these rings
on the face is highly predictable since the light types and positions are precisely known,
which can be used to solve for facial deformation based on shading observed on the plate by
the witness camera. Other embodiments of camera system 200 do not require the witness
cameras 204, 206 to be IR cameras and can d employ witness cameras that use a
different spectra, but IR cameras can make the data ed relatively easy to process.
On-set Facial Performance Capture and Transfer:
Embodiments of the invention and operation of camera system 200 can be better
understood from an exemplary use case scenario described with respect to which is a
fied flowchart depicting an on-set facial performance capture and transfer method 300
according to some embodiments of the invention. For example, unlike when ing and
transferring a facial performance from archival footage, on-set facial motion capture and
transfer process 300 allows for access to the , the cameras and the set as part of a set-up
and/or tion tasks (step 310). Such initiation tasks can be performed prior to filming a
motion picture or other video sequence. For example, the actor’s face can be “scanned” in a
few ermined positions through image-based multi-view stereo techniques, the actor’s
face reflectance properties (e.g. a diffuse color map) can be measured; the set or performance
area (e.g., area 102) can be scanned and measured (e.g., by scanning the set with a LIDAR
unit); lenses for the taking and witness cameras can be selected based on the conditions of the
set along with riate aperture, ISO and other settings; tion for the lenses used in
camera system 200 can be measured and estimates of the camera’s intrinsic and extrinsic
parameters can be made using known calibration techniques and known approaches to
calibrating color; lighting for the set and cameras can be adjusted as appropriate; and the onset
illumination can be captured for each ent lighting configuration that will be used
during the shoot (e.g., by capturing a stereo pair of HDRI light probes in key locations, such
as where the actor(s) will stand). Additionally, sensors can be placed at various locations on
set to gather useful information for the reconstruction, most commonly in the form of the
witness cameras 204, 206.
Deformation Model
Step 310 can also include building a facial rig for each actor. The facial rig can be a
three-dimensional parameterized deformable model of the actor’s face. Parameters of the
deformable model can be varied to generate different facial expressions of the actor allowing
the deformable model to be lated to mimic the actor’s facial expressions. Building the
facial rig typically es ing” the actor’s geometry in a set of predetermined poses.
For example, some ments can use Disney Research’s Medusa system to do the capture
and rely on artists to clean up the capture result into a usable film-quality facial rig. In some
embodiments, the facial rig can be made of a simple set of linear blendshapes as described
generally in U.S. Patent No. 8,207,971, entitled “Controlling Animated Character
Expressions”, which is incorporated by reference herein in its entirety. Other embodiments
of the invention also support solving for a more x rig with rotational-translational
joints and skinning as well as arbitrary functional mapping between rig controls and final
blend shape weights.
ments of the invention are not limited to deformable models based on
blendshapes. For example, in other embodiments the three-dimensional parameterized
deformable model can be made purely of per-vertex displacements. In still other
embodiments, more sophisticated models that rely on per-patch deformation and don’t use
blendshapes in the traditional sense of the term can be used. In various embodiments,
different facial expressions can be attained by setting different parameter values for the
deformable model. For example, for a three-dimensional parameterized able model
based on hapes, different facial expressions can be attained from a linear combination
of a selected set of facial expressions (i.e., blendshapes) from the facial rig. By in g one
or more ters associated with the linear combination, a range of facial expressions can
be created while utilizing relatively small amounts of computational resources.
[0032] As an example, some embodiments of the invention use a ation function that
produces a facial sion mesh M by combining linearly a set of m three-dimensional
blendshapes B0, B1, B2, … Bm, where each Bj is made of n es and represents a
predefined canonical sion (e.g., inspired from Facial Action Coding System (FACS)
shapes), where B0 is the neutral expression, and where per-vertex displacements δ are added.
A rotation R and translation t can also be applied to the resulting geometry. Thus, the
deformation for a vertex of index i can be as follows:
����
���� (���� )���� = [���� ���� (���� ) +� ���� ���� (���� ���� (���� ) − ���� ���� (���� )) + ���� (���� )]���� ∙ ���� + ���� (1)
���� −1
where the ���� ���� are the blend shape weights, i.e. the weights used to combine the blend shapes
linearly. The rotation R, the translation t, the blendshape weights wj and the per-vertex
displacements δ(i) are the parameters of the deformable model.
This deformation is versatile in the sense that it incorporates both a strong prior in
the form of blend shapes and a less constrained deformation component through the deltas
(per-vertex 3D displacements), which enables expressions to be matched which, expectedly,
go beyond the abilities of the shapes alone. Some embodiments also support more complex
facial rigs and ation functions which include rotational and/or translational joints and
skinning (e.g., for the jaw) in addition to blendshapes and deltas. Some ments also
support arbitrary functional mapping between a set of user-facing ls and final shape (or
joint) weights.
Referring again to step 310, the director or other party planning the performance
capture (e.g., a Director of Photography) can also select s configurations for the
cameras 106 to be used in the capture session including the type of camera and lens used in
the shoot for each camera 106 (or where a camera 106 is a system including both taking and
witness cameras, the type of camera and lens used with each such camera in the system), the
settings on the camera(s), luminosity levels, etc. In some embodiments all cameras are jam-
synchronized but other embodiments are able to work with systems where only time codesynchronization
is ble between the main camera and the witness cameras. In some
embodiments, the witness cameras can be set to a short exposure time to limit the amount of
motion blur in s camera images.
As mentioned above, in some embodiments, a small number of s (e.g., the
markers can be applied to two, three, six, eight, or more points on an actor’s face) can be
positioned on an actor’s face to assist in the motion capture process as described in U.S.
Patent ation No. 16/102,556, which as noted above is incorporated by reference herein.
The markers can be positioned on substantially rigid parts of an actor’s face to minimize
distortion caused by facial movement during a mance. The markers enable motion
capture of the s skull as he or she is performing and can also be used for deformation
tracking as well. The data generated from tracking the s can be used for determining
rotation and translation of the actor’s skull in each plate as opposed to being used for tracking
movements in the actor’s facial expressions in accordance with some traditional techniques as
mentioned in the Background of the Invention section above.
[0036] is a simplified illustration 400 of exemplary positions for markers 402a-g
that enable motion capture of the skull of an actor 404 during a performance according to
some ments of the present disclosure. As shown in markers 402a and 402d
can be positioned at the temples of actor 404, markers 402b and 402c can be positioned along
the hairline of actor 404, marker 402e can be positioned on the nose bridge, and marker 402f
can be oned on the chin of actor 404. These positions are selected because they are
generally substantially free of nt caused by facial expressions and/or talking. That
way, the positions can closely track the movement of the skull of actor 404. By tracking
these positions, the witness cameras can more accurately capture the movement of the actor’s
head.
In some embodiments the markers can be retroreflective gel-based markers that
reflect the ble light (e.g., IR or UV light in the bandwidth captured by the witness
cameras) but are not visible to the taking camera as there is generally no visible light emitted
near the optical axis of the taking camera. The markers gel-based markers can be applied to
an actor’s face as if it were makeup. As a retroreflective substance, each marker, when
applied to an actor’s face, can act as a surface that reflects light back to its source with a
minimum of scattering along a vector that is el but te in ion from the s
source. By being retroreflective, each marker can effectively negate any noise from ambient
light. For ce, under normal lighting conditions indoors (i.e., absent lights directly
beaming at the markers), the markers may not be visible or have negligible visibility. For
instances where a set is positioned outside, the sun can emit vast amounts of IR light.
However, because the markers are retroreflective, the IR light emitted from the sun may not
reflect back to the witness cameras. Instead, only the IR light emitted from the witness
camera light sources (e.g., ring of IR LEDs around the witness cameras’ lenses) will get
reflected back to the witness cameras. Thus, even though a taking camera and one or more
witness cameras are g an actor with markers 402a-g, only the witness s will
capture the markers.
[0038] By having two types of cameras 202 and 204, 206 with their respective light sources
and ng markers that are only visible to witness cameras and not a taking ,
camera system 200 can effectively and efficiently capture two different motion picture
compositions with one shoot, i.e., act of filming. Thus, with a single performance by actor
110, camera system 200 can capture images that are directly usable: (1) for an item of
content (e.g., content that can be used in cinematic e) and/or driving a digital character
in a virtual nment. and (2) for accurately determining the location of a digital character
mapped to the head of actor 110 in a virtual environment.
Capturing the Performance
After set-up and initiation tasks have been completed, camera system 200 can be
used to capture the entire composition of a set, such as set 100, during a performance (
step 320). For example, the taking camera 202 can be used to film actor 110 during a
performance in which the actor is surrounded by a op of rural mountains. Light
sources 108 can be flood lights g white, visible light that illuminates the scene with
visible light so that taking camera 202 can capture footage of actor 110 as he or she looks
around, as well as any extras present in the scene and props, such as farm equipment that may
be near the actor. Meanwhile, the IR lights associated with witness cameras 204, 206 can
project onto the scene invisible, IR light so that witness cameras can simultaneously capture
footage of markers 112 (which as discussed in more detail herein be configured as a retroreflectors
that can substantially reflect IR light) on the face of actor 110. ingly, the
markers may appear as bright dots in the images captured by witness cameras 204, 206.
Because taking camera 202 is generally unable to detect IR light, the images
ed by taking camera 202 will likely not include portions of reflected IR light from
markers 112. As a result, the images ed by taking camera 202 can be used directly in
an item of content (e.g., as footage in a movie) and/or used to drive a digital replica of actor
110 based on a markerless motion solving . In some embodiments, markers 112 can
be able in both visible and invisible light spectrums. For instance, markers 112 can be
black dots that are detectable in both visible light and IR light. In such instances, taking
camera 202 and witness s 204, 206 can both capture the positions of markers 112,
thereby enabling a more robust triangulation of the face of actor 110 during the performance.
Once the desired facial motion capture footage has been obtained, the footage can
be used to generate a computer model of the performance thereby transferring the captured
movement of the actor during the performance, including the actor’s facial expressions, to a
three-dimensional model of the t. The three-dimensional model can, in turn, be used to
create visual effects that can be incorporated into ions, movies, video games and the
like ( step 330). A number of different inputs and data sets can be used to solve for
the actor’s performance, i.e., to identify and transfer the facial expressions of the actor to
those of a three-dimensional computer model. Some of the models and/or data that can be
used to solve the performance can be built or otherwise compiled or created independent of
the performance and thus can be done either before, during or after step 320. Other inputs
that are used to solve for the performance (e.g., the tracked location of markers 402a-g during
the performance) are created based on the mance itself and thus can be generated either
during performance 320 or after the performance.
ering the Captured Performance to a Computer-Generated Model
is a simplified flowchart of a method 500 of apture processing that can
be performed as part of step 330 according to some embodiments of the invention. Method
500 can match facial expressions of an actor ed during a performance (e.g., step 320) to
facial expressions of a computer-generated model of the actor. Method 500 can be performed
on each and every plate in a sequence of video so that the facial expressions of a computergenerated
model of the actor matches the facial expressions of the actor throughout the entire
video sequence. In some embodiments method 500 can be performed such that each plate in
the sequence of video frames can be processed independently without depending on the
processing or solving of one or more previous plates. Thus, some embodiments of the
method 500 allow each plate of a filmed video ce to be sed in parallel taking
advantage of the parallelization offered by computer clusters.
For each plate processed on a plate-by-plate basis, method 500 can start with
various inputs including a plate from the performance capture session (block 502) and an
initial facial mesh (block 504) representing a neutral geometry of a deformable model
generated, for example, as described above with respect to Fig. 3, step 310. The initial facial
mesh (i.e., initial deformable model) can include the rigid adjustment (rotation and
translation), the blend shape s and the per-vertex deltas for the deformable model that
define the neutral geometry. A differentiable er (block 506) can render the l facial
mesh and then method 500 can solve the deformation from the plate (block 510) by trying to
minimize the differences between the initial deformable model (i.e., l expression) and
the actor’s actual facial expression in the plate using a recipe (i.e., a sequence of deformation
solvers as discussed below) based on various inputs as described below over a series of n
ions. Thus, the solver in block 510 calculates an sion of the deformable model
that is closest to the expression of the actor in the plate.
Each of the n iterations involved with solving the deformation in block 510
generates a revised version of the deformable model (i.e., updated values for the parameters
of the deformable model) that changes in each ion from the initial neutral expression of
block 504 to an expression that comes closer and closer to resembling the actor’s actual facial
expression in the plate. The plate can be an image made up of millions of pixels where each
pixel has a particular RGB value. In each iteration, block 510 uses the differential renderer
(block 506) to generate a rendering of the deformable model for the particular iteration along
with derivatives. The differentiable render is an image made up of pixels and, having access
to derivatives of pixel color values with respect to parameters of the model generated by the
differentiable er, the solver tries to minimize the differences between the RGB values
of the plate and the RGB values of corresponding pixels in the rendered version of the
deformable model. In each iteration the output of the solver (block 510) will get closer and
closer to the actual expression of the actor in the plate until the final iteration produces a final
facial mesh (block 520) in which the ters of the deformable model (e.g., the various
weights of the blendshapes and the values of the rigid rotation, translation and the per-vertex
displacements) result in a facial expression that very closely s the expression of the
actor in the plate. Since embodiments of the invention provide the solver with a very dense
set of pixels in each iteration, the solver can e a more detailed solution for the
performance compared to ons calculated by traditional -based systems that are
limited in the detail they capture by the number of markers being d.
Inputs for the Transferring Process
[0045] When solving the deformation in block 510, embodiments of the invention can use
some or all of the following inputs (block 502) in addition to the footage of the actor whose
facial expressions are being captured (i.e., the plate also in block 502):
1) A facial rig of the actor which includes a 3D mesh of the actor’s face with neutral
expression and a set of canonical expressions also represented as 3D meshes (also
known as blend shapes). The facial rig can be made up of the following components
and can be built as described above: a three-dimensional mesh B0 of the face in a
neutral pose comprising n vertices, and a set of m three-dimensional meshes B1, B2, …
Bm, where each Bj is made of n vertices and represents a predefined canonical
expression (e.g., inspired from Facial Action Coding System (FACS) shapes).
2) The camera rig – calibrated and match-moved as described below.
3) A small set (e.g., 4-8) of 2D markers chosen on as rigid as possible places of the face
and tracked hout the e as described above and/or a set of virtual
landmarks can be added to the face in various predetermined locations using known
ation techniques.
4) The rigid motion of the 3D facial mesh throughout the footage, i.e,. an estimate of the
rotational and translational components of the head for each frame. The IR dots
visible in the witness cameras can be used to triangulate the positions of these s
in 3D and solve for the rigid head motion which best satisfies the 3D dot positions at
every frame. While this rigid motion is not expected to be perfectly accurate, it can
be refined later during facial capture.
5) A hand-matched pose for a nce frame – i.e. for one of the frames of the footage
an artist manually dials in facial rig controls to best match the expression from the
plate. In the case of strong head rotation, it can be useful to produce two or three
reference frames rather than one to improve the albedo and lighting estimate
(described below). This pose matching can also be done automatically (albeit more
approximately) leveraging machine-learning-based virtual facial landmarks.
6) A l light rig built as described below.
7) Flattened rotoscoping splines and masks as described below.
8) The albedo measured on the light stage.
Some of the above inputs can be generated from data processed on a per-shot (i.e., a
continuous sequence of frames of l film) basis as opposed to a per-plate basis. For
example, for each shot, one or more of the following can be done, several of which can be
required for lighting of the final frame and shared with the lighting ment:
1) A virtual light rig can be built by, for example: stitching the HDRI light probes into
lat-long images; using stereo view geometry, turn the key lights in the HDRI
probes into actual virtual lights (e.g., rectangular area lights); and using the gray
sphere, chrome sphere and/or McBeth chart, adjust light intensities and colors.
2) Match move the camera rig by, for example, using footage of calibration devices,
solving for camera intrinsic and extrinsic parameters and the relative transformations
between the main camera and the witness cameras. And, using standard match-
moving techniques (markers on set, etc), solving for the rig transformation matrix
during the shot.
3) Rotoscoping s by, for example, drawing eye lid and outer lip splines as viewindependent
splines (i.e., that are “drawn” on the mesh) and inner lip view-dependent
splines (i.e., which delineate the occluding contours of the inner lips). Rotoscoping
splines can be replaced by e-learning based facial virtual landmarks if desired.
4) Rotoscoping shapes by, for example, drawing shapes to define occluding masks – any
object which occludes the face at any point during a shot can be drawn as a closed 2D
shape.
) Flatten all two-dimensional elements by, for example, using the lens distortion
measurements done r. The lens distortion as an image-space map can be
inverted and d to the 2D elements (such as the plate, the rotoscoping s,
occluding masks etc).
6) Model occluding geometry by, for example, for any object that casts a significant
shadow on the actor’s face, a 3D mesh can be produced that approximates the
occluding geometry.
Embodiments of the invention can solve for the performance in block 510 with a
differentiable renderer based on some or all of the above inputs using appearance and/or
shading to infer geometry as opposed to using a standard VFX rendering system. For
example, some embodiments can employ shape from shading techniques that can leverage
gradient patterns on the image to provide clues as to what the actor’s face is doing at the time
the image was taken and use the gradient patterns to estimate what deformation the s
face is doing based on the image.
A simplified shading model can accommodate the differentiability constraints
imposed by an optimization framework while maintaining acceptable performances. In some
embodiments the e reflectance model can be a simple diffuse Lambertian model and
four types of lights can be supported, including: environment light, rectangular area light,
ional light and point light. Embodiments can represent the environmental illumination
using a second order cal ics basis representation (i.e. nine components) or a
higher order basis representation.
[0049] All lights can be initialized using the light rig measured on set. In particular, the
Spherical Harmonics components of the environment light can be lized by projecting the
HDRI measured on set onto the Spherical Harmonics basis. For rectangular area lights, the
light geometry can be known from the stereo pair of HDRI images (and potentially with the
help of the scan). Their emission color can be approximated by averaging the full emission
texture as photographed on set. Directional lights can be used to model illumination from the
sun, and point lights can occasionally be used as a r approximation for finite size lights
which are far away from the subject.
For all these , irradiance can be computed ically using closed-form
differentiable expressions, as described further. Shadows can be imated using
stochastic Monte-Carlo integration and multiplied with the unshadowed irradiance to get the
final reflected radiance. While this approximation may not be entirely correct (taking the
visibility term outside of the rendering integral), it is often good enough for the purpose it is
required for and makes the approach practical.
For the nment light, efficiency can be improved by computing a visbility term
V as the proportion of samples for which the environment is unoccluded, where the light
samples are importance-sampled ing to the energy defined by the Spherical Harmonics
components. For rectangular area lights, samples on the light geometry can be buted
and, again, the tion of occluded shadow rays against the full set of samples drawn can
be computed. Shadowing for directional and point lights can also be done. Note that, in
some embodiments, the visibility term is not easily differentiable and can be considered a
constant term in the optimization. Its value can be updated at every step of the iterative solve.
[0052] In some embodiments the model includes an albedo term α ented as an RGB
color for each vertex of the mesh. The albedo value at an arbitrary point on the surface of the
mesh can be obtained through barycentric interpolation of the albedo at the triangle vertices
where the point lies. With this model, the ce ���� ������ scattering off a 3D point p of normal
n on the mesh under an illumination defined by the Spherical Harmonics components
{���� ����
Claims (19)
1. A sampling apparatus (100) for taking a representative milk sample in a ermined quantity range from a ing line (10) in which milk is conveyed at conveying intervals of unknown length, sing: a pump (108), a controller (109) of the pump (108), and a sample container connection element (112) connected to the pump (108), wherein the controller (109) is configured to control a pulsed ion of the pump (108) in a first mode of operation in a conveying interval based on the predetermined quantity range and a measured quantity indicating a flow rate of the milk conveyed in the conveying line (10) and/or a measured variable indicating a flow rate of the milk conveyed in the conveying line (10) and/or a predetermined variable indicating a total quantity of milk to be conveyed through the conveying line (10) in a conveying interval, wherein the pump (108) s a discrete sample subset along a first conveying direction (122) of the pump (108) during pulsed operation in each sample pulse al, wherein the predetermined quantity range is greater than or equal to a total quantity corresponding to a total number of discrete sample subsets in the conveying interval, wherein the controller (109) is further configured to reduce a delivery velocity of the pump (106) and increase a frequency of sample pulse intervals as the flow rate of the milk conveyed in the conveying line (10) decreases.
2. Sampling apparatus (100) according to claim 1, wherein the ller (109) is further configured to control the operation of the pump (108) based on a comparison of an actual quantity corresponding to an actual number of sample subsets conveyed and the predetermined quantity range.
3. Sampling tus (100) according to claim 2, wherein the controller (109) is ured to control the operation of the pump (108) to interrupt the first mode of ion and convey a subset of the actual quantity of the actual number of sample subsets conveyed at an interruption interval in a second mode of operation in a second conveying direction (124) oriented opposite the first conveying ion (122).
4. Sampling apparatus (100) ing to claim 3, wherein the ller (109) is further configured to control the pulsed operation of the pump (108) to update, in the first mode of operation, a number of sample pulse intervals for the remaining duration of the conveying interval and/or a size of the sample pulse intervals after the interruption interval based on a remaining actual quantity.
5. Sampling apparatus (100) ing to claim 3 or 4, r sing a mixing device (116, 118) adapted to mix the actual quantity collected in a sample container prior to the interruption interval, wherein the controller (109) is further configured to activate the mixing device (116, 118) in the second mode of ion prior to operation of the pump (108) in the interruption interval.
6. Sampling apparatus (100) according to one of claims 1 to 5, wherein the controller (109) is configured to update a number of sample pulse intervals based on the measured size after at least one sample pulse interval.
7. Sampling apparatus (100) according to one of claims 1 to 6, further comprising a pair of ring electrodes (14) adapted to detect a conductance of milk conveyed in the conveying line (10).
8. Sampling apparatus (100) according to claim 7, further comprising a settling chamber and a pair of electrodes ed in the settling chamber for ing a conductivity of milk in the settling chamber.
9. Sampling apparatus (100) according to claim 7 or 8, further comprising a further pair of ring electrodes (14') adapted to detect a conductance of milk conveyed in the conveying line (10), wherein the controller (109) is configured to control operation of the pump (108) based on the flow rate of milk conveyed in the conveying line (10) based on conductance values detected by the pair of ring odes (14) and the further pair of ring electrodes (14').
10. Sampling apparatus according to one of claims 1 to 9, further comprising an optical flow sensor having a light source and a light or connected to the controller (109), wherein the controller (109) is further ured to control the pump (108) based on data output from the optical flow sensor.
11. Method for taking representative milk samples in a predetermined quantity range from a conveying line (10) in which milk is conveyed in conveying als of unknown length, wherein the predetermined quantity range is greater than or equal to a total quantity corresponding to a total number of discrete sample subsets in the conveying interval, and wherein the method comprises in a first mode of operation determining an ing rate of a pump (108) based on the predetermined ty range and a predetermined value that estimates a length of a conveying interval and/or estimates a total quantity conveyed through the conveying line (10) in the conveying interval and/or estimates a quantity that indicates a flow rate or change thereof conveyed in the conveying line (10) in the ing al, the operating rate defining a certain number of sample pulse intervals, wherein the pump (108) s a discrete sample subset along a first conveying direction (122) in each sample pulse interval, operating the pump (108) based on the determined operating rate in the first conveying direction (122), wherein with decreasing flow rate of the milk conveyed in the conveying line (10), a conveying velocity of the pump (108) is reduced and a frequency of sample pulse intervals is increased.
12. Method according to claim 11, the method comprising during operation of the pump (108): detecting an actual value indicating the flow rate in the conveying line (10) and/or an actual value indicating a flow velocity of the milk conveyed in the conveying line (10), ng the ing rate based on the detected actual value, and operating the pump (108) based on the updated operating rate in the first conveying direction (122).
13. Method ing to claim 11 or 12, wherein operation of the pump (108) in the first mode of operation is interrupted when an actual quantity of an actual number of sample subsets conveyed approaches the predetermined quantity range up to a predetermined distance and it is determined on the basis of the detected actual value that a current conveying al has not yet ended.
14. Method according to claim 13, further comprising operating the pump (108) in a second mode of operation during an interruption interval in a second conveying ion (124) opposite the first conveying direction (122) to convey a subset of the actual ty of the conveyed actual number of sample subsets in the second conveying direction (124).
15 Method according to claim 13, wherein the second mode of operation further comprises mixing the actual quantity collected in a sample container prior to operating the pump (108) in the interruption al.
16. Method according to one of claims 13 to 15, further comprising updating the ing rate of the pump (108) based on a remaining actual quantity after the interruption interval.
17. Method according to one of claims 11 to 16, further comprising updating a number of sample pulse intervals based on the measured quantity after at least one sample pulse interval in the first mode of operation.
18. Method according to one of claims 11 to 17, where the determined operating rate in the first mode of operation specifies a sample subset to be conveyed three times per minute.
19. Method according to one of claims 11 to 18, further comprising operating the pump (108) in a third mode of operation after the first mode of ion has been completed, n the pump (108) is operated in the third mode of operation at a flushing interval for delivery along the second direction of delivery (124).
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962814994P | 2019-03-07 | 2019-03-07 | |
US62/814,994 | 2019-03-07 | ||
US16/681,300 US11069135B2 (en) | 2019-03-07 | 2019-11-12 | On-set facial performance capture and transfer to a three-dimensional computer-generated model |
US16/681,300 | 2019-11-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
NZ762338A NZ762338A (en) | 2021-09-24 |
NZ762338B2 true NZ762338B2 (en) | 2022-01-06 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020201618B2 (en) | On-set facial performance capture and transfer to a three-dimensional computer-generated model | |
Meka et al. | Lime: Live intrinsic material estimation | |
US11115633B2 (en) | Method and system for projector calibration | |
US11671717B2 (en) | Camera systems for motion capture | |
US10380802B2 (en) | Projecting augmentation images onto moving objects | |
US20200013223A1 (en) | Method and System for Representing a Virtual Object in a View of a Real Environment | |
US10133171B2 (en) | Augmenting physical appearance using illumination | |
Martull et al. | Realistic CG stereo image dataset with ground truth disparity maps | |
US8207963B2 (en) | System and method for performing motion capture and image reconstruction | |
US11425283B1 (en) | Blending real and virtual focus in a virtual display environment | |
Fender et al. | Optispace: Automated placement of interactive 3d projection mapping content | |
CN110458964B (en) | Real-time calculation method for dynamic illumination of real environment | |
Fukamizu et al. | Elamorph projection: Deformation of 3d shape by dynamic projection mapping | |
JP7333437B2 (en) | 3D digital model surface rendering and conversion | |
NZ762338B2 (en) | On-set facial performance capture and transfer to a three-dimensional computer-generated model | |
GB2584192A (en) | On-set facial performance capture and transfer to a three-dimensional computer-generated model | |
AU2020368983B2 (en) | Method and system for rendering | |
Lensch et al. | A framework for the acquisition, processing, transmission, and interactive display of high quality 3D models on the web | |
Abad et al. | Integrating synthetic objects into real scenes | |
CN116368350A (en) | Motion capture calibration using targets |