MORPHING OF AN OBJECT USING THE
MORPHOLOGICAL BEHAVIOUR OF ANOTHER OBJECT
The present invention generally relates to the
morphing of an image and in particular relates to the
morphing of an image of an object using a model of
morphological behaviour obtained from another object.
Computer-based morphing of objects, in particular
faces, is being actively pursued in a number of academic and industrial centres. A particular area of activity is the morphing of faces in order to provide animation. In
essence, many prior art methods operate by displacing a
series of control points representing parts or the whole
of the face. This displacement is so defined that the
desired facial morphing, expression or phoneme production
is achieved.
Many different ways of achieving this have been used
in the prior art.
In the field of animation of talking faces, use of principal component analysis has been frequently used in the prior art. For example, in a paper by Kuratate et al. ("Kinematics based synthesis of realistic talking
faces" Proceedings of "Auditory-visual Speech Processing"
Conference, Terrigal, NSW, Australia, pp 185-190) a
method is described for animating talking f ces in which
each point on the face is represented in terms of x, y
and z coordinates. Samples of the form of the face are taken for a number of phonemes . A generic mesh is
applied to each face scan and the meshes are lined up
along feature contours. Certain nodes are adjusted to
match eighteen facial control points. The remainder of
the nodes are matched via a field morphing technique.
The mean mesh in this particular registration is
estimated and the principal components of the covariance
matrix of the node x, y coordinates with respect to the
mean are extracted as eigenvectors. This results in a
set of principal component values for each phoneme and thus facial movement can be driven by inputting phonemes
to derive principal components. Back projection can be
applied to the principal components in order to derive
the displacements in the space of the original
coordinates. Thus movement can be achieved by the input of phonemes to cause the synchronous animation of the face with the speech from which phonemes are derived. In
this technique separate analyses are used to drive the
face and lips.
A similar technique has been applied to the analysis
of lip movements by Reveret and Benoit (Proceedings of
"Auditory-visual Speech Processing" Conference, Terrigal,
NSW, Australia 1998). In the technique disclosed in this
document lips in various phoneme positions are digitised
and thirty landmarks are positioned through a mixed
manual/polynomial interpolation approach. The
coordinates of these landmarks are subjected to principal component analysis after alignment to a reference
position and the first three principal components which
account for most of the differences between positions are
used to drive an animation through back projection using
eigenvectors and the mean.
Similar techniques are used by L. M. Arslan and D.
Talkin ("3-D face point trajectory synthesis using an
automatically derived visual phoneme similarity matrix"
Proceedings of "Auditory-visual Speech Processing" Conference, Terrigal, NSW, Australia 1998). In their technique they take 54 samples in three dimensions and use principal component analysis to reduce the number of
dimensions to 20.
A similar technique is also disclosed in the work by
Galanes et al. ("Generation of lip-synched synthetic
faces from phonetically clustered face movement data"
proceedings of "Auditory-visual Speech Processing"
Conference, Terrigal, NSW, Australia 1998). In this work
58 three-dimensional samples are taken non-uniformly
distributed around the face. Principal component
analysis is then carried out to reduce this to 20
dimensions. By back projection using phonemes, a synchronous animated face movement is provided.
The inventors have realised that these techniques of
the prior art only allow the morphing or animation of an
object, e.g. a face, based on training data obtained from
that object. The inventors have realised that this is a
limitation since it requires the object such as the face
to be animated to be available to carry out the training
in order to obtain the principal component data required
for the morphing necessary for animation by back projection.
The present invention provides a solution to this problem by providing a technique in which samples are
taken defining different forms of an object e.g. due to different facial expressions or due to form changes
during pronunciation of phonemes . The measurements taken
are not restricted to measurements from a single
reference object e.g. measurements may be taken for the
form of the faces of a number of subjects during the
pronunciation of a phoneme. The sets of measurements
(the training set) are subjected to registration in order
to remove the effects of differences in size, location
and rotation. This has the effect of providing data in a lower dimensional shape space. Each shape defined by
a set of data comprises a point in the shape space. The
shape space is then subject to statistical processing in
order to identify the directions in the shape space of
the most correlated changes in shape. This results in a
determined coordinate system and coordinates defining the
data sets within the coordinate system. Each coordinate
defining a data set is indexed by a form reference e.g.
a phoneme or a facial expression where the object is a
face. Thus these data comprise a model of the morphological behaviour of an object with the effects of size, translation and rotation removed. The model is
purely a model of the change in shape of an object as defined from a reference shape.
The inventors have realised that because of the
removal of size, location and rotation effects, this
model can be applied to the back projection and animation
of another object. A reference form for the target
object to be morphed using the model is obtained and when
the coordinates are determined using input form
references such as phonemes, instead of morphing the
reference form of the reference object, the reference form of the desired object is morphed by the amount determined from the coordinates.
Thus in this way a desired object can be made to
have the morphological behaviour of a reference object.
The present invention is applicable to the morphing
of the whole of an object or to any one or more parts of
the object. If more than one part of the object is to be
morphed, the parts can be treated separately.
Preferably, the present invention employs data which
comprise vector data defining the form of the object.
Thus , the data can comprise landmark data providing three-dimensional coordinates of morphologically
important features of the object, or data comprising
control points of splines which define the form of the
object.
In an embodiment of the present invention the
reference shape is determined from the mean of the
reference data.
In an embodiment the processing to remove the
effects of size, location and rotation comprises scaling
the sets of data according to centroid size, translating
the sets of data to make the centroids of the sets of data coincident, and rotating the data sets about the
centroids to minimise the sum of the squares of the
differences between equivalent members of the data sets.
This technique is, in one embodiment, performed as a
generalised Procrustes analysis.
Generally the shape space defined by the sets of
samples in two or three dimensions defines a non-
Euclidean shape space. Although it is possible to
perform the statistical analysis, e.g. principal component analysis, in the non-Euclidean shape space, it is convenient to transform the data into a Euclidean shape space to make the computation simpler.
Embodiments of the present invention will now be described with reference to the accompanying drawings in
which:
Figure 1 is a flow diagram of the operation of a
general embodiment of the present invention;
Figure 2 is a schematic diagram of the general
embodiment of the present invention;
Figure 3 is a schematic diagram of a computer system
implementing the technique to build the model in accordance with a first embodiment of the present invention;
Figure 4 is a flow diagram illustrating the
operation of the embodiment of Figure 3 ;
Figure 5 is a diagram explaining the concepts of
principal component analysis in the tangent to the shape space;
Figures 6a and 6b illustrate two example facial
postures ;
Figure 7a is a diagram of superimposed sets of data points for the forms of the training face;
Figure 7b illustrates the mean coordinates; Figure 7c is a wire frame drawn between the mean
coordinates;
Figure 7d illustrates the rendering of the polygons
between the coordinates;
Figure 8 is a graph of the first two principal
components in an embodiment showing the effects of
changes in principal component scores on the training
face;
Figure 9 is a plot of the third and fourth principal
components showing the effects of the principal component scores on the shape of the training face;
Figure 10 is a diagram of a computer system for
inputting and registering the reference form of the desired object to be morphed or animated;
Figure 11 is a flow diagram illustrating the
operation of the embodiment of Figure 10;
Figure 12 is a plot of the first two principal
components illustrating the score of the target face to
be morphed using the model of the training face;
Figure 13 is a diagram of a computer system for
morphing or animating the target face using the training face in accordance with an embodiment of the present invention;
Figure 14 is a flow diagram illustrating the
operation of the embodiment of Figure 13;
Figure 15 is a diagram illustrating how the
principal component score is used together with the
values for the eigenvectors in order to back project and
provide coordinates in shape space;
Figure 16 is a diagram of an example of grid warping
using a triplet of thin plate splines;
Figure 17a is a diagram of a NURBS curve; and
Figure 17b is a diagram of the NURBS curve after deformation produced in translating control point B7.
The general embodiment of the present invention will
now be described with reference to Figures 1 and 2.
A schematic diagram of the general embodiment is
illustrated in Figure 2. This embodiment can be
implemented on a general purpose computer using software
or in a special purpose hardware machine.
In step SI of Figure 1 input reference data sets for
a plurality of forms of a reference object indexed by a
form reference are input into the reference object data store 1 shown in Figure 2. Such data can comprise spline control points or landmarks in two or three-dimensional
coordinates for a number of forms of the object. Each
form is characterised by a form reference which
identifies the form. For example, for a face, the facial
expressions can be indexed by an expression identifier,
e.g. happy, sad, or crying, or by a phoneme identifier
which identifies the phoneme being pronounced resulting
in the facial expression. Thus the indexing of the data
sets by the form reference enables the data to be
referenced for back projection as will be described in
more detail hereinafter.
In order to remove the effects of size, translation
and rotation from the data sets, in step S2 the data sets
are registered against a reference data set for a
reference form to thus define a point in shape space for
each form. This is achieved using the reference object
data registrar 2 illustrated in Figure 2. Thus by
reference to the reference data set the effects of size,
location and rotation are removed thus converting the
data from data on the form of the object to data on the
shape of the object (where form = shape + size).
The sets of data then undergo statistical analysis using the statistical analyser 3 to determine a space of
reduced dimensionality based on directions in the shape space of the most correlated changes in shape. The
technique used for the statistical analysis can be
principal component analysis or a regression technique
for example. Having determined the coordinate system,
the coordinates for each form with respect to the
reference form are then determined in step S4 and this
data is then stored in the data store 4 illustrated in
Figure 2.
So far the steps have resulted in the formation of a model of the morphological behaviour of the reference
object with reference to the indexed form reference.
Thus if the form reference comprises phonemes, and the
reference data sets comprise data sets for a face, the
model comprises a model of the relative movement of the face when pronouncing phonemes, i.e. it defines what
features of the face move during the pronunciation of a
phoneme. This model can thus be used for morphing or
animating.
In order for this model to be applied to another object, e.g. another face, a reference data set for a reference form of the target object is input in step S5
and this is stored in the target object reference data store 5 of Figure 2. In order to remove the effects of
size, location and rotation from the input data set, it
is registered in step S6 against the reference data for
the reference object by the target object registrar 6 in
Figure 2.
The system now has all the data necessary to perform
a morphing or animation operation on the reference data
for the target object simply from input form reference data. Thus in step S7 the process awaits the receipt of form reference data from the form reference receiver 7.
Once this is received, in step S8 coordinates in the
space of reduced dimensionality are looked up in the data
store 4 in Figure 2 and in step S9 these are applied to
the registered data set for the reference form of the
target object by the target object reference data
transformer 8. In step S10 the transformed data set and
the morphed target object are then output using the
transformed target object data set output device 9. It can thus be seen that because the effects of size, translation and rotation are removed, the target object reference data can be used to replace the
reference object reference data for the back projection
process thus enabling the morphing or animation of the
target object instead of the reference object.
A specific embodiment of the present invention will
now be described of the formation of the model of the
morphological behaviour of an object with reference to
Figures 3 to 9.
Figure 3 is a schematic diagram of a computer system
embodying the invention in which an input device 11 allows the input of n sets of coordinate data ( landmark
matrices) in accordance with step S20 of the flow diagram
of Figure 4. A display unit 12 is provided for the
display of the coordinate data sets. Processor 13 is
provided for the implementation of computer program
modules provided to the computer system by a storage
medium (floppy disc) 16 and stored in the program storage
15. Within the program storage 15 there is stored
program code for carrying out generalised Procrustes
analysis which when loaded in the processor 13 results in the implementation of the generalised Procrustes analysis module 13a. Also, the program storage 15 stores program
code for calculating the mean of the reference data which
when loaded in the processor 13 causes the implementation of the mean calculator 13b. Program code for
implementing the tangent projection is also provided in
the program storage 15 and when loaded in the processor
13 implements the tangent projection module 13c. Program
code for implementing a principal component analysis is
also provided in the program storage 15 and which when
used by the processor 13 implements the principal component analysis module 13d.
The program storage 15 in this embodiment can be
provided by a conventional mass-storage device such as a
hard disc of a general purpose computer. The processor
13 comprises the microprocessor of a general purpose computer.
The computer system also comprises a working memory
14 in which data is stored. The processor 13 uses the
working memory 14 to store and retrieve data used by the
various modules. The working memory 14 is provided by
the random access memory (RAM) of a general purpose computer.
All of the components of the computer system are
linked together by the data and control bus 10.
The operation of the embodiment of Figure 3 will now
be described in more detail with reference to Figures 4
and 9.
In step S20 a plurality (n) of coordinate data sets
are input to form a plurality n of landmark matrices . In
this embodiment of the present invention the data
comprises x, y, z coordinates for facial landmarks from
a reference face, k landmarks are taken in m dimensions. Thus there are n x k x m data points. Thus where m = 3
(i.e. measurements are taken in three dimensions) k = 31
(i.e. 31 landmarks are used) and n = 15 (i.e. there are
15 specimens (forms) taken), the form space has a
dimensionality of 93 with 15 coordinates.
The coordinates for each specimen identify important
morphological features which will move.
In order to remove the effects of size, location and
rotation and thus define the data sets as points in shape
space, the data sets are registered with respect to each other. Differences in size, translation and rotation of specimens can occur, e.g. due to movement of the person to put the face in a different position or rotation of
the camera used to obtain the coordinate data.
This is achieved by minimising the sum of squared
distances between the equivalent landmarks of forms.
This is termed "generalised Procrustes analysis" (GPA).
Where there are n specimens, each represented by a k x m
matrix of landmark coordinates, Xi where i = l,...n
results in registered training faces denoted, X'i, for
which the sum of squared differences, dP 2, between them
is minimised using:
Scaling is thus according to centroid size (the
square root of the sum of squared Euclidean distances
from each landmark to the centroid which is the mean of
landmark coordinates).
Having carried out registration, the mean shape of
the specimens is determined as the mean of the landmark
coordinates in step S22. This is used as the reference in shape space for the tangent projection and for the principal component analysis as will be described in more
detail hereinafter.
Once registration is carried out, each shape
(specimen) can be represented as a point in a "shape
space". Because of the removal of the effects of size,
location and rotation, the shape space has a
dimensionality of km-m-(m(m-l ) /2 )-l , since rotations
provide m(m-l)/2 degrees of freedom, locations provide m
degrees of freedom and scaling provides just one degree
of freedom. The shape space with a distance resulting from the generalised Procrustes analysis is termed "Kendall's
shape space" (Kendall DG (1984) "Shape manifolds,
Procrustean metrics and complex projective spaces"
Bulletin of the London Mathematical Society, volume 16
pages 81-121). Kendall's shape space has the desirable
property that independent isotropic distribution of
landmarks results in isotropic distribution of points
representing specimens in the shape space. This means
that if landmarks vary in location according to an isotropic model we can expect to find an isotropic
distribution of specimens in the shape space. Conversely, deviations from independent isotropic
distribution landmark variations will lead to a non- isotropic distribution of specimens in the shape space.
This also means that shapes separated by a particular
Procrustes distance anywhere in the shape space will have
the same net difference in landmark coordinates in the
space of the original object (i.e. the real space in
which the object exists: termed "figure space"). The
principal directions of variations of shape in the shape
space are often of interest since they indicate correlated landmark differences.
Kendall's shape space is non-Euclidean (i.e. it is
curved) . Thus although it is possible to perform
analysis of the principal directions of variations of
shape in the non-Euclidean shape space, it is not
straightforward. For example, for triangles the space is
equivalent to the surface of a sphere of unit diameter as
illustrated in Figure 5. As can be seen in Figure 5,
equilateral triangles lie at the poles: the southern
hemisphere being a reflection of the northern hemisphere.
The sphere is divided into twelve equal half lunes (six in each hemisphere). If the apices of the triangles are unlabelled and reflections are ignored all the triangles
lie in one half lune. Isosceles triangles lie along the lines dividing the lunes and flat triangles lie at the
equator.
For more than three landmarks (k landmarks in m
dimensions) the space is high dimensional and more
complex. Because of this, great care is needed in
carrying out analyses and modelling of movements. For
this reason, in this embodiment the analysis of the
principal directions of variation of shape are carried out in the tangent space to Kendall's shape space. As can be seen in Figure 5 for triangles the scattered
points on the spherical shape space representing
variation within the range of samples is projected into
a Euclidean tangent plane at a tangent to the mean of the
sets of samples. Thus the coordinates of the points
representing specimens are no longer given in terms of
the sphere, but rather as coordinates in the plane. As
long as the projection has not resulted in excess
distortion (as might occur if the projection encompasses
a large proportion of the sphere) useful analyses can be carried out in this plane. For higher dimensions, the tangent plane to the shape space can be imagined as a
space of km-m-m(m-l ) /2-1 dimensions.
Figure 5 illustrates three steps (as numbered). The
first step is the generalised Procrustes analysis to
obtain Kendall's shape space from the landmark matrices.
The second step is the transformation of the data from
the non-Euclidean Kendall's shape space to the Euclidean
tangent space in which principal component analysis takes
place as will be described in more detail hereinafter.
A visualisation of data can then be achieved by back
projection using the principal component scores as will be described in more detail hereinafter.
The Procrustes tangent coordinates are estimated in
step S23 using a technique which would be apparent to a
skilled person in the art such as the technique used in
the paper by Dryden and Mardia ( "Multivariate shape
analysis" Sankhya volume 55(A) pages 460-480 1993) the
content of which is hereby incorporated by reference.
This projection results in a (k-l)m vector of tangent
space shape coordinates with respect to the mean for each specimen. The vectors of the tangent space are of rank km-m-m(m-l)/2-l. Principal component analysis is then carried out in step S24 using tangent space coordinates
to extract km-m-m(m-l)/2-l eigenvectors which are the principal components of variation of shape. Principal
component scores for the coordinates of the tangent
matrices are then determined in step S25 so that the
points in tangent space can be identified by principal
components and their scores . Thus in t s way a model of
the morphological behaviour with respect to form
references is provided as principal components and
principal component scores. The principal components are
all orthogonal and pass through the origin which comprises the mean of the specimens.
The principal components comprise eigenvectors
defined in the tangent space. Each eigenvector has an
eigenvalue which gives the statistical significance of
the eigenvector. Where there exists significant
correlations between landmark displacements amongst the
sample of shapes, it is reasonable to expect that the
first few principal components will serve as an adequate
representation of shape differences among that sample.
The method of generation of the model as principal components and principal component scores will now be described in more detail with reference to specimens as
illustrated in Figures 6 to 9.
A set of training faces is used to develop models of
facial posture deformation such as between phonemes and
expressions. These training faces represent the
movements of one particular face classified in terms of
phonemes and standard expressions for example so that
they encompass the range of all possible facial
movements. Alternatively, the training set could consist
of several different faces in different postures.
Each face is defined in terms of landmarks hereinafter termed "control points".
In the example illustrated in Figures 6a and 6b, two
example facial postures are shown each defined by 107
anatomical landmarks in three dimensions. A mesh is
drawn between the landmarks to deliver a visual impression.
Figure 7a illustrates superimposed coordinates of
control points of the set of training faces after
generalised Procrustes analysis. Figure 7b illustrates the mean coordinates for the mean facial posture and
Figure 7c illustrates a wire frame drawn between the mean coordinates. Figure 7d illustrates the polygons rendered
between the control points .
The next phase involves the extraction of principal
components of the covariance matrix between registered
control point coordinates. The principal components
represent, within the training faces, aspects of
correlated variation amongst the control points. Since
the face is only capable of a limited range of movements
(many of which are correlated through physical,
neurological and anatomical constraints ) , it is expected
that many fewer principal components than control point coordinates (in this example 107 landmarks x three
dimensions = 321) will encompass the whole range of
movements. It is found in this example that 9 principal
components account for 91% of the total variance in face
shapes within the training set.
The principal components in this training set
clearly represent aspects of facial movement which make
sense intuitively as well as numerically. In effect they
are control parameters for the set of possible integrated movements of the training faces.
Figure 8 is a plot of the first and second principal components. As can be seen, the linear scattering of
points passing diagonally along the plot represents variability amongst the training faces in mouth opening
and its correlated movements. In Figure 8 these
movements are visualised by back projection from the
principal component scores to warp the mean training
face. The upper left diagonal arrow indicates the
direction in which the warped mean representations were
computed. The second diagonal arrow, lower right, shows
the direction in which the warped mean face was computed to indicate coordinated eye closure and lip movement.
Figure 9 is a plot of the third principal component versus the fourth principal component. It can be seen
that these principal components represent asymmetric
aspects of facial shape variability. The four faces have
been obtained by back projecting from the principal
component scores in their respective positions in the plot.
Thus the embodiment described with reference to
Figures 3 to 9 result in the generation of a model for
the training faces as principal components and principal component scores . These are defined in relation to the shape of an object and not the form of an object.
An embodiment of the present invention will now be described with reference to Figures 10 to 12 in which a
reference set of data for a target object is entered and
stored. This process can be carried out as illustrated
on a separate computer system to the computer system used
for the generation of the model. Alternatively, it can
be carried out on the same computer system and thus the
systems illustrated in Figures 3 and 10 would be
combined. Figure 10 illustrates a computer system for generating the reference shape data. An input device 110
is provided for the input of a set of reference
coordinates which define a reference shape to be warped
or animated. This should be matched as closely as
possible to the reference shape, i.e. the mean shape, of
the training data for the best results. A display unit
120 is provided for display of the coordinate data. A
processor 130 is provided for implementing program code
stored in the program storage 150. The program storage
150 stores program code for carrying out the generalised
Procrustes analysis and when loaded in the processor 130 implements the generalised Procrustes analysis module
130a. Also, the program storage 150 stores program code for implementing the tangent projection and when this is
loaded in the processor 130 it implements the tangent
projection module 130b.
The program code can be provided to the computer
system via a storage medium such as a floppy disc 160 and
loaded into the program storage 150 for the
implementation of the process.
A working memory 140 is provided for use by the
processor 130. The working memory stores the data necessary for processing by the processor 130.
The components of the computer system are all linked
by a control and data bus 100.
The computer system of Figure 10 is implemented by
a general purpose computer under software control. The
processor 130 comprises the microprocessor of a general
purpose computer. The program storage 150 comprises the
mass-storage medium, e.g. hard disc, of the general purpose computer. The working memory 140 comprises the
random access memory (RAM) of the general purpose computer. The input device 110 can comprise a keyboard, disc drive, or modem for example as any means of
inputting the reference data for the target object to be
warped or animated.
The operation of the system will now be described
with reference to Figures 11 and 12.
In step S30 the target object landmarks are input as
a set of reference coordinates. In step S31 these are
registered against the mean matrix for the training data
stored in the working memory 140 in order to remove the
effects of size, translation and rotation. This is performed by the generalised Procrustes analysis module
130a.
In order to transform the data matrix from non-
Euclidean Kendall's shape space into the Euclidean
tangent space, the tangent projection module 130b carries
out the tangent projection process using the mean matrix
to form the tangent matrix for the object i.e. to provide
a point in tangent shape space. This can be visualised
in Figure 12 where for illustrative purposes a principal
component analysis has been carried out for the training
faces and for the target object, i.e. the frog. The first principal component PCI shows the change between the mean shape of the training face to the shape of the
front face. Principal component 2 (PC2) shows the change in shape from the face having an open mouth and open eyes
to a closed mouth and closed eyes. The faces illustrated
in the diagram are shown in relation to their relative
principal component positions on the plot. It can thus
be seen that by changing principal component 2 starting
from the principal component value for the frog, the
principal values can be obtained which when projected
back result in an image of a frog which has the animated properties of the training faces.
The process of generating data for the morphed or
animated target object will now be described with reference to Figures 13 to 15.
Figure 13 is a schematic diagram of a computer
system for generating a morphed image or an animated
image from a reference data set for the image and using
a model of morphological behaviour of a reference object.
Since the system is provided with the principal
components, principal component scores, and mean matrix provided by the system of Figure 3, and the tangent matrix for the object provided by the system of Figure 10, the system for animating or warping can be separate
to the systems of Figures 3 and 10. Alternatively, the systems of Figures 3, 10 and 13 could be combined in
general purpose computer to provide a computer system
having the functionality of all three separate systems .
The computer system of Figure 13 includes an input
device for inputting a form reference such as a phoneme
or facial expression which has been used to index the
principal component scores for the training faces . A
display unit 1200 is provided for displaying the warped or animated image. A processor 1300 is provided for implementing program code stored in the program storage
1500. In the program storage 1500 there is provided
program code for implementing the model look-up module
1300a in order to look up the principal components and
principal component scores using the input form
reference. Also within the program storage 1500 there is
provided program code for implementing the matrix warp
module 1300b in order to use the principal component
scores and principal components to generate a warped
matrix. Further within the program storage 1500 there is provided program code for implementing the inverse tangent projection module 1300c within the processor 1300
in order to transform the warped matrix into Kendall's shape space. Further within the program storage 1500
there is provided program code for implementing the wire
frame and render processing module 1300d in order to wire
frame and render the landmark matrix for the object for
display on the display unit 1200.
A working memory 1400 is provided for storing each
of the data matrices and PC scores used by the processor
1300.
The program code for the implementation of the process in the computer system can be provided on a
storage medium such as floppy disc 1600 for loading into
the program storage 1500.
The computer system is implemented using a general
purpose computer and an appropriate program. Thus the
processor 1300 comprises the microprocessor of the
general purpose computer. The program storage 1500
comprises a mass-storage device e.g. hard disc drive of
the general purpose computer. The working memory 1400 comprises the random access memory (RAM) of the general purpose computer.
The components of the computer system are
interconnected via the data and control bus 1000.
The method of operation of the system of Figure 13
will now be described with reference to Figures 14 and
15. In step S40 the shape space reference is input.
This can comprise a phoneme or facial expression which
has been used as an index for the principal components
stored in the model. In step S41 the principal component
scores and vector matrix for each principal component are
looked up using the input shape space reference. Thus what is now obtained is the change in shape from the
reference for the training faces. In order to morph or
animate the target face, it is however necessary to apply
this change to the reference face for the target object
rather than to the reference face for the training faces.
This is achieved in step S42 by taking, for each
principal component the eigenvectors multiplied by the
desired principal component score adding this to the
tangent matrix for the object. This is illustrated for
principal components 1 and 2 in Figure 15. Principal components 1 and 2 comprise eigenvectors in the tangent shape space and thus the principal component scores
identify coordinates in the tangent shape space relative
to the mean for the training faces. Thus this change can
simply be applied to the tangent matrix for the target
object. For example, in Figure 12, the change
illustrated in Figure 15 to the point illustrated for the
frog face would cause a change in the face vertically
towards an open mouth posture in view of the change in
PC2.
In order to obtain coordinates in shape space it is
then necessary in step S43 to inverse tangent project the tangent matrix to obtain the landmark matrix for the
target object. In step S44 the landmarks are then wire
framed and rendered in order to generate an image for
display on the display unit 1200.
In the embodiments described hereinabove, so far
only how controlled points might be animated has been
described. In practical applications, however, it is
likely that the rendering achieved just using polygons
drawn between control points will be inadequate for high
quality animation and graphics. In these cases it will be necessary to develop a polygon mesh for rendering in concert with the control points . This can be achieved by
use of a splining function such as the thin plate spline. A triplet of splines will work all 3D mesh coordinates
from frame to frame. This warping is such that good
points whose coordinates are identical to control points
in the reference will map directly into the equivalent
(matching) control point coordinates in the target. In
between control points the mapping of grid points between
reference and target shapes will be such that the grid is
minimally bent. Figure 16 illustrates the use of triplet splines to warp a regular grid between the reference and target
images as described hereinabove.
Although in the specific embodiment described
hereinabove, landmark data is used, the present invention
is also applicable to the use of splines.
The thin plate spline method provides a very
effective means of calculating the vector of motion of
the vertices of a polygon mesh representing a face based
on the displacement of control points . Thus a complex rendering model can be animated from only the analysis of
a few coordinates. Polygon mesh itself can be derived from connecting x, y and z coordinate information
gathered from a 3D camera, laser scanner etc. in a simple
linear fashion usually to form a network of planar
triangles or quadrilateral facets (generally these facets
are an approximation to a curved surface). A typical
face might require over 1000 polygon facets before using
a crude appearance. Therefore the ability to summarise
and analyse facial movement using the motion of key
control points common to all faces and then reapplying
those movements to different, final high polygon-count,
render objects is greatly desirable. This can be achieved by representing the face or any object in the
form of connected spline patches. Any curve can be
represented mathematically as a parametric function of
polynomials of order three. The class of such curves is
known as B-splines and the most powerful are non-uniform
rational B-splines (NURBS) which are in fact
generalisations of all B-splines. B-splines consist of
curve segments whose polynomial coefficients depend on
tangent vectors to the segments . The end points of these
vectors define the shape of the splines and are therefore called control points. Figure 17a illustrates a NURBS curve with control points and Figure 17b illustrates the
same curve after deformation produced in translating
control point B7.
Thus instead of generating polygon facets from the
sampled vertices of a 3D digitizer, it is well known in
the prior art to produce splines which fit to these
vertices producing a spline mesh that has minimum bending
energy.
There are a number of advantages to working with
NURBS. They can represent virtually any desired shape,
from points, straight lines, and polylines to conic sections (circles, ellipses, parabolas and hyperbolas ) to
freeform curves with arbitrary shapes. They provide a
great deal of control over the shape of the curve. A set
of control points and knots (points where one spline
joins another) which guide the curves shape can be
directly manipulated to control its smoothness and
curvature. They can represent very complex shapes with
remarkably little data. Thus the use of NURBS greatly
reduces the magnitude of information needed to preserve
accuracy. Thus the present invention encompasses the use of spline control points as the data sets instead of landmark data.
Although in the specific embodiment described hereinabove, the whole image of the object was morphed or
animated, the present invention is applicable to any
parts of the image of an object. For example, the lips
and eyes of a face can be animated independently using
the same technique. The separate animations of adjacent
or overlapping anatomical features can thus be
reconstructed into whole faces through least squares
registration of shared landmarks against a neutral face
with shared landmarks. This approach allows more easy control of the individual movements of these features but
will lose information about correlation of movements in
between features .
In the embodiment described hereinabove only one
face in different postures is used in the training set
and thus a set of principal components is determined in
which each posture is defined in terms of the coordinates
of a single point. If several faces are used in the
training set then each posture will be represented by several points in the shape space. The difference between any two points representing the same posture would in part be due to differences in face and in part
to differences between faces in the way the facial posture is adopted. The differences due to face should
be factored out by regression or by disregarding
principal components that contained only information
relating to face differences. Residual variation on
principal components could then be used as facial control
parameters as for the case of a single training set.
Alternatively, and computationally more simple, would be
the straightforward taking of the Procrustes means of all
training faces in each posture. These averages would then be subjected to principal component analysis as
above and used to generate principal component control
parameters .
Since the method for generating movement through
principal component analysis of tangent coordinates
returns displacements in x, y and z which are applied to
the mean or alternative face, it matters only a little if
these are very different in shape e.g. human mean
training face and a frog alternative face) since the
displacements still apply. If, however, very large discrepancies between relative facial proportions leads to movements which are relatively too small or too large
for the alternative face these can be corrected by differentially multiplying movements applied to sets of
control points, in particular anatomical regions.
Alternatively, it could be dealt with by use of separate
principal component analyses if possibly overlapping
anatomical regions with subsequent registration of these
regions to each other.
In order to get the movements of alternative faces
to look acceptable, it is preferable that these faces are presented in a posture that is like that of the mean training face. This is because the movements are
calculated as displacements of the mean coordinates.
This posture can be achieved by careful control at the
time of digitization of the alternatives or by taking
control point coordinates from the alternatives in
numerous poses using back projection of principal
component analyses from shape analyses of these postures
to generate ones which are most satisfactory.
Although in the specific embodiments described hereinabove the object to be morphed or animated is
unrelated to the training object, this need not be the case. It is possible for the object to be animated to be
a caricature of the training object. For example, the principal components can be weighted to increase the
emphasis of certain features. This modified or warped
object can then be used as the reference data for the
target object. The weighting can be achieved by taking
the weighted average of more than one object. For
example, as shown in Figure 12, the reference face could
be an average of the frog face and the training face.
Another method of achieving a caricature is to
weight the principal components which are applied to the reference data for the target object. In this way the
movements applied as a morphing the object can be
exaggerated or reduced.
It can thus be seen from the embodiments hereinabove
described of the present invention that the present
intention is widely applicable to the morphing and
animation of an object using the referenced movement of
another object. The object can comprise any moveable
object e.g. the body of a person or animal or a
mechanical object.
The present invention is ideally suited to implementation by a computer program implemented on a
general purpose computer. Thus the present invention can be embodied as a carrier such as a storage medium e.g.
floppy disc, a signal e.g. a signal carrying a software
over network such as Internet, or an electronic device
e.g. a programmable read only memory.
Since in the embodiments described hereinabove the
Procrustes analysis eliminates scaling, the variations
quantified by the principal component analysis are shape
rather than form variations . In order to restore size
for the training set, regressions of principal component scores verses centroid size for the significant principal
components can be carried out. Alternatively, the
distances between two standard points in the object e.g.
inter pupillary distance for a face, could be used to
quickly rescale the generated morph of the target object
once the shape has been reconstructed through back
projection from principal component scores to landmark
(control point) configurations. Where morphing or animation of only part of an object has taken place a
third point can be taken and used to allow not only rescaling but also re-registering.
In the present invention the model of the
morphological behaviour of a group of objects, e.g.
faces, or parts of objects, e.g. the mouth of one or more
faces, can be generated in one computer as illustrated in
Figure 3 and stored for later use on the same computer or
transmitted to another computer, e.g. electronically over
a network such as the Internet, or via any other storage
medium such as a floppy disc. The recipient of the model
can use this model to change the morphological behaviour
of another object. For example, a person could generate a model of their own facial morphological behaviour and transmit this together with a message, e.g. text or
speech, to a recipient. The data transmitted using this
technique is far less than the data which would be
required to be transmitted to provide a moving image of
the face, e.g. video. The recipient of the model and the
message can then use the model and the message to
generate a morph or animation of another object, e.g. a
cartoon character. Thus using this technique an audio
message can be replayed either as a direct recording of the speech or as a result of speech generation from the text, in synchronism with an animated morph of an object such as a cartoon character having the morphological
behaviour of the originator of the message.
Although the present invention has been described
hereinabove with reference to specific embodiments, the
present invention is not limited to such embodiments and
it would be apparent to a skilled person in the art that
modifications are possible within the spirit and scope of
the present invention.