WO2009030636A1 - Method and system for synthesis of non-primary facial expressions - Google Patents

Method and system for synthesis of non-primary facial expressions Download PDF

Info

Publication number
WO2009030636A1
WO2009030636A1 PCT/EP2008/061319 EP2008061319W WO2009030636A1 WO 2009030636 A1 WO2009030636 A1 WO 2009030636A1 EP 2008061319 W EP2008061319 W EP 2008061319W WO 2009030636 A1 WO2009030636 A1 WO 2009030636A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial
expression
primary
facial image
face
Prior art date
Application number
PCT/EP2008/061319
Other languages
French (fr)
Inventor
John Ghent
John Mcdonald
Original Assignee
National University Of Ireland, Maynooth
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Of Ireland, Maynooth filed Critical National University Of Ireland, Maynooth
Publication of WO2009030636A1 publication Critical patent/WO2009030636A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Definitions

  • the present invention relates to methods and systems for synthesising images of facial expressions, and in particular, to a method and system for synthesising photo -realistic images of facial expressions of an individual from a neutral image of the individual.
  • One technique used to synthesise facial expressions is to manually alter the facial expression of an image of an individual using image processing software packages.
  • the process of altering an expression can take several hours and requires a highly skilled graphic artist.
  • Parameterisation techniques change the appearance of an individual's expression by varying independent parameter values as described in "Parameterized facial expression synthesis based on mpeg-4", A. Raouzaiou, N. Tsapatsoulis, and K. Karpouzis, Journal on Applied Signal Processing, 2002.
  • the advantage of this approach is the ability to synthesise numerous expressions by combining different parameter values.
  • this technique is dependent on the facial mesh topology and therefore completely generic parameterisation is not feasible.
  • Realistic 3D synthetic facial expressions can be modelled using a physics-based approach. This technique creates geometric models of the eyes, eyelids, teeth (incisor, canine and molar teeth), and hair and neck as described in "Realistic modelling for facial animation", Y. Lee, D. Terzopoulos, and K. Waters, ACM, 1995. This approach can produce excellent results. However, the images are not photorealistic and it requires intensive calculations to complete a synthetic expression. Similarly, the vast majority of 3D approaches to facial expression synthesis are computationally complex and can require extensive calculations.
  • US Patent No. 6,940,454 describes a method for generating facial expression values. However, the outputs are not photo -realistic images but a coded description of the audio and visual data contained in the input sequence.
  • US Patent No. 6,088,040 describes a facial image conversion method. This method takes images of a number of expressions from one face and synthesises expressions for that face.
  • US Patent No. 7,123,263 a method for modelling a 3-dimensional facial system is described. However, this method requires information from the user to guess which 3- dimensional model of a face to use and hence is not fully automatic. Furthermore, this approach is more complex due to its 3-dimensional nature.
  • a method for synthesis of primary facial expressions is described in "A Computational Model of Facial Expression”, J. Ghent, PhD Thesis, 2005 and “Photo -realistic facial expression synthesis”, J. Ghent and J. McDonald, Image and Vision Computing Journal, 2005. This method involves generation of a subject-independent matrix-based shape and texture model for each of the primary facial expressions.
  • a shape model of a facial expression describes how the outline shape of the face, as well as the position and shape of the features on the face, should change to portray that expression.
  • a texture model of a facial expression describes how the image pixel values (that is, the intensity or darkness of each pixel of the facial image) should change to portray the expression.
  • a shape and texture model may be used to accurately predict how the shape of the face, the shape and position of the features on the face and the shadows, wrinkles and creases on the face will change for a particular expression.
  • the subject- independent model is applied to a neutral image of the face to generate an image with the required facial expression.
  • this subject-independent model is only useful for synthesising primary facial expressions, and is inadequate for synthesis of subtle, non-primary facial expressions (that is, expressions other than the six primary expressions).
  • this approach provides no means of interpolating between expressions, and synthesis of hybrid or dynamic facial expressions is therefore not possible.
  • Hybrid facial expressions are a collection of expressions at different intensities and/or expressions with a mixture of two or more of the primary facial expressions.
  • the term hybrid expression includes intermediate expressions exhibited during the transition from one primary expression to another.
  • Dynamic facial expressions are facial expressions which change over time, for example, in an animation or movie. For example, to create a movie of an individual forming a smile from a surprised expression would require a method of synthesising dynamic and hybrid facial expressions.
  • a method for synthesis of a non-primary facial expression comprising: acquiring a facial image; calculating a shape and a texture for each of a plurality of primary facial expressions for the facial image; generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures; selecting an expression vector corresponding to a non-primary facial expression to be synthesised; applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression.
  • non- primary facial expression includes hybrid and dynamic facial expressions. Because this method provides a single shape and texture model for each subject, rather than for each expression, it is possible to interpolate between the primary expressions, thereby allowing synthesis of non-primary facial expressions.
  • the acquired facial image portrays a substantially neutral expression. This allows an accurate result to be achieved.
  • the subject specific model of the face is a matrix-based model.
  • the step of applying the expression vector to the model may comprise multiplying the expression vector by the matrix to obtain an output vector that describes the selected non-primary facial expression.
  • Each of the columns of the matrix may correspond to a primary facial expression.
  • the output vector may be a weighted linear combination of the columns of the matrix.
  • the output vector thus represents a weighted combination of each of the primary facial expressions.
  • the method may include the step of generating an estimate of the size and location of the face within the acquired facial image.
  • the method may further include the step of scaling and aligning the estimate to an initial mean face shape.
  • the scaling and aligning may be done using Procrustes alignment.
  • Procrustes alignment is a standard algorithm for aligning two shapes to each other by adjusting only the scale, rotation, and translation, so that the actual shape remains unchanged.
  • the initial mean face shape may be pre-calculated based on a collection of facial images.
  • the method may include the step of locating at least one feature of the face. This step may include locating features such as the eyes, eyebrows, pupils, nose, and/or mouth. The locating step may include identifying a plurality of landmark points associated with at least one feature of the face.
  • the step of calculating the shape and texture of the plurality of primary facial expressions for the facial image may comprise applying subject-independent Facial Expression Shape Model and Facial Expression Texture Model algorithms to the facial image to calculate a shape and texture for the face for each of the primary facial expressions.
  • the method may further include the step of aligning the calculated shapes for each of the primary expressions to each other.
  • the alignment may be done using Procrustes alignment.
  • a new mean face shape may be calculated from the aligned shapes.
  • the method may also include the step of warping each of the calculated textures to the new mean face shape. The warping may be done using a piece-wise aff ⁇ ne transformation algorithm.
  • the method may further include the step of scaling the synthesised face shape to its original size (thus creating a new synthesised shape vector) and warping the synthesised texture to the new synthesised shape vector.
  • the method may further comprise combining the synthesised facial image with the acquired facial image.
  • the step of combining the synthesised facial image with the acquired facial image may comprise: superimposing the synthesised facial image over the acquired facial image; and blending the synthesised facial image with the acquired facial image.
  • a method for generating an expression look-up table comprising: analysing a plurality of images, each of which depicts a facial expression; for each image, determining the relationship between a neutral expression and the facial expression depicted therein to create an expression vector associated with the facial expression; storing each expression vector indexed by an expression description descriptive of the associated facial expression.
  • the expression look-up table may be used to synthesise a facial expression in accordance with the method described above.
  • An expression vector corresponding to the expression to be synthesised may be selected from the expression look-up table and applied to the subject-specific model to synthesise a facial image having the desired facial expression.
  • the expression look-up table may be generated once, during a set-up phase, using images of a single subject (or of a plurality of subjects). The expression vectors generated from these images may then be applied to subject- specific models of other subjects in accordance with the method described above.
  • a system for synthesis of a non-primary facial expression comprising: means for acquiring a facial image; means for calculating a shape and texture of a plurality of primary facial expressions for the facial image; means for generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures; means for selecting an expression vector corresponding to a non-primary facial expression to be synthesised; means for applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression.
  • Figure 1 is a flow chart for a method of synthesis of a non-primary facial expression in accordance with the present invention
  • Figure 2 shows a model of a face for use in the method of the invention
  • Figure 3 is a flow chart for a method of generation of a facial expression shape model according to the present invention.
  • Figure 4 is a flow chart for a method of generation of a facial expression texture model according to the present invention.
  • Step 1 comprises acquiring a facial image.
  • the image acquisition is performed using a standard digital camera.
  • the acquired image preferably portrays the subject with a neutral facial expression.
  • step 2 an estimate of the size and location of the face within the image is generated.
  • the face is assumed to be free from occlusions and located so that it is the most prominent object in the image.
  • the estimate is scaled and aligned to a predetermined mean face shape Procrustes alignment.
  • the mean face shape is an average face shape calculated based on a collection of facial images.
  • 122 landmark points located around the eye, the eyebrow, the pupil, the nose, and the mouth represent each shape, i.e. the acquired face shape and the mean face shape.
  • Procrustes alignment is used to minimise the weighted sum of the squared distances between the corresponding landmark points.
  • an active shape model search is performed to locate all features of the face (eyes, eyebrows, pupils, nose, and mouth).
  • the features of the face are located using the landmark points discussed above. For example, eight landmark points around the eye accurately describe the location of the eye on the face. The landmark points are used as inputs to the subsequent steps of the method.
  • Step 5 comprises calculating the shape and texture of a plurality of primary facial expressions for the facial image. This is done using the techniques described in "A Computational Model of Facial Expression”, J. Ghent, PhD Thesis, 2005 and “Photo- realistic facial expression synthesis”, J. Ghent and J. McDonald, Image and Vision Computing Journal, 2005.
  • FESM Facial Expression Shape Model
  • FETM Facial Expression Texture Model
  • the FESM and FETM are generated (step 14) from a database of images of faces depicting each of the primary facial expressions as described by the Facial Action Coding System (FACS).
  • FACS Facial Action Coding System
  • the FESM and FETM algorithms are applied to the acquired facial image to calculate a shape and texture for the face for each of the primary facial expressions.
  • step 6 the calculated shapes for each of the primary expressions are aligned to each other using Procrustes alignment and a new mean face shape is calculated.
  • step 7 each of the calculated textures is warped to the newly calculated mean face shape using a piece-wise aff ⁇ ne transformation algorithm.
  • Step 8 comprises generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures. As shown in Figures 3 and 4, this is achieved by generating a shape and texture model that contains only the neutral expression and the 6 primary facial expressions for the individual face.
  • the model is a matrix-based model denoted as iSpace (or identity-Space).
  • the iSpace is generated using the warped textures and the aligned shapes generated in steps 6 and 7.
  • iSpace is a unified expression model of the face which allows primary and non-primary (that is, hybrid or dynamic) expressions to be generated for a specific face.
  • a subject-specific FESM and FETM are generated for the face, denoted iFESM and iFETM respectively.
  • the iFESM and iFETM form the iSpace for the face as shown in Figure 2.
  • the iSpace allows for the synthesis of non-primary facial expressions that retain the identity of an individual and, by interpolation, the generation of dynamic expression image sequences of that individual.
  • the primary facial expressions form the basis of the iSpace and the iSpace consists of all linear combinations of this basis. Particular combinations will result in particular non- primary facial expressions.
  • an expression vector corresponding to a non-primary facial expression to be synthesised is selected.
  • the expression vector is selected from an Expression Code Book (ECB).
  • the ECB is a look-up table that relates high-level expression descriptions (e.g. consciously surprised) to 7-dimensional expression vectors.
  • the ECB is generated by analysing a plurality of images, each of which depicts a facial expression. For each image, the relationship between a neutral expression and the facial expression depicted therein is determined to create an expression vector associated with the facial expression.
  • the expressions vectors are stored so that they may be indexed by an expression description descriptive of the associated facial expression.
  • the ECB provides coordinates of particular expressions within iSpace.
  • the expression vector for a particular non-primary expression describes the amount of each primary expression in the non-primary expression.
  • Each of the columns of the iSpace matrix corresponds to a primary facial expression (or neutral expression).
  • a particular output expression is generated as a weighted linear combination of the column vectors of the iSpace matrix, where the weights are given by the particular expression vector.
  • the ECB can be used in conjunction with iSpace to generate dynamic, hybrid, and non-neutral facial expressions. This is shown in Equation 1 below where V 0 is an output vector, S 1 is an MxN iSpace Matrix (i.e. the column vectors of this matrix are the basis vectors) and E v is the expression vector.
  • the output vector V 0 is a linear combination of the columns of S 1 .
  • Any non- primary facial expression may be approximated by a linear combination of the basis vectors (i.e. primary expressions).
  • weighted linear combinations that may be generated and thus there is no limit to the degree of subtlety of the hybrid expressions than may be synthesised using the method of the present invention
  • the ECB is generated by using images portraying specific facial expressions to calculate the expression vector as follows:
  • S'" v is the pseudo-inverse of iSpace and E 1 is a training expression vector.
  • the ECB is generated only once using a single exemplar individual and may then be applied to other faces according to the method of the invention.
  • step 10 the expression vector is applied to the iSpace model to synthesise a facial image having the selected non-primary facial expression.
  • step 11 the shape and textures are projected out of iSpace to generate the output vector V 0 as given by
  • Equation 1 The synthesised shape is rescaled to its original size and the synthesised texture is warped to the new synthesised shape vector.
  • the synthesised facial image is combined with the acquired facial image.
  • Step 12 comprises superimposing the synthesised facial image over the acquired facial image.
  • the synthesised image is blended with the acquired image using standard image processing techniques for added realism.
  • the method and system of the present invention may be implemented within a mobile telephone, personal digital assistant (PDA), camera or other similar device and used to generate images and/or animations portraying a variety of facial expressions of a subject.
  • the background of the image may also be varied, so as to depict the subject in various locations or situations.
  • the created images or animations may be may be shared or sent via the Internet or transmitted using wireless telecommunication networks.

Abstract

The present invention relates to a method for synthesis of a non-primary facial expression. The method comprises acquiring a facial image and calculating a shape and a texture for each of a plurality of primary facial expressions for the facial image. The method further comprises the step of generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures. The method also comprises selecting an expression vector corresponding to a non-primary facial expression to be synthesised and applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression. The invention also relates to a system for synthesis of a non-primary facial expression and to a method for generating an expression look-up table.

Description

Title
METHOD AND SYSTEM FOR SYNTHESIS OF NON-PRIMARY FACIAL EXPRESSIONS
Field of the Invention
The present invention relates to methods and systems for synthesising images of facial expressions, and in particular, to a method and system for synthesising photo -realistic images of facial expressions of an individual from a neutral image of the individual.
Background to the Invention
Techniques for generating images representing different facial expressions of an individual are known. These techniques have applications in many areas of technology, in particular, in animation and computer gaming applications.
One technique used to synthesise facial expressions is to manually alter the facial expression of an image of an individual using image processing software packages. However, the process of altering an expression can take several hours and requires a highly skilled graphic artist. Furthermore, it is difficult to output a photo -realistic image using this technique.
Several attempts have been made to automate facial expression synthesis over the past 10 years. One approach is described in "Geometric driven photorealistic facial expression synthesis", Q. Zhang, Z. Liu, B. Guo, and H. Shum, SIGGRAPH Symposium on Computer Animation, 2003. This approach takes several images of a single face, warps each image to a mean shape, divides the face into 14 regions and uses these regions as examples of texture change for each expression. A difference vector of feature points between neutral and non-neutral expressions is then calculated. This difference vector is used to synthesise a shape depicting a specific expression. However, this approach has limitations, as it is subject or face specific and computationally expensive.
Parameterisation techniques change the appearance of an individual's expression by varying independent parameter values as described in "Parameterized facial expression synthesis based on mpeg-4", A. Raouzaiou, N. Tsapatsoulis, and K. Karpouzis, Journal on Applied Signal Processing, 2002. The advantage of this approach is the ability to synthesise numerous expressions by combining different parameter values. However, this technique is dependent on the facial mesh topology and therefore completely generic parameterisation is not feasible.
Realistic 3D synthetic facial expressions can be modelled using a physics-based approach. This technique creates geometric models of the eyes, eyelids, teeth (incisor, canine and molar teeth), and hair and neck as described in "Realistic modelling for facial animation", Y. Lee, D. Terzopoulos, and K. Waters, ACM, 1995. This approach can produce excellent results. However, the images are not photorealistic and it requires intensive calculations to complete a synthetic expression. Similarly, the vast majority of 3D approaches to facial expression synthesis are computationally complex and can require extensive calculations.
There have been several attempts to model muscular behaviour for facial expression synthesis. However, these approaches lack the ability to model the exact structure of the human face. Muscle movements are simulated in the form of splines, wires and free form deformations. An example of free form deformations can be seen in "Model based face reconstruction for animation", Y. Lee, P. Karla and N. Magnenat- Thalmann, Proceedings of Multimedia Modelling, 1997. A disadvantage of this approach is that the synthesised images are not photo -realistic.
US Patent No. 6,940,454 describes a method for generating facial expression values. However, the outputs are not photo -realistic images but a coded description of the audio and visual data contained in the input sequence. US Patent No. 6,088,040 describes a facial image conversion method. This method takes images of a number of expressions from one face and synthesises expressions for that face. In US Patent No. 7,123,263, a method for modelling a 3-dimensional facial system is described. However, this method requires information from the user to guess which 3- dimensional model of a face to use and hence is not fully automatic. Furthermore, this approach is more complex due to its 3-dimensional nature. A method for synthesis of primary facial expressions (fear, anger, surprise, happiness sadness and disgust) is described in "A Computational Model of Facial Expression", J. Ghent, PhD Thesis, 2005 and "Photo -realistic facial expression synthesis", J. Ghent and J. McDonald, Image and Vision Computing Journal, 2005. This method involves generation of a subject-independent matrix-based shape and texture model for each of the primary facial expressions.
A shape model of a facial expression describes how the outline shape of the face, as well as the position and shape of the features on the face, should change to portray that expression. A texture model of a facial expression describes how the image pixel values (that is, the intensity or darkness of each pixel of the facial image) should change to portray the expression. Thus, a shape and texture model may be used to accurately predict how the shape of the face, the shape and position of the features on the face and the shadows, wrinkles and creases on the face will change for a particular expression.
When a facial expression is to be generated for an individual face, the subject- independent model is applied to a neutral image of the face to generate an image with the required facial expression. However, this subject-independent model is only useful for synthesising primary facial expressions, and is inadequate for synthesis of subtle, non-primary facial expressions (that is, expressions other than the six primary expressions). Furthermore, because the models for each of the expressions are entirely separate, this approach provides no means of interpolating between expressions, and synthesis of hybrid or dynamic facial expressions is therefore not possible.
Hybrid facial expressions are a collection of expressions at different intensities and/or expressions with a mixture of two or more of the primary facial expressions. The term hybrid expression includes intermediate expressions exhibited during the transition from one primary expression to another. Dynamic facial expressions are facial expressions which change over time, for example, in an animation or movie. For example, to create a movie of an individual forming a smile from a surprised expression would require a method of synthesising dynamic and hybrid facial expressions.
Summary of the Invention According to a first aspect of the present invention, there is provided a method for synthesis of a non-primary facial expression, comprising: acquiring a facial image; calculating a shape and a texture for each of a plurality of primary facial expressions for the facial image; generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures; selecting an expression vector corresponding to a non-primary facial expression to be synthesised; applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression.
An advantage of this arrangement is that photo -realistic non-primary facial expressions may be synthesised using a single acquired facial image. The term non- primary facial expression includes hybrid and dynamic facial expressions. Because this method provides a single shape and texture model for each subject, rather than for each expression, it is possible to interpolate between the primary expressions, thereby allowing synthesis of non-primary facial expressions.
Preferably, the acquired facial image portrays a substantially neutral expression. This allows an accurate result to be achieved.
Preferably, the subject specific model of the face is a matrix-based model. The step of applying the expression vector to the model may comprise multiplying the expression vector by the matrix to obtain an output vector that describes the selected non-primary facial expression. Each of the columns of the matrix may correspond to a primary facial expression. The output vector may be a weighted linear combination of the columns of the matrix. The output vector thus represents a weighted combination of each of the primary facial expressions. The method may include the step of generating an estimate of the size and location of the face within the acquired facial image. The method may further include the step of scaling and aligning the estimate to an initial mean face shape. The scaling and aligning may be done using Procrustes alignment. Procrustes alignment is a standard algorithm for aligning two shapes to each other by adjusting only the scale, rotation, and translation, so that the actual shape remains unchanged. The initial mean face shape may be pre-calculated based on a collection of facial images.
The method may include the step of locating at least one feature of the face. This step may include locating features such as the eyes, eyebrows, pupils, nose, and/or mouth. The locating step may include identifying a plurality of landmark points associated with at least one feature of the face.
The step of calculating the shape and texture of the plurality of primary facial expressions for the facial image may comprise applying subject-independent Facial Expression Shape Model and Facial Expression Texture Model algorithms to the facial image to calculate a shape and texture for the face for each of the primary facial expressions.
The method may further include the step of aligning the calculated shapes for each of the primary expressions to each other. The alignment may be done using Procrustes alignment. A new mean face shape may be calculated from the aligned shapes. The method may also include the step of warping each of the calculated textures to the new mean face shape. The warping may be done using a piece-wise affϊne transformation algorithm.
The method may further include the step of scaling the synthesised face shape to its original size (thus creating a new synthesised shape vector) and warping the synthesised texture to the new synthesised shape vector. The method may further comprise combining the synthesised facial image with the acquired facial image. The step of combining the synthesised facial image with the acquired facial image may comprise: superimposing the synthesised facial image over the acquired facial image; and blending the synthesised facial image with the acquired facial image.
According to another aspect of the invention, there is provided a method for generating an expression look-up table, comprising: analysing a plurality of images, each of which depicts a facial expression; for each image, determining the relationship between a neutral expression and the facial expression depicted therein to create an expression vector associated with the facial expression; storing each expression vector indexed by an expression description descriptive of the associated facial expression.
The expression look-up table may be used to synthesise a facial expression in accordance with the method described above. An expression vector corresponding to the expression to be synthesised may be selected from the expression look-up table and applied to the subject-specific model to synthesise a facial image having the desired facial expression.
The expression look-up table may be generated once, during a set-up phase, using images of a single subject (or of a plurality of subjects). The expression vectors generated from these images may then be applied to subject- specific models of other subjects in accordance with the method described above.
According to a further aspect of the present invention, there is provided a system for synthesis of a non-primary facial expression, comprising: means for acquiring a facial image; means for calculating a shape and texture of a plurality of primary facial expressions for the facial image; means for generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures; means for selecting an expression vector corresponding to a non-primary facial expression to be synthesised; means for applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression.
Brief Description of the Drawings
Figure 1 is a flow chart for a method of synthesis of a non-primary facial expression in accordance with the present invention;
Figure 2 shows a model of a face for use in the method of the invention;
Figure 3 is a flow chart for a method of generation of a facial expression shape model according to the present invention; and
Figure 4 is a flow chart for a method of generation of a facial expression texture model according to the present invention.
Detailed Description of the Drawings
A method for synthesis of a non-primary facial expression is shown in Figure 1. Step 1 comprises acquiring a facial image. The image acquisition is performed using a standard digital camera. The acquired image preferably portrays the subject with a neutral facial expression. In step 2, an estimate of the size and location of the face within the image is generated. The face is assumed to be free from occlusions and located so that it is the most prominent object in the image. These constraints ensure that it can be found using standard statistical methods.
In step 3, the estimate is scaled and aligned to a predetermined mean face shape Procrustes alignment. The mean face shape is an average face shape calculated based on a collection of facial images. 122 landmark points located around the eye, the eyebrow, the pupil, the nose, and the mouth represent each shape, i.e. the acquired face shape and the mean face shape. Procrustes alignment is used to minimise the weighted sum of the squared distances between the corresponding landmark points. In step 4 of the method, an active shape model search is performed to locate all features of the face (eyes, eyebrows, pupils, nose, and mouth). The features of the face are located using the landmark points discussed above. For example, eight landmark points around the eye accurately describe the location of the eye on the face. The landmark points are used as inputs to the subsequent steps of the method.
Step 5 comprises calculating the shape and texture of a plurality of primary facial expressions for the facial image. This is done using the techniques described in "A Computational Model of Facial Expression", J. Ghent, PhD Thesis, 2005 and "Photo- realistic facial expression synthesis", J. Ghent and J. McDonald, Image and Vision Computing Journal, 2005.
As shown in Figures 3 and 4, subject-independent shape and texture models denoted as the Facial Expression Shape Model (FESM), and the Facial Expression Texture Model (FETM) respectively are used to calculate the shape and the texture of the six primary facial expressions (fear, anger, surprise, happiness, sadness and disgust) for the face. The FESM and FETM are generated (step 14) from a database of images of faces depicting each of the primary facial expressions as described by the Facial Action Coding System (FACS). In steps 5a and 5b, the FESM and FETM algorithms are applied to the acquired facial image to calculate a shape and texture for the face for each of the primary facial expressions.
In step 6, the calculated shapes for each of the primary expressions are aligned to each other using Procrustes alignment and a new mean face shape is calculated. In step 7, each of the calculated textures is warped to the newly calculated mean face shape using a piece-wise affϊne transformation algorithm.
Step 8 comprises generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures. As shown in Figures 3 and 4, this is achieved by generating a shape and texture model that contains only the neutral expression and the 6 primary facial expressions for the individual face. The model is a matrix-based model denoted as iSpace (or identity-Space). The iSpace is generated using the warped textures and the aligned shapes generated in steps 6 and 7. iSpace is a unified expression model of the face which allows primary and non-primary (that is, hybrid or dynamic) expressions to be generated for a specific face.
Thus, a subject-specific FESM and FETM are generated for the face, denoted iFESM and iFETM respectively. Together, the iFESM and iFETM form the iSpace for the face as shown in Figure 2. The iSpace allows for the synthesis of non-primary facial expressions that retain the identity of an individual and, by interpolation, the generation of dynamic expression image sequences of that individual. The primary facial expressions form the basis of the iSpace and the iSpace consists of all linear combinations of this basis. Particular combinations will result in particular non- primary facial expressions.
In step 9, an expression vector corresponding to a non-primary facial expression to be synthesised is selected. The expression vector is selected from an Expression Code Book (ECB). The ECB is a look-up table that relates high-level expression descriptions (e.g. pleasantly surprised) to 7-dimensional expression vectors. The ECB is generated by analysing a plurality of images, each of which depicts a facial expression. For each image, the relationship between a neutral expression and the facial expression depicted therein is determined to create an expression vector associated with the facial expression. The expressions vectors are stored so that they may be indexed by an expression description descriptive of the associated facial expression.
Essentially, the ECB provides coordinates of particular expressions within iSpace.
The expression vector for a particular non-primary expression describes the amount of each primary expression in the non-primary expression. Each of the columns of the iSpace matrix corresponds to a primary facial expression (or neutral expression). A particular output expression is generated as a weighted linear combination of the column vectors of the iSpace matrix, where the weights are given by the particular expression vector. Thus, the ECB can be used in conjunction with iSpace to generate dynamic, hybrid, and non-neutral facial expressions. This is shown in Equation 1 below where V0 is an output vector, S1 is an MxN iSpace Matrix (i.e. the column vectors of this matrix are the basis vectors) and Ev is the expression vector.
V0 - StEv (Equation 1)
Thus, the output vector V0 is a linear combination of the columns of S1. Any non- primary facial expression may be approximated by a linear combination of the basis vectors (i.e. primary expressions). There are potentially an infinite number of weighted linear combinations that may be generated and thus there is no limit to the degree of subtlety of the hybrid expressions than may be synthesised using the method of the present invention
The ECB is generated by using images portraying specific facial expressions to calculate the expression vector as follows:
SmvEt = E, (Equation 2)
where S'"v is the pseudo-inverse of iSpace and E1 is a training expression vector. The ECB is generated only once using a single exemplar individual and may then be applied to other faces according to the method of the invention.
In step 10, the expression vector is applied to the iSpace model to synthesise a facial image having the selected non-primary facial expression. In step 11, the shape and textures are projected out of iSpace to generate the output vector V0 as given by
Equation 1. The synthesised shape is rescaled to its original size and the synthesised texture is warped to the new synthesised shape vector.
In steps 12 and 13, the synthesised facial image is combined with the acquired facial image. Step 12 comprises superimposing the synthesised facial image over the acquired facial image. In step 13, the synthesised image is blended with the acquired image using standard image processing techniques for added realism. The method and system of the present invention may be implemented within a mobile telephone, personal digital assistant (PDA), camera or other similar device and used to generate images and/or animations portraying a variety of facial expressions of a subject. The background of the image may also be varied, so as to depict the subject in various locations or situations. Where the invention is implemented in a mobile telephone, PDA or other mobile telecommunications device, the created images or animations may be may be shared or sent via the Internet or transmitted using wireless telecommunication networks.
The words "comprises/comprising" and the words "having/including" when used herein with reference to the present invention are used to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.

Claims

Claims
1. A method for synthesis of a non-primary facial expression, comprising: acquiring a facial image; calculating a shape and a texture for each of a plurality of primary facial expressions for the facial image; generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures; selecting an expression vector corresponding to a non-primary facial expression to be synthesised; and applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression.
2. A method as claimed in claim 1, wherein the subject-specific model of the face is a matrix-based model.
3. A method as claimed in claim 2, wherein the step of applying the expression vector to the model comprises multiplying the expression vector by the matrix to obtain an output vector that describes the selected non-primary facial expression.
4. A method as claimed in claim 2 or claim 3, wherein each of the columns of the matrix corresponds to a primary facial expression.
5. A method as claimed in claim 4, wherein the output vector is a weighted linear combination of the columns of the matrix.
6. A method as claimed in any preceding claim, further comprising: generating an estimate of the size and location of the face within the acquired facial image.
7. A method as claimed in claim 6, further comprising: scaling and aligning the estimate to a mean face shape.
8. A method as claimed in claim 7, wherein the steps of scaling and aligning are done using Procrustes alignment.
9. A method as claimed in any preceding claim, further comprising: locating at least one feature of the face.
10. A method as claimed in claim 9, wherein the locating step includes locating at least one of: an eye, an eyebrow, a pupil, a nose, or a mouth.
11. A method as claimed in claim 9 or claim 10, wherein the locating step comprises: identifying a plurality of landmark points associated with at least one feature of the face.
12. A method as claimed in any preceding claim, wherein the step of calculating the shape of the plurality of primary facial expressions for the facial image comprises: applying a subject-independent Facial Expression Shape Model algorithm to the facial image to calculate a shape for the face for each of the primary facial expressions.
13. A method as claimed in any preceding claim, wherein the step of calculating the texture of the plurality of primary facial expressions for the facial image comprises: applying a subject-independent Facial Expression Texture Model algorithm to the facial image to calculate a texture for the face for each of the primary facial expressions.
14. A method as claimed in any preceding claim, further comprising: aligning the calculated shapes for each of the primary expressions to each other.
15. A method as claimed in claim 14, wherein the step of aligning the calculated shapes is done using Procrustes alignment.
16. A method as claimed in claim 14 or claim 15, further comprising: calculating a new mean face shape from the aligned shapes.
17. A method as claimed in claim 16, further comprising: warping each of the calculated textures to the new mean face shape.
18. A method as claimed in claim 17, wherein the step of warping is done using a piece-wise affϊne transformation algorithm.
19. A method as claimed in any preceding claim, further comprising: combining the synthesised facial image with the acquired facial image.
20. A method as claimed in claim 19, wherein the step of combining the synthesised facial image with the acquired facial image comprises: superimposing the synthesised facial image over the acquired facial image; and blending the synthesised facial image with the acquired facial image.
21. A system for synthesis of a non-primary facial expression, comprising: means for acquiring a facial image; means for calculating a shape and texture for each of a plurality of primary facial expressions for the facial image; means for generating a subject-specific model of the face including each of the primary facial expressions using the calculated shapes and textures; means for selecting an expression vector corresponding to a non-primary facial expression to be synthesised; and means for applying the expression vector to the model to synthesise a facial image having the selected non-primary facial expression.
22. A method for generating an expression look-up table, comprising: analysing a plurality of images, each of which depicts a facial expression; for each image, determining the relationship between a neutral expression and the facial expression depicted therein to create an expression vector associated with the facial expression; storing each expression vector indexed by an expression description descriptive of the associated facial expression.
23. A method for synthesis of a non-primary facial expression substantially as hereinbefore described with reference to and/or as illustrated in the accompanying drawings.
24. A method for generating an expression look-up table substantially as hereinbefore described with reference to and/or as illustrated in the accompanying drawings.
PCT/EP2008/061319 2007-09-05 2008-08-28 Method and system for synthesis of non-primary facial expressions WO2009030636A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IE2007/0634 2007-09-05
IE20070634A IE20070634A1 (en) 2007-09-05 2007-09-05 Method and system for synthesis of non-primary facial expressions

Publications (1)

Publication Number Publication Date
WO2009030636A1 true WO2009030636A1 (en) 2009-03-12

Family

ID=40225272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/061319 WO2009030636A1 (en) 2007-09-05 2008-08-28 Method and system for synthesis of non-primary facial expressions

Country Status (2)

Country Link
IE (1) IE20070634A1 (en)
WO (1) WO2009030636A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130002669A1 (en) * 2011-06-30 2013-01-03 Samsung Electronics Co., Ltd. Method and apparatus for expressing rigid area based on expression control points
US8593452B2 (en) 2011-12-20 2013-11-26 Apple Inc. Face feature vector construction

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
COOTES T F ET AL: "ACTIVE APPEARANCE MODELS", EUROPEAN CONFERENCE ON COMPUTER VISION, BERLIN, DE, vol. 2, no. 1, 1 January 1998 (1998-01-01), pages 484 - 498, XP000884426 *
DU Y ET AL: "Emotional facial expression model building", PATTERN RECOGNITION LETTERS, ELSEVIER, AMSTERDAM, NL, vol. 24, no. 16, 1 December 2003 (2003-12-01), pages 2923 - 2934, XP004463254, ISSN: 0167-8655 *
GHENT ET AL: "Photo-realistic facial expression synthesis", IMAGE AND VISION COMPUTING, GUILDFORD, GB, vol. 23, no. 12, 1 November 2005 (2005-11-01), pages 1041 - 1050, XP005060790, ISSN: 0262-8856 *
GROSS ET AL: "Generic vs. person specific active appearance models", IMAGE AND VISION COMPUTING, GUILDFORD, GB, vol. 23, no. 12, 1 November 2005 (2005-11-01), pages 1080 - 1093, XP005060793, ISSN: 0262-8856 *
IAIN MATTHEWS ET AL: "Active Appearance Models Revisited", INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, BO, vol. 60, no. 2, 1 November 2004 (2004-11-01), pages 135 - 164, XP019216428, ISSN: 1573-1405 *
LEI XIONG ET AL: "Facial Expression Sequence Synthesis Based on Shape and Texture Fusion Model", IMAGE PROCESSING, 2007. ICIP 2007. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 September 2007 (2007-09-01), pages IV - 473, XP031158758, ISBN: 978-1-4244-1436-9 *
XIAOGUANG LU ET AL: "Integrating Range and Texture Information for 3D Face Recognition", APPLICATION OF COMPUTER VISION, 2005. WACV/MOTIONS '05 VOLUME 1. SEVENTH IEEE WORKSHOPS ON, IEEE, PI, 1 January 2005 (2005-01-01), pages 156 - 163, XP031059084, ISBN: 978-0-7695-2271-5 *
ZHOU ET AL: "Facial expressional image synthesis controlled by emotional parameters", PATTERN RECOGNITION LETTERS, ELSEVIER, AMSTERDAM, NL, vol. 26, no. 16, 1 December 2005 (2005-12-01), pages 2611 - 2627, XP005136270, ISSN: 0167-8655 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130002669A1 (en) * 2011-06-30 2013-01-03 Samsung Electronics Co., Ltd. Method and apparatus for expressing rigid area based on expression control points
US9454839B2 (en) * 2011-06-30 2016-09-27 Samsung Electronics Co., Ltd. Method and apparatus for expressing rigid area based on expression control points
US8593452B2 (en) 2011-12-20 2013-11-26 Apple Inc. Face feature vector construction
AU2012227166B2 (en) * 2011-12-20 2014-05-22 Apple Inc. Face feature vector construction

Also Published As

Publication number Publication date
IE20070634A1 (en) 2009-04-15

Similar Documents

Publication Publication Date Title
Pyun et al. An example-based approach for facial expression cloning
Pighin et al. Modeling and animating realistic faces from images
US8624901B2 (en) Apparatus and method for generating facial animation
CN111833236B (en) Method and device for generating three-dimensional face model for simulating user
Sharma et al. 3d face reconstruction in deep learning era: A survey
US11587288B2 (en) Methods and systems for constructing facial position map
WO2021228183A1 (en) Facial re-enactment
KR100900823B1 (en) An efficient real-time skin wrinkle rendering method and apparatus in character animation
Ahmed et al. Automatic generation of personalized human avatars from multi-view video
KR20230085931A (en) Method and system for extracting color from face images
Song et al. A generic framework for efficient 2-D and 3-D facial expression analogy
CN115393480A (en) Speaker synthesis method, device and storage medium based on dynamic nerve texture
Theobald et al. Real-time expression cloning using appearance models
JP2024506170A (en) Methods, electronic devices, and programs for forming personalized 3D head and face models
Xu et al. Efficient 3d articulated human generation with layered surface volumes
WO2009030636A1 (en) Method and system for synthesis of non-primary facial expressions
CN116863044A (en) Face model generation method and device, electronic equipment and readable storage medium
KR100792704B1 (en) A Method of Retargeting A Facial Animation Based on Wire Curves And Example Expression Models
CN115023742A (en) Facial mesh deformation with detailed wrinkles
Park et al. A feature‐based approach to facial expression cloning
Sun et al. SSAT $++ $: A Semantic-Aware and Versatile Makeup Transfer Network With Local Color Consistency Constraint
Wang et al. Uncouple generative adversarial networks for transferring stylized portraits to realistic faces
Jiang et al. Animating arbitrary topology 3D facial model using the MPEG-4 FaceDefTables
Tu et al. Expression detail mapping for realistic facial animation
Zhang et al. Fast individual face modeling and animation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08787552

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08787552

Country of ref document: EP

Kind code of ref document: A1