WO1998001830A1 - Image processing - Google Patents

Image processing Download PDF

Info

Publication number
WO1998001830A1
WO1998001830A1 PCT/GB1997/001834 GB9701834W WO9801830A1 WO 1998001830 A1 WO1998001830 A1 WO 1998001830A1 GB 9701834 W GB9701834 W GB 9701834W WO 9801830 A1 WO9801830 A1 WO 9801830A1
Authority
WO
WIPO (PCT)
Prior art keywords
muscle
model
shape
muscles
generic
Prior art date
Application number
PCT/GB1997/001834
Other languages
French (fr)
Inventor
William John Welsh
Cecile PINCEMAIL
Original Assignee
British Telecommunications Public Limited Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB9614194.0A external-priority patent/GB9614194D0/en
Application filed by British Telecommunications Public Limited Company filed Critical British Telecommunications Public Limited Company
Priority to AU34526/97A priority Critical patent/AU3452697A/en
Publication of WO1998001830A1 publication Critical patent/WO1998001830A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Definitions

  • This invention relates to image processing and in particular image processing for use in real time applications such as video conferencing.
  • Broadcast quality TV requires a transmission bandwidth in excess of 1 00
  • model-based or knowledge-based image coding may be used in which a model of an object in a scene is used.
  • a model of an object in a scene is used.
  • the information needed to generate a moving sequence at the receiver would be a description of the cube (which only needs to be sent once at the beginning of the transmission) followed by a description of its motion
  • the polygon net can be shaded.
  • the shading is made to depend on the presence of imaginary light sources in the model world and the assumption of reflectance properties of the model surfaces
  • a so-called smooth shading technique such as Gouraud or Phong shading can be used to improve realism
  • Gouraud or Phong shading can be used to improve realism
  • the shading of each polygon is effectively interpolated from the shading values at the vertices. This gives the impression of a smoothly curved surface.
  • the face can be made to move in a global way by applying a set of rotations and translations to all the vertices as a whole.
  • the vertices of certain polygons can be translated locally in order to change the facial expression. This may be in accordance with a system that has been developed for categorising facial expressions in terms of about 50 independent facial actions (the facial action coding system or FACS) as described in "Manual for the Facial Action Coding System" P Ekman, W V F ⁇ esen, consulting Psychologists Press, Palo Alto, California 1 977
  • a method and apparatus for coding an image of an animated object an object being represented by a shape model defining the generic shape of the object and a muscle model defining the generic arrangement of muscles associated with the object, said image being coded in terms of movement of the shape and/or muscle model, the shape and muscle model for an object having a predefined interrelationship such that when one of the models is conformed to the shape of a specific example of the object, the other of said models is also conformed accordingly
  • this generic structure of a shape and a muscle model can be easily manipulated for a specific object
  • the muscle model comprises information relating to predefined expressions, which information relates to which muscles are activated for each predefined expression and the degree of activation required, wherein, when the shape model is conformed to an object, the degree of activation is adapted in accordance with the changes made to the shape model
  • the generic shape model and the generic muscle model may share common points within an object.
  • a set of action units may be generated for a specific object, each action unit defining the displacement of points within the specific object necessary to produce a required animated expression
  • the invention is particularly suitable for use in video conferencing applications and will also find use in graphical user interfaces which require animation.
  • Figure 1 shows an example of a generic shape model for a human face
  • Figure 2 shows an example of a generic muscle model for a human face
  • Figure 3 shows the logical structure of the muscle model of Figure 2
  • Figure 4 shows a structure of the expressions associated with the muscle model
  • Figure 5a and b show the generic shape model of Figure 1 and the conformed shape model respectively;
  • Figure 6 shows a conformed muscle model added onto a face image
  • Figures 7 a-f show various expressions applied to the face.
  • the shape model as shown in Figure 1 , represents a human head and comprises a so-called wire frame of interconnected polygons 10.
  • the x, y and z components of each polygon vertex in the object's face are stored in a computer as arrays X(V), Y(V), Z(V) where V is the vertex address.
  • LIN V(L) (E) gives the addresses of the vertices at the end of a line L, where E is either 0 or 1 depending on which end of the line is being considered:
  • LIN L(P) (S) gives the line of address for each side S of a polygon P.
  • the wire frame is drawn on the screen by iterating through all the values of L in LIN V(L) (E) giving the vertex addresses which, in turn, yield the co-ordinates of the vertices using the arrays X(V), Y(V) and Z(V).
  • the side of the polygon is projected onto the screen using perspective or orthographic projection as is known in the art of model-based image coding.
  • the muscles of the human face are embedded in the superficial fascia, the majority extending from the bones of the skull and into the skin
  • the orifices of the face namely the eye orbit, nose, and mouth, are guarded by the eyelids, nostrils and lips.
  • a secondary function of the facial muscles is to modify the expression of the face.
  • Each muscle can be defined according to the orientation of the fasciculi (the individual fibres of the muscles) that may be parallel/linear, oblique or spiralized relative to the direction of pull at their attachment. They can be split into two groups: the upper and lower face. In the lower face there are five major groups:
  • Orbitals that are circular or elliptical in nature, and run round the eyes and mouth
  • Sheet muscle that carries out miscellaneous actions, particularly over the temporal zones
  • the upper facial muscles are responsible for the changing appearance of the eyebrows, forehead and the upper and lower lids of the eyes.
  • the muscles contract isotonically towards the static insertion into the cranium, consequently the surface tissue bunches and w ⁇ nkies perpendicularly to the direction of the muscle.
  • the muscles of the mouth have the most complex muscular interaction.
  • the Obicularis O ⁇ s is a sphincter muscle with no bony attachment.
  • the facial muscle system is very complex. For videoconferencing purposes a very complicated model was undesirable due to the small size of the image and the real time nature of a videoconferencing application.
  • a simple muscle model for the face is here described which, for the sake of simplicity only, does not include any muscles on the mouth.
  • FIG 2 shows an example of a generic muscle model for the muscles of the face.
  • the origin of each muscle is represented by a solid circle.
  • Muscles 0 and 1 represent the frontalis muscle, which are used, for example, during the raised eyebrow expression.
  • Each muscle is composed of two lines to represent the width of the frontalis muscle. As these muscles are attached to a bone, the upper vertices are not allowed to move, and the origin is located between the two upper vertices of each muscie.
  • Muscle 2 represents the corrugator muscle When it is contracted, the surface tissue bunches and wrinkles perpendicularly to the direction of the muscles. This muscle is composed by two segments whose extremities are moving vertices. The origin is positioned under the muscle to emphasize the slope of the eyebrow for an expression like anger.
  • Muscles 3 and 4 represent the orbicula ⁇ s ocuh, which are the sphincter muscles of the eyelids
  • each muscle describes a large circle around the eye to improve the realism of each motion.
  • Each muscle contains 7 vertices, but just the upper vertices are allowed to move to represent the movement of the eyelid. The origin is in the middle of the circle.
  • Muscles 5 and 6 comprise one segment which creates a link between the eye and the mouth. This is used, for example, during a smile to contract the skin between the eye and the mouth.
  • Muscle 7 is the muscles of the nostrils used as a sphincter muscie and as a dilator muscle too It contains two segments alongside the nose. The two lower vertices are used to represent the different motions of the nostrils. The origin is the middle point of the four extremities of the segments.
  • Muscles 8 and 9 can be compared to the zygomaticus major, and are very useful for the smile. Each muscle is represented by two segments to show the width of the muscle. To translate the attachment to the bone, only the lower vertices are allowed to move, and the origins are between the two upper extremities.
  • Muscles 10 and 1 1 can be compared, at one and the same time, to the buccinator and the m levator. Attached to the bone on one extremity, just the nearest to the mouth vertices are allowed to move.
  • Muscle 12 is the obicula ⁇ s o ⁇ s. This is a sphincter muscle with no bony attachment so each vertex is allowed to move. This muscle contains 8 segments and is used in most of the expression. It is not used only as a sphincter muscie which compresses the lips together, but as a dilator muscle which also separates the lips. It just defines the shape of the external mouth. The internal mouth is not modified by this muscle.
  • Muscles 13 and 14 can be compared, at one and the same time, to the M depressor and to the labii inferio ⁇ s. They contain two segments to represent the width of the muscle, and two moving vertices to represent the bony attachment.
  • Muscles 15 and 16 are used to move the eyelids. By their contraction the accuracy is better than by the orbicula ⁇ s oculi.
  • Muscles 1 7 and 18 are used to represent the small contraction of the skin under the eyes. As the two previous muscles, the accuracy is better. They are composed on the same base.
  • the face wire frame as shown in Figure 1 is modified to add new vertices defining the muscles' extremities i.e. the muscle model of Figure 2 is added to the shape model of Figure 1 .
  • the face wire frame is a set of 1 4 vertices, and some of them are used as muscle points too. 54 other points are defined as muscles' extremities only because no vertex in the shape model corresponds to the position of the muscle extremity.
  • the information is stored in a file.
  • the first line of this file contains the identification number of a face wire frame (e.g. no. 0001 ), then the number of vertices in the face wire frame ( 144) , and before the number of facets (21 7), there is the number of vertices used only as muscles' extremities (54) .
  • the muscle model and the shape model are locked together such that, when the generic shape model is conformed to fit a particular face, the muscle model is also conformed. As well as positioning the muscle attachment points this also scales the muscle lengths.
  • an expression is used to activate a set of muscles - for example, to create an impression of anger - the muscles thus contract to an appropriate degree for the specific face, rather than for the generic face where different sizes and locations of the features (eyes, mouth, nose etc.) would require a different set of characteristics.
  • the logical structure of the muscle model is defined as a tree, as shown in figure 3.
  • each muscle is composed of one or more segments ( 1 to n) .
  • the Orbiculans Oris which is around the mouth, contains 8 segments.
  • Each segment is defined by the vertices defining its extremities. These vertices belong to the face wire frame as will be described later.
  • the segments of each muscle are defined in the next level 32. Two other parameters used to characterise a muscle are:
  • the origin 32a which is used when the muscle is pulled or squeezed.
  • the origin is defined by x, y, z co-ordinates or a vertex number which def ⁇ ne(s) the centre of the scale function corresponding to the muscle activation (the origins of the muscle model are indicated by dots in Figure 2) .
  • the moving vertices 32b which characterised the extremities of the muscie attached to the skin (i.e. the position of the moving vertices) .
  • Those vertices which represent muscle extremities which are attached to the bones of the skull and hence which do not move are not defined in the muscle model.
  • All the muscles are described in a computer file by their segments 32, moving vertices 32b and origin 32a.
  • This file contains one line per muscle. For each muscle, there is the number of the segment, the list of the segment extremities; then the number of moving vertices and the list of moving vertices, and the number and the list of vertices used to calculate the position of the origin.
  • the origin is the barycentre of these vertices.
  • each expression is described by a list of muscles to contract.
  • each expression is defined in terms of the number of muscles to scale, the muscle number identification, and the X and Y factors by which the muscles are to be expanded or contracted with respect to the origin to produce the desired expression.
  • the logical structure of the expressions is shown in figure 4.
  • the X and Y factors determine the displacement (in the X and Y directions) of the moving vertices of the associated muscle with respect to the origin. For instance, consider muscles 5 and 6 of Figure 2, from the eye to the mouth. The generic muscle model may require these muscles to contract by 30% in the Y direction and 2% in the X direction for the expression of a smile Thus the X and Y factors for the expression "smile" for the muscles 5 and 6 are -2 and - 30 respectively (or 98 % and 70% of the initial position with respect to the origin) . Rather than defining the muscle contraction factors as factors of X and Y, the muscle contraction factors may represent the proportion by which the muscle length has to contract for a given expression, the orientation of the moving vertices with respect to the origin remaining the same.
  • the degree of activation of the muscle is defined once for each muscle used in an expression. Alternatively, however, the degree of activation may be defined for each vertex of the muscie.
  • the generic muscle model comprises a set of muscle definitions (as shown logically in Figure 3) and a set of expressions (as shown logically in Figure 4) .
  • action units To convert these expressions to so-called action units, the initial position of each point of the face wire frame is registered, and the position when the expression is maximal. Then a program determines the difference between the positions and stores them as a action unit. The last step consists of adding those lines to the action unit file.
  • this generic muscle model has to be manipulated for a specific head.
  • the picture in Figure 6 shows a manipulated muscle model added onto a face image, as will be described below.
  • the shape and muscle models To conform the shape and muscle models to a particular user, firstly at least two images of the person are grabbed by a video camera, including one image with the mouth open and the other with the mouth closed. Both of these images are loaded tnto a computer, and the generic shape and muscle models are adapted by the conform program, from the generic model, to the specified person.
  • the conform program is also used to define and test all the action units involved in the animation. Once the tests are realised, the files are send to a receiver Then the person at the transmitter is able to speak on demand via the adapted talking head at the receiver. The head is produced from two still pictures, one with the mouth closed, the other with the mouth open. These pictures have the format *. yuv Loading the data files means defining the wire frames.
  • FIG. 5a shows an image of a participant with the generic shape model superposed thereon.
  • the wire frame is then scaled and rotated by an operator to conform the generic model to the object.
  • An example of the conformed shape model is shown in Figure 5b.
  • the key points are special vertices of the wire frame which can be moved by the mouse. A t ⁇ angulation is made from all the key points; moving one key point modifies one or more triangles and moves ail the vertices of the wire frame contained in those triangles. So moving the key points allow a better accuracy for the wire frame.
  • All these files are created by the conformation program from the pictures produced by the video, and from the manipulated wire frames. Then they are used to produce an animated image at the transmitter, which is then used by the talking head at the receiver.
  • synthesised facial expressions can be produced on the face. This can be achieved by applying the set of contraction factors to certain muscles as explained with reference to Figure 4. This has two effects: first, other attached muscles are tugged along since all the muscle form a linked set; second, the positions of the wireframe vertices are affected by the muscle movements.
  • the method used to move the wireframe vertices is the same as that used during the conformation process, in which some of the vertices are designated as keypoints which the user can reposition and the remaining vertices are moved automatically using the interpolation process as described in relation to the shape model. In the case of movement due to muscle action, the endpoi ⁇ ts of the muscles take the place of the keypoints in the conformation process and the same interpolation method is used.
  • the method described to produce the synthesised facial expressions may be used on any face. To determine one set of muscle contraction factors which produce satisfactory facial expressions for most faces, it is necessary to consider a plurality of different faces to obtain average factors which produce a desired result. Thus a head independent set of muscle contraction factors is obtained.
  • This method of determining muscle contraction factors is used off-line since it is more computationally intensive than the method of using action units because of the interpolation process.
  • the method of action units is fast because it is based on a set of displacements applied to a subset of the vertices of the original wireframe model; only three additions or subtractions (in x, y, z directions) are required per vertex. Interpolation requires a number of multiplications to be carried out per vertex.
  • the same set of muscle contraction factors can then be applied to any new face for each of the expressions required and the displacement of each vertex of the wireframe model from the neutral position is recorded.
  • the set of displacements obtained is then identical to an action unit which, if applied to a real-time player, will result in exactly the same expression being generated.
  • the action unit is composed of a set of absolute displacements of the wireframe vertices, it is specific to a particular head, unlike the set of muscle contractions which is generic.
  • the action unit can be used in a real-time head player.
  • the degree of activation of a muscle may alternatively be stored in absolute pixel values i.e. as so-called action units.
  • the degree of activation may be adapted, in the X direction, in dependence on the ratio of the width of the generic shape model and the conformed shape model and similarly in the Y direction, in dependence on the ratio of the length of the two models.
  • those vertices of the shape model which comprise areas of the object which are associated with muscles may be pre-defined and the relative displacement of the vertices (either individually or on average for a given muscle) of the generic and conformed shape model used to adapt the degree of activation.
  • An operator at a transmitter may check that the thus conformed animated head provides a good enough representation of the specific person. To do this each expression is tested. The operator is provided with a list of all the possible expressions. To activate an action unit, a cursor for the corresponding action is selected and the degree of the action unit to be applied adjusted if necessary.
  • Figures 7 a-f show different expressions produced using a generic muscle model as described:
  • Figure 7a shows a smile For a smile 1 0 muscles are activated. The most important contractions are the zygomatic contractions. However, the mouth is moving and stretching towards the side. The smile has repercussion for the eyes too.
  • Figure 7b shows surprise. This expression activates 1 1 muscles. Most of the modifications are around the eyes to open them. Then the muscle along the nose and the obicula ⁇ s oris around the mouth are stretched down. • Figure 7c shows anger. This activates 10 muscles: the muscles near the eyes for frowning, the muscle along the nose to enlarge the nostrils and the muscle on the chin to contract the skin on it.
  • Figure 7d shows disgust. It involves 9 muscles. The same muscles as anger expression for frowning, but with a less degree, are activated.
  • Figure 7e shows a raised eyebrow. This expression concerns just 5 muscles near the eyes. The transformations are exactly the same as for surprise expression but limited to the upper part of the face and with a higher degree of application.
  • Figure 7f shows sadness. This expression involves 6 muscles. Four muscles are used to lower the corner of the mouth. The two others are used to raise up the eyebrows.
  • the muscle model is implemented on the face wire frame only, and displayed on the picture with the mouth closed.
  • the mouth would also be animated but this has not been discussed here for the sake of simplicity.
  • the muscles' vertices are defined for a particular user, but they need to be stored for the particular user in the structure defined in Figure 3.
  • This operation is done when the face wire frame is loaded, by running a procedure named readjnuscle. This procedure uses the face wire frame and the file describing the muscle, muscle, wfm. Two other procedures were written to display the mesh of the muscles and their origins, disp_muscle_wf and d ⁇ sp_m ⁇ scle_ortgin.
  • the muscles' extremities may be loaded as key points and a t ⁇ angulation run. Displacing one extremity, with the left button of a mouse, will modify the vertices contained in the modified triangles. Alternatively a muscle extremity may be selected with the right button of the mouse and then moved. This modification will have no repercussion on the whole face wire frame.
  • an origin of a muscle may be moved to adjust the centre of the scaling function.
  • the degree of activation of a muscle for a given expression is defined gene ⁇ cally. That means that the same vertices are moved for a smile whoever the person However the amount of movement is adapted for a particular person by means of the conforming of the muscle model in accordance with the shape model.
  • the action units and images may be transmitted from a transmitter to a remote receiver, e.g. in a video-conferencing application Alternatively, the action units and images may be used locally to provide an animated interface for a service or the like.

Abstract

A method of coding an image of an animated object, an object being represented by a shape model defining the generic shape of the object and a muscle model defining the generic arrangement of muscles associated with the object, said image being coded in terms of movement of the shape and/or muscle model, the shape and muscle model for an object having a predefined interrelationship such that when one of the models is conformed to the shape of a specific example of the object, the other of said models is also conformed accordingly. The muscle model comprises information relating to predefined expressions, which information relates to which muscles are activated for each predefined expression and the degree of activation required, wherein, when the shape model is conformed to an object, the degree of activation is adapted in accordance with the changes made to the shape model.

Description

IMAGE PROCESSING
This invention relates to image processing and in particular image processing for use in real time applications such as video conferencing. Broadcast quality TV requires a transmission bandwidth in excess of 1 00
Mbit/s when coded in digital form which is both expensive to transmit and requires high bandwidth links When images are coded digitally for transmission, it is therefore desirable to keep the amount of data generated as low as possible.
This may be done using predictive coding techniques which exploit the correlation between the picture elements (pixels) of the digitised sequence of images or frames Alternatively model-based or knowledge-based image coding may be used in which a model of an object in a scene is used. As an example, consider a scene containing a cube moving around against a plain background- the information needed to generate a moving sequence at the receiver would be a description of the cube (which only needs to be sent once at the beginning of the transmission) followed by a description of its motion
It is very straightforward to produce computer animated sequences of cubes using model-based coding, but for video conferencing the modelling of people in the scene is more complicated In "Parametised Modeis for Facial Animation" FI Parke, IEEE Computer Graphics and Applications, November 1 982, a parameteπsed model of a face was proposed for use in videophony. A computer model of a head comprising a net of inter-connected polygons is stored in a computer as a set of linked lists or links arrays.
In order to make the models appear more realistic, the polygon net can be shaded. The shading is made to depend on the presence of imaginary light sources in the model world and the assumption of reflectance properties of the model surfaces A so-called smooth shading technique such as Gouraud or Phong shading can be used to improve realism In these techniques the shading of each polygon is effectively interpolated from the shading values at the vertices. This gives the impression of a smoothly curved surface.
The face can be made to move in a global way by applying a set of rotations and translations to all the vertices as a whole. In addition the vertices of certain polygons can be translated locally in order to change the facial expression. This may be in accordance with a system that has been developed for categorising facial expressions in terms of about 50 independent facial actions (the facial action coding system or FACS) as described in "Manual for the Facial Action Coding System" P Ekman, W V Fπesen, Consulting Psychologists Press, Palo Alto, California 1 977
The diversity of facial forms in terms of sex, age and race is enormous. It is these diversities that allow us to recognise individuals and send complex nonverbal signals to one another However, the anatomy of a human head is the same for everyone It differs in size, but the structure, the arrangement of the bones and their rules are the same Generally everyone has got the same bones in the same place and the same muscles
In accordance with the invention there is provided a method and apparatus for coding an image of an animated object, an object being represented by a shape model defining the generic shape of the object and a muscle model defining the generic arrangement of muscles associated with the object, said image being coded in terms of movement of the shape and/or muscle model, the shape and muscle model for an object having a predefined interrelationship such that when one of the models is conformed to the shape of a specific example of the object, the other of said models is also conformed accordingly Thus this generic structure of a shape and a muscle model can be easily manipulated for a specific object
Preferably the muscle model comprises information relating to predefined expressions, which information relates to which muscles are activated for each predefined expression and the degree of activation required, wherein, when the shape model is conformed to an object, the degree of activation is adapted in accordance with the changes made to the shape model
The generic shape model and the generic muscle model may share common points within an object.
A set of action units may be generated for a specific object, each action unit defining the displacement of points within the specific object necessary to produce a required animated expression The invention is particularly suitable for use in video conferencing applications and will also find use in graphical user interfaces which require animation.
The invention will now be described further by way of example only with reference to the accompanying drawing in which:
Figure 1 shows an example of a generic shape model for a human face; Figure 2 shows an example of a generic muscle model for a human face; Figure 3 shows the logical structure of the muscle model of Figure 2, Figure 4 shows a structure of the expressions associated with the muscle model;
Figure 5a and b show the generic shape model of Figure 1 and the conformed shape model respectively;
Figure 6 shows a conformed muscle model added onto a face image; and Figures 7 a-f show various expressions applied to the face. The shape model, as shown in Figure 1 , represents a human head and comprises a so-called wire frame of interconnected polygons 10. The x, y and z components of each polygon vertex in the object's face are stored in a computer as arrays X(V), Y(V), Z(V) where V is the vertex address. In addition, a pair of two-dimensional arrays are used: LIN V(L) (E) gives the addresses of the vertices at the end of a line L, where E is either 0 or 1 depending on which end of the line is being considered: LIN L(P) (S) gives the line of address for each side S of a polygon P. The wire frame is drawn on the screen by iterating through all the values of L in LIN V(L) (E) giving the vertex addresses which, in turn, yield the co-ordinates of the vertices using the arrays X(V), Y(V) and Z(V). The side of the polygon is projected onto the screen using perspective or orthographic projection as is known in the art of model-based image coding.
The muscles of the human face are embedded in the superficial fascia, the majority extending from the bones of the skull and into the skin The orifices of the face, namely the eye orbit, nose, and mouth, are guarded by the eyelids, nostrils and lips. It is the function of the facial muscles to serve as sphincters or dilators of these structures A secondary function of the facial muscles is to modify the expression of the face. Each muscle can be defined according to the orientation of the fasciculi (the individual fibres of the muscles) that may be parallel/linear, oblique or spiralized relative to the direction of pull at their attachment. They can be split into two groups: the upper and lower face. In the lower face there are five major groups:
• Uppers and downers, that move the face upwards towards the brow and conversely towards the chin
• Those that contract horizontally towards the ears and conversely towards the centre line of the face • Oblique muscles that contract in an angular direction from the lips, upwards and outwards to the cheek bones
• Orbitals, that are circular or elliptical in nature, and run round the eyes and mouth
• Sheet muscle, that carries out miscellaneous actions, particularly over the temporal zones
The upper facial muscles are responsible for the changing appearance of the eyebrows, forehead and the upper and lower lids of the eyes. The muscles contract isotonically towards the static insertion into the cranium, consequently the surface tissue bunches and wπnkies perpendicularly to the direction of the muscle.
The muscles of the mouth have the most complex muscular interaction. The Obicularis Oπs is a sphincter muscle with no bony attachment.
The facial muscle system is very complex. For videoconferencing purposes a very complicated model was undesirable due to the small size of the image and the real time nature of a videoconferencing application. A simple muscle model for the face is here described which, for the sake of simplicity only, does not include any muscles on the mouth.
Figure 2 shows an example of a generic muscle model for the muscles of the face. In Figure 2 the origin of each muscle is represented by a solid circle. • Muscles 0 and 1 represent the frontalis muscle, which are used, for example, during the raised eyebrow expression. Each muscle is composed of two lines to represent the width of the frontalis muscle. As these muscles are attached to a bone, the upper vertices are not allowed to move, and the origin is located between the two upper vertices of each muscie.
• Muscle 2 represents the corrugator muscle When it is contracted, the surface tissue bunches and wrinkles perpendicularly to the direction of the muscles. This muscle is composed by two segments whose extremities are moving vertices. The origin is positioned under the muscle to emphasize the slope of the eyebrow for an expression like anger.
• Muscles 3 and 4 represent the orbiculaπs ocuh, which are the sphincter muscles of the eyelids In the model, each muscle describes a large circle around the eye to improve the realism of each motion. Each muscle contains 7 vertices, but just the upper vertices are allowed to move to represent the movement of the eyelid. The origin is in the middle of the circle. • Muscles 5 and 6 comprise one segment which creates a link between the eye and the mouth. This is used, for example, during a smile to contract the skin between the eye and the mouth.
• Muscle 7 is the muscles of the nostrils used as a sphincter muscie and as a dilator muscle too It contains two segments alongside the nose. The two lower vertices are used to represent the different motions of the nostrils. The origin is the middle point of the four extremities of the segments.
• Muscles 8 and 9 can be compared to the zygomaticus major, and are very useful for the smile. Each muscle is represented by two segments to show the width of the muscle. To translate the attachment to the bone, only the lower vertices are allowed to move, and the origins are between the two upper extremities.
• Muscles 10 and 1 1 can be compared, at one and the same time, to the buccinator and the m levator. Attached to the bone on one extremity, just the nearest to the mouth vertices are allowed to move. • Muscle 12 is the obiculaπs oπs. This is a sphincter muscle with no bony attachment so each vertex is allowed to move. This muscle contains 8 segments and is used in most of the expression. It is not used only as a sphincter muscie which compresses the lips together, but as a dilator muscle which also separates the lips. It just defines the shape of the external mouth. The internal mouth is not modified by this muscle.
• Muscles 13 and 14 can be compared, at one and the same time, to the M depressor and to the labii inferioπs. They contain two segments to represent the width of the muscle, and two moving vertices to represent the bony attachment.
• Muscles 15 and 16 are used to move the eyelids. By their contraction the accuracy is better than by the orbiculaπs oculi. One the single segment, the upper vertex is used as origin and the lower is allowed to move.
• Muscles 1 7 and 18 are used to represent the small contraction of the skin under the eyes. As the two previous muscles, the accuracy is better. They are composed on the same base.
The face wire frame as shown in Figure 1 is modified to add new vertices defining the muscles' extremities i.e. the muscle model of Figure 2 is added to the shape model of Figure 1 . The face wire frame is a set of 1 4 vertices, and some of them are used as muscle points too. 54 other points are defined as muscles' extremities only because no vertex in the shape model corresponds to the position of the muscle extremity. The information is stored in a file. The first line of this file contains the identification number of a face wire frame (e.g. no. 0001 ), then the number of vertices in the face wire frame ( 144) , and before the number of facets (21 7), there is the number of vertices used only as muscles' extremities (54) .
The muscle model and the shape model are locked together such that, when the generic shape model is conformed to fit a particular face, the muscle model is also conformed. As well as positioning the muscle attachment points this also scales the muscle lengths. When an expression is used to activate a set of muscles - for example, to create an impression of anger - the muscles thus contract to an appropriate degree for the specific face, rather than for the generic face where different sizes and locations of the features (eyes, mouth, nose etc.) would require a different set of characteristics.
The logical structure of the muscle model is defined as a tree, as shown in figure 3.
On the top level 31 , there is the muscie model of the face (as shown in Figure 2) which reoresents the set of all the muscles of the face, which are linked together. This set is characterised by the number of muscles 31 a that it contains and by a list 31 b of all the individual muscles. Each muscle is composed of one or more segments ( 1 to n) . For example the Orbiculans Oris, which is around the mouth, contains 8 segments. Each segment is defined by the vertices defining its extremities. These vertices belong to the face wire frame as will be described later. The segments of each muscle are defined in the next level 32. Two other parameters used to characterise a muscle are:
• the origin 32a, which is used when the muscle is pulled or squeezed. The origin is defined by x, y, z co-ordinates or a vertex number which defιne(s) the centre of the scale function corresponding to the muscle activation (the origins of the muscle model are indicated by dots in Figure 2) .
• the moving vertices 32b, which characterised the extremities of the muscie attached to the skin (i.e. the position of the moving vertices) . Those vertices which represent muscle extremities which are attached to the bones of the skull and hence which do not move are not defined in the muscle model.
All the muscles are described in a computer file by their segments 32, moving vertices 32b and origin 32a. This file contains one line per muscle. For each muscle, there is the number of the segment, the list of the segment extremities; then the number of moving vertices and the list of moving vertices, and the number and the list of vertices used to calculate the position of the origin. The origin is the barycentre of these vertices.
Each expression is described by a list of muscles to contract. In a file named expression, each expression is defined in terms of the number of muscles to scale, the muscle number identification, and the X and Y factors by which the muscles are to be expanded or contracted with respect to the origin to produce the desired expression.
The logical structure of the expressions is shown in figure 4. The X and Y factors determine the displacement (in the X and Y directions) of the moving vertices of the associated muscle with respect to the origin. For instance, consider muscles 5 and 6 of Figure 2, from the eye to the mouth. The generic muscle model may require these muscles to contract by 30% in the Y direction and 2% in the X direction for the expression of a smile Thus the X and Y factors for the expression "smile" for the muscles 5 and 6 are -2 and - 30 respectively (or 98 % and 70% of the initial position with respect to the origin) . Rather than defining the muscle contraction factors as factors of X and Y, the muscle contraction factors may represent the proportion by which the muscle length has to contract for a given expression, the orientation of the moving vertices with respect to the origin remaining the same.
The above has assumed that the degree of activation of the muscle is defined once for each muscle used in an expression. Alternatively, however, the degree of activation may be defined for each vertex of the muscie.
Thus the generic muscle model comprises a set of muscle definitions (as shown logically in Figure 3) and a set of expressions (as shown logically in Figure 4) . To convert these expressions to so-called action units, the initial position of each point of the face wire frame is registered, and the position when the expression is maximal. Then a program determines the difference between the positions and stores them as a action unit. The last step consists of adding those lines to the action unit file.
As for the shape wire frame, this generic muscle model has to be manipulated for a specific head. The picture in Figure 6 shows a manipulated muscle model added onto a face image, as will be described below.
To conform the shape and muscle models to a particular user, firstly at least two images of the person are grabbed by a video camera, including one image with the mouth open and the other with the mouth closed. Both of these images are loaded tnto a computer, and the generic shape and muscle models are adapted by the conform program, from the generic model, to the specified person. The conform program is also used to define and test all the action units involved in the animation. Once the tests are realised, the files are send to a receiver Then the person at the transmitter is able to speak on demand via the adapted talking head at the receiver. The head is produced from two still pictures, one with the mouth closed, the other with the mouth open. These pictures have the format *. yuv Loading the data files means defining the wire frames. So this part is divided into two sections: work on the face wire frame with the mouth closed image and work on the mouth wire frame with the mouth open image. First, to adapt the shape wire frame for the head, the face. yuv file is loaded together with the generic face wire frame. Figure 5a shows an image of a participant with the generic shape model superposed thereon. The wire frame is then scaled and rotated by an operator to conform the generic model to the object. An example of the conformed shape model is shown in Figure 5b. Then the conformed wire frame can be made more accurate by the use of key points. The key points are special vertices of the wire frame which can be moved by the mouse. A tπangulation is made from all the key points; moving one key point modifies one or more triangles and moves ail the vertices of the wire frame contained in those triangles. So moving the key points allow a better accuracy for the wire frame.
The same actions are then carried out for the mouth wire frame (not shown) .
The details sent to the receiver comprise six files:
• Two files containing the two pictures used, named face. fy, for the picture with the mouth closed, and mouth. my, for the picture with the mouth open. These pictures have the same size as the talking head window.
• Two files describing the vertices of each wire frame, face.fv for the face wire frame and mouth. mv for the mouth wire frame. • Two files for the facets of the wire frame, face. ff and mouth. mf.
All these files are created by the conformation program from the pictures produced by the video, and from the manipulated wire frames. Then they are used to produce an animated image at the transmitter, which is then used by the talking head at the receiver.
After conformation of both the generic wireframe model and the generic muscle model to a specific face, synthesised facial expressions can be produced on the face. This can be achieved by applying the set of contraction factors to certain muscles as explained with reference to Figure 4. This has two effects: first, other attached muscles are tugged along since all the muscle form a linked set; second, the positions of the wireframe vertices are affected by the muscle movements. The method used to move the wireframe vertices is the same as that used during the conformation process, in which some of the vertices are designated as keypoints which the user can reposition and the remaining vertices are moved automatically using the interpolation process as described in relation to the shape model. In the case of movement due to muscle action, the endpoiπts of the muscles take the place of the keypoints in the conformation process and the same interpolation method is used.
The method described to produce the synthesised facial expressions may be used on any face. To determine one set of muscle contraction factors which produce satisfactory facial expressions for most faces, it is necessary to consider a plurality of different faces to obtain average factors which produce a desired result. Thus a head independent set of muscle contraction factors is obtained.
This method of determining muscle contraction factors is used off-line since it is more computationally intensive than the method of using action units because of the interpolation process. The method of action units is fast because it is based on a set of displacements applied to a subset of the vertices of the original wireframe model; only three additions or subtractions (in x, y, z directions) are required per vertex. Interpolation requires a number of multiplications to be carried out per vertex.
The same set of muscle contraction factors can then be applied to any new face for each of the expressions required and the displacement of each vertex of the wireframe model from the neutral position is recorded. The set of displacements obtained is then identical to an action unit which, if applied to a real-time player, will result in exactly the same expression being generated. Because the action unit is composed of a set of absolute displacements of the wireframe vertices, it is specific to a particular head, unlike the set of muscle contractions which is generic. The action unit can be used in a real-time head player.
Although the preferred embodiment of the invention uses muscle contraction factors, the degree of activation of a muscle may alternatively be stored in absolute pixel values i.e. as so-called action units. In this case, the degree of activation may be adapted, in the X direction, in dependence on the ratio of the width of the generic shape model and the conformed shape model and similarly in the Y direction, in dependence on the ratio of the length of the two models. Alternatively, those vertices of the shape model which comprise areas of the object which are associated with muscles may be pre-defined and the relative displacement of the vertices (either individually or on average for a given muscle) of the generic and conformed shape model used to adapt the degree of activation.
An operator at a transmitter may check that the thus conformed animated head provides a good enough representation of the specific person. To do this each expression is tested. The operator is provided with a list of all the possible expressions. To activate an action unit, a cursor for the corresponding action is selected and the degree of the action unit to be applied adjusted if necessary.
Figures 7 a-f show different expressions produced using a generic muscle model as described:
• Figure 7a shows a smile For a smile 1 0 muscles are activated. The most important contractions are the zygomatic contractions. However, the mouth is moving and stretching towards the side. The smile has repercussion for the eyes too.
• Figure 7b shows surprise. This expression activates 1 1 muscles. Most of the modifications are around the eyes to open them. Then the muscle along the nose and the obiculaπs oris around the mouth are stretched down. • Figure 7c shows anger. This activates 10 muscles: the muscles near the eyes for frowning, the muscle along the nose to enlarge the nostrils and the muscle on the chin to contract the skin on it.
• Figure 7d shows disgust. It involves 9 muscles. The same muscles as anger expression for frowning, but with a less degree, are activated.
However, the most important transformations concern the scale of the mouth.
• Figure 7e shows a raised eyebrow. This expression concerns just 5 muscles near the eyes. The transformations are exactly the same as for surprise expression but limited to the upper part of the face and with a higher degree of application.
• Figure 7f shows sadness. This expression involves 6 muscles. Four muscles are used to lower the corner of the mouth. The two others are used to raise up the eyebrows.
Once those actions are done, the vertices are moved (if necessary) , the displacements coded and the texture mapping is activated to show the animation. Several action units can be activated together
In the Figures, the muscle model is implemented on the face wire frame only, and displayed on the picture with the mouth closed. In practice, the mouth would also be animated but this has not been discussed here for the sake of simplicity.
At this point, the muscles' vertices are defined for a particular user, but they need to be stored for the particular user in the structure defined in Figure 3. This operation is done when the face wire frame is loaded, by running a procedure named readjnuscle. This procedure uses the face wire frame and the file describing the muscle, muscle, wfm. Two other procedures were written to display the mesh of the muscles and their origins, disp_muscle_wf and dιsp_mυscle_ortgin. When an operator modifies the muscle model, the muscles' extremities may be loaded as key points and a tπangulation run. Displacing one extremity, with the left button of a mouse, will modify the vertices contained in the modified triangles. Alternatively a muscle extremity may be selected with the right button of the mouse and then moved. This modification will have no repercussion on the whole face wire frame. In the same way, an origin of a muscle may be moved to adjust the centre of the scaling function.
It is possible to select an expression with an amount ranging from 0 to 1 00. This wiil run all the muscles' activation on X and Y axes. An amount of 100 will activate the contraction of the muscles with the pre-defined factor; an amount of 50 will activate the contraction by half as much, etc. Once the scale function is activated for all the muscles concerned by the expression, the conformation is loaded and the texture mapping is activated.
The degree of activation of a muscle for a given expression is defined geneπcally. That means that the same vertices are moved for a smile whoever the person However the amount of movement is adapted for a particular person by means of the conforming of the muscle model in accordance with the shape model.
The action units and images may be transmitted from a transmitter to a remote receiver, e.g. in a video-conferencing application Alternatively, the action units and images may be used locally to provide an animated interface for a service or the like.
Whilst this description has described the invention in relation to an object representing a head, it will be clear to the reader that the invention is applicable to the animation of any object.

Claims

1 . A method of coding an image of an animated object, an object being represented by a shape model defining the generic shape of the object and a muscle model defining the generic arrangement of muscles associated with the object, said image being coded in terms of movement of the shape and/or muscle model, the shape and muscle model for an object having a predefined interrelationship such that when one of the models is conformed to the shape of a specific example of the object, the other of said models is also conformed accordingly, the muscle model also comprising information relating to predefined expressions, which information relates to which muscles are activated for each predefined expression and the degree of activation required, wherein, when the shape model is conformed to an object, the degree of activation is adapted in accordance with the changes made to the shape model
2. A method according to claim 1 , wherein the generic shape model and the generic muscle model share at least one common point within an animated object.
3. A method according to claim 1 or 2 , wherein a set of action units are generated for a specific object, each action unit defining the displacement of points within the specific object necessary to produce a required animated expression.
4. A method according to claim 1 or 2 wherein the degree of activation is represented as a proportion of the length of the associated muscle.
5. An image reproduced from a signal coded according to any of claims 1 to
4
6. An image processing apparatus comprising a stored shape model defining the generic shape of an object and a stored muscle model defining the generic arrangement of muscles associated with the object, means for coding an image of the object in terms of movement of the shape and/or muscle model, the shape and muscie model for an object having a predefined interrelationship such that when one of the models is conformed to the shape of a specific example of the object, the other of said models is also conformed accordingly, the muscle model comprising information relating to predefined expressions, which information relates to which muscles are activated for each predefined expression and the degree of activation required, wherein, when the shape model is conformed to an object, the degree of activation is adapted in accordance with the changes made to the shape model.
7 Apparatus according to claim 6 , wherein the generic shape model and the generic muscle model share at least one common point within an animated object.
8 A method according to claim 6 or 7, wherein a set of action units are generated for a specific object, each action unit defining the displacement of points within the specific object necessary to produce a required animated expression.
9. Video conferencing apparatus including image processing apparatus according to any of Claims 6 to 8.
PCT/GB1997/001834 1996-07-05 1997-07-07 Image processing WO1998001830A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU34526/97A AU3452697A (en) 1996-07-05 1997-07-07 Image processing

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GBGB9614194.0A GB9614194D0 (en) 1996-07-05 1996-07-05 Image processing
GB9614194.0 1996-07-05
EP96306184 1996-08-23
EP96306184.1 1996-08-23

Publications (1)

Publication Number Publication Date
WO1998001830A1 true WO1998001830A1 (en) 1998-01-15

Family

ID=26143855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1997/001834 WO1998001830A1 (en) 1996-07-05 1997-07-07 Image processing

Country Status (2)

Country Link
AU (1) AU3452697A (en)
WO (1) WO1998001830A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002009037A2 (en) * 2000-07-24 2002-01-31 Reflex Systems Inc. Modeling human beings by symbol manipulation
WO2002009040A1 (en) * 2000-07-24 2002-01-31 Eyematic Interfaces, Inc. Method and system for generating an avatar animation transform using a neutral face image
WO2002030171A2 (en) * 2000-10-12 2002-04-18 Erdem Tanju A Facial animation of a personalized 3-d face model using a control mesh
US7127081B1 (en) 2000-10-12 2006-10-24 Momentum Bilgisayar, Yazilim, Danismanlik, Ticaret, A.S. Method for tracking motion of a face

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHOI C S ET AL: "ANALYSIS AND SYNTHESIS OF FACIAL IMAGE SEQUENCES IN MODEL-BASED IMAGE CODING", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 4, no. 3, 1 June 1994 (1994-06-01), pages 257 - 275, XP000460758 *
TERZOPOULOS D ET AL: "ANALYSIS AND SYNTHESIS OF FACIAL IMAGE SEQUENCES USING PHYSICAL AND ANATOMICAL MODELS", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 15, no. 6, 1 June 1993 (1993-06-01), pages 569 - 579, XP000369961 *
YUENCHENG LEE ET AL: "REALISTIC MODELING FOR FACIAL ANIMATION", COMPUTER GRAPHICS PROCEEDINGS, LOS ANGELES, AUG. 6 - 11, 1995, 6 August 1995 (1995-08-06), COOK R, pages 55 - 62, XP000546216 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002009037A2 (en) * 2000-07-24 2002-01-31 Reflex Systems Inc. Modeling human beings by symbol manipulation
WO2002009040A1 (en) * 2000-07-24 2002-01-31 Eyematic Interfaces, Inc. Method and system for generating an avatar animation transform using a neutral face image
WO2002009037A3 (en) * 2000-07-24 2002-04-04 Reflex Systems Inc Modeling human beings by symbol manipulation
WO2002030171A2 (en) * 2000-10-12 2002-04-18 Erdem Tanju A Facial animation of a personalized 3-d face model using a control mesh
WO2002030171A3 (en) * 2000-10-12 2002-10-31 Tanju A Erdem Facial animation of a personalized 3-d face model using a control mesh
US6664956B1 (en) 2000-10-12 2003-12-16 Momentum Bilgisayar, Yazilim, Danismanlik, Ticaret A. S. Method for generating a personalized 3-D face model
US7127081B1 (en) 2000-10-12 2006-10-24 Momentum Bilgisayar, Yazilim, Danismanlik, Ticaret, A.S. Method for tracking motion of a face

Also Published As

Publication number Publication date
AU3452697A (en) 1998-02-02

Similar Documents

Publication Publication Date Title
Noh et al. A survey of facial modeling and animation techniques
US7116330B2 (en) Approximating motion using a three-dimensional model
Buck et al. Performance-driven hand-drawn animation
US11778002B2 (en) Three dimensional modeling and rendering of head hair
Parke Control parameterization for facial animation
Pandzic et al. Towards natural communication in networked collaborative virtual environments
WO1998001830A1 (en) Image processing
US20230106330A1 (en) Method for creating a variable model of a face of a person
Otsuka et al. Extracting facial motion parameters by tracking feature points
US20230281901A1 (en) Moving a direction of gaze of an avatar
US20220076409A1 (en) Systems and Methods for Building a Skin-to-Muscle Transformation in Computer Animation
KR100229538B1 (en) Apparatus and method for encoding a facial movement
JP2001231037A (en) Image processing system, image processing unit, and storage medium
JP2843262B2 (en) Facial expression reproduction device
Dugelay et al. Synthetic/natural hybrid video processings for virtual teleconferencing systems
US11158103B1 (en) Systems and methods for data bundles in computer animation
Cowe Example-based computer-generated facial mimicry
US11875504B2 (en) Systems and methods for building a muscle-to-skin transformation in computer animation
US20230247180A1 (en) Updating a model of a participant of a three dimensional video conference call
de Dinechin et al. Automatic generation of interactive 3D characters and scenes for virtual reality from a single-viewpoint 360-degree video
Jiang et al. Animating arbitrary topology 3D facial model using the MPEG-4 FaceDefTables
US20230070853A1 (en) Creating a non-riggable model of a face of a person
US20230085339A1 (en) Generating an avatar having expressions that mimics expressions of a person
Mortlock et al. Virtual conferencing
Burford et al. Face-to-face implies no interface

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 09043424

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 98503646

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA