GB2320839A

GB2320839A - Encoding facial movement in a 3D model-based image coding system

Info

Publication number: GB2320839A
Application number: GB9726058A
Authority: GB
Inventors: Min-Sup Lee
Original assignee: Daewoo Electronics Co Ltd
Current assignee: WiniaDaewoo Co Ltd
Priority date: 1996-12-27
Filing date: 1997-12-09
Publication date: 1998-07-01
Also published as: JPH10215452A; KR19980053565A; KR100229538B1; GB9726058D0

Abstract

A method for encoding a facial movement of a new face based on a voice and a 2-dimensional (2D) image signals in a 3-dimensional (3D) model-based coding system is provided. An adaptive 3D model is generated 10 from initial data of the new face based on a basic 3D model 14 of a common face of a human being; and a basic pattern 18 for the 2D image signal is produced by a rotation correlation between the 2D image signal 16 and the adaptive 3D model 10. One or more feature regions of the new face from the 2D image signal are compared 20-28 with the basic pattern so that a plurality of transformation parameters are detected. The transformation parameters are modified 36 based on the voice signal.

Description

METHOD AND APPARATUS FOR ENCODING A FACIAL MOVEMENT The present invention relates to a method and apparatus for encoding moving objects; and, more particularly, to a method and apparatus capable of encoding and decoding a facial movement by using a 3 dimensional face model.

In digitally televised systems such as video-telephone, teleconference and high definition television systems, a large amount of digital data is needed to define each video frame signal since a video line signal in the video frame signal comprises a sequence of digital data referred to as pixel values. Since, however, the available frequency bandwidth of a conventional transmission channel is limited, in order to transmit the large amount of digital data therethrough, it is necessary to compress or reduce the volume of data through the use of various data compression techniques, especially for such low bit-rate video signal encoders as video-telephone and teleconference systems employed for transmitting a shape of human being therethrough.

In a video coding system, images to be transmitted generally consist of pixels which vary continuously. In a 3 dimensional model-based coding system, however, a particular movement parameter is extracted from the images and the same is transmitted to a receiving end. At the receiving end, in order to reconstruct the images, for example, facial images, the movement parameter received is combined with data such as a shape of basic face of a person which is transmitted to the receiving end in advance and a general 3 dimensional model for a head.

In the video-telephone and teleconference systems, the video images are primarily comprised of head-and-shoulder images, i.e., an upper body of a person. Furthermore, a most likely object of interest to a viewer will be the face of the person and the viewer will focus his/her attention on moving parts, i.e., the person's mouth area including his/her lip, chin, head and the like that are moving, especially when the person is talking in a video scene, not on the background scenery or other details. Therefore, if only general information on the shape of the face need be transmitted, the amount of digital data can be substantially reduced.

It is, therefore, an object of the invention to provide a method and apparatus capable of encoding and decoding a facial movement by using a 3 dimensional face model with a reduced amount of transmission data.

In accordance with the present invention, there is provided a method for encoding a facial movement of a new face based on a voice and a 2-dimensional (2D) image signals in a 3-dimensional (3D) model-based coding system, wherein the voice and the 2D image signals of the new face are provided either on a frame-by-frame basis or on a field-by-field basis, the method comprising the steps of: (a) generating an adaptive 3D model from initial data of the new face based on a basic 3D model, wherein the initial data represents one or more 2D facial images of the new face and the basic 3D model represents a 3D model of a common face of a human being; (b) producing a basic pattern for the 2D image signal based on the adaptive 3D model, wherein the basic pattern represents a 2D picture obtained by a rotational correlationship between the 2D image signal and the adaptive 3D model; (c) extracting one or more feature regions of the new face from the 2D image signal, wherein the feature regions represent one or more regions in which lots of transformation take place; (d) comparing the feature regions with the basic pattern to detect a plurality of transformation parameters, wherein the transformation parameters represent the comparison results; (e) modifying the transformation parameters based on the voice signal to generate modified transformation parameters; and (f) encoding the initial data and the modified transformation parameters.

The above and other objects and features of the present invention will become apparent from the following description of preferred embodiments given in conjunction with the accompanying drawings, in which: Fig. 1 illustrates a block diagram of an apparatus 100 for encoding a facial movement in accordance with the present invention; Fig. 2 provides a block diagram of an apparatus 200 for decoding the facial movement in accordance with the present invention; Fig. 3A depicts a multiplicity of eye and eyebrow transformation parameters in accordance with the present invention; Fig. 3B shows a plurality of mouth transformation parameters in accordance with the present invention; Fig. 3C presents 3 chin transformation parameters in accordance with the present invention; and Fig. 3D represents 3 head transformation parameters in accordance with the present invention.

In an embodiment of the present invention, it is assumed that an input image of concern is a face of a human being and predetermined feature parts of a facial image to be encoded are parts of a head, a mouth, a chin, eyebrows and eyes.

Referring to Fig. 1, there is shown a block diagram of an apparatus 100 for encoding facial movements in accordance with the embodiment of the present invention, wherein the facial movements are classified based on a method further explained in the followings.

For convenience sake, a human body will be divided into an upper body including waist and a lower body below the waist. And, the upper body is again divided into a head, a trunk, arms and the like. And the head is again divided into eyes, nose, mouth and ears and the like. If the eyes, the nose, the mouth and the ears are considered as basic patterns, a hierarchical system for the human body can be organized based on such basic patterns and transformation parameters representing transformation of the basic patterns can be extracted. Hereinafter, the transformation parameters and data structure thereof will be explained for the head.

Basic patterns of the head may be divided into two categories. A first category corresponds to regions where lots of transformation of the basic patterns take place and a second category corresponds to regions where the transformation of the basic patterns takes place very rarely.

The former corresponds to regions of the eyes, eyebrows, mouth, chin, cheeks and forehead, and the latter to regions of the hair, nose, ears and the like. Basic patterns which are used in extracting the transformation parameters will correspond to actively moving regions. Accordingly, the selected transformation parameters will include parameters for the eyes, the eyebrows, the mouth, the chin and the head, which will be described in detail in the following. Wrinkle in the forehead and the cheeks move passively following the movement of the eyebrows and the chin, respectively.

1) Eyebrow : As shown in Fig. 3A, eyebrows are divided into left and right eyebrows, and left and right eyebrows transformation parameters include inner eyebrow up-down movement parameters (EB1, EB2), eyebrow left-right movement parameters (EB3, EB4) and outer eyebrow up-down movement parameters (EB5, EB6), respectively.

2) Eye : As also shown in Fig. 3A, left and right eyes transformation parameters include eyelid up-down movement parameters (EL1, EL2), pupil up-down movement parameters (El, E2) and pupil left-right movement parameters (E3, E4), respectively.

3) Mouth : As shown in Fig. 3B, a mouth movement is dependent on a lip movement. Mouth transformation parameters include left-right movement parameters of both end points of the lip (L1, L2), up-down movement parameters in uppermost and lowermost points of a central region of the lip (L3, L4), forward-backward movement parameters in the uppermost and lowermost points of the central region of the lip (L5, L6) and up-down movement parameters of the end points of the lip (L7, L8).

4) Chin : As shown in Fig. 3C, chin transformation parameters include an up-down movement parameter (C1), a leftright movement parameter (C2) and a forward-backward movement parameter (C3).

5) Head : As shown in Fig. 3D, a 3 dimensional coordinate frame is defined such that a facial plane, a virtual plane parallel to the face, is perpendicular to x axis, z axis passes through a center of a crown of the head and y axis is perpendicular to both the x and z axes, the origin of the coordinate being located at or around the atlas. In other words, the x, y and z axes run parallel to the movement directions of the parameters L5, L1 and L3 shown in Fig. 3B, respectively. Head transformation parameters include 3 rotation parameters, i.e., a yawing parameter (H1) indicating a left-right rotation around z-axis, a pitching parameter (H2) indicating a up-down rotation around y axis and a rolling parameter (H3) indicating a left-right slant, i.e., a rotation around x axis.

These parameters have 0 values at their respective basic positions on the basic patterns.

The transformation parameters are stored and transmitted in a data format given below, wherein the transformation parameters of each basic pattern are represented by independent items.

TITLE CODE BIT-NUMBER START CODE head 3 HEAD ORIENTATION BIT head~orientation~bit 1 HEAD ORIENTATION ITEMS head orientation items 3 head~orientation~items[0] H1 8 head~orientation~items[1] H2 7 head~orientation~items[2] H3 5 EYEBROW TRANSFORMATION BIT eyebrow~bit 1 LEFT-RIGHT EYEBROW ITEMS eyebrows 2 LEFT EYEBROW ITEMS lefteyebrow~items 3 lefteyebrow~items[0] EB1 3 lefteyebrow~items[l] EB3 3 lefteyebrow~items[2] EB5 3 RIGHT EYEBROW ITEMS righteyebrow~items 3 righteyebrow~items[0] EB2 3 righteyebrow~items[1] EB4 3 righteyebrow~items[2] EB6 3 EYE TRANSFORMATION BIT eye~bit 1 El 3 E2 3 E3 3 E4 3 EL1 3 EL2 3 MOUTH TRANSFORMATION BIT mouth~bit 1 TRANSFORMATION SELECTION BIT speech~bit 1 L1 3 L2 3 L3 3 L4 3 L5 3 L6 3 L7 3 L8 3 sound 8 pace 4 accent 3 CHIN TRANSFORMATION BIT chin~bit 1 C1 4 C2 3 C3 3 FACE TEXTURE BIT face~texture~bit 1 face~data VLB An explanation will be given for each item hereinafter.

1. START CODE (head) : A 3 bit code that represents the start of head data is set to, e.g., "001". If the start code is other than "001", the head data will not follow.

2. HEAD ORIENTATION BIT (head~orientation~bit) : A 1 bit code that indicates whether the head is rotated or not. Its value 1 indicates that the head is rotated and the head orientation parameters will follow. Its value 0 tells that the head is not rotated.

1) HEAD ORIENTATION ITEMS (head~orientation~items) : A 3 bit code that indicates into which direction the head is rotated. The respective 3 bits of the code represent existence of their corresponding orientation items. A bit of 1 indicates that the head is rotated in its corresponding direction. For example, its value "110" indicates that a yawing, i.e., a left-right rotation and a pitching, i.e., an up-down rotation of the head are taken place.

a) head~orientation~items[0] : An 8 bit head yawing parameter (H1) indicates an integer value given in 181 steps from -90 degrees to 90 degrees.

b) head orientation items[l] : A 7 bit head pitching parameter (H2) represents an integer value in 121 steps from 60 degrees to 60 degrees.

c) head~orientation~items[2] : A 5 bit head rolling parameter (H3) represents an integer value in 31 steps from 15 degrees to 15 degrees.

3. EYEBROW TRANSFORMATION BIT (eyebrow~bit) : A 1 bit code that tells whether the eyebrows move or not. Its value 1 indicates that the eyebrows move and its value 0 means that the eyebrows do not move.

1) LEFT-RIGHT EYEBROW ITEMS (eyebrows) : A 2 bit code that indicates which eyebrow moves.

a) 00 : None are moved.

b) 01 : Left eyebrow moves.

c) 10 : Right eyebrow moves.

d) 11 : Both eyebrows move.

2) LEFT EYEBROW ITEMS (lefteyebrow~items) : A 3 bit code that indicates into which direction a left eyebrow moves. The 3 bits of the code represent the existence of the three movement parameters below.

a) lefteyebrow~items[0] : Its value 1 in the code lefteyebrow items indicates that the inner left eyebrow moves up or down. A 3 bit inner left eyebrow up-down movement parameter (EB1) is given in 7 steps from -1.0 to 1.0.

STEP 1 2 3 4 z5 6 7 WEIGHTING(w) -1.0 -0.6 0.3 0.0 0.3 0.6 1.0 The 4th step, wherein there usually is almost no movement, and two extreme steps 1st and 7th steps will be given predetermined absolute 3 dimensional coordinates and locations of the remaining steps are calculated by applying predetermined weighting factors, e.g., in the table shown above. The coordinates of the 2nd, 3rd, 5th and 6th steps may be computed as: for 2nd and 3rd steps, x(j) =jw(j) t*x(step 1)+(1.0-lw(j)l)*x(step 4) y(j)=|w(j)|*y(step 1)+(1.0-|w(j)|)*y(step 4) z(j)=|w(j)|*z(step 1)+(1.0-lw(j)l)*z(step 4) and for 5th and 6th steps, x(j) =w(j) *x(step 7) +(1.0-w(J)) step 4) y(j) =w(j) *y(step 7)+(1.0-w(j)) *y(step 4) z(j) =w(j) *z(step 7) +(1.0-w(j))*z(step 4) wherein x(j), y(j), and z(j) represent x, y and z coordinates for jth step; w(j) is a predetermined weight factor for jth step; and x(step i), y(step i) and z(step i) are x, y and z coordinates at ith step.

b) lefteyebrow items[l] : Its value 1 indicates that the left eyebrow moves in left or right direction. A 3 bit left eyebrow left-right movement parameter (EB3) is given in 7 steps from -1.0 to 1.0. Locations of the steps are determined in a similar manner as in the EB1.

c) lefteyebrow~items[2] : Its value 1 indicates that the outer left eyebrow moves up or down. A 3 bit outer left eyebrow up-down movement parameter (EB5) is in 7 steps from 1.0 to 1.0. The weightings used in (EB1) are applied to (EB5) and the locations of the steps are determined in a similar manner as in the EB1.

3) RIGHT EYEBROW ITEMS (righteyebrow~items) : A code that indicates into which direction a right eyebrow moves. The functions of the right eyebrow transformation parameters (EB2, EB4, EB6) are same as those of the left eyebrow transformation parameters (EB1, EB3, EB5).

4. EYE TRANSFORMATION BIT (eye~bit) : A 1 bit code that indicates whether eyes move or not. Its value 1 indicates that the eyes move and its value 0 indicates that the eyes do not move.

1) PUPIL UP-DOWN MOVEMENT PARAMETERS (El, E2) : (El) and (E2) represent up-down movements of the left and the right eyes, respectively. (El) and (E2) have 7 steps, respectively.

The 4th step, wherein there usually is almost no movement, and two extreme steps 1st and 7th steps will be given predetermined absolute 3 dimensional coordinates and locations of the remaining steps are calculated as in the case of the EB1.

2) PUPIL LEFT-RIGHT MOVEMENT PARAMETERS (E3, E4) : (E3) and (E4) represent left-right movements of the left and the right eyes, respectively. (E3) and (E4) have 7 steps, respectively, and locations of the steps are calculated as in the case of the El and E2.

3) OUTER EYELID UP-DOWN MOVEMENT PARAMETERS (EL1, EL2) : (EL1) and (EL2) represent up-down movements of left and right eyelids, respectively. (EL1) and (EL2) have 7 steps, respectively, and locations of the steps are calculated as in the case of the El and E2.

5. MOUTH TRANSFORMATION BIT (mouth~bit) : A 1 bit code that indicates whether a shape of a mouth changes or not. Its value 1 indicates that the shape of the mouth changes and its value 0 indicates that the shape of the mouth does not change.

1) TRANSFORMATION SELECTION BIT (speech bit) : A code that indicates which transformation parameter is selected for the mouth. The shape of the lip is classified into two cases, i.e., one is a case when a person is talking, and the other is a case when the person expresses emotion. Generally, since the shape of the lip depends very much on a sound pronounced when the person is talking, the shape of the lip can be constructed by using characteristic features of the sound pronounced, pace and the accent of a voice. When the person expresses his/her emotion, however, the shape of the lip will not have any characteristic features. Therefore, the shape of the lip should be constructed by using all of the mouth transformation parameters, i.e., the left-right movement parameters of both end points of the lip (L1 L2), the up-down movement parameters in uppermost and lowermost points of a central region of the lip (L3, L4), the forward-backward movement parameters in the uppermost and lowermost points of the central region of the lip (L5, L6) and the up-down movement parameters of the end points of the lip (L7, L8) shown in Fig. 3B. Its value 1 indicates that the person is talking and that sound, pace and accent codes will follow thereafter. If the code is of a zero value, Ll to L8 codes follow. The L1 to L8 movement parameters are of 3 bits each and have 7 steps, respectively. Locations of the steps are calculated based on the scheme described above with respect to the movement parameters EB1. Characteristic features of the sound pronounced, the pace and the accent of a voice are expressed in 8, 4 and 3 bits, respectively.

6. CHIN TRANSFORMATION BIT (chin bit) : A 1 bit code that indicates whether the chin moves or not. Its value 1 indicates that the chin moves and its value 0 indicates that the chin does not.

1) CHIN UP-DOWN MOVEMENT PARAMETER (Cl) : A 4 bit chin up-down movement parameter (C1) represents the amount of displacement of the chin from the position corresponding to a closed mouth and is given in 16 steps, 0th step representing a closed mouth, 15th step representing a largest open mouth.

Locations of the chin for the 16 steps are calculated in a similar manner as in the EB1.

2) CHIN LEFT-RIGHT MOVEMENT PARAMETER (C2) : A 3 bit chin left-right movement parameter (C2) represents left-right movements of the chin and is given in 3 steps toward the left direction and in 3 steps toward the right direction from the base at the central region. Locations of the steps are calculated as in the case of EB1.

3) CHIN FORWARD-BACKWARD MOVEMENT PARAMETER (C3) : A 3 bit chin forward-backward movement parameter (C3) represents forward-backward movements of the chin and is given in 3 steps in the forward direction and in 3 steps in the backward direction from the base at the central region. Locations of the steps are calculated as in the case of EB1.

7. FACE TEXTURE BIT (face texture bit) : When a new face participates in communication, this code will be set to 1.

1) FACE DATA (face~data) : It represents compressed basic facial image data of the new face and its length varies.

Referring back to Fig. 1, initial data is applied to an adaptive 3-dimensional (3D) model block 10 and an encoder 12, wherein the initial data indicates one or more 2-dimensional (2D) expressionless and mute facial images of a new face just appeared on the screen, i.e., one or more still pictures of the new face. The encoder 12 encodes the initial data of the 2D facial images by a conventional encoding discipline to provide the encoded facial image as face~data to a formatter 36.

Meanwhile, a basic 3D model stored in a basic 3D model block 14 is applied to the adaptive 3D model block, wherein the basic 3D model represents a 3D model of a common face of a human being. The adaptive 3D model block 10 creates an adaptive 3D model similar to the new face by modifying the 2D initial data based on a basic 3D model and provides the adaptive 3D model to a head parameter block 16 and a basic pattern generation block 18.

In the meantime, image signals of the new face are provided to the head parameter block 16 and a feature extraction block 20 from, e.g., a camera(not shown); and voice signals of the new face are successively inputted to a voice analyzer 30 from, e.g., a microphone(not shown), wherein the image signals and the voice signals of the new face are inputted successively either on a frame-by-frame basis or on a field-by-field basis.

First of all, the head parameter block 16 detects the head yawing, pitching and rolling parameters H1 to H3 from the image signals of the new face by applying a conventional affine transform discipline to the adaptive 3D model of the new face. The head yawing, pitching and rolling parameters H1 to H3 are provided to the basic pattern generation block 18 and the formatter 36. The basic pattern generation block 18 generates a basic pattern of the new face, wherein the basic pattern represents 2D adaptive image of the new face obtained by rotating the adaptive 3D model for the yawing, pitching and rolling parameters and, then, projecting the rotated adaptive 3D model to a screen; and indexes left and right eyebrows, left and right eyes, mouth and chin in the basic pattern to generate basic eyebrows, eyes, mouth and chin patterns. The basic pattern generation block 18 provides the indexed eyebrows, eyes, mouth and chin to an eyebrow extraction block 22, an eye extraction block 24, a mouth 1 extraction block 26 and a chin 1 extraction block 28, respectively.

In the meantime, the feature extraction block 20 extracts edges of predetermined feature regions from the image signals of the new face by using a conventional edge detector such as a sobel operator, wherein the feature regions includes left and right eyebrows, left and right eyes, mouth and chin of the new face, and provides contour information of the feature regions, e.g., the left and the right eyebrows, the left and the right eyes, the mouth and the chin to the eyebrow extraction block 22, the eye extraction block 24, the mouth 1 extraction block 26 and the chin 1 extraction block 28, respectively, wherein the contour information represents the shape and the position of each of the feature regions.

The eyebrow extraction block 22 detects movements of the left and the right eyebrows based on the basic eyebrows pattern fed from the basic pattern generation block 18. If the left and the right eyebrows move, the left and the right eyebrows transformation parameters El to E6 are calculated in 3 bits, respectively. A 3 bit lefteyebrow~items signal, telling which of the left eyebrow transformation parameters El, E3 and E5 will be encoded, and a 3 bit righteyebrow~items signal, telling which of the right eyebrow transformation parameters E2, E4 and E6 will be coded, are generated. A 2 bit eyebrows signal, telling which eyebrow moves, is generated based on the lefteyebrow~items and the righteyebrow~items signals. Eyebrow data is successively provided to the formatter 36 in the data format given above, wherein the eyebrow data includes the eyebrows signal, the lefteyebrow~items signal, the left eyebrow transformation parameters EB1, EB3 and EB5, the righteyebrow~items signal, and the right eyebrow transformation parameters EB2, EB4 and EB6, if any.

The eye extraction block 24 detects movements of the left and the right eyes based on the basic eyes pattern fed from the basic pattern generation block 18 and generates the pupil up-down movement parameters El and E2, the pupil left-down movement parameters E3 and E4 and the outer eyelid up-down movement parameters EL1 and EL2 based on the movement of the left and the right eyes, respectively. The eye extraction block 24 provides an eye data, which includes the pupil updown movement parameters El and E2, the pupil left-down movement parameters E3 and E4 and the outer eyelid up-down movement parameters EL1 and EL2, if any, to the formatter 36.

The mouth 1 extraction block 26 detects a movement of the mouth under the new face's emotional expression based on the basic mouth pattern fed from the basic pattern generation block 18 and generates the mouth transformation parameters L1 to L8. The mouth transformation parameters L1 to L8 are provided to the formatter 36.

The chin 1 extraction block 28 detects a movement of the chin under the new face's emotional expression based on the basic chin pattern fed from the basic pattern generation block 18 and generates the chin transformation parameters C1 to C3.

The chin transformation parameters C1 to C3 are provided to the formatter 36.

In the meantime, the voice analyzer 30 compares the voice signal with a predetermined threshold to determine whether the new face is talking or expressing his/her emotion. A speech~bit signal, telling whether or not the new face is talking for communication, is provided to the formatter 36.

If the new face is talking, the sound pronounced, the pace and the accent are extracted from the voice signals to be provided to a mouth 2 extraction block 32 and a chin 2 extraction block 34.

The mouth 2 extraction block generates an 8 bit sound parameter, a 4 bit pace parameter and a 3 bit accent parameter with respect to the sound pronounced, the pace and the accent, respectively, for determining the shape of the mouth and provides the sound parameter, the pace parameter and the accent parameter to the formatter 36. If necessary, 3 chin transformation parameters C1 to C3 may be generated in the chin 2 extraction block 34 to be provided to the formatter 36.

The formatter 36 generates a 1 bit face~texture~bit whenever a new face appears in the scene, wherein the face~texture bit indicates that the face data of the new face is followed. The formatter 36 also generates the 3 bit start code signal, the 1 bit head~orientation~bit signal, the 1 bit eyebrow~bit signal, the 1 bit eye bit signal, the 1 bit mouth~bit signal and the 1 bit chin~bit signal, wherein the 1 bit eyebrow~bit signal generated based on the eyebrows signal tells whether or not there exists any movement in the left or the right eyebrow; the 1 bit eye~bit signal generated based on the parameters El to E4, EL1 and EL2 tells whether or not there exists any movement in the left or the right eye; the mouth~bit signal generated based on the mouth transformation parameters L1 to L8 or the sound, the pace and the accent parameters tells whether or not there exists any movement in the mouth; and the 1 bit chin bit signal generated based on the chin transformation parameters Cl to C3 tells whether or not there exists any movement in the chin. The formatter 36 multiplexes all the signals, the parameters and the face~data in accordance with the data format given above and provides the multiplexed results to a buffer 38 for storing, in which the stored data are provided to the transmitter(not shown) for transmission.

Referring to Fig. 2, there is shown a block diagram of an apparatus 200 for decoding facial actions in accordance with the present invention, wherein a transmitted data is temporally stored in a buffer 50 and is provided to an initial data decoder 52 and a parameter decoder 54.

The initial data decoder 52 decodes the face~data among the transmitted data to provide the 2D initial data of the new face to an adaptive 3D model block 58 within an adaptive 3D model generation block 57.

The adaptive 3D model block 58 creates an adaptive 3D model similar to the new face by modifying the 2D initial data based on the basic 3D model fed from a basic 3D model block 60 within an adaptive 3D model generation block 57, wherein the basic 3D model is identical to the basic 3D model of the encoding apparatus 100. The adaptive 3D model is provided to a pattern generation block 62.

In the meantime, the parameter decoder 54 decodes all the transmitted data except the face~data to generate all the transformation parameters, which include the head transformation parameters H1 to H3; the left and the right eyebrows transformation parameters EB1 to EB6; the left and the right eyes transformation parameters El to E4, EL1 and EL2; either the mouth transformation parameters L1 to L8 or the sound, the pace and the accent parameters; and the chin transformation parameters C1 to C3. The head transformation parameters H1 to H3 are provided to the pattern generation block 62 via a line L62; the left and the right eyebrows transfor image of the new face obtained by rotating the adaptive 3D model for the head transformation parameters, i.e., the yawing, pitching and rolling parameters H1 to H3 and, then, projecting the rotated adaptive 3D model to a screen; and the left and the right eyebrows, the left and the right eyes, the mouth and the chin in the basic pattern are indexed. The pattern generation block 62 provides the indexed eyebrows, eyes, mouth and chin to the eyebrow reconstruction block 64, the eye reconstruction block 66, the mouth reconstruction block 68 and the chin reconstruction block 70, respectively.

The eyebrow reconstruction block 64 reconstructs left and right eyebrows from the indexed eyebrows based on the left and the right eyebrows transformation parameters EB1 to EB6 to provide the reconstructed left and right eyebrows to the image reconstruction block 56. The eye reconstruction block 66 reconstructs left and right eyes from the indexed eyes based on the left and the right eyes transformation parameters El to E4, EL1 and EL2 to provide the reconstructed left and right eyes to the image reconstruction block 56. The mouth reconstruction block 68 reconstructs the mouth from the indexed mouth based on either mouth transformation parameters L1 to L8 or the sound, the pace and the accent parameters to provide the reconstructed mouth to the image reconstruction block 56. The chin reconstruction block 70 reconstructs the chin from the indexed chin based on the chin transformation parameters C1 to C3 to provide the reconstructed chin to the image reconstruction block 56.

The image reconstruction block 56 reconstructs a new image of the new face either on a frame-by-frame basis or on a field-by-field basis by replacing the eyebrows, the eyes, the mouth and the chin in the basic pattern fed from the pattern generation block 62 with those from the reconstruction block 63.

While the present invention has been described with respect to the particular embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.

Claims

Claims:

1. A method for encoding a facial movement of a new face based on a voice and a 2-dimensional (2D) image signals in a 3-dimensional (3D) model-based coding system, wherein the voice and the 2D image signals of the new face are provided either on a frame-by-frame basis or on a field-by-field basis, the method comprising the steps of: (a) generating an adaptive 3D model from initial data of the new face based on a basic 3D model, wherein the initial data represents one or more 2D facial images of the new face and the basic 3D model represents a 3D model of a common face of a human being; (b) producing a basic pattern for the 2D image signal based on the adaptive 3D model, wherein the basic pattern represents a 2D picture obtained by a rotational correlationship between the 2D image signal and the adaptive 3D model; (c) extracting one or more feature regions of the new face from the 2D image signal, wherein the feature regions represent one or more regions in which lots of transformation take place; (d) comparing the feature regions with the basic pattern to detect a plurality of transformation parameters, wherein the transformation parameters represent the comparison results; (e) modifying the transformation parameters based on the voice signal to generate modified transformation parameters; and (f) encoding the initial data and the modified transformation parameters.

2. The method according to claim 1, wherein the step (b) includes the steps of: (bl) determining head parameters for the 2D image signal based on the adaptive 3D model, wherein the head parameters represent a rotating condition for a projection image of the adaptive 3D model to be similar to the 2D image signal; and (b2) replacing the basic pattern with the projection image corresponding to the head parameters.

3. The method according to claim 2, wherein the projection image is determined by an affine transform discipline.

4. The method according to claim 1, wherein the step (d) includes the steps of: (dl) matching the feature regions with the basic pattern to calculate each moving quantity for each of the feature regions; and (d2) storing said each moving quantity in a corresponding transformation parameter.

5. The method according to claim 4, wherein the feature regions include left and right eyebrows, left and right eyes, mouth and chin; and the transformation parameters include eyebrows transformation parameters, eyes transformation parameters, mouth transformation parameters and chin transformation parameters.

6. The method according to claim 5, wherein the feature regions are determined by a sobel operator.

7. The method according to claim 1, wherein the step (e) includes the steps of: (el) obtaining a sound pronounced, a pace and an accent from the voice signal; (e2) comparing the sound pronounced, the pace and the accent with predetermined threshold, respectively, to generate a sound~bit signal, wherein the sound~bit signal tells whether the new face either talks or expresses emotion; and (e3) if the new face is determined to be talking, modulating mouth transformation parameters based on the sound pronounced, the pace and the accent to generate modified mouth transformation parameters.

8. The method according to claim 7, wherein, if the new face is determined to be talking, the step (e3) further has the step of modulating chin transformation parameters based on the sound pronounced, the pace and the accent to generate modified chin transformation parameters.

9. The method according to claim 8, wherein the modified transformation parameters include the head, the eyebrows, the eyes, the mouth and the chin transformation parameters.

10. An apparatus for encoding a facial movement of a new face based on a voice and a 2-dimensional (2D) image signals in a 3-dimensional (3D) model-based coding system, wherein the voice and the 2D image signals of the new face are provided either on a frame-by-frame basis or on a field-by-field basis, the apparatus comprises: adaptive 3D model generator for generating an adaptive 3D model from initial data of the new face based on a basic 3D model, wherein the initial data represents one or more 2D facial images of the new face and the basic 3D model represents a 3D model of a common face of a human being; basic pattern generator for producing a basic pattern for the 2D image signal based on the adaptive 3D model, wherein the basic pattern represents a 2D picture obtained by a rotational correlationship between the 2D image signal and the adaptive 3D model; feature extractor for extracting one or more feature regions of the new face from the 2D image signal, wherein the feature regions represent one or more regions in which lots of transformation take place; parameter generator for comparing the feature regions with the basic pattern to detect a plurality of transformation parameters, wherein the transformation parameters represent the comparison results; voice analyzer for generating a sound~bit signal based on the voice signal, wherein the sound~bit signal tells whether the new face either talks or expresses emotion; in response to the sound~bit signal, parameter modulator for modulating transformation parameters based on the voice signal to generate modified transformation parameters; and formatter for encoding the initial data and the transformation parameters or the modified transformation parameters.

11. The apparatus according to claim 10, wherein the basic pattern generator includes: means for determining head parameters for the 2D image signal based on the adaptive 3D model, wherein the head parameters represent a rotating condition for a projection image of the adaptive 3D model to be similar to the 2D image signal; and means for replacing the basic pattern with the projection image corresponding to the head parameters.

12. The apparatus according to claim 11, wherein the projection image is determined by an affine transform discipline.

13. The apparatus according to claim 10, wherein the parameter generator includes: means for matching the feature regions with the basic pattern to calculate each moving quantity for each of the feature regions; and means for storing said each moving quantity in a corresponding transformation parameter.

14. The apparatus according to claim 13, wherein the feature regions include left and right eyebrows, left and right eyes, mouth and chin; and the transformation parameters include eyebrows transformation parameters, eyes transformation parameters, mouth transformation parameters and chin transformation parameters.

15. The apparatus according to claim 14, wherein the feature regions are determined by a sobel operator.

16. The apparatus according to claim 14, wherein voice analyzer includes: means for obtaining a sound pronounced, a pace and an accent from the voice signal; and means for comparing the sound pronounced, the pace and the accent with predetermined threshold, respectively, to generate the sound~bit signal.

17. The apparatus according to claim 16, wherein, in response to the sound~bit signal, the mouth and the chin transformation parameters are modulated based on the voice signal to generate modified mouth and chin transformation parameters, respectively.

18. The apparatus according to claim 17, wherein the modified transformation parameters include the head, the eyebrows, the eyes, the modified mouth and the modified chin transformation parameters.

19. An apparatus constructed and arranged substantially as herein described with reference to or as shown in Figures 1, 2 and 3A to 3D of the accompanying drawings.