WO2002031772A2 - Procede de suivi du mouvement d'un visage - Google Patents

Procede de suivi du mouvement d'un visage Download PDF

Info

Publication number
WO2002031772A2
WO2002031772A2 PCT/IB2001/002736 IB0102736W WO0231772A2 WO 2002031772 A2 WO2002031772 A2 WO 2002031772A2 IB 0102736 W IB0102736 W IB 0102736W WO 0231772 A2 WO0231772 A2 WO 0231772A2
Authority
WO
WIPO (PCT)
Prior art keywords
markers
face
locations
local
motion
Prior art date
Application number
PCT/IB2001/002736
Other languages
English (en)
Other versions
WO2002031772A8 (fr
WO2002031772A3 (fr
Inventor
Tanju A. Erdem
Original Assignee
Erdem Tanju A
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/689,595 external-priority patent/US6294157B1/en
Application filed by Erdem Tanju A filed Critical Erdem Tanju A
Publication of WO2002031772A2 publication Critical patent/WO2002031772A2/fr
Publication of WO2002031772A8 publication Critical patent/WO2002031772A8/fr
Publication of WO2002031772A3 publication Critical patent/WO2002031772A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning

Definitions

  • the present invention is related to the field of digital video processing and analysis, and more specifically, to a technique for tracking the three-dimensional (3-D) motion of a person's face from a sequence of two-dimensional (2-D) images of the person's face that are sequentially received in chronological order.
  • Tracking the 3-D motion of a face in a sequence of 2-D images of the face is an important problem with applications to facial animation, hands-free human-computer interaction environment, and lip-reading.
  • Tracking the motion of the face involves tracking the 2-D positions of salient features on the face.
  • the salient features could be in the form of (i) points, such as the corners of the mouth, the eye pupils, or external markers placed on the face; (ii) lines, such as the hair-line, the boundary of the lips, and the boundary of eyebrows; and (iii) regions, such as the eyes, the nose, and the mouth.
  • the salient features can also be synthetically created by placing markers on the face. Tracking of salient features is generally accomplished by detecting and matching a plurality of salient features of the face in a sequence of 2-D images of the face. The problem of detecting and matching the salient features is made difficult by variations in illumination, occlusion of the features, poor video quality, and the real-time constraint on the computer processing of the 2-D images.
  • the present invention provides an improvement designed to satisfy the aferomentioned needs.
  • the present invention is directed to a computer program product for tracking the motion of a person's face from a chronologically ordered sequence of images of the person's face for the purpose of animating a 3-D model of the same or another person's face, by performing the steps of: (a) receiving a sequence of 2-D images of a person's face; (b) tracking the salient features of the person's face in the 2-D images; and (c) obtaining the 3-D global and local motion of the face from the tracked 2-D location of the salient features.
  • FIG. 1 is a perspective view of a computer system for implementing the present invention
  • FIG. 2 is a first flowchart for the method of the present invention
  • FIG. 3 is a second flowchart for the method of the present invention
  • FIG. 4 is a diagram illustrating the method of placing markers on a person's face
  • FIG. 5 is a diagram further illustrating the method of placing markers on a person's face
  • FIG. 6a is a diagram illustrating the method of calculating the calibration parameter of the camera with a target obj ect
  • FIG. 6b is a diagram illustrating the image of the target object captured by the camera
  • FIG. 7 is a diagram illustrating the method of acquiring a plurality of neutral images of a person's face using the camera
  • FIG. 8 is a diagram further illustrating the method of acquiring a plurality of action images of a person's face using the camera
  • FIG. 9 is a first table illustrating the method of locating global and local markers on the person's face
  • FIG. 10 is a second table illustrating the method of locating global and local markers on the person's face
  • FIG. 11 is a table illustrating the method of determining the surface normals of the global markers
  • FIG. 12 is a table illustrating the method of determining the surface normals and the motion planes of the local markers
  • the computer system 10 includes a microprocessor-based unit 12 for receiving and processing software programs and for performing other well known processing functions.
  • the software programs are contained on a computer useable medium 14, typically a compact disk, and are input into the microprocessor based unit 12 via the compact disk player 16 electronically connected to the microprocessor-based unit 12.
  • programs could also be contained in an Internet server 18 and input into the microprocessor-based unit 12 via an Internet connection 20.
  • a camera 22 is electronically connected to the microprocessor-based unit 12 to capture the 2-D images of a person's face.
  • a display 24 is electronically connected to the microprocessor-based unit 12 for displaying the images and user related information associated with the software.
  • a keyboard 26 is connected to the microprocessor based unit 12 for allowing a user to input information to the software.
  • a mouse 28 is also connected to the microprocessor based unit 12 for selecting items on the display 24 or for entering 2-D position information to the software, as is well known in the art.
  • a digital pen 30 and a digital pad 32 may be used for selecting items on the display 24 and entering position information to the software.
  • the output of the computer system is either stored on a hard disk 34 connected to the microprocessor unit 12, or uploaded to the Internet server 18 via the Internet connection 20. Alternatively, the output of the computer system can be stored on another computer useable medium 14, typically a compact disk, via a compact disk writer 36.
  • the first five steps are the initialization steps of the invention. Briefly stated, the first five steps are as follows: (a) selecting or placing salient features on the person's face (Step 100); (b) calculating the calibration parameter of the camera (Step 110); (c) acquiring a plurality of images of the person's face using the camera (Step 120); (d) calculating the 3-D positions of the salient features (Step 130); and (e) determining the surface normals and motion planes for the salient features (Step 140).
  • the second five steps are the tracking steps of the invention.
  • the second five steps are as follows: (f) acquiring a chronologically ordered sequence of 2-D images of the person's face in action (Step 150); (g) locking onto the salient features (Step 160); (h) tracking the global and local motion of the face (Step 170); (i) determining tracking failure (Step 180); and (j) storing or transmitting the global and local motion values (Step 190).
  • salient features are selected or placed on the person's face for tracking the global and local motion of the face.
  • Salient features that can be selected for tracking the global motion are the hairline, the comers of the eyes, the nostrils, and contours of the ears.
  • Salient features that can be selected for tracking the local motion are the eyebrows, eyelids, pupils, and the lips.
  • Methods have been proposed in the prior art for using the aforementioned salient features to track the global and local motion of the face.
  • salient features are designed and placed on the face rather than selected from what is naturally available on the face.
  • circular markers are placed on a head-set that is worn by the person.
  • the head-set may comprise a strap 206 for the skull, a strap 207 for the chin, and a strap 208 for the eyebrows.
  • two concentric circles are used to create the markers; one having twice the diameter of the other one, and the small one placed on top of the larger one.
  • the circles are painted in black and white.
  • markers black-on-white 213 and white- on-black 214 markers.
  • markers may be used, including and not limited to fluorescent dyes and contrasting paints.
  • FIG. 5 in a second preferred embodiment of the invention, circular markers are placed directly on the person's face.
  • Markers are placed on the following ten locations on the person's face for tracking the global motion of the face, henceforth they are referred to as the global markers: right-ear-base 251, left-ear-base 252, right-temple 253, left-temple 254, right-outer-forehead 255, left-outer-forehead 256, right-central- forehead 257, left-central-forehead 258, node-base 259, and nose-tip 260.
  • Markers are placed on the following six locations on the person's face for tracking the local motion of the face, henceforth they are referred to as the local markers: right-lip-co ner 261, left-lip- comer 262, upper-lip-center 263, lower-lip-center 264, right-central-eyebrow 265, and left-central-eyebrow 266.
  • a perspective image of a target object is captured with the camera with the target object being placed at approximately the same distance from the camera as the person's face.
  • the method of the present invention uses the perspective image of the target object to calculate a camera parameter that is used in the subsequent steps, hereinafter referred to as the E parameter.
  • the E parameter has a non-negative value and it is a measure of the amount of perspective deformation caused by the camera. A zero value indicates no perspective deformation and the larger the value of the E parameter the more the perspective deformation caused by the camera.
  • the target object is square shaped and planar, hence letting al denote the 3-D vector from (X.,Y.,Z.) to (X 2 ,Y 2 ,Z 2 ) and aJ denote the 3-D vector from (X.,Y.,Z.) to (X 4 , Y 4 , Z 4 ) , where I and J are orthonormal vectors and ⁇ is the size of the square, we have the following mathematical expressions for the 3-D positions of the comers of the square object:
  • the method of acquiring a plurality of images of a person's face using the camera comprises the steps of (1) acquiring neutral images of the face (Step 121); and (2) acquiring action images of the face (Step 122). In the following, a detailed description of these steps is given.
  • a plurality of 2-D images of the person's face in the same neutral state are captured with the camera from different directions.
  • the neutral state for the face means that all face muscles are relaxed, eyes are normally open, mouth is closed and lips are in contact. These images are subsequently used to obtain the neutral 3-D positions of the salient features of the face, hence, hereinafter they are referred to as the neutral images.
  • the camera directions to capture neutral images are selected so that the majority of salient features are visible in all images.
  • the face is not required to be at the same distance from the camera in all the neutral images.
  • markers are placed on the person's face as described in Step 100, and fifteen camera directions selected for obtaining the neutral images, hi order to obtain the neutral images, the camera remains fixed and the person rotates his/her head to realize the following fifteen different directions: front 221, forehead 222, chin 223, angled-right 224, angled- right-tilted-down 225, angled-right-tilted-up 226, angled-left 227, angled-left-tilted-down 228, angled-left-tilted-up 229, full-right-profile 230, full-right-profile-tilted-down 231, full-right-profile-tilted-up 232, full-left-profile 233, full-left-profile-tilted-down 234, and full-left-profile-tilted-up 235.
  • a plurality of 2-D images of the person's face in action states are captured with the camera from different directions.
  • the action states for the face include faces with a smiling mouth, a yawning mouth, raised eyebrows, etc. These images are subsequently used to obtain the 3-D position of the local salient features when the face is in action states, hence, hereinafter they are referred to as the action images.
  • the camera directions to capture the action images are selected so that the majority of salient features are visible in all images.
  • the face is not required to be at the same distance from the camera in all the action images.
  • markers are placed on the person's face as described in Step 100 and five facial action states and two camera directions for each action are selected.
  • the facial action states are as follows: smiling mouth, yawning mouth, kissing mouth, raised eyebrows, and squeezed eyebrows.
  • the camera directions are front and right.
  • the method calculating the neutral 3-D positions of the salient features comprises the steps of (1) locating the global and local salient features in the neutral and action images (Step 131); (2) calculating the 3-D positions of the global and local salient features for the neutral face (Step 132); and (3) calculating the 3-D positions of the local salient features for the action faces (Step 133).
  • Step 131 locating the global and local salient features in the neutral and action images
  • Step 132 calculating the 3-D positions of the global and local salient features for the neutral face
  • Step 133 calculating the 3-D positions of the local salient features for the action faces
  • Step 131 The salient features are automatically or manually located on the acquired images. It is important to note that not all of the salient features may be visible in all neutral and action images and some salient features may not be in their neutral position in some action images. Thus, in the present invention, the location of only the visible salient features and salient features that are in their neutral position are automatically or manually located in each neutral and action image.
  • markers that are placed on the face are used as the salient features as described in Step 100. These markers are manually located in the neutral images that are indicated with an X in the table in FIG. 9, and are manually located in action images that are indicated with an X in FIG. 10. The markers are assumed as invisible in those neutral images that are not indicated with an X in the table in FIG. 9. The markers are not in their neutral position in those action images that are not indicated with an X in the table in FIG. 10. In operation, the computer program prompts the user to manually locate only the visible markers and markers that are in their neutral position in each image.
  • the 3-D positions of the salient features of the person's face are calculated using a modified version of the method in "Shape and Motion from Image Streams under Orthography: A Factorization Method" by Carlo Tomasi and Takeo Kanade, International Journal of Computer Vision, vol. 9, no. 2, pp. 137-154, 1992.
  • global and local markers placed on the person's face as described in Step 100 are used as the salient features.
  • a general mathematical analysis of 2-D image projections of 3-D marker positions is given.
  • the method of "Shape and Motion from Image Streams under Orthography” is reviewed.
  • the proposed modification to the method of "Factorization of Shape and Motion” is presented.
  • the image plane passes at (0,0,-E) and is perpendicular to k .
  • N denote the number of global markers
  • P n , n- 1,...,N denote the coordinates of the global markers with respect to the origin (0,0,0) of the camera system.
  • M denote the number of local markers
  • the coordinates, of all the markers are changed. It is therefore more appropriate to use a local coordinate system for the face to represent the coordinates of the markers.
  • the unit vectors i , j , and k denote the coordinate axes for an arbitrary local coordinate system for the face.
  • the origin C 0 of the local coordinate system is defined to be the centroid of the markers and is given by
  • the origin of the local coordinate system is changed but the local coordinates of the markers always remain fixed.
  • W is some constant in units of meters that will be defined shortly.
  • the quantities on the left hand side are measured quantities while the quantities on the right hand side are unknown quantities.
  • the method of "Factorization of Shape and Motion" solves the above equations for the 3-D local coordinates S H and L n of the global and local markers, respectively, the orientation vectors I f and J f , and the 2-D position (c f o,x,c f o, y ) of the centroid of the markers in all images in terms of the 2-D projected positions (p f n , x ,p f n , y ) and (q f n,x,q f n,y) of the global and local markers, respectively, in all images.
  • the third orientation vector K 1 is uniquely defined by the first two orientation vectors I 3 and J 1 simply as
  • K f I f xj f .
  • the number of iterations is selected to be 50 and the threshold is selected to be 1 pixel.
  • the 3-D positions of the global markers right-ear-base 251, left-ear-base 252, nose-base 259, and nose-tip 260 are used to globally translate and rotate the the 3-D positions of the global and local markers so that they correspond to a frontal-looking face.
  • Letri and r 2 denote the 3-D positions of the right- ear-base 251 and left-ear-base 252, respectively; / denote the 3-D position of the nose- base 259; and b denote the 3-D position of the nose-tip 260. Then, the following procedure is used to globally translate the positions of the markers: 1. Define the following vector
  • the following procedure is used to globally rotate the marker positions so that they correspond to a frontal-looking face:
  • the method of calculating the 3-D positions of the local salient features for the action faces is disclosed in the following.
  • global and local markers placed on the person's face as described in Step 100 are used as the salient features.
  • the position and orientation of the person's face in the action images are calculated using the 3-D positions S n of the global markers and the 2-D measurements (p f n, x ,p f n, ) of the global markers in the action images.
  • the 3-D positions L ⁇ of the local markers in the action states are calculated using the position and orientation of the person's face in the action images and the 2-D measurements (q f n, x ,q f n , y ) of the local markers in the action images.
  • the 3-D position of the face in an image / is described by the centroid (c f o, x ,c f o,y) of the markers and the camera-distance-ratio ⁇ of the face in that image.
  • the 3-D orientation of the face in an image / is described by the vectors ⁇ f and J ⁇ in that image.
  • the 3-D position and orientation parameters (c f o, x ,c f o,y) , ⁇ f , I f and - ⁇ in the action images are calculated using the following steps:
  • the number of iterations is selected to be 50 and the threshold is selected to be 1 pixel.
  • action state ⁇ — 1 corresponds to a yawning mouth 241 and 242
  • Steps 1 and 2 Repeat Steps 1 and 2 until a predetermined number of iterations has been reached, or the following average measurement of matching error
  • the number of iterations is selected to be 50 and the threshold is selected to be 1 pixel.
  • a surface normal is defined for each marker.
  • the surface normals are used during the tracking process to determine if a marker is visible in a 2-D image.
  • the surface normal for a marker is defined to be the vector perpendicular to the surface of the face at the location of the marker, hi a preferred embodiment of the invention, the vectors given in the table in FIG. 11 are defined as the surface normals for the global markers.
  • the surface normals for local markers are given in FIG. 12. It should be noted that the surface normals given in the tables in FIGS. 11 and 12 are not necessarily normalized. They can be normalized to so that they all have umt length.
  • the surface normals for the markers are used later in Step 170 to determine the visibilities of the markers in a 2-D image.
  • a video of the face of the person in action is received.
  • the 2-D images of the video are processed to track the salient features on the face and to calculate the global and local motion of the face in the order they are received.
  • a locking method is used to start tracking the salient features of the face.
  • the locking method is used at the very beginning of the tracking process or whenever the tracking is lost, as described in Step 190. initial images of the video are used to lock the tracking process onto the salient features on the face.
  • cross-like signs are displayed on top of the 2-D image to be associated with the markers on the face.
  • the locations of the signs are determined by projecting the 3-D positions of the markers obtained in Step 132 assuming a frontal orientation of the face.
  • the person looks directly at the camera so as to produce a frontal view positioned at the center of the image. The person moves his/her face back and forth and also rotates his/her face if necessary until
  • the method of the present invention considers the locations of the cross-like signs as the predicted locations for the features and uses the method of Step 173 to calculate the current motion in the 2-D image. If the calculated motion corresponds to a frontal orientation at the center of the display, then the method of the present invention considers a lock onto the features has been achieved.
  • the method finding the 3-D global and local motion of the face in each 2-D image comprises the steps of (1) predicting the global motion (Step 171); (2) detecting the global salient features (Step 172); (3) estimating the global motion (Step 173); (4) predicting the local motion (Step 174); (5) detecting the local salient features (Step 175); and (6) estimating the local motion (Step 176).
  • Step 171 Predicting The Locations of Global Salient Features (Step 171)
  • the global motion of the face in a 2-D image is defined to be the 3-D orientation and position of the face in the 2-D image.
  • the global motion of the face in a 2-D image that is currently processed is predicted from the motion of the face in the previously processed 2-D images.
  • the calculated position and orientation of the face in the immediate previous 2-D image is used as the prediction for the global motion in the current 2-D image.
  • the method of detecting the global markers in the current 2-D image is comprised of the following steps:
  • a 2-D correlation filter is designed that has the support given by the outer ellipse and having the value of 1 inside the inner ellipse and the value of 0 elsewhere.
  • the coefficients of the 2-D correlation filter for the global marker n be given by c n (x,y) .
  • W W f ⁇ i ) ⁇ c n ( ⁇ > y) - I( ⁇ + i + Pn, x > y +J+ pcountry, y ) > ⁇ ⁇ z ' - / ⁇ '
  • I(x,y) denotes the intensity distribution of the 2-D image with the center of the image being at (0,0).
  • the visibility threshold is selected as 0.25 and the size W of the square region is selected as 20 pixels.
  • Step 172 there are L valid detected locations assigned to L global markers.
  • the 3-D orientation I and J , and the 3-D position (c 0 ⁇ X ,c O y ) , ⁇ , of the face in the current 2-D image are then calculated from these L detected locations using the following steps:
  • the number of iterations is selected to be 50 and the threshold is selected to be 1 pixel.
  • the local motion of the face in a 2-D image is defined through an action vector that represents the actions of the face in the 2-D image.
  • an action vector that represents the actions of the face in the 2-D image.
  • a m being the amount of yawning-mouth action
  • a MS being the amount of smiling- outh action
  • a MK being the amount of kissing-mouth action
  • a m being the amount of raised-eyebrows action
  • a ES being the amount of squeezed-eyebrows action.
  • an action vector A ( ⁇ .5, 0.0, 0.0, 1.0, 0.0) represents a half-yawning mouth and fully raised eyebrows.
  • A (O.O, 0.0, 0.0, 0.0, 0.0).
  • the action vector found for the previous image is used as the predicted action vector for the current image.
  • A denote the predicted action vector for the current image.
  • A vAi ⁇ A > A ⁇ Ai ⁇ — I > AI ⁇ Ai > A - A j
  • the method of detecting the global markers in the current 2-D image is comprised of the following steps:
  • a 2-D correlation filter is designed that has the support given by the outer ellipse and having the value of 0 inside the inner ellipse, the value of 1 in the outer ellipse, and the value of 0 elsewhere.
  • W W h n ( i ) ⁇ d relief(x,y) -I(x + i + q n x ,y + j + q n>y ), -— ⁇ i,j ⁇ — , where the summation is over the support of the correlation filter d n (x,y) and
  • I(x,y) denotes the intensity distribution of the 2-D image with the center of the image being at (0,0).
  • the visibility threshold is selected as 0.25 and the size W of the square region is selected as 20 pixels.
  • Step 176 Eliminate superfluous and multiple detected locations: If the distance between any two detected locations is less than a distance threshold, but larger than zero, then discard the detected location that has a smaller peak value. On the other hand, if the exact same location is detected for more than one local marker, then assign the detected location only to the local marker that has the largest visibility index.
  • the distance threshold is selected to be 1 pixel. All local markers that are not assigned a valid detected location are assumed invisible for the purpose of estimating the local motion that is done in the following Step 176.
  • the local motion of the face is represented by an action vector as described in Step 174.
  • the action vector for the current image is calculated using the following steps:
  • the 3-D displacements of the local markers are calculated from the 2-D displacements of the local markers, the 2-D motion planes of the local markers, and the global motion of the face in the current image.
  • the 2-D motion plane of a local marker passes from the neutral 3-D position of the local marker and approximates the motion space of local marker with a plane.
  • Two basis vectors are used to define each motion plane. Let B l n and B 2 n denote the basis vectors 1 and 2 for the local marker n.
  • the basis vectors for the motion planes of the local markers are given in FIG. 12.
  • the 3-D displacements of the local markers are then calculated as follows. Form the matrix Mschreib for each local marker ,
  • the 3-D moved positions of the markers can be modified so as to satisfy the motion symmetries of the face.
  • Examples of motion symmetries of the face are as follows: the right and the left eyebrows move simultaneously and by the same amount, and the right and the left comers of the mouth move simultaneously and by the same amount.
  • the calculated 3-D displacements of the markers can be further modified to enforce motion dependencies of the face.
  • An example of a motion dependency of the face is as follows: as the comers of the mouth move towards the center of the mouth, the centers of the top and bottom lips move forward.
  • the calculated 3-D displacements of the markers can be still further modified by filtering.
  • the filtering of the calculated 3-D displacements of the face smooth out the jitter in the calculated 3-D positions that can be caused by errors in the detected 2-D positions of the markers.
  • the action vector for the current image is calculated using the following steps:
  • n is set to 9 for the global marker Nose-base 259.
  • action state z— 1 corresponds to a yawning mouth 241 and 242
  • action state z-3 corresponds to a kissing mouth 245 and 246
  • the fractional displacement / (5) is determined based on the distance between the Right-central-eyebrow 265 and Left-central-eyebrow 266 in the squeezed- eyebrows action state t-5 of the face, and in the neutral state of the face, and the distance between the detected positions of those markers:
  • is greater than 50 then it is concluded that there is a motion failure.
  • the calculated global motion of the face is in terms of the 3-D orientation vectors I 1" and J f , the 2-D centroid (c f o, x ,c f o, y ) of the face, and the camera-distance ratio ⁇ f .
  • the superscript / denotes the chronological order number for the motion values.
  • the following equations are used to convert the calculated global motion parameters into a more direct representation that uses a 3-D rotation matrix R f and a 3-D position vector T'
  • K f I f x J f
  • subscripts x, y, and z denote the x-, y-, and z- components of a vector.

Abstract

La présente invention concerne un procédé de suivi du mouvement du visage d'une personne destiné à l'animation d'un modèle tridimensionnel de ce visage ou du visage d'une autre personne. Le modèle de visage tridimensionnel précité possède tant les caractéristiques géométriques (forme) du visage de la personne que les caractéristiques de texture (couleur) de celui-ci. La forme du modèle de visage est représentée à l'aide d'un canevas triangulaire tridimensionnel (canevas géométrique) tandis que la texture du modèle de visage est représentée à l'aide d'une image composite bidimensionnelle (image de texture). Le procédé permet de suivre à la fois le mouvement global et le mouvement local du visage de la personne. Le mouvement global du visage implique une rotation et une translation du visage en trois dimensions. Le mouvement local du visage implique le mouvement en trois dimensions des lèvres, des sourcils, etc. entraîné par la parole et les expressions du visage. Les positions bidimensionnelles des traits caractéristiques du visage d'une personne et/ou des repères placés sur le visage d'une personne sont automatiquement déterminées dans une séquence chronologique d'images bidimensionnelles du visage. Le mouvement global et le mouvement local du visage sont calculés séparément à l'aide des positions bidimensionnelles déterminées des traits caractéristiques ou des repères. Le mouvement global est représenté dans une image bidimensionnelle par des vecteurs de rotation et de position tandis que le mouvement local est représenté par un vecteur d'action qui spécifie la quantité d'actions effectuées par le visage telles que le sourire dessiné par la bouche ou les sourcils qui se lèvent.
PCT/IB2001/002736 2000-10-13 2001-10-09 Procede de suivi du mouvement d'un visage WO2002031772A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/689,595 US6294157B1 (en) 1999-10-14 2000-10-13 Composition containing sapogenin
US09/689,595 2000-10-13

Publications (3)

Publication Number Publication Date
WO2002031772A2 true WO2002031772A2 (fr) 2002-04-18
WO2002031772A8 WO2002031772A8 (fr) 2002-07-04
WO2002031772A3 WO2002031772A3 (fr) 2002-10-31

Family

ID=24769119

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2001/002736 WO2002031772A2 (fr) 2000-10-13 2001-10-09 Procede de suivi du mouvement d'un visage

Country Status (1)

Country Link
WO (1) WO2002031772A2 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6664956B1 (en) 2000-10-12 2003-12-16 Momentum Bilgisayar, Yazilim, Danismanlik, Ticaret A. S. Method for generating a personalized 3-D face model
WO2006015809A2 (fr) * 2004-08-06 2006-02-16 Peters Heiko Procede de determination de position et systemes de mesure de position
WO2009056919A1 (fr) * 2007-10-30 2009-05-07 Sony Ericsson Mobile Communications Ab Système et procédé de rendu et de sélection d'une partie discrète d'une image numérique à des fins de manipulation
EP2191445A2 (fr) * 2007-09-04 2010-06-02 Sony Corporation Capture de mouvements intégrée
EP3454250A4 (fr) * 2016-05-04 2020-02-26 Tencent Technology (Shenzhen) Company Limited Procédé et appareil de traitement d'image de visage et support d'informations
WO2023146019A1 (fr) * 2022-01-25 2023-08-03 주식회사 딥브레인에이아이 Dispositif et procédé de génération d'images vocales synthétisées
WO2023153555A1 (fr) * 2022-02-14 2023-08-17 주식회사 딥브레인에이아이 Appareil et procédé de génération d'image de synthèse vocale
WO2023153554A1 (fr) * 2022-02-14 2023-08-17 주식회사 딥브레인에이아이 Appareil et procédé de génération d'image vocale synthétisée
WO2023153553A1 (fr) * 2022-02-09 2023-08-17 주식회사 딥브레인에이아이 Appareil et procédé de génération d'image vocale synthétisée

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5802220A (en) * 1995-12-15 1998-09-01 Xerox Corporation Apparatus and method for tracking facial motion through a sequence of images

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2894241B2 (ja) * 1995-04-21 1999-05-24 村田機械株式会社 画像認識装置

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5802220A (en) * 1995-12-15 1998-09-01 Xerox Corporation Apparatus and method for tracking facial motion through a sequence of images

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
COSI P ET AL: "Phonetic recognition by recurrent neural networks working on audio and visual information" SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 19, no. 3, 1 September 1996 (1996-09-01), pages 245-252, XP004013654 ISSN: 0167-6393 *
DATABASE WPI Section EI, Week 199703 Derwent Publications Ltd., London, GB; Class T01, AN 1997-031076 XP002205957 & JP 08 293026 A (MURATA KIKAI KK), 5 November 1996 (1996-11-05) *
EBIHARA K ET AL: "REAL-TIME 3-D FACIAL IMAGE RECONSTRUCTION FOR VIRTUAL SPACE TELECONFERENCING" ELECTRONICS & COMMUNICATIONS IN JAPAN, PART III - FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA. NEW YORK, US, vol. 82, no. 5, May 1999 (1999-05), pages 80-90, XP000875659 ISSN: 1042-0967 *
GUENTER, BRIAN; GRIMM, CINDY; WOOD, DANIEL; MALVAR, HENRIQUE; PIGHIN FREDRICK: "Making faces" COMPUTER GRAPHICS. PROCEEDINGS. SIGGRAPH 98 CONFERENCE PROCEEDINGS, PROCEEDINGS OF SIGGRAPH 98: 25TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, ORLANDO, FL, USA, 19-24 JULY 1998, pages 55-66, XP002205956 1998, New York, NY, USA, ACM, USA ISBN: 0-89791-999-8 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6664956B1 (en) 2000-10-12 2003-12-16 Momentum Bilgisayar, Yazilim, Danismanlik, Ticaret A. S. Method for generating a personalized 3-D face model
WO2006015809A2 (fr) * 2004-08-06 2006-02-16 Peters Heiko Procede de determination de position et systemes de mesure de position
WO2006015809A3 (fr) * 2004-08-06 2006-05-18 Peters Heiko Procede de determination de position et systemes de mesure de position
EP2191445A2 (fr) * 2007-09-04 2010-06-02 Sony Corporation Capture de mouvements intégrée
EP2191445A4 (fr) * 2007-09-04 2011-11-30 Sony Corp Capture de mouvements intégrée
WO2009056919A1 (fr) * 2007-10-30 2009-05-07 Sony Ericsson Mobile Communications Ab Système et procédé de rendu et de sélection d'une partie discrète d'une image numérique à des fins de manipulation
EP3454250A4 (fr) * 2016-05-04 2020-02-26 Tencent Technology (Shenzhen) Company Limited Procédé et appareil de traitement d'image de visage et support d'informations
WO2023146019A1 (fr) * 2022-01-25 2023-08-03 주식회사 딥브레인에이아이 Dispositif et procédé de génération d'images vocales synthétisées
WO2023153553A1 (fr) * 2022-02-09 2023-08-17 주식회사 딥브레인에이아이 Appareil et procédé de génération d'image vocale synthétisée
WO2023153555A1 (fr) * 2022-02-14 2023-08-17 주식회사 딥브레인에이아이 Appareil et procédé de génération d'image de synthèse vocale
WO2023153554A1 (fr) * 2022-02-14 2023-08-17 주식회사 딥브레인에이아이 Appareil et procédé de génération d'image vocale synthétisée

Also Published As

Publication number Publication date
WO2002031772A8 (fr) 2002-07-04
WO2002031772A3 (fr) 2002-10-31

Similar Documents

Publication Publication Date Title
US7127081B1 (en) Method for tracking motion of a face
Tjaden et al. A region-based gauss-newton approach to real-time monocular multiple object tracking
US6664956B1 (en) Method for generating a personalized 3-D face model
Bottino et al. A silhouette based technique for the reconstruction of human movement
Rhodin et al. General automatic human shape and motion capture using volumetric contour cues
Stoll et al. Fast articulated motion tracking using a sums of gaussians body model
Kakadiaris et al. Model-based estimation of 3D human motion
Plänkers et al. Tracking and modeling people in video sequences
DeCarlo et al. The integration of optical flow and deformable models with applications to human face shape and motion estimation
US9235928B2 (en) 3D body modeling, from a single or multiple 3D cameras, in the presence of motion
US6492986B1 (en) Method for human face shape and motion estimation based on integrating optical flow and deformable models
Neumann et al. Spatio-temporal stereo using multi-resolution subdivision surfaces
US20150347833A1 (en) Noncontact Biometrics with Small Footprint
JP4692526B2 (ja) 視線方向の推定装置、視線方向の推定方法およびコンピュータに当該視線方向の推定方法を実行させるためのプログラム
CN106796449A (zh) 视线追踪方法及装置
US10755433B2 (en) Method and system for scanning an object using an RGB-D sensor
JP4936491B2 (ja) 視線方向の推定装置、視線方向の推定方法およびコンピュータに当該視線方向の推定方法を実行させるためのプログラム
Plankers et al. Automated body modeling from video sequences
WO2002031772A2 (fr) Procede de suivi du mouvement d'un visage
Ribnick et al. 3D reconstruction of periodic motion from a single view
Malleson et al. Single-view RGBD-based reconstruction of dynamic human geometry
CN113298953A (zh) 用于定位铰接关节的旋转的中心的方法
He Generation of human body models
Najafi et al. Automated initialization for marker-less tracking: A sensor fusion approach
Pogalin et al. Gaze tracking by using factorized likelihoods particle filtering and stereo vision

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: C1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

CFP Corrected version of a pamphlet front page
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP