GB2360183A

GB2360183A - Image processing using parametric models

Info

Publication number: GB2360183A
Application number: GB9927311A
Authority: GB
Inventors: Rhys Andrew Newman; Charles Stephen Wiles; Mark Jonathan Williams
Original assignee: Anthropics Technology Ltd
Current assignee: Anthropics Technology Ltd
Priority date: 1999-11-18
Filing date: 1999-11-18
Publication date: 2001-09-12
Also published as: GB9927311D0

Abstract

A method and apparatus are provided for determining a set of appearance parameters representative of the appearance of an object within an image. The system employs a parametric model which relates appearance parameters to corresponding image data of the object as well as a number of matrices, each of which relates a change in the appearance parameters to an image error between image data of the object from the image and image data generated from the appearance parameters and the parametric model. The system uses these matrices to iteratively modify an initial estimate of the appearance parameters until convergence has been reached. The parametric model is preferably obtained through a principal component analysis of shape and texture data extracted from a large number of training images.

Description

2360183 1 IMAGE PROCESSING SYSTEM The present invention relates to a

method of and apparatus for image processing. The invention has particular, although not exclusive relevance to the tracking of a deformable object in a sequence of images. The invention has applications in human face tracking in video sequences and in computer animation.

is The use of parametric models for image interpretation and synthesis has become increasing popular. Cootes et al have shown in their paper entitled "Active Shape Models Their Training and Application", Computer Vision and Image Understanding, Volume 61, No. 1, January, pages 3859, 1995, how such parametric models can be used to model the variability of the shape and texture of human faces. They have mainly used these models for face recognition and tracking within video sequences, although they have also demonstrated that their model can be used to model the variability of other deformable objects, such as MRI scans of knee joints. The use of these models provides a basis for a broad range of applications since they explain the appearance of a given image in terms of a compact set of model parameters which can be used for higher levels of interpretation of the image. For example, when analysing face images, they can be used to characterise the identity, pose or expression of a face.

2 Using such models for image interpretation requires, however, a method of fitting them to new image data. This involves identifying the model parameters that generate an image which best fits (according to some measure) the new input image. Typically this problem is one of minimising the sum of squares of pixel errors between the generated image and the input image. In their paper entitled "Estimating Coloured 3D Face Models from Single Images: An Example-Based Approach" Vetter and Blanz have proposed a stochastic gradient descent optimisation technique to identify the optimum model parameters for the new image. Although this technique can give very accurate results finding the locally optimal solution, they generally get stuck in local minima since the error surface for the problem of fitting an appearance model to an image is particularly rough containing many local minima. Therefore, this minimisation technique often fails to converge on the global minimum. An additional drawback of this technique is that it is very slow requiring several minutes to achieve convergence.

A f aster, more robust technique known as the active appearance model was proposed by Edwards et al in the paper entitled "Interpreting Face Images using Active Appearance Models", published in the Third International Conference on Automatic Face and Gesture Recognition 1998, pages 300-305, Japan, April 1998. This technique 3 uses a prior training stage in which the relationship between model parameter displacements and the resulting change in image error is learnt. Although the method is much faster than direct optimisation techniques, it also requires fairly accurate initial model parameters if the search is to converge. Additionally, this technique does not guarantee that the optimum parameters will be found.

The present invention aims to provide an alternative technique for efficiently and accurately finding the model parameters that minimise the image error between the image generated by the model parameters and the actual input image.

According to one aspect, the present invention provides a method of determining a set of parameters representative of the appearance of an object, the method comprising the steps of: storing a parametric model which relates parameters to corresponding appearance data of the object; storing a non-constant function which relates a change in the parameters to an error between actual appearance data of the object and appearance data determined from the parameters and the parametric model; receiving an initial estimate of a current set of parameters for the object; determining appearance data for the object from the current set of parameters and the stored parametric model; determining an error between the actual appearance data for the object and the appearance 4 data determined from the current set of appearance parameters and the stored parametric model; determining a change in the parameters using said non-constant function and said determined image error; and updating the current set of parameters with said change in the appearance parameters.

An exemplary embodiment of the present invention will now be described with reference to the accompanying drawings in which:

Figure 1 is a schematic block diagram illustrating a general arrangement of a computer system which can be programmed to implement the present invention; Figure 2 is a block diagram of an appearance model generation unit which receives some of the image frames of a source video sequence together with a target image frame and generates therefrom an appearance model; Figure 3 is a block diagram of a target video sequence generation unit which generates a target video sequence from a source video sequence using a set of stored difference parameters; Figure 4 is a flow chart illustrating the processing steps which the target video sequence generation unit shown in Figure 3 performs to generate the target video sequence; Figure 5a is a plot which schematically illustrates the way in which an error function varies with the value of a model parameter together with a quadratic approximation of the plot; Figure 5b, is a plot illustrating the way in which the error function varies over the parameter space and also illustrates the way in which the parameter space can be partitioned to provide for more accurate optimisation; Figure 6 is a flow chart illustrating the processing steps performed during a training routine to identify an Active matrix for each parameter partition shown in Figure 5b; Figure 7 is a flow chart illustrating the processing steps performed to determine a set of appearance parameters for a current source video frame; Figure 8 is a schematic diagram illustrating a multidimensional binary tree which is used to partition a parameter space; 6 Figure 9a is a flow chart illustrating the processing steps performed during a training routine to identify an Active matrix associated with small parameter changes; Figure 9b is a flow chart illustrating the processing steps performed during a training routine to identify an Active matrix associated with medium sized parameter changes; is Figure 9c is a flow chart illustrating the processing steps performed during a training routine to identify an Active matrix associated with large parameter changes; Figure 10a shows three frames of an example source video sequence which is applied to the target video sequence generation unit shown in Figure 4; Figure 10b shows an example target image used to generate a set of difference parameters used by the target video sequence generation unit shown in Figure 4; Figure 10c shows a corresponding three frames from a target video sequence generated by the target video sequence generation unit shown in Figure 4 f rom, the three frames of the source video sequence shown in Figure 10a using the difference parameters generated using the target image shown in Figure 10b; 7 Figure 10d shows a second example of a target image used to generate a set of difference parameters for use by the target video sequence generation unit shown in Figure 4; and Figure 10e shows the corresponding three frames from the target video sequence generated by the target video sequence generation unit shown in Figure 4 when the three frames of the source video sequence shown in Figure 10a, are input to the target video sequence generation unit together with the difference parameters calculated using the target image shown in Figure 10d.

Figure 1 is an image processing apparatus according to an embodiment of the present invention. The apparatus comprises a computer 1 having a central processing unit (CPU) 3 connected to a memory 5 which is operable to store a program defining the sequence of operations of the CPU 3 and to store object and image data used in calculations by the CPU 3. Coupled to an input port of the CPU 3 there is an input device 7, which in this embodiment comprises a keyboard and a computer mouse. Instead of, or in addition to the computer mouse, another position sensitive input device (pointing device) such as a digitiser with associated stylus may be used.

A f rame buf f er 9 is also provided and is coupled to the CPU 3 and comprises a memory unit (not shown) arranged to 8 is store image data relating to at least one image, f or example by providing one (or several) memory location(s) per pixel of the image. The value stored in the f rame buffer for each pixel defines the colour or intensity of that pixel in the image. In this embodiment, the images are represented by 2-D arrays of pixels, and are conveniently described in terms of Cartesian coordinates, so that the position of a given pixel can be described by a pair of x-y coordinates. This representation is convenient since the image is displayed on a raster scan display 11. Therefore, the x-coordinate maps to the distance along the line of the display and the ycoordinate maps to the number of the line. The frame buf f er 9 has suf f icient memory capacity to store at least one image. For example, for an image having a resolution of 1000 x 1000 pixels, the frame buffer 9 includes 106 pixel locations, each addressable directly or indirectly in terms of a pixel coordinate x,y.

In this embodiment, a video tape recorder (VTR) 13 is also coupled to the frame buffer 9, for recording the image or sequence of images displayed on the display 11. A mass storage device 15, such as a hard disc drive, having a high data storage capacity is also provided and coupled to the memory 5. Also coupled to the memory 5 is a f loppy disc drive 17 which is operable to accept removable data storage media, such as a f loppy disc 19 and to transfer data stored thereon to the memory 5. The 9 memory 5 is also coupled to a printer 21 so that generated images can be output in paper form, an image input device 23 such as a scanner or video camera and a modem 25 so that input images and output images can be received from and transmitted to remote computer terminals via a data network, such as the Internet. The CPU 3, memory 5, frame buffer 9, display unit 11 and mass storage device 13 may be commercially available as a complete system, for example as an IBM compatible personal computer (PC) or a workstation such as the Spark station available from Sun Microsystems.

A number of embodiments of the invention can be supplied commercially in the form of programs stored on a floppy disc 19 or on other mediums, or as signals transmitted over a data link, such as the Internet, so that the receiving hardware becomes reconf igured into an apparatus embodying the present invention.

In this embodiment, the computer 1 is programmed to receive a source video sequence input by the image input device 23 and to generate a target video sequence from the source video sequence using a target image. In this embodiment, the source video sequence is a video clip of an actor acting out a scene, the target image is an image of a second actor and the resulting target video sequence is a video sequence showing the second actor acting out the scene. The way in which this is achieved will now be brief ly described with reference to Figures 2 to 4. A more detailed description can be found in the applicant's earlier International Application PCT/GB99/03161, the content of which is incorporated herein by reference.

In order to generate the target video sequence from the source video sequence, an appearance model which models the variability of shape and texture of the head images is used. The appearance model makes use of the fact that some prior knowledge is available about the contents of head images in order to facilitate the modelling of the head images. For example, it can be assumed that two frontal images of a human face will each include eyes, a nose and a mouth. In order to create the appearance model, a number of landmark points are identified on a training image and then the same landmark points are identified on the other training images in order to represent how the location and pixel value of the landmark points vary within the training images. A principal component analysis is then performed on the matrix which consists of vectors of the landmark points. This principal component analysis yields a set of Eigenvectors which describe the directions of greatest variation along which the landmark points change. The appearance model includes the linear combination of Eigenvectors plus parameters for translation, rotation and scaling.

In the Applicant's earlier International application mentioned above, an appearance model for the specific source video sequence and target image was determined. However, in this embodiment, an appearance model which models the variability in shape and training of all human heads is determined. This appearance model is determined in advance during a training routine using the same procedure described in the Applicant's earlier International application, but using a large collection of training images of different individuals of all nationalities and images showing the greatest variation in facial expressions and 3D pose. The same appearance model can then be used to generate a target video sequence f rom any source video sequence and any target image. Figure 2 schematically illustrates the appearance model generation unit 31 which generates the appearance model 35 using the images stored in an image database 32 and user input via the user interf ace 33. In this embodiment, all the training images are black and white images having 500 x 500 pixels, whose value indicates the luminance of the image at that point. The resulting appearance model 35 is a parameterisation of the appearance of the class of head images defined by the heads in the training images, so that a relatively small number of parameters (for example 80) can describe the detailed (pixel level) appearance of a head image from the class. In particular, the appearance model 35 defines a function (F) such that:

12 I = F(P) (1) where p is the set of appearance parameters (written in vector notation) which generates, through the appearance model (F), the face image I. For more information on this appearance model and how it can be used to parameterise an input image or generate an output image from an input set of parameters, the reader is referred to the above mentioned paper by Cootes et al, the content of which is incorporated herein by reference.

Once the appearance model 35 has been determined, a target video sequence can be generated from a source video sequence. As shown in Figure 3, the source video sequence is input to a target video sequence generation unit 51 which processes the source video sequence using a set of difference parameters 53 to generate and to output the target video sequence. The difference parameters 53 are determined by subtracting the appearance parameters which are generated for the first actor's head in one of the source video frames, from the appearance parameters which are generated for the second actor's head in the target image. The way in which these appearance parameters are determined for these images will be described later. In order that these difference parameters only represent differences in the general shape and grey level of the two actors, heads, the pose and facial expression of the first actor's head in the 13 source video frame used should match, as closely as possible, the pose and facial expression of the second actor's head in the target image.

The processing steps required to generate the target video sequence f rom the source video sequence will now be described in more detail with reference to Figure 4. As shown, in step sl, the appearance parameters (psi) for the first actor's head in the current video frame (fsi) are automatically calculated. The way that this is achieved will be described later. Then, in step s3, the difference parameters (Pdif) are added to the appearance parameters for the first actor's head in the current video frame to generate:

- i - i - Pmod Ps ' PdIf (2) The resulting appearance parameters (A.di) are then used, in step s5, to regenerate the head for the current target video f rame. In particular, the modified appearance parameters are inserted into equation (1) above to regenerate a modified head image (P) which is then composited, in step s7, into the source video f rame to generate the corresponding target video frame. A check is then made, in step s9, to determine whether or not there are any more source video frames. If there are, then the processing returns to step sl where the procedure described above is repeated for the next source 14 video frame. If there are no more source video frames, then the processing ends.

is Figure 10 illustrates the results of this animation technique. In particular, Figure 10a shows three frames of the source video sequence, Figure 10b shows the target image (which in this embodiment is computer generated) and Figure 10c shows the corresponding three frames of the target video sequence obtained in the manner described above. As can be seen, an animated sequence of the computer generated character has been generated from a video clip of a real person and a single image of the computer generated character.

AUTOMATIC GENERATION OF APPEARANCE PARAMETERS During the calculation of the difference parameters (Pdjf) and in step sl discussed above, appearance parameters for heads in input images were automatically calculated. This task involves finding the set of appearance parameters p which best describes the pixels in view. This problem is complicated because the inverse of the appearance model function F is not one-to-one. In this embodiment, the appearance parameters for the head in an input image are calculated in a two-step process. In the first step, an initial set of appearance parameters for the current head is found using a simple and rapid technique. For all but the first frame of the source video sequence, this is achieved by simply using the appearance parameters from the preceding video frame before modification in step s3 (i.e. parameters p.i-1). In this embodiment, the appearance parameters (p) effectively define the shape and grey level of the head, but they do not define the scale, position and orientation of the head within the video frame. For all but the first frame in the source video sequence, these also can be initially estimated to be the same as those for the head in the preceding frame.

For the first frame and for the target image the initial estimate of the appearance parameters is set to the mean set of appearance parameters and the scale, position and orientation is initially estimated by the user manually placing the mean head over the head in the images.

In the second step, an iterative technique is used in order to make fine adjustments to the initial estimate of the appearance parameters. The adjustments are made in an attempt to minimise the difference between the head described by the appearance parameters (the model head) and the head in the current video frame (the image head). With 80 appearance parameters, this represents a difficult optimisation problem. This can be performed by using a standard steepest descent optimisation technique to iteratively reduce the mean squared error between the given image pixels and those predicted by a particular 16 set of appearance parameter values. In particular, minimising the following error function E(P):

E (p) = [ja-F(P) IT[ I a_ F(i) (3) where Ja is a vector of actual image pixels at the locations where the appearance model predicts a value (the appearance model does not predict all pixel values since it ignores background pixels and usually only predicts a subsample of pixel values within the object being modelled). As those skilled in the art will appreciate, E(P) will only be zero when the model head (i.e. F(P)) predicts the actual image head (Ia) exactly. Standard steepest descent optimisation techniques stipulate that a step in the direction -VE(p) should result in a reduction in the error function E(P), provided the error function is well behaved. Therefore, the change (AP) in the set of parameter values should be:

Ai5 = 2 [ VF (P-) 1 T [ 1 a - F(j5)] (4) which requires the calculation of the differential of the appearance model, i.e. VF(p). In the system proposed by Blanz et al, the differential of the appearance model is calculated analytically. This technique is therefore very slow and not suited to "real time,, applications. The technique described by Edwards et al assumes that, on 17 average over the whole parameter space, VF(P) is constant. The update equation then becomes:

Ai5 = A[IF(P_) 1 (5) for some constant matrix A (referred to as the "Active matrix") which is determined beforehand during a training routine after the appearance model 35 has been determined, using the following procedure:

choose a random parameter vector P; perturb p by a small random vector to create p + Ap; iii) use the appearance model (F) to compute the model image associated with p and label this I,; iv) similarly compute the image associated with p + Ap and label this I,; V) record the actual parameter change Ap and the actual image difference I,-I,; vi) return to step (i) until sufficient data over the entire parameter space has been collected; and vii) determine Active matrix A from the data using multiple multivariate linear regressions on the data.

However, in practice, this technique does not guarantee a convergence to an error that is sufficiently small.

The reason for this is schematically illustrated in 18 Figure 5a which shows, for convenience, a plot of the error function E(P) for a single appearance parameter pr. As shown, the error function 61 has a very rough surface with many local minima and a global minimum at parameter value por. By assuming that the differential of the appearance model is constant results in the error function being effectively approximated by the dotted quadratic function 63 shown in Figure 5a. However, as shown, the minimum of this smoothed error function 63 occurs at parameter value Pe r which is of f set f rom the actual optimum parameter value p,r. As a result, the image predicted by the parameters does not exactly match the actual image.

The way in which the present embodiment alleviates these problems with the prior art techniques will now be described with reference to Figures 5b to 8. In this embodiment, rather than using a single Active matrix covering the entire space of all the input parameters, the space of the input parameters is partitioned and a separate Active matrix is calculated and used f or each partition. This is schematically illustrated in Figure 5b which shows the same error plot 61 shown in Figure 5a and that the parameter space (which is def ined by the range of values that the appearance parameter pr can take) is partitioned into six partitions partition 1 r between parameter values plr and P2, partition 2 between parameter values P2 r and P3 r, partition 3 between 19 parameter values P3r and P4r, partition 4 between parameter values P4r and p,r, partition 5 between parameter values p5r and P6 r and partition 6 between parameter values P6 r and P7 r. As those skilled in the art will appreciate, although six partitions are shown in Figure 5b, the invention is not limited to this. Partitioning the parameter space into two or more partitions may result in an improvement over the system proposed by Edwards et al.

The way in which the Active matrix for each partition is calculated in this embodiment, is similar to the technique used by Edwards et al for determining the Active matrix for the single partition and is shown in Figure 6. In particular, in step s21, the system chooses a random parameter vector (p) and determines the partition in which it is located. Then, in step s23, the system perturbs the parameter vector p by a small random amount to create p + Ap. The processing then proceeds to step s25 where the system uses the parameter vector p and the perturbed parameter vector p + AP to create model images I. and I, respectively. The processing then proceeds to step s27 where the system records the parameter change Ap and image difference I, - I. and labels them with the partition determined in step s21. Then in step s29, the system determines whether there is sufficient training data in each partition. If there is not then the processing returns to step s21. Once sufficient training data has been generated, the processing proceeds to step s31 where the system performs multiple multivariate linear regressions on the data in each partition to identify an Active matrix for each parameter partition. In the illustration shown in Figure 5b, this results in the generation of six Active matrices which effectively results in the approximation of the error function to the piece wise linear function 65 shown in Figure 5b. As shown, the minimum of this piece wise linear function more accurately corresponds to the minimum of the true error function E(P).

Once the Active matrices have been determined they are used in the automatic calculation of the appearance parameters for a head in an input image. The way in which this is achieved for the source video sequence is illustrated in the flow chart of Figure 7. As shown, in step s41 the system determines the initial estimate of the appearance parameters for the head in the current source video frame. The processing then proceeds to step s43 where the system determines in which partition of the parameter space the current appearance parameters lie, in order to determine which Active matrix to use in the determination, in step s45, of the change (AP) in appearance parameters using equation (5) above. The processing then proceeds to step s47 where the change in parameters is added to the current appearance parameters. The system then determines, in step s49, whether or not 21 convergence has been reached by comparing the image error for the new set of parameters with a predetermined threshold (Th). If convergence has not been reached, then the processing returns to step s49. Once convergence is reached, the processing proceeds to step s51 where the current appearance parameters are output as the appearance parameters for the current source video frame and then the processing ends.

PARTITIONING THE PARAMETER SPACE In the above description, it is assumed that the parameter space has been partitioned. There are many dif f erent ways in which this can be done. A naive way to partition the parameter space would be to divide the range of variation of each appearance parameter into equal partitions and to provide an active matrix for each possible parameter partition combination. However, for a practical application, such a partitioning technique would result in too many active matrices to consider. For example, if there are 80 appearance parameters and if the range of variation of each parameter is divided into 6 partitions, then this would involve 680 Active matrices.

Therefore, in this embodiment, the partitioning of the parameter space is determined during the training of the appearance model 35 using user defined labels. In particular, each training image is labelled with 22 identifying information which describes the person in the image. For example, a training image might be labelled "young white Caucasian male with dark hair". Then the system uses a linear discriminant analysis of the training data to identify a binary tree structure which separates the training images into different categories. In particular, in this embodiment, the system performs a linear discriminant analysis on all the training data to identify the hyper-plane within the parameter space which separates male training examples from female training examples. The equation for this hyper-plane (which is given in equation (6) below) is then stored.

alp l+a 2P 2 + a3P 3+a4P 4+.. a 80P 80 =K, (6) where a,, a2... a80 and K, are constants and p', p2... p110 are the appearance parameters. The systemthen performs another linear discriminant analysis on the data for the male training images to identify the hyper-plane which separates white Caucasians from black Africans and stores the corresponding hyper-plane. A similar analysis is performed on the female training images. In this way, the parameter space is hierarchically divided up by a multidimensional binary tree, with an Active matrix being associated with the end of each leaf of the tree. This is schematically illustrated for this embodiment in Figure 8. By hierarchically dividing the training data in this way a simple technique is provided to identify the partitions in the training data and, from the data in the partitions, the Active matrix for each partition. This technique also allows the determination of the partition in which a given set of appearance parameters belongs, by inserting the appearance parameters into the hyper-plane equations, starting at the top of the binary tree with the hyper-plane equation which separates males from females and progressing down the appropriate branches until the appropriate partition is found.

ALTERNATIVE EMBODIMENTS In the first embodiment, the parameter space was partitioned using a multi-dimensional binary tree. As those skilled in the art will appreciate, there are various other ways in which the parameter space can be partitioned. For example, a standard clustering algorithm could be used to identify clusters within the training data. An Active matrix for each cluster could then be calculated. Alternatively, a neural network may be trained and used to identify the partitions and subsequently to classify an input set of parameters into one of the partitions.

In the above embodiment, the appearance parameter space was partitioned into a plurality of partitions and a respective Active matrix was assigned to each partition. The particular Active matrix used during an iterative 24 routine for updating a set of appearance parameters for an input image was then determined by determining the partition in which the current set of appearance parameters belongs. Rather than storing Active matrices f or a partition, an Active matrix may be stored f or a particular point in the parameter space and then f or a given set of appearance parameters, the function used to relate the image error to the change in appearance parameters can be calculated from a weighted combination of the Active matrices in the vicinity of the point in the parameter space defined by the current set of appearance parameters. In such an embodiment, the weighting applied to the matrices can depend upon the distance between the point in parameter space corresponding to the current set of appearance parameters and the point in parameter space associated with the respective Active matrices.

In the above embodiment, the appearance parameter space was partitioned into a plurality of partitions and a respective Active matrix was assigned to each partition. By partitioning the parameter space in this way, a set of appearance parameters describing an input image can be determined relatively accurately and quickly as compared with the technique proposed by Edwards et al. Another problem with the technique proposed by Edwards et al is that the Active matrix is determined by learning the relationship between small parameter changes and corresponding image errors. As a result, the parameter changes obtained using their appearance model are relatively small and therefore many iterations are necessary before convergence is reached. In this embodiment, to overcome this problem, rather than having a single Active matrix associated with small parameter changes, several Active matrices are used, each associated with a differen amount of parameter change.

Figures 9a, 9b and 9c illustrate the processing steps involved in determining these different Active matrices. In particular, Figure 9a shows the processing steps s61 to s71 for generating an Active matrix associated with small parameter changes. AS those skilled in the art will appreciate, this is achieved by ensuring that in step s63, the random parameter vector p is perturbed by small random amounts. Similarly, Figure 9b shows the processing steps s73 to s83 required to generate an Active matrix associated with medium sized parameter changes and Figure 9c shows the processing steps s85 to s95 required to generate an Active matrix associated with large parameter changes. As can be seen from a comparison of Figures 9a, 9b and 9c with Figure 6, the technique for generating each of these Active matrices is similar to the technique described above in the first embodiment. In use, the Active matrix associated with large parameter changes would initially be used, followed by the Active matrix associated with medium sized 26 parameter changes and followed after that by the Active matrix associated with small parameter changes. In this way ' convergence can be reached quicker whilst maintaining accuracy of the resulting appearance parameters.

As those skilled in the art will appreciate, the technique described above in this second embodiment can be used in addition to the technique described in the first embodiment. In this way, Active matrices would be generated for each partition which characterise not only the parameter changes typical of the partition but also different magnitudes of such changes within each partition. Therefore, the Active matrix used at any stage of the minimisation process can be controlled not only by the current values of the appearance parameters but also by the desired magnitude of parameter change (initially large and then getting smaller).

In the above embodiments, a plurality of Active matrices were determined and used in order to improve the accuracy and/or speed of the minimisation process proposed by Edwards et al. Rather than using Active matrices, which define a linear relationship between the image error and the change in the appearance parameters, one or more nonlinear functions may be used instead. For example, a neural network may be used to define the relationship 27 between the image error and the change in appearance parameters.

In the above embodiments, multiple Active matrices were determined and used in order to improve the accuracy and/or speed of the minimisation process proposed by Edwards et al. As an alternative, rather than using multiple Active matrices, the technique proposed by Edwards et al could be used in an initial minimisation process followed by a more accurate error minimisation technique which uses the true error surface (by for example using gradient descent) to find the optimum solution. With this combined technique, an accurate determination of the appearance parameters for a given input image can be determined relatively quickly as compared with the technique described by Blanz et al.

In the first embodiment, the target image illustrated a computer generated head. This is not essential. For example, the target image might be a handdrawn head or an image of a real person. Figures 10d and 10e illustrate how an embodiment with a hand-drawn character might be used in character animation. In particular, Figure 10d shows a hand-drawn sketch of a character which, when combined with the images from the source video sequence (some of which are shown in Figure 10a) generate a target video sequence, some frames of which are shown in Figure 10e. As can be seen from a 28 comparison of the corresponding f rames in the source and target video frames, the hand-drawn sketch has been animated automatically using this technique. As those skilled in the art will appreciate, this is a much quicker and simpler technique for achieving computer animation as compared with existing systems which require the animator to manually create each frame of the animation. In particular, in this embodiment, all that is required is a video sequence of a real life actor acting out the scene to be animated, together with a single sketch of the character to be animated.

The above embodiment has described the way in which a target image can be used to modify a source video sequence. In order to do this, a set of appearance parameters has to be automatically calculated for each frame in the video sequence. This involved the use of a number of Active matrices which relate image errors to appearance parameter changes. As those skilled in the art will appreciate, similar processing is required in other applications, such as the tracking of an object within a video sequence, such as the tracking of a human f ace within a video sequence or the tracking of a knee joint in an MRI scan.

In the above embodiment, the appearance model was used to model the variations in facial expressions and 3D pose of human heads. As those skilled in the art will 29 appreciate, the appearance model can be used to model the appearance of any deformable object such as parts of the body and other animals and objects. For example, the above techniques can be used to track the movement of lips in a video sequence. Such an embodiment could be used in film dubbing applications in order to synchronise the lip movements with the dubbed sound. This animation technique might also be used to give animals and other objects human-like characteristics by combining images of them with a video sequence of an actor. This technique can also be used for monitoring the shape and appearance of objects passing along a production line for quality control purposes.

In the above embodiment, the appearance model was generated by using a principal component analysis of shape and grey level data which is extracted from the training images. As those skilled in the art will appreciate, by modelling the features of the training heads in this way, it is possible to accurately model each head by just a small number of parameters. However, other modelling techniques, such as vector quantisation and wavelet techniques can be used.

In the above embodiments, the training images used to generate the appearance model were all black and white images. As those skilled in the art will appreciate, the appearance model can be generated from colour images. In such an embodiment, instead of sampling the grey level of the heads within the training images, the colour embodiment would sample each of the red, green and blue values at the corresponding points. Further, as those skilled in the art will appreciate, the way in which the colour is represented in such an embodiment is not important. In particular, rather than each pixel having red, green and blue value, they might be represented by chrominance and a luminance component or by hue, saturation and value components. These other colour embodiments would be simpler than the red, green and blue embodiment, since the image search which is required during the automatic calculation of the appearance parameters could be performed using only the luminance or value component. In contrast, with the red, green and blue colour embodiment, each of these terms would have to be considered in the image search.

In the above embodiment, during the automatic generation of the appearance parameters, and in particular during the iterative updating of these appearance parameters the error between the input image and the model image was generated using the appearance model. Since this iterative technique still requires a relatively accurate initial estimate for the appearance parameters, it is possible to initially perform the iterations using lower resolution images and once convergence has been reached for the lower resolutions to then increase the resolution 31 of the images and to repeat the iterations f or the higher resolutions. In such an embodiment, separate Active matrices would be required for each of the resolutions.

In the above embodiment, the difference parameters were determined by comparing the image of the first actor from one of the frames of the source video sequence with the image of the second actor in the target image. In an alternative embodiment, a separate image of the f irst actor may be provided which does not form part of the source video sequence.

In the above embodiments, each of the images in the source video sequence and the target image were twodimensional images. The above technique could be adapted to work with 3D modelling and animations. In such an embodiment, the training images used to generate the appearance model would also have to be 3D images instead of 2D images. The threedimensional models may be obtained using a three dimensional scanner which typically work either by using laser range-finding over the object or by using one or more stereo pairs of cameras. Once a 3D appearance model has been created from the training models, new 3D models can be generated by adjusting the appearance parameters and existing 3D models can be animated using the same differencing technique that was used in the two-dimensional embodiment described above.

32 is In the above embodiment, a set of difference parameters were identified which describe the main differences between the head in the video sequence and the head in the target image, which difference parameters were used to modify the video sequence so as to generate a target video sequence showing the second head. In the embodiment, the set of difference parameters were added to a set of appearance parameters for the current frame being processed. In an alternative embodiment, the difference parameters may be weighted so that, for example, the target video sequence shows a head having characteristics from both the first and second actors.

In the above embodiment, an iterative process was used to update an estimated set of appearance parameters for an input image. This iterative process continued until an error between the actual image and the image predicted by the model was below a predetermined threshold. In an alternative embodiment, where there is only a predetermined amount of time available for determining a set of appearance parameters for the input image, this iterative routine may be performed for a predetermined period of time or for a predetermined number of iterations.

33

Claims

CLAIMS:

1. A method of determining a set of appearance parameters representative of the appearance of an object within an input image, the method comprising the steps of:

(i) storing a parametric model appearance parameters to corresponding object; (ii) storing a non-constant function which relates a change in the appearance parameters to an image error between image data of the object from the input image and image data of the object determined from a set of appearance parameters and said parametric model; (iii) initially estimating a current set of appearance parameters for the object in the input image; (iv) determining image data for the object from the current set of appearance parameters and said stored parametric model; (v) determining an image error between the image data determined for the current set of appearance parameters and image data of the object from the input image; (vi) determining a change in the appearance parameters using said non- constant function and said determined image error; and (vii) updating the current set of appearance parameters with said change in the appearance parameters.

which relates image data of the 34

2. A method according to claim 1, wherein said non- constant function is non-constant with respect to time.

3. A method according to claim 2, wherein said nonconstant function is defined by a plurality of data matrices, each of which relates a change in the appearance parameters to an image error and further comprising the step of selecting one of said stored data matrices and wherein said step of determining a change in the appearance parameters uses said selected one of said data matrices.

4. A method according to claim 3, further comprising the step of repeating steps (iv) to (vii) in order to reduce the determined error.

5. A method according to claim 4, wherein said selecting step selects said one of said stored data matrices in dependence upon the number of times said steps (iv) to (vii) have been repeated.

6. A method according to claim 5, wherein the data matrix selected in said selecting step for initial repeats of said steps (iv) to (vii) are operable to generate larger changes in the appearance parameters than the data matrix selected for later repeats of said steps (iv) to (vii).

7. A method according to any preceding claim, wherein said non-constant function is non-constant with respect to the values of the appearance parameters.

8. A method according to claim 7, wherein said nonconstant function is defined by a plurality of data matrices, each of which relates a change in the appearance parameters to an image error and further comprising the step of selecting one of said stored data matrices and wherein said step of determining a change in the appearance parameters uses said selected one of said data matrices and wherein said selecting step selects said one of said stored data matrices in dependence upon the current set of appearance parameters.

9. A method according to claim 8, wherein a parameter space, defined by the range of values each appearance parameter can have, is partitioned into a plurality of partitions and at least one of said data matrices is associated with each partition, and wherein said selecting step selects said one of said stored data matrices in dependence upon the partition in which the current set of appearance parameters is located.

10. A method according to claim 9, further comprising the steps of storing a mathematical function which identifies each partition and wherein said selecting step identifies the partition in which the current set of 36 appearance parameters is located by applying them to said stored mathematical function.

11. A method according to claim 10, wherein said mathematical function defines hyper-planes within said parameter space.

12. A method according to claim 10 or 11, wherein said mathematical function comprises a multi-dimensional binary decision tree which operates to hierarchically divide said parameter space into said partitions.

13. A method according to any preceding claim, wherein said parametric model is determined in advance during a training routine in which a statistical analysis is performed of a pointwise correspondence between corresponding points on the object in each of a plurality of training images.

14. A method according to claim 13, wherein said statistical analysis includes a principal component analysis of said pointwise correspondences.

15. A method according to claim 13 or 14, wherein shape and texture data are extracted from said training images to generate said parametric model.

16. A method according to claim 13, 14 or 15, wherein 37 each of said data matrices is determined during said training routine.

17. A method according to any of claims 13 to 16, wherein said estimating step estimates said current set of appearance parameters to be a mean set of appearance parameters determined during said training routine for the training images.

18. A method according to any preceding claim, wherein said image data of the object from the input image comprises a plurality of sampled pixel values from the input image.

19. A method according to any preceding claim, wherein said image error comprises a vector of pixel errors, wherein the parameter change for each appearance parameter is determined from a weighted sum of the pixel errors in said image error and wherein the weightings are defined by data in the non-constant function.

20. A method according to claim 19 when dependent upon claim 3 or claim 8, wherein the change in appearance parameters are determined by calculating:

AD = AAR where AD is a vector of appearance parameter changes, A is the selected data matrix and AR is the vector of pixel errors.

38

21. A method according to any preceding claim, wherein said object is a deformable object.

22. A method according to claim 21, wherein said deformable object comprises a face or a part thereof.

23. A method according to claim 1, wherein said nonconstant function is a non-linear function.

24. A method according to claim 23, wherein said nonlinear function comprises a neural network.

25. A method according to any preceding claim, wherein said parametric model models the two-dimensional appearance of the object.

26. A method according to any of claims 1 to 24, wherein said parametric model models the three-dimensional appearance of said object.

27. A method according to any preceding claim, wherein said parametric model models the shape of the object.

28. A method of determining a set of appearance parameters representative of the appearance of an object, the method comprising the steps of:

(i) storing a parametric model which relates appearance parameters to a determined appearance of the 39 object; (ii) storing a non-constant function which relates a change in the appearance parameters to an error between the actual appearance of the object and the appearance of the object determined f rom a set of appearance parameters and the parametric model; (iii) receiving an initial estimate of a current set of appearance parameters for the object; (iv) determining the appearance of the object from the current set of appearance parameters and the stored parametric model; (v) determining an error between the actual appearance of the object and the appearance of the object determined from the current set of appearance parameters; (vi) determining a change in the appearance parameters using said non- constant function and said determined error; and (vii) updating the current set of appearance parameters with said change in the appearance parameters.

29. A method of determining a set of appearance parameters representative of the appearance of an object within an input image, the method comprising the steps of:

(i) storing a parametric model which relates appearance parameters to corresponding image data of the object; (ii) storing data defining a plurality of sets of weights, each set of weights relating a change of each parameter in a set of appearance parameters to a plurality of image errors between image data of the object from the input image and image data of the object determined from the set of appearance parameters and said parametric model; (iii) initially estimating a current set of appearance parameters for the object in the input image; (iv) determining image data for the object from the current set of appearance parameters and said stored parametric model; (v) determining said plurality of image errors between the image data determined for the current set of appearance parameters and image data of the object from the input image; (vi) selecting one of said sets of weights; (vii) determining a change to each parameter in the current set of appearance parameters using said selected set of weights and said plurality of image errors; and (viii) updating each parameter in the current set of appearance parameters with the respective change thereto determined in step (vii).

30. A method of tracking an object within an input sequence of images, the method comprising the steps of:

(i) storing a parametric model which relates appearance parameters to corresponding image data of the object; 41 (ii) storing a Plurality of data matrices, each of which relates a change in the appearance parameters to an image error between image data of the object from the input image and image data of the object determined from a set of appearance parameters and said parametric model; (iii) initially estimating a current set of appearance parameters for the object in a current image; (iv) determining image data for the object from the current set of appearance parameters and said stored parametric model; (v) determining an image error between the image data generated for the current set of appearance parameters and image data of the object from the current image; (vi) selecting one of said stored data matrices; (vii) determining a change in the appearance parameters using said selected one of said data matrices and said determined image error; (viii) updating the current set of appearance parameters with said change in the appearance parameters; (ix) repeating steps (iv) to (viii) until the determined change in the appearance parameters is less than a predetermined threshold; and (x) repeating steps (iii) to (ix) for the next image in the sequence of images until there are no more images in the sequence.

31. A method according to claim 30, wherein said 42 initially estimating step estimates the current set of appearance parameters for the object in the current image as being the appearance parameters determined for the preceding image in the sequence, for all images except the first image in the sequence.

32. A method of determining a set of appearance parameters representative of the appearance of an object within an input image, the method comprising the steps of:

(i) storing a parametric model which relates appearance parameters to corresponding image data of the object; (ii) storing a data matrix which relates a change in the appearance parameters to an image error between image data of the object from the input image and image data of the object determined from a set of appearance parameters and said parametric model; (iii) initially estimating a current set of appearance parameters for the object in the input image; (iv) determining image data for the object from the current set of appearance parameters and said stored parametric model; (v) determining an image error between the image data determined for the current set of appearance parameters and image data of the object from the input image; (vi) determining a change in the appearance 43 parameters using said data matrix and said determined image error; (vii) updating the current set of appearance parameters with said change in the appearance parameters; (viii) repeating steps (iv) to (vii) until the determined change in the appearance parameters is less than a first predetermined threshold; (ix) determining a differential of the parametric model for the current set of appearance parameters and using this differential to calculate a further change in the appearance parameters; (x) updating the current set of appearance parameters with the change in the appearance parameters determined in step (ix); and (xi) repeating steps (ix) and (x) until the determined change in the appearance parameters is less than a second predetermined threshold which is less than said first predetermined threshold.

33. A method according to any preceding claim, wherein the steps are performed in the order in which they are claimed.

34. A storage medium storing processor implementable instructions for causing a processor to implement the method of any one of claims 1 to 33.

35. Processor implementable instructions for causing a 44 processor to implement the method of any one of claims 1 to 33.

36. An apparatus for determining a set of appearance parameters representative of the appearance of an object within an input image, the apparatus comprising:

means for storing: (i) a parametric model which relates appearance parameters to corresponding image data of the object; and (ii) a nonconstant function which relates a change in the appearance parameters to an image error between image data of the object f rom the input image and image data of the object determined from a set of appearance parameters and said parametric model; means for receiving an initial estimate of a current set of appearance parameters for the object in the input image; and means for updating the current set of appearance parameters comprising:

(i) means for determining image data for the object from the current set of appearance parameters and said stored parametric model; (ii) means for determining an image error between the image data determined for the current set of appearance parameters and image data of the object from the input image; (iii) means for determining a change in the appearance parameters using said stored non-constant function and said determined image error; and (iv) means for updating the current set of appearance parameters with said change in the appearance parameters.

37. An apparatus according to claim 36, wherein said non-constant function is non-constant with respect to time.

38. An apparatus according to claim 37, wherein said non-constant function is defined by a plurality of data matrices, each of which relates a change in the appearance parameters to an image error and further comprising means for selecting one of said stored data matrices and wherein said means for determining a change in the appearance parameters uses said selected one of said data matrices.

39. An apparatus according to claim 38, wherein said updating means is operable to update iteratively the current set of appearance parameters until the determined change in the appearance parameters is less than a predetermined threshold.

40. An apparatus according to claim 39, wherein said selecting means is operable to select said one of said stored data matrices in dependence upon the number of times said updating means has updated the current set of appearance parameters.

46

41. An apparatus according to claim 40, wherein the data matrix selected by said selecting means for initial updates of said appearance parameters are operable to generate larger changes in the appearance parameters than the data matrix selected by said selecting means for later updates of said appearance parameters.

42. An apparatus according to any of claims 36 to 41, wherein said nonconstant function is non-constant with respect to the values of the appearance parameters.

43. An apparatus according to claim 42, wherein said non-constant function is defined by a plurality of data matrices, each of which relates a change in the appearance parameters to an image error and further comprising means for selecting one of said stored data matrices and wherein said means for determining a change of the appearance parameters uses said selected one of said data matrices and wherein said selecting means is operable to select said one of said stored data matrices in dependence upon the current set of appearance parameters.

44. An apparatus according to claim 43, wherein a parameter space, defined by the range of values each appearance parameter can have, is partitioned into a plurality of partitions and at least one of said data matrices is associated with each partition, wherein said 47 updating means f urther comprises means f or determining the partition which the current set of appearance parameters is located and wherein said selecting means is operable to select said one of said stored data matrices in dependence upon the partition in which the current set of appearance parameters is located.

45. An apparatus according to claim 44, wherein said storing means stores a mathematical function which identifies each partition and wherein said updating means is operable to identify the partition in which the current set of appearance parameters is located by applying them to said stored mathematical function.

46. An apparatus according to claim 45, wherein said mathematical function def ines hyper-planes within said parameter space.

47. An apparatus according to claim 45 or 46, wherein said mathematical function comprises a multi -dimensional binary decision tree which operates to hierarchically divide said parameter space into said partitions.

48. An apparatus according to any of claims 36 to 47, further comprising training means for determining said parametric model by performing a statistical analysis of a pointwise correspondence between corresponding points on the object in each of a plurality of training images.

48

49. An apparatus according to claim 48, wherein said training means is operable to perform a principal component analysis of said pointwise correspondences.

50. An apparatus according to claim 48 or 49, wherein said training means is operable to extract shape and texture data from said training images to generate said parametric model.

51. An apparatus according to claim 48, 49 or 50, wherein training means is operable to determine each of said data matrices.

52. An apparatus according to any of claims 48 to 51, wherein said estimating means is operable to estimate said current set of appearance parameters to be a mean set of appearance parameters determined by said training means for the training images.

53. An apparatus according to any of claims 36 to 52, wherein said image data of the object from the input image comprises a plurality of sampled pixel values from the input image.

54. An apparatus according to any of claims 36 to 55, wherein said image error comprises a vector of pixel errors, wherein the parameter change determining means is operable to determine the parameter change for each 49 appearance parameter f rom a weighted sum of the pixel errors in said image error and wherein the weightings are defined by data in the nonconstant function.

55. An apparatus according to claim 54 when dependent upon claim 38 or 43, wherein the parameter change determining means is operable to determine the change in appearance parameters by calculating:

AP = ALE where Ap is a vector of appearance parameter changes, A is the selected data matrix and AR is the vector of pixel errors.

56. An apparatus according to any of claims 36 to 55, wherein said object is a deformable object.

57. An apparatus according to claim 56, wherein said deformable object comprises a face or a part thereof.

58. An apparatus according to claim 36, wherein said non-constant function is a non-linear function.

59. An apparatus according to claim 58, wherein said non-linear function comprises a neural network.

60. An apparatus according to any of claims 36 to 59, wherein said parametric model models the two-dimensional appearance of the object.

61. An apparatus according to any of claims 36 to 59, wherein said parametric model models the threedimensional appearance of said object.

62. An apparatus according to any of claims 36 to 61, wherein said parametric model models the shape of the object.

63. An apparatus for determining a set of appearance parameters representative of the appearance of an object within an input image, the apparatus comprising:

means for storing (i) a parametric model which relates appearance parameters to corresponding image data of the object; and (ii) data defining a plurality of sets of weights, each set of weights relating a change of each parameter in a set of appearance parameters to a plurality of image errors between image data of the object from the input image and image data of the object determined from the set of appearance parameters and said parametric model; means for receiving an initial estimate of a current set of appearance parameters for the object in the input image; and means for updating the current set of appearance parameters comprising:

(i) means for determining image data for the object from the current set of appearance parameters and said stored parametric model; 51 (ii) means for determining said plurality of image errors between the image data determined f or the current set of appearance parameters and image data of the object from the input image; (iii) means for selecting one of said sets of weights; (iv) means for determining a change to each parameter in the current set of appearance parameters using said selected set of weights and said plurality of image errors; and (v) means for updating each parameter in the current set of appearance parameters with the respective change thereto determined by said parameter change determining means.

64. An apparatus for tracking an object within an input sequence of images, the apparatus comprising:

means for storing (i) a parametric model which relates appearance parameters to corresponding image data of the object; and (ii) a nonconstant function which relates a change in the appearance parameters to an image error between image data of the object from the input image and image data of the object determined from a set of appearance parameters and said parametric model.

means for receiving an initial estimate of a current set of appearance parameters for the object in a current image; and means for updating the current set of appearance 52 parameters comprising:

(i) means for determining image data for the object from the current set of appearance parameters and said stored parametric model; (ii) means for determining an image error between the image data generated for the current set of appearance parameters and image data of the object from the current image; (iii) means for determining a change in the appearance parameters using said stored non-constant function and said determined image error; and (iv) means for updating the current set of appearance parameters with said change in the appearance parameters; wherein said updating means is operable to update iteratively the current set of appearance parameters in order to reduce the image error determined by said image error determining means, wherein said receiving means is operable to receive an initial estimate of each image in the sequence of images and wherein said updating means is operable to update the current set of appearance parameters for each image in the sequence of images.

65. An apparatus according to claim 64, wherein said updating means is operable to update the current set of. appearance parameters for a current image from the sequence of images after it has iteratively updated the appearance parameters for the preceding image in the 53 sequence of images.

66. An apparatus according to claim 64 or 65, wherein said receiving means is operable to receive the set of appearance parameters for the object in the preceding image in the sequence of images as the estimate of the current set of appearance parameters for the object in the current image.

67. An apparatus for determining a set of appearance parameters representative of the appearance of an object within an input image, the apparatus comprising:

means for storing (i) a parametric model which relates appearance parameters to corresponding image data of the object; and (ii) a data matrix which relates a change in the appearance parameters to an image error between image data of the object from the input image and image data of the object determined from a set of appearance parameters and said parametric model; means for receiving an initial estimate of a current set of appearance parameters for the object in the input image; and first parameter update means for updating the current set of appearance parameters comprising:

(i) means for determining image data for the object from the current set of appearance parameters and said stored parametric model; (ii) means for determining an image error between 54 the image appearance the input (iii) appearance determined is data determined for the current set of parameters and image data of the object from image; means for determining a change in the parameters using said data matrix and said image error; and (iv) means for updating the current set of appearance parameters with said change in the appearance parameters; wherein said updating means is operable to update iteratively the current set of appearance parameters in order to reduce the determined error; and second parameter update means for updating the current set of appearance parameters after said first parameter updating means has updated the current set of appearance parameters, comprising:

means for determining a measure of the differential of the parametric model for the current set of appearance parameters and for using this measure to calculate a further change in the appearance parameters; and means for updating the current set of appearance parameters with the further change in the appearance parameters; and wherein said second parameter update means is operable to update iteratively the current set of appearance parameters in order to reduce the determined error.