WO2009143163A2  Face relighting from a single image  Google Patents
Face relighting from a single imageInfo
 Publication number
 WO2009143163A2 WO2009143163A2 PCT/US2009/044533 US2009044533W WO2009143163A2 WO 2009143163 A2 WO2009143163 A2 WO 2009143163A2 US 2009044533 W US2009044533 W US 2009044533W WO 2009143163 A2 WO2009143163 A2 WO 2009143163A2
 Authority
 WO
 Grant status
 Application
 Patent type
 Prior art keywords
 face
 images
 method
 field
 function
 Prior art date
Links
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
 G06T15/00—3D [Three Dimensional] image rendering
 G06T15/50—Lighting effects

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
 G06K9/00268—Feature extraction; Face representation

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
 G06K9/46—Extraction of features or characteristics of the image
 G06K9/4661—Extraction of features or characteristics of the image related to illumination properties, e.g. according to a reflectance or lighting model
Abstract
Description
FACE RELIGHTING FROM A SINGLE IMAGE
[001] Statement Regarding Federally Sponsored Research Or Development
[002] There is no federal government sponsorship associated with this invention.
[003] Technical Field
[004] The present invention relates to face recognition, facial relighting, and specifically to methods and techniques for synthesizing facial images under novel illumination conditions.
[005] Background Art
[006] Due to important applications like face recognition and facial relighting, synthesis of facial images under novel illumination conditions has attracted immense interest, particularly in the fields of computer vision and computer graphics. The challenge presented is the following: given only a few example images of a face, generate images of that face under novel illumination conditions. This challenge is particularly difficult when only one example image is available, which is the most common and realistic scenario in the very important application of face recognition. This special circumstance is a more difficult scenario than the typical graphics relighting problem, which generally does not have a limitation on the number of example images that can be considered. Solving this challenge is especially attractive because if multiple images under novel illumination can be generated from a single example image, the images can be used to enhance recognition performance of any learning based face recognition method.
[007] The literature is replete with various proposals to solve this challenge. However, each of these existing solutions works only under certain assumptions (e.g. the convexLambertian assumption) or requires specific kinds of data (e.g. 3D face scans) and/or manual intervention. Thus, it is important to compare these methods in the light of these assumptions and requirements and not just by the claimed results. The method of the present invention produces results which are better or comparable to those of the existing methods, even though it works under an extremely emaciated set of requirements. It is a completely automatic method which works with a single 2D image, does not require any 3D information, seamlessly handles cast shadows and specularities (i.e. does not make a convexLambertian assumption) and does not require any specially acquired information (i.e. works well with existing benchmark databases like Extended Yale B).
[008] The convexLambertian assumption is inaccurate as human faces are neither exactly Lambertian nor exactly convex. It is common to see cast shadows (e.g. in the perinasal region due to nonconvexity) and specularities (e.g. oily forehead and nose tip due to non Lambertianess) on facial images. Any method which fails to take into account these inaccuracies is clearly limited in its applicability. Furthermore, some of these methods end up using 3 dimensional information that is expensive to acquire and/or require undesirable manual intervention. Though the cost of acquiring 3 dimensional geometry is decreasing, most of the existing benchmark face databases (e.g. Extended Yale B and CMU PIE) consist of a single or multiple 2dimensional face images and therefore, it is less pragmatic, if not less accurate, to use 3dimensional information as input to systems dealing with facial illumination problems. Furthermore, recent systems that do use 3D information directly (based on the morphable models), require manual intervention at various stages which is clearly undesirable. At the same time, techniques which require specially acquired 2D information or an exorbitant amount of 2D information are also not attractive. Hence a method which does not make these limiting assumptions and still produces good results is highly desirable.
[009] Disclosure of Invention
[010] The present invention provides a novel antisymmetric higherorder Cartesian tensor spline based method for the estimation of the Apparent Bidirectional Reflectance Function (ABRDF) field for human faces that seamlessly accounts for specularities and cast shadows.
[011] Brief Description of Drawings
[012] Fig. 1 shows a plot of an ABRDF function according to varying methodologies
[013] Fig. 2 shows images under novel illumination directions synthesized from the estimated ABRDF field; [014] Fig. 3 is depicts an estimation of facial features that arise from cast shadows and speculatiries according the present invention
[015] Fig. 4 is a comparison of the present invention with two other known methods;
[016] Fig. 5 depicts the registration results of the present invention and provides a comparison to two known methods;
[017] Fig. 6 depicts an image under novel illumination directions synthesized from a single example image according to the method of present invention.
[018] Figs. 7 and 8 depict the present invention as applied to several reference human faces; [019] Fig. 8 depicts the present invention as applied to several reference human faces;
[020] Fig. 9 is a quantitative comparison of the method of the present invention according to a varying number of reference images;
[021] Fig. 10 is an illustration of the method of the present invention using a 3^{r} order antisymmetric tensor spline estimation;
[022] Fig. 11 is a plot of the lighting directions of images in the training sets;
[023] Fig. 12 is a comparison of the average intensity value errors of the Lambertian model and the method of present invention;
[024] Fig. 13 depicts synthesized images under several different lighting directions for a randomly selected subject.;
[025] Fig. 14 depicts a comparison of the synthesized images using the Lambertian model and the method of the present invention; [026] Fig. 15 depicts the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows;
[027] Fig. 16 is an intensity value error comparison of the method of the present invention and several known methods;
[028] Modes of Carrying Out the Invention
[029] The present invention is composed of two stages. The first stage comprises learning the Apparent Bidirectional Reflectance Function field of a reference face using its nine images taken under different illumination conditions. The ABRDF is a spherical function that gives the image intensity value at each pixel in each illumination direction. Three novel methods are set forth below for estimating this ABRDF field from nine or, if available, more images.
[030] The second stage comprises transferring the ABRDF field from a reference face to a new target face using just one 2dimensional image of the target face using a novel ABRDF transfer algorithm. Hence, once the reference ABRDF field has been captured, images of a novel face under a new illumination direction can be rendered by first transferring the ABRDF field and then sampling the field in the appropriate illumination direction.
[031] Learning the ABRDF Field Using Tensor Splines
[032] The present invention provides a novel antisymmetric higherorder Cartesian tensor spline based method for the estimation of the ABRDF field for human faces that seamlessly accounts for specularities and cast shadows.
[033] Spherical Functions Modeled as Tensors
[034] In general, a spherical function can be approximated by a nthorder Cartesian tensor, which can be expressed in the following form:
(1) where v = [vi V_{2} V_{Ϊ}Y is a unit vector and T_{k}i_{m} are the realvalued tensor coefficients. It should be noted that the spherical functions modeled by Eq. 1 are symmetric (i.e. T(\) = T(v)) for even orders, and antisymmetric (i.e.
for odd orders. As a special case of Eq. 1, the 1^{st} order tensors take the form T(v) = T ^{•} v, where T = [71,0,0 ?o,i,o 7o,o,i] and the 2^{nd}order tensors take the form T(\) = v ^{7}Tv, where T is a 3 x 3 matrix. It should also be noted that in the case of 3rdorder tensors, there are 10 unique coefficients 7\_{/m} in Eq. 1, while in the case of 5thorder antisymmetric tensors, there are 21 unique coefficients T_{k},ι,_{m} The ability of a Cartesian tensor to approximate the complex geometry of a spherical function with multiple lobes increases with its order. A 1 storder tensor can only be used to approximate singlelobed antisymmetric spherical functions. In order to approximate a function with more lobes, higherorder tensors are required. However, higherorder tensors can be perceived to be more sensitive to noise, simply by virtue of their ability to model high frequency detail. In contrast, the lower order tensors are incapable of modeling high frequency detail. Since it is impossible to discriminate between high frequency detail in the data and high frequency noise in the data, it is reasonable to say that the high order tensors possess higher noise sensitivity. Therefore, a balance between the accuracy in the approximation and the noise sensitivity must be found in determining the best suited tensor order.[035] Tensor Splines
[036] A tensor spline can be defined by combining the Cartesian tensor basis within a single pixel, as set forth above, with the wellknown Bspline basis across the image lattice. Preferably, the degree of the spline is fixed at 3 (i.e. a cubic spline) for purposes of simplicity since this degree of continuity is commonly used literature. A tensor spline is to be defined as a Bspline on multilinear functions of any order in general. In a tensor spline, the multilinear functions, which are antisymmetric tensors in the present invention, are weighted by the Bspline basis N, u _{+ /}, where:
1 if /. i. /  r_{i+}ι
  ( l ^{)} otherwise and
.Y_{1 J1} I n = .Y, _{fc}_i ( n  ΛVH
where the iVj, * _{+} /(/) functions are polynomials of degree k and are associated with n+k+2 monotonically increasing numbers called "knots" (L_{k}, L_{k+}i, ... , ?„_{+}/)• By using the above equation, the bicubic (i.e. k = 3) tensor spline is given by: Sϊt. V) = 53 V_{14}(^, JA_{J4} I^JT_{1J} (V I: .J
(4)
where t = \t_{x} t_{y} ^{~}\, v = [vi v_{2} v_{3}]^{r} is a unit vector, and T_{y}(V) is given by Eq. 1. It should be noted that in Eq. 4, there is a field of control tensors T_{y}(V) instead of the control points used in a regular Bspline. Below, the bicubic tensor splines are employed for approximating the ABRDF field of a human face given a set of fixedpose images under different known lighting directions.
[037] Apparent BRDF Approximation By Tensor Splines [038] The BRDF of a Lambertian surface is given by:
D(v \ = o (o ■ v) ^
where v is the light source direction, n is the normal vector at a particular point of the surface and α is a constant. It is immediate that the Lambertian model is a lstorder tensor (i.e. n = 1 in Eq. 1) with CLn_{x},
<xn_{y} and 7o,o,i = α«_{z}. As a lstorder tensor, the Lambertian model is antisymmetric and has a single peak.[039] Human faces however, are not exactly Lambertian since specularity can be observed in certain regions (e.g. nose and forehead). Moreover, the nonconvex shapes on the face (e.g. lips and nose) can create cast shadows. The shadows and specularities of the human face are indicative of a multilobed apparent BRDF. Therefore, in these cases the ABRDF cannot be modeled successfully by a l^{st}order tensor and hence higherorder antisymmetric tensors should be employed instead.
[040] As described above, the challenge is as follows: given a set of N face images of a given human subject with a fixed pose, I_{n}, n = 1 ... N with associated lighting directions V_{n}, one wants to estimate the ABRDF field of the face using a bicubic spline tensor spline. The fitting of the tensor spline to the given data can be done by minimizing the following energy:
E
where /_{*}, ^ run through the lattice of the given images. The minimization of Eq. 6 is done with respect to the unknown tensor coefficients T,_{j}χι,_{m} that correspond to the control tensor T_{tJ} (v_{π}).[041] For example, uniform grid knots 1, 2, 3 ... can be used in both lattice coordinates. Accordingly, there are (M + T) x (M+ T) control tensors, where Mx Mis the lattice size of each given image. Under this configuration, in the case of 3rdorder antisymmetric tensors, there are 10 unique coefficients for each control tensor. Therefore, the number of unknowns in Eq. 6 is equal to 10(M + 2)^{2} and in the case of a 5thorder antisymmetric tensor, the number of unknowns is 21 (M+ if.
[042] From Eq. 6, the derivatives, dEldT_{hJ}χι_{m}, are analytically computed and thus, any gradientbased functional minimization method can be used. For example, a nonlinear conjugate gradient with a randomly initialized control tensor coefficient field can be used. After having estimated the tensor field, images are synthesized under new lighting direction v by evaluating the apparent BRDF field in the direction v, whereby each apparent BRDF is given by Eq. 4. The generated images can also be upsampled directly by evaluating Eq. 4 on a denser sampling grid since the tensor spline is a continuous function.
[043] Learning the ABRDF Field Using Spline Modulated Spherical Harmonics
[044] Our goal is to generate images of a face under various illumination conditions using a single example 2dimensional image. This can be achieved by acquiring a reference ABRDF field once and then transferring it to new faces using their single images. The ABRDF represents the response of the object at a point to light in each direction, in the presence of the rest of the scene, not merely the surface reflectivity. Hence, by acquiring the ABRDF field of an object, cast shadows, which are image artifacts manifested by the presence of scene objects obstructing the light from reaching otherwise visible scene regions, can be easily captured. Note that since we want to analyze the effects of the illumination direction change, we would assume the ABRDF to be a function of just the illumination direction by fixing the viewing direction, though sometimes it is denned to be a function of both the illumination and viewing directions. Below, the first part of this process, i.e. the reference ABRDF field estimation, is described using novel bicubic BSpline modulated antisymmetric spherical harmonics.
[045] Surface Spherical Harmonic Basis
[046] The surface spherical harmonic basis, the analog to the Fourier basis for Cartesian signals, provides a natural orthonormal basis for functions defined on a sphere. In general, the spherical harmonic bases are defined for complexvalued functions but as the apparent BRDF is a realvalued function, the realvalued spherical harmonic bases are used to represent the apparent BRDF functions. The spherical harmonics basis function, ψ^{m}/ (order = /, degree = m), with / = 0, 1, 2, ... and / < m ≤ I, is defined as follows:
where P^ are the associated Legendre functions and Φ_{m}(θ, φ) is defined as:
[047] Note that even orders of the spherical harmonics basis functions are antipodally symmetric while odd orders are antisymmetric. Perceptually speaking, given a limited number of data samples, the ABRDFs are best approximated using only antipodally antisymmetric components of the spherical harmonic bases. To recognize this there are two crucial questions that must be examined. First, whether using just even or odd ordered bases drastically limit the approximation of the ABRDF and second, whether symmetric or antisymmetric bases are more suitable.
[048] With respect to the first question, even though the ABRDF is a function defined on a sphere, for the purposes of the present invention, the interested only lies in its behavior on the frontal hemisphere. Hence, if the function's behavior at very extreme angles (0° and 180°) is ignored, once the ABRDF has been modeled accurately on the frontal hemisphere, the rear hemisphere can be filled in appropriately to make the function either symmetric or antisymmetric. To visualize this, polar plots in the first column of FIG. 1 show a typical ABRDF function defined on a semicircle. The second column shows the same function being approximated by an antipodally symmetric (row 1) and an antipodally antisymmetric (row 2) function. By not using both types of components, not much is lost in the approximation power. For visualization, the problem has been scaled down to 2dimensional and the blue circle represents the zero value in these polar plots. A more important reason that keeps us from using the complete set of bases is that for a fixed number of given example images, using just symmetric or antisymmetric components allows us to go to higher orders which are necessary to approximate discontinuities like cast shadows and specularities in the image.
[049] With respect to the second question, one must observe the function's behavior at the extreme angles (0° and 180°). In reality, most facial ABRDF functions have a positive value near one of the extreme angles (as they face the light source) and a very small (« 0) value near the other extreme angle (as they go into attached shadows). Hence, the function in column 1 of FIG. 1 is very close to physical ABRDFs. Clearly, the function's behavior at 0° and 180° is neither antipodally symmetric nor antisymmetric and hence, using just one of the two would lead to errors in approximation at these extreme angles. The error caused by symmetric approximation is perceptually very noticeable as it gives the function a positive value where it should be 0 [see FIG. 1, last column, first row and the regions marked by arrows in FIG. 1, last row as they are unnaturally bright] while the error caused by antisymmetric approximation is not perceptually noticeable as it gives the function a negative value where it should be 0, which can be easily set to 0 as it is known that ABRDF is never negative [see FIG. 1, last column, second row and FIG. 1, last row, last two images]. Nonnegativity is achieved similarly in the Lambertian model using the max function. Errors at the nonzero end of the function are not perceptually noticeable, as can be seen from the last row of FIG. 1.
[050] Bicubic Bspline Modulated Spherical Harmonic
[051] For a fixed pose, each pixel location has an associated ABRDF and across the whole face, we have a field of such ABRDFs. To model such a field of spherical functions (S x R — > R), a modulated spherical harmonics is used by combining spherical harmonic basis within a single pixel and Bsplines basis across the field. The Bspline basis, N^, where:
_{<γ} f 1 if*. 1 t C i_{ϊ+ i} *^{ u} \ fl otherwise.
and
V.fc(O = X,.kL {t^{)} . ^{t U} , + Y_{l+}l,*l i t) J^{+}^
' Hfc 1 ' H* (10)
acts as a weight to the spherical harmonic basis. Here, N_{hk}(f) is the spline basis of degree k + 1 with associated knots (Lk, tk+i, ... , t_{n+}i). Hence, the expression for the modulated spherical harmonics is given by:
Φ,^{m}( 0. O, X, I, J ) = J   ^— _{Λ} ,^{'} .V,.4( JI iO. Ofiπtft. Φ )
with Φ_{m}(θ, φ) and Pi\_{m\} as defined before, 3c = (xi, xi) are the spline control points, / andy are the basis indices. The bicubic spline is chosen because it is one of the most commonly used in literature and more importantly, it provides enough smoothness for the ABRDF field so that the discontinuities present in the field due to cast shadows are appropriately approximated as demonstrated in the results shown below.[052] There are three distinct advantages of using this novel bicubic Bspline modulated spherical harmonics for ABRDF field estimation. First, the builtin smoothness provides a degree of robustness against noise which is very common when dealing with image data. Second, it allows for using neighborhood information while estimating the ABRDF at each pixel location. Finally, it provides a continuous representation of the spherical harmonic coefficient field, which will be exploited during the ABRDF transfer that is defined further below. [053] ABRDF Field Estimation
[054] If the ABRDF field is available for a face, images of the face under novel illumination directions can be rendered by simply sampling the ABRDF at each location in the appropriate directions. But in a realistic setting, only a few images of a face (sample of the ABRDF field) are given. Hence, the problem at hand is of ABRDF field estimation from these few samples. Motivated by the reasoning outlined above, the present invention employs a bicubic BSpline modulated antisymmetric spherical harmonic functions for this task.
[055] Using S_{x}(Q, φ), the given data samples (intensity values) in (θ, φ) direction at location x , the ABRDF field can be estimated by minimizing the following error function:
where the first term in the summation is the representation of the ABRDF function using modulated antisymmetric spherical harmonic functions. T is the set of odd natural numbers and Wj_{j}i_{m} are the unknown coefficients of the apparent BRDF function that is being sought. Here, the spline control grid is overlayed on data grid (pixels) and the inner summation on / andj is over the bicubic BSpline basis domain. This objective function is minimized using the nonlinear conjugate gradient method initialized with a unit vector, for which the derivative of the error function with respect wyι_{m} can be computed in the analytic form as,
[056] Both of odd orders 3 and 5 are able to yield sufficiently good synthesis results, with order 3 performing slightly better than order 5. Therefore, order 3 is preferred. This is because the order 5 approximation overfits the data. In an order 3 (value of 1) modulated antisymmetric spherical harmonic approximation, values of the unknown coefficients can be recovered with just 9 images under different illumination conditions. Estimation is better if the given 9 images somewhat uniformly sample the illumination directions and improves if more images are present. As ABRDF is a positive function, any negative values produced by the model are set to 0 (as also done by the max function in the Lambertian model).
[057] Learning ABRDF Field Using Continuous Mixture of Single Lobed Functions
[058] The method described above can be quantitatively compared to a more general model that, in theory, can approximate spherical functions using a continuous mixture of singlelobed spherical functions. There are various spherical functions with a single lobe that can be used in a continuous mixture. For this application, it is desirable to choose a function that leads to an analytic solution, such as the following:
Six) = , « > _ ! _{(14)} where u and v are unit vectors. Eq. 14 has the following desirable properties: 1) it has a single peak, and 2) S(v) = 0 for all v such that v • u = 0 (because if the viewing and illumination directions are perpendicular, we expect zero intensity). These properties are also valid for the Lambertian model.
[059] Given the singlelobed function in Eq. 14, any spherical function can be written as a continuous mixture of such functions. Accordingly, the apparent BRDF, a spherical function, can be modeled as a continuous mixture of functions S(v) as follows:
(15) where the integration is over the set of all unit vectors u (i.e. the unit sphere) and βu) is a distribution on orientations. The von MisesFisher distribution is chosen as the mixing density as it is the analog of the Gaussian distribution on S_{2} The von MisesFisher distribution is given by:
where μ is a unit vector defining the orientation and K is a scalar governing the concentration of the distribution. The important observation is made that by substituting Eq. 16 into Eq. 15, an integral is derived that is the Laplace transform of the von MisesFisher distribution, which is analytically computed to be:However, the single von MisesFisher distribution model cannot approximate angular distributions with several peaks, such as the human face apparent BRDF fields. Therefore, a finite mixture of von MisesFisher distributions can be used, which leads to the following alternate definition of Eq. 15:
(18) where w, are the mixture weights.
[060] In order to use this mixture of von MisesFisher distributions to obtain an expression for the apparent BRDF, a dense sampling of 642 directions of the unit sphere obtained by the 4th order tessellation of the icosahedron can be used. The result is the following expression:
Although βu) has the form of a discrete mixture, the approximating function B(\) is still a continuous mixture of singlelobed functions expressed by Eq. 15.
[061] Given a set of N facial images of a given human subject with a fixed pose, I_{n}, n = I ... N, associated with lighting directions V_{n}, a N x 642 matrix A_{nj} can be setup by evaluating Eq.17 for every value V_{n} and μ,. Then, for each pixel, the unknown weights of Eq. 19 can be estimated by solving the overdetermined system:
AW = B _{(20)} where B is a Ndimensional vector that consists of the intensities of a fixed pixel in the N given images, and W is the vector of the unknown weights. This system can be solved efficiently to obtain a sparse solution by the nonnegative least square minimization algorithm. This general model just described is used as a benchmark for evaluating quantitatively the ability of the antisymmetric tensor spline model of the present invention in approximating the apparent BRDF of human faces.
[062] ABRDF Transfer Algorithm
[063] The second part of the method of the present invention deals with transferring the ABRDF field from one face (reference) to another (target) and thus generating images under various novel illuminations for the target face using just one exemplar image. The basic shapes of features are more or less the same on all faces and thus the optical artifacts, e.g. cast and attached shadows, created by these features are also similar on all faces. Accordingly, the nature of the ABRDFs on various faces is also similar and hence, one should be able to derive the ABRDF field of the target face using a given reference ABRDF field.
[064] First, the nonrigid warping field between the reference and the target face images must be estimated. The nonrigid warping field between the reference and the target face images can be formalized as the estimation of a nonrigid coordinate transformation T such that:
^ Mi ϊref (Ti X)). ftargeti?)}
*6J (21) is minimized. Mis a general matching criterion which depends on the registration technique. I_{re}f and I_{target} are the reference and target images respectively, x is the location on the image domain /. Preferably, an information theoretic match measure based registration technique should be used in order for the registration to be done across different faces with possibly different illuminations (e.g. Mutual Information (MI) and CrossCumulative Residual Entropy (CCRE) based registration described respectively in the following publications: 1) "Alignment by maximization of mutual information," P. Viola and I. William M. Wells, IJCV, 24(2): 137154, 1997 and 2) "Nonrigid multimodal image registration using crosscumulative residual entropy," F. Wang and B. C. Vemuri, IJCV, 74(2):201205, 2007)). The CCRE registration technique works with cumulative distributions rather than probability densities and hence, is more robust to noise. Therefore, CCRE is able to produce better results when applied to faces. FIG. 5 depicts the results produced by MI and CCRE for the purpose of visual comparison. The first and second columns contain the reference image and the target image respectively. The third and fourth columns contain deformed faces produced by CCRE and MI respectively.
[065] Once the deformation field has been recovered, it is used to warp the source image's apparent BRDF field coefficients to displace the apparent BRDFs into appropriate locations for the target image. As described above, by using modulated spherical harmonic functions, we can obtain a continuous representation of the coefficient field, which is written explicitly as:
'1'I_{v}JS) = ^] λ\ _{I}J J _{1} J Λ~_{Λ}4 ( So ) U\β_{m} υ (22)
As defined above, w_{/m}( x ) are the unknown coefficients for the order / and degree m spherical harmonic basis at location x . The apparent BRDF field coefficients for the target image w _{/}„,( x ) can be computed using Eq. 22 as w_{/m}(x ) = w/_{m}(T(x)), where T is the deformation field recovered by minimization of Eq. 21. Using w/_{m}(x), the apparent BRDF field can be readily computed using the spherical harmonic basis. As can be noted from FIG. 5, though the locations of the apparent BRDF have been changed to match the target face image, they are still the source (reference) image's apparent BRDF and thus, the images obtained by sampling them appear like the source (reference) image as can be seen in columns three and four of FIG. 5.
[066] This discrepancy can be fixed by using the following intensity mapping technique. A separate transformation can be chosen for each pixel. Based upon the geometric transformation between the reference and the target image, the intensity mapping quotient, Q(x), for each location x can be defined as:
Qf .Fi = I_{tar grt} l ?)ilr_{f}fi 7^{"}( Pn
[067] Because the images are known to be noisy and the division operation accentuates that noise, a Gaussian kernel G_{σ} can be used to smooth the image intensity mapping quotient field. As a result, the intensity value at location x of an image of the target face under novel illumination direction (θ, φ) can be computed as: l
ieτ _{m}=t _{(24)}
where the argument (θ, <p, x) indicates that the apparent BRDF at location x is being queried in direction (θ, φ) and 3(θ, φ) can be from any of the 3 above mentioned methods.
[068] The intensity mapping quotient (Eq. 23) is not the same as the Quotient image proposed by RiklinRaviv and Shashua as they make explicit Lambertian assumption and define their quotient image to be ratio of the albedos which is clearly not the case here.
[069] Experimental Results
[070] All the experiments in this section used the Extended Yale B database, which has 64 different images per subject under known illumination directions. In order to test the sensitivity of our antisymmetric tensor spline model to the selection of the train set, we constructed three different training sets, each consisting of 9 facial pictures per subject taken under different lighting directions. The lighting directions of the 9 images for the three selected training sets (A, B and C) are shown in FIG. 11. In FIG. 11, the lighting direction of each image is presented as a point in the azimuthelevation plane. The training set 'A' shows a case where the 9 lighting directions do not span the azimuthelevation plane in a symmetric and uniform manner, and therefore the input dataset does not represent well the underlying ABRDF. In the training set 'B', the lighting directions cover the azimuthelevation plane better; however there is no lighting direction of an extreme high angle. Finally, the training set 'C samples the azimuthelevation plane even better, including highangle lighting directions along the elevation axis.
[071] To see the impact of different training sets on the approximated ABRDF field, the ABRDF of 10 different subjects from the Extended Yale B dataset was computed under the lighting configurations described in FIG. 11, using: a) the antisymmetric tensor spline model of the present invention of order 3 and b) the Lambertian version of our framework using l^{st}order tensors. After the training was performed using only 9 images per subject according to the method described above, 64 facial images per subject were synthesized by evaluating Eq. 4 for the 64 lighting directions provided in the Yale B database.
[072] FIG. 13 presents the synthesized images under several different lighting directions for a randomly selected subject. The images demonstrate that our proposed model approximated well the underlying ABRDF, producing realistic images. The 9 input images used here are shown in FIG. 10.
[073] FIG. 12 shows the average error in the intensity value denned as the absolute distance between the intensity values of the synthesized images and the ground truth images in the database. Based on the reported errors, it can be concluded that the method of the present invention performs significantly better than the Lambertian model. Moreover, in all three training set configurations, the performance remained approximately the same, which conclusively demonstrates that the method of the present invention approximates well the underlying ABRDF regardless of the lighting directions of the 9 input images.
[074] In FIG. 14, examples of the synthesized images using the Lambertian model and the antisymmetric tensor spline method of the present invention are visually compared. The first column shows the ground truth image from the extended Yale B dataset. Note that the ground truth images presented in FIG. 14 were not a part of the training set used for the synthesis of the images presented in the second and third columns of FIG. 14. By visual comparison, one can conclude that the 3^{rd}order tensorial model can accommodate cast shadows and approximate well the specular components of the underlying ABRDFs. In contrast, specularity and shadows are missing from the images synthesized under the Lambertian model, which demonstrates the invalidity of the Lambertial assumption.
[075] FIG. 15 shows the approximated ABRDFs plotted as spherical functions in a region of interest that has specularities and shadows. The shapes of the plotted functions contain up to three lobes and show complexities that cannot be approximated under the Lambertian assumption.
[076] Next, the continuous mixture of single lobed functions was employed to approximate the underlying ABRDF by using all 64 given images as the training set. This model, although less efficient (since it requires a much larger training set of 64 images) than the antisymmetric tensor spline method of the present invention (which uses only 9 images), can approximate spherical functions with a very complex structure characterized by a large number of lobes. In contrast, the 3^{rd}order antisymmetric tensor spline model can approximate functions whose shape complexity consists of at most three lobes. By comparing the performance of the continuous mixture of exponential functions with that of the antisymmetric tensor spline, both presented in FIG. 16, one can conclude that they yield similar intensity values. This quantitatively demonstrates that in spite of the limitations of the 3^{rd}order antisymmetric tensor spline model, we can still capture and approximate the shape of the underlying facial ABRDFs.
[077] In FIG. 2A, we present the novel images synthesized from the learnt ABRDF field using spline modulated spherical harmonics, which clearly demonstrate that photorealistic images can be generated by our model. Note the sharpness of the cast shadows in the last row. The presented technique is capable of both extrapolating and interpolating illumination directions from the sample images provided to it [see FIG 2B]. In FIG. 3 (left), we present the estimated ABRDF field overlayed on a face and in FIG. 3 (right), the method of the present invention can be seen to capture multiple bumps with varying sharpness to account for shadows and specularities. The method of the present invention's ability to capture cast shadows and specularities in images is clearly demonstrated in FIG. 4.
[078] In FIG. 6, we present a set of images generated under novel illumination conditions of the target face [see 2^{nd} row and 2^{nd} column in FIG. 5] using just one image. It can be noted that the specularity of the nose tip and cast shadows have been captured to produce photorealistic results. Next, in FIG. 7, we present novel images of the same subject using three different reference faces. Discounting minor artifacts, it can be noted that these images are perceptually similar.
[079] In the next set of experiments, we demonstrate the robustness and versatility of the method of the present invention. First, we demonstrate that we can produce good results even when parts of the face in the target image are occluded [see FIG. 8]. This is accomplished by setting the intensity mapping quotient to unity and performing a histogram equalization in the occluded regions. The results show that our framework can handle larger occlusion than what was demonstrated recently by Wang et al. Second, even though we do not use any 3 dimensional information, the technique of the present invention is capable of generating photorealistic images of faces in poses different from that of the reference face under novel illumination directions. At this stage, our framework can handle poses that differ up to 12°. In FIG. 8, we look at the quantitative error introduced by our method as a function of the number of images used for the ABRDF field estimation. We compare the synthesized novel images to the ground truth images present in the Extended Yale B database. We observe that the quantitative error increases with the harshness of illumination direction, which we attribute to the lack of accurate texture information for extreme illumination directions.
[080] Finally, we present two sets of results for the application of the proposed techniques to face recognition. First, using a simple Nearest Neighbor classifier, we compare the results of our ABRDF estimation techniques using 9 sample images with those of existing techniques which use multiple (from 4 to 9) images [see TABLE I]. For this experiment, we assume that 9 gallery images with known illumination directions per person are available (from subset 1 and 2). We estimate the ABRDF field of each face using the techniques (Tensor Spline and Spline Modulated Spherical Harmonics) described, generate a number of images under novel illumination directions (defined on a grid) and then use all of them in our Nearest Neighbor classifier as gallery images. The results demonstrate that our technique can produce competitive results even when used with a naive classifier like Nearest Neighbor. To make the results comparable to the competing methods we used the 10 subjects from the Yale B face database. Results were averaged over 5 independent runs of the recognition algorithm. The results pertaining to the other techniques are summarized in the publication, "Acquiring linear subspaces for face recognition under variable lighting " K. Lee, J. Ho, and D. J. Kriegman, PAMI, 27(5):684698, 2005. The results demonstrate that the method of the present invention can produce competitive results even when used with a nave classifier like the Nearest Neighbor classifier.
Table I : Recognition results on Yale B Face Database of various existing tech niques..
[081] A second set of experiments demonstrate how the ABRDF transfer technique, which works with a single image, can be used to enhance various existing benchmark face recognition techniques [see TABLE 2]. For this, we make use of the fact that the performance of most recognition systems is improved when a better training set is present. We present results for Nearest Neighbor (NN), Eigenfaces and Fisherfaces, where we assume that only a single near frontal illumination image of each subject is available in the gallery set. For ABRDF+NN, ABRDF+Eigenfaces and ABRDF+Fisherfaces, we use this single image to generate more images and then use all of them to train the classifiers. Experiments were earned out using 10 randomly selected subjects from the Extended Yale B Database. Results were computed using 3 different reference faces (other than the 10 selected subjects) over 5 independent runs each of the recognition algorithms and then averaged.
Table 2: Recognition revults of various, benchmark methods on the Extended Yale Face Database.
[082] Attached as Exhibits to the instant application, and incorporated by reference hereto are the following articles authored by the inventors herein:
 Beyond the Lambertian Assumption: A generative model for Apparent BRFD fields of Faces using AntiSymmetric Tensor Spline
From one to many: A generative model for face image synthesis under varying illumination
[083] Accordingly, it will be understood that embodiments of the present invention has been disclosed by way of example and that other modifications and alterations may occur to those skilled in the art without departing from the scope and spirit of the above description or appended claims.
Claims
Priority Applications (2)
Application Number  Priority Date  Filing Date  Title 

US5500208 true  20080521  20080521  
US61/055,002  20080521 
Publications (2)
Publication Number  Publication Date 

WO2009143163A2 true true WO2009143163A2 (en)  20091126 
WO2009143163A3 true WO2009143163A3 (en)  20120426 
Family
ID=41340824
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/US2009/044533 WO2009143163A3 (en)  20080521  20090519  Face relighting from a single image 
Country Status (1)
Country  Link 

WO (1)  WO2009143163A3 (en) 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN102163330A (en) *  20110402  20110824  西安电子科技大学  Multiview face synthesis method based on tensor resolution and Delaunay triangulation 
CN105447906A (en) *  20151112  20160330  浙江大学  Method for calculating lighting parameters and carrying out relighting rendering based on image and model 
Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

US20060280342A1 (en) *  20050614  20061214  Jinho Lee  Method and system for generating bilinear models for faces 
US20070014435A1 (en) *  20050713  20070118  Schlumberger Technology Corporation  Computerbased generation and validation of training images for multipoint geostatistical analysis 
US20070031028A1 (en) *  20050620  20070208  Thomas Vetter  Estimating 3d shape and texture of a 3d object based on a 2d image of the 3d object 
US7215802B2 (en) *  20040304  20070508  The Cleveland Clinic Foundation  System and method for vascular border detection 
Patent Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

US7215802B2 (en) *  20040304  20070508  The Cleveland Clinic Foundation  System and method for vascular border detection 
US20060280342A1 (en) *  20050614  20061214  Jinho Lee  Method and system for generating bilinear models for faces 
US20070031028A1 (en) *  20050620  20070208  Thomas Vetter  Estimating 3d shape and texture of a 3d object based on a 2d image of the 3d object 
US20070014435A1 (en) *  20050713  20070118  Schlumberger Technology Corporation  Computerbased generation and validation of training images for multipoint geostatistical analysis 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN102163330A (en) *  20110402  20110824  西安电子科技大学  Multiview face synthesis method based on tensor resolution and Delaunay triangulation 
CN105447906A (en) *  20151112  20160330  浙江大学  Method for calculating lighting parameters and carrying out relighting rendering based on image and model 
Also Published As
Publication number  Publication date  Type 

WO2009143163A3 (en)  20120426  application 
Similar Documents
Publication  Publication Date  Title 

Cootes et al.  On representing edge structure for model matching  
Goldman et al.  Shape and spatiallyvarying brdfs from photometric stereo  
Ramanathan et al.  Face verification across age progression  
Abate et al.  2D and 3D face recognition: A survey  
Bronstein et al.  Threedimensional face recognition  
Lu et al.  Matching 2.5 D face scans to 3D models  
Hirzer  Large scale metric learning from equivalence constraints  
Blanz et al.  Face recognition based on fitting a 3D morphable model  
US20020106114A1 (en)  System and method for face recognition using synthesized training images  
US20070080967A1 (en)  Generation of normalized 2D imagery and ID systems via 2D to 3D lifting of multifeatured objects  
Huang et al.  Superresolution of human face image using canonical correlation analysis  
Singh et al.  A mosaicing scheme for poseinvariant face recognition  
US20060034545A1 (en)  Quantitative analysis, visualization and movement correction in dynamic processes  
Sandbach et al.  Static and dynamic 3D facial expression recognition: A comprehensive survey  
Ma et al.  Nonrigid visible and infrared face registration via regularized Gaussian fields criterion  
Blanz et al.  Fitting a morphable model to 3D scans of faces  
Romdhani  Face image analysis using a multiple features fitting strategy  
Raviv et al.  Full and partial symmetries of nonrigid shapes  
Wang et al.  Face recognition from 2D and 3D images using 3D Gabor filters  
Zhao et al.  Robust face recognition using symmetric shapefromshading  
Zhang et al.  Multiscale dictionary for single image superresolution  
Suinesiaputra et al.  Automated detection of regional wall motion abnormalities based on a statistical model applied to multislice shortaxis cardiac MR images  
Cula et al.  Skin texture modeling  
Thacker et al.  Performance characterization in computer vision: A guide to best practices  
Scandrett et al.  A personspecific, rigorous aging model of the human face 
Legal Events
Date  Code  Title  Description 

121  Ep: the epo has been informed by wipo that ep was designated in this application 
Ref document number: 09751391 Country of ref document: EP Kind code of ref document: A2 

NENP  Nonentry into the national phase in: 
Ref country code: DE 

122  Ep: pct application nonentry in european phase 
Ref document number: 09751391 Country of ref document: EP Kind code of ref document: A1 