IMAGE CAPTURE APPARATUS
This invention relates generally to image capture apparatus and, more particularly, to image capture apparatus including means for estimating lens distortion parameters from two or more images captured by a camera.
There are many circumstances, for example within the film industry, in which it may be required to reconstruct a three-dimensional scene which is observed in one or more captured images. For instance, it may be required to combine a three-dimensional video sequence of a scene captured within the real world with one or more three dimensional virtual buildings or other objects incorporated therein.
The lens of an image capturing device is generally curved and, as such, an image captured therewith will exhibit a certain amount of "curvature" caused by lens distortion, which curvature does not substantially adversely affect the quality of the captured image itself, but causes the image not to obey the laws of perspective projection, and therefore needs to be accounted for. In order to achieve this, it is known in the art to calibrate the camera capturing the images of the scene so as to apply one or more lens distortion terms to the captured images and thus substantially eliminate, or at least reduce, the above-mentioned curvature therefrom. The result of this (if viewed in terms of a single frame) is the production of an essentially "perspective" image of the same scene in which the corners of the original image have been effectively "pushed out" or "pushed in", as illustrated in Figure 1 of the drawings.
The amount of curvature created in an image because of lens distortion may typically be represented by a single value, and the addition of such a single distortion term within the image processing function is known to significantly improve the results of scene reconstruction, particularly over long video sequences.
It should be noted that the largest body of work on the correction of lens distortion deals with camera precalibration, i.e. where the camera is calibrated offline (before the image
sequence is captured therewith). However, the present invention is concerned primarily with the situation where the original camera lens is not available, for example in the case of archive footage, or when using variable lens geometries. Thus, the present invention is concerned with the problem of nonlinear lens distortion in the context of camera self- calibration and structure from motion. In particular, the recovery of three-dimensional camera motion from two-dimensional point tracks, where there is moderate to severe lens distortion.
Known techniques concerned with online estimation of lens distortion can be divided into two strategies. The first, known as the plumb line method uses straight lines in a scene to provide constraints on the distortion parameters. However, straight lines are not always available in a scene, and when present are not necessarily trivial to detect. As a result, extreme care must often be taken to ensure that real-world curves are not confused with distorted lines.
It is known in the art that lens distortion can be computed given two or more images of the same scene, captured from two or more different respective angles, and no other information, and the second known method is known as bundle adjustment which involves the computation of the fundamental matrix of a pair of cameras. An example of this method is given in Proceedings International Conference on Pattern Recognition (proc ICPR), 1996 "On the Epipolar Geometry Between Two Images With Lens Distortion", Z. Zhang, in which lens distortion is considered as an integral part of the camera and the rigidity constraints or assumptions required to compute the fundamental matrix are extended to include the parameters of the distortion model. The epipolar geometry between two images with lens distortion is described, and it is established that for a point in one image, its corresponding point in the other image should lie on the so-called epipolar curve. The paper then goes on to investigate the possibility of estimating the distortion parameters and the fundamental matrix based on this generalised epipolar constraint. However, experimental results with computer simulation have shown that the distortion parameters can only be estimated correctly using the disclosed techniques if the
noise in image points is low and the lens distortion is severe. Otherwise it is considered to be better to treat the camera(s) as being free of distortion.
Further, the above-described technique relies on iterative methods to find the distortion parameters. As is usual with such iterative methods, their convergence is not guaranteed, initial estimates must be found, and - although fast within this class of nonlinear techniques - they remain too slow to place in the inner loop of any hypothesise-and-test architecture.
Thus, some known systems are only able to estimate lens distortion parameters satisfactorily in the extreme conditions whereby noise in image points is low and lens distortion is severe, otherwise lens distortion is ignored. However, in many cases, ignoring the lens distortion parameters produces unsatisfactory results and, if accurate camera information is required there is considered to be no recourse but to bundle adjustment (also described in the chapter 'Bundle Adjustment: A Modern Synthesis', by W.Triggs, P. McLauchlan, R. Hartley and A. Fitzgibbon in Vision Algorithms .-Theory and Practice, LNCS, Springer Verlag, 2000), initialised with reasonable estimates of camera geometry in the presence of lens distortion which have until now been necessarily partially initially estimated manually by an expert before being inserted into one or more iterative algorithms for a more accurate calculation of these parameters. As a result, currently-known bundle adjustment techniques cannot be used to compute lens distortion parameters simultaneously with unknown camera motion and unknown scene geometry, and must instead be set by automated guesswork, which is slow, requires expert manual input and frequently fails altogether, and we have now devised an arrangement which overcomes the problems outlined above.
In accordance with the present invention, there is provided image data processing apparatus, comprising means for receiving two or more images of a scene captured by one or more image capturing devices, means for identifying point correspondences between said captured images, means for determining a lens distortion parameter value corresponding to lens distortion exhibited in said captured images and adjusting said point correspondences according to said determined lens distortion parameter value such that
said point correspondences lie on substantially the same epipolar curve, said lens distortion parameter value being determined so as to satisfy an epipolar constraint equation defining said epipolar curve, the lens distortion parameter comprising an element of said epipolar constraint equation, the apparatus further comprising means for computing one or more suitable estimated lens distortion parameter values for insertion in said epipolar constraint equation.
In a preferred embodiment of the invention, the means for computing one or more suitable estimated lens distortion parameter values is arranged to compute said values using one or more design matrices, said one or more design matrices preferably being derived from two- dimensional point coordinates, beneficially obtained directly from the captured images.
Each point correspondence obtained directly from the captured images comprises a pair of image points (x„ y,) and (x, ', y, "), where i ranges from 1 to the number of point correspondences, m. Define r, := x + y ,2, and r, ' := x, + y, .
In one preferred embodiment, from each such point correspondence, three design matrix rows may be defined as follows:
R,(1) = [x, 'x, x, 'y, x, ' y, 'x, y, 'y, y, ' x, y, 1 ]
R,m = [ 0 0 *, 'r,2 0 0 y. 'r ,r,2 JV, 12 rf + r * ]
R,(3)= [ 0 0 0 0 0 0 0 0 r, '2r,2 ]
These rows are assembled into m x 9 design matrices D D
2, D
3 (i.e. D
α α ε {1, 2, 3}) as follows:
The three design matrices are inserted into a quadratic eigenvalue problem of the form:
(D, + λD2 + λ2D3) f = 0
where λ denotes the lens distortion parameter and f denotes a 9-vector containing the elements of the fundamental matrix. It is known how to solve such problems.
In a second embodiment, from each such point correspondence, three pairs of design matrix rows may be defined as follows:
D (l) _ o o o -x' -y' -l yx' υy' v 1
' [i' y' 1 0 0 0 -xx' -xy' -x J
n (2) _ [ 0 0 0 -rx' -ry' -r'-r 0 0 yr'
< [ rx' ry' r'+r 0 0 0 0 0 -xr' J
P (3) = θ 0 0 0 0 -rr' θ O θ ] Λ' L 0 0 rr' 0 0 0 0 0 0 J
These pairs of rows are assembled into three design matrices D,, D2, D3 each of size 2m x 9, as above. The three design matrices are inserted into a quadratic eigenvalue problem of the form:
(D, + λD2 + λ2D3) h = 0
where λ denotes the lens distortion parameter and h denotes a 9-vector containing the elements of the planar homography.
An embodiment of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which:
Figure 1 is a schematic diagram illustrating the effect of applying lens distortion correction to a captured image, it will be shown that distortion can be computed given two or more images of the scene and no other information.
The following description of an exemplary embodiment of the present invention relates primarily to allowing the matching of image pairs via interest-point correspondences. As explained above, the most successful prior art techniques for matching interest points are based on the geometric constraints offered by multiple-view geometry. These tend to be effective because fast linear algorithms exist for the computation of the relationships allowing their computation to form the kernel of RANS AC -based matching algorithms. However, when images exhibit strong lens distortion, these constraints cannot be applied because the two-view relationships (fundamental matrix, planar homography) are not accurate in the image periphery.
Thus, an object of the present invention is to develop a model for the between- view relations which incorporates lens distortion. In particular, a model is required which admits a direct solution, i.e. computation from point correspondences via well understood, fast, globally convergent, numerical algorithms such as single value decomposition (SVD) or eigenvalue extraction, and the present invention is concerned with the calculation of one or more realistic estimates of camera geometry and correspondences for use in bundle adjustment techniques generally.
In the following, 2D points (in non-homogeneous coordinates) will be denoted by x = (x, y) and x will denote a general vector, including 2D points in homogeneous coordinates. The data used in the algorithm employed in this exemplary embodiment of the invention comprises point correspondences between lens distorted images. As the following deals almost entirely with two-view geometry, primes will be used to indicate a corresponding
point in the second view. Thus, as input we have a set of two-view point correspondences, denoted x <= x'.
The image points observed will be distorted functions of some perspective pinhole points, which shall be denoted by p_, so the image point x is the distorted version of the perfect point p.. The present invention is only concerned with radial distortion, so that the relationship between x and p. is dependent on their distances from the image centre. Throughout the paper, all these points are expressed in a 2D coordinate system, with origin at the distortion centre. In the absence of any other information, one would fix the distortion centre at the centre of the image, which is considered to be a reasonable approximation.
Given that the distortion centre may be assumed known, it is known in the art how to write the distortion correction (e.g. C. Slama, "Manual of Photogrammetry, 4th Edition", American Society of Photogrammetry, Falls Church, NA, USA, 1980) in one of several ways.
The distortion model used in this description of an exemplary embodiment of the present invention is:
(1) E = / ( 1 + λ|x|2)
and:
(2) n = x ( 1 + κ\x\2)
In order to compute the fundamental matrix from perfect point correspondences p ρ' an algorithm may be derived and used, which can be modified to include λ, the distortion parameter, such that the fundamental matrix F may be computed from distorted, measurable points x<=>x', as follows.
A point correspondence in pinhole coordinates ρ p' which corresponds to a real 3D point which has been imaged by a pair of cameras will satisfy the epipolar constraint. This is embodied in the fundamental matrix, F, for the pair of cameras:
P'TFP = o (3)
It is the task of the apparatus of this exemplary embodiment of the present invention to recover F from point correspondences. Writing p = (p, q, 1), and concatenating the rows of F into a nine-vector f, we may rewrite the above constraint as:
Collecting eight such rows into a design matrix D, we obtain an estimate for f by solving Df = 0. This estimate will be greatly improved by truncating the resulting matrix to rank 2.
In order to compute F from the known image coordinates x, we must express equation (3) in terms of x. Writing the distortion equation (1) projectively, we obtain:
or: ι> x(H- «(x3 +jr>)) y(i + 2 + j,3)) 1 1
where both x and z are known (i.e. can be computed from image coordinates alone). Then the epipolar constraint is:
(x' + \r')
TV(x + λz) = 0
,'TT Fx + λ(z'TFx + x" Fz) -r- λ 2V„'Tl ,Fz = 0
or:
(x' + κz')τF(x + κz) = 0 x'TFx + κ(z'τFx + x'TFz) + κ2z'τFz = 0
which is quadratic in λ (or K) and linear in F. Indeed, expanding everything out, we obtain (with r = ||x||, and r' = ||x'||)
[ x'x x'y x' y'x y'y y' x V 1 ] • * +
4- λ [ 0 0 z'r2 0 0 y'r2 xr'2 yr'2 r2+r'2 ] • f + + λ2 [ 0 0 0 0 0 0 0 0 r'V ] ■ f = 0
Gathering the three row vectors into three design matrices, we obtain the following quadratic eigenvalue problem (QEP):
(Dι + \D2 + λ2£>3)f = 0 (4)
Such problems are analogous to standard second order ordinary differential equations (ODE's) (replace λ with partial derivative operators), and efficient numerical algorithms are already available in the art for their solution. Equation (4) has a maximum of 18 solutions which can be used as the distortion parameters in the bundle adjustment estimation described above, with one of said solutions providing the closest approximation of lens distortion. However, it has been found that there are at most only 10 solutions, and in practice probably no more than 6 solutions, that are real.
The preceding analysis applies also to the estimation of a plane projective transformation between the images. In this case, each point correspondence adds two rows to the design matrices:
l1 ' 1 » 0 0 -xx' -xy' -x \
D2 = \ °, °, ,0 ~r ' -ry' -r'-r 0 0 yr' 1 L r ry r'+r 0 0 0 0 0 -sr' J
£, To o 0 0 0 -rr' O O o l
L 0 0 rr' O 0 0 O O o J
The analogous computation for the trifocal tensor leads to a cubic eigenvalue problem, which is again readily solved.
Thus, in summary, the present invention extends the uncalibrated estimation of geometry from multiple images to include a correction for lens distortion. Its main contribution is considered to be a linear algorithm for the simultaneous estimation of this model and the fundamental matrix. All conventional algorithms for this purpose are iterative.
The present invention enables images which exhibit lens distortion to be matched with the same ease as those which accurately fit the pinhole model. Furthermore, it is possible to use the distortion-aware model to match even low distortion images without overfitting.
A specific embodiment of the present invention has been described herein by way of example only and it will be apparent to a person skilled in the art that modifications and variations may be made to the described embodiment without departing from the scope of the invention.