AU748148B2

AU748148B2 - A method for single camera range finding using a wavelet decomposition

Info

Publication number: AU748148B2
Application number: AU24870/01A
Authority: AU
Inventors: Julian Frank Andrew Magarey
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-03-16
Filing date: 2001-03-05
Publication date: 2002-05-30
Anticipated expiration: 2021-03-05
Also published as: AU2487001A

Description

S&F Ref: 541241

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT

ORIGINAL

Name and Address of Applicant: Actual Inventor(s): Address for Service: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Julian Frank Andrew Magarey Spruson Ferguson St Martins Tower,Level 31 Market Street Sydney NSW 2000 A Method for Single Camera Range Finding Using a Wavelet Decomposition 0. 0 0 Invention Title: ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU PQ6303 [32] Application Date 16 Mar 2000 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c A METHOD FOR SINGLE CAMERA RANGE FINDING USING A WAVELET

DECOMPOSITION

Technical Field of the Invention The present invention relates generally to automatic scene analysis and, in particular, to the estimation of object range by means of a single lens image sensor.

Background Art A two-dimensional image of a three-dimensional world scene is formed in a standard camera by a process of projection, which means the third dimension of range is lost during acquisition. The aim of range estimation is to restore this lost dimension and thus enable a 3D representation of the scene being captured.

Range estimation methods are generally classified into two categories, namely active and passive range sensing. Active range sensing involves broadcasting some kind of exploratory signal into a scene. Reflections from objects within the scene are then analysed to gain an understanding of the scene structure. An example of active range sensing is the ultrasonic technique employed by bats. Passive sensors, by contrast, analyse 15 the scene based only on the signals already present, for example ambient light. The human visual system falls into the passive sensing category. Passive sensing has the potential advantage of cost and practicality, but is heavily dependent on the presence of sufficient intrinsic detail in the scene. A benefit of passive sensing is the ability of light to reflect S"minute properties of the object or scene under analysis.

Restoring the range information from a single image under passive sensing is ::possible, but this requires prior knowledge of the content of a scene captured in the image.

In the absence of such knowledge, a differential method may be employed. The image of the scene varies with the relative position and intrinsic parameters of a camera used for capturing the image, as well as the range (as a function of image plane coordinates, also 25 termed a range map). If a detailed model for the expected mode of variation can be constructed, the range map may be inferred from two (or more) images, captured with different camera parameters in a manner independent of the actual scene content.

Several parameters of the camera are potentially controllable for this purpose. If a camera position is fixed while focus settings are varied, the resulting image processing algorithm is known as depth from defocus (DFD). If the focus settings are fixed while the 541241au.doc camera position is varied, the algorithm may be referred to as depth from stereo (DFS).

Historically these two approaches were treated separately, but more recently their common basis in geometric optics has been recognised and analysed for comparison purposes.

However, a truly general treatment encompassing both algorithms has not yet been proposed. It may be because of the different optical set-up required in each case. The DFD algorithm typically uses a single camera, whereas the DFS algorithm typically require two (or more) cameras.

Disclosure of the Invention It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

According to a first aspect of the invention, there is provided a method of determining range information from a plurality of images of a visual scene, said plurality of images being images of said visual scene captured using a single-lens camera with different optical configurations, said method comprising the steps of: calculating each of a first predetermined number of levels of multi-level Gabor Transforms for each of said plurality of images; determining an error criterion from level N Gabor Transforms of said S 20 images; refining said error criterion using a lower level Gabor Transforms of said images to form a refined error criterion; and oooo determining said range information from said refined error criterion.

0o 541241aul.doc According to another aspect of the invention, there is provided an apparatus for determining range information from a plurality of images of a visual scene, said plurality of images being images of said visual scene captured using a single-lens camera with different optical configurations, said apparatus comprising: means for calculating each of a first predetermined number of levels of multi-level Gabor Transforms for each of said plurality of images; means for determining an error criterion from level N Gabor Transforms of said images; means for refining said error criterion using a lower level Gabor Transforms of said images to form a refined error criterion; and means for determining said range information from said refined error criterion.

0 5* 541241aul.doc 4 Brief Description of the Drawings An embodiment of the present invention will now be described with reference to the drawings, in which: Fig. 1 is a schematic block diagram of a single-camera passive range finding system; Fig. 2 is a schematic diagram showing the geometry of the optical system of the single-camera passive range finding system in Fig. 1; Figs. 3A to 3C show mask offsets that may be used; Fig. 4 represents the image sampling grids defined by each level of the wavelet transform; Fig. 5 is a graphical representation of the processing elements of the general Complex Discrete Wavelet Transform; Fig. 6 is a graphical representation of the processing elements of the Complex 15 Discrete Wavelet Transform in accordance with the preferred implementation; oooo Fig. 7 is a graphical representation of the processing stages of the range-finding algorithm from two images captured by the single-camera passive range finding system in Fig. 1; and ,:Fig. 8 is a schematic diagram showing the "region of accuracy" of the combined •o system for Fig. 1 and image processing algorithms of Fig. 7.

o*

U

Detailed Description including Best Mode A problem of range map estimation in the unified framework may be reduced to 25 finding the (space-varying) parameters of a linear-space-variant transformation between 00 o0 o; goU 541241aul.doc two images. A linear-space-invariant transformation is readily modelled as a multiplication in the Fourier transform domain, so that localised versions of the Fourier transform, such as the Gabor transform, have often been utilised to estimate the spacevariant transformation in both DFD and DFS applications. To allow sufficient local adaptivity in the range map, high spatial resolution is needed, but this increases the computational requirements of previous Gabor-based algorithms beyond practical limits.

A multi-resolution Gabor-like image decomposition based on the Complex Discrete Wavelet Transform (CDWT) is therefore proposed. The multi-resolution nature of the CDWT lends itself naturally to a coarse-to-fine solution strategy. Under this strategy, a low-resolution estimate of the range map is found using large-scale features of a scene. The estimated map is gradually refined and condensed by appropriately incorporating information at progressively finer scales. This allows for sensible range estimates in regions lacking intrinsic texture, a weakness of most passive range estimation strategies. In addition, the proposed algorithm allows accurate estimates of range maps with rapid variations even discontinuities in range at the edge of objects may be handled reasonably well.

Fig. 1 illustrates a single-camera passive range finding system 10. The range finding system 10 includes an optical system 200 and a processor module 100. Reflected light from an object 300 passes along an optical axis 400, through a programmable spatial light modulator (SLM) 210 and an image forming device 220 such as a lens or compound lens, and is projected onto an image capture mechanism 230. The programmable SLM S• 210 is preferably placed on a principal plane 240 of the optical system 200 and centred on optical axis 400. The image capture mechanism 230 consists of a 2D array of photosensitive elements. A processor module 100 controls the programmable SLM 210 25 and analyses captured images from the image capture mechanism 230.

The processor module 100 typically includes at least one processor unit 114, a memory unit 118, for example formed from semiconductor random access memory (RAM) and read only memory (ROM), input/output (11O) interfaces including an image interface 122 for receiving image data from the image capture mechanism 230, and an I/O interface 116 for controlling the programmable SLM 210. A storage device 120 is also provided. The components 114 to 120 of the processor module 100, typically communicate via an interconnected bus 130 and in a manner which results in a 541241au.doc conventional mode of operation of the processor module 100 known to those in the relevant art.

In a preferred implementation, the processor module 100 is implemented together with the optical system 200 in a camera (not illustrated). Alternatively, a conventional computer (not illustrated) may be connected to the optical system 200 for controlling the programmable SLM 210 and analysing captured images from the image capture mechanism 230.

The processor module 100 programs the programmable SLM 210 with a desired function of two-dimensional position. Referring to Fig. 2, the programmable SLM 210 imposes the chosen function as a mask function Ml(X) on the optical system 200 over a wavelength operating range of the programmable SLM 210. The mask function MI(X) is realised as a template mask function M(X) with a known transverse offset ri perpendicular to the optical axis 400. The wavelength operating range of the programmable SLM 210 should at least cover the wavelength range of ambient light. A photosensitive array 230 at a distance d, from the principal plane 240 is also set.

A projected image gl (not illustrated) of a scene including the object 300 is captured using the photosensitive array 230, and stored in the memory unit 118 or storage device 120 of the processor module 100. The image gi may be captured in either monochrome or colour. The processor module 100 then programs the programmable SLM 20 210 with a second mask function M 2 realised as the template mask function M(X) with a known transverse offset t2 perpendicular to the optical axis 400. In addition, the photosensitive array 230 is also set at a distance d 2 from the principal plane 240. The •photosensitive array distance d 2 may be different to the photosensitive array distance dl. A second image g2 is captured using the photosensitive array 230, and stored in the memory unit 118. In the preferred implementation only two images g and g2 are used, but this process may be repeated as many times as desired, with each image gi having associated optical system parameters namely, a photosensitive array distance dj and a transverse :•:':offset Figs. 3A to 3C show preferred arrangements of mask offsets rj for capturing 2, 3 and 4 images respectively. By symmetrically spacing the mask offsets tj around the origin 410, a maximum average baseline length is achieved, which improves resolution.

Since the programming of the SLM 210 is substantially instantaneous, in principle, the interval between captures of the successive images gi is typically limited only by the scanning and storage period of the photosensitive array 230. This allows for 541241au.doc the minimisation of errors caused by motion in the scene. Multiple images add robustness to the range estimation of a static scene, at the obvious risk that the scene does not remain static over the period of capture.

The mask functions Mj(X) loaded into the SLM 210 are all realised as the template mask function but all have different transverse offsets where X is the coordinate vector in a plane of the SLM 210. Therefore, the following applies: Mj(X)= M(X-t (1) The template mask function M(X) may take any convenient form, but in the preferred implementation, because of the simplicity of the mathematics, a non-normalised Gaussian function is used: M(X) exp 2T (2) where y is a mask standard deviation.

The effect of the mask function Mj(X) is to impose a known point spread function (PSF) hj(x) on the optical system 200. The PSF h(x) is defined as the image gj of a point source on the optical axis 400 with range Z on the photosensitive array 230 with distance dj from the principal plane 240, and is given by: M (3) where x is the coordinate vector in a plane of the photosensitive array 230. A dilation factor aj from principal plane 240 to the plane of the photosensitive array 230 and dilation 20 factor f from the plane of the SLM 210 to principal plane 240 are given by ai =1-d f-Z) (4) and

Z

z-A where a longitudinal offset A is the distance towards the object of the programmable SLM 210 from the principal plane 240 andfis the focal length of the lens 220. The captured image gj is formed as a weighted sum of point images: g ff (x t)hj t)dt (6) 541241au.doc where f(x) is a perfectly focussed image as would be formed by a pinhole camera with focal length dj. Note that the PSF hi(x) of Equation varies over the plane of the photosensitive array 230, because of the variation of range Z of the imaged points, which in turn means the dilation parameters a and (3 vary over the image g. Equation reduces to a simple convolution only when all objects in the scene have a constant range Z, thereby all being in object plane 310. If it is assumed that this indeed is the case, then Equation (6) may be expressed in the Fourier transform domain as: G F (u)H (7) where u is the spatial frequency vector.

It may be shown that the focussed images f(x) differ only by a dilation factor K related to the absolute magnification of the optical system 200. This is defined as the height of the image of an object of unit size, and is given by: d 1-/+A K. (8) S Z-A The focussed images f(x) satisfy f f, x. (9) With the photosensitive array distances dj chosen appropriately, the magnification differences between focussed images fj(x) may be reduced to insignificance. In the preferred implementation, the longitudinal offset A is set equal to the focal lengthf of the lens 220. From Equation the dilation factor K is the same for all images gi. This is 20 known as telecentric optics. Alternatively, it should be ensured that the magnification changes very little, therefore changes to the dilation factor Kj should not exceed 1% over S: the set of images gj. This condition ensures that the effect of the scene content can cancel out and the following hold for all images gi for eachj 2 2, G, -(10) G H,(u) If the magnification changes significantly between images gi, each image gj may be resampled by a factor of K/Ki to form resampled images g' before proceeding as 541241au.doc described below. The only effect the resampling has, is that the dilation factor a is replaced by a scaled dilation factor a' defined as: a. aj(11) Therefore, dependent on whether image gj is thus resampled, either the dilation factor a or the scaled dilation factor aJ would be used in what follows.

The right hand side of Equation (10) involves the point spread functions Hj, which are derived from the (known) mask functions Mj(X) with dependence on the range Z and the (known) optical parameters, the longitudinal offset A, the focal lengthfof the lens 220 and the photosensitive array distances dj. The left-hand side of Equation (10) depends only on the captured images gi. Therefore, it is a constraint sufficient to form an independent range estimate at each spatial frequency. A single estimate for the range Z may be found by a maximum-likelihood formulation over all spatial frequencies u. The maximum likelihood estimate of range may be found as follows: 1. Form an error criterion based on Equation 1 L H,(U) 15 E(z) 2 1 log (Gu -log( Wj (12) where: the weight function wlj(u) is inversely proportional to the variance of the first log term when the captured images are subject to noise of a given statistical model;

N

2 is the number of frequency channels; the logs are complex-valued quantities (hence the absolute value sign).

2. From Equation find a location Z of the minimum of the error criterion E using standard minimisation methods. This value Zo is the maximum likelihood estimate of the true range Z.

25 The above two steps form the essence of the general Fourier-domain depth from defocus method of range estimation.

It may be shown that, if the correct weighting function wiy(u) is used, the variance of the range estimate Zo is proportional to the variance Cr 2 of the additive noise. The constant of proportionality, known as the sensitivity, is given by: 541241au.doc 02 1 1oc- (13)

C

2 E"(zo) The inverse of the sensitivity may be used as a scalar measure of confidence in the range estimate.

Two distinct special cases of this general approach, which may be solved analytically, are now described. The two special cases are called a variable-distance and variable-position depth from defocus (DFD).

1. VD (variable-distance) -DFD.

In this case, the mask function M(X) is Gaussian and the transverse offset Tr remains the same for all captured images gj i.e. the mask remains fixed for all image captures. Combining Equations and the error criterion EvD in Equation (12) may be written as: Ev [A (uu)2 -a2 j(u) (14) N j=2 u where 15 A(u) loG(u) ;and G (u) 2 2 cG,(u) Sw, GI(u) Gj(u) is the weight factor derived from an additive white Gaussian noise model.

The error criterion EvD may be written as a function of s, where s= z-f (16) f(z A)

A*

20 Using this change in variable, and Equations and the error criterion EVD becomes: EVD (s

T

u)7 2 (d 1 -d,)s(dj d 2A) 2 w(u) (17) j=2 u where 4=fl(f-A).

541241au.doc Since EVD(s) is a quartic (polynomial of degree 4) ins, there exists a closed form solution for its minimising argument over any given interval. Inverting Equation the minimising argument so is related to range estimate Zo as follows: Z f(1- soA) (18) 1 fs The sensitivity, following Equation is given by U c (dZ 1 (19) d 2 s =so E (so) The inverse of the sensitivity may be used as a scalar measure of confidence in the range estimate Zo.

2. VP (variable-position) -DFD.

In this special case the photosensitive array distance dj is kept unchanged for all captured images gi. From Equation this implies that the magnification or dilation factor Kj is also constant for all images gi, thereby eliminating the need for prior resampling. Combining Equations and the error criterion Evp from Equation (12) may be written as: 2 E, (z) 2 P i u) i T j=2 u where (u) is the phase angle (in the range of the ratio; and wj is defined as in Equation Again it is convenient to change variables, this time using 21 Ss z(f d+ fd =a (z)fl(z) (21) to obtain 1 L 2I)S E, -I S su T t 2 w j(u) (22) N2 j=2 u 541241au.doc 12 Since Evp(s) is a quadratic equation in s, there exists a simple closed form solution for its minimising argument over any given interval. The minimising argument so is related to range estimate Zo by inverting Equation (21): As 0 d Zo d (23) so d--1 f The sensitivity is given, following Equation by S2 /d\Z 2 2 oc (24) 2 ,ds E"(so) In the foregoing, it was assumed that the scene is in the object plane 310, which is perpendicular to the optical axis 400. In order to handle scenes of varying depth Z, localised versions of the above algorithms are needed. A known method is to partition the images gj into (possibly overlapping) tiles and carry out the computation independently on each tile, resulting in a field of range estimates with spacing equal to that of the tile centres. For reasons related to the theory of discrete Fourier transforms (DFTs), a "window function" is usually applied to each tile before the DFT is applied. The effect of 15 the window, which is normally a smoothly varying function symmetrical about the tile centre, is to de-emphasise the contribution of those image samples near the edge of the tile to the DFT. If the chosen window function is Gaussian, the resulting transform is known as the Gabor transform and is an example of a space-frequency decomposition of an image g(t): G(x,u) Ig(t)(x-t,u)dt where A(u) exp(- x r x) exp(iu rx) (26) with A(u) the Gaussian window covariance matrix.

Prior art methods have been devoted to choosing the optimal characteristics of the Gabor transform for the purposes of passive range estimation. The parameters determine the shape and size of the Gaussian window, encapsulated in the symmetric 2 by 2 covariance matrix and the relative density of spatial and frequency sampling.

541241au.doc 13 The aim is to ensure that the Gabor transform Gj(x,u) of the image g satisfies a localised equivalent of Equation (10) as follows:

G

1 u) H 1 u) Gj(x,u) Hj(x,u) (27) Similar to the manner in which Equation (10) which leads to Equation (12), Equation (27) also leads to an error criterion separately defined at each sampling point x. Each error criterion E(x,s) can be analytically minimised (in the VD- and VP- DFD implementations) with respect to s by the methods described above to yield a range estimate Z(x) of the scene portion corresponding to the image region around x, and a corresponding confidence measure. The set of range estimates Z(x) is also known as a range map.

The choice of Gabor parameters has design trade-offs, which include: High spatial resolution, required for high-density range maps requires a small spacing of window centres. This increases the computation requirements exponentially; Spatial adaptivity, a measure of how rapidly the estimated range map Z(x) may change to handle discontinuities in the scene, is aided by low overlap between windows, and therefore small window size compared to spacing. However, as the window size is decreased, the risk that no useful image texture is contained in the window increases, and so therefore does the susceptibility of the range estimates to noise; and 20 Adherence to Equation (27) requires that the Gabor analysis uses narrow-band frequency channels. In the global Fourier transform the channels have zero width.

However, the uncertainty principle means that narrow frequency channels and small spatial window sizes are conflicting goals.

In known Gabor-based DFD methods, these trade-offs have been handled by 25 using Gabor channels whose spatial extent varies inversely with their centre frequency. In such a paradigm the low-frequency channels have large window sizes and consequently

S..

narrow bandwidth. The high-frequency channels have small window sizes and broad bandwidth. The choice of filters, however, has remained somewhat arbitrary, and the computational requirements somewhat heavy. The problem of achieving spatial adaptivity remains.

541241au.doc These deficiencies are addressed by the use of the Complex Discrete Wavelet Transform (CDWT) to implement the variable-window Gabor analysis efficiently and systematically. In addition, the proposed algorithm uses a coarse-to-fine estimation strategy that meshes naturally with the multi-resolution nature of the wavelet transform.

Referring to Fig. 5, the CDWT 500 is shown for two levels. Subsequent levels are performed by repeating the level 2 processing on the intermediate outputs. The processing block comprises 1-dimensional convolutions by FIR filters bo, and b, followed by downsampling by 2. FIR filters b 0 and bl are discrete Gabor filters.

Columns of an image g are filtered, in parallel, by both the "lowpass" filter bo and the "highpass" filter bl. The filters bo and bl are followed by a downsampling by 2 (not illustrated) in the same direction. Rows of the resulting images 510 and 520 are then filtered by the lowpass filter bo and the highpass filter as well as the complex conjugates of the lowpass filter bo and the highpass filter bl, again followed by a downsampling (not illustrated) in the same direction, to form eight images. The eight images comprise 2 images 515 and 525 which are lower resolution, complex-valued versions of the input image g, and 6 subimages D D 2 0

D

6 1 This concludes the level 1 CDWT processing.

Level 2 CDWT processing operates in a similar manner in that columns of images 515 and 525 from the level 1 CDWT are each filtered by both the lowpass filter bo 20 and the highpass filter bl, again followed by a downsampling (not illustrated) in the same direction. The rows of resulting images 530 and 540 from image 515 are also filtered by both the lowpass filter bo and the highpass filter bl, followed by a downsampling in the same direction, to form image 535 and subimages D(2,2) and D 3 2 In a similar manner, the rows of resulting images 550 and 560 from image 525 are filtered by both the complex conjugate of lowpass filter bo and the complex conjugate of highpass filter bl, followed by a downsampling in the same direction, to form image 555 and subimages

D

4 2

D

5 2 and D 6 2 A level 3 CDWT (not illustrated) may be added in a similar manner as level 2 by further processing images 535 and 555, to form again 2 images (not illustrated) and 6 subimages D 3

D

2 3

D

6 3 The level 2 structure may be repeated lmax-1 times, where Imax is the maximum pyramid level, decided prior to processing in a manner described below.

541241au.doc In the preferred implementation, the CDWT 501 of Fig. 6 is used. In addition to the FIR filters bo, and bl, the first level of the CDWT 501 also utilises FIR filterf The preferred filter coefficients are given in Table 1. FIR filters bo, and bl are sampled Gabor functions having harmonic frequencies 7r/6 and 57r/6 respectively and having a Gaussian envelope with standard deviation of 1.27.

n f(n) bo(n) b (n) -4 0 -0.0023 0.0084i 0.0023 0.0084i -3 0 0.0145 0.0543i 0.0145 0.0543i -2 -0.0257 0.0009i 0.1373 0.1373i -0.1373 0.1373i -1 -0.0014 0.4247i 0.3486 0.0934i 0.3486 0.0934i 0 1.0620 0.0000i 0.3486 0.0934i -0.3486 0.0934i 1 -0.0014 0.4247i 0.1373 0.1373i 0.1373 0.1373i 2 -0.0257 0.0009i 0.0145 0.0543i -0.0145 0.0543i 3 0 -0.0023 0.0084i -0.0023 0.0084i Table 1: Filter coefficients for implementing the preferred CDWT.

The effect of the CDWT is to efficiently implement an analysis filter bank S 10 consisting of a hierarchy of 2-D Gabor filters. The outputs of the filters at level 1 of the hierarchy are effectively downsampled by 2' in each direction (horizontal and vertical), wherein 1 may range from 1 to /max. The result is a set of complex-valued sub-images effectively obtained as follows: SD(n,(x) )(2x k) (28) k 15 wherein n The six wavelet filters at level 1 may be written in "the form of Equation They are distinguished by their characteristic orientations, which are approximately evenly spaced between 0 and 1800. Each wavelet filter (x) emphasises image features and textures of image g that are aligned with its characteristic orientation. Table 2 contains the horizontal and vertical components of the characteristic spatial frequencies un 1 for level 1. The values of c, and c, for the preferred implementation of the CDWT are 7i/4 and 37t/4 respectively.

541241au.doc 16 The order of row and column processing in the CDWT may be reversed, so that, within each level, column processing follows row processing. The algorithm is unchanged, except that the negative signs of the horizontal components of u n for n=4, and 6 as shown in Table 2 are transferred to the corresponding vertical components of

U

n l n horizontal component of u n vertical component of un,' 1 C Table 2: Spatial frequencies u n0 for the six wavelet filters ly at level 1 of the CDWT hierarchy.

Each level of the hierarchy of the CDWT contains six distinct spatial frequency 10 channels with characteristic scale 2' This means that their common bandwidth is 2 inversely proportional to 2 while their spatial extent is directly proportional to 22 Moreover, the downsampling means that the channel sampling points (known as 2 1 subpixels) are also spaced at 2 *Table 2:Fig. 4 shows subpixel grids at levels 1, 2, and 3 of an 8 x 8 pixel image. Whereas 15 the pixels 600 of image g have spacing one by definition, the level 1 subpixels 610, the level 2 subpixels 620 and the level 3 subpixels 630 are spaced 2, 4 and 8 pixels apart *respectively forming subpixel grids X1, X2 and X3.

*S The subimages may be identified with the ntGabor channel outputs G(x,u) of Equation with frequencies u indexed by the pair They may therefore be used in analogous fashion to find a spatially-varying range map Z(x) as described above.

However, instead of summing over all orientations and scales at once, the strategy However, instead of summing over all orientations and scales at once, the strategy 541241au.doc 17 followed is to build up the error criterion cumulatively, by summing over orientations at each level from /max to 1 from coarse to fine). This strategy arises naturally from consideration of the different spacings of subpixels 610, 620 and 630 at each level 1 produced by the CDWT.

The cumulative error criterion CEi(xt,s) is defined recursively on the grid Xi as follows: CE, E, s) (29) SCEt, l <lax where (in the VP-DFD case) the level-i error criterion E is defined in analogy with Equation (22) by the following: N 6 i j=2 n=l In the VD-DFD case, in analogy with Equation the level-i error criterion El is defined by EVD(Xs)= 6 [A 2 ±Iu 2 y2(d, 2A -2)1 w,(x) j=2 n= 1 Dn I 2 32 1 w. (32) D I n 2 IDn P n,I oo* *To form the cumulative error criterion CEi (xs) requires an interpolation from 20 the grid X+ 1 to the grid X. This may be done by writing the cumulative error criterion as a second-order Taylor series around its minimum location sl+i(xi+2) as follows: CE, s 4 (x, 1 s 1 (xt (33) followed by interpolating the Taylor coefficients and the minimum locations bilinearly to the finer grid X 1 the finer grid Xi.

541241au.doc 18 Fig. 7 shows the processing steps of an algorithm 800 for calculating the range from two images gi and g2 captured with the single-camera passive range finding system shown in Fig. 1. Steps 802 and 804 calculate the level 1 CDWT of images gi and g2.

In order to obtain better accuracy in the range map Zi(xi), further levels up to level Imax may be computed. For instance, to add a level 2, steps 814 and 815, using the outputs of the precursor level 1 CDWT calculations in steps 802 and 804 respectively, calculate the level 2 CDWT of images gi and g2. Further levels may be added in a similar manner, up to the predetermined level 1max. The absolute upper limit on level Imax for an N by N pixel image is log 2 However, in the preferred implementation level !max is set somewhat less than this, at around log 2 The error criterion El is calculated for each level. Step 816 calculates the level 2 error criterion E 2 whereas step 806 calculates the level 1 error criterion El. For the special VP-DFD case, this may be done using Equation (30) or for the VD-DFD special case, Equation (31) may be used.

Starting at level !max, the cumulative error criterion CEI(xt,s) is calculated using Equation For levels Imax-1 to 1, the cumulative error criterion CEI(xt,s) also includes the error criterion El of the next level. For instance, step 818 calculates the cumulative error criterion CE 2 (x 2 for level 2 by adding the level 3 interpolated cumulative error criterion CE 3 (x 2 to the level 2 error criterion E 2 Once the level I cumulative error criterion CEi(xt,s) has been calculated, an interpolation from grid X to grid Xi-1 is performed. For instance, step 819 performs an interpolation from grid X 2 to grid X 1 The interpolated cumulative error criterion CE 2 (xi,s) may now be added in step 808 to the level 1 error criterion E 1 to calculate the level 1 cumulative error criterion CEi(xi,s).

By calculating the minima of the level 1 cumulative error criterion CEi(xi,s) in step 810, thereby calculating an s value for each grid point xl in the level 1 grid X 1 a range Zi(xi) is calculated for each grid point xi in step 812.

Each level 1 yields a range map Z four times denser than the range map Zi-I above. Algorithm 800 therefore may be used to build a multi-resolution representation of the range map. The highest levels of the hierarchy contain coarse approximations to the range derived from the largest scale features of the images gi and g2. The range map Zi is gradually refined and condensed as more levels of detail (finer scale features) are incorporated. This coarse-to-fine strategy accords with an intuitive understanding of how 541241au.doc 19 the human visual system perceives the depth of a scene. In addition, it reveals a clear relationship between resolution and computation: each added level of detail requires as much computation again as has already been carried out over previous levels. An implementation can therefore trade off computation load against the required accuracy of the range map by varying the halting level, m.in.

There are fundamental geometric limits on the performance of the range sensing apparatus 10. Firstly, there is a finite "field of view" for the apparatus 10. Referring to Fig. 8, if a bright point 320 is further than some limit Hmax from the optical axis 400, its range cannot be estimated. This occurs for one of two reasons, each of which imposes its own Hmax: a) The blurred, shifted image (not illustrated) of the point 320 is not wholly contained within the dimensions of the image sensor 230. This gives a limit D -cz HK. (34) Ki where Kj is the absolute magnification of the optical system, given by Equation and c is the lens aperture radius.

b) The lens aperture c truncates the optical path. In other words, the PSF hi(x) is not completely determined by the SLM 210, so that the model of Equation is not valid. In S•this case the limit is z

A

S• 20 where a is the radius of lens 220. The lesser of these two limits gives the radius of the field of view at a given range Z.

It is noted from Equation that Hmax(Z) 0 when Z Zmin given by ZtinZ (36)

C

1a a This is the absolute lower limit of usefulness for the range sensing apparatus Further limits on Z are imposed by the actual method of computation outlined above. The VP-DFD algorithm depends on the measurement of a phase difference between two corresponding image coefficients. The phase difference increases as range Z 541241au.doc 1 1 1 deviates further from a focal plane range Z 1 defined by the Lens Law as f d, Z, where the phase shift is zero. Because the phase difference can only be measured modulo 27r, ranges greater than a certain distance from the reference range cannot be measured unambiguously.

From Equation an expected phase difference li is given by lj alpu'(T -Tj) (37) so the limits on Z are specified by apr r(38) II Umax I ijl.max Substituting for a, and P and inserting the maximum frequency from Table 1, we obtain the limits as follows: Z max (39) -1+sax Z ASmax 1 Smax where 2min+l Smax 3" jn 2 15 It is noted that the limits Zmin and Zmax are dependent on the lowest CDWT level /min incorporated (subject to the absolute limit in Equation These limits Zmin and Zmax may be widened by halting at a higher level /min during the algorithm 800, at the cost of resolution in the final range map Z 1 No such limits are imposed by the VD-DFD algorithm it merely degrades as the object moves away from the object plane 310. This degradation is reflected in the gradually decreasing confidence measure resulting from the lower image energy as blurring increases.

A cone-shaped "region of accuracy" 420 may be defined incorporating the limits of the range sensing apparatus 10 and those of the algorithm 800.

541241au.doc The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including" and not "consisting only of'. Variations of the word comprising, such as "comprise" and "comprises" have corresponding meanings.

.0 *oo 541241au.doc

Claims

1. A method of determining range information from a plurality of images of a visual scene, said plurality of images being images of said visual scene captured using a single- lens camera with different optical configurations, said method comprising the steps of: calculating each of a first predetermined number of levels of multi-level Gabor Transforms for each of said plurality of images; determining an error criterion from level N of said multi-level Gabor Transforms of said images; refining said error criterion using a lower level of said multi-level Gabor Transforms of said images to form a refined error criterion; and determining said range information from said refined error criterion.

2. A method as claimed in claim 1 wherein step comprises the sub-steps of: (cl) interpolatingsaid error criterion to said lower level; (c2) determining a lower level error criterion from said lower level of said multi-level Gabor Transforms of said images; and (c3) adding said interpolated error criterion to said lower level error criterion o:oo S- 20to form said refined error criterion.

3. A method as claimed in claim 1 or 2 wherein said refining step is repeated for a second predetermined number of levels lower level N of said multi-level Gabor Transforms of said images. 25 4. A method as claimed in any one of claims 1 to 3 wherein said range information is determined by calculating a maximum likelihood range map from said error criterion. A method as claimed in any one of claims 1 to 4 wherein said multi-level Gabor Transform is a Complex Discrete Wavelet Transform.

6. A method as claimed in any one of claims 1 to 5 wherein said different optical configurations are different focal distances. 541241aul.doc

7. A method as claimed in any one of claims 1 to 5 wherein said different optical configurations are different transverse offsets of an optical mask.

8. A method as claimed in claim 7, wherein each of said plurality of images are captured using steps comprising: controlling an aperture controlling means with a mask function for preventing a portion of light intensity from reaching said optical sensor, wherein said mask function is a repositioned version of a template mask function; and capturing said image of said visual scene on an optical sensor.

9. A method as claimed in claim 8, wherein said optical sensor comprises a CCD array. A method as claimed in claims 8 or 9 wherein said aperture controlling means comprises a programmable spatial light modulator having spatially varying opacity which is controllable to form said mask functions. An apparatus for determining range information from a plurality of images of a visual scene, said plurality of images being images of said visual scene captured using a single-lens camera with different optical configurations, said apparatus comprising: S~ •means for calculating each of a first predetermined number of levels of "multi-level Gabor Transforms for each of said plurality of images; means for determining an error criterion from level N of said multi-level Gabor Transforms of said images; 25 means for refining said error criterion using a lower level of said multi-level Gabor Transforms of said images to form a refined error criterion; and means for determining said range information from said refined error criterion.

12. An apparatus as claimed in claim 11 wherein said means for refining comprises: means for interpolating said error criterion to said lower level; means for determining a lower level error criterion from said lower level of said multi-level Gabor Transforms of said images; and 541241aul.doc means for adding said interpolated error criterion to said lower level error criterion to form said refined error criterion.

13. An apparatus as claimed in claim 11 or 12 wherein said means for refining operates on a second predetermined number of levels lower level N of said multi-level Gabor Transforms of said images.

14. An apparatus as claimed in any one of claims 11 to 13 wherein said range information is determined by calculating a maximum likelihood range map from said error criterion. An apparatus as claimed in any one of claims 11 to 14 wherein said multi-level Gabor Transform is a Complex Discrete Wavelet Transform.

16. An apparatus as claimed in any one of claims 11 to 15 wherein said different optical configurations are different focal distances. *17. An apparatus as claimed in any one of claims 11 to 15 wherein said different :optical configurations are different transverse offsets of an optical mask. S

18. An apparatus as claimed in claim 15 further comprising: a single lens; an optical sensor for capturing said images by recording optical intensity of light travelling from said visual scene through said lens; and 25 an aperture controlling means for providing mask functions for masking said lens to prevent a portion of said optical intensity of said light from reaching said sensor, wherein said mask functions are repositioned versions of a template mask function.

19. An apparatus as claimed in claim 18, wherein said optical sensor comprises a CCD array. 541241aul.doc An apparatus range sensor as claimed in claim 18 or 19, wherein said aperture controlling means comprises a programmable spatial light modulator having spatially varying opacity which is controllable to form said mask functions.

21. A method of determining range information from at least two images of a visual scene, said method substantially as described herein with reference to the accompanying drawings.

22. A range sensor substantially as described herein with reference to the accompanying drawings. 21. A camera comprising an apparatus according to any one of claims 11 to Dated 4 February 2002 Canon Kabushiki Kaisha Patent Attorneys for the Applicant/Nominated Person SPRUSON FERGUSON S 541241aul.doc