CA2196563A1

CA2196563A1 - Apparatus and methods for determining the three-dimensional shape of an object using active illumination and relative blurring in two images due to defocus

Info

Publication number: CA2196563A1
Application number: CA002196563A
Authority: CA
Inventors: Shree K. Nayar; Minori Noguchi; Masahiro Wantanabe
Original assignee: Individual
Current assignee: Columbia University of New York
Priority date: 1995-06-07
Filing date: 1995-06-07
Publication date: 1996-12-19

Abstract

A method and apparatus for mapping depth of an object (22) in a preferred arrangement uses a projected light pattern to provide a selected texture to the object (22) along the optical axis (24) of observation. An imaging system senses (32, 34) first and second images of the object (22) with the projected light pattern and compares the defocused of the projected pattern in the images to determine relative depth of elemental portions of the object (22).

Description

96/41304 ~ b~ ~ 3 PCT~S95~07890 Descri~tion Apparatus and Methodg for D~t~rm;n;ng the Three-Dimensio~al Shape o~ an Object ~sing Active Tllnm;n~t;nn and Relative Rlnrr;n~ in Two Imaqes Due To Defocus .

~ackqround of ~he Invention I. Field o~~the invention.
The present invention relates to t~n; all~Q for mapping a three~ ;nn~l 8tructure or object from two-dimensional images, and more particularly relatea to t~hn;~l~q employing active ;ll n~tion to retrieve depth in~~rr~t;~n, II. Description o~ the related art.
A pertinent problem in ~ ~~tinn~l vision is the recovery of three-dimensional mea~uL~ 8 of a structure ~rom two~ n~l imageg. There have been many proposed solutions to this problem that can be broadly rl~QQ;fi~ into two categories; passive and active. Passive techniques such a~ shape from shading and texture attempt to extract ~tructure from a single image. Such t~n;~l~c are still under investigation and it ia expected they will prove complem.-ntary to other techniques but cannot serve as stand alone approaches. Other passive methods, such as stereo and structure from motion, use multiple views to resolve shape ambiguities inherent in a single image. The primary problem ~n~ollnt~red by these methods has proved to be corresp~nA~n~ and ieature tracking. In ~A~t;~n, paggive algorithmE have yet to demonstrate the accuracy and robustne 8 required for high-level perception tasks such as object recognition and pose estimation.
Hitherto, high-quality three-~; q;~n~l mapping o~ object~ has resulted only from the use of active .

W096/41304 l~ 1 96~ 63 r~ Y,lu~

sen30rs based on time of flight or light striping.
Prom a practical perspective, light stripe range finding has been the preferred approach. In structured enviL, ~~, where active riq~;iqt;~n of a scene is feasible, it offers a robust yet ;n~r~nR;ve solution to a variety of problems. ~owever, it has suffered from one inherent drawback, namely, speed. To achieve depth maps with sufficient spatial resolution, a large number (say, ~ of closely spaced stripes are used. If all stripes are projected simultiqn~nl~ly it is ~ 8;hle to aggociate a unique stripe with any given image point, a process that is necessary to compute depth by triangulation. The cliq~1qirivl approach is to obtain multiple images, one for each stripe. The requirement for multiple images increases the required time for mapping.
F4cus analysis has a major advantage over stereo and structure from motion, as two or more images of a scene are taken under different optical settings but from the same viewpoint, which, in turn, CiL~ La the need for Co~ A~n~ or feature tracking.
~owever, differences between the two images tend to be very subtle and previous ~ol-lt;~n~ to depth from defocu~ have met with limited success as they are based on rough appr~;r-t;~r~ to the optical and sensing -hiqn;~ involved in focus analysis.
Fnnl~ ~iq1 to depth from defocu~ is the relaticn-ship between focused and defocused images. Figure 1 shows the basic image formation geometry. All~light rays that are~radiated by object point P and pass through aperture A, having an aperture ~ r a, are r~fra~t~ by the lens to converge at point Q on the focus image plane I~. For a~thin lens, the relationship between the object distance d, focal length of the lens f, and the image distance d1 is given by the Gau3~ian lens law:

~ WO 96141304 PCI-/US95~07890 2i 9~5/~3 ' ; ' i -~ 1 +'-1 1 d di f ~ ach polnt on an object plane that includes point P is projected onto a single point on the image plane I~, causing a clear or focused image to be formed. If, however, a sensor plane such as I1or I1, does not ~oin~ with the image focus plane and is displaced from it, the energy received from P by the lens is distributed over a patch on the sensor plane. The result is a blurred image of P. It is clear that a single image does not include sufficient information for depth est;~t;nn a3 two scenes defocused to different degrees can produce ~nt;~l image ~
A solution to depth is achieved by using two images iormed on image planes Il and I2 separated by a known physical distance ~. The problem i8 reduced to analyzing the relative blurring of each scene point in the two images and , , ;ng the distance ~ to the focused image for each image point. Then, using d~ , the lens law (1) yields depth d of the scene point. Simple as this procedure may appear, several technical problems emerge when ; ,1~ ;n~ a method of practical value.
First, there i8 the problem of det~rm;n;ng relative defocus. In frequency domain, blurring can be viewed as low-pass filtering of the scene texture.
Relative blllrr;ns can thus in principle be estimated by frequency analysis. However, the_local object texture is unknown and variable. Since the effect of blurring is fre~uency ~p~n~nt, it is not I ~n~ngful to investigate the net blurring of the entire rnl~ nn 3C o+ fr~ nr;~ that constitute scene texture. This observation has forced lnvestigators to use narrow-band filters that isolate more or less single frequencies and estimate their relative attPnn~t;nn due to defocus in two or more images. Given that the d ~n~nt W09~41304 PCT~S95/07890 21 ~6563 ~ ~
fre~uencies of the scene are unknown and possibly spatially varying, one is forced to use compleY
filtering teahn;~-~a that add to the complexity of the process. This ~ lPY;ty makes the approach impractical for any real-time appl;rPt;nn.
A second problem with the depth from defocus terhn;~ is with respect to textureless surfaces. If the imaged surface is textureless (a white sheet of paper, for instance) defocus and focus produce ~
;~nt;~ images and any number of filters would prove ineffective in est;m-t;ng relative blurring.
Particularly in structured envi~. a this problem can be ~ csed by projecting an illl n~timn~pattern on the scene of interest, i.e. forcing scene texture.
Indeed, ;llllm;n~t;~n projection has been suggested in the past for both depth from defocus and depth from pattern size distortion under perspective projection.
For example, Girod et al., "Depth From Defocus of Structured ~ight, n Pro~ ;ngq of the SPIE - The Int'l Soc~y for Optical Eng'g, vol. 1194, pp. 209-215 (1990) ~; arl os~q the uge of a structured light source iL a depth from defocus range ~ensing system. Girod projects a structured light pattern levenly spaced vertical lines) through a large ~elLuL~ lens onto the object surface. Girod detects a single image which has image characteristics derived from the defor--a8;n~
effects of the large dye~Lu~ light source. Girod also suggests use of an anisotropic aperture, e.g., a slit or T-shaped aperture, in conn~at;~n with the light source to produce orth~g~n~l patterrs that can be compared to remove systemic errors due to the limited depth of field of the camera.
Similarly, A. Pentland et al., "Simple Range Cameras Based OA Focal Error," J. Optical Soc~y of America, vol. 11, pp. 2925-3~4 (1994) discloses a structured light sensor which projects a pattern of light (evenly spaced vertical~lines) via a simple slide projector o~to a scene, measures the appare~t blurring ~ w096i4~304 2 1 ~ 6 5 6 3 ~ . ~ /vby~

of the pattern, and compares it to the known (focused) original light pattern to estimate depth.
~ otably, these proposed snl~lt;nnq rely on ev~ln~;nr, defocus from a slngle image. A~q a result, they do not take into account variations in the defocus eV~ t;nn that can arise from the natural textural characteristics of the object.
When cnnq;~r;ng a multiple image system, the relation between magnification and focus must be taken into account. ,In the imaging system shown in Figure 1, the effective image lor5t;nn of point P moves along ray R as the sensor plane is ~;qpl~rPd Accordingly the ~foc~qe~ image formed on plane I~ 1G larger than the focused image that would be iormed on plane I~ and both of these image6 are larger than that formed on plane I2.
This causes a shift in image coor~;n~tes of P that in turn depends on the u~Xnown scene coordinates of P.
This variation in image r-gn;f;r~t;nn with defocus manifests as a corr~ ce-like problem in depth from defocus since it is n~r~RsAry to compare the defocus o_ ~ULL. ~ ;nrJ image ~l~ c in image planes Il and I~ to estimate blurring. This problem has been und~L _~qized in much of the previous work where a precise focuq-magnification calibration of motorized zoom-lensea is suggested and where a registration-like correction in lmage domain is proposed. The r~l ;hr~t;nn approach, while effective, is cumbersome and not viable for many of~-the-shelf lenses.

SummarY of the Invention An object of the present invention is to provide an apparatus for mapping a three-dimensional object from two-~; ~;nn~l images.
A further object of the present invention is to provide an apparatus and method employing active ;ll n~t;nn to retrieve depth ;nfnrr~t;nn based on focus analysis.

WO9~41304 2 1 ~6~63 ~ PCT~S9~07890 ~

A further object of the present invention is to provide a system that uses ;nPYp~nAlve o~-the-shelf imaging and processing hardware.
A further object of the present invention is to provide a system for determining depth information with improved accuracy.
Still a further object of the present invPnt;o~ ia to provide a depth from defocus method that uses two scene images which correspond to different levels of focus to retrieve depth information.
Still a further object of the present invention is to provide an apparatus and method for determining depth from defocus analysis based on a careful analysis of the optical, sensing, and , _ ~t;rnal elements required.
In order to meet these and other objects which will become apparent with reference to further disclosure set forth below, the present invention broadly provides a method for measuring the three-~ aion~l structure of a object by depth fromdefocus. The method re~uires that the scene be ;11 n~tP~ with a preselected ;11 n~tirn pattern, and that at least two images of the scene be sensed, where the sensed images are formed with different imaging parameters. The relative blur~between corr~apon~;ng ~ 1 portions of the sensed images is measured, thereby to ~t~rm; n~ the relative depth of corr~aprn~;ng elemental portions of said three ~ irnAl structure.
The present invention also provides an ~dL~Lus for mapping the three-~; a;rn~l structure of an object by depth from defocus. The apparatus ;rrl~ a active ;lll1min~t;rn means for ;llnm;n~ting the object with a preselected ;11 n~t-on pattern and sensor means, optically coupled to the ;ll11min~t;ng means, for sensing at least two images of the object, wherein the sensed images are taken with different imaging parameters. The apparatus also inrlll~pa depth ~ Wos6141304 2 1 9 6 5 6 3 mea~u,. means, coupled to the sensor means, for measuring the relative blur between the sensed images, thereby to ~t~rm; n~ the depth of the portions of the object.
Preferably, the images are sensed via constant image magnification sensing as an array having X 1 Y
pixels of predetermined width, and the relative blur is measured on a pixel by pixel basis over the X ~ Y pixel array, so that depth ;nforr-t;nn is ~htA;n~ on a pixel by pixel basis over the X ~ Y array.
In one ~ , the scene is ill1 n~t~ by a Xenon light source filtered by a spectral filter having the preselected ;ll1lm;nAt;ng pattern. ~he spectral filter may be selected 80 a~ to produce an ;11 nAt;~n l~ pattern which g~n~r~t~ multiple spatial frequencies ~or each image element sensed. In a different : '-'; ~, the light source is a mono~l," ;c laser light source, and at least two depth images of the scene formed by the laser light and at least one brightness image of the scene formed by ambient light are sensed. In this : '~';- , the relative blur between the sensed laser light images is measured.
Preferably, the preselected illnm;nAt;~n pattern is o~t; m; 7~ 80 that a small variation in the degree of defocus results in a large variation in the measured relative blur. The optimized ;11 nAt;on pattern advantageously takes the form of a r~ctAn~-l ~r grid pattern, and is preferably selected to have a grid period which is substAnt;Ally egual to twice the pixel array element width and a registration phase shift being subst~nt;Ally equal to zero with respect to the sensing array, in two orth~g~nAl directions.
~lt~rn~t;vely, the optimized ;llnm;nAt;~n pattern may have a period which is subst~nt;Ally equal to four 3~ times the pixel element width and a registration pha~e shift being substAntiAlly equal to ~/4 with respect to the sensing array in two orth~g~n~l directions.

, ~ .

WO9~41304 2 1 9 6 5 6 3 8 PCT~S95/07890 In a preferred ~ , two images are sensed, a first image at a position ~rr~p~n~ i ng to a near focused plane in the sensed sce~e, and a second image at a position ~LL ~ A;ng to a far focused plane in the sensed scene.
~ alf-mirror optics may be provided so that the ;lln-;n~ n pattern and the scene images pass along the same optical axis.
Polarization optics which polarize both the 10 i 11 nm; n~ n pattern and the scene images in controlled polarization ~;rPrt;~nq may be used to filter specular r~fl~r~; ~n q from the object.
Adv~n~geo~qly, the sensed images are converted into digital signals on a pixel by pixel basis, and are then convolved to determine power mea~ul t signals that corre~pond to the f, ' ~1 fre~uency of the ;llll~;n~ n pattern at each pixel for each sensed scene image. The power mea~uL, signals are preferably corrected for mis-registration on a pixel by pixel ba~is, such that any errors introduced into the power mea~uL signals because of misalignment between the sensing pixels of the grid and the illnm;n~ n pattern is corrected. Correction may be effP~n~tP~ by multiplying each of the power mea~uL ' signals, on a pixel by pixel basis, by the sum of the squares of the power mea~uL siB al's four r~;3hhnring power mea~uL signals. In addition, the power mea~uL ~ signals are preferably normalized on a pixel by pixel basis The power measurement signals for one of the sensed images are ,:~ ~d, on a pixel by pixel basis, with ~Pt~rm;nPd power mea-uL, ~ for a second sensed image to determine depth information at each pixels.
Look-up table mapping may be used for such comparison.
The pixel by pixel depth information is preferably arranged as a depth map and may be displayed as a wireframe on a bi~rarr~d workstation. If both laser-light depth images and ambient light brightness images ~ WO96/41304 2 1 PCr/US95~07~90 9~63 9~
are sensed, the brightn~8 zmage may be preferably displayed concurrently with the depth map 80 that the three-fl; ~; nn~ structure of the sensed scene and the actual scene picture are concurrently displayed as a t~LuL _.
Based on the above t~rhni~l~, both textured and textureless surfaces can be recovered by uaing an optimized ;lll-m;nR~ion pattern that i8 registered with an image sensor. Further, constant magnification defocusing is provided in a depth-from-defocus imaging t~rhn;~l~, ~rror~;ngly, techniques for real-time three-fl; tnn~l imaging are provided herein which produce precise, high resolution depth maps at frame rate.
The ac~ ying drawings, which are incorporated and constitute part of this disclosure, illustrate a preferred '~ of the invention and serve to explain the pr;nr;r~l~ of the invention.

Brief De8cri~tion of the Drawinqs Figure 1 is a ~; ~l;fied diagram illustration the _ormation of images with a lens.
Figure 2 is a ~; _1; f; Pd diagram showing the a~ L~l~. ' of the optical portions of the present tn~r~nt; nn, Figure 2A is a plan view of the image detectors used in the Figure 1 apparatus.
Flgure 3 is a plan view of the image detectors used in the Figure 2 apparatus.
Figures 4A and 4B show the preferred ~LL~ily of rectangular grids for the projection screen used in the Figure 2 : ' ~fl; .
Figure 5 is a diagram showing the spatial and fn~l~nry domain fnnrt;nn for op~;m;78~;nn of the system of the present invention.
Figure 6 are diagrams showing the determ; n~; nn of the tuned focus operator.

WO9~41304 ~ =, PCTNS9~07890 2 , 9 6 5 6 3 ~ ~
Figure 7 iB a graph of the normalized defocus factor as a function of distance between the image plane and the focused image plane.
Flgure 8 iB a drawing of the optics of the apparatus of Figure 2 showing alternate filter locations.
Figure 9 i8 a plan view of a BPl Prt~hl P filter.
Figures lOA to lOD show alternate aLL~u~ for polarization control in the apparatus of the invention.
Figures llA to F show an alternate grid pattern and the use of tuned f;ltPr; ng to provide two ranges of depth determination therefrom.
Figures 12A through D show further ~ltPrn~te grid patterns and the frequency response thereto.
Figure 13 is a 8; ,l;f;P~ drawing of an apparatus for detprm;n;ng depth of imaged object Pl~ ~ using two dif~erent apertures.
Figure 14 is a plan view of a pair of constant area, variable geometry apertures.
Figure 15 showa a registration shift of the grid pattern with respect to a pixel array.
Figure 15 i8 ~n ~'lJ-~ dLUs for recovering depth infnrr~t;nn and image3 of an object.
Figure 17A through E illustrates a phase ahift grid screen and the resulting ;~ m1n~t;on and spatial frequency respon~e.
Figure 18 i5 a computation ~1OW diagram showing the derivation of depth map ;nfnr~-t;nn from two defocused images.

Descri~tion of the Pre~erred Embodiments Reference will now be made in detail to the present preferred Pmhn~; ' of the ;n~Pnt;nn as illustrated in t~he drawings.
Figure 2 is a simplified optical diagram showing an ~pp~r~tll~ 10 for mapping the three ~ ;nn~l image of an ob~ect 22 within its field of view. The apparatus 10 ;nr1ll~P~ a light source 12, preferably of ~ wo96l4~304 2 1 96 5 63 ;~ PClnrsss/n7~s~

high intensity, such aG a Xenon arc lamp, a strobe light or a laser source. Light source 12 ;11 nm; nAtes a partially transparent screen 14 having a two nAl grid to be projected as ;~ min~ti~ onto object 22. A lens 16 aperture 18 and beam splitter 20 are provided such that an image of the screen 14 is projected onto object 22 along optical axis 24. Such active ;11 n~ti,-n provides a well defined texture to object 22 as viewed from the apparatus 10. In C~mn~t;rrl with measuring depth by defocus ~
image portions of the object 22 can only be measured where they have a defined texture, the defocus of which can be observed in the Apr~rAtll~ 10. The grid ;lln~l;n~tirm pattern provides a forced and well de$ined texture onto object 22.
The ;-~,L~A~ 10 is arranged so that light from light source 12 is projected along the same optical axis 24 îrom which object 22 is observed. For this reason a beam aplltting rf~f~ctor 20 is provided to project the ;lln~;nAt;r~n in the direction of object 22.
R~fl e~-tor 20, which is illustrated as a plane mirror may pre$erably be a prism beam splitter.
Light from the ;11 nAt;r~n grid pattern as r~flecte~ $rom object 22 along the optical axis 24 passes through beam splitter 20 into a detection ;~ppzlrAtll~ 25. Apparatus 25 includes a focal plane aperture 26 and lens 28 for providing invariance of image r-~nif;~At;r~n with vAr;Atirm in the image plane, also referred to as a t~l~c~ntr;r lens system. A
second beam splitter 30 ~ fl~'tR images of objeot 22 onto image detecting arrays 32 and 34, each of which has a different spacing along the optical axis from lens 2 8 to provide a pair of detected images which have different degrees of de$ocus, but identical image ~gn;f;c~t;on~
The characteristics of a tele~.ontr; r lens system using a single lens will be rl~ r;h~d in ~ nn~rt;~ln with the diagram of Figure 3. As previously discussed W096141304 21 96563 ~ i ' r~
/~ .
with respect to Figure l, a point P on an object i6 imaged by a lens 28 onto a point Q on the ~OCU3 plane euLL ~ ;rg to the image distance di from lens 28.
Image distance dl is related to the object distance d from lens 28 to point P on the object by the len6 formula (l). At image plane6 Il which is 6paced by distance ~ from the focu6 plane I~, a blurred image corresponding to an enlarged circular spot will result from the defocusing of object point P. Bikewise, a blurred image of point P will be formed on image plan I" located a distance ~ from image plane Il. Image planes Il and ~ are on opposite sides of focuG plane I~. The objective of the tPl ~c~ntric len6 system, wherein aperture 26 is located a distance f, CULL ~ ;ng to the focal length of lens 28, in front of the lens 28 is that the size of the resulting images on image planes Il and I2 will be the same as the ideal image that would ~e projected on focus plane Ii. This is evident from the fact that the optical center of the 20 refracted image path behind lens 28, which is designated by R' is p~rAllrl to the optical axis 24.
Of course, those skilled in the art will appreciate that the location of the apparatus will depend on the optical properties of the lens choaen. Indeed, certain lenses are r-m1f~t11red to be tPl~rrntric v-nd do not reciuire an ~;t;~n~l d~elLure~ .
In the ~rp~r~tnq lO of ~igure 2 image planes 32 and 34 are located at ~;ff~r~nt optical distances from lens 28 corr~p~n~;ng, for example, to image planes Il 30 and I2 respectively. Because a t~l~c~ntric lens system is used the images ~t~rt~ on these image planes will have ;~nt; r~ 1 image size and corresponding locations of image ~ with different amounts of defocus =
resulting from the different distances from lens 28 along the optical paths to the image planes.
~ igure 2A is a plan view from the ;nr;~nt optical axis of image sensors 32, 34, which are preferably charge coupled devices (CCD). In a preferred O WO 96/41304 2 ~ 9 6 5 6 3 ~ r~
~ ~ ~ C ~
aLL~n~ , the image planes consists of image sensing cells 36 which are preferably arranged in an array of 512 by 480 Pl ~ at regular spacings Px and py along the x and y airections. The CCD image ~PtPrtnr is preferred for use in the systems of the~present invention because of the digital nature of signal i;ng elements, which facilitates precise ;~pnt;f;r~t;nn o_ picture Pl~ ~A in two image planes 32 and 34 so that a correlation can be made between the image Pl ~. Alternatively other image sampling devices, such as a television receptor tubes can be used with ~Lu~Llate analog-to-digital conversion.
~owever, the use of such analog devices may result in possihlP 1088 of precision in detPrm;n;nr~ the defocus effect upon which the present system depends for depth pPrrPpt;nn.
Figure 4A is a plan view of a screen 14 according to one of the preferred ~ ; R of the present invention. The screen 14 may be formed on a glass sub~trate, for example by photo-etching techniques, and has a rhprkprho~rd grid of ~ltPrn~t;nn~ transparent and opaque P~ q of a size bx times by and periodicity tx ind ~, which is sPle~te~ such that the individual black and white rhPrkPrho~rd square~ of grid 14 project images which, after rPflPrt;ng ovf_ the object 22, are ~nr;~Pnt o~ the photo image detectors 32 and 34 with an image grid element size rnrrPRpon~ln3 to the size and periodicity of image ~PtPCt;ng Pl ~ 36 in arrays 32 and 34. Accordingly, the illnm;n~t;Gn grid period (tx, ty) of the projected grid image on the photo ~Ptectnr arrays is twice the spacing (Px, py) of the imaging Pl ~ of the array, which uuLla~ullds to the detected pixel ~ of the image. Those skilled in the art will recognize that the defocused images of grid 14 on ~n~;vidual pixel Pl s of ~Pterton~ 32 and 34 will be defocused ~rom the ideal image by reason of the spacing o~ arrays 32 and 34 from the ideal ~ocusing plane ovf the image. The amount of such W096l4l304 2 1 ~ 6 5 6 3 ~t~ PCT~S95/07890 O

defocusing for each image element and for each detector array ia a function of the spacing S of the corrP~ptntl;ng object portion from the ideal focused plane F in the object field for the corrP~pt~rt~;nt3 tlPte~t~r array. By ~P~Pct;ng image intensity in each of the detecting pixel Pl~ ~ 36 of detecting arrays 32, 34, it i8 possible to derive for each defocused image on ~P~P~; ng arrays 32, 34 the amount of defocusing ofjthe projected grid which:results from the spacing S of the ~tJL~ n~;nJ object portion from the plane of focus F in the object field for that array.
Accordingly, two mea~uL ~ of the defocus factor, as described below are obtained for each Pl Rl area of the object, and such defocus factors are used to compute a normalized defocus measure which can be directly mapped to the depth of the corrP~p~nt1;ng object portion corrP~p~rtl;ng to the pixel element, as will be described below.

Opt;~n;7~tlrn Proce8s Ir. order to better describe the principles of operation of the method and apparatus of the present invention, it is useful to have an understanding of the analytical tPrhn;t~P~ by which the optimum parameters of the apparatus and method were determined. Such analytical tP~hn;~P~ take into account ~~llng of both the ;~ m;n~t;~n and imaging fnntt;tnt~ to analyze the physical and , ~t;~n~l p~ tPrs involved in the depth from defocus determination, and its application to varying object field re~uirements.
There are five different Pl~ t~t or ~ Antc, that play a critical role. We briefly describe them before proceediug to model them.
1. Tl 1 In~tion Pattern: The exact pattern used to ~llnmin~tt~ the gcene ~P~PrminP~ its final texture. The spatial and spatial ~re~uency characteristics of this texture determine the behavior of the focus measure and hence the accuracy of depth est;r-t;tn. It is the ~ Wo9~41304 2 1 9 6 ~ 6 3 ,~ ~ ~ i?; ' PCT~S9Yo7890 parameters of this cnmrnnPnt that we set out to optimize 30 as to achieve maximum depth accuracy.
2. Optical Transfer Function: The finite size of the lens aperture 26 imposes restrictiona on the range of spatial frequencies that are ~tent~h1P by the imaging system. These restr;rt;on~ play a critical role in the opt;m;~t;nn of the ;llllm;n~tion pattern. Upon initial ;n~pe~tjnn~ the optical transfer function (OTF) seems to severely constrain the range of use~ul ;llnm;n~t;nn patterns. ~owever, as we shall see, the OTF's limited range also enables us to avoid serious problems such as image aliasing.

3. D~oc~Q;ng: The depth d of a surface point is directly related to its defocus (or lack of it) on the image plane. It is this rh that enables us to recover depth from defocus. It is well-known that defocus is ess~nt;~lly a low-pass spatial filter.
Eowever, a realistic model for this rhPr~ is imperative for focus analysis. Our objective is to determine depth from two images on planes Il and I~ by est;m-t;ng the plane of best focus I~ for each scene polnt P.

4. Image Sensing: The two images used for shape recovery are of course discrete. The r~l~t;nn~h;r between the cnnt;nnnll~ image formed on the sensor plane and the array discrete image used in computations is determined by the shape and spatial arrangement of sensing ~ 36 (pixels) on the image ~tpctnrs 32, 34. As will be ~hown, the final ;ll n~t;on pattern 14 will inclnde Pl~ ~nt~ that are comparable to the size Px, py of each pixel on the sensor array.
Therefore, an accurate model for image sensing is e~sential for ;llllm;n~tion nrt;m;7~tjnn 5. Focus Operator: The relative degree of defocus in two images is estimated by using a focus operator.
Such an operator is typically a high-pass filter and is applied to ;screte image P~ . Interestingly, the W096/41304 2 1 q 6 5 6 3 , ~

optimal ;11 n~tion pattern is also ~L~n~l "I on the parameters of the focus operator used.
All the above factors together determine the relation between the depth d of a scene point P and its two focus measures. Therefore, the optimal ~ min~t;on grid 14 ig viewed as one that ~-~;mi7 the sensitivity and robustness of the focus measure function. To achieve this each is modeled in spatial as well as Fourier domains. Since we have used the t~ler~ntric lens (Figure 3) in our impl: ~t;nn, it~s parameters are used in developing each model.
However, all of the following expressions can be made valid for the classical lens system (Figure 2) by f : di simply replaclng the factor , by a Ill_ n~tion Pattern Before the parameters of the illumination grid 14 can be ~t~rmi n~, an ill n~tion model must be defined. Such a model must be flexible in that it must subsume a large enough variety of possible ;11 ;n~tion ao patterns. In ~f;n;ng the model, it is meaningful to take the characteristics of the other r~ ~nn~nt~ into cnnR;~r~tinn. As we will describe shortly, the image sensors 32, 34 used have rectangular sensing ~
36 arranged on a rrrt~n~l~r spatial array as shown in Figure 2A. With this in mind, we define the following i11 nmi n~tion model. The basic building block of the model is a rectangular ill in~ted patch, or cell, with uniform intensity:
iC(x,y) =iC~x,y; bx~by) =2II ( bl x, b Y) ~ 12) where, ~II() is the two-~i inn~l Rectangular function. The unk~own parameters of this illnmin~tion cell are bX and i~, the length~and width of the cell.

_ Wo96/41304 -17- p ~ 2 ~ 95563 This cell is assumed to be repeated on a two-dimensio~al grid to obtain a periodic pattern as shown in Figures 4A and 4B. This periodicity is essential since our goal is to achieve spatial invariance in depth accuracy,~ i.e. all image regions, irrespective o_ their distance irom each other, must pos6elss the same textural characteristics. The periodic grid i~ de~ined as:
~ ,y) i~(x,y; tX,ty) =2III( 2( 1 x+ 1 y~ x 1 )) where, IIII() is the 2-dimensional Shah fnn~t;nn, and 2tX and 2ty determine the periods o~ the grid in the x and y directions. ~ote that this grid is not rectangular but has vertical and horizontal symmetry on the x-y plane. The +inal ;ll ;n~t;~n pattern i(x,y) is ~btA;n~ by convolving the cell iC(x~y) with the grid ig(x~y) i(X,y)=i(X,y; bx~by~tx~ty) =iC(x~y) ~ig(X,y) (4) The exact pattern is therefore ~t~rm;n~d by +our parameters, namely, bx, by, tx and ty. The above ~1nm;n~t;~n grid ig not as restrictive as it may appear upon i~ltial ~n~p~rtj~n. For instance, bx~ by, 2tX and 2ty can each be stretched to obtain repeated ~l1nm;n~tion and non-il1-lmin~tion stripes in the horizontal and vertical directions, respectively.
~lt~rn~t;vely, they can also be adjusted to obtain a ~h~k~rh~rd ;llnm;n~t;~n pattern with large or small ;11 n~t~ patches. The exact values for bx, by, tx and ty will be evaluated by the optimization procedure described later. I~ practice, the ;ll nAt;~n pattern determined by the optim;7~t;nn is used to f~hr;~te a screen with the same pattern.
The opt;m;~ti~n ~L~ceduL~ requires the analysis o~ each -nt c~ the 8ystem in spatial domain as . . ~ . ~., WO96/41304 PCT~S9~07890 2 1 96563 , 8' ' '" ' well as frequency domain (u,v). The Fourier tran3forms of the ;~ m;nation cell, grid, and pattern are denoted as Ic(u,v), Tq(u,v), and I(u,v), respectively, and found to be:
r (u v) ~ Ic(u~v; bD~ by) = bs ~b5~ by (b (u~ v) = I~(u~ v; t=~ ty) = 2lIl((tsu + tl~v)~ (tru--tvv)) (~

I(u~ v) = I(u~ v; bS, by~ tS~ ty) = Ic(u~ v) I9(u~ v) O~tical Tr~r~f~r F~m~t;nn Ad~acent points on the illl n~t~ surface reflect light waves that interfere with each other to produce diffraction effects. The angle of diffraction increases with the spatial frequency of surface texture. Since the lens aperture 26 of the imaging system 25 (Fl~ure 2) i8 of finite radius a', it does not capture the higher order ~;ffr~tionR r~ te~ by the surface. This effect places a limit on the optical r~nl-~tion of the imaging system, which is characterized by the optical transfer function lOTF):

O(U,V) = O(u,lli a/ f) (s) = 1 ($) (7--sin~~ fi~< zr' l~~ v~ J
where~=2cos~'(~

where, (u,v) is the spatial frequency of the two-n~l surface texture as seen from the image side of the lens, f is the focal length of the lens, and A
is the wavelength of ;n~ n~ light. It is clear from 2C the above expression that only spatial frequencies ~ W096/41~.04 ~ .f~,r~
2 1 96563 ,~

below the limit ~f will be imaged by the optical system (Figure 5). This in turn places restrictions on the frequency of the ;1l ;nFt;r~l pattern. Further, the above fr~lr~nry limit can be used to "cut off'~ any desired r~mber of higher harmonics produced by the ~ ;n-~t;on pattern. In short, the OTF is a curse and a blessing; it limits the ~r~tr-~rtAhle range of frequencies and at the same time can be used to 7n;n;7n;7e t~e detrimental effects of ;1;A~;ng and high-order ~ ;rA.

Pefocus~inr~
~ ,7~r~rr;nrJ to Figure 3, a i8 the distance betweenthe focus image plane Ir of a surface point P and its defocused image formed on the sensor plane Il. The light energy radiated by the surface point and cr7l1~rt~d by the imaging o~tics is nn;fornly distributed over a circular patch on the sensor plane.
This patch, also called the p~llbox, is the defocus function (Figure 7~:

7(T y) = h(r~y;cr~a~f) = 21rar2C~ (2ac~) (s7 where, once again, a' is the radius of the t~l~r~ntric lens d~eLLuL--. In Fourier domain, the above defocus fllnrt;rn is given by:

I(U~V) = H (u,v;cr~a'~f) = ~a~ ~ 5J (2~ra~r ~ ) (10 where J1 is the first-order Bessel function. As is evident from the above expressior., defocus serves as a low-pass filter. The bandwidth of the filter increases a~ a x decreases, i.e. as the sensor plane Il gets closer to the focus plane I~ In the extreme case of = O, ~(u,vJ passes all frequencieg without attr~nn tirn .~ ; .~ . . .- ;

Wo96/4l304 2 1 9 6 5 6 3 r~

producing a perfectly focused image. Note that in a defocused image, all frequencies are attenuated at the same time. In the ca3e of passive depth from focus or defocus, this poses a serious problem; different frequencies in an unknown scene are bound to have different (and unknown) magnitudes and phases. It is al;ff;rnl~ therefore to estimate the degree of defocus of an image region without the use of a large set of narrow-band focus operators that analyze each Erequency in ;qrl~t;rn This again ;n~;ratrR that it would be desirable to have an ;ll n~t;nn pattern that has a single la n~nt frequency, rn~hl;ng robust estimation of defocus and hence depth.

~maqe Senslnq 15 We assume the image sensor to be a typical CCD
sensor array. Such a sensor can be modeled as a rrct~n~l ~r array 32, 34 of rectangular sensing rl~ 'q 36 (pixels). The quantum rff;r;rnry of each sensor 36 is assumed to be uniform over the area of the pixel. Let m(x,y) be the rr,nt;nnnllq image formed on~
the sensor plane. The _inite pixel area has the effect of averaging the crnt;nnrnq image m(x,y). In spatial domain, the averaging function is the rect~n~l ~r cell:

~2~y) = ~SC( tyiwr1wy) = ~ It--y) ( r Y

where, wx, and wy are the width and height of the said 2~ sensing element 36, respectively. The discrete image is obtained by sampling the convolution of m(x,y) with (x,y). This sampling function is a rectangular grid:

9( 1 Y; P=~ Pv~ Ov) IIl(p~ (Y -- YJv)) ( 1'~

where, Px and py are spar;ngq between discrete samples in the two spatial dimensions, and (~ y) is phase shift of the grid. The final discrete image is therefore:

~ Wo9~41304 PCT~S9~07890 2 196563 ' - ~
r~L( I ~ Y~ = (.7~( 1, y~ * m( ~ Y)) D~ . Y) ( 13 The p- ~rs wx, wy, p~, and py are all ~t~rmin~ by the particular image sensor used. These parameters are therefore Ymown and their values are substituted after array opt;m;7rt;c-n is done. On the other hand, the phase shift (~ y) of the sampiing function is with respect to the ;11 'n ~t;on pattern and will also be viewed ag ;11nm;n~t;nn F-- t~rs during opt;m;~t;~n To r~rorJn; ~r the importance of these phase parameters one can visualize the variations in a discrete image that arise from simply tr~n~1~t;ng a high-frequency illl n-tir~n pattern with respect to the sensing grid.
In Fourier domain, the above averaging and sampling fl1n~t;rtn~ are:
r.in(-rw~u) sin~wyu) ~(u,~= SC(u,v;w~,wj)= w;, ~w~ (14 ~rwru ~WyU

~,7(U, v) = Sg(ut v; pz~ pyt ~7z7 ~7~ 15 = 21II(pzu~pyv)~ 2~ u+~ v) The ~in,1 discrete image is:
~fL(U~ V) = (SC(U, V) ' M(U, V)) * SD(U~ V) (16 1~ Focus O~erator ~ ince de~ocusing has the ef~ect of suppressing high-frequency c ~ '~ in the focused image, it desirable that the focus operator respond to high frequencles in the image. For the purpose of ;11 n~tion cpt;mi7~ti~n we use the T.~p1~ n.
However, the derived pattern will remain optimal for a large class of symmetric focus operators. In spatial domain, the digcrete T.~pl~ri ~n i8 WO96141304 PCT~S95/07890 T~y) = I(~,y; qr~ qli) ~17 40(T) ~ ~i(y) -- [~j(T) ~(y qy) + ( ) (Y q") +~ qs~ ~5(Y) + ~5( + qs) ~(Y)]
Here, ~, and qy, are the ~pa~;ngq between neighboring ~1~ ' R of the diacrete rAp1~;An kernel. In the opt;m;7At;rn, these sr~c;ngq will be related to the ; l l n~t; on parameters. The Fourier t~ansform of the discrete T.AP1 A~i An is:
(u,v) = L(u,v;qr,qv) (18 = 2(1-cos(2~q~u))*~(u)+ 2(1 - cos (2~q~v)) * ~(v) ~--2 COB (2trqSu) --2 cos (2rq~u) The required discrete nature of the focus operator comes with a price. It tends to broaden the bandwidth of the op~r~tnr~ Once the pattern has been determined, the _bove filter will be tuned to maximize sensitivity to the fnnfl ~Al ;ll lnAt;nn frequency while m;n;m;7;n~ the effects of spurious frp~l~n~;pn caused either by the scene's inherent texture or image noise.

Focus Measure The focus measure is simply the output of:the focus operator. It is related to defocus ~ ~and hence depth d) via all of the c _ -nt ~ modeled above. ~ote that the illl ;n~t; nn pattern~ *i~) is projected through optics that is similar to that used for image formation. Consequently, the pattern is also subjected to the limits imposed by the optical transfer function o and array defocus function h. There~ore, the texture projected o~ the scene is:
i(T~ y; bS, b", tJ, t~) * O(I, y; a, f) * ~ y; ~ a ~ f) ( 19) where, ~' represents defocus of the ;llnm;nAt;nn itself that depends on the depth of the ;11 nm; n~t~ point.
However, the ;11 nAt; nn pattern once ; nrifl~nt on a ~ W096/41304 PCT~S9~07890 2196563 a 3 '~
sur~ace patch plays the role of surface texture and hence de~ocus ~' o~ ill ;nAt;~n does not have any sign;f;~nt effect on depth est;r-t;nn. The projected texture is reflected by the scene and projected by the optics back onto the image plane to produce the discrete image.

{i(:~,y;bz,bv,tr,tv)~o(r,y;a',f)'2~h'(~,y;~',a',f)*h(T,y;~,a',J) (20) *~5C(~y;2~l~wv)} ~ 5g(7~YiPs~Pv~0=~50v) where, oA2 = o ~ o. The final focus measure function g(x,y) is the result of applying the discrete T.ArlAr;~n to the above discrete image:
T, y) = ~, (i (T, y; bs~ by~ tr~ tv) $ o~ r~ y; a, f ) (2 1 ~h'(I,y;cr~,a~lf) ~ h'(~y;c~,a',f~s~(7,y;w5~wv)) 5~/(2~ y; pr~ Pv~ s- Yv) } 'I' l(2~ y; qr~ QV) = {(i~o'2~1 2~5C)-lo}~l Since the distance between adjacent weights of the T,~rl ~ n kernel must be integer multiples of the period o~ the=image sampling function ~, the above expression can be rearranged as 2~y) -- (i * o'~ * h~ * h * sc * 1) ~ sg (2 = Yo SQ
where, gO = ~ ~ oA~ ~ h'~ h ~ 80 ~ l. The same can be expressed in Fourier domain as:
G(u, v) = (I o2 ~ H' H Sc L) ~ 59 (23) Go ~ 5~

The above expression gives us the final output of the focus~operator ~or any value of the defocus F~ a_ It will be used in the following section to determine the optimal ;ll nm; n~t; nn pattern.

W096/41304 P~~
21 96563 a~
o~tim;7At;on In our impl At;~n, the ;~ m;n~t;on grid projected on the object 22 using a high power light aource and a t~1~r~ntric lens identical to the one used to image the scene. This allowa ua to assume that the projected ;11 rAt;~n ia the primary cause for surface texture and is ~Lully~r than the natural texture of the surface. Conse~uently our results are applicable not only to textureless surfaces but also textured ones.
The ;11 n~t;on opt;m;z~ti~n problem is full lAt~ as follows: Establish closed-form relationships between the ;11 n~t;~n parameters (b~, byl tx, ty)l sensor ~a~ t~rs (wx, wy, ~x, Pyl ~x, ~y~/ and discrete T~rlA~;An parameters (9~, qy) so as to maximize the sensitivity, robustness, and spatial resolution of the focus measure g(x,y). High sensitivity implies that a small variation in the degree of focus results in a large variation in g(x,y). This would ensure high depth estimation accuracy in the presence of image noise, i.e. high signal-to-noise ratio. By robustness we mean that all pixel sensors 36 with the same degree of defocus produce the same focus measure ;n~r~n~nt of their ]o~At;~n on the image plane. This ensures that depth estimation accuracy is invariant to lo~at;~n on the image plane. ~astly, high spatial resolution is achieved by m;n;m;7;ng the size of the focus operator This ensures that rapid depth variations (surface nt;n-l;ties) can be detected with high accuracy In order to minimize smoûthing effects and ~-~;m; ~e spatial resolution of computed depth, the support (or span) of the ~ r~t~ Laplacian must be as small as possible This in turn requires the frequency of the ;11 ;n~tion pattern be as high as possible.
However, the optical transfer function described in section 4.2 imposes limits on the highest spatial fre~uency that can be imaged by the optical system.

~ wo 96141304 r ~ ~ . j r~ ,s ~S
This m~imum allowable frequency i8 Af ~ determined by the numerical aperture of the t~l~rPntric lens.
Since the ;11 'n~t;nn grid pattern is periodic, its Fourier transform must be ~iscrete. It may have a zero-fnpqu~nry _l~n~nt, but this can be safely ignored since the T~rl~r;~n operator, being a sum of second-order derivatives, will eventually remove any zero-fnP~l~nry : _ -nt in the ~inal image. Our objective then i8 to m ~imize the fnn~ ~l spatial frequency tl/tX, l/ty) of the illnm;n~t;~n pattern. In order to m~imize this freguency while r-;n~t~;n;ng high ~tert~h-l;ty, we must have ~(l/tx) + (l/t~ close to the optical limit Af - This in turn pushes all higher ~,~ A in the ill~~m;n~t;~n pattern outside the optical limit. What we are left with is a surface texture whose image has only the quadruple fnn~ ~l fr~ nr;~r (~l/tx~ ~ l/ty)~ As a result, these are the only frequencies we need consider in our analysis of the focus measure fnnct;~n G(u,v).
Before we c~nR;~r the final measure G(u,v), we examine Go(u~v) the focus measure prior to image ! ~ 1; ng For the reasons given above, the two-dimensional Go (u, v) i5 reduced to four discrete spikes at (l/tX, l/tyj, (l/tX, -l/ty), (-l/tX, l/ty) and (-l/tX, -l/ty)~ Since all r ~ ~ (I, o, ~, 5~ and ~) of Go ~ are reflectio~ ~y trir about u = O and v = O, we have:

~(--,--) = Go(--,---) = Go~---,--) = G~(--1,--1 ) (2 ts ty t3: ty ts ty s y where wo 96141304 2 1 9 6 5 6 3 ~ ~ ~ PCT~S9V07890 Go~ t ~t ) -- I(t ~t ibr7 by~ tr7tv) o2( t ~tl ;a~J) (25 ( t ~ t; ~ a', f ~ H( t ~ t; ~, a~, f ) S'(t ~ t ;ws~ W9) L(t ~ t; q '59)' Therefore, in frequency domain the _ocus measure function prior to image ~ _l;ng reduce3 to:
Go(u~v) = G~(t ~ t ) (26) {~(u--t ~v ~ t ) +~i(u + t ~u--t ) +~i(u----~v+--~+~5(u+ t ~v+ t )}

The fllnrt;~n gO(x,y) in image domain, is aimply the inverse Pourier transform of Go(u,v):

go(I~y) = G~(t' t ) ~4cos2~ ~-cos2~t Yl (27 Note that gO(x,y) is the product of cogine fl~n~t;~n~
weighted by the co~ff;~;~nt Go(l/tx, l/ty)~ The defocus function h has the effect of reducing the co~ff;ri~nt Go(l/tX, l/ty) in the focus measure gO(x,y). Clearly, the sensitivity of the focus measure to depth (or defocus) is opt;m;7~ by m-~;m~7;ng the ~ff;~;~nt .
Gofl/tX, l/ty) with respect to the unknown parameters of the system. This opt;m; 7~t;on procedure can be summarized as:

~tr ~(tr'tv) ' ~tv ~(tr'tv) ' (2~) ab G~(t ~ t ) = ~~ ~b G~(t--' t ) = ~~ (29) - Go(-~-)=0~ - Go(-~-)=0~ (30) ~qr tr ty ~7qu tr tv ~ Wo96/4l304 2 1 9 6 5 6 3 t ~ 9 PC~U595~7890 Since tx, and ty show :p in all the ,lullenL8 in ~25), the first two partial dêrivatives (~ t;nn (28)) are ~;ff~rlllt to evaluate. Fortunately, the clerivatives in (29) and (30) are sufficient to obtain r~l~t;nn~ between the system ~ rS. The iollowing rêsult --~;m~7~ sensitivity ~n~ gpAtial r~nlnt;nn of the iocu8 measure g(x,y):
br = 2t~ ~Y = 2tY ~31) qs = 2t-~ qY = 2tY (32, Next, we examine the spatial robustnesa o~ g(x,y).
Imagine the imaged surface to be planar and parallel to the image sensor. Then, we would like the image r ,l;ng to produce the same absolute value of g(x,y) at all ~ r~t~ sampling points on the image. This entails relating the ;ll n~t; nn and sensing p~ rg so as to f~;l; tAte careful sampling o_ the product of coaine flln~t;nnq in (27). Note that the final ~ocu~ measure is:

(2,y) = go ~0 = G~(t ,t)-{4cos2~t 2-cos2~t y} (33 -lIll(pl ~2 - yz)~ pl~V - Y~Y)) All samples of g(x,y) have the same ab601ute value when the two cosines in the above êxpression are sampled at their peak values. Such a sampling is pn~5;hl~ when:
pr = 2t'~ PY = 2tY (3~) and ~ = 0, ~y = 0 (35) W096/41304 2 1 9 6 5 6 3 ~ ' ~' " P~

Alternatively, the cosines can be sampled with a period of ~r/2 and phase shift of ~T/4. This yields the second solution:
pr = 4tr~ py = 4tv7 (36) ~r = +8tr~ Y~y = +8tV- (37) The above er~uations give two solutions, shown in Figures 4A and 4B both are r~k~.l,o~ d ;7l~l~;nAtinn patterns but differ i~ their flln~ ~1 freriuencies, size of the illumination cell, and the phase shift with respect to the image sensor. Equations (31), (32), ~34), (35) yield the grid pattern shown in Fi~ure 4A.
In this case the grid image and detector are registered with zero phase shift, and the image of the ;llllm;n~t;~n cell hag the same size and shape as the sensor ~ (pixels). The second rnllltin~, shown in Figure 4B, is nht~;nP~ using the e _linrJ solutions (36) and (37), yieldirg a filter pattern with ;llllm;n~t;nn cell image two times the size of the sensor element and phase shift of half the sensor element size.

~uned FQCU8 O~erator For the purpose of illl n~t;~n optimization, we used the T~rl~r;~n op~r~tnr~ The resulting ~ ;n~t;nn pattern has only a single ' r~nt absolute Lre~u~n~y, (l/tx, l/ty)~ Given this, we are in a position to further refine our focus operator so as to m;n;m;7e the effects of all other fr~r~ nr;~c caused either by the physical texture of the scene or image noise. To this end, let us consider the properties of the 3x3 discrete Laplacian (see Figure 6A and 6B~. We see that though the T.~pl ~r;~n does have peaks exactly at (1/tx, l~ty), (l/tX, -l/ty), (-l/tX, l/ty) and (-l/tx, -l/ty), it has a fairly broad bandwidth allowing other WO 961413~14 2 1 9 6 5 6 3 a~
spurious frequencie3 to contribute to the focus measure G in (23), as shown in Figure 6B. ~ere, we aeek a narrow band operator wlth sharp peaks at the above four coordinatea in frequency apace.
Given that the operator muat eventually be discrete and of finite aupport, there ia a limit to the extent to which it can be tuned. To constrain the problem, we impose the following conditiona. (a) To maximize apatial r~nll~t;nn in computed depth we force the operator kernel to be 3x3. (b) Since the f ' ~l frequency of the ;l1 'n~t;nn pattern has a symmetric guadruple dLL~ly , the focus operator must be rotAt;nn~lly symmetric. These two conditiona force the operator to have the atructure shown in~5 ~igure 6~ ~ (c~ The operator must not re3pond to any DC
in image brightnesa. This last rnn~;t;nn ia ~At; ~; e~ if the aum of all Pl ~ of the operator equals zero: ~
a + 4b + 4c = O (38) It ia also imperative that the response Lfu,vJ of the~0 operator to the ~ ' ~1 frequency not be zero:

( 1 1 ) = a + 2b(cc~27rq.t + co~27rq~tJ ) Given (32), the above reduces to:
a - 4b + 4c ~ O . (40) Expresaions (38) and (40) imply that b ~ 0. Without loas of generality, we aet b = ~ ence, (38) gives a 4 (1 -C) . Therefore, the tuned operator is ~t~rmi n~
by a single unknown rA -t~r~ c~ ag ghown in Figure 6D. The problem then is to find c such that the operator~s Fourier transform has a sharp peak at (1/ty~
1/~). A rough measure of sharpness is given by the :; :

WO96141304 PCT~S95/07890 ~
2 1 965~3 , ~
3~
second-order moment of the powerl1~(u, v)211 with respect to ( l~tX, l/ty):
L(~ 12 ¦ O ¦ O [(u _ ~ )2 + (v _ ~ )2] 1¦ L(u-t,v-~ du ( ~1) 7~ 201r2c2 + 6c2 + 48C -- 327r c + 201r2 -- 93) The above measure is minimized when ~c = ~, i.e.

when c = O.658 as shown in Figure 6E. The resulting tuned focus operator has the respon3e shown in Figure 6F, it has subst~nti~lly sharper peaks than the di6crete T.~pl ~r; ~n . Given that the operator is 3x3 and discrete, the sharpne33 of the peak3 is limited. The above derivation brings to light the ~1 ' nl di~~erence between ~P~ign;ng tuned operators in ~nt;nn~ and discrete domains. In general, an operator that i3 deemed optimal in ~nnt; nn~ll~ domain is most likely cub-optimal for discrete images.

DePth from Two Imaqes Depth e3t; r-t; ~n uses two images of the scene I1(x, y) and I2(x, y) that corre3pond to different e~fective focal lengths as shown in Figure 3. Depth of each scene point is ~Ptprm;no~ by e~t;r-t;nS the ~;~pl~1 a of the ~ocused plane If for the scene point. The tuned focus operator is applied to both ~
images to get focus measure images g1(x, y) and g2fx,y).
From (33) we see that: ~

g1(Z y) G~(ttrl~J1~) (42) 92(~,Y) Gb(t~
From (23) we see that the only factor in G~ a~fected by parameter a is de~0cu3 ~unction H.

~ WO96/41304 ~ 96563 ~ ~/US95/07890 91(2~y~ H(~t ~ s~ ) 43 92(:1:.y) H(~,, sl i'~--13) ( ) Note that the above measure is not bounded. Thi3 poses a problem from a c _- ~t;nn~l viewpoint which is easily L~ -~;P~ by using the following norm-1;7~t;nn:

91(2~Y)--g2t2~y) ~ )--H(sl ~ t~ d) ~71(2,y) + 92~2,y) ~(~ I) + H(~ - d) As shown in Figure 7, g is a monotonic function of such that -p c q ~ p, p ~ 1. In ~ractice, the above relation can be pre- _ Pd and stored as a look-up table that maps q _ e~ at each image point to a unique ~. Since a represents the position of the focused image, the lens law ~1) yields the depth d of the ccrrp~pnn~; ng scene point. Note that the tuned focus operator ~P~;gnPd in the previous section is a linear filter. making it feasible to compute depth maps of scenes in real-time using simple image processing hardware.

~ Real Time Ranqe 5ensor Based on ,the above results, we have ; _1~ P~
the real-time focus range sensor 25 shown in Figure 2.
The scene is imaged using a standard 12.5 mm Fujinon lens 28 with an ~;t;nnAl l~eLLuL~ 26 added to convert it to telecentric. Light rays passing through the lens 28 are split in two directions, using a beam-splitting prism 30. This produces two images that are simultaneously detected using two Sony XC-77-RR 8-bit CCD cameras 32, 34. The positions of the two cameras are precisely fixed such that one obtains a near-focus image while the other a far-focus image. In this setup a physical displ~ of 0.25mm between the effective iocal lengths of the two CCD cameras translates to a ~ensor depth of iield o~ apprn~-tPly 30 cms. This detectable range of the sensor can be varied elther by . , ~

W096~4l304 _ PCT~S95/07890 2 1 9 fJ 5 ~ 3 3a~
changing the sensor displacement or the focal length of the imaging optics.
The ;~ min~tion grid shown in Figure 4B waa etched on a glass plate using microlithography, a process widely used in VBSI. The grid 14 was then placed in the path of a 300 W Xenon arc lamp. The ~llllm;n~t;nn pattern generated is projected using a telecentric lens 16 ;A~ntir~1 to the one used for image formation. A half-mirror 20 is used to ensure that the ;11 n~t;rn pattern projects onto the scene via the same optical path 24 used to acquire images. As a result, the pattern is almost perfectly registered with respect to the pixels 36 of the two CCD cameras 32, 34.
Furthermore, the a~ove ~ _ ensures that every scene point that i9 visible to the sensor is also ;1lnm;n~ted by it, avoiding shadows and thus nArtert~hle regions.
Images from the two CCD cameras 32, 34 are digitized and processed using MV200 Datacube image processing hardware. The present configuration includes the equivalent of two 8-bit digitizers, two A/D convertors, and one 12-bit convolver. This hardware enables simultaneous digitization of the two images, convoIution of both images with the tuned focus operator, and~the ~ ~t;rn of a 256x240 depth map, all within a single frametime of 33 msec with a lag of 33 msec. A look-up table is used to map each pair of focus measures (gl and g,) to a unique depth estimate d.
Alternatively, a 512x480 depth map can be computed at the same rate if the two images are taken in s~lrr~a~irn. Simultaneous image acquisition is clearly advantageous since it makes the sensor less sensitive to variations in both illnm;natirn and scene structure between frames. With minor additions to the present processing hardware, it is easy to obtain 512x480 depth maps at 30 ~z usi~g simultaneous image grabbing. Depth maps produced by the sensors 25 can be displayed as wireframes at frame rate on a DEC Alpha workstation.

WO 96/4i304 - 2 1 9 6 5 6 3 ~3 ~ ! S r~., ~
V~riation ~n The Preferred ~n' ' ~.~
One v~r;~t;nn of the sensor 10 addresses the fact that the defocus effect is a function of the chromatic content of the~;lln~in~ting light. Most lenses have slightly different focal length for different light wavelengths, accordingly, the accuracy of determ;n~t;nn of depth from defocus can vary with the spectral characteristics of the ;11 n~t;nn and the color of the reflecting surface of the object, since depth ~t~rm;n~t;nn relies on prior knowledge of the focal length f of lens 28. This source of error can be avoided by providing a spectral band-pass filter 38, shown in Figure 2, to allow only certain ~ of reflected light to be imaged. A band-pass filter would limit the range of the wavelengths to be imaged and thereby limit the chromatic variation in the focal length of the~lens. Other possible locations for such a filter are shown in Figure 8 at 38', 38'' and 38'''.
In the case where the ;11 n~t;nn gource 12 is a laser, the filter is preferably narrow band, passing the laser fre~uency and ~lim;n~t;ng most ambient light, thereby both Pl;m;nAt;n~ the effects of chromatic nh~ w~t;nr of the lens and texture variations from ambient light, not resulting from the projected grid pattern.
In multicolor scenes with no-overlapping spectral characteristics, the pass-band of the spectral filter may be changed or controlled to use an appropriate pass-band to measure depth in different object areas.
For this purpose an electrically controllable filter, or a filter wheel 101, shown in Figure 9 may be used.
In some instances objects to be mapped may include surfaces or structures that provide sp~c~ r r~f1ert;nn~ as well as diffuse reflections. SrPcn1~r rPf1ent;nnq c~an produce negative effects in a depth from defocus mèa~u~ ~ system. First, specular r~flect;nn~ tend to saturate the image sensors 32, 34, whereby focus and defocus information is lost. Second, W096/41304 PCT~S95/07890 21 9~563 3 ~ ' the depth from defocus valuea derived from apecular r~f1ert;rnq represent the depth of the r~fl~rt~
source, not the reflecting surface. Finally, if the normal at a specular surface point does not bisect the 5 ;11 nAt;nn ~;r~rt;~n and the optical axis 24, then the surface point will not produce reflections of the ;11 n~t;~n light in the direction of the sensor.
When required, polarization filters, as shown in Figures lOA to lOD can be used to remove the ef~ects of specular r~f1ect;~n from the senaor images. In Figure lOA a polarizing filter 44 polarizes the ;11 ;n~t;~n light in a vertical direction ;n~;rAtrd by arrowhead V.
Specular r~f1~rt;~n~ would therefore have primarily vertical polarization and would be filtered by hor;~nt~11y polarized filter 42 arranged to provide horizontal polAr;~t;~n H in the sensor imaging 6ystem.
An ~lt~rn~t~, illustrated in Figure 10;3 uses a vertically polarized laser source 45 which ia projected onto grid 14 by lens 47 to provide vertically polarized ;11 n~t;nn A polarizing filter 42 protects the imaging optics from 8pPr~ r reflections. Another Al t~rn~t~ shown in Figure lOC uaes the polarizing effect of a prism semi reflective beam splitter 46, which causes vertically polarized ~11 n~t;~n V to be r~fl~rt~ toward the object, but allows hor;7~n~t~11y polarized re~1ections ~ to pass to the imaging optics.
A final aLl~l-y. of Figure~4D ahows a vertical polarizer 48 ~ollowed by a quarter wave plate 50 to produce circularly polarized light. Tl 1 ;n~t;~n light passing through polarizer 48 becomes vertically po1~r;~ and is converted to right-hand r; rrnl ~r polarization by circular polarizer 50. Specular reflections, which have left-hand circular polarization are converted to h~r;7~nt~1 polarization by polarizer 50 and are filtered out by vertical polarizer_48.
Diffuse reflections include right-hand circular polarized ~ IL ~nt~ that are converted to vertical WO 96/41304 2 ~ 9 6 ~ 6 3 ~ r~

polarization by polarizer 50 and pass polarizer 48 to the sensor system.
A5 described wlth respect to the preferred embodiment, the ;~ min~tjrn patterns shown in~Figures 4A and 4B include sïngle f~ l spatial frequency in the x and y coordinates, with harmonic fre~nr; ~R
outside the limits imposed by the optical transfer fnnrt;rn It is, however, possible to uge ;ll r~t;rn grid patterns that have multiple measurable spatial freguencies within the limits of the optical transfer fllnrt;rn. One such multiple frequency grid pattern is shown in Figure llA, wherein two rh~r~rh~rd grids, one with twice the spatial frequency of the other are superimposed. The resulting sensing of the defocus function, Figure llB, can be filtered in the frequency domain by tuned filters to result in multiple tuned focus operators that detect power variations for ~;ff~r~nt defocug frequencies on a pixel by pixel basis, as shown in Figures llC and llD. The defocus discr;m;n~t;rn functions g for sensitivity of depth from defocus are shown in Figures llE and llF
respectively. The high/frequency defocus function yields greater depth sensitivity, but reduced range.
The lower frequency defocus function yields lower depth sensitivity, but increased range. Accordingly, using the multiple frequency grid of Figure llA can provide variable resolution depth from defocus.
Still other grid patterns are shown in Figures 12A
and 12C with their respective frequency responses in Figures 12B and 12D respectively. The pattern of Figure 12A has dual frequency response characteristics similar to the pattern of Figure llA, but using a different pattern aLL~ . The grid of Figure 12C
has ~;ff~r~nt freguency responses in the x and y coordinates.

-- . .

W096/41304 2 1 9 6 5 6 3 ~ ' PCT~S95107890 3~ =
A~erture Variation The apparatu6 and method described thus far uses two sensor images taken at ~iffer~nt distances from the imaging lens to generate ~1ff~r~nt amounts of defocus in the images. It in also possible to provide images with different defocus by using different aperture sizes or shapes for the two images, which are formed on substlnt;~l1y the same image plane with respect to the imaging lens. It is well r~~ogn;7~, for example, that a small ~eLLuL~ opening a' will cause less de~ocus effect in an image than a large aperture opening.
One approach to aperture variation is to use the ~pp~r~tlln of Figure 2, ~l~m;n~t;ng beam splitter 30~a~nd sensor array 32. A first image of an object 22 is formed on sensor array 34 with a first setting of a~_LLuLe 26 and a second image is se~l~nt;~lly formed using a dif~erent setting of aperture 26. Preferably a ~ero density filter is used with the larger aperture setting to ~ ~ r~t~ for the greater amount of light.
The v~r;pt;nn in the defocus factor between the two ~peLLuL~ settings can then be used to ~t~r~ir~ depth of an image element by defocus.
Another approach, shown in Figure 13 provides a beam splitter 57 which is ~t~rn~l to a pair of sensor units 60, 62. ~nits 60, 62 have j~nt~n~l nensor arrays 64, 66 and lenses 68, 70. Unit 60 has a small aperture opening 72 while unit 62 has a large aperture opening 74 and a neutral density filter 76 to compensate for the increased light from the larger aperture Alt~rn~t~ly, in either a sequential or ~;mnlt~n~nl-n ~eLLuL~ based aLL~n~ , two apertures having similar transparent area size but different shape, such as d~e~LuL~s llO~and 112, shown in Figure 14, can be used. The difference in aperture shape changes the optical transfer~iunction, i.e., depth of focus, while r-;nt~;n;ng the same image brightness WO 96/41304 2 1 9 6 5 6 3 - - - PCI~US9~07890 RP~; AtratiOn _ - -While those skilled in the art will recognize that ~l;h~t;nn of the system 10 of Figure 2 can be achieved by aligning the aensor arrays 32, 34 with the image grid as projected onto a plane surface located at the field focal plane of the sensor, it i8 also possible to c ~ te for mis-alignment in the , _ ~t;nn of the defocus function. Misregistration of an ;ll n~t;nr pattern of the type shown in Figure 4B with respect to a sensor array is shown in Figure 15, wherein the ;ll ;n~t;on pattern is mis-registered by an amount ~x and ~y from the nor~al phase offset value ~ ~htX and ~ 'hty given by P~l~tinn (37). In this case the output of each sensor element in the misaligned sensor will have an error of C08 ~-/)X C08 ~ y which will cause a depth map error.
It is pn~;hle to compensate for this alignment error by applying an additional operator to the convolved image, taking the sum of squared data of the convolved image at four adjacent Pl ~ which correspond to the phase shift of (~x, ~y) = (~, ~), (~, ~2) ~ (~/2,0) and (~ ). This results in a power mea~ul, ' that can be directly applied to a power look up table for defocus, or can be modified by the square root function before being applied to the norm ~ t;nn or look-up table.
In the case of a one ~ inn~l pattern (stripes) it is only nP~P~y to apply the above procedure to two adjacent element points in the direction transverse to the pattern stripes.
It is also possible to numerically construct two tuned operators which produce focus measure data whose phase~ diffe~r by ~/, (sine and cosine). In the case of the two dimensional pattern, it is likewise possible to numerically construct four tuned operators which produce focus measure data whose phases differ (~ y) = (~I ~) I (~~ ~/2), (~/2,0) and (~/21 ~/~) . These WO 96/41304 2 1 9 6 5 6 ~ ~ ' ;. 'j i ! P~ Y:~/U~V~C

convolved images can be 'in~fl to calculate the sum of squares at positions C~Ll~ fl; ng to the image ~l~ q to get a focus measure that is illd~ldeu~ of ~l;,; phase in either one or two fl;-- q;~nal grid patterns.

Concurre~t Imaainq In some ~ppl;c~t;~nc it is desirable to have both a depth map and a br;qhtn~ss image of a scene. In this respect the images used to compute depth from defocus can be used to computat;~n~l1y reconstruct a normal br;ghtn~qq image by removing the spatial frequencies ~o~;~ted with the projected illnm;n~t;nn This can be achieved using a aimple convolution operation to yield an image under aobient ;llllm;nation~ Further, since the depth of each image point is known, a de-hlllrr~ng operation, which can also be ~ ed as a convolution, can be applied to the brightness image that has the highest degree of focus at all point3. In the case of coaxial ;ll ;n~t;~n and imaging, the ~ fl focused hr;3htn~qq image is registered with the ~ d depth map and may be stored in a suitable memory. This enables the use of, not only fast texture mapping, but also the joint recovery of geometric and photometric scene properties for visual processing, such as object recognition. T_ree-fl; R;~n~l texture maps may be displayed as wireframes at frame rate on a bitmapped workstation.
Figure 16 shows an aLL_ _ ' for separate detection of hr;ghtn~qq images in a televi3ion camera 80 and depth by sensor 25, which may be the '~fl;-of Fiaure 2. In thi8 ~rr~n. various filter or 8P~l~n~;ng techniques may be used to remove the effect of the ;11 ;n~t;~n pattern in the brightness image.
For example, beam splitter 82 may be formed as a selective r~fl~tnr allowinq frequencies corresponding to the ;ll llm; n~ti~r pattern to pass to depth sensor 25 and r~ t; ng other light to camera 80.

WO96/41304 2 1 965 63 '; ~ PCr/US95~07890 3q ~lt~rn~tively, filter 84 can be ~ ~d to aelectively absorb the ;11 n~tion frequency and pass other fr~q~rnri~, while filter 86 passes the illumination f~equency to de~pth aensor 25. Such filtering ia especially practical in the case narrow band, e.g.
laaer, pattern ;11 n~t;nr.
An alternate to uaing a tranamission grid acreen, as shown in Figures 4A and 4B is to uae a phaae pattern grid, wherein there ia provided a r~rrk~rhn~rd grid of rectangular rl~ r, with tranamiaaion phaae shifted by 90~ in alternate grid ~1~ R aa ahown in Figure 17A. This "phase grid" providea a pro;ected pattern of alt~rnntinr; conatructive and deatructive interference as ahown in Figure 17B and reaulta in a freriuency domain pattern Figure 17C, that can be analyzed by tuned filter convolution to provide alternate separate ~requency responses for defocus analysis aa ahown in Figures 17D and 17E. The advantage of a phase ahift grid is that there is little 1088 of energy from the grid ;11 n~t;nn aa c , ~d to the tranr~;r8ion grid pattern.
In nnnn~rt;nn with the provision of an illllm;n~t~d grid pattern, as noted above, a laaer ia the preferred source for several reasons, ;nr~ ;ng (1) narrow band ;llllm;n~t;on, providing ease of filtering and absence of ~h.l t;C aberration in ~tected imagea, (2) better control of aurfaces, including lens, filter and mirror coatings for single frequency light, (3~ polarized light without loss of energy, and (4) bright, and controllable light source using low power.
Figure 18 is a flow diagram showing the determ;n~t;nn of the image depth map from the image information received in sensor arrays 32 and 34. The image sensor data is converted to digital format, and then convolved in ac~ ~.c~ with the methods deacribed herein, to reault in a determ;n~t;nn of the defocua measures for each element of each image. Optionally registration correction, as described above, can be W09~41304 2 1 9 6 5 6 3 ~,~ ' i ' performed in the process of arriving at defocus measures gO and gl. The defocus measures are then ,_ ' ;n~A in a point-by-point manner to ~t~rm;n~ the n~rr~ od relative blur of the two images and, using computation or a look-up table ~t~rm;n~ depth of the object on a point-by-point basis, resulting in the desired depth map.
Further, while some e '~~ q of the invention ;n~ t~ simultaneous generation of images, depending on the dynamics of the appl;c~t;~n it should be rP~ogn;7~ that the invention can be pr~ct;r~fl with se~l~nt;~77y formed images, wherein the image spacing, lens position and/or ~eLLul~ are varied between image6, but the object position remains constant.
While we have described what we believe to be the preferred : ' ~fl; ' q of the ;nvention, those skilled in the art will r~cmgn; 7~ that other and further changes and , ~;fic~ nq can be made thereto without departing from the spirit of the invention, and it is ;nt~n~ to claim all such changes as fall within the true scope of~the invention.

Claims

1. A method for mapping a three-dimensional structure by depth from defocus, comprising the steps of:
(a) illuminating said structure with a _ preselected illumination pattern;
(b) sensing at least two images of said illuminated structure each of said images being formed with different imaging parameters; and (c) determining a relative blur between corresponding elemental portions of said sensed images thereby determining the depth of corresponding elemental portions of said three-dimensional structure.

2. The method of claim 1 wherein said illumination comprises illumination with a two dimensional illumination pattern.

3. The method of claim 2 wherein said illumination is with a rectangular grid having selected horizontal and vertical grid spacing of rectangular transparent and opaque elements forming a checkerboard pattern.

4. The method of claim 3 wherein said sensing comprises sensing using an array of sensing elements having horizontal and vertical array spacings that are integral sub-multiples of said horizontal and vertical grid spacing in said sensed images.

5. The method of claim 1 wherein said at least two images are formed with a telecentric lens system.

6. A method for mapping a three-dimensional structure by depth from defocus, comprising the steps of:

(a) illuminating said structure with an illumination pattern comprising a rectangular grid projected along an optical axis;
(b) sensing at least two images of said illuminated structure from said optical axis using a constant magnification imaging system, said images being sensed at least two imaging planes with different locations with respect to the focal plane of said imaging system;
(c) determining the relative blur between corresponding elemental portions of said illumination patterns in said sensed images thereby determining the depth of corresponding elemental portions of said three dimensional structure.

7. The method of claim 6 wherein said images are sensed using first and second sensing arrays of sensing elements arranged in a rectangular pattern with selected element spacing in each direction of said array.

8. The method of claim 7 wherein said rectangular grid has a checkerboard pattern with selected grid periodicity.

9. The method of claim 8 wherein said grid periodicity is selected to provide a grid image on said sensing arrays wherein said grid periodicity is an integral multiple of said corresponding element spacing.

10. The method of claim 9, wherein said grid periodicity is selected to provide a grid image with a period substantially equal to twice said element spacing, and said grid image is aligned with said array in two orthogonal directions.

11. The method of claim 9, wherein said grid image periodicity is substantially equal to four times said pixel width and said grid image is shifted on said array by one eighth of said grid image periodicity.

12. The method of claim 6, wherein said light source is a monochromatic laser light source.

13. The method of claim 9, wherein said sensing step comprises sensing at least two depth images of said scene formed by said laser light and at least one brightness image of said scene formed by ambient light, and said determining step comprises measuring a relative blur between said sensed laser light images.

14. The method of claim 6, wherein a first image being is at a position corresponding to a near focused plane in said and a second image is sensed at a position corresponding to a far focused plane.

15. The method of claim 6, wherein said illumination grid is selected so as to produce an illumination pattern which generates multiple spatial frequencies.

16. The method of claim 6, wherein said illuminating step further comprises using half-mirror optics to reflect said illumination pattern prior to illuminating said scene, and said sensing step further comprises passing said scene images through said half-mirror optics prior to sensing said scene, such that said illumination pattern and said scene images pass along a common optical axis.

17. The method of claim 6, wherein said illuminating step further comprises using polarization optics to polarize said illumination pattern prior to illuminating said scene, and said sensing step further comprises passing said scene images through polarization optics prior to sensing said scene.

18. The method of claim 6, wherein said determining step further comprises:
(i) converting said sensed images into digital signals on a pixel by pixel basis; and (ii) convolving said digital signals on a pixel by pixel basis to determine power measurement signals that correspond to the fundamental frequency of said illumination pattern at each of said pixels for each sensed scene image.

19. The method of claim 18, wherein said measuring step further comprises:
(iii) correcting said power measurement signals for mis-registration on a pixel by pixel basis, such that any errors introduced into said power measurement signals because of misalignment between said sensing pixels of said array and said illumination pattern is corrected.

20. The method of claim 19, wherein said correcting step comprises taking the sum of the squares of said measurement signal at four neighboring pixels.

21. The method of claim 18, wherein said measuring step further comprises:
(iii) normalizing said power measurement signals on a pixel by pixel basis.

22. The method of claim 18, wherein said measuring step further comprises:
(iii) comparing said power measurement signals for one of said sensed images, on a pixel by pixel basis, with determined power measurements for a second of said sensed images to determine said depth information at each of said pixels.

23. The method of claim 6, wherein said determination step comprises arranging said pixel by pixel depth information as a depth map.

24. The method of claim 23, further comprising the step of displaying said depth map as a wireframe image.

25. The method of claim 13, wherein said determination step comprises arranging said pixel by pixel depth information as a depth map, further comprising the step of constructing a texture mapped three-dimensional display from said sensed brightness image and said depth map.

26. Apparatus for measuring a three-dimensional structure of a scene by depth from defocus, comprising:
(a) active illumination means for illuminating the scene with a preselected illumination pattern;
(b) sensor means, optically coupled to said illuminating means, for sensing at least two images of the scene, wherein at least one of said sensed images is taken with optical or imaging parameters that are different from at least one other of said sensed images;

(c) depth measurement means, coupled to said sensor means, for measuring a relative blur between said sensed images; and (d) scene recovery means, coupled to said depth measurement means, for reconstructing said three-dimensional structure of said sensed scene from said measured relative blur of said sensed images.

27. The apparatus of claim 26, wherein said sensor means comprises a plurality of sensors, each sensor having X * Y pixels of predetermined width to form an X * Y sensing grid, said depth measurement means measuring said relative blur on a pixel by pixel basis over said X * Y pixel grid, such that depth information is obtained for each of said pixels within said X * Y grid.

28. The apparatus of claim 27, wherein said active illumination means comprises:
(i) an illumination base;
(i) a light source coupled to said illumination base; and (ii) a spectral filter having said preselected illuminating pattern coupled to said illumination base, such that light from said light source passes through said spectral filter to form said preselected illumination pattern.

29. The apparatus of claim 28, wherein said preselected illumination pattern of said spectral filter is optimized so that a small variation in the degree of defocus sensed by said sensor means results in a large variation in the relative blur measured by said depth measurement means.

30. The apparatus of claim 29, wherein said optimized illumination pattern is a rectangular grid pattern.

31. The apparatus of claim 30, wherein said optimized illumination pattern comprises a pattern having a period being substantially equal to twice said pixel width and a phase shift being substantially equal to zero with respect to said sensing grid, in two orthogonal directions.

32. The apparatus of claim 30, wherein said optimized illumination pattern comprises a pattern having a period being substantially equal to four times said pixel width and a phase shift being substantially equal to one eighth of said pixel width with respect to said sensing grid, in two orthogonal directions.

33. The apparatus of claim 28, wherein said light source is a Xenon lamp.

34. The apparatus of claim 28, wherein said light source is a monochromatic laser.

35. The apparatus of claim 34, wherein said sensor means further comprises:
(i) a sensor base;
(i) first and second depth sensors, coupled to said sensor base, for sensing depth images of said scene formed by said laser light, such that said depth measurement means measure a relative blur between said sensed laser light images; and (ii) at least one brightness sensor, coupled to said sensor base, for sensing an image of said scene formed by ambient light.

36. The apparatus of claim 26, wherein said sensor means comprises:
(i) a sensor base;
(ii) a lens, coupled to said sensor base and optically coupled to said illuminating means, for receiving scene images;
(iii) a beamsplitter, coupled to said sensor base and optically coupled to said lens, for splitting said scene images into two split scene images; and (iv) first and second sensors, coupled to said sensor base, wherein said first sensor is optically coupled to said beamsplitter such that a first of said split scene images is incident on said first sensor and said second sensor is optically coupled to said beamsplitter such that a second of said split scene images is incident on said second sensor.

37. The apparatus of claim 36, wherein said sensor means further comprises:
(v) an optical member having an aperture, coupled to said sensor base in a position between said lens and said beamsplitter, being optically coupled to both said lens and said beamsplitter such that images received by said lens are passed through said aperture and are directed toward said beamsplitter.

38. The apparatus of claim 36, wherein said first sensor is at a position corresponding to a near focused plane in said sensed scene, and said second sensor is at a position corresponding to a far focused plane in said sensed scene.

39. The apparatus of claim 38, wherein said spectral filter includes an illumination pattern capable of generating multiple spatial frequencies for each image sensed by said first and second sensors.

40. The apparatus of claim 26, further comprising:
(e) a support member, coupled to said active illumination means and said sensor means; and (f) a half-mirror, coupled to said support member at an optical intersection of said active illumination means and said sensor means, such that said preselected illumination pattern is reflected by said half-mirror prior to illuminating said scene, and such that said scene images pass through said half-mirror prior to being sensed by said sensor means, whereby said illumination pattern and said scene images pass through coaxial optical paths.

41. The apparatus of claim 26, further comprising:
(e) a support member, coupled to said active illumination means and said sensor means; and (f) a half-mirror, coupled to said support member at an optical intersection of said active illumination means and said sensor means, such that said preselected illumination pattern passes through said half-mirror prior to illuminating said scene, and such that said scene images are reflected by said half-mirror prior to being sensed by said sensor means, whereby said illumination pattern and said scene images pass through coaxial optical paths.

42. The apparatus of claim 26, further comprising:
(e) a support member, coupled to said active illumination means and said sensor means; and (f) a polarization filter, coupled to said support member at an optical intersection of said active illumination means and said sensor means, such that said preselected illumination pattern is reflected by said polarization filter prior to illuminating said scene, and such that said scene images pass through said polarization filter prior to being sensed by said sensor means, whereby said illumination pattern incident on said scene and said sensed scene images are both polarized in controlled polarization directions.

43. The apparatus of claim 27, wherein said depth measurement means further comprises:
(i) analog to digital converting means, coupled to said sensor means, for converting sensed images into digital signals on a pixel by pixel basis; and (ii) convolving means, coupled to said analog to digital converting means, for convolving said digital signals on a pixel by pixel basis to derive power measurement signals that correspond to the fundamental frequency of said illumination pattern at each of said pixels for each sensed scene image.

44. The apparatus of claim 43, wherein said depth measurement means further comprises:
(iii) registration correction means, coupled to said convolving means, for correcting said power measurement signals for mis-registration on a pixel by pixel basis, such that any errors introduced into said power measurement signals because of misalignment between said sensing pixels of said grid and said illumination pattern is corrected.

45. The apparatus of claim 44, wherein said registration correction means further include arithmetic means for multiplying each of said power measurement signals, on a pixel by pixel basis, by the sum of the squares of said power measurement signal's four neighboring power measurement signals.

46. The apparatus of claim 43, wherein said depth measurement means further comprises:
(iii) normalizing means, coupled to said convolving means, for normalizing said power measurement signals on a pixel by pixel basis.

47. The apparatus of claim 43, wherein said depth measurement means further comprises:
(iii) comparator means, coupled to said convolving means, for comparing said power measurement signals for one of said sensed images, on a pixel by pixel basis, with determined power measurements for a second of said sensed images, to determine said depth information at each of said pixels.

48. The apparatus of claim 47, wherein said comparator means includes a look-up table.

49. The apparatus of claim 27, wherein said scene recovery means comprises depth map storage means, coupled to said depth measurement means, for storing derived pixel by pixel depth information for said scene as a depth map.

50. The apparatus of claim 49, further comprising:
(e) display means, coupled to said scene recovery means, for displaying said depth map as a wireframe on a bitmapped workstation.

51. The apparatus of claim 35, wherein said scene recovery means comprises three-dimensional texturemap storage means, coupled to said depth measurement means and said brightness sensor, for storing derived pixel by pixel depth information and brightness information for said scene, further comprising:
(e) display means, coupled to said scene recovery means, for displaying said three-dimensional texturemap as a wireframe on a bitmapped workstation.