AU2011265379A1

AU2011265379A1 - Single shot image based depth mapping

Info

Publication number: AU2011265379A1
Application number: AU2011265379A
Authority: AU
Inventors: Matthew R. Arnison; Donald James Bone; Ben Yip
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-12-20
Filing date: 2011-12-20
Publication date: 2013-07-04

Abstract

SINGLE SHOT IMAGE BASED DEPTH MAPPING Disclosed is a method (100) for determining distance (140) to an object (235) in a scene from a camera (1501, 200), the camera having captured an image of the scene. The method provides (110) the image of the scene, the image being captured through a masked lens (205,215) of the camera, the masked lens having an intensity point spread function 10 (IPSF), the autocorrelation of which produces an image with a plurality of compact elements (611-616) wherein a position for at least one of the compact elements varies according to the distance to be determined. A set of reference data (150) is received (700) with each data of the set having an associated distance value and being formed from an autocorrelation (725) of the point spread function (720) for a point object at the associated 15 distance value. The method forms test data (828) based on an autocorrelation (820) of a selected region (810) of the captured image (120) and forms (830) a series of distance estimates (835) based on matching measures determined between the test data and a subset of the reference data at each of the associated distance values. The method estimates (840) the distance (850) of the object using the set of distance estimates from the selected subsets 20 of reference data. P015750_specijodge 120 Sart End0 Fig. 1 5846326_1 P01 5750_figslodge

Description

S&F Ref: P015750 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant: chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Donald James Bone Ben Yip Matthew R Arnison Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Single shot image based depth mapping The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(5848215_1) - 1 SINGLE SHOT IMAGE BASED DEPTH MAPPING TECHNICAL FIELD The present invention relates to the design of depth imaging systems and, in particular, to pupil plane mask based depth imaging systems. 5 BACKGROUND Passive optical methods for estimating depth of an object in a scene captured by a camera rely on the natural lighting of the scene and can be broadly classified into single camera and multi-camera methods. One type of passive optical method uses an aperture mask which incorporates a 10 complex (amplitude and/or phase) mask into the pupil plane of the camera and modifies the Intensity Point Spread Function (IPSF) of the resulting imaging system in a way that codes the depth of the scene. Whilst, even a simple lens based imaging system produces an IPSF that varies with depth, the masked aperture methods seek to produce an IPSF that improves the recovery of the depth information in the scene. 15 Another kind of passive optical method is known as monocular stereo and uses additional optical devices, such as mirrors or prisms, to produce two or more views through one lens. Considered with the perspective of the effect on the IPSF of the imaging system, these monocular stereo methods fall into the same class as the masked aperture methods. Monocular stereo methods also change the IPSF in a way that codes the depth of 20 the scene. For the methods that produce disjoint stereo images, the IPSF consists of two or more bright spots that are well separated but whose relative position on the image plane changes with depth. For methods that produce overlapping stereo images, the similarity to the aperture masked methods is more apparent, since the resulting IPSF consists of a compact set of bright spots whose configuration changes with depth. 25 Depth estimation using aperture mask based methods Aperture masks can modify the field in the pupil plane in two ways. They can shape the amplitude of the field by attenuating the amplitude (amplitude masks), or they can change the phase of the field by imposing some non-uniform variation in the optical path length through the pupil plane (phase masks), or they can impose some combination 30 of the two. A single binary aperture mask, known as a coded aperture mask, can be designed to modulate the amplitude of a light field in a camera aperture in a way that the mask creates an IPSF that codes the depth of the point source by significantly changing the IPSF with changes in the depth of the point source. Such masks have been designed to improve the P015750_speci-lodge -2 sensitivity to depth relative to a simple open aperture, and simultaneously allow the recovery of both a depth map and a high resolution image from a single captured image. The main disadvantage of amplitude masks for coding depth into the IPSF of an imaging system is that amplitude masks are optically inefficient, since they reduce the 5 amount of light through the aperture. This means that the signal to noise ratio (SNR) of the captured image, for a given lighting environment, will be reduced. Computational methods used to recover information from the captured image will therefore generally result in increased errors in the recovered image and depth map because of the decreased SNR. These computational methods are also computationally expensive. 10 The other disadvantage is that diffraction effects mean that the light passing through different elements of the masks will be in close proximity on the sensor and diffraction effects will then result in complex interactions between these components. Those interactions are not simply modelled, and thus accommodating such interactions increases the complexity of the analysis required to recover the depth information from the 15 image. Methods that combine amplitude and phase elements in the aperture mask have the potential to increase the sensitivity to depth and to improve the optical efficiency compared to a pure amplitude mask. One proposal used an optical ranging approach that produced nulls in the optical 20 transfer function of the system, in which the nulls coded depth of the objects in the scene. In broad effect, this proposal resembled two shaped amplitude apertures with approximately linear phase shifts imposed across the apertures. Image analysis under that proposal relied on identifying the location of the nulls of the optical transfer function from the spectrum of a scene. Unfortunately, noise caused difficult in accurately locating 25 spectral nulls. Nulls that result from the image data itself can interfere with the analysis, so the proposal has not been widely used. Pure phase masks do not exclude light from the aperture, they simply rearrange the light. This can improve the SNR performance relative to a system such as that described above that imposes some amplitude masking. 30 Another approach described the concept of a lattice lens. The advantage of this approach is that the multiple lens elements each focus at different depths ensuring that at some depth, some subset of the lens elements will be close to focus, preserving the spatial frequency information of the scene over a range of depths. Because each of the elements are oriented somewhat arbitrarily and offset from their natural position in the aperture, the P015750_speci-lodge -3 IPSF of such an approach showed considerable variation with depth, thereby giving good depth discrimination. The captured image is able to be deconvolved with the IPSF function for each of a series of scene depths to form a sequence of images and the local depth and deblurred image is formed by analysing each local region in the set of deblurred 5 images and using an image quality criterion to select the depth that provides the best local image region. The disadvantage of this approach is that the analysis is computationally very expensive and is currently quite impractical for use as an embedded system, such as in a camera. There has been some work by various authors to develop phase masks that produce 10 an IPSF consisting of a pair of point-like features that rotate around each other with defocus. A major disadvantage of these various approaches is that they are not generally able to analyse the depth in a natural scene and have only been successfully applied to situations where the scene consists of isolated features, such as points corners or edges or a contrived, intensity distribution such as synthetic random noise. These phase masks 15 produce only two bright spots in their point spread function, which means that for some alignment of edge features in the scene, there will be no apparent change in the disparity of the two views of the edge features in the image as a function of depth. These methods are therefore insensitive to depth for some alignments of edge features in the scene. SUMMARY 20 It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of prior systems. Disclosed is an apparatus and method for depth imaging where an aperture mask is used in or near the pupil plane of a single imaging system, to modify the captured image in such a way that a depth map may be recovered from the captured image. The recovery is 25 performed by a method based on a simple and computationally inexpensive autocorrelation of local regions of the image in which the depth is substantially uniform. By combining three or more views of the scene, the presently disclosed approach is able to ensure that there is always a pair of views of the scene that produce a useable disparity, thus avoiding the problem with depth sensitivity that is inherent with only two views of the scene. 30 Preferably the three or more views are obtained using, what the present inventors term a "trefoil" aperture mask, giving three views. Masks with a higher number of aperture masks, may be used, but the use of three apertures has been found by the present inventors always provide a pair of views that provide some disparity for any edge orientation. Increasing the number of views will reduce the amount of light and the resolution for each P015750_speci-lodge -4 view, which is generally not desired in the imaging arts. The present inventors have found three views to be optimal. By capturing a single image with a single optical system, the present approach avoids problems of feature correspondence and image alignment which plague multi-image systems. By capturing a single image of the scene, the present 5 approach is able to capture depth of non-static scenes. By using a robust matching process, the present approach is able to preferentially weight the depth estimates from pairs of views in which the epipolar geometry provides a more accurate depth estimate for linear features such as edges. According to one aspect of the present disclosure there is provided a method for 10 determining distance to an object in a scene from a camera, the camera having captured an image of the scene, the method comprising: providing the image of the scene, the image being captured through a masked lens of the camera, the masked lens having an intensity point spread function (IPSF), the autocorrelation of which produces an image with a plurality of compact elements wherein 15 a position for at least one of the compact elements varies according to the distance to be determined; receiving a set of reference data with each data of the set having an associated distance value and being formed from an autocorrelation of the point spread function for a point object at the associated distance value; 20 forming test data based on an autocorrelation of a selected region of the captured image; forming a series of distance estimatesbased on matching measures determined between the test data and a subset of the reference data at each of the associated distance values; and 25 estimating the distance of said object using the set of distance estimates from the selected subsets of reference data. Preferably the autocorrelation of the forming of the set of reference data comprises calculating a sum, over all of the pixels in a pixel-wise product of a shifted copy of the IPSF and an un-shifted IPSF for all possible pixel shifts. 30 Alternatively the autocorrelation of the forming of the set of reference data uses an inverse Fourier transform of a squared magnitude of a Fourier transform of the IPSF. Typically the forming of the set of reference data comprises analysing an autocorrelation image formed from the autocorrelation thereof to identify isolated intensity peaks in the autocorrelation image, and recording the peaks by at least one of: (i) a P015750_speciJodge -5 corresponding peak position, (ii) as normalised segments of the autocorrelation image, and (iii) as parametric fits to the peaks. The method may further comprise filtering the captured image, by removing a local mean value from well-decorrelated data of the captured image. 5 Alternatively the method may filter the captured image using an image gradient filter for image data that is not well-decorrelated, the image gradient filter being a gradient magnitude filter. Preferably the test data is formed from a set of autocorrelation values from an autocorrelation image of the selected region. 10 Alternatively the test data is formed as a parametric representation of principal features of an autocorrelation image of the selected region based on a fitting process. Desirably the forming of the series of distance estimates comprises selecting a feature subset from the set of reference data by associating a feature subset with each isolated peak in an autocorrelation. The method may further comprise pairing symmetric 15 components of the autocorrelation into subsets consisting of two peaks. The method may include determining a set of said matching measures which are minimally affected by noise and for which a subset are minimally affected by structure in the scene. Advantageously the matching measures are calculated as an inner product of a selected region corresponding to a feature subset in the reference data for a selected depth 20 with a corresponding region of the autocorrelation of the captured image. Alternatively the matching measures are calculated as a function of autocorrelation image values at the positions in the autocorrelation image for the selected image region of the captured image corresponding to feature locations in the reference data for a selected depth. 25 Generally the estimating of the distance comprises averaging the set of distance estimates. Desirably the estimating of the distance comprises determining a median of the set of distance estimates. In another implementation the estimating of the distance comprises calculating the distance by fitting a quadratic to the match scores at a set of discrete depths and using that 30 quadratic fit to estimate the depth corresponding to a peak in the match score. Typically the estimating of the distance comprises calculating a confidence measure based on the matching measure at the local peak and a curvature of the matching measure at the local peak. P015750_speci-lodge -6 Preferably the providing of the image comprises capturing the image through the masked lens of the camera. Other aspects are also disclosed including a camera for performing the methods and a computer program by which the methods may be implemented. 5 BRIEF DESCRIPTION OF THE DRAWINGS At least one embodiment of the present invention will now be described with reference to the following drawings, in which: Fig. I is a flow chart of an overall data processing architecture for depth map imaging; 10 Fig. 2 is a schematic diagram of an example optical arrangement, with which image capture of the architecture of Fig. I may be obtained; Fig. 3 is an illustration of the phase distribution for one example of an aperture mask according to the present disclosure which produces an IPSF with three well localised intensity peaks which are substantially circularly symmetric; 15 Fig. 4 is an illustration of the phase distribution for an example phase-only aperture mask according to the present disclosure that produces an IPSF with three well localised intensity peaks; Fig. 5 is an example of the behaviour of the IPSF as a function of the depth of a point object according to the present disclosure; 20 Fig. 6 is an example of the behaviour of the autocorrelation of the IPSF as a function of the depth of a point object according to the present disclosure; Fig. 7 is a flow chart of a method for the production of a set of reference IPSF autocorrelation data; Fig. 8 is a flow chart of a method for the production of a depth map; 25 Fig. 9 is a flow chart of a method for the production of a Depth Match Score array for a single region in an image using matching process based on the reference autocorrelation data from the IPSF and test data from an autocorrelation of the image region; Figs. 10a, 10b and 10c illustrate the Depth Match Score calculation for a single 30 region as a graphical overlay of the image region autocorrelation and the feature subsets in the reference IPSF autocorrelation for a series of depths; Figs. I la, 1 b and 1 Ic illustrate the Depth Match Score results for a single region as plots for each of the feature subsets in the reference IPSF autocorrelation for a series of depths for the example of Figs. l0a, l0b and 10c; P015750_specijodge -7 Figs. 12a, 12b and 12c illustrate the Depth Match Score calculation and results for a further example; Figs. 13a, 13b and 13c illustrate the Depth Match Score results for a single region as plots for each of the feature subsets in the reference IPSF autocorrelation for a series of 5 depth for the further example of Figs. 12a, 12b and 12c; Fig. 14 is a flow chart of a method for the estimation of the depth of a single region in an image from a match score array for that region; and Figs. 15A and 15B collectively form a schematic block diagram representation of an electronic device upon which described arrangements can be practised. 10 DETAILED DESCRIPTION INCLUDING BEST MODE Fig. I is a flow chart 100 of a data processing architecture by which an image capture 110, performed by an optical system, such as that included in a camera, and which contains a mask, is able to create a depth map recording a distance from the optical system to one or more points in an image. During the image capture 110 a copy of the captured 15 image 120 may be saved for later reference to a memory. A depth map creation step 130 takes the captured image 120 and a reference data set 150 as input to determine a depth map 140. The reference data set 150 provides information about depth dependent behaviour of the imaging performance of the optical system. Figs. 15A and 15B collectively form a schematic block diagram of a general 20 purpose electronic device 1501 including embedded components, upon which the depth map methods to be described are desirably practiced. A preferred example of the device 1501 is a digital camera. Nevertheless, the methods to be described may also be performed on higher-level devices such as desktop computers, server computers, tablet computers and other such devices with significantly larger processing resources. 25 As seen in Fig. 15A, the electronic device 1501 comprises an embedded controller 1502. Accordingly, the electronic device 1501 may be referred to as an "embedded device." In the present example, the controller 1502 has a processing unit (or processor) 1505 which is bi-directionally coupled to an internal storage module 1509. The storage module 1509 may be formed from non-volatile semiconductor read only memory 30 (ROM) 1560 and semiconductor random access memory (RAM) 1570, as seen in Fig. 15B. The RAM 1570 may be volatile, non-volatile or a combination of volatile and non-volatile memory. The electronic device 1501 includes a display controller 1507, which is connected to a display 1514, such as a liquid crystal display (LCD) panel or the like. The display P015750_speci-lodge -8 controller 1507 is configured for displaying captured images and where desired graphical images on the display 1514 in accordance with instructions received from the embedded controller 1502, to which the display controller 1507 is connected. The electronic device 1501 also includes user input devices 1513 which are 5 typically formed by keys, a keypad or like controls. In some implementations, the user input devices 1513 may include a touch sensitive panel physically associated with the display 1514 to collectively form a touch-screen. Such a touch-screen may thus operate as one form of graphical user interface (GUI) as opposed to a prompt or menu driven GUI typically used with keypad-display combinations. Other forms of user input devices may 10 also be used, such as a microphone (not illustrated) for voice commands or a joystick/thumb wheel (not illustrated) for ease of navigation about menus. As seen in Fig. 15A, the electronic device 1501 also comprises a portable memory interface 1506, which is coupled to the processor 1505 via a connection 1519. The portable memory interface 1506 allows a complementary portable memory device 1525 to 15 be coupled to the electronic device 1501 to act as a source or destination of data or to supplement the internal storage module 1509. Examples of such interfaces permit coupling with portable memory devices such as Universal Serial Bus (USB) memory devices, Secure Digital (SD) cards, Personal Computer Memory Card International Association (PCMIA) cards, optical disks and magnetic disks. 20 The electronic device 1501 also has a communications interface 1508 to permit coupling of the device 1501 to a computer or communications network 1520 via a connection 1521. The connection 1521 may be wired or wireless. For example, the connection 1521 may be radio frequency or optical. An example of a wired connection includes Ethernet. Further, an example of wireless connection includes BluetoothTM type 25 local interconnection, Wi-Fi (including protocols based on the standards of the IEEE 802.11 family), Infrared Data Association (IrDa) and the like. Typically, the electronic device 1501 is configured to perform some special function. The embedded controller 1502, possibly in conjunction with further special function components 1510, is provided to perform that special function. In the present 30 example, where the device 1501 is a digital camera, the components 1510 may represent an optical system of the camera including lens, focus control and image sensor, and specifically the arrangements illustrated in Fig. 2 to be described. The special function components 1510 may also include a number of image encoders and decoders of a type P015750_speci-lodge -9 including Joint Photographic Experts Group (JPEG), (Moving Picture Experts Group) MPEG, MPEG-I Audio Layer 3 (MP3), and the like. The methods described hereinafter may be implemented using the embedded controller 1502, where the processes of Figs. I and 3 to 14 may be implemented as one or 5 more software application programs 1533 executable within the embedded controller 1502. The electronic device 1501 of Fig. 15A implements the described methods. In particular, with reference to Fig. 15B, the steps of the described depth calculation methods are effected by instructions in the software 1533 that are carried out within the controller 1502. The software instructions may be formed as one or more code modules, each for 10 performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user. The software 1533 of the embedded controller 1502 is typically stored in the non 15 volatile ROM 1560 of the internal storage module 1509. The software 1533 stored in the ROM 1560 can be updated when required from a computer readable medium. The software 1533 can be loaded into and executed by the processor 1505. In some instances, the processor 1505 may execute software instructions that are located in RAM 1570. Software instructions may be loaded into the RAM 1570 by the processor 1505 initiating a 20 copy of one or more code modules from ROM 1560 into RAM 1570. Alternatively, the software instructions of one or more code modules may be pre-installed in a non-volatile region of RAM 1570 by a manufacturer. After one or more code modules have been located in RAM 1570, the processor 1505 may execute software instructions of the one or more code modules. 25 The application program 1533 is typically pre-installed and stored in the ROM 1560 by a manufacturer, prior to distribution of the electronic device 1501. However, in some instances, the application programs 1533 may be supplied to the user encoded on one or more CD-ROM (not shown) and read via the portable memory interface 1506 of Fig. 15A prior to storage in the internal storage module 1509 or in the 30 portable memory 1525. In another alternative, the software application program 1533 may be read by the processor 1505 from the network 1520, or loaded into the controller 1502 or the portable storage medium 1525 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that participates in providing instructions and/or data to the controller 1502 for execution P015750_speci-lodge - 10 and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, flash memory, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the device 1501. Examples of 5 transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the device 1501 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. A 10 computer readable medium having such software or computer program recorded on it is a computer program product. The second part of the application programs 1533 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 1514 of 15 Fig. 15A. Through manipulation of the user input device 1513 (e.g., the keypad), a user of the device 1501 and the application programs 1533 may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts 20 output via loudspeakers (not illustrated) and user voice commands input via the microphone (not illustrated). Fig. 15B illustrates in detail the embedded controller 1502 having the processor 1505 for executing the application programs 1533 and the internal storage 1509. The internal storage 1509 comprises read only memory (ROM) 1560 and random access 25 memory (RAM) 1570. The processor 1505 is able to execute the application programs 1533 stored in one or both of the connected memories 1560 and 1570. When the electronic device 1501 is initially powered up, a system program resident in the ROM 1560 is executed. The application program 1533 permanently stored in the ROM 1560 is sometimes referred to as "firmware". Execution of the firmware by the processor 1505 30 may fulfil various functions, including processor management, memory management, device management, storage management and user interface. The processor 1505 typically includes a number of functional modules including a control unit (CU) 1551, an arithmetic logic unit (ALU) 1552 and a local or internal memory comprising a set of registers 1554 which typically contain atomic data P015750_speci_lodge - 11 elements 1556, 1557, along with internal buffer or cache memory 1555. One or more internal buses 1559 interconnect these functional modules. The processor 1505 typically also has one or more interfaces 1558 for communicating with external devices via system bus 1581, using a connection 1561. 5 The application program 1533 includes a sequence of instructions 1562 though 1563 that may include conditional branch and loop instructions. The program 1533 may also include data, which is used in execution of the program 1533. This data may be stored as part of the instruction or in a separate location 1564 within the ROM 1560 or RAM 1570. 10 In general, the processor 1505 is given a set of instructions, which are executed therein. This set of instructions may be organised into blocks, which perform specific tasks or handle specific events that occur in the electronic device 1501. Typically, the application program 1533 waits for events and subsequently executes the block of code associated with that event. Events may be triggered in response to input from a user, via 15 the user input devices 1513 of Fig. 15A, as detected by the processor 1505. Events may also be triggered in response to other sensors and interfaces in the electronic device 1501. The execution of a set of the instructions may require numeric variables to be read and modified. Such numeric variables are stored in the RAM 1570. The disclosed method uses input variables 1571 that are stored in known locations 1572, 1573 in the 20 memory 1570. The input variables 1571 are processed to produce output variables 1577 that are stored in known locations 1578, 1579 in the memory 1570. Intermediate variables 1574 may be stored in additional memory locations in locations 1575, 1576 of the memory 1570. Alternatively, some intermediate variables may only exist in the registers 1554 of the processor 1505. 25 The execution of a sequence of instructions is achieved in the processor 1505 by repeated application of a fetch-execute cycle. The control unit 1551 of the processor 1505 maintains a register called the program counter, which contains the address in ROM 1560 or RAM 1570 of the next instruction to be executed. At the start of the fetch execute cycle, the contents of the memory address indexed by the program counter is loaded into the 30 control unit 1551. The instruction thus loaded controls the subsequent operation of the processor 1505, causing for example, data to be loaded from ROM memory 1560 into processor registers 1554, the contents of a register to be arithmetically combined with the contents of another register, the contents of a register to be written to the location stored in another register and so on. At the end of the fetch execute cycle the program counter is P015750_specijodge - 12 updated to point to the next instruction in the system program code. Depending on the instruction just executed this may involve incrementing the address contained in the program counter or loading the program counter with a new address in order to achieve a branch operation. 5 Each step or sub-process in the processes of the methods described below is associated with one or more segments of the application program 1533, and is performed by repeated execution of a fetch-execute cycle in the processor 1505 or similar programmatic operation of other independent processor blocks in the electronic device 1501. 10 The captured image 120 may be essentially a "live" image captured by the camera 1501 and processed in the manner to be described in a post-capture processing configuration. Alternatively, the image 120 may be a prior captured image and provided for subsequent (off-line) processing for depth. Such an implementation may be suited to desk-top computing environments. Similarly, the set of reference data 150 may be 15 determined within the camera 1501 according to "live" capture conditions, or alternatively may be prior determined and subsequently provided to the processing architecture 100 where the data 150 is received for depth map creation. Fig. 2 shows a schematic diagram of an optical system 200 by which image capture, for the arrangements and methods presently described, may be performed. The 20 optical system 200 may form part of a camera system such as the camera 1501 and includes a lens 205, having a focal length 208 and a pupil plane 210 in or close to which there is an aperture mask 215. The optical system 200 images the light from objects in a scene captured by the camera onto a sensor 220. The aperture mask 215 is made according to one of the implementations to be described. Light 230 emanating from a point 235 of 25 the scene on the optical axis 240 of the system 200 and in the in-focus object plane 245 at the object depth 250, propagates to the pupil plane 210. At the pupil plane 210 the light 230 passes through the aperture mask 215 to then form an image on the sensor 220 located in the image plane 225, corresponding to the in-focus object plane 245. The image plane 225 is positioned at the image depth 255 from the pupil plane of the lens 205. The point 30 235 is for example a location of an object in the scene being imaged and captured by the camera. Typical captured images will include many objects on, in front of, and behind the in-focus object plane 245. The image of the object point is known as the Intensity Point Spread Function (IPSF), consisting of three bright spots on the image sensor and indicated in the example of Fig. 2 by the three small solid circles 258 and further illustrated in Fig. 5. P015750_speci-lodge - 13 For a point 260 on the optical axis 240 in some out-of-focus object plane 265, the light 270 will be in best focus at some distance 275 displaced from the sensor plane 210. On the sensor plane 220, the image of this out-of-focus point will form a modified IPSF consisting of three bright spots on the image sensor and indicated in Fig. 2 by the three small dotted 5 circles 278, where the modification depends only on the displacement of the out-of-focus object plane from the in-focus object plane. The mask 215 of the optical system 200 can take a number of forms. In one implementation, the mask 215 includes three sub-apertures 216, 217, 218, which transmit substantially all the light that falls on the sub-apertures 216, 217, 218, but where the mask 10 within each sub-aperture acts to change the optical path length for the light that passes through the sub-aperture in a way that varies across the sub-aperture. The present inventors have termed such an arrangement of the sub-apertures a "trefoil aperture". Operation of the mask 215 having, in this example, three sub-apertures results in the formation of three bright spots represented by the circles 258 seen in Fig. 2. A mask with 15 four sub-apertures would create a corresponding number of four bright spots on the sensor plane 220. Fig. 3 depicts the phase shift 300 induced by the mask 215 on a planar wave propagating in a direction parallel to the optical axis 240 of the optical system 200. In a region 310 outside the influence of the three sub-apertures 216, 217, 218, all light is 20 blocked. Within regions influenced by each sub-aperture 216-218, the induced phase shift 300 exhibits a linear gradient indicated in Fig. 3 by contour lines 320. This gradient is chosen to produce a deflection of the light passing through the sub-aperture where that deflection is out of the plane defined by a central point of the segment and the optical axis 240. 25 Fig. 4 depicts the phase shift induced by a further implementation of the aperture mask 215 on a plane wave propagating in a direction parallel to the optical axis 240 of the optical system 200. In this implementation, corresponding sub-apertures 410, 411, 412, divide a circular pupil plane aperture 425 into three segments which fill the circular aperture of the pupil plane 210 of the optical system 200. This approach has the advantage 30 of increasing the optical efficiency of the phase mask and the spatial resolution as well as allowing the diameter of the pupil plane aperture of the optical system to be adjusted without changing the shape of the trefoil apertures, but which results in the individual peaks in the IPSF being distinctly non-circular. Within each sub-aperture the induced P015750_speci-lodge - 14 phase shift is again chosen to have a linear gradient, indicated in Fig. 3 by the contour lines 420. The effect on the IPSF of the imaging system of a trefoil aperture mask such as that in Figs. 3 and 4 is illustrated in Fig. 5. For an in-focus object point on the optical axis 235, 5 the image point produced in the image plane 220 of a perfect lens with no aperture mask will also be on the optical axis 505. The effect of a trefoil aperture mask, like that illustrated in Fig. 3 and 4, is to split the IPSF into three bright spots 510, 520, 530, or IPSF elements, displaced from the optical axis 505 by corresponding vectors v 1 , v 2 . v 3 by the shift effect of the linear phase gradient in each of the three sub-apertures of the aperture 10 mask. These spots or IPSF elements are usually compactly supported in the sense that they extend over only a few pixels on the sensor and are thus referred to as 'compact elements'. In a real system with vignetting or aberration or for non-circular sub-aperture shapes such as in Fig. 4, the shape of these elements may be non-circular but they will remain more or less compact provided the imaging system is capable of capturing an image. These 15 compact elements are representative of image content by virtue of the fact that they are the images, through each of the apertures, of a single small point on an object in the scene. In the absence of the trefoil mask, the effect of shifting the point out of the in-focus object plane is to both induce some defocus (which increases the spot size if the defocus is sufficiently large) and to shift the spot on the sensor plane along a straight line in the plane 20 containing a central point of the sub-aperture and the optical axis, indicated in Fig. 5 by the vectors P1, P2, P3 for the corresponding sub-apertures 216, 217, 218. When combined with the displacement induced by the trefoil phase mask indicated in Fig 5 by the vectors vi, v 2 , v 3 , the movement of each of the three spots 510, 520, 530 is along a different straight line 515, 525, 535 and is depicted in Fig. 5 by adjacent unlabelled spots along the 25 corresponding line 515, 525, 535. These lines 515, 525, 535 are the epipolar lines of the view of the scene through each of the three apertures relative to the view obtained with the full aperture (which would form a point spread function with a single spot centred at 505). A change in the depth of the object point therefore has the effect of making a triangle (not illustrated for clarity), formed by the three spots 510, 520, 530 of the IPSF, to appear to 30 rotate and change in size. Note that changing the depth of a point while keeping the capturing camera 1501/200 focussed at a fixed depth will move the point in and out of focus. The points with different grey tone illustrated in Fig. 5 represent different depths P015750_speciJodge - 15 for the point in the scene for a fixed focal depth. Changing the focal depth for a fixed point depth will also move the point in and out of focus, but this is not illustrated in Fig. 5. The motion of the IPSF spots 510, 520, 530 along their respective epipolar lines 515, 525, 535 as the depth of the object point changes is monotonic with increasing depth, 5 in the directions indicated by the arrows 517, 527, 537. For object points off the optical axis 505, the IPSF is substantially the same but with a corresponding translation of the IPSF away from the optical axis 505. Most natural scenes do not contain isolated object points, so direct imaging of the IPSF is not usually possible. In the paraxial approximation, a natural image captured by 10 the camera 1501 with the optical system 200 will be a geometric image of the scene convolved with the IPSF of the optical system 200. Because the IPSF of the optical system 200 has a number of isolated peaks (three peaks in the example illustrated by Fig. 5), the captured image will look like several overlapped copies of the scene with a small displacement of each copy of the scene image. To recover the depth dependent parameters 15 for a region of the captured image, an autocorrelation of the image region is calculated. The autocorrelation gives a measure of the similarity of an image to shifted copies of itself. Use of the trefoil aperture mask 215 provides effectively three views of the scene, the interrelationship between which provides for determining a depth map for the captured image. A mask having more than three sub-apertures may alternatively be used. The use 20 of three or more views of a scene will always produce at least one pair of views of a linear feature in the scene that exhibits a change in disparity (separation of the images of the linear feature on the sensor) with depth of the feature. Equivalently, the epipolar line for at least one pair of views will cross any linear feature at an oblique angle. For a mask with only two apertures and therefore with only two views of the scene, when a linear feature is 25 aligned with the epipolar line for this pair of views, it will not be possible to determine the depth of the feature. For a trefoil aperture masked imaging system 200 such as that illustrated in Fig. 2, Fig. 6 schematically illustrates the behaviour of an autocorrelation of the IPSF of the system 200 as the depth of the point object changes. Each aperture of the trefoil mask may 30 be considered as forming a copy image, where the actual captured image 120 obtained using the imaging system 200 represents a summation or merging of the three copy images. The relative alignment of the copies of features in that summation may be considered as representative of depth information for the captured image. For a point in the in-focus object plane 245, an autocorrelation consists of a central spot 610 (the small P015750_speci-lodge - 16 lighter spot in Fig. 6) corresponding to the strong similarity of the image to an unshifted copy of itself, and a set of six spots 611, 612, 613, 614, 615, 616, which occur at shifts which align the two images formed by two different sub-apertures. For example, spot 611 corresponds to a shift v 12 which aligns the spot from the first aperture in the IPSF to the 5 spot from the second aperture in the copy of the image. The opposite or symmetric spot 614 corresponds to a shift v 2 1 which aligns the spot from the second aperture in the IPSF to the spot formed by the first aperture in the copy of the image. The effect of defocus is to enlarge the spots in the IPSF, in addition to moving the position of the spots, and therefore also to enlarge the spots in an autocorrelation. The spot 611 in the autocorrelation moves 10 along an 'epipolar' line 621, as the defocus changes, corresponding to the epipolar line of the view through aperture I with respect to the view through aperture 2 of the trefoil aperture mask. The spot 631 at a position a 1 2 is the sum of the shift vector v 1 2 (which results from the difference in the shifts induced by prismatic element in the two corresponding apertures of the trefoil mask and is a constant vector that does not change 15 with depth) and a vector P12 (which is determined by the different shifts induced by the defocus in the two views and is dependent on the depth of the local region in the scene). The movement of the other spots are similarly constrained to move along 'epipolar' lines 622, 623, 624, 625, 626 corresponding to the six possible pairs of views. Fig. 7 is a flow chart 700 of a method for the production of the reference data set 20 150. The flow chart 700 represents a method of processing preferably implemented using software stored in the memory 1509 and executable by the processor 1505 to determine reference data for a set of depths. The method 700 can take as input captured image data in the form of measured IPSFs at different depths (as used in step 720 to be described), the method 700 can process simulation image data in the form of calculated IPSFs at different 25 depths. A first step 710 of the method 700 selects an appropriate depth range and generates discrete depth values over the range that includes most of the object depths 250 of a scene. The discrete depth values can be evenly spaced over the selected depth range or chosen to produce equal increments in defocus or spaced to concentrate depth samples around known 30 object depths. The result of step 710 is a list of discrete depth values of varying value. In a second step 720, the IPSF for one of the depth values in step 710 is measured or calculated by the processor 1505. Standard methods exist for measuring the IPSF given the optical system and calculating the IPSF given a mathematical model of the optical P015750_speciJodge - 17 system. The IPSF is stored as a two-dimensional array of intensity values in the memory 1509, and has the sample spacing of the optical sensor 220. In a next step 725, an autocorrelation of the IPSF is calculated. Any measure of the similarity of the IPSF to a shifted copy of itself as a function of the vertical and horizontal 5 shifts can be used as an autocorrelation. One approach that may be used is to calculate the sum, over all of the pixels in a shifted copy of the IPSF, of the pixel-wise product of the shifted copy of the IPSF and the un-shifted IPSF for all possible pixel shifts. An autocorrelation can also be calculated as the sum of the absolute differences of the IPSF and the shifted copy of the IPSF calculated for all possible pixel unit shifts of the LPSF 10 relative to the copy of the IPSF. Another approach for calculating an autocorrelation of the IPSF is to use the inverse Fourier transform of the squared magnitude of the Fourier transform of the IPSF. An autocorrelation of the IPSF can be represented as an image where each pixel value represents the strength of the similarity of the IPSF and the shifted copy of the IPSF 15 where the location of the pixel in this autocorrelation image corresponds to the vertical and horizontal shift of the copy of the IPSF relative to the IPSF. In the next step 728 the autocorrelation is analysed to identify principal features in the autocorrelation. Typically, these features are the isolated intensity peaks in the autocorrelation, such as illustrated in Fig. 6, which can be recorded simply by their peak 20 position, but alternatively or additionally may be recorded as normalised segments of the autocorrelation image or parametric fits to the peaks. In one implementation the trefoil mask illustrated in Fig. 3 is used as the aperture mask. The behaviour of the autocorrelation of the IPSF as the depth of the object point is varied for this implementation is illustrated in Fig. 6. At any given depth the autocorrelation will contain 25 seven principal features, represented by the spots for the given object depth in Fig 6. The principal features are identified and the autocorrelation for each depth is segmented and labelled to identify these features. In the next step 730, the IPSF autocorrelation for a selected depth value is used to calculate a reference data set 740 for the current depth value which is saved for later use. 30 The reference data set 740 is used in the process described in Fig. 8 to determine the match score for a subregion of the captured image for each of the discrete depths. In one implementation the reference data for the current depth value takes the form of the autocorrelation image itself, and includes a segmentation which identifies a series of subsets of the principal features of the autocorrelation image. These subsets can contain P015750_speci-lodge - 18 isolated features or multiple features. In another implementation, the reference data takes the form of a parametric representation of the series of subsets of principal features of the autocorrelation image obtained by a fitting process. Step 750 tests to determine if all depth values have been processed, and causing 5 (No) a repeat of steps 720,725,730 and 740 for each of the depth values originally determined in step 710. When all depth values have been processed, the collection of the reference data 740 for all the depth values forms the reference data set 150. Fig. 8 is a flow chart representing a preferred method 130 for creating a depth map for the architecture of Fig. I from a captured image 120. 10 In a first step 805, a filter is applied to the captured image 120 to reduce the effect of low spatial frequency terms on the autocorrelation. Step 805 reduces the correlation in the data. For a scene whose image is well-decorrelated (such that an image of the scene does not resemble shifted copies of itself, as for example with a noise-like texture), step 805 may simply involve removing the local mean value in the data. If the data is not well 15 decorrelated, then the filter of step 805 must operate to reduce the strong local correlation of low spatial frequency components (which will tend to dominate the autocorrelation) so that sharp well-defined peaks are formed in the autocorrelation. One form of image filter that is well-suited to step 805 where the data is not well-decorrelated is an image gradient magnitude filter. Gradient magnitude filters can be created as the magnitude of a vector 20 gradient filter such as the Sobel operator, the Prewitt operator or the Roberts Cross operator. Many other filters can also be used, such as the Canny edge detector or gradient operators based on mathematical morphology. In the next step 810, an image region from the filtered captured image is selected. An image region typically consists of a selected pixel and a set of pixels in a 25 neighbourhood around the selected pixel. The region for example may be a 32x32 block of pixels or a 64x64 block of pixels. The selected pixels and associated regions or blocks define the set of image regions. The selected pixels may be chosen by a regular sampling of the captured image, or the selection may be based on the properties of the image region (such as intensity, or gradient or texture or some other image feature). Many methods for 30 selecting a neighbourhood around the selected pixel are known. One form of neighbourhood is the set of pixels within a defined distance of the selected pixel, where the distance is one of the metric distances: the Manhattan distance; the Euclidean distance or the Chebyshev distance. The depth of the object that corresponds to the selected image pixel is estimated using all the pixel values in the image region. P015750_speci-lodge - 19 In the next step 820, an autocorrelation of the selected region is calculated from the filtered captured image. Standard methods for calculating an autocorrelation exist. The autocorrelation method is usually chosen to match the method used in the calculation step 725 of Fig. 7 where the autocorrelation for the reference data from the IPSF is determined. 5 In the next step 825, a test data set is calculated from the selected region. In one implementation this is the set of autocorrelation values (in the form of an autocorrelation image), which may include a segmentation of the autocorrelation image identifying the principal features. In an alternate implementation, the test data set may be a parametric representation of the principal features of the autocorrelation image based on a fitting 10 process. This alternate implementation would be chosen to match the step 730 used in the calculation of the reference data set from the IPSF. In the next step 830, a depth match score array 835 is created from the test data set of a selected region 828 and the reference data set 150 and stored in the memory 1509. The details of this step are further illustrated in Fig 9. 15 In the next step 840, a scene depth of the selected image region is estimated from the depth match score array 830 and the depth value is saved in a depth map 850 formed in the memory 1509. This is further illustrated in Fig. 14. Step 860 then tests that all regions have been analysed, and where not, steps 810 to 840 are performed for every image region until all regions are determined to have been 20 analysed. At such a stage, the depth map 850 then contains a scene depth estimate from each selected image region. When all regions have been analysed, the method 130 ends. Fig. 9 is a flow chart of a method 830 for the production of the match score array 835 for a test data set of an image region 828 according to a preferred implementation. The depth estimation process determines the depth from the position of features in the 25 autocorrelation of the test data for the selected region. However, the captured image data can have some degree of self-similarity arising from self-similarity in the scene (for example along a high contrast edge in the scene). This will result in extended features in the autocorrelation of the image region which may not have a well-defined central peak. However, matching the features in the IPSF autocorrelation to the autocorrelation of the 30 selected region of the filtered captured image, makes it possible to determine the position along the epipolar line intersected by the extended feature in the autocorrelation of the selected region of the filtered captured image. This position can be used to infer the depth of the selected region. P015750_specijodge - 20 The method 830 has a first step 910, where a feature subset, s , is selected or chosen from the set of reference features. In one implementation, this feature subset corresponds to one or more of the spots 611 to 616 that are identified and segmented 728 in the IPSF autocorrelation image for each depth dj. The positions of these spots move 5 along the 'epipolar' lines 621 to 626 as the depth d; of the object point changes. The number and form of the features depends on the design of the aperture mask 215. The selection of the subsets in step 910 may involve associating a subset with each isolated peak in the autocorrelation. However, the autocorrelation is approximately symmetric so there is a computational advantage to pairing the symmetric components into 10 subsets consisting of two peaks. Furthermore, the effect of noise on the matching score is reduced by using more than one peak in each subset. However some peaks are likely to be affected by interactions between structure in the image data and the configuration of the points in the point spread functions. If it is assumed that no more than n of the peaks are significantly impacted by such interactions, then one solution would be to choose all 15 possible unique unordered subsets of size N-n, where N is the number of peaks whose position in the autocorrelation of the IPSF is dependent on the depth of the point in the scene. In general, this will be NC, . For example with three peaks in the IPSF whose position depends on depth of the point in the scene, there will be six peaks in the autocorrelation of the IPSF whose position is sensitive to depth. If up to two of these 20 points are likely to be adversely affected by structure in the scene, then choosing all possible subsets of four points (of which there will be 6

C

4 = 15 possible subsets) will provide a set of matching scores which are minimally affected by noise and for which a subset are minimally affected by the structure in the scene. This process will make the depth analysis robust to both noise and to structure in the scene. 25 In the next step 920, a depth value, dj, is selected. A depth value, dj, is one of the discrete depth values selected in step 710. In the next step 930, a matching score, M(i,j), is calculated. The matching score is a measure of the quality of the match between the feature subset si in the reference data 150 for depth dj, and the test data 828 for the selected region. In one implementation, the 30 matching score maybe calculated as an inner product of the reference data for depth d; in the region of the feature si with the corresponding region of the autocorrelation image for the selected image region of the filtered captured image. In another example, the feature subset si may be simply the locations of a subset of the peaks in the corresponding IPSF P015750_speci_lodge -21 autocorrelation image for depth dj, in which case the match score may be a function of autocorrelation values at the corresponding positions of the autocorrelation image for the selected image region of the filtered captured image. The match score of depth value dj and feature subset si is then stored in the match score array M830 within the 5 memory 1509. The steps 920 to 950 are repeated until all depths have been tested, as determined in step 950 for the current feature subset. If all depths have been tested then processing proceeds to test in step 960 whether all feature subsets have been tested. If they have not then the next feature is selected 910 and steps 920 to 950 are repeated for all depths for the 10 next selected feature. If all feature subsets have been tested, as determined at step 960, then the processing of the method 830 terminates, at which point the match score array 835 contains match scores for all features for all depths. The calculation of the match score array is further illustrated in Figs. l0a, 10b and 10c. In Fig. 10a the autocorrelation of the selected region of the filtered captured image is 15 shown as a contour plot 1010 of the autocorrelation image and exhibits seven extended features 1011 to 1017 with high autocorrelation values, embedded within a larger area of low autocorrelation values 1018. A similar plot is shown for each of the features in the IPSF autocorrelation 1024 as well as IPSF autocorrelation values 1021, 1022, 1023, 1025 and 1026 of Figs. 10b and 10c to illustrate the matching process. Each plot shown in 20 Figs. 10a to 10c shows one of the IPSF autocorrelation features 1031 to 1036 for a series of possible depths along the epipolar line for the corresponding pair of scene views. The IPSF autocorrelation features for only four discrete depths are shown in each plot, however, for the purposes of the Depth Match Score calculation of step 930, more discrete depths would be used to obtain greater depth discrimination. The overlap of the IPSF 25 autocorrelation feature with the autocorrelation of the selected region of the filtered captured image is a visual indication of the strength of the match strength score. The depth match scores produced by the process of Fig. 9 from the data illustrated in Figs. 1Oa, 1Ob and 1Oc are plotted in Figs. I Ia, I lb and I lc. Each of the plots 1121 to 1126 presents the match score results from the corresponding autocorrelation figure 1021 30 to 1026, with the black spots 1130 giving the match score for one of the IPSF autocorrelation features for one of the illustrated depths in the corresponding figure. The open circles 1140 represent match score for the same IPSF autocorrelation features for one of the intermediate depths that is not graphically depicted in Figs. 10a, 10b and 10c. It should be noted that the sharpness of the peak depends on the relative angle with which the P015750_specilodge - 22 corresponding epipolar line in Figs. 10a, 10b and 10c crosses the autocorrelation feature. In the presence of noise in the data, this means that the sharper peaks will return a smaller depth error. Depending on the nature of the image feature it may even be that one or more of the features will not return a reliable depth estimate. 5 The calculation of the depth match score array is further illustrated in Figs. 12a, 12b and 12c. In this example, the isolated features in the autocorrelation of the IPSF are grouped into subsets. This grouping takes advantage of the near symmetry of the autocorrelation to reduce the number of feature subsets that need to be considered. The autocorrelation of the selected region of the filtered captured image is shown in Fig. 12a as 10 a contour plot 1210 of the autocorrelation image and exhibits seven extended features 1211 to 1217 with high autocorrelation values embedded within a larger area of low autocorrelation 1218. A similar plot is shown for each of the feature subsets in the IPSF autocorrelation 1222 and 1223 in Figs. 12b and 12c to illustrate the matching process. Each plot in Figs. 12a, 12b and 12c shows one of the IPSF autocorrelation feature subsets 15 1231 to 1233 for a series of possible depths along the epipolar lines for the corresponding pair of scene views. The IPSF autocorrelation feature subsets for only four discrete depths are shown in each plot, however, for the purposes of the Depth Match Score calculation of step 930, more discrete depths would be used to obtain greater depth discrimination. The overlap of the IPSF autocorrelation feature with the autocorrelation of the selected region 20 of the filtered captured image is a visual indication of the strength of the match strength score. The depth match scores produced by the process of Fig. 9 from the data illustrated in Figs. 12a, 12b and 12c are plotted in Figs. 13a, 13b and 13c. Each of the plots 1321 to 1323 presents the match score results from the corresponding autocorrelation figure 1221 25 to 1223, with the black spots 1330 giving the match score for one of the IPSF autocorrelation feature subsets for one of the illustrated depths in the corresponding figure. The open circles 1340 represent match score results for the same IPSF autocorrelation feature subset for one of the intermediate depths that is not graphically depicted in Figs. 12a to 12c. It should be noted that the sharpness of the peak depends on the relative 30 angle with which the corresponding epipolar lines in Figs. 12a to 12c cross the autocorrelation features. In the presence of noise in the data, this means that the sharper peaks will return a smaller depth error. Depending on the nature of the image feature it may even be that one or more of the features will not return a reliable depth estimate. P015750_specijlodge - 23 There are a number of ways to combine the data from the match scores for each feature subset to arrive at a depth estimate. Fig. 14 is a detailed flow chart of a preferred method 840 of Fig. 8 for the estimation of the depth for a selected region given the match score array 940 for the selected image region. This process 840 looks at the depth match 5 scores 940 for each of the features in the autocorrelation of the IPSF for a series of depths (corresponding to one of the plots in Figs. I Ia to Ic and identifies the depth D(i) that produces a maximum value in the match score for that feature as well as some measure of the of the confidence in that depth estimate C(i). The depth that produce the maximum score for each of the features are then combined along with the confidence measure to 10 produce a depth estimate. In the implementation illustrated in Fig. 14, a first step 1410 selects the next feature in the IPSF autocorrelation. In the next step 1420, the maximum score value is set to zero. In the next step 1430, the next depth value is selected, and then the match score value for the selected depth is compared at step 1440 to the maximum depth score value. If the 15 current depth score value is greater than the maximum depth score value, then the maximum depth score value is set to the current depth score value and the depth index J(i) for the current feature subset is set to the current depth index and processing proceeds to step 1460. If the current depth score value is not greater than the maximum depth score value, then processing proceeds directly to step 1460. Step 1460 tests if all depths have 20 been tested, and where untested depths remain then the next depth is selected 1430, and subsequent steps are processed until step 1460 determines that all depths have been tested. Processing then proceeds to step 1470 which tests if all features have been tested, and if not, then processing proceeds to step 1410 to select the next feature. Subsequent steps are taken until step 1470 determines that all features have been processed. At this point the 25 depth index array J will contain the depth indices of the best discrete depth estimates from each of the IPSF autocorrelation features. Processing then proceeds to step 1480 where the set of depth indices in the depth index array J and the match score array M is processed to return a single depth estimate. In one implementation this may be a simple average of the discrete depths dD(i). In another implementation this can be a median or similar robust 30 average of the depth estimates depths dDCL) to make the process robust to large errors in the estimated depths that will arise when the extended feature in the autocorrelation of the selected image region aligns with the epipolar line for the IPSF autocorrelation (which would result in a weak dependence of the response on the depth of the object point and a P015750_specijodge - 24 highly inaccurate depth estimate). In one implementation, a robust average might detect that one or more of the depths are unreliable based on the strength of the matching score or the sensitivity of the matching score to depth changes and would then place more weight on the more reliable depths in the averaging process. In another implementation, a refined 5 depth can be calculated by fitting a quadratic to the points around the best discrete depth for each feature subset i to estimate the depth corresponding to the local peak in matching score. A confidence measure can also be calculated based on the matching score at the peak and the curvature of the matching score at the peak (since a higher score suggests a better match and a lower curvature indicates higher uncertainty in the location of the peak 10 and therefore the depth). This confidence measure can be used when combining the depth values from each subset to discount the contributions of less reliable subsets. INDUSTRIAL APPLICABILITY The arrangements described are applicable to the computer and image processing industries and particularly for the determination of depth-of-field maps in images. 15 The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. (Australia Only) In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not 20 "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. P015750_speci-lodge

Claims

1. A method for determining distance to an object in a scene from a camera, the camera having captured an image of the scene, the method comprising: 5 providing the image of the scene, the image being captured through a masked lens of the camera, the masked lens having an intensity point spread function (IPSF), the autocorrelation of which produces an image with a plurality of compact elements wherein a position for at least one of the compact elements varies according to the distance to be determined; 10 receiving a set of reference data with each data of the set having an associated distance value and being formed from an autocorrelation of the point spread function for a point object at the associated distance value; forming test data based on an autocorrelation of a selected region of the captured image; 15 forming a series of distance estimatesbased on matching measures determined between the test data and a subset of the reference data at each of the associated distance values; and estimating the distance of said object using the set of distance estimates from the selected subsets of reference data. 20

2. A method according to claim 1, wherein the autocorrelation of the forming of the set of reference data comprises calculating a sum, over all of the pixels in a pixel-wise product of a shifted copy of the IPSF and an un-shifted IPSF for all possible pixel shifts. 25

3. A method according to claim 1, wherein the autocorrelation of the forming of the set of reference data uses an inverse Fourier transform of a squared magnitude of a Fourier transform of the IPSF.

4. A method according to claim 1, wherein the forming of the set of reference data 30 comprises analysing an autocorrelation image formed from the autocorrelation thereof to identify isolated intensity peaks in the autocorrelation image, and recording the peaks by at least one of: (i) a corresponding peak position, (ii) as normalised segments of the autocorrelation image, and (iii) as parametric fits to the peaks. P015750_speci-lodge - 26

5. A method according to claim 1, further comprising filtering the captured image, by removing a local mean value from well-decorrelated data of the captured image.

6. A method according to claim 1, further comprising filtering the captured image 5 using an image gradient filter for image data that is not well-decorrelated, the image gradient filter being a gradient magnitude filter.

7. A method according to claim 1, wherein the test data is formed from a set of autocorrelation values from an autocorrelation image of the selected region. 10

8. A method according to claim 1, wherein the test data is formed as a parametric representation of principal features of an autocorrelation image of the selected region based on a fitting process. 15

9. A method according to claim 1, wherein the forming of the series of distance estimates comprises selecting a feature subset from the set of reference data by associating a feature subset with each isolated peak in an autocorrelation.

10. A method according to claim 9, further comprising pairing symmetric components 20 of the autocorrelation into subsets consisting of two peaks.

11. A method according to claim 9, further comprising determining a set of said matching measures which are minimally affected by noise and for which a subset are minimally affected by structure in the scene. 25

12. A method according to claim 1, wherein the matching measures are calculated as an inner product of a selected region corresponding to a feature subset in the reference data for a selected depth with a corresponding region of the autocorrelation of the captured image. 30

13. A method according to claim 1, wherein the matching measures are calculated as a function of autocorrelation image values at the positions in the autocorrelation image for the selected image region of the captured image corresponding to feature locations in the reference data for a selected depth. P015750_speci~jodge - 27

14. A method according to claim 1, wherein the estimating of the distance comprises averaging the set of distance estimates. 5

15. A method according to claim 1, wherein the estimating of the distance comprises determining a median of the set of distance estimates.

16. A method according to claim 1, wherein the estimating of the distance comprises calculating the distance by fitting a quadratic to the match scores at a set of discrete depths 10 and using that quadratic fit to estimate the depth corresponding to a peak in the match score.

17. A method according to claim 16, wherein the estimating of the distance comprises calculating a confidence measure based on the matching measure at the local peak and a 15 curvature of the matching measure at the local peak.

18. A method according to claim 1, wherein the providing of the image comprises capturing the image through the masked lens of the camera. 20

19. A camera adapted to perform the method of any one of the preceding claims.

20. A computer-readable storage medium having a program recorded thereon, the program being executable by a processor to perform the method of any one of the preceding claims. 25 Dated this 20th day of December 2011 CANON KABUSKIKI KAISHA Patent Attorneys for the Applicant Spruson&Ferguson 30 P015750_speciJodge