AU2016273979A1

AU2016273979A1 - System and method for adjusting perceived depth of an image

Info

Publication number: AU2016273979A1
Application number: AU2016273979A
Authority: AU
Inventors: Matthew Raphael Arnison; Nicolas Pierre Marie Frederic Bonnier
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-07-05

Abstract

-34 Abstract A method of modifying the perceived depth of an image capturing a scene. A physical depth map of the scene captured in the image is received. A perceived depth map is generated from the image. Based on a correspondence between the physical depth map and the perceived 5 depth map, a plurality of perceived depth scores defining a perceived depth quality of the image is determined. A first local image adjustment process is applied to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image. 12)flO-7fA,1 ID~r 722 Cr- i Ae EilArl

Description

SYSTEM AND METHOD FOR ADJUSTING PERCEIVED DEPTH OF AN IMAGE

TECHNICAL FIELD

The present disclosure relates generally to the field of image processing, and, in particular, to the field of processing images according to how they are perceived by humans. The present disclosure also relates to a system and method for applying image processing to an image which adjusts the perceived depth of the objects in the image.

BACKGROUND

Still photography and video cameras typically capture a two-dimensional (2D) image of a real world scene which is inherently three-dimensional (3D). While some of the scene’s depth information is perceivable in the captured image by various monocular depth cues present in the image, such as relative size, familiar size, texture gradient, aerial perspective, linear perspective, overlap (or interposition), height in the visual field, shade and shadows other visual cues such as stereo cues such as stereopsis, or retinal (binocular) disparity, convergence and shadow stereopsis are usually not perceivable in an image captured using a typical image capture device which has a single sensor and lens system. Similarly, depth cues from movements such as depth from motion, kinetic depth effect, in a video capture are not available in still images. Since various monocular cues from a two-dimensional image are relied upon by human observers to mentally construct a representation of the perceived depth of the original three-dimensional scene, the perceived depth from the two-dimensional image may differ significantly from the actual physical depth in the three-dimensional scene when there is a loss in depth cues caused by the mapping of the three-dimensional scene to a two-dimensional image. This can often result in an image perceived as flat, when compared to the actual three-dimensional scene.

For example when two objects A and B are at different depths in the scene, it is often difficult to estimate the respective depths of objects A and B, and/or the relative difference in depth between object A and object B in the absence of monocular cues in the two-dimensional image. Accoridngly, when objects appear to be at the same distance in a two-dimensional image, the viewer of the two-dimensional image may find it difficult to determine if the objects were actually at the same distance in the scene.

Some attempts to address these deficiencies in image quality have included capture systems which rely on depth information of the scene being recorded at image capture time using, for example, an infrared sensor coupled with an infrared light illuminating the scene, a laser scanning device or a stereo camera arrangement. In order to maximise the perceived scene depth for a viewer of the captured image, a three-dimensional reproduction of the image is created for viewing on three-dimensional display with appropriate viewing glasses. While this provides a high level of perceived depth, the specialised hardware requirements for properly viewing such images can be too onerous in many situations. Accordingly, in situations where a two-dimensional image is more suitable, there is a need to enhance the depth that can be perceived by a viewer of the two-dimensional image to more accurately represent the actual depth in the three dimensional scene being captured.

An experienced user may use one or multiple image processing tools such as unsharp masks, artificial bokeh, or aerial perspective to post-process a captured image and enhance its depth ny enhancing one or multiple depth cues in one or several areas of the image. For example, the user may manually apply artificial blur to the background of a model in a portait to increase the perceived separation between the model and the background, or apply unsharp masking to the area of the image corresponding to the model to increase the perceived texture of their clothes, skin and hair.

One problem with post-processing a captured image after the image has been captured in order to enhance the perceived depth in the image is that the user may only have access to the the two-dimensional image and cannot easily determine if that two-dimensional image is a good representation of the actual three-dimensional scene. This problem is made even more difficult when the user applying the post-processing was not involved in the image capture at the scene location and is therefore not aware of the actual geometry of the scene Some examples in which such a situation may arise are when the image was captured by another person, or by an automatic system such as a photo booth, or a camera on a drone or a satellite. In some circumstances that scene may be moving so fast that observers cannot get a good understanding of the scene composition at the image capture time. Examples include images captured at a sporting event or from a moving vechicle. A lack of awareness of actual scene depth can also occur in contexts where it is not possible for the user to rely on stereo cues in the scene. For example, when the objects in the scene are too far away (e.g. mountains, stars) or too small (e.g. can only been seen through a monocular microscope), or more generally, when stereo cues from the scenes are either non-existant or cannot be relied upon.

Without this understanding of the scene, it is is difficult for a user intending to post-process a captured image to identify how to apply the post-processes to produce a modified image that is perceptually closer to the real physical depth in the scene.

Accordingly, there exists a need for an improved method of modifying the perceived depth in a captured image in order to produce a modified image that conveys a sense of depth perceptually closer to the real physical depth of the scene.

SUMMARY

It is an object of the present disclosure to substantially overcome, or at least ameliorate, at least one disadvantage of present arrangements.

According to one aspect of the present disclosure, there is provided a method of modifying the perceived depth of an image capturing a scene, said method comprising: receiving a physical depth map of the scene captured in the image; generating a perceived depth map from the image ; determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to another aspect of the present disclosure, there is provided an apparatus for modifying the perceived depth of an image capturing a scene, said apparatus comprising:means for receiving a physical depth map of the scene captured in the image; means for generating a perceived depth map from the image ; means for determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and means for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to still another aspect of the present disclosure, there is provided a system for modifying the perceived depth of an image capturing a scene, said apparatus comprising: a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: receiving a physical depth map of the scene captured in the image; generating a perceived depth map from the image ; determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to still another aspect of the present disclosure, there is provided a computer readable medium having a computer program stored on the medium for modifying the perceived depth of an image capturing a scene, said program comprising: code for receiving a physical depth map of the scene captured in the image; code for generating a perceived depth map from the image; code for determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and code for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to still another aspect of the present disclosure, there is provided a method of modifying perceived depth of an image capturing a scene, said method comprising: receiving a physical depth measurements of a first object and second object in the scene captured in the image; generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to still another aspect of the present disclosure, there is provided an apparatus for modifying perceived depth of an image capturing a scene, said apparatus comprising: means for receiving a physical depth measurements of a first object and second object in the scene captured in the image; means for generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; means for determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and means for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to still another aspect of the present disclosure, there is provided a system for modifying perceived depth of an image capturing a scene, said system comprising: a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: receiving a physical depth measurements of a first object and second object in the scene captured in the image; generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

According to still another aspect of the present disclosure, there is provided a computer readable medium having a computer program stored on the medium for modifying perceived depth of an image capturing a scene, said program comprising: code for receiving a physical depth measurements of a first object and second object in the scene captured in the image; code for generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; code for determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and code for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

One or more embodiments of the invention will now be described with reference to the following drawings, in which:

Figs. 1A and IB form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced.

Fig. 2 is a schematic representation of the described arrangements;

Fig. 3 is a schematic block diagram of a data processing architecture;

Fig. 4A and Fig. 4B are schematic block diagrams of the process of extracting features from the physical and perceived depth map;

Fig. 5 is a schematic block diagram of the process of comparing the physical and perceived depth map to calculate scores;

Figs. 6A, 6B and 6C are schematic representations of an example application of the described arrangements in which an image is post processed to modify perceived depth quality of the image; and

Fig. 7 is a schematic block diagram of the process of adjusting the camera capture settings using the rittai-kan scores.

DETAILED DESCRIPTION INCLUDING BEST MODE

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

It is to be noted that the discussions contained in the "Background" section and the section above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the present inventor(s) or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art.

The present disclosure is directed at post-processing a captured image in order to modify the perceived depth quality of the image and better convey the real physical depth in the real world three-dimensional scene portrayed in the image. This perceived depth quality will be referred to below as “Rittai-kan” in Japanese).

Rittai-kan represents “a feeling of three-dimensionality”, or a “sense of depth” of the content of a two-dimensional image of a three-dimensional scene, and is a perceptual property of an image which is highly-valued by Japanese professional photographers. Some aspects of rittai-kan are compositional and are therefore set at the time of capture of the image and cannot be easily adjusted later. However, other aspects of rittai-kan may be adjusted after capture through postprocessing.

Through the use of a post-processing tool which allows photorealistic manipulations of rittai-kan in captured two-dimensional images, users are able to directly modify the perceived sense of depth in a two-dimensional image by selecting a desired strength of change of rittai-kan. This tool is used to produce a modified image that is perceptually closer to the real physical depth of the scene, provided the measured physical depths determined for the scene are accurate.

In described arrangements, an image capture device captures an image of a scene together with additional depth data for the scene. For example, as illustrated in Fig. 2, an image capturing device (such as a digital still camera or video camera) 210, captures a scene 220. The scene is composed of a foreground object 230, a mid-ground object 240, and a background object 250. The image capturing device 210 captures a two-dimensional photographic image 221 and a physical depth map 222 of the scene 220. The foreground object 230, mid-ground object 240 and background object 250 are captured in the image 221 as represented by circles 231, 241 and 251 respectively, and in the physical depth map 222 as represented by shaded circles 232, 242 and 252 respectively.

An evaluation algorithm then determines the perceived quality of the captured image, and applies post-capture image adjustement processing on the image to optimise the image quality in a manner that improves the perceived quality of the image. These evaluation and image adjustment processes are discussed in more detail below.

Figs. 1A and IB depict a general-purpose computer system 100, upon which the various arrangements described can be practiced.

As seen in Fig. 1 A, the computer system 100 includes: a computer module 601; input devices such as a keyboard 102, a mouse pointer device 103, a scanner 126, a camera 127, and a microphone 180; and output devices including a printer 11, a display device 114 and loudspeakers 117. An external Modulator-Demodulator (Modem) transceiver device 116 may be used by the computer module 101 for communicating to and from a communications network 120 via a connection 121. The communications network 120 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 121 is a telephone line, the modem 116 may be a traditional “dial-up” modem. Alternatively, where the connection 121 is a high capacity (e g., cable) connection, the modem 116 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 120.

The computer module 101 typically includes at least one processor unit 105, and a memory unit 106. For example, the memory unit 106 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 101 also includes an number of input/output (I/O) interfaces including: an audio-video interface 107 that couples to the video display 114, loudspeakers 117 and microphone 180; an I/O interface 113 that couples to the keyboard 102, mouse 103, scanner 126, camera 127 and optionally a joystick or other human interface device (not illustrated); and an interface 108 for the external modem 116 and printer 11. In some implementations, the modem 116 may be incorporated within the computer module 101, for example within the interface 108. The computer module 101 also has a local network interface 111, which permits coupling of the computer system 100 via a connection 123 to a local-area communications network 122, known as a Local Area Network (LAN). As illustrated in Fig. 1 A, the local communications network 122 may also couple to the wide network 120 via a connection 124, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 111 may comprise an Ethernet circuit card, a Bluetooth® wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 111.

The I/O interfaces 108 and 113 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 109 are provided and typically include a hard disk drive (HDD) 110. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 112 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e g., CD-ROM, DVD, Blu-ray DiscTM), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 100.

The components 105 to 113 of the computer module 101 typically communicate via an interconnected bus 104 and in a manner that results in a conventional mode of operation of the computer system 100 known to those in the relevant art. For example, the processor 105 is coupled to the system bus 104 using a connection 118. Likewise, the memory 106 and optical disk drive 112 are coupled to the system bus 104 by connections 119. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun Sparcstations, Apple Mac™ or a like computer systems.

Processes to be described may be implemented using the computer system 100 wherein the processes, to be described, may be implemented as one or more software application programs 133 executable within the computer system 100. In particular, the steps of the current method are effected by instructions 131 (see Fig. IB) in the software 133 that are carried out within the computer system 100. The software instructions 131 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the current methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 100 from the computer readable medium, and then executed by the computer system 100. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 100 preferably effects an advantageous apparatus for implementing the described arrangements.

The software 133 is typically stored in the HDD 110 or the memory 106. The software is loaded into the computer system 100 from a computer readable medium, and executed by the computer system 100. Thus, for example, the software 133 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 125 that is read by the optical disk drive 112. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 100 preferably effects an apparatus for implementing the described arrangements.

In some instances, the application programs 133 may be supplied to the user encoded on one or more CD-ROMs 125 and read via the corresponding drive 112, or alternatively may be read by the user from the networks 120 or 122. Still further, the software can also be loaded into the computer system 100 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 100 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 101. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 101 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 133 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 114. Through manipulation of typically the keyboard 102 and the mouse 103, a user of the computer system 100 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 117 and user voice commands input via the microphone 180.

Fig. IB is a detailed schematic block diagram of the processor 105 and a “memory” 134. The memory 134 represents a logical aggregation of all the memory modules (including the HDD 109 and semiconductor memory 106) that can be accessed by the computer module 101 in Fig. 1A.

When the computer module 101 is initially powered up, a power-on self-test (POST) program 150 executes. The POST program 150 is typically stored in a ROM 149 of the semiconductor memory 106 of Fig. 1A. A hardware device such as the ROM 149 storing software is sometimes referred to as firmware. The POST program 150 examines hardware within the computer module 101 to ensure proper functioning and typically checks the processor 105, the memory 134 (109, 106), and a basic input-output systems software (BIOS) module 151, also typically stored in the ROM 149, for correct operation. Once the POST program 150 has run successfully, the BIOS 151 activates the hard disk drive 110 of Fig. 1A. Activation of the hard disk drive 110 causes a bootstrap loader program 152 that is resident on the hard disk drive 110 to execute via the processor 105. This loads an operating system 153 into the RAM memory 106, upon which the operating system 153 commences operation. The operating system 153 is a system level application, executable by the processor 105, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 153 manages the memory 134 (109, 106) to ensure that each process or application running on the computer module 101 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 100 of Fig. 1A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 134 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 100 and how such is used.

As shown in Fig. IB, the processor 105 includes a number of functional modules including a control unit 139, an arithmetic logic unit (ALU) 140, and a local or internal memory 148, sometimes called a cache memory. The cache memory 148 typically includes a number of storage registers 144 - 146 in a register section. One or more internal busses 141 functionally interconnect these functional modules. The processor 105 typically also has one or more interfaces 142 for communicating with external devices via the system bus 104, using a connection 118. The memory 134 is coupled to the bus 104 using a connection 119.

The application program 133 includes a sequence of instructions 131 that may include conditional branch and loop instructions. The program 133 may also include data 132 which is used in execution of the program 133. The instructions 131 and the data 132 are stored in memory locations 128, 129, 130 and 135, 136, 137, respectively. Depending upon the relative size of the instructions 131 and the memory locations 128-130, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 130. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 128 and 129.

In general, the processor 105 is given a set of instructions which are executed therein. The processor 105 waits for a subsequent input, to which the processor 105 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 102, 103, data received from an external source across one of the networks 120, 102, data retrieved from one of the storage devices 106, 109 or data retrieved from a storage medium 125 inserted into the corresponding reader 112, all depicted in Fig. 1 A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 134.

The disclosed arrangements use input variables 154, which are stored in the memory 134 in corresponding memory locations 155, 156, 157. The arrangements produce output variables 161, which are stored in the memory 134 in corresponding memory locations 162, 163, 164. Intermediate variables 158 may be stored in memory locations 159, 160, 166 and 167.

Referring to the processor 105 of Fig. IB, the registers 144, 145, 146, the arithmetic logic unit (ALU) 140, and the control unit 139 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 133. Each fetch, decode, and execute cycle comprises: • a fetch operation, which fetches or reads an instruction 131 from a memory location 128, 129, 130; • a decode operation in which the control unit 139 determines which instruction has been fetched; and • an execute operation in which the control unit 139 and/or the ALU 140 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 139 stores or writes a value to a memory location 132.Each step or sub-process in the processes of Figs. 3, 4a, 4b and 1 is associated with one or more segments of the program 133 and is performed by the register section 144, 145, 147, the ALU 140, and the control unit 139 in the processor 105 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 133.

The current method may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the required functions or sub functions. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories. A system in accordace with described arrangements will now be described with reference to Fig. 3. The system is composed of an image capturing device (a camera) 2301, a rittai-kan quantification algorithm 302 to evaluate the quality of the captured image and produce scores 303, and a post-processing algorithm to optimise the image quality 304 and produce an optimised image 380.

The image capturing device (a camera) 301 captures red-green-blue-depth (RGBD) data including an image 312 along with additional data 318 of a scene 310. A perceived depth map 325 is generated from the image by an algorithm estimating perceived depth 320 and a perceptual filter 322. A physical depth map 335 of the scene is calculated by a depth calculation algorithm 330 from the additional data 318 and in some arrangemets, the image 312 as well. For example, in one arrangement, the additional data is recorded by a time-of-flight camera (ToF camera), which is a range imaging camera system that resolves distance based on the known speed of light, measuring the time-of-flight required for a pulse of infrared light to leave the camera, reflect from the scene, and return to the camera. A physical depth map 335 is formed which associates distances from the camera to objects in the image of the scene. For example, the physical depth map 335 may associate a distance with each pixel in the RGB image 312. In another arrangement, the physical depth map 335 may associate a distance with regions of the RGB image 312, for example the RGB image may be segmented into regions corresponding to objects in the scene, and each region may be assigned a depth corresponding to the estimated distance of the corresponding object. In another arrangement, the physical depth map 335 may associate distances with local features in the RGB image 312, such as Harris-Stephens comer features.

Depth information 337, representing the depth of regions in the RGB image 312, is generated from the physical depth map 335 and the perceived depth map 325. In one arrangement, the depth information is the same as the physical depth map. In another arrangement, the depth information is the same as the perceived depth map.

In another arrangement, the depth information includes both the physical and perceived depth maps. In another arrangement, the depth information includes additional information related to the depth of regions in the scene. In one arrangement, the additional information includes the focal length and focus distance of the camera lens used to capture the RGB image. In another arrangement, the additional information includes a confidence map which estimates the accuracy of the depth estimates in the physical depth map or the perceived depth map.

The physical depth map of the scene is compared with the perceived depth map by the calculate scores algorithm 345, using a rittai-kan quality model 340. A rittai-kan quality model 340 describes the mathematical relation between a set of image feature values extracted from the image and the quality of the perceived “sense of depth” of a 2D image, ie rittai-kan. For example, the rittai-kan quality model is a nonlinear combination of different features, local (e g. pixel color values, local contrast) and global (e.g. mean chroma, mean contrast). The rittai-kan quality model is determined offline, e.g. by building a statistical model from a set of data collected through a psychophysical experiment. The comparison of physical depth map with the perceived depth map determines a plurality of perceived depth characteristics defining a perceived depth quality of the image and produces rittai-kan scores 303, such as for example the position score 352, the texture score 355 and the shape score 358.

The position score 352 is related to the perceived sense of depth separation between the different objects in the scene.

The texture score 355 is related to the perceived sense of depth of the texture of the different objects in the scene.

The shape score 358 is related to the perceived sense of 3D shape of the different objects in the scene.

The scores 303 are used as an input for rittai-kan post-processes 304, for example the position process 362, the shape process 365 or the texture process 368, which are local image adjustment processes applied to the image using depth information 337, so as to modify the determined perceived depth characteristics of the image. In one arrangement, the local image adjustment processes are applied to the image 312 without user intervention, but in other arrangements the user is asked to confirm application of the image adjustment processes or provided with a number of options for suitable image adjustments which can then be activated or deactivated as desired.

In one arrangement, the position process 362 affects chroma of the pixels. The process increases chroma of the foreground objects while decreasing chroma of the background objects. The process is local in that only a subset of the image pixels corresponding to foreground objects have their chroma increased, and only a subset of the image pixels corresdponding to background objects have their chroma decreased, as opposed to a global process which affects all images pixels in the same way. The process 362 uses the physical depth map contained in the depth information to segment the foreground and background objects, identifying foreground objects as having a depth map value greater than the mean value of the depth map, and background objects as having a depth map value lower than the mean value of the depth map. Similarly, the shape process 365 and the texture process 368 affect attributes of the pixels such as chroma, hue, contrast or sharpness, and uses the physical depth map contained in the depth information to adapt the processing strength according to the depth of objects.

These rittai-kan post-processes 304 produce a modified image 380 that is perceptually closer to the physical depth of the scene.

The described arrangements can typically be applied in situations where a camera user is capturing an image or a sequence of images of a scene with a camera. For example, the scene is a portrait, or a group of people, or a landscape, but many other scenes would also be suitable.

The user expects the 2D image captured with the camera to be a good representation of the 3D scene. In particular, the user expects perceptual attributes such as for example the colours, or the contrast in the 2D image to match the colours or the contrast in the 3D scene. For example, the colour of a blue sky captured in the 2D image should match the colour of the actual blue sky in a 3D landscape scene. Similarly, the user also expects the depth information perceived from the 2D image to match the depth in the 3D scene.

With reference to the previously discussed scene examples: • When capturing a portrait of an individual, the 3D shape of the person and their features and the relative depth distance between them and the background should be perceived in the 2D image. • When capturing a group of people, the relative shape of each individual, the texture of their clothing, and the 3D composition of the group should be perceived in the 2D image. • When capturing a landscape image, the 3D shape of the different objects in the landscape (eg. trees, sheds, etc), their textures, and the relative depth distance between these objects should be perceived in the 2D image.

When a person looks at a 2D image, their human visual system uses various monocular cues from that 2D image to mentally construct a representation of the perceived depth of the original 3D scene. However, the human visual is unable to perceive all the fine details of a 2D image. In order to create a perceived depth map, a perceptual filter 322 is required to simulate the limitations of the human visual system.

In some arrangements, one or more perceptual monocular cues are extracted from the 2D image by a depth estimation algorithm 320, a perceptual filter 322 is applied and a perceived depth map 325 is generated.

For example, for the monocular blur cue, a depth estimation algorithm 320 first extracts blur caused by the optical system of the camera to produce a local blur map, using (Zhou, 2011). This blur map is then processed by a perceptual filter 322 which is a model of the human visual system (HVS). In one arrangement, this HVS model is an image convolution by the contrast sensitivity function of the HVS which removes information that cannot be perceived by the human visual system. In another arrangement, the convolution removes information that cannot be perceived by the human visual system, and also amplifies the information that will be highly perceivable, in order to produce a perceived depth map 325.

For the monocular aerial perspective cue, an depth estimation algorithm 320 first extracts the chroma value of the 2D image, by converting the RGB image to the CIELCH color space, then extracting the C chroma channel. The C chroma channel is then filtered using a perceptual filter 322 which is a chroma difference model that will filter away any subtle difference not perceivable by the HVS, and increase differences that are higly perceivable. The filtered output is the perceived depth map 325.

In another arrangement, multiple depths are estimated using different monocoluar cues, a perceptual filter is applied to each estimated depth, and the outputs are combined into a single perceived depth map using a weighted average.

In the depth estimation algorithm 320, the captured image is blurred using convolution with a Gaussian kernel with a standard deviation equal to a predetermined blur radius σ0, forming a reblurred image. An example value for σ0 in the equation below is 1 pixel. The gradient magnitude of the captured image is divided by the gradient magnitude of the reblurred image, forming a gradient magnitude ratio image. Edge locations in the captured image are detected, for example using Canny edge detection. For each edge location, the gradient magnitude ratio image is used to estimate the blur radius in the captured image using the equation: σ = oO/sqrt(RA2 - 1) where σ is the estimated blur radius, σ0 is the predetermined reblur radius and R is the gradient magnitude ratio image. The result is a sparse percevied depth map 325, where the depth is expressed in the form of blur radius amounts.

In another arrangement, other monocular cues are extracted including but not limited to: texture gradients, relative size, aerial perspective, linear perspective and overlap into maps representing these cues. The maps are processed by a model of the human visual system. Depth is then locally estimated from the processed maps to produce a perceived depth map 325.

The perceived depth from the 2D image may differ significantly from the physical depth of the 3D scene. Two main factors contributing to that difference include: 1. Only monocular cues, rather than binocular cues, from the 2D image can be used to build the perceived depth map. This is because only non-moving monocular cues are captured in a still 2D image. The binocular cues are not captured in the 2D image, as a 2D image is recorded by one unique view point while binocular cues result (by definition) from the capture of two images by the two eyes from two separate viewpoints. 2. Each monocular cue map used to build the perceived depth map is first processed by a perceptual filter which is a model of the human visual system.

Besides the image, additional data is captured from the 3D scene, and is used to build a physical map, as described below.

In one arrangement, this additional data is a second image captured from a viewpoint adjacent to the primary viewpoint. The two images constitute a pair of stereo images pair (i.e. left and right images), from which a depth map is computed. Depth from stereo processing is performed on the stereo pair, using an implementation of a block-matching algorithm (MVTools2, 2014) for motion estimation (similar methods are used in video encoding standards such as MPEG2 and MPEG4). The two images are divided into small blocks and for every block in one of the two images (ie first image), a matching or the most similar block in the second image is located. In the arrangement where the additional data is a second image, the quality measure of block similarity is the sum of absolute differences (SAD) of all pixels of these two blocks compared. Iteratively for each block in the first image, the block in the second image with the smallest SAD value is selected as the matching block. The relative shift of these blocks is an estimate of the stereo disparity, and is directly related to the physical depth. In the arrangement where the additional data is a second image, the stereo disparity map is rescaled between 0 and 1, and used as the physical depth map. In an alternative arrangement, the stereo disparity map is rescaled into a physical distance from the camera to the objects in the scene and used as the physical depth map, where the physical distance is calculated using the lens focal length and stereo baseline from the additional data.

Alternatively, depth is estimated using a Depth From Defocus (DFD) process from a focus bracket of images, ie a set of images having varying capture parameters such as aperture. DFD uses a small number of images shot at different focus positions and extracts depth information from variation in blur with object distance. Depth from defocus is more practical than other methods for many applications because DFD relies on as few as two images to estimate depth.

Several suitable DFD methods are known Such DFD methods typically rely on correspondences between regions of pixels in multiple images of the same scene to extract depth information about the object imaged at that image region. The depth information is extracted by quantifying the amount of blur difference between the images of an object. For a static object and camera, this blur difference is caused by a change in the focal quality of the image captures, which is governed by a change in camera parameters such as focus, aperture, or zoom. Alternatively, depth is estimated by a stereo system setup with two cameras that capture two substantially concurrent images from two nearby viewpoints. A disparity map is calculated from the two captured images. Each pixel value in this map corresponds to the disparity between the two images at that pixel location. This disparity is proportional to the depth and is used as an estimate of the scene depth.

Some of these imaging systems capture an RGB image along with depth information and are often referred to as RGB-Z or RGB-D cameras.

In one arrangement, the additional data is a measure of depth using a tool such as a laser, a time of flight device, or a direct measure of depth obtained using a ruler.

Several depth map features, which are discussed in further detail below, are extracted from the physical depth map to form a vector of features for the physical depth map. A set of features is then extracted from the perceived depth map to form a vector of features for the perceived depth map, as illustrated by Fig 4A.

In one arrangement, the physical depth features and the perceived depth features are compared to select the subset of features which correspond between the physical depth map and the perceived depth map. A physical depth map 401 is an input to a feature extraction process 402. The output is a series of features of the physical depth map 403, 404, 405 and 406. Similarly, a perceived depth map 451 is an input to a feature extraction process 452. The output is a series of features of the perceived depth map 453, 454, 455 and 456.

In one arrangement, in order to allow a comparison of the two maps, the physical depth map 401 and the perceived depth map 451 are both scaled to the same range of values, such as for example a continuous range of [0,1], by scaling algorithms 410 and 460. For example, the scaling algorithm applied to the two maps is a histogram scaling algorithm that maps the minimum value of each depth map to 0, the largest value of each map to 1, and linearly maps intermediate values between 0 and 1.

In another arrangement, in order to allow a comparison of the two maps, one of the two maps is scaled so that its range of values matches the range of values of the other map.

In another arrangement, each depth map is scaled such that the minimum and maximum possible depths are set to 0 and 1 respectively, with the actual depths occupying a reduced range according to the scene being captured. For example, for a physical depth map corresponding to an RGB image captured using a 50 mm lens on to a 36x24 mm sensor, a minimum depth value of 0 could be assigned to a distance 0 m from the camera to an object, and a maximum depth value of 1 could be assigned to the hyperfocal distance, corresponding to a nominal lens aperture of f/8, which is 10 m from the camera to an object. The scene being captured may have distances over a reduced range, such as from 2 m to 8 m, in which case the physical depth map values would be in the range 0.2 to 0.8 after scaling. For a perceived depth map, considering the monocular blur cue, a minimum depth value of 0 could be assigned to a blur size of 0.5 pixels and a maximum depth value of 1 could be assigned to a blur size of 20 pixels. The scene being captured may have monocular blurs over a reduced range, such as from 0.5 pixels to 15 pixels, in which case the perceived depth map values would be in the range 0 to 0.75 after scaling.

In one arrangement, features are then extracted at the pixel level, at the region level and at the image level. In another arrangement, features are extracted only at one or two of these levels.

The physical depth map is segmented into regions corresponding to different objects, using the physical depth map information. For example, the segmentation is achieved by a K-mean algorithm as follows: 1. Pick K cluster centers randomly 2. Assign each pixel in the image to the cluster that minimizes the distance between the pixel and the cluster centre, where the distance is the sum of the squared Euclidean pixel distance in the image and the squared depth distance in the depth map. 3. Re-compute the cluster centers by averaging all of the pixels in the cluster 4. Repeat steps 2 and 3 until convergence is attained (i.e. no pixels change clusters).

In another arrangement, the physical depth map is segmented into regions corresponding to different objects, using the image information. A K-mean algorithm is used, where the distance is the sum of the squared Euclidean pixel distance in the image and the squared Euclidean colour distance.

In yet another arrangement, the physical depth map is segmented into regions corresponding to different objects, using the physical depth map and the image information. A K-mean algorithm is used, where the distance is the sum of the squared Euclidean pixel distance in the image, squared depth distance, and the squared Euclidean colour distance.

In still another arrangement, the physical depth map is segmented into regions corresponding to different objects, using the physical depth map, the perceived depth map, and the image information. A K-mean algorithm is used, where the distance is the sum of the squared Euclidean pixel distance in the image, the squared depth distance in the physical depth map, the squared depth distance in the perceptual depth map, and the squared Euclidean colour distance.

Once the segmentation is completed, objects are labelled with different categories corresponding to depth layers, such as foreground, midground, and background, for example by quantizing the depth map into these 3 levels.

The features are related to the relative depth distance between the different objects captured in the image and identified by the depth map segmentation. The depth distance between objects identified as being in the foreground and objects identified as the background is computed for each of the physical and perceived depth maps. For example, D_bg- fg Perceived=0.5 and D bg-fg Physical= 1.

For example, consider a scene with three (3) objects, foreground object A, midground object B and background object C.

Each depth distance between each and every pair of objects in the scene (in the above example, A-B, A-C and B-C), the total depth of the scene (A-C), that is the distance from the closest point in the scene to the furthest point in the scene, the depth rankings of the objects are also computed (A:1,B:2,C:3, and ranking vector is [1,2,3],

In one arrangement, once the segmentation is completed, objects are labelled. Each of the object is assigned an index letter, e.g. A, B and C. Then for each object, the associated mean depth is computed from the pixels in the segmented region of the depth map corresponding to this object, e.g. mean_depth_A=0.1, mean_depth_B=0.2 and mean_depth_C=0.6. Then the objects are ranked according to their mean depths, from the closest to the most distant one, and the result is stored in a vector of ranked labels, e.g. [A,B,C],

In another arrangement, the features are related to the shape and volume of the different objects captured in the image. The perception of the shape and volume of all the objects in the scene is used by the human visual system to estimate the sense of 3D in a scene or in an image. Measures of the volume of each of the objects captured in the image are computed from the surface depths of the objects. This is achieved by first segmenting the depth maps into regions corresponding to different objects, as described previously. Then the volume of each object is computed by finding the minimum depth value of the object, then subtracting this minimum depth value of the all the depth values of the object, then summing all the resulting depth values.

In another arrangement, the features are related to the texture of the different objects captured in the image. Local texture of object such as for example skin, wood, grass, or fabric impact the sense of 3D in a scene or an image. Measures of the texture of each object or region of the scene are computed. For example, texture is computed for each pixel of a region by computing the some of the squared differences between neighbouring pixels within a sliding local window of 10x10 pixels.

In yet another arrangement, all the above features are extracted from each depth map to form the two key features vectors.

The following table is an illustration of a list of features extracted from both the physical map and the perceived map, as discussed here.

Each of the key features of the physical depth map of the 3D scene (e.g. 501) is then compared with the equivalent key feature of the perceived depth map computed from the 2D image (e.g. 502), by comparing the perceived depth map scalar or vector with the physical depth map scalar or vector, as illustrated by Fig 5.

In one arrangement, the comparison scalar or vector is the difference between the feature scalar or vector of the physical map, and the scalar or vector of the same feature in the perceived map.

In another arrangement, the comparison scalar or vector is a relative comparison obtained by dividing the difference between the feature scalar or vector of the physical map and the scalar or vector of the same feature in the perceived map by the scalar or vector of the feature of the physical map.

In yet another arrangement, the comparison scalar or vector is the ratio between the feature scalar or vector of the physical map, and the scalar or vector of the same feature in the perceived map.

Features of the physical depth map Fl_phy501, F2_phy 503 and F3_phy505 are compared respectively with features of the physical depth map Fl_per502,F2_per 504, and F3_per 506. The comparison of Fl_phy 501 with Fl_per502 by an algorithm 511 results in a score Delta l 512. The comparison of F2_phy503 with F2_per504 by an algorithm 513 results in a score Delta_2 514. The comparison of FN_phy505 with FN_per506 by an algorithm 515 results in a score Delta N 516.

Each comparison is a quality score of the 2D image for that feature. For example, we compute the difference between the depth distances between objects identified as being in the foreground and objects identified as the background D bg-fg Perceived=0.5 and D bg-fgPhysical= 1. Score bg-fg =0.5-1 =-0.5.

In another arrangement, one of the features is a vector of ranked labels. The two vectors of ranked labels are compared, using for example a quality score based on the depth ordering similarity such as the Spearman’s rank correlation coefficient or the Mann-Whitney-Wilcoxon test, and a score is determined. The score value is 1 if the two vectors match, the score value is 0 if the two vectors totally mismatch, and the value is intermediate if the two vectors somewhat match.

Each feature is compared independently, with the results of the comparison being communicated to the user. Alternatively, metrics are computed to evaluate a set of related features together and create comparison values, such as the depth distances between each and every pair of objects in the scene, e.g. comparison values Delta_l 512, Delta_2 514 and Delta_N 516, and deliver a subset of depth quality scores.

In one arrangement, related scores are combined into a subset of scores, such as for example the Position score 531, the Shape score 532 and the Texture score 533. Each score is a weighted mean of several comparison values, or the median, or a non-linear combination of several comparison scalar or vectors.

In another arrangement, the rittai-kan quality model 520 is an input to the pooling algorithms 521, 522 and 523. The pooling algorithms 521, 522 and 523 combine Delta_l 512, Delta_2 514 and Delta N 516 according to the rittai-kan quality model 520. The output of the pooling algorithms is one quality score, or several such as for example the Position score 531, the Shape score 532 and the Texture score 533.

The rittai-kan quality model is a set of coefficients of a mathematical model describing a nonlinear combination of several different comparison values, in the set of values from Delta l 512 to Delta_N 516. Alternatively, the rittai-kan quality model is a set of coefficients of a mathematical model describing a nonlinear combination of several different comparisons values along with values of the features of the physical depth map and the perceived depth map. The rittai-kan model can also be obtained by statistical analysis of data gathered in psychophysical experiments.

In one arrangement, related scores are combined into a subset of scores, such as for example the Position score 531, the Shape score 532 and the Texture score 533, using several predetermined rittai-kan quality models that describe the relation between an aspect of the quality of rittai-kan and a set of feature scores.

In one arrangement, the position, shape and texture scores are used as an input for a rittai-kan post-processes, including the position process, the shape process and the texture process, which are local image adjustment processes applied to the 2D image using depth information, so as to modify the determined perceived depth characteristics of the image. A process selection is performed to determine which post-processes should be applied. In one arrangement, the position, shape and texture scores are compared with predetermined thresholds for each score, and processes are selected where the score is below the corresponding threshold. In another arrangement, a single process is selected which will give the strongest enhancement based on which rittai-kan score is lowest. The selected processes are then applied to the RGB image.

These rittai-kan post-processes produce a modified image that is perceptually closer to the physical depth of the scene.

The described arrangements enable the produced images to be perceptually closer to the physical depth of the scene.

The scene is for example a portrait, or a group of people, or a landscape with several objects at different depths, captured by a DSLR camera.

Fig 6A, 6B, and 6C illustrate an example application and will be described below. In Fig 6A a scene is represented, with elements at different depths, 5601 in the foreground, 602 in the middle ground, and 603 in the background.

Fig 6B is a reproduction of the captured 2D image of the 3D scene. In the captured 2D image of the 3D scene, the region 611 corresponds to the scene element 601 in the foreground, the region 612 corresponds to the scene element 602 in the middle ground, the region 613 corresponds to the scene element 603 in the background.

When looking at the captured image Fig 6B, the regions 611 and 612 appear to be at a similar depth even though they are at different depths in the 3D scene. In order to improve the accuracy of the captured image, a rittai-kan process is applied to this image according to the workflow described in Fig. 3.

Fig 6C is an optimized reproduction of the captured 2D image of the 3D scene. In this processed 2D image of the 3D scene, the region 621 corresponds to the scene element 601 in the foreground, the region 622 corresponds to the scene element 602 in the middle ground, the region 623 corresponds to the scene element 603 in the background.

When looking at the optimized reproduction of the captured 2D image Fig 6C, the regions 621 and 622 appear to be at a different depths, accurately matching their different depths in the 3D scene.

Described arrangements can be applied in digital cinema, where the quality scores are used to reduce the difference in perceived depth between the physical 3D scene and the 2D release of the movie. The described arrangements can also be applied in digital cinema such that the quality scores are used to reduce the difference in perceived depth between a 3D release of a movie and the 2D release of the same movie, by adjusting the 2D version of the movie in line with the processes discussed above so that the 2D version of the movie perceptually matches the 3D version.

In another application, the quality scores are determined in a digital camera or a connected processing device such as a cloud server, and shown to the user during capture of an image. The user is then able to use this information to manually modify the capture settings such as the position of the camera (its view point), the lens focal length, the lens aperture, the ISO sensitivity and/or the colour rendering in an effort to increase the quality score for the captured image.

The scores of each captured image can also be saved as metadata, either in the image file or in another file associated with the captured image. The set of captured images with associated scores are able to be ranked according to one of the scores such as, for example, the texture score. Alternatively, when a user is looking for images with specific characteristics (e.g. a “flat” image with a low score), the user can search within the set of captured images for those images having an associated scores matching the user’s needs.

In another arrangement, the scores are used automatically by the camera during capture to modify the capture settings such as the position of the camera (its view point), the lens focal length, the lens aperture, the ISO sensitivity and the colour rendering (in order to increase the score). A flow chart for this arrangement is shown in Fig. 7 for the process of adjusting the camera capture settings using the rittai-kan scores 700. In step 710 the camera capture settings are selected, for example using manual controls operated by the user, or automatically by the camera using information from sensors for exposure and focus. In step 720 the camera captures an RGB image and additional data in response to the user pressing the shutter button. In step 730 the camera processor performs rittai-kan quantification 302 including a comparison between a physical depth map and a perceived depth map and using a rittai-kan quality model. In step 740 the camera processor performs rittai-kan scoring 303 for the position, texture and shape of objects in the RGB image. In step 750 a decision is made whether to improve the rittai-kan scores, for example by comparing the rittai-can scores against a predetermined threshold. If the rittai-kan scores are above the threshold, then the process 700 stops.

If the rittai-kan scores are below the threshold, then control moves to step 760 where the capture settings are adjusted to improve the rittai-kan score. For example, if the position score is below the threshold, then the camera may open up the aperture to increase the monocular blur cue in the captured RGB image. As another example, if the shape score is below the threshold in an indoors portrait image, then the camera may disable the flash which will tend to increase the directional lighting on the subject and therefore increases the sense of shape and volume. Control then passes to step 720 where a new RGB image is captured with increased rittai-kan quality.

One example of a camera implemented arrangement is described below.

Firstly, a user captures a 2D image of a 3D scene using an RGBZ camera. The RGBZ camera captures a 2D RGB image along with a physical depth map of the 3D scene. An image processing system embedded in the RGBZ camera generates a perceived depth map from the 2D RGB image. The system extracts features from the physical and perceived depth maps and determines the perceived depth quality of the image in the form of a series of quality scores associated with the captured image.

These determined quality scores are used as an input value by an embedded local image adjustment processes, which processes the captured image using associated depth information, which as discussed above is preferably derived from the physical depth map, so as to modify determined perceived depth characteristics of the image.

The system determines that the image processing is completed when the perceived depth of the 2D image matches the physical depth of the 3D scene, as described in Fig. 6c. Alternatively, if matching the physical depth of the 3D secene is not possible, the system stops when the perceived depth of the 2D image has been brought suitably closer to the physical depth of the 3D scene.

The arrangements described herein provide for technical improvements in adjusting the perceived depth of a captured image. The arrangements described compare physical depth values for the scene captured in the image with a determination of initial perceived depth values of the captured image to form perceived depth scores defining a perceived depth quality for the image.

The arrangements base the determination of local post-processing of the image on the depth information and perceived depth scores.

The arrangements described are applicable to the computer and data processing industries and particularly for image processing and image editing.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word "comprising", such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

1. A method of modifying the perceived depth of an image capturing a scene, said method comprising: receiving a physical depth map of the scene captured in the image; generating a perceived depth map from the image ; determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

2. A method according to claim 1, wherein the perceived depth map is generated using a predetermined perceptual filter.

3. A method according to claim 1, wherein a second local image adjustment process is applied to the image to modify at least one of the plurality of the determined perceived depth scores of the image not modified by the first local image adjustment process.

4. A method according to claim 1, wherein the plurality of perceived depth scores are formed by a combination of different comparison values.

5. A method according to claim 1, wherein a correspondence between the physical depth map and the perceived depth map is determined with reference to a predetermined rittai-kan model.

6. A method according to claim 1, wherein a second local image adjustment process is applied to a local region of the image which is different to a local region in which the first local image adjustment process is applied, said local region corresponding to distinct depth layers as represented in the depth information.

7. A method according to claim 1, wherein the determined perceived depth scores of the image are used to rank the image with respect to other images.

8. A method according to claim 1, wherein the determined perceived depth scores of the image are used to search for the image within a plurality of images.

9. A method according to claim 1, wherein the predetermined perceptual filter is a model associated with perceptual attributes of the human visual system.

10. An apparatus for modifying the perceived depth of an image capturing a scene, said apparatus comprising: means for receiving a physical depth map of the scene captured in the image; means for generating a perceived depth map from the image ; means for determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and means for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

11. A system for modifying the perceived depth of an image capturing a scene, said apparatus comprising: a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: receiving a physical depth map of the scene captured in the image; generating a perceived depth map from the image ; determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

12. A computer readable medium having a computer program stored on the medium for modifying the perceived depth of an image capturing a scene, said program comprising: code for receiving a physical depth map of the scene captured in the image; code for generating a perceived depth map from the image ; code for determining, based on a correspondence between the physical depth map and the perceived depth map, a plurality of perceived depth scores defining a perceived depth quality of the image; and code for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

13. A method of modifying perceived depth of an image capturing a scene, said method comprising: receiving a physical depth measurements of a first object and second object in the scene captured in the image; generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

14. An apparatus for modifying perceived depth of an image capturing a scene, said apparatus comprising: means for receiving a physical depth measurements of a first object and second object in the scene captured in the image; means for generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; means for determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and means for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

15. A system for modifying perceived depth of an image capturing a scene, said system comprising: a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: receiving a physical depth measurements of a first object and second object in the scene captured in the image; generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image.

16. A computer readable medium having a computer program stored on the medium for modifying perceived depth of an image capturing a scene, said program comprising: code for receiving a physical depth measurements of a first object and second object in the scene captured in the image; code for generating perceived depth information from the image, said perceived depth information identifying a perceptual depth of the first object relative to the second object; code for determining, based on a correspondence between the physical depth measurements and the perceived depth information, a plurality of perceived depth scores defining a perceived depth quality of the image; and code for applying a first local image adjustment process to the image using depth information, so as to modify at least one of the plurality of the determined perceived depth scores of the image. CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant/Nominated Person SPRUSON & FERGUSON