AU2016273974A1

AU2016273974A1 - Method, system and apparatus for identifying an image

Info

Publication number: AU2016273974A1
Application number: AU2016273974A
Authority: AU
Inventors: Timothy Stephen Mason; Alex Nyit Choy Yee
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-07-05

Abstract

-26 Abstract A method of identifying an image. A plurality of regions of a query image are identified, the regions being located at different depth planes within the query image. A content depth profile is determined for the query image based on a set of differences in depth and 5 appearance in pairs of proximate regions located in neighbouring depth planes. An image is identified based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image. 1710ROS,6v1 (P24649O Snri As Filed) -4/9 Startris Image Segment regions 430 Characterise regions 440 Identify proximate regions Form content depth profile Content depth profile : End Fig. 4 1 A1f/1/V$_1 /TVAAC tIT TC' T A T 1--

Description

METHOD, SYSTEM AND APPARATUS FOR IDENTIFYING AN IMAGE

TECHNICAL FIELD

The present invention relates generally to the field of image processing and, in particular, to the measurement of similarity between images. The present invention also relates to a method and apparatus for identifying an image, and to a computer program product including a computer readable medium having recorded thereon a computer program for identifying an image.

BACKGROUND

When visual artists such as photographers consider the beauty of an image, one of the aspects they may consider is how well the image communicates a sense of depth. The human perception of depth in a conventional still image is influenced by “monocular depth cues”, i.e. aspects of the image that suggest depth even when viewed by a single eye. An example monocular depth cue is occlusion: if a first object obscures vision of a second object, this is a strong cue that the first object is in front of the second object. Other monocular cues are more subtle and may create an impression of depth without the viewer being able to specify how that impression was formed.

Image similarity measurements are used in image quality measurements and image retrieval applications.

In image quality measurements, methods generally quantify the degradation between a high-quality reference image and a processed version of that image. One such method measures local luminance and contrast across both images, and forms a combined comparison in terms of means, standard deviations and covariance of the images according to the local measurements. Other methods include Peak Signal-to-Noise Ratio (PSNR) and distortion based Structural Similarity (SSIM). Such methods are not intended for comparing any two arbitrary images, but rather images that have related content in related spatial locations.

Image retrieval methods find images in an image database that are similar to a query image. An “image lookup” is performed, whereby the database is searched for images that are sufficiently similar to the query image. For example, a lookup may determine a histogram of visual words (i.e. the ‘Bag of Visual Words’ method) for the query image (“query histogram”); and compare the query histogram to each respective histogram of the images in the database (“database histogram”). The lookup may then determine a histogram difference (such as a χ2 distance) between the query histogram and each database histogram; and identify each histogram difference that is less than a given threshold as indicating a sufficiently similar image. These sufficiently similar images are said to be “found” by the lookup for that query image.

SUMMARY

It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

Disclosed is an image similarity measurement method that is suited for problems where three-dimensional spatial composition of the images is significantly relevant. One such problem is the manipulation of perceived depth of an image.

According to one aspect of the present disclosure, there is provided a method of identifying an image, said method comprising: identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.

According to another aspect of the present disclosure, there is provided an apparatus for identifying an image, said apparatus comprising: means for identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; means for determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and means for identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.

According to still another aspect of the present disclosure, there is provided a system for identifying an image, said system comprising: a memory for storing data and a computer program; and a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.

According to still another aspect of the present disclosure, there is provided a computer program product having a computer program stored thereon for identifying an image, said program comprising: code for identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; code for determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and code for identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.

Other aspects are also disclosed.

DESCRIPTION OF THE DRAWINGS

One or more embodiments of the invention will now be described with reference to the following drawings, in which:

Fig. 1A shows an example photographic image captured by a RGBD camera.

Fig. IB shows an example depth map of the photographic image in Fig. 1 A.

Fig. 1C shows an example photographic image with enhanced sense of depth, of the photographic image in Fig. 1 A.

Fig. 2 shows an example depth enhancement software structure.

Fig. 3 is a schematic flow diagram showing a method of determining compositional similarity of two images;

Fig. 4 is a schematic flow diagram showing a method of determining a content depth profile as used in the method of Fig. 3;

Fig. 5 is a schematic flow diagram showing a method of segmenting an image into regions, as used in the method of Fig. 4;

Fig. 6A illustrates the region segmentation corresponding to the image 120 of Fig. 1 A;

Fig. 6B illustrates the content depth profile corresponding to the image 120 of Fig. 1A;

Fig. 7 illustrates another arrangement of the current invention; and

Figs. 8A and 8B form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced.

DETAILED DESCRIPTION INCLUDING BEST MODE

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. A distinction may be made between perceived depth (i.e. depth as understood by a human viewer of an image) and physical depth (i.e. actual depth of the scene that the image depicts). Depth sensors may be used for measuring the depth of a scene using methods such as time-of-flight imaging, stereo-pair imaging to calculate object disparities, or imaging of projected light patterns. The physical depth may also be estimated from a single image. The physical depth may be represented by a spatial array of values called a depth map, where each value of the depth map is a distance between the depth sensor and the nearest surface along a ray. These measurements can be combined with a photographic image of the scene to form a RGBD image (i.e. RGB denoting the colour channels Red, Green, and Blue of the photographic image, and D denoting the measured depth of the scene), such that each pixel of the image has a paired colour value representing visible light; and a depth value representing the distance from a viewpoint. Other representations and colour spaces may also be used for an image.

The perceived depth of an image may be affected by altering the monocular cues of the image using image processing. A human observer, when viewing an image of a scene, will have some perception of the depth of the scene based on the depth cues of the scene. For example, one monocular depth cue is aerial perspective, which is associated with light scattering in an atmosphere. According to aerial perspective, distant objects are hazier and have less contrast. If the image of the scene is processed as a function of a depth map, such that closer regions have a relative contrast increase and distant regions have a relative contrast decrease than in the original image, the perception of the depth of scene for an observer will typically be enhanced. Thus, image processing can affect the perceived depth of an image.

Fig. 1A shows an example photographic image 120 captured by a RGBD camera that has an image sensor that captures the photographic image 120 of a scene. In the example of Fig. 1 A, the RGBD camera also has a depth sensor that measures the depth of the scene to produce a depth map 150 of the scene as shown in Fig. IB. Alternatively, the depth map 150 may also be obtained using two image sensors placed side-by-side to measure stereo disparity (i.e. the depth sensor in the RGBD camera is replaced by the two image sensors). In the photographic image 120 of Fig. 1 A, a person 130 is visible in the foreground and a tree 140 is visible in the background.

The depth map 150 is shown in the example of Fig. IB such that shorter distances are significantly indicated by lighter colouring (i.e., by sparser hatching) and longer distances are significantly indicated by darker colouring (i.e., by denser hatching). A first region 160 of the depth map 150, corresponding to the person 130 of the photographic image 120, is coloured lightly, indicating close proximity to the RGBD camera. A second region 170 of the depth map 150, corresponding to the tree 140 of the photographic image 120, is coloured more darkly, indicating a greater distance from the RGBD camera. A photographer, having captured an image of the scene of the photograph 120, may wish to enhance the sense of depth conveyed by the photographic image 120. For a portrait photograph such as the photographic image 120, the depth enhancement may involve emphasising the person 130 situated in the foreground and deemphasising the tree 140 situated in the background, resulting in an enhanced image 180 as shown in Fig. 1C. The depth enhancement may be achieved by processing the photographic image 120 as a function of the depth map 150 corresponding to the image 120. However, it is a difficult task to select which image processes to apply (such as contrast adjustment, colour adjustments, noise addition or removal, or other such processes), and in what proportions and what order. Unfortunately, the same specific combination of image processes applied to two compositionally different images does not reliably yield the same result on the sense of depth (i.e., perceived depth) conveyed by the two images. Further, not all specific combinations of image processes will produce a visually appealing result. As a result, an experienced photographer may find it time-consuming and arduous to manually adjust the photographic image 120 to achieve the desired enhanced image 180; whereas a novice photographer may not have sufficient experience to successfully achieve the desired outcome. The terms “perceived depth” and “conveyed depth” can be used interchangeably. A method 300 of determining compositional similarity of two images will be described in detail below with reference to Fig. 3.

Figs. 8A and 8B depict a general-purpose computer system 800, upon which the various arrangements described can be practiced.

As seen in Fig. 8A, the computer system 800 includes: a computer module 801; input devices such as a keyboard 802, a mouse pointer device 803, a scanner 826, a camera 827, and a microphone 880; and output devices including a printer 815, a display device 814 and loudspeakers 817. An external Modulator-Demodulator (Modem) transceiver device 816 may be used by the computer module 801 for communicating to and from a communications network 820 via a connection 821. The communications network 820 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 821 is a telephone line, the modem 816 may be a traditional “dial-up” modem. Alternatively, where the connection 821 is a high capacity (e.g., cable) connection, the modem 816 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 820.

The computer module 801 typically includes at least one processor unit 805, and a memory unit 806. For example, the memory unit 806 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 801 also includes an number of input/output (I/O) interfaces including: an audio-video interface 807 that couples to the video display 814, loudspeakers 817 and microphone 880; an I/O interface 813 that couples to the keyboard 802, mouse 803, scanner 826, camera 827 and optionally a joystick or other human interface device (not illustrated); and an interface 808 for the external modem 816 and printer 815. In some implementations, the modem 816 may be incorporated within the computer module 801, for example within the interface 808. The computer module 801 also has a local network interface 811, which permits coupling of the computer system 800 via a connection 823 to a local-area communications network 822, known as a Local Area Network (LAN). As illustrated in Fig. 8A, the local communications network 822 may also couple to the wide network 820 via a connection 824, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 811 may comprise an Ethernet circuit card, a Bluetooth® wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 811.

The EO interfaces 808 and 813 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 809 are provided and typically include a hard disk drive (HDD) 810. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 812 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e g., CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 800.

The components 805 to 813 of the computer module 801 typically communicate via an interconnected bus 804 and in a manner that results in a conventional mode of operation of the computer system 800 known to those in the relevant art. For example, the processor 805 is coupled to the system bus 804 using a connection 818. Likewise, the memory 806 and optical disk drive 812 are coupled to the system bus 804 by connections 819. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun Sparcstations, Apple Mac™ or a like computer systems.

The method 300 and other methods described here may be implemented using the computer system 800 wherein the processes of Figs. 3 to 7, to be described, may be implemented as one or more software application programs 833 executable within the computer system 800. In particular, the steps of the described methods are effected by instructions 831 (see Fig. 8B) in the software 833 that are carried out within the computer system 800. The software instructions 831 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software 833 is typically stored in the HDD 810 or the memory 806. The software is loaded into the computer system 800 from the computer readable medium, and then executed by the computer system 800. Thus, for example, the software 833 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 825 that is read by the optical disk drive 812. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 800 preferably effects an advantageous apparatus for implementing the described methods.

In some instances, the application programs 833 may be supplied to the user encoded on one or more CD-ROMs 825 and read via the corresponding drive 812, or alternatively may be read by the user from the networks 820 or 822. Still further, the software can also be loaded into the computer system 800 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 800 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 801. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 801 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 833 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 814. Through manipulation of typically the keyboard 802 and the mouse 803, a user of the computer system 800 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 817 and user voice commands input via the microphone 880.

Fig. 8B is a detailed schematic block diagram of the processor 805 and a “memory” 834. The memory 834 represents a logical aggregation of all the memory modules (including the HDD 809 and semiconductor memory 806) that can be accessed by the computer module 801 in Fig. 8A.

When the computer module 801 is initially powered up, a power-on self-test (POST) program 850 executes. The POST program 850 is typically stored in a ROM 849 of the semiconductor memory 806 of Fig. 8A. A hardware device such as the ROM 849 storing software is sometimes referred to as firmware. The POST program 850 examines hardware within the computer module 801 to ensure proper functioning and typically checks the processor 805, the memory 834 (809, 806), and a basic input-output systems software (BIOS) module 851, also typically stored in the ROM 849, for correct operation. Once the POST program 850 has run successfully, the BIOS 851 activates the hard disk drive 810 of Fig. 8A. Activation of the hard disk drive 810 causes a bootstrap loader program 852 that is resident on the hard disk drive 810 to execute via the processor 805. This loads an operating system 853 into the RAM memory 806, upon which the operating system 853 commences operation. The operating system 853 is a system level application, executable by the processor 805, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 853 manages the memory 834 (809, 806) to ensure that each process or application running on the computer module 801 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 800 of Fig. 8 A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 834 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 800 and how such is used.

As shown in Fig. 8B, the processor 805 includes a number of functional modules including a control unit 839, an arithmetic logic unit (ALU) 840, and a local or internal memory 848, sometimes called a cache memory. The cache memory 848 typically include a number of storage registers 844 - 846 in a register section. One or more internal busses 841 functionally interconnect these functional modules. The processor 805 typically also has one or more interfaces 842 for communicating with external devices via the system bus 804, using a connection 818. The memory 834 is coupled to the bus 804 using a connection 819.

The application program 833 includes a sequence of instructions 831 that may include conditional branch and loop instructions. The program 833 may also include data 832 which is used in execution of the program 833. The instructions 831 and the data 832 are stored in memory locations 828, 829, 830 and 835, 836, 837, respectively. Depending upon the relative size of the instructions 831 and the memory locations 828-830, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 830. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 828 and 829.

In general, the processor 805 is given a set of instructions which are executed therein. The processor 1105 waits for a subsequent input, to which the processor 805 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 802, 803, data received from an external source across one of the networks 820, 802, data retrieved from one of the storage devices 806, 809 or data retrieved from a storage medium 825 inserted into the corresponding reader 812, all depicted in Fig. 8A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 834.

The disclosed arrangements use input variables 854, which are stored in the memory 834 in corresponding memory locations 855, 856, 857. The disclosed arrangements produce output variables 861, which are stored in the memory 834 in corresponding memory locations 862, 863, 864. Intermediate variables 858 may be stored in memory locations 859, 860, 866 and 867.

Referring to the processor 805 of Fig. 8B, the registers 844, 845, 846, the arithmetic logic unit (ALU) 840, and the control unit 839 work together to perform sequences of microoperations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 833. Each fetch, decode, and execute cycle comprises: a fetch operation, which fetches or reads an instruction 831 from a memory location 828, 829, 830; a decode operation in which the control unit 839 determines which instruction has been fetched; and an execute operation in which the control unit 839 and/or the ALU 840 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 839 stores or writes a value to a memory location 832.

Each step or sub-process in the processes of Figs. 3 to 7 is associated with one or more segments of the program 833 and is performed by the register section 844, 845, 847, the ALU 840, and the control unit 839 in the processor 805 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 833.

The described methods may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

Fig. 2 shows an example depth enhancement software structure 200, which may be used to aid a photographer with the task of enhancing the depth conveyed by an image 210. An example of image 210 is the photographic image 120 shown in Fig. 1 A. The depth enhancement guidance software structure 200 comprises a database 220 that contains a collection of “depth enhancement packages” such as depth enhancement package 230. Each depth enhancement package 230 contains an image such as image 240. The database 220 may be configured, for example, within the HDD 810. In accordance with the described method 300, each depth enhancement package 230 comprises a description of a specific combination of image processes, such as the description 245 associated with image 240. The description 245 may be used for applying the specific image processes to the associated image 240.

The description 245 includes details of the proportions of the image processes that can be applied to enhance the conveyed depth of the associated image 240.

The depth enhancement packages 230 may be created by an experienced photographer who has manually enhanced the depth conveyed by the images such as the image 240. Alternatively, the associated description 245 may be obtained from perceptual experiments set up to determine the optimal combination of image processes to enhance the depth conveyed by the image 240.

As seen in Fig. 2, an example lookup 250 into the database 220 may be performed, to determine an appropriate depth enhancement package 260. The lookup 250 is performed by using the image 210 to be enhanced as a query image in an image lookup for a similar example image 270. The depth enhancement package 260 contains the similar image 270 and the corresponding description of the specific combination of image processes 275 for the image 270.

If that image lookup 250 uses an image similarity measurement method that is well-suited to the task of depth enhancement, then the example specific combination of image processes 275 of the determined image 270 will be closely related to the specific combination of image processes to be applied to the image 210 to be enhanced. Known image similarity measurement methods, such as peak-signal-to-noise-ratio (PSNR), structural similarity (SSIM), Bag of Visual Words, are not suitable for depth enhancement. Specifically, the image similarity measurement methods, such as peak-signal-to-noise-ratio (PSNR), structural similarity (SSIM), Bag of Visual Words, do not take into consideration the compositional aspect of the image.

The compositional aspect of the image may be used in the task of depth enhancement of the image. The combination of image processes 275 is useful as a starting point for enhancing the depth of the image 210, whether the enhancing is performed by the photographer, an automated system, or otherwise.

The method 300 may be used for identifying images with similar three-dimensional spatial composition. The method 300 may be implemented, for example, using the depth enhancement software structure 200 depicted in Fig. 2. The method 300 will now be described by way of example with reference to two images 310 and 320 as shown in Fig. 3. The method 300 may be implemented as one or more software code modules of the software application program 833 resident in the hard disk drive 810 and being controlled in its execution by the processor 805.

As seen in Fig. 3, the two images 310 and 320 are provided. The three-dimensional spatial composition of each image 310 and 320 is summarised in a content depth profile, as determined at determining steps 330A and 33OB, respectively. As described below, the content depth profile determined at step 330A defines content and depth structure within the image 310. A method 400 of determining a content profile, as executed at each of steps 330A and 33OB, will be described below with reference to Fig. 4. The method 400 results in a respective content depth profile 340 and 350 for each of the images 310 and 320, respectively. The content depth profiles 340 and 350 determined at steps 330A and 330B, respectively, may be stored in the memory 806 and/or the hard disk drive 810.

Then, the method 300 proceeds from steps 330A and 330B to similarity determining step 360, where the similarity of the two images 310 and 320 is determined by comparing the respective content depth profiles of each of the images 310 and 320. Step 360 will be described in detail below.

In the context of depth enhancement software structure 200, the first image 310 corresponds to the query image 210, and the second image 320 corresponds to an image such as the image 240 stored within the database 220. Typically, the content depth profile 350 of the second image 320 is predetermined when the associated image (e.g., 240 of the package 230 is inserted into the database 220 and that content depth profile is cached in the database 220. The content depth profile 340 of the first image 310 is determined when the first image lookup associated with the lookup 250 is performed, and the content depth profile 340 is cached for reuse in later image lookups associated with the lookup 250. For both images 310 and 320, both a photographic image and a depth map are obtainable. Obtaining photographic images and depth maps depend on the arrangement of the described methods. For example, RGBD images may be used directly in the described methods. Alternatively, the method 300 may receive photographic images from an imaging system such as the camera 827 or the scanner 826, from a data source such as a personal computing device, a portable memory card, a network data stream, etc.

Depth maps for use in the described methods may be received from depth measurement sensors or other depth measurement methods. Alternatively, the depth of an image may be estimated from the image.

Each image 240 of the depth enhancement packages (e g., 230) stored in the database 220 may be inserted into the database as an RGBD image (i.e., an associated photographic image and depth map may be obtained at insertion time and converted into a common RGBD format). The query image 210 is typically less controlled, as the query image 210 may be provided by an end-user. A photographic image and depth map for the query image 210 may be obtained as appropriate for the specific image.

In another arrangement, the image 240 is not required to be present in the database 220 and is replaced by the content depth profile 350. An arrangement where the image 240 is replaced by the content depth profile 350 is possible because the content depth profile 350 of the second image 320 is typically pre-determined, when the associated image 240 is inserted into the database 220. Hence, each depth enhancement package (e.g., 230 and 260) may have their images (e.g., 240) replaced with the content depth profile of the respective image. An arrangement where the image 240 is replaced by the content depth profile 350 significantly reduces the size of the database 220 and results in significant efficiency improvements.

The method 400 of determining a content profile, as executed at each of steps 330A and 330B, according to one arrangement will now be described with reference to Fig. 4. The method 400 processes an image (e.g., 310 or 320), comprising a photographic image and a depth map, and produces a content depth profile 460 corresponding to the image (e.g. 310 or 320). The method 400 will be described by way of example with reference to the image 310.

The method 400 may be implemented as one or more software code modules of the software application program 833 resident in the hard disk drive 810 and being controlled in its execution by the processor 805.

The method 400 begins at a region segmenting step 420, where the image 310 is segmented into regions, under execution of the processor 805, such that each region has a significantly homogeneous visual appearance and depth. In many cases, regions will correspond with semantic objects of the image 310. The segmented image determined at step 420, may be stored within the memory 806, under execution of the processor 805.

The segmentation of the image 310 is in some arrangements determined based on a superpixel segmentation of the image to identify compact regions of similar visual appearance and depth. A method 500 of segmenting an image into regions, as executed at step 420, will now be described with reference to Fig. 5. The method 500 processes the image 310 to produce segmented regions 550.

The method 500 may be implemented as one or more software code modules of the software application program 833 resident in the hard disk drive 810 and being controlled in its execution by the processor 805.

The method 500 begins at a superpixel segmenting step 520, where the image 310 is segmented into superpixels, under execution of the processor 805 and the segmented image may be stored, for example, in the memory 806. At step 520, an approximate grid of initial candidate points is assigned, sparsely covering the image 310. Each pixel of the image 310 is associated with a most similar candidate point according to a dissimilarity measure between that pixel and nearby candidate points. The dissimilarity measure is selected such that visually similar and proximate image regions are associated with the same candidate point.

To achieve association of visually similar and proximate image regions, the dissimilarity measure used at step 520 comprises a measurement of local image characteristics about the pixel and the candidate point being considered. The dissimilarity measure also comprises a measure of three-dimensional spatial distance (being x, y and depth) between the pixel and the candidate point being considered. For example, the colour distance in the CIELAB colour space between the pixel and the candidate point is a useful local image characteristic to measure and may be used to determine the dissimilarity measure at step 520. According to an arrangement, the local image characteristics determined at step 520 are measures of the local colour saturation, local colour hue relative to red and blue, and local image gradient (an aspect of image texture) as measured by a Gaussian derivative filter. After each pixel has been associated with a candidate point, the set of pixels associated with each candidate point is called a “superpixel”. As the depth is a contributor to the dissimilarity measure, each resulting superpixel is situated significantly on a single depth plane.

The method 500 continues at a superpixel characterising step 530, where the local visual appearance and depth of each superpixel produced from the superpixel segmenting step 520 is characterised. At step 530, the local image characteristics and local depth characteristics within each superpixel are determined under execution of the processor 805, and may be stored, for example, in the memory 806. The local image characteristics determined at step 530 are attributes of each superpixel that describe the visual appearance of that superpixel such as, for example, colour and frequency distribution of the superpixel. According to an arrangement, the local image characteristics determined at step 530 are the same as the local image characteristics determined at the superpixel segmenting step 520. The determined local image characteristics and local depth characteristics are associated with the appropriate superpixels and may be stored in the memory 806.

Typically, the superpixels produced at the superpixel segmenting step 520 are locally compact, and smaller than semantic objects of the image 310.

The method 500 continues at a superpixel merging step 540, where adjacent superpixels that have similar characterisations, as determined by the superpixel characterising step 530, are merged, under execution of the processor 805. The merged superpixels resulting from step 540 are segmented regions approximately corresponding to semantic objects of the image 310. According to one arrangement, a comparison is made between each pair of adjacent superpixels at step 540. A mean local image characteristic is determined for each superpixel, and the difference of the determined mean local characteristics is stored, for example, within the memory 806 under execution of the processor 805. Similarly, the difference of the mean depths is stored within the memory 806. The dissimilarity of adjacent superpixels is measured as the sum of the mean local image characteristic differences and the mean local depth characteristic difference. Sufficiently similar adjacent superpixels are merged at step 540.

According to another arrangement, a graphical model such as a Markov random field with vertices at superpixels and edges between adjacent superpixels may be used at step 540 to identify which superpixels should be merged. Learning may be performed in a supervised manner; and a learning database may be created to contain images for which superpixel segmentation has been performed according to execution of the superpixel segmenting step 520. The associated superpixel characterisations may then be determined by the execution of the superpixel characterising step 530. The superpixels may be manually merged by human observers, effectively “supervisors”, into appropriate segmented regions, and stored in the learning database alongside the images, such that there is an association between the characteristics of superpixels and the appropriate superpixel merging. This association may then be learned by the graphical model and applied by the superpixel merging step 540.

Following the superpixel merging step 540, the segmented regions 550 produced in accordance with the method 500 may be stored in the memory 806. Each region identifies an associated collection of pixels of the image 310.

Once segmented regions have been produced, the method 400 continues to a region characterising step 430, where the visual appearance and depth of each region determined at step 420 is characterised. According to an arrangement, each region is characterised by the same local image characteristics and local depth characteristics used by the superpixel characterising step 530, calculated over the region rather than over a superpixel. As each region has a significantly homogeneous depth, each region significantly occupies a single depth plane. Each of the regions is located at different depth planes. Therefore, the local depth characteristics of the region can be stored as the depth of the depth plane that the region significantly occupies. The local image characteristics and local depth characteristics can be represented in normalised forms that are suitable for each characteristic.

Next, at a proximate region identifying step 440, pairs (i.e., a plurality) of the regions determined at step 420 which are two-dimensionally spatially proximate in the image 310 (i.e., in x and y, disregarding depth), are identified under execution of the processor 805. Depth is disregarded in identifying the pairs of regions at step 440. However, each of the identified regions is located in neighbouring depth planes within the image 310 as will be described further below. According to an arrangement, adjacent pairs of regions in the image 410 are identified as being proximate. According to another arrangement, pairs of regions separated by less than a threshold number of pixels in the image 410 are identified as being proximate. According to yet another arrangement, pairs of regions separated by less than a threshold degrees of foveal vision in the image 410 are identified at step 440 as being proximate. According to some arrangements, the proximity of the regions is also determined and stored at step 440.

Next, the method 400 proceeds from step 440 to a content depth profile determining step 450. At step 450, the content depth profile 460 of the image 310 is determined under execution of the processor 805. The content depth profile 460 of the image 310 is determined based on a set of differences in depth and appearance in the pairs of proximate regions located in neighbouring depth planes. The differences in depth and appearance represent relationships between the pairs of proximate regions, the relationships defining a difference in depth of a proximate region pair together with a difference in a content characteristic of the proximate region pair.

The determination of the content depth profile at step 450, according to an arrangement, will now be described by way of example with reference to Fig. 6A and Fig. 6B.

Fig. 6 A shows an example region segmentation 610 corresponding to the image 120 of Fig. 1 A, as may be determined at region segmenting step 420. Regions of Fig. 6 A are each labelled with a letter, such as the tree (Region B) 620, the sky to the right of the tree (Region C) 630 and the person (Region E) 640.

Fig. 6B shows a content depth profile 650 corresponding to the image 120 of Fig. 1A and the region segmentation 610 of Fig 6A, as may be determined by the content depth profile determining step 450. The content depth profile 650 is a graph with a vertex for each region of the segmented regions 610. For example, the tree (Region B) 620 is represented by vertex B 660; the sky to the right of the tree (Region C) 630 is represented by vertex C 670; and the person (Region E) 640 is represented by vertex E 680. The content depth profile 650 has an edge between each pair of proximate regions, as determined by the proximate region identifying step 440. For example, the tree (Region B) 620 and the sky to the right of the tree (Region C) 630 are adjacent to each other and are therefore proximate, so the content depth graph 650 has an edge 690 joining vertex B 660 and vertex C 670. However, the tree (Region B) 620 and the person (Region E) 640 are not proximate, so the content depth graph 650 has no edge joining vertex B 660 and vertex E 680.

Further, each of the regions in a pair of proximate regions is located in neighbouring depth planes. For example, the proximate tree (Region B) 620 and sky (Region C) 630 are each located in neighbouring depth planes. However, the tree (Region B) 620 and the person (Region E) 640 are not proximate, and so the tree (Region B) 620 and the person (Region E) 640 are located in depth planes which are not neighbouring. The depth planes of the tree (Region B) 620 and the person (Region E) 640 may be separated by one or more further depth planes.

The content depth profile 650 of Fig. 6B shows that vertex A 675 is not connected by an edge to vertex C 670. Vertex A 675 corresponds to the sky to the left of the tree (Region A) 635 of Fig. 6A, and Vertex B 670 corresponds to the sky to the right of the tree (Region C) 630 of Fig. 6A. Therefore in the illustrated arrangement, Regions A and C are not proximate.

Alternatively, according to some arrangements of the proximate region identifying step 440, the sky to the left of the tree (Region A) 635 may be proximate to the sky to the right of the tree (Region C) 630, as they are separated by a small number of pixels. In such arrangements, both sky regions (Regions A and C) are also located in neighbouring depth planes. In such arrangements, Vertex A 675 and Vertex C 670 are connected by an edge.

The content depth profile 650 describes the three-dimensional spatial composition of the image 120 by encoding information in the vertices and edges of the graph. Each vertex encodes the characterisation of the associated region, as determined by the region characterising step 430. Each edge encodes the difference of characterisations encoded in the pair of vertices that the edge connects. As a result, each edge contains a difference of local image characteristics representative of a difference in visual appearance of the proximate regions, and a difference of depths of the proximate regions.

In another arrangement of the content depth profile 650, only one vertex of the graph is encoded with the characterization of the region associated with the one vertex, while all the edges are encoded with the difference of local image characteristics and difference of depths. The values of the other non-encoded vertices may be determined from the differences encoded in the edges and the one encoded vertex.

When compared to the image 120, the content depth profile 650 determined in accordance with the method 400 is many orders of magnitude more compact and informative. The content depth profile 650 advantageously describes the three-dimensional spatial composition of the image 120.

As described above, at similarity determining step 360, the content depth profiles corresponding to the two images 310 and 320 are compared in order to determine the compositional similarity of the two images 310 and 320. According to an arrangement, a graph similarity measure using a random walk graph kernels is used at step 360 to determine the compositional similarity of the two images 310 and 320. The sum of absolute differences of local image characteristics and differences of depth is treated as an edge weighting, and the nodes are treated as being unlabelled. As a result of using the edge-encoded information of difference of local image characteristics and difference of depth, the three-dimensional spatial composition of each image may be compared. As described above, typically the content depth profile 350 of the second image 320 is predetermined. The similarity of the two images 310 and 320 may be used for identifying the image 320 based on the content depth profile determined at step 330A for the image 310 and the predetermined content depth profile associated with the image 320.

In yet another arrangement, a graph similarity measure based on feature extraction (features of the graph) is used to determine the similarity of the two images 310 and 320 at determining step 360. For example, the eigenvalues of the graphs may be compared to determine the similarity of the two images 310 and 320 at determining step 360. As an example, let A1 and A2 represent the adjacency matrices of two graphs G1 and G2 respectively. Also let L1=D1-A1 and L2=D2-A2 be the Laplacians of the graphs, where D1 and D2 are the corresponding diagonal matrices of degrees. The eigenvalues of the Laplacians may be determined and the similar measure may be determined as the sum of square of the difference of the eigenvalues. A number close to zero (0) means that the graphs are very similar.

According to an arrangement, a captured image, such as the image 120, is created by an image capture system that includes a depth sensor which selects appropriate post-processing to apply to a captured image in order to modify perceived depth of the image. The image capture system may be the system 800 implemented using a single electronic device such as the camera 827.

Other implementations may comprise a stereo-pair of digital SLR cameras operating synchronously as a unit, and associated hardware and software. Stereo-pair disparity may be used as a mechanism for measuring depth of elements of a captured image in the corresponding scene. Disparities may be measured from locations of objects in the image of the first camera to the locations of the same objects in the image of the second camera. Thus, the depth map aligns to object locations in the image of the first camera. The image of the first camera may therefore be used as the photographic image.

In other arrangements, camera systems that use a different principle for measuring depth may be used to implement the described methods. For example, the camera 827 may be in the form of a single digital SLR camera (producing a photographic image) coupled with a time-of-flight depth sensor (producing a depth map).

As an example, the captured image may be transferred from the camera 827 to the computer module 801, where the captured image is added to image collection of a photographer. The photographer may then wish to post-process the captured image to enhance depth of the captured image. In such an example, the photographer may have previously performed postprocessing to enhance the depth of images using the computer module 801, where the images and associated combinations of processes have been stored by post-processing software as depth enhancement packages 230 in the database 220 configured within the hard disk drive 810. The photographer can then perform a lookup 250 using the captured image as a query image 210 to determine a sufficiently similar depth enhancement package 260, to identify a significantly relevant combination of image processes 275 for applying to the query image 210 to enhance the depth of the query image 210. The photographer may then make minor adjustments to the significantly relevant combination of image processes 275 to enhance the image 210. The resulting adjusted set of image processes may then be inserted, associated with the captured image, as a new image enhancement package in the database 220. The photographer may retain the resulting enhanced image as part of the image collection of the photographer. Advantageously, the enhanced image has been created with relatively minor time and effort from the photographer relative to existing techniques.

According to another arrangement, the captured image may be created by an image capture system in the form of the camera 827, in the same manner as the arrangement described above. The captured image may then reside in memory of the camera 827, without being transferred off that camera 827. The camera 827 may also have networking functionality for connecting to the network 820. The database 220 may reside in a location accessible by the networking functionality of the camera 827, such as a Cloud storage facility or social media network. The database 220 configured within the Cloud storage facility or social media network may contain depth enhancement packages 230 which have been made available to the photographer of the captured image according to a reason. For example, the depth enhancement packages may have been created by the photographer and so are made available to the photographer. Alternatively, the creator or provider of the image enhancement package 230 may grant the photographer permission to access the image enhancement package 230 due to a relationship between the creator/provider and the photographer (such as friendship, family, belonging to a common interest group, commercial, etc.).

In one example, the photographer may wish to enhance the depth of the captured image, and press a button on a user interface of the image capture system in the form of the camera 827. The camera 827 may then perform a lookup 250 using the captured image as the query image 210, by transmitting the associated content depth profile 460 for the query image 210, via the network 820, to the network-accessible provider of the database 220. The query image 210 does not need to be transferred to complete the lookup 250. In the example of Fig. 2, a sufficiently similar depth enhancement package 260 may be found, and the associated combination of image processes 275 may then be transmitted from the network-accessible provider of the database 220 to the camera 827. The combination of image processes 275 may then be applied by the camera 827, and the resulting enhanced image may be presented to the photographer on a display of the camera 827. The photographer can then select whether the enhancement is acceptable; and if the enhancement is not acceptable, then a further lookup may be performed, to determine a different depth enhancement package of the database 220.

Advantageously, as described above, no full image is required to be transferred via the network 820 according to the arrangements described above. The content depth profile 460 is compact. The compactness of the content depth profile 460 is efficient in terms of data transfer and bandwidth, and provides an amount of image privacy for both the photographer of the query image 210 and the creator of the determined image enhancement package 260.

Another arrangement will now be described with reference to Fig. 7. In the arrangement of Fig. 7, an image database 720 configured within the hard disk drive 810 contains images 740. A webpage interface, displayed on the display 814, may be configured to allow users to upload an image and find similarly-composed images. In the example of Fig. 7, a user uploads an image 710. An image lookup 750 is then performed using the uploaded image as a query image 710. The image lookup 750 uses the method 300 (e.g., executed by the processor 805) to determine most similar images 770. A response webpage displaying images ordered by similarity to the image 710 as found by the lookup 750 may then be displayed by the response webpage on the display 814. The response webpage may then be returned to the user.

Industrial Applicability

The arrangements described are applicable to the computer and data processing industries and particularly for the image processing.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word "comprising", such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

Claims

1. A method of identifying an image, said method comprising: identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.
2. The method according to claim 1, wherein the content depth profile for the query image is determined based on a set of relationships between pairs of proximate regions located in neighbouring depth planes, the relationships defining a difference in depth of the proximate region pair together with a difference in content characteristic of the proximate region pair.
3. The method according to claim 1, further comprising: determining image processes associated with the identified image; and applying the determined image processes to the query image.
4. The method according to claim 1, wherein the content depth profile defines content and depth structure within the query image.
5. The method according to claim 1, wherein the selected regions are superpixels determined based on a colour.
6. The method according to claim 1, wherein the selected regions are superpixels determined based on texture.
7. The method according to claim 1, wherein the selected regions are superpixels determined based on depth information.
8. The method according to claim 1, further comprising segmenting the query image into superpixels.
9. The method according to claim 11, wherein attributes of each superpixel represent local image characteristics.
10. The method according to claim 1, wherein each pixel of the query image is associated with a candidate point according to a dissimilarity measure.
11. The method according to claim 10, wherein the dissimilarity measure comprises a measurement of local image characteristics about each pixel and the candidate point.
12. The method according to claim 8, further comprising merging adjacent ones of the superpixels.
13. The method according to claim 12, wherein the adjacent superpixels are merged using a Markov random field.
14. The method according to claim 1, further comprising characterising the depth and appearance of each of said regions.
15. The method according to claim 1, further comprising identifying the regions which are two-dimensionally spatially proximate in the query image.
16. An apparatus for identifying an image, said apparatus comprising: means for identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; means for determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and means for identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.
17. A system for identifying an image, said system comprising: a memory for storing data and a computer program; and a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image.
18. A computer program product having a computer program stored thereon for identifying an image, said program comprising: code for identifying a plurality of regions of a query image, the regions being located at different depth planes within the query image; code for determining a content depth profile for the query image based on a set of differences in depth and appearance in pairs of proximate regions located in neighbouring depth planes; and code for identifying an image based on a similarity of the determined content depth profile of the query image and a predetermined content depth profile associated with the identified image. CANON KABUSHIKIKAISHA Patent Attorneys for the Applicant/Nominated Person SPRUSON & FERGUSON