AU2012261494A1

AU2012261494A1 - Saliency refinement method

Info

Publication number: AU2012261494A1
Application number: AU2012261494A
Authority: AU
Inventors: Clement Fredembach; Jue Wang
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-11-30
Filing date: 2012-11-30
Publication date: 2014-06-19

Abstract

- 26 SALIENCY MAP REFINEMENT METHOD A method (100) of forming a refined saliency map (180) of an image (110) captured by an image capture device (1001), said method comprising determining (120) a global saliency map (150) of the captured image defining regions of relative significance across the image; determining (130) user interaction data (135) navigating to a part of the image on a display screen (1014); identifying (130), in the image, at least one region of significance based on the user interaction data; determining (140) a local saliency map (160) based on the at least one identified region of significance, said local saliency map defining regions of relative significance across the at least one identified region of the image independent of other regions of the image; and refining (170) the global saliency map dependent upon the local saliency map to form the refined saliency map of the captured image. 69137161 P053185_specilodge 1-l35 115 110 User nput Image 192/ -181 -183 User Input Global Saliency 120 Acquisition Process Calculation Process See Fi. 218 See Fig. See Fig 7(a) Local Saliency 140 Global Saliency 150 Calculation Process Map See Fig. 7(e) Local Saliency Map See Fig. 4 Map Combination 170 Process See Fig. 7(f) Refined Map 180 Fig. 1 OfOr)47r r- I;A-

Description

S&F Ref: P053185 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant : chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Jue Wang Clement Fredembach Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Saliency refinement method The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(6917594_1) -1 SALIENCY MAP REFINEMENT METHOD TECHNICAL FIELD [0001] The current invention relates to image processing and in particular to image quality assessment or image quality enhancement. BACKGROUND [0002] Salient features in an image, i.e. features which are particularly noticeable to a person viewing the image, can be useful factors in performing different operations on the image. Salient features may be positive attractive features, such as the face of a dear friend, or may be merely noteworthy, such as a hurricane, and the term attractive is used in both contexts in this specification unless otherwise noted. The term "saliency detection" in the present description is used to denote (a) the detection, by either manual, automatic, or hybrid automatic/manual methods, of these salient features and may also determine (b) the extent to which these salient features attract the attention of a viewer. [0003] An accurate automatic saliency detection algorithm is an important enabler for many technologies such as image enhancement and object detection. [0004] Automatic saliency prediction methods aim to predict the visual attention attracting capability/saliency of an image for a group of observers, in other words, for an "average" observer. These techniques typically obtain the visual attention attracting capability ground truth of an image by averaging visual attention maps of multiple observers looking at an image in a controlled environment, and develop saliency prediction models that generate saliency maps that are as close as possible to the averaged visual attention maps. [0005] Determining the visual attention paid by an average observer to a particular image is often not good enough, as one individual can significantly differ from another individual in terms of the visual attention they pay to the same image or to parts thereof. [0006] Eye trackers are the most accurate devices for determining the visual attention paid by individual observers to an image or to parts thereof, but these devices are not always available, they can be cumbersome, and they require careful explicit calibration. Moreover, performance of these devices is highly restricted by viewing conditions, e.g., observer's movement, image size, and lighting environment. 69137161 P053185_specilodge -2 [0007] An alternative to using eye-tracker devices involves explicitly asking observers to select regions or objects of interest. However, this adds an often unacceptable burden for the observer and requires dedicated user interfaces and input techniques. SUMMARY [0008] It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements. [0009] Disclosed are arrangements, referred to as User Interaction Refinement (i.e. UIR) arrangements, which seek to address the above problems by identifying features of significance to a user by exploiting the user's interaction with an image capturing device, and using those identified features to refine an automatically generated saliency map, thereby generating an improved saliency map of a particular image with respect to the specific user. [00010] According to a first aspect of the present invention, there is provided a method of forming a refined saliency map of an image captured by an image capture device, said method comprising the steps of: determining a global saliency map of the captured image, said global saliency map defining regions of relative significance across the image; determining user interaction data depending upon interactions of a user with the image capture device, said user interactions navigating to a part of the image on a display screen; identifying, in the image, at least one region of significance based on the user interaction data; determining a local saliency map based on the at least one identified region of significance, said local saliency map defining regions of relative significance across the at least one identified region of the image independent of other regions of the image; and refining the global saliency map dependent upon the local saliency map to form the refined saliency map of the captured image. (00011] According to another aspect of the present invention, there is provided an apparatus for implementing any one of the aforementioned methods. [00012] According to another aspect of the present invention there is provided a computer program product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above. 6913716_1 P053185_specilodge -3 [00013] Other aspects of the invention are also disclosed. BRIEF DESCRIPTION OF THE DRAWINGS [00014] At least one embodiment of the invention will now be described with reference to the following drawings, in which: [00015] Fig. 1 is a schematic flow chart showing an example of a saliency map refinement method according to one UIR arrangement; [00016] Fig. 2 is a schematic flow diagram illustrating an example of a method for implementing a user input acquisition process 130 used in the method of Fig. 1; [00017] Fig. 3 is a schematic flow diagram illustrating an example of a method for implementing a local saliency determination process 140 used in the method of Fig. 1; [00018] Fig. 4 is a schematic flow diagram illustrating an example of a method used for implementing a map combination process 170 used in the method of Fig. 1; [00019] Fig. 5 is a schematic flow diagram illustrating an example of a method for implementing a weight determination process 410 used in the method of Fig. 4; [00020] Fig. 6 is a schematic flow diagram depicting an example of a saliency map refinement method according to one UIR arrangement; [00021] Figs. 7(a) - 7(f) show six figures illustrating an example of a saliency map refinement method based on user input in the method of Fig. 1; [00022] Fig. 8 is a schematic flow diagram depicting another example of a saliency refinement method according to one UIR arrangement; [00023] Fig. 9 is a schematic flow diagram illustrating an example of a method for implementing a refined saliency calculation process 410 used in the method of Fig. 8; and [00024] Figs. 10A and 1OB collectively form a schematic block diagram representation of an electronic device upon which described UIR arrangements can be practised; 6913716_1 P053 185_spec _lodge -4 DETAILED DESCRIPTION INCLUDING BEST MODE [00025] Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. [00026] It is to be noted that the discussions contained in the "Background" section and the section above relating to prior art arrangements relate to discussions of arrangements which may form public knowledge through their use. Such discussions should not be interpreted as a representation by the present inventor(s) or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art. [00027] Figs. 1 0A and 10B collectively form a schematic block diagram of a general purpose electronic apparatus 1001 including embedded components, upon which the UIR methods to be described are desirably practiced. The electronic device 1001 may be, for example, a mobile phone, a portable media player or a digital camera, in which processing resources are limited. Nevertheless, the methods to be described may also be performed on higher-level devices such as desktop computers, server computers, and other such devices with significantly larger processing resources. [00028] As seen in Fig. 10A, the electronic device 1001 comprises an embedded controller 1002. Accordingly, the electronic device 1001 may be referred to as an "embedded device." In the present example, the controller 1002 has a processing unit (or processor) 1005 which is bi-directionally coupled to an internal storage module 1009. The storage module 1009 may be formed from non-volatile semiconductor read only memory (ROM) 1060 and semiconductor random access memory (RAM) 1070, as seen in Fig. 10B. The RAM 1070 may be volatile, non-volatile or a combination of volatile and non-volatile memory. [00029] The electronic device 1001 includes a display controller 1007, which is connected to a video display 1014, such as a liquid crystal display (LCD) panel or the like. The display controller 1007 is configured for displaying graphical images on the video display 1014 in accordance with instructions received from the embedded controller 1002, to which the display controller 1007 is connected. [00030] The electronic device 1001 also includes user input devices 1013 which are typically formed by keys, a keypad or like controls. In some implementations, the user 6913716_1 P053185_specilodge -5 input devices 1013 may include a touch sensitive panel physically associated with the display 1014 to collectively form a touch-screen. Such a touch-screen may thus operate as one form of graphical user interface (GUI) as opposed to a prompt or menu driven GUI typically used with keypad-display combinations. Other forms of user input devices may also be used, such as a microphone (not illustrated) for voice commands or a joystick/thumb wheel (not illustrated) for ease of navigation about menus. [00031] As seen in Fig. 10A, the electronic device 1001 also comprises a portable memory interface 1006, which is coupled to the processor 1005 via a connection 1019. The portable memory interface 1006 allows a complementary portable memory device 1025 to be coupled to the electronic device 1001 to act as a source or destination of data or to supplement the internal storage module 1009. Examples of such interfaces permit coupling with portable memory devices such as Universal Serial Bus (USB) memory devices, Secure Digital (SD) cards, Personal Computer Memory Card International Association (PCMIA) cards, optical disks and magnetic disks. [00032] The electronic device 1001 also has a communications interface 1008 to permit coupling of the device 1001 to a computer or communications network 1020 via a connection 1021. The connection 1021 may be wired or wireless. For example, the connection 1021 may be radio frequency or optical. An example of a wired connection includes Ethernet. Further, an example of wireless connection includes BluetoothTM type local interconnection, Wi-Fi (including protocols based on the standards of the IEEE 802.11 family), Infrared Data Association (IrDa) and the like. (00033] Typically, the electronic device 1001 is configured to perform some special function. The embedded controller 1002, possibly in conjunction with further special function components 1010, is provided to perform that special function. For example, where the device 1001 is a digital camera, the components 1010 may represent a lens, focus control and image sensor of the camera. The special function components 1010 is connected to the embedded controller 1002. As another example, the device 1001 may be a mobile telephone handset. In this instance, the components 1010 may represent those components required for communications in a cellular telephone environment. Where the device 1001 is a portable device, the special function components 1010 may represent a number of encoders and decoders of a type including Joint Photographic Experts Group (JPEG), (Moving Picture Experts Group) MPEG, MPEG-1 Audio Layer 3 (MP3), and the like. 69137161 P053185_specijlodge -6 [00034] The UIR methods described hereinafter may be implemented using the embedded controller 1002, where the processes of Figs. 1-6 and 8-9 may be implemented as one or more software application programs 1033 executable within the embedded controller 1002. The electronic device 1001 of Fig. 10A implements the described UIR methods. In particular, with reference to Fig. 10B, the steps of the described methods are effected by instructions in the software 1033 that are carried out within the controller 1002. The software instructions may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user. [00035] The UIR software 1033 of the embedded controller 1002 is typically stored in the non-volatile ROM 1060 of the internal storage module 1009. The software 1033 stored in the ROM 1060 can be updated when required from a computer readable medium. The software 1033 can be loaded into and executed by the processor 1005. In some instances, the processor 1005 may execute software instructions that are located in RAM 1070. Software instructions may be loaded into the RAM 1070 by the processor 1005 initiating a copy of one or more code modules from ROM 1060 into RAM 1070. Alternatively, the software instructions of one or more code modules may be pre-installed in a non-volatile region of RAM 1070 by a manufacturer. After one or more code modules have been located in RAM 1070, the processor 1005 may execute software instructions of the one or more code modules. [00036] The UIR application program 1033 is typically pre-installed and stored in the ROM 1060 by a manufacturer, prior to distribution of the electronic device 1001. However, in some instances, the application programs 1033 may be supplied to the user encoded on one or more CD-ROM (not shown) and read via the portable memory interface 1006 of Fig. 10A prior to storage in the internal storage module 1009 or in the portable memory 1025. In another alternative, the software application program 1033 may be read by the processor 1005 from the network 1020, or loaded into the controller 1002 or the portable storage medium 1025 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that participates in providing instructions and/or data to the controller 1002 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, flash memory, or a computer readable card such as a PCMCIA card and the like, whether or not 6913716_1 P053185_specilodge -7 such devices are internal or external of the device 1001. Examples of transitory or non tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the device 1001 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. A computer readable medium having such software or computer program recorded on it is a computer program product. [00037] The second part of the application programs 1033 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUls) to be rendered or otherwise represented upon the display 1014 of Fig. 10A. Through manipulation of the user input device 1013 (e.g., the keypad), a user of the device 1001 and the application programs 1033 may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via loudspeakers (not illustrated) and user voice commands input via the microphone (not illustrated). [00038] Fig. 1OB illustrates in detail the embedded controller 1002 having the processor 1005 for executing the application programs 1033 and the internal storage 1009. The internal storage 1009 comprises read only memory (ROM) 1060 and random access memory (RAM) 1070. The processor 1005 is able to execute the application programs 1033 stored in one or both of the connected memories 1060 and 1070. When the electronic device 1001 is initially powered up, a system program resident in the ROM 1060 is executed. The application program 1033 permanently stored in the ROM 1060 is sometimes referred to as "firmware". Execution of the firmware by the processor 1005 may fulfil various functions, including processor management, memory management, device management, storage management and user interface. [00039] The processor 1005 typically includes a number of functional modules including a control unit (CU) 1051, an arithmetic logic unit (ALU) 1052 and a local or internal memory comprising a set of registers 1054 which typically contain atomic data elements 1056, 1057, along with internal buffer or cache memory 1055. One or more internal buses 1059 interconnect these functional modules. The processor 1005 typically also has one or more interfaces 1058 for communicating with external devices via system bus 1081, using a connection 1061. 6913716_1 P053185_specijodge - 8 [00040] The UIR application program 1033 includes a sequence of instructions 1062 through 1063 that may include conditional branch and loop instructions. The program 1033 may also include data, which is used in execution of the program 1033. This data may be stored as part of the instruction or in a separate location 1064 within the ROM 1060 or RAM 1070. [00041] In general, the processor 1005 is given a set of instructions, which are executed therein. This set of instructions may be organised into blocks, which perform specific tasks or handle specific events that occur in the electronic device 1001. Typically, the application program 1033 waits for events and subsequently executes the block of code associated with that event. Events may be triggered in response to input from a user, via the user input devices 1013 of Fig. 10A, as detected by the processor 1005. Events may also be triggered in response to other sensors and interfaces in the electronic device 1001. [00042] The execution of a set of the instructions may require numeric variables to be read and modified. Such numeric variables are stored in the RAM 1070. The disclosed method uses input variables 1071 that are stored in known locations 1072, 1073 in the memory 1070. The input variables 1071 are processed to produce output variables 1077 that are stored in known locations 1078, 1079 in the memory 1070. Intermediate variables 1074 may be stored in additional memory locations in locations 1075, 1076 of the memory 1070. Alternatively, some intermediate variables may only exist in the registers 1054 of the processor 1005. [00043] The execution of a sequence of instructions is achieved in the processor 1005 by repeated application of a fetch-execute cycle. The control unit 1051 of the processor 1005 maintains a register called the program counter, which contains the address in ROM 1060 or RAM 1070 of the next instruction to be executed. At the start of the fetch execute cycle, the contents of the memory address indexed by the program counter is loaded into the control unit 1051. The instruction thus loaded controls the subsequent operation of the processor 1005, causing for example, data to be loaded from ROM memory 1060 into processor registers 1054, the contents of a register to be arithmetically combined with the contents of another register, the contents of a register to be written to the location stored in another register and so on. At the end of the fetch execute cycle the program counter is updated to point to the next instruction in the system program code. Depending on the instruction just executed this may involve incrementing the address contained in the program counter or loading the program counter with a new address in order to achieve a branch operation. 69137161 P053185_specilodge -9 [00044] Each step or sub-process in the processes of the UIR methods described below is associated with one or more segments of the UIR application program 1033, and is performed by repeated execution of a fetch-execute cycle in the processor 1005 or similar programmatic operation of other independent processor blocks in the electronic device 1001. Context [00045] When a photographer inspects the quality of a photo on a display screen such as the screen 1014 at the back of the camera 1001 or a computer monitor, for assessing the quality, pleasantness or noteworthiness of an image, the photographer will typically want to review the image at full resolution, which often requires him to navigate to pan to, or zoom in to, particular regions of the image. The photographer may also adjust a focal point on the displayed image to review in-focus and out-of-focus regions of the image, thereby identifying regions of significance. The regions to which the photographer navigates reflect the individual photographer's interest due to content of the regions, and their relative significance and importance should be taken into account when calculating an average saliency map of the image. The disclosed UIR arrangements incorporate that implicit interaction data and determines refined saliency maps that are, in effect, customised for each observer/image pair. The aforementioned interaction data is referred to as being "implicit" to reflect the fact that although the interaction data is generated because the user is navigating to and viewing different parts of the photograph, the user is not performing the navigation and viewing in order to provide this data. In other words the data is generated as a by-product, and not explicitly because the user has been directed to produce the data. Overview [00046] Given the photo, the photographer may not navigate to every single region in which he is interested, and thus the regions to which the photographer does in fact navigate and inspect may be a only subset of regions of interest in the image. Therefore, the visual attention paid to the regions to which the photographer does navigate cannot directly be used as an accurate means to predict the visual attention attractiveness of an entire image. [00047] The UIR arrangements employ navigation data, generated by the photographer when he inspects a photo, as an input to refine automatically predicted saliency information derived from the photo. The resultant refined photo saliency information, also referred to as 6913716_1 P053185_specilodge -10 a refined salience map, is a more accurate measurement of the visual attention paid by the photographer for a given image. It is noted that the refined salience map is specific to the image in question, and to the user in question. [00048] The UIR arrangements can produce a saliency map of an image that more closely reflects the visual attention paid by an individual user to an image. Compared with eye tracking methods and other techniques that require explicit user input, the UIR arrangements are relatively easy to set up, and the user input acquisition process is transparent to the user. The term "transparent" denotes the fact that the interaction data is gathered not because the user is specifically asked to generate this data, but rather as a by-product of another activity of the user, i.e. navigating to, and viewing, different parts of an image of interest. [00049] In the present specification, the terms "image" and "photograph" are used interchangeably unless otherwise noted. The terms "user", "people", and "photographer" are also used interchangeably unless otherwise noted. Embodiment 1 [00050] Fig. 1 is a schematic flow chart showing an example of a process 100 for performing the UIR arrangement. The process 100 generates a refined saliency map 180 for an input image 110, the generated refined saliency map 180 being custom ised for and specific to an individual viewer 115 and the given image 110, based on input from the viewer 115. [00051] When the photographer 115 views the photo 110 that he has captured, he often interacts, as depicted by an arrow 181, with the display screen 1014 upon which the image 110 is displayed, by zooming in and navigating to particular regions of the image 110, using the user interface 1013, in order to inspect the image 110 and/or parts thereof in more detail. When the user 115 is inspecting the input image 110 a user input acquisition process 130, typically implemented as part of the UIR application program 1033 executing on the processor 1005, gathers, as depicted by an arrow 192, interaction data 135 generated by the user 115 this interaction data 135 resulting from control actions taken by the user when operating the user interface 1013 in order to navigate the image in question. The process 100 follows an arrow 190 to a local saliency calculation process 140 that generates, as depicted by a dashed arrow 186, a local saliency map 160 based on the input image 110 and the acquired user interaction data 135. A global saliency calculation process 120, implemented as part of the UIR application program 1033 executing on the 6913716_1 P053185_speci_lodge - 11 processor 1005, generates, as depicted by a dashed arrow 184, a global saliency map 150 based, as depicted by an arrow 183, on the input image 110. The global saliency map defines regions of relative significance across the image 110. The process 100 follows respective arrows 193 and 194 from the steps 140 and 150 to a map combination process 170, implemented as part of the UIR application program 1033 executing on the processor 1005, which uses, as depicted by respective dashed arrows 188 and 187, the local saliency map 160 and the global saliency map 150, the local saliency map 160 being used as a clue, reflecting the visual attention of the user 115 to the image 110, in order to refine the global saliency map 150. The combination process 170 also uses, as depicted by an arrow 185, the acquired user interaction data 135 in order to refine the global saliency map 150 The combination process 170 produces, as depicted by an arrow 189, the refined saliency map 180 which is a better representation of the visual attention paid by the user 115 to the input image 110. Details of the each step will be described in the following paragraphs. [00052] The global saliency calculation process 120 determines the global saliency map 150 of the input image 110. The global saliency calculation process 120 is not limited to a particular saliency prediction method and can use any existing automatic saliency prediction algorithm. Thus for example existing object detection or face detection approaches can be used to detect objects/faces and define these objects/faces as the significant salient objects in the input image 110 and these salient features can be incorporated into the global saliency map. The global saliency map 150 is a measure of the visual attention paid by an average observer to the input image 110 [00053] Fig. 2 is a schematic flow diagram illustrating an example of a method for implementing the user input acquisition process 130 depicted in the method of Fig. 1. When the user 115 manipulates, as depicted by an arrow 281, the user interface 1013 and navigates to a specific region of the input image 110 on the display 1014, the relative position of the specific region to which the user 115 has navigated and/or zoomed can be determined. A timer is used in the user input acquisition process 130. As soon as the user 115 stops manipulating the screen after he/she navigates to the specific region, the timer is started. After a certain time, during which the user 115 is presumed to visually inspect all the relevant details of interest in the specific region, the user 115 manipulates the display again and navigates away from the region, and the timer is stopped, and a period of time from when the timer was started last time is determined. The calculated time is the time the user has spent viewing the specific region in question. The user, input acquisition process 6913716_1 P053185_specilodge - 12 130 records the position of each region to which the user navigates, and the time that the user has spent viewing the region in question. [00054] When the input image 110 is first displayed on the screen 1014, an initialization process 200 initializes, as depicted by an arrow 282, parameter values used in the user input acquisition process 130. In the initialization process 200, before the user 115 starts to manipulate the screen, the timer is stopped, and a time t which the user 115 spends on a specific region of the input image 110 is set to 0. A checking step 210, performed by the processor 1005 directed by the UIR software application 1033, constantly checks to determine if the user 115 has started to manipulate the display of the image 110. As soon as the step 210 detects that the user 115 has started manipulating the screen, e.g., zooming in to a particular region, the process 130 follows a YES arrow 285 to a timer stopping process 220 which stops the timer. The time t which the user 115 has spent on the specific region of the input image 110 is determined. This identifies a region of significance in the image to the user. If it is the first time that the user 115 has started to manipulate the display, then since the timer has never been started, the value of the variable t is still 0. Otherwise, the variable t is the time between the time the timer was started last time and the time the timer stops. [00055] The process 130 then follows an arrow 287 to a time checking unit 230, performed by the processor 1005 directed by the UIR software application 1033, which determines if t is long enough for the user 115 to have inspected details of the region. In one UIR arrangement, t is compared with a pre-determined threshold T. If t is greater than T, the process 130 follows a YES arrow 288 to an information storing process 240 that stores the position of the inspected region and the associated time t in a list stored in the memory 1009, otherwise, the process 130 follows a NO arrow 289 to a step 250 and no information is stored. In one UIR arrangement, the position of the inspected region is defined by coordinates of the top left and bottom right corners of the region displayed on the screen. The checking unit 250, performed by the processor 1005 directed by the UIR software application 1033, determines if the user 115 has finished viewing the image. If the user 115 is still inspecting the image, the process 130 follows a NO arrow 292 to a checking process 260, performed by the processor 1005 directed by the UIR software application 1033, which constantly determines if the user 115 has navigated to the region that he/she desires and stopped manipulating the screen. If the step 260 determines that the user has not stopped manipulating the image, the process 130 follows a NO arrow 296 cack to the step 260 in a looping manner. If the user 115 has stopped manipulating the screen, then the process 130 follows a YES arrow 293 to a timer starting process 280, 6913716_1 P053185_speci lodge -13 performed by the processor 1005 directed by the UIR software application 1033, which starts the timer. The process 130 then follows an-arrow 284 to the step 210. The timer will be stopped when the checking unit 210 detects the next commencement of the user's manipulation of the screen. [00056] Returning to the step 250, when the checking process 250 determines that the user 115 has left the input image 110 the process 130 follows a YES arrow 291 to an information output process 270 which outputs, as depicted by an arrow 295, the manipulation information 135 which is made up of the positions of all the regions that the user 115 has inspected and the time the user 115 has spent on each region. Typically the user will inspect at least one or more parts of the image. [00057] Fig. 3 is a schematic flow diagram illustrating an example of a method for implementing the local saliency calculation process 140 used in the method of Fig. 1. The local saliency calculation process 140 generates the local saliency map 160 which comprises the saliency information of all the regions the user 115 has inspected in the image 110. [00058] First of all, a map initialization process 310 creates, as depicted by a dashed line 392, a map V (i.e. 393) of the same size as the input image 110. In one example, all the pixel values in V are initialized to be the same value, which may, for example be the value "0". A region selection process 320, performed by the processor 1005 executing the UIR software program 1033, selects, as depicted by an arrow 381, a region R; (i.e. 394) from the manipulation information 135. In one UIR arrangement, the region selection process 320 randomly selects R; on the basis that this region has not yet been processed. When the region R; is selected, the position of the region R; in the input image 110 and the time the user spent in viewing R, can be obtained from the manipulation information 135. The process 140 then follows an arrow 383 to a viewing time extraction process 330, performed by the processor 1005 executing the UIR software program 1033, which extracts a viewing time spent by the user in viewing the region R;, this viewing time denoted as ti. Returning to the step 320 the process 140 also follows an arrow 384 to a region extraction process 340, performed by the processor 1005 executing the UIR software program 1033, which extracts, as depicted by an arrow 382, the region R; from the input image 110 based on the coordinates of the region R;. The process 140 then follows an arrow 388 to a saliency prediction process 350, performed by the processor 1005 executing the UIR software program 1033, which constructs a saliency map V; (i.e. 395) associated with the selected region R; without reference to the rest of the input image 110. 69137161 P053185_speci_lodge -14 [00059] In one UIR arrangement, the approach used in the saliency prediction process 350 is the same as used in the global saliency calculation process 120. There are various approaches that can be used in the step 350 for predicting the saliency information of R;, for the saliency map V;. For example, existing object detection or face detection approaches can be used to detect objects/faces and define these objects/faces as the salient objects in R;. Alternately, it can be presumed that the user 115 looks at every pixel in R; equally, and thus every pixel in R; is defined to have the same saliency value 1/(h; * wi), where: 1. hi and w, are the height and width in pixel of region R;. [00060] Returning to the process 140 , the process follows an arrow 387 to a map modification process 360, performed by the processor 1005 executing the UIR software program 1033, which modifies the map V based on the saliency map Vi that has been determined by the step 350 based upon the selected region R;. [00061] In one UIR arrangement, if the user inspects the region R, of the input image 110, the saliency values of pixels in the corresponding region in the map V are boosted accordingly. In one UIR arrangement, for each selected region Ri, the corresponding map V is modified as Equation (1) shows. V(x+ x1,y +y') =V(x+ xi, y+yi )+ t;*V;(x,y) f or xEG(1, w), yEG(1,hi) (1)' [00062] Where: 1. (xi, y ) are the coordinates of the top left corner of region Ri, and 2. t; is the time the user 115 has spent in viewing region R,. [00063] After the map modification process 360 modifies the map V based on the saliency map of a selected region, the process 140 follows an arrow 390 to a checking step 370, performed by the processor 1005 executing the UIR software program 1033, which checks to determine if all regions in the manipulation information 135 have been processed. If this is not the case, the process 140 follows a NO arrow 385 back to the region selection process 320 which will select another region from the manipulation information 135 for processing. If however the step 370 determines that all the regions in 69137161 P053185_specilodge - 15 the manipulation information 135 have been processed, then the process 140 follows a YES arrow 396 to a normalization process 380, performed by the processor 1005 executing the UIR software program 1033, which normalizes the map V. The process 140 then follows an arrow 302 and the normalized map V is output as the local saliency map 160. There are many ways of normalizing a map. In one UIR arrangement, the map V is normalized as depicted by Equation (2) as follows. V(x, y) = V(x, y)/ ZX'=1 Z'=1V(x, y) (2) [00064] Where: 1. w and h are the width and height of map V respectively. [00065] Fig. 4 is a schematic flow diagram illustrating an example of a method used for implementing the map combination process 170 used in the method of Fig. 1. The weight determination process 410, described hereinafter in more detail with reference to Fig. 5, determines the weight w of the local saliency map 160. The map merging process 420, performed by the processor 1005 executing the UIR software program 1033, receives (i) the local saliency map 160 as depicted by arrows 431 and 434, and (ii) the global saliency map 150 as depicted by arrows 432 and 434, and merges the local saliency map 160 and the global saliency map 150 together to form the refined map 180. In one UIR arrangement, in the map merging process 420, each pixel value in the refined map 180 is calculated as the weighted average of the pixel values in the local saliency map 160 and the global saliency map 150, shown in Equation (3) as follows: S'(x, y) = w * V(x, y) + (1 - w) * S(x, y) (3) [00066] Where: 1. S'is the refined map (e.g. see 180 in Fig. 1), 2. V is the local saliency map (e.g. see 160 in Fig. 1), and 3. S is the global saliency map (e.g. see 150 in Fig. 1); 4. w is the weight of the local saliency map; and 6913716_1 P053185_specilodge - 16 5. X(x,y) is the value of pixel (x,y) in map X(X = S', V, S [00067] In the weight determination process 410, the importance of the local saliency map 160 is determined. A simple way of specifying the weight of the local saliency map 160 is to assign a constant weight, for example, 0.5, to the local saliency map. A basis for the weight being 0.5 is that the local saliency map and the global saliency map are of equal importance. An alternate way of determining the weight of the local saliency map 160 is to calculate the weight based on the viewing time of the input image 110. The longer time the user 115 spends in viewing the input image 110, the more careful the user 115 is clearly being during the image inspection, and thus the obtained local saliency map 160 from the user manipulation data 135 is deemed to be more reliable for predicting the visual attention of the user 115 for the input image 110 and consequently the greater weight is ascribed to the local saliency map 160. [00068] Based on the alternative way mentioned above, in one UIR arrangement, an example of a method for implementing the weight determination process 410 is depicted in the schematic flow diagram in Fig. 5. [00069] Fig. 5 is a schematic flow diagram illustrating an example of a method for implementing the weight determination process 410 used in the method of Fig. 4. In Fig. 5, based on the manipulation information 135 as depicted by an arrow 541, an image viewing time acquisition process 520, performed by the processor 1005 executing the UIR software program 1033, determines the total viewing time the user 115 has spent viewing the input image 110. The process 410 then follows an arrow 542 from the process 520 to a user's average viewing time calculation process 510, performed by the processor 1005 executing the UIR software program 1033, which updates the average viewing time of the individual user 115. The process 410 then follows respective arrows 544 from the step 510, and 543 from the step 520, to a weight calculation process 530, performed by the processor 1005 executing the UIR software program 1033, which calculates the weight of the local saliency map 415 based on the viewing time of the user 115 inspecting the input image 110 (from the step 520) and the average viewing time of the user 115 (from the step 510). [00070] Returning to the step 520, the manipulation information 135 stores the positions of all the regions and the viewing times associated with each viewed region. The image viewing time acquisition process 520, performed by the processor 1005 executing the UIR software program 1033, sums up the viewing time of all the regions, and the total time is the viewing time of the user 115 inspecting the input image 110. 69137161 P053185_specilodge - 17 [00071] Returning to the step 510, the user's average viewing time calculation process 510 obtains the individual user's average viewing time by learning the behaviour of the user 115, based upon the past behaviour of the user 115. In one UIR arrangement, when a user 115 uses the system for the first time, the user's average viewing time calculation process 510 set the user's average viewing time as the output of the image viewing time acquisition process 520. Subsequently, the user's average viewing time calculation process 510 calculates the user's average viewing time using Equation (4) as follows. i - 1) *Fi_1 + t)i (4) [00072] Where: 1. i is the number of data items which the user's average viewing time calculation process 510 has obtained from the image viewing time acquisition process 520. Each data item is a viewing time obtained from the user viewing an image.; and 1. F is the average viewing time based upon i viewing times obtained from the user 115 inspecting an image. F is calculated as Equation 4 shows, and represents the average viewing time based on I data items (viewing times). [00073] Another way of determining the user's average viewing time (i.e. referring to the step 510) is to set the initial average viewing time to the average viewing time of a group of users, which can be learnt offline. Following each time a viewing time of the user 115 is obtained by the image viewing time acquisition process 520, the user's average viewing time calculation process 510 updates the user's average viewing time. [00074] The weight calculation process 530 determines the weight of the local saliency map by comparing the total viewing time of the user 115 for the input image 110 with the average viewing time of the user 115. A longer total viewing time for the user 115 for an input image 110 corresponds to a larger weight for the local saliency map. In one UIR arrangement, the weight calculation process 530 calculates the weight of the local saliency map 415 as according to Equation (5) as follows; W ,2_ (5) [00075] Where: 6913716_1 P053185_specijlodge - 18 1. t is the total viewing time of the user 115 for the input image 110, and 2. t is the average viewing time of the user 115. Embodiment 2 [00076] Fig. 6 is a schematic flow diagram depicting an example of a saliency map refinement method 600 according to another UIR arrangement. When the user 115 is manipulating the display to inspect the input image 110, the local saliency map 160 and the global saliency map 150 are generated in the same manner described in Embodiment 1 which is depicted in Fig. 1. However, a map combination process 610, which operates in a different manner to the process 170 in Fig. 1, uses the pixel values in the local saliency map 160 to weight the pixels in the global saliency map 150, as depicted in Equation (6). S'(x, y) = V (x, y) * S(x, y) (6) [00077] Where: 1. S', V and S denote for the refined map 620, the local saliency map 160 and the global saliency map 150 respectively, and 2. X(x,y) is the value of pixel (x,y) in map X(X = S', V, S). Embodiment 3 [00078] Fig. 8 is a schematic flow diagram depicting another example of a saliency map refinement method 800 according to another UIR arrangement. When the user 115 is manipulating the screen to inspect the input image 110, the local saliency map 160 and the global saliency map are generated, in the same manner as described in Embodiment 1 as depicted in Fig. 1. However a refined saliency calculation process 810, described hereinafter in more detail in regard to Fig. 9, compares the salient regions of the input image 110 (via an arrow 811) in the local saliency map 160 and the salient regions of the input image 110 in the global saliency map 150, and thus refines the global saliency calculation process 120 The refined saliency calculation process 810 then generates a refined map 820 of the input image 110. [00079] In most automatic saliency calculation approaches, an input image 110 is decomposed into various channels based on the different attributes and features employed 6913716_1 P053185_speci_lodge -19 to calculate saliency, for example sharpness, colour, and luminance. The salience information in each channel is analysed independently and combined together as the last step to form a saliency map of the input image 110. [00080] Fig. 9 is a schematic flow diagram illustrating an example of a method for implementing the refined saliency calculation process 810 as used in the method of Fig. 8. A first salient region acquisition process 910, performed by the processor 1005 executing the UIR software program 1033, extracts, as depicted by respective arrows 971 and 972, local salient regions 920 of the input image 110 based on the local saliency map 160 and the input image 110. [00081] One way of extracting the local salient regions 920 is to find the regions in the input image 110 having corresponding salient regions in the local saliency map 160. This occurs, for example, when the pixel values in the corresponding regions in the local saliency map 160 are greater than a pre-determined threshold. [00082] A second salient region acquisition process 985, performed by the processor 1005 executing the UIR software program 1033, extracts, as depicted by respective arrows 973 and 972, global salient regions 930 using the same method used by the step 910, based on the input image 110 and the global saliency map 150. [00083] The process 810 then follows respective arrows 983 and 984 from the steps 910 and 985 to a feature comparison process 950, performed by the processor 1005 executing the UIR software program 1033. [00084] As described hereinafter in more detail, for each feature processed by the global saliency calculation process 120 (see Fig. 1), feature values are determined in the local salient regions 920 and the global salient regions 930. In one UIR arrangement, the average feature value in a region is used to represent the feature value of the region. [00085] The feature comparison process 950 compares the feature values in the local salient regions 920 and the global salient regions 930. The process 810 then follows an arrow to a step 960, performed by the processor 1005 executing the UIR software program 1033. If the feature values in the two regions as determined by the step 950 are similar, the feature weight will not be changed in the feature weight refinement process 960. Otherwise, the feature weight refinement process 960 will refine the weight of the feature. 69137161 P053185_specilodge -20 In one UIR arrangement, the weight refinement process 960 refines the weight of the feature according to Equation (7) as follows. wi =L*wi (7) [000861 Where: 1. w; is the weight of feature i in the global saliency calculation process 120, 2. w,'is the refined weight of feature i, and 3. f, and fg are the feature values in the local salient regions 920 and the global salient regions 930 respectively. [00087] If a feature value, e.g. sharpness, is higher in the local salient regions 920 than the global salient regions 930, this indicates that the user 115 is more interested in sharp regions, and thus the weight of sharpness feature should be increased accordingly, and vice versa. [00088] After the weight of each feature is analysed and refined in the respective steps 950 and 960, the process 810 follows an arrow 981 to a saliency calculation process 970, performed by the processor 1005 executing the UIR software program 1033, which decomposes the input image 110, received via an arrow 978, into different channels based on different features. The step 970 further analyses the salience information in each channel, and weight salience information based on the refined weights calculated in the feature weight refinement process 960, and combines them together to form, as depicted by an arrow 982, a refined map 820. Example(s)/User Case(s) [00089] One use case of the UIR arrangement is digital cameras. Due to the size constraint of the camera back screen, a photographer often manipulates the screen, such as zooming in and navigating to the left or right, to see the details of a certain area of an image that he took. The UIR arrangement can record the user manipulation of the screen (which results in the user manipulation information 135) and use the information to refine the saliency map of a photo. The refined saliency map more closely reflects the visual 69137161 P053185_speci_lodge - 21 attention paid by the individual user when viewing a photo compared with an automatically computed saliency map. [00090] Figs. 7(a) - 7(f) show figures illustrating an example of a saliency map refinement method used in a digital camera 1001 according to a UIR arrangement. Fig. 7(a) illustrates a typical image displayed on the rear display of the digital camera. A reference numeral 710 refers to the LED display screen 1014. A reference numeral 720 refers to a button (which is part of the user interface 1013) for manipulating an image on the display screen 710. A reference numeral 730 depicts an example of an input image displayed on the display screen 710. [00091] The user can manipulate the image 730 on the display screen 710 using the button 720 to inspect a certain part of the image 730 for more details. Fig. 7(b) shows an example where the screen 710 shows a part 740 of the input image 730 to which the user has navigated as a result of the user's manipulation. [00092] Fig. 7(c) depicts a global saliency map 150 generated by the global saliency calculation process 120. Darker regions represent higher saliency values. Regions 751 to 754 correspond to regions 731 to 734 in the input image 730. In the global saliency map example shown in Fig. 7(c), regions 751 and 752 are darker than, and therefore more relatively significant (i.e. salient) than regions 753 and 754. [00093] Fig. 7(d) illustrates an example of a local saliency map (e.g. see VI, i.e. 395 in Fig. 3) associated with the corresponding region 740 (e.g. see R;, i.e. 394 in Fig. 3) determined by the saliency prediction process 350. Fig. 7(e) illustrates an example of the local saliency map 160 calculated based on the local saliency calculation process 140. The saliency map of region 740 shown in Fig. 7(d) corresponds to the dotted region 760 in Fig. 7(e). [00094] The map combination process 170 combines the local saliency map 160 (depicted in Fig. 7(e)) and the global saliency map 150 (depicted in Fig. 7(c)). Fig. 7(f) illustrates an example of the refined saliency map 180. Darker regions have higher saliency values. Regions 771 to 774 correspond to regions 751 to 754 respectively. In the example of the refined saliency map 180 shown in Fig. 7(f), the regions 773 and 774 become more salient than the regions 771and 772. This is because the user is more interested in regions 773 and 774. The refined saliency map more closely reflects the attention paid by the individual user to the specific image in question. 6913716_1 P053185_specilodge - 22 Industrial Applicability [00095] The arrangements described are applicable to the computer and data processing industries and particularly for the processing of images to determine a prime subject of interest. [00096] The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. [00097] In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. 6913716_1 P053185_specilodge

Claims

1. A method of forming a refined saliency map of an image captured by an image capture device , said method comprising the steps of: determining a global saliency map of the captured image, said global saliency map defining regions of relative significance across the image; determining user interaction data depending upon interactions of a user with the image capture device, said user interactions navigating to a part of the image on a display screen; identifying, in the image, at least one region of significance based on the user interaction data; determining a local saliency map based on the at least one identified region of significance, said local saliency map defining regions of relative significance across the at least one identified region of the image independent of other regions of the image; and refining the global saliency map dependent upon the local saliency map to form the refined saliency map of the captured image.

2. A method according to claim 1, wherein the display screen forms part of the image capture device.

3. A method according to claim 1, wherein the refining step comprises weighting the global saliency map dependent upon a time during which said part of the image, to which the user interaction has navigated, is present on the display screen.

4. A method according to claim 1, wherein at least one of the steps of determining the global saliency map and determining the local saliency map comprises the steps of: detecting an object in the captured image; and defining the detected object as a region of significance.

5. A method according to claim 1, wherein the user interaction navigating to the part of the image on a display screen comprises at least one of the steps of: zooming in on the part of the image; and adjusting a focal point on the displayed image to effect in-focus and out-of-focus regions of the image.

6. A method according to claim 1, wherein the refining step comprises the steps of: determining one or more visual attributes of the image such as colour and sharpness; and 6913716_1 P053185_specilodge - 24 performing the refining step depending upon relative values of the determined attributes in the local saliency map compared with the global saliency map.

7. An apparatus for forming a refined saliency map of an image captured by an image capture device,. said apparatus comprising: a processor; and a non-transitory computer readable medium storing a computer executable program for directing the processor to execute a method comprising the steps of: determining a global saliency map of the captured image, said global saliency map defining regions of relative significance across the image; determining user interaction data depending upon interactions of a user with the image capture device, said user interactions navigating to a part of the image on a display screen; identifying, in the image, at least one region of significance based on the user interaction data; determining a local saliency map based on the at least one identified region of significance, said local saliency map defining regions of relative significance across the at least one identified region of the image independent of other regions of the image; and refining the global saliency map dependent upon the local saliency map to form the refined saliency map of the captured image.

8. A non-transitory computer readable medium storing a computer executable program for directing a processor to execute a method for forming a refined saliency map of an image captured by an image capture device, said program comprising: computer executable code for determining a global saliency map of the captured image, said global saliency map defining regions of relative significance across the image; computer executable code for determining user interaction data depending upon interactions of a user with the image capture device, said user interactions navigating to a part of the image on a display screen; computer executable code for identifying, in the image, at least one region of significance based on the user interaction data; computer executable code for determining a local saliency map based on the at least one identified region of significance, said local saliency map defining regions of relative significance across the at least one identified region of the image independent of other regions of the image; and computer executable code for refining the global saliency map dependent upon the local saliency map to form the refined saliency map of the captured image. 6913716_1 P053185_speci_lodge - 25 Dated 30th day of November 2012 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant/Nominated Person SPRUSON & FERGUSON 6913716_1 P053185_speci_lodge