WO2018046959A1

WO2018046959A1 - Image storage and retrieval

Info

Publication number: WO2018046959A1
Application number: PCT/GB2017/052658
Authority: WO
Inventors: Mark LANSDALE
Original assignee: University Of Leicester
Priority date: 2016-09-12
Filing date: 2017-09-11
Publication date: 2018-03-15
Also published as: GB201615451D0

Abstract

A method for processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database: comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances and the comparison of spatial relationships.

Description

Image Storage and Retrieval

Technical Field

[0001] The present invention relates to methods, devices, apparatuses and systems for storing images in a database and retrieving images from a database. Specifically, the present invention relates to methods and apparatuses for searching and displaying images from an image database and for storing images in a database according to the spatial configuration of determined points of interest input by a user. Background

[0002] In recent years, there has been an increasing popularity of digital cameras and a tremendous expansion in the power and storage capacity of computing devices. More and more people use electronic devices equipped with digital cameras in their everyday lives. The quantity and variety of digital images produced by these electronic devices have increased immensely, and many computing devices are now equipped with sufficient power and storage capacity to store, retrieve, view, and edit large numbers of high resolution digital images. In addition, the exponential growth in wireless networking as well as the improvement in connection speeds allows users to access large databases of images stored remotely over networks. Accordingly, there is a need for efficient image storage and retrieval techniques so as to provide users with a way to easily navigate through the growing numbers of available digital images.

[0003] Currently known image retrieval techniques allow users to retrieve images in one of two ways: keyword-based image retrieval and content-based image retrieval. Keyword-based image retrieval performs searches for images by matching keywords input by a user to keywords that have been pre-assigned to the images. However, with keyword-based image retrieval the retrieval efficiency may be limited, due to the inability to match images for queries which include ambiguous descriptions. Content- based image retrieval performs searches for images that are similar to an example image in terms of low-level image features, such as colour histogram, texture, shape, etc. Accordingly, queries based on content-based image retrieval may return completely irrelevant images which happen to contain similar low-level image features.

Summary of Invention

[0004] In a first aspect, this specification describes a method of processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and calculating a similarity ranking value for each database image based on the comparison of spatial relationships.

[0005] The method may further comprise displaying the plurality of database images in a list according to the calculated similarity ranking values.

[0006] A spatial relationship along the x-axis and a spatial relationship along the y-axis may be evaluated between each received point of interest and each other point of interest.

[0007] Each spatial relationship may be evaluated as a first point of interest having a coordinate value which is higher, lower or equal to that of a second point of interest with respect to the x-axis or the y-axis; and the evaluation of each spatial relationship ma y be represented with a numerical value of 1, -1 or o respectively.

[0008] The comparison of spatial relationships may comprise determining whether the numerical values representing the evaluation of the two spatial

relationships are a match, and assigning a score to the comparison based on the determination.

[0009] The method may further comprise receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels. [0010] The receiving word labels may comprise receiving a voice input for each received point of interest and generating a word label corresponding to each voice input.

[0011] The method may further comprise generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels. [0012] The corresponding word labels in the plurality of database images may be synonyms or generalisations of the received word labels. [0013] The method may further comprise: assigning a numerical label to each of the plurality of received points of interest; wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels. [0014] The plurality of received points of interest may be assigned with numerical labels according to the order in which the points of interest are received.

[0015] The plurality of received points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.

[0016] Displaying the plurality of database images in a list according to the calculated similarity ranking values may comprise displaying the plurality of database image in an order of highest similarity ranking value to lowest similarity ranking value. [0017] In a second aspect, this specification describes a method of processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; for each image in the database, calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances.

[0018] The method may further comprise displaying the plurality of database images in a list according to the calculated similarity ranking values.

[0019] The method may further comprise: calculating an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area; wherein the similarity ranking of database images is based on the calculated accuracy metric for each database image. [0020] The method may further comprise receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels. [0021] The method may further comprise receiving a voice input for each received point of interest and generating a word label corresponding to each voice input.

[0022] The method may further comprise generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels.

[0023] The corresponding word labels in the plurality of database images may be synonyms or generalisations of the received word labels. [0024] The method may further comprise assigning a numerical label to each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.

[0025] The plurality of received points of interest may be assigned with numerical labels according to the order in which the points of interest are received.

[0026] The plurality of received points of interest may be assigned with numerical labels according to the location of each point of interest within the input area. [0027] Displaying the plurality of database images in a list according to the calculated similarity ranking values may comprise displaying the plurality of database image in order of highest similarity ranking value to lowest similarity ranking value.

[0028] In a third aspect, this specification describes a method for processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database:

comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances and the comparison of spatial relationships. [0029] In a fourth aspect, this specification describes a device for processing an interrogative query for an image database storing a plurality of images, comprising: a display for displaying an input area; an input module for receiving an input specifying locations for a plurality of points of interest within the input area; and a query module configured to: evaluate a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and calculate a similarity ranking value for each database image based on the comparison of spatial

relationships.

[0030] The display may be further configured to display the plurality of database images in a list according to the calculated similarity ranking values.

[0031] The input module may comprise at least one of: a mouse, a touch screen display apparatus, and an eye-tracking apparatus.

[0032] In a fifth aspect, this specification describes a method of structurally storing an image in an image database, comprising: displaying the image, which is pending storage, to a user for a predetermined period of time; tracking an eye movement of the user over the time period; determining one or more points of interest which are fixated upon by the user during the time period; storing the image with data representing the one or more points of interest.

[0033] The method may further comprise: displaying the one or more determined points of interest to the user; receiving at least one word label for each of the displayed points of interest; and storing the image with the at least one word label for each of the one or more points of interest.

[0034] The receiving at least one word label may comprise receiving a voice input for each determined point of interest and generating a word label corresponding to each voice input. [0035] The method may further comprise: assigning a numerical label to each of the one or more determined points of interest; and storing the image with the numerical label for each of the one or more points of interest.

[0036] The one or more determined points of interest may be assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user during the time period. [0037] The one or more determined points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.

Brief Description of Drawings

[0038] For a more complete understanding of the methods, apparatuses, devices, and systems described herein, reference is made to the following descriptions taken in connection with the accompanying drawings in which:

Figure 1 is a block diagram illustrating a system for storing an image in an image database, according to an embodiment of the present invention;

Figure 2 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 1, according to another exemplary embodiment of the present invention;

Figure 3 is a block diagram illustrating a system for searching an image database which stores a plurality of images, according to another embodiment of the present invention; Figure 4 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention;

Figure 5 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention; and

Figure 6 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention. Description of Embodiments [0039] Embodiments of the invention allow users to store a plurality of images based on points of interest in an image and their relative locations in the image, and to retrieve and display images based on queries containing points of interest and their relative locations in the image.

[0040] The invention utilises spatial memory of the users to manage an image database which provides a more intuitive way for image storage and retrieval. The invention exploits a theoretical understanding of human spatial memory to manage an image database to enable distinct and useful channels of query not available to other query methods. Additionally, because the invention exploits natural psychological competences, it enables functionalities which allow uses effectively to manipulate trade-offs between the cost of search and its efficiency in a task-appropriate manner, making it an entirely novel and appropriate approach to a wide range of application domains.

[0041] Figure 1 is a block diagram illustrating a system for storing an image in an image database, according to another exemplary embodiment.

[0042] As shown in Figure 1, the system 1 comprises a device 100 and an image database 200. The device 100 comprises a display 110, a tracking module 120, a control module 130, a storage module 140, and an input module 150.

[0043] The display 110 is arranged to display an image to a user for a

predetermined period of time. Over the time period in which the image is displayed to the user, the tracking module 120 is arranged to track eye movement of the user. The tracking module 120 comprises an eye tracker including a camera which records movement of one or both eyes of the user as the user looks at the displayed image at the display 110. [0044] The control module 130 is arranged to determine one or more points of interest which are fixated upon by the user during the time period in which the image is displayed to the user. Specifically, the control module 130 receives the recorded eye movement data by the eye tracker of the tracking module 120 and determines, based on where the user focused their gaze and a duration for which they focused their gaze, one or more points of interest which are fixated upon by the user. [0045] After the one or more points of interests fixated upon by the user have been determined, the display 110 in the present embodiment is further arranged to display the one or more points of interest to the user, and the input module 150 is arranged to receive at least one word label from the user for each of the displayed points of interest. Specifically, the input module 150 of the present embodiment comprises a microphone to receive voice input from the user, as the one or more determined points of interests are displayed. The input module 150 then generates a word label corresponding to each voice input and assigns to the determined point of interest which is displayed by the display 110.

[0046] The displayed image is then stored along with data representing the one or more points of interest in the storage module 140. In the present embodiment, the data representing one or more points of interest includes at least one of: a location of the one or more determined points of interest, a duration for which the user focused their gaze on the one or more determined points of interest, a frequency with which the user focused their gaze on the one or more determined points of interest, a word label assigned to the one or more points of interest. In some alternative embodiments, numerical labels may be assigned to the determined points of interest, and the data representing one or more points of interest may include a numerical label assigned to the one or more points of interest.

[0047] In alternative embodiments, the device 100 may not be arranged to display the one or more determined points of interest to the user or to receive word labels for each determined point of interest. In these alternative embodiments, once the control module determines the one or more points of interest which are fixated upon by the user, the storage module may be arranged to store the image along with locations of the determined points of interest.

[0048] Figure 2 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 1, according to another embodiment of the present invention.

[0049] The process starts at step 21. In step 21, the display 110 displays an image to a user. By way of example, the display 110 may be arranged to display an image of an English village. In the centre of the English village there is a small park with a tree, a lawn, and a hedge. The park itself is surrounded by old buildings with shops and small streets. Old street lights and benches in the park round up the image of a cosy small, old village.

[0050] Although it is shown in the flowchart that step 22 follows step 21, it is noted that steps 21 and 22 occur simultaneously in the present embodiment. In other words, as the image is being displayed to the user in step 21, the eye tracker of the tracking module 120 tracks movement of one or both eyes of the user and records the movement. The eye movement data, which includes data related to a path that indicates the movement of the eye(s) and data related to a time duration the user focused their gaze on a point in the image, is recorded in the storage module 140.

[0051] Subsequently, in step 23, the control module 130 determines the one or more points of interest which are fixated upon by the user. This determination is based on the recorded eye movement data by the eye tracker of the tracking module 120. In the present example, as the image of the English village is displayed to the user, the user may focus on the following objects: the tree, the lawn, the hedge, the street lights, the benches, the shops, the small streets. The user may fixate on each of the objects a plurality of times and, for example, may focus his/her gaze a greater number of times on each of the first three objects and a lesser number of times on each of the rest of the objects in the image. The control module 130 analyses the recorded eye movement data stored in the storage module 140, for example by overlaying the path of the eye movement over the image displayed in step 21, so as to determine at least one point of interest which are fixated upon by the user. In the example used above, the control module 130 may determine the locations of the tree, the lawn, and the hedge in the displayed image as three different points of interest. In step 23, the control module 130 may be arranged to assign a numerical label to each of the determined points of interests according to at least one of: an order in which the user focus his/her gaze on the determined points of interest, a degree of visual attention from the user, and a user input.

[0052] Although it is shown in the flowchart that step 24 is followed by step 25, it is noted that these two steps also occur simultaneously in the present embodiment. In steps 24 and 25, the display 110 displays the determined points of interest to the user while the input module 150 receives at least one word label for each of the displayed points of interest. [0053] Continuing with the above example, the display 110 is arranged to display a portion of the database image which contains the tree, and the user may utter the word "tree" towards the microphone of the input module 150. This voice input is then processed by the input module 150 to generate the word label "tree" which becomes assigned to the determined point of interest, either automatically by the input module 150 or manually by the user. The same process may be repeated for other portions of the image, e.g. "lawn", "hedge", etc.

[0054] After the word labels are received and assigned to respective determined points of interest in step 25, in the subsequent step 26 the storage module 140 stores the image with data representing the one or more determined points of interest. In the present embodiment, the data representing one or more points of interest includes at least one of: a location of the one or more determined points of interest, a frequency with which the user focused their gaze on each of the one or more determined points of interest, a word label assigned to the one or more points of interest.

[0055] It is noted that the above steps can be applied to different images, and in some embodiments the device 100 may be arranged to display a predetermined set of images to the user sequentially in order to determine the points of interest in each one in the predetermined set of images. An example of a predetermined set of images provided by way of description is included in Appendix A.

[0056] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined.

[0057] Figure 3 is a block diagram illustrating a system for processing an interrogative query for a plurality of images, according to another embodiment of the present invention.

[0058] As shown in Figure 3, the system 2 comprises an image database 200 and a device 300. The device 300 comprises a display 310, an input module 320, a query module 330, and a storage module 340. The image database 200 stores a plurality of database images which can be retrieved based on a user-input interrogative query, as explained in more detail in the following. The image database 200 of the present embodiment may be the same one that is part of system 1 as illustrated in Figure 1. In other words, the images stored in the image database 200 can be searched and retrieved by the device 300 in system 2.

[0059] In the present embodiment, the display 310 and the input module 320 are integrated as a touch screen display that is arranged to initially display an input area by presenting a pro-forma blank region so as to allow a user to specify, as touch screen input, respective locations for a plurality of points of interest within the input area. The received points of interest at the input module 320 are considered as at least a part of the interrogative query for searching the image database 200.

[0060] In the present embodiment, the input module 320, which is integrated as part of a touch screen display as described above, is further arranged to receive at least one word label for each of the plurality of received points of interest. The at least one word label is provided as at least one respective virtual tag displayed at the display 310 for selection by the user. By manipulating the touch screen display, the user is able to relocate the at least one virtual tag to respective location(s) so as to assign each of the at least one virtual tag to the received points of interests that have been input by the user within the displayed input area. [0061] As mentioned above, the image database 200 of the system 2 comprises a plurality of database images which can be searched and retrieved by the device 300 based on the interrogative query. Each database image stored in the image database 200 comprises a plurality of points of interest. Moreover, in the present embodiment each database image stored in the image database 200 comprises at least one word label which is associated with a point of interest in the database image. This associated word label is a description of an object in the image represented by the point of interest, and it may relate to a name of an object in the image represented by the point of interest, a shape of an object in the image represented by the point of interest, or a colour of an object represented by the point of interest, etc.

[0062] After the input module 320 receives the input specifying locations for a plurality of points of interest within the input area, the query module 330 is arranged to perform a number of method steps, comprising: (1) evaluating a spatial relationship between each received point of interest and each other point of interest; (2) comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and (3) calculating a similarity ranking value for each database image based on the comparison of spatial

relationships. Steps (1) through (3) will be explained in further detail in the following.

Step (1):

[0063] The query module 330 is arranged to evaluate a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest.

[0064] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. In the context of step (1), the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.

[0065] In this embodiment, the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis. Each spatial relationship is evaluated as one of "higher", "lower", or "equal", respectively representing scenarios in which the first point of interest having a higher coordinate value than that of second point of interest, the first point of interest having a lower coordinate value than that of second point of interest, and the first point of interest having an equal coordinate value to that of the second point of interest. The evaluation of each spatial relationship is represented with a numerical value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest. The same evaluation technique is applied to the images stored in the image database 200 so as to obtain evaluated spatial relationships of the determined points of interests in each of the images stored in the image database 200 for the comparison step in step (2). Step (2 ):

[0066] The query module 330 is arranged to compare each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of the points of interest can be evaluated in a quantifiable manner.

[0067] For example, an exemplary database image Ai may have five points of interest that correspond to the received five points of interest input by the user at the input module 320. Corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/similar associated word label. The corresponding word labels in the plurality of database images can be exact matches, synonyms, or generalisations of the received word labels. For example, if the word "tree" is received as voice input at the input module 320, the query module 330 is able to retrieve any image(s) from the image database 200 which comprises word labels such as "forest", "plant", and "leaves", etc.

[0068] The query module 330 then compares each evaluated spatial relationship of the five received points of interest (from step (1)) with the evaluated spatial

relationships of database image Ai and assign an individual score to the comparison of each evaluated spatial relationship of the received points of interest with a

corresponding spatial relationship between corresponding points of interest in the database image Ai.

[0069] In this example, there are 20 possible spatial relationships, i.e. both the horizontal and vertical dimensions for the 10 spatial relationship permutations between the five points of interest. For each possible spatial relationship, the query module 330 determines whether there is a match.

[0070] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image Ai is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image Ai is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison. [0071] For example, if it is evaluated in step (1) that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image Ai is also "lower" (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image Ai is "higher" or "equal" (i.e. having a value of "1" or "0" which does not match with "-1"), then the query module 330 assigns a value of '0' to the comparison. This comparison and value assignment step is applied to all 20 possible spatial relationships to generate an accumulated score N_Ai, where N for each of the database image is calculated using the formula N =

¾+¾+..._+n₂₀ (n_x representing individual spatial relationship comparison value for each of the 20 possible spatial relationships). The generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.

[0072] In practice, a proportion of spatial relationships can be expected to be matched by chance. The modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20. In order to take into account the likelihood of matching spatial relationships by chance, a spatial composition similarity metric L for each database image is calculated using: L= (Ν-7.5)/(20-7·5). In this embodiment, any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1. In this case, o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy. [0073] The same process is repeated for database image Bi, database image Ci, etc. so as to generate a respective spatial composition similarity metric L (i.e. L_Bi, La, etc.) for each database image stored in the image database 200.

[0074] The query module 330 of the present embodiment may be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels, prior to the comparison of step (2). Specifically, the query module 330 is arranged to retrieve any image(s) from the image database 200 which comprises an associated word label equivalent to that received word labels at the input module 320. The query module 330 may be configured to perform the comparison of step (2) on the retrieved subset of database images only.

Step f< :

[0075] The query module 330 is arranged to calculate a similarity ranking value for each database image, based on the comparison of spatial relationships and generation of spatial composition similarity metrics in step (2). In more detail, in the present embodiment the query module 330 calculates a similarity ranking value based on the spatial composition similarity metric L in step (2) for each of the database image stored in the image database 200. [0076] For example, database image Ai may have a generated metric L_Ai of 0.37, database image Bi may have a generated metric L_Bi of 0.20, database image Ci may have a generated metric La of 0.44, and database image Di may have generated metric LDI of 0.27. According to the process for generating a spatial composition similarity metric for a database image as described in step (2), the generated metric for a database image indicates a degree of configural similarity of the database image with the received points of interest in terms of spatial relationships between the points of interest, wherein a lower score indicates a lower degree of similarity and a higher score indicates a higher degree of similarity. Therefore, in this example, the ranking value for database images Ai to Di would be - database image Ai similarity ranking value: 2; database image Bi similarity ranking value 4; database image Ci similarity ranking value: 1; and database image Di similarity ranking value: 3. In other words, the calculated similarity ranking value indicates the ranking of overall configural similarity of a database image to the received points of interest in terms of the spatial

relationships between the points of interest.

[0077] In the present embodiment, the similarity ranking value for each database image calculated in step (3) is stored in the storage module 340.

[0078] Subsequent to the calculation of ranking values by the query module 330, the display 310 is further arranged to display the plurality of images stored in the image database 200 in a list according to the calculated similarity ranking values. In the example as described above, the display 110 would be arranged to display database images Ai to Di in the order of: Ci, Ai, Di, and Bi. The user is therefore presented with a ranked list of database images according to a level of configural similarity in terms of the spatial relationships between the points of interest.

[0079] Figure 4 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another exemplary embodiment. [0080] The present embodiment is similar to that described in relation to Figure 3. Each database image stored in the image database 200 comprises a plurality of points of interest. Moreover, in the present embodiment each database image stored in the image database 200 comprises at least one word label which is associated with a point of interest in the database image. This associated word label is a description of an object in the image represented by the point of interest, and it may relate to a name of an object in the image represented by the point of interest, a shape of an object in the image represented by the point of interest, or a colour of an object represented by the point of interest, etc. [0081] The process starts at step 41. In step 41, the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.

[0082] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. [0083] In addition, the input module 320 in the present embodiment comprises a microphone which is arranged to receive a voice input for each received point of interest, and the input module 320 generates a word label corresponding to each voice input.

[0084] For example, for a first received point of interest the user may utter the word "tree" towards the microphone of the input module 320. This voice input is then processed by the input module 320 to generate the word label "tree" which becomes assigned to the first received point of interest, either automatically by the input module 320 or manually by the user manipulating the touch screen display.

[0085] In the subsequent step 42, the query module 330 evaluates a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest. In the context of step 42, the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.

[0086] Specifically, the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis. The evaluation of each spatial relationship is represented with a value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.

[0087] The method then proceeds to step 43, in which the query module 330 compares, for each image in the image database 200, each spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of points of interest can be evaluated in a quantifiable manner.

[0088] For example, an exemplary database image Ai may have five points of interest that correspond to the received five points of interest input by the user at the input module 320. Corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/similar associated word label. The corresponding word labels in the plurality of database images can be exact matches, synonyms, or generalisations of the received word labels. For example, if the word "sky" is received as voice input at the input module 320, the query module 330 is able to retrieve any image(s) from the image database 200 which comprises word labels such as "sun", "blue", and "cloud", etc.

[0089] The query module 330 then compares each evaluated spatial relationship of the five received points of interest (from step 42) with the evaluated spatial

corresponding spatial relationship between corresponding points of interest in the database image Ai. [0090] In this example, there are 20 possible spatial relationships, i.e. both the horizontal and vertical dimensions for the 10 spatial relationship permutations between the five points of interest. For each possible spatial relationship, the query module 330 determines whether there is a match. [0091] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image Ai is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image Ai is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison.

[0092] For example, if it is evaluated in step 42 that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image Ai is also "lower" (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image Ai is "higher" or "equal" (i.e. having a value of "1" or "0" which does not match with "-1"), then the query module 330 assigns a value of '0' to the comparison. This comparison and value assignment step is applied to all 20 possible spatial relationships to generate an accumulated score N_Ai, where N for each of the database image is calculated using the formula N =

[0093] In practice, a proportion of spatial relationships can be expected to be matched by chance. The modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20. In order to take into account the likelihood of matching spatial relationships by chance, a spatial composition similarity metric L for each database image is calculated using: L= (Ν-7.5)/(20-7·5). In this embodiment, any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1. In this case, o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy.

[0094] The same process is repeated for database image Bi, database image Ci, etc. so as to generate a respective spatial composition similarity metric L (i.e. L_Bi, La, etc.) for each database image stored in the image database 200.

[0095] The query module 330 of the present embodiment may be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels, prior to the

comparison of step 43. Specifically, the query module 330 is arranged to retrieve any image(s) from the image database 200 which comprises an associated word label equivalent to that received word labels at the input module 320. The query module 330 may be configured to perform the comparison of step 43 on the retrieved subset of database images only.

[0096] In step 44, the query module 330 calculates a similarity ranking value for each database image based on the comparison of spatial relationships and generation of spatial composition similarity metrics in step 43. In more detail, in the present embodiment the query module 330 calculates a similarity ranking value based on the spatial composition similarity metric L in step 43 for each of the database image stored in the image database 200.

[0097] For example, database image Ai may have a generated metric L_Ai of 0.37, database image Bi may have a generated metric L_Bi of 0.20, database image Ci may have a generated metric La of 0.44, and database image Di may have generated metric LDi of 0.27. According to the process for generating a spatial composition similarity metric for a database image as described in step (2), the generated metric for a database image indicates a degree of configural similarity of the database image with the received points of interest in terms of spatial relationships between the points of interest, wherein a lower score indicates a lower degree of similarity and a higher score indicates a higher degree of similarity. Therefore, in this example, the ranking value for database images Ai to Di would be - database image Ai similarity ranking value: 2; database image Bi similarity ranking value 4; database image Ci similarity ranking value: 1; and database image Di similarity ranking value: 3. In other words, the calculated similarity ranking value indicates the ranking of overall configural similarity of a database image to the received points of interest in terms of the spatial

relationships between the points of interest. [0098] In the present embodiment, the similarity ranking value for each database image calculated in step 44 is stored in the storage module 340.

[0099] After the calculation of ranking values in step 44, in step 45 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated ranking values. In the example as described above, in step 45 the display 310 would be arranged to display database images Ai to Di in the order of: Ci, Ai, Di, and Bi. The user is therefore presented a ranked list of database images according to a level of configural similarity in terms of the spatial relationships between the points of interest.

[0100] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined. [0101] Figure 5 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention. [0102] The process starts at step 51. In step 51, the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.

[0103] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9.

[0104] In the subsequent step 52, the query module 330 assigns a numerical label to each of the plurality of points of interests received in step 51. In the present embodiment, the query module 330 is arranged to assign numerical labels according to the order in which the points of interest are received in step 51. In other words, the first received point of interest is assigned , the second received point of interest is assigned '2', and so forth.

[0105] The method then proceeds to step 53, in which the query module 330 calculates, for each database image stored in the image database 200, a distance between each received points of interest and a corresponding point of interest in the database image.

[0106] In the present embodiment, each image in the image database 200 comprises at least one numerical label which is pre-assigned to a point of interest in the image according to at least one of: an order in which a user focus his/her gaze on the determined points of interest, a degree of visual attention from a user, and a user input.

[0107] As such, the query module 330 in the present embodiment is arranged to calculate, for each database image, a distance between the received point of interest which is assigned the numerical label with the point of interest in the database image which is pre-assigned with the numerical label . The same calculation process is repeated for the other received points of interest and each of the other database images stored in the image database 200.

[0108] In step 54, the query module 330 calculates, for each database image, an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area.

[0109] By way of illustrating how a weighting factor is generated, consider a point of interest of a database image (herein referred to as a "target") which has the coordinates [5,5], i.e. corresponding to the centre cell of the 9x9 cellular grid input area. In this particular example, any point of interest of the interrogative query received in step 51 can be at most displaced from the target by 4 cells in the x-axis or the y-axis. Based on the working approximation that any guessed input of a point of interest is equally likely in any location in the input area, the average displacement of a point of interest of an interrogative query (C[_x,_y]) by chance for a target with the coordinates [5_>5] is 2.2 cells - i.e. C[₅,₅] = 2.2. On the other hand, a target with the coordinates [1,1] has an average displacement of an interrogative query by chance of 4 cells, i.e. (¾,ι] = 4, since a guessed input of a point of interest can be at most displaced from the target by 8 cells in the x-axis or the y-axis.

[0110] Accordingly, on the basis of the working approximation that all guessed input of points of interest are equally likely, the average displacement of a point of interest of an interrogative query from a specific target can be pre-calculated and stored as a look-up table in the storage module 340.

[0111] In step 54, for each point of interest in a database image, the query module 330 retrieves an average displacement of a point of interest of an interrogative query (C[x,_y]) from the look-up table and generate a corresponding weighting factor so as to normalise the corresponding calculated distance for chance. In other words, the query module 330 multiplies each of the calculated distances and its respective weighting factor so as to obtain values of normalised calculated distances. Using the formula C = 1 - [normalised calculated distance], C for each of the received points of interest can be calculated where a value of o represents chance values and 1 represents absolutely accurate recall. In this embodiment, any negative value of C becomes rounded up to o. Also, in the present embodiment, the accuracy metric D for each database image is the average of C corresponding to all points of interest within the database image. In other words, the calculated similarity ranking value indicates the ranking of overall proximity similarity of a database image to the received points of interest in terms of the locations of the points of interest.

[0112] The method then proceeds to step 55, in which the query module 330 calculates a similarity ranking value for each database image based on the calculated distances. Specifically, in the present embodiment, the similarity ranking value for each of the plurality of database images is calculated based on the accuracy metrics calculated in step 54.

[0113] For example, database image A2 may have an accuracy metric D_A2 of 0.29, database image B2 may have an accuracy metric D_B2 of 0.37, database image C2 may have an accuracy metric Dc₂ of 0.34, and database image D2 may have an accuracy metric D_D2 of 0.28. According to the process for calculating an accuracy metric for a database image as described in step 54, the calculated accuracy metric for a database image indicates a degree of similarity of the database image with the received points of interest in terms of the proximity of the locations of corresponding points of interest, wherein a lower accuracy metric indicates a lower degree of similarity and a higher accuracy metric indicates a higher degree of similarity. Therefore, in this example, the similarity ranking value for database images A2 to D2 would be - database image A2 similarity ranking value: 3; database image B2 similarity ranking value 1; database image C2 similarity ranking value: 2; and database image D2 similarity ranking value: 4·

[0114] After the calculation of ranking values in step 55, in step 56 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated similarity ranking values. In the example as described above, in step 56 the display 310 would be arranged to display database images A2 to D2 in the order of: B2, C2, A2, and D2. The user is therefore presented a ranked list of database images according to a level of similarity in terms of the proximity of the locations of corresponding points of interest. [0115] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined. For example, step 52 may be omitted from the method described above, such that no numerical labels are assigned to the points of interest and no numerical labels are pre-assigned to the points of interest of the images stored in the image database 200. Also, as another example, step 54 may be omitted from the method described above. In this case, the calculation of ranking values would not be based on accuracy metric, but directly based on calculated distances instead. [0116] Figure 6 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention.

[0117] The process starts at step 61. In step 61, the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area. [0118] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. [0119] In the subsequent step 62, the query module 330 assigns a numerical label to each of the plurality of points of interests received in step 61. In the present embodiment, the query module 330 is arranged to assign numerical labels according to the order in which the points of interest are received in step 61. In other words, the first received point of interest is assigned , the second received point of interest is assigned '2', and so forth.

[0120] In the subsequent step 63, the query module 330 evaluates a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest. In the context of step 63, the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area. [0121] Specifically, the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis. The evaluation of each spatial relationship is represented with a value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.

[0122] The method then proceeds to step 64, in which the query module 330 compares, for each image in the image database 200, each spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of points of interest can be evaluated in a quantifiable manner.

[0123] In the present embodiment, each image in the image database 200 comprises at least one numerical label which is pre-assigned to a point of interest in the image according to at least one of: an order in which a user focus his/her gaze on the determined points of interest, a degree of visual attention from a user, and a user input.

[0124] The exemplary database image A3 may have five points of interest that correspond to the received five points of interest input by the user at the input module 320. Corresponding points of interest in a database image maybe determined by matching a received point of interest with an assigned numerical label with a point of interest in the database image with the same numerical label.

[0125] The query module 330 then compares each evaluated spatial relationship of the five received points of interest (from step 63) with the evaluated spatial

relationships of database image A3 and assign an individual score to the comparison of each evaluated spatial relationship of the received points of interest with a

corresponding spatial relationship between corresponding points of interest in the database image A3.

[0126] In this example, there are 20 possible spatial relationships, i.e. both the horizontal and vertical dimensions for the 10 spatial relationship permutations between the five points of interest. For each possible spatial relationship, the query module 330 determines whether there is a match.

[0127] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image A3 is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image A3 is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison.

[0128] For example, if it is evaluated in step 63 that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image A3 is also "lower" (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image A3 is "higher" or "equal" (i.e. having a value of "1" or "0" which does not match with "-1"), then the query module 330 assigns a value of '0' to the comparison. This comparison and value assignment step is applied to all 20 possible spatial relationships to generate an accumulated score NAI, where N for each of the database image is calculated using the formula N = ¾+¾+..._+n₂₀ (n_x representing individual spatial relationship comparison value for each of the 20 possible spatial relationships). The generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.

[0129] In practice, a proportion of spatial relationships can be expected to be matched by chance. The modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20. In order to take into account the likelihood of matching spatial relationships by chance, a spatial composition similarity metric L for each database image is calculated using: L= (Ν-7.5)/(20-7·5). In this embodiment, any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1. In this case, o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy. [0130] The same process is repeated for database image B3, database image C3, etc. so as to generate a respective spatial composition similarity metric L (i.e. L_B3, Lc₃, etc.) for each database image stored in the image database 200.

[0131] The method then proceeds to step 65, in which the query module 330 calculates, for each database image stored in the image database 200, a distance between each received points of interest and a corresponding point of interest in the database image.

[0132] As mentioned, each image in the image database 200 in the present embodiment comprises at least one numerical label which is pre-assigned to a point of interest in the image. The query module 330 is arranged to calculate, for each database image, a distance between the received point of interest which is assigned the numerical label with the point of interest in the database image which is pre- assigned with the numerical label . The same calculation process is repeated for the other received points of interest and each of the other database images stored in the image database 200.

[0133] In step 66, the query module 330 calculates, for each database image, an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area. [0134] By way of illustrating how a weighting factor is generated, consider a point of interest of a database image (herein referred to as a "target") which has the coordinates [5,5], i.e. corresponding to the centre cell of the 9x9 cellular grid input area. In this particular example, any point of interest of the interrogative query received in step 51 can be at most displaced from the target by 4 cells in the x-axis or the y-axis. Based on the working approximation that any guessed input of a point of interest is equally likely in any location in the input area, the average displacement of a point of interest of an interrogative query (C[x,y]) by chance for a target with the coordinates [5,5] is 2.2 cells - i.e. C[5,5] = 2.2. On the other hand, a target with the coordinates [1,1] has an average displacement of an interrogative query by chance of 4 cells, i.e. C[i,i] = 4, since a guessed input of a point of interest can be at most displaced from the target by 8 cells in the x-axis or the y-axis. [0135] Accordingly, on the basis of the working approximation that all guessed input of points of interest are equally likely, the average displacement of a point of interest of an interrogative query from a specific target can be pre-calculated and stored as a look-up table in the storage module 340. [0136] In step 66, for each point of interest in a database image, the query module 330 can retrieve an average displacement of a point of interest of an interrogative query (C[x,y]) from the look-up table and generate a corresponding weighting factor so as to normalise the corresponding calculated distance for chance. In other words, the query module 330 multiplies each of the calculated distances and its respective weighting factor so as to obtain values of normalised calculated distances. Using the formula C = 1 - [normalised calculated distance], C for each of the received points of interest can be calculated where a value of o represents chance values and 1 represents absolutely accurate recall. In this embodiment, any negative value of C becomes rounded up to o. Also, in the present embodiment, the accuracy metric D for each database image is the average of C corresponding to all points of interest within the database image.

[0137] The method then proceeds to step 66, in which the query module 330 calculates a similarity ranking value for each database image based on the calculated distances in step 65 and the comparison of spatial relationships in step 64. In the present embodiment, the calculation of similarity ranking values is based on a composite metric Q which is evaluated using the formula Q = (L+D)/2. [0138] For example, database image A3 may have a generated metric L_A3 of 0.31 and an accuracy metric D_A3 of 0.37 which gives a composite metric QA₃ of 0.34;

database image B3 may have a generated metric of L_B3 0.29 and an accuracy metric of D_B3 of 0.29 which give a composite metric QB₃ of 0.29; database image C3 may have a generated metric L_¾ 0.37 and an accuracy metric D_¾ of 0.35 which gives a composite metric Qc₃ of 0.36; and database image D3 may have a generated metric L_D3 of 0.36 an accuracy metric D_D3 of 0.34 which gives a composite metric QD₃ of 0.35. The composite metric indicates a degree of similarity of the database image with the received points of interest both in terms of the proximity of the locations of corresponding points of interest ("proximity similarity") and the spatial relationships between the points of interest ("configural similarity"), wherein a lower accuracy metric indicates a higher degree of similarity and a higher accuracy metric indicates a lower degree of similarity. Therefore, in this example, the similarity ranking value for database images A3 to D3 would be - database image A3 similarity ranking value: 3; database image B3 similarity ranking value 4; database image C3 similarity ranking value: 1; and database image D3 similarity ranking value: 2.

[0139] After the calculation of ranking values in step 67, in step 68 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated similarity ranking values. In the example as described above, in step 68 the display 310 would be arranged to display database images A2 to D2 in the order of: C3, D3, A3, and B3. The user is therefore presented a ranked list of database images according to both proximal similarity and configural similarity. By employing both object-centred recall, in which the proximity of individual points of interests in relation to their notional correspondent in a database image is quantified and scene-configural recall, in which recall of the relative locations of a set of points of interests is quantified, the present embodiment provides a synergistic effect in query- processing in terms of image retrieval accuracy, for example compared to the embodiments described in relation to Figures 4 and 5.

[0140] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or maybe combined. For example, step 68 may be omitted from the method described above, such that the database images are not displayed. [0141] Although it is described in an embodiment above that the display and the input module of the device are integrated as a touch screen display, in alternative embodiments the display and the input module may not be integrated as a single component. In addition, in alternative embodiments, the display may comprise other types of display devices, and the input module may comprise other types of input devices, e.g. a mouse, a keyboard, etc.

[0142] In alternative embodiments, the storage modules of the devices in Figure 1 and Figure 3 may be implemented as external components of the devices. Also, in alternative embodiments, the image database may be integrated into the devices of Figure 1 and Figure 3.

[0143] In alternative embodiments, the device may not be arranged to receive word labels that correspond to received/ determined points of interest.

[0144] Although it is described in an embodiment above that corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/ similar associated word label, in alternative embodiments, corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned numerical label with a point of interest in the database image with the same pre-assigned numerical label. [0145] Although it is described in an embodiment above that the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received as at least part of a user query, in alternative embodiments the received points of interest may be assigned with numerical labels according to other factors, e.g. the location of each point of interest within the input area. Also, in other alternative embodiments, the received points of interest may not be assigned with numerical labels at all.

[0146] Although it is described in an embodiment above that the plurality of determined points of interest are assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user as an image is being displayed to the user, in alternative embodiments the determined points of interest may be assigned with numerical labels according to other factors, e.g. the location of each points of interest within the input area, or a duration for which the user focused their gaze on the one or more determined points of interest. Also, in other alternative embodiments, the determined points of interest may not be assigned with numerical labels at all.

[0147] Although it is described in an embodiment above that receiving the at least one word label comprises receiving a voice input for each received/determined point of interest, in alternative embodiments the at least one word label may be received in other methods, e.g. keyboard input. In these alternative embodiments, the input module may comprise a keyboard.

[0148] Although it is described in an embodiment above that the input area presented to the user at the display is quantised into a uniform 9x9 cellular grid, in alternative embodiments the input area may be quantised with a different grain of analysis, e.g. a 15x15 cellular grid, a 6x8 cellular grid, etc., according to requirements of the system and the sizes/dimensions of the database images in the image database.

[0149] In alternative embodiments, the system may not be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels prior to a comparison of spatial relationships of points of interest.

[0150] Although it is described in an embodiment above that the weighting factor associated with a received point of interest is generated based on the average displacement of a point of interest of an interrogative query from a specific target, in alternative embodiments the weighting factor may be generated based on other factors.

[0151] In alternative embodiments, the image stored in the image database may each comprise an identification number, and the display may be arranged to display the plurality of database images in a list according to the calculated ranking values together with their respective identification numbers.

[0152] In alternative embodiments, for each database image, a spatial relationship between each point of interest and each other point of interest may be pre-evaluated and stored together with the database image. [0153] Although it is described in an embodiment above that the tracking module comprises an eye tracker including a camera for recording movement of one or both eyes of the user, in alternative embodiments the tracking module may comprise other types of tracking device such as an eye-attached tracking device or an electrical potential measuring device.

[0154] In alternative embodiments, the control module of the device may be arranged to perform evaluation of a spatial relationship between each determined point of interest and each other determined point of interest in the manner as described in relation to Figure 3. In these alternative embodiments, the storage module of the device may be arranged to store a numerical representation of the evaluated spatial relationship of a database image together with the database image. [0155] Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware, and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an exemplary embodiment, the application software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate, or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. [0156] Although various aspects of the present disclosure are set out in the independent claims, other aspects of the present invention may comprise other combinations of features from the described embodiments and/ or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

[0157] It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims. For example, in some of the embodiments described throughout this description, word labels are assigned to received points of interest and in some other embodiments numerical labels are assigned to the received points of interest. It will be appreciated that in some alternative embodiments, the step of assigning word labels may be replaced with the step of assigning numerical labels, and vice versa. In some further alternative embodiments, the method may comprise assigning word labels and numerical labels to the received points of interest. In addition, in some further alternative embodiments, the method may comprise assigning other types of alphanumeric descriptors to the received points of interest.

Ap endix A

In order to illustrate methods of the present invention in more detail, descriptions of a set of 27 images that can be used in relation to the method shown in Figure 3 are included as follows. It is noted that the descriptions provided in the following merely serve as examples of image content and potential points of interest that a user may focus on during an eye-tracking operation in the method of Figure 3. It will be understood that other images with different points of interest may be shown to users for the purpose of storing the images in a database. Image 1: University Campus

During the last sunshine of autumn, a student is riding a bike to a lecture, others are walking to the nearby dining hall or walking home. Another student is sitting on a bench rehearsing the last seminar. The leaves of the trees are shining in autumnal colours above the green lawn.

Image 2: Basketball Game

An offensive player is shooting the ball for three points while the defensive effort of the player with the headband will be too late. The other players are trying to get into a good position for a potential rebound. The display panel under the roof is also showing the game from a different viewpoint.

Image 3: Building Site

In a small town, preparations are being made to renew the pipeline system. The workers are currently having their lunch break in a red van, while the man in charge in sitting in a white van behind it. The material, which includes bricks and red and black connectors are placed where the works are set to continue, along with a traffic sign that was removed from a nearby street.

Image 4: Cargo Ship

One of the world's largest cargo ships, the "Colombo Express" is approaching a harbour to have its hundreds of containers unloaded. The pilot vessel from its side looks small. The life boat at the stern is capable of transporting all of the crew.

Image : Car Park

This place is usually a car park for a logistics company. Due to additional works, also indicated by the shovel in the front, a green tractor with a drill at the back is left during a break at the car park. The other cars, most of them middle class compact cars, are undamaged.

Image 6: Cafe in the Desert

Somewhere in the desert, a traveller is parking his red convertible outside a derelict cafe. At this place, which once stood two houses and an ice cream shop, he is hoping to find some leftover refreshment but the only things that still remain are the telephone and light posts, which still connect the cafe to the rest of the world. Image 7: Fjord in Spring

This Fjord is the place where many fishermen start their day. One of them lives in the old grey house with the back garden. From here he has a wonderful view across the Fjord with its blue water, the hills at the shores and fishing boats. Image 8: Flock of Sheep

A flock of sheep are grazing in front of a grey wooden house. The house was built on a slope and has two storeys. It also has a veranda on the right. There is even a small hut hidden in the forest. Image Q: Hill in Cloud

A red vintage tractor is left on top of three garages which were built into the field in front of this mixed forest. The hill on which this forest is growing, is engulfed in rain clouds which might be a reason for the driver to leave the tractor outside. Image 10: School in field

Before the 19th century, this old single storey school with iron swings in the back garden was the home to the children of settlers in the west of the USA. The view from the school goes beyond the bell tower to the mountains in the far distance, back to the swings next to the school, where the children used to play during school breaks.

Image 11: Italian Port

This small port in Italy is dominated by the yellow four-storey house right at the port's footbridge. Some of the house's green window shutters are left open. The blue sign indicates a nautical shop. All places at the footbridge are occupied by small sailing and motor boats. Image 12: Mountain Panorama

A mountaineer is refilling his water bottle at high altitude in the Swiss Alps. In the yellow rucksack are all the necessary items for a long hike. The drinking trough is usually for the cows which spend the summer months on higher grasslands. In the background are the snow covered mountain tops.

Image 13: Narrow Street

The narrow street, which leads down to the port of an old fishing village, is just wide enough for the white Range Rover. A tourist at the side gives way. A bin protrudes from one house entrance at the side, and the old church tower can be seen in the distance.

Image 14: NASA Hangar

In front of this NASA Hangar is a selection of planes, ranging from small prop airliners in the back via a blue jet fighter in the middle to larger passenger airplanes in the front. Even a helicopter is on display to the left. Behind the hangar are a few radio towers.

Image 15: Old Settlement

This is one of the first settlements in North America. The sign gives the date of the foundation and the building behind it later housed the town mayor, as the American Flag at the entrance shows. The Road in front of the house, the Bayley-Hazen Road, leads to the town's school.

Image 16: Palace Gardens

These palace gardens in France open their gates to the public in summer to show what lies behind the richly decorated iron garden fence. The white fence encloses a collection of red tulips, green hedges and a tree which have grown on the premises for decades. The fence also has stone crowns to show the wealth of the former owners.

Image 17: Paris

Art dealers set up their shops alongside a busy road in the French capital with the most famous Parisian cathedral in the background. Behind the trees are the towers and spires of the cathedral. Spectators walk past the exhibited pictures and reprints that are on display. Image 18: Pedestrian Crossing A pedestrian is patiently waiting at this crossing in Vienna. The red sign indicates the Museums quarter in this city which seems to start behind the pedestrian. The traffic on this day is very busy as it becomes obvious from the cars standing on and around the crossing.

Image IQ: Icy River

This picture of a river was taken in January. The low temperatures at this time of the year make it form icicles where the water falls several centimetres. Nonetheless, underneath the snow mantle, there are the first signs of Spring.

Image 20: Sailing Vessel

An officer on the deck of a sailing vessel is silently watching another vessel which is closer to the port. Visitors are on board the ship with the golden ornaments while sailors are climbing on the masts. The ropes on the officer's vessel are neatly aligned on the wooden deck.

Image 21: Space Engineers

Some space engineers are discussing a problem with the casing of a communication satellite. Their blue badges identify them as NASA employees. The two engineers in the front are talking about the thermal protection shield whereas the engineers in the back are inspecting the bolts of the same shield.

Image 22: Space Observatory

To be unaffected by other light sources space observatories are built far away from modern civilization. These two observatories with their big domes house the telescopes and are connected via a small road.

Image 23: San Francisco Suburb

As night falls people arrive home in their yellow, blue, white or green wooden houses in a suburb of this famous American city. As the street lights are turned on, life still goes on downtown in the skyscrapers and nightclubs. The tallest building in the centre is overlooking the rest of the city.

Image 24: Tower House

This newly refurbished tower house is rehousing nurses. Some nurses have just come home from their day shift with their file folders. Its three storeys were the home of the biggest vase manufacturing company in Europe. The tower with its rounded silver top still remains the symbol of the town.

Image 25: English Village

In the centre of an English village lies a small park with a tree, a lawn and a hedge. The park itself is surrounded by old buildings with shops and small streets. Old street lights and benches in the park round up the image of a cosy, small, old village.

Image 26: Windmill

A Mediterranean windmill with a pitched red roof and small rectangular windows is set on top of a hill overlooking the surrounding areas. It also houses a restaurant with outside dining area, popular with tourists and locals. The green door at the front leads to another garden area adjoining the brown bricks of the building's foundation. Image 27: Windsurfing

A strong wind blows the contestants of a regional windsurfing championship to high speed. With the rocks in the back the surfer at the front has taken the lead by more than 15 lengths ahead of the last surfers with the red and green sails who have just passed the surface marker buoy.

Claims

Claims l. A method of processing an interrogative query for a plurality of images stored in an image database, comprising:

receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area;

evaluating a spatial relationship between each received point of interest and each other point of interest;

for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and

calculating a similarity ranking value for each database image based on the comparison of spatial relationships.

2. The method of claim l, further comprising displaying the plurality of database images in a list according to the calculated similarity ranking values.

3. The method of claim 1 or claim 2, wherein a spatial relationship along the x-axis and a spatial relationship along the y-axis are evaluated between each received point of interest and each other point of interest.

4. The method of claim 3, wherein each spatial relationship is evaluated as a first point of interest having a coordinate value which is higher, lower or equal to that of a second point of interest with respect to the x-axis or the y-axis; and

the evaluation of each spatial relationship is represented with a numerical value of 1, -1 or o respectively.

5. The method of claim 4, wherein the comparison of spatial relationships comprises determining whether the numerical values representing the evaluation of the two spatial relationships are a match, and assigning a score to the comparison based on the determination.

6. The method of any preceding claim, further comprising receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels.

7. The method of claim 6, wherein the receiving word labels comprises receiving a voice input for each received point of interest and generating a word label

corresponding to each voice input.

8. The method of claim 6 or claim 7, further comprising generating a subset of database images which comprise points of interest having word labels which

correspond to the plurality of received word labels.

9. The method of any one of claims 6 to 8, wherein the corresponding word labels in the plurality of database images are synonyms or generalisations of the received word labels.

10. The method of any one of claims 1 to 5, further comprising:

assigning a numerical label to each of the plurality of received points of interest;

wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.

11. The method of claim 10, wherein the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received.

12. The method of claim 10, wherein the plurality of received points of interest are assigned with numerical labels according to the location of each point of interest within the input area.

13. The method of claim 2 and of any of claims 3 to 12 when dependent on claim 2, wherein displaying the plurality of database images in a list according to the calculated similarity ranking values comprises displaying the plurality of database image in an order of highest similarity ranking value to lowest similarity ranking value.

14. A method of processing an interrogative query for a plurality of images stored in an image database, comprising:

receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; for each image in the database, calculating a distance between each received point of interest and a corresponding point of interest in the database image; and

calculating a similarity ranking value for each database image based on the plurality of calculated distances.

15. The method of claim 14, further comprising displaying the plurality of database images in a list according to the calculated similarity ranking values.

16. The method of claim 14 or claim 15, further comprising:

calculating an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area;

wherein the similarity ranking of database images is based on the calculated accuracy metric for each database image.

17. The method of any of claims 14 to 16, further comprising receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels.

18. The method of claim 17, wherein the receiving word labels comprises receiving a voice input for each received point of interest and generating a word label

corresponding to each voice input.

19. The method of claim 17 or claim 18, further comprising:

generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels.

20. The method of any one of claims 17 to 19, wherein the corresponding word labels in the plurality of database images are synonyms or generalisations of the received word labels.

21. The method of any of claims 14 to 16, further comprising assigning a numerical label to each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.

22. The method of claim 21, wherein the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received.

23. The method of claim 21, wherein the plurality of received points of interest are assigned with numerical labels according to the location of each point of interest within the input area.

24. The method of claim 15 and of any of claims 16 to 23 when dependent on claim 15, wherein displaying the plurality of database images in a list according to the calculated similarity ranking values comprises displaying the plurality of database image in order of highest similarity ranking value to lowest similarity ranking value.

25. A method for processing an interrogative query for a plurality of images stored in an image database, comprising:

for each image in the database:

comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image;

calculating a distance between each received point of interest and a corresponding point of interest in the database image; and

calculating a similarity ranking value for each database image based on the plurality of calculated distances and the comparison of spatial relationships.

26. A device for processing an interrogative query for an image database storing a plurality of images, comprising:

a display for displaying an input area;

an input module for receiving an input specifying locations for a plurality of points of interest within the input area; and

a query module configured to:

evaluate a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and

calculate a similarity ranking value for each database image based on the comparison of spatial relationships.

27. The device of claim 26, wherein the display is further configured to display the plurality of database images in a list according to the calculated similarity ranking values.

28. The device of claim 26 or claim 27, wherein the input module comprises at least one of: a mouse, a touch screen display apparatus, and an eye-tracking apparatus.

29. A method of structurally storing an image in an image database, comprising: displaying the image, which is pending storage, to a user for a predetermined period of time;

tracking an eye movement of the user over the time period;

determining one or more points of interest which are fixated upon by the user during the time period;

storing the image with data representing the one or more points of interest.

30. The method of claim 29, further comprising:

displaying the one or more determined points of interest to the user;

receiving at least one word label for each of the displayed points of interest; and storing the image with the at least one word label for each of the one or more points of interest.

31. The method of claim 30, wherein the receiving at least one word label comprises receiving a voice input for each determined point of interest and generating a word label corresponding to each voice input.

32. The method of claim 29, further comprising:

assigning a numerical label to each of the one or more determined points of interest; and

storing the image with the numerical label for each of the one or more points of interest.

33. The method of claim 32, wherein the one or more determined points of interest are assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user during the time period.

34. The method of claim 32, wherein the one or more determined points of interest are assigned with numerical labels according to the location of each point of interest within the input area.