WO2018046959A1 - Image storage and retrieval - Google Patents

Image storage and retrieval Download PDF

Info

Publication number
WO2018046959A1
WO2018046959A1 PCT/GB2017/052658 GB2017052658W WO2018046959A1 WO 2018046959 A1 WO2018046959 A1 WO 2018046959A1 GB 2017052658 W GB2017052658 W GB 2017052658W WO 2018046959 A1 WO2018046959 A1 WO 2018046959A1
Authority
WO
WIPO (PCT)
Prior art keywords
interest
image
points
database
point
Prior art date
Application number
PCT/GB2017/052658
Other languages
French (fr)
Inventor
Mark LANSDALE
Original Assignee
University Of Leicester
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Leicester filed Critical University Of Leicester
Publication of WO2018046959A1 publication Critical patent/WO2018046959A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • G06F16/152File search processing using file content signatures, e.g. hash values
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present invention relates to methods, devices, apparatuses and systems for storing images in a database and retrieving images from a database. Specifically, the present invention relates to methods and apparatuses for searching and displaying images from an image database and for storing images in a database according to the spatial configuration of determined points of interest input by a user. Background
  • Keyword-based image retrieval performs searches for images by matching keywords input by a user to keywords that have been pre-assigned to the images.
  • keyword-based image retrieval the retrieval efficiency may be limited, due to the inability to match images for queries which include ambiguous descriptions.
  • Content- based image retrieval performs searches for images that are similar to an example image in terms of low-level image features, such as colour histogram, texture, shape, etc. Accordingly, queries based on content-based image retrieval may return completely irrelevant images which happen to contain similar low-level image features.
  • this specification describes a method of processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and calculating a similarity ranking value for each database image based on the comparison of spatial relationships.
  • the method may further comprise displaying the plurality of database images in a list according to the calculated similarity ranking values.
  • a spatial relationship along the x-axis and a spatial relationship along the y-axis may be evaluated between each received point of interest and each other point of interest.
  • Each spatial relationship may be evaluated as a first point of interest having a coordinate value which is higher, lower or equal to that of a second point of interest with respect to the x-axis or the y-axis; and the evaluation of each spatial relationship ma y be represented with a numerical value of 1, -1 or o respectively.
  • the comparison of spatial relationships may comprise determining whether the numerical values representing the evaluation of the two spatial
  • the method may further comprise receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels.
  • the receiving word labels may comprise receiving a voice input for each received point of interest and generating a word label corresponding to each voice input.
  • the method may further comprise generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels.
  • the corresponding word labels in the plurality of database images may be synonyms or generalisations of the received word labels.
  • the method may further comprise: assigning a numerical label to each of the plurality of received points of interest; wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.
  • the plurality of received points of interest may be assigned with numerical labels according to the order in which the points of interest are received.
  • the plurality of received points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.
  • Displaying the plurality of database images in a list according to the calculated similarity ranking values may comprise displaying the plurality of database image in an order of highest similarity ranking value to lowest similarity ranking value.
  • this specification describes a method of processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; for each image in the database, calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances.
  • the method may further comprise displaying the plurality of database images in a list according to the calculated similarity ranking values.
  • the method may further comprise: calculating an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area; wherein the similarity ranking of database images is based on the calculated accuracy metric for each database image.
  • the method may further comprise receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels.
  • the method may further comprise receiving a voice input for each received point of interest and generating a word label corresponding to each voice input.
  • the method may further comprise generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels.
  • the corresponding word labels in the plurality of database images may be synonyms or generalisations of the received word labels.
  • the method may further comprise assigning a numerical label to each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.
  • the plurality of received points of interest may be assigned with numerical labels according to the order in which the points of interest are received.
  • the plurality of received points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.
  • Displaying the plurality of database images in a list according to the calculated similarity ranking values may comprise displaying the plurality of database image in order of highest similarity ranking value to lowest similarity ranking value.
  • this specification describes a method for processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database:
  • this specification describes a device for processing an interrogative query for an image database storing a plurality of images, comprising: a display for displaying an input area; an input module for receiving an input specifying locations for a plurality of points of interest within the input area; and a query module configured to: evaluate a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and calculate a similarity ranking value for each database image based on the comparison of spatial
  • the display may be further configured to display the plurality of database images in a list according to the calculated similarity ranking values.
  • the input module may comprise at least one of: a mouse, a touch screen display apparatus, and an eye-tracking apparatus.
  • this specification describes a method of structurally storing an image in an image database, comprising: displaying the image, which is pending storage, to a user for a predetermined period of time; tracking an eye movement of the user over the time period; determining one or more points of interest which are fixated upon by the user during the time period; storing the image with data representing the one or more points of interest.
  • the method may further comprise: displaying the one or more determined points of interest to the user; receiving at least one word label for each of the displayed points of interest; and storing the image with the at least one word label for each of the one or more points of interest.
  • the receiving at least one word label may comprise receiving a voice input for each determined point of interest and generating a word label corresponding to each voice input.
  • the method may further comprise: assigning a numerical label to each of the one or more determined points of interest; and storing the image with the numerical label for each of the one or more points of interest.
  • the one or more determined points of interest may be assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user during the time period. [0037] The one or more determined points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.
  • Figure 1 is a block diagram illustrating a system for storing an image in an image database, according to an embodiment of the present invention
  • Figure 2 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 1, according to another exemplary embodiment of the present invention
  • Figure 3 is a block diagram illustrating a system for searching an image database which stores a plurality of images, according to another embodiment of the present invention
  • Figure 4 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention
  • Figure 5 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention.
  • FIG. 6 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention.
  • Description of Embodiments [0039] Embodiments of the invention allow users to store a plurality of images based on points of interest in an image and their relative locations in the image, and to retrieve and display images based on queries containing points of interest and their relative locations in the image.
  • the invention utilises spatial memory of the users to manage an image database which provides a more intuitive way for image storage and retrieval.
  • the invention exploits a theoretical understanding of human spatial memory to manage an image database to enable distinct and useful channels of query not available to other query methods. Additionally, because the invention exploits natural psychological competences, it enables functionalities which allow uses effectively to manipulate trade-offs between the cost of search and its efficiency in a task-appropriate manner, making it an entirely novel and appropriate approach to a wide range of application domains.
  • Figure 1 is a block diagram illustrating a system for storing an image in an image database, according to another exemplary embodiment.
  • the system 1 comprises a device 100 and an image database 200.
  • the device 100 comprises a display 110, a tracking module 120, a control module 130, a storage module 140, and an input module 150.
  • the display 110 is arranged to display an image to a user for a
  • the tracking module 120 is arranged to track eye movement of the user.
  • the tracking module 120 comprises an eye tracker including a camera which records movement of one or both eyes of the user as the user looks at the displayed image at the display 110.
  • the control module 130 is arranged to determine one or more points of interest which are fixated upon by the user during the time period in which the image is displayed to the user. Specifically, the control module 130 receives the recorded eye movement data by the eye tracker of the tracking module 120 and determines, based on where the user focused their gaze and a duration for which they focused their gaze, one or more points of interest which are fixated upon by the user.
  • the display 110 in the present embodiment is further arranged to display the one or more points of interest to the user, and the input module 150 is arranged to receive at least one word label from the user for each of the displayed points of interest.
  • the input module 150 of the present embodiment comprises a microphone to receive voice input from the user, as the one or more determined points of interests are displayed. The input module 150 then generates a word label corresponding to each voice input and assigns to the determined point of interest which is displayed by the display 110.
  • the displayed image is then stored along with data representing the one or more points of interest in the storage module 140.
  • the data representing one or more points of interest includes at least one of: a location of the one or more determined points of interest, a duration for which the user focused their gaze on the one or more determined points of interest, a frequency with which the user focused their gaze on the one or more determined points of interest, a word label assigned to the one or more points of interest.
  • numerical labels may be assigned to the determined points of interest, and the data representing one or more points of interest may include a numerical label assigned to the one or more points of interest.
  • the device 100 may not be arranged to display the one or more determined points of interest to the user or to receive word labels for each determined point of interest.
  • the storage module may be arranged to store the image along with locations of the determined points of interest.
  • Figure 2 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 1, according to another embodiment of the present invention.
  • step 21 the display 110 displays an image to a user.
  • the display 110 may be arranged to display an image of an English village. In the centre of the English village there is a small park with a tree, a lawn, and a hedge. The park itself is surrounded by old buildings with shops and small streets. Old street lights and benches in the park round up the image of a cosy small, old village.
  • step 22 follows step 21, it is noted that steps 21 and 22 occur simultaneously in the present embodiment.
  • the eye tracker of the tracking module 120 tracks movement of one or both eyes of the user and records the movement.
  • the eye movement data which includes data related to a path that indicates the movement of the eye(s) and data related to a time duration the user focused their gaze on a point in the image, is recorded in the storage module 140.
  • the control module 130 determines the one or more points of interest which are fixated upon by the user. This determination is based on the recorded eye movement data by the eye tracker of the tracking module 120.
  • the user may focus on the following objects: the tree, the lawn, the hedge, the street lights, the benches, the shops, the small streets.
  • the user may fixate on each of the objects a plurality of times and, for example, may focus his/her gaze a greater number of times on each of the first three objects and a lesser number of times on each of the rest of the objects in the image.
  • the control module 130 analyses the recorded eye movement data stored in the storage module 140, for example by overlaying the path of the eye movement over the image displayed in step 21, so as to determine at least one point of interest which are fixated upon by the user.
  • the control module 130 may determine the locations of the tree, the lawn, and the hedge in the displayed image as three different points of interest.
  • the control module 130 may be arranged to assign a numerical label to each of the determined points of interests according to at least one of: an order in which the user focus his/her gaze on the determined points of interest, a degree of visual attention from the user, and a user input.
  • step 24 the display 110 displays the determined points of interest to the user while the input module 150 receives at least one word label for each of the displayed points of interest.
  • the display 110 is arranged to display a portion of the database image which contains the tree, and the user may utter the word "tree” towards the microphone of the input module 150.
  • This voice input is then processed by the input module 150 to generate the word label "tree" which becomes assigned to the determined point of interest, either automatically by the input module 150 or manually by the user.
  • the same process may be repeated for other portions of the image, e.g. "lawn", "hedge”, etc.
  • the storage module 140 stores the image with data representing the one or more determined points of interest.
  • the data representing one or more points of interest includes at least one of: a location of the one or more determined points of interest, a frequency with which the user focused their gaze on each of the one or more determined points of interest, a word label assigned to the one or more points of interest.
  • the device 100 may be arranged to display a predetermined set of images to the user sequentially in order to determine the points of interest in each one in the predetermined set of images.
  • a predetermined set of images provided by way of description is included in Appendix A.
  • Figure 3 is a block diagram illustrating a system for processing an interrogative query for a plurality of images, according to another embodiment of the present invention.
  • the system 2 comprises an image database 200 and a device 300.
  • the device 300 comprises a display 310, an input module 320, a query module 330, and a storage module 340.
  • the image database 200 stores a plurality of database images which can be retrieved based on a user-input interrogative query, as explained in more detail in the following.
  • the image database 200 of the present embodiment may be the same one that is part of system 1 as illustrated in Figure 1. In other words, the images stored in the image database 200 can be searched and retrieved by the device 300 in system 2.
  • the display 310 and the input module 320 are integrated as a touch screen display that is arranged to initially display an input area by presenting a pro-forma blank region so as to allow a user to specify, as touch screen input, respective locations for a plurality of points of interest within the input area.
  • the received points of interest at the input module 320 are considered as at least a part of the interrogative query for searching the image database 200.
  • the input module 320 which is integrated as part of a touch screen display as described above, is further arranged to receive at least one word label for each of the plurality of received points of interest.
  • the at least one word label is provided as at least one respective virtual tag displayed at the display 310 for selection by the user.
  • the user is able to relocate the at least one virtual tag to respective location(s) so as to assign each of the at least one virtual tag to the received points of interests that have been input by the user within the displayed input area.
  • the image database 200 of the system 2 comprises a plurality of database images which can be searched and retrieved by the device 300 based on the interrogative query.
  • Each database image stored in the image database 200 comprises a plurality of points of interest. Moreover, in the present embodiment each database image stored in the image database 200 comprises at least one word label which is associated with a point of interest in the database image. This associated word label is a description of an object in the image represented by the point of interest, and it may relate to a name of an object in the image represented by the point of interest, a shape of an object in the image represented by the point of interest, or a colour of an object represented by the point of interest, etc.
  • the query module 330 is arranged to perform a number of method steps, comprising: (1) evaluating a spatial relationship between each received point of interest and each other point of interest; (2) comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and (3) calculating a similarity ranking value for each database image based on the comparison of spatial
  • Steps (1) through (3) will be explained in further detail in the following.
  • the query module 330 is arranged to evaluate a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest.
  • the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid.
  • the present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image.
  • the spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9.
  • the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.
  • the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis.
  • Each spatial relationship is evaluated as one of "higher”, “lower”, or “equal”, respectively representing scenarios in which the first point of interest having a higher coordinate value than that of second point of interest, the first point of interest having a lower coordinate value than that of second point of interest, and the first point of interest having an equal coordinate value to that of the second point of interest.
  • each spatial relationship is represented with a numerical value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.
  • the same evaluation technique is applied to the images stored in the image database 200 so as to obtain evaluated spatial relationships of the determined points of interests in each of the images stored in the image database 200 for the comparison step in step (2).
  • the query module 330 is arranged to compare each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of the points of interest can be evaluated in a quantifiable manner.
  • an exemplary database image Ai may have five points of interest that correspond to the received five points of interest input by the user at the input module 320.
  • Corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/similar associated word label.
  • the corresponding word labels in the plurality of database images can be exact matches, synonyms, or generalisations of the received word labels. For example, if the word "tree" is received as voice input at the input module 320, the query module 330 is able to retrieve any image(s) from the image database 200 which comprises word labels such as "forest", "plant", and "leaves", etc.
  • the query module 330 compares each evaluated spatial relationship of the five received points of interest (from step (1)) with the evaluated spatial
  • the query module 330 determines whether there is a match.
  • the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image Ai is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison. [0071] For example, if it is evaluated in step (1) that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e.
  • N representing individual spatial relationship comparison value for each of the 20 possible spatial relationships.
  • the generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.
  • a proportion of spatial relationships can be expected to be matched by chance.
  • the modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20.
  • any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1.
  • o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy.
  • the same process is repeated for database image Bi, database image Ci, etc. so as to generate a respective spatial composition similarity metric L (i.e. L B i, La, etc.) for each database image stored in the image database 200.
  • the query module 330 of the present embodiment may be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels, prior to the comparison of step (2). Specifically, the query module 330 is arranged to retrieve any image(s) from the image database 200 which comprises an associated word label equivalent to that received word labels at the input module 320. The query module 330 may be configured to perform the comparison of step (2) on the retrieved subset of database images only.
  • Step f ⁇
  • the query module 330 is arranged to calculate a similarity ranking value for each database image, based on the comparison of spatial relationships and generation of spatial composition similarity metrics in step (2).
  • the query module 330 calculates a similarity ranking value based on the spatial composition similarity metric L in step (2) for each of the database image stored in the image database 200.
  • database image Ai may have a generated metric L A i of 0.37
  • database image Bi may have a generated metric L B i of 0.20
  • database image Ci may have a generated metric La of 0.44
  • database image Di may have generated metric LDI of 0.27.
  • the generated metric for a database image indicates a degree of configural similarity of the database image with the received points of interest in terms of spatial relationships between the points of interest, wherein a lower score indicates a lower degree of similarity and a higher score indicates a higher degree of similarity. Therefore, in this example, the ranking value for database images Ai to Di would be - database image Ai similarity ranking value: 2; database image Bi similarity ranking value 4; database image Ci similarity ranking value: 1; and database image Di similarity ranking value: 3. In other words, the calculated similarity ranking value indicates the ranking of overall configural similarity of a database image to the received points of interest in terms of the spatial
  • the similarity ranking value for each database image calculated in step (3) is stored in the storage module 340.
  • the display 310 is further arranged to display the plurality of images stored in the image database 200 in a list according to the calculated similarity ranking values.
  • the display 110 would be arranged to display database images Ai to Di in the order of: Ci, Ai, Di, and Bi. The user is therefore presented with a ranked list of database images according to a level of configural similarity in terms of the spatial relationships between the points of interest.
  • FIG. 4 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another exemplary embodiment.
  • the present embodiment is similar to that described in relation to Figure 3.
  • Each database image stored in the image database 200 comprises a plurality of points of interest.
  • each database image stored in the image database 200 comprises at least one word label which is associated with a point of interest in the database image.
  • This associated word label is a description of an object in the image represented by the point of interest, and it may relate to a name of an object in the image represented by the point of interest, a shape of an object in the image represented by the point of interest, or a colour of an object represented by the point of interest, etc.
  • step 41 the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area.
  • the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.
  • the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid.
  • the present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image.
  • the spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9.
  • the input module 320 in the present embodiment comprises a microphone which is arranged to receive a voice input for each received point of interest, and the input module 320 generates a word label corresponding to each voice input.
  • the user may utter the word "tree” towards the microphone of the input module 320.
  • This voice input is then processed by the input module 320 to generate the word label "tree" which becomes assigned to the first received point of interest, either automatically by the input module 320 or manually by the user manipulating the touch screen display.
  • the query module 330 evaluates a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest.
  • the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.
  • the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis.
  • the evaluation of each spatial relationship is represented with a value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.
  • step 43 in which the query module 330 compares, for each image in the image database 200, each spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image.
  • an exemplary database image Ai may have five points of interest that correspond to the received five points of interest input by the user at the input module 320.
  • Corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/similar associated word label.
  • the corresponding word labels in the plurality of database images can be exact matches, synonyms, or generalisations of the received word labels. For example, if the word "sky” is received as voice input at the input module 320, the query module 330 is able to retrieve any image(s) from the image database 200 which comprises word labels such as "sun", “blue", and “cloud”, etc.
  • the query module 330 compares each evaluated spatial relationship of the five received points of interest (from step 42) with the evaluated spatial
  • the query module 330 determines whether there is a match. [0091] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image Ai is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image Ai is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison.
  • step 42 For example, if it is evaluated in step 42 that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower” (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image Ai is also "lower” (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image Ai is "higher” or "equal" (i.e.
  • the query module 330 assigns a value of '0' to the comparison.
  • N representing individual spatial relationship comparison value for each of the 20 possible spatial relationships.
  • the generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.
  • a proportion of spatial relationships can be expected to be matched by chance.
  • the modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20.
  • any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1.
  • o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy.
  • the query module 330 of the present embodiment may be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels, prior to the
  • the query module 330 is arranged to retrieve any image(s) from the image database 200 which comprises an associated word label equivalent to that received word labels at the input module 320.
  • the query module 330 may be configured to perform the comparison of step 43 on the retrieved subset of database images only.
  • the query module 330 calculates a similarity ranking value for each database image based on the comparison of spatial relationships and generation of spatial composition similarity metrics in step 43.
  • the query module 330 calculates a similarity ranking value based on the spatial composition similarity metric L in step 43 for each of the database image stored in the image database 200.
  • database image Ai may have a generated metric L A i of 0.37
  • database image Bi may have a generated metric L B i of 0.20
  • database image Ci may have a generated metric La of 0.44
  • database image Di may have generated metric LDi of 0.27.
  • the generated metric for a database image indicates a degree of configural similarity of the database image with the received points of interest in terms of spatial relationships between the points of interest, wherein a lower score indicates a lower degree of similarity and a higher score indicates a higher degree of similarity.
  • the ranking value for database images Ai to Di would be - database image Ai similarity ranking value: 2; database image Bi similarity ranking value 4; database image Ci similarity ranking value: 1; and database image Di similarity ranking value: 3.
  • the calculated similarity ranking value indicates the ranking of overall configural similarity of a database image to the received points of interest in terms of the spatial
  • the similarity ranking value for each database image calculated in step 44 is stored in the storage module 340.
  • step 45 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated ranking values.
  • the display 310 would be arranged to display database images Ai to Di in the order of: Ci, Ai, Di, and Bi. The user is therefore presented a ranked list of database images according to a level of configural similarity in terms of the spatial relationships between the points of interest.
  • FIG. 5 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention.
  • the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.
  • the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid.
  • the present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image.
  • the spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9.
  • the query module 330 assigns a numerical label to each of the plurality of points of interests received in step 51.
  • the query module 330 is arranged to assign numerical labels according to the order in which the points of interest are received in step 51. In other words, the first received point of interest is assigned , the second received point of interest is assigned '2', and so forth.
  • step 53 the query module 330 calculates, for each database image stored in the image database 200, a distance between each received points of interest and a corresponding point of interest in the database image.
  • each image in the image database 200 comprises at least one numerical label which is pre-assigned to a point of interest in the image according to at least one of: an order in which a user focus his/her gaze on the determined points of interest, a degree of visual attention from a user, and a user input.
  • the query module 330 in the present embodiment is arranged to calculate, for each database image, a distance between the received point of interest which is assigned the numerical label with the point of interest in the database image which is pre-assigned with the numerical label . The same calculation process is repeated for the other received points of interest and each of the other database images stored in the image database 200.
  • step 54 the query module 330 calculates, for each database image, an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area.
  • any point of interest of the interrogative query received in step 51 can be at most displaced from the target by 4 cells in the x-axis or the y-axis.
  • the average displacement of a point of interest of an interrogative query from a specific target can be pre-calculated and stored as a look-up table in the storage module 340.
  • step 54 for each point of interest in a database image, the query module 330 retrieves an average displacement of a point of interest of an interrogative query (C[x, y ]) from the look-up table and generate a corresponding weighting factor so as to normalise the corresponding calculated distance for chance. In other words, the query module 330 multiplies each of the calculated distances and its respective weighting factor so as to obtain values of normalised calculated distances.
  • C 1 - [normalised calculated distance]
  • C for each of the received points of interest can be calculated where a value of o represents chance values and 1 represents absolutely accurate recall. In this embodiment, any negative value of C becomes rounded up to o.
  • the accuracy metric D for each database image is the average of C corresponding to all points of interest within the database image.
  • the calculated similarity ranking value indicates the ranking of overall proximity similarity of a database image to the received points of interest in terms of the locations of the points of interest.
  • step 55 the query module 330 calculates a similarity ranking value for each database image based on the calculated distances.
  • the similarity ranking value for each of the plurality of database images is calculated based on the accuracy metrics calculated in step 54.
  • database image A2 may have an accuracy metric D A2 of 0.29
  • database image B2 may have an accuracy metric D B2 of 0.37
  • database image C2 may have an accuracy metric Dc 2 of 0.34
  • database image D2 may have an accuracy metric D D2 of 0.28.
  • the calculated accuracy metric for a database image indicates a degree of similarity of the database image with the received points of interest in terms of the proximity of the locations of corresponding points of interest, wherein a lower accuracy metric indicates a lower degree of similarity and a higher accuracy metric indicates a higher degree of similarity.
  • the similarity ranking value for database images A2 to D2 would be - database image A2 similarity ranking value: 3; database image B2 similarity ranking value 1; database image C2 similarity ranking value: 2; and database image D2 similarity ranking value: 4 ⁇
  • step 56 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated similarity ranking values.
  • the display 310 would be arranged to display database images A2 to D2 in the order of: B2, C2, A2, and D2. The user is therefore presented a ranked list of database images according to a level of similarity in terms of the proximity of the locations of corresponding points of interest.
  • the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined.
  • step 52 may be omitted from the method described above, such that no numerical labels are assigned to the points of interest and no numerical labels are pre-assigned to the points of interest of the images stored in the image database 200.
  • step 54 may be omitted from the method described above. In this case, the calculation of ranking values would not be based on accuracy metric, but directly based on calculated distances instead.
  • Figure 6 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention.
  • step 61 the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area.
  • the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.
  • the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid.
  • the present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image.
  • the spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. [0119]
  • the query module 330 assigns a numerical label to each of the plurality of points of interests received in step 61.
  • the query module 330 is arranged to assign numerical labels according to the order in which the points of interest are received in step 61. In other words, the first received point of interest is assigned , the second received point of interest is assigned '2', and so forth.
  • the query module 330 evaluates a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest.
  • the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.
  • the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis.
  • the evaluation of each spatial relationship is represented with a value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.
  • step 64 in which the query module 330 compares, for each image in the image database 200, each spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image.
  • each image in the image database 200 comprises at least one numerical label which is pre-assigned to a point of interest in the image according to at least one of: an order in which a user focus his/her gaze on the determined points of interest, a degree of visual attention from a user, and a user input.
  • the exemplary database image A3 may have five points of interest that correspond to the received five points of interest input by the user at the input module 320.
  • Corresponding points of interest in a database image maybe determined by matching a received point of interest with an assigned numerical label with a point of interest in the database image with the same numerical label.
  • the query module 330 compares each evaluated spatial relationship of the five received points of interest (from step 63) with the evaluated spatial
  • the query module 330 determines whether there is a match.
  • the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image A3 is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison.
  • step 63 For example, if it is evaluated in step 63 that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower” (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image A3 is also "lower” (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image A3 is "higher” or "equal" (i.e.
  • the query module 330 assigns a value of '0' to the comparison.
  • the generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.
  • a proportion of spatial relationships can be expected to be matched by chance.
  • the modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20.
  • any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1.
  • o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy.
  • the same process is repeated for database image B3, database image C3, etc. so as to generate a respective spatial composition similarity metric L (i.e. L B3 , Lc 3 , etc.) for each database image stored in the image database 200.
  • step 65 the query module 330 calculates, for each database image stored in the image database 200, a distance between each received points of interest and a corresponding point of interest in the database image.
  • each image in the image database 200 in the present embodiment comprises at least one numerical label which is pre-assigned to a point of interest in the image.
  • the query module 330 is arranged to calculate, for each database image, a distance between the received point of interest which is assigned the numerical label with the point of interest in the database image which is pre- assigned with the numerical label . The same calculation process is repeated for the other received points of interest and each of the other database images stored in the image database 200.
  • the query module 330 calculates, for each database image, an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area.
  • a weighting factor is generated, consider a point of interest of a database image (herein referred to as a "target") which has the coordinates [5,5], i.e. corresponding to the centre cell of the 9x9 cellular grid input area.
  • any point of interest of the interrogative query received in step 51 can be at most displaced from the target by 4 cells in the x-axis or the y-axis.
  • the average displacement of a point of interest of an interrogative query from a specific target can be pre-calculated and stored as a look-up table in the storage module 340.
  • the query module 330 can retrieve an average displacement of a point of interest of an interrogative query (C[x,y]) from the look-up table and generate a corresponding weighting factor so as to normalise the corresponding calculated distance for chance. In other words, the query module 330 multiplies each of the calculated distances and its respective weighting factor so as to obtain values of normalised calculated distances.
  • C 1 - [normalised calculated distance]
  • C for each of the received points of interest can be calculated where a value of o represents chance values and 1 represents absolutely accurate recall. In this embodiment, any negative value of C becomes rounded up to o.
  • the accuracy metric D for each database image is the average of C corresponding to all points of interest within the database image.
  • step 66 the query module 330 calculates a similarity ranking value for each database image based on the calculated distances in step 65 and the comparison of spatial relationships in step 64.
  • database image A3 may have a generated metric L A3 of 0.31 and an accuracy metric D A3 of 0.37 which gives a composite metric QA 3 of 0.34;
  • database image B3 may have a generated metric of L B3 0.29 and an accuracy metric of D B3 of 0.29 which give a composite metric QB 3 of 0.29;
  • database image C3 may have a generated metric L 3 ⁇ 4 0.37 and an accuracy metric D 3 ⁇ 4 of 0.35 which gives a composite metric Qc 3 of 0.36;
  • database image D3 may have a generated metric L D3 of 0.36 an accuracy metric D D3 of 0.34 which gives a composite metric QD 3 of 0.35.
  • the composite metric indicates a degree of similarity of the database image with the received points of interest both in terms of the proximity of the locations of corresponding points of interest (“proximity similarity”) and the spatial relationships between the points of interest (“configural similarity”), wherein a lower accuracy metric indicates a higher degree of similarity and a higher accuracy metric indicates a lower degree of similarity. Therefore, in this example, the similarity ranking value for database images A3 to D3 would be - database image A3 similarity ranking value: 3; database image B3 similarity ranking value 4; database image C3 similarity ranking value: 1; and database image D3 similarity ranking value: 2.
  • step 68 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated similarity ranking values.
  • the display 310 would be arranged to display database images A2 to D2 in the order of: C3, D3, A3, and B3. The user is therefore presented a ranked list of database images according to both proximal similarity and configural similarity.
  • the present embodiment provides a synergistic effect in query- processing in terms of image retrieval accuracy, for example compared to the embodiments described in relation to Figures 4 and 5.
  • step 68 may be omitted from the method described above, such that the database images are not displayed.
  • step 68 may be omitted from the method described above, such that the database images are not displayed.
  • the display and the input module of the device are integrated as a touch screen display, in alternative embodiments the display and the input module may not be integrated as a single component.
  • the display may comprise other types of display devices, and the input module may comprise other types of input devices, e.g. a mouse, a keyboard, etc.
  • the storage modules of the devices in Figure 1 and Figure 3 may be implemented as external components of the devices.
  • the image database may be integrated into the devices of Figure 1 and Figure 3.
  • the device may not be arranged to receive word labels that correspond to received/ determined points of interest.
  • corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/ similar associated word label
  • corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned numerical label with a point of interest in the database image with the same pre-assigned numerical label.
  • the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received as at least part of a user query
  • the received points of interest may be assigned with numerical labels according to other factors, e.g. the location of each point of interest within the input area.
  • the received points of interest may not be assigned with numerical labels at all.
  • the determined points of interest are assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user as an image is being displayed to the user
  • the determined points of interest may be assigned with numerical labels according to other factors, e.g. the location of each points of interest within the input area, or a duration for which the user focused their gaze on the one or more determined points of interest.
  • the determined points of interest may not be assigned with numerical labels at all.
  • receiving the at least one word label comprises receiving a voice input for each received/determined point of interest
  • the at least one word label may be received in other methods, e.g. keyboard input.
  • the input module may comprise a keyboard.
  • the input area presented to the user at the display is quantised into a uniform 9x9 cellular grid
  • the input area may be quantised with a different grain of analysis, e.g. a 15x15 cellular grid, a 6x8 cellular grid, etc., according to requirements of the system and the sizes/dimensions of the database images in the image database.
  • system may not be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels prior to a comparison of spatial relationships of points of interest.
  • the weighting factor associated with a received point of interest is generated based on the average displacement of a point of interest of an interrogative query from a specific target, in alternative embodiments the weighting factor may be generated based on other factors.
  • the image stored in the image database may each comprise an identification number
  • the display may be arranged to display the plurality of database images in a list according to the calculated ranking values together with their respective identification numbers.
  • the tracking module comprises an eye tracker including a camera for recording movement of one or both eyes of the user
  • the tracking module may comprise other types of tracking device such as an eye-attached tracking device or an electrical potential measuring device.
  • control module of the device may be arranged to perform evaluation of a spatial relationship between each determined point of interest and each other determined point of interest in the manner as described in relation to Figure 3.
  • storage module of the device may be arranged to store a numerical representation of the evaluated spatial relationship of a database image together with the database image.
  • Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware, and application logic.
  • the software, application logic and/or hardware may reside on memory, or any computer media.
  • the application software or an instruction set is maintained on any one of various conventional computer-readable media.
  • a "memory” or “computer-readable medium” may be any media or means that can contain, store, communicate, propagate, or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
  • word labels are assigned to received points of interest and in some other embodiments numerical labels are assigned to the received points of interest.
  • step of assigning word labels may be replaced with the step of assigning numerical labels, and vice versa.
  • the method may comprise assigning word labels and numerical labels to the received points of interest.
  • the method may comprise assigning other types of alphanumeric descriptors to the received points of interest.
  • An offensive player is shooting the ball for three points while the defensive effort of the player with the headband will be too late.
  • the other players are trying to get into a good position for a potential rebound.
  • the display panel under the roof is also showing the game from a different viewpoint.
  • a flock of sheep are grazing in front of a grey wooden house.
  • the house was built on a slope and has two storeys. It also has a veranda on the right. There is even a small hut hidden in the forest.
  • Image Q Hill in Cloud
  • a mountaineer is refilling his water bottle at high altitude in the Swiss Alps.
  • the yellow rucksack are all the necessary items for a long hike.
  • the drinking trough is usually for the cows which spend the summer months on higher grasslands.
  • In the background are the snow covered mountain tops.
  • the narrow street which leads down to the port of an old fishing village, is just wide enough for the white Range Rover.
  • a tourist at the side gives way.
  • a bin protrudes from one house entrance at the side, and the old church tower can be seen in the distance.
  • the white fence encloses a collection of red tulips, green hedges and a tree which have grown on the premises for decades.
  • the fence also has stone crowns to show the wealth of the former owners.
  • An officer on the deck of a sailing vessel is silently watching another vessel which is closer to the port. Visitors are on board the ship with the golden ornaments while sailors are climbing on the masts. The ropes on the officer's vessel are neatly aligned on the wooden deck.
  • This newly refurbished tower house is rehousing nurses. Some nurses have just come home from their day shift with their file folders. Its three storeys were the home of the biggest vase manufacturing company in Europe. The tower with its rounded silver top still remains the symbol of the town.
  • a Mediterranean windmill with a pitched red roof and small rectangular windows is set on top of a hill overlooking the surrounding areas. It also houses a restaurant with outside dining area, popular with tourists and locals. The green door at the front leads to another garden area adjoining the brown bricks of the building's foundation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database: comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances and the comparison of spatial relationships.

Description

Image Storage and Retrieval
Technical Field
[0001] The present invention relates to methods, devices, apparatuses and systems for storing images in a database and retrieving images from a database. Specifically, the present invention relates to methods and apparatuses for searching and displaying images from an image database and for storing images in a database according to the spatial configuration of determined points of interest input by a user. Background
[0002] In recent years, there has been an increasing popularity of digital cameras and a tremendous expansion in the power and storage capacity of computing devices. More and more people use electronic devices equipped with digital cameras in their everyday lives. The quantity and variety of digital images produced by these electronic devices have increased immensely, and many computing devices are now equipped with sufficient power and storage capacity to store, retrieve, view, and edit large numbers of high resolution digital images. In addition, the exponential growth in wireless networking as well as the improvement in connection speeds allows users to access large databases of images stored remotely over networks. Accordingly, there is a need for efficient image storage and retrieval techniques so as to provide users with a way to easily navigate through the growing numbers of available digital images.
[0003] Currently known image retrieval techniques allow users to retrieve images in one of two ways: keyword-based image retrieval and content-based image retrieval. Keyword-based image retrieval performs searches for images by matching keywords input by a user to keywords that have been pre-assigned to the images. However, with keyword-based image retrieval the retrieval efficiency may be limited, due to the inability to match images for queries which include ambiguous descriptions. Content- based image retrieval performs searches for images that are similar to an example image in terms of low-level image features, such as colour histogram, texture, shape, etc. Accordingly, queries based on content-based image retrieval may return completely irrelevant images which happen to contain similar low-level image features.
Summary of Invention
[0004] In a first aspect, this specification describes a method of processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and calculating a similarity ranking value for each database image based on the comparison of spatial relationships.
[0005] The method may further comprise displaying the plurality of database images in a list according to the calculated similarity ranking values.
[0006] A spatial relationship along the x-axis and a spatial relationship along the y-axis may be evaluated between each received point of interest and each other point of interest.
[0007] Each spatial relationship may be evaluated as a first point of interest having a coordinate value which is higher, lower or equal to that of a second point of interest with respect to the x-axis or the y-axis; and the evaluation of each spatial relationship ma y be represented with a numerical value of 1, -1 or o respectively.
[0008] The comparison of spatial relationships may comprise determining whether the numerical values representing the evaluation of the two spatial
relationships are a match, and assigning a score to the comparison based on the determination.
[0009] The method may further comprise receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels. [0010] The receiving word labels may comprise receiving a voice input for each received point of interest and generating a word label corresponding to each voice input.
[0011] The method may further comprise generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels. [0012] The corresponding word labels in the plurality of database images may be synonyms or generalisations of the received word labels. [0013] The method may further comprise: assigning a numerical label to each of the plurality of received points of interest; wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels. [0014] The plurality of received points of interest may be assigned with numerical labels according to the order in which the points of interest are received.
[0015] The plurality of received points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.
[0016] Displaying the plurality of database images in a list according to the calculated similarity ranking values may comprise displaying the plurality of database image in an order of highest similarity ranking value to lowest similarity ranking value. [0017] In a second aspect, this specification describes a method of processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; for each image in the database, calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances.
[0018] The method may further comprise displaying the plurality of database images in a list according to the calculated similarity ranking values.
[0019] The method may further comprise: calculating an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area; wherein the similarity ranking of database images is based on the calculated accuracy metric for each database image. [0020] The method may further comprise receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels. [0021] The method may further comprise receiving a voice input for each received point of interest and generating a word label corresponding to each voice input.
[0022] The method may further comprise generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels.
[0023] The corresponding word labels in the plurality of database images may be synonyms or generalisations of the received word labels. [0024] The method may further comprise assigning a numerical label to each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.
[0025] The plurality of received points of interest may be assigned with numerical labels according to the order in which the points of interest are received.
[0026] The plurality of received points of interest may be assigned with numerical labels according to the location of each point of interest within the input area. [0027] Displaying the plurality of database images in a list according to the calculated similarity ranking values may comprise displaying the plurality of database image in order of highest similarity ranking value to lowest similarity ranking value.
[0028] In a third aspect, this specification describes a method for processing an interrogative query for a plurality of images stored in an image database, comprising: receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; evaluating a spatial relationship between each received point of interest and each other point of interest; for each image in the database:
comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; calculating a distance between each received point of interest and a corresponding point of interest in the database image; and calculating a similarity ranking value for each database image based on the plurality of calculated distances and the comparison of spatial relationships. [0029] In a fourth aspect, this specification describes a device for processing an interrogative query for an image database storing a plurality of images, comprising: a display for displaying an input area; an input module for receiving an input specifying locations for a plurality of points of interest within the input area; and a query module configured to: evaluate a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and calculate a similarity ranking value for each database image based on the comparison of spatial
relationships.
[0030] The display may be further configured to display the plurality of database images in a list according to the calculated similarity ranking values.
[0031] The input module may comprise at least one of: a mouse, a touch screen display apparatus, and an eye-tracking apparatus.
[0032] In a fifth aspect, this specification describes a method of structurally storing an image in an image database, comprising: displaying the image, which is pending storage, to a user for a predetermined period of time; tracking an eye movement of the user over the time period; determining one or more points of interest which are fixated upon by the user during the time period; storing the image with data representing the one or more points of interest.
[0033] The method may further comprise: displaying the one or more determined points of interest to the user; receiving at least one word label for each of the displayed points of interest; and storing the image with the at least one word label for each of the one or more points of interest.
[0034] The receiving at least one word label may comprise receiving a voice input for each determined point of interest and generating a word label corresponding to each voice input. [0035] The method may further comprise: assigning a numerical label to each of the one or more determined points of interest; and storing the image with the numerical label for each of the one or more points of interest.
[0036] The one or more determined points of interest may be assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user during the time period. [0037] The one or more determined points of interest may be assigned with numerical labels according to the location of each point of interest within the input area.
Brief Description of Drawings
[0038] For a more complete understanding of the methods, apparatuses, devices, and systems described herein, reference is made to the following descriptions taken in connection with the accompanying drawings in which:
Figure 1 is a block diagram illustrating a system for storing an image in an image database, according to an embodiment of the present invention;
Figure 2 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 1, according to another exemplary embodiment of the present invention;
Figure 3 is a block diagram illustrating a system for searching an image database which stores a plurality of images, according to another embodiment of the present invention; Figure 4 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention;
Figure 5 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention; and
Figure 6 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention. Description of Embodiments [0039] Embodiments of the invention allow users to store a plurality of images based on points of interest in an image and their relative locations in the image, and to retrieve and display images based on queries containing points of interest and their relative locations in the image.
[0040] The invention utilises spatial memory of the users to manage an image database which provides a more intuitive way for image storage and retrieval. The invention exploits a theoretical understanding of human spatial memory to manage an image database to enable distinct and useful channels of query not available to other query methods. Additionally, because the invention exploits natural psychological competences, it enables functionalities which allow uses effectively to manipulate trade-offs between the cost of search and its efficiency in a task-appropriate manner, making it an entirely novel and appropriate approach to a wide range of application domains.
[0041] Figure 1 is a block diagram illustrating a system for storing an image in an image database, according to another exemplary embodiment.
[0042] As shown in Figure 1, the system 1 comprises a device 100 and an image database 200. The device 100 comprises a display 110, a tracking module 120, a control module 130, a storage module 140, and an input module 150.
[0043] The display 110 is arranged to display an image to a user for a
predetermined period of time. Over the time period in which the image is displayed to the user, the tracking module 120 is arranged to track eye movement of the user. The tracking module 120 comprises an eye tracker including a camera which records movement of one or both eyes of the user as the user looks at the displayed image at the display 110. [0044] The control module 130 is arranged to determine one or more points of interest which are fixated upon by the user during the time period in which the image is displayed to the user. Specifically, the control module 130 receives the recorded eye movement data by the eye tracker of the tracking module 120 and determines, based on where the user focused their gaze and a duration for which they focused their gaze, one or more points of interest which are fixated upon by the user. [0045] After the one or more points of interests fixated upon by the user have been determined, the display 110 in the present embodiment is further arranged to display the one or more points of interest to the user, and the input module 150 is arranged to receive at least one word label from the user for each of the displayed points of interest. Specifically, the input module 150 of the present embodiment comprises a microphone to receive voice input from the user, as the one or more determined points of interests are displayed. The input module 150 then generates a word label corresponding to each voice input and assigns to the determined point of interest which is displayed by the display 110.
[0046] The displayed image is then stored along with data representing the one or more points of interest in the storage module 140. In the present embodiment, the data representing one or more points of interest includes at least one of: a location of the one or more determined points of interest, a duration for which the user focused their gaze on the one or more determined points of interest, a frequency with which the user focused their gaze on the one or more determined points of interest, a word label assigned to the one or more points of interest. In some alternative embodiments, numerical labels may be assigned to the determined points of interest, and the data representing one or more points of interest may include a numerical label assigned to the one or more points of interest.
[0047] In alternative embodiments, the device 100 may not be arranged to display the one or more determined points of interest to the user or to receive word labels for each determined point of interest. In these alternative embodiments, once the control module determines the one or more points of interest which are fixated upon by the user, the storage module may be arranged to store the image along with locations of the determined points of interest.
[0048] Figure 2 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 1, according to another embodiment of the present invention.
[0049] The process starts at step 21. In step 21, the display 110 displays an image to a user. By way of example, the display 110 may be arranged to display an image of an English village. In the centre of the English village there is a small park with a tree, a lawn, and a hedge. The park itself is surrounded by old buildings with shops and small streets. Old street lights and benches in the park round up the image of a cosy small, old village.
[0050] Although it is shown in the flowchart that step 22 follows step 21, it is noted that steps 21 and 22 occur simultaneously in the present embodiment. In other words, as the image is being displayed to the user in step 21, the eye tracker of the tracking module 120 tracks movement of one or both eyes of the user and records the movement. The eye movement data, which includes data related to a path that indicates the movement of the eye(s) and data related to a time duration the user focused their gaze on a point in the image, is recorded in the storage module 140.
[0051] Subsequently, in step 23, the control module 130 determines the one or more points of interest which are fixated upon by the user. This determination is based on the recorded eye movement data by the eye tracker of the tracking module 120. In the present example, as the image of the English village is displayed to the user, the user may focus on the following objects: the tree, the lawn, the hedge, the street lights, the benches, the shops, the small streets. The user may fixate on each of the objects a plurality of times and, for example, may focus his/her gaze a greater number of times on each of the first three objects and a lesser number of times on each of the rest of the objects in the image. The control module 130 analyses the recorded eye movement data stored in the storage module 140, for example by overlaying the path of the eye movement over the image displayed in step 21, so as to determine at least one point of interest which are fixated upon by the user. In the example used above, the control module 130 may determine the locations of the tree, the lawn, and the hedge in the displayed image as three different points of interest. In step 23, the control module 130 may be arranged to assign a numerical label to each of the determined points of interests according to at least one of: an order in which the user focus his/her gaze on the determined points of interest, a degree of visual attention from the user, and a user input.
[0052] Although it is shown in the flowchart that step 24 is followed by step 25, it is noted that these two steps also occur simultaneously in the present embodiment. In steps 24 and 25, the display 110 displays the determined points of interest to the user while the input module 150 receives at least one word label for each of the displayed points of interest. [0053] Continuing with the above example, the display 110 is arranged to display a portion of the database image which contains the tree, and the user may utter the word "tree" towards the microphone of the input module 150. This voice input is then processed by the input module 150 to generate the word label "tree" which becomes assigned to the determined point of interest, either automatically by the input module 150 or manually by the user. The same process may be repeated for other portions of the image, e.g. "lawn", "hedge", etc.
[0054] After the word labels are received and assigned to respective determined points of interest in step 25, in the subsequent step 26 the storage module 140 stores the image with data representing the one or more determined points of interest. In the present embodiment, the data representing one or more points of interest includes at least one of: a location of the one or more determined points of interest, a frequency with which the user focused their gaze on each of the one or more determined points of interest, a word label assigned to the one or more points of interest.
[0055] It is noted that the above steps can be applied to different images, and in some embodiments the device 100 may be arranged to display a predetermined set of images to the user sequentially in order to determine the points of interest in each one in the predetermined set of images. An example of a predetermined set of images provided by way of description is included in Appendix A.
[0056] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined.
[0057] Figure 3 is a block diagram illustrating a system for processing an interrogative query for a plurality of images, according to another embodiment of the present invention.
[0058] As shown in Figure 3, the system 2 comprises an image database 200 and a device 300. The device 300 comprises a display 310, an input module 320, a query module 330, and a storage module 340. The image database 200 stores a plurality of database images which can be retrieved based on a user-input interrogative query, as explained in more detail in the following. The image database 200 of the present embodiment may be the same one that is part of system 1 as illustrated in Figure 1. In other words, the images stored in the image database 200 can be searched and retrieved by the device 300 in system 2.
[0059] In the present embodiment, the display 310 and the input module 320 are integrated as a touch screen display that is arranged to initially display an input area by presenting a pro-forma blank region so as to allow a user to specify, as touch screen input, respective locations for a plurality of points of interest within the input area. The received points of interest at the input module 320 are considered as at least a part of the interrogative query for searching the image database 200.
[0060] In the present embodiment, the input module 320, which is integrated as part of a touch screen display as described above, is further arranged to receive at least one word label for each of the plurality of received points of interest. The at least one word label is provided as at least one respective virtual tag displayed at the display 310 for selection by the user. By manipulating the touch screen display, the user is able to relocate the at least one virtual tag to respective location(s) so as to assign each of the at least one virtual tag to the received points of interests that have been input by the user within the displayed input area. [0061] As mentioned above, the image database 200 of the system 2 comprises a plurality of database images which can be searched and retrieved by the device 300 based on the interrogative query. Each database image stored in the image database 200 comprises a plurality of points of interest. Moreover, in the present embodiment each database image stored in the image database 200 comprises at least one word label which is associated with a point of interest in the database image. This associated word label is a description of an object in the image represented by the point of interest, and it may relate to a name of an object in the image represented by the point of interest, a shape of an object in the image represented by the point of interest, or a colour of an object represented by the point of interest, etc.
[0062] After the input module 320 receives the input specifying locations for a plurality of points of interest within the input area, the query module 330 is arranged to perform a number of method steps, comprising: (1) evaluating a spatial relationship between each received point of interest and each other point of interest; (2) comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and (3) calculating a similarity ranking value for each database image based on the comparison of spatial
relationships. Steps (1) through (3) will be explained in further detail in the following.
Step (1):
[0063] The query module 330 is arranged to evaluate a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest.
[0064] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. In the context of step (1), the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.
[0065] In this embodiment, the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis. Each spatial relationship is evaluated as one of "higher", "lower", or "equal", respectively representing scenarios in which the first point of interest having a higher coordinate value than that of second point of interest, the first point of interest having a lower coordinate value than that of second point of interest, and the first point of interest having an equal coordinate value to that of the second point of interest. The evaluation of each spatial relationship is represented with a numerical value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest. The same evaluation technique is applied to the images stored in the image database 200 so as to obtain evaluated spatial relationships of the determined points of interests in each of the images stored in the image database 200 for the comparison step in step (2). Step (2 ):
[0066] The query module 330 is arranged to compare each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of the points of interest can be evaluated in a quantifiable manner.
[0067] For example, an exemplary database image Ai may have five points of interest that correspond to the received five points of interest input by the user at the input module 320. Corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/similar associated word label. The corresponding word labels in the plurality of database images can be exact matches, synonyms, or generalisations of the received word labels. For example, if the word "tree" is received as voice input at the input module 320, the query module 330 is able to retrieve any image(s) from the image database 200 which comprises word labels such as "forest", "plant", and "leaves", etc.
[0068] The query module 330 then compares each evaluated spatial relationship of the five received points of interest (from step (1)) with the evaluated spatial
relationships of database image Ai and assign an individual score to the comparison of each evaluated spatial relationship of the received points of interest with a
corresponding spatial relationship between corresponding points of interest in the database image Ai.
[0069] In this example, there are 20 possible spatial relationships, i.e. both the horizontal and vertical dimensions for the 10 spatial relationship permutations between the five points of interest. For each possible spatial relationship, the query module 330 determines whether there is a match.
[0070] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image Ai is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image Ai is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison. [0071] For example, if it is evaluated in step (1) that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image Ai is also "lower" (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image Ai is "higher" or "equal" (i.e. having a value of "1" or "0" which does not match with "-1"), then the query module 330 assigns a value of '0' to the comparison. This comparison and value assignment step is applied to all 20 possible spatial relationships to generate an accumulated score NAi, where N for each of the database image is calculated using the formula N =
¾+¾+..._+n20 (nx representing individual spatial relationship comparison value for each of the 20 possible spatial relationships). The generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.
[0072] In practice, a proportion of spatial relationships can be expected to be matched by chance. The modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20. In order to take into account the likelihood of matching spatial relationships by chance, a spatial composition similarity metric L for each database image is calculated using: L= (Ν-7.5)/(20-7·5). In this embodiment, any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1. In this case, o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy. [0073] The same process is repeated for database image Bi, database image Ci, etc. so as to generate a respective spatial composition similarity metric L (i.e. LBi, La, etc.) for each database image stored in the image database 200.
[0074] The query module 330 of the present embodiment may be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels, prior to the comparison of step (2). Specifically, the query module 330 is arranged to retrieve any image(s) from the image database 200 which comprises an associated word label equivalent to that received word labels at the input module 320. The query module 330 may be configured to perform the comparison of step (2) on the retrieved subset of database images only.
Step f< :
[0075] The query module 330 is arranged to calculate a similarity ranking value for each database image, based on the comparison of spatial relationships and generation of spatial composition similarity metrics in step (2). In more detail, in the present embodiment the query module 330 calculates a similarity ranking value based on the spatial composition similarity metric L in step (2) for each of the database image stored in the image database 200. [0076] For example, database image Ai may have a generated metric LAi of 0.37, database image Bi may have a generated metric LBi of 0.20, database image Ci may have a generated metric La of 0.44, and database image Di may have generated metric LDI of 0.27. According to the process for generating a spatial composition similarity metric for a database image as described in step (2), the generated metric for a database image indicates a degree of configural similarity of the database image with the received points of interest in terms of spatial relationships between the points of interest, wherein a lower score indicates a lower degree of similarity and a higher score indicates a higher degree of similarity. Therefore, in this example, the ranking value for database images Ai to Di would be - database image Ai similarity ranking value: 2; database image Bi similarity ranking value 4; database image Ci similarity ranking value: 1; and database image Di similarity ranking value: 3. In other words, the calculated similarity ranking value indicates the ranking of overall configural similarity of a database image to the received points of interest in terms of the spatial
relationships between the points of interest.
[0077] In the present embodiment, the similarity ranking value for each database image calculated in step (3) is stored in the storage module 340.
[0078] Subsequent to the calculation of ranking values by the query module 330, the display 310 is further arranged to display the plurality of images stored in the image database 200 in a list according to the calculated similarity ranking values. In the example as described above, the display 110 would be arranged to display database images Ai to Di in the order of: Ci, Ai, Di, and Bi. The user is therefore presented with a ranked list of database images according to a level of configural similarity in terms of the spatial relationships between the points of interest.
[0079] Figure 4 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another exemplary embodiment. [0080] The present embodiment is similar to that described in relation to Figure 3. Each database image stored in the image database 200 comprises a plurality of points of interest. Moreover, in the present embodiment each database image stored in the image database 200 comprises at least one word label which is associated with a point of interest in the database image. This associated word label is a description of an object in the image represented by the point of interest, and it may relate to a name of an object in the image represented by the point of interest, a shape of an object in the image represented by the point of interest, or a colour of an object represented by the point of interest, etc. [0081] The process starts at step 41. In step 41, the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.
[0082] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. [0083] In addition, the input module 320 in the present embodiment comprises a microphone which is arranged to receive a voice input for each received point of interest, and the input module 320 generates a word label corresponding to each voice input.
[0084] For example, for a first received point of interest the user may utter the word "tree" towards the microphone of the input module 320. This voice input is then processed by the input module 320 to generate the word label "tree" which becomes assigned to the first received point of interest, either automatically by the input module 320 or manually by the user manipulating the touch screen display.
[0085] In the subsequent step 42, the query module 330 evaluates a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest. In the context of step 42, the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area.
[0086] Specifically, the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis. The evaluation of each spatial relationship is represented with a value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.
[0087] The method then proceeds to step 43, in which the query module 330 compares, for each image in the image database 200, each spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of points of interest can be evaluated in a quantifiable manner.
[0088] For example, an exemplary database image Ai may have five points of interest that correspond to the received five points of interest input by the user at the input module 320. Corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/similar associated word label. The corresponding word labels in the plurality of database images can be exact matches, synonyms, or generalisations of the received word labels. For example, if the word "sky" is received as voice input at the input module 320, the query module 330 is able to retrieve any image(s) from the image database 200 which comprises word labels such as "sun", "blue", and "cloud", etc.
[0089] The query module 330 then compares each evaluated spatial relationship of the five received points of interest (from step 42) with the evaluated spatial
relationships of database image Ai and assign an individual score to the comparison of each evaluated spatial relationship of the received points of interest with a
corresponding spatial relationship between corresponding points of interest in the database image Ai. [0090] In this example, there are 20 possible spatial relationships, i.e. both the horizontal and vertical dimensions for the 10 spatial relationship permutations between the five points of interest. For each possible spatial relationship, the query module 330 determines whether there is a match. [0091] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image Ai is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image Ai is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison.
[0092] For example, if it is evaluated in step 42 that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image Ai is also "lower" (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image Ai is "higher" or "equal" (i.e. having a value of "1" or "0" which does not match with "-1"), then the query module 330 assigns a value of '0' to the comparison. This comparison and value assignment step is applied to all 20 possible spatial relationships to generate an accumulated score NAi, where N for each of the database image is calculated using the formula N =
¾+¾+..._+n20 (nx representing individual spatial relationship comparison value for each of the 20 possible spatial relationships). The generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.
[0093] In practice, a proportion of spatial relationships can be expected to be matched by chance. The modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20. In order to take into account the likelihood of matching spatial relationships by chance, a spatial composition similarity metric L for each database image is calculated using: L= (Ν-7.5)/(20-7·5). In this embodiment, any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1. In this case, o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy.
[0094] The same process is repeated for database image Bi, database image Ci, etc. so as to generate a respective spatial composition similarity metric L (i.e. LBi, La, etc.) for each database image stored in the image database 200.
[0095] The query module 330 of the present embodiment may be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels, prior to the
comparison of step 43. Specifically, the query module 330 is arranged to retrieve any image(s) from the image database 200 which comprises an associated word label equivalent to that received word labels at the input module 320. The query module 330 may be configured to perform the comparison of step 43 on the retrieved subset of database images only.
[0096] In step 44, the query module 330 calculates a similarity ranking value for each database image based on the comparison of spatial relationships and generation of spatial composition similarity metrics in step 43. In more detail, in the present embodiment the query module 330 calculates a similarity ranking value based on the spatial composition similarity metric L in step 43 for each of the database image stored in the image database 200.
[0097] For example, database image Ai may have a generated metric LAi of 0.37, database image Bi may have a generated metric LBi of 0.20, database image Ci may have a generated metric La of 0.44, and database image Di may have generated metric LDi of 0.27. According to the process for generating a spatial composition similarity metric for a database image as described in step (2), the generated metric for a database image indicates a degree of configural similarity of the database image with the received points of interest in terms of spatial relationships between the points of interest, wherein a lower score indicates a lower degree of similarity and a higher score indicates a higher degree of similarity. Therefore, in this example, the ranking value for database images Ai to Di would be - database image Ai similarity ranking value: 2; database image Bi similarity ranking value 4; database image Ci similarity ranking value: 1; and database image Di similarity ranking value: 3. In other words, the calculated similarity ranking value indicates the ranking of overall configural similarity of a database image to the received points of interest in terms of the spatial
relationships between the points of interest. [0098] In the present embodiment, the similarity ranking value for each database image calculated in step 44 is stored in the storage module 340.
[0099] After the calculation of ranking values in step 44, in step 45 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated ranking values. In the example as described above, in step 45 the display 310 would be arranged to display database images Ai to Di in the order of: Ci, Ai, Di, and Bi. The user is therefore presented a ranked list of database images according to a level of configural similarity in terms of the spatial relationships between the points of interest.
[0100] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined. [0101] Figure 5 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention. [0102] The process starts at step 51. In step 51, the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area.
[0103] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9.
[0104] In the subsequent step 52, the query module 330 assigns a numerical label to each of the plurality of points of interests received in step 51. In the present embodiment, the query module 330 is arranged to assign numerical labels according to the order in which the points of interest are received in step 51. In other words, the first received point of interest is assigned , the second received point of interest is assigned '2', and so forth.
[0105] The method then proceeds to step 53, in which the query module 330 calculates, for each database image stored in the image database 200, a distance between each received points of interest and a corresponding point of interest in the database image.
[0106] In the present embodiment, each image in the image database 200 comprises at least one numerical label which is pre-assigned to a point of interest in the image according to at least one of: an order in which a user focus his/her gaze on the determined points of interest, a degree of visual attention from a user, and a user input.
[0107] As such, the query module 330 in the present embodiment is arranged to calculate, for each database image, a distance between the received point of interest which is assigned the numerical label with the point of interest in the database image which is pre-assigned with the numerical label . The same calculation process is repeated for the other received points of interest and each of the other database images stored in the image database 200.
[0108] In step 54, the query module 330 calculates, for each database image, an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area.
[0109] By way of illustrating how a weighting factor is generated, consider a point of interest of a database image (herein referred to as a "target") which has the coordinates [5,5], i.e. corresponding to the centre cell of the 9x9 cellular grid input area. In this particular example, any point of interest of the interrogative query received in step 51 can be at most displaced from the target by 4 cells in the x-axis or the y-axis. Based on the working approximation that any guessed input of a point of interest is equally likely in any location in the input area, the average displacement of a point of interest of an interrogative query (C[x,y]) by chance for a target with the coordinates [5>5] is 2.2 cells - i.e. C[5,5] = 2.2. On the other hand, a target with the coordinates [1,1] has an average displacement of an interrogative query by chance of 4 cells, i.e. (¾,ι] = 4, since a guessed input of a point of interest can be at most displaced from the target by 8 cells in the x-axis or the y-axis.
[0110] Accordingly, on the basis of the working approximation that all guessed input of points of interest are equally likely, the average displacement of a point of interest of an interrogative query from a specific target can be pre-calculated and stored as a look-up table in the storage module 340.
[0111] In step 54, for each point of interest in a database image, the query module 330 retrieves an average displacement of a point of interest of an interrogative query (C[x,y]) from the look-up table and generate a corresponding weighting factor so as to normalise the corresponding calculated distance for chance. In other words, the query module 330 multiplies each of the calculated distances and its respective weighting factor so as to obtain values of normalised calculated distances. Using the formula C = 1 - [normalised calculated distance], C for each of the received points of interest can be calculated where a value of o represents chance values and 1 represents absolutely accurate recall. In this embodiment, any negative value of C becomes rounded up to o. Also, in the present embodiment, the accuracy metric D for each database image is the average of C corresponding to all points of interest within the database image. In other words, the calculated similarity ranking value indicates the ranking of overall proximity similarity of a database image to the received points of interest in terms of the locations of the points of interest.
[0112] The method then proceeds to step 55, in which the query module 330 calculates a similarity ranking value for each database image based on the calculated distances. Specifically, in the present embodiment, the similarity ranking value for each of the plurality of database images is calculated based on the accuracy metrics calculated in step 54.
[0113] For example, database image A2 may have an accuracy metric DA2 of 0.29, database image B2 may have an accuracy metric DB2 of 0.37, database image C2 may have an accuracy metric Dc2 of 0.34, and database image D2 may have an accuracy metric DD2 of 0.28. According to the process for calculating an accuracy metric for a database image as described in step 54, the calculated accuracy metric for a database image indicates a degree of similarity of the database image with the received points of interest in terms of the proximity of the locations of corresponding points of interest, wherein a lower accuracy metric indicates a lower degree of similarity and a higher accuracy metric indicates a higher degree of similarity. Therefore, in this example, the similarity ranking value for database images A2 to D2 would be - database image A2 similarity ranking value: 3; database image B2 similarity ranking value 1; database image C2 similarity ranking value: 2; and database image D2 similarity ranking value: 4·
[0114] After the calculation of ranking values in step 55, in step 56 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated similarity ranking values. In the example as described above, in step 56 the display 310 would be arranged to display database images A2 to D2 in the order of: B2, C2, A2, and D2. The user is therefore presented a ranked list of database images according to a level of similarity in terms of the proximity of the locations of corresponding points of interest. [0115] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or may be combined. For example, step 52 may be omitted from the method described above, such that no numerical labels are assigned to the points of interest and no numerical labels are pre-assigned to the points of interest of the images stored in the image database 200. Also, as another example, step 54 may be omitted from the method described above. In this case, the calculation of ranking values would not be based on accuracy metric, but directly based on calculated distances instead. [0116] Figure 6 is a flowchart schematically illustrating various functionalities which may be provided by the system of Figure 3, according to another embodiment of the present invention.
[0117] The process starts at step 61. In step 61, the input module 320 receives input which specifies the locations for a plurality of points of interest within a displayed input area. As described with relation to Figure 3 above, the input area is displayed at the display 310 of the device 300 so as to allow a user to specify, as touch screen input, locations for a plurality of points of interest within the input area. [0118] In the present embodiment, the input area presented to the user at the display 310 is quantised into a uniform 9x9 cellular grid. The present embodiment uses the Cartesian coordinate system, wherein each point has an x-coordinate representing its horizontal position in the image and a y-coordinate representing its vertical position in the image. The spatial relationship between each received point of interest and each other point of interest comprises: a spatial relationship along the x-axis and a spatial relationship along the y-axis between each point of interest in the database image and each other point of interest in the database image. Therefore, in this embodiment, any received point of interest can be represented by a coordinate [x,y], where x and y take values from 1 to 9. [0119] In the subsequent step 62, the query module 330 assigns a numerical label to each of the plurality of points of interests received in step 61. In the present embodiment, the query module 330 is arranged to assign numerical labels according to the order in which the points of interest are received in step 61. In other words, the first received point of interest is assigned , the second received point of interest is assigned '2', and so forth.
[0120] In the subsequent step 63, the query module 330 evaluates a spatial relationship along the x-axis and a spatial relationship along the y-axis between each received point of interest and each other point of interest. In the context of step 63, the x-coordinate represents the horizontal position of a point of interest in the input area while the y-coordinate represents the vertical position of the point of interest in the input area. [0121] Specifically, the query module 330 evaluates each spatial relationship as a first point of interest having a coordinate value which is higher, lower, or equal to that of a second point of interest with respect to the x-axis or the y-axis. The evaluation of each spatial relationship is represented with a value of 1, -1, or o, which respectively corresponds to a coordinate value of the first point of interest which is higher, lower, or equal to that of the second point of interest.
[0122] The method then proceeds to step 64, in which the query module 330 compares, for each image in the image database 200, each spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image. By performing this comparison step, a configural similarity between the interrogative query and the database image in terms of locations of points of interest can be evaluated in a quantifiable manner.
[0123] In the present embodiment, each image in the image database 200 comprises at least one numerical label which is pre-assigned to a point of interest in the image according to at least one of: an order in which a user focus his/her gaze on the determined points of interest, a degree of visual attention from a user, and a user input.
[0124] The exemplary database image A3 may have five points of interest that correspond to the received five points of interest input by the user at the input module 320. Corresponding points of interest in a database image maybe determined by matching a received point of interest with an assigned numerical label with a point of interest in the database image with the same numerical label.
[0125] The query module 330 then compares each evaluated spatial relationship of the five received points of interest (from step 63) with the evaluated spatial
relationships of database image A3 and assign an individual score to the comparison of each evaluated spatial relationship of the received points of interest with a
corresponding spatial relationship between corresponding points of interest in the database image A3.
[0126] In this example, there are 20 possible spatial relationships, i.e. both the horizontal and vertical dimensions for the 10 spatial relationship permutations between the five points of interest. For each possible spatial relationship, the query module 330 determines whether there is a match.
[0127] If corresponding spatial relationship of two received points of interest and two corresponding points of interest in the database image A3 is a match, the query module 330 assigns a value of '1' to the individual spatial relationship comparison. If corresponding spatial relationship of two received points of interest and two points of interest in the database image A3 is not a match, the query module 330 assigns a value of '0' to the individual spatial relationship comparison.
[0128] For example, if it is evaluated in step 63 that the spatial relationship between the x-coordinate of a first received point of interest and the x-coordinate of a second received point of interest is "lower" (i.e. having a value of "-1"), and it is evaluated that the corresponding two points of interest in database image A3 is also "lower" (i.e. having a value of "-1"), then the query module 330 assigns a value of to this particular comparison. However, if it is evaluated that the spatial relationship of the corresponding two points of interest in database image A3 is "higher" or "equal" (i.e. having a value of "1" or "0" which does not match with "-1"), then the query module 330 assigns a value of '0' to the comparison. This comparison and value assignment step is applied to all 20 possible spatial relationships to generate an accumulated score NAI, where N for each of the database image is calculated using the formula N = ¾+¾+..._+n20 (nx representing individual spatial relationship comparison value for each of the 20 possible spatial relationships). The generated accumulated score N is in the range of o to 20, where o describes a complete logical reversal between the interrogative query and the database image, and 20 describes all spatial relationships matching between the interrogative query and the database image.
[0129] In practice, a proportion of spatial relationships can be expected to be matched by chance. The modal chance frequency of correct guesses is readily computable and in this case is estimated to be 7.5/20. In order to take into account the likelihood of matching spatial relationships by chance, a spatial composition similarity metric L for each database image is calculated using: L= (Ν-7.5)/(20-7·5). In this embodiment, any negative value of L becomes rounded up to o. Therefore, L is in the range from o to 1. In this case, o implies that an association between the interrogative query and the database image is as unlikely as it could be, and as the value of L approaches 1, the likelihood that the database image is the one the user had in mind when formulating the interrogative query increases to complete accuracy. [0130] The same process is repeated for database image B3, database image C3, etc. so as to generate a respective spatial composition similarity metric L (i.e. LB3, Lc3, etc.) for each database image stored in the image database 200.
[0131] The method then proceeds to step 65, in which the query module 330 calculates, for each database image stored in the image database 200, a distance between each received points of interest and a corresponding point of interest in the database image.
[0132] As mentioned, each image in the image database 200 in the present embodiment comprises at least one numerical label which is pre-assigned to a point of interest in the image. The query module 330 is arranged to calculate, for each database image, a distance between the received point of interest which is assigned the numerical label with the point of interest in the database image which is pre- assigned with the numerical label . The same calculation process is repeated for the other received points of interest and each of the other database images stored in the image database 200.
[0133] In step 66, the query module 330 calculates, for each database image, an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area. [0134] By way of illustrating how a weighting factor is generated, consider a point of interest of a database image (herein referred to as a "target") which has the coordinates [5,5], i.e. corresponding to the centre cell of the 9x9 cellular grid input area. In this particular example, any point of interest of the interrogative query received in step 51 can be at most displaced from the target by 4 cells in the x-axis or the y-axis. Based on the working approximation that any guessed input of a point of interest is equally likely in any location in the input area, the average displacement of a point of interest of an interrogative query (C[x,y]) by chance for a target with the coordinates [5,5] is 2.2 cells - i.e. C[5,5] = 2.2. On the other hand, a target with the coordinates [1,1] has an average displacement of an interrogative query by chance of 4 cells, i.e. C[i,i] = 4, since a guessed input of a point of interest can be at most displaced from the target by 8 cells in the x-axis or the y-axis. [0135] Accordingly, on the basis of the working approximation that all guessed input of points of interest are equally likely, the average displacement of a point of interest of an interrogative query from a specific target can be pre-calculated and stored as a look-up table in the storage module 340. [0136] In step 66, for each point of interest in a database image, the query module 330 can retrieve an average displacement of a point of interest of an interrogative query (C[x,y]) from the look-up table and generate a corresponding weighting factor so as to normalise the corresponding calculated distance for chance. In other words, the query module 330 multiplies each of the calculated distances and its respective weighting factor so as to obtain values of normalised calculated distances. Using the formula C = 1 - [normalised calculated distance], C for each of the received points of interest can be calculated where a value of o represents chance values and 1 represents absolutely accurate recall. In this embodiment, any negative value of C becomes rounded up to o. Also, in the present embodiment, the accuracy metric D for each database image is the average of C corresponding to all points of interest within the database image.
[0137] The method then proceeds to step 66, in which the query module 330 calculates a similarity ranking value for each database image based on the calculated distances in step 65 and the comparison of spatial relationships in step 64. In the present embodiment, the calculation of similarity ranking values is based on a composite metric Q which is evaluated using the formula Q = (L+D)/2. [0138] For example, database image A3 may have a generated metric LA3 of 0.31 and an accuracy metric DA3 of 0.37 which gives a composite metric QA3 of 0.34;
database image B3 may have a generated metric of LB3 0.29 and an accuracy metric of DB3 of 0.29 which give a composite metric QB3 of 0.29; database image C3 may have a generated metric L¾ 0.37 and an accuracy metric D¾ of 0.35 which gives a composite metric Qc3 of 0.36; and database image D3 may have a generated metric LD3 of 0.36 an accuracy metric DD3 of 0.34 which gives a composite metric QD3 of 0.35. The composite metric indicates a degree of similarity of the database image with the received points of interest both in terms of the proximity of the locations of corresponding points of interest ("proximity similarity") and the spatial relationships between the points of interest ("configural similarity"), wherein a lower accuracy metric indicates a higher degree of similarity and a higher accuracy metric indicates a lower degree of similarity. Therefore, in this example, the similarity ranking value for database images A3 to D3 would be - database image A3 similarity ranking value: 3; database image B3 similarity ranking value 4; database image C3 similarity ranking value: 1; and database image D3 similarity ranking value: 2.
[0139] After the calculation of ranking values in step 67, in step 68 the display 310 displays the plurality of database images stored in the image database 200 in a list according to the calculated similarity ranking values. In the example as described above, in step 68 the display 310 would be arranged to display database images A2 to D2 in the order of: C3, D3, A3, and B3. The user is therefore presented a ranked list of database images according to both proximal similarity and configural similarity. By employing both object-centred recall, in which the proximity of individual points of interests in relation to their notional correspondent in a database image is quantified and scene-configural recall, in which recall of the relative locations of a set of points of interests is quantified, the present embodiment provides a synergistic effect in query- processing in terms of image retrieval accuracy, for example compared to the embodiments described in relation to Figures 4 and 5.
[0140] If desired, the different steps discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more above-described steps may be optional or maybe combined. For example, step 68 may be omitted from the method described above, such that the database images are not displayed. [0141] Although it is described in an embodiment above that the display and the input module of the device are integrated as a touch screen display, in alternative embodiments the display and the input module may not be integrated as a single component. In addition, in alternative embodiments, the display may comprise other types of display devices, and the input module may comprise other types of input devices, e.g. a mouse, a keyboard, etc.
[0142] In alternative embodiments, the storage modules of the devices in Figure 1 and Figure 3 may be implemented as external components of the devices. Also, in alternative embodiments, the image database may be integrated into the devices of Figure 1 and Figure 3.
[0143] In alternative embodiments, the device may not be arranged to receive word labels that correspond to received/ determined points of interest.
[0144] Although it is described in an embodiment above that corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned word label with a point of interest in the database image with the same/ similar associated word label, in alternative embodiments, corresponding points of interest in a database image may be determined by matching a received point of interest with an assigned numerical label with a point of interest in the database image with the same pre-assigned numerical label. [0145] Although it is described in an embodiment above that the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received as at least part of a user query, in alternative embodiments the received points of interest may be assigned with numerical labels according to other factors, e.g. the location of each point of interest within the input area. Also, in other alternative embodiments, the received points of interest may not be assigned with numerical labels at all.
[0146] Although it is described in an embodiment above that the plurality of determined points of interest are assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user as an image is being displayed to the user, in alternative embodiments the determined points of interest may be assigned with numerical labels according to other factors, e.g. the location of each points of interest within the input area, or a duration for which the user focused their gaze on the one or more determined points of interest. Also, in other alternative embodiments, the determined points of interest may not be assigned with numerical labels at all.
[0147] Although it is described in an embodiment above that receiving the at least one word label comprises receiving a voice input for each received/determined point of interest, in alternative embodiments the at least one word label may be received in other methods, e.g. keyboard input. In these alternative embodiments, the input module may comprise a keyboard.
[0148] Although it is described in an embodiment above that the input area presented to the user at the display is quantised into a uniform 9x9 cellular grid, in alternative embodiments the input area may be quantised with a different grain of analysis, e.g. a 15x15 cellular grid, a 6x8 cellular grid, etc., according to requirements of the system and the sizes/dimensions of the database images in the image database.
[0149] In alternative embodiments, the system may not be further arranged to generate a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels prior to a comparison of spatial relationships of points of interest.
[0150] Although it is described in an embodiment above that the weighting factor associated with a received point of interest is generated based on the average displacement of a point of interest of an interrogative query from a specific target, in alternative embodiments the weighting factor may be generated based on other factors.
[0151] In alternative embodiments, the image stored in the image database may each comprise an identification number, and the display may be arranged to display the plurality of database images in a list according to the calculated ranking values together with their respective identification numbers.
[0152] In alternative embodiments, for each database image, a spatial relationship between each point of interest and each other point of interest may be pre-evaluated and stored together with the database image. [0153] Although it is described in an embodiment above that the tracking module comprises an eye tracker including a camera for recording movement of one or both eyes of the user, in alternative embodiments the tracking module may comprise other types of tracking device such as an eye-attached tracking device or an electrical potential measuring device.
[0154] In alternative embodiments, the control module of the device may be arranged to perform evaluation of a spatial relationship between each determined point of interest and each other determined point of interest in the manner as described in relation to Figure 3. In these alternative embodiments, the storage module of the device may be arranged to store a numerical representation of the evaluated spatial relationship of a database image together with the database image. [0155] Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware, and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an exemplary embodiment, the application software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate, or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. [0156] Although various aspects of the present disclosure are set out in the independent claims, other aspects of the present invention may comprise other combinations of features from the described embodiments and/ or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
[0157] It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims. For example, in some of the embodiments described throughout this description, word labels are assigned to received points of interest and in some other embodiments numerical labels are assigned to the received points of interest. It will be appreciated that in some alternative embodiments, the step of assigning word labels may be replaced with the step of assigning numerical labels, and vice versa. In some further alternative embodiments, the method may comprise assigning word labels and numerical labels to the received points of interest. In addition, in some further alternative embodiments, the method may comprise assigning other types of alphanumeric descriptors to the received points of interest.
Ap endix A
In order to illustrate methods of the present invention in more detail, descriptions of a set of 27 images that can be used in relation to the method shown in Figure 3 are included as follows. It is noted that the descriptions provided in the following merely serve as examples of image content and potential points of interest that a user may focus on during an eye-tracking operation in the method of Figure 3. It will be understood that other images with different points of interest may be shown to users for the purpose of storing the images in a database. Image 1: University Campus
During the last sunshine of autumn, a student is riding a bike to a lecture, others are walking to the nearby dining hall or walking home. Another student is sitting on a bench rehearsing the last seminar. The leaves of the trees are shining in autumnal colours above the green lawn.
Image 2: Basketball Game
An offensive player is shooting the ball for three points while the defensive effort of the player with the headband will be too late. The other players are trying to get into a good position for a potential rebound. The display panel under the roof is also showing the game from a different viewpoint.
Image 3: Building Site
In a small town, preparations are being made to renew the pipeline system. The workers are currently having their lunch break in a red van, while the man in charge in sitting in a white van behind it. The material, which includes bricks and red and black connectors are placed where the works are set to continue, along with a traffic sign that was removed from a nearby street.
Image 4: Cargo Ship
One of the world's largest cargo ships, the "Colombo Express" is approaching a harbour to have its hundreds of containers unloaded. The pilot vessel from its side looks small. The life boat at the stern is capable of transporting all of the crew.
Image : Car Park
This place is usually a car park for a logistics company. Due to additional works, also indicated by the shovel in the front, a green tractor with a drill at the back is left during a break at the car park. The other cars, most of them middle class compact cars, are undamaged.
Image 6: Cafe in the Desert
Somewhere in the desert, a traveller is parking his red convertible outside a derelict cafe. At this place, which once stood two houses and an ice cream shop, he is hoping to find some leftover refreshment but the only things that still remain are the telephone and light posts, which still connect the cafe to the rest of the world. Image 7: Fjord in Spring
This Fjord is the place where many fishermen start their day. One of them lives in the old grey house with the back garden. From here he has a wonderful view across the Fjord with its blue water, the hills at the shores and fishing boats. Image 8: Flock of Sheep
A flock of sheep are grazing in front of a grey wooden house. The house was built on a slope and has two storeys. It also has a veranda on the right. There is even a small hut hidden in the forest. Image Q: Hill in Cloud
A red vintage tractor is left on top of three garages which were built into the field in front of this mixed forest. The hill on which this forest is growing, is engulfed in rain clouds which might be a reason for the driver to leave the tractor outside. Image 10: School in field
Before the 19th century, this old single storey school with iron swings in the back garden was the home to the children of settlers in the west of the USA. The view from the school goes beyond the bell tower to the mountains in the far distance, back to the swings next to the school, where the children used to play during school breaks.
Image 11: Italian Port
This small port in Italy is dominated by the yellow four-storey house right at the port's footbridge. Some of the house's green window shutters are left open. The blue sign indicates a nautical shop. All places at the footbridge are occupied by small sailing and motor boats. Image 12: Mountain Panorama
A mountaineer is refilling his water bottle at high altitude in the Swiss Alps. In the yellow rucksack are all the necessary items for a long hike. The drinking trough is usually for the cows which spend the summer months on higher grasslands. In the background are the snow covered mountain tops.
Image 13: Narrow Street
The narrow street, which leads down to the port of an old fishing village, is just wide enough for the white Range Rover. A tourist at the side gives way. A bin protrudes from one house entrance at the side, and the old church tower can be seen in the distance.
Image 14: NASA Hangar
In front of this NASA Hangar is a selection of planes, ranging from small prop airliners in the back via a blue jet fighter in the middle to larger passenger airplanes in the front. Even a helicopter is on display to the left. Behind the hangar are a few radio towers.
Image 15: Old Settlement
This is one of the first settlements in North America. The sign gives the date of the foundation and the building behind it later housed the town mayor, as the American Flag at the entrance shows. The Road in front of the house, the Bayley-Hazen Road, leads to the town's school.
Image 16: Palace Gardens
These palace gardens in France open their gates to the public in summer to show what lies behind the richly decorated iron garden fence. The white fence encloses a collection of red tulips, green hedges and a tree which have grown on the premises for decades. The fence also has stone crowns to show the wealth of the former owners.
Image 17: Paris
Art dealers set up their shops alongside a busy road in the French capital with the most famous Parisian cathedral in the background. Behind the trees are the towers and spires of the cathedral. Spectators walk past the exhibited pictures and reprints that are on display. Image 18: Pedestrian Crossing A pedestrian is patiently waiting at this crossing in Vienna. The red sign indicates the Museums quarter in this city which seems to start behind the pedestrian. The traffic on this day is very busy as it becomes obvious from the cars standing on and around the crossing.
Image IQ: Icy River
This picture of a river was taken in January. The low temperatures at this time of the year make it form icicles where the water falls several centimetres. Nonetheless, underneath the snow mantle, there are the first signs of Spring.
Image 20: Sailing Vessel
An officer on the deck of a sailing vessel is silently watching another vessel which is closer to the port. Visitors are on board the ship with the golden ornaments while sailors are climbing on the masts. The ropes on the officer's vessel are neatly aligned on the wooden deck.
Image 21: Space Engineers
Some space engineers are discussing a problem with the casing of a communication satellite. Their blue badges identify them as NASA employees. The two engineers in the front are talking about the thermal protection shield whereas the engineers in the back are inspecting the bolts of the same shield.
Image 22: Space Observatory
To be unaffected by other light sources space observatories are built far away from modern civilization. These two observatories with their big domes house the telescopes and are connected via a small road.
Image 23: San Francisco Suburb
As night falls people arrive home in their yellow, blue, white or green wooden houses in a suburb of this famous American city. As the street lights are turned on, life still goes on downtown in the skyscrapers and nightclubs. The tallest building in the centre is overlooking the rest of the city.
Image 24: Tower House
This newly refurbished tower house is rehousing nurses. Some nurses have just come home from their day shift with their file folders. Its three storeys were the home of the biggest vase manufacturing company in Europe. The tower with its rounded silver top still remains the symbol of the town.
Image 25: English Village
In the centre of an English village lies a small park with a tree, a lawn and a hedge. The park itself is surrounded by old buildings with shops and small streets. Old street lights and benches in the park round up the image of a cosy, small, old village.
Image 26: Windmill
A Mediterranean windmill with a pitched red roof and small rectangular windows is set on top of a hill overlooking the surrounding areas. It also houses a restaurant with outside dining area, popular with tourists and locals. The green door at the front leads to another garden area adjoining the brown bricks of the building's foundation. Image 27: Windsurfing
A strong wind blows the contestants of a regional windsurfing championship to high speed. With the rocks in the back the surfer at the front has taken the lead by more than 15 lengths ahead of the last surfers with the red and green sails who have just passed the surface marker buoy.

Claims

Claims l. A method of processing an interrogative query for a plurality of images stored in an image database, comprising:
receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area;
evaluating a spatial relationship between each received point of interest and each other point of interest;
for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and
calculating a similarity ranking value for each database image based on the comparison of spatial relationships.
2. The method of claim l, further comprising displaying the plurality of database images in a list according to the calculated similarity ranking values.
3. The method of claim 1 or claim 2, wherein a spatial relationship along the x-axis and a spatial relationship along the y-axis are evaluated between each received point of interest and each other point of interest.
4. The method of claim 3, wherein each spatial relationship is evaluated as a first point of interest having a coordinate value which is higher, lower or equal to that of a second point of interest with respect to the x-axis or the y-axis; and
the evaluation of each spatial relationship is represented with a numerical value of 1, -1 or o respectively.
5. The method of claim 4, wherein the comparison of spatial relationships comprises determining whether the numerical values representing the evaluation of the two spatial relationships are a match, and assigning a score to the comparison based on the determination.
6. The method of any preceding claim, further comprising receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels.
7. The method of claim 6, wherein the receiving word labels comprises receiving a voice input for each received point of interest and generating a word label
corresponding to each voice input.
8. The method of claim 6 or claim 7, further comprising generating a subset of database images which comprise points of interest having word labels which
correspond to the plurality of received word labels.
9. The method of any one of claims 6 to 8, wherein the corresponding word labels in the plurality of database images are synonyms or generalisations of the received word labels.
10. The method of any one of claims 1 to 5, further comprising:
assigning a numerical label to each of the plurality of received points of interest;
wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.
11. The method of claim 10, wherein the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received.
12. The method of claim 10, wherein the plurality of received points of interest are assigned with numerical labels according to the location of each point of interest within the input area.
13. The method of claim 2 and of any of claims 3 to 12 when dependent on claim 2, wherein displaying the plurality of database images in a list according to the calculated similarity ranking values comprises displaying the plurality of database image in an order of highest similarity ranking value to lowest similarity ranking value.
14. A method of processing an interrogative query for a plurality of images stored in an image database, comprising:
receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area; for each image in the database, calculating a distance between each received point of interest and a corresponding point of interest in the database image; and
calculating a similarity ranking value for each database image based on the plurality of calculated distances.
15. The method of claim 14, further comprising displaying the plurality of database images in a list according to the calculated similarity ranking values.
16. The method of claim 14 or claim 15, further comprising:
calculating an accuracy metric based on the plurality of calculated distances and a weighting factor which is generated according to the location of the received point of interest within the input area;
wherein the similarity ranking of database images is based on the calculated accuracy metric for each database image.
17. The method of any of claims 14 to 16, further comprising receiving at least one word label for each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding word labels.
18. The method of claim 17, wherein the receiving word labels comprises receiving a voice input for each received point of interest and generating a word label
corresponding to each voice input.
19. The method of claim 17 or claim 18, further comprising:
generating a subset of database images which comprise points of interest having word labels which correspond to the plurality of received word labels.
20. The method of any one of claims 17 to 19, wherein the corresponding word labels in the plurality of database images are synonyms or generalisations of the received word labels.
21. The method of any of claims 14 to 16, further comprising assigning a numerical label to each of the plurality of received points of interest, wherein the corresponding points of interest in the database image are points of interest having corresponding numerical labels.
22. The method of claim 21, wherein the plurality of received points of interest are assigned with numerical labels according to the order in which the points of interest are received.
23. The method of claim 21, wherein the plurality of received points of interest are assigned with numerical labels according to the location of each point of interest within the input area.
24. The method of claim 15 and of any of claims 16 to 23 when dependent on claim 15, wherein displaying the plurality of database images in a list according to the calculated similarity ranking values comprises displaying the plurality of database image in order of highest similarity ranking value to lowest similarity ranking value.
25. A method for processing an interrogative query for a plurality of images stored in an image database, comprising:
receiving a query comprising a user input specifying locations for a plurality of points of interest within an input area;
evaluating a spatial relationship between each received point of interest and each other point of interest;
for each image in the database:
comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image;
calculating a distance between each received point of interest and a corresponding point of interest in the database image; and
calculating a similarity ranking value for each database image based on the plurality of calculated distances and the comparison of spatial relationships.
26. A device for processing an interrogative query for an image database storing a plurality of images, comprising:
a display for displaying an input area;
an input module for receiving an input specifying locations for a plurality of points of interest within the input area; and
a query module configured to:
evaluate a spatial relationship between each received point of interest and each other point of interest; for each image in the database, comparing each evaluated spatial relationship with a corresponding spatial relationship between corresponding points of interest in the database image; and
calculate a similarity ranking value for each database image based on the comparison of spatial relationships.
27. The device of claim 26, wherein the display is further configured to display the plurality of database images in a list according to the calculated similarity ranking values.
28. The device of claim 26 or claim 27, wherein the input module comprises at least one of: a mouse, a touch screen display apparatus, and an eye-tracking apparatus.
29. A method of structurally storing an image in an image database, comprising: displaying the image, which is pending storage, to a user for a predetermined period of time;
tracking an eye movement of the user over the time period;
determining one or more points of interest which are fixated upon by the user during the time period;
storing the image with data representing the one or more points of interest.
30. The method of claim 29, further comprising:
displaying the one or more determined points of interest to the user;
receiving at least one word label for each of the displayed points of interest; and storing the image with the at least one word label for each of the one or more points of interest.
31. The method of claim 30, wherein the receiving at least one word label comprises receiving a voice input for each determined point of interest and generating a word label corresponding to each voice input.
32. The method of claim 29, further comprising:
assigning a numerical label to each of the one or more determined points of interest; and
storing the image with the numerical label for each of the one or more points of interest.
33. The method of claim 32, wherein the one or more determined points of interest are assigned with numerical labels according to the frequency with which each point of interest is fixated upon by the user during the time period.
34. The method of claim 32, wherein the one or more determined points of interest are assigned with numerical labels according to the location of each point of interest within the input area.
PCT/GB2017/052658 2016-09-12 2017-09-11 Image storage and retrieval WO2018046959A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1615451.0 2016-09-12
GBGB1615451.0A GB201615451D0 (en) 2016-09-12 2016-09-12 Image storage and retrieval

Publications (1)

Publication Number Publication Date
WO2018046959A1 true WO2018046959A1 (en) 2018-03-15

Family

ID=57234768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2017/052658 WO2018046959A1 (en) 2016-09-12 2017-09-11 Image storage and retrieval

Country Status (2)

Country Link
GB (1) GB201615451D0 (en)
WO (1) WO2018046959A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287055A (en) * 2020-11-03 2021-01-29 亿景智联(北京)科技有限公司 Algorithm for calculating redundant POI data according to cosine similarity and Buffer
CN113221025A (en) * 2020-01-21 2021-08-06 百度在线网络技术(北京)有限公司 Interest point recall method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060251292A1 (en) * 2005-05-09 2006-11-09 Salih Burak Gokturk System and method for recognizing objects from images and identifying relevancy amongst images and information
WO2008142675A1 (en) * 2007-05-17 2008-11-27 Link-It Ltd. A method and a system for organizing an image database
WO2010128511A1 (en) * 2009-05-06 2010-11-11 Superfish Ltd. Method for organizing a database of images and retrieving images from that database according to a query image
US20130083999A1 (en) * 2011-09-30 2013-04-04 Anurag Bhardwaj Extraction of image feature data from images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060251292A1 (en) * 2005-05-09 2006-11-09 Salih Burak Gokturk System and method for recognizing objects from images and identifying relevancy amongst images and information
WO2008142675A1 (en) * 2007-05-17 2008-11-27 Link-It Ltd. A method and a system for organizing an image database
WO2010128511A1 (en) * 2009-05-06 2010-11-11 Superfish Ltd. Method for organizing a database of images and retrieving images from that database according to a query image
US20130083999A1 (en) * 2011-09-30 2013-04-04 Anurag Bhardwaj Extraction of image feature data from images

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHIOU-TING HSU ET AL: "Content-based image retrieval by interest-points matching and geometric hashing", VISUAL COMMUNICATIONS AND IMAGE PROCESSING; 20-1-2004 - 20-1-2004; SAN JOSE,, vol. 4925, 1 September 2002 (2002-09-01), pages 80 - 90, XP002305780, ISBN: 978-1-62841-730-2, DOI: 10.1117/12.481572 *
KEVIN JING: "Our crazy-fun new visual search tool | Blog", 8 November 2015 (2015-11-08), XP055430774, Retrieved from the Internet <URL:https://blog.pinterest.com/en/our-crazy-fun-new-visual-search-tool> [retrieved on 20171130] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221025A (en) * 2020-01-21 2021-08-06 百度在线网络技术(北京)有限公司 Interest point recall method, device, equipment and medium
CN113221025B (en) * 2020-01-21 2024-04-02 百度在线网络技术(北京)有限公司 Point-of-interest recall method, device, equipment and medium
CN112287055A (en) * 2020-11-03 2021-01-29 亿景智联(北京)科技有限公司 Algorithm for calculating redundant POI data according to cosine similarity and Buffer

Also Published As

Publication number Publication date
GB201615451D0 (en) 2016-10-26

Similar Documents

Publication Publication Date Title
US20240020968A1 (en) Improving geo-registration using machine-learning based object identification
Litton Aesthetic dimensions of the landscape
Zhou et al. Learning deep features for scene recognition using places database
KR102113969B1 (en) Method of improving classification accuracy of sns image data for tourism using space information deep learning, recording medium and device for performing the method
Weaver Ecotourism in the context of other tourism types.
Xiao et al. Sun database: Large-scale scene recognition from abbey to zoo
Grubinger et al. Overview of the ImageCLEFphoto 2007 photographic retrieval task
CN106462624A (en) Tile-based geocoder
Jiang et al. CU-VIREO374: fusing Columbia374 and VIREO374 for large scale semantic concept detection
CN114241464A (en) Cross-view image real-time matching geographic positioning method and system based on deep learning
Hemmersam Making the Arctic city: The history and future of urbanism in the circumpolar north
CN109784237A (en) The scene classification method of residual error network training based on transfer learning
WO2018046959A1 (en) Image storage and retrieval
CN109918509A (en) Scene generating method and scene based on information extraction generate the storage medium of system
Kennedy et al. LSCOM lexicon definitions and annotations (version 1.0)
Hubbard Borderland: Identity and belonging at the edge of England
WO2021117543A1 (en) Information processing device, information processing method, and program
CN113094603B (en) Tourist attraction recommendation method and system
Turner Bahamian ship graffiti
Tang et al. Urban street landscape analysis based on street view image recognition
Midtbø Indoor Maps for Orienteering Sport Events.
Torgersen Lavaland: Vernacular Seismology in Volatile Volcanic Environments in Puna, Hawai'i
Wallis A House of Wild Dreams and Dumb Schemes
Codes et al. Founded by
Marki et al. In memoriam–Jean-François Geleyn (1950–2015)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17768211

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17768211

Country of ref document: EP

Kind code of ref document: A1