EP1820125A1 - Adaptation of time similarity threshold in associative content retrieval - Google Patents

Adaptation of time similarity threshold in associative content retrieval

Info

Publication number
EP1820125A1
EP1820125A1 EP05821605A EP05821605A EP1820125A1 EP 1820125 A1 EP1820125 A1 EP 1820125A1 EP 05821605 A EP05821605 A EP 05821605A EP 05821605 A EP05821605 A EP 05821605A EP 1820125 A1 EP1820125 A1 EP 1820125A1
Authority
EP
European Patent Office
Prior art keywords
time
content item
candidate
distance
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05821605A
Other languages
German (de)
French (fr)
Inventor
Elmo M.A. Diederiks
Bartel M. Van De Sluis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of EP1820125A1 publication Critical patent/EP1820125A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Definitions

  • the present invention relates to the field of content retrieval, management and presentation, and to content item similarity threshold determination based on time usage and metadata.
  • a base time is determined. Such a base time may, for example, be a current time.
  • a first time is identified by extracting time data for a first identified content item. Then a first threshold may be set based on a criterion distance in time determined between the base time and the first time.
  • a candidate time may be identified and the time data for candidate content item extracted.
  • a distance between the base time and the first candidate time may be determined as a candidate distance.
  • a candidate content item may be selected as similar for organization of the database or for retrieval based on the first candidate distance in time and the first threshold, and a selection signal for the selected candidate content is output, accordingly.
  • criterion distance in time in time- determined granularity for setting a threshold is provided, according to which the threshold is set such that distance granularity is higher for times closer to the base time than for times further away from the base time.
  • a second threshold based on the criterion distance in time may be set, which second threshold together with the first threshold comprises a range, and then candidate content items are selected if the first candidate distance in time is within the range.
  • the first times may include a time of content item acquisition, a time of content item last usage, or a time of content item most usage.
  • the time may be a content item base time, a content item most recent modification time, or a content item creation time. Further additional identified content items may be identified, times and distances determined, so that the first threshold may also be set based on these criterion distance in time determined.
  • Figure 1 is a schematic view of a retrieval system according to an embodiment of the present invention.
  • Figure 2 is a flowchart of an operation of a system according to an embodiment of the present invention.
  • the retrieval system 1-1 includes several modules, which will be described below. Modules of the retrieval system 1-1, or portions thereof, and/or the retrieval system as a whole, may be comprised of hardware, software, firmware, or a combination of the foregoing, however some modules may be comprised of hardware for example, while other modules may be comprised of software, firmware or a combination thereof. It is to be understood that modules of the retrieval system need not all be located or integrated with the same device. A distributed architecture is also contemplated for the retrieval system, which may "piggy-back" off of suitable modules provided by existing devices.
  • the following description will refer to a retrieval system 1-1 that is physically integrated with or connected to a database 1-2 via a wired or wireless connection thereto.
  • a clock (not shown) may also be integrated with or connected to the retrieval system 1-1.
  • the database 1-2 may be embodied on a storage device such as on a hard drive of a personal computer, a personal video recorder, an entertainment system, an electronic organizer, a personal handheld device, a Jaz drive, or may be embodied as a commercial storage facility, such as a disk drive. It will be understood that the database 1-2 may include several storage devices that are connected, such that organization or grouping of content items on two or more of such devices is possible.
  • the database may be understood to include one or more storage media, such as disks, including CDs, DVDs, zip disks, floppy disks, data cartridges, or the like, which can be loaded onto and retrieved by the database 1-2.
  • the retrieval system 1-1 is also capable of retrieving content via a network 1-9, such as a LAN, WAN, the internet, or the like.
  • the retrieval system 1-1 includes a time data extractor 1- 11, which is a module that collects certain types of data from a content item.
  • the content item may be a video, or a video clip, a movie, a photo, a text file, music data, an audio file, or other type of multimedia data, a JPEG file, or XML data.
  • the video may be a home video shot on a digital video recorder
  • the movie may be commercially distributed film data, such as a film encoded as MPEG (including MPEG- 2, MPEG-3, or the like)
  • the photo may be a digital photograph data, or series of photographs or a photograph album
  • the text file may be a word processor produced file, a spreadsheet, or a computer code file
  • the music data may be an MP3 file or the like, and so forth.
  • the description data extracted by the time data extractor 1-11 includes information, such as metadata or usage data about the content item.
  • information may also include time data for the content item, such as time of the creation of the item, time of acquisition of the item; the last/first/penultimate et cetera time of playback and/or editing of the content item; and, a time of most usage, for example, the item is mostly used around 8 PM, or on a given day of the week, month, or year, the item is mostly used at night, or the like. "Usually” as used herein may be based on an average use time, median use time, a mode of use time, or the like.
  • usage history data is sometimes known as metadata, and conversely, types of metadata are sometimes referred to as usage history data.
  • the time information discussed herein may be one or many such similarity dimensions, or it may be the only or the most weighty dimension. The degree to which such factors are weighted (if at all) would depend on the application and the needs of the user.
  • the identified content item may be identified in one of several ways.
  • a user may designate the item based on which other items, sometimes referred to as “candidate content items" are to be retrieved.
  • a content item newly added or created may automatically be designated as an identified content item based on which other items are to be retrieved.
  • a base time is determined by base time determiner 1-13.
  • Such a base time may be a current time entered or set by the user, previously programmed, determined with reference to a clock (not shown), or determined from the internet or another network, or by combination of the foregoing.
  • the base time, the time associated with the identified content items and the candidate content items may each include a date and/or time.
  • a date without time will be sufficient, or even more relevant.
  • both time and date would be used. It will be understood that such date information and the time information could be converted to a format that will facilitate computation of a distance in time and comparison with other dates and times.
  • the time data extractor 1-11 determines a time associated with the identified content item or items and determines the distance in time (that is, the amount of time that has elapsed) between the time associated with the identified content item(s) and the base time. This distance is sometimes called a first criterion distance in time.
  • the time associated with the identified content item or items may be determined by reference to metadata associated with the content item, a database index, or by reference to the network 1-9, including for example the world wide web, by requesting user input, or a combination of the foregoing.
  • the distance in time may be determined by referring to a table, by computation, by requesting user input, or by a combination of the foregoing.
  • Threshold setter 1-14 sets a threshold or range that candidate content items must meet to be selected. The threshold or range is set by threshold setter 1-14 based on the first criterion distance in time.
  • Candidate content item identifier 1-12 identifies candidate content items in the database, over the network connection or from other sources, that are similar with respect to their metadata or other information and/or based on their distance from the base time to the distance in time of the first identified content item to the base time.
  • Controller 1-15 coordinates overall functioning of the retrieval system 1-1 and interacts with user interface 1-1, the database 1-2, the server 1-9, and the outside generally, and handles system settings.
  • Selector 1-16 selects qualifying candidate content items and result output 1-17 provides a results signal for the selected and/or the rejected candidate content items.
  • Result output 1-17 interfaces with other devices and communication with the outside, including interfacing with a user (not shown).
  • retrieval result output 1-17 signals to the user interface of content items retrieved by the retrieval system 1-1.
  • User interface 1-3 may be a separate device or may be integrated with another device or system, such as a personal computer or a personal video recorder, or one or more of the storage and other devices enumerated above.
  • this process of time metadata and/or usage extraction and distance in time determination may be repeated for any number of available identified content items 1-N, N being a positive integer greater than 1. Then, the candidate content item selection is performed based on an average of all such criterion distances in time.
  • a first content item is identified, as described above, by a user via user interface 1-3 shown in Figure 1, or automatically by the system, for example by a detection of a newly added content item or an isolated content item in database 1-2.
  • base time determiner 1-13 determines a base time, as discussed above.
  • Time data extractor 1-11 of retrieval system 1-1 extracts first time data for the first content item identified, as stated at S2 of Figure 2.
  • additional identified content items may be similarly processed (time data extracted for first-Nth identified content items), for example if the user or the system designates several "anchor" documents based on which target documents are to be retrieved.
  • a criterion distance in time is determined by time metadata extractor 1-11 for each identified content item, by determining a distance in time between the base time and the time of the identified content item.
  • such criterion distances in time may be averaged to arrive at an average criterion distance in time.
  • average may be determined based on a computation of the arithmetic mean, mode, or median. Further, a simple sum of the values may be used as well as some such statistical function suitably selected to provide a composite view of the selected values.
  • a threshold or range is set based on the criterion distances in time or the average criterion distance in time.
  • a threshold may be assigned such that a value of 1 or 0 may indicate a very small distance in time between the base time and the first identified content item, while a value of 9 or 10 may indicate a great distance in time.
  • thresholds may represent for example, “identical time”, “very close time”, or “close time,” “distant time” or “very distant time” or some such designation. It will be understood that numerous other schemes for such values may be used without departing from the spirit of the present invention.
  • a second threshold may similarly be chosen.
  • the first threshold thus may represent a maximum distance in time
  • the second threshold may represent a minimum distance in time, thus together the thresholds comprising a range.
  • Candidate items would be selected only if their distance (the distance between the base time and the candidate content item time) falls within the range.
  • candidate content item identifier 1-12 of Figure 1 identifies candidate content items in the database 1-2, over a network or elsewhere, while time data extractor 1-11 ( Figure 2) extracts time data for each of the candidate content items.
  • the process of distance in time determination for the candidate content item is then performed at S7. Further identified content items may also be available, and the process of extracting the time data and determining distance in time values would continue for candidate content items 1-M.
  • a distance in time of the candidate content item is compared to the threshold or range by selector 1-16. If the distance is under the threshold or within the range then it is selected.
  • a distance in time of two hours, representing the distance between the base time and the first identified content item is the first criterion distance in time.
  • a threshold distance in time is set, for example, as "4 hours,” as “same day”, “same period of day” as “close distance in time,” or as "4" (4 being an integer assigned from 0-9, where 0 means in substantially the same time, and 9 meaning very distant in time).
  • a distance of a candidate content item is compared with this threshold, and the candidate item is selected, at S 8, if the distance in time of the candidate content item from the base time is within 4 hours, within "same day,” within “same period of day”, within the "close distance in time,” or with the distance in time scale "4" threshold.
  • the threshold is set such that distance granularity is higher for times closer to the base time than for times further away from the base time. Therefore, for example, if the distance from the base time were to be ranked on a scale of 1 to 10, then as the distance from the base time increased, longer distances would be encompassed by fewer gradations of the scale.
  • the criterion distance in time the distance between the base time and an identified document or content item
  • a first candidate content item might be judged not similar if the candidate distance from the base is 5 hours.
  • a second candidate document might be judged similar even if the second candidate distance from the base is 6000 days.
  • Such thresholding is based on the idea that often people intuitively think of differences in distance in time between instances in the more distant past as less important than between equally distant times in the more recent past: the farther in distance in time one moves from the relevant base time, the less important, in terms of determining similarity, are the distances in time between instances.
  • Such thresholding is sometimes referred to herein as criterion distance in time-determined granularity thresholding.
  • a range may also be generated at S5 by threshold setter 1-14, using a maximum and a minimum threshold, based on the spread in the set of identified content item distance values.
  • the maximum threshold would be as described, and a minimum threshold, for example, “different hour,” “at least 1 hour,” “very close in time,” or as distance ranking 2.
  • the range for content items that are selected would be "different hour, same day,” “different hour, same period of day,” “1-4 hours,” “very close distance in time-close distance in time,” or “2-4” scale, depending on the system of thresholding/ranges used.
  • multiple “base” times could be used, and the criterion distance in time- determined granularity would be applied for each such base time separately.
  • an actual current time and time of a signification event in the past could be a second base time.
  • the level of granularity would decrease with distance from base time 1 (the further in the past the candidate document time, the greater the amount of elapsed time that would be considered similar), and would similarly decrease with distance in time from base time 2.
  • the idea is that for a person who, for example, had a wedding on particular day, the differences in time closer to that second base time would matter more and therefore would need a higher granularity.
  • Such second, third, L-th, et cetera, (L being an integer greater than 3) base time could be set by the user or determined by the system according, for example, to the ways of determining base times discussed above.
  • a significant number or percentage of documents associated with the user for example, documents residing in the user' s computer, database, handheld, et cetera
  • a content item time for example, date/time of creation or last use or the like
  • a significant number of content items for example, using a threshold or based on a statistical function showing a significantly higher than normal concentration of content items, for purposes of illustration, wedding pictures, wedding video, music, e-mails, on or near the date of the user' s wedding in the past, could be determined as such an additional base time.
  • Criterion distances in time could then be determined, and thresholds set according to the criterion distance in time- determined granularity, based on such additional base times.
  • the content item retrieved may be of a content type different from the content type of the user-selected content item.
  • the user- selected content item is of the type music file, or MP3
  • the retrieved content item may be of the content type photograph data. In this way, for example, pictures of a certain genre may be retrieved to match user- selected music based on similarity in time.
  • This (or these) selected candidate content item(s) are provided to the user or to the user interface 1-3 at S9.
  • a signal may be provided directly to the database 1-2 to cause retrieval of the selected candidate item to the database or to the user interface 1-3.
  • a signal may be provided if a candidate content item is rejected.
  • a notification may be provided to user interface 1-3 to notify a user (not shown) of a retrievable content item.
  • the notification may consist of an identification of the content item to be retrieved, a description of the content item, a URL or a link to the content item, a retrieval of the entire content item or a portion thereof, or a combination of the foregoing.
  • the system may also be used to group the retrieved item selected with the anchor item to organize a database. At SlO, processing terminates.

Abstract

Retrieval of similar content item or documents is provided based on similarity of an associated time, such as time of creation or usage. A time of one or more identified anchor document(s), in a database for example, is determined by extracting time metadata. Then a first threshold is set based on a criterion distance in time determined between the base time or current time and the anchor item time. A candidate document item time is identified and the time metadata for candidate content item may be extracted. A distance in time between the base time and the candidate time may be determined as a candidate distance in time. A candidate content item may be selected as similar for retrieval based on the first candidate distance in time and the first threshold. The notion of criterion distance in time in time- determined granularity for setting a threshold is provided.

Description

ADAPTATION OF TIME SIMILARITY THRESHOLD IN ASSOCIATIVE CONTENT RETRIEVAL
The present invention relates to the field of content retrieval, management and presentation, and to content item similarity threshold determination based on time usage and metadata.
In recent years, the storage capacity of storage devices and databases, including hard drives on personal computers and other types of storage media has been rapidly increasing. Storage capacity, by some estimates, doubles approximately every year or so, while network bandwidth also has been increasing very rapidly. As a result, storage devices store a greater amount of content to which user access needs to be facilitated. A user can be overwhelmed with content stored on a storage device or database, even on the user's own hard drive, and may not be able to retrieve content that is available on a network, such as the internet, unless the content is somehow managed or organized to provide convenient access for the user. Content that is not indexed or organized in a manner transparent to the user may be "lost" as far the needs of the user are concerned and be unlikely to be retrieved.
Many data retrieval schemes are known. Farnham et al., U.S. Patent Application Publication No. 2003/0158855 discloses automatic context associations, in which associations are dynamically generated between objects or metadata, such that a degree of similarity, represented as a numeric value, between computer files is determined. Stubler et al., U.S. Patent Application Publication No. 2002/0188602, discloses generation of captions or semantic labels for acquired images based on similarity of the acquired image with stored images by extracting metadata for the acquired image. Platt, U.S. Patent Publication No. 2003/0221541, discloses an automatic playlist generator, in which several seed songs, including "undesirable seed" songs are used to generate songs on a playlist. Cluts, U.S. Patent No. 5,616,876, discloses selecting additional songs that are like a first set of songs, based on "style labels" for each song previously written by an editor. Gargi, U.S. Patent Publication No. 2004/0098362, discloses an automated propagation of document metadata, including a time of creation. However, none of these references discloses setting a threshold for time similarity in selecting or rejecting target items. Prince, U.S. Patent Application Publication No. 2002/0099696, discloses fuzzy database retrieval in which a degree of similarity is given a score and threshold are used to select items to be retrieved. However, neither Prince nor the other references discloses or suggests setting a threshold based on a base time, nor setting the threshold based on a distance in time between the base time and an identified item time.
It is also possible of course for a user to retrieve content items, however attempting to locate similar items can be a time-consuming and onerous job, particularly if the content type of desirable items is not known or specified by the user. Further, as content items continue to accumulate in a storage device or database controlled by the user, the job of retrieving content items becomes ever more difficult.
Provided are a method, system, device, apparatus, and computer-readable media that embodies or carries out the functions of a retrieval system. The selected candidate content item or items are provided. A base time is determined. Such a base time may, for example, be a current time. A first time is identified by extracting time data for a first identified content item. Then a first threshold may be set based on a criterion distance in time determined between the base time and the first time. A candidate time may be identified and the time data for candidate content item extracted. A distance between the base time and the first candidate time may be determined as a candidate distance. A candidate content item may be selected as similar for organization of the database or for retrieval based on the first candidate distance in time and the first threshold, and a selection signal for the selected candidate content is output, accordingly.
The notion of criterion distance in time in time- determined granularity for setting a threshold is provided, according to which the threshold is set such that distance granularity is higher for times closer to the base time than for times further away from the base time. Further, a second threshold based on the criterion distance in time may be set, which second threshold together with the first threshold comprises a range, and then candidate content items are selected if the first candidate distance in time is within the range.
The first times may include a time of content item acquisition, a time of content item last usage, or a time of content item most usage. The time may be a content item base time, a content item most recent modification time, or a content item creation time. Further additional identified content items may be identified, times and distances determined, so that the first threshold may also be set based on these criterion distance in time determined.
Figure 1 is a schematic view of a retrieval system according to an embodiment of the present invention.
Figure 2 is a flowchart of an operation of a system according to an embodiment of the present invention.
The following discussion and the foregoing figures describe embodiments of Applicant's invention as best understood presently by the inventors however, it will be appreciated that numerous modifications of the invention are possible and that the invention may be embodied in other forms and practiced in other ways without departing from the spirit of the invention. Further, features of embodiments described may be omitted, combined selectively or as a whole with other embodiments, or used to replace features of other embodiments, or parts thereof, without departing from the spirit of the invention. The figures and the detailed description are therefore to be considered as an illustrative explanation of aspects of the invention, but should not be construed to limit the scope of the invention.
As shown in Figure 1, the retrieval system 1-1 includes several modules, which will be described below. Modules of the retrieval system 1-1, or portions thereof, and/or the retrieval system as a whole, may be comprised of hardware, software, firmware, or a combination of the foregoing, however some modules may be comprised of hardware for example, while other modules may be comprised of software, firmware or a combination thereof. It is to be understood that modules of the retrieval system need not all be located or integrated with the same device. A distributed architecture is also contemplated for the retrieval system, which may "piggy-back" off of suitable modules provided by existing devices.
The following description will refer to a retrieval system 1-1 that is physically integrated with or connected to a database 1-2 via a wired or wireless connection thereto. A clock (not shown) may also be integrated with or connected to the retrieval system 1-1. The database 1-2 may be embodied on a storage device such as on a hard drive of a personal computer, a personal video recorder, an entertainment system, an electronic organizer, a personal handheld device, a Jaz drive, or may be embodied as a commercial storage facility, such as a disk drive. It will be understood that the database 1-2 may include several storage devices that are connected, such that organization or grouping of content items on two or more of such devices is possible. The database may be understood to include one or more storage media, such as disks, including CDs, DVDs, zip disks, floppy disks, data cartridges, or the like, which can be loaded onto and retrieved by the database 1-2. However, it will be understood that the retrieval system 1-1 is also capable of retrieving content via a network 1-9, such as a LAN, WAN, the internet, or the like.
As shown in Figure 1, the retrieval system 1-1 includes a time data extractor 1- 11, which is a module that collects certain types of data from a content item. The content item may be a video, or a video clip, a movie, a photo, a text file, music data, an audio file, or other type of multimedia data, a JPEG file, or XML data. For example, the video may be a home video shot on a digital video recorder, the movie may be commercially distributed film data, such as a film encoded as MPEG (including MPEG- 2, MPEG-3, or the like), the photo may be a digital photograph data, or series of photographs or a photograph album, the text file may be a word processor produced file, a spreadsheet, or a computer code file, the music data may be an MP3 file or the like, and so forth.
The description data extracted by the time data extractor 1-11 includes information, such as metadata or usage data about the content item. Such information may also include time data for the content item, such as time of the creation of the item, time of acquisition of the item; the last/first/penultimate et cetera time of playback and/or editing of the content item; and, a time of most usage, for example, the item is mostly used around 8 PM, or on a given day of the week, month, or year, the item is mostly used at night, or the like. "Mostly" as used herein may be based on an average use time, median use time, a mode of use time, or the like. Such usage history data is sometimes known as metadata, and conversely, types of metadata are sometimes referred to as usage history data.
The time information discussed herein may be one or many such similarity dimensions, or it may be the only or the most weighty dimension. The degree to which such factors are weighted (if at all) would depend on the application and the needs of the user.
It will be understood that such description data about the item may be located and extracted in a variety of ways, including from the item, from an index or database management file, or from an outside source such as from the World Wide Web connected to the retrieval system 1-1 via a wired or a wireless connection to the Internet
1-9.
The identified content item may be identified in one of several ways. A user may designate the item based on which other items, sometimes referred to as "candidate content items" are to be retrieved. Alternatively, a content item newly added or created may automatically be designated as an identified content item based on which other items are to be retrieved.
A base time is determined by base time determiner 1-13. Such a base time may be a current time entered or set by the user, previously programmed, determined with reference to a clock (not shown), or determined from the internet or another network, or by combination of the foregoing. It will be understood that the base time, the time associated with the identified content items and the candidate content items may each include a date and/or time. For some applications, a date without time will be sufficient, or even more relevant. For many applications, both time and date would be used. It will be understood that such date information and the time information could be converted to a format that will facilitate computation of a distance in time and comparison with other dates and times.
The time data extractor 1-11 determines a time associated with the identified content item or items and determines the distance in time (that is, the amount of time that has elapsed) between the time associated with the identified content item(s) and the base time. This distance is sometimes called a first criterion distance in time. The time associated with the identified content item or items may be determined by reference to metadata associated with the content item, a database index, or by reference to the network 1-9, including for example the world wide web, by requesting user input, or a combination of the foregoing. The distance in time may be determined by referring to a table, by computation, by requesting user input, or by a combination of the foregoing. Threshold setter 1-14 sets a threshold or range that candidate content items must meet to be selected. The threshold or range is set by threshold setter 1-14 based on the first criterion distance in time.
Candidate content item identifier 1-12 identifies candidate content items in the database, over the network connection or from other sources, that are similar with respect to their metadata or other information and/or based on their distance from the base time to the distance in time of the first identified content item to the base time.
Controller 1-15 coordinates overall functioning of the retrieval system 1-1 and interacts with user interface 1-1, the database 1-2, the server 1-9, and the outside generally, and handles system settings.
Selector 1-16 selects qualifying candidate content items and result output 1-17 provides a results signal for the selected and/or the rejected candidate content items. Result output 1-17 interfaces with other devices and communication with the outside, including interfacing with a user (not shown). In particular retrieval result output 1-17 signals to the user interface of content items retrieved by the retrieval system 1-1. User interface 1-3 may be a separate device or may be integrated with another device or system, such as a personal computer or a personal video recorder, or one or more of the storage and other devices enumerated above.
There may be additional available identified content items. Thus, this process of time metadata and/or usage extraction and distance in time determination may be repeated for any number of available identified content items 1-N, N being a positive integer greater than 1. Then, the candidate content item selection is performed based on an average of all such criterion distances in time.
An operation of an embodiment of the present invention will now be described with reference to Figures 1-2. A first content item is identified, as described above, by a user via user interface 1-3 shown in Figure 1, or automatically by the system, for example by a detection of a newly added content item or an isolated content item in database 1-2.
At Sl, base time determiner 1-13 determines a base time, as discussed above. Time data extractor 1-11 of retrieval system 1-1 extracts first time data for the first content item identified, as stated at S2 of Figure 2. At S2, additional identified content items may be similarly processed (time data extracted for first-Nth identified content items), for example if the user or the system designates several "anchor" documents based on which target documents are to be retrieved.
At S3, a criterion distance in time is determined by time metadata extractor 1-11 for each identified content item, by determining a distance in time between the base time and the time of the identified content item. At S4, such criterion distances in time may be averaged to arrive at an average criterion distance in time. As used herein, average may be determined based on a computation of the arithmetic mean, mode, or median. Further, a simple sum of the values may be used as well as some such statistical function suitably selected to provide a composite view of the selected values. At S5, a threshold or range is set based on the criterion distances in time or the average criterion distance in time. For example, a threshold may be assigned such that a value of 1 or 0 may indicate a very small distance in time between the base time and the first identified content item, while a value of 9 or 10 may indicate a great distance in time. Alternatively, instead of using a scale of 1 to 10, thresholds may represent for example, "identical time", "very close time", or "close time," "distant time" or "very distant time" or some such designation. It will be understood that numerous other schemes for such values may be used without departing from the spirit of the present invention.
Further, a second threshold may similarly be chosen. The first threshold thus may represent a maximum distance in time, while the second threshold may represent a minimum distance in time, thus together the thresholds comprising a range. Candidate items would be selected only if their distance (the distance between the base time and the candidate content item time) falls within the range.
At S6, candidate content item identifier 1-12 of Figure 1 identifies candidate content items in the database 1-2, over a network or elsewhere, while time data extractor 1-11 (Figure 2) extracts time data for each of the candidate content items. The process of distance in time determination for the candidate content item is then performed at S7. Further identified content items may also be available, and the process of extracting the time data and determining distance in time values would continue for candidate content items 1-M. At S8, a distance in time of the candidate content item is compared to the threshold or range by selector 1-16. If the distance is under the threshold or within the range then it is selected.
For instance, if the base time is determined as 8 AM, November 22, 2004, and the first identified content item time is determined to be 6 AM, November 22, 2004, then a distance in time of two hours, representing the distance between the base time and the first identified content item, is the first criterion distance in time. Based on this 2 hour distance in time, or based on an average of the criterion distances in times determined from identified content items 1-N, a threshold distance in time is set, for example, as "4 hours," as "same day", "same period of day" as "close distance in time," or as "4" (4 being an integer assigned from 0-9, where 0 means in substantially the same time, and 9 meaning very distant in time). Then, a distance of a candidate content item is compared with this threshold, and the candidate item is selected, at S 8, if the distance in time of the candidate content item from the base time is within 4 hours, within "same day," within "same period of day", within the "close distance in time," or with the distance in time scale "4" threshold.
According to an aspect of the present invention, the threshold is set such that distance granularity is higher for times closer to the base time than for times further away from the base time. Therefore, for example, if the distance from the base time were to be ranked on a scale of 1 to 10, then as the distance from the base time increased, longer distances would be encompassed by fewer gradations of the scale. Thus, if the criterion distance in time (the distance between the base time and an identified document or content item) is 1 hour, then a first candidate content item might be judged not similar if the candidate distance from the base is 5 hours. However, if the criterion distance in time is 1000 days, then a second candidate document might be judged similar even if the second candidate distance from the base is 6000 days. Such thresholding is based on the idea that often people intuitively think of differences in distance in time between instances in the more distant past as less important than between equally distant times in the more recent past: the farther in distance in time one moves from the relevant base time, the less important, in terms of determining similarity, are the distances in time between instances. Such thresholding is sometimes referred to herein as criterion distance in time-determined granularity thresholding. As discussed, a range may also be generated at S5 by threshold setter 1-14, using a maximum and a minimum threshold, based on the spread in the set of identified content item distance values. Thus, using the above-discussed example with just one identified content item, the maximum threshold would be as described, and a minimum threshold, for example, "different hour," "at least 1 hour," "very close in time," or as distance ranking 2. Then, the range for content items that are selected would be "different hour, same day," "different hour, same period of day," "1-4 hours," "very close distance in time-close distance in time," or "2-4" scale, depending on the system of thresholding/ranges used. According to an aspect of the invention, multiple "base" times could be used, and the criterion distance in time- determined granularity would be applied for each such base time separately. For instance, an actual current time and time of a signification event in the past, for example, a date of a user's wedding, birth of a child, anniversary, or the like, could be a second base time. Thus, the level of granularity would decrease with distance from base time 1 (the further in the past the candidate document time, the greater the amount of elapsed time that would be considered similar), and would similarly decrease with distance in time from base time 2. The idea is that for a person who, for example, had a wedding on particular day, the differences in time closer to that second base time would matter more and therefore would need a higher granularity. Such second, third, L-th, et cetera, (L being an integer greater than 3) base time could be set by the user or determined by the system according, for example, to the ways of determining base times discussed above. Thus, for example, if the system detects that a significant number or percentage of documents associated with the user (for example, documents residing in the user' s computer, database, handheld, et cetera) have a content item time (for example, date/time of creation or last use or the like) at a particular time, then such an additional base time could be set. For instance, a significant number of content items, for example, using a threshold or based on a statistical function showing a significantly higher than normal concentration of content items, for purposes of illustration, wedding pictures, wedding video, music, e-mails, on or near the date of the user' s wedding in the past, could be determined as such an additional base time. Criterion distances in time could then be determined, and thresholds set according to the criterion distance in time- determined granularity, based on such additional base times.
According to an aspect of the present invention, the content item retrieved may be of a content type different from the content type of the user-selected content item. For example, if the user- selected content item is of the type music file, or MP3, then the retrieved content item may be of the content type photograph data. In this way, for example, pictures of a certain genre may be retrieved to match user- selected music based on similarity in time.
This (or these) selected candidate content item(s) are provided to the user or to the user interface 1-3 at S9. A signal may be provided directly to the database 1-2 to cause retrieval of the selected candidate item to the database or to the user interface 1-3.
Alternatively (or additionally), a signal may be provided if a candidate content item is rejected.
A notification may be provided to user interface 1-3 to notify a user (not shown) of a retrievable content item. The notification may consist of an identification of the content item to be retrieved, a description of the content item, a URL or a link to the content item, a retrieval of the entire content item or a portion thereof, or a combination of the foregoing. The system may also be used to group the retrieved item selected with the anchor item to organize a database. At SlO, processing terminates. Embodiments of the present invention provided in the foregoing written description are intended merely as illustrative examples. It will be understood however, that the scope of the invention is provided in the claims.

Claims

1. A content item retrieval method comprising: determining (Sl) a base time; extracting (S2), as a first time, time metadata for a first identified content item; setting (S5) a first threshold based on a criterion distance in time determined (S3) between the base time and the first time; extracting (S6), as first candidate time, time metadata for a first candidate content item, and determining (S7), as a first candidate distance in time, the distance between the base time and the first candidate time; selecting (S8) the first candidate content item based on the first candidate distance in time and the first threshold; and outputting (S9) a selection signal for the first candidate content item when the first candidate content item is selected.
2. The method of claim 1, wherein the first threshold is set based on criterion distance in time- determined granularity.
3. The method of claim 1, further comprising setting a second threshold based on the criterion distance in time, which second threshold together with the first threshold comprises a range, and selecting the first candidate content item when the first candidate distance in time is within the range.
4. The method of claim 1, wherein at least one of the first time and the first candidate time comprises a time of content item acquisition, a time of content item last usage, a time of content item first usage, and a time of content item most usage.
5. The method of claim 1, wherein at least one of the first time and the first candidate time comprises a content item most recent modification time, and a content item creation time.
6. The method of claim 1, further comprising: extracting (S2), as a second time, time metadata for a second identified content item; and setting (S5) the first threshold based also on a second criterion distance in time determined as the distance between the base time and the second time.
7. A content item retrieval system comprising: a base time extractor (1-13) configured to determine a base time; a time metadata extractor (1-11) configured to extract, as a first time, time metadata for a first identified content item and to determine, as a criterion distance in time, a distance in time between the base time and the first time; a threshold setter (1-14) configured to set a first threshold, based on a criterion distance in time determined as a distance between the base time and the first time; said metadata extractor (1-11) configured to extract, as first candidate time, the metadata for a first candidate content item, and to determine, as a first candidate distance in time, the distance in time between the base time and the first candidate time; a selector (1-16) configured to select the first candidate content item based on the first candidate distance in time and the first threshold; and a result output (1-17) configured to output a selection signal for the first candidate content item when the first candidate time is selected.
8. The system of claim 7, wherein the first threshold is set based on criterion distance in time- determined granularity.
9. The system of claim 7, further comprising said threshold setter (1- 14) setting a second threshold based on the criterion distance in time, which second threshold together with the first threshold comprises a range, and said selector selecting the first candidate content item when the first candidate distance in time is within the range.
10. The system of claim 7, wherein at least one of the first time and the first candidate time comprises a time of content item acquisition, a time of content item last usage, a time of content item first usage, and a time of content item most usage.
11. The system of claim 7, wherein at least one of the first time and the first candidate time comprises a content item most recent modification time, and a content item creation time.
12. The system of claim 7, wherein at least one of the first time and the first candidate time comprises one of a frequently used time period, a recently used time period by a user, and a time period of most usage by a user.
13. The system of claim 7, comprising: said time data extractor (1-11) is configured to extract, as a second time, time metadata for a second identified content item; and said threshold setter (1-14) is configured to set the first threshold based also on a second criterion distance in time determined as the distance between the base time and the second time.
EP05821605A 2004-12-01 2005-11-30 Adaptation of time similarity threshold in associative content retrieval Withdrawn EP1820125A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US63213604P 2004-12-01 2004-12-01
PCT/IB2005/053983 WO2006059293A1 (en) 2004-12-01 2005-11-30 Adaptation of time similarity threshold in associative content retrieval

Publications (1)

Publication Number Publication Date
EP1820125A1 true EP1820125A1 (en) 2007-08-22

Family

ID=36169210

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05821605A Withdrawn EP1820125A1 (en) 2004-12-01 2005-11-30 Adaptation of time similarity threshold in associative content retrieval

Country Status (5)

Country Link
EP (1) EP1820125A1 (en)
JP (1) JP2008522309A (en)
KR (1) KR20070086805A (en)
CN (1) CN101069180A (en)
WO (1) WO2006059293A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4812031B2 (en) * 2007-03-28 2011-11-09 Kddi株式会社 Recommender system
KR102659788B1 (en) * 2021-11-02 2024-04-23 주식회사 엠클라우독 System for recommending document using dynamic change of time-series pattern information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003019560A2 (en) * 2001-08-27 2003-03-06 Gracenote, Inc. Playlist generation, delivery and navigation
US6987221B2 (en) * 2002-05-30 2006-01-17 Microsoft Corporation Auto playlist generation with multiple seed songs
US6996390B2 (en) * 2002-06-26 2006-02-07 Microsoft Corporation Smart car radio
US7228054B2 (en) * 2002-07-29 2007-06-05 Sigmatel, Inc. Automated playlist generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2006059293A1 *

Also Published As

Publication number Publication date
WO2006059293A1 (en) 2006-06-08
CN101069180A (en) 2007-11-07
JP2008522309A (en) 2008-06-26
KR20070086805A (en) 2007-08-27

Similar Documents

Publication Publication Date Title
US8442976B2 (en) Adaptation of location similarity threshold in associative content retrieval
US9524349B2 (en) Identifying particular images from a collection
US8171016B2 (en) System and method for using content features and metadata of digital images to find related audio accompaniment
US8117210B2 (en) Sampling image records from a collection based on a change metric
EP2405371A1 (en) Method for grouping events detected in an image collection
US7788267B2 (en) Image metadata action tagging
US20080306995A1 (en) Automatic story creation using semantic classifiers for images and associated meta data
US20090043811A1 (en) Information processing apparatus, method and program
EP2510464B1 (en) Lazy evaluation of semantic indexing
US20100217755A1 (en) Classifying a set of content items
US8356034B2 (en) Image management apparatus, control method thereof and storage medium storing program
CN101755303A (en) Automatic story creation using semantic classifiers
EP2070087A2 (en) Method of creating a summary
US20080306930A1 (en) Automatic Content Organization Based On Content Item Association
WO2006059295A1 (en) Associative content retrieval
US7698296B2 (en) Content-reproducing apparatus
JP2006094018A (en) Program recommending apparatus, program recommending method, program, and recoding medium with the program recorded
EP1820125A1 (en) Adaptation of time similarity threshold in associative content retrieval
US20070156844A1 (en) Apparatus and method for storing content, and apparatus and method for displaying content

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

17P Request for examination filed

Effective date: 20070702

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

18W Application withdrawn

Effective date: 20070801