EP1820126A1 - Associative content retrieval - Google Patents

Associative content retrieval

Info

Publication number
EP1820126A1
EP1820126A1 EP05821618A EP05821618A EP1820126A1 EP 1820126 A1 EP1820126 A1 EP 1820126A1 EP 05821618 A EP05821618 A EP 05821618A EP 05821618 A EP05821618 A EP 05821618A EP 1820126 A1 EP1820126 A1 EP 1820126A1
Authority
EP
European Patent Office
Prior art keywords
item
content item
candidate
vector
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05821618A
Other languages
German (de)
English (en)
French (fr)
Inventor
Elmo M.A. Diederiks
Bartel M. Van De Sluis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arris Global Ltd
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of EP1820126A1 publication Critical patent/EP1820126A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Definitions

  • the present invention relates to the field of content retrieval, management and presentation.
  • First description data including dimension data for a first user- selected content item is extracted.
  • candidate description data including corresponding dimension data for candidate content items is extracted, each candidate content item being of a content type different from the content type of the user-selected content item.
  • a first set of vector values for each candidate content item may be generated, each vector value representing a degree of similarity between the dimension data for a dimension of the first description data and the corresponding dimension data of the candidate description data.
  • a candidate content item from the candidate content items can then be selected based on the degrees of similarity represented by the generated first set of vector values.
  • the selected candidate content item or items are then provided by the retrieval system, such as via a user interface.
  • a dimension of the dimension data represents a content type of the item, a content style for the item, a genre of the item, item metadata, usage history of the item, a performer performing in the item, a director associated with the item, a creator associated with the item, or rendering requirements for the item.
  • the metadata may include a time of creation of the item, a place of creation of the item, a time of acquisition of the item, and/or a place of acquisition of the item.
  • the candidate content item may be selected only if a total degree of similarity represented by the first set of vector values surpasses a minimum threshold.
  • the candidate content item with the highest total degree of similarity as represented by the first set of vector values may be selected.
  • Additional content items may be identified. Description data including the dimensions data for a second identified content item grouped with the first identified content item is extracted. The candidate content item is then selected based also on a second set of vector values representing degrees of similarity between the dimension data for the second identified content item and the dimension data of the similar candidate content item. Accordingly, the candidate content item may be selected such that the first set of vector values and the second set of vector values is averaged, weighted averaged, or added. A commonality vector may also be chosen for weighting results.
  • a commonality vector a vector that represents a dimension for which dimension data of the first identified content item is closest to the second identified content item is selected, and in selecting the candidate content item a value of the commonality vector may be weighted more than remaining vector values of the first set of vector values and the second set of vector values.
  • a virtual content item may be constructed. Description data including dimension data for a first and a second user-selected content item are extracted. Candidate description data including corresponding dimension data for candidate content items are extracted, each candidate content item being of a content type different from the content type of the user- selected content item.
  • a virtual item may be constructed by averaging or weighted averaging a virtual item set of vector values, each vector value of the virtual item set of vector values representing a degree of similarity between a dimension of the dimension data of the first description data and a corresponding dimension of the dimension data of the second description data.
  • a set of vector values for each candidate content item can be generated, each vector value representing a degree of similarity between the dimension data for a dimension of the virtual content item and corresponding dimension data for the candidate content item.
  • a candidate content item from the candidate content items may thus selected by computing as a testing value one of an average, a weighted average, and a sum for each set of vector values of the candidate content items, and determining as the selected candidate content item the candidate content item whose testing value surpasses a threshold.
  • the selected candidate content item or items are provided.
  • Figure 1 is a schematic view all of a retrieval system according to an embodiment of the present invention.
  • FIGS. 2A-2C are flowcharts operations of a system according to the present invention.
  • Figure 3 shows a data chart of vector value alignment according to an embodiment of the present invention.
  • the retrieval system 1-1 includes several modules, which will be described below. Modules of the retrieval system 1-1, or portions thereof, and/or the retrieval system as a whole, may be comprised of hardware, software, firmware, or a combination of the foregoing, however some modules may be comprised of hardware for example, while other modules may be comprised of software, firmware or a combination thereof. It is to be understood that modules of the retrieval system need not all be located or integrated with the same device. A distributed architecture is also contemplated for the retrieval system, which may "piggy-back" off of suitable modules provided by existing devices.
  • the following description will refer to a retrieval system 1-1 that is physically integrated with or connected to a database 1-2 via a wired or wireless connection thereto.
  • the database 1-2 may be embodied on a storage device such as on a hard drive of a personal computer, a personal video recorder, an entertainment system, an electronic organizer, a personal handheld device, a Jaz drive, or may be embodied as a commercial storage facility, such as a disk drive. It will be understood that the database 1-2 may include several storage devices that are connected, such that organization or grouping of content items on two or more of such devices is possible.
  • the database may be understood to include one or more storage media, such as disks, including CDs, DVDs, zip disks, floppy disks, data cartridges, or the like, which can be loaded onto and retrieved by the database 1-2.
  • the retrieval system 1-1 is also capable of retrieving content via a network 1-9, such as a LAN, WAN, the Internet, or the like.
  • the retrieval system 1-1 includes a description data extractor 1-11, which is a module that collects certain types of data from a content item.
  • the content item may be a video, or a video clip, a movie, a photo, a text file, music data, an audio file, or other type of multimedia data, a JPEG file, or XML data.
  • the video may be a home video shot on a digital video recorder
  • the movie may be commercially distributed film data, such as a film encoded as MPEG (including MPEG-2, MPEG-3, or the like)
  • the photo may be a digital photograph data, or series of photographs or a photograph album
  • the text file may be a word processor produced file, a spreadsheet, or a computer code file
  • the music data may be an MP3 file or the like, and so forth.
  • the description data extracted by the description data extractor 1-11 includes information about the content item.
  • Such description data describe the dimensions of the content item.
  • Such dimensions may include any one or more of the following: the content type, including the medium, such as the video, audio, photo, text file, et cetera; the content style or genre, such as holiday movie, personal landscape photography, jazz music or the like; metadata for the item, such as time and/or location of the creation of the item, time and/or place of acquisition of the item; usage history of the item, such as the last/first/penultimate etc.
  • a time period of most usage for example, the item is mostly used at night, or on Monday afternoons, or 6-8 AM or the like
  • a time of acquisition of the item for example, the item is mostly used at night, or on Monday afternoons, or 6-8 AM or the like
  • a time of acquisition of the item for example, the item is mostly used at night, or on Monday afternoons, or 6-8 AM or the like
  • a time of acquisition of the item for example, the item is mostly used at night, or on Monday afternoons, or 6-8 AM or the like
  • a place of creation of the item for example, the item is mostly used at night, or on Monday afternoons, or 6-8 AM or the like
  • a place of acquisition of the item for example, the item is mostly used in the living room, or in the user's home, or the like
  • usage history data is sometimes known as metadata
  • types of metadata are sometimes referred to as usage history data
  • such description data about the item may be located and extracted in a variety of ways, including from the item, from an index or database management file, or from an outside source such as from the World Wide Web connected to the retrieval system 1-1 via a wired or a wireless connection to the Internet 1-9.
  • the identified content item may be identified in one of several ways.
  • a user may designate the item based on which other items, sometimes referred to as “candidate content items" are to be retrieved.
  • a content item newly added or created may automatically be designated as an identified content item based on which other items are to be retrieved.
  • content item identifier 1-12 identifies candidate content items in the database, over the network connection or from other sources that are similar with respect to these dimensions of their description data to the first identified content item.
  • Vector constructor 1-13 then creates a first set of vector values by assigning vector values to each of a number of vector as follows: each vector corresponds to a dimension, and a value for the vector reflects a degree of similarity or matching of a dimension of the first identified content item with the candidate content item.
  • a vector that corresponds to the dimension of the content item termed style or genre would get a high value if both the identified content item and the candidate content item are of the same genre, such as "Spanish holiday.”
  • a vector value of 1 or 0 may indicate little or no correlation or matching for the particular dimension between the first identified content item and the candidate content item, while a vector value of 9 or 10 may indicate a high degree of similarity or match. For example, when both content items have a genre of "Spanish holiday" then for the vector corresponding to the genre dimension, a 9 or 10 value would be assigned.
  • vector values may merely represent a "strong", “normal”, or "weak” match for the dimension.
  • a second identified content item is available, than a second set of vector values may be similarly constructed by vector constructor 1-13 based on description date extracted by description data extractor 1-11 for the second content item, such that this second set represents a degree of similarity between corresponding dimensions of this second identified content item and a candidate content item. There may be additional available identified content items. Thus, this process of description data extraction and vector value set generation may be repeated for any number of available identified content items 1-N, N being a positive integer greater than 1. Then, the candidate content item selection is performed based on all such generated vector value sets, or their average.
  • a commonality vector generator/threshold setter 1-14 may select one or more vectors for which the vector values of the first set and the second set are consistently high. Such vector values may then be weighted more than values for the other vectors in the average or sum of the set of vector values representing the overall degree of similarity between the two items. In this way, a dimension which is representative of the first and second identified content item, or which tends to capture the similarity between the first and second identified content item and is therefore characteristic of the group would be weighted more then other vector values.
  • a commonality vector generator module and a threshold setter module may be constructed as part of the retrieval system 1-1, or such modules may be incorporated into other modules.
  • Virtual item constructor 1-15 will be described below in the context of a discussion of an operation of embodiment of the present invention.
  • Content item selector 1-16 selects the candidate content item or items to be provided to user. This module may also handle other tasks necessary for the operation of the retrieval system, such as overall control and coordination of the modules of the retrieval system 1-1.
  • Retrieval result output 1-17 interfaces with other devices and communication with the outside, including interfacing with a user (not shown). In particular retrieval result output 1-17 signals about the user interface of content items retrieved by the retrieval system 1-1.
  • User interface 1-3 may be a separate device or may be integrated with another device or system, such as a personal computer or a personal video recorder, or one or more of the storage and other devices enumerated above.
  • a first content item is identified, as described above, by a user via user interface 1-3 shown in Figure 1, or automatically by the system, for example by a detection of a newly added content item or an isolated content item in database 1-2.
  • Description data extractor 1-11 of retrieval system 1-1 extracts first description data for the first content item identified, as stated at Sl of Figure 2A.
  • Figure 3 shows a box labeled 6-11 referencing identified content item 1.
  • dimension data for each of the dimensions for the first identified content item are compiled.
  • steps S3 and S4 are performed: at S3 description data for the identified content item is extracted, and at S4, dimension data for each of the dimensions for the second identified content item are compiled.
  • a number of content items may be identified as bases of content retrieval.
  • Figure 3 shows first identified content item, 6-11, second identified content item, 6-12, and identified content item N, 6-14. Therefore, this process would be repeated for each of the first - N identified content items.
  • Content item identifier 1-12 of Figure 1 identifies candidate content items in the database 1-2, over a network or elsewhere, while description data extractor 1-11 at S5 ( Figure 2A) extracts description data for each of the candidate content items and, at S6, compiles the dimension data for each of the content items.
  • the process of extracting the corresponding description data of a second candidate content item (represented in box 6- 22), if found, is performed at S7, and the compilation of the dimension data for the second candidate content item is then performed at S8.
  • a virtual item is to be constructed as a basis for determining the similarity of candidate content items, in which case processing will proceed as shown in Fig. 2C. Otherwise, processing would proceed as shown in Figure 2B.
  • a vector value is constructed by a vector constructor 1-13 as shown in SIl of Figure 2B.
  • Figure 3 shows a table 6-1 with a set of vectors 6-3 with values that reflect the degree of similarity for corresponding dimensions of first identified content item 6-11 with the first candidate content item 6-21.
  • a set of vector values 6-4 reflects the similarity of the dimensions of first identified content item, 6-11, with second candidate content item, 6-22.
  • the set of vector values 6-5 reflects the degrees of similarity for corresponding dimensions with first candidate content item 6- 21, while the set of vector values 6-6 reflects the degree of similarity between dimensions of second identified content item, 6-12, with candidate content item 6-22.
  • Each set of vector values also may include an average vector value determined at S 12, based on computation of the arithmetic mean, mode, median or sum of the vector values of this set, that reflects the average similarity for the pair of content items.
  • vector values 6-3 of Figure 3 may include a first vector value, a second vector value, and h-th vector value, and an average value for the set.
  • Box 6-23 references such a candidate content item M.
  • a commonality vector value set is determined based on the similarity of dimensions between identified content items.
  • dimensions that are most similar are identified, and representative vectors can be weighted more than the other vectors, or can be used exclusively.
  • a dimension which is representative of the first and second (and additional) identified content items, and which therefore tends to capture the similarity between the identified content items and is therefore characteristic of the group being formed would be weighted more then other vector values, or would be used exclusively to determine similar candidate content items.
  • a further set of vector values 6-8 may be computed that reflect the overall similarity for each of the dimensions for each candidate content item, by averaging or adding corresponding vector values of the candidate content item 6-21.
  • averaging or adding corresponding vector values for each set of vector values for that candidate content item for the column 6-2
  • an overall degree of similarity with the identified content items for the dimension is attained for the first candidate content item.
  • all of the vector values of the set 6-8 may be added or averaged to obtain an total similarity value for that candidate content item.
  • average as used herein may include an arithmetic mean, a mode, a median or some such other statistical function suitably selected to provide a composite view of the selected values. Further, a simple sum of the values may be used as well as some such statistical function.
  • certain dimensions all of the content item may be more important than others, and for this reason it may be helpful to weight vectors corresponding to certain dimensions more than others. The degree to which such factors are weighted would depend on the application and the needs of the user.
  • a minimal similarity threshold may be used to eliminate non-similar candidate content items, as shown at S 15 of Figure 2B.
  • thresholds may be employed for the various vectors, depending on the needs of the user and the application. Accordingly, candidate content items for which the vector values meet or surpass the threshold value are grouped with the identified content items by group organizer 1-17, while other candidate content items are rejected. Alternatively, the most similar candidate content item, or predetermined number of the most similar candidate content items may be selected for grouping with the identified content items, while the remainder of the candidate content items may be rejected.
  • the content item retrieved is of a content type different from the content type of the user- selected content item.
  • the user- selected content item is of the type music file, or MP3
  • the retrieved content item may be of the content type photograph data. In this way, for example, pictures of a certain genre may be retrieved to match user-selected music of the same genre.
  • This (or these) selected candidate content item(s) are provided to the user or to the user interface 1-3 at S 16.
  • a signal may be provided directly to the database 1-2 to cause retrieval of the selected candidate item to the database or to the user interface 1-3.
  • a notification may be provided to user interface 1-3 to notify a user (not shown) of a retrievable content item. The notification may consist of an identification of the content item to be retrieved, a description of the content item, a URL or a link to the content item, a retrieval of the entire content item or a portion thereof, or a combination of the foregoing.
  • FIGS17 processing terminates.
  • Figure 2C shows a further process according to an aspect of the present invention, using a virtual content item.
  • virtual item constructor 1-15 analyzes the dimensions of the identified content items based on which a grouping is sought.
  • a representative content item for all of the identified content items called a virtual content item 6-15 is then constructed based on the average or weighted average dimensions of the identified content items. For example, if all of the identified content items are of the genre "Spanish holiday,” then the virtual content item would also have as its genre "Spanish holiday.”
  • sets of vector values 6-7 are generated based on the similarity with the dimensions of this virtual content item with the candidate content items.
  • the threshold is applied in selecting similar candidate content items are selected, or the highest scoring candidate content item or items are selected.
  • notification signal is provided by retrieval result output 1-17.
  • processing terminates.
  • a user is compiling digital data representing photographs of a recent holiday in Spain in a database and would like to retrieve other content items with a Spanish theme available in the database, in another connected storage medium, or available over the Internet.
  • the user may select the three photos as identified content item 1, identified content item 2, and identified content item 3, respectively, via user interface 1-3.
  • the retrieval system would then retrieve a data file representing Spanish music found as the selected candidate content item.
  • the user may not have remembered the existence of the Spanish music, or where to look for it in the database 1-2, and indeed the data file may have been added by another user with access to the database 1-2, or may have been retrieved by the retrieval system 1-1 from another storage device or from the world wide web.
  • the user would now be notified of the retrieved content item and/or the retrieved content item would be associated with the user-selected content items. The user would then be able to accompany the viewing of the Spanish holiday photographs with Spanish music.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP05821618A 2004-12-01 2005-11-30 Associative content retrieval Withdrawn EP1820126A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US63213504P 2004-12-01 2004-12-01
PCT/IB2005/053986 WO2006059295A1 (en) 2004-12-01 2005-11-30 Associative content retrieval

Publications (1)

Publication Number Publication Date
EP1820126A1 true EP1820126A1 (en) 2007-08-22

Family

ID=36088607

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05821618A Withdrawn EP1820126A1 (en) 2004-12-01 2005-11-30 Associative content retrieval

Country Status (5)

Country Link
EP (1) EP1820126A1 (zh)
JP (1) JP2008522310A (zh)
KR (1) KR20070086806A (zh)
CN (1) CN101069183A (zh)
WO (1) WO2006059295A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4645676B2 (ja) 2008-04-28 2011-03-09 ソニー株式会社 情報処理装置、関連アイテムの提供方法、及びプログラム
CN101378358B (zh) 2008-09-19 2010-12-15 成都市华为赛门铁克科技有限公司 一种实现安全接入控制的方法及系统、服务器
AU2016250475B2 (en) * 2010-07-21 2018-11-15 Samsung Electronics Co., Ltd. Method and apparatus for sharing content
KR101775027B1 (ko) * 2010-07-21 2017-09-06 삼성전자주식회사 컨텐트 공유 방법 및 장치
US10031968B2 (en) * 2012-10-11 2018-07-24 Veveo, Inc. Method for adaptive conversation state management with filtering operators applied dynamically as part of a conversational interface
US20150032609A1 (en) * 2013-07-29 2015-01-29 International Business Machines Corporation Correlation of data sets using determined data types

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173275B1 (en) * 1993-09-20 2001-01-09 Hnc Software, Inc. Representation and retrieval of images using context vectors derived from image information elements
WO2001046858A1 (fr) * 1999-12-21 2001-06-28 Matsushita Electric Industrial Co., Ltd. Creation d'un indice vectoriel, recherche de vecteurs similaires et dispositifs correspondants
US20030018617A1 (en) * 2001-07-18 2003-01-23 Holger Schwedes Information retrieval using enhanced document vectors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2006059295A1 *

Also Published As

Publication number Publication date
CN101069183A (zh) 2007-11-07
KR20070086806A (ko) 2007-08-27
WO2006059295A1 (en) 2006-06-08
JP2008522310A (ja) 2008-06-26

Similar Documents

Publication Publication Date Title
US8442976B2 (en) Adaptation of location similarity threshold in associative content retrieval
RU2444072C2 (ru) Система и способ для использования возможностей контента и метаданных цифровых изображений для нахождения соответствующего звукового сопровождения
US20080162435A1 (en) Retrieving Content Items For A Playlist Based On Universal Content Id
US7953735B2 (en) Information processing apparatus, method and program
JP5340517B2 (ja) マルチメディア情報に対するメタ・ディスクリプタ
KR20070095282A (ko) 네트워크 접속을 요구함이 없이 미디어 관리가 가능한,로컬 데이터 속성을 포함하는 네트워크 기반의 데이터 수집
US20080306930A1 (en) Automatic Content Organization Based On Content Item Association
EP2208149A2 (en) Classifying a set of content items
CN100546356C (zh) 用于处理信息的设备、方法
WO2006059295A1 (en) Associative content retrieval
KR20090118752A (ko) 컨텐트 재생 목록 제공 방법 및 그 장치
CN103530311A (zh) 对元数据进行优先次序排序的方法和装置
US20060031212A1 (en) Method and system for sorting, storing, accessing and searching a plurality of audiovisual recordings
CN101088088A (zh) 一种编辑节目检索信息的方法和装置
EP1820125A1 (en) Adaptation of time similarity threshold in associative content retrieval
KR20070066509A (ko) 이미지 파일의 관리방법 및 장치
US20070078847A1 (en) System and method for generating a play-list
Khoja et al. Thematic video indexing to support video database retrieval and query processing
Rehatschek et al. An innovative system for formulating complex, combined content-based and keyword-based queries
Abdel-Mottaleb et al. Multimedia content management

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070702

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: PACE MICROTECHNOLOGY PLC

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: PACE PLC

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20081220