EP1066572A1 - Systeme et procede de generation de gabarits semantiques visuels pour l'extraction d'images et de video - Google Patents

Systeme et procede de generation de gabarits semantiques visuels pour l'extraction d'images et de video

Info

Publication number
EP1066572A1
EP1066572A1 EP99911110A EP99911110A EP1066572A1 EP 1066572 A1 EP1066572 A1 EP 1066572A1 EP 99911110 A EP99911110 A EP 99911110A EP 99911110 A EP99911110 A EP 99911110A EP 1066572 A1 EP1066572 A1 EP 1066572A1
Authority
EP
European Patent Office
Prior art keywords
query
visual
concept
subset
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99911110A
Other languages
German (de)
English (en)
Inventor
Shih-Fu Chang
William Chen
Hari Sundaram
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Columbia University in the City of New York
Original Assignee
Columbia University in the City of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Columbia University in the City of New York filed Critical Columbia University in the City of New York
Publication of EP1066572A1 publication Critical patent/EP1066572A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5862Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/786Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using motion, e.g. object motion or camera motion

Definitions

  • the invention relates to database still image, video and audio retrieval and, more particularly, to techniques which facilitate access to database items.
  • the database can be indexed using a collection of visual templates.
  • the visual templates represent semantic concepts or categories, e.g. skiing, sunset and the like.
  • SVT semantic visual templates
  • Semantic visual templates can be established by an interactive process between a user and a system.
  • the user can provide the system with an initial sketch or example image, as a seed to the system to automatically generate other representations of the same concept.
  • the user then can pick those views for inclusion that are plausible for representing the concept.
  • the database can be searched with it, for the user to provide relevancy feedback on the returned results.
  • SVT's the user can interact with the system at concept level. In forming new concepts, pre-existing SVT's can be used..
  • Fig. 1 is a schematic of an interactive technique for generating a library or collection of semantic visual templates in accordance with a preferred embodiment of the invention.
  • Fig. 2 is a diagram which illustrates a concept having necessary and sufficient conditions.
  • Fig. 3 is a diagram which illustrates query generation.
  • Fig. 4 is a schematic of an interactive system in accordance with a preferred further embodiment of the invention, including audio processing.
  • Fig. 5 shows a set of icons exemplifying the concept "high jump”.
  • Fig. 6 shows a set of icons exemplifying the concept "sunset”.
  • Fig. 7 shows a semantic visual template for the concept "slalom"
  • VideoQ VideoQ
  • each object can be characterized by salient attributes such as color, texture, size, shape and motion, for example.
  • video object database consisting of all the objects extracted from the scene and their attributes.
  • Visual Templates A visual template represents an idea, in the form of a sketch or an animated sketch. As a single visual template may be a poor representative of a class of interest, a library of visual templates can be assembled, containing representative templates for different semantic classes. For example, when searching for video clips of the class Sunset, one could select one or more visual templates corresponding to the class and use similarity-based querying to find video clips of sunsets.
  • An important advantage of using a visual template library lies in linkage of a low-level visual feature representation to high-level semantic concepts. For example, if a user enters a query in a constrained natural language form as described in the above-referenced patent applications, visual templates can be used to transform the natural language query into automated queries specified by visual attributes and constraints. When visual content in the repository or database is not indexed textually, customary textual search methods cannot be applied directly.
  • a semantic visual template is the set of visual templates associated with a particular semantic. This notion of an SVT has certain key properties as follows: Semantic visual templates are general in nature. For a given concept, there should be a set of visual templates that cover that concept well. Examples of successful SVT's are Sunset, High Jump, Down-hill Skiing.
  • a semantic visual template for a concept should be small but cover a large percentage of relevant images and videos in the collection, for high precision-recall performance.
  • a semantic visual template can be understood further as a set of icons or example scenes/objects that represent the semantic with which the template is associated. From a semantic visual template, feature vectors can be extracted for querying. The icons are animated sketches.
  • the features associated with each object and their spatial and temporal relationships are important. Histograms, texture and structural information are examples of global features that can be part of such a template. The choice between an icon-based realization versus a feature vector set formed out of global characteristics depends upon the semantic to be represented.
  • each template contains multiple icons, example scenes/objects to represent a concept.
  • the elements of the set can overlap in their coverage. Desirably, coverage is maximized with a minimal template set.
  • Each icon for a concept e.g. down-hill ski, sunset, beach crowd, is a visual representation consisting of graphic objects resembling the actual objects in a scene.
  • Each object is associated with a set of visual attributes, e.g. color, shape, texture, motion. The relevancy of each attribute and each object to the concept is also specified.
  • sun object may be optional, as there may be sunset videos in which the sun is not visible.
  • high jump the motion attribute of the foreground object is mandatory, the texture attribute of the background is non-mandatory, and both are more relevant than other attributes.
  • Fig. 5 shows several potential icons for "high jump", and Fig. 6 for "sunset".
  • the optimal set of icons should be chosen based on relevancy feedback and maximal coverage in terms of recall as described below in further detail.
  • the positive coverage sets for different visual templates may overlap. Therefore, it is an objective to find a small set of visual templates with large, minimally overlapping positive coverage.
  • Users may provide initial conditions for effective visual templates. For example, a user may use a yellow circle (foreground) and a light-red rectangle (background) as an initial template for retrieving sunset scenes. Also, users may indicate weights and relevancy of different objects, attributes, and necessary conditions pertaining to the context by answering an interactive questionnaire. The questionnaire is sensitive to the current query that the user has sketched out on a sketchpad, for example.
  • the search system Given the initial visual template and relevancy of all visual attributes in the template, the search system will return a set of most similar images/video to the user. Given the returned results, the user can provide subjective evaluation of the returned results. The, precision of the results and positive coverage, i.e. recall can be computed.
  • the system can determine an optimal strategy for altering the initial visual query and generate modified queries based on: 1.
  • Such features are embodied in a technique as conceptually exemplified by Fig. 1, with specific illustration of a query for the concept "high jump".
  • the query includes three objects, namely two stationary rectangular background fields and an object which moves to the lower right.
  • four qualities are specified with associated weights, e.g. color, texture, shape and size, represented in Fig. 1 by vertical bars.
  • a new query can be formed by stepping at least one of the qualities, at which point user interaction can be invoked for deciding as to plausibility for inclusion as an icon in the template.
  • this template can be used for a database search. The results of the search can be evaluated for recall and precision. If acceptable, the template can be stored as a semantic visual template for "high jump".
  • the fundamental video data unit may be termed a video shot, comprising multiple segmented video objects.
  • the lifetime of any particular video object may be equal to or less than the duration of the video shot.
  • a similarity measure D between a member of the SVT set and a video shot can be defined as
  • O are the objects specified in the template
  • O' are the matched objects for Ofact
  • d f is the feature distance between its arguments
  • d s is the similarity between the spatial-temporal structure in the template and that among matched objects in the video shot
  • ⁇ f and ⁇ s are the normalized weights for the feature distance and the structure dissimilarity.
  • the query procedure is to generate a candidate list for each object in the query.
  • the distance D is the minimum over all possible sets of matched objects that satisfy the spatial-temporal restrictions. For example, if the semantic template has three objects and two candidate objects are kept for each single object query, there will be at most eight potential candidate sets of objects considered in computing the minimal distance in Equation 1. Given N objects in the query, this appears to require searching over all sets of
  • Each video object, O is used to query the entire object database, resulting in a list of matched objects which can be kept short by using a threshold. Only objects included in this list are then considered as candidate objects matching O,
  • the candidate objects on the list are then joined, resulting in the final set of matched objects on which the spatial-temporal structure relationships will be verified.
  • Template generation Two-way interaction is used between a user and the system for generating the templates. Given the initial scenario and using relevancy feedback, the technique converges on a small set of icons that gives maximum recall.
  • a user furnishes an initial query as a sketch of the concept for which a template is to be generated, consisting of objects with spatial and temporal constraints. The user can also specify whether the object is mandatory. Each object has features to which the user assigns relevancy weights.
  • the initial query can be regarded as a point in a high-dimensional feature space into which all videos in the database can be mapped.
  • a step size can be determined with the help of the weight that the user has specified along with the initial query, which weight can be regarded as a measure for the degree of relevancy attributed by the user to the feature of the object. Accordingly, a low weight results in coarse quantization and vice versa, e.g. when
  • is the jump distance corresponding to a feature
  • is the weight associated with the feature
  • the feature pace is quantized into hyper-rectangles. For example, for color the cuboids can ge generated using the metric for the LUV space along with ⁇ ( ⁇ ).
  • an additional join can be included in step 2 with respect to the candidate lists for each object.
  • a “sunset” may be described at the object level as well as the global level.
  • a global-level description may take the form of a color or texture histogram.
  • An object-level description may be a collection of two objects such as the sky and the sun. These objects may be further quantified using feature level descriptors.
  • a concept e.g. Sunset
  • N necessary
  • S sufficient
  • Additional templates may be generated manually i.e. as the user inputs additional queries. The task is undertaken for each concept. Necessary conditions can be imposed on a concept, thereby automatically generating additional templates, given an initial query template.
  • the user interacts with the system through a "concept questionnaire", to specify necessary conditions for the semantic searched for. These conditions may also be global, e.g. the global color distribution, the relative spatial and temporal interrelationships etc.
  • the system moves in the feature space to generate additional templates, with the user's original one as a starting point.
  • This generation is also modified by the relevancy feedback given to the system by the user.
  • new rules can be determined pertaining to the necessary conditions. These can be used further to modify the template generation procedure.
  • the rules are generated by looking at the correlation between the conditions deemed necessary for a concept with the videos that have been marked as relevant by the user.
  • a query, for a "crowd of people", in VideoQ is in the form of a sketch.
  • the user has specified a visual query with an object, giving weights for color and size, but is unable to specify a more detailed description in the form of either texture (of the crowd) or the relative spatial and temporal movements that characterize the concept of a crowd of people.
  • texture of the crowd
  • relative spatial and temporal movements that characterize the concept of a crowd of people.
  • the system identifies the video clips relevant to the concept that the user is interested in. Now, since the system knows that texture and the spatial and temporal arrangements are necessary to the concept, it seeks to determine consistent patterns amongst the features deemed necessary, amongst the relevant videos. These patterns are then returned to the user, who is asked if they are consistent with the concept that he is searching for. If the user accepts these patterns as consistent with the concept, then they will be used to generate new query templates, as illustrated by Fig. 3. Including this new rule has two-fold impact on query template generation, namely it improves the speed of the search and increases the precision of the returned results.
  • the query defines a feature space where the search is executed.
  • the feature space is defined by the attributes and relevancy weights of the visual template.
  • the attributes define the axes of the feature space, and the relevancy weights stretch/compress the associated axes.
  • each video shot can be represented as a point in this space.
  • the visual template covers a portion of this space. Since the visual template can differ in feature and in character (global against object level), the spaces that are defined by the templates differ and are non-overlapping.
  • Selection of a few features may be insufficient to determine a concept, but it may be adequately represented by a suitable selection differing as to weight, for example. Thus, a concept can be mapped into a feature space.
  • a concept is not limited to a single feature space nor to a single cluster.
  • sunsets cannot be totally characterized by a single color or a single shape.
  • it is important to determine not only the global static features and weights relating to a concept, but also those features and weights that can vary.
  • the search for concepts starts by specifying certain global constants. Through a context questionnaire, the number of objects in the search is determined, and the global features that are necessary to each object. These represent constraints in the search process that do not vary.
  • a user gives an initial query specifying features and setting weights.
  • a set intersection is taken with the set of necessary conditions defined by the user. The necessary conditions are left unchanged. Changes are made to the template based on changes to those features deemed sufficient. If the sets do not intersect, rules are derived that characterize the concept based on the necessary conditions and relevancy feedback.
  • the threshold determines the number of non-overlapping coverings possible. The number of coverings determines the size and number of jumps possible along that particular feature.
  • the algorithm performs a breadth first search and is guided by three criteria:
  • the greedy algorithm going in the direction of increasing recall Compute all possible initial jumps. Convert each jump into the corresponding visual template. Execute the query and collate all the results. Show the results to the user for relevancy feedback and chose those results that maximize incremental recall as possible points of subsequent query.
  • Keywords accompanying the data can either be generated manually or are obtained by association, i.e. keywords are extracted from the accompanying text (in the case of an image) or the captions that accompany videos.
  • VideoQ provides a "language" for inputting the query, in terms of a sketch. There is a simple correspondence between what exists in VideoQ and its natural language counterpart, as illustrated by Table 1.
  • a constrained language set can be used, with a set of allowable words.
  • a sentence is parsed into classes such as nouns, verbs, adjectives, and adverbs to generate a motion model of the video sequence.
  • nouns the objects
  • An noun (i.e. scenario/object) database may initially include a hundred scenes or so, and be extensible by user interaction. Each object may have a shape description that is modified by the various modifiers such as adjectives (color, texture), verbs (walked), adverbs (slowly). This can then be inserted into the VideoQ palette, where it may be subject to further refinement.
  • the parser When the parser encounters a word that is absent from its modifier database(i.e. the databases corresponding respectively to verbs, adverbs, prepositions, adjectives), it then looks up a thesaurus to determine if synonyms of that word are present in its database, and uses them instead. If that fails, it returns a message to indicate an invalid string.
  • modifier database i.e. the databases corresponding respectively to verbs, adverbs, prepositions, adjectives
  • the parser When the parser encounters a word that it cannot classify, the user must either modify the text or, if the word is a noun (like "Bill"), then he can indicate to the system the class (in this case a noun), and additionally indicate that the word refers to a human being. If the user indicates a noun that is absent from the system databases, then the user is prompted to draw that object in the sketch pad so that the system can learn about the object. In the database, attributes such as motion, color, texture and shapes can be generated at the object level, so that one level of matching can be at that level.
  • the audio stream can be used that accompanies a video, as illustrated by Fig. 4. Indeed, if the audio is closely correlated to the video, it may be the single most important source of the semantic content of the video.
  • a set of keywords can be generated, 10-20 per video sequence, for example. Then the search at the keyword level can be joined to the search that at the model level. Those videos can then be ranked the highest which match at the keyword (semantic) level as well as the motion-model level.
  • Semantic visual templates for retrieving video shots of slalom skiers Semantic visual templates for retrieving video shots of slalom skiers.
  • the system asks and the user answers questions regarding context.
  • the semantic visual template is labeled "slalom”.
  • the query is specified as object-based, including two objects.
  • the large blank background represents the ski slope and the smaller foreground object the skier with its characteristic zigzag motion trail.
  • the system generates a set or test icons from which the user selects plausible feature variations in the skier's color and motion trajectory.
  • the four selected colors and the three selected motion trails are joined to form 12 possible skiers.
  • the list of skiers is joined with the single background, resulting in the 12 icons of Fig. 7 where groups of three adjacent icons are understood as having the same.color.
  • the user chooses a candidate set to query the system.
  • the system retrieves the 20 closest video shots.
  • the user provides relevancy feedback to guide the system to a small set of exemplar for slalom skiers.
  • the database contains nine high jumpers in 2589 video shots.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Pour l'extraction d'images ou de vidéo en base de données, un gabarit sémantique visuel ou 'SVT' (Semantic Visual Template) est un ensemble d'icônes de scènes types ou d'objets types caractérisant un concept tel que 'skier', 'coucher du soleil' et analogue. Les SVT assurent une interaction bidirectionnelle entre un utilisateur et un système. L'utilisateur peut fournir au système une esquisse de départ ou une image type qu'utilisera le système comme donner naissance à la génération automatique d'autres représentations du même concept. L'utilisateur peut alors prendre celles des vues à inclure qui donnent une représentation crédible du concept. Une fois qu'on a défini le SVT, on peut l'utiliser pour faire des recherches dans la base de données ou pour permettre à l'utilisateur de fournir des retours d'informations de pertinence concernant les résultats obtenus. Lorsqu'il dispose des SVT définis, l'utilisateur peut lancer des interactions avec le système au niveau du concept. Les SVT existants peuvent servir à former de nouveaux concepts. Pour faire des requêtes au système, il est possible de passer en revue un vocabulaire limité en relation avec des gabarits sémantiques visuels.
EP99911110A 1998-03-04 1999-03-04 Systeme et procede de generation de gabarits semantiques visuels pour l'extraction d'images et de video Withdrawn EP1066572A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US7678198P 1998-03-04 1998-03-04
US76781P 1998-03-04
PCT/US1999/004776 WO1999045483A1 (fr) 1998-03-04 1999-03-04 Systeme et procede de generation de gabarits semantiques visuels pour l'extraction d'images et de video

Publications (1)

Publication Number Publication Date
EP1066572A1 true EP1066572A1 (fr) 2001-01-10

Family

ID=22134152

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99911110A Withdrawn EP1066572A1 (fr) 1998-03-04 1999-03-04 Systeme et procede de generation de gabarits semantiques visuels pour l'extraction d'images et de video

Country Status (5)

Country Link
EP (1) EP1066572A1 (fr)
JP (1) JP2002506255A (fr)
KR (1) KR20010041607A (fr)
CA (1) CA2322448A1 (fr)
WO (1) WO1999045483A1 (fr)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU6503800A (en) 1999-07-30 2001-02-19 Pixlogic Llc Perceptual similarity image retrieval
US6563959B1 (en) 1999-07-30 2003-05-13 Pixlogic Llc Perceptual similarity image retrieval method
EP1320992A2 (fr) * 2000-09-13 2003-06-25 Koninklijke Philips Electronics N.V. Procede de mise en evidence d'information importante dans un programme video par utilisation de reperes
US8085995B2 (en) * 2006-12-01 2011-12-27 Google Inc. Identifying images using face recognition
US8190604B2 (en) 2008-04-03 2012-05-29 Microsoft Corporation User intention modeling for interactive image retrieval
GB2466245A (en) * 2008-12-15 2010-06-23 Univ Sheffield Crime Scene Mark Identification System
US9317533B2 (en) 2010-11-02 2016-04-19 Microsoft Technology Licensing, Inc. Adaptive image retrieval database
US8463045B2 (en) 2010-11-10 2013-06-11 Microsoft Corporation Hierarchical sparse representation for image retrieval
US9147125B2 (en) 2013-05-03 2015-09-29 Microsoft Technology Licensing, Llc Hand-drawn sketch recognition
KR101912794B1 (ko) 2013-11-27 2018-10-29 한화테크윈 주식회사 영상 검색 시스템 및 영상 검색 방법
CN106126581B (zh) * 2016-06-20 2019-07-05 复旦大学 基于深度学习的手绘草图图像检索方法
CN116992294B (zh) * 2023-09-26 2023-12-19 成都国恒空间技术工程股份有限公司 卫星测控训练评估方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3177746B2 (ja) * 1991-03-20 2001-06-18 株式会社日立製作所 デ−タ処理システム及び方法
JP2903904B2 (ja) * 1992-10-09 1999-06-14 松下電器産業株式会社 画像検索装置
US5493677A (en) * 1994-06-08 1996-02-20 Systems Research & Applications Corporation Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9945483A1 *

Also Published As

Publication number Publication date
CA2322448A1 (fr) 1999-09-10
WO1999045483A9 (fr) 2000-10-12
JP2002506255A (ja) 2002-02-26
WO1999045483A1 (fr) 1999-09-10
KR20010041607A (ko) 2001-05-25

Similar Documents

Publication Publication Date Title
Clinchant et al. Semantic combination of textual and visual information in multimedia retrieval
Cheng et al. Semantic visual templates: linking visual features to semantics
Meghini et al. A model of multimedia information retrieval
Chen et al. A novel video summarization based on mining the story-structure and semantic relations among concept entities
US7502780B2 (en) Information storage and retrieval
EP1565846B1 (fr) Stockage et extraction d'informations
US7610306B2 (en) Multi-modal fusion in content-based retrieval
Hsu et al. Reranking methods for visual search
US8140550B2 (en) System and method for bounded analysis of multimedia using multiple correlations
US20040107221A1 (en) Information storage and retrieval
EP1066572A1 (fr) Systeme et procede de generation de gabarits semantiques visuels pour l'extraction d'images et de video
Ciocca et al. Quicklook2: An integrated multimedia system
Chang et al. Multimedia search and retrieval
Paz-Trillo et al. An information retrieval application using ontologies
Pradhan et al. A query model to synthesize answer intervals from indexed video units
Baan et al. Lazy users and automatic video retrieval tools in (the) lowlands
Srihari et al. A model for multimodal information retrieval
Doulaverakis et al. Ontology-based access to multimedia cultural heritage collections-The REACH project
Mallik et al. Multimedia ontology learning for automatic annotation and video browsing
Kutics et al. Use of adaptive still image descriptors for annotation of video frames
Sacco Uniform access to multimedia information bases through dynamic taxonomies
Tanaka et al. Organization and retrieval of video data
Chen et al. Generating semantic visual templates for video databases
Vogel et al. Performance prediction for vocabulary-supported image retrieval
Faudemay et al. Intelligent delivery of personalised video programmes from a video database

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000908

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20040203