CN1372669A - Fundamental entity-relationship models for the genetic audio visual data signal description - Google Patents

Fundamental entity-relationship models for the genetic audio visual data signal description Download PDF


Publication number
CN1372669A CN00812462A CN00812462A CN1372669A CN 1372669 A CN1372669 A CN 1372669A CN 00812462 A CN00812462 A CN 00812462A CN 00812462 A CN00812462 A CN 00812462A CN 1372669 A CN1372669 A CN 1372669A
Prior art keywords
Prior art date
Application number
Other languages
Chinese (zh)
Other versions
CN1312615C (en
Original Assignee
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US14232599P priority Critical
Application filed by 纽约市哥伦比亚大学托管会, Ibm公司 filed Critical 纽约市哥伦比亚大学托管会
Publication of CN1372669A publication Critical patent/CN1372669A/en
Application granted granted Critical
Publication of CN1312615C publication Critical patent/CN1312615C/en



    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data


提供用于从多媒体信息产生标准描述记录的系统和方法。 Systems and methods for generating standards described for the recording from the multimedia information. 本发明使用基本的实体-关系模型于Generic AVDS,它把实体,实体属性,和关系分类成相应的类型以便描述可视数据。 The present invention uses a basic entity - relationship model in Generic AVDS, which the entities, their attributes, and relationships to the corresponding classification in order to describe the type of visual data. 它也涉及将实体关系分类成语法和语义属性。 It also involves the relationship between the entity classified into syntactic and semantic properties. 语法属性可归类成不同的层:类型/技术,全局分布,局部结构,和全局组成。 Attribute syntax may be classified into different layers: the type / technology, global distribution, local structure, and the overall composition. 语义属性能归类成不同的层:普通对象,普通场景,特定对象,特定场景,抽象对象,和抽象场景。 Semantic attributes can be classified into different layers: generic object, an ordinary scene, a specific object, a specific scene, abstract objects, scenes, and abstract. 本发明还使用将实体关系分类为语法和语义类别的分类。 The present invention also uses the entity-relationship classified into syntax and semantics classification categories. 语法关系能归类成空间的,时间的,和可视类别。 Grammatical relations can be categorized into spatial, temporal, and visual categories. 语义关系能归类成词法的和表述的类别。 Lexical semantic relations can be classified into categories and expressed.


对普通声频可视数据信号描述的基本实体关系模型 Basic entity-relationship model ordinary audio visual data signal described

对有关专利申请的参考本专利申请是基于1999年7月3日归档的60/142,325号美国暂时的专利申请,并要求对其的优先权。 This patent application for the patent application is based on the number 60 / 142,325 US provisional patent application, 1999 July 3 filed, and claims priority to it. 本发明的背景I、本发明的领域。 I Background of the invention Field of the invention.

本发明涉及用于描述多媒体信息的技术,更具体地是涉及到描述视频和图象信息,或声频信息的技术,以及这种信息的内容。 The present invention relates to a technique for describing multimedia information, and more particularly to the techniques described and video image information or audio information, and the content of this information. 所揭示的技术用于数字数据信号(如多媒体信号)的对内容敏感的索引和分类。 Techniques for digital data signals disclosed (e.g. multimedia signals) of the index and context sensitive classification.

II、有关技术的描述随着全球因特网的成熟及区域网及局域网的广泛应用,数字多媒体信息已变得越来越为消费者及商业所接受。 II, description of technology with the extensive application of mature global Internet and local area network and local area networks, digital multimedia information has become more and more accepted by consumers and business. 因此开发那些处理,过滤,搜索及组织数字多媒体信息,使得能从越来越庞大的原始信息中筛选有用信息的系统变得越来越重要。 Therefore, the development of those processing, filtering, searching and organizing digital multimedia information so that useful information from the system increasingly large screened original information is becoming increasingly important.

在撰文当前专利申请时,已存在允许消费者/或商业搜索文本信息的解决方安葬。 When the author of the current patent application, allowing the consumer / commercial search or text information solutions burial already exists. 确实,如由,,excite.com等提供的许多基于文本的搜索引擎在万维网(www)上可以得到并存在于大多数被访问的web网站中,这表明对那样的信息提取技术有大量的需求。 Indeed, many text-based search engines such as provided by,, such as the World Wide Web (www) can be obtained exists in the most visited web sites, indicating that the information extraction there are a lot of technical requirements.

不幸的是对多媒体内容不是那样,因为对这样的对象不存在公认的描述方法。 Unfortunately, the multimedia content is not so, because the object of such recognized described method does not exist.

数字图形和视频的最近迅速增加为那些在搜索内容时有大量资源的最终用户带来了新的机遇。 The rapid increase in digital graphics and video recently brought new opportunities for end users who have a lot of resources when searching for content. 可视信息从许多不同的来源以许多不同的格式在各个不同的方面到处可得到。 Visual information from many different sources in many different formats available everywhere in various aspects. 这是个优点,但同时也是挑战,因为用户在搜索这种内容时不能审阅大量数据。 This is an advantage, but also a challenge, because the user can not review a large amount of data in the search for such content. 因此,必须让用户能有效地浏览内容,或根据他们特定的需要实现询问。 Therefore, it must allow users to efficiently navigate content, or ask to achieve according to their specific needs. 但是为了在一个数字库中提供那样的功能,重要的是理解这些数据并合适地索引它。 But in order to provide such a function in a digital library, it is important to understand the data and index it properly. 必须构造索引,并必须根据用户想如何访问这种信息来构造。 Index must be constructed, and must be constructed according to how the user wants to access such information.

在传统的方法中,使用文本标记于索引,一个编目人员手工指定一组关键字或表达式来描述一个图形。 In the conventional method, using the index mark text, cataloging a person to manually specify a set of keywords or expressions to describe a pattern. 然后用户能实现基于文本的询问或通过手工指定的编目浏览。 Then the user can achieve based on a challenge by the text manually specify the catalog or browse. 与基于文本方法相反,在基于内容检索方面的现代技术将目光集中在基于它们可视内容的索引图象。 In contrast to the text-based method, the image will focus in the index based on their visual content based on the content of the modern technology of retrieval. 用户能通过样本(如象此样本的图象)或用户设计图(如象此设计图的图象)实现询问。 Users can sample (e.g., sample image as this), or the user design (e.g., the design of this image picture) implemented interrogation. 更现代的工作试图根据它们的内容自动分类图象:一个系统分类每个图象并指定一个标签(如室内,室外,包含一面,等)。 More modern work trying to automatically classify content according to their image: a systematic classification of each image and assign a label (such as indoor, outdoor, includes side, etc.).

在两个范例中有分类的议题,尤其在基于内容的检索方面,这经常被忽视。 There are two examples of classification issues, particularly in terms of content-based retrieval, which is often overlooked. 在合适的索引可视信息方面的主要困难可以归纳如下:(1)在单个图形中有大量的信息(如索引什么?),和(2)可能有不同层次的描述(如如何索引?)。 The main difficulty of the visual aspects of the information in the appropriate indexes can be summarized as follows: (1) there is a lot of information in a single drawing, and (2) may have different levels of description (such as what index?) (Such as how to index?). 例如,考虑穿着一套制服的男士的画象。 For example, consider wearing a uniform of the men's portraits. 可能用术语“制服”或“男士”来标签此图象。 The term may be "uniform" or "man" to the label of this image. 术语“男士”转而能引出多个层次的信息:概念上,(如在字典中男士的定义)物理上(大小,重量)和视觉上的(头发颜色,衣服)及其他。 The term "man" in turn could lead to multiple levels of information: the concepts (as defined in the dictionary for men) on the physical (size, weight) and visual (hair color, clothing) and others. 因此,一个分类标签包含明显的(如在图象中的人是男士而非女士),和隐含的或不确定信息(如单从那个术语不可能知道那个男士穿什么)。 Therefore, a classification label contains obvious (such as man in the picture is Men Women instead), and implicit or uncertain information (as that term is impossible to know just from what the men wear).

在这方面,过去的尝试是提供多媒体数据库,它允许用户使用包含在图形中的视频对象中的如颜色,纹理和形状那样的特征搜索图形。 In this respect, the past attempts to provide a multimedia database that allows users to search for graphics features such as color, texture and shape as included in the video object in the drawing. 但是,在20世纪末仍然不可能实现搜索因特网上或大多数区域网或局域网上的多媒体内容,因为不存在这方面内容的广泛共识的描述。 However, in the late 20th century is still impossible multimedia content on the Internet or on most search area network or LAN, as described broad consensus does not exist in this area of ​​content. 此外,对多媒体内容搜索的要求不限于数据库,并扩展到如数字广播电视和多媒体电话那样的其他应用中。 In addition, the requirements of the multimedia content is not limited to the search database, and extended to other applications, such as digital broadcast television and multimedia phones.

通过运动图形专家组(Motion Picture Expert Group-“MPEG”)的MPEG-7的标准化的努力,一个开发此标准的工业界的试图已经形成一个多媒体描述架构。 By moving picture expert group (Motion Picture Expert Group- "MPEG") standardization efforts in MPEG-7, trying to develop a standard for this industry has become a multimedia description schema. 在1996年10月开始,MPEG-7致力于标准化多媒体数据的内容描述,以便于针对内容的应用,如多媒体搜索,过滤,浏览和综合。 Began in October 1996, MPEG-7 is committed to the standardization of multimedia content description data in order to apply for content, such as multimedia search, filter, browse and comprehensive. MPEG-7标准为对象的更完全的描述包含在国际标准化组织(International Organisation forStandardisation)文档ISO/IEC JTC1/SC29/WG11 N2460(1998.10)中,其内容在这是包含作为参考。 MPEG-7 standard is described more fully object contained in the International Standards Organization (International Organisation forStandardisation) document ISO / IEC JTC1 / SC29 / WG11 N2460 (1998.10), the contents of which is incorporated by reference.

MPEG-7标准具有的目标是规定描述符以及用于描述符和它们关系的结构(称之为“描述型式”)的一组标准集以描述各种类型的多媒体信息。 MPEG-7 standard has a goal descriptor and a predetermined descriptors and their relationship (referred to as "type described") is set to a set of criteria described various types of multimedia information. MPEG-7也提出定义其他描述符及对这些描述符和它们的关系的“描述型式”的标准化方法。 MPEG-7 is also defined in other descriptors and raised "Description Type" of these descriptors and their relationship to the standardized method. 此描述,即描述符及描述型式的结合应与内容本身有关联,以允许快速并有效地搜索及过滤用户感兴趣的东西。 Described herein, i.e. identifier and described in connection with the description of the type to be associated with the content itself, to allow fast and efficient search and filtering users interesting things. MPEG-7还建议标准化一个语言,来规定描述型式,即描述定义语言(Description Definition Language--“DDL”),和用于二进制编码多媒体内容描述的型式。 MPEG-7 is also recommended that a standardized language, to describe a predetermined pattern, i.e., description definition language (Description Definition Language - "DDL"), and a type of binary encoded multimedia content description.

在撰文当前专利申请时,MPEG征求技术投标,它将优化的实现必要的描述型式,用于将来集成到MPEG标准中去。 When the author of the current patent application, MPEG seek technical proposals, it optimized to achieve the necessary type described, for future integration into the MPEG standard to go. 为了提供那样的优化描述型式,考虑3个不同的多媒体应用的方面。 In order to provide an optimized version described above, the three aspects to consider different multimedia applications. 它们是分布式处理情况,内容交换情况,和允许个性化观看多媒体内容的格式。 They are distributed processing, the content of exchange of information, and allow personalized viewing format multimedia content.

关于分布式处理,描述型式必须提供多媒体内容的互换描述的能力,而与能进行多媒体内容分布式处理的任何平台,任何销售商及任何应用无关。 About distributed processing, describe the type must provide the ability to exchange multimedia content description, and any platform with multimedia content can be distributed processing, independent of any vendor and any application. 可互操作的内容描述的标准化意味着,从各种来源来的数据能加入到各种分布或应用中,如多媒体处理器,编辑器,检索系统,过滤工具等。 Standardization means interoperable content can be described, data from various sources can be added to or distributed to various applications, such as multimedia processor, editor, retrieval system, filtering tools. 这些应用中的某些可以从第三方提供,产生一个能用此多媒体数据的标准化描述工作的多媒体工具的提供者的子工业。 Some of these applications can be provided from a third party provider to produce a standardized description can work this multimedia data of multimedia tools sub-industry.

用户应能访问各个内容提供者的网站来下载内容和由某些低层或高层处理获得的有关的索引数据,并进而访问若干工具提供者的网站来下载工具(如Java应用小程序),以便按照用户的个人兴趣以特别的方法处理异种数据描述,这种多媒体工具的一个例子是视频编辑器。 Users should be able to access the various content providers to download the site's content and associated index data obtained from some low-level or high-level process, and thus access the website several tools provider to download tools (such as Java applets) for follow user's personal interests heterogeneous data processing method described in a particular, example of such a tool is a multimedia video editor. 如果与每个视频相关的描述是MPEG-7相容的,MPEG-7相容的视频编辑器能管理和处理来自各种来源的视频内容。 If the description associated with each video is compatible with MPEG-7, MPEG-7 compatible video editor can manage and process video content from various sources. 每个视频可以带有变化的描述细节程度,如摄影机运动,场景剪辑,标准及物体分段。 Each video details may be described with a degree of change, such as camera movement, the scene cut, and a standard object segment.

从可互操作的内容描述标准得益非浅的第二情况是在各异种多媒体数据库之间交换多媒体内容。 The second benefit greatly from the case of the standard description is interoperable exchange of multimedia content among different kinds of multimedia database. MPEG-7致力于提供表示,交换,翻译和重复使用多媒体内容的现有描述的方法。 MPEG-7 is committed to providing said exchange, the method described in the prior translation and re-use of the multimedia content.

当前,TV广播提供者,无线电广播提供者和其他内容提供者管理并存储巨量多媒体内容。 Current, TV broadcast providers, radio broadcasting providers and other content providers to manage and store a huge amount of multimedia content. 此内容目前使用文本信息和专用的数据库人工描述。 This content is currently using a text message and a dedicated database description manually. 没有可互操作性的内容描述,内容的使用者需要投资人力来将由每个广播者使用的描述手工翻译成它们自己适用的型式。 There is no description of interoperability of content, user content will need to invest manpower to translate the manual describe each broadcaster to use their own suitable type. 如果所有的内容提供者包含同样的型式的内容描述型式,多媒体内容描述的互换是可能的。 If all of the content provider contains the contents of a type similar to the type described, exchange multimedia content description is possible.

最后,应用该描述型式的多媒体播放者及观看者必须为用户提供创新的能力如由用户配置进行数据的多重观看。 Finally, the ability to apply the described type of multimedia players and viewers who must provide users with innovative user configuration data such as multiple viewing. 用户应能改变显示配置而不需要从内容的广播者以不同的程式再次下载数据。 Users should change the display configuration without the need to download data to a different program again from the broadcaster's content.

上述的例子仅暗示了对于根据MPEG-7以标准方式提供的构造丰富的数据的可能的使用。 The above-described example merely suggested for possible use in accordance with MPEG-7 standard manner to provide a rich data structure. 不幸的是,当前尚无现成的技术能大体上满足分布式处理,内容改变或个性化观看等情况。 Unfortunately, currently there is no readily available technology can substantially meet the distributed processing, change or personalized content to watch and so on. 尤其是,现有的技术不能根据一般的特征或语义关系提供捕捉嵌入在多媒体信息中的内容的技术或提供组织这种内容的技术。 In particular, the prior art fails to provide the capturing multimedia content information embedded in the art or the content of such technical features according to the general organization or semantic relationships. 因此需要对一般的多媒体信息的有效的内容描述型式的技术。 Type of technique is required for the effective content of the general description of the multimedia information.

在MPEG汉城会议(1999年3月)期间,根据DS1(静止图象),DS3++(多媒体),DS4(应用),尤其是根据MPEG-7 Evaluation AHL(Lancaster,UK1999年2月)(AHG on MPEG-7 Evaluation Logistics,“Report of the Ad-hoc Groupon MPEG-7 Evaluation Logistics”,ISO/TEC JTC1/SC29/WG11 MPEG 99/N4524,汉城韩国,1999年3月)的DS2(视频)组的某些建议产生了一个通用可视性描述型式(Generic Visual Description Scheme)(Video Group,“Generic VisualDescription Scheme for MPEG-7”,ISO/IEC JTC-1/SC29/WG11 MPEG99/N2694,汉城,韩国,1999年3月)。 During MPEG Seoul meeting (March 1999), according to DS1 (still image), DS3 ++ (multimedia), DS4 (application), in particular according to MPEG-7 Evaluation AHL (Lancaster, UK February 1999) (AHG on MPEG -7 some Evaluation Logistics, "Report of the Ad-hoc Groupon MPEG-7 Evaluation Logistics", ISO / TEC JTC1 / SC29 / WG11 MPEG 99 / N4524, Seoul, Korea, March 1999) of DS2 (video) group recommendations visibility creates a generic description of the type (generic Visual description Scheme) (Video Group, "generic VisualDescription Scheme for MPEG-7", ISO / IEC JTC-1 / SC29 / WG11 MPEG99 / N2694, Seoul, Korea, 1999 March). 该通用可视性描述型式发展成对通用可视性描述型式的AHG描述型式(“AVDS”)(AHG on Description Scheme,“Generic Audio Visual Descrption Scheme for MPEG-7(Vo.3)”,ISO/IECJTC1/SC29/WG11 MPEG 99/M4677,温哥华,加拿大,1999年7月)。 The pattern developed into a general description of visibility the visibility general type described AHG described type ( "AVDS") (AHG on Description Scheme, "Generic Audio Visual Descrption Scheme for MPEG-7 (Vo.3)", ISO / IECJTC1 / SC29 / WG11 MPEG 99 / M4677, Vancouver, Canada, July 1999). GenericAVDS描述了视频序列或映象的可视性内容,以及部分地描述了声频序列的内容,它不是针对多媒体或归档内容。 SUMMARY GenericAVDS describes a video sequence or image visibility, as well as described in part a sequence of audio content, it is not archived, or for multimedia content.

Generic AVDS的基本构成部分是语法结构DS,语义结构DS,语法-语义连接DS,和分析/语法模型DS。 Generic AVDS basic components of a syntax structure DS, the semantic structure DS, syntax - semantic connections DS, and analysis / gram DS. 语法结构DS由区域树(region tree),片树(Segment tree),和片/区域关系图组成。 DS syntax structure tree by a region (region tree), a tree sheet (Segment tree), and the sheet / area relationship showing a composition. 类似地,语义结构DS由对象树(objecttree),事件树(event tree)和对象/事件(object/event)关系图组成。 Similarly, the semantic object tree structure by the DS (objecttree), event tree (event tree) and the object / event (object / event) the composition diagram. 语法-语义连接DS提供将语法单元(区域,片段和片段/区域关系)与语义单元(对象,事件,和事件/对象关系)的正反向连接。 Syntax - semantic connections DS provides connection to the reverse syntax element (regions, fragments and fragment / area relationship) and the semantical unit (objects, events, and event / object relational) of. 分析/语法模型规定了语法与语义结构之间的设计/登记/概念的对应关系。 Analysis / gram predetermined correspondence relationship between the design / registration / conceptual structure between syntax and semantics. 通常称之为内容单元的语义和语法单元具有相关的属性。 Commonly referred to as semantic and syntactic unit content unit having associated attributes. 例如,用颜色/纹理,形状,2D-几何,运动,和变形描述一个区域。 For example, color / texture, shape, geometry 2D-, sports, and a description of the deformed region. 用类型,对象行为,和语义标记DS描述一个对象。 A type, the behavior of the object, and a semantic markup DS described object.

我们已认识到在Generic AVDS的当前描述中可能的缺点。 We have recognized the potential in the current description Generic AVDS's shortcomings. Generic AVDS包括内容单元和实体—关系图。 Generic AVDS unit and includes a content entity - relationship diagram. 内容单元具有相关的特征,实体-关系图描述在内容单元中的一般关系。 SUMMARY unit having characteristics associated entity - the general relationship described in relation to FIG content unit. 这是根据实体-关系(ER)建模技术(PP-S,Chen,“The Entity-Relation Model-Toward a United View of Data,ACM Transaction onDatabase Systems,Vol.1,No.1,pp-9-36,1976年3月)。但是在Generic AVDS中对这些单元的当前描述太一般,以致不能成为描述声频-视频内容的有力工具。Generic AVDS也包括层次及层次之间的连接,这些通常是物理上的层次模型。因此,Generic AVDS是不同的概念的和物理的模型的混合。此DS的其他限止是语义和语法结构的刚性分隔,并缺乏其内容单元的明确及统一的定义。 This is based on the entity - Relationship (ER) modeling technique (PP-S, Chen, "The Entity-Relation Model-Toward a United View of Data, ACM Transaction onDatabase Systems, Vol.1, No.1, pp-9- 36, March 1976), but in the current Generic AVDS description of these units is too general, so that can not be described in the audio - video content .Generic AVDS powerful tools also include a connection between the level and the level, which is usually physically hierarchical model on. Therefore, Generic AVDS is a mixture of different concepts and models of physics. other limiting, this DS is rigid separating semantic and syntactic structure, its contents and the lack of a clear and uniform definition unit.

根据对书面上内容的描述的传统方法,Generic AVDS描述了图象,视频序列,和部分地描述声频序列:(1)文档的物理的或语法的结构的定义;内容表;(2)语义结构的定义,索引;和(3)语义概念出现处的位置的定义,它包括:(1)语法结构DS;(2)语义结构DS;(3)语法-语义连接DS;(4)分析/语法模型DS:(5)可视化DS;(6)元信息DS;和(7)媒体信息DS。 The conventional method described content on writing, Generic AVDS described images, video sequences, and in part to illustrate the audio sequence: defining the physical or structural syntax (1) of the document; table of contents; (2) the semantic structure definition index; and (3) the semantic concept appears defined position at, which comprises: (1) syntax structure DS; (2) the semantic structure of the DS; (3) syntax - semantic connections DS; (4) analysis / grammar model DS: (5) visualization DS; (6) the DS meta information; and (7) the media information DS.

语法DS用于规定一幅图象或定义文档的内容表的视频序列的物理结构和信号特性。 Structure and physical characteristics of the video signal sequence is a table of contents, or image definition document DS for a predetermined syntax. 它包括:(1)片段DS;(2)区域DS;和(3)片段/区域关系图DS。 Which comprises: (1) segment DS; (2) the DS region; and (3) segment / region relation FIG DS. 分割DS可用于确定片段树,后者规定了视频节目的线性时间结构。 DS segment split tree may be used to determine which structure a predetermined time linear video program. 片段是在具有相关特征的视频序列中一组连续的帧,特征包括:时间DS;元信息DS,媒体信息DS。 Fragment is a set of consecutive frames in a video sequence having an associated feature, the features comprising: a time the DS; the DS meta information, media information DS. 一个特殊类型的片段,即一个镜头包括编辑效果DS,关键帧DS,马赛克DS,和摄象机运动DS。 A special type of segment, i.e., a lens editing effect includes the DS, the DS key frame, the DS mosaics, the camera movement and DS. 类似地,区域DS可用于定义区域树。 Similarly, the area defining area can be used for DS tree. 一个区域可定义为去具有相关特征的一幅图象的一个视频序列中的一组互相连接的像素,这些特征包括:几何DS,颜色/纹理DS,运动DS,变形DS,媒体DS,和元信息DS。 A video sequence of a region of an image to be defined as having the relevant features of a set of pixels connected to each other, these include: the DS geometric, color / texture of the DS, the DS motion, deformation of the DS, the DS media, and meta information DS. 片段/区域关系图DS规定了在片段和区域之间的一般关系,如“To The Left of-到其左边去”那样的空间关系;“Sequential to-跟在其后面”那样的时间关系;和如“Consist of-包括”那样的语义关系。 Segment / region relational graph DS in a predetermined relationship between the segments and the general area, such as "To The Left of- its left to go" as spatial relationship; temporal relationship as "Sequential to- followed behind"; and such as "Consist of- include" semantic relationships like that.

语义DS用于借助语义对象和事件规定一幅图象或一个视频序列的语义特征。 Semantic DS and event for the semantic object by a predetermined picture or a video sequence semantic feature. 它能看成为一组索引。 It can look into a set of indexes. 它包括(1)事件DS;(2)对象DS;和(3)事件/对象关系图DS。 It includes (1) an event DS; (2) the DS objects; and (3) event / object relational graph DS. 事件DS可用于构成事件树,后者对片段DS中的片段定义一个语义索引表。 Events may be used to constitute a DS event tree, in which fragments DS segment defining a semantic indexing table. 事件包括一个标记DS。 Events include a mark DS. 类似地,对象DS可用于构成对象树,后者对于在对象DS中的对象定义一个语义索引表。 Similarly, the object may be used to constitute a DS object tree, which is defined for the object in the object DS is a semantic indexing table. 事件/对象图DS规定了在事件和对象中的一般关系。 Event / object relation graph DS in the general provisions of the events and objects.

语法-语义连接DS在语法单元(片段,区域,或片段/区域关系)和语义单元(事件,对象,对事件/对象关系)之间是双向的。 Bidirectional connection between a semantic grammar DS units (fragment, region, or fragment / area relationship) and the semantical unit (event, object, event / object relational) - syntax. 分析/综合模型DS规定了在语法和语义结构DS之间设计/登记/概念的对应关系。 Analysis / Synthesis Model DS provides correspondence relationship between syntax and semantics of the design structure DS / registration / concepts. 媒体和元信息DS分别包含存储媒体和作者产生的信息的描述符。 And meta information media each including DS descriptor information of the storage medium and produced. 可视化DS包括一组视图DS,使一个视频节目能有效的可视化。 Visualization DS DS comprises a set of views, so that a video program can be effectively visualized. 它包括下列视图:多分辨率空间一频率略图,关键帧,高亮度,事件,和其他视图。 It comprises the following views: a multi-resolution spatial frequency thumbnail key frames, high brightness, events, and other views. 这些视图中每一个都是独立定义的。 Each of these views is defined independently. Generic AV DS的缺点Generic AVDS包括具有相关特征的内容单元(即区域,对象,片段,和区域)。 Generic AV DS disadvantages Generic AVDS comprises content units (i.e. region, objects, fragments, and regions) with associated features. 它也包括实体—关系图,描述根据实体—关系模型的内容单元个的一般关系。 It also includes an entity - relationship diagram, describing the entity - a unit of the general relationship between the relational model. 当前DS的不足之处是去单元中的特征和关系可以具有广泛的取值范围,这就降低了它们的有用性及表达的能力。 The current shortcomings DS is a feature and relationships to unit may have a wide range of values, which reduces their usefulness and ability of expression. 一个明确的例子是在对象单元中的语义标记特征。 It is a clear example of an object in the semantic feature tag unit. 语义标记的值可以是一般的(“男人”),特定的(“JohaDoe”)或抽象的(“幸福-Happiness)概念。 The value of semantic markup may be general ( "man"), specific ( "JohaDoe") or abstract ( "happiness -Happiness) concept.

导致本发明的研发的原始目标是对Generic AVDS定义明确的实体—关系结构以解决这一不足之处。 Research and development leading to the invention of the original goals are clear of Generic AVDS defined entity - relationship structure to address this deficiency. 明确的实体—关系结构将属性和关系归类到相关的类别。 Definite entity - relationship between structure and property relations classified into relevant categories. 在此过程中,尤其在产生具体例子过程中(见在图6-9中示出的棒球的例子),我们觉察到当前的Generic AV DS在关系到DS的全局设计方面的其他缺点。 In this process, especially the process of generating specific example (see the example 6-9 is shown in FIG baseball), we perceive the current Generic AV DS DS other disadvantages in relation to the overall design. 我们将在本章节中提到这些。 We will refer to these in this section. 在此应用中,我们提出完整的基本实体—关系模型,以试图解决这些问题。 In this application, we propose a complete basic entity - relationship model, in an attempt to solve these problems.

首先,使用一个实体-关系模型能表示Generic DS的完全的规定。 First, an entity - relationship model can represent fully the provisions of Generic DS. 作为一个例子,对在图6中的棒球的例子的图7-9中提供的实体-关系模型包括由Generic AV DS的大多数构成部分(如事件DS,片段DS,对象DS,区域DS,语法-语义连接DS,片段/区域关系图DS,和事件/对象关系图DS)实现的功能以及更多的功能。 As a solid example, baseball example in FIG. 6 provided in FIGS. 7-9 - relational model includes a majority component Generic AV DS (e.g. event the DS, the DS fragment, target the DS, the DS region, grammar - semantic connections DS, segment / region relation FIG DS, and event / object relational graph DS) to achieve functions and more. 实体-关系(ER)模型是一个常见的高层概念的数据模型,它与作为层次的,关系的或面向对象的模型等的实际实现无关。 Entity - Relationship (ER) model is a common high-level data model concept, as it level, regardless of the actual implementation of relations or other object-oriented model. 当前的GenericDS版本看来多个概念的和实现的数据模型的混合,这些模型是:实体关系模型(如片段/区域关系图),层次模型(如区域DS,对象DS,和语法-语义连接DS),和面向对象模型(发片段DS,可视片段DS,和声频片段DS)。 Current opinion GenericDS hybrid version of the data model more concepts and implementations, these models are: entity-relationship model (e.g., segment / region relation FIG), hierarchical model (e.g., region DS, DS objects, and grammar - semantic connections DS ), and object-oriented model (the DS fragment of hair, the DS fragment of visual, audio segments DS).

其次,在当前的Generic DS中语法和语义之间的分隔太固定。 Second, the separation between syntax and semantics too fixed in the current Generic DS. 对于在图6中的例子中,如当前Generic AV DS提出的那样,我们已区分了击球事件和击球片段的描述(见图7)。 For example, in FIG. 6, the current Generic AV DS as proposed, we have described and distinguished events ball hitting segment (see FIG. 7). 但是在此情况,将两个单元合并成一个单独的,具有语义和语法特征的击球事件是更加方便。 In this case, however, the two are combined into a single unit, semantic and grammatical features shot event is more convenient. 从事视频索引工作的许多组主张语法结构(内容表:片段和景)和语义结构(语义索引:事件)的如此区分,但是在描述在视频序列中的图象或动画对象时,区分这些结构的价值是不太明显。 Many groups advocate the grammatical structure of work in the video index (table of contents: fragments and King) and semantic structure (semantic indexing: Event) is so distinguished, but in describing the image or animated objects in a video sequence, the distinction between these structures value is less obvious. “真实对象-Real Object”通常由它们的语义特征(如语义类别-人,猫等)和由它们的语法特征(如颜色,纹理,和运动)来描述。 "The real object -Real Object" usually by their semantic features - and to describe (such as semantic categories of people, cats, etc.) by their grammatical features (such as color, texture, and movement). 当前的Generic AVDS在区域和对象DS中区分“真实对象”的定义,这可以引起这些描述的低效率的处理。 The current definition of Generic AVDS distinguish "real object" in the region of the object and DS, which may cause inefficient processing of these described.

最后,在Generic DS中,内容单元,尤其是对象和事件缺乏明确的和统一的定义。 Finally, Generic DS, the content unit, objects and events, especially the lack of clear and uniform definition. 例如,当前的Generic DS将一个对象定义成具有某些语义意义并包含其他对象的对象。 For example, the current Generic DS object is defined to have a certain semantic meaning other objects and contain objects. 虽然对象在对象DS中定义,事件/对象关系图能描述去对象和事件中的一般关系。 Although the objects defined in the object DS, event / object relational graph can describe the general relationship to objects and events. 此外,对象通过语法-语义连接DS被连接到语法DS中的对应区域,因而,对象具有跨越Generic Visual DS的许多构成部分的分布式定义,它是不太清楚。 Further, the object by the grammar - DS is connected to a corresponding connection region of semantic grammar in DS, therefore, an object having a defined distributed across many components of the Generic Visual DS, it is less clear. 事件的定义十分相似并含糊不清。 Define an event is very similar and vague. 对Generic AV DS的实体-关系模型在PP-S.Chen的文章“The Entity-Relation on Database Systens,Vol.1,No.1pp.9-36,1976年3月,首次提出的实体-关系(ER)模型借助于实体和它们的关系描述数据。实体和关系均能用属性描述。实体-关系模型的基本部分示于图1。实体,实体属性,关系,和关系属性很紧密地与名词(如男孩和苹果),形容词(如年轻),动词(如吃)及动词补足语(如慢慢地)相对应,它们是描述一般数据的主要部分。能以视频镜头描述的“一个年轻男孩慢慢地吃一个苹果”。使用图2中的实体-关系型表示。此建模技术已用于对图形及其特征的内容的建模用于图象的检索。 For Generic AV DS entity - relationship model in PP-S.Chen article "The Entity-Relation on Entity Database Systens, Vol.1, No.1pp.9-36, 1976 March first proposed - relationship ( . the ER) by means of model entities and their relationships describe data entities and relationships can be described with the attribute entities - the basic portion of the relational model shown in FIG. 1. entities, their attributes, relationships, attributes, and relationships with very close noun ( such as the boys and apples), adjectives (such as the young), verbs (such as eating) and the verb complement (eg slowly) corresponding, they are part of the general description of the main data. "a young boy with video footage can be slow described slowly eating an apple "in FIG. 2 entity - indicates relational modeling techniques have been used to retrieve the contents of the graphics and modeling features for image.

在本章节中,我们对当前的Generic AV DS提出基本实体-关系模型,以解决上面讨论的缺点。 In this section, we present the basic entity for the current Generic AV DS - relational model, in order to address the shortcomings discussed above. 基本的实体-关系模型索引(1)内容单元的属性,(2)这些内容单元之间的关系,和(3)内容单元本身。 Basic entities - Property Relationship Model Index (1) the content unit, the relationship between the content unit (2) and (3) the content unit itself. 这些模型在图5中画出。 The model shown in FIG. 5. 我们提出的内容构成在提交到因特网Imaging 2000的A.James和S.-F.Chang的文章“A Conceptual Framework for Indexing Visual Information at Multiple Levels”中提出的索引可视信息的概念结构的顶层。 What we proposed constitution submitted to the index at the top of A.James Internet Imaging 2000 and S.-F.Chang article presented "A Conceptual Framework for Indexing Visual Information at Multiple Levels" in the conceptual structure of visual information.

发明概述本发明的一个目标是提供对一般多媒体信息的内容描述型式。 SUMMARY OF THE INVENTION An object of the invention to provide a general description of the type of multimedia information.

本发明的另一个目标是提供用于实现标准的多媒体内容描述型式的技术。 Another object of the present invention is to provide a standard for implementing multimedia content type described in the art.

本发明的又一个目标是提供一个装置,它使用户能在因特网或区域或局域网上完成对多媒体的增强的内容敏感的一般搜索。 Still another object of the present invention is to provide a device which enables the user to complete the display of enhanced content is generally sensitive search on the Internet or a local area or region.

本发明的另一个目标是提供系统和技术,用于根据一般特征或语义关系捕捉嵌入在多媒体信息中的内容。 Another object of the present invention to provide systems and techniques for capturing the contents is embedded in the multimedia information in accordance with the general features or semantic relationships.

本发明的又一个目标是提供根据实体在语法和语义属性的差异组织嵌入在多媒体信息中的内容的技术。 Still another object of the present invention to provide a technique according to the syntax and semantics of content entity attributes differences in the tissue embedded in the multimedia information. 语法的属性能归类成不同层次:类型/技术,全局分布,局部结构和全局组成。 Attribute syntax can be classified into different levels: the type / technology, global distribution, local structure and the global composition. 语义属性能归类成不同层次:普通对象,普通场景,特定对象,特定场景,抽象对象,和抽象场景。 Semantic attribute can be categorized into different levels: normal object, an ordinary scene, a specific object, a specific scene, abstract objects, scenes, and abstract.

本发明又一个目标是将实体关系分类成语法和语义的类别。 Still another objective of the present invention is classified into the entity relationship syntactic and semantic categories. 语法关系可以归类成空间的,时间的,和声频的类型。 Grammatical relations can be categorized into spatial, temporal, and audio type. 语义关系可以归类成词法的和表述的类别。 Lexical semantic relations can be classified into categories and expressed. 空间的和瞬时的关系可以是拓扑的或有方向的;声频关系可以是全局的局部的,或合成的;词法的关系可以是同义词,反义词,亚词(hyponymy)/超词(hypernymy),部分词(meronymy)/全词(holonymy);和表述关系可以是动作(事件)或状态。 And instantaneous relationship space may be topology or direction; audio relationship may be local to global, or synthetic; relation lexical may be synonyms, antonyms, sub-word (hyponymy) / Ultra word (hypernymy), part word (meronymy) / whole word (holonymy); and the relationship can be expressed in action (event) or state.

本发明的又一个目标是借助于视频和声频信号的分类描述每个层和实体关系。 Still another object of the present invention is described by means of a classification of video and audio signals, and each layer entity relationships.

本发明的另一个目标是通过索引内容一单元属性,内容单元之间的关系,和内容单元本身,提供解决这些问题的基本的和清楚的实体-关系。 Another object of the present invention is achieved by the relationship between the index of a content element attributes, content units, and the content unit itself, provides a clear and basic entities to solve these problems - relationship.

此工作是基于在提交到因特网Imaging 2000的A.Jaimes和S.-F.Chang的文章“A Conceptual Frame-work for Indexing Visual Information of MultipleLevels”中提出的用于索引可视信息的概念性结构,它已经采用并对Generic AVDS作了扩展。 This work is conceptual structure for indexing visual information Internet Imaging 2000 of A.Jaimes and S.-F.Chang article presented "A Conceptual Frame-work for Indexing Visual Information of MultipleLevels" in submitting to the basis, it has adopted and Generic AVDS has been expanded. 在另外的参考文献中(如S.Paex ABBenitez,S.-F.Chang,C.-S.Li.JR Smith,LDBergman,A.Puri,C.Swain,和J.Osterman,“Proposal forMPEG-7 Image deseription Scheme,这是对1999年2月英国Lancaster的ISO/IECJTC1/SC29/WG11 MPEG 99/P480提出的的工作是恰当的,因为它区分了在内容单元和在内容单元(具有实本—关系图和层次,一个实体—关系图的特定情况)中的关系的规定的描述。通过这样做,清楚地规定了一个ER模型。 In further references (e.g. S.Paex ABBenitez, S.-F.Chang, C.-S.Li.JR Smith, LDBergman, A.Puri, C.Swain, and J.Osterman, "Proposal forMPEG-7 Image deseription Scheme, which is the work of the February 1999 British Lancaster of ISO / IECJTC1 / SC29 / WG11 MPEG 99 / P480 proposed is appropriate, because it distinguishes between the content units and content units (with real this - relationship and FIGS level, an entity - described predetermined relationship between the particular case of the diagram) in doing so, it provides a clear ER model.

我们着眼于对于索引可视信息的多层描述问题。 We focus on the multi-layered visual information for the index describing the problem. 我们提出一个新颖的概念框架,它统一了在文献中处于各不相同领域的概念,这些领域如认识上的心理,图书馆科学,艺术及更新的基于内容的检索。 We propose a new conceptual framework that unifies the concepts in different areas in the literature, these fields such as psychology, library science, art and updated knowledge on content-based retrieval. 我们在可视与非可视之间作出区别并提供恰当的结构。 We make a distinction between the visual and non-visual and provide appropriate structure. 提出的10层可视结构提供了根据语法(如颜色,纹理等)和语义(如对象,事件等)索引图象的系统的方法,并包括在一般概念与可视概念之间的区分。 Layer 10 proposed structure provides a visual distinction between the general concept of the concept in accordance with the visual syntax (such as color, texture, etc.) and semantics (such as object, event, etc.) systems of the index image method, and comprising. 我们在可视结构不同层上定义了不同类型的关系(如语法的,语义的),并且还使用一个语义信息表来综合有关图象(如出现在非可视结构中的图象)的重要方面。 We define the different types of relationships (e.g., syntax, semantics) visible structures on different layers and also using a semantic information table relating to the image synthesis (as occurs in the structure of the non-visible image) is an important aspect.

我们的结构正确地放置当前技术水平的基于内容的检索技术,使它们与实际用户需求及其他领域中的研究相关。 Our structure is correctly placed state of the art content-based retrieval technology to make them relevant to the needs of the actual user research and other fields. 使用如人们提出的结构不仅通过理解用户及他们的兴趣深益,而且在按照用于访问可视信息的描述层次表征基于内容的检索问题上得益。 The use of such structures have been raised not only by understanding users and their deep interest in the benefits, but also benefit in the characterization of content-based retrieval problems as described level for access to visual information.

本发明建议根据在提交给因特网Imaging 2000的A..Jaimes和S.-F.Chang的“A Conceptual Framework for Indexing Visual Information at Multiple Levels”中提出的10层概念结构来索引内容单元的属性,该文章如图3所示根据语法(如颜色和纹理)及语义(如语义标记)区分属性,可视结构的头4层涉及语法,而余下6层涉及语义。 The present invention proposes to index content attribute unit 10 according to the layer structure proposed in the concept presented to the Internet Imaging 2000 and S.-F.Chang A..Jaimes of "A Conceptual Framework for Indexing Visual Information at Multiple Levels", the article 3 according to the syntax distinguished (e.g., color and texture) and semantics (e.g., semantic tags) attribute, the first four-layer structure according to the visual syntax, and the remaining 6 relates the semantic layer. 语法层是类型/技术,全局分布,局部结构,和全局组成。 Syntax layer type / technology, global distribution, local structure, and the overall composition. 语义层是普通对象,普通场景,特定对象,特定场景,抽象对象,和抽象场景。 It is a general object semantic layer, an ordinary scene, a specific object, a specific scene, abstract objects, scenes, and abstract.

我们还提出在Generic AV DS的实体—关系图中的内容单元之间关系的明显类型。 We also present in Generic AV DS entity - clearly the type of relationship between the content of the unit diagram. 我们区分语法和语义的关系,如图4所示。 We distinguish the syntax and semantics of the relationship, as shown in FIG. 语法关系分成空间的,时间的和可视的。 Grammatical relations into space, and visualization of time. 空间的和时间的属性分成拓扑和有方向的类别。 Attribute space and time and directional topology into categories. 语法属性关系能进一步索引为全局的,局部的及组成。 Syntax property relations can further index global, local and composition. 语义关系补分成词法的和表述的。 Up into lexical semantic relations and expressed. 词法关系被分类成同义词,反义词,亚词/超词,部分词/全词。 Lexical relations are classified into synonyms, antonyms, word sub / super words, partial words / whole word. 表述关系能进一步索引为动作和事件。 The relationship between the expression index further actions and events.

供助内容单元的类型,我们建议将它们分类成语法及语义单元。 Type help for content units, we propose to classify them into syntax and semantics unit. 语法单元能分成区域,动画—区域,和片段单元;语义单元能索引到对象,动画对象,和事件单元。 Syntax element can be divided into regions, animations - region, and a segment unit; semantical units can be indexed to an object, an animated object, and an event unit. 我们提供了这些单元的明显并统一的定义,它们借助于单元的属性和与其他单元之间的关系以提出的基本模型表示,还规定了在这些单元的某些之间的承继关系。 We provide a clear and uniform definition of these units, the basic model of the relationship between them by means of property units and other units to raise the representation, also provides for inheritance relationship between some of these units.

加入到这里并作为本发明揭示内容一部分的附图示出了本发明的较佳实施例,并且于解释本发明的原理。 And here was added as part of the disclosure of the present invention, illustrate the preferred embodiments of the present invention, and to explain the principles of the invention.

附图概述图1是一个一般实体—关系(ER)模型;图2提供了对场景“一个年轻男孩在4分钟吃了一个苹果”的一个实体一关系模型的例子;图3用一个金字塔表示索引可视结构;图4示出如在可视结构的不同层次上提出的关系;图5示出内容单元的每个提出的类型的基本模型;图6图示了一个棒球击球事件图象;图7是对在图6中显示的棒球击球事件图象的击球事件的概念描述;图8是对图6的击球事件的击球和投球事件的概念描述;图9是对图6的击球事件现场对象的概念描述;图10概念性地表示了非可视信息的分析;图11示出如何在语义上使用可视的和非可视的信息来表征一个图象或其部分;图12示出去声频结构的不同层次上的关系。 BRIEF DESCRIPTION FIG. 1 is a general entity - Relationship (ER) model; FIG. 2 provides a scene '4 minutes a young boy ate an apple "is an example of an entity-relationship model; Figure 3 represents the index of a pyramid visual structure; FIG. 4 shows the relationship as proposed in different levels of visual structure; FIG. 5 shows a basic model of the proposed contents of each type of unit; FIG. 6 illustrates a baseball batting event image; FIG 7 is a conceptual baseball batting incident event image shot shown in FIG. 6 described; FIG. 8 is a conceptual event of hitting the ball of FIG. 6 and description of the event pitching; FIG. 9 is a 6 concept shot incident scene description object; FIG. 10 conceptually shows the analysis of non-visual information; FIG. 11 shows how visual and non-visual semantic information to characterize an image or a portion thereof ; FIG. 12 illustrates the relationship between sound out at different levels of the frequency structure. 在语法层中的单元按语法关系相联系。 Unit according to the syntax layer associated grammatical relations. 在语义层中的单元接语法和语义关系相联系。 Means connected in the semantic layer associated syntactic and semantic relationships.

较佳实施例的描述我们选择在这里使用的建模技术,因为实体—关系模型是最广泛使用的概念模型。 Description of the preferred embodiment we have chosen modeling techniques used here, since the entity - relationship model is a conceptual model of the most widely used. 它们达到高度的抽象性并与硬件及软件无关。 They achieve a high degree of abstraction and is independent of hardware and software. 存在特定的过程将这些模型转换成用于实施的物理模型,后者与硬件与软件有关。 There is a specific procedure to convert them into physical models for the implementation of the model, which is related to the hardware and software. 物理模型的例子是层次模型,关系模型,和面向对象模型。 Examples of physical model is a hierarchical model, relational model, and object-oriented model. 在MPEG-7范围的ER概念框架在1999年7月加拿大温哥华的ISO/IEC JTC1/SC29/WG11 MPEG 99的稿件JRSmith and C.-S.Li“An ER Conceptual Modeling Framework for MPEG-7”中讨论。 ER conceptual framework MPEG 7-ranging discussion in July 1999 in Vancouver, Canada, ISO / IEC document JTC1 / SC29 / WG11 MPEG 99 of JRSmith and C.-S.Li "An ER Conceptual Modeling Framework for MPEG-7" .

如图5所示,我们对属性(或MPEG7的描述符),关系,和内容单元作出语法和语义之间的区分。 5, we, the relationship, and means to distinguish between the content syntax and semantics of the attribute (or descriptors of MPEG7). 语法涉及内容单元安排的方法,而不考虑那样安排的意义。 Syntax relates to a method of content units scheduled, regardless of significance as scheduled. 另一方面语义,处理那些单元的意义及它们的安排的意义。 On the other hand semantic meaning as those processing units and their arrangements. 如在本章节余下部分将讨论的那样,语法和语义属性能涉及若干层次(语法层是类型,全局分布,局部结构,和全局组成;语义层是普通对象/场景,特定对象/场景,和抽象对象/场景),如图3所示,类似地,语法和语义关系能进一步分成与不同层相关的子类型,语法关系归类成在普通层与特定上的空间的,时间的和可视的关系;语义关系被归类词汇和表述的类;见图4。 As above, the syntax and semantics of the attributes can be referred to in this section the remainder of the discussion of a number of levels (Syntax layer type, distributed globally, local structure, and the overall composition; semantic layer is a general object / scene, a particular object / scene, and abstract object / scene), shown in Figure 3, similarly, the relationship between the syntax and semantics can be further divided into sub-layers associated with different types of grammatical relations categorized into common space on the particular layer, and visualization of time relationship; semantic relationships are classified and expressed class vocabulary; Figure 4. 根据单元相关的属性类型及与其他单元的关系,我们提供语法及语义单元的紧凑及清楚的定义。 According to the relevant unit of property types and relationships with other units, we provide a compact and clearly define the syntax and semantics unit. 然而,与Generic AV DS的重要差别在于我们的语义单元不仅包括语义属性,还包括语法属性。 However, with the important difference is that we Generic AV DS semantic unit includes not only the semantic attributes also include attribute syntax. 因此,如果一个应用宁肯不区分语法单元和语义单元,通过将所有单元作为语义单元实施它也可以这样做。 Thus, if an application rather not distinguish the syntax and semantics of unit cells, by all means as semantical units embodiments it may do so.

为了阐明基本的实体-关系模型的解释,我们将使用图6-9中的例子。 In order to clarify the basic entities - explain the relationship model, we will use the example in Figure 6-9. 图6示出表示成击球事件和击球片段(片段和事件如在Generic AV DS中定义的那样)的棒球比赛的视频镜头。 6 illustrates events and expressed as a ball striking fragment (fragment and event as defined in the Generic AV DS) of the video baseball game shots. 图7包括将击球事件作为下列事件组合的可能的描述:现场对象,击球事件,投球事件,在投球与击球事件之间的时间关系“Before-去前”,和某些可视性属性。 7 includes a description of the possible batting event as a combination of the following events: live objects, hitting event, pitching event, the temporal relationship between pitching and hitting events "Before- before going", and some visibility Attributes. 图8表示投球和击球以及它们之间关系的描述。 And Figure 8 shows the relationship between the batting and pitching description thereof. 投球事件是一个动作,即投手对象对球对象执行向击球手对象的“投”。 Pitching event is an action that the implementation of the object pitcher to hitter object "vote" on the ball object. 对投于对象我们提供某些语义属性。 We provide target for investment in certain semantic attributes. 击球事件是一个动作,即击球手对象在同样的球对象上执行“击球”。 Shot event is an action that batsmen object to perform the "impact" on the same target ball. 图9示出将现场对象分解成3个不同区域,其中之一通过空间关系“在其顶上-On The Top of”与投手对象有关。 Figure 9 shows a decomposed site into three different areas of the object, a spatial relationship by one "on top of it -On The Top of" objects associated with the pitcher. 提供这些区域之一的某些可视性属性。 Provide some visibility attributes of one of these areas. 属性类型我们提出了在图象和视频描述中索引可视内容单元(如区域,整个图象,和事件)的10层概念结构。 We propose the attribute type in the index image and video visual content description units (e.g., region, the entire image, and events) conceptual structure of 10 layers. 此结构仅对明确画去实际的图象或视频序列(如绘画的价格将不是可视内容的一部分)。 This structure is only explicitly drawn to the actual image or video sequence (such as the price of the painting is not part of visual content).

提出的可视结构包括10层:头4层涉及语法,而余下6层涉及语义。 Visual proposed structure layer 10 comprises: first layer 4 relates to the syntax, and the remaining 6 relates the semantic layer. 在图3中给出可视结构的概貌。 It gives an overview of the structure visible in FIG. 3. 在金字塔中层次越低,就需要越多的知识去完成索引。 The lower level of the pyramid, the more knowledge you need to complete the index. 每层的宽度是所需知识量的指示。 The width of each layer is required to indicate the amount of knowledge. 一个属性的索引代价能作为该属性的子属性包含其中。 The cost of a property index can be used as a sub-attribute of the attribute included. 语法层是类型/技术,全局分布,局部结构,和全局组成。 Syntax layer type / technology, global distribution, local structure, and the overall composition. 语义层是普通对象,普通场景,特定对象,特定场景,抽象对象,和抽象场景。 It is a general object semantic layer, an ordinary scene, a specific object, a specific scene, abstract objects, scenes, and abstract. 虽然这些分割的某些可能是不严格的,但还应考虑它们,因为在理解用户搜索什么及他如何试图在数据库中找到它方面,这些分割具有直接的影响。 While some of these may not be split strict, but they should also be considered, because users searching in understanding what he was trying to find it and how aspects in the database, the division has a direct impact. 它们也借助于所需的知识强调不同索引技术(人工的自动的)的局限性。 By means of the knowledge they needed to emphasize different indices technical limitations (automated manual) of.

在图3中,索引可视结构由一个金字塔表示。 In Figure 3, the visual index is represented by the structure of a pyramid. 很清楚,在金字塔中层次越低,为完成索引需要更多的知识与信息。 Clearly, the lower level of the pyramid, the more knowledge and information needed to complete the index. 每个层次的宽度是所需知识量的指示-例如,为命名在同一场景中的特定对象需要更多的信息。 The width of each level is an indication of the amount of knowledge required - for example, need more specific information for the named object in the same scene.

在图5中,语法属性(语法DS)包括一个枚举的属性,层,其值是在可视结构(图3)中它对应的语法层一即类型,全局分布,局部结构,或全局组成一或“未规定”。 In FIG. 5, the attribute grammar (syntax DS) comprises an enumerated attribute, layer, which is the value which corresponds to a layer that is the syntax type, the global distribution of the visual structure (FIG. 3), the partial structure, composition or global or a "not specified." 语义属性也包括一个枚举的属性,层,其值是在语义结构(图3)中它对应的语义层一即普通对象,普通场景,特定对象,特定场景,抽象对象,和抽象场景一或“未规定”,对不同类型的语法与语义属性建模的另一种可能性是将语法和语义属性单元进行子分类,以分别建立类型,全局分布,局部结构,和全局组成的语法属性;或普通对象,普通场景,特定对象,特定场景,抽象对象,抽象场景属性(这些类型中某些不应用于所有对象,动画对象,和事件)。 Semantic attributes also include an enumerated attribute, layer, whose value is in the semantic structure of a semantic layer (FIG. 3), i.e. it corresponds to a normal object, an ordinary scene, a specific object, a specific scene, abstract objects, and abstract or a scene "unspecified", another possibility for the syntax and semantics of different types of modeling attribute syntax and semantic attributes are sub-classified unit to establish the type, distributed globally, local structure, composition and overall attribute grammar, respectively; or ordinary objects, ordinary scene, a particular object, a particular scene, abstract objects abstract scene properties (some of these types do not apply to all objects, animated objects, and events).

可视结构的每一层在下面解释。 Each layer structure of the visual explained below. 其后讨论各层之间的关系。 Followed by discussion of the relationship between the layers. 根据此可视结构和各层之间的关系,我们在下一章节中定义内容单元的类型。 The relationship between this structure and the visible layers, we define the type of content units in the next section. 类型/技术在最基本的层上,我们关心的是图象或视频序列的一般可视特征。 Type / Technology At the most basic level, we are concerned with the general visual feature images or video sequences. 图象或视频序列的描述或用于产生它的技术是非常一般的,但证明在组织一个可视数据库时具有很大的重要性。 Image or video sequences, or described in the art for producing it is very general, but have proved of great importance in the organization of a visual database. 例如,图象可以放在如彩绘,黑与白,彩色照片,和绘画那样的类别。 For example, images may be placed, such as painting, black and white, color photographs, paintings and the like categories. 在此层次的有关分类型式可以在WebSEEK中自动做。 The level of the relevant classification type can be done automatically WebSEEK in. 对于在图6中的类型是彩色视频序列。 For the type in FIG. 6 is a color video sequence. 全局分布在前一层次中的类型/技术给出图象或视频序列的有关可视特征的一般信息,但是关于可视内容只给出少量信息。 General information about the global distribution of the front of a visual characteristic of the type hierarchy / technique gives images or video sequences, but gives only a small amount on the visual content information. 全局分布目的在于根据其全局内容分类图象或视频序列,并借助于如空间敏感性(颜色),和频率敏感性(纹理)那样的低层觉特征来测量。 Global distribution object classified according to their global image content or video sequences and spatial sensitivity (color), and a frequency sensitivity (texture) features such as a low-level sense be measured by means. 内容的各单独部分尚未在此层处理(即在测量是全局性地进行的意义下对这些分布未给定“形式-form”),所以全局分布特征可以包括全局颜色(如主色调,平均,直方图),全局纹理(如粗糙度,定向性,对比度),全局形状(如纵横比),全局运动(如速度和加速度),摄影机运动,全局变形(如成长速度),和时间/空间尺度(如空间面积和时间尺度)。 Each individual part of the content has not been processed in this layer (i.e., not to the distribution of these "form the -form" is set in the measurement sense globally carried out), the global color distribution may include a global (e.g., primary colors, average, histogram), global texture (e.g., roughness, orientation, contrast), the overall shape (e.g., aspect ratio), the global motion (such as velocity and acceleration), camera movement, global deformations (e.g., growth rate), and time / space scales (e.g., time scales and spatial area). 对于在图6的击球片段,作为全局分布属性的彩色直方图和时间区间被规定(见图7)。 For hitting segment of FIG. 6, as a global color histogram and distribution properties are predetermined time intervals (see FIG. 7).

即使对于一个观察者这些测量的某些难以量化,已将这些全局的低层特征成功地用于各种基于内容的检索系统,来组织用于浏览的数据库的内容,并实现范例查询。 Even for some difficult to quantify an observer of these measures, which have a global low-level features used successfully in a variety of content-based retrieval systems, to organize content for browsing database, query by example and implement. 局部结构在处理一个图象或视频序列的信息时,我们完成不同层次的组合。 When a partial structure of an image processing information or video sequences, we have completed the combination of different levels. 与不提供有关图象或视频序列的各个部分的任何信息的全局结构相反,局部结构层关注各构成部分的概要和特征。 And does not provide any information about each part of the image or video sequence opposite global structure, local structure and outline characteristics of each level attention constituent parts. 在最基本的层上,那些构成部分从低层处理得到并包括如点、线、风格、颜色和纹理那样的单元。 At the most basic level, it includes those components and give as points, lines, style, color and texture as the lower-layer processing unit. 作为一个例子,一个二进制形状表征码(binary shape mask)描述了在图6中的击球片段(见图7)。 As one example, a binary shape masks (binary shape mask) describes a ball segment in FIG. 6 (see FIG. 7). 局部结构的另外例子是时间/空间位置(如起始时间及重心),局部颜色(如M×N布局).,局部运动,局部变形,局部形状/2D几何(见定界方框)。 Additional examples of local structure is the temporal / spatial location (e.g., start time and gravity), local color (e.g., M × N layout)., Local motion, local deformation, the local shape / 2D geometry (see bounding box).

那样的单元也已在基于内容的检索系统中使用,主要在如Viswal SEEK那样的按用户草图查询的界面中。 As cells have also been used in a content-based retrieval system, the interface is mainly in the sketch by the user as the query in Viswal SEEK. 这里涉及的不是对象,而是表示它们的基本单元以及这种单元的组合,例如一个正方形由4条线构成。 The object is not involved here, but rather a combination of the base unit thereof and such a unit, for example, a square formed of four lines. 全局组成在此层,我们着限于由局部结构给出的基本单元的特定安排或组成。 The overall composition in this layer, we are limited to the particular arrangement of the base unit is given by the local structure or composition. 换言之,我们作为整体分析此图象,但只使用在以前层描述的的基本单元(如线和圆)于分析。 In other words, we analyze this image as a whole, but only the base unit (such as lines and circles) layer previously described for the analysis. 全局组成涉及在该图象中单元的安排或空间布局。 It relates to the overall composition or the spatial arrangement of the unit in a layout image. 在业内的传统分析描述如平衡、对称、举兴趣中心(注意中心或焦点),主线索,和视角那样的组成概念。 Analysis As described balanced, symmetrical, the center of lift of interest (or the center of attention focus), the main line, and the concept of the composition as in conventional industry perspective. 但是,在此层没有特定对象的知识,只考虑基本单元(如点、线、和圆)或基本单元的组。 However, no specific knowledge of this layer objects, only the group the base unit (such as points, lines, and circles) or base unit. 在图6中Sand1区域的2D几何是一个全局组成属性(见图9)。 2D in FIG. 6 Sand1 region is a global geometric attribute composition (see FIG. 9). 普通对象直到前一层不需要现实世界知识来实现索引,所以可以使用自动技术提取这些层上的相关信息。 Ordinary objects until the previous one does not require real-world knowledge to achieve the index, so you can use automated techniques to extract the relevant information on these layers. 但是若干研究已证明,人类主要使用高层属性来描述,分类的搜索可视资料。 But several studies have shown that humans are mainly used to describe the high-level properties, classification search visual information. 见C.Jongensen,“Image Attributes in Describing Tasks:an Invest:gat:on”,Informat:on Processing & Management,34,(2/3),99.161-17,1998,C.Jongensen,“Rertrieving the Unretrievable:Art,Aesthetics,andEmotion in Image Retrieval Systems”,SPIE Conference in Human Vision andElectronic Imaging,IS&T/SPIE99,Vol.3644,San Jose,CA,Jan 1999。 See C.Jongensen, "Image Attributes in Describing Tasks: an Invest: gat: on", Informat: on Processing & amp; Management, 34, (2/3), 99.161-17,1998, C.Jongensen, "Rertrieving the Unretrievable : Art, Aesthetics, andEmotion in Image Retrieval Systems ", SPIE Conference in Human Vision andElectronic Imaging, IS & amp; T / SPIE99, Vol.3644, San Jose, CA, Jan 1999. 对象是特别的重要,但是它们也能放置在不同层次的类别中-一个苹果可以分类成一个Machintosh苹果,一个苹果,或一个水果。 The object is particularly important, but they can also be placed at different levels of classes - an apple can be classified into a Machintosh apple, an apple, or a fruit. 当涉及到普通对象时,我们感兴趣于基本层的类别:对象描述的最普通层,它能以日常知识加以识别。 When it comes to ordinary objects, we are interested in the base layer category: The most common description of the object layer, it can be identified with everyday knowledge. 对在图6中的投手对象,一个普通对象属性可以是标记“男人”(见图8)。 Object of the pitcher in FIG. 6, a general object properties can be marked "man" (see FIG. 8). 普通场景正如一个映象或视频序列能按出现其中的各个对象被索引,有可能根据可视资料所包含的所有对象集及它们的安排作为整体未索引该资料。 As an ordinary scene image or a video sequence can be indexed by each of the objects occurs, it is possible according to the set of all objects contained in the visual information and their arrangement as a whole is not the index information. 场景类别的例子包括城市,风景,室内,室外,寂静的生活画面,和肖象。 Examples of categories include urban scenes, landscapes, indoor, outdoor, still life pictures, and portrait. 此层的方针是只需要普通知识。 Policy of this layer is required only common knowledge. 既不需知道特定的街道或建筑物的名字来确定它是一个城市的场景,也不需要知道个人的名字来得知该图象是一个肖象。 Not only do not need to know the name of a specific street or a building to make sure it is a city scene, do not need to know the name of the person to know the image is a portrait. 对于在图6中的击球事件,用值“击球(Batting)”规定普通场景的属性(见图7)。 For shot event in FIG. 6, with the value "batting (Batting)" predetermined attribute ordinary scene (see FIG. 7). 特定对象与以前的层次相反,特定对象涉及已识别的已命名的对象。 Previous level of a particular object contrast, involve a particular object named object has been identified. 需要在图象或视频序列中的各对象的特定知识,且那样的知识是客观的,因为它依赖于已知的事实。 It requires specific knowledge of each object in an image or video sequence, and that knowledge is objective, since it relies on the known facts. 实例包括个人(如在图6中语义学标记“Peter who,Yankee队运动员3#”或对象(如体育场名)。特定场景此层类似于一般场景,其差别是此处有有关场景的特定知识。虽然在该可视资料中不同的对象能以不同方式帮助确定所画的特定场景,有时单独一个对象已足够。例如,清楚地显示白宫的一张照片,只根据那个对象就能归类成白宫的场景。对于在图7中的击球事件用值“由Yankee队的#32运动员击球”规定特定场景的属性。抽象对象在此层使用有关该对象表示什么的专门知识。在完全是主观上的感觉的意义上索引层是最为困难的,而且不同用户的评估可以相差极大。此层的重要性在观察者使用抽象的属性描述图象的实验中示出。例如,在一张照片中的女人由一个观察者看表现为愤怒,对另一个则表现为忧郁。对图8中投手对象用值“速度”规定一个抽象对象属 Examples include individuals (such as semantic tag "Peter who, Yankee team player # 3" or an object (such as a stadium name). This particular scene layer similar to the general scenario in Figure 6, the difference here is there is specific knowledge about the scene Although the different objects in the visual information in different ways can help determine drawn particular scene, sometimes a single object is enough. For example, clearly shows a picture of the White House, only to be able to classify objects according to the White House scene for hitting events in Figure 7 with the value "hitting # 32 athletes from the Yankee team" provides attributes for a particular scene. abstract objects using specialized knowledge about what the object represented in this layer is completely sense index layer on the subjective feeling of the most difficult, and evaluation of different users can vary greatly. the importance of this layer abstract attribute describes experiments is shown in the image viewer. For example, in a photo of a woman by the observer to see the performance of anger, depression is manifested in another. Figure 8 for the pitcher with the target value "speed" provides an abstract object properties 。抽象场景抽象场景层涉及图象作为整体表示什么。它可以是非常主观的。用户有时如对对象那样以抽象的术语描述图象,如悲伤,幸福,权威,天堂,和乐园。对于图7中的击球事件,用值“好策略”规定抽象场景属性。关系的类型在本章节中我们提出包含在Generic AV DS中的内容单元之间关系的明确的类型。如图4所示,在以前提出的可视结构的不同层次上定义关系。为了表示在内容单元中的关系,我们考虑在可视结构中分成语法和语义。就下面讨论的可视结构的层次而论,我们提出的关系类型的某些界限并不固定。 Abstract scene abstract scene layer relates to an image What does that mean as a whole. It can be very subjective. Users sometimes as an object as an abstract term used to describe images, such as sadness, happiness, authority, paradise, and paradise for Figure 7 the ball event, with the value "good strategy" provides an abstract scene attribute. type of relationship in this section we present a clear relationship between the type of content included in the Generic AV DS units in Figure 4, defined previously proposed relationship between different levels of visual structures. in order to express the relationship between the content unit, we consider into syntax and semantics of the visual structure. hierarchy respect to the visual structure of the discussion below, we propose the relationship certain types of boundaries are not fixed.

可视结构的语法层的关系只能发生在2D空间,因为在这些层上没有对象的知识来确定3D关系。 The grammatical structure of the visible layer can only occur in a 2D space, because there is no knowledge of the object on these layers to determine the 3D relationship. 在语法层上,只能是语法关系,即空间(如“下一个”),时间(如“同时地”),和可见(如“更黑”)关系,它们唯一地根据语法知识。 Syntactically layer only grammatical relations, i.e. (e.g., "next") space, time (e.g., "simultaneously"), and visible (e.g. "more black") relationship, they are uniquely according to grammar. 空间和时间属性分类成拓扑类和有方向类。 Spatial and temporal attributes classified into class and topology directional classes. 可视关系能进一步索引成全局的,局部的和组成。 Visual relationship further index into the global, local and composition.

在可视结构的语义层,内容单元之间的关系可以在3D由发生。 Visual relationship between the semantic structure of the layer, the content unit may occur in 3D. 如图4所示,在这些层内的单元不仅能与语义关系有关,而且与语法关系有关(如“一个人在另一个的旁边”及“一个人是另一个的朋友”)。 4, the cells within the layers not only related to the semantic relationships, and relationships with relevant syntax (e.g., "next to one another" and "one another friend"). 我们区分两种不同类型的语义关系:词法关系,如同义词,反义词,亚词/超词和部分词/全词;表述关系涉及动作(事件)或状态。 We distinguish between two different types of semantic relations: lexical relations such as synonyms, antonyms, word sub / super words and word part / whole word; express relationships involving action (event) or state.

在图4中,在可视结构的不同层次上提出关系。 In FIG. 4, in relations at different levels of visual structures. 在语法层中的单元按照一种类型的关系:语法关系联系。 Unit according to the syntax layer of one type of relationship: Contact grammatical relations. 在语义层中的单元按照二种类型的关系:语法和语义关系联系。 Semantic units according to the relation layer of two types: contact syntactic and semantic relationships. 我们将在下面章节中用例子更广泛地解释语法和语义关系。 We'll examples broader interpretation of the relationship between syntax and semantics used in the following sections. 表1和表2贯综合了的索引结构,包括了例子。 Tables 1 and 2 combines penetration index structure, including examples. 语法关系我们将语法关系分成三类:空间的,时间的,和可视的。 Grammatical relations grammatical relations we will be divided into three categories: space, time, and visual. 人们可能有争议,认为空间和时间关系恰是可视关系中的特殊情况。 One might dispute that the relationship between space and time is just a visual relationship special circumstances. 我们以特定的方式定义空间和时间关系。 We define the relationship between space and time in a particular way. 对这些关系,我们将单元分别考虑成在空间和时间的边界而不带有关大小与持续期的信息。 Of these relationships, we will consider each unit to the boundary of space and time and not with the size and duration of the relevant information. 在表1中看到所提出的语法关系的类型的综合及例子。 See the type of grammatical relationship of the proposed integrated and examples in Table 1.

跟随下文中的工作:D.Hernandez“Qualitative Representation of SpatialKnowledge”,Lecture Notes in Artificial Intelligence,804,Springer-Verlag,Berlin,1994,我们将空间关系分成下列类别:(1)拓扑的,即单元的边界是如何相关的;和(2)定向或有方向的,即单元放置的互相位置(见表1)。 The work follows below: D.Hernandez "Qualitative Representation of SpatialKnowledge", Lecture Notes in Artificial Intelligence, 804, Springer-Verlag, Berlin, 1994, we spatial relationships into the following categories: (1) topology, i.e., the boundary unit how relevant; and (2) the orientation or direction, i.e., the position of each unit is placed (see Table 1). 拓扑关系的例子是“接近于”,“在其中”,和“邻近于”;有方向关系的例子是“在其前面”,“在其左边”,“在其顶上”。 Examples of topological relation is "close to", "where", and "adjacent to"; directional relationship example is "in front of", "in its left," "on top of it." 众所周知的空间关系图是2D弦,R2,和属性关系图。 Known spatial relationship diagram is a 2D string, R2, and attribute the diagram.

以类似的方式我们将时间关系归类成拓扑的和有方向的类别(见表1)。 In a similar way we will be the time classified as directional topology and category (see Table 1). 时间拓扑关系的例子是“同时发生”,“重叠(发生)”,“期间发生”;有方向时间关系的例子是“在前面发生”,和“在其后发生”。 Examples of topological time is "simultaneous", "overlap (occurs)", "occurs during"; the example is the time direction "occurs before" and "occurs thereafter." SMIL(World WideWeb Consortium,SMIL Web Site Video/#SMIL)的同时及顺序的关系是时间的拓扑关系的例子。 Relationship SMIL (World WideWeb Consortium, SMIL Web Site Video / # SMIL), and while the order is an example of topology of time.

可视关系根据单元的可视属性或特征与那些单元联系。 Visual relationship between those cells associated with the visual attributes or feature unit. 这些关系能被索引成全局的,局部的和组成的类别(见表1)。 These relationships can be indexed into a global, local and composition of the class (see Table 1). 例如,一个可视的全局关系可以是“更光滑”(根据全局的纹理特征),一个可视性局部关系可以是“加速更快”(根据全局的纹理特征),一个可视性局部关系可以是“加速更快”(根据运动特征),一个可视性组成关系可以是“更加对称(根据2D几何特征)。能使用可视关系根据任何可视特征的组合串联视频镜头/关键帧,可视特征包括颜色,纹理,2D几何,时间,运动,变形,和摄影机运动。表1:对语法关系的索引结构和例子 For example, a global visual relationship may be "smoother" (according to the overall texture features), the visibility of a local relationship may be "faster acceleration" (according to the overall texture features), the visibility of a possible local relationship is "faster acceleration" (the motion feature), the visibility of the composition of a relationship may be "more symmetrical (a 2D geometric features). relationships can be visualized using a series combination of a video camera according to any visual features / key frames, may be visual features including color, texture, 2D geometry, time, movement, deformation, and camera movement table 1: examples of the index structure and grammatical relations

以类似于可视结构的单元有不同层次(普通,特定,和抽象)的方法,这些语法关系的类型(见表1)能以普通层次(“靠近”)或特定层次(离开0.5英尺)定义。 In similar cell structures are visible at different levels (general, specific, and abstract) method, these types of grammatical relations (Table 1) capable of normal levels ( "near") or a specific level (0.5 feet away from) the definition of . 例如,如“与其并”,“与其交”,“是其非”那样操作关系是拓扑的,特定的关系,或是空间的,或是时间的(见表1)。 For example, as described in "and its", "pay their", "which is not" operation that is the topology relationship, a specific relationship, or space, or time (see Table 1).

继续棒球比赛的例子,图7示出如何通过其组合单元(即击球段,场景对象,击球事件,和投球事件),和它们之间的关系(即从击球事件到投球事件的时间关系“在前”)来定义击球事件。 Continuing with the example of a baseball game, FIG. 7 illustrates (i.e., shot event, pitching and batting event scene object segment), the relationship between them, and how their combination unit (i.e., the time from hitting to pitching event event relations "front") to define the hitting event. 击球事件和它的组合单元通过空间一时间关系“由什么构成”互相联系。 Ball event and its combination unit "of what constitutes" contact with each other through a space time relationship. 语义关系语义关系只能发生在10层概念结构的语义层的内容单元之中。 Semantic relation semantic relationships among the content 10 can only occur in the layer structure of the semantic concept of layer unit. 我们将语义关系划分成词汇语义和表达关系。 We will divide into semantic relations and lexical semantic relationship between the expression. 表2综合了语义关系并包括例子。 Table 2 combines semantic relationships and includes examples.

表2:语义关系的索引结构和例子 Table 2: Examples of the index structure and semantic relationships

词法语义关系对应于在WordNet中使用的名词之间的语义关系。 Lexical semantic relations corresponding to the semantic relationship between the terminology used in WordNet. 这些关系是同义词(管线类似于管道),反义词(幸福与悲伤相反),亚词(狗是一个动物),超词(一个动物和一条狗),部分词(音乐家是乐队的成员),和全词(乐队由音乐家们组成)。 These relationships are synonymous (line similar to the pipeline), antonyms (opposite of happiness and sadness), sub-word (dog is an animal), super-word (an animal and a dog), part of the word (musician is a member of the band), and All words (orchestra composed of musicians).

表述语义属性涉及在两个或多个单元之间的动作(事件)或状态。 The expression semantic attribute relates to the operation (event) between two or more elements or state. 动作关系的例子是“投”和“击”。 Examples of action relationship is "cast" and "strike." 状态关系的例子是“属于”和“拥有”。 Examples of state relations "belonging" and "own." 图8包括两个动作关系:“投”和“击”。 Figure 8 includes two action relationship: "vote" and "strike." 与只将表述语义分成动作的状态不同,我们能使用在Work Net中使用的部分关系语义分解。 State only the semantic representation into different operation, we can use the semantic relationships used in the exploded portion of Work Net. Word Net将动词分成15个语义领域:身体关心及功能,改变,认识,通讯,竞争,消费,接触,建立,情绪,运动,感觉,占有,社会接触,和气候动词。 Word Net semantics of the verb into 15 areas: health care and functionality, change, knowledge, communication, competition, consumer, contacts, build, emotions, movement, sensation, possession, social contacts, and climate verb. 只有那些与描述可视概念有关的领域能被使用。 Only those areas related to the description of the visual concept can be used.

至于这里提出的10层可视结构,我们能在不同的层次定义语义关系:普通的,特定的,和抽象的。 As for the 10-story structure visualization presented here, we can define different levels of semantic relations: regular, special, and abstract. 例如,一个变通的动作关系是“拥有股票”,一个特定的动作关系是“拥有80%的股票”,一个抽象的语义关系是“控制该公司”。 For example, one alternative action relationship is "owned stock", a specific action relationship is "owned 80% of shares", an abstract semantic relationship is "control of the company."

对于在图6中的投球和击球事件,图8示出使用语义关系描述两个对象的动作:投手对象“投”球对象到击球手对象,而击球手对象“击”球对象。 For pitching and batting event in FIG. 6, FIG. 8 illustrates the operation described using the semantic relationship of two objects: Object pitcher "vote" of the ball object subject to the batter, the batter is object "click" of the ball object. 实体的类型到这里,我们已经提出了在内容单元中的属性和类型的明显类型。 Type of entity here, we have made significant attribute type and type in the content unit. 在此章节,我们提出内容单元(基本ER模型的实体)的新类型,并提出每个内容一单元类型的明显和统一的定义。 In this section, we present a new type of content units (solid basic ER model), and proposed contents of each distinct and uniform definition of a unit type.

我们根据(1)描述内容单元的属性和(2)将它们与其他内容单元相关联的关系来定义内容单元的类型。 We define the type of content unit according to (1) described property (2) their relationship with other content associated with the unit and the content unit. 以前,我们在10层可视结构中索引内容单元以可视属性。 Previously, we index the content in the visual properties of the unit 10 to the visual layer structure. 金字塔的头4个层对应于语法,而其他6个层对应于语义。 The first four layers of the pyramid corresponds to the grammar, and the other six layers correspond to the semantics. 此外,我们将关系分成两类:语法的和语义的。 In addition, we will be divided into two types of relations: syntax and semantics. 结果,我们提出了内容单元的两个基本类型:语法的和语义的单元(见图5)。 As a result, we propose two basic types of contents: the syntax and semantics of the unit (see FIG. 5). 语法单元只能具有语法属性和关系(如颜色直方图属性和空间关系“在其顶上”);语义单元只能具有语法属性和关系(如颜色直方图属性和空间关系“在其顶上”);语义单元不仅能有语义属性和关系,还能有语法的属性和关系(如一个对象能用颜色直方图和语义标记描述符描述)。 Grammatical syntax element can have properties and relationships (e.g., color histogram properties and spatial relations "on top of it"); semantical unit has only grammatical relations and attributes (attributes such as color histograms and spatial relationships "on top of its" ); semantic units not only have a semantic attributes and relationships, but also have properties and relationships syntax (e.g., an object can be color histogram descriptor tag and semantic description). 我们方法不同于当前的Generic AV DS在于我们的语义(或高层)单元包括语法和语义信息,解决了语法和语义结构的固定区分问题。 Our method differs from the current Generic AV DS semantics is that we (or high) unit, including syntax and semantic information, to solve the problem of distinguishing a fixed syntax and semantic structure.

如图5所示,我们进一步将语法单元分类成区域,和片段单元。 5, we further classified into syntax element region, and a segment unit. 相似地,语义单元能分类成下列语义来:对象,动画对象,和事件。 Similarly, the semantic units can be classified into the following semantics: objects, animated objects, and events. 区域和对象是空间实体。 Areas and objects are spatial entities. 片段和事件是时间实体。 Fragments and events are temporal entities. 最后,动画区域和动画对象是混合的空间-时间实体。 Finally, animation and animated objects are mixed region of space - time entities. 因此我们在章节中解释每个类型。 Therefore, we explain each type in the chapter. 语法实体语法单元是在图象或视频数据中的内容单元,它只由语法属性,即类型,全局分布,局部结构,或全局组成属性,来描述(见图5)。 Syntax entity syntax element in the image or video data content units, only, i.e. the type, attribute syntax of the global distribution, local structure, composition or global properties, is described (see FIG. 5). 语法单元只能通过可视关系与其他单元联系。 Grammar unit can only be linked to other units through visual relationship. 我们进一步将语法单元归类成区域,动画区域,和片段单元。 We further categorized into grammar unit area, animation area, and segment unit. 这些单元通过承继关系从语法单元导出。 These units derived from the syntax unit through inheritance relationship.

区域单元是一个纯粹的空间实体,它涉及一个图象或一个视频的一个任意的,连续的或不连续的一部分。 Area units is a purely spatial entity, which relates to an arbitrary image or a video of a continuous or discontinuous part. 一个区域由一组语法属性和一个区域的固定义,它们通过空间和可视关系联系(见图5)。 A region of attributes and a set of syntax to define a fixed area, by which spatial relationships and visual contact (see FIG. 5). 重要的是指出,组成具有空间,拓扑类型。 Important to note that the composition has a space, type topology. 区域的可能属性是颜色,纹理,及2D几何。 Possible attributes of an area is color, texture, and 2D geometry.

片段单元是一个纯粹的时间实体,它与一个视频序列的任意一组连续或不连续的帧联系。 Fragment unit is a pure time entity with which any of a video sequence a set of continuous or discontinuous contact frames. 一个片段由一组语法特征,和一个片段图,动画区域以及通过时间和可视关系联系的区域定义(见图5)。 A fragment consisting of a set of grammatical features, and a fragment, animations area defining area and the contact time and by visual relationship (see FIG. 5). 组成的联系具有时间,拓扑类型。 Contact composed have time, topology types. 可能的片段属性是摄影机运动,和语法特征。 Possible fragments attribute is camera movement, and grammatical features. 例如图7中的击球片段是一个片段单元,它用一个持续期(全局分布,语法的),和形状表征码(局部结构,语法的)属性描述。 For example in FIG. 7 hitting segment is a segment unit, with which a duration (global distribution, grammar), and the shape of the masks (local structure, syntax) attribute description. 这一片段具有与击球事件的“包括”关系(空间一时间关系,语法的)。 This fragment has a shot event "includes" relationship (a space of time, grammar).

动画区域单元是一个混合的空间一时间实体,它涉及一个视频的任意设置帧的任意片段。 Animation is a mixed region means a time spatial entity, which relates to any fragment of any of a set of video frames. 一个动画帧由一组语法特征,一个动画区域图和通过组合,空间一时间关系,和可视关系联系的区域定义(见图5)。 An animation frame by a set of grammatical features, and a region map by combining animation, a spatial relationship between the time, and the area defining the relationship between the visual contact (see FIG. 5). 动画帧能包含从区域和段单元来的任意特征。 Animation frames can comprise any of the features from the region and segment units. 动画区域是一个片段和在同一时间的区域。 Animation is a region fragments and regions at the same time. 例如,在图8中的投手区域是一个动画区域,它由一个纵横比(全局分布,语法的),一个形状表征码(局部结构,语法的),和一个对称性(全局分布,语法的)属性描述。 For example, pitcher area in FIG. 8 is an animation area, which consists of an aspect ratio (global distribution, grammar), a shape of the masks (local structure, grammar), and a symmetric (global distribution, grammar) property description. 此动画区域是在Sand 3区域“的顶上”(空间时间关系,语法的)。 This animation area is "on top" in Sand 3 area (space-time relationship, grammar). 语义实体语义单元是一个内容单元,它不仅由语义特征而且由语法特征描述。 Semantic entities semantic unit is a unit of content, but it is not only characterized by the syntax described by the semantic features. 语义单元通过语义和可视关系与其他单元联系(见图5)。 Contact semantic units with other units (see FIG. 5) by semantic and visual relationships. 因此,我们使用承继关系从语法单元导出语义单元。 Therefore, we use inheritance relationship is derived from semantic unit grammar unit. 我们进一步将语义单元归类成对象,动画时象和事件单元。 We will further semantic units classified as objects, and events like animation unit. 纯粹的语义属性是标记,它通常是文本格式(如6-W语义标记,自由文本标记)。 Tag is purely semantic attribute, it is usually in text format (e.g., 6-W semantic tags, free text tag).

一个对象单元是一个语义和空间实体;它联系到一个图象的一部分任意或视频的一个帧。 An object is a semantic unit and spatial entity; it is linked to any portion of a frame of an image or video. 一个对象由一组语法和语义特征,和通过空间(组成是空间关系),可视的,和语义关系联系的对象和区域的图定义(见图5)。 An object of a set of grammatical and semantic features, and through the space (a composition of spatial relationships), visual, and semantic relations defined objects and contact region (see FIG. 5). 对象是一个区域。 Object is an area. 事件单元是一个语义和时间的实体;它涉及一个视频序列的一个任意一段。 Event and time unit is a semantic entity; it relates to any section of a video sequence. 一个事件由一组语法和语义特征,和通过时间(组成是一个时间关系),可视的,和语义关系联系的事件,片段、动画区域,动画对象,区域,及对象的一个图定义。 An event is a set of grammatical and semantic features, and time (the time the composition is a), FIG define a visual, and semantic relations contact event, fragments, regions animation, animated objects, areas, and objects. 事件是具有语义属性和关系的一个片段。 Event is a fragment of semantic attributes and relationships. 例如,在图7中的击球事件是一个事件单元,它由一个“击球”(普通场景,语义的)“由Yankee队32号球员击球“(特定场景,语义的)和一个“好策略”(抽象场景,语义的)属性描述。 For example, in FIG. 7 shot event is an event means that the "impact" of a (common scene, semantics) "from the Yankee batting team player 32" (a specific scene, semantics) and a "good strategy "(abstract scenes, semantic) property description. 击球片段的语法属性能应用于击球事件(即我们可以不区分击球事件和击球片段,且可以将击球片段的语法属性赋予击球事件)。 Syntax attributes can be applied to fragments hitting the ball event (that is, we can not distinguish between events and hitting the ball fragments, fragments and syntactic properties hitting the ball can be given event). 击球事件由现场对象,和投球事件和击球事件组成,它代表两个在击球事件中的主要动作(即投球和击球)。 Event objects hitting from the field, batting and pitching events and incidents, it represents the two main action (ie pitching and batting) in the ball event. 投球事件和击球事件由一个“在前面”的关系(时间关系,语法的)相联系。 The relationship between pitching and hitting event event by a "front" of (the time, grammar) linked.

最后,动画对象单元是一个语义和空间一时间实体;它在任意一组视频序列帧中与任意一段相联系。 Finally, a semantic video object unit is a spatial and temporal entities; it is set at an arbitrary video sequence associated with an arbitrary frame period. 一个动画对象由一组语法和语义特征,和通过组成,空间一时间,可视,和语义关系联系的动画对象,动画区域,区域和对象的一个图定义(见图5)。 An animation objects from the syntax and semantic features of a set, and by the composition, a space of time, visual, and semantic relations associated animation objects, the animation area defining area and a graph object (see FIG. 5). 动画对象是一个事件,同时是一个对象。 Animation object is an event that is also a target. 例如,在图8中的投手对象是一个通过“男人”(普通对象,语义的),“Yankee队3号队员”(特定对象,语义的),和一个“速度”(抽象对象,语义的)属性描述的动画对明。 For example, pitching the object in FIG. 8 is a by "man" (normal object, semantic), "the Yankee team 3 members" (specific object, semantic), and a "speed" (abstract objects, semantics) animation properties described Ming. 此动画对象是在图9中所示的Sand3区域“的顶部”(空间一时间关系,语法的)。 This animation object is the "top" (a time spatial relation, grammar) in Sand3 region shown in FIG. 投手区域的语法特征可应用于投手对象。 Grammatical features may be applied to the pitcher pitching target region. 我们如在Generic AV DS中所规定的那样区分此动画对象的语法和语义属性。 We distinguish this as animated objects specified in the Generic AV DS in the syntax and semantics of attributes. 但是,我们在这样做时损失了灵活性和有效性,因为我们把“真实”对象的定义散布到不同的单元。 However, in doing so we lost the flexibility and effectiveness, because we define "real" objects scattered to a different unit.

图5提供了内容单元每个提供的类型的基本模型。 Figure 5 provides a basic model for each type of content providing unit. 属性、单元,和关系归类成下列类别:语法的和语义的。 Property, units, and relationships categorized into the following categories: syntax and semantics. 语法和语义属性具有相关的属性、层,其值对应于可视结构的有关的层。 Attribute syntax and semantics of the properties associated with the layer, which is related to the value corresponding to the visible layer structure. 语法单元进一步分成区域,片段和动画区域。 Syntax element is further divided into regions, fragments and regions animation. 语义单元归类为对象,动画对象,和事件类别。 Semantic units classified as objects, animated objects, and events category.

图6画出一示例性棒球击球事件。 6 depicts an exemplary ball baseball event.

图7按照本发明提供图6中棒球比赛的击球事件的概念描述。 7 provides a conceptual shot event in a baseball game described in FIG. 6 according to the present invention.

图8按照本发明对图6中的击球事件提供击球和投球事件的概念描述。 8 provides batting and pitching of the events according to the present invention in FIG. 6 described concepts shot event.

图9按照本发明对图6中的击球事件提供现场对象的概念描述。 9 depicts the scene according to the concepts of the present invention to provide object of shots in FIG. 6 events. 感觉对概念本发明也可以结合在分析和分类图象的特征时的感觉的概念来说明。 Feel of the inventive concept may also be incorporated in the concept of sensory analysis and classification of features of the image will be described. 在索引图象中一个内在的困难是它们能被分析的方法的数目。 In the index image is a number of inherent difficulties in that they can be a method of analysis. 单个图象可以表示许多事情,不仅是因为它包含许多信息,而且因为我们在该图象中所看到的能映射到许多个抽象概念。 Single image can represent many things, not only because it contains a lot of information, but also because we have seen in the image can be mapped to many abstract concepts. 在那些可能的抽象的描述和只根据该图象的可视方面更具体的描述之间的区别构成索引中的重要步骤。 Possible distinction between those descriptions and abstract only the visual aspects of the image in a more particular description constitutes an important step in the index.

在下列章节中,我们作出感觉和概念之间的区分。 In the following section, we make a distinction between feelings and concepts. 然后我们提供对语法和语义的定义。 Then we provide definitions of syntax and semantics. 并最后讨论一般的概念空间及可视概念空间。 And concludes with a discussion of general concepts of space and visual concept space. 当我们确定我们的索引结构时,在基于内容的检索的范围内这些定义的重要性是显然的。 When we determine that our index structure, the importance of these definitions within the range of content-based retrieval is obvious. 感觉对概念映象是信息的多维表示,但是在最基本的层上它们简单地引起对光的响应(色调光或缺乏光)。 Feeling concept map is a multidimensional representation of the information, but at the most basic level simply due to their response to light (or lack of light-tone). 但是在最复杂的层上,图象代表抽象的想法,这在很大程度上取决于每个人的知识,经验,甚至特别的心情。 But in the most complex layer, the image represents the abstract idea, depending on each person's knowledge and experience to a large extent, even special mood. 我们能作出感觉和概念之间的区别。 We can make the difference between feeling and concepts.

感觉涉及到在明亮的可视系统中我们感官觉察到什么。 Bright feel involved in the visual system to perceive what our senses. 这些光的图案产生如纹理和颜色那样的不同单元的感觉。 These light patterns such as the feeling of texture and color as different units. 当我们谈到感觉时不发生解释过程-不需要知识。 It does not require knowledge - when we talk about feeling the interpretation process does not occur.

另一方面,一个概念关系到从特定的范例产生的抽象的或普通的概念。 On the other hand, a concept related to the general concept of the abstract or generated from the specific examples. 这样,它隐含着背景知识的使用和对所觉察事物的内去解释。 In this way, it implies the use of background knowledge and to explain to within perceived things. 概念在它取决于个人的知识和解释的意义上是非常抽象的一这趋于非常主观的。 Is a very abstract concept which tends to very subjective in its interpretation depends on personal knowledge and meaning. 语法和语义以感觉不需解释相似的方法,语法涉及到可视单元安排的方法而不考虑那样安排的意义。 The syntax and semantics to explain without feeling similar manner, the syntax relates to a method of visual elements arranged without regard to the meaning of that arrangement. 另一方面,语义处理这些单元的意义和它们的安排的意义。 On the other hand, the significance of these elements and their significance arrangements semantic processing. 如在下面讨论中所示的那样,语法能涉及某些感性的层一从简单的全局颜色和纹理到如线和圆那样的局部几何形式。 As discussed below As shown, the syntax can be directed to a certain layer from simple perceptual color and texture to the global lines and local geometric forms such as circular. 语义也能在不同层次上处理。 Semantics can be processed on different levels. 一般概念对可视概念这里我们希望强调,一般概念与中视概念是不同的,而且这些可以随个人而变化。 The general concept of the visual concept of where we wish to emphasize that the general concept and the visual concept is different, and these may vary depending on the individual.

作为例子使用一个球,我们客到虽然一个可能的一般概念将球描述成一个圆形物质,不同的人可有不同的一般概念。 Use a ball as an example, although we are off to a possible general concepts described as a round ball material, different people may have different general concept. 一个排球运动员可以具有与棒球运动员不同的球的一概念,因为如上所述一个概念隐含背景知识和解释。 A concept of a volleyball player may have a different ball baseball players, as described above, a concept implicit background knowledge and interpretation. 对不同的个人,自然具有非常不同的概念的解释(或在此情况对实际对象的解释)。 For different individuals naturally have a very different interpretation of the concept (or interpretation of the actual object in this case). 我们将概念区分成一般概念和可视概念。 We will be divided into zones concept general concept and visual concepts. 可以认识到,用于球的一般概念和可视概念的属性是不同的(可以命名用规则描述概念,但我们恰恰使用属性来简化此解释)。 Can be appreciated, the general concepts and attributes for a ball visible concept is different from (rule can be described by naming concept, but we just use this property to simplify the explanation).

这些定义是有用的,因为它们指出了在基于内容的检索中非常重要的结果:不同的用户具有不同的概念(甚至简单对象的概念),而且甚至简单的对象一能在不同的概念层上看。 These definitions are useful, as they pointed out the important content retrieval based on the results: Different users have different concepts (or even a simple object concept), but even a simple object can be viewed in a different conceptual level . 尤其是,在一般概念(即帮助回答问题:它是什么?)和可视概念(即帮助回答问题:它看来象什么?)之间有重要的区别,而且在设计一个图象数据库时必须予以考虑。 In particular, the general concept (that is, to help answer the question: what is it?) And visual concepts (ie, to help answer the question:? It seems like what) there are important differences between, and must be in the design of an image database be considered. 我们将这些想法应用于构造我们的索引结构。 We will construct these ideas apply to our index structure. 概念的分类结构可以根据感觉的结构。 The concept of the classification structure according to the structure of feeling. 可视与非可视的内容如在前面章节中已知,有很多信息层出现在图象中,且当将它们组织到数字库中时必须考虑它们的多维体。 Visual and non-visual content as is known in the previous section, there are a lot of information in the image layer it appears, and when they are organized into a digital library must be considered in their cube. 专建立一个概念性的索引结构中的第一步是在可视与非可视内容之间作出区别。 The first step index structure is dedicating a conceptual distinction is made between the visual and non-visual content. 映象的可视内容对应于在观察该映象时直接感觉到的事物(即由所讨论的映象或视频的可视内容直接激发的描述符一线,形,颜色,物体等)。 Mappings visual content corresponding to the image of the object when viewed directly sensed (i.e., by the visual image or video content in question descriptor direct excitation line, shape, color, objects, etc.). 非可视的内容对应于与该图象密切相关但不是明显由其外表给出的信息。 Nonvisual closely related to content corresponding to the image information, but not significantly by an outer analysis. 如在绘画中,价格,当前的拥有者等属于非一可视类别。 As in painting, price, etc. belongs to the current owner of a non-visual category. 接着我们对图象的可视内容提出一个索引结构,并随后是非可视信息的结构。 Then we propose a structure of visual content index image, and then the structure of non-visual information. 可视内容随后的分析的每个层次仅从图象获得。 Each subsequent level image analysis only visual content is obtained. 观察者的知识永远起作用,但是这里的一般规则是,不是明显从图象获得的信息不进入此类别(如一张画的价格不是可视内容的部分)。 Knowledge of the observer always work, but the general rule here is that the information obtained is not clear from the image does not enter this category (such as a picture of part of the price is not visual content). 换言之,对可视由容使用的任何描述符是由所讨论的图象或视频的可视内容所激发。 In other words, for any visual descriptor is used by the receiving excited by the visual image or video content in question.

我们的可视结构包括10层:头4层涉及语法,余下6层涉及语义。 Our visual structure comprising Layer 10: first layer 4 directed syntax, semantics relates to the remaining layer 6. 此外,1到4层直接联系到感觉,5到10层联系到可视概念。 In addition, layer 1-4 directly to the feeling, 5-10 layer concept to the visual contact. 虽然这些划分中的某些是不严格的,应该考虑它们,因为它们在理解用户搜索什么以及他们如何度图在数据库中寻找它方面具有直接的影响。 While these are some of the division is not strict, you should consider them, because they understand what users search for and how they map is looking for in terms of the database it has a direct impact. 他们也强调根据所需的知识不同索引技术(人工或自动的)的限制。 They also emphasized the limitations (manual or automatic) in accordance with the required knowledge of different indexing techniques. 在图3中给出该结构的一个概貌。 It gives an overview of the structure in FIG. 从上到下观察此图,很清楚,在金字塔的较低层需要较多的知识和信息来完成索引。 From top to bottom looking into the figure, it is clear, in the lower level of the pyramid need more knowledge and information to complete the index. 每层的宽度给出所需的知识量的指示,例如在一场景中命名特定对象需要更多的知识。 The width of each layer gives an indication of the amount of knowledge required to, for example, naming a specific object scene requires more knowledge. 每层在下面解释,且其后出现各层之间关系的讨论。 Each layer is explained below, and subsequently discuss the relationship between the layers occurs.

观察此结构,明确的是大多数的基于内容的检索的努力集中在语法上(即,到4层)。 Observe this structure, clear that most of the content-based retrieval efforts focused on grammar (ie, to the 4th floor). 但是完成在5至10层的语义分类的技术是非常希望的。 However, completion of 5 to 10 layers semantic classification technique is highly desirable. 我们提出的结构帮助识别由特定技术处理或由给定的描述(如MPEG-7标记)提供的属性的层次。 Our proposed structure help identify the level provided by a particular process or technique described by a given (e.g., MPEG-7 tags) properties. 类型/技术在最基本的层我们的兴趣在于映象或视频序列的一般可视特征。 Type / technology we are interested in the general visual feature maps or video sequences in the basic layer. 映象或视频序列的类型的描述或用于产生此描述的技术是非常一般,但证明具有很大的重要性。 Type of image or video sequence or for the techniques described herein are described very generally, but proved to be of great importance. 例如映象可以放入如绘图,黑与白(b&w),彩色照片,素描那样的类别。 For example, such image can be placed in the drawing, black and white (b & amp; w), color photographs, as sketches category. 在此层的有关分类型式已在概念上作出,并在WebSEEK中自动做。 It has been made in this layer pattern on the classification of related concepts, and automatically do WebSEEK in.

在数字照相的情况,两个主要类别可以是彩色和灰度,带有影响一般可视特征的附加类别/描述。 In the case of digital photography, two main categories may be color and gray, with the additional impact of the general categories of visual features / description. 这些能包括颜色数目,压缩型式,分辨率等。 These can include the number of colors, type of compression, and resolution. 我们注意到,这些中某些可以与这黑描述的非可视索引状况有某些重叠。 We note that some of these may have some overlap with this non-visual conditions index black described. 全局分部在前一层中类型/技术给出有关映象或视频序列的可视特征的一般信息,但只给出有关可视内容的很少信息。 Global Segment type given in the preceding layer / technical information about general features visual image or video sequence, but gives only little information about the visual content. 全局分布目的在于根据图象或视频序列的全局内容将其分类,并借助如色谱的敏感性(颜色),和频率的敏感性(纹理)那样的低层感觉特征来测量。 The global distribution of the global object to the content of the picture or video sequence will be classified, and with the sensitivity (color), such as chromatography, frequency and sensitivity (texture) as low-level sensory characteristics measured. 在此层不处理内容的单独部分(即在测量是全局性进行的意义上对这些分布不给出“形式-form”)。 Part of this layer is not in a separate processing contents (i.e., in measurements performed on the overall significance of these distributions do not give "form -form"). 因此,全局分布特征可以包括全局颜色(如主色调,对比度),全局形状(如纵横比),全局运动(如速度,加速度,和轨迹),和时间/空间尺度(如空间面积和时间尺度),及其他。 Thus, the overall distribution can include a global color (e.g., primary colors, contrast), the overall shape (e.g., aspect ratio), the global motion (such as velocity, acceleration, and trajectory), and the temporal / spatial scales (e.g., floor space and time scales) ,and others. 例如,考虑两个具有类似纹理/颜色的图象。 For example, consider two images having similar texture / color. 在这特定的情况注意到,这些属性十分有用,但如果用户要搜索一个对象,它们就不那么有用。 In this particular case we note that these properties are useful, but if you want to search for an object, they are not so useful.

虽然这些测量中的某些对一个观察者而量难以量化,这些全局低层层特征已经成功地用于各种基于内容的检索系统来实现范例查询(QBIC,WebSEEK,Virage),并用于组织数据库的内容,用于浏览。 While certain of these measurements to a viewer and difficult to quantify the amount of these layers of low global features have been successfully used to implement various organizational database query by example (QBIC, WebSEEK, Virage) content-based retrieval system, and for content for browsing. 局部结构与不提供有关映象和视频序列的各单独部分的任何信息的全局结构相反,局部结构层关注映象的各部分的提取和特征。 Local structure of any global structure does not provide information about a video sequence image and each individual portion opposite to, and extracting features of each part of the image of local structure level attention. 在最基本层上,那些部分从低层处理导出,并包括如点,线,包调,颜色和纹理。 At the most basic level, those derived from the lower portion of the process, and comprising as points, lines, packet transfer, color and texture. 在Visual Literacy文献中,上述中某些被称作为可视通信的“基本单元”并认为是基本语法符号。 Visual Literacy in the literature, some of which are referred to above as the "base unit" and visual communication that is substantially Syntax Notation. 局部结构属性的另外例子是时间/空间位置(如起始时间)及(重心),局部颜色(如MXN布局),局部运动,局部变形,和局部形状/2D几何(如边界框)。 Additional examples of local structure attribute is the temporal / spatial location (e.g., start time) and the (center of gravity), local color (e.g., MXN layout), local motion, local deformation, the local shape and / 2D geometry (e.g. a bounding box). 有各种图象,其中这些类型的属性是重要的。 There are a variety of image, wherein the properties of these types is important. 在X射线和显微镜的图象中常常着重关注局部细节。 In the X-ray image of the microscope and often focus on local details. 那些单元也已用在基于内容的检索系统中,主要在按用户草图询问的接口上。 Those cells have also been used in a content-based retrieval system, mainly in the sketch by the user query interface. 这里不关心对象,而是关心表示它们的基本单元及这种单元的组合,例如,一个正方形由4条线组成。 Here is not the object of interest, but about a combination thereof indicates the base unit and such a unit, for example, a square of four lines. 在此意义上,我们此处能包括某些“基本形状”,如圆,椭圆和多边形。 In this sense, we can here comprise certain "basic shape", such as circles, ellipses, and polygons. 注意,这能考虑成人们在感觉到可视信息时完成的非常基本的“分组”的层。 Note that this can be considered a very basic level "packet" into finished when people feel visual information. 全局组成在此层我们的兴趣在于由局部结局给出的基本单元的特定安排,但重点是在全局组成。 In the overall composition of this layer we are interested in a particular arrangement basic unit is given by the local outcome, but the focus is on the global composition. 换言之,我们将映象作为整体分析,但使用上述的基本单元(线,圆等)于分析。 In other words, we analyze the image as a whole, but the use of the above-described base unit (line, circle, etc.) in analysis.

全局组成讨论在映象中单元的安排或空间布局。 Global spatial arrangement or composition discussed placement unit in the image. 业内的传统分析描述如平衡,对称,兴趣中心(如注意力中心或焦点),主线,视角等。 Analysis of the industry such as described in the conventional balance, symmetry, the center of interest (e.g., center or focus of attention), the main line and viewpoint. 但是在此层没有特定对象的知识;只考虑基本单元(即点,线等)及其组合。 However, no specific knowledge of this layer of the object; only consider the base unit (i.e. point, line, etc.), and combinations thereof. 在此意义上,一个图象的视图被简化成只包含基本语法符号的图象:一个图象由线,圆,方块等构成的组表示。 In this sense, it is a simplified view of the image into the image containing only the basic syntax of the symbols: an image represented by the group consisting of lines, circles, squares and the like. 普通对象直到上一层,强调的是图象的感觉方面。 Until ordinary objects on a layer, the emphasis is feeling aspects of the image. 在上面任何层上不需要现实世界的知识来实现索引,而且自动化技术只依赖于低层处理。 It does not require knowledge of the real world at any level to achieve the top index, and automation technology rely only on low-level processing. 虽然对自动索引和分类这是个优点,研究证明人类主要使用高层属性描述,分类,和搜索图象。 Although automatic indexing and classification This is an advantage, studies have shown that humans mainly use high-rise property description, classification, and image search. 对象是特别重要,但它们在不同层也放在类别中-一个苹果能归类成Macintosh苹果,苹果,或水果。 Object is particularly important, but they can also be placed in different layers categories - an apple can be classified as Macintosh apples, apple, or fruit. 当谈到普通对象,我们着重于基本层分类:对象描述的最一般层。 When it comes to ordinary objects, we focus on the base layer category: the most general description of the object layer. 在业内的研究中,此层对应于预图解(Pre-Iconography),且在信息科学中称它为层的属(generic)。 In the study of the industry, this corresponds to the pre-layer is illustrated (Pre-Iconography), and, in the case of information science (generic) called layer. 在这些概念和我们的普通对象的定义的共同基础想法是,为识别对象只需要一般的日常知识。 A common basis for the definition of these concepts and ideas in our ordinary object is to identify the object only needs normal everyday knowledge. 例如,Macintosh苹果应归类成此层的苹果:即是那个对象的最一般的描述层。 For example, Macintosh apples to apples should be classified in this layer: That layer is the most general description of that object.

在我们的定义和业内以前使用的定义之间的可能差别基于下述事实,我们将可视对象定义为能见到的实体,某些时候不同于对象的传统定义。 Possible differences between our definition and the definition of the industry previously used based on the fact that we will be able to visualize an object is defined as an entity seen, sometimes different from the traditional definition of the object. 象天空或海洋那样的对象在传统定义下可能不认为是对象,但对应于我们的可视对象(以及象汽车,房子等的传统对象)。 Like the sky or the sea as an object under the traditional definition it may not be considered as an object, but corresponds to our visual objects (as well as traditional objects like cars, houses, etc.). 普通场景正象一个图象能按照出现在其中的单个对象来索引那样,有可能根据它包含的所有对象组和它们的安排作为整体来索引该图象。 Just as an ordinary scene image can be indexed by a single object that appears therein, it is possible according to the set of all objects it contains and their arrangement as a whole the image index. 场景类型的例子包括城市,风景,室内,室外,静止生物,肖象等。 Examples of the type of scene including urban, landscape, indoor, outdoor, stationary organisms, portrait and so on. 在自动场景分类的某些工作已经完成,而且在基本场景分类的研究也存在。 Some work in automatic scene classification has been completed, but there are also studying the basic scene classification.

此层的准则是只需要一般知识。 Guidelines of this layer is required only general knowledge. 为确定是一个城市的场景不必要知道特定的街道或建筑物的名称,为确定是一个肖象不需要知道人物的名字。 To determine a city's scene is unnecessary to know the name of a specific street or building, to determine a portrait does not need to know the names of the characters. 特定对象与以前的层相反,特定对象讨论能被识别和命名的对象。 A particular object layer opposite to the previous, specific objects discussed objects can be identified and named. Shatford称此层为在图象中的对象需要的特定知识的细节(Specific),而且那样的知识往往是客观的,因为它依赖已知的事实。 Shatford said the details of a particular knowledge of this layer in the image of the object required (Specific), and that kind of knowledge is often an objective, because it relies on known facts. 例子包括个人和对象。 Examples include individuals and objects. 特定场景此层模拟一般场景,基差别在于有关于场景的特定知识。 This layer is generally a specific scene simulation scenario, group specific differences that have knowledge about the scene. 虽然在映象中的不同对象以不同的方式用于确定该图象描画出的一个特定场景,有时单元个对象已足够。 Although the different objects in the image in a different manner for determining the image depicting a scene of a specific, cell objects may be sufficient. 例如,一个清楚显示埃菲尔铁塔的照片能归类成巴黎的场景,它只根据那个对象。 For example, it shows a clear picture of the Eiffel Tower in Paris can be classified into the scene, it only based on that object. 抽象对象在此层,使用有关该对象代表什么的专门的或解释性的知识。 Abstract objects in this layer, using the knowledge of what the object represents a special or explanatory. 这在业内称之为Iconology(解释)或大约(about)层。 This is referred to in the industry Iconology (interpretation) or about (About) layer. 它是完全主观的且在不同用户之间的评估变化很大,在此意义下此索引层是最困难的层。 It is a great and entirely subjective assessment vary between different users, and in this sense this index layer is the hardest layer. 此层的重要性在实验中示出,其中观察者使用抽象的属性描述映象。 The importance of this layer is shown in the experiment in which a viewer attribute is described using an abstract image. 例如,照片中的一个女人对一个观察者可以表示愤怒,对另一个观察者多半是忧郁。 For example, the photo of a woman on a viewer can express anger, another observer mostly melancholy. 抽象场景抽象场景讨论图象作为整体表示什么。 Abstract scene abstract scene as a whole to discuss what the image represents. 这是非常主观的。 This is very subjective. 有时用户以感情的(如情绪)或抽象的(如气氛,主题)术语描述映象。 Sometimes a user to feelings (such as emotional) or abstract (such as the atmosphere, themes) term used to describe images. 在抽象场景层的其他例子包括悲伤、幸福、权力、天堂和乐园。 In other cases abstract scene layer include sadness, happiness, power, heaven and paradise. 层之间的关系我们已选择了金字塔表示,因为它直接反映了在我们结构中固有的某些重要的结果。 The relationship between the layers we have selected the pyramid said, because it directly reflects some important results inherent in our structure. 很清楚,在金字塔的较低层为了完成索引需要更多的知识和信息。 It is clear that in order to complete index requires more knowledge and information in the lower level of the pyramid. 此知识是由每层的宽度表示。 This knowledge is represented by a width of each layer. 但是,重要的是指出,此假设可以具有某些例外。 However, it is important to note that this assumption may have some exceptions. 例如,一个通常的观察者可以不能够确定用于创作一幅画的技术,但是一个在艺术领域内的专家能够精确地确定使用什么。 For example, a normal observer may not be able to determine the technology used to create a picture, but an expert in the field of art can accurately determine what to use. 在此特定情况的索引在类型/技术层需要的知识比在普通对象层的要多(因为需要有关艺术技术的专门知识)。 In this particular case, the index of the type of knowledge / technology layers need to be more common than in the object layer (because of the need specialized knowledge of the art technology). 但是,在大多数情况为了索引所需要的知识在我们的结构中从顶层到底层增加:识别一个特定的场景(如纽约的中央公园)比确定普通场景层(如公园)需要更多的知识。 However, in most cases the knowledge required for the index increased in our structure from top to bottom: identifying a particular scene (such as New York's Central Park) than ordinary scene to determine the layer (such as parks) need more knowledge.

虽然层间的依赖关系存存,当观察一个图象时每一层可以看作独立的景象或范围,而且处理每一层的方法取决于数据库,用户和目的本质。 While the dependencies among the memory storage layer, when viewed in each layer can be considered as a separate image scene or range, and the processing of each layer depends on the database, the nature and purpose of the user. 可视内容关系在本章节中我们简单地提出对象映象单元之间的关系的表示。 Relationship between visual content in this section we propose simply showing a relationship between the object unit image. 此结构适应不同层次的关第,且基于以前提出的可视结构。 This structure adapt to different levels of off first, and based on the visual structure previously proposed. 我们注意到,某些层次上的关系在应用于实施结构的实体之间时(如从不同图象的场景可以比较)最为有用。 We note that on some level of relationship between entities in the implementation structure applied (as can compare different images from the scene) is most useful. 在每层中的单元按照两类关系联系:语法与语义(只对5到10层)。 Means in each layer in accordance with the two contact relationship: Syntax and semantic (only 5 to 10 layers). 例如,两个圆(局部结构)可以空间上(如相邻),时间上(如在前)和/或视觉上(如更黑)相联系。 For example, two circles (partial structure) may be spatially (e.g., adjacent), the time (e.g., front) and / or on a visual (e.g., darker) linked. 在语义层的单元(如对象)可以具有语法和语义的关系(如两个人互相挨着,他们是朋友)。 In the cell (e.g., an object) may have a semantic relation layer syntax and semantics (e.g., two next to each other, they are friends). 此外,每个关系能在不同的层(普通的,特定的,和抽象的)上描述。 Further, each of the relations can (normal, specific, and abstract) The description of the different layers. 我们注意到,在层1,6,8和10之间的关系在由结构表示的实体之间(如在图象之间,在图象的各部分之间,在场景之间等)最有用。 We note that the relationship between the layers 1,6,8 and 10, between the entities represented by the structure (e.g., between the image, between the portion of the image between scenes, etc.) is most useful .

可视结构能分成语法/感觉和可视概念/语义。 Syntax structure can be divided into a visual / visual and sensory concepts / semantics. 为表示关系,我们观察那样的划分并考虑如下:(1)一个对象的知识包含对象空间尺寸的知识,即它在空间中典型的,可能的,或实际的延伸的可分等及的特征;(2)空间知识隐含某结座标轴系的可用性,后者确定空间中对象之间某些尺寸和距离的指示。 Is a relationship, we observed that the division and consider the following: Knowledge (1) contains a knowledge target object spatial dimensions, i.e. it is typically in space, it is possible, or the like, and the actual features extending separable; (2) implicit knowledge spatial coordinates of a shaft junction availability, indicates certain dimensions and which determines the distance between the object space. 我们用此表明发生在可视结构语法层中的关系只能发生在2D空间,因为不存在对象的知识(即不能确定在3D空间中的关系)。 We show that this relationship occurs in the visual syntax layer structure can only occur in a 2D space, because knowledge of the object does not exist (that is, the relationship can not be determined in 3D space). 例如,在局部结构层只考虑可视识别能力(Literacy)的基本单元,所以在那层上的关系只在这种单元之间考虑(即它不包括3D信息)。 For example, considering only the partial structure of the visual recognition capability layer (Literacy) of the base unit, so that the layer on the relationship between such units only consider (i.e., it does not include 3D information). 但是在5到10层的单元之间的关系能按照2D或3D描述。 However, the relationship between unit layers 5-10 can be described in terms of 2D or 3D.

以类似的方法,关系本身划分成语法(即与感觉有关)类和语义(即与意义有关)类。 In a similar way, the relationship itself is divided into grammar (ie, sensory-related) classes and semantics (ie meaning related) class. 语法关系可以发生在任何层的单元之间,但语义关系只能发在5到10层的单元之间。 Grammatical relationships may occur between any of the layers of the unit, but can only send semantic relationship between unit layers 5-10. 例如,绘画中不同颜色之间语义关系能被确定(如颜色混合是暖色调-warm),但我们不把这些包括在我们的模型层上。 For example, painting color different semantic relations can be determined (e.g., color mixture is warm -warm), but we do not include these in our model layer.

我们将空间关系划分成下列类:(1)拓扑的(即单元的边界如何联系)和(2)取向的(即单元如何互相有关地放量)。 We will divide the spatial relationships into the following classes: (1) topology (ie, how the boundary unit linked) and (2) orientation (ie how mutually related unit volume). 拓扑关系包括近、远、接触等,取向的关系包括与之针对,在其前面等。 Topology including near, far, and so the contact relationship therewith for alignment comprises, in front of the like.

时间关系讨论单元在时间方面的连接(如在视频中包括在前,在后,其间,等)。 The time to discuss the connecting unit with respect to time (e.g., included in the video in the front, after, during, and the like). 而可视关系讨论可视特征(如兰,黑,等)。 Relationship between visual features discussed visualized (e.g., blue, black, etc.). 语义关系与意义相联系(如其主人,其朋友,等)。 Semantic relationships and meaning in connection (such as its owner, his friend, and so on).

以类似于可视结构单元具有不同层(普通,特定,抽象)的方法:可视联系能在不同层上定义。 In analogy to layers having different visual structural unit (general, specific, abstract): the visual contact can be defined on different layers. 语法关系可以是普通的(如近)或特定的(如数字距离测量)。 Grammatical relations may be an ordinary (e.g., near) or specific (such as a digital distance measurement). 语义关系可以是普通,特定,或抽象。 Semantic relations may be common, specific or abstract.

作为一个例子,空间的全局分布可用距离直方图表示,局部结构用局部构成部分之间的关系(如可视单元之间的距离)表示,全局组成由可视单元之间的全局关系表示。 As one example, the available space from the global distribution histogram, partial structure represented by the relationship between the local component (such as the distance between the visual elements), the overall composition represented by the overall relationship between the visual elements. 非可视信息如在本章节开始所说明的那样,非可视信息讨论不直接是图象一部分但以某种方式与其相关的信息。 Non-visual information as described in the beginning of this section, the non-visual information is not directly discussed, but in some way part of the image information related thereto. 人们可以将属性划分成传记的和关系的属性。 One can divide the property into a biography and property relations. 虽然对非可视信息可能由声音,文字,超链接文本等组成,这里我们的目的是提出一个对索引给出一般准则的简单准则,我们简单地只集中在文本信息。 Although non-visual information may be provided by voice, text, hyperlinks, text, etc., where our aim is to propose a simple criterion gives general guidelines for the index, we simply focus only on text messages. 图10给出此结构的概貌。 Figure 10 shows an overview of this structure. 传记的信息真实抽象的来源可以是直接的(如自然景色的照片)或间接的(如雕塑,绘画,建筑物、图画的图象)。 Abstract real source of information biography can be direct (such as natural scenery photos) or indirect (such as sculpture, painting, building, drawing the picture). 在两种情况可以有传记信息与图象相联系。 In both cases there may biographical information associated with the image. 在两种情况可以有传记信息与图象相联系。 In both cases there may biographical information associated with the image. 此信息本身能对图象中的若干对象重复(如罗马西斯廷教堂的天花板可以具有关绘画和教堂本身的信息),只存在于该图象,或完全不存在。 This information itself can be repeated a number of images of the object (such as the Roman Sistine ceiling may have information about itself painting and Church), present only in the image, or completely absent. 在大多数情况,传记信息不直接关系到该图象的主题,而关系到作为整体的图象。 In most cases, biographical information not directly related to the subject of the image, and the relationship to the image as a whole. 例子包含作者,日期,标题,素材,技术等。 Examples include author, date, title, material and technology. 有关信息非可视信息的第二类直接以某种方式与图象连接。 Non-visual information about the second type of information is directly connected with the image in some way. 有关的信息包含字幕,文章,声音记录等。 It contains information about the subtitle, articles, sound recording and so on.

如上讨论,在许多情况此信息有助于实现在可视结构中的某些索引,因为它包含有关在映象中画什么(即主题)的特定信息。 As discussed above, in many cases, this information can help achieve some of the index in the visible structure, because it contains specific information about what the painting (which is the subject) in the image. 在此情况,它在语义层一般非常有用,因为语义层需要通常只在图象中不出现的更多的知识。 In this case, it is generally useful in the semantic layer, because the semantic layer is typically only requires more knowledge does not appear in the picture. 但是,在某些情况,该信息不直接联系该图象的主题,而是以某种方式上该图象相关。 However, in some cases, the information is not directly the subject of the image, but in some way related to the image. 例如,一个伴随着肖象的声音记录可以包括与所画的人物毫无关系的声音,虽然它们与该图象联系关且若需要的许可以被索引。 For example, along with a portrait of the sound recording may include sound nothing to do with drawn figures, although they are associated with the picture off and, if need permission to be indexed. 物理属性物理属性简单地关系到对作为一个物理对象的映象必须做的事。 Physical properties related to the physical attributes of simply as a physical object mappings must do. 这可以包括图象的位置,原始来源的位置,存储(如大小,压缩)等。 This may include the position of the image, the original source location, storage (e.g., size, compression) and the like. 在索引结构之间的关系我们定义了一个语义信息表来汇集有关图象的高层信息(见图11)。 The relationship between the index structure we define a semantic information table to bring together high-level information about the image (see Figure 11). 此表能用于各个对象,对象组,整个场景,或图象的各部分。 This table can be used for each object, each object group portions, the entire scene, or an image. 在大多数情况可视及非可视住处被用于填入表内,单从可视内容可能不容易确定如室内/室外那样简单的场景类别;位置在映象中不明显等。 In most cases, visual and non-visual accommodation is used to fill in the table, from a single visual content may not be easy to determine the indoor / outdoor scene category as simple; inconspicuous location in the image and the like. 各个对象能根据非可视住处分类及命名,用于在可视对象和要领对象之间的映射。 The individual objects can be classified and named non-visual accommodation, visual objects for mapping between the object and the essentials.

在图11中,可视与非可视住处能用于在语义上表征一个图象或其各部分。 In Figure 11, non-visible visual accommodation can be used to characterize an image or portions thereof semantically. 这两个形式住处用于回答语义表中的问题的方式可随内容而变。 These two forms of embodiment answered questions for the accommodation of the symbol table may vary with the content. 该表有助于回答轺下列的问题:主题是什么(人物/对象等)? The light carriage following table helps to answer the question: What is the topic (characters / objects, etc.)? 主题在做什么? What topics do? 主题在哪里? Where is the theme? 何时做? When to do it? 为什么做? Why do? 该表能用于各个别对象,对象组,整个场景,或图象的各部分。 The table can be used for each portion of each individual object, group of objects, the entire scene, or an image.

当得该表应用于从5层开始的每一层时,在此结构与可视结构之间的关系十分明白。 When the table is obtained when each layer is applied starting from the layer 5, the relationship between this structure and the visible structure very clear. 我们也注意到,该表提供了关于图象的某些住处的紧凑表示,它不替代所提出的索引结构。 We also note that the table provides some quarters about the compact representation of an image, it does not replace the proposed index structure. 结构组提供最完全的描述。 Group structure to provide the most complete description.

有了合适的索引结构,我们能着眼于如何能组织一个数字库的内容。 With the right index structure, we can focus on how to organize the contents of a digital library. 在下一章节中,我们分析在组织和检索图象中起关键作用的问题。 In the next section, we analyze problems play a key role in organizing and retrieving images. 特征,相似性和归类为了成功地建立图象数字库,不仅重要的是理解数据,而且要理解人们关一类的论点。 Characteristics, similarities and classified in order to successfully establish a digital image library is not only important to understand the data, but also to understand people off a kind of argument. 在本章节中我们讨论在这方面重要的论点,并解释我们是怎样应用此要领于建立我们的图象索引试验基地。 In this section we discuss important argument in this regard, and explain how we apply the essentials to build our image index test site. 首先我们讨论归类。 First, we discuss the classification. 然后,我们讨论在归类中的层和结构。 Then, we discuss the structure of the layer and collation. 最后我们提出有关属性和相似性的某些论点。 Finally, we present some arguments about the properties and similarity. 归类和分类归类可以定义为将一组实体作为等价的处理。 Category classification and collation processing can be defined as a group of entities as equivalent. 类是实体或要领所属的若干基本的和不同的分类,在类内的实体看来更相似,而类之间的实体不那么相似。 Class is different and a number of basic essentials belongs or categorization entities within the class seems more similar, but less physical similarity between classes. 但是在归类以前,重要的是对归类的数据的本质有一个理解。 But before the classification, it is important to have an understanding of the nature of the classified data. 我们现集中讨论能够使用的类的类型。 We are now able to focus on the type of use of the class. 在分类的文献中,研究者已识别两种类:(1)敏感的感觉类(如纹理,颜色或说话声音/e/),和(2)普通知识(GK)类(如自然类—鸟,人造物—汽车,和事件—吃)。 Classification literature, researchers have identified two types: (1) sensitive sensory (such as texture, color, or voices / e /), and (2) Common Knowledge (GK) (such as natural kind - Bird, artifact - car, and events - eat).

在我们的结构中我们识别如颜色和纹理那样的敏感的感觉。 In our construction we identified as sensitive as the color and texture feeling. 但是GK类起了非常重要的作用,因为用户主要关心出现在映象上的对象以及那个对象表示什么。 But GK class played a very important role, because users are mainly concerned about what appears on the image of the object and the object represents. 认识心理学中的某些理论表示在GK类中的分类是如下做的:规则:使用实体的属性值(如,规则:在人们的类别中的一个图像应有人在其中)。 Some theoretical understanding of psychology indicates classification in class GK is done as follows: Rule: Use the property value of the entity (eg, rule: a picture in people's category in which someone should).

原型:类别的原型包括其类别的模型的特征属性。 Prototype: Prototype class attributes comprising a characteristic model of its category. 这此进贯穿于类别成员之间最可能发生的属性,但对类别的全体成既不必要也不充分。 It runs through the property into this category among the members most likely to occur, but to all categories of neither necessary nor sufficient. 一个新的图象根据它如何类似于该类的原形来分类(如风景类的原型能是简单的日落的素描)。 A new image based on how it is similar to the prototype class classification (such as the prototype class scenic sunset can be a simple sketch).

模型:按其最类似的模型的类分类的实例(如,替代对人的类别有一个规则的方法,我们可以在那上类中有一组例子图象,并使用那些于分类)。 Model: instance of the class classification according to their most similar model (e.g., a substitute for regular human class methods, we can have a set of examples that the image in the class, and those used in classification).

借助于将映象组织到一个数据库此论据是有用的,因为我们能使用此技术来实现分类,并将结果提供给用户。 By means of the image organized into a database This argument is useful because we can use this technology to achieve the classification, and the results are provided to the user. 这些要领已被用于开发我们的图像索引试验基本中。 These essentials have been used to develop our image index in the basic test. 类结构类结构是数字库中的关键因素,并引起若干重要的议题,在此我们简单的讨论。 Class structure class structure is a key factor in the digital library, and cause a number of important issues, here we briefly discussed. 应考虑下列议题:在类之间的关系(如层次关系或实体关系),实现分类的抽象层(如由Rosch研究的)暗示者存在基本层和下级/上级层类,水平类结构(即每个类应如何被组织且大每个类中单元的全体成员的程度能是模糊的或二进制的)等。 You should consider the following issues: the relationships (such as entity-relationship or hierarchy), to achieve an abstraction layer between the classification (such as research by Rosch) implied by the presence of a base layer and a lower / higher layer type, a horizontal type structure (i.e., every the extent of all the members of a class should be organized and how big middle section of each class can be vague or binary) and so on.

除了在索引可视信息时考虑不同的分析层次以外,测量相似性的方法是很重要的。 In addition to considering different levels of analysis when indexing visual information, the method of measuring the similarity is very important. 有关相似性测量的问题包括考虑的层次(如部分对全体),审查的属性,属性的类型(如我们结构的层),整个范围是否可分等。 Issues include similarity measure levels (such as part of the whole), the properties examined, the type of the attribute (such as a layer of our structure) is considered, whether the entire range can be divided and the like. 图象索引试验基地我们开发了一个图象索引试验基地,它包含了这里提出的要领,根据这里列出的结构使用不同的技术索引图象。 Image index test site we have developed a test base image index, which contains the essentials presented here, the structure of listed here use different techniques index image. 尤其是对类型/技术层我们使用了辨别分析。 Especially for the type / technology layer that we used discriminant analysis. 对于全局分布层我们使用全局颜色直方图及Tamura纹理测量。 For global distribution layer we use a global color histogram and Tamura texture measurements. 在局部结构层,通过使用自动分段以及相缘变换及投影直方衅的多惊讶分段弯曲直方图我们能如VideoQ中那样进行草图查询。 In the partial structure layer, by using an automatic segmentation and edge phase transformation and a projection histogram Troubles curved segment histogram more surprisingly we can have in the sketch query As VideoQ. 通过完成自动分段和合并产生的区域得到图象的图标表示而获得全局组成。 The image obtained by performing region segmentation and merger automatic icon indicates the overall composition is obtained.

使用Visual Apprentice(可视学徒工)自动检测普通对象。 Use Visual Apprentice (apprentice visual) to automatically detect an ordinary subject. 在VisualApprentice中通过定义一个对象定义层次(即规定一个对象及其各部分的模型),并提供带有例子的系统建立可视对象检测器。 In VisualApprentice object definition by defining a level (i.e., a predetermined portion of each object and its model), and provides the system with an example of the establishment of a visual object detector. 由在层次结构中不同层次(区域,感觉的,对象部分,和对象)的系统自动学会多重分类器,并在实现自动分类时自动选择最好的分类器并被组合。 By the different levels in the hierarchy (the region, the feeling of the object portion, and the object) is automatically learn multiple classifier system, and automatically selects the best when the automatic classification and classification combination. 我们也使用AMOS系统实现对象的人工标记及对象的搜索。 We also use manual tagging and search objects AMOS system to achieve the object.

在普通场景层我们完成城市对风景以及室外对室内的分类。 In general we completed the scene layer classification of urban scenery and outdoor to indoor. 这可以利用OF*IIF技术自动地做,OF*IIF技术结合如可供使用的纹理特征(如从图象的标题)及专门的对象检测器(如面部或天空检测器)实现图象区域的聚类及分类。 This technique may be utilized OF * IIF done automatically, texture features OF * IIF technologies such as available (e.g., from the title picture) image areas and to achieve specific object detector (such as the sky or the face detector) clustering and classification.

有关特定对象的场景的住处使用一个提取人,地方等的名字的系统,从有关的住处获得。 Residence scene about a particular object extraction using a name people, places and other systems, obtained from the relevant quarters. 在抽象层的标记进行时用人工完成的。 When marking abstraction layer is done manually. 声频本发明的优点的另一个说明性讨论通过列出它结合表示声频内容的数字信号的使用的一个范例描述而得到。 Audio Another advantage of the invention is illustrative discussion lists obtained by using it in conjunction with an example digital signal representation of the audio description of the content.

我们以前提出索引图象的可视性内容单元(如区域,整个映象,事件等)的一个10层概念结构。 A conceptual structure layer 10 we propose visibility index picture content unit previously (e.g., region, the entire map, events, etc.). 在那个工作的分类只涉及用于视频内容的描述符(即本意不是对“元数据”的,例如,拍照人的名字不是可视描述符)。 In that job classification relates only descriptor for video content (ie, was not intended to "meta-data", for example, the camera person's name is not a visual descriptor).

在本文中,我们提出根据以前提出的10层概念结构分类声频描述符(被包括在标准的MPEG-7声频部分)。 In this paper, we propose in accordance with the previously proposed classification conceptual structure layer 10 audio descriptor (included in the standard MPEG-7 audio portion). 我们提出的金字搭结构包括与以前结合图3和图4描述的可视性结构恰恰相同的层次。 We propose to take the gold structure include a combination of the previous Figures 3 and 4 the visibility of the structure described exactly the same level. 但是每个层次涉及声频单元而非可视单元。 However, each level relates to audio units and non-visible section. 在原始结构中,一个对象对应一个可视实体。 In the original configuration, an object corresponding to a visual entity. 在新的结构中,一个对象对应一个声频实体(如人的语音)。 In the new structure, an object corresponding to an audio entity (e.g., a human voice).

在语法和语义之间区分的重要性广泛地被研究者在图象和视频索引的领域认识到。 The importance of the distinction between syntax and semantics widely by researchers in the field of image and video indexing recognized. 虽然我们未觉察到对声频内容相似的研究,从考查的研究得出的结果建议,此区分在声频索引方面很有用。 Although we are not aware of the research on similar audio content, derived from studies examining the results suggest that this distinction is useful in audio index. 例如,在住处检索和认识心重 学方面的研究已同个人如何使用不同的层描述(或索引)图象/对象。 For example, in the study and understanding of cardiac residence retrieval aspects we have to relearn how to use with individual layers described in different (or index) image / object. 虽然我们提出的某些划分不严格,应该予以考虑这些划分,因为它们在如何索引、处理声频内容并将那样的内容向用户(如应用或观察者)表示方面具有直接的影响。 While some of our proposed division is not strict, these divisions should be considered, as they are in how indexing, processing, and audio content such as content expressed to the user (such as an application or observer) aspects have a direct impact.

以前对可视属性提出的结构吸引了有关图象索引的不同领域的研究,它也提供了能容易地应用于声频的紧凑并有组织的分类。 The structure of the visual attributes previously proposed by attracting research in different fields related to the image of the index, it also offers audio can be easily applied to a compact and organized classification. 该结构是直觉的和高度起作用的,并强调需要,需求和不同索引技术(人工和自动)的限制。 The structure is intuitive and highly functioning, and stressed the need, demand and different indexing techniques (manual and automatic) restrictions. 例如,对声频段的索引代价(计算的或以所化人力)通常在金字塔的较低层较高:自动确定内容的类型(音乐还是声音),相对识别普通对象(如男人的声音),相对壹对象(如比尔·克林顿的声音)。 For example, the cost index of acoustic band (or in the computing of Human) generally higher in the lower layer of the pyramid: type (music or voice) to determine automatically the content, relative to normal object identification (e.g., a man's voice), the relative One objects (such as Bill Clinton's voice). 这也隐含着,在较低层需要较多的住处/知识,而且如果一个用户(如应用)对另一个用户(如应用)作出一请求,有一个清晰度问题,牵涉及需要多少附加信息,或一个用户从如5层声频分类器期望什么级别的“服务”。 This also implies, in the lower level needs more residence / knowledge, and if a user (such as applications) made a request to another user (such as applications), there is a problem definition, involve and how much additional information is needed or from a user such as a five-layer acoustic classifier expect what level of "service." 此外,属性和关系的此等分解具有很大价值,因为人们经常根据属性作出比较。 In addition, the relationship between these properties and decomposition of great value, because people often make comparisons based on attributes. 所提出的结构的好处已在对视频内容的基本实验中示出,且已经作出进行核心实验的努力。 The benefits of the proposed structure has been shown in experiments on the basic video content, and the efforts made by core experiments. 这些实验以及允许对声频索引的该结构的使用的灵活性意味着将这类描述符分类应到声频和视频内容的好处。 These experiments allow the use of the structure as well as audio index of flexibility means that this type of descriptors should classify the benefits of audio and video content.

在此例中我们描述了声频属性的分类。 In this example we describe the classification of the audio properties. 我们也描述声频的关系。 We also describe the relationship of the audio. 描述符的分类该提出的声频结构包含10层:头4层涉及语法,而余下6层涉及语义。 This proposed classification descriptors audio structure comprising Layer 10: first layer 4 relates to grammar, and the remaining 6 relates the semantic layer. 声频结构的概貌语法,而余下6层涉及语义。 Audio profile syntax structure, and the remaining 6 relates the semantic layer. 声频结构的概貌能从图3得出。 Audio profile structure derived from FIG. 每层的宽度是所需的知识/信息量的指示。 The width of each layer is indicated knowledge / information required. 语法层是类型/技术,全局分布,局部结构,和全局组成。 Syntax layer type / technology, global distribution, local structure, and the overall composition. 语义层是普通对象,普通场景,特定对象,特定场景,抽象对象,和抽象场景。 It is a general object semantic layer, an ordinary scene, a specific object, a specific scene, abstract objects, scenes, and abstract.

语法层分类语法描述符,即通过低层特征描述内容的那些描述符。 Syntax Syntax layer classification descriptors, i.e., by low-level features describing the content of those descriptors. 在可视结构中这些涉及出现在映象中的颜色和纹理。 These relate to the visual structure of the image appear in the color and texture. 在本文的声频结构中这些涉及声频信号的低层特征(它是音乐还是语音等)。 In the configuration of the audio signal is low-level feature herein relates to these audio (music or voice which is like). 例子包括基频,谐音峰值等。 Examples include the fundamental frequency, other harmonics peaks.

可视结构的语义层分类了有关对象和场景的属性。 Semantic classification layer structure visual properties about objects and scenes. 在声频结构的语义层是类似的,其差别在于分类是基于从声频信号本身提取的属性。 Audio semantic layer structure is similar, with the difference that classification is based on the properties of the extracted audio signal itself. 如在可视情况一样,在声频情况有可能识别对象(如男人的语音,小号的声音等)和场景(如街上噪声,歌剧,等)。 As in the case of visible as possible in the case of audio recognition target (e.g., a man's voice, the trumpet sound, etc.) and the scene (e.g., street noise, opera, and the like).

可视结构的每层是类似的以前已经予以解释。 Each visual structure that has to be explained like before. 接着,我们简单地解释每层,并描述它如何能用于声频描述符的分类。 Next, we simply interpret each, and describe how it can be used to classify the audio descriptor. 我们可交换地使用词:属性和描述符,并对每一层给出直观的例子,作出与可视结构的模仿以帮助阐明此解释。 We use the terms interchangeably: attribute and descriptor, and intuitive example given each layer, and to mimic the visual structure to help clarify this explanation. 对于语义层设想典型的无线电新闻广播是有用的,其中不同的实体可交换地使用:个人,噪音,音乐,和场景(如在现场报导,在记者报导前,后或期间常听到背景噪音或音乐)。 For the semantic layer envisioned a typical radio news broadcasts are useful where different entities are used interchangeably: personal, noise, music, and the scene (as in live coverage before reporters reported often hear background noise or during or after music). 类型/技术声频序列的类型的一般描述。 Type / audio technology in general type described sequence. 例如音乐,噪声,语音或它们的任意组合;立体声,声道数,等。 Such as music, noise, voice, or any combination thereof; stereo, number of channels, and the like. 全局分布描述声频的全局内容的属性,通过低层特征测量。 Global distribution describes global properties of the audio content, measured by low-level features. 在此层的属性是全局的,因为它们不涉及信号的各个别分量而涉及全局的描述。 Properties of this layer are global, because they do not relate to each of the individual components of the signals involving a global description. 例如,一个信号可以描述成高斯噪音,这种描述是全局性的,因为它不考虑任何局部分量(如什么单元或低层特征描述此噪声信号)。 For example, a signal can be described as Gaussian noise, such description is global because it does not consider any local component (such as what units or low-level noise signal characteristics described herein). 局部结构涉及在声频段中各个低层语法部的提取和特征。 Extraction and partial structure relates to low-level features of each portion of the syntax acoustic band. 与以前的层相反,这里的属性意味着描述信号的的局部结构。 In contrast to the previous layer, the properties described herein means a partial structure of the signal. 在一图象中,局部单元由在该图象中出现的基本语法符号给出(如线,圆等)。 In one image, the local unit is substantially given by the grammar symbol appears in the image (such as lines, circles, etc.). 此层在声频中用作同样的功能,所以任何低层(即不是如单词说话内容中的字母那样语义的)局部描述符应在此层分类。 This layer serves the same function in the audio, so any lower layer (i.e., not to speak a word such as the contents of the letter semantics) local descriptors to be classified in this layer. 全局组成根据基本单元(即局部结构描述符)的特定安排或组成的一个声频片段的全局描述。 The overall composition is described particular arrangement of the global base unit (i.e., a partial structure descriptors) or composed of an acoustic frequency segment. 虽然局部结构着眼于声频的特定局部特征,全局组成着眼于局部单元的结构(如它们是如何安排的)。 Although the partial structure to focus on a specific local features of the audio, the overall composition of the local unit, focusing on structures (e.g., how they are arranged). 例如,一个声频序列可用马尔科夫键表示(建模),或用任何其他使用低层局部特征的结构表示。 For example, an audio sequences can be expressed Markov key (modeling), or represent any other local features using low-level structure. 普通对象直到前一层,为实现索引不需大量的知识,定量特能自动从声频片段提取并分类成所描述的语法层,但是,当前声频片段借助语义(如认识)描述时,对象起了重要的作用。 Common Object until the layer, to achieve an index without a large amount of knowledge, quantitative Laid automatically from the frequency sound fragments extracted and classified into syntax layer described, however, the current audio segment by means of a semantic (e.g. recognized) describing the object from the an important role. 然而,对象能放在不同层次的类别中,一个苹果能分类成,Macintosh苹果,苹果,或水果。 However, the object can be placed at different levels of classes, it can be classified as an apple, Macintosh apples, apple, or fruit. 能基于声频片段识别一个对象,因而我们能作出相似的分类。 It can identify an object based audio clips, so that we can make a similar classification. 例如,我们能说一个声频实体(如语音)对应一个男人,或对应比尔.克林顿。 For example, we can say that an audio entity (such as voice) corresponds to a man, or the corresponding Bill Clinton. 在讨论普通对象时,我们的兴趣在于基本层类别:这是用日常知识能识别的对象描述的最一般的层。 In discussing ordinary objects, our interest lies in the base layer category: This is the most general level of knowledge can identify with everyday objects described. 这就意味着没有所谈论的对象的特定识别的知识(如爆炸声,雨声,敲击声,男人的语音,女人的语音等)。 This means that the knowledge of specific identification of the object is not talking about (such as explosions, rain, percussion, voice of a man, a woman's voice, etc.). 能在此层分类声频实体描述符。 Descriptor in this layer can be classified audio entity. 一般场景正如声频片段能按照各个对象索引,也可能根据其它包含的所有对象的集以及它们的安排作为整体索引该声频片段。 As a general scenario according to each audio object index fragment can also index as a whole may be set according to other contains all the objects and their arrangement of the audio segment. 声频场景类的例子包括街道噪声,运动场,办公室,人们交谈,音乐会,新闻编辑室等。 Examples include audio scenes like street noise, playground, office, people talking, concerts, and other newsroom. 这层的准则是只需要一般知识。 This layer needs only criterion is general knowledge. 不需要识别特定的声频实体(如是谁的语音),或特定的声频场景(如是哪个音乐会)来获得在此层的描述符。 Need not identify a particular entity audio (voice who case), or specific audio scene (in the case of which a concert) to obtain a descriptor in this layer. 特定对象与以前的层相反,特定对象涉及已识别及已命名的声频实体。 A particular object layer opposite to the previous, relates to the specific object identified audio and named entities. 需要特定的知识,且那样的知识通常是客观的,因为它依赖已知的事实,在此层识别和命名品噪声或声音。 Require specific knowledge, and that knowledge is generally objective, because it relies on known facts, in this layer and name recognition product noise or sound. 例子包括个人的语音(如“比尔.克林顿”)或特征噪声(如,纽约证券交易所的铃声),等。 Examples include personal voice (such as "Bill Clinton") or feature noise (such as the New York Stock Exchange bell), and so on. 特定场景此层类似于普通场景,基差别在于存在有关在声频片段的场景的特定知识。 This particular scene is similar to an ordinary scene layer, wherein there is a certain difference between the base knowledge about the scene in the audio fragments. 例如,马丁.路德.金的讲话“我有一个梦”,此声频场景能被特定地识别及命名。 For example, Martin Luther King's speech "I Have a Dream", this particular audio scene can be identified and named. 1968年在月球着陆等。 Such as the moon landing in 1968. 抽象对象在此层,使用读声频实体代表了什么的主观知识。 Abstract objects in this layer, using the subjective knowledge of reading audio entity represents something. 此索引层是完全主观的,在不同用户这间的评估变化很大,在此意义上这是最困难的层。 This index layer is entirely subjective, very different users in this assessment change, in the sense that this is the most difficult of layers. 对于图象,此层的重要性在实验中示出,其中观察者使用抽角属性及其他来描述图象。 For an image, the importance of this layer is shown in the experiment, where the viewer using the pumping angle and other attributes to describe the image. 在一个声频段也能为对象赋予感情属性。 In an acoustic band also gives emotional attributes to objects. 例如,一个声音(如在电影中,在音乐中)可描述或恐怖的,幸福的等。 For example, a voice (as in movies, in music) may be described or terror, happiness and so on. 抽象场景抽象场景层涉及,作为整体声频片段表示什么。 Scene scene Abstract abstraction layer relates, what it indicates as a whole an audio clip. 这可以是非常主观。 This can be very subjective. 例如,对于图象已示出用户有时用感情(如情绪)或抽象(如气氛,题目)术语描述图象。 For example, sometimes the image has been shown by feelings user (e.g., mood) or abstract (e.g., atmosphere, title) term used to describe the image. 类似的描述也能指定给声频片段,例如,描述一个声频场景的属性可以包括:悲伤(如人在哭),幸福(如人在笑),等。 Similar descriptions can be assigned to the audio clip, for example, a description of the audio scene attributes may include: sadness (e.g., crying person), happiness (e.g., people laugh), and the like. 关系关系的类型在本章节,我们提出在我们提出的内容单元之间的明显的关系类型。 Type of relationship relations in this section, we present a clear type of relationship between the content of our proposed unit. 这些关系类似于以前对可视内容提出的那些关系。 These relationships similar to those of past for visual content presented. 如图12所示,关系在以前结合图3提出的声频结构的不同层次上定义。 12, the relationship between different levels previously defined in conjunction with FIG. 3, of the acoustic structure. 为表示在内容单元之间的关系,我们考虑将基划分为语法的和语义的。 Is a relationship between the content units, we consider the base is divided into syntax and semantics.

在语法层,能有一个语法关系,即空间的(如“声音A近似于声音B”),时间的(如“同时的”),和声频的(“更响”)的关系,这些唯一地基于语法知识。 In the syntax level, we can have a grammatical relation, i.e., the space (e.g., "sound similar to the sound A B"), time (e.g., "simultaneous"), audio (the "louder") relationship, these uniquely based on knowledge of grammar. 空间和时间属性分成拓扑和有方向类。 Spatial and temporal attributes and directional topology into classes. 声频关系能进一步索引成全局的,局部的和组成的。 Relationship audio index further into global, local and composition. 如图12所示,在这些层中的单元能够不仅与语义关系,而且与语法关系相关(如“小号声接近小提琴”、“小号音调补充小提琴音调”)。 12, the unit layer can not only semantic and grammatical relations associated with (e.g., "trumpet close Violin", "Supplementary violin tone trumpet tones"). 我们区分两种不同类型的语义关系:如同义词,反义词,亚词/超词,部分词/全词那样的词法关系;和关于动作(事件)或状态的表达关系。 We distinguish between two different types of semantic relations: lexical relations such as synonyms, antonyms, word sub / super words, partial words / whole word; and the relationship between the expression of action (event) or state.

我们这里提出的关系模仿对视频信号提出的关系,两种情况的唯一差别在于使用的属性,而不在于关系。 Here we propose to imitate the relationship of the relationship between a video signal raised, the only difference between the two cases is that the use of the property, not in the relationship. 例如,从一个图象不可能说单元A比单元B更响。 For example, an image from said unit A can not louder than the cell B. 从一个声频段不可能说(除非在声频内本身中明确地表述)单元A比单元B更黑。 A possible sound from said band (unless expressly in the audio itself) unit A darker than the cell B. 但是,关系的类型是相同的:一个是声频,另一个是可视,但它们场是全局普通的(见表4)。 However, the type of relationship is the same: an acoustic frequency, and the other is visible, but they are global common field (see Table 4).

我们将用例子更透彻地解释语法和语义关系。 We will be more thoroughly explained with an example of the relationship between syntax and semantics. 下面的表3和表4综合了对关系的索引结构并民括例子。 The following Table 3 and Table 4, the integrated index structure including the relationship between people and examples. 语法关系我们将语法关系分成了类:空间的,时间的,和声频的。 We'll grammar grammatical relations relationship into the categories: space, time, and sound frequencies. 人们可能争辨,认为空间和时间关系只是声频关系的特殊情况。 It may argued that the relationship between space and time only audio special circumstances relations. 但是我们以专门的方法定义空间和时间关系,因为我们将单元分别看作空间和时间的边界而没有“关于”或持续期间的信息。 But we define the spatial and temporal relationships with specialized methods, because we are seen as the cell boundaries of space and time and no information "about" or sustained period. 见表3,它是所提出的语法关系类型的综合以反例子。 Table 3, it is counter-example to the type of comprehensive grammatical relations proposed.

我们将空间关系分成下列类型:(1)拓扑的,即单元的边界如何相关;和(2)定向的或有方向的,即单元互相相对地放置(见表表3)。 We spatial relationship into the following types: (1) topology, i.e., the boundary cell associated how; and (2) the orientation or direction, i.e., positioned relative to each other means (see Table 3). 注意,这些关系常常能从声频片段提取:例如,所新闻报导的立体声广播,常常容易为声频实体指定语法属性。 Note that these relationships are often extracted from the audio clip: for example, the stereo broadcast news reports, the syntax is often easy to specify properties for audio entity. 例如,有可能评估一个声音接近另一个声音,或更确切地评估在不同的声音来源之间的语法关系。 For example, it is possible to evaluate a sound closer to another voice, or more precisely evaluate the grammatical relationship between the different sound sources. 在这方面,人们可以确定在信号中可以是不明显的某些评细的拓扑的和有方向的关系。 In this regard, it may be determined in a signal and may have a certain direction Topological The PCT inconspicuous. 拓扑关系例子是“近于”,“在其中”,和“与其邻接”;有方向关系的例子是“在其前面”,“在其左面”。 Examples of topological relation is "close to", "where", and "adjoining"; directional relationship example is "in front of", "in its left." 注意,在这些关系和从可视信息中得到的关系之间差别依赖于关系本身的内涵,单从声频确定某些空间关系可以是更加困难,但是在建立合成的声频模型时,这些关系起看非常重要的作用。 Note that the relationship between and visual information obtained from the differences in these relationships depend on the connotation of the relationship itself, just from the audio to determine certain spatial relationships may be more difficult, but at the time of the establishment of the synthesized audio model, these relationships from watching a very important role.

以相似的方式,我们将时间关系分类成拓扑的和有方向的类(见表3)。 In a similar manner, the time will be classified as topology and directional classes (see Table 3). 时间的拓扑关系的例子是“同时发生”,“重叠”,“在其间发生”;有方向的时间关系是“在前发生”,和“在后发生”。 Examples of topological time is "simultaneous", "overlap", "therebetween happen"; the direction of the time is "first occurrence" and "takes place after." SMIL的同时和顺序关系是时间的拓扑关系的例子。 SMIL simultaneous and sequential relationship is an example of topology time.

声频关系根据它们的可视属性或特征联系声频实体。 Audio Information Audio entity relationships, according to their characteristics or visual attributes. 这些关系能够被索引或全局的,局部的,和组成的类(见表3)。 These relationships can be indexed or global, local, and classes (see Table 3). 例如,一个声频全局关系能是“噪声小于”(根据全面噪声特征),一个声频局部关系能是“声音大于”(根据局部响应测量),而一个声频组成关系能根据一个隐藏马尔科夫(Hidden Markov)模型的结构的比较。 For example, an audio global relations can be "noise is less than" (according to the overall noise characteristics), an audio local relationship could be "sound greater than" (partial response measurement), and an audio relation on the compositions can be in accordance with a Hidden Markov (Hidden Compare the structure of Markov) model.

声频结构的单元具有不同的层(普通的,特定的,和抽象的),以与其类似的方法能够在普动层(“近于”)或特定层(“距离10米),中定义语法关系的这些类型(见表3)。例如,如“与其并”,“与其交”,“是其非那样的操作关系是拓扑的,特定的关系,或者是空间的,或者是时间的(见表3)。 Means audio structure having different layers (regular, specific, and abstract), to analogous methods thereto capable Pu movable layer ( "near") or the particular layer ( "10 meters), as defined grammatical relations these types (see Table 3). For example, as described in "and its", "pay their", "is non-operative relationship that is topology specific relationship, or space, or time (see Table 3). 语义关系语义关系只能在10层概念结构的语义层上的内容单元之间发生。 Semantic relation semantic relationships can only occur at the semantic level layer 10 between the conceptual configuration of a content unit. 我们将语义关系划分成词法的和表述的关系。 We will be divided into lexical semantic relations relationship and expressed. 表4综合了语义关系并包括例子。 Table 4 combines semantic relationships and includes examples. 注意,因为语义关系根据内容的理解,我们能对从声频内容获得的关系可以作出与对从可视内容得到的关系作出一样的分类。 Note that because the semantic relationships based on understanding of content, we can make the same classification and make the relationship resulting from the visual content of the relationship obtained from the audio content. 因此,此处语义关系等同于结合可视信号取的方法(即理解声频相对于理解一个图象或视频)。 Thus, where the semantic relations are equal to those taken in conjunction with the visual signal (i.e., audio understood with respect to the understanding of one picture or video). 我们虽然有原始例子可以应用为了更清楚地作业解释,我们使用与声频有关的例子。 Although we have the original examples can be applied to more clearly explain the work, we use the example of the audio-related. 例如:作为一个普通同义词的例子,那个苹果象那个桔子;如果说话人谈及它们,苹果和桔子能从声频被“识别”。 For example: As an example of a common synonym, like the one that apples oranges; if the speaker talked about them apples and oranges from the audio is "recognition."

词法语义关系对应于在WordNet中使用的名词之间的语义关系。 Lexical semantic relations corresponding to the semantic relationship between the terminology used in WordNet. 这些关系是同义词(小提琴类似于中提琴),反义词(长笛与鼓相反),亚词(吉他是弦乐器),超词(弦乐乐器和一个吉他),部分词(音乐家是乐队的成员),和全词(乐队由音乐家组成)。 These relationships are synonyms (like violin viola), antonyms (opposite flute and drums), sub-word (guitar stringed instrument), super-word (string instruments and a guitar), part of the word (musician is a member of the band), and All words (band consists of musicians).

表述语义属性涉及在两个或多个单元中的动作(事件)或状态。 The expression semantic attribute relates to the operation (event) in two or more units or state. 动作关系的例子是“对其叫嚷”,和“击打”(例击球)。 Examples of the action relationship is "its shouting", and "striking" (Example batting). 状态关系的例子是“属于”和“拥有”。 Examples of state relations "belonging" and "own." 不是仅将表述语义划分成动作和状态,我们可以利用在WordNet中使用的部分关系语义分解。 Not only the semantic representation into actions and state, we can use the semantic relationship part for use in WordNet decomposition. WordNet将动词分成15个语义域:身体照顾及功能的动词,改变,认识,通讯,竞争,消费,接触,建立,情绪,运动,感觉,拥有,社会交往,和气象动词。 WordNet verb is divided into 15 semantic fields: health care and verb function change, understanding, communication, competition, consumer, contacts, build, emotions, movement, sensation, with social interaction, and meteorological verbs. 只有那些与可视概念的描述有关的域能被使用。 Only those fields that can be described with the use of visual concepts related.

至于这里提出的10层声频结构。 As set forth herein layer 10 acoustic frequency structure. 我们能在不同的层上定义语义关系:普通的,特定的,和抽象的。 We can define semantic relations in different layers: regular, special, and abstract. 例如,一个普通动作关系是“拥有股票”,一个特定动作关系是“拥有80%股票”,而最后,一个抽象语义关系是“控制该公司”。 For example, a common action relationship is "owned stock", a specific action relationship is "owned 80% of shares", and finally, an abstract semantic relationship is "control of the company." 表3:语法关系的索引结构及例子 Table 3: index structure and examples of grammatical relations

表4:语义关系的索引结构和例子 Table 4: Examples of index structure and semantic relationships

本发明不仅包括用于为索引和/或分类的目的的多层数字信号(如多媒体信号)的分类方法,而且包括计算机实施的系统。 The present invention includes not only a digital signal for the purpose of indexing a multilayer and / or classification (e.g. multimedia signals) classification, and a system including a computer implemented. 上述的方法根据它们能用于处理这里讨论类型的数字信号的任何系统中的事实已在某些一般原则中予以描述,一如任何在MPEG-7标准下与处理数字多媒体信号或文件相容的业内认识的(或将来开发的)系统。 The above-described method are discussed herein can be used to handle the fact that any type of digital signal system is described in them some general principles, like any process compatible with the digital multimedia signals or files in the MPEG-7 standard industry knowledge (or later developed) system.

通常认为,因为对数字信号的标准的目的是促进对那样信号发送,归档,和输出的混合平台(Cross-Platform)的兼容性,对实施本发明建立的系统给出系统特定的规定是不必要也是不希望的。 Is generally considered as a standard for the purpose of the digital signal that is to promote the transmission signal, archiving, platform compatibility and mixing outputs (Cross-Platform) is given a predetermined system specific embodiment of the system of the present invention is not necessary to establish It is not desirable. 相反的,业内一般熟练人员认识到,使用业内所熟知的所希望的硬件和软件技术如何实施这时提出的普通技术。 In contrast, the industry ordinarily skilled in the art recognize that the use of the industry desired known hardware and software technology of ordinary skill how to implement the proposed time.

为了给出广泛的例子,人们能够结合任何兼容设备考虑一个实现本发明的一个系统的实施例,用于处理,显示,归档,或发送数字信号(包括视,声频,静止图象,及其他包含人的感觉内容的数字信号,但不限于此)。 To give an example of a wide range, it is possible to consider a combination with any compatible device embodiment of the present invention is a system implementation, the processing, display, archive, or transmitting a digital signal (including video, audio, still picture, and others containing human feeling digital signal content, but it is not limited thereto). 那样的系统可以是包括奔腾处理器,存储器(如硬盘驱动器和随机访问存储器容量),视频显示,和合适的多媒体附件的个人计算机工作站。 As a system may be a Pentium processor, a memory (such as hard disk drives and random access memory capacity), a video display, a personal computer and a workstation suitable multimedia accessory. 总结本发明对当前的Generic AV DS提出基本的实本一关系模型,以便着于解决与其整体设计相关的缺点。 Summary of the invention the current Generic AV DS made this a real basic relational model, in order to solve the shortcomings associated with its overall design. 该基本的实体一关系模型索引(1)内容单元的属性,(2)内容单元之间的关系,和(3)内容单元本身。 The basic physical properties of a model index relationship (1) the content unit, the relationship between (2) the content unit, and (3) the content unit itself. 我们选择此建模技术,和(3)内容单元本身。 We chose this modeling technique, and (3) the contents of the unit itself. 我们选择此建模技术,因为实体一关系模型最广泛地使用于概念模型。 We chose this modeling technique, because the entity-relationship model most widely used in the conceptual model. 它们形成高度抽象性并与硬件及软件无关。 They form a highly abstract and independent of the hardware and software.

我们对属性(或MPEG-7描述符),关系,和内容单元在语法和语义上作出区别。 We, the relationship between, and the difference in content units to the syntax and semantics of the attributes (or MPEG-7 descriptors). 语法涉及内容单元安排的方法而不考虑那样安排的意义。 Syntax relates to methods for content arrangement unit regardless of significance as scheduled. 另一方面,语义处理这些单元及其安排的意义。 On the other hand, the significance of these units and semantic processing arrangements. 语法和语义属性能涉及若干层。 Attribute syntax and semantics can involve several layers. 类似地,语法和语义关系能进一步分成与不同层有关的子一类型。 Similarly, the syntax and semantics of a relationship can be further divided into sub-types associated with different layers. 我们根据它们的属性及与其他单元的关系的类型提出这语法和语义单元的紧凑及明确的定义。 We propose a compact and well-defined syntax and semantics of this unit depending on the type of property and their relationship with other units. 但是与Generic AVDS的一个重要差别是我们的语义单元不仅包括语义属性,也包括语法属性。 But a significant difference with the Generic AVDS is our semantic unit includes not only the semantic attributes, including syntax property. 因此,如果一个应用宁可不区分语法及语义单元,通过只使用语义单元,它也能做。 Thus, if an application does not distinguish between syntax and semantics rather unit, by using only the semantic units, it can do.

本发明的上述例子及说明性实施例为解释的目的列出。 Examples of the above-described embodiment and illustrative embodiments of the present invention are listed for the purposes of explanation. 业内普通熟练人员将认识到,这些讲授的例子不限定本发明的精神与范围的限止,本发明只受附后的权利要求的限止。 Industry ordinarily skilled in the art will recognize that these teachings is not limited to the examples of limiting, the final version of the spirit and scope of the invention, limiting, the present invention is limited only by the appended claims.

Claims (18)

1.一种索引多个数字信息信号的方法,其特征在于包括下列步骤:(a)对每个信号(i)对信号内容定义多个索引层;(ii)选择至少一个所述的索引层;(iii)从与每个所述的选定的索引层有关的信号中提取特征;(b)对每个信号分类(信号之间)在同一选定的索引层的所述提取的特征之间的关系;和(c)对该信号将所述的提取的特征及关系组织到较高层的描述结构中。 1. A method for indexing a plurality of digital information signals, comprising the steps of: (a) for each signal (i) index layers defining a plurality of signal content; (ii) selecting at least one layer of the index ; (iii) extracting features from each associated with a selected layer of said index signal; (b) wherein each of said signal classification (between signal) selected in the same index of the extracted layer relationship between; and (c) the extracted signal characteristics and the relationship to the higher level of organization described structure.
2.如权利要求1所述的方法,其特征在于所述的索引层包括与语法有关的层及与语义有关的层。 2. The method according to claim 1, wherein said layer comprises a layer of index-related layer and semantic grammar related.
3.如权利要求2所述的方法,其特征在于与语法有关的层包括至少一个从与下列有关的层的组中选定的层:(i)类型/技术(ii)全局分布(iii)局部结构;和(iv)全局组成。 3. The method according to claim 2, characterized in that the layer comprises at least one syntax selected from the group consisting of a layer of layer related to: (i) Type / Technology (ii) Global Distribution (iii) local structure; and (iv) the overall composition.
4.如权利要求2所述的方法,其特征在于所述有关语义的层至少包括一个从与下列有关的层的组中选的层:(i)普通对象;(ii)普通场景;(iii)特定对象;(iv)特定场景;(v)抽象对象;和(vi)抽象场景。 An ordinary scene (ii);; (i) Common Object (iii): 4. A method as claimed in claim 2, wherein said layer comprises at least about semantic layer from a layer associated with the following selected from the group a specific object; (iv) a particular scene; (V) abstract objects; and (vi) an abstract scene.
5.如权利要求1所述的方法,其特征在于所述的关系包括语义关系。 5. The method according to claim 1, wherein said relationship comprises a semantic relationship.
6.如权利要求5所述的方法,其特征在于所述的语义关系包括至少一个从包括下列关系的组中选出的关系:(a)词法的;和(b)表述的关系。 And (b) the relationship expressed; (A) Morphology: The method as claimed in claim 5, characterized in that the semantic relationship comprises at least one relationship selected from the group consisting of the following relations.
7.如权利要求1所述的方法,其特征在于所述的关系包括语法关系。 7. The method according to claim 1, wherein said relationship comprises a grammatical relations.
8.如权利要求7所述的方法,其特征在于所述的语法关系包括从包括下列关系的组中选出的关系:(a)空间的;(b)时间的;和(c)可视的关系。 8. The method according to claim 7, wherein said grammatical relationship comprises the relationship selected from the group comprising the following relations: (a) space; (b) time; and (c) Visual Relationship.
9.如权利要求1所述的文法,其特征在于所述数字信息信号包括多媒体数据文件。 9. grammar according to claim 1, wherein said information signal comprises a digital multimedia data file.
10.如权利要求9所述的方法,其特征在于所述的方法被应用于把所述的数据文件组织在数字库中。 10. The method according to claim 9, characterized in that the method is applied to the data in a digital library file organization.
11.如权利要求9所述的方法,其特征在于所述的数据文件包括视频文件。 11. The method according to claim 9, wherein said data file comprises a video file.
12.如权利要求9所述的方法,其特征在于所述的数据文件包括声频文件。 12. The method according to claim 9, wherein said data file comprises an audio file.
13.如权利要求1所述的方法,其特征在于至少一个所述的数字信息信号包括一个多媒体数据文件的一个片断部分。 13. The method according to claim 1, wherein said digital information signal comprises at least a portion of a segment of a multimedia data file.
14.如权利要求13所述的方法,其特征在于所述数据文件的片断部分对应于提供给用户感觉的多媒体数据文件的人类可感知的子部分。 14. The method according to claim 13, wherein portions of fragments of the data file corresponding to the sub-portion provided to the user feeling multimedia data files human perceptible.
15.如权利要求14所述的方法,其特征在于所述的人类可感知的子部分包括在视频图象文件中的特定的人或对象的图象。 15. The method according to claim 14, wherein the sub-human perceptible portion of the image includes a specific person or object in the video image file.
16.一个用于索引多个数字信息信号的系统,其特征在于包括:(a)至少一个用于接收信号的多媒体信息输入接口;(b)一个计算机处理器,耦合到所述的至少一个多媒体信息输入接口,用于(对每个信号):(i)对这些信号的内容定义多个索引层;(ii)选择至少一个所述的索引层;(iii)从与每个所述的选定的索引层有关的信息提取特征;而且用于对这些信号的每个分类(在这些信号之间)同一选定索引层的所述提取的特征之间的关系;并用于对这些信号把所述的提取的特征和关系组织到较高层的描述结构中。 16. A system for indexing a plurality of digital information signals, comprising: (a) at least one multimedia information input interface for receiving signals; (b) a computer processor coupled to said at least one multimedia information input interface for (for each signal) :( i) defining a plurality of index signals of the content of these layers; (ii) selecting at least one of said index layer; (iii) from each of said selected extracting information about a given characteristic index layer; and a relationship between the index of selected layers of the same features extracted for each classification of these signals (between the signal); and for the signals to the said extracted features, and relationships to a higher level of organization described structure.
17.如权利要求16所述的系统,其特征在于还包括:(c)操作上与所述处理器耦合的数据存储系统,用于存储与索引有关的信息。 17. The system according to claim 16, characterized by further comprising: (c) operatively coupled with the processor of the data storage system for storing information relating to the index.
18.一个用于分类多个数字信息信号的方法,其特征在于包括下列步骤:(a)对这些信号中的每一个:(i)对这些信号的内容定义多个分类层,所述的分类层包括与概念及感觉有关的分类层;(ii)选择至少一个所述的分类层;(iii)从与每个所述的选定的分类层有关的信号中提取特征;(b)对这些信号中的每一个分类(在这些信号之间)在同一选定的分类层的所述提取的特征之间的关系;和(c)对这些信号将所述的提取的特征和关系组织到较高层描述结构。 18. A method for classifying a plurality of digital information signals, comprising the steps of: (a) for each of these signals: defining a plurality of layers of the contents of these classification signals (I), said classification layer comprises a related classification of concepts and sensory layer; (ii) selecting at least one layer of the classification; (iii) extracting features from each associated with a selected signal of said free layer; (b) those each of the signal classification in the same relationship between the selected feature extraction classification layer (between these signals); and (c) of these signals and the extracted features representing the relationship between the organization high-level description of the structure.
CNB008124620A 1999-07-03 2000-06-30 Method and apparatus for indexing digital information signal for media contents management system CN1312615C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14232599P true 1999-07-03 1999-07-03

Publications (2)

Publication Number Publication Date
CN1372669A true CN1372669A (en) 2002-10-02
CN1312615C CN1312615C (en) 2007-04-25



Family Applications (1)

Application Number Title Priority Date Filing Date
CNB008124620A CN1312615C (en) 1999-07-03 2000-06-30 Method and apparatus for indexing digital information signal for media contents management system

Country Status (6)

Country Link
EP (1) EP1194870A4 (en)
JP (1) JP4643099B2 (en)
CN (1) CN1312615C (en)
AU (1) AU6065400A (en)
MX (1) MXPA02000040A (en)
WO (1) WO2001003008A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100520773C (en) 2004-05-03 2009-07-29 微软公司 System and method for encapsulation of representative sample of media object
CN1677487B (en) 2004-03-31 2010-06-16 微软公司 Language model adaptation using semantic supervision
CN104882145A (en) * 2014-02-28 2015-09-02 杜比实验室特许公司 Audio object clustering by utilizing temporal variations of audio objects
CN101110249B (en) * 2006-03-30 2017-05-03 索尼株式会社 Content capturing method for capturing content and information affixation method

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2844079B1 (en) 2002-08-30 2005-08-26 France Telecom Blurred associative system for describing multimedia objects
BRPI0605994B1 (en) * 2006-09-29 2019-08-06 Universidade Estadual De Campinas - Unicamp Progressive randomization process for multimedia analysis and reasoning
US9244924B2 (en) * 2012-04-23 2016-01-26 Sri International Classification, search, and retrieval of complex video events
US8537983B1 (en) 2013-03-08 2013-09-17 Noble Systems Corporation Multi-component viewing tool for contact center agents
CN105102031B (en) 2013-04-10 2019-01-18 赛诺菲 Driving mechanism for medicine delivery device
US10349093B2 (en) 2014-03-10 2019-07-09 Cisco Technology, Inc. System and method for deriving timeline metadata for video content
US9838759B2 (en) 2014-06-20 2017-12-05 Google Inc. Displaying information related to content playing on a device
US10206014B2 (en) 2014-06-20 2019-02-12 Google Llc Clarifying audible verbal information in video content
US9946769B2 (en) 2014-06-20 2018-04-17 Google Llc Displaying information related to spoken dialogue in content playing on a device
US9805125B2 (en) 2014-06-20 2017-10-31 Google Inc. Displaying a summary of media content items
US10349141B2 (en) 2015-11-19 2019-07-09 Google Llc Reminders of media content referenced in other media content
US10034053B1 (en) 2016-01-25 2018-07-24 Google Llc Polls for media program moments
US10432987B2 (en) 2017-09-15 2019-10-01 Cisco Technology, Inc. Virtualized and automated real time video production system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3303543B2 (en) * 1993-09-27 2002-07-22 インターナショナル・ビジネス・マシーンズ・コーポレーション How to play constituting the multimedia segment, and methods configured to play two or more multimedia story as hyper Story
US5821945A (en) * 1995-02-03 1998-10-13 The Trustees Of Princeton University Method and apparatus for video browsing based on content and structure

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1677487B (en) 2004-03-31 2010-06-16 微软公司 Language model adaptation using semantic supervision
CN100520773C (en) 2004-05-03 2009-07-29 微软公司 System and method for encapsulation of representative sample of media object
CN101110249B (en) * 2006-03-30 2017-05-03 索尼株式会社 Content capturing method for capturing content and information affixation method
CN104882145A (en) * 2014-02-28 2015-09-02 杜比实验室特许公司 Audio object clustering by utilizing temporal variations of audio objects
CN104882145B (en) * 2014-02-28 2019-10-29 杜比实验室特许公司 It is clustered using the audio object of the time change of audio object

Also Published As

Publication number Publication date
WO2001003008A1 (en) 2001-01-11
EP1194870A1 (en) 2002-04-10
AU6065400A (en) 2001-01-22
EP1194870A4 (en) 2008-03-26
MXPA02000040A (en) 2003-07-21
CN1312615C (en) 2007-04-25
JP2003507808A (en) 2003-02-25
JP4643099B2 (en) 2011-03-02

Similar Documents

Publication Publication Date Title
Hanjalic et al. Affective video content representation and modeling
Miller A beast in the field: The Google Maps mashup as GIS/2
US6593936B1 (en) Synthetic audiovisual description scheme, method and system for MPEG-7
US6446083B1 (en) System and method for classifying media items
Du Gay et al. Doing cultural studies: The story of the Sony Walkman
US7363649B2 (en) Media content descriptions
Beer Making friends with Jarvis Cocker: Music culture in the context of Web 2.0
US9167189B2 (en) Automated content detection, analysis, visual synthesis and repurposing
Kaminskas et al. Contextual music information retrieval and recommendation: State of the art and challenges
CN101755303B (en) Automatic story creation using semantic classifiers
Hori et al. Context-based video retrieval system for the life-log applications
US7120626B2 (en) Content retrieval based on semantic association
Truong et al. Video abstraction: A systematic review and classification
Holt Genre in popular music
JP2015007980A (en) Keyword advertisement method using digital content related meta-information and related system therefor
TWI278757B (en) Presenting a collection of media objects
US20070294295A1 (en) Highly meaningful multimedia metadata creation and associations
US20070124282A1 (en) Video data directory
US8717367B2 (en) Automatically generating audiovisual works
US20080168070A1 (en) Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification
US8375303B2 (en) System and method for generating a work of communication with supplemental context
JP5866728B2 (en) Knowledge information processing server system with image recognition system
US6922691B2 (en) Method and apparatus for digital media management, retrieval, and collaboration
US20040017390A1 (en) Self instructional authoring software tool for creation of a multi-media presentation
JP4062908B2 (en) Server device and image display device

Legal Events

Date Code Title Description
C10 Entry into substantive examination
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model
EXPY Termination of patent right or utility model