KR20000033410A

KR20000033410A - Image data retrieval method by partial result matrix and flexible attribute tree

Info

Publication number: KR20000033410A
Application number: KR1019980050255A
Authority: KR
Inventors: 이원석
Original assignee: 이원석
Priority date: 1998-11-23
Filing date: 1998-11-23
Publication date: 2000-06-15
Also published as: KR100322300B1

Abstract

PURPOSE: An image data retrieval method is provided to group interrelated attributes into fixed and flexible attributes, structure a user's content based queries into a set of various attributes, simplify browsing structured flexible attribute trees by using a dictionary browser, and browse contents of a video file with an easily understandable method. CONSTITUTION: An image data retrieval method comprises steps of expressing dynamic information on contents of a video file as the flexible attributes, defining a flexible attribute tree on a flexible attribute A, structuring the flexible attribute tree TA as a set of the flexible attribute A and a plurality of children nodes(n1,n2,n3..), and letting each element of the set the flexible attribute have a unique value vi(1≤i≤p). The flexible attribute has following characteristics; 1) The node A is a root of a tree. 2) The node ni has more minute value vi than its pa rent node 3) If the node ni is an internal node, the domain of the node ni is vi 4) if the node ni is a leaf node, the node ni can have a minute value set(W) defined by a user and the domain of the node ni is a set of vi or a user defined value set(W) 5) A new node can be inserted into the tree, and a node can be deleted.

Description

Image data retrieval method using flow attribute tree and partial result matrix

본 발명은 영상 데이터의 검색 방법에 관한 것으로, 특히, 영상을 고정속성과 유동속성으로 분류하여 검색하는 유동속성트리와 부분결과 행렬에 의한 영상 데이터 검색 방법에 관한 것이다.The present invention relates to a method of retrieving image data, and more particularly, to a method of retrieving image data by using a flow attribute tree and a partial result matrix for retrieving images classified into fixed attributes and flow attributes.

컴퓨터와 통신 그리고 데이터 압축기술의 발달로 비디오 데이터를 이용한 다양한 서비스가 가능하게 되었다. 다른 데이터 타입과 비교할 때 비디오 데이터는 압축되었다 하더라도 크기가 매우 크고 실시간에 연속적으로 접근해야 하므로 비디오 데이터의 관리가 어렵다. 또한 연속된 비디오 스트림에는 다양한 정보가 서로간의 명확한 경계 구분 없이 표현된다. 그러나 비디오 데이터는 복합적인 정보를 용이하게 표현할 수 있으므로 비디오 데이터의 사용이 증가될 것으로 예상된다. 효과적인 내용기반 검색을 지원하기 위해서 비디오 파일에 나타나는 임의의 부분에 대한 정보를 적절하게 기술(설명)할 수 있는 기술(설명)단위(unit of description)를 정의한다.Advances in computer, communication, and data compression technologies have enabled a variety of services using video data. Compared with other data types, even though compressed video data is compressed, it is difficult to manage video data because it is very large and requires continuous access in real time. In addition, in a continuous video stream, various pieces of information are represented without distinct boundaries. However, since video data can easily represent complex information, the use of video data is expected to increase. To support effective content-based retrieval, we define a unit of description that can adequately describe information about any part of a video file.

또한 기술(설명)단위는 비디오 파일에 있는 부분들을 물리적으로 재구성하는 것이 아니라 물리적으로 분리되어 있지만 논리적으로는 하나의 연속된 비트스트림으로 관리될 수 있게 한다.In addition, the description unit does not physically reconstruct the parts of a video file, but allows it to be physically separated but logically managed as one continuous bitstream.

비디오 데이터에 대한 정보는 두 가지 유형이 있다. 하나는 비디오 파일의 부분에 나타나는 내용에 대한 정보이고, 다른 하나는 비디오 파일 자체 혹은 그 비디오 파일에 나타나는 개체에 관련된 일반적인 정보이다. 전자는 비디오 파일에 나타나는 개체의 위치와 움직임의 형태 및 의미를 표현하는 정보이다. 반면에 후자는 비디오 파일의 물리적인 성질과 개체에 대한 일반적인 특성을 표현하는 정보이다. 이 차이를 구분하기 위해서 전자를 동적정보(dynamic information), 후자를 정적정보(static information)라고 정의한다. 비디오 데이터 베이스에서 내용기반 검색에 대한 대부분의 연구는 동적정보를 효과적으로 표현하는 방법에 초점을 두었지만 일반적으로 질의는 정적정보와 동적정보의 조합으로 표현될 수 있으므로 정적정보와 동적정보의 유기적인 관계를 표현할 수 있어야 한다.There are two types of information about video data. One is information about the content that appears in the portion of the video file, and the other is general information about the video file itself or the objects that appear in that video file. The former is information representing the form and meaning of the object's position and movement appearing in the video file. The latter, on the other hand, is information that represents the physical properties of the video file and the general characteristics of the object. To distinguish this difference, the former is defined as dynamic information and the latter as static information. While most research on content-based retrieval in video databases has focused on how to effectively represent dynamic information, in general, queries can be expressed as a combination of static and dynamic information, so the organic relationship between static and dynamic information Should be able to express

비디오 데이터 베이스 시스템에서 내용기반 검색의 효율성은 기술(설명)단위의 구조뿐만 아니라 동적정보의 표현방법에 달려 있고 구축하고자 하는 응용분야의 필요에 따라 표현하는 동적정보의 형태가 변할 수 있다. 따라서 특정분야에서 요구되는 동적정보의 형태를 고정하는 것이 더 효율적일 수 있다. 하지만 이런 방법은 응용분야에 따라 모델링 요소가 변하므로 일반적인 비디오 데이터 베이스 모델로 사용하는 데에는 한계가 있다. 따라서 대부분의 연구는 기존 데이터 베이스 관리시스템에서 사용하는 속성의 개념없이 다양한 형태를 자유롭게 모델링할 수 있는 키워드 방법을 사용한다. 이 방법은 키워드를 저장할 수 있는 간단한 구조로 구현이 가능하지만 질의의 검색결과가 주석(annotation) 처리에서 사용된 단어와 질의를 표현할 때 사용되는 단어의 선택에 따라 상당히 민감하게 변하는 단점을 갖는다.The efficiency of content-based retrieval in a video database system depends not only on the structure of description (description) but also on the method of expressing dynamic information, and the form of dynamic information expressed according to the needs of the application field to be constructed may be changed. Therefore, it may be more efficient to fix the type of dynamic information required in a particular field. However, this method has limitations in using it as a general video database model because modeling elements change depending on the application. Therefore, most of the studies use keyword methods that can freely model various forms without the concept of attributes used in existing database management systems. This method can be implemented in a simple structure that can store keywords, but has the disadvantage that the search results of the query vary considerably depending on the words used in the annotation processing and the words used to express the query.

속성의 개념을 지원하는 방법에서는 화면(scene)의 의미를 기술(설명)하는 속성들의 집합을 고정된 데이터 베이스 스키마(schema)로 표현하는 방법과 스키마없이 동적으로 구성하는 형태로 분류할 수 있다. 하지만 동적정보를 고정스키마로 표현하는 데에는 다음과 같은 단점이 있다.In the method of supporting the concept of attributes, a set of attributes describing the meaning of a scene can be classified into a fixed database schema and a dynamic configuration without a schema. However, there are the following disadvantages in expressing dynamic information with fixed schema.

첫 번째는 하나의 화면에 나타나는 개체의 움직임에 대한 수많은 유형이 있을 수 있기 때문에 움직임의 형태를 미리 고정시키는 방법은 일반적인 해결책이 될 수 없다. 두 번째는 비디오 데이터베이스에 저장된 화면의 종류와 개체의 량이 방대할 수 있고 사용자 관점의 변화에 따라 개체와 화면의 중요도가 변할 수 있기 때문에 모델링되는 개체혹은 화면의 종류를 고정하는 것은 좋은 방법이 아니다. 마지막으로 동일한 동적정보에 대한 기술(설명)은 사용자의 관점에 따라 표현이 상이할 수 있기 때문이다. 이러한 이유로 비디오 데이터 베이스가 특정한 용도로 설계되지 않는다면 모델링 요소의 집합을 고정하여 정의하는 것은 거의 불가능하다.The first is that there can be many types of movement of an object on one screen, so the method of fixing the shape of the movement in advance cannot be a general solution. Secondly, it is not a good idea to fix the type of objects or screens that are modeled because the type and amount of screens stored in the video database can be enormous and the importance of objects and screens can change according to changes in the user's perspective. Finally, the description (description) of the same dynamic information may be different depending on the user's point of view. For this reason, it is almost impossible to define a fixed set of modeling elements unless the video database is designed for a particular purpose.

비디오 데이터 베이스 시스템의 이런 요구에도 불구하고 스키마 없이 속성을 관리하는 데이터 베이스 관리 시스템은 없다. 따라서 대부분의 연구는 서로 다른 질의언어와 인덱스구조를 갖는 고유의 비디오 데이터 모델을 제공한다. 이런 어려움을 극복하기 위해서 본 발명은 기존의 데이터 베이스모델에서 유동적으로 정의되는 속성과 속성값의 집합을 효율적으로 관리한다. 또한 비디오 데이터 베이스에서는 기존의 데이터 베이스 시스템과는 달리 사용자가 질의결과를 활용하기 위해서는 결과 비디오 데이터를 재생해야 한다. 하지만 질의 결과의 수가 상당히 많을 경우 일일이 모든 비디오를 재생한다는 것은 불가능하다.Despite this demand of video database systems, no database management system manages attributes without a schema. Therefore, most studies provide a unique video data model with different query languages and index structures. To overcome this difficulty, the present invention efficiently manages a set of attributes and attribute values that are dynamically defined in the existing database model. In addition, unlike the existing database system, the video database requires the user to play the result video data in order to utilize the query result. However, if the number of query results is quite large, it is impossible to play all the videos one by one.

일반적으로 내용 기반검색을 위한 질의는 정확한 매칭(matching)보다는 유사한 조건을 갖는 데이터를 추출하는 특성을 갖는다. 따라서 질의를 구성하는 각 조건에 대한 예상결과수를 질의 처리전에 사용자에게 제시함으로써 사용자의 질의표현작업을 효과적으로 지원할 수 있다. 본 발명에서는 이러한 기능을 지원하는 결과브라우저를 제안하여 사용자가 질의조건들을 재구성할 수 있도록 지원한다.In general, a query for content-based retrieval has a property of extracting data having similar conditions rather than exact matching. Therefore, it is possible to effectively support the user's query expression by presenting the expected number of results for each condition constituting the query to the user before query processing. The present invention proposes a result browser that supports such a function so that the user can reconstruct the query conditions.

비디오 데이터베이스에 대한 내용기반검색의 지원을 위한 다양한 방법의 연구가 진행되고 있다. 비디오 파일과 심볼객체(symbol object)를 각각 비디오 클래스 계층(class hierachy)과 심볼클래스 계층(symbol class hierachy)으로 구성하여 내용기반 검색을 한다. 이런 계층구조에 기초하여 기술(설명)된 비디오 정보는 비디오심볼객체(video_symbol_object) 테이블에서 유지되고 질의는 CVQL 이라는 질의언어로 표현된다.Various methods for supporting content-based retrieval for video databases have been conducted. Content-based retrieval consists of video file and symbol object composed of video class hierarchy and symbol class hierachy, respectively. Based on this hierarchy, the video information described (described) is maintained in the video_symbol_object table, and the query is expressed in a query language called CVQL.

심볼객체의 공간적 또는 시간적 관계는 미리 정의된 시공산 함수로 정의된다. 톱피컬카타고리(topical categories)라고 하는 클래스계층을 사용하여 도메인의 특정정보를 표현 할 수 있다. 키워드는 비디오 정보를 기술(설명)하는데 사용되며 도메인의 특정 스키마에 저장되는 반면에 질의는 자유문구(free-text)형식으로 표현된다. 위의 두가지 방식은 주석과 인덱싱을 위해 응용 도메인의 특정정보를 사용하기 때문에 제한된 응용분야에 적합하다. 또한 의미단어들에 대한 의미사전 트리의 크기가 커질 경우 트리구조에 대한 브라우징 방법을 고려하지 않았다.The spatial or temporal relationship of the symbol object is defined by a predefined construction function. A class hierarchy called topical categories can be used to express domain specific information. Keywords are used to describe the video information and are stored in a specific schema of the domain, while queries are expressed in free-text form. The above two methods are suitable for limited applications because they use application domain specific information for annotation and indexing. Also, when the size of the semantic dictionary tree for the semantic words increases, the browsing method for the tree structure is not considered.

텔레비젼 뉴스의 내용기반 검색을 위해서 문자방송을 이용하여 뉴스정보를 기술(설명)한다. 이 방법에서 질의는 키워드로 표현하고 질의결과는 질의문과 문자방송의 텍스트간의 매칭정도를 나타내는 매칭스코어(matching score)로 정렬하여 순차적으로 표시된다. 키워드와 자유문구형식으로 질의를 표현하는 방식은 개체의위치, 크기, 새깔과 같은 하위레벨의 정보를 표현하기 어렵다는 단점이 있다.For content-based retrieval of television news, news information is described using text broadcasting. In this method, the query is expressed as a keyword, and the query results are displayed sequentially by sorting with a matching score indicating the degree of matching between the query statement and the text of the text. The method of expressing the query in the form of keywords and free phrases has a disadvantage in that it is difficult to express low-level information such as object position, size, and color.

비디오 데이터의 구조를 저장하기 위하여 기존의 데이터 베이스 관리시스템을 사용하고 비디오의 내용정보는 키워드를 사용하여 주석처리한다. 미리 정의된 함수를 사용하여 비디오 데이터들의 시간적 관계를 표현하고 비디오 데이터들을 공유한다.The existing database management system is used to store the structure of the video data, and the content information of the video is annotated using keywords. Predefined functions are used to represent temporal relationships of video data and share video data.

칼라 히스토그램, 색상 및 평균 밝기를 사용해서 비디오 스트림의 특성정보를 자동적으로 추출하고 이 정보는 비디오 데이터를 검색하는데 필요한 키프레임을 찾는데 사용할 수 있다. 한편, 비디오 데이터를 구조화하기 위해서 특정화면에 대한 비디오 정보를 정해진 모델링 요소 집합으로 구성된 벡터표현식(vector expression) 으로 변환하고, 이 벡터 표현식은 화면의 경계를 감지하여 추출하기 위해서 이웃한 프레임과 세그먼트간의 유사성을 찾는데 사용할 수 있다.Color histograms, colors, and average brightness are used to automatically extract the characteristics of the video stream, which can be used to find the keyframes needed to retrieve the video data. On the other hand, in order to structure the video data, video information about a specific screen is converted into a vector expression composed of a set of modeling elements, and the vector expression is used to detect the boundary of the screen and extract the neighboring frames and segments. Can be used to find similarities.

OVID와 대수영상모델(Algibraic Video Model)에서는 비디오 데이터의 동적정보를 유동적으로 정의된 속성과 속성값 쌍으로 기술(설명)한다. OVID에서의 기술(설명)단위는 비디오 객체로서 내용을 기술(설명)하는데 사용된 모든 속성값들은 일반화 계층(generalized hierachy)구조에서 유지한다. 따라서 속성값을 많이 정의할 경우 트리의 크기가 커지는 단점이 있다. OVID에서는 고유의 질의언어로 Video SQL을 사용하여 비디오 객체를 검색한다. 대수영상모델(Algibraic Video Model)에서의 기술(설명)단위는 비디오 대수연산자로 표현되는 비디오 표현식이며 이러한 구조는 다른 비디오 표현식에서 용이하게 공유할 수 있는 장점이 있지만 비디오 표현식이 대수연산을 다중으로 포함할 경우 비디오 표현식에 해당하는 실제 비디오 데이터를 파악하는 관계가 복잡해진다. 또한 속성과 속성값 쌍의 논리연산 조합으로 질의를 표현하지만 브라우징하기 위한 의미사전 구조가 없기 때문에 사용자는 정의를 표현하기 위해 속성과 속성값의 리스트를 각각 순차적으로 검색해야 한다.The OVID and Algibraic Video Model describe (dynamically) describe dynamic information of video data as dynamically defined attributes and attribute value pairs. The description unit in OVID is a video object and all property values used to describe content are kept in a generalized hierachy structure. Therefore, if you define a lot of attribute values, there is a disadvantage that the size of the tree increases. OVID retrieves video objects using Video SQL as its native query language. The description unit in the Algibraic Video Model is a video expression represented by a video algebraic operator. This structure has the advantage of being easily shared by other video expressions, but the video expression contains multiple algebraic operations. This complicates the relationship of identifying the actual video data corresponding to the video expression. In addition, the query is expressed by a combination of attribute and attribute value pairs, but since there is no semantic dictionary structure for browsing, the user must search each attribute and list of attribute values sequentially to express the definition.

기존의 데이터베이스 관리시스템(DBMS)에서 비디오 데이터의 정적정보는 가장효과적으로 표현할 수 있지만 대부분의 연구에서 내용기반 질의를 표현하기 위해 고유의 질의 언어를 제공한다. 이들 질의언어들은 SQL과 호환되지 않으므로 비디오 데이터의 질의를 동적정보와 정적정보의 조합으로 표현하기가 어렵다. 따라서, 이러한 방법은 새로운 질의언어를 습득해야 하는 사용자의 부담을 초래할 뿐만아니라 비디오 데이터 베이스가 기존의 데이터베이스와 통합되어야 할 경우 호환성의 문제가 발생할 수 있다.In the existing database management system (DBMS), static information of video data can be expressed most effectively. However, most researches provide a unique query language to express content-based queries. Since these query languages are not compatible with SQL, it is difficult to express a query of video data as a combination of dynamic information and static information. Therefore, this method not only incurs a burden on the user to acquire a new query language, but also may cause compatibility problems when the video database needs to be integrated with an existing database.

두 개의 기술(설명)단위 사이에는 다양한 시간적인 관계가 존재할 수 있다. 이들 중에서 하나의 구간이 다른 구간에 포함될 때 구간포함(interval inclusion) 관계가 발생하며 이 관계는 OVID 시스템에서 동적기술(설명)을 상속하는데 사용한다. 속성은 사용자에 의해 상속가능한 속성과 상속 불가능한 속성으로 분류되는데 상속가능한 속성과 속성값 쌍은 병합(merge) 혹은 중첩(overlap) 연산이 수행되었을 경우에만 상속된다. 반면에 대수영상모델(Algibraic Video Model)은 포함되는 비디오 표현식 간에 기술(설명)된 의미를 동적으로 상속할 수 있지만 OVID에서와 같이 선택적으로 지원하지 않는다.Various temporal relationships can exist between two descriptive units. An interval inclusion relationship occurs when one of these sections is included in another section, and this relationship is used to inherit the dynamic description (description) in the OVID system. Attributes are categorized as inheritable and non-inheritable by the user, and inheritable attribute and attribute value pairs are inherited only when a merge or overlap operation is performed. The Algibraic Video Model, on the other hand, can dynamically inherit the described meaning between the included video expressions, but it is not optionally supported as in OVID.

비디오 데이터의 내용 기반검색을 위해서 계층적 시간언어(hierachical temporal language)를 사용하고 비디오 데이터의 시공간적 기술(설명)은 논리식으로 표현할 수 있다. 멀티미디어 객체를 검색하기 위해 확률적 검색모델(Probabilistic retrieval model)을 사용할 수 있다. 질의와 비디오 클립 내용간의 정확한 매칭에 따라 검새결과를 제공하는 다른방법과는 달리 이 방법은 질의와의 관련성을 확률로 표현한다. 비디오 데이터에 대한 모델링 요소는 비디오 데이터베이스의 응용분야에 따라 다르게 정의될 수 있기 때문에 질의를 표현할 수 있는 범위는 모델링된 요소에 제한된다.A hierarchical temporal language is used for content-based retrieval of video data, and the spatiotemporal description (description) of video data can be expressed logically. Probabilistic retrieval model can be used to retrieve multimedia objects. Unlike other methods that provide search results based on exact matching between query and video clip content, this method expresses the relevance of the query as a probability. Modeling elements for video data can be defined differently according to the application of the video database, so the range of expressing a query is limited to the modeled elements.

본 발명은 상기와 같은 종래의 문제점을 해소하기 위한 것으로,The present invention is to solve the above conventional problems,

본 발명의 목적은 기존의 데이터 베이스모델에서 유동적으로 정의되는 속성과 속성값의 집합을 효율적으로 관리하는 방법을 제공하는 데 있다.An object of the present invention is to provide a method for efficiently managing a set of attributes and attribute values that are dynamically defined in an existing database model.

본 발명의 다른 목적은 사용자가 질의조건들을 재구성할 수 있는 기능을 지원하는 결과브라우저를 제공하는 것이다.Another object of the present invention is to provide a result browser that supports a function that allows a user to reconstruct query conditions.

본 발명의 또 다른 목적은 질의를 구성하는 각 조건에 대한 예상결과수를 질의 처리전에 사용자에게 제시할 수 있는 방법을 제공하는 것이다.It is still another object of the present invention to provide a method for presenting an expected number of results for each condition constituting a query to a user before query processing.

본 발명의 또 다른 목적은 정적정보와 동적정보의 유기적인 관계를 표현할 수 있는 방법을 제공하는 것이다.Another object of the present invention is to provide a method for expressing an organic relationship between static information and dynamic information.

본 발명의 또 다른 목적은 비디오 파일에 있는 부분들을 물리적으로 분리되어 있지만 논리적으로는 하나의 연속된 비트스트림으로 관리될 수 있는 방법을 제공하는 것이다.It is yet another object of the present invention to provide a method in which parts in a video file can be physically separated but logically managed as one continuous bitstream.

상기와 같은 목적을 달성하기 위하여 본 발명은 다양한 추상화 단계로 해석될 수 있는 유동속성을 관리하기 위해 유동속성 A에대해 유동속성트리(FAT:flexible attribute tree)를 정의하고, 동일한 개념의 트리를 두 개의 다른 유동속성 트리또는 속성값으로 모델링하고, 모델링되는 다양한 유형의 정보가 존재하는 비디오 데이터 베이스 모델에서 이러한 정보들을 나타내기 위해서 스키마 구조를 제안한다. 또한, 사용자의 효과적인 브라우징 작업을 지원하기 위하여 사용자의 질의 표현과 의미 기술(설명)작업을 단순화시킨 사전브라우저(dictionary browser)와 사용자가 질의조건들의 다양한 조합에 대한 질의 결과를 분석할 수 있도록 결과 브라우저를 제공한다. 결과브라우저에 의하여 사용자가 질의조건들을 재구성할 수 있도록 지원한다.In order to achieve the above object, the present invention defines a flexible attribute tree (FAT) for the flow attribute A in order to manage the flow attribute that can be interpreted in various abstraction steps, and puts a tree of the same concept We model with three different flow attribute trees or attribute values, and propose a schema structure to represent such information in a video database model in which various types of information are modeled. In addition, the dictionary browser, which simplifies the user's query expression and semantics, to support the user's effective browsing, and the result browser so that the user can analyze the query results for various combinations of query conditions. To provide. The result browser allows the user to reconstruct the query conditions.

도1은 비디오 데이터의 내용을 기술(설명)할 수 있는 속성형을 보인다.1 shows an attribute type that can describe (explain) the content of video data.

도2는 의미속성"행위"의 유동속성트리와 x-y 좌표상에 나타나는 개체의 위치를 표현하는 형태속성 "위치"의 유동속성트리를 보인다.Fig. 2 shows the flow attribute tree of the semantic attribute "behavior" and the flow attribute tree of the shape attribute "position" representing the position of the entity appearing on the x-y coordinates.

도3은 FAT에 있는 속성값의 브라우징 방법을 보인다.3 shows a method of browsing attribute values in a FAT.

도4는 본 발명에 의한 사전 브라우저의 전반적인 구조를 보인다.4 shows the overall structure of a dictionary browser according to the present invention.

도5는 질의의 결과를 보이는 결과뷰어이다.5 is a result viewer showing the result of a query.

도6은 본 발명에 의한 결과브라우저를 보인다.Figure 6 shows the resulting browser according to the present invention.

이하, 상기와 같이 구성된 본 발명의 기술적 사상에 따른 실시예를 들어 첨부된 도면에 의거 그 동작 및 작용 효과를 상세히 설명하면 다음과 같다.Hereinafter, with reference to the accompanying drawings for an embodiment according to the technical idea of the present invention configured as described above in detail the operation and operation effects are as follows.

기존의 관계형 및 객체지향 데이터 베이스 모델에서는 비디오 데이터의 정적정보는 손쉽게 표현할 수 있다. 개체의 일반적인 성질에 연관된 속성들은 에디터 베이스 설계시에 고정스키마로 정의한다. 예를들어, 이름, 나이, 성별과 같은 개체의 일반적인 속성이나 제목, 파일포맷, 크기, 해상도와 같은 비디오 파일 자체에 대한 정보는 고정 스키마로 손쉽게 모델링할 수 있다. 반면에 다양한 사용자가 서로 다른 관점으로 속성에 대한 정보를 여러 단계로 추상화시킬 경우 속성의 공통적인 집합을 정의하기가 불가능하기 때문에 동적정보는 고정스키마로 모델링하기가 어렵다. 따라서, OVID 시스템에서처럼 동적정보는 속성과 속성값 쌍으로 표현한다. 속성과 속성값 쌍의 집합을 동적기술(설명)(DYNAMIC DESCRIPTION) 이라고 정의하고 그 요소는 필요에 따라 동적으로 추가 또는 삭제될 수 있다. 이런 유동적인 특성 때문에 스키마를 정의하는 기존의 방법은 사용될 수 없으며 속성과 속성의 도메인을 명확히 정의하는 것이 용이하지 않기 때문에 유동적으로 정의된 속성과 속성값은 특별한 관리가 필요하다.In the existing relational and object-oriented database models, static information of video data can be easily represented. Properties related to the general properties of an object are defined by a fixed schema when designing the editor base. For example, general properties of objects such as name, age and gender, or information about the video file itself, such as title, file format, size, and resolution, can be easily modeled in a fixed schema. On the other hand, dynamic information is difficult to model with fixed schema because it is impossible to define a common set of attributes when various users abstract the information about attributes with different stages from different perspectives. Therefore, as in the OVID system, dynamic information is represented by attribute and attribute value pairs. A set of attribute and attribute value pairs is defined as a DYNAMIC DESCRIPTION, and its elements can be added or removed dynamically as needed. Because of these fluid characteristics, existing methods of defining schemas cannot be used, and because it is not easy to clearly define attributes and domains of attributes, fluidly defined attributes and attribute values require special management.

동적정보는 형태단계(SYNTACTIC LEVEL)와 의미단계(semantic-level)의 두단계로 분류할 수 있다. 형태단계기술(설명)은 기술(설명)단위에 나타나는 정보의 가장하위 단계를 의미한다. 이는 비디오 데이터에 나타나는 개체의 크기, 위치, 혹은 개체의 연속된 움직임의 방향으로서 개체의 정확한 모양, 색깔 또는 움직임을 표현하는데 사용한다. 따라서, 형태단계 기술(설명)은 대개 물리적으로 연속된 기술(설명)단위에 사용된다. 기존의 많은 연구들은 형태단계 정보만을 사용하여 비디오 데이터의 주석을 자동적으로 부여하는 기법에대해 연구되었다.Dynamic information can be classified into two phases: SYNNTACTIC LEVEL and semantic-level. Form level description (description) means the lowest level of information that appears in the description (description) unit. It is used to express the exact shape, color, or movement of an object as the size, position, or direction of continuous movement of the object as it appears in the video data. Thus, form-level descriptions are often used in physically continuous description (description) units. Many existing studies have been conducted on techniques for automatically annotating video data using only shape-level information.

의미단계 기술(설명)은 보다 상위단계의 추상화로서 기술(설명)단위에 나타나는 형태단계 정보의 개념적인 의미를 모델링하는데 사용한다. 예를들어 사람이 독서하고 있는 화면이라면 형태단계 기술(설명)은 화면에 나타나는 각 개체의 위치와 크기를 모델링할 수 있고 의미단계기술(설명)은 화면을 독서로 모델링할 수 있다. 형태단계 기술(설명)과 달리 의미단계 기술(설명)은 여러단계의 추상화가 가능하고 연속된 비트스트림 뿐만 아니라 하나 이상의 작은 기술(설명) 단위들이 하나의 큰 기술(설명)단위로 정의될 수 있다. 이 경우에 하나로 구성되는 의미단계 기술(설명)은 하나 이상의 다른 작은 기술(설명)단위보다 높은 단계로 추상화된 의미로 기술(설명)된다.Semantic level description (explanation) is a higher level abstraction and is used to model the conceptual meaning of the type level information that appears in the description (explanation) unit. For example, if the screen is a human reading, the shape level description (description) can model the position and size of each object appearing on the screen, the semantic level description (description) can model the screen as reading. Unlike the form-level description (description), the semantic description (description) can be abstracted in multiple stages and one or more small description (description) units as well as a continuous bitstream can be defined as one large description (description) unit. . In this case, the semantic level description (description), which consists of one, is described (explained) in an abstracted sense at a higher level than one or more other small description (description) units.

동적정보를 표현하는 속성은 유동속성이라고 정의하고, 그렇지않은 경우를 고정속성이라고 정의한다.Attributes that represent dynamic information are defined as flow attributes, otherwise they are defined as fixed attributes.

각 사용자의 관점에 따라 유동속성과 속성값의 이름이 각각 다르게 표현할 수있기 때문에 유동속성과 속성값의 집합을 관리하는 것은 용이하지 않다. 예를들어, 개체의 위치는 "위치" 혹은 "중심"으로 표현될 수 있고 그 속성값은 x-y 좌표혹은 r-θ 좌표로 표현될 수 있다.Managing the set of flow attributes and attribute values is not easy because the names of flow attributes and attribute values can be expressed differently according to each user's point of view. For example, the position of an entity can be expressed as "position" or "center" and its attribute value can be expressed in x-y coordinates or r-θ coordinates.

다양한 추상화 단계로 해석될 수 있는 유동속성을 관리하기 위해 유동속성 A에 대해 유동속성트리(FAT:flexible attribute tree)를 정의한다. 유동속성트리 T_A는 노드 N={A, n₁, n₂,...,n_p}의 집합으로 구성되고, 각 원소는 유일한 값 v_i(1≤i≤p)를 갖는다. 또한 유동속성트리 T_A는 다음과 같은 성질을 갖는다.A flexible attribute tree (FAT) is defined for flow attribute A to manage flow attributes that can be interpreted at various levels of abstraction. The flow attribute tree T _A is composed of a set of nodes N = {A, n ₁ , n ₂ , ..., n _p }, and each element has a unique value v _i (1 ≦ _i ≦ _p ). In addition, the flow attribute tree T _A has the following properties.

(i)노드 A는 트리의 루트이다.(i) Node A is the root of the tree.

(ii)노드 n_i는 자신의 부모보다 상세화된 값 v_i를 갖는다.(ii) Node n _i has a more detailed value v _i than its parent.

(iii)노드 n_i가 내부노드이면 n_i의 도메인은 D(n_i)=v_i이다.(iii) the domain of node n _i is an internal node n _i is _{_{D (n i) = v i}} .

(iv)노드 n_i가 리프노드이면,(iv) if node n _i is a leaf node,

-n_i는 사용자에 의해 정의되는 상세화된 값 W={w₁, w₂,....,w_i}의 집합을 가질 수 있다.-n _i may have a set of detailed values W = {w ₁ , w ₂ , ...., w _i } defined by the user.

-n_i의 도메인은 D(T_A)={v_i}∪W.The domain of -n _i is D (T _A ) = {v _i } ∪.

(v)T_A의 도메인 D(T_A)=∪(v) D domain of T _{_A} (T _A) = ∪

(vi)새로운 노드가 트리에 삽입될 수 있고 트리의 노드는 삭제될 수 있다. 노드 n_i가 삭제되었을 때는 노드n_i의 자식은 노드n_i부모의 자식이 된다.(vi) New nodes can be inserted into the tree and nodes in the tree can be deleted. When the node n _i is deleted node n _i child is a child of the parent node n _i.

유동속성트리에 있는 노드의 값으로 고유명사와 같은 특정값은 사용할 수 없는 대신에 리프노드에서 상세화된 값으로 관리한다. 예를들어 유동속성트리 빌딩과 그 리프노드의 하나가 정부기관이라면 청와대는 정부기관의 상세화된 값이 된다. 도2에 의미속성"행위"의 유동속성트리와 x-y 좌표상에 나타나는 개체의 위치를 표현하는 형태속성 "위치"의 유동속성트리를 보인다.The value of a node in the flow attribute tree. Certain values, such as proper nouns, cannot be used. Instead, they are managed as detailed values in the leaf node. For example, if one of the flow attribute tree buildings and one of its leaf nodes is a government agency, the Blue House would be a detailed value of the government agency. Fig. 2 shows the flow attribute tree of the semantic attribute "action" and the flow attribute tree of the shape attribute "position" representing the position of the entity appearing on the x-y coordinates.

동일한 개념의 트리를 두 개의 다른 유동속성 트리또는 속성값으로 모델링할 수 있기 때문에 유동속성트리와 노드값을 일관되게 사용하기 위해서 관리자만이 새로운 유동속성트리를 정의할 수 있고 유동속성트리의 구조를 변경할 수 있다. 주어진 유동속성트리에 기초하여 사용자는 화면에 대한 주석을 부여할 수 있고 내용기반질의를 표현할 수 있다.Since the same concept tree can be modeled as two different flow attribute trees or attribute values, only an administrator can define a new flow attribute tree and use the structure of the flow attribute tree to use the flow attribute tree and node values consistently. You can change it. Based on the given flow attribute tree, the user can annotate the screen and express the content-based query.

앞서 설명했듯이 비디오 데이터 베이스에서는 모델링되는 다양한 유형의 정보가 존재한다. 기존의 데이터베이스 모델에서 이러한 정보들을 나타내기 위해서 다음과 같은 스키마 구조를 제안한다.As mentioned earlier, there are various types of information that are modeled in video databases. To represent this information in the existing database model, we propose the following schema structure.

·물리세그멘트(P_id, 파일이름, 파일위치 파일크기, 파일포맷,...,물리속성 또는 정적속성)Physical segment (P _id , file name, file location file size, file format, ..., physical property or static property)

·논리세그멘트(L_id, P_id, 시작옵셋, 끝옵셋)Logical segment (L _id , P _id , start offset, end offset)

·논리세그멘트 리스트 (V_id, L_id, Sequence#)Logical Segment List (V _id , L _id , Sequence #)

·비디오 클립(V_id, 유동속성이름, 속성값, 상속정보, [개체속성이름])Video clip (V _id , floating attribute name, attribute value, inheritance information, [object attribute name])

·뷰(V_id, 고정속성)View (V _id , fixed property)

·유동속성(유동속성이름, 유형, 상속여부?, FAT pointer)Flow attributes (flow attribute name, type, inheritance ?, FAT pointer)

비디오 데이터 베이스에 삽입된 각 비디오 파일은 유일한 물리세그먼트 P_id를 가지며 논리세그먼트 L_id=(P_id=[s, e])는 물리세그먼트 P_id,에 있는 옵셋 s에서 옵셋 e까지의 구간을 나타내는 것으로 정의한다. 다수의 논리세그먼트가 하나의 물리세그먼트에서 정의될 수 있는 반면에 그 반대의 경우는 정의할 수 없기 때문에 물리세그먼트와 논리세그먼트의 집합간의 관계는 1:N 이다.Each video file inserted into the video database has a unique physical segment P _id and the logical segment L _id = (P _id = [s, e]) represents the interval from offset s to offset e in physical segment P _id , It is defined as. Since multiple logical segments can be defined in one physical segment, and vice versa, the relationship between a physical segment and a set of logical segments is 1: N.

또한 하나 이상의 논리세그먼트들을 논리세그먼트 리스트[(P₁=[s₁, e₁],P₂=[s₂, e₂],..., P_n=[s_n, e_n]]로 정의하여 비디오 클립 혹은 뷰를 정의하는데 사용한다.Also define one or more logical segments as logical segment list [(P ₁ = [s ₁ , e ₁ ], P ₂ = [s ₂ , e ₂ ], ..., P _n = [s _n , e _n ]]] To define a video clip or view.

비디오클립은 내용기반 의미를 기술(설명)하는 단위인 반면에, 뷰는 논리적인 비디오 파일을 구성하기 위해 사용한다. 즉 뷰는 여러 물리세그먼트들의 부분을 논리적으로 연결하여 하나의 논리적인 비디오 파일을 구성하는데 사용된다. 비디오 클립혹은 뷰는 하나이상의 논리세그먼트로 정의될 수 있으며, 동일한 논리세그먼트는 여러 뷰 혹은 비디오 클립에서 공유할 수 있으므로 비디오 클립집합 또는 뷰 집합은 논리세그먼트 집합과 M:N 관계를 형성한다.Video clips are units for describing content-based semantics, while views are used to organize logical video files. In other words, views are used to logically connect parts of several physical segments to form a single logical video file. A video clip or view can be defined as one or more logical segments, and the same logical segment can be shared by multiple views or video clips, so the video clip set or view set forms an M: N relationship with the logical segment set.

비디오 클립의 동적의미는 유동속성트리에 있는 노드값을 선택하여 표현하며 실제의 비디오 클립에 연결된다. 사용자는 비디오 클립을 생성하고 제거할 수 있지만 논리세그먼트와 논리세그먼트 리스트는 시스템에서 유지된다. 물리세그먼트와 뷰 스키마의 고정속성과 비디오 클립 스키마의 개체 속성은 비디오 데이터베이스의 응용분야에 따라 정의한다. 대개 개체 속성은 속성과 속성값 쌍의 주체로서 개체의 이름을 표현한다. 이는 비디오 클립에서 하나이상의 개체들이 동시에 나타날 수 있기 때문에 각 개체들을 독립적으로 기술(설명)할 수 있도록 지원한다.The dynamic meaning of a video clip is represented by selecting a node value in the flow attribute tree and linked to the actual video clip. You can create and remove video clips, but the logical segments and the logical segment list are maintained in the system. The fixed attributes of the physical segment and view schema and the object properties of the video clip schema are defined according to the application of the video database. Usually an object property represents the name of the object as the subject of the property and property value pairs. This allows you to describe each object independently, because more than one object can appear in a video clip at the same time.

비디오 클립 v_i와 v_j가 각각 v_i=[l₁, l₂,...., l_x] 와 v_j=[l₁', l₂',...., l_y']으로 정의되는 논리세그먼트 리스트라 하고, v_j의 비디오 스트림이 v_i의 모든 스트림을 포함하는 것을 v_i∠v_j으로 나타낸다면 v_i에 있는 각 논리세그먼트 l_i=[p_i,(s_i, e_i)]에 대해 v_i∠ v_j은 다음의 (i)과 (ii)를 만족하는 논리세그먼트l_j'=[p_j',(s_j', e_j')]가 v_j에 존재해야 한다.Video clips v _i and v _j are represented by v _i = [l ₁ , l ₂ , ...., l _x ] and v _j = [l ₁ ', l ₂ ', ...., l _y '] La logical segment list to be defined, and, v if the video stream is represented by the _j v _j _i ∠v containing all streams of v _i v each logical segment in the _{_{_{i l i = [p i,}}} (s i, e for _{_i)]} v _i ∠ v _j is a logical segment _{_{l j '= [p j'}} , (s j ', e j') which satisfy the following (i) and (ii)] should be present in the v _j do.

(i)p_i=p'_j (i) p _i = p ' _j

(ii)s'_j＜s_i, e_i＜e'_j (ii) s ' _j <s _i , e _i <e' _j

주어진 형태속성집합 A_syn과 의미속성집합 A_sem에 대하여 비디오 클립 v_i에 대해 v_i ^syn과 v_i ^sem라고 정의한다.For a given shape attribute set A _syn and a semantic set A _sem , we define v _i ^syn and v _i ^sem for a video clip v _i .

v_i ^syn={(속성, 속성값)｜속성∈ A_syn, 속성값∈D(T_attr)}v _i ^syn = {(attribute, attribute value) ｜ attribute∈ A _syn , attribute∈D (T _attr )}

v_i ^sem={(속성, 속성값)｜속성∈ A_sem, 속성값∈D(T_attr)}v _i ^sem = {(attribute, attribute value) ｜ attribute∈ A _sem , attribute value∈D (T _attr )}

따라서 비디오 클립 v_i에 대해서 사용자가 기술(설명)한 의미는 다음과 같다.Therefore, the meaning described by the user for the video clip v _i is as follows.

v_i ^self= v_i ^syn∪ v_i ^sem v _i ^self = v _i ^syn ∪ v _i ^sem

구간포함 관계연산에 의한 기술(설명)을 고려하기 위해서 의미속성 A_sem의 집합을 다음과 같이 상속가능속성(A_{I_sem})과 상속불가능 속성(A_{NI_sem})으로 분류할 수 있다.In order to consider the description (explanation) by the section inclusion relation operation, a set of semantic attributes A _sem may be classified into an inheritable attribute (A _{I_sem} ) and a non-inheritable attribute (A _{NI_sem} ) as _follows .

A_sem= A_{I_sem}∪ A_{NI_sem}, A_{I_sem}∩ A_{NI_sem}= ØA _sem = A _{I_sem} ∪ A _{NI_sem} , A _{I_sem} ∩ A _{NI_sem} = Ø

스키마 관리자는 임의의 의미 속성이 상속가능한 지 상속 불가능한 지를 결정한다. 따라서, 비디오 클립 v_i의 의미기술(설명)은 다음과 같이 상속가능기술(설명) (v_i ^I_sem)과 상속불가능 기술(설명)(v_i ^NI_sem)의 두가지로 분류된다.The schema manager determines whether any semantic attribute is inheritable or non-inheritable. Accordingly, the semantic description (description) of the video clip v _i is classified into two types: an inheritable description (description) (v _i ^I_sem ) and a non-inheritable description (description) (v _i ^NI_sem ).

v_i ^I_sem={(속성, 속성값)｜속성∈ A_{I_sem}, 속성값∈D(T_attr)}v _i ^I_sem = {(attribute, attribute value) ｜ attribute∈ A _{I_sem} , attribute value∈D (T _attr )}

v_i ^NI_sem={(속성, 속성값)｜속성∈ A_{NI_sem}, 속성값∈D(T_attr)}v _i ^NI_sem = {(attribute, attribute value) ｜ attribute∈ A _{NI_sem} , attribute∈D (T _attr )}

비디오 클립에 대한 모든 형태 기술(설명)은 상속될 수 있으므로 비디오 클립 v_i에 상속된 기술(설명)은 다음과 같이 v_i∠ v_k을 만족하는 모든 v_k의 상속가능 기술(설명)을 포함한다.All forms of technology for the video clip (description) are subject to inherit the technique inherits the video clip v _i (description) comprises a v _i ∠ v _k all v inheritance of _k possible technique for satisfying the (description), as follows: do.

v_i ^inh= ∪_k(v_k ^syn∪ v_k ^I_sem)v _i ^inh = ∪ _k (v _k ^syn ∪ v _k ^I_sem )

따라서 상속된 기술(설명)을 포함하는 비디오 클립 v_i에 기술(설명)된 의미는 다음과 같다.Therefore, the meaning described in the video clip v _i including the inherited description is as follows.

v_i ^description= v_i ^self∪ v_i ^inh v _i ^description = v _i ^self ∪ v _i ^inh

모든 유동속성의 이름, 유동속성트리(FAT) 그리고 동적정보의 유형(형태 혹은 의미)은 유동속성 스키마에 저장되고 속성의 상속여부를 유지한다. 비디오클립간의 의미상속은 비디오 클립 스키마에서 표현된다. 사용자가 비디오 클립 v_s에 표현한 새로운 형태 기술(설명)이나 상속 가능한 의미 기술(설명)은 비디오 클립 스키마에 다음과 같이 표현된다.The name, flow attribute tree (FAT), and type (dynamic or semantic) of all dynamic attributes are stored in the flow attribute schema and maintain their inheritance. Semantic inheritance between video clips is expressed in the video clip schema. New form descriptions (descriptions) or inheritable semantic descriptions (descriptions) expressed by users in video clips v _s are expressed in the video clip schema as follows:

(v_s, 속성; 속성값; 상속정보=no,[la])(v _s , attribute; attribute value; inheritance = no, [la])

여기서 (속성;속성값')∈{(속성, 속성값)│속성∈A_syn또는 속성∈A_I, 속성값 ∈D(T_attr)} 그리고 [la]: 개체 속성값Where (property; property value ') ∈ {(property, property value) │property∈A _syn or property∈A _I , property value ∈D (T _attr )} and [la]: object property value

이때 v_i∠v_s를 만족하는 모든 비디오 클립 v_t에 대하여 새로운 튜플(v_t, 속성; 속성값; 상속정보=yes,[la])을 비디오 클립 스키마에 부가적으로 첨가하게 된다. 포함(∠)관계를 만족하는 비디오 클립이 많다면 의미상속처리시간은 많이 소요되지만 상속된 기술(설명)은 비디오 클립의 보충적인 의미로 고려할 수 있기 때문에 의미상속처리는 새로운 기술(설명)이 삽입될 때마다 수행하기 보다는 주기적으로 수행하는 것이 더욱 효율적이다.At this time, a new tuple (v _t , property; property value; inheritance information = yes, [la]) is additionally added to the video clip schema for all video clips v _t satisfying v _i ∠ v _s . If there are many video clips that satisfy the containment relationship, the semantic inheritance processing takes a lot of time, but the inherited description (description) can be considered as a supplementary meaning of the video clip. It is more efficient to do it periodically than whenever it is done.

의미기술(설명)을 상속할 때 부분중첩관계(partially overlapped relationship)는 구간을 부분적으로 공유하고 있는 두 개의 클립간에 발생하며 이 관계는 중첩된 부분 구간이 각각의 비디오 클립과 포함관계를 가지므로 특별한 구간포함 관계로 간주할 수 있다. 따라서 중첩된 부분을 새로운 비디오 클립으로 시스템이 정의함으로써 어느 사용자에 의해서도 비디오 클립으로 정의되지 않은 부분에 대해 내용기반 검색을 지원할 수 있다.In inheriting the semantic description, a partially overlapped relationship occurs between two clips that partially share a segment, which is particularly important because the overlapping segment has an inclusion relationship with each video clip. It can be regarded as an interval inclusion relationship. This allows the system to define nested parts as new video clips, enabling content-based retrieval for parts not defined as video clips by any user.

단순한 내용기반 질의는 그래픽 사용자 인터페이스(GUI)로 표현할 수 잇지만 복잡한 질의는 비디오 데이터 모델에 기반한 질의언어로 표현해야 한다. 기존연구에서 사용된 대부분의 질의언어는 세가지 방식으로 분류된다.Simple content-based queries can be expressed in graphical user interfaces (GUIs), but complex queries must be expressed in a query language based on the video data model. Most of the query languages used in previous studies are classified in three ways.

첫 번째는 질의를 표현하는데 형식화된 방법을 제공하지 못했던 임기 응변적인 방법(ad-hoc solution)을 사용한 것으로 키워드 형태로 주석을 부여하는 시스템에서 사용한다. 두 번째는 SQL과 같은 표준질의언어를 사용한 것이다. 그러나 이 방법은 스키마가 유동적으로 바뀔 수 없기 때문에 모델링 요소가 고정되어 있는 분야에만 한정되었다. 세 번째는 제안하는 시스템에 적합한 형태를 갖는 고유의 질의언어를 제공한 것이다.The first uses an ad-hoc solution that does not provide a formal method for expressing a query. It is used in systems that annotate keywords. The second uses a standard query language such as SQL. However, this method is limited to areas where modeling elements are fixed because the schema cannot be changed flexibly. The third is to provide a unique query language with a form suitable for the proposed system.

비디오 데이터베이스에 정치인 비디오 파일이 저장되어 있다고 가정하자. 유동속성트리가 도2와 같다고 할 때 김영삼 대통령이 연설하고 있는 비디오 클립을 찾고자 한다면 질의는 다음과 같은 SQL로 표현된다.Suppose a politician video file is stored in a video database. If the flow attribute tree is as shown in Fig. 2, and you want to find the video clip that President Kim Young-Sam is speaking, the query is expressed as the following SQL.

Q1:select v_id Q1: select v _id

from 비디오 클립from video clips

where 유동속성=행위 and 값= 연설 and 개체이름= 김영삼where flow attribute = behavior and value = speech and individual name = Kim Young Sam

사용자가 화면의 x＜X_max/2 와 y＜Y_max/2 의 범위에서 김영삼이 나타나는 비디오 클립을 모두 찾고자한다면 질의는 다음과 같다.If the user wants to find all the video clips in which Kim Young-Sam appears in the range of x <X _max / 2 and y <Y _max / 2, the query looks like this:

Q2:(select v_id Q2: (select v _id

from 비디오 클립from video clips

where 유동속성=위치 and 값=x＜X_max/2 and 개체속성= 김영삼)where flow attribute = position and value = x <X _max / 2 and individual attribute = Kim Young Sam)

intersectintersect

(select v_id (select v _id

from 비디오 클립from video clips

여기서 유동속성=위치 and 값=y＜Y_max/2 and 개체이름= 김영삼)Where flow attribute = position and value = y <Y _max / 2 and individual name = Kim Young Sam)

관계형 데이터 베이스에서 스키마 정치인 (이름, 나이, 성별, 국적, 활동경력)이 존재한다고 할 때 한국남자 정치가가 연설하는 클립을 검색하는 질의는 다음과 같다.Given the existence of schema politicians (name, age, gender, nationality, and work experience) in a relational database, the query to search for clips spoken by Korean male politicians is as follows.

Q3:select v_id Q3: select v _id

from 비디오 클립from video clips

where 유동속성=행위 and 값=연설 andwhere flow attribute = behavior and value = speech and

개체이름in(select 이름Objectnamein (select name

from 정치인from Politicians

where 국적=한국 and 성별 =남자)where nationality = Korea and gender = men)

비디오 클립간의 포함(∠)관계를 파악하기 위해 한 구간이 다른 구간에 포함되는지를 확인할 필요가 있다. 예를들어 질의 Q4가 물리세그먼트 p의 t1에서 t2까지의 구간에서 정의된 논리세그먼트를 검색하고 Q5가 비디오 클립 v의 모든 세그먼트를 검색한다면 질의는 다음과 같다.In order to understand the inclusion relationship between video clips, it is necessary to confirm whether one section is included in another section. For example, if query Q4 retrieves the logical segment defined in the interval from t1 to t2 of physical segment p, and Q5 retrieves all segments of video clip v, then the query is as follows.

Q4:select v_id Q4: select v _id

from 논리세그먼트from logical segment

where P_id=p and 시작옵셋 ＞t1 and 끝옵셋＜t2where P _id = p and start offset ＞ t1 and end offset <t2

Q5:select L_id Q5: select L _id

from 논리세그먼트 리스트from logical segment list

where V_id=vwhere V _id = v

order by Sequence#order by Sequence #

상속된 기술(설명)을 제외하고 비디오 클립을 검색하고자 한다면 비디오 클립 스키마에 있는 상속정보? 속성을 이용하여 표현한다. 예를들어, 질의 Q2를 상속된 기술이 없는 질의로 표현하면 다음과 같다.If you want to search for video clips except inherited descriptions (description), you can use the inheritance? Express using attributes. For example, if the query Q2 is expressed as a query without inherited technology, it is as follows.

Q6:select v_id Q6: select v _id

from 비디오 클립from video clips

where 유동속성=위치 and 값=x＜X_max/2 and 개체속성= 케네디 and 상속정보?=no)where flow attribute = position and value = x <X _max / 2 and entity attribute = Kennedy and inheritance information? = no)

intersectintersect

(select V_id (select V _id

from 비디오 클립from video clips

where 유동속성=위치 and 값=y＜Y_max/2 and 개체이름= 케네디 and 상속정보?=no)where flow attribute = position and value = y <Y _max / 2 and entity name = kennedy and inheritance? = no)

비디오 클립의 수와 클립에 대한 기술이 방대해진다면 클립검색 작업은 많은 탐색시간이 필요하지만 OVID 나 Algibraic Model에서는 이를 위한 접근구조에 대한 해결책이 제시되지않았다. 이와같이 기존의 데이타베이스 관리시스템을 이용하는 작업은 해쉬나 B-tree와 같은 접근구조가 데이터베이스 검색효율을 높일 수 있다.If the number of video clips and the description of the clips are enormous, clip search can take a lot of searching time, but the OVID or Algibraic Model does not provide a solution for this approach. As such, using an existing database management system, an access structure such as hash or B-tree can improve database search efficiency.

내용기반 질의는 속성에 대한 다양한 조건을 포함할 수 있기 때문에 비디오 데이터 베이스에서 사용자는 복잡한 질의를 표현하기가 힘들며 속성과 속성값을 유동적으로 관리하게 되면 이 문제는 더 심각해 질 수 있다. 또한 사용자가 필요한 모든 질의 조건을 구성했다고 해도 사용자는 질의의 결과 비디오를 사용하기 전에 반드시 재생해야 한다.Because content-based queries can contain a variety of conditions for attributes, it is difficult for a user to express complex queries in a video database, and this problem can become more severe when fluidly managing attributes and attribute values. Also, even if the user has configured all the necessary query conditions, the user must play the result video of the query before using it.

FAT의 구조는 직접적으로 사용자가 사용하기에 적합하지 않다. 노드가 추가되고 삭제되기 때문에 FAT는 균형이 맞지 않는 트리가 될 수 있고 따라서 전체적인 트리구조를 보여주기 어렵다. 또한, 하나의 FAT가 속성과 속성값 쌍을 표현하기 때문에 복잡한 질의 조건을 구성하기 위해서 대부분의 질의는 하나 이상의 FAT가 필요하다.The structure of the FAT is not suitable for direct user use. Because nodes are added and deleted, the FAT can be an unbalanced tree, making it difficult to show the overall tree structure. In addition, since one FAT represents attribute and attribute value pairs, most queries require more than one FAT in order to construct complex query conditions.

도3은 FAT에 있는 속성값의 브라우징 방법을 보여준다. 하나의 FAT에서 관련된 속성값을 보여주기 위해서 FAT의 루트에서부터 현재 선택된 노드의 경로에 이웃한 노드가 사용자의 작업대상이 된다. 사용자가 트리의 탐색경로를 선택할 때 브라우저는 경로에 있는 동일한 레벨의 속성값만을 표시한다. 브라우저에서 속성값 상자는 FAT에 있는 노드의 속성값을 표현하고 FAT의 레벨차이는 각 레벨마다 속성값 상자의 색상을 다르게 한다. 속성값 상자의 두가지 화살표는 부모 혹은 자식으로 이동할 수 있는 콘트롤이다. 도3의 속성값 상자의 자식 콘트롤을 선택하면 속성값 상자는 자신의 자식노드에 해당하는 속성값 상자로 대치되고 속성값 상자의 부모 콘트롤은 자식 콘트롤과 반대의 기능을 한다.3 shows a method of browsing attribute values in a FAT. In order to show related attribute values in one FAT, the node next to the path of the currently selected node from the root of the FAT becomes the user's work target. When the user selects a search path in the tree, the browser only displays attribute values of the same level in the path. In the browser, the attribute value box represents the attribute value of a node in the FAT, and the level difference of the FAT causes the color of the attribute value box to be different for each level. The two arrows in the property box are controls that can be moved to the parent or child. When the child control of the attribute value box of FIG. 3 is selected, the attribute value box is replaced with the attribute value box corresponding to its own child node, and the parent control of the attribute value box has the opposite function as the child control.

본 발명의 브라우저에서는 하나 이상의 FAT를 동시에 브라우징 할 수 있다. 도3와 같이 세 개의 FAT 을Aⁱi, A^j, A^k를 하나의 작업공간에서 검색할 수 있다. 브라우저에서 유동속성을 나타내는 각 열을 속성 팔레트(attribute pallete)라고 정의한다. 초기 화면에서 시스템에 정의된 모든 팔레트의 이름이 아이콘 형식으로 제공되고 사용자가 검색하고자 하는 아이콘의 집합을 선택한 후에 속성 팔레트가 브라우저에 나타난다.The browser of the present invention can simultaneously browse more than one FAT. As shown in FIG. 3, three FATs can be searched in one workspace, ⁱ i, A ^j , and A ^k . Each column representing the flow attribute in the browser is defined as an attribute pallete. In the initial screen, the names of all the palettes defined in the system are provided in icon format, and after selecting the set of icons you want to search, the Properties palette appears in the browser.

관리자는 FAT의 구조를 수정할 수 있는데 FAT의 부분트리(sub-tree)의 크기가 커지면 비디오 클립 스키마의 내용을 수정하여 새로운 FAT로 정의하고자 하는 서브트리의 속성값을 갖는 모든 튜플들의 유동속성필드를 새로운 FAT의 이름으로 변환시킨다. FAT의 수가 관리하기 힘들정도로 커지면 의미의 유사성에 따라 클립맵과 같이 부 그룹화(sub-grouping)할 수 있다.The administrator can modify the structure of the FAT. If the size of the FAT sub-tree grows, the administrator can modify the contents of the video clip schema to create floating attribute fields of all tuples with the attribute values of the subtree to be defined as a new FAT. Convert to the name of the new FAT. If the number of FATs is too large to manage, they can be sub-grouped like a clipmap, depending on their similarity in meaning.

FAT를 브라우징하면서 사용자는 비디오 클립의의미를 기술할 수 있다. FAT에서 원하는 유동속성값을 선택한 후에 속성=속성값의 기본적인 질의조건을 입력할 수 있는 속성값 상자를 클릭하여 의미를 기술한다. 사용자는 도3의(d)의 질의조건을 나타내는 텍스트 상자에서 리프노드의 상세 속성값을 입력하고 개체속성값도 유사한 방법으로 입력한다. 이와같은 방법으로 입력된 정보는 비디오 클립 스키마에 저장된다.While browsing the FAT, the user can describe the meaning of the video clip. After selecting the desired flow attribute value from FAT, click the attribute value box to input basic query condition of attribute = attribute value. The user inputs the detailed attribute values of the leaf nodes in the text box representing the query condition of FIG. 3 (d), and inputs the individual attribute values in a similar manner. The information entered in this way is stored in the video clip schema.

비디오 데이터에 대한 질의를 표현하는 기존의 방법에서는 상호 대화식의 질의는 키워드의 리스트 혹은 논리연산자의 조합으로 구성하였다. 이중에서는 질의 조건을 제한하거나 색깔, 질감(texture), 공간/시간적 조건을 지원하기 위한 고유의 인터페이스를 제공하였다. 질의모드에서는 사전 브라우저를 사용하여 표현할 수 있으며 상세 속성값에 대한 질의 조건은 앞서 기술한 의미기술 방법과 동일한 방법으로 표현한다. 이렇게 구성된 질의조건은 논리연산으로 해석되어 내부적으로 SQL문으로 변환된다. 동일한 팔레트에서 선택한 질의조건은 속성값간의 OR을 의미하고 다른 팔레트에서 선택한 질의조건들은 AND를 의미한다. 따라서 동일한 FAT의 값들간에 AND 조건을 표현하기 위해서는 동일한 속성 팔레트를 한번 이상 선택해야 한다.In the existing method of expressing a query for video data, an interactive query is composed of a list of keywords or a combination of logical operators. Among them, we provided a unique interface for limiting query conditions or supporting color, texture, and spatial / temporal conditions. In the query mode, the dictionary browser can be used to express the query condition for the detailed attribute value in the same way as the aforementioned semantic description method. The query condition thus constructed is interpreted as a logical operation and converted into an SQL statement internally. Query conditions selected in the same palette mean OR between attribute values, and query conditions selected in other palettes mean AND. Therefore, the same property palette must be selected more than once to represent the AND condition between the values of the same FAT.

사전 브라우저의 전반적인 구조는 도4와 같다. 도4는 4개의 고정속성과 5개의 유동속성을 선택하여 질의를 구성한 후의 브라우저를 보여준다. 두 개의 유동속성 및 속성값 쌍, ACTION/handshake 과 POSITION/Xmax/2 의 주체는 김영삼 대통령으로 표현되었고, PLACE 속성에 대하여 BLUE HOUSE는 리프노드 government organization의 상세 속성값이다. 각 질의 조건마다 제공하고 잇는 Inh 버튼을 선택하여 상속된 의미를 포함하는 질의결과를 얻을 수 있다. 질의결과는 도5와 같이 결과뷰어에 나타나며 주어진 질의를 만족하는 각 비디오 클립의 대표 프레임이 아이콘으로 표현된다. 이 아이콘은 마이콘(micon)과 동일하며 사용자는 대표 프레임에 해당되는 비디오 데이터를 재생하기 위해서 아이콘을 클릭할 수 있고 슬라이드 쇼 기능을 사용하여 모든 대표프레임을 순차적으로 짧은 시간에 볼 수 있다.The overall structure of the dictionary browser is shown in FIG. 4 shows a browser after constructing a query by selecting four fixed attributes and five floating attributes. The two flow attribute and attribute value pairs, ACTION / handshake and POSITION / Xmax / 2, are represented by President Kim Young-sam. For the PLACE attribute, BLUE HOUSE is the detailed attribute value of the leaf node government organization. By selecting the Inh button provided for each query condition, the query result including the inherited meaning can be obtained. The query result is shown in the result viewer as shown in FIG. 5 and a representative frame of each video clip satisfying a given query is represented by an icon. This icon is the same as the micon, and the user can click the icon to play the video data corresponding to the representative frame, and can use the slide show function to sequentially view all the representative frames in a short time.

질의의 결과는 매칭스코어를 순차적으로 정렬한 리스트로 표시하고 뉴스 방송에서 시간바(timebar) 구조를 제공하여 질의결과에 대한 매칭정도를 표현한다. 하지만 비디오 데이터 베이스에서는 사용자가 질의를 표현하는데 도움을 줄 수 있도록 좀더 분석적인 기능을 제공할 필요가 있다. 질의결과가 너무 많거난 적으면 사용자는 적합한 검색결과를 얻기 위하여 질의조건을 재구성해야 한다. 결과의 수는 질의의 구성뿐만아니라 데이터 베이스의 내용에 의존되므로 사용자가 구성한 질의조건의 부분적인 예상결과 수를 보여줄수 있는 특별한 브라우저가 필요하다.The result of the query is displayed as a list of matching scores in a sequential order, and a timebar structure is provided in the news broadcast to express the degree of matching with the query result. However, video databases need to provide more analytical capabilities to help users express their queries. If there are too many or too few query results, the user must reconfigure the query conditions to obtain a suitable search result. Since the number of results depends not only on the composition of the query, but also on the contents of the database, a special browser is needed to show the partial expected number of results of the user-configured query condition.

s_i, 1≤i≤c를 사용자가 표현한 "속성=속성값"으로 표현되는 질의조건이라 하자. 질의의 부분적인 결과분석을 위해 부분결과행렬(partial result matrix)라고 불리는 C×C 행렬 R을 사용한다. 이 행렬의 각 원소는 질의조건의 조합을 만족하는 비디오 클립의수를 다음과 같이 나타낸다.Let s _i , 1≤i≤c, be a query condition expressed by "attribute = attribute value" expressed by the user. For the partial result analysis of the query, we use the C × C matrix R called the partial result matrix. Each element of this matrix represents the number of video clips that satisfy the combination of query conditions.

(i)대각원소(Diagonal elements) R(i,i), 1≤i≤c:s_i를 만족하는 비디오 클립의수(i) Number of video clips satisfying Diagonal elements R (i, i), 1≤i≤c: s _i

(ii)상위대각원소(Upper-diagonal elements) R(i,j), 1≤i≤c, 1≤j≤c:s_i∧s_i+1∧....∧s_j를 만족하는 비디오 클립의수(ii) Video satisfying Upper-diagonal elements R (i, j), 1≤i≤c, 1≤j≤c: s _i ∧s _{i + 1} ∧ .... ∧s _j Number of clips

(iii)하위대각원소(Lower-diagonal elements) R(i,j), 1≤i≤c,1≤j≤c:s_i∧s_j를 만족하는 비디오 클립의수(iii) the number of video clips that satisfy the lower-diagonal elements R (i, j), 1≤i≤c, 1≤j≤c: s _i ∧s _j

직관적으로 해석할 수 있도록 부분결과 행렬의 각원소의 색상 명도로 결과 비디오 클립수를 나타낸다.For intuitive interpretation, the color brightness of each element in the partial result matrix indicates the number of result video clips.

의의 질의 조건들의 조합은 사용자가 질의의 매칭정도를 이해할 수 있도록 충분한 정보를 제공한다. 행렬에서 원소의 순서가 질의의 결과를 분석하는데 중요한 요소이므로 사용자가 드래그 앤 드롭하여 재정렬할 수 잇도록 지원한다. 결과 브라우저의 비디오 클립수에 기초하여 사용자는 질의를 재구성할 수 있다.The combination of meaningful query conditions provides enough information for the user to understand the degree of matching of the query. The order of elements in the matrix is an important factor in analyzing the results of a query, so the user can drag and drop to reorder them. The user can reconstruct the query based on the number of video clips in the result browser.

하나의 비디오 파일에 대한 물리적인 구조의 브라우징을 지원하기 위하여 구조 브라우저(structure browser)를 제공하며 브라우저의 구성은 도5의 결과 브라우저와 유사하다. 구조 브라우저에서 2차원 구조는 샷(shot), 화면(scene), 시퀀스(squence)의 대표 프레임을 나타내는 3단계의 트리구조로 표현한다. 이기능은 계층적인 브라우저와 유사한다. 즉 트리는 비디오 파일의 물리적인 구조를 표현하고 있다. 구조브라우저에서는 비디오 파일에 정의되어 있는 비디오 클립을 검색할 수 있다.A structure browser is provided to support the browsing of the physical structure of one video file, and the configuration of the browser is similar to the result browser of FIG. In the structure browser, a two-dimensional structure is represented by a three-level tree structure representing representative frames of shots, scenes, and sequences. This feature is similar to a hierarchical browser. In other words, the tree represents the physical structure of the video file. In the Structure Browser, you can search for video clips defined in the video file.

이상에서 살펴본 바와 같이, 본 발명에 의하면, 서로 관련된 속성들을 그룹화하여 고정 스키마로 정의하고, 또는 고정스키마 없이 유동적인 방법으로 속성을 정의하여 유동적으로 정의된 속성은 고정스키마로 유지되기 때문에 사용자가 내용기반 질의를 다양한 속성집합으로 질의를 구성할 수 있다. 사전브라우저는 내부적으로 트리형태로 구조화된 유동속성 트리의 브라우징을 단순화시키고 구조브라우저는 비디오 파일의 내용을 이해할 수 있도록 간단한 방법을 제공한다.As described above, according to the present invention, the user-defined content is defined as a fixed schema by grouping related properties, or by defining a property in a fluid manner without a fixed schema. You can construct a query based on a variety of attribute sets. The pre-browser simplifies the browsing of the tree, which is internally structured in a tree, and the structured browser provides a simple way to understand the content of a video file.

Claims

In the video data search method,

Represents dynamic information representing information on the content of the image file as a flow attribute, defines a flexible attribute tree (FAT) for the flow attribute _A , and the flow attribute tree T _A is a node N = {A , n ₁ , n ₂ , ..., n _p }, each element has a unique value v _i (1≤i≤p), and the flow attribute tree T _A has the following properties: An image data retrieval method using a flow attribute tree and a partial result matrix.

(i) Node A is the root of the tree.

(ii) Node n _i has a more detailed value v _i than its parent.

(iii) the domain of node n _i is an internal node n _i is _{_{D (n i) = v i}} .

(iv) if node n _i is a leaf node,

-n _i may have a set of detailed values W = {w ₁ , w ₂ , ...., w _i } defined by the user.

The domain of -n _i is D (T _A ) = {v _i } ∪.

(v) D domain of T _{_A} (T _A) = ∪

(vi) New nodes can be inserted into the tree and nodes in the tree can be deleted. When the node n _i is deleted node n _i child is a child of the parent node n _i.

The method of claim 1, wherein a specific value, such as a proper noun, is managed as a value of a node in the flow attribute tree as a detailed value in a leaf node.

The method of claim 1,

If the size of the sub-tree of the FAT becomes large, the contents of the video clip schema are modified to change the flow attribute fields of all tuples having the attribute values of the subtree to be defined as the new flow attribute tree. An image data retrieval method using a flow attribute tree and a partial result matrix, which can be converted into a name of a flow attribute tree.

In the video data search method,

Each video file inserted into the image database has a unique physical segment P _id , and logical segment L _id = (P _id = [s, e]) represents a section from offset s to offset e in physical segment P _id . , Denote one or more logical segments as a logical segment list [(P ₁ = [s ₁ , e ₁ ], P ₂ = [s ₂ , e ₂ ], ..., P _n = [s _n , e _n ]] And image data retrieval method using a flow attribute tree and a partial result matrix, which are used to define a video clip or a view.

The schema of claim 4, wherein the schema structure of the image database is

Physical segment (P _id , file name, file location file size, file format, ..., physical property or static property)

Logical segment (L _id , P _id , start offset, end offset)

Logical Segment List (V _id , L _id , Sequence #)

Video clip (V _id , floating attribute name, attribute value, inheritance information, [object attribute name])

View (V _id , fixed property)

Flow attributes (flow attribute name, type, inheritance ?, FAT pointer)

Image data retrieval method using a flow attribute tree and a partial result matrix consisting of.

5. The method of claim 4, wherein the video clip set or view set forms an M: N relationship with the logical segment set.

In the video data search method,

In order to show related attribute values in a FAT, the target node is a node neighboring the path of the currently selected node from the root of the FAT. When the user selects a search path of the tree, the browser only displays attribute values of the same level in the path. In the browser, the attribute value box represents the attribute value of the node in the FAT, the level difference of the FAT changes the color of the attribute value box for each level, and the two arrows in the attribute value box move to the parent or child. An image data retrieval method using a flow attribute tree and a partial result matrix characterized in that the control.

8. The method of claim 7, wherein the two-dimensional structure is a three-level tree structure representing representative frames of shots, scenes, and sequences in order to support browsing of a physical structure of one video file. An image data retrieval method using a flow attribute tree and a partial result matrix characterized by providing a structure browser to represent.

8. The method of claim 7, wherein when the browser selects a child control of an attribute value box, the attribute value box is replaced with an attribute value box corresponding to its own child node, and the parent control of the attribute value box has a function opposite to that of the child control. An image data retrieval method using a flow attribute tree and a partial result matrix.

8. The method of claim 7, wherein after selecting a desired flow attribute value in the flow attribute tree, the attribute value box for inputting a basic query condition of attribute = attribute value is clicked to describe the meaning, and the leaf is displayed in a text box representing the query condition. A method for retrieving image data using a flow attribute tree and a partial result matrix, characterized by inputting detailed attribute values of a node and inputting object attribute values in a similar manner to describe the meaning of a video clip while browsing the flow attribute tree.

11. The method of claim 10, wherein the query result is displayed in the result viewer, and a representative frame of each video clip satisfying the given query is represented by an icon.

12. The method of claim 11, wherein a C × C matrix, R, is called a partial result matrix for partial analysis of the query, wherein each element of the matrix is a number of video clips that satisfy a combination of query conditions. The image data retrieval method using the flow attribute tree and the partial result matrix, characterized in that as follows.

(i) Number of video clips satisfying Diagonal elements R (i, i), 1≤i≤c: s _i

(ii) Video satisfying Upper-diagonal elements R (i, j), 1≤i≤c, 1≤j≤c: s _i ∧s _{i + 1} ∧ .... ∧s _j Number of clips

(iii) the number of video clips that satisfy the lower-diagonal elements R (i, j), 1≤i≤c, 1≤j≤c: s _i ∧s _j