KR20050021571A

KR20050021571A - Database model for hierarchical data formats

Info

Publication number: KR20050021571A
Application number: KR10-2005-7001447A
Authority: KR
Inventors: 마르코 빈터; 메이놀프 블라바트; 위 얀센; 후이 리; 랄프 오스테르만
Original assignee: 톰슨 라이센싱 소시에떼 아노님
Priority date: 2002-07-29
Filing date: 2003-07-16
Publication date: 2005-03-07

Abstract

본 발명은 계층적 데이터 포맷을 관계형 데이터베이스 관리 시스템에 매핑하는 방법에 관한 것이다. 본 발명의 목적은, 설명자(1, 10, 11)를 삽입하고, 설명자(1, 10, 11)의 부분을 판독하고, 전체 설명자(1, 10, 11)를 판독하고, 빠른 텍스트 질의를 수행하기 위해 빠른 방식으로 다양한 유형의 계층적 설명자(1, 10, 11)를 처리할 수 있는 관계형 데이터베이스 관리 시스템에 설명자(1, 10, 11)를 포함하는 계층적 데이터 포맷을 매핑하는 방법을 제공하는 것이다. 본 발명에 따라, 설명자(1, 10, 11)는 공통 포맷의 부분으로 분리되며, 이것은 관계형 데이터베이스에서의 관계(20, 21, 22...)에 저장된다.The present invention relates to a method of mapping a hierarchical data format to a relational database management system. It is an object of the present invention to insert descriptors 1, 10, 11, read portions of descriptors 1, 10, 11, read entire descriptors 1, 10, 11, and perform a quick text query. To provide a method for mapping hierarchical data formats including descriptors (1, 10, 11) to a relational database management system that can handle various types of hierarchical descriptors (1, 10, 11) in a fast manner. will be. According to the invention, the descriptors 1, 10, 11 are separated into parts of a common format, which are stored in the relationships 20, 21, 22... In the relational database.

Description

DATABASE MODEL FOR HIERARCHICAL DATA FORMATS}

본 발명은 계층적 데이터 포맷을 관계형 데이터베이스 관리 시스템에 매핑하는 방법에 관한 것이다. 더욱이, 본 발명은 그러한 방법을 이용하여 리코딩 매체로부터 판독 및/또는 리코딩 매체에 기록하기 위한 데이터베이스 모델 및 장치에 관한 것이다.The present invention relates to a method of mapping a hierarchical data format to a relational database management system. Moreover, the present invention relates to a database model and apparatus for reading from and / or writing to a recording medium using such a method.

디지털 리코딩의 미래는, 부가 가치의 데이터 서비스의 준비, 표현 및 달성을 특징으로 하는데, 즉 예를 들어 DVR(Digital Video Recorder: 디지털 비디오 리코더)과 같은 리코더는 방송국과 같은 컨텐츠 제공자 또는 특수 서비스에 의해 전달되거나 심지어 사용자 자신에 의해 어셈블링된 추가 정보를 저장하고 처리할 것이다. 부가 가치(메타데이터)는 또 다른 정보를 사용자에게 제공하도록 생성된다. 예를 들어, 부가 가치는 줄거리를 설명하는 영화 요약, 배우들의 목록 등일 수 있다. 또한, 영화 내의 항해(navigation)를 용이하게 하는 추가 정보의 제공은 부가 가치를 구성한다. 예를 들어, 영화는 섹션(section), 서브섹션 등으로 구성될 수 있으며, 각각의 섹션과 서브섹션은 개별 타이틀을 갖고, 아마 또 다른 유용한 정보를 포함한다.The future of digital recording is characterized by the preparation, presentation and attainment of value-added data services, i.e. recorders such as digital video recorders (DVRs) are provided by content providers or special services such as broadcast stations. It will store and process additional information passed or even assembled by the user himself. Value added (metadata) is created to provide another user with information. For example, the added value may be a movie summary describing the plot, a list of actors, and the like. In addition, the provision of additional information to facilitate navigation within the movie constitutes added value. For example, a movie can be made up of sections, subsections, and the like, each section and subsection having a separate title and possibly containing another useful information.

구조적 정보를 제공하고 비디오 또는 오디오 스트림과 같은 멀티미디어 객체에 대한 다른 메타데이터를 전송하기 위해, 일반적으로 계층적 데이터 포맷이 사용된다. 잘 알려지고 광범위하게 수용된 계층적 데이터 포맷은 XML(eXtensible Markup Languages: 확장성 마크업 언어)이다. XML은 포맷된 데이터를 송신하는데 사용되는 특수한 마크업 언어를 한정하는 시스템이다. 그러므로, XML은 소위 메타 언어라 불리고, 다른 특수 언어를 생성하는데 사용된 언어이다. XML 데이터는 복수의 설명자의 형태로 조직화된 텍스트로 구성되어 있다. 텍스트 자체는 요소(element), 속성 및 컨텐트, 즉 나머지 텍스트를 포함한다. 멀티미디어 객체에 대한 사용 이외에, XML에 대한 많은 다른 애플리케이션이 알려져 있다.Hierarchical data formats are generally used to provide structural information and to transmit other metadata for multimedia objects such as video or audio streams. A well known and widely accepted hierarchical data format is XML (eXtensible Markup Languages). XML is a system that restricts the special markup language used to send formatted data. Therefore, XML is called a meta language and is the language used to create other special languages. XML data consists of text organized in the form of a plurality of descriptors. The text itself contains elements, attributes and content, i.e. the remaining text. In addition to the use of multimedia objects, many other applications for XML are known.

예측 가능한 미래에, 디지털 리코더는 XML에서 매우 많은 양의 데이터 또는 다른 계층적 데이터 포맷을 관계형 데이터베이스에 저장하는데, 이는 이들 데이터베이스가 광범위하게 사용되고 매우 복잡하기 때문이라는 것이 예측된다. 그러나, 저장을 위해 계층적 데이터 포맷이 관계형 데이터베이스 관리 시스템(RDBMS)에 매핑되어야 한다는 문제가 발생한다. XML을 위한 다수의 데이터베이스 모델은 이미 제안되어 왔다. 예를 들어 라하유(Rahayu) 등의, Representation of multilevel composite objects in relational database(1998년 객체 지향 정보 시스템에 관한 국제 회의, OOIS'98의 회보, pp 221-238), 또는 장(Zhang) 등의, On Supporting Containment Queries in Relational Database Management Systems(2001년, ACM. Sigmod Record, 제 30권, 2호, pp.425-36)을 참조하자. 그러나, 설명자를 삽입하고, 설명자의 부분을 판독하고, 전체 설명자를 판독하고 빠른 텍스트 질의를 수행하기 위한 빠른 방법으로 다양한 유형의 계층적 설명자를 처리할 수 있는 어떠한 데이터베이스 모델도 알려지지 않았다.In the foreseeable future, digital recorders store very large amounts of data or other hierarchical data formats in XML in relational databases because it is widely used and very complex. However, a problem arises that the hierarchical data format must be mapped to a relational database management system (RDBMS) for storage. Many database models for XML have already been proposed. For example, Rahaayu et al., Representation of multilevel composite objects in relational database (1998 International Conference on Object-Oriented Information Systems, Bulletin of OOIS'98, pp 221-238), or Zhang. See, On Supporting Containment Queries in Relational Database Management Systems (2001, ACM.Sigmod Record, Vol. 30, No. 2, pp. 425-36). However, no database model is known that can handle various types of hierarchical descriptors as a quick way to insert descriptors, read portions of descriptors, read entire descriptors, and perform quick text queries.

도 1의 a) 및 도 1의 b)는 개략적인 XML 설명자 및 XML 트리로서 그 표현을 도시한 도면.1 a) and 1 b) are schematic XML descriptors and representations thereof as XML trees;

도 2는 단일 관계를 이용하는 본 발명에 따른 데이터베이스 모델을 도시한 도면.2 illustrates a database model according to the invention using a single relationship.

도 3은 도 2에서와 같은 데이터베이스 모델이지만, 설명자 구조상에 추가 정보가 저장되는 데이터베이스 모델을 도시한 도면.FIG. 3 is a database model as in FIG. 2, but showing a database model in which additional information is stored on the descriptor structure. FIG.

도 4의 a) 및 도 4의 b)는 도 1에서와 같은 XML 설명자를 도시한 도면이지만, 텍스트가 스트링 값 및 정수 값을 포함하는 XML 설명자를 도시한 도면.4A and 4B are diagrams showing an XML descriptor as in FIG. 1, but showing an XML descriptor in which the text includes a string value and an integer value.

도 5는 도 2와 유사한 데이터베이스 모델이지만, 요소, 속성, 정수 값 및 스트링 값이 상이한 관계로 분리되는 데이터베이스 모델을 도시한 도면.FIG. 5 is a database model similar to FIG. 2 but illustrating a database model in which elements, attributes, integer values, and string values are separated into different relationships. FIG.

도 6은 도 3과 유사한 데이터베이스 모델이지만, 관계 내의 반복이 추가 관계를 제공함으로써 제거되는 데이터베이스 모델을 도시한 도면.FIG. 6 is a database model similar to FIG. 3, but showing a database model in which iterations within a relationship are removed by providing additional relationships.

도 7은 네임스페이스 정보, 고유 식별자, 및 다른 메타데이터 설명자로의 링크를 포함하는 일반적인 메타데이터 설명자를 도시한 도면.FIG. 7 illustrates a generic metadata descriptor including namespace information, a unique identifier, and a link to another metadata descriptor.

도 8은 복수의 메타데이터 설명자를 포함하는 메타데이터 스트림을 도시한 도면.8 illustrates a metadata stream comprising a plurality of metadata descriptors.

도 9는 설명자 인덱스를 포함하는 도 6에 따른 데이터베이스 모델을 도시한 도면.9 shows a database model according to FIG. 6 including a descriptor index;

그러므로, 본 발명의 목적은, 설명자를 포함하는 계층적 데이터 포맷을 관계형 데이터베이스 관리 시스템에 매핑하는 방법을 제공하는 것이다. 본 발명의 다른 목적은, 그러한 방법을 이용하여 리코딩 매체로부터 판독 및/또는 리코딩 매체에 기록하기 위한 데이터베이스 모델 및 장치를 제공하는 것이다.It is therefore an object of the present invention to provide a method of mapping a hierarchical data format comprising a descriptor to a relational database management system. It is another object of the present invention to provide a database model and apparatus for reading from and / or writing to a recording medium using such a method.

본 발명에 따라, 설명자는 공통 포맷의 부분으로 분리되며, 공통 포맷의 부분은 관계형 데이터베이스에 있는 관계(relation)에 저장된다. 본 방법은, 저장된 설명자의 구조와 독립적이라는 장점을 갖는다. 제한된 수의 공통 포맷만이 모든 유형의 설명자 포맷을 저장하는데 필요하다. 공통 포맷은 예를 들어 요소, 속성, 텍스트 등을 포함한다. 이러한 방식으로, 각 설명자는 한 워드씩 분석되고, 상이한 구성요소로 분리되고, 표인 것이 바람직한 관계에 저장된다.In accordance with the present invention, descriptors are separated into parts of a common format, which parts of the common format are stored in a relationship in a relational database. The method has the advantage of being independent of the structure of the stored descriptor. Only a limited number of common formats are needed to store all types of descriptor formats. Common formats include, for example, elements, attributes, text, and the like. In this way, each descriptor is analyzed word by word, separated into different components, and stored in a relationship that is preferably a table.

본 방법은 공통 포맷에 대한 독립적인 관계를 제공함으로써 더 개선될 수 있다. 모든 질의는 이들 관계만을 이용한다. 예를 들어, 제 1 관계는 텍스트만을 포함하는 한편, 제 2 관계는 요소 등을 포함한다. 이는 제한된 수의 관계로 인해 질의를 빠르고 간단하게 할 수 있다. 예를 들어 텍스트 질의가 수행되어야 한다면, 텍스트를 포함하는 관계만이 검색되어야 한다. 모든 공통 포맷에 대해 독립적인 관계를 제공하는 것이 유리하지만, 마찬가지로 하나를 초과하는 공통 포맷에 대한 관계를 이용하는 것이 가능하다. 예를 들어, 요소 및 속성은 함께 제 1 관계에 저장될 수 있지만, 텍스트는 제 2 관계에 저장된다.The method can be further improved by providing an independent relationship to the common format. All queries use only these relationships. For example, the first relationship includes text only, while the second relationship includes elements and the like. This makes the query quick and simple due to the limited number of relationships. For example, if a text query is to be performed, only relationships that contain text should be retrieved. It is advantageous to provide independent relationships for all common formats, but it is likewise possible to use relationships for more than one common format. For example, elements and attributes may be stored together in a first relationship, while text is stored in a second relationship.

본 발명의 변형에 따라, 본 방법은 설명자 구조의 복구를 허용하는 정보를 관계에 저장하는 단계를 더 포함한다. 질의가 단일 데이터베이스 엔트리만을 전달할 때, 특정 데이터베이스 엔트리에 속하는 설명자의 완전한 구조가 복구될 수 있다.According to a variant of the invention, the method further comprises storing information in the relationship allowing the restoration of the descriptor structure. When a query passes only a single database entry, the complete structure of the descriptor belonging to a particular database entry can be restored.

유리하게, 설명자 구조의 복구를 허용하는 정보는 설명자 내의 공통 포맷의 부분의 설명자 번호 및 상대 및/또는 절대 위치를 포함한다. 이러한 정보를 이용하여, 데이터베이스로부터 적절한 값을 수집하고 유용한 방식으로 이들 값을 분류하는 것이 가능하다. 설명자가 데이터베이스에 저장될 때마다, 단일(univocal) 설명자 번호를 수신한다. 더욱이, 설명자의 공통 포맷의 모든 부분에 대해, 설명자 내의 상대 위치 및/또는 관계 내의 절대 위치가 도출된다. 설명자 번호 및 상대 및/또는 절대 위치는 공통 포맷의 부분과 함께 관계에 저장된다.Advantageously, the information allowing restoration of the descriptor structure includes the descriptor number and relative and / or absolute position of the portion of the common format within the descriptor. Using this information, it is possible to collect appropriate values from the database and classify these values in a useful way. Each time a descriptor is stored in the database, it receives a unique descriptor number. Moreover, for all parts of the descriptor's common format, relative positions in the descriptor and / or absolute positions in the relationship are derived. Descriptor numbers and relative and / or absolute positions are stored in a relationship with parts of the common format.

유리하게, 설명자 구조의 복구를 허용하는 정보는 설명자 내에서 공통 포맷의 부분의 그 다음 상부 계층적 레벨에 대한 표시자(indicator)를 더 포함한다. 이것은, 설명자의 헤드쪽으로의(지향된 레벨) 설명자의 임의의 부분으로부터 시작함으로써 설명자 부분의 빠른 재구성을 용이하게 한다. 그 다음 상부 계층적 레벨은, 공통 포맷의 일부분의 상태 또는 절대 워드 위치만이 예를 들어 질의 결과로서 알려져 있을 때 설명자 부분을 재구성하기 위해 도움을 주는 정보이다.Advantageously, the information allowing recovery of the descriptor structure further includes an indicator for the next upper hierarchical level of the portion of the common format within the descriptor. This facilitates quick reconstruction of the descriptor part by starting from any part of the descriptor towards the head of the descriptor (level directed). The next upper hierarchical level is information that helps to reconstruct the descriptor portion when only the state or absolute word position of a portion of the common format is known, for example as a query result.

본 발명의 다른 양상에 따라, 본 방법은 설명자 인덱스를 관계형 데이터베이스에 저장하는 단계를 더 포함한다. 그러한 설명자 인덱스는, 모든 설명자에 대한 추가 정보를 저장하고 데이터베이스에서 특정 설명자를 쉽게 찾아낼 수 있게 한다.According to another aspect of the invention, the method further comprises storing the descriptor index in a relational database. Such descriptor indexes store additional information about all descriptors and make it easy to find a specific descriptor in the database.

유리하게, 설명자 인덱스는 설명자를 위한 관계 및/또는 고유 식별자 내에서 적어도 설명자 번호, 설명자의 절대 위치를 포함한다. 설명자 인덱스에 이러한 정보를 저장하는 것은 관계 내의 특정 설명자로의 빠른 액세스를 허용한다. 관계 내의 설명자의 절대 위치는 공통 포맷의 제 1 부분의 절대 위치로서 유리하게 한정된다. 고유 식별자가 종종 필요하기 때문에, 이러한 종류의 데이터로의 더 빠른 액세스는 고유 식별자를 설명자 인덱스에 저장함으로써 제공된다. 언급된 정보 이외에, 예를 들어 설명자의 레벨 번호 또는 다른 유용한 데이터와 같은 다른 유형의 정보는 설명자 인덱스에 저장될 수 있다.Advantageously, the descriptor index includes at least the descriptor number, the descriptor's absolute position in the relationship and / or unique identifier for the descriptor. Storing this information in the descriptor index allows quick access to specific descriptors in the relationship. The absolute position of the descriptor in the relationship is advantageously defined as the absolute position of the first part of the common format. Because unique identifiers are often needed, faster access to this kind of data is provided by storing the unique identifier in the descriptor index. In addition to the information mentioned, other types of information may be stored in the descriptor index, such as, for example, the descriptor's level number or other useful data.

유리하게, 설명자를 포함하는 계층적 데이터 포맷은 XML에 대응한다. XML이 광범위하게 사용되고 널리 허용되기 때문에, 이것은 본 발명의 방법의 넓은 범위의 애플리케이션을 허용한다.Advantageously, the hierarchical data format containing the descriptors corresponds to XML. Because XML is widely used and widely accepted, this allows a wide range of applications of the method of the present invention.

본 발명에 따라, 공통 포맷은 적어도 요소, 속성 및 텍스트를 포함한다. 이러한 유형의 공통 포맷은 많은 애플리케이션에 충분하다. 요소가 설명자를 구성하는데 주로 사용되지만, 텍스트는 일반적으로 질의에서 검색되는 정보를 포함한다. 속성은 요소의 특징을 나타내는데 대부분 사용된다.According to the invention, the common format includes at least elements, attributes and text. This type of common format is sufficient for many applications. Although elements are primarily used to construct descriptors, text usually contains the information retrieved from the query. Attributes are most often used to characterize an element.

유리하게, 공통 포맷 텍스트는 스트링 값 및 정수 값으로 더 나누어진다. 이러한 방식으로, 더 빠른 검색이 달성될 수 있는데, 이는 질의에 대해 검색되어야 하는 관계가 더 작아지기 때문이다. 예를 들어 스트링 값에 대한 질의는 단지 스트링 값만을 포함하는 관계에서 수행되는데, 상기 관계는 스트링 및 정수 값 모두를 포함하는 관계보다 적은 요소를 포함한다.Advantageously, the common format text is further divided into string values and integer values. In this way, a faster search can be achieved because the relationship that must be searched for a query is smaller. For example, a query on a string value is performed in a relationship that contains only string values, which contains fewer elements than a relationship that contains both strings and integer values.

유리하게, 공통 포맷은 네임스페이스(namespace)를 더 포함한다. 이러한 특징은 특히 XML에 대해 흥미를 일으키고, 하나의 도큐먼트에 대해 의도된 마크업이 상이한 목적을 위한 다른 도큐먼트와 동일한 요소 유형 또는 속성 이름을 이용할 때 상이한 도큐먼트 사이의 충돌을 방지할 수 있게 한다.Advantageously, the common format further includes a namespace. This feature is of particular interest to XML and allows the markup intended for one document to avoid conflicts between different documents when using the same element type or attribute name as another document for different purposes.

유리하게, 설명자를 포함하는 계층적 데이터 포맷을 관계형 데이터베이스 관리 시스템에 매핑하기 위한 데이터베이스 모델은 본 발명에 따른 방법을 이용한다. 그러한 데이터베이스 모델은 간단하고 빠른 질의, 다양한 설명자 포맷의 융통성있는 처리, 설명자의 간단하고 빠른 재구성, 및 설명자의 간단하고 빠른 삽입을 실행할 수 있다. 더욱이, 그러한 데이터베이스 모델은 기존의 관계형 데이터베이스 관리 시스템을 통해 쉽게 구현될 수 있다.Advantageously, a database model for mapping a hierarchical data format comprising descriptors to a relational database management system utilizes the method according to the invention. Such a database model can perform simple and fast queries, flexible processing of various descriptor formats, simple and quick reconfiguration of descriptors, and simple and quick insertion of descriptors. Moreover, such a database model can be easily implemented through existing relational database management systems.

유리하게, 리코딩 매체로부터 판독 및/또는 상기 리코딩 매체에 기록하기 위한 장치는 설명자를 포함하는 계층적 데이터 포맷을 관계형 데이터베이스 관리 시스템에 매핑하기 위해 본 발명에 따른 방법 또는 데이터베이스 모델을 이용한다. 그러한 장치는 부가 가치 정보를 기존의 관계형 데이터베이스에 저장할 수 있게 한다. 장치의 사용자는 부가 가치 정보를 쉽게 사용 및/또는 편집할 수 있다.Advantageously, an apparatus for reading from and / or writing to a recording medium uses the method or database model according to the invention for mapping a hierarchical data format comprising a descriptor to a relational database management system. Such a device makes it possible to store value added information in an existing relational database. The user of the device can easily use and / or edit the value added information.

본 발명을 더 잘 이해하기 위해, 예시적인 실시예는 계층적 데이터 포맷에 대한 일례로 XML을 이용하여 도면을 참조하여 유리한 실시예의 다음 설명에 규정된다. 본 발명이 이러한 예시적인 실시예에 한정되지 않고, 규정된 특징이 또한 본 발명의 범주에서 벗어나지 않고도 편리하게 조합 및/또는 변형될 수 있음이 이해된다.In order to better understand the present invention, exemplary embodiments are defined in the following description of advantageous embodiments with reference to the drawings using XML as an example of a hierarchical data format. It is to be understood that the invention is not limited to this exemplary embodiment, and that defined features may also be conveniently combined and / or modified without departing from the scope of the invention.

도 1의 a)는 XML 설명자(10)의 개략적인 예를 도시하고, 도 1의 b)는 XML 트리로서 대응하는 표현을 도시한다. 도면에서 알 수 있듯이, 예시적인 실시예(10)는 섹션, 서브섹션 및 서브-서브섹션을 포함하며, 각각은 타이틀을 구비한다. 서브-서브섹션의 타이틀은 값("down")을 갖는 속성("arrow")을 갖는다. 설명자(10)는 총 17개의 워드로 구성되며, 여기서 각 타이틀의 텍스트는 실제 수의 워드와 독립적으로 단일 워드로서 카운트된다. 예를 들어, "Leonardo is swimming"은 3개의 "실제" 워드를 포함하지만, 단일 "논리적" 워드이다. 도 1의 a)에서 설명자(10)의 각 라인에 주어진 번호는 설명자(10) 내의 각 라인의 제 1 워드의 관계 워드 위치이다. 도 1의 b)에서 대응하는 트리 구조로부터, 설명자(10)가 5개의 레벨, 즉 레벨 0으로부터 레벨 4까지로 된 레벨을 갖는다는 것을 알 수 있다. 트리 구조는 설명자(10)의 상이한 워드 사이의 계층적 관계를 예시하는데 도움을 주는 도구이다.FIG. 1A shows a schematic example of an XML descriptor 10 and FIG. 1B shows a corresponding representation as an XML tree. As can be seen in the figure, the exemplary embodiment 10 includes sections, subsections and sub-subsections, each having a title. The title of the sub-subsection has an attribute "arrow" with a value "down". The descriptor 10 consists of a total of 17 words, where the text of each title is counted as a single word independently of the actual number of words. For example, "Leonardo is swimming" includes three "real" words, but is a single "logical" word. The number given to each line of the descriptor 10 in a) of FIG. 1 is the relation word position of the first word of each line in the descriptor 10. It can be seen from the corresponding tree structure in FIG. 1 b that the descriptor 10 has five levels, that is, a level from level 0 to level 4. The tree structure is a tool to help illustrate the hierarchical relationship between different words of the descriptor 10.

도 2에서, 본 발명에 따른 데이터베이스 모델이 도시되며, 여기서 단일 관계(20)가 사용된다. 관계(20)는 테이블로 표시된다. 제 1 열("값")은 저장된 부분(XML 스트링) 자체를 나타낸다. 제 2 열("Descr#")은 데이터베이스 관리 시스템 내의 단일 설명자 번호를 나타낸다. 열("워드 위치")은 특정 설명자(10) 내의 저장된 부분의 상대 위치를 포함한다. 함께 취해진 "Descr#" 및 "워드 위치"는 관계(20)의 1차 키(primary key)인데, 이것은 설명자(10)의 완전한 복구를 허용한다. 각 XML 스트링의 유형은 열("유형")에서의 관계에 포함하여 저장된다. 예에서, 유형은 "요소", "속성" 및 "텍스트"를 포함한다. 마지막 열("레벨")은 도 1의 b)에 도시된 바와 같이 각 XML 스트링의 계층적 레벨을 포함한다. 알 수 있듯이, 설명자(10)의 모든 워드가 관계(20)에 저장되는 것은 아니다. </title> 및 </section>과 같은 "종료(closing)" 워드는 추가 정보를 포함하지 않고, 설명자(10)의 복구에 반드시 필요하지는 않다. 그러므로, 상기 "종료" 워드는 데이터베이스에 저장되지 않는다. 물론, 필요한 경우 이러한 워드를 또한 저장하는 것이 가능하다.In figure 2 a database model according to the invention is shown, where a single relationship 20 is used. Relationship 20 is represented by a table. The first column ("value") represents the stored part (XML string) itself. The second column ("Descr #") represents a single descriptor number in the database management system. The column ("word position") contains the relative position of the stored part within the specific descriptor 10. The "Descr #" and "word position" taken together are the primary key of the relationship 20, which allows complete recovery of the descriptor 10. The type of each XML string is stored in the relationship in the column ("type"). In the example, the type includes "element", "attribute" and "text". The last column ("level") contains the hierarchical levels of each XML string as shown in b) of FIG. As can be seen, not all words of the descriptor 10 are stored in the relationship 20. The words "closing" such as </ title> and </ section> do not contain additional information and are not necessary for the recovery of the descriptor 10. Therefore, the "end" word is not stored in the database. Of course, it is also possible to store these words if necessary.

도 3은 도 2와 유사한 데이터베이스 모델을 도시하지만, 여기서 추가 열("다음 상부 워드 위치")은 관계(21)에 포함되고, 상기 관계(21)는 특정 설명자(10) 내의 XML 스트링의 다음 상부 계층적 워드에 대한 표시자를 포함한다. 이것은, 예를 들어 질의 결과로서 공통 포맷의 일부분의 워드 위치만이 알려져 있을 때 설명자 부분을 복구하는데 도움을 주는 정보이다. 설명자 부분의 빠른 재구성은 이러한 추가 정보를 제공함으로써 용이하게 된다.FIG. 3 shows a database model similar to that of FIG. 2, where additional columns (“next top word position”) are included in relationship 21, which relationship 21 is the next top of the XML string in the specific descriptor 10. Contains indicators for hierarchical words. This is information to help recover the descriptor part, for example when only the word position of a part of the common format is known as a query result. Quick reconfiguration of the descriptor portion is facilitated by providing this additional information.

도 4에서, 도 1과 유사한 다른 개략적인 설명자(11)가 도시된다. 그러나, 이 예에서, 텍스트는 스트링 값 및 정수 값으로 구성된다. 도 4의 b)에서 알 수 있듯이, 스트링 값 및 정수 값은 분리되고, 별도의 "논리적" 워드로서 카운트된다.In FIG. 4, another schematic descriptor 11 similar to FIG. 1 is shown. However, in this example, the text consists of a string value and an integer value. As can be seen in FIG. 4 b), the string value and the integer value are separated and counted as separate " logical " words.

도 5는 도 2에 도시된 것과 유사한 데이터베이스 모델을 도시한다. 그러나, 이 예에서, XML 스트링은 요소, 속성, 스트링 값 및 정수 값으로 분리되고, 상이한 관계(22, 23, 24, 25)에 저장된다. 이것은 관계(22, 23, 24, 25) 내부에서의 더 빠른 검색을 허용한다. 설명자 번호 및 워드 위치로 인해, 상이한 관계(22, 23, 24, 25)로부터 완전한 설명자(11)를 여전히 복구할 수 있다. 값("유형")은 이러한 실시예에 반드시 필요하지 않은데, 이는 모든 관계(22, 23, 24, 25)가 특정한 유형만을 포함하기 때문이다.FIG. 5 shows a database model similar to that shown in FIG. 2. However, in this example, the XML string is separated into elements, attributes, string values, and integer values, and stored in different relationships 22, 23, 24, 25. This allows for a faster search within the relationships 22, 23, 24, 25. Due to the descriptor number and word position, it is still possible to recover the complete descriptor 11 from different relationships 22, 23, 24, 25. The value ("type") is not necessary for this embodiment, because all relationships 22, 23, 24, 25 contain only a specific type.

도 6에서, 본 발명에 따른 데이터베이스 모델의 추가 변형이 도시된다. 데이터베이스 모델은 도 3에 도시된 것과 유사하지만, 관계(31) 내부의 반복이 제거된다. 이것은, 요소, 스트링 및 정수 값, 및 속성에 대한 추가 관계(32, 33, 34, 35)("2차 관계")를 제공함으로써 달성된다. 각 XML 스트링에 대해, 값("유형") 및 대응하는 설명자 키("Descr.key")는 "1차" 관계(31)에 포함된다. 설명자 키는 특정 유형의 XML 스트링에 대한 추가 관계(32, 33, 34, 35)에서의 대응하는 엔트리를 나타낸다. 함께 취해진 열("유형" 및 "Descr.key")은 2차 키로서 간주될 수 있는데, 이는 상기 열이 1차 키에 의해 규정된 각 XML 스트링을 특정 값에 연결시키기 때문이다.In figure 6 further modifications of the database model according to the invention are shown. The database model is similar to that shown in FIG. 3, but the repetition within the relationship 31 is eliminated. This is accomplished by providing additional relationships 32, 33, 34, 35 ("secondary relationships") for elements, string and integer values, and attributes. For each XML string, the value ("type") and the corresponding descriptor key ("Descr.key") are included in the "primary" relationship 31. The descriptor key represents the corresponding entry in the additional relationships 32, 33, 34, 35 for the particular type of XML string. The columns taken together ("type" and "Descr.key") can be considered as secondary keys because the columns associate each XML string defined by the primary key with a particular value.

도 7은 일반적인 메타데이터 설명자(1)를 도시한다. 메타데이터 설명자의 실제 컨텐트는 코어(6)에 포함된다. 더욱이, 메타데이터 설명자(1)는 네임스페이스 선언(2), 고유 식별자(4) 및 다른 메타데이터 설명자로의 링크(5)를 포함한다. 네임스페이스 선언(2) 및 고유 식별자(4)는 종종 필요하기 때문에 데이터베이스 관리 시스템 내의 특정 위치에 저장된다. 그 의도는 이러한 종류의 데이터로의 빠른 액세스를 제공하기 위한 것이다. 네임스페이스 선언(2)은 특정 메타데이터 설명자(1)에 대해서만 유효하다. 고유 식별자(4)는 메타데이터 설명자(1)의 명백한 식별을 허용한다.7 shows a general metadata descriptor 1. The actual content of the metadata descriptor is included in the core 6. Moreover, the metadata descriptor 1 includes a namespace declaration 2, a unique identifier 4, and a link 5 to other metadata descriptors. Namespace declarations 2 and unique identifiers 4 are often stored at specific locations within the database management system because they are needed. The intention is to provide quick access to this kind of data. The namespace declaration (2) is only valid for the specific metadata descriptor (1). The unique identifier 4 allows for explicit identification of the metadata descriptor 1.

도 8은 도 7에 도시된 것과 같은 복수의 메타데이터 설명자(1)를 포함하는 메타데이터 스트림(7)을 도시한다. 더욱이, 메타데이터 스트림(7)은 네임스페이스 선언(2)을 포함하는데, 상기 네임스페이스 선언(2)은 특정 메타데이터 스트림(7) 내부의 모든 메타데이터 설명자(1)에 대해 유효하다.FIG. 8 shows a metadata stream 7 comprising a plurality of metadata descriptors 1 as shown in FIG. 7. Moreover, the metadata stream 7 includes a namespace declaration 2, which is valid for all metadata descriptors 1 within a particular metadata stream 7.

도 9에서, 설명자 인덱스(40)의 이용이 도시된다. 설명자 인덱스(40)는 데이터베이스에 저장된 각 설명자에 대해, 설명자 번호, 설명자의 레벨의 수("Max Level"), 고유 식별자("UUID") 및 관계(41) 내의 절대 위치("Abs.Pos")를 포함한다. 대응하는 관계(41)는 도 6에 도시된 것과 유사하다. 그러나, 상기 관계(41)는 각 XML 스트링의 절대 위치 및 네임스페이스 선언을 더 포함한다. 2차 키에 의해 어드레싱되는, 요소, 스트링 값, 정수 값 등을 포함하는 추가 관계는 간략함을 위해 도시되지 않는다.In FIG. 9, the use of the descriptor index 40 is shown. The descriptor index 40, for each descriptor stored in the database, includes a descriptor number, the number of descriptor levels ("Max Level"), a unique identifier ("UUID"), and an absolute position in the relationship 41 ("Abs.Pos"). ). The corresponding relationship 41 is similar to that shown in FIG. 6. However, the relationship 41 further includes the absolute location and namespace declaration of each XML string. Additional relationships, including elements, string values, integer values, etc., addressed by secondary keys, are not shown for simplicity.

도면에 도시된 데이터베이스 모델은 다음과 같은 복수의 장점을 갖는다:The database model shown in the figures has a number of advantages:

- 인입 XML 스트림의 공통 포맷으로의 분리를 제공함으로써 모든 종류의 설명자를 저장할 수 있는 융통성.Flexibility to store all kinds of descriptors by providing separation of the incoming XML stream into a common format.

- 제한된 수의 관계로 인한 빠른 질의. 예를 들어, 텍스트 질의는 "스트링 값" 또는 "요소"와 같은 작은 수의 관계에서만, 즉 스트링이 저장되는 그러한 관계에서만 수행되어야 한다.Fast queries due to a limited number of relationships. For example, a text query should only be performed in a small number of relationships, such as "string values" or "elements", ie only in those relationships in which strings are stored.

- 제한된 수의 관계로 인해 데이터베이스 관리 시스템 내의 그러한 데이터베이스 모델의 빠른 구현. 다른 데이터베이스 모델은 각 설명자 유형에 대해 적어도 하나의 관계를 필요로 한다.Fast implementation of such a database model within a database management system due to a limited number of relationships. Different database models require at least one relationship for each descriptor type.

- 데이터베이스의 특정 모델링으로 인해, 즉 속성("Descr#" 및 "워드 위치")를 이용함으로써 설명자를 다시 XML 포맷으로의 빠른 복구.Fast restoration of descriptors back to XML format due to the specific modeling of the database, ie by using attributes ("Descr #" and "word position").

- 추가 정보("다음 상부 워드 위치")를 제공함으로써 설명자 부분의 빠른 복구. 이것은 설명자의 헤드쪽으로의(지향된 레벨) 설명자의 임의 부분으로부터 시작할 때 도움이 된다.-Fast recovery of the descriptor part by providing additional information ("next upper word position"). This is helpful when starting from any part of the descriptor towards the head of the descriptor (level oriented).

상술한 바와 같이, 본 발명은 계층적 데이터 포맷을 관계형 데이터베이스 관리 시스템에 매핑하는 방법에 관한 것으로, 더욱이, 본 발명은 그러한 방법을 이용하여 리코딩 매체로부터 판독 및/또는 리코딩 매체에 기록하기 위한 데이터베이스 모델 및 장치 등에 이용된다.As noted above, the present invention relates to a method of mapping a hierarchical data format to a relational database management system, and furthermore, the present invention relates to a database model for reading from and / or writing to a recording medium using such a method. And devices.

Claims

A method of mapping a hierarchical data format including descriptors (1, 10, 11) to a relational database management system,

Separating the descriptors (1, 10, 11) into parts of a common format,

Storing portions of the common format as relations (20, 21, 22 ...) in the relational database

And a hierarchical data format comprising a descriptor to a relational database management system.

2. The relational data format of claim 1, further comprising providing an independent relationship (22, 23, ..., 32, 33, ...) to said common format. How to map to a database management system.

3. A hierarchical data format according to claim 1 or 2, further comprising the step of storing information in the relationship (20, 21, 22, ...) that permits recovery of the descriptor structure. How to map to a relational database management system.

4. A method according to claim 3, characterized in that the information allowing restoration of the descriptor structure comprises relative and / or absolute positions of parts of a common format within the descriptors (1, 10, 11), and descriptor numbers, A method of mapping a hierarchical data format containing descriptors to a relational database management system.

5. The information as claimed in claim 4, wherein the information allowing restoration of the descriptor structure further comprises an indicator for the next upper hierarchical level of the portion of the common format in the descriptor (1, 10, 11). A method of mapping a hierarchical data format comprising a descriptor to a relational database management system.

6. A method as claimed in claim 4 or 5, further comprising storing a descriptor index (40) in the relational database.

7. The descriptor index (40) of claim 6, wherein the descriptor index (40) is at least a descriptor number, an absolute position of the descriptor (1, 10, 11) in the relationship (20, 21, 22 ...), and / or the descriptor (1, A hierarchical data format comprising a descriptor, characterized in that it comprises a unique identifier (4) of 10, 11).

8. A hierarchical data format according to any one of the preceding claims, characterized in that the hierarchical data format comprising the descriptors (1, 10, 11) corresponds to XML (eXtensible Markup Language). How to map a hierarchical data format including descriptors to a relational database management system.

9. A method as claimed in any preceding claim, wherein said common format includes at least elements, attributes and text.

10. The method of claim 9, wherein the common format text is divided into a string value and an integer value.

11. A method according to claim 9 or 10, wherein the common format further comprises namespace information (2).

A database model for mapping hierarchical data formats including descriptors (1, 10, 11) to a relational database management system,

A database model for mapping a hierarchical data format comprising a descriptor to a relational database management system, using the method according to any one of claims 1 to 11.

An apparatus for reading from and / or writing to a recording medium, the apparatus comprising:

The method of any one of claims 1 to 11, or the database model of claim 12, for mapping a hierarchical data format comprising descriptors (1, 10, 11) to a relational database management system. An apparatus for reading from and / or writing to a recording medium.