KR100660028B1

KR100660028B1 - A Scheme of Indexing and Query of XML Tree based Concept Structure of Database

Info

Publication number: KR100660028B1
Application number: KR1020050029342A
Authority: KR
Inventors: 추교남; 김일진; 우요섭
Original assignee: 인천대학교 산학협력단
Priority date: 2005-02-23
Filing date: 2005-04-08
Publication date: 2006-12-20
Also published as: KR20060094000A

Abstract

본 발명은 XML(eXtensible Markup Language) 문서 내의 구조 정보를 얻어내기 위한 데이터베이스 개념 구조에 기반한 XML 트리의 색인 방법, 및 XML 문서 구조 정보에 대한 질의 수행을 위한 데이터베이스 개념 구조에 기반한 XML 트리의 질의 방법에 관한 것이다. 본 발명에 따른 XML 트리의 색인 방법은, a) XML 문서를 트리로 변환하는 단계; b) 트리의 각 노드에 순차적으로 각 레벨에 해당하는 번호를 부여하여 확장된 트리로 재구성하는 단계; c) 확장된 트리에 대해 데이터를 색인화하고, 비트 스트림으로 변환하는 단계; 및 d) 색인화된 데이터 및 비트 스트림을 데이터베이스에 저장하는 단계를 포함한다. 본 발명에 따르면, 모든 구조 정보들을 효율적인 비트 스트림으로 변환하여 색인 알고리즘의 복잡성과 비효율적인 연산의 문제점을 해결하고, 또한, XML 문서 구조 정보에 대한 질의 수행시 색인 파일에 기반한 질의 스키마를 통해 빠르게 변환한 후, 색인 파일에 접근하여 반복적인 액세스를 피하고, 그 연산 결과를 빠르게 검색함에 따라, XML 문서의 효과적 분석 및 검색이 가능하고, 또한, XML 문서와 데이터베이스간의 원활한 호환성을 제공할 수 있다.The present invention provides an indexing method of an XML tree based on a database conceptual structure for obtaining structural information in an eXtensible Markup Language (XML) document, and a querying method of an XML tree based on a database conceptual structure for querying XML document structure information. It is about. An indexing method of an XML tree according to the present invention includes the steps of: a) converting an XML document into a tree; b) reconstructing the extended tree by assigning numbers corresponding to each level sequentially to each node of the tree; c) indexing data over the extended tree and converting it into a bit stream; And d) storing the indexed data and bit stream in a database. According to the present invention, all the structure information is converted into an efficient bit stream to solve the problem of complexity and inefficient operation of the index algorithm, and also, through the query file based on the index file when performing the query on the XML document structure information After that, by accessing the index file to avoid repetitive access and retrieving the operation result quickly, the efficient analysis and retrieval of the XML document can be performed, and also the smooth compatibility between the XML document and the database can be provided.

XML, 트리, 색인, 질의, 검색, 데이터베이스 XML, tree, index, query, search, database

Description

A Scheme of Indexing and Query of XML Tree based Concept Structure of Database

도 1은 본 발명의 실시예에 따른 데이터베이스 개념 구조에 기반한 XML 트리의 색인 방법을 나타내는 동작 흐름도이다.1 is a flowchart illustrating an indexing method of an XML tree based on a database conceptual structure according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 데이터베이스의 색인 스키마를 나타내는 테이블이다.2 is a table illustrating an index schema of a database according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 XML 문서 트리에 순차적으로 번호를 부여하는 것을 예시하는 도면이다.3 is a diagram illustrating sequentially numbering XML document trees according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 노드별 비트 스트림을 예시하는 테이블이다.4 is a table illustrating a bit stream per node according to an embodiment of the present invention.

도 5는 본 발명의 실시예에 따른 데이터베이스 개념 구조에 기반한 색인 파일 구조를 예시하는 도면이다.5 is a diagram illustrating an index file structure based on a database conceptual structure according to an embodiment of the present invention.

도 6은 본 발명의 실시예에 따른 데이터베이스 개념 구조에 기반한 XML 트리의 질의 방법을 나타내는 동작 흐름도이다.6 is an operation flowchart illustrating a query method of an XML tree based on a database concept structure according to an embodiment of the present invention.

도 7은 본 발명의 실시예에 따른 그래프 형식의 XML 쿼리(Query)를 나타내는 도면이다.7 is a diagram illustrating an XML query in a graph form according to an embodiment of the present invention.

도 8은 본 발명의 실시예에 따른 질의의 비트 스트림을 나타내는 도면이다.8 illustrates a bit stream of a query according to an embodiment of the present invention.

도 9는 본 발명의 실시예에 따른 질의에 대한 비트 스트림 변환을 예시하는 테이블이다.9 is a table illustrating bit stream conversion for a query according to an embodiment of the present invention.

도 10은 본 발명의 실시예에 따른 질의에 대한 비트 스트림을 예시하는 테이블이다.10 is a table illustrating a bit stream for a query according to an embodiment of the present invention.

본 발명은 데이터베이스 개념 구조에 기반한 XML 트리의 색인 및 질의 방법에 관한 것으로서, 보다 구체적으로, XML 문서 내의 구조 정보를 얻어내기 위한 데이터베이스 개념 구조에 기반한 XML 트리의 색인 방법, 및 XML 문서 구조 정보에 대한 질의 수행을 위한 데이터베이스 개념 구조에 기반한 XML 트리의 질의 방법에 관한 것이다.The present invention relates to a method of indexing and querying an XML tree based on a database conceptual structure, and more particularly, to an indexing method of an XML tree based on a database conceptual structure for obtaining structural information in an XML document, and an XML document structure information. It is about query method of XML tree based on database conceptual structure for query execution.

1996년, W3C(World Wide Web Consortium)에서 제안된 확장성표기언어인 XML(eXtensible Markup Language)은 HTML(HyperText Markup Language)보다 홈페이지 구축 기능, 검색 기능 등이 향상되고, 클라이언트 시스템의 복잡한 데이터를 쉽게 처리할 수 있다. 또한, XML은 인터넷 사용자가 웹에 추가할 내용의 작성 및 관리가 쉽도록 되어 있다.In 1996, the Extensible Markup Language (XML), an extensibility markup language proposed by the World Wide Web Consortium (W3C), improves homepage building and searching functions than HyperText Markup Language (HTML), and makes it easier to manage complex data on client systems. Can be processed. XML also makes it easy for Internet users to create and manage content to add to the Web.

또한, HTML이 웹 페이지에서 데이터베이스처럼 구조화된 데이터를 지원할 수 없는 반면에, XML은 사용자가 구조화된 데이터베이스를 뜻대로 조작할 수 있다. 즉, XML 문서들은 구조적으로 SGML(Standard Generalized Markup Language) 문서 형식을 따르고 있다.Also, while HTML can't support structured data like databases in web pages, XML allows users to manipulate structured databases at will. In other words, XML documents are structurally compliant with the Standard Generalized Markup Language (SGML) document format.

다시 말하면, XML은 인터넷상에서 의학, 경영, 법률, 논문 등 복잡하고 구조화된 문서자료의 저장, 관리, 검색을 용이하게 할 수 있을 뿐만 아니라, 나아가 전자상거래, 전자 도서관, 가상대학 등의 핵심 응용 시스템의 구축에서 중요한 역할을 할 것으로 기대된다. 따라서, XML 문서의 효과적인 처리를 위한 다양한 지원 시스템을 구축하는 연구는 경제, 사회적인 측면에서 매우 중요하다.In other words, XML not only facilitates the storage, management, and retrieval of complex and structured document data such as medicine, management, law, and thesis on the Internet, but it also enables core application systems such as e-commerce, electronic libraries, and virtual universities. It is expected to play an important role in the construction of the. Therefore, researches to build various supporting systems for effective processing of XML documents are very important in economic and social aspects.

이하, 종래 기술에 따른 엘리먼트 기반 색인 기법 및 K-ary 트리(Tree) 기반 색인 기법에 대해 설명한다.Hereinafter, the element-based indexing technique and the K-ary tree-based indexing technique according to the prior art will be described.

1) 엘리먼트 기반 색인 기법1) Element based indexing technique

엘리먼트 기반 색인 기법은 XML 문서의 DTD(Document Type Definition)로부터 엘리먼트 타입을 추출하고, 임의의 XML 문서에서 구조 정보를 추출하여 매핑 테이블을 구성한 후, 상기 추출한 구조 정보를 이용하여 내용 색인, 구조 색인 및 애트리뷰트 색인을 구성하는 기법이다. 또한, 엘리먼트 기반 색인 기법은 상기 색인들의 결합을 통한 혼합 검색을 지원한다.The element-based indexing technique extracts an element type from a document type definition (DTD) of an XML document, constructs a mapping table by extracting structure information from an arbitrary XML document, and then uses the extracted structure information to construct a content index, a structure index, and A technique for constructing attribute indexes. In addition, element-based indexing techniques support mixed search through a combination of the indexes.

이러한 엘리먼트 기반 색인 기법은 추출된 엘리먼트 이름과 ID를 연결시켜 주는 ETID(Element Type ID)를 구성한 후, XML 문서의 구조 정보를 트리로 구성하여 각 엘리먼트에 ETID 값을 부여한다. 이때, 상기 ETID 값은 부모 엘리먼트의 정보뿐만 아니라 형제 엘리먼트의 순서 정보까지 포함되어 있으므로, 임의의 엘리먼트로부터 특정 엘리먼트를 검색하기 용이하다. 이러한 엘리먼트 기반 색인 기법은 특정 엘리먼트에 대한 직접적인 접근이 가능하며, 다양한 구조적 질의를 효과적으 로 처리할 수 있다는 장점이 있다.This element-based indexing scheme constructs an ETID (Element Type ID) that connects the extracted element name and ID, and then constructs the XML document structure information in a tree and assigns an ETID value to each element. In this case, since the ETID value includes not only information of a parent element but also order information of sibling elements, it is easy to search for a specific element from any element. This element-based indexing technique has the advantage of allowing direct access to specific elements and efficiently processing various structural queries.

XML 문서는 그 특성상 비슷한 유형의 문서들의 집합으로서, 각 엘리먼트별로 어떤 형식의 내용이 입력되어 있을지 예측할 수 있으므로, 각 엘리먼트에 입력될 내용의 특성을 파악한다면 검색시에 보다 정확한 결과를 반영할 수 있다. 그러나, 이러한 엘리먼트 기반 색인 기법은 XML 문서의 내용 색인을 단지 일반 텍스트 문서의 색인 방법으로 처리하고 있을 뿐이다.An XML document is a collection of documents of a similar type because of its characteristics, and it is possible to predict what type of content is input for each element. Therefore, if you understand the characteristics of the content to be input to each element, it is possible to reflect more accurate results when searching. . However, this element-based indexing technique merely handles indexing the content of an XML document as a plain text document.

2) K-ary 트리(Tree) 기반 색인 기법2) K-ary Tree-based Indexing Technique

K-ary 트리 기반 색인 기법은 SGML 문서를 K-ary 완전 트리 매핑을 통해 구조 검색을 지원하기 위한 색인 기법으로서, K-ary의 완전 트리의 특성을 이용하여 논리적 구조 관계인 부모노드와 자식노드의 관계를 연산식을 이용하여 손쉽게 구하여 연산을 단순화할 수 있다. 또한, K-ary 트리 기반 색인 기법은 연산이 단순하기 때문에 엘리먼트 검색이 매우 빠르다는 장점이 있다.The K-ary tree-based indexing technique is an indexing technique to support SGML document structure searching through K-ary full tree mapping. It uses the characteristics of K-ary's complete tree to establish a logical structure between parent and child nodes. Can be easily obtained using a formula to simplify the calculation. In addition, the K-ary tree-based indexing technique has the advantage that element retrieval is very fast because of simple operation.

이러한 K-ary 트리 기반 색인 기법은 문서의 구문 트리를 작성한 후, 노드들 중에 가장 큰 차수 K를 구하여 K-ary 완전 트리로 재구성한 후, 여기에 각 노드에 대한 모든 번호를 부여한다. 즉, 이러한 K-ary 트리 기반 색인 방법은 문서 구조 사이의 계층 관계를 완전 트리 상의 가상 노드까지를 포함하여 노드 번호를 부여하게 된다.This K-ary tree-based indexing method creates a syntax tree of the document, obtains the largest order K among the nodes, reconstructs it into a K-ary complete tree, and gives all the numbers for each node. That is, this K-ary tree-based indexing method assigns node numbers including hierarchical relationships between document structures up to virtual nodes on a complete tree.

그러나, 전술한 K-ary 완전 트리 색인 기법은 매핑 과정에서 Null 노드가 깊어질수록 노드 변화가 커진다는 단점이 있는데, 사용하지 않는 가상노드의 숫자가 많아지고, 데이터의 양이 커짐을 의미한다.However, the above-described K-ary full tree indexing technique has a disadvantage that the node change increases as the Null node deepens in the mapping process, which means that the number of unused virtual nodes increases and the amount of data increases.

또한, K-ary 완전 트리 색인 기법에서는, 구조화된 문서임에도 불구하고 한 레벨에서 다양한 타입의 엘리먼트가 존재하게 되어 특정범위의 제한 검색이 어렵고, 문서의 구조 변경이 발생하였을 경우, 그 차수만큼의 노드를 모두 사용하여 할당할 수 있는 노드 번호가 없을 때 처음부터 다시 문서의 구조 정보를 추출하여 색인을 구성해야 하는 문제점이 있다.In addition, in the K-ary full tree indexing technique, even though it is a structured document, various types of elements exist at one level, so that it is difficult to search a limited range of a specific range. When there is no node number that can be allocated by using all of them, there is a problem that the index must be constructed by extracting the document structure information from the beginning again.

전술한 바와 같이, 종래 기술에 따르면 XML 문서 내의 구조 정보를 얻어내기 위한 색인 알고리즘이 복잡하고, 또한, 많은 데이터베이스 테이블을 생성함으로써 여러 번 연산해야 하는 문제점을 가지고 있었다.As described above, according to the related art, the indexing algorithm for obtaining the structural information in the XML document is complicated, and there is a problem that it needs to be operated several times by generating many database tables.

상기 문제점을 해결하기 위한 본 발명의 목적은, XML 문서의 효과적 분석 및 검색이 가능하고, 또한, XML 문서와 데이터베이스간의 원활한 호환성이 가능한 데이터베이스 개념 구조에 기반한 XML 트리의 색인 및 질의 방법을 제공하기 위한 것이다.An object of the present invention for solving the above problems is to provide a method for indexing and querying an XML tree based on a database conceptual structure that enables efficient analysis and retrieval of an XML document, and also enables seamless compatibility between an XML document and a database. will be.

상기 목적을 달성하기 위한 수단으로, 본 발명에 따른 XML 트리의 색인 방법은, 데이터베이스 개념 구조에 기반한 XML(eXtensible Markup Language) 트리의 색인 방법에 있어서, a) XML 문서를 미리 설정된 색인파일 스키마 구조―여기서, 상기 색인파일 스키마 구조는, 상기 데이터베이스 개념 구조에 기반한 다수의 필드를 포함하는 색인 스키마 구조임.―에 기초하여 트리로 변환하는 단계; b) 상기 트리의 각 노드에 순차적으로 각 레벨에 해당하는 번호를 부여하여 확장된 트리로 재구성하는 단계; c) 상기 확장된 트리에 대해 데이터를 색인화하고, 비트 스트림―여기서,상기 비트 스트림은 모든 노드들과 일대일로 대응되는 고유하고 유일한 값을 포함함.―으로 변환하는 단계; 및 d) 상기 색인화된 데이터 및 비트 스트림을 데이터베이스에 저장하는 단계를 포함하는 것을 특징으로 한다.As a means for achieving the above object, the indexing method of the XML tree according to the present invention, in the indexing method of the XML (eXtensible Markup Language) tree based on the database concept structure, a) XML document pre-set index file schema structure- Wherein the index file schema structure is an index schema structure that includes a plurality of fields based on the database concept structure; b) reconstructing an extended tree by assigning numbers corresponding to each level sequentially to each node of the tree; c) indexing data for the expanded tree and converting to a bit stream, where the bit stream contains a unique and unique value one-to-one corresponding to all nodes; And d) storing the indexed data and bit stream in a database.

또한, 상기 목적을 달성하기 위한 다른 수단으로, 본 발명에 따른 XML 트리의 질의 방법은, 데이터베이스 개념 구조에 기반한 XML 트리의 질의 방법에 있어서, a) 질의가 발생하는 경우, 데이터베이스와의 정합을 위해 전체 질의 경로를 복원하는 단계; b) 상기 복원된 전체 질의 경로에 따라 미리 저장된 색인 테이블―상기 색인 테이블은, 질의 분석에 필요한 각 노드의 이름, 고유한 비트 스트림값, 부모 노드의 비트 스크림값과 레벨값을 포함함.―을 기초로 상기 질의를 비트 스트림으로 변환하는 단계; 및 c) 상기 변환된 비트 스트림에 대한 질의 결과값을 사용자에게 리턴하는 단계를 포함하는 것을 특징으로 한다.In addition, as another means for achieving the above object, the query method of the XML tree according to the present invention, in the query method of the XML tree based on the database concept structure, a) when a query occurs, for matching with the database Restoring the entire query path; b) an index table pre-stored according to the restored full query path, wherein the index table includes the name of each node, a unique bit stream value, and a bit scrim value and level value of a parent node required for query analysis. Converting the query into a bit stream on a basis; And c) returning a query result value for the converted bit stream to a user.

이하, 첨부된 도면을 참조하여, 본 발명의 실시예에 따른 데이터베이스 개념 구조에 기반한 XML 트리의 색인 및 질의 방법을 상세히 설명한다.Hereinafter, an index and a query method of an XML tree based on a database concept structure according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

일반적으로, XML 문서는 일정한 형식의 문서 양식을 정의해 놓은 DTD(Document Type Definition)를 기준으로 각 엘리먼트에 내용을 입력하여 문서들을 구조화시킨다.In general, an XML document is structured by inputting content into each element based on a document type definition (DTD) that defines a document type in a certain format.

따라서, 상기 XML 문서의 구조를 파악한다면 사용자가 원하는 문서의 검색시 매우 효과적인 방법을 제공해 줄 수 있다. 즉, 각 엘리먼트에 입력되는 내용의 특징과 엘리먼트간 계층적 정보를 파악하여 색인어 추출 과정에 고려함으로써, 사용자의 다양한 요구에 대해 보다 정확하고 다양한 검색을 수행할 수 있다.Therefore, if the structure of the XML document is grasped, it can provide a very effective method for searching for a desired document. In other words, by identifying the characteristics of the content input to each element and the hierarchical information between the elements and considering them in the index word extraction process, it is possible to perform a more accurate and diverse search for various needs of the user.

본 발명의 실시예는, XML 문서 내의 구조 정보를 얻어내기 위한 데이터베이스 개념 구조에 기반한 XML 트리의 색인 방법, 및 XML 문서 구조 정보에 대한 질의 수행을 위한 데이터베이스 개념 구조에 기반한 XML 트리의 질의 방법을 개시하며, 이하 각각 색인 방법 및 질의 방법을 설명하기로 한다.An embodiment of the present invention discloses an indexing method of an XML tree based on a database concept structure for obtaining structure information in an XML document, and a querying method of an XML tree based on a database concept structure for performing a query on XML document structure information. The following describes the indexing method and the querying method, respectively.

1) 색인 스키마1) Index Schema

도 1을 참조하면, 본 발명의 실시예에 따른 XML 트리의 색인 방법은, 먼저, XML 문서를 트리로 변환하고(S110), 상기 XML 트리의 각 노드에 순차적으로 각 레벨에 해당하는 번호를 부여하게 된다(S120).Referring to FIG. 1, in the indexing method of an XML tree according to an embodiment of the present invention, first, an XML document is converted into a tree (S110), and each node of the XML tree is sequentially assigned a number corresponding to each level. It is made (S120).

다음으로, 각 레벨에 해당하는 번호를 연속적으로 부여함으로써, 상기 트리에 대한 확장된 트리를 재구성하게 된다(S130).Next, by sequentially assigning a number corresponding to each level, the extended tree for the tree is reconstructed (S130).

다음으로, 상기 확장된 트리에 대해 데이터를 색인화하고, 비트 스트림으로 변환하게 된다(S140).Next, data is indexed for the expanded tree and converted into a bit stream (S140).

다음으로, 상기 색인화된 데이터 및 비트 스트림을 데이터베이스에 저장하게 된다(S150).Next, the indexed data and bit stream are stored in a database (S150).

이하, 도 2 내지 도 5를 참조하여, 본 발명의 실시예에 따른 XML 트리의 색인 방법을 보다 구체적으로 설명한다.Hereinafter, an indexing method of an XML tree according to an embodiment of the present invention will be described in more detail with reference to FIGS. 2 to 5.

도 2는 본 발명의 실시예에 따른 데이터베이스의 색인 스키마를 나타내는 테이블로서, XML 트리에 따른 색인 파일의 스키마 구조를 나타낸다.2 is a table illustrating an index schema of a database according to an embodiment of the present invention, and illustrates a schema structure of an index file according to an XML tree.

도 2를 참조하면, 전체적인 필드의 역할은 다음과 같이 정의된다. 즉, N_name 필드(210)는 XML 트리 노드의 이름을 나타내고, B_Value 필드(220)는 각 노드의 비트 스트림 값을 나타내며, Tb_len 필드(230)는 해당 노드 비트 스트림 전체 길이 값을 나타내며, Belen 필드(240)는 현 노드에 가변적으로 할당되는 비트 스트 림의 길이 값을 나타내며, Level 필드(250)는 현 노드가 XML 트리 상에 존재하는 레벨을 나타내며, P_value 필드(260)는 현 노드의 부모 노드의 비트 스트림 값을 나타내며, Data 필드(270)는 각 노드들이 가지고 있는 값을 나타낸다.Referring to Figure 2, the role of the overall field is defined as follows. That is, the N_name field 210 represents the name of the XML tree node, the B_Value field 220 represents the bit stream value of each node, the Tb_len field 230 represents the corresponding node bit stream total length value, and the Belen field ( 240 indicates a length value of a bit stream that is variably assigned to the current node, a Level field 250 indicates a level at which the current node exists in the XML tree, and a P_value field 260 indicates a parent node of the current node. Represents a bit stream value, and the Data field 270 represents a value of each node.

한편, 도 3은 본 발명의 실시예에 따른 XML 문서 트리에 순차적으로 번호를 부여하는 것을 예시하는 도면으로서, DTD에 기반한 기본 문서를 생성한 후, XML 문서 트리에 2진수의 순차적 번호를 부여하는 것을 보여주고 있다. 여기서, 도면부호 301 내지 도면부호 312는 각각 순차적으로 번호가 부여되는 노드들을 나타내며, 도면부호 313은 값(value)을 나타낸다.Meanwhile, FIG. 3 is a diagram illustrating sequentially numbering an XML document tree according to an embodiment of the present invention. After generating a basic document based on a DTD, binary numbers are sequentially assigned to an XML document tree. Is showing. Here, reference numerals 301 to 312 denote nodes each sequentially numbered, and reference numeral 313 denotes a value.

도 3을 참조하면, 각 레벨의 노드마다 순차적으로 번호를 부여하고, 각 레벨의 노드 수만큼 비트를 할당한다. 이때, 뿌리 노드를 기점으로 하여 각 노드들의 경로를 따라 가면, 각 경로마다 유일한 비트 스트림이 생성되고, 이 값 자체는 부모-자식, 조상-후손, 형제 등과 같은 트리의 모든 노드들의 구조 정보 또한 가지게 된다.Referring to FIG. 3, numbers are sequentially assigned to nodes of each level, and bits are allocated by the number of nodes of each level. At this time, if you follow the path of each node starting from the root node, a unique bit stream is generated for each path, and this value itself has structure information of all nodes of the tree such as parent-child, ancestor- descendant, sibling, etc. do.

즉, 뿌리노드로부터 연속적으로 노드를 방문하면서 각 노드에 부여된 비트 스트림 값을 얻고, 이 비트 스트림을 할당된 고정 비트 공간의 최하위 비트에서 시작하여 각 레벨별로 비트 공간을 할당받는다. 예를 들면, D-R-N-V₁의 경우, D(delivery: 301)에 1이 할당되며, R(receiver: 302)에 01이 할당되고, N(name: 307)에 001이 할당된다.That is, while continuously accessing nodes from the root node, a bit stream value assigned to each node is obtained, and the bit stream is allocated at each level starting from the least significant bit of the allocated fixed bit space. For example, in the case of DRNV ₁ , 1 is assigned to D (delivery: 301), 01 is assigned to R (receiver: 302), and 001 is assigned to N (name: 307).

한편, 도 4는 본 발명의 실시예에 따른 노드별 비트 스트림을 예시하는 테이 블로서, 노드별 비트 스트림이 할당된 표이다. 여기서, 도면부호 410은 레벨을 나타내고, 도면부호 420은 전체 비트수를 나타내며, 도면부호 430은 노드별 비트 스트림을 각각 나타내며, 여기서 0_i는 0이 i번 반복되는 것을 의미한다.On the other hand, Figure 4 is a table illustrating a bit stream for each node according to an embodiment of the present invention, a table to which a bit stream for each node is assigned. Here, reference numeral 410 denotes a level, reference numeral 420 denotes the total number of bits, reference numeral 430 denotes a bit stream for each node, where 0 _i means that 0 is repeated i times.

도 4를 참조하면, 상기 비트 스트림 값들은 모든 노드들과 일대일로 대응되며, 고유하고 유일한 값이기 때문에 각 노드의 이름별 비트 값을 알 수 있다면 XML 문서 전체를 복원할 수 있다.Referring to FIG. 4, since the bit stream values correspond one-to-one with all nodes and are unique and unique values, the entire XML document may be restored if the bit values of each node are known.

또한, 비트 스트림의 전체 크기는 32 비트로 고정된 공간을 할당한다. 하지만, XML 문서의 레벨이 늘어나거나 레벨별 노드의 수가 증가하면, 보다 큰 공간을 할당하여 전체 알고리즘에 변화 없이 효율적으로 비트 확장이 용이하도록 할 수 있다. 이와 같이 얻어진 노드별 비트 스트림은 각각 16진수로 변환되며, 모든 비트 스트림이 후술할 도 5에 도시된 바와 같이 색인 스키마 테이블로 생성된다.In addition, the total size of the bit stream allocates a fixed space of 32 bits. However, when the level of the XML document is increased or the number of nodes for each level is increased, a larger space can be allocated to facilitate bit extension efficiently without changing the entire algorithm. The bit streams for each node obtained as described above are converted to hexadecimal numbers, and all bit streams are generated as index schema tables as shown in FIG. 5 to be described later.

도 5는 본 발명의 실시예에 따른 데이터베이스 개념 구조에 기반한 색인 파일 구조를 예시하는 도면으로서, 도면부호 510 내지 570은 각각 전술한 도 2의 필드를 나타낸다.FIG. 5 is a diagram illustrating an index file structure based on a database concept structure according to an embodiment of the present invention, and reference numerals 510 to 570 respectively denote the aforementioned fields of FIG. 2.

도 5를 참조하면, 사용자의 질의가 발생됐을 때, 색인 테이블은 질의 분석에 필요한 각 노드의 이름, 고유한 비트 스트림 값, 부모 노드의 비트 스트림 값과 레벨 값을 참조하여, 상기 변환된 질의 형태를 생성해 낼 수 있다.Referring to FIG. 5, when a user's query is generated, the index table refers to the name of each node, a unique bit stream value, a bit stream value and a level value of a parent node, which are required for query analysis. You can generate

상기 변환된 질의는 그 자체에 부모-자식, 상하위 및 형제 노드의 정보를 담고 있기 때문에 구조 정보를 부여하기 위한 연산을 줄일 수 있게 된다.Since the converted query itself contains information of parent-child, parent, and sibling nodes, it is possible to reduce an operation for assigning structure information.

또한, 상기 변환된 비트 스트림을 가지고 색인 테이블과 다시 정합을 시도할 때도 질의를 나타내는 비트 스트림 자체가 중복 없는 유일하고 고유한 값이기 때문에 추가적 연산 없이 원하는 값과 부수적인 결과들도 리턴 받을 수 있다. 또한, 매칭되는 비트 스트림은 기존의 문자열 매칭보다 매우 빠르게 수행되기 때문에 연산 수행 시간도 짧아질 수 있다.In addition, even when attempting to re-match the index table with the converted bit stream, since the bit stream representing the query is a unique and unique value without duplication, a desired value and ancillary results may be returned without additional operations. In addition, since the matching bit stream is performed much faster than the existing string matching, the operation execution time may be shortened.

2) 질의스키마2) Query schema

도 6을 참조하면, 본 발명의 실시예에 따른 XML 트리의 질의 방법은, 먼저, 질의가 있는지 확인하여(S610), 즉, 사용자로부터 질의가 들어오게 되면, 색인 데이터베이스와의 정합을 위한 전체 질의 경로(Full Query Path)를 복원하게 된다(S620). 즉, 데이터베이스와 정합을 위하여 주어진 XPath의 정규형 표현식의 전체 질의 경로를 복원한다.Referring to FIG. 6, the query method of the XML tree according to an embodiment of the present invention first checks whether a query exists (S610), that is, when a query comes in from a user, an entire query for matching with an index database. The full query path is restored (S620). That is, it restores the entire query path of the regular expression of the given XPath to match the database.

다음으로, 상기 복원된 전체 질의 경로에 따른 질의를 비트 스트림으로 변환한다(S630). 즉, 상기 전체 질의 경로를 확장 트리에서 얻어진 데이터 값이 저장된 색인 테이블을 참조하여 질의를 비트 스트림으로 변환한다.Next, the query according to the restored full query path is converted into a bit stream (S630). That is, the query is converted into a bit stream by referring to the index table in which the data values obtained from the extension tree are stored in the entire query path.

다음으로, 상기 변환된 비트 스트림을 색인 테이블에서 검색하여 질의 결과값을 리턴하게 된다(S640). 즉, 색인 스키마에서 얻어진 데이터베이스를 검색하여 반환된 결과를 사용자에게 리턴한다.Next, the converted bit stream is searched in the index table to return a query result value (S640). That is, it searches the database obtained from the index schema and returns the returned results to the user.

다음으로, 추가 질의가 있는지 확인하여(S650), 추가 질의가 있는 경우, 상 기 S620 내지 S640 단계를 반복하여 수행하게 된다.Next, check whether there is an additional query (S650), and if there is an additional query, the steps S620 to S640 are repeated.

도 7은 본 발명의 실시예에 따른 그래프 형식의 XML 쿼리(Query)를 나타내는 도면으로서, 711 내지 714는 Q₁: "책을 주문한 고객을 모두 찾아라"에 대한 Query를 나타내며, 또한, 도면부호 721 내지 726은 Q₂: "2004년 4월 1일 인천에서 주문한 것을 찾아라"에 대한 Query를 나타내며, 도면부호 731 내지 735는 Q₃: "XML 입문이라는 책에 대한 주문을 찾아라"에 대한 Query를 나타내며, 도면부호 741 내지 745는 Q₄: "책을 선불로 구매한 주문을 찾아라"에 대한 Query를 각각 나타낸다.FIG. 7 is a diagram illustrating an XML query in a graph format according to an embodiment of the present invention, in which 711 to 714 represent Query for Q ₁ : "Find all customers who ordered a book." And 726 represent a query for Q ₂ : "Find what you ordered from Incheon on April 1, 2004", and reference numerals 731 to 735 represent a query for Q ₃ : "Find an order for a book called Introduction to XML." , Reference numerals 741 to 745, respectively, represent a query for Q ₄ : "Find an order for which a book has been prepaid."

이전의 많은 연구들에서 DTD 트리의 노드들이나 경로를 통한 수많은 색인 기법들을 제안해 왔다. 경로를 통한 색인 기법들은 도 7의 Q₁(예를 들면, /D/R/N/V1)과 같은 단순한 질의에 대하여 결과를 리턴할 수 있지만, Q₂와 같은 분기형 구조를 가진 질의는 여러 개의 서브-질의들로 분해하여 각각의 단일 질의들을 처리한 다음 비용이 많이 드는 여러 번의 연산을 통해 그 결과들을 결합하여 최종 결과를 생성한다. 이러한 방법들은 Q₃과 Q₄ 같은 복잡한 질의가 주어졌을 때는 아주 비효율적이다.Many previous studies have proposed numerous indexing techniques through nodes or paths in the DTD tree. Path indexing techniques may return results for a simple query such as Q ₁ (eg, / D / R / N / V1) in FIG. 7, but queries with branched structures such as Q ₂ may be Each single query is decomposed into three sub-queries, which are then combined through the costly number of operations to produce the final result. These methods are very inefficient when given complex queries such as Q ₃ and Q ₄ .

본 발명의 실시예에 따른 색인 파일 구조를 바탕으로 하는 질의 스키마를 통해 전술한 도 3 및 도 5를 참조하여 각각의 질의를 처리하는 구체적인 예를 설명한다.A detailed example of processing each query will be described with reference to FIGS. 3 and 5 described above through a query schema based on an index file structure according to an embodiment of the present invention.

도 8은 본 발명의 실시예에 따른 질의의 비트 스트림을 나타내는 도면으로 서, 상기 질의 Q₁(예를 들면, /D/R/N)의 경우, 도 3 및 도 5를 참조하여 XPath 표현식을 각 노드별(810, 820, 830)로 비트 스트림으로 변환할 수 있는 것을 나타낸다.8 is a diagram illustrating a bit stream of a query according to an embodiment of the present invention. In the case of the query Q ₁ (eg, / D / R / N), an XPath expression is described with reference to FIGS. 3 and 5. Each node 810, 820, and 830 indicates that a bit stream can be converted.

한편, 도 9는 본 발명의 실시예에 따른 질의에 대한 비트 스트림 변환을 예시하는 테이블로서, 도면부호 910은 연산 결과(a)를 나타내고, 도면부호 920은 연산 결과(b)를 각각 나타낸다.9 is a table illustrating bit stream conversion for a query according to an embodiment of the present invention, wherein 910 denotes an operation result (a), and 920 denotes an operation result (b).

도 9를 참조하면, 전술한 도 7의 질의 Q₁의 비트 스트림 변환의 첫 번째는 노드 D와 노드 R의 AND 연산이다. 이때, 그 연산 결과(a)의 값이 노드 D의 비트 스트림과 일치한다. 이것은 두 노드의 비트 스트림 일부분이 일치할 수 있음을 의미한다. 하지만, 연산 결과 전체가 노드 D와 같다는 것은 노드 D가 노드 R의 상위 노드라는 것을 의미하게 된다.Referring to FIG. 9, the _first of the bit stream transformations of the query Q ₁ of FIG. 7 described above is an AND operation of the node D and the node R. At this time, the value of the operation result (a) coincides with the bit stream of the node D. This means that the bit stream portions of the two nodes can match. However, if the entire operation result is the same as node D, it means that node D is an upper node of node R.

또한, 두 번째 비트 연산은 노드 R과 노드 N의 AND 연산이다. 그 연산 결과(b)의 값이 노드 R의 비트 스트림과 일치한다. 이것은 노드 R이 노드 N의 상위 노드임을 말해준다.Also, the second bit operation is the AND operation of node R and node N. The value of the operation result b coincides with the bit stream of the node R. This indicates that node R is the parent of node N.

따라서, 연산 결과 (a)와 (b)에 의하여 노드 D는 노드 R과 노드 N의 상위이고, 또한, 노드 R은 노드 N의 상위 노드가 된다. 그러므로, 성공적으로 질의 Q₁이 비트 스트림으로 변환된 것을 알 수 있다.Therefore, according to the calculation results (a) and (b), the node D is higher than the node R and the node N, and the node R becomes the higher node of the node N. Therefore, it can be seen that the query Q ₁ has been successfully converted to a bit stream.

한편, 도 10은 본 발명의 실시예에 따른 질의에 대한 비트 스트림을 예시하는 테이블로서, 도면부호 1010은 Query를 나타내며, 도면부호 1020은 XPath, 도면부호 1030은 비트 스트림을 나타낸다.10 is a table illustrating a bit stream for a query according to an embodiment of the present invention, wherein 1010 represents a Query, 1020 represents an XPath, and 1030 represents a bit stream.

도 10을 참조하면, 상기 비트 스트림(1030) 값을 도 5의 데이터베이스 스키마 테이블의 B_value(520) 필드값과 매칭하면, 원하는 질의의 결과를 리턴 받을 수 있게 된다. 예를 들어, Q₂, Q₃, Q₄ 질의의 결과값으로는 도 10에 도시된 비트 스트림(1030)을 비트의 OR 연산이나 XOR 연산을 수행함으로써 얻을 수 있다.Referring to FIG. 10, if the bit stream 1030 value is matched with a B_value 520 field value of the database schema table of FIG. 5, a result of a desired query may be returned. For example, as a result of the Q ₂ , Q ₃ , and Q ₄ queries, the bit stream 1030 illustrated in FIG. 10 may be obtained by performing an OR operation or bitwise XOR operation on a bit.

결국, 본 발명의 실시예는 검색이 가능한 엘리먼트를 XML 구조 정보를 이용하여 추출한 후, 그 특성을 고려하여 효율적인 색인을 구성하는 기법으로서, 사용자의 입장에서 엘리먼트별 검색이 가능하도록 하였고, 각 엘리먼트에 입력되는 내용의 특성을 파악하여 보다 정확한 검색이 가능하도록 하였다.As a result, an embodiment of the present invention is a technique for constructing an efficient index by extracting a searchable element using XML structure information and considering its characteristics. By identifying the characteristics of the input content, more accurate search is possible.

다시 말하면, 본 발명의 실시예에 따르면, 모든 구조 정보들을 효율적인 비트 스트림으로 변환하여 색인 알고리즘의 복잡성과 비효율적인 연산의 문제점을 해결하고, 또한, XML 문서 구조 정보에 대한 질의 수행시에도 색인 파일에 기반한 질의 스키마를 통해 빠르게 변환한 후, 색인 파일에 접근하여 반복적인 액세스를 피하고, 연산 결과도 빠르게 검색할 수 있게 된다.In other words, according to an embodiment of the present invention, all the structural information is converted into an efficient bit stream to solve the problem of complexity and inefficient operation of the indexing algorithm, and also in the index file when performing a query on the XML document structure information. After a quick transformation through the underlying query schema, the index file can be accessed to avoid repetitive access and to quickly retrieve the results of operations.

이상의 설명에서 본 발명은 특정의 실시예와 관련하여 도시 및 설명하였지만, 특허청구범위에 의해 나타난 발명의 사상 및 영역으로부터 벗어나지 않는 한도 내에서 다양한 개조 및 변화가 가능하다는 것을 당업계에서 통상의 지식을 가진 자라면 누구나 쉽게 알 수 있을 것이다.While the invention has been shown and described in connection with specific embodiments thereof, it will be appreciated that various modifications and changes can be made without departing from the spirit and scope of the invention as indicated by the claims. Anyone who owns it can easily find out.

본 발명에 따르면, XML 문서의 효과적 분석 및 검색이 가능하고, 또한, XML 문서와 데이터베이스간의 원활한 호환성을 제공할 수 있다.According to the present invention, an effective analysis and retrieval of an XML document can be performed, and smooth compatibility between an XML document and a database can be provided.

Claims

In the indexing method of the XML (eXtensible Markup Language) tree based on the database conceptual structure,

a) converting an XML document to a tree based on a predetermined index file schema structure, wherein the index file schema structure is an index schema structure comprising a plurality of fields based on the database concept structure;

b) reconstructing an extended tree by assigning numbers corresponding to each level sequentially to each node of the tree;

c) indexing data for the expanded tree and converting to a bit stream, where the bit stream contains a unique and unique value one-to-one corresponding to all nodes; And

d) storing the indexed data and bit stream in a database

How to index an XML tree containing a.

The method of claim 1,

The index file schema structure of step a) may include: an N_name field indicating an XML tree node name, a B_Value field indicating a bit stream value of each node, a Tb_len field indicating a corresponding node bit stream total length value, and a current node A Belen field indicating the length value of the allocated bit stream, a Level field indicating the level at which the current node exists in the XML tree, a P_value field indicating the bit stream value of the parent node of the current node, and a value indicating the value of each node An indexing method of an XML tree, characterized by comprising a Data field.

The method according to claim 1 or 2,

In the step b), after the basic document based on the document type definition (DTD) is generated, the XML document tree is sequentially numbered for each node of each level, and bits are allocated as many as the number of nodes of each level. How to index an XML tree.

The method according to claim 1 or 2,

Step c) generates a unique bit stream for each path of each node starting from the root node, and the bit stream value itself is data of all nodes of the tree such as parent-child node, ancestor- descendant node and sibling node. And mapping the structure information to the index schema structure.

The method according to claim 1 or 2,

Step c) obtains a bit stream value assigned to each node while continuously visiting nodes from the root node, and allocates the bit space for each level starting from the least significant bit of the allocated fixed bit space. How to index an XML tree.

delete

In query method of XML tree based on database conceptual structure,

a) when the query occurs, restoring the entire query path for registration with the database;

b) an index table pre-stored according to the restored full query path, wherein the index table includes the name of each node, a unique bit stream value, and a bit scrim value and level value of a parent node required for query analysis. Converting the query into a bit stream on a basis; And

c) returning a query result value for the converted bit stream to a user

Query method of XML tree including.

The method of claim 7, wherein

The step c) returns the query result value to the user when the converted bit stream value matches the bit stream value (B_value) of each node in the index table.

The method according to claim 7 or 8,

And the transformed bit stream itself of step b) contains information of the parent-child node, parent node, and sibling node with unique and unique values without duplication.

delete