KR20190089420A

KR20190089420A - Data construction and management system of sub index storage method

Info

Publication number: KR20190089420A
Application number: KR1020180007878A
Authority: KR
Inventors: 최성원
Original assignee: 주식회사 수퍼트리랩스
Priority date: 2018-01-22
Filing date: 2018-01-22
Publication date: 2019-07-31

Abstract

According to one embodiment of the present invention, provided is a data construction system of a sub-index storage method, which is an index structure securing a space in which multiple types of data is stored and indexing the space as a storage space of data corresponding to a data value. Specifically, provided is an index storage method which can quickly find specific data from multiple types of data to improve the performance of adding, deleting, and searching data, and allows the multiple types of data to efficiently maintain the alignment of data identifier lines. The technical solution is to store the multiple types of data in a mass storage object, and each sub-index which indexes an identifier to each data is matched and connected.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a data structure management system,

본 발명은 서브 인덱스 저장 방식의 데이터 구축 및 관리 시스템에 관한 것으로, 한정된 메모리 자원 및 기타 시스템 자원을 고려하여, 효율적으로 데이터를 저장 및 검색하기 위해 다수개의 데이터가 저장되는 공간을 확보하고 데이터 값에 따라 대응하는 데이터의 저장공간으로 인덱스 시키는 인덱스 구조에 관한 것이다.The present invention relates to a data construction and management system of a sub-index storage method. In order to efficiently store and retrieve data in consideration of limited memory resources and other system resources, a space for storing a plurality of data is secured, And indexes the data into the storage space of the corresponding data.

인터넷이 발전하고 이용자 수가 증가하면서, 인터넷을 통해 다양한 서비스가 제공되고 있으며, 사용자들은 다양한 형태의 서비스를 이용하고 있다.As the Internet develops and the number of users increases, various services are provided through the Internet, and users are using various types of services.

한편, 인터넷 서비스 제공자가 제공하는 서비스에 대한 이용률이 높아지면서 방대한 양의 데이터가 새로이 생성, 수정되거나 삭제되고 있다. 이러한 정보의 양이 방대해 질수록 서비스 제공자에게는 효율적인 정보 관리와 사용자가 요구하는 정보를 신속하게 제공할 수 있는 시스템의 중요성이 높아지고 있다.On the other hand, as the utilization rate of services provided by the Internet service provider increases, a huge amount of data is newly created, modified or deleted. As the amount of this information increases, the importance of a system that can efficiently provide information to the service provider and promptly provide information requested by the user is increasing.

이에 따라, 근래 정보 검색 시스템에서 보편적으로 사용되고 있는 인덱스 구조는 키워드가 주어졌을 때 그것이 나타나는 정보를 찾아주는 인덱스 방식으로, 이를 위하여 인덱스는 각 키워드 별로 데이터를 유지하고 있다.Accordingly, the index structure, which is commonly used in the information retrieval system in recent years, is an index method that finds information that appears when a keyword is given. For this purpose, the index maintains data for each keyword.

이때, 데이터는 키워드가 발생한 어떤 데이터의 식별자와 그 데이터 내에서의 발생 위치 정보로 구성되기 때문에 데이터는 그 키워드가 발생한 빈도에 다라 그 내용이 다양하다. 또한, 그리고 같은 키워드라도 데이터 마다 발생 위치의 정보가 다르기 때문에 각 데이터의 내용도 일정하지 않다.At this time, since the data is composed of the identifier of certain data in which the keyword occurs and the location information in the data, the contents of the data vary depending on the frequency of occurrence of the keyword. In addition, even if the same keyword is used, the information of the generated position differs for each data, and therefore the contents of each data are not constant either.

따라서, 인덱스는 데이터의 추가, 삭제, 수정에 따라 효율적으로 갱신될 수 있는 저장 구조를 가져야 한다.Therefore, the index should have a storage structure that can be efficiently updated as data is added, deleted, or modified.

상술한 필요성에 따라 인덱스를 저장하기 위한 가장 간단한 방법은 키워드와 데이터를 레코드 구조로 하는 데이터 베이스 테이블을 사용하는 방법이다. 그러나, 이 방법은 동일한 키워드가 발생 빈도만큼 중복 저장되므로 많은 저장 공간을 필요로 할 뿐만 아니라 성능 또한 떨어지는 것으로 알려져 있다.The simplest method for storing an index according to the above-mentioned necessity is a method of using a database table having a record structure of a keyword and data. However, this method is known to not only require a lot of storage space but also deteriorate performance because the same keywords are stored redundantly by the frequency of occurrence.

그러므로, 데이터 베이스 테이블을 이용하지 않고 인덱스를 인덱스 구조에 저장하는 방법, 즉 키워드에 대한 인덱스를 구축하고 단말 노드의 포인터 필드에서 데이터를 가리키는 방법들이 많이 연구 되었다.Therefore, a method of storing an index in an index structure without using a database table, that is, an index for a keyword, and a method for pointing data in a pointer field of a terminal node have been studied extensively.

첨부한 제1도에서 참조 번호 10은 키워드에 대한 인덱스를 나타내고, 참조 번호 11은 데이터가 저장된 저장 공간을 나타낸다.In FIG. 1, reference numeral 10 denotes an index for a keyword, and reference numeral 11 denotes a storage space in which data is stored.

따라서, 첨부한 제1도에 도시되어 있는 역 인덱스 저장 구조는 앞서 언급한 데이터 베이스 테이블 사용 방식이 아니므로, 저장 공간 효율과 그 성능 면에서 더 우수함을 실험을 통하여 알 수 있는데, 데이터의 동적 추가로 인하여 데이터의 크기가 지속적으로 증가할 경우에 번호 11의 저장 공간을 할당하는 방법에 대한 많은 연구가 진행되었다.Therefore, since the inverse index storage structure shown in FIG. 1 is not the above-mentioned database table using method, it can be known through experiments that the storage space efficiency and the performance are better, A lot of research has been conducted on how to allocate the storage space of the number 11 when the size of the data continuously increases.

그러나, 현재까지 제안된 방식들은 특정 데이터를 빨리 찾는 문제와 데이터의 추가, 삭제, 수정에 따라 데이터가 동적으로 갱신되는 환경에서 데이터 관리 및 검색에 대해서 고려하지 않았다.However, the proposed schemes do not consider data management and retrieval in an environment where data is dynamically updated due to the problem of finding specific data quickly and data addition, deletion, and modification.

그 결과, 특정 데이터에 대한 키워드 검색 성능이 떨어진다. As a result, the keyword search performance for specific data drops.

상술한 문제점을 해소하기 위한 본 발명의 목적은 데이터 검색 성능을 높이고자 데이터 리스트에서 특정 데이터를 빨리 찾을 수 있고 데이터 리스트가 데이터 식별자 순의 정렬을 효율적으로 유지할 수 있는 서브 인덱스와 대용량 객체를 이용한 인덱스 저장 구조를 제공하는데 있다.It is an object of the present invention to overcome the above-mentioned problems, and it is an object of the present invention to provide an apparatus and a method for searching for a specific data in a data list in order to improve data retrieval performance, Storage structure.

상기 과제를 해결하기 위한 본 발명의 특징은, 다수개의 데이터 리스트가 저장되는 공간을 확보하고 키워드에 따라 대응하는 데이터 리스트의 저장공간으로 인덱스시키는 인덱스 구조에 있어서, 데이터 리스트를 대용량 객체에 저장하되, 각각의 데이터 리스트에 식별자를 인덱스시키는 각각의 서브 인덱스를 매칭 연결시키는데 있다. According to an aspect of the present invention, there is provided an index structure for reserving a space in which a plurality of data lists are stored and indexing the data list into a storage space of a corresponding data list according to a keyword, the index structure storing a data list in a large- Indexes the identifiers in the respective data lists.

상기 목적을 달성하기 위한 본 발명의 부가적인 특징으로 상기 서브 인덱스는 데이터 리스트와 연결되어 데이터 식별자를 인덱스 시키는데 있어 오프셋 배열을 이용하여 간접적으로 가리키는 데 있다.In order to achieve the above object, the sub-index is indirectly pointed to by using an offset array to index a data identifier in connection with a data list.

상기 목적을 달성하기 위한 본 발명의 부가적인 특징으로 상기 데이터 리스트는 객체 식별자와 데이터 식별자 및 발생위치정보를 저장하는 물리계층으로 이루어지는 데 있다.According to another aspect of the present invention, the data list includes a physical layer for storing an object identifier, a data identifier, and generated location information.

상기 목적을 달성하기 위한 본 발명의 다른 특징은, 객체 식별자와 데이터 식별자 및 발생위치정보를 저장하는 물리계층으로 이루어지는 다수개의 데이터 리스트가 대용량 객체에 저장되고, 오프셋 배열을 이용하여 간접적으로 데이터 식별자를 인덱스시키는 다수개의 서브 인덱스를 각각의 데이터 리스트에 매칭 연결시켜 놓은 서브 인덱스와 대용량 객체를 이용한 역 인덱스 저장 구조상에서의 새로운 데이터 삽입 방법에 있어서: 새로운 데이터의 오프셋 배열을 위한 공간이 부족하지 않은가를 검색하는 제1단계와; 상기 제 1단계에서 오프셋 배열을 위한 공간이 부족하다고 판단되면 현재 오프셋 배열의 크기를 두배로 확장하는 제2단계와, 상기 제1단계에서 오프셋 배열을 위한 공간이 부족하지 않다고 판단되거나 상기 제2단계를 수행한 이후의 경우 비어 있는 오프셋 배열에 새로운 데이터의 식별자의 원소를 하나 할당 받으며 새로운 데이터의 식별자가 현재까지의 식별자들 보다 큰 가를 판단하는 제3단계와, 상기 제3단계를 통해 새로운 데이터의 식별자가 현재까지의 식별자들 보다 크다고 판단되는 경우 새로운 데이터를 기존의 데이터 리스트의 끝에 추가하는 제4단계와, 상기 제3단계를 통해 새로운 데이터의 식별자가 현재까지의 식별자들 보다 크지 않다고 판단되는 경우 서브인덱스의 탐색을 통해 결정된 바이트 오프셋에 새로운 데이터를 삽입하는 제5단계, 및 상기 제4단계 또는 제5단계를 수행한 후 새로운 데이터와 바이트 오프셋이 바뀌어진 모든 데이터들의 바이트 오프셋을 기록하고 있는 오프셋 배열 원소들의 값을 수정한 후 추가된 식별자와 할당 받은 오프셋 배열 원소 번호를 서브 인덱스에 삽입하는 제7단계를 포함하는 데 있다.According to another aspect of the present invention, there is provided a method of managing a plurality of data lists including a physical layer for storing an object identifier, a data identifier, and occurrence position information in a large-capacity object, A method of inserting new data on an inverse index storage structure using a sub index and a large capacity object in which a plurality of sub indexes are indexed and indexed to each data list, the method comprising the steps of: ; A second step of doubling the size of the current offset array if it is determined that the space for the offset arrangement is insufficient in the first step; A third step of allocating one element of the identifier of the new data to the empty offset array and determining whether the identifier of the new data is larger than the identifiers of the current data; A fourth step of adding new data to the end of the existing data list when it is determined that the identifier is larger than the current identifiers; and if it is determined through the third step that the identifier of the new data is not larger than the current identifiers The fifth step of inserting new data into the byte offset determined through the search of the sub-index And after performing the fourth or fifth step, correcting the value of the offset array elements recording the byte offset of the new data and all the data whose byte offset has been changed, adding the added identifier and the allocated offset array element And a seventh step of inserting the number into the sub-index.

상기 목적을 달성하기 위한 본 발명의 또 다른 특징은, 객체 식별자와 데이터 식별자 및 발생위치정보를 저장하는 물리계층으로 이루어지는 다수개의 데이터 리스트가 대용량 객체에 저장되고, 오프셋 배열을 이용하여 간접적으로 데이터 식별자를 인덱스시키는 다수개의 서브 인덱스를 각각의 데이터 리스트에 매칭 연결시켜 놓은 서브 인덱스와 대용량 객체를 이용한 역 인덱스 저장 구조상에서의 데이터 삭제 방법에 있어서: 서브 인덱스를 탐색하여 삭제된 문서의 식별자가 포함된 데이터의 저장 위치를 알아내어 해당 데이터를 삭제하는 제1단계와, 상기 제1단계에서 삭제된 데이터와 저장 위치가 바뀌어 진 모든 데이터들의 바이트 오프셋을 기록하고 있는 오프셋 배열 원소들의 값을 수정하는 제2단계, 및 상기 제2단계에서 삭제된 데이터의 오프셋 배열 원소를 가리키던 서브 인덱스 엔트리를 삭제하는 제3단계를 포함하는 데 있다. 상기 목적을 달성하기 위한 본 발명의 또 다른 특징은, 객체 식별자와 데이터 식별자 및 발생위치정보를 저장하는 물리계층으로 이루어지는 다수개의 데이터 리스트가 대용량 객체에 저장되고, 오프셋 배열을 이용하여 간접적으로 데이터 식별자를 인덱스시키는 다수개의 서브 인덱스를 각각의 데이터 리스트에 매칭 연결시켜 놓은 서브 인덱스와 대용량 객체를 이용한 역 인덱스 저장 구조상에서의 데이터 검색 방법에 있어서: 키워드를 기준으로 키워드 인덱스를 탐색하여서 해당 키워드에 대응하는 데이터 리스트가 저장되어 있는 대용량 객체를 검색하는 제1단계와, 특정 데이터 식별자가 존재하는 가를 판단하는 제2단계와, 상기 제2단계에서 특정 데이터 식별자가 존재하는 경우 주어진 특정 데이터 내에 주어진 키워드가 포함되는지를 검색하기 위해 서브 인덱스를 탐색하여 주어진 특정 데이터를 찾게 되고 검색된 특정 데이터를 읽어 반환하는 제3단계, 및 상기 제2단계에서 특정 데이터 식별자가 존재하지 않는 경우 대용량 객체 내에 저장된 데이터 리스트를 순차적으로 읽어 반환하는 제4단계를 포함하는 데 있다.According to another aspect of the present invention, there is provided a method of managing a data structure, the method comprising the steps of: storing a plurality of data lists including a physical layer for storing an object identifier, a data identifier and occurrence position information in a large object, Indexed storage structure using a sub-index and a large-capacity object in which a plurality of sub-indexes indexing a plurality of sub-indexes are connected to each data list, the method comprising the steps of: A second step of correcting a value of the offset array elements recording the byte offset of the data deleted in the first step and all data whose storage location is changed, , And a step of discarding the data deleted in the second step There to a third step of deleting kideon sub-index entries point to the array element. According to another aspect of the present invention, there is provided a method of managing a data structure, the method comprising the steps of: storing a plurality of data lists including a physical layer for storing an object identifier, a data identifier and occurrence position information in a large object, A method of retrieving data on an inverse index storage structure using a sub-index and a large-capacity object in which a plurality of sub-indexes indexing a plurality of sub-indexes are connected to each data list, the method comprising the steps of: A second step of determining whether or not a specific data identifier is present; and a step of, if a specific data identifier is present in the second step, Search for A third step of searching for a specific sub-index, searching for a specific sub-index, reading and returning the specific data retrieved, and, if the specific data identifier does not exist in the second sub-index, It involves having four steps.

상기 목적을 달성하기 위한 본 발명의 또 다른 특징은, 객체 식별자와 데이터 식별자 및 발생위치정보를 저장하는 물리계층으로 이루어지는 다수개의 데이터 리스트가 대용량 객체에 저장되고, 오프셋 배열을 이용하여 간접적으로 데이터 식별자를 인덱스시키는 다수개의 서브 인덱스를 각각의 데이터 리스트에 매칭 연결시켜 놓은 서브 인덱스와 대용량 객체를 이용한 역 인덱스 저장 구조와 매칭되는 데이터 베이스 시스템의 구조에 있어서: 데이터 베이스 시스템에는 데이터 식별자와 다수개의 데이터 베이스 객체가 존재하고, 상기 데이터 베이스 객체는 어드레스의 개념으로 객체 식별자와 데이터 식별자를 사용하여 데이터 리스트 내의 데이터와 매칭 관계를 이루는 물리적 형태를 갖는 데 있다.According to another aspect of the present invention, there is provided a method of managing a data structure, the method comprising the steps of: storing a plurality of data lists including a physical layer for storing an object identifier, a data identifier and occurrence position information in a large object, Index storage structure using a sub-index and a large-capacity object in which a plurality of sub-indexes are index-indexed to each data list, wherein the sub-index is matched with an inverse index storage structure, the database system comprising: And the database object has a physical form matching the data in the data list using the object identifier and the data identifier in the concept of the address.

상술한 바와 같이 동작하는 본 발명에 따른 서브 인덱스와 대용량 객체를 이용한 인덱스 저장 구조를 제공하면 다음과 같은 장점을 얻을 수 있다.The following advantages can be obtained by providing an index storage structure using a sub-index and a large-capacity object according to the present invention operating as described above.

첫째, 인덱스의 동적 갱신을 효율적으로 지원할 수 있다. 한 키워드의 데이터 리스트를 하나의 대용량 객체로 간주하고 대용량 객체 관리기법을 사용하여 관리함으로써 데이터가 동적으로 추가, 삭제, 수정되어 데이터 리스트의 길이가 다양하게 변하더라도 최소한의 비용으로 저장 공간을 관리할 수 있다. 즉, 대용량 객체 중간에 데이터를 삽입, 삭제하더라도 삽입/삭제된 데이터 전후의 다른 데이터들은 물리적으로 움직일 필요가 없다.First, it can efficiently support dynamic update of index. By managing the data list of one keyword as one large object and managing it by using the large object management method, the data can be dynamically added, deleted and modified so that the storage space can be managed at a minimum cost even if the length of the data list varies . That is, even if data is inserted or deleted in the middle of a large-capacity object, other data before and after the inserted / deleted data need not be physically moved.

둘째, 데이터의 삭제 성능을 높여준다. 서브 인덱스를 이용하면 삭제된 데이터의 식별자가 포함된 데이터를 데이터 리스트에서 빨리 찾아 삭제할 수 있으므로 문서 삭제 기능을 높일 수 있다. 특히, 길이가 긴 데이터 리스트를 가진 키워드는 대부분의 데이터에 포함되어 있으므로 긴 데이터 리스트에서의 데이터 삭제는 데이터 삭제 성능에 중요한 요소가 된다.Second, it improves data deletion performance. Using the sub-index, the data containing the identifier of the deleted data can be quickly found and deleted from the data list, thereby enhancing the document deletion function. In particular, since keywords having a long data list are included in most data, data deletion in a long data list is an important factor for data deletion performance.

셋째, 데이터의 수정 성능을 높여준다. 데이터의 수정은 기존 데이터를 삭제하고 수정된 데이터를 추가하는 것이므로 데이터의 삭제 성능이 향상되면 수정 성능도 향상된다. 또한, 데이터 리스트에 수정된 데이터를 삽입할 때 데이터 식별자 순의 정렬을 유지하기 위하여 기존 데이터 식별자 위치에 삽입하여야 한다. 이 경우 서브 인덱스를 통하여 삽입하면 자동적으로 데이터 식별자 순의 정렬이 유지되므로 데이터 수정 성능을 높일 수 있다.Third, it improves the data correction performance. Since the modification of the data is to delete the existing data and add the modified data, the correction performance is improved when the data deletion performance is improved. In addition, when inserting modified data into the data list, it should be inserted at the existing data identifier position in order to maintain the order of the data identifiers. In this case, when the data is inserted through the sub-index, the sorting of the data identifiers is automatically maintained, thereby improving the data modification performance.

넷째, 특정 데이터에 대한 키워드 검색 성능을 높여준다. 서브 인덱스를 이용하면 특정 키워드에 해당하는 데이터 리스트에서 특정 데이터를 빨리 찾을 수 있으므로 특정 데이터 내에 특정 키워드가 포함 되었는지의 검색 성능을 높일 수 있다.Fourth, it improves the keyword search performance for specific data. By using the sub-index, specific data can be quickly found in a data list corresponding to a specific keyword, so that it is possible to enhance the search performance as to whether a specific keyword is included in specific data.

다섯째, 데이터 베이스 검색과 정보검색의 통합 처리를 가능하게 한다. 정보 검색이 결합된 데이터 베이스 관리 시스템에서는 정형 데이터에 대한 데이터 베이스 검색과 데이터의 내용에 대한 정보검색이 통합 처리될 수 있어야 하는데 새로운 인덱스 저장 구조에서는 데이터의 식별자 자리에 정형 데이터를 저장하고 있는 데이터 베이스 객체의 식별자를 유지함으로써 정보검색과 데이터 베이스 검색을 통합 처리할 수 있다. Fifth, it enables integrated processing of database search and information search. In a database management system combined with information retrieval, it is necessary to be able to integrate database retrieval with regular data and information retrieval with respect to data contents. In the new index storage structure, a database By maintaining the identifiers of objects, information retrieval and database retrieval can be integrated.

제1도는 종래 인덱스 방식의 저장 구조.
제2도는 본 발명에 따른 인덱스 저장 구조가 구현되는 시스템의 대략적인 간략 구성 예시도.
제3도는 본 발명에 따른 인덱스 방식의 저장 구조.
제4도는 제3도에 도시되어 있는 서브 인덱스에서 오프셋 방식을 이용한 예를 도시한 예시도.
제5도는 본 발명에 따른 인덱스 방식의 저장 구조를 이용한 데이터 추가의 과정 예시도.
제6도는 본 발명에 따른 인덱스 방식의 저장 구조를 이용한 데이터 삭제의 과정 예시도.
제7도는 본 발명에 따른 인덱스 방식의 저장 구조를 이용한 데이터 검색의 과정 예시도.
제8(a)도 내지 제8(b)도는 본 발명에 따른 인덱스 방식의 저장 구조와 데이터베이스 시스템간의 데이터 매칭 관계를 나타내는 예시도.FIG. 1 is a conventional index-type storage structure.
FIG. 2 is a simplified schematic configuration example of a system in which an index storage structure according to the present invention is implemented; FIG.
FIG. 3 is an index-type storage structure according to the present invention; FIG.
FIG. 4 illustrates an example of using an offset method in a sub-index shown in FIG. 3; FIG.
FIG. 5 is a diagram illustrating a process of adding data using a storage structure of an index scheme according to the present invention; FIG.
6 is a diagram illustrating an example of a data deletion process using an index-type storage structure according to the present invention;
FIG. 7 illustrates an example of a data retrieval process using an index-type storage structure according to the present invention;
8 (a) through 8 (b) illustrate examples of data matching relationships between an index-type storage structure and a database system according to the present invention.

이하, 첨부한 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 살펴보기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

우선, 본 발명에서 달성하고자 하는 기술적 요지를 간략히 설명하면, 데이터의 동적 추가, 삭제, 수정이 빈번한 환경에서는 데이터에 대한 일관성 있는 관리가 중요하며 이를 위해서는 동시성 제어와 파손 회복 기능 등 데이터 베이스 관리 시스템 기능이 필요하다.First, in order to briefly explain the technical points to be achieved in the present invention, consistent management of data is important in an environment where dynamic addition, deletion, and modification of data are frequently performed. For this purpose, a database management system function Is required.

또한, 데이터에 대한 검색은 데이터가 가진 정형 데이터에 대한 검색과 비정형 데이터에 대한 검색으로 이루어지는데, 상술한 정형 데이터에 대한 검색은 데이터 베이스 관리 시스템이 잘 지원하고 있으며, 상기 비정형 데이터에 대한 검색은 정보검색 시스템이 잘 지원하고 있다. 따라서, 데이터에 대한 일관성 있는 관리와 통합 검색을 위해서는 데이터 검색을 데이터 베이스 관리 시스템과 결합하는 것이 필요하다.In addition, the retrieval of the data is performed by retrieving the regular data of the data and the retrieval of the unstructured data. The database management system supports the retrieval of the above-described regular data, and the retrieval of the non- Information retrieval systems are well supported. Therefore, it is necessary to combine data retrieval with database management systems for consistent management and integrated retrieval of data.

그러므로, 결합된 시스템에서는 데이터 베이스에 데이터가 동적으로 추가, 삭제, 수정될 때 인덱스 구조가 효율적으로 갱신될 수 있어야 하고 인덱스를 통한 데이터 검색과 정형 데이터에 대한 데이터 베이스 검색이 통합 처리될 수 있어야 한다.Therefore, in the combined system, when the data is dynamically added to, deleted from, or modified in the database, the index structure must be efficiently updated and the data retrieval through the index and the database retrieval for the structured data must be integrated .

상술한 바와 같은 인덱스 구조의 필요성에 의해 제안 된 본 발명에 따른 인덱스 저장 구조는 첨부한 제2도에 도시되어 있는 하드웨어적인 환경 속에 적용되는데, 첨부한 제3도에는 본 발명에 따라 새롭게 제안되는 인덱스 저장 구조가 도시되어 있다.The index storage structure according to the present invention proposed by the necessity of the above-described index structure is applied to the hardware environment shown in FIG. 2, Storage structure is shown.

첨부한 제3도에 도시되어 있는 본 발명에 따른 인덱스 저장 구조는 종래의 인덱스 구조에서 데이터 리스트를 대용량 객체에 저장하고, 각각에 대하여 데이터 식별자를 이용한 서브 인덱스를 구비 시키고 있다는 것이다.In the index storage structure according to the present invention shown in FIG. 3, a data list is stored in a large capacity object in a conventional index structure, and a sub index using a data identifier is provided for each data list.

즉, 첨부한 제3도에서 참조 번호 20은 키워드를 키 값으로 하는 B+트리 인덱스로서, 키워드가 주어지면 그 키워드의 데이터 리스트가 저장되어 있는 대용량 객체(참조 번호 21)를 찾아준다. 대용량 객체란 텍스트(full text), 이미지(image), 오디오(audio), 비디오(video) 데이터와 같이 저장되는 객체의 크기가 디스크 페이지 크기를 초과하는 객체이며, 한 키워드에 대한 데이터 리스트를 하나의 대용량 객체로 간주한다.In FIG. 3, reference numeral 20 denotes a B + tree index having a keyword as a key value. If a keyword is given, a large object (reference numeral 21) in which a data list of the keyword is stored is searched. A large-capacity object is an object whose size exceeds a disk page size such as full text, image, audio, and video data, and a data list for one keyword is divided into one It is regarded as a large object.

상술한 바와 같이 대용량 객체를 이용하여 데이터 리스트를 저장하면 다양하고 가변적인 길이 특성을 가진 데이터 리스트의 관리가 대용량 객체 관리 기법에 의하여 자동적으로 이루어진다. As described above, when a data list is stored using a large-capacity object, management of a data list having various variable length characteristics is automatically performed by a large-capacity object management technique.

또한, 첨부한 제3도의 참조 번호 22는 본 발명에서 고안한 서브 인덱스로서 각 대용량 객체 마다 구축되어 있다. 상기 서브 인덱스(22)는 대용량 객체 내에 저장되어 있는 데이터 리스트에 대하여 데이터 식별자를 키로 하여 구축되는 인덱스로서 데이터 리스트가 데이터 식별자 순의 정렬을 자동적으로 유지하도록 하고 데이터 리스트에서 특정 데이터를 빨리 찾을 수 있도록 하기 위한 인덱스이다.Also, reference numeral 22 in the attached FIG. 3 is a sub-index devised in the present invention and constructed for each large-capacity object. The sub-index 22 is an index constructed with a data identifier as a key for a data list stored in a large-capacity object, so that the data list can automatically maintain sorting in the order of data identifiers, .

즉, 데이터가 수정되었을 때 서브 인덱스를 통하여 수정된 데이터를 데이터 식별자 순으로 데이터 리스트에 빠르게 삽입할 수 있고, 데이터가 삭제되었을 때 서브 인덱스를 통하여 삭제된 데이터를 데이터 리스트에서 빠르게 삭제할 수 있으며, 특정 데이터에 대한 키워드 검색을 할 때 서브 인덱스를 통하여 특정 데이터의 식별자가 포함된 데이터를 데이터 리스트에서 빠르게 검색할 수 있다.That is, when the data is modified, the modified data can be quickly inserted into the data list in the order of the data identifiers through the sub index, and the deleted data can be quickly deleted from the data list through the sub index when the data is deleted. When a keyword search for data is performed, data including an identifier of a specific data can be quickly retrieved from the data list through a sub index.

따라서 서브 인덱스를 이용하면 데이터의 수정, 삭제나 특정 데이터에 대한 키워드 검색 성능이 높아진다. 이러한 서브 인덱스는 항상 구축하는 것이 아니고 오버 헤드를 줄이기 위하여 일정 길이 이상의 데이터 리스트에 대해서만 구축되게 할 수 있으며 B+트리로 구현할 수도 있다.Therefore, if sub-indexes are used, data retrieval and deletion and keyword search performance for specific data are improved. These sub-indexes are not always constructed, but may be constructed only for a data list of a certain length or more in order to reduce overhead, and may be implemented by a B + tree.

상술한 바와 같이 본 발명에 의해 제안된 인덱스 구조에 있어 서브 인덱스의 자세한 저장 구조를 첨부한 제4도를 참조하여 살펴보면, 서브 인덱스의 키는 데이터 식별자가 되고 서브 인덱스가 가리키는 것은 대용량 객체 내에서 각 데이터가 저장되어 있는 위치이다.Referring to FIG. 4, which shows a detailed storage structure of subindexes in the index structure proposed by the present invention, a subindex key is a data identifier, and a subindex indicates a subindex in a large object It is the location where the data is stored.

그러나, 서브 인덱스가 각 데이터의 저장 위치를 직접 가리키게 되면 대용량 객체 중간에 새로운 데이터를 삽입하거나 기존 데이터를 삭제할 때, 이후에 저장된 모든 데이터들의 저장 위치가 바뀌게 되므로 이들을 가리키는 서브 인덱스 엔트리들을 모두 갱신하여야 한다.However, when the sub-index directly indicates the storage location of each data, when inserting new data or deleting existing data in the middle of a large-capacity object, the storage locations of all the data stored thereafter are changed. .

따라서, 본 발명에서는 이를 해결하기 위하여 데이터 리스트의 앞 부분에 오프셋 배열(offset array)을 유지한다. 상기 오프셋 배열의 각 원소는 해당 데이터의 저장 위치를 가리키고 있다. 저장 위치는 오프셋 배열 끝에서부터의 바이트 오프셋 값이다. 그러므로, 오프셋 배열의 크기가 변하더라도 데이터의 저장 위치 즉, 바이트 오프셋 값은 변하지 않는다. 또한, 서브 인덱스는 각 데이터의 저장 위치를 가리키고 있는 오프셋 배열의 원소를 가리킴으로써 데이터의 저장 위치가 바뀌더라도 서브 인덱스는 영향을 받지 않게된다.Accordingly, in the present invention, an offset array is maintained in the front part of the data list in order to solve this problem. Each element of the offset array indicates a storage position of the corresponding data. The storage location is the byte offset value from the end of the offset array. Therefore, even if the size of the offset array changes, the storage location of the data, i.e., the byte offset value, does not change. In addition, the sub index indicates an element of the offset array indicating the storage position of each data, so that the sub index is not affected even if the storage position of the data is changed.

상기 제3도와 제4도에 도시되어 있는 바와 같은 구조의 인덱스 구조를 이용하여 데이터가 추가되거나 수정되었을 때 새로운 데이터 한 개를 삽입하는 알고리즘은 첨부한 제5도에 도시되어 있다.An algorithm for inserting one new data when data is added or modified using the index structure of the structure shown in FIGS. 3 and 4 is shown in FIG. 5 attached hereto.

제5도의 내용을 살펴보면, 스텝 S101에서 새로운 데이터의 저장을 위한 오프셋 배열을 위한 공간이 부족하지 않은가를 판단하게 되는데, 상기 스텝 S101에서 오프셋 배열을 위한 공간이 부족하다고 판단되면 스텝 S102로 진행한다. 상기 스텝 S102에서는 오프셋 배열을 위한 공간 확보를 위하여 현재 오프셋 배열의 크기를 두배로 확장하게 된다.Referring to FIG. 5, it is determined in step S101 whether a space for offset arrangement for storing new data is insufficient. If it is determined in step S101 that the space for offset arrangement is insufficient, the process proceeds to step S102. In step S102, the size of the current offset array is doubled to secure space for offset arrangement.

이후, 상기 스텝 S101에서 오프셋 배열을 위한 공간이 부족하지 않다고 판단되거나 상기 스텝 S102에서 현재 오프셋 배열의 크기를 두배로 확장한 경우 스텝 S103으로 진행하는데, 스텝 S103에서는 비어 있는 오프셋 배열에 새로운 데이터의 식별자의 원소를 하나 할당 받는다.Thereafter, if it is determined in step S101 that the space for offset arrangement is not insufficient, or if the size of the current offset array is doubled in step S102, the process proceeds to step S103. In step S103, One of the elements of.

상기 스텝 S103에서 새로운 원소를 비어 있는 오프셋 배열에 할당하는 과정 중에 스텝 S104에서는 새로운 데이터의 식별자가 현재까지의 식별자들보다 큰 지를 비교하는데, 비교치가 기존의 식별자들 보다 크다면 스텝 S106으로 진행하여 새로운 데이터를 리스트의 끝에 추가하게 된다.During the process of allocating the new element to the vacant offset array in step S103, it is compared in step S104 whether the identifier of the new data is larger than the current identifiers. If the comparison value is larger than the existing identifiers, We add the data to the end of the list.

만약, 상기 스텝 S104에서 비교치가 기존의 문서 식별자들 보다 크기 않다고 판단되면 스텝 S105로 진행하고, 상기 스텝 S106에서는 서브 인덱스의 탐색을 통해 결정된 바이트 오프셋에 새로운 데이터를 삽입한다. 이때, 삽입된 데이터 뒤에 저장된 모든 데이터들의 바이트 오프셋은 삽입된 데이터의 크기만큼 커지게 된다. 단 여기서 대용량 객체의 관리 방법에 의해 데이터의 물리적 위치 자체는 변하지 않는다.If it is determined in step S104 that the comparison value is not larger than the existing document identifiers, the process proceeds to step S105. In step S106, new data is inserted into the byte offset determined through searching of the sub-index. At this time, the byte offset of all the data stored after the inserted data becomes larger by the size of the inserted data. However, the physical location of the data itself is not changed by the management method of the large capacity object.

상술한 스텝 S105나 스텝 S106의 과정을 통하여 새로운 데이터가 삽입 또는 추가된 후 스텝 S107에서는 삽입(또는 추가)한 데이터와 바이트 오프셋이 바뀌어진 모든 데이터들의 바이트 오프셋을 기록하고 있는 오프셋 배열 원소들의 값을 수정한다.After the new data is inserted or added through the above-described steps S105 and S106, the value of the offset array elements recording the byte offset of all the data inserted and added with the byte offset in step S107 Modify it.

이후, 마지막으로 스텝 S108에서는 추가된 데이터의 식별자와 할당 받은 오프셋 배열 원소 번호를 서브 인덱스에 삽입한다.Thereafter, finally, in step S108, the identifier of the added data and the assigned offset array element number are inserted into the sub-index.

상술한 과정을 통하여 새로운 데이터 한 개를 삽입하는 과정을 살펴보았는데, 특정 데이터가 삭제되는 경우 즉, 데이터가 삭제되었을 때 삭제된 데이터의 식별자를 포함하고 있는 데이터 한 개를 리스트에서 삭제하는 과정은 첨부한 제6도에 도시되어 있는 바와 같다.The process of inserting one new data through the above process has been described. The process of deleting one piece of data including the identifier of the deleted data when the specific data is deleted, that is, when the data is deleted, As shown in FIG. 6.

스텝 S201에서 서브 인덱스를 탐색하여 삭제된 데이터의 식별자가 포함된 데이터의 저장 위치를 알아내고, 스텝 S202에서 해당 데이터를 삭제한다.The sub index is searched in step S201 to find the storage location of the data including the identifier of the deleted data, and the corresponding data is deleted in step S202.

이때, 삭제된 데이터 뒤에 저장된 모든 데이터들의 바이트 오프셋은 삭제된 데이터의 크기만큼 작아지게 된다. 단 여기서도 삽입 때와 마찬가지로 데이터의 물리적 위치 자체는 변하지 않는다.At this time, the byte offset of all the data stored after the deleted data becomes smaller by the size of the deleted data. Here, as in the case of insertion, the physical position of the data does not change.

따라서, 스텝 S203에서는 삭제된 데이터와 저장 위치가 바뀌어 진 모든 데이터들의 바이트 오프셋을 기록하고 있는 오프셋 배열 원소들의 값을 수정하고, 스텝 S204에서는 삭제된 데이터의 오프셋 배열 원소를 가리키던 서브 인덱스 엔트리를 삭제한다. 이때, 삭제된 데이터를 가리키던 오프셋 배열 원소는 새로운 데이터의 삽입 시 재 사용될 수 있다.Accordingly, in step S203, the value of the offset array elements recording the byte offset of all the data whose data and storage position are changed is modified. In step S204, the sub-index entry indicating the offset array element of the deleted data is deleted do. At this time, the offset array element indicating the deleted data can be reused when inserting new data.

상술한 제5도와 제6도에 도시되어 있는 바와 같은 과정을 통하여 데이터의 추가 또는 삭제가 가능한 본 발명에 따른 인덱스 저장 구조를 이용하여 데이터를 검색하는 과정은 첨부한 제7도에 도시되어 있는 것이다.The process of retrieving data using the index storage structure according to the present invention, which can add or delete data through the process as shown in the above-mentioned fifth and sixth figures, is shown in FIG. 7 .

먼저, 스텝 S301에서 주어진 키워드를 키 값으로 하여 키워드 인덱스를 탐색하여서 주어진 키워드의 데이터 리스트가 저장되어 있는 대용량 객체를 찾는다.First, a keyword index is searched using the keyword given in step S301 as a key value, and a large-capacity object storing a data list of a given keyword is searched.

이후, 스텝 S302에서는 특정 데이터 식별자가 주어진 경우와 주어지지 않은 경우로 나누어 데이터 식별자가 존재하는 경우 스텝 S303으로 진행하게 되는데, 스텝 S303에서는 특정 데이터 식별자가 주어진 경우의 스텝으로서 주어진 특정 데이터 내에 주어진 키워드가 포함되는 지를 검색하여야 하므로 서브 인덱스를 탐색하여 주어진 특정 데이터를 찾게 되고, 스텝 S304에서는 검색된 특정 데이터를 읽어 반환한다.Thereafter, in step S302, if the data identifier is divided into the case where the specific data identifier is given and the case where there is no data identifier, the process proceeds to step S303. In step S303, The sub index is searched to find the given specific data. In step S304, the searched specific data is read and returned.

반면에, 특정 데이터 식별자가 주어지지 않은 경우에는 스텝 S305로 진행하는 데, 스텝 S305에서는 주어진 키워드가 발생한 모든 데이터를 반환하여야 하므로 대용량 객체 내에 저장된 데이터 리스트를 순차적으로 읽어 반환한다.On the other hand, if the specific data identifier is not given, the flow advances to step S305. In step S305, all the data in which a given keyword is generated should be returned, and the data list stored in the large object is sequentially read and returned.

이상과 같은 과정을 통해 새로운 데이터의 추가나 특정 데이터의 삭제 및 검색 과정을 살펴보았다.Through the above process, we have examined the process of adding new data or deleting and searching specific data.

이제, 데이터 식별자와 객체 식별자를 이용하여 정보검색과 데이터 베이스 검색을 통합 처리하는 방법을 첨부한 제8(a)도와 제8(b)도의 내용을 참조하여 살펴보기로 한다.Now, a description will be given with reference to the contents of FIGS. 8 (a) and 8 (b) with a method of integrating information retrieval and database retrieval using a data identifier and an object identifier.

제7(a)도에 도시되어 있는 내용은 본 발명에 따른 인덱스 구조 중 데이터 리스트와 데이터 베이스 시스템 상의 데이터 베이스 객체 사이에 매칭 관계를 나타내는 것으로, 객체 식별자와 데이터 식별자 및 발생 위치 정보로 이루어지는 것이 데이터 리스트이며, 데이터 식별자와 데이터 번호와 작성자와 작성 날짜 및 내용으로 구성된 것이 데이터 베이스 객체이다.7 (a) shows a matching relationship between a data list and a database object on the database system, and the object identifier, the data identifier, and the generated location information are data List is a database object composed of data identifier, data number, author, creation date and contents.

이때, 상기 데이터 베이스 객체는 어드레스의 개념으로 객체 식별자와 데이터 식별자를 사용하게 되는데, 데이터 리스트의 데이터와 매칭 관계를 이루는 물리적 형태를 갖는다.At this time, the database object uses an object identifier and a data identifier in the concept of an address, and has a physical form matching the data of the data list.

상기 제8(a)도에 도시되어 있는 바와 같은 매칭 관계를 갖는 데이터 리스트와 데이터 베이스 객체간의 상호 연결 및 사용 관계는 첨부한 제8(b)도에 예시되어 있다.The interconnection and usage relationship between the data list and the database object having the matching relationship as shown in FIG. 8 (a) is illustrated in the accompanying FIG. 8 (b).

제8(b)도는 두 개의 데이터와 두 개의 데이터 베이스 객체가 존재한다는 가정아래 도시되어 있는 것으로, 데이터1(400a)과 데이터2(400b)는 데이터 식별자와 객체 식별자를 함께 저장하고 있고 데이터 식별자에 따라 물리적으로 정렬되어 있으며 각각의 객체 식별자는 객체 1(500b)과 객체2(500a)를 가리킨다.The first data 400a and the second data 400b are stored together with the data identifier and the object identifier, and the data identifier 400b and the second identifier 400b Are physically aligned along with the object identifiers, and each object identifier indicates object 1 500b and object 2 500a.

또한, 객체 1(500b)과 객체2(500a)는 데이터 식별자를 저장하고 있으며 그 각각의 값은 데이터1(400a)과 데이터2(400b)의 데이터 식별자 값과 같다. 정보검색 질의를 처리하여 데이터를 얻게 되면 데이터에 저장된 객체 식별자를 통해 대응하는 데이터 베이스 객체를 읽어서 데이터 베이스 질의를 처리할 수 있고, 반대로 데이터 베이스 질의를 처리하여 데이터 베이스 객체를 얻게 되면 해당하는 데이터 식별자를 키 값으로 하여 서브 인덱스를 탐색함으로써 정보검색 질의를 처리할 수 있으므로 정보검색과 데이터 베이스 검색을 통합 처리할 수 있다.The first object 500b and the second object 500a store data identifiers and their values are the same as the data identifier values of the first data 400a and the second data 400b. When the information retrieval query is processed to obtain the data, the corresponding database object can be read through the object identifier stored in the data to process the database query. On the contrary, if the database object is obtained by processing the database query, Is used as a key value to process the information search query by searching the sub-index, so that the information search and the database search can be integrated.

15 : CPU
10 : 메모리
25 : 응용 프로그램
30 : 데이터 베이스 관리 시스템 영역
35 : 데이터 베이스 질의 처리 영역
40 : 정보검색 시스템 영역
45 : 데이터 베이스 객체 버퍼
50 : 데이터 리스트 버퍼
55 : 데이터 베이스15: CPU
10: Memory
25: Application
30: Database Management System Area
35: Database query processing area
40: Information Retrieval System Area
45: Database object buffer
50: Data list buffer
55: Database

Claims

An index structure for securing a space in which a plurality of data lists are stored and indexing the space into a storage space of a corresponding data list according to a keyword input, the index structure comprising: a plurality of data lists including a physical identifier for storing an object identifier, Index storage structure using a sub-index and a large-capacity object, wherein a plurality of sub-indexes indirectly indexing a data identifier by using an offset array is connected to each data list.

2. The reverse index storage structure according to claim 1, wherein the sub-index is selectively connectable to a data list having a predetermined length or longer when connected to each data list stored in the large-capacity object.

A plurality of data lists including an object identifier, a physical identifier for storing a data identifier and occurrence position information are stored in a large-capacity object, and a plurality of subindexes indirectly indexing a data identifier using an offset array are matched to respective data lists A method for inserting new data on an index storage structure using a linked sub-index and a large object, the method comprising: a first step of searching for insufficient space for an offset array of new data; A second step of doubling the size of the current offset array if it is determined that the space for the offset arrangement is insufficient in the first step; If it is determined that the space for the offset arrangement is not insufficient in the first step or after the second step is performed, one element of the new data identifier is allocated to the empty offset array, Determining whether the identifier is larger than the identifiers; A fourth step of adding new data to the end of the existing data list when it is determined through the third step that the new data identifier is larger than the current data identifiers; A fifth step of inserting new data into the byte offset determined through the search of the sub-index when it is determined through the third step that the new data identifier is not larger than the current data identifiers; And after performing the fourth step or the fifth step, after correcting the value of the offset array elements recording the byte offset of the new data and all data whose byte offset has been changed, adding the identifier of the added data and the allocated offset array element And a seventh step of inserting the number into the sub-index. The method of inserting new data in an index storage structure using a sub-index and a large-capacity object.

A plurality of data lists including an object identifier, a physical identifier for storing a data identifier and occurrence position information are stored in a large-capacity object, and a plurality of subindexes indirectly indexing a data identifier using an offset array are matched to respective data lists A method of deleting data on an index storage structure using a sub-index and a large-capacity object, the method comprising: a first step of searching for a sub-index to find a storage location of data including an identifier of the deleted data and deleting the data; ; A second step of correcting the values of the offset array elements recording the byte offset of the data deleted in the first step and all the data whose storage location is changed; And a third step of deleting the sub-index entry pointing to the offset array element of the data deleted in the second step. &Lt; Desc / Clms Page number 19 >

A plurality of data lists including an object identifier, a physical identifier for storing a data identifier and occurrence position information are stored in a large-capacity object, and a plurality of sub-indexes indirectly indexing a data identifier by using an offset array, A method for inserting new data on an index storage structure using a matching sub-index and a large object, the method comprising: searching a keyword index based on an input keyword and searching for a large object storing a data list corresponding to the keyword; A first step; A second step of determining whether a specific data identifier exists; A third step of searching a sub-index to search for a given specific data and reading and returning the retrieved specific data if a specific data identifier is present in the second step; And a fourth step of sequentially reading and returning the data list stored in the large capacity object when the specific data identifier does not exist in the second step. Way.

A plurality of data lists including an object identifier, a physical identifier for storing a data identifier and occurrence position information are stored in a large-capacity object, and a plurality of sub-indexes indirectly indexing a data identifier by using an offset array, In a structure of a database system matched with an index storage structure using a matching sub-index and a large object, a plurality of database objects including a data identifier, a data number, an author, a creation date, and contents exist in the database system and,
Wherein the database object has a physical form matching the specific object identifier by using an object identifier and a data identifier in the concept of an address. rescue.

A plurality of data lists including an object identifier, a physical identifier for storing a data identifier and occurrence position information are stored in a large-capacity object, and a plurality of subindexes indirectly indexing a data identifier using an offset array are matched to respective data lists A sub index connected to the sub index; There is a plurality of database objects composed of a data identifier, a data number, a creator, a creation date, and contents, and the database object is a concept of an address, data having a physical form matching a specific object identifier A method for operating in a database management system having a base system, the method comprising the steps of: expanding the size of the current offset array twice when there is insufficient space for offsetting new data; The new data is added to the end of the existing data list if it is larger and the new data is inserted into the byte offset determined through the search of the sub-index in the other case, and then the value of the offset array elements recording the byte offset of all the changed data Edit New data insert mode for inserting an offset array element number allocated to the identifier of the added data, and then the sub-index and; The sub-index is searched to find the storage location of the data including the identifier of the deleted data to delete the corresponding data, and the value of the offset array elements recording the byte offset of the deleted data and all the changed data A data erasing mode for erasing a corresponding sub-index entry; And searches for a keyword index based on the input keyword, searches for a large-capacity object in which a data list corresponding to the keyword is stored, determines whether a specific data identifier exists, and if the keyword exists, And a data retrieval mode for sequentially reading and returning the data list stored in the large object when the specific data identifier is not present, and searching the sub index for retrieving the specific data, A method of combining database management system and information retrieval by index storage structure using sub - indexes and large objects.