KR102104496B1

KR102104496B1 - Method and apparatus of searching data

Info

Publication number: KR102104496B1
Application number: KR1020130107503A
Authority: KR
Inventors: 김항규; 김기성; 이형동; 김형주; 문봉기
Original assignee: 삼성전자주식회사; 서울대학교산학협력단
Priority date: 2013-09-06
Filing date: 2013-09-06
Publication date: 2020-04-24
Also published as: KR20150028934A

Abstract

본 발명의 실시예는 RDF(Resource Description Framework) 데이터를 검색하는 데에 소요되는 시간을 감소시킬 수 있는 데이터 검색 방법 및 장치에 관한 것으로서, 노드(node) 및 에지(edge)를 포함하는 그래프(graph)에 대응되는 적어도 하나 이상의 트리플 데이터(triple data)를 수신하는 단계, 트리플 데이터를 이용하여 그래프의 서브그래프에 대한 인덱스를 생성하는 단계, 그래프에서 검색할 질의 그래프에 대응되는 질의 트리플 데이터를 수신하는 단계, 질의 트리플 데이터 및 생성된 인덱스를 이용하여 트리플 데이터를 필터링하는 단계, 필터링된 트리플 데이터를 이용하여 질의 트리플 데이터를 검색하는 단계, 및 검색된 결과를 출력하는 단계를 포함하는 데이터 검색 방법을 제공할 수 있다.An embodiment of the present invention relates to a data retrieval method and apparatus capable of reducing the time required for retrieving Resource Description Framework (RDF) data, a graph including a node and an edge ) Receiving at least one or more triple data corresponding to), generating an index for a subgraph of the graph using the triple data, receiving query triple data corresponding to a query graph to be searched for in the graph Providing a data retrieval method comprising the steps of: filtering triple data using query triple data and the generated index, retrieving query triple data using filtered triple data, and outputting the searched results. You can.

Description

Data retrieval method and device {METHOD AND APPARATUS OF SEARCHING DATA}

본 발명의 실시예는 데이터 검색 방법 및 장치에 관한 것으로서, 보다 상세하게는 RDF(Resource Description Framework) 데이터를 검색하는 데에 소요되는 시간을 감소시킬 수 있는 데이터 검색 방법 및 장치에 관한 것이다.An embodiment of the present invention relates to a data retrieval method and apparatus, and more particularly, to a data retrieval method and apparatus capable of reducing time required for retrieving RDF (Resource Description Framework) data.

시멘틱 웹(Semantic Web)에 대한 표준으로서 RDF(Resource Description Framework)가 소개되었다. RDF 형식의 데이터는 유연성을 갖고 스키마에 제한이 없는 데이터 기술(description)이 필요한 분야에서 널리 이용되고 있다. 따라서, RDF 형식의 데이터는 그래프 형태의 데이터 기술이 필요한 바이오인포매틱스(bioinformatics), 메타데이터, 위키피디아, 소셜 네트워크 등과 같은 분야에서 대용량의 데이터를 기술하는 데에 널리 이용되고 있다.The Resource Description Framework (RDF) was introduced as a standard for the Semantic Web. RDF format data is widely used in fields where flexibility and schema-independent data description are required. Accordingly, RDF format data is widely used to describe large amounts of data in fields such as bioinformatics, metadata, Wikipedia, and social networks, which require graph-type data technology.

또한, 대용량으로 축적된 RDF 데이터에 대한 질의어로서 SPARQL 표준이 소개되었다. 이에 따라 RDF 데이터에 대한 SPARQL 질의를 처리하는 방법에 대한 관심이 증가되고 있다.In addition, the SPARQL standard was introduced as a query language for RDF data accumulated in large amounts. Accordingly, interest in a method of processing a SPARQL query for RDF data is increasing.

대부분의 RDF 저장소들은 데이터를 주어, 동사 및 목적어에 해당하는 세 개의 정보(Subject, Predicate, Object)를 포함하는 트리플(triple)을 하나의 단위로 저장한다. 따라서, RDF 저장소에 SPARQL 질의(query)가 입력되면, 질의를 처리하기 위해 트리플 단위로 저장된 데이터를 이용하여 다수의 조인(join) 연산이 수행될 수 있다. 조인 연산은 다른 종류의 연산에 비해 시간이 많이 소요될 수 있다. 따라서, 조인 연산의 속도는 RDF 저장소의 질의 처리 속도를 좌우할 수 있다.Most RDF repositories give data and store triples including three information (Subject, Predicate, Object) corresponding to the verb and object as a unit. Accordingly, when a SPARQL query is input to the RDF storage, a plurality of join operations may be performed using data stored in triple units to process the query. Join operations can take longer than other types of operations. Therefore, the speed of the join operation can influence the speed of query processing in the RDF repository.

그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 수행 시간을 줄이기 위해서는 조인 연산이 수행되는 횟수를 줄이거나, 조인 연산의 효율을 향상시키거나, 조인 연산의 대상이 되는 데이터의 양을 줄이는 방법이 이용될 수 있다.In order to reduce the execution time of the join operation performed when processing a query in graph data, the number of join operations is reduced, the efficiency of the join operation is improved, or the amount of data that is the object of the join operation is reduced. Reduction methods can be used.

조인 연산이 수행되는 횟수를 줄이는 방법은 Jena 또는 Oracle에서 주로 이용되는 방법이다. 상기 방법에서는 자주 사용되는 패턴들에 대해 조인을 수행한 결과로서 생성된 테이블을 미리 별도로 저장해 놓을 수 있다. 그러나 미리 조인을 수행할 범위를 결정하기가 매우 어려우며, 널(null) 값과 다중(multi) 값이 발생되어 연산의 성능이 저하될 수 있다.The method of reducing the number of join operations is mainly used in Jena or Oracle. In the above method, a table generated as a result of joining the frequently used patterns can be stored separately in advance. However, it is very difficult to determine the range in which to perform the join in advance, and a null value and a multi value are generated, which may degrade the performance of the operation.

조인 연산의 효율을 향상시키는 방법은 SW-Store, Hexastore, RDF-3X에서 주로 이용되는 방법이다. 상기 방법에서는 각각의 트리플에 대해 SPO(Subject-Predicate-Object), PSO(Predicate-Subject-Object), OPS(Object-Predicate-Subject) 등의 여러 인덱스를 미리 생성할 수 있다. 이로써 트리플에 대한 접근 속도가 향상될 수 있다. 또한, 이들에 대해 병합 조인(merge join)이 수행되는 경우 보다 효율적으로 조인 연산이 수행될 수 있다. The method of improving the efficiency of join operation is a method mainly used in SW-Store, Hexastore, and RDF-3X. In the above method, for each triple, several indexes such as SPO (Subject-Predicate-Object), PSO (Predicate-Subject-Object), and OPS (Object-Predicate-Subject) can be previously generated. This can speed up access to triples. In addition, when a merge join is performed on them, the join operation can be performed more efficiently.

조인 연산의 대상이 되는 데이터의 양을 줄이는 방법에서는, U-SIP(Ubiquitous Sideways Information Passing)와 같은 필터를 이용하여 조인 연산의 대상이 되지 않는 데이터를 입력 데이터에서 제외시킬 수 있다. RDF-3X에서도 이와 같은 방법이 이용되고 있다.In a method of reducing the amount of data that is a target of a join operation, data not subject to a join operation may be excluded from input data by using a filter such as Ubiquitous Sideways Information Passing (U-SIP). This method is also used in RDF-3X.

이외에도 GRIN index, DOGMA, PIG, gStore 등과 같은 SPARQL 질의 처리 시스템들이 인덱스 기법들을 제시하고 있다. 그러나, 이러한 시스템에서의 인덱스는 검색 범위를 줄이는 역할을 하는 것으로서, 조인 연산의 대상이 되는 데이터의 양을 줄이는 데에 적용하기에는 한계가 있을 수 있다.In addition, SPARQL query processing systems such as GRIN index, DOGMA, PIG, gStore, etc., offer indexing techniques. However, the index in such a system serves to reduce the search range, and may be limited in application to reduce the amount of data that is the object of the join operation.

본 발명의 실시예는 RDF(Resource Description Framework) 데이터를 검색하는 데에 소요되는 시간을 감소시킬 수 있는 데이터 검색 방법 및 장치를 제공할 수 있다.An embodiment of the present invention can provide a data retrieval method and apparatus that can reduce the time required to retrieve RDF (Resource Description Framework) data.

본 발명의 실시예는 그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 수행 시간을 감소시킬 수 있는 데이터 검색 방법 및 장치를 제공할 수 있다.An embodiment of the present invention can provide a data retrieval method and apparatus capable of reducing execution time of a join operation performed when processing a query in graph form data.

본 발명의 실시예는 그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 대상이 되는 데이터의 양을 감소시킬 수 있는 데이터 검색 방법 및 장치를 제공할 수 있다.An embodiment of the present invention can provide a data retrieval method and apparatus capable of reducing the amount of data that is a target of a join operation performed when processing a query in graph form data.

본 발명의 실시예는 그래프 형태의 데이터를 이용한 조인 연산이 수행되는 속도를 향상시킬 수 있는 데이터 검색 방법 및 장치를 제공할 수 있다.An embodiment of the present invention can provide a data retrieval method and apparatus capable of improving the speed at which join operations using graph data are performed.

본 발명의 실시예는 노드(node) 및 에지(edge)를 포함하는 그래프의 구조에 대한 정보를 이용하여, 조인 연산의 대상이 되는 그래프 형태의 데이터의 양을 감소시킬 수 있는 데이터 검색 방법 및 장치를 제공할 수 있다.An embodiment of the present invention is a data retrieval method and apparatus that can reduce the amount of data in the form of a graph that is a target of a join operation by using information on the structure of a graph including nodes and edges Can provide.

본 발명의 실시예에 따른 데이터 검색 방법은, 노드(node) 및 에지(edge)를 포함하는 그래프(graph)에 대응되는 적어도 하나 이상의 트리플 데이터(triple data)를 수신하는 단계, 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성하는 단계, 상기 그래프에서 검색할 질의 그래프에 대응되는 질의 트리플 데이터를 수신하는 단계, 상기 질의 트리플 데이터 및 상기 생성된 인덱스를 이용하여 상기 트리플 데이터를 필터링하는 단계, 상기 필터링된 트리플 데이터를 이용하여 상기 질의 트리플 데이터를 검색하는 단계, 및 상기 검색된 결과를 출력하는 단계를 포함할 수 있다.In the data retrieval method according to an embodiment of the present invention, receiving at least one or more triple data corresponding to a graph including a node and an edge, and using the triple data Generating an index for a subgraph of the graph, receiving query triple data corresponding to a query graph to be searched in the graph, and filtering the triple data using the query triple data and the generated index The method may include searching the query triple data using the filtered triple data, and outputting the searched results.

또한, 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성하는 단계는, 상기 트리플 데이터를 이용하여 상기 그래프를 재생성하는 단계, 상기 그래프에 포함된 임의의 서브그래프를 선택하는 단계, 상기 선택된 서브그래프에 대한 상기 인덱스를 생성하는 단계, 및 상기 그래프에 포함된 임의의 서브그래프를 선택하는 단계 내지 상기 선택된 서브그래프에 대한 인덱스를 생성하는 단계를 반복하는 단계를 포함할 수 있다.In addition, generating an index for a subgraph of the graph using the triple data may include regenerating the graph using the triple data, selecting an arbitrary subgraph included in the graph, and It may include the step of generating the index for the selected subgraph, and repeating the step of selecting any subgraph included in the graph to the step of generating an index for the selected subgraph.

또한, 상기 선택된 서브그래프에 대한 인덱스를 생성하는 단계는, 상기 그래프에서 상기 선택된 서브그래프가 매칭되는 횟수를 산출하는 단계, 및 상기 산출된 횟수가 기준 횟수보다 크거나 같은 경우 상기 선택된 서브그래프에 대한 상기 인덱스를 생성하는 단계를 포함할 수 있다.In addition, the step of generating an index for the selected subgraph may include calculating a number of times the selected subgraph is matched in the graph, and when the calculated number of times is greater than or equal to a reference number, for the selected subgraph. And generating the index.

또한, 상기 반복하는 단계는, 상기 그래프에 포함되고 상기 선택된 서브그래프와 서로 다른 서브그래프에 대해, 상기 서브그래프를 선택하는 단계 내지 상기 인덱스를 생성하는 단계를 반복하는 단계를 포함할 수 있다.Further, the repeating may include repeating the steps of selecting the subgraph or generating the index for subgraphs included in the graph and different from the selected subgraph.

또한, 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성하는 단계는, 상기 그래프에 포함된 상기 노드 중에서 상기 서브그래프에 포함된 각각의 노드에 대응되는 노드를 포함하는 목록을 생성하는 단계를 포함할 수 있다.In addition, the step of generating an index for a subgraph of the graph using the triple data generates a list including nodes corresponding to each node included in the subgraph among the nodes included in the graph. It may include steps.

또한, 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성하는 단계는, 상기 그래프에 포함된 상기 노드 중에서 상기 서브그래프에 포함된 각각의 에지와 인접한 노드에 대응되는 노드를 포함하는 목록을 생성하는 단계를 포함할 수 있다.In addition, the step of generating an index for a subgraph of the graph using the triple data is a list including a node corresponding to a node adjacent to each edge included in the subgraph among the nodes included in the graph It may include the step of generating.

또한, 상기 서브그래프는 상기 그래프에 포함된 상기 에지 중에서 적어도 2개 이상의 에지를 포함할 수 있다.In addition, the subgraph may include at least two or more edges among the edges included in the graph.

또한, 상기 질의 트리플 데이터 및 상기 생성된 인덱스를 이용하여 상기 트리플 데이터를 필터링하는 단계는, 상기 질의 트리플 데이터를 이용하여 상기 질의 그래프를 재생성하는 단계, 상기 질의 그래프에 포함된 각각의 에지에 대응되는 상기 트리플 데이터에 대한 조인 연산을 수행하는 순서를 결정하는 단계, 및 상기 생성된 인덱스를 이용하여 상기 각각의 조인 연산의 대상이 되는 상기 트리플 데이터를 필터링하는 단계를 포함할 수 있다.In addition, filtering the triple data using the query triple data and the generated index may include regenerating the query graph using the query triple data, and corresponding to each edge included in the query graph. The method may include determining an order of performing join operations on the triple data, and filtering the triple data targeted for each join operation using the generated index.

또한, 상기 생성된 인덱스를 이용하여 상기 각각의 조인 연산의 대상이 되는 상기 트리플 데이터를 필터링하는 단계는, 상기 생성된 인덱스를 이용하여, 상기 그래프에 포함된 상기 노드 중에서 상기 질의 그래프에 포함된 각각의 에지와 인접한 노드에 대응되는 노드를 포함하는 적어도 하나 이상의 대응 노드 목록을 생성하는 단계, 상기 각각의 대응 노드 목록에 공통적으로 포함된 노드를 포함하는 공통 노드 목록을 생성하는 단계, 및 상기 질의 그래프에 포함된 각각의 에지에 대응되는 상기 트리플 데이터 중에서, 상기 공통 노드 목록에 포함된 각각의 상기 노드에 대응되는 상기 트리플 데이터를 제외한 나머지 트리플 데이터를 필터링하는 단계를 포함할 수 있다.In addition, filtering the triple data that is the target of each join operation using the generated index may include each of the nodes included in the graph in the query graph using the generated index. Generating a list of at least one corresponding node including nodes corresponding to nodes adjacent to an edge of, generating a common node list including nodes commonly included in the respective corresponding node lists, and the query graph Filtering the remaining triple data except for the triple data corresponding to each node included in the common node list among the triple data corresponding to each edge included in the.

또한, 상기 필터링된 트리플 데이터를 이용하여 상기 질의 트리플 데이터를 검색하는 단계는, 상기 필터링된 트리플 데이터를 이용하여 상기 조인 연산을 수행하는 단계, 및 상기 조인 연산이 수행된 결과에서 상기 질의 트리플 데이터를 검색하는 단계를 포함할 수 있다.In addition, the step of retrieving the query triple data using the filtered triple data may include performing the join operation using the filtered triple data, and querying the query triple data from the result of the join operation. And searching.

본 발명의 실시예에 따른 데이터 검색 장치는, 노드(node) 및 에지(edge)를 포함하는 그래프(graph)에 대응되는 적어도 하나 이상의 트리플 데이터(triple data)를 수신하고, 상기 그래프에서 검색할 질의 그래프에 대응되는 질의 트리플 데이터를 수신하는 입력부, 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성하는 인덱스 생성부, 상기 질의 트리플 데이터 및 상기 생성된 인덱스를 이용하여 상기 트리플 데이터를 필터링하는 필터링부, 상기 필터링된 트리플 데이터를 이용하여 상기 질의 트리플 데이터를 검색하는 검색부, 및 상기 검색된 결과를 출력하는 출력부를 포함할 수 있다.The data retrieval apparatus according to an embodiment of the present invention receives at least one or more triple data corresponding to a graph including a node and an edge, and queries to search in the graph An input unit that receives query triple data corresponding to a graph, an index generator that generates an index for a subgraph of the graph using the triple data, and filters the triple data using the query triple data and the generated index It may include a filtering unit, a search unit for searching the query triple data using the filtered triple data, and an output unit for outputting the searched result.

또한, 상기 인덱스 생성부는 상기 트리플 데이터를 이용하여 상기 그래프를 재생성하고, 상기 인덱스 생성부는 상기 그래프에 포함된 임의의 서브그래프를 선택하고, 상기 인덱스 생성부는 상기 선택된 서브그래프에 대한 인덱스를 생성하고, 상기 인덱스 생성부는 상기 그래프에 포함된 임의의 서브그래프를 선택하는 단계 내지 상기 선택된 서브그래프에 대한 상기 인덱스를 생성하는 단계를 반복할 수 있다.In addition, the index generator regenerates the graph using the triple data, the index generator selects any subgraph included in the graph, and the index generator generates an index for the selected subgraph, The index generator may repeat the steps of selecting an arbitrary subgraph included in the graph or generating the index of the selected subgraph.

또한, 상기 인덱스 생성부는 상기 그래프에서 상기 선택된 서브그래프가 매칭되는 횟수를 산출하고, 상기 인덱스 생성부는 상기 산출된 횟수가 기준 횟수보다 크거나 같은 경우 상기 선택된 서브그래프에 대한 상기 인덱스를 생성할 수 있다.In addition, the index generator may calculate the number of times the selected subgraph matches in the graph, and the index generator may generate the index for the selected subgraph when the calculated number of times is greater than or equal to a reference number. .

또한, 상기 인덱스 생성부는, 상기 그래프에 포함되고 상기 선택된 서브그래프와 서로 다른 서브그래프에 대해, 상기 서브그래프를 선택하는 단계 내지 상기 인덱스를 생성하는 단계를 반복할 수 있다.Further, the index generator may repeat the steps of selecting the subgraph or generating the index for subgraphs included in the graph and different from the selected subgraph.

또한, 상기 인덱스는, 상기 그래프에 포함된 상기 노드 중에서 상기 서브그래프에 포함된 각각의 노드에 대응되는 노드를 포함하는 목록을 포함할 수 있다.In addition, the index may include a list including nodes corresponding to each node included in the subgraph among the nodes included in the graph.

또한, 상기 인덱스는, 상기 그래프에 포함된 상기 노드 중에서 상기 서브그래프에 포함된 각각의 에지와 인접한 노드에 대응되는 노드를 포함하는 목록을 포함할 수 있다.In addition, the index may include a list including nodes corresponding to nodes adjacent to each edge included in the subgraph among the nodes included in the graph.

또한, 상기 필터링부는 상기 질의 트리플 데이터를 이용하여 상기 질의 그래프를 재생성하고, 상기 필터링부는 상기 질의 그래프에 포함된 각각의 에지에 대응되는 상기 트리플 데이터에 대한 조인 연산을 수행하는 순서를 결정하고, 상기 필터링부는 상기 생성된 인덱스를 이용하여 상기 각각의 조인 연산의 대상이 되는 상기 트리플 데이터를 필터링할 수 있다.In addition, the filtering unit regenerates the query graph using the query triple data, and the filtering unit determines an order of performing a join operation on the triple data corresponding to each edge included in the query graph, and the The filtering unit may filter the triple data that is the target of each join operation using the generated index.

또한, 상기 필터링부는 상기 생성된 인덱스를 이용하여, 상기 그래프에 포함된 상기 노드 중에서 상기 질의 그래프에 포함된 각각의 에지와 인접한 노드에 대응되는 노드를 포함하는 적어도 하나 이상의 대응 노드 목록을 생성하고, 상기 필터링부는 상기 각각의 대응 노드 목록에 공통적으로 포함된 노드를 포함하는 공통 노드 목록을 생성하고, 상기 필터링부는 상기 질의 그래프에 포함된 각각의 에지에 대응되는 상기 트리플 데이터 중에서, 상기 공통 노드 목록에 포함된 각각의 상기 노드에 대응되는 상기 트리플 데이터를 제외한 나머지 트리플 데이터를 필터링할 수 있다.In addition, the filtering unit generates at least one corresponding node list including nodes corresponding to nodes adjacent to each edge included in the query graph among the nodes included in the graph, by using the generated index, The filtering unit generates a common node list including nodes commonly included in the respective corresponding node lists, and the filtering unit is selected from the triple data corresponding to each edge included in the query graph, in the common node list. The remaining triple data may be filtered except for the triple data corresponding to each node included.

또한, 상기 검색부는 상기 필터링된 트리플 데이터를 이용하여 상기 조인 연산을 수행하고, 상기 검색부는 상기 조인 연산이 수행된 결과에서 상기 질의 트리플 데이터를 검색할 수 있다.Also, the search unit may perform the join operation using the filtered triple data, and the search unit may search the query triple data from the result of the join operation.

본 발명의 실시예에 따르면 RDF(Resource Description Framework) 데이터를 검색하는 데에 소요되는 시간을 감소시킬 수 있다.According to an embodiment of the present invention, it is possible to reduce the time required to search for RDF (Resource Description Framework) data.

본 발명의 실시예에 따르면 그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 수행 시간을 감소시킬 수 있다.According to an embodiment of the present invention, it is possible to reduce the execution time of a join operation performed when processing a query in graph form data.

본 발명의 실시예에 따르면 그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 대상이 되는 데이터의 양을 감소시킬 수 있다.According to an embodiment of the present invention, the amount of data that is a target of a join operation performed when processing a query in graph form data can be reduced.

본 발명의 실시예에 따르면 그래프 형태의 데이터를 이용한 조인 연산이 수행되는 속도를 향상시킬 수 있다.According to an embodiment of the present invention, it is possible to improve the speed at which join operations using graph data are performed.

본 발명의 실시예에 따르면 노드(node) 및 에지(edge)를 포함하는 그래프의 구조에 대한 정보를 이용하여, 조인 연산의 대상이 되는 그래프 형태의 데이터의 양을 감소시킬 수 있다.According to an embodiment of the present invention, by using information about a structure of a graph including a node and an edge, the amount of data in the form of a graph that is a target of a join operation can be reduced.

도 1은 본 발명의 실시예에 따른 데이터 검색 방법이 수행되는 과정을 나타내는 순서도이다.
도 2는 본 발명의 실시예에 따른 전체 그래프를 나타내는 도면이다.
도 3은 본 발명의 실시예에 따라 서브그래프에 대한 인덱스를 생성하는 단계가 수행되는 과정을 나타내는 순서도이다.
도 4는 본 발명의 실시예에 따라 도 2에 나타난 그래프를 이용하여 생성된 인덱스를 나타내는 도면이다.
도 5는 본 발명의 실시예에 따른 질의 그래프를 나타내는 도면이다.
도 6은 본 발명의 실시예에 따라 트리플 데이터를 필터링하는 단계가 수행되는 과정을 나타내는 순서도이다.
도 7은 본 발명의 실시예에 따라 질의 트리플 데이터에 대한 조인 연산을 수행하는 순서를 결정하는 단계를 설명하기 위한 참고도이다.
도 8은 본 발명의 실시예에 따른 데이터 검색 장치의 구성을 나타내는 블록도이다.1 is a flowchart illustrating a process of performing a data search method according to an embodiment of the present invention.
2 is a view showing an overall graph according to an embodiment of the present invention.
3 is a flowchart illustrating a process in which the step of generating an index for a subgraph is performed according to an embodiment of the present invention.
4 is a diagram showing an index generated by using the graph shown in FIG. 2 according to an embodiment of the present invention.
5 is a diagram illustrating a query graph according to an embodiment of the present invention.
6 is a flowchart illustrating a process in which filtering of triple data is performed according to an embodiment of the present invention.
7 is a reference diagram for explaining a step of determining an order of performing a join operation on query triple data according to an embodiment of the present invention.
8 is a block diagram showing the configuration of a data retrieval apparatus according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and the ordinary knowledge in the technical field to which the present invention pertains. It is provided to fully inform the holder of the scope of the invention, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

비록 "제1" 또는 "제2" 등이 다양한 구성요소를 서술하기 위해서 사용되나, 이러한 구성요소는 상기와 같은 용어에 의해 제한되지 않는다. 상기와 같은 용어는 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용될 수 있다. 따라서, 이하에서 언급되는 제1구성요소는 본 발명의 기술적 사상 내에서 제2구성요소일 수도 있다.Although "first" or "second" are used to describe various components, these components are not limited by the terms above. The above terms may be used only to distinguish one component from another component. Accordingly, the first component mentioned below may be the second component within the technical spirit of the present invention.

본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소 또는 단계가 하나 이상의 다른 구성요소 또는 단계의 존재 또는 추가를 배제하지 않는다는 의미를 내포한다.The terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, “comprises” or “comprising” implies that the stated component or step does not exclude the presence or addition of one or more other components or steps.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 해석될 수 있다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used in the present specification may be interpreted as meanings commonly understood by those skilled in the art to which the present invention pertains. In addition, terms defined in the commonly used dictionary are not ideally or excessively interpreted unless explicitly defined.

이하에서는, 도 1 내지 도 8을 참조하여 본 발명의 실시예에 따른 데이터 검색 방법 및 데이터 검색 장치(100)에 대해 상세히 설명하기로 한다.Hereinafter, a data search method and a data search apparatus 100 according to an embodiment of the present invention will be described in detail with reference to FIGS. 1 to 8.

도 1은 본 발명의 실시예에 따른 데이터 검색 방법이 수행되는 과정을 나타내는 순서도이다. 도 1을 참조하면 본 발명의 실시예에 따른 데이터 검색 방법에서는 먼저, 노드(node) 및 에지(edge)를 포함하는 그래프(graph)에 대응되는 적어도 하나 이상의 트리플 데이터(triple data)를 수신하는 단계(S100)가 수행될 수 있다.1 is a flowchart illustrating a process of performing a data search method according to an embodiment of the present invention. Referring to FIG. 1, in a data search method according to an embodiment of the present invention, first, receiving at least one or more triple data corresponding to a graph including a node and an edge (S100) may be performed.

그래프는 노드 및 상기 노드를 서로 연결하는 에지를 포함할 수 있다. 그래프에 포함된 에지는 방향을 가질 수 있다. 예를 들어, 제1노드와 제2노드가 에지에 의해 서로 연결되어 있고 상기 에지는 제1노드로부터 제2노드를 향하는 방향을 가질 수 있다. 이 때, 상기 제1노드는 출발 노드라 칭할 수 있고, 상기 제2노드는 도착 노드라 칭할 수 있다. 그래프가 복수의 에지를 포함하는 경우, 그래프는 적어도 3개 이상의 노드를 포함할 수 있다.The graph may include a node and an edge connecting the nodes to each other. Edges included in the graph may have a direction. For example, the first node and the second node are connected to each other by an edge, and the edge may have a direction from the first node toward the second node. At this time, the first node may be referred to as a departure node, and the second node may be referred to as an arrival node. When the graph includes a plurality of edges, the graph may include at least three or more nodes.

RDF(Resource Description Framework) 저장소에 저장되는 각각의 데이터는 주어, 동사 및 목적어에 대응되는 세 개의 정보(Subject, Predicate, Object)를 포함하는 트리플(triple) 형식을 가질 수 있다. 따라서, RDF 저장소에 저장되는 각각의 데이터는 트리플 데이터라 칭할 수 있다.Each data stored in the Resource Description Framework (RDF) storage may have a triple format including three information (Subject, Predicate, Object) corresponding to a subject, a verb, and an object. Accordingly, each data stored in the RDF storage may be referred to as triple data.

그래프에 포함된 각각의 에지는 트리플 데이터로 표현될 수 있다. 예를 들어, 출발 노드는 트레플 데이터에 포함된 세 개의 정보 중에서 주어에 대응되는 정보에 대응될 수 있다. 또한, 에지는 트리플 데이터에 포함된 세 개의 정보 중에서 동사에 대응되는 정보에 대응될 수 있다. 또한, 도착 노드는 트리플 데이터에 포함된 세 개의 정보 중에서 목적어에 대응되는 정보에 대응될 수 있다. 하나의 에지는 하나의 트리플 데이터로 표현될 수 있으므로, 트리플 데이터를 수신하는 단계(S100)에서는 그래프에 포함된 에지의 개수와 동일한 개수의 트리플 데이터를 수신할 수 있다.Each edge included in the graph may be represented by triple data. For example, the departure node may correspond to information corresponding to a subject among three pieces of information included in the treble data. Also, an edge may correspond to information corresponding to a verb among three pieces of information included in triple data. In addition, the arrival node may correspond to information corresponding to the target word among three pieces of information included in the triple data. Since one edge may be represented by one triple data, in the step of receiving triple data (S100), triple data having the same number of edges included in the graph may be received.

도 2는 본 발명의 실시예에 따른 전체 그래프를 나타내는 도면이다. 도 2를 참조하면, 전체 그래프는 R1, R2, R3 및 R4의 4개의 분리된 그래프를 포함할 수 있다. 트리플 데이터를 수신하는 단계(S100)에서는 상기 전체 그래프에 대응되는 트리플 데이터를 수신할 수 있다. 다시 말해서, 전체 그래프에 포함된 각각의 에지에 대응되는 트리플 데이터가 수신될 수 있다. 또한, 전체 그래프에 포함된 에지의 개수와 동일한 개수의 트리플 데이터가 수신될 수 있다.2 is a view showing an overall graph according to an embodiment of the present invention. Referring to FIG. 2, the entire graph may include four separate graphs of R1, R2, R3, and R4. In step S100 of receiving triple data, triple data corresponding to the entire graph may be received. In other words, triple data corresponding to each edge included in the entire graph may be received. Also, triple data having the same number of edges included in the entire graph may be received.

도 2에 도시된 전체 그래프에 포함된 각각의 에지에 대응되는 15개의 트리플 데이터가 수신될 수 있다. 예를 들어, 그래프 R1에 포함된 각각의 에지에 대응되는 4개의 트리플 T1, T2, T3 및 T4가 수신될 수 있다. 예를 들어, T1 = {v1, p3, v2}, T2 = {v5, p4, v2}, T3 = {v2, p2, v3}, T4 = {v3, p1, v4}와 같이 표현될 수 있다.15 triple data corresponding to each edge included in the entire graph shown in FIG. 2 may be received. For example, four triples T1, T2, T3, and T4 corresponding to each edge included in the graph R1 may be received. For example, T1 = {v1, p3, v2}, T2 = {v5, p4, v2}, T3 = {v2, p2, v3}, T4 = {v3, p1, v4}.

다시 도 1을 참조하면, 다음으로, 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성하는 단계(S110)가 수행될 수 있다. 도 3은 본 발명의 실시예에 따라 서브그래프에 대한 인덱스를 생성하는 단계(S110)가 수행되는 과정을 나타내는 순서도이다.Referring back to FIG. 1, next, an operation (S110) of generating an index for a subgraph of the graph using the triple data may be performed. 3 is a flowchart illustrating a process in which the step of generating an index for a subgraph (S110) is performed according to an embodiment of the present invention.

도 3을 참조하면, 서브그래프에 대한 인덱스를 생성하는 단계(S110)는 먼저, 상기 트리플 데이터를 이용하여 상기 그래프를 재생성하는 단계(S111)가 수행될 수 있다. 예를 들어, 상기 트리플 데이터를 이용하여 도 2에 도시된 바와 같은 전체 그래프가 재생성될 수 있다.Referring to FIG. 3, in generating an index for a subgraph (S110), first, regenerating the graph using the triple data (S111) may be performed. For example, the entire graph as shown in FIG. 2 may be reconstructed using the triple data.

다음으로, 상기 그래프에 포함된 임의의 서브그래프를 선택하는 단계(S112)가 수행될 수 있다. 예를 들어, 도 2에 도시된 전체 그래프 중에서 그래프 R1의 에지 p1만을 포함하는 서브그래프가 선택될 수 있다. 서브그래프의 크기는 상기 서브그래프에 포함된 에지의 개수와 같을 수 있다. 따라서, 에지 p1만을 포함하는 서브그래프의 크기는 1일 수 있다.Next, a step (S112) of selecting any subgraph included in the graph may be performed. For example, a subgraph including only the edge p1 of the graph R1 may be selected from the entire graph illustrated in FIG. 2. The size of the subgraph may be the same as the number of edges included in the subgraph. Therefore, the size of the subgraph including only the edge p1 may be 1.

다시 도 3을 참조하면, 다음으로, 상기 그래프에서 상기 선택된 서브그래프가 매칭되는 횟수를 산출하는 단계(S113)가 수행될 수 있다. 예를 들어, 도 2에 도시된 전체 그래프에서 에지 p1이 매칭되는 횟수가 산출될 수 있다. 도 2을 참조하면, 그래프 R1, R2 및 R3에 각각 p1이 하나씩 포함되어 있으므로, 전체 그래프에서 에지 p1이 매칭되는 횟수는 3일 수 있다.Referring back to FIG. 3, next, a step (S113) of calculating the number of times the selected subgraph is matched in the graph may be performed. For example, the number of times the edge p1 is matched in the entire graph illustrated in FIG. 2 may be calculated. Referring to FIG. 2, since the graphs R1, R2, and R3 each include one p1, the number of times the edge p1 is matched in the entire graph may be three.

다시 도 3을 참조하면, 다음으로, 상기 산출된 횟수가 기준 횟수보다 크거나 같은 경우 상기 선택된 서브그래프에 대한 인덱스를 생성하는 단계(S114)가 수행될 수 있다. 상기 산출된 횟수가 기준 횟수보다 크거나 같은 경우에만 인덱스를 생성함으로써, 전체 그래프에 포함된 횟수가 비교적 많은 서브그래프에 대하여만 인덱스를 생성하도록 할 수 있다. 이와 같이, 전체 그래프에 포함된 횟수가 비교적 많은 서브그래프에 대하여 한정된 인덱스 저장 공간을 할당함으로써 효율을 향상시킬 수 있다.Referring back to FIG. 3, next, when the calculated number of times is greater than or equal to the reference number, an operation (S114) of generating an index for the selected subgraph may be performed. By generating an index only when the calculated number of times is greater than or equal to the reference number, an index can be generated only for subgraphs having a relatively large number of times included in the entire graph. As described above, efficiency can be improved by allocating a limited index storage space to subgraphs having a relatively large number of times included in the entire graph.

예를 들어, 기준 횟수가 3인 경우, 에지 p1만을 포함하는 서브그래프에 대한 인덱스가 생성될 수 있다. 도 4는 본 발명의 실시예에 따라 도 2에 나타난 그래프를 이용하여 생성된 인덱스를 나타내는 도면이다. 예를 들어, 에지 p1만을 포함하는 서브그래프에 대한 인덱스로서, 상기 그래프에 포함된 노드 중에서 상기 서브그래프에 포함된 각각의 노드에 대응될 수 있는 노드의 목록인 대응 노드 목록이 생성될 수 있다. 다시 말해서, 그래프에 포함된 노드 중에서 서브그래프에 포함된 각각의 에지와 인접한 노드에 대응되는 노드의 목록이 생성될 수 있다.For example, when the reference count is 3, an index for a subgraph including only edge p1 may be generated. 4 is a diagram showing an index generated by using the graph shown in FIG. 2 according to an embodiment of the present invention. For example, as an index for a subgraph including only the edge p1, a corresponding node list that is a list of nodes that can correspond to each node included in the subgraph among the nodes included in the graph may be generated. In other words, a list of nodes corresponding to nodes adjacent to each edge included in the subgraph among nodes included in the graph may be generated.

예를 들어, 그래프 R1, R2 및 R3에 포함된 에지 p1의 출발 노드는 각각 v3, v8, v14이므로, 도 4에 나타난 바와 같이 서브그래프 gp1의 제1노드에 대응될 수 있는 대응 노드 목록 Vlist(gp1, ?v1) = {v3, v8, v14}가 생성될 수 있다. 또한, 그래프 R1, R2 및 R3에 포함된 에지 p1의 도착 노드는 각각 v4, v9, v15이므로, 도 4에 나타난 바와 같이 서브그래프 gp1의 제2노드에 대응될 수 있는 대응 노드 목록 Vlist(gp1, ?v2) = {v4, v9, v15}가 생성될 수 있다.For example, since the starting nodes of the edges p1 included in the graphs R1, R2, and R3 are v3, v8, and v14, respectively, as shown in FIG. 4, the corresponding node list Vlist (which can correspond to the first node of the subgraph gp1) Vlist ( gp1,? v1) = {v3, v8, v14} can be generated. In addition, since the arrival nodes of the edge p1 included in the graphs R1, R2, and R3 are v4, v9, and v15, respectively, as shown in FIG. 4, the corresponding node list Vlist (gp1, which can correspond to the second node of the subgraph gp1) ? v2) = {v4, v9, v15} can be generated.

다시 도 3을 참조하면, 다음으로, 상기 그래프에 포함된 임의의 서브그래프를 선택하는 단계(S112) 내지 상기 선택된 서브그래프에 대한 인덱스를 생성하는 단계(S114)를 반복하는 단계(S115)가 수행될 수 있다. 다시 말해서, 상기 그래프에 포함되고 상기 선택된 서브그래프와 서로 다른 서브그래프에 대해 상기 과정이 반복될 수 있다. 이로써, 전체 그래프에 포함된 모든 서브그래프 각각에 대해, 서브그래프를 선택하는 단계(S112) 내지 상기 선택된 서브그래프에 대한 인덱스를 생성하는 단계(S114)가 반복될 수 있다.Referring back to FIG. 3, next, a step (S115) of repeating steps (S112) of selecting an arbitrary subgraph included in the graph to a step of generating an index for the selected subgraph (S114) is performed (S115). Can be. In other words, the process may be repeated for subgraphs included in the graph and different from the selected subgraph. Accordingly, for each of the subgraphs included in the entire graph, steps of selecting a subgraph (S112) to generating an index for the selected subgraph (S114) may be repeated.

최종적으로 완성된 인덱스는 도 4에 나타난 바와 같을 수 있다. 도 4를 참조하면, 최종적으로 완성된 인덱스는 서브그래프의 크기에 따라 정렬될 수 있다. 인덱스를 생성할 서브그래프의 최대 크기는 미리 정해질 수 있다. 예를 들어, 도 4에 나타난 바와 같이, 인덱스를 생성할 서브그래프의 최대 크기가 3으로 미리 정해질 수 있다. The final completed index may be as shown in FIG. 4. Referring to FIG. 4, the finally completed index may be sorted according to the size of the subgraph. The maximum size of the subgraph to create an index may be determined in advance. For example, as illustrated in FIG. 4, the maximum size of a subgraph to generate an index may be predetermined to 3.

상기와 같이, 인덱스를 생성할 서브그래프의 최대 크기가 미리 정해져 있고, 전체 그래프에 포함된 횟수가 비교적 많은 서브그래프에 대하여만 인덱스를 생성하므로, 인덱스를 생성하는 데에 소요되는 시간 및 공간이 절약될 수 있다.As described above, since the maximum size of a subgraph to generate an index is predetermined and an index is generated only for subgraphs having a relatively large number of times included in the entire graph, time and space required to create an index are saved. Can be.

다시 도 3을 참조하면, 다음으로, 상기 그래프에서 검색할 질의 그래프에 대응되는 질의 트리플 데이터를 수신하는 단계(S120)가 수행될 수 있다. 도 5는 본 발명의 실시예에 따른 질의 그래프를 나타내는 도면이다. 따라서, 질의 그래프에 포함된 각각의 에지에 대응되는 질의 트리플 데이터가 수신될 수 있다. 또한, 질의 그래프에 포함된 에지의 개수와 동일한 개수의 질의 트리플 데이터가 수신될 수 있다.Referring back to FIG. 3, next, a step (S120) of receiving query triple data corresponding to a query graph to be searched in the graph may be performed. 5 is a diagram illustrating a query graph according to an embodiment of the present invention. Accordingly, query triple data corresponding to each edge included in the query graph may be received. Also, the same number of query triple data may be received as the number of edges included in the query graph.

다시 도 3을 참조하면, 다음으로, 상기 질의 트리플 데이터 및 상기 생성된 인덱스를 이용하여 상기 트리플 데이터를 필터링하는 단계(S130)가 수행될 수 있다. 도 6은 본 발명의 실시예에 따라 트리플 데이터를 필터링하는 단계(S130)가 수행되는 과정을 나타내는 순서도이다.Referring back to FIG. 3, next, filtering the triple data using the query triple data and the generated index (S130) may be performed. 6 is a flowchart illustrating a process in which the step of filtering triple data (S130) is performed according to an embodiment of the present invention.

도 6을 참조하면, 트리플 데이터를 필터링하는 단계(S130)는 먼저, 상기 질의 트리플 데이터를 이용하여 상기 질의 그래프를 재생성하는 단계(S131)가 수행될 수 있다. 예를 들어, 상기 질의 트리플 데이터를 이용하여 도 5에 도시된 바와 같은 질의 그래프가 재생성될 수 있다.Referring to FIG. 6, in the step of filtering triple data (S130), a step of regenerating the query graph using the query triple data (S131) may be performed. For example, a query graph as shown in FIG. 5 may be regenerated using the query triple data.

다시 도 6을 참조하면, 다음으로, 상기 질의 그래프에 포함된 각각의 에지에 대응되는 상기 트리플 데이터에 대한 조인 연산을 수행하는 순서를 결정하는 단계(S132)가 수행될 수 있다. 도 7은 본 발명의 실시예에 따라 트리플 데이터에 대한 조인 연산을 수행하는 순서를 결정하는 단계(S132)를 설명하기 위한 참고도이다. Referring back to FIG. 6, next, a step (S132) of determining an order of performing a join operation on the triple data corresponding to each edge included in the query graph may be performed. 7 is a reference diagram for explaining a step (S132) of determining an order of performing a join operation on triple data according to an embodiment of the present invention.

도 7을 참조하면, 예를 들어, 가장 먼저 에지 p1에 대응되는 트리플 데이터와 에지 p2에 대응되는 트리플 데이터에 대한 조인 연산이 수행되도록 할 수 있다. 다음으로, 상기 조인 연산이 수행된 결과와 에지 p3에 대응되는 트리플 데이터에 대한 조인 연산이 수행되도록 할 수 있다. 다음으로, 상기 조인 연산이 수행된 결과와 에지 p4에 대응되는 트리플 데이터에 대한 조인 연산이 수행되도록 할 수 있다.Referring to FIG. 7, for example, first, a join operation may be performed on triple data corresponding to edge p1 and triple data corresponding to edge p2. Next, a join operation may be performed on the result of the join operation and triple data corresponding to edge p3. Next, a join operation may be performed on the result of the join operation and triple data corresponding to edge p4.

다시 도 6을 참조하면, 다음으로, 상기 생성된 인덱스를 이용하여 상기 각각의 조인 연산의 대상이 되는 상기 트리플 데이터를 필터링하는 단계(S133)가 수행될 수 있다.Referring back to FIG. 6, next, a step (S133) of filtering the triple data that is the target of each join operation may be performed using the generated index.

예를 들어, 가장 먼저 에지 p1에 대응되는 트리플 데이터가 필터링될 수 있다. 도 5를 참조하면, 에지 p1의 출발 노드에는 제3노드(?v3)가 위치하며 에지 p1의 도착 노드에는 제4노드(?v4)가 위치하고 있다. 또한, 에지 p1의 출발 노드에 대응되는 지점에는 에지 p3, p4 및 p2를 포함하는 서브그래프가 위치하므로, 제3노드(?v3)는 에지 p3, p4 및 p2를 포함하는 서브그래프의 일단에 위치한다.For example, triple data corresponding to edge p1 may be filtered first. Referring to FIG. 5, a third node (? V3) is located at a departure node of edge p1 and a fourth node (? V4) is located at an arrival node of edge p1. In addition, since a subgraph including edges p3, p4, and p2 is located at a point corresponding to the starting node of edge p1, the third node (? V3) is located at one end of the subgraph including edges p3, p4, and p2. do.

도 2에 도시된 인덱스를 참조하면, 에지 p1의 출발 노드에 대응될 수 있는 대응 노드 목록을 산출하기 위해, 서브그래프 gp1의 제1노드(?v1)에 대한 대응 노드 목록 Vlist(gp1, ?v1) 및 서브그래프 gp5의 제4노드(?v4)에 대한 대응 노드 목록 Vlist(gp5, ?v4)가 참조될 수 있다. 에지 p1의 출발 노드에 대응되는 지점에는 에지 p3, p4 및 p2를 포함하는 서브그래프가 위치하므로 상기와 같이 서브그래프 gp5의 제4노드(?v4)에 대한 대응 노드 목록 Vlist(gp5, ?v4)가 참조될 수 있다.Referring to the index illustrated in FIG. 2, in order to calculate a corresponding node list that can correspond to the starting node of edge p1, a corresponding node list Vlist (gp1,? V1) for the first node (? V1) of subgraph gp1 ) And the corresponding node list Vlist (gp5,? V4) for the fourth node (? V4) of the subgraph gp5 may be referenced. Since the subgraph including the edges p3, p4, and p2 is located at the point corresponding to the starting node of the edge p1, the corresponding node list Vlist (gp5,? V4) for the fourth node (? V4) of the subgraph gp5 as described above Can be referenced.

대응 노드 목록 Vlist(gp1, ?v1)에는 v3, v8, v14가 포함되어 있으며 대응 노드 목록 Vlist(gp5, ?v4)에는 v3, v18이 포함되어 있다. 질의 그래프에서 에지 p1의 출발 노드는 서브그래프 gp1의 제1노드(?v1)이면서 동시에 서브그래프 gp5의 제4노드(?v4)이므로, 대응 노드 목록 Vlist(gp1, ?v1)과 대응 노드 목록 Vlist(gp5, ?v4)에 동시에 포함된 노드만이 에지 p1의 출발 노드에 대응될 수 있다. 따라서, v3만이 에지 p1의 출발 노드에 대응될 수 있다.The corresponding node list Vlist (gp1,? V1) includes v3, v8, v14, and the corresponding node list Vlist (gp5,? V4) contains v3, v18. In the query graph, the starting node of the edge p1 is the first node (? V1) of the subgraph gp1 and at the same time the fourth node (? V4) of the subgraph gp5, so the corresponding node list Vlist (gp1,? V1) and the corresponding node list Vlist Only nodes included in (gp5,? v4) at the same time may correspond to the starting node of edge p1. Therefore, only v3 can correspond to the starting node of edge p1.

또한, 에지 p1의 도착 노드에 대응될 수 있는 대응 노드 목록을 산출하기 위해 서브그래프 gp1의 제2노드(?v2)에 대한 대응 노드 목록 Vlist(gp2, ?v2)이 참조될 수 있다. 대응 노드 목록 Vlist(gp2, ?v2)에는 v4, v9, v15가 포함되어 있다. 이 중에서 에지 p1의 출발 노드에 대응되는 v3에 대응되는 도착 노드는 그래프 R1의 v4뿐이므로, 에지 p1에 대한 모든 트리플 데이터 중에서 트리플 데이터 {v3, p1, v4}를 제외한 나머지 트리플 데이터는 필터링될 수 있다.Also, the corresponding node list Vlist (gp2,? V2) for the second node (? V2) of the subgraph gp1 may be referenced to calculate a list of corresponding nodes that can correspond to the arrival node of the edge p1. The corresponding node list Vlist (gp2,? V2) includes v4, v9, and v15. Among them, since only the arrival node corresponding to v3 corresponding to the departure node of the edge p1 is v4 of the graph R1, among all the triple data for the edge p1, the triple data except triple data {v3, p1, v4} can be filtered. have.

상기와 유사하게 에지 p2에 대한 트리플 데이터를 필터링하면, 에지 p2에 대한 모든 트리플 데이터 중에서 트리플 데이터 {v2, p2, v3}, {v17, p2, v18}을 제외한 나머지 트리플 데이터는 필터링될 수 있다. 또한, 에지 p3에 대한 모든 트리플 데이터 중에서 트리플 데이터 {v1, p3, v2}, {v16, p3, v17}을 제외한 나머지 트리플 데이터는 필터링될 수 있다. 또한, 에지 p4에 대한 모든 트리플 데이터 중에서 트리플 데이터 {v5, p4, v2}, {v19, p4, v17}을 제외한 나머지 트리플 데이터는 필터링될 수 있다.When filtering the triple data for the edge p2 similarly to the above, among the triple data for the edge p2, the triple data other than the triple data {v2, p2, v3}, and {v17, p2, v18} may be filtered. Also, among all triple data for the edge p3, the triple data other than the triple data {v1, p3, v2}, and {v16, p3, v17} may be filtered. In addition, among all triple data for the edge p4, the triple data other than the triple data {v5, p4, v2}, and {v19, p4, v17} may be filtered.

다시 도 3을 참조하면, 다음으로, 상기 필터링된 트리플 데이터를 이용하여 상기 조인 연산을 수행하는 단계(S140)가 수행될 수 있다. 상기 실시예를 참조하면, 조인 연산의 대상이 되는 트리플 데이터의 수는 1 + 2 + 2 + 2 = 7개일 수 있다.Referring back to FIG. 3, next, an operation (S140) of performing the join operation using the filtered triple data may be performed. Referring to the above embodiment, the number of triple data targeted for the join operation may be 1 + 2 + 2 + 2 = 7.

만약, 상기와 같이 트리플 데이터를 필터링하지 않고 조인 연산을 수행하는 경우에는, 각각의 에지에 대한 모든 트리플 데이터를 이용하여 조인 연산이 수행될 수 있다. 이 때, 조인 연산의 대상이 되는 트리플 데이터의 수는 3 + 5 + 4 + 3 = 15개일 수 있다.If the join operation is performed without filtering the triple data as described above, the join operation may be performed using all the triple data for each edge. At this time, the number of triple data that is the target of the join operation may be 3 + 5 + 4 + 3 = 15.

또한, 만약 상기와 같이 생성된 인덱스를 이용하여 트리플 데이터를 필터링하지 않고, 단순히 출발 노드 및 도착 노드에 대응될 수 있는 대응 노드 목록을 이용하여 트리플 데이터를 필터링한 경우를 가정하기로 한다. 이 때, 조인 연산의 대상이 되는 트리플 데이터의 수는 3 + 4 + 3 + 1 = 11개일 수 있다.In addition, it is assumed that triple data is filtered using a list of corresponding nodes that can correspond to a departure node and an arrival node, rather than filtering triple data using the index generated as described above. At this time, the number of triple data that is the target of the join operation may be 3 + 4 + 3 + 1 = 11.

따라서, 본 발명의 실시예에서와 같이 서로 다른 개수의 에지를 포함하는 서브그래프 각각에 대한 인덱스를 생성하고, 상기 생성된 인덱스를 이용하여 트리플 데이터를 필터링 하는 경우, 다른 방법에 비해 조인 연산의 대상이 되는 트리플 데이터의 개수가 가장 작을 수 있다.Therefore, as in the embodiment of the present invention, when generating indexes for each of the subgraphs including different numbers of edges, and filtering triple data using the generated index, the object of the join operation is compared to other methods This may be the smallest number of triple data.

다시 도 3을 참조하면, 다음으로, 상기 조인 연산이 수행된 결과에서 상기 질의 트리플 데이터를 검색하는 단계(S150)가 수행될 수 있다. 상기 실시예를 참조하면, 조인 연산이 수행된 결과 중에서 도 2의 그래프 R1에 대응되는 트리플 데이터가 검색될 수 있다.Referring back to FIG. 3, next, a step (S150) of retrieving the query triple data from the result of the join operation may be performed. Referring to the above embodiment, triple data corresponding to the graph R1 of FIG. 2 may be searched among the results of the join operation.

다시 도 6을 참조하면, 다음으로, 상기 검색된 결과를 출력하는 단계(S160)가 수행될 수 있다.Referring back to FIG. 6, next, a step (S160) of outputting the searched result may be performed.

도 8은 본 발명의 실시예에 따른 데이터 검색 장치(100)의 구성을 나타내는 블록도이다. 도 8을 참조하면, 본 발명의 실시예에 따른 데이터 검색 장치(100)는, 입력부(110), 인덱스 생성부(120), 필터링부(130), 검색부(140), 및 출력부(150)를 포함할 수 있다.8 is a block diagram showing the configuration of a data retrieval apparatus 100 according to an embodiment of the present invention. Referring to FIG. 8, the data retrieval apparatus 100 according to an embodiment of the present invention includes an input unit 110, an index generation unit 120, a filtering unit 130, a search unit 140, and an output unit 150 ).

입력부(110)는 노드(node) 및 에지(edge)를 포함하는 그래프(graph)에 대응되는 적어도 하나 이상의 트리플 데이터(triple data)를 수신할 수 있다. 또한, 입력부(110)는 상기 그래프에서 검색할 질의 그래프에 대응되는 질의 트리플 데이터를 수신할 수 있다. 입력부(110)는 예를 들어, 네트워크 어댑터와 같은 통신 장치 또는 기타 입력 장치일 수 있다. 입력부(110)에 대한 상세한 내용은 상술한 트리플 데이터를 수신하는 단계(S100) 및 질의 트리플 데이터를 수신하는 단계(S120)와 대응되므로 자세한 설명은 생략하기로 한다.The input unit 110 may receive at least one or more triple data corresponding to a graph including a node and an edge. Also, the input unit 110 may receive query triple data corresponding to a query graph to be searched in the graph. The input unit 110 may be, for example, a communication device such as a network adapter or other input device. Details of the input unit 110 correspond to the above-described step of receiving the triple data (S100) and the step of receiving the query triple data (S120), so a detailed description thereof will be omitted.

인덱스 생성부(120)는 상기 트리플 데이터를 이용하여 상기 그래프의 서브그래프에 대한 인덱스를 생성할 수 있다. 인덱스 생성부(120)는 예를 들어, 중앙 처리 장치와 같은 연산 장치, 데이터베이스, 서버 또는 단말 장치일 수 있다. 인덱스 생성부(120)에 대한 상세한 내용은 상술한 인덱스를 생성하는 단계(S110)와 대응되므로 자세한 설명은 생략하기로 한다.The index generator 120 may generate an index for a subgraph of the graph using the triple data. The index generator 120 may be, for example, a computing device such as a central processing unit, a database, a server, or a terminal device. Details of the index generator 120 correspond to step S110 of generating the above-described index, so a detailed description thereof will be omitted.

필터링부(130)는 상기 질의 트리플 데이터 및 상기 생성된 인덱스를 이용하여 상기 트리플 데이터를 필터링할 수 있다. 필터링부(130)는 예를 들어, 중앙 처리 장치와 같은 연산 장치, 데이터베이스, 서버 또는 단말 장치일 수 있다. 필터링부(130)에 대한 상세한 내용은 상술한 트리플 데이터를 필터링하는 단계(S130)와 대응되므로 자세한 설명은 생략하기로 한다.The filtering unit 130 may filter the triple data using the query triple data and the generated index. The filtering unit 130 may be, for example, a computing device such as a central processing unit, a database, a server, or a terminal device. The details of the filtering unit 130 correspond to the step (S130) of filtering the triple data described above, so a detailed description thereof will be omitted.

검색부(140)는 상기 필터링된 트리플 데이터를 이용하여 상기 질의 트리플 데이터를 검색할 수 있다. 검색부(140)는 예를 들어, 중앙 처리 장치와 같은 연산 장치, 데이터베이스, 서버 또는 단말 장치일 수 있다. 검색부(140)에 대한 상세한 내용은 상술한 조인 연산을 수행하는 단계(S140) 및 질의 트리플 데이터를 검색하는 단계(S150)와 대응되므로 자세한 설명은 생략하기로 한다.The search unit 140 may search for the query triple data using the filtered triple data. The search unit 140 may be, for example, a computing device such as a central processing unit, a database, a server, or a terminal device. The details of the search unit 140 correspond to the step S140 of performing the above-described join operation and the step S150 of searching the query triple data, so a detailed description thereof will be omitted.

출력부(150)는 상기 검색된 결과를 출력할 수 있다. 출력부(150)는 예를 들어, 네트워크 어댑터와 같은 통신 장치 또는 기타 출력 장치일 수 있다. 출력부(150)에 대한 상세한 내용은 상술한 검색된 결과를 출력하는 단계(S160)와 대응되므로 자세한 설명은 생략하기로 한다.The output unit 150 may output the searched result. The output unit 150 may be, for example, a communication device such as a network adapter or other output device. The details of the output unit 150 correspond to the step (S160) of outputting the above-mentioned searched results, so a detailed description thereof will be omitted.

이상에서 설명한 본 발명의 실시예에 따르면 RDF(Resource Description Framework) 데이터를 검색하는 데에 소요되는 시간을 감소시킬 수 있다. 또한, 그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 수행 시간을 감소시킬 수 있다. 또한, 그래프 형태의 데이터에서의 질의를 처리할 때 수행되는 조인 연산의 대상이 되는 데이터의 양을 감소시킬 수 있다. 또한, 그래프 형태의 데이터를 이용한 조인 연산이 수행되는 속도를 향상시킬 수 있다. 또한, 노드(node) 및 에지(edge)를 포함하는 그래프의 구조에 대한 정보를 이용하여, 조인 연산의 대상이 되는 그래프 형태의 데이터의 양을 감소시킬 수 있다.According to the embodiment of the present invention described above, it is possible to reduce the time required for retrieving RDF (Resource Description Framework) data. In addition, it is possible to reduce the execution time of a join operation performed when processing a query in graph form data. In addition, it is possible to reduce the amount of data that is the object of a join operation performed when processing a query in graph-type data. In addition, it is possible to improve the speed at which the join operation using graph data is performed. In addition, by using information on the structure of the graph including nodes and edges, the amount of data in the form of a graph that is a target of a join operation can be reduced.

이상에서 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can implement the present invention in other specific forms without changing its technical spirit or essential features. You will understand that there is. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100: 데이터 검색 장치
110: 입력부
120: 인덱스 생성부
130: 필터링부
140: 검색부
150: 출력부100: data retrieval device
110: input unit
120: index generation unit
130: filtering unit
140: search unit
150: output unit

Claims

In the data search method of the data search device,
Receiving at least one or more triple data corresponding to a graph including a node and an edge;
Generating an index for a subgraph of the graph using the triple data;
Receiving query triple data corresponding to a query graph to be searched in the graph;
Filtering the triple data using the query triple data and the generated index;
Retrieving the query triple data using the filtered triple data; And
Outputting the searched result
Data search method comprising a.

According to claim 1,
Generating an index for the subgraph of the graph using the triple data,
Regenerating the graph using the triple data;
Selecting any subgraph included in the graph;
Generating the index for the selected subgraph; And
Repeating the steps of selecting any subgraph included in the graph or generating an index for the selected subgraph.
Data search method comprising a.

According to claim 2,
The step of generating an index for the selected subgraph,
Calculating a number of times the selected subgraph is matched in the graph; And
Generating the index for the selected subgraph when the calculated number of times is greater than or equal to a reference number of times
Data search method comprising a.

According to claim 2,
The repeating step,
Repeating the step of selecting the subgraph or generating the index for subgraphs included in the graph and different from the selected subgraph.
Data search method comprising a.

According to claim 1,
Generating an index for the subgraph of the graph using the triple data,
Generating a list including a node corresponding to each node included in the subgraph among the nodes included in the graph
Data search method comprising a.

According to claim 1,
Generating an index for the subgraph of the graph using the triple data,
Generating a list including nodes corresponding to nodes adjacent to each edge included in the subgraph among the nodes included in the graph.
Data search method comprising a.

According to claim 1,
The subgraph includes at least two or more edges among the edges included in the graph.

According to claim 1,
Filtering the triple data using the query triple data and the generated index may include:
Regenerating the query graph using the query triple data;
Determining an order of performing a join operation on the triple data corresponding to each edge included in the query graph; And
Filtering the triple data that is the object of each join operation using the generated index
Data search method comprising a.

The method of claim 8,
Filtering the triple data, which is the target of each join operation, using the generated index,
Generating at least one corresponding node list including nodes corresponding to nodes adjacent to each edge included in the query graph among the nodes included in the graph using the generated index;
Generating a common node list including nodes commonly included in the respective corresponding node lists; And
Filtering the remaining triple data excluding the triple data corresponding to each node included in the common node list among the triple data corresponding to each edge included in the query graph.
Data search method comprising a.

The method of claim 8,
Searching for the query triple data using the filtered triple data may include:
Performing the join operation using the filtered triple data; And
Retrieving the query triple data from the result of the join operation
Data search method comprising a.

An input unit receiving at least one triple data corresponding to a graph including a node and an edge, and receiving query triple data corresponding to a query graph to be searched in the graph;
An index generator configured to generate an index for a subgraph of the graph using the triple data;
A filtering unit filtering the triple data using the query triple data and the generated index;
A search unit that searches the query triple data using the filtered triple data; And
An output unit that outputs the searched results
Data search device comprising a.

The method of claim 11,
The index generator regenerates the graph using the triple data, the index generator selects any subgraph included in the graph, and the index generator generates an index for the selected subgraph, and the index The generation unit repeats the steps of selecting an arbitrary subgraph included in the graph or generating the index for the selected subgraph.

The method of claim 12,
The index generator calculates the number of times the selected subgraph matches in the graph, and the index generator generates the index for the selected subgraph when the calculated number of times is greater than or equal to a reference number.

The method of claim 12,
The index generation unit, the data retrieval apparatus that repeats the steps of selecting the subgraph or generating the index for subgraphs different from the selected subgraph included in the graph.

The method of claim 11,
The index is a data search apparatus including a list including nodes corresponding to each node included in the subgraph among the nodes included in the graph.

The method of claim 11,
The index, the data search apparatus including a list including a node corresponding to a node adjacent to each edge included in the subgraph among the nodes included in the graph.

The method of claim 11,
The subgraph is a data retrieval apparatus including at least two or more edges among the edges included in the graph.

The method of claim 11,
The filtering unit regenerates the query graph using the query triple data, the filtering unit determines an order of performing a join operation on the triple data corresponding to each edge included in the query graph, and the filtering unit A data retrieval device that filters the triple data that is the target of each join operation using the generated index.

The method of claim 18,
The filtering unit generates a list of at least one corresponding node including a node corresponding to a node adjacent to each edge included in the query graph among the nodes included in the graph, by using the generated index, and filtering The unit generates a common node list including nodes commonly included in each corresponding node list, and the filtering unit is included in the common node list among the triple data corresponding to each edge included in the query graph. A data retrieval apparatus for filtering the remaining triple data except for the triple data corresponding to each node.

The method of claim 18,
The search unit performs the join operation using the filtered triple data, and the search unit retrieves the query triple data from the result of the join operation.