KR101714435B1

KR101714435B1 - Method of mining a pattern related to input pattern, apparatus performing the same and storage medium storing a program performing the same

Info

Publication number: KR101714435B1
Application number: KR1020150052735A
Authority: KR
Inventors: 윤은일; 양흥모; 이강인; 김동규; 황진규
Original assignee: 세종대학교산학협력단
Priority date: 2015-04-14
Filing date: 2015-04-14
Publication date: 2017-03-09
Also published as: KR20160122589A

Abstract

본 발명은 연관 패턴 마이닝 방법 및 서버에 관한 것으로, 연관 패턴 마이닝 방법은 연관 패턴 마이닝 방법은 설정된 시간 간격을 기초로 배치를 할당하는 단계, 상기 할당된 배치에 상응하는 시간 범위 내에서 발생한 데이터를 기초로 트리 기반의 자료구조를 생성하는 단계, 탐색 시간 범위와 탐색 패턴을 입력받는 단계, 상기 입력된 탐색 시간 범위를 기초로 상기 탐색 패턴을 탐색할 배치를 결정하고, 결정된 배치에 대한 트리 기반의 자료구조를 상기 입력된 탐색 패턴을 기초로 재구성하는 단계 및 상기 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색하는 단계를 포함한다.The present invention relates to an associative pattern mining method and a server, wherein an associative pattern mining method includes: allocating a layout based on a set time interval; generating data based on a time range corresponding to the allocated layout; Generating a tree-based data structure, receiving a search time range and a search pattern, determining a search pattern for searching the search pattern based on the input search time range, Reconstructing a structure based on the input search pattern, and searching an association pattern in the corresponding layout based on the reconstructed tree-based data structure.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an association pattern mining method, an association pattern mining method, an association pattern mining method, and a recording medium recording a program for performing the association pattern mining method.

본 발명은 연관 패턴 마이닝 방법, 연관 패턴 마이닝 서버 및 이를 수행하는 프로그램을 기록하는 기록매체에 관한 것으로, 보다 상세하게는 시간에 민감한 슬라이딩 윈도우(Time-sensitive sliding window)를 기반으로 입력 패턴과 연관된 연관 패턴을 정해진 시간 범위 내에서 마이닝하는 방법, 장치 및 이를 수행하는 프로그램을 기록하는 기록매체에 관한 것이다.The present invention relates to an associative pattern mining method, an associative pattern mining server, and a recording medium storing a program for executing the associative pattern mining method. More particularly, the present invention relates to an association A method for mining a pattern within a predetermined time range, a device, and a recording medium for recording a program for performing the method.

데이터 스트림 환경에서 사용자가 정의한 최소 임계치를 기반으로 패턴을 마이닝(mining)하는 방식은 실시간으로 해당 조건을 만족하는 패턴들을 찾아낼 수 있다. 그러나, 이러한 방식은 사용자가 분석하고자 하는 특정 패턴에 연관된 패턴들만으로 마이닝 결과를 한정할 수 없는 문제점이 있다.In a data stream environment, a method of mining a pattern based on a minimum threshold defined by a user can find patterns satisfying the condition in real time. However, this method has a problem that the mining result can not be limited only by the patterns associated with the specific pattern to be analyzed by the user.

특정 패턴에 연관된 패턴들을 마이닝하기 위해서는 최소 임계치를 변경하며 여러 번의 마이닝 과정을 수행하여야 하며, 사용자는 마이닝 결과를 매번 살펴보아야 한다. 이러한 방식은 원하는 결과를 얻기 위해 많은 시간이 소요될 수 있다.In order to mine the patterns associated with a specific pattern, the minima must be changed by changing the minimum threshold, and the user must look at the mining results every time. This approach can take a lot of time to get the desired results.

한편, 시간 간격 또는 시간차를 기준으로 패턴을 마이닝하는 방식은 사용자가 마이닝을 수행하기 위한 시간 범위를 설정할 수 없을 뿐만 아니라 최소 임계치를 기반으로 패턴을 마이닝하는 방식과 마찬가지로 입력 패턴에 연관된 패턴들을 마이닝할 수 없다는 문제점이 있다. 즉, 상기 방식들은 최소 임계치를 기반으로 해당 조건을 만족하는 패턴들을 마이닝할 수는 있지만, 최소 임계치 설정 없이 입력된 패턴에 연관된 패턴들을 효율적으로 발견하는 데에는 어려움이 있다.Meanwhile, a method of mining a pattern based on a time interval or a time difference can not set a time range for a user to perform a mining operation. In addition to mining patterns based on a minimum threshold, There is a problem that it can not be done. That is, although the above schemes can mine the patterns satisfying the corresponding conditions based on the minimum threshold, it is difficult to efficiently detect the patterns associated with the inputted patterns without setting the minimum threshold.

한국등록특허 제10-1376444호는 데이터 스트림에서 가중치를 고려하여 하향식으로 트리를 탐색하는 패턴 마이닝 방법에 관한 것으로서, 패턴 마이닝 장치는 데이터 스트림이 가중치 내림차순으로 정렬된 제1 트랜잭션을 생성하는 단계, 패턴 마이닝 장치는 상기 제1 트랜잭션을 WP-트리(Weighed Pattern-tree)에 삽입하는 단계, 패턴 마이닝 장치는 상기 WP-트리를 하향식으로 탐색하는 단계, 패턴 마이닝 장치는 탐색한 경로를 이용하여 제2 트랜잭션을 생성하는 단계, 패턴 마이닝 장치는 상기 제2 트랜잭션을 데이터베이스로 하는 방식으로 조건적 데이터베이스를 생성하는 단계 및 패턴 마이닝 장치는 상기 조건적 데이터베이스를 사용하여 투영화 작업을 수행하는 단계를 포함한다.Korean Patent Registration No. 10-1376444 relates to a pattern mining method for searching a tree in a top-down manner considering a weight in a data stream. The pattern mining apparatus includes a step of generating a first transaction in which data streams are sorted in descending weight order, Wherein the mining device inserts the first transaction into a Weighed Pattern-tree, the pattern mining device searches the WP-tree top-down, the pattern mining device uses the traversed path to perform a second transaction Generating a conditional database by using the second transaction as a database, and performing patterning work using the conditional database by the pattern mining apparatus.

한국등록특허 제10-0812378호는 지속적으로 발생되는 트랜잭션 데이터 집합인 데이터 스트림 환경에서 빈발항목집합 탐색을 위한 축약형 전위 트리를 이용한 빈발항목집합 탐색 방법에 관한 것으로, 데이터 스트림 환경에서 빈발 항목을 탐색하는데 효과적인 데이터 구조를 정의하고 이 데이터 구조를 이용하여 필요한 정보를 탐색한다. 한국등록특허 제10-0812378호는 데이터 구조를 축약형 전위 트리 구조라 정의하며 축약형 전위 트리는 기존의 데이터 마이닝에 응용된 전위 트리 구조와 비교하여 마이닝 수행과정 중에 노드를 병합하거나 분리하여 다수의 항목을 하나의 노드에서 관리함으로써 동적으로 트리의 크기를 유연하게 조절할 수 있다. 이러한 동적 조절 기능은 데이터 스트림의 변화로 인해 빈발항목집합이 될 가능성이 높은 항목집합들의 수의 변화폭이 클 경우 전위 트리에 있는 노드들을 동적으로 병합 및 분리 함으로써 제한된 메모리 공간내에서 마이닝 결과의 정확도, 즉, 탐색되는 빈발항목집합의 정확도를 극대화할 수 있다.Korean Patent Registration No. 10-0812378 is directed to a method for searching frequent itemsets using abbreviated disjoint trees for frequent item set search in a data stream environment, which is a continuously generated transaction data set. In the data stream environment, We define the effective data structure and use this data structure to search for necessary information. Korean Patent No. 10-0812378 defines a data structure as abbreviated dislocation tree structure and the abbreviated dislocation tree is compared with a dislocation tree structure applied to existing data mining to merge or separate nodes in the process of mining, The size of the tree can be flexibly adjusted dynamically by managing it in one node. This dynamic adjustment function dynamically merges and separates the nodes in the potential tree when the number of item sets that are likely to become a frequent item set is large due to a change in the data stream, That is, it is possible to maximize the accuracy of the frequent item set searched.

한국등록특허 제10-1376444호(2014.03.13), 제3 페이지 내지 제7 페이지Korean Patent No. 10-1376444 (Mar. 13, 2013), pages 3 to 7 한국등록특허 제10-0812378호(2008.03.04), 제6 페이지 내지 제8 페이지Korean Patent No. 10-0812378 (Mar. 4, 2008), pages 6 to 8

본 발명의 일 실시예는 스트림 환경에서 지속적으로 발생하는 데이터를 생성 시간을 고려하여 일정한 시간 간격으로 할당된 배치에 반영함으로써 설정된 시간 범위 내에서 패턴을 탐색할 수 있는 연관 패턴 마이닝 방법 및 서버를 제공하고자 한다.One embodiment of the present invention provides an association pattern mining method and a server capable of searching patterns within a set time range by reflecting data continuously generated in a stream environment into allocation allocated at regular time intervals in consideration of generation time I want to.

본 발명의 일 실시예는 시간에 민감한 슬라이딩 윈도우를 사용하여 일정한 시간 간격으로 배치를 할당하고, 각 배치에 상응하는 시간 내에 생성된 스트림 데이터를 해당 배치에 반영하여 시간별 패턴 탐색이 가능한 연관 패턴 마이닝 방법 및 서버를 제공하고자 한다.One embodiment of the present invention relates to an associative pattern mining method capable of searching for a time pattern by allocating a batch at a predetermined time interval using a time-sensitive sliding window and reflecting stream data generated within a time corresponding to each batch to the batch And a server.

본 발명의 일 실시예는 시간 범위를 설정하고 패턴을 입력하면, 해당 시간 범위 내에서 발생한 스트림 데이터만을 사용하여 배치별(시간별)로 입력 패턴에 연관된 패턴들을 탐색하여 제공할 수 있는 연관 패턴 마이닝 방법 및 장치를 제공하고자 한다.In an embodiment of the present invention, when a time range is set and a pattern is input, an association pattern mining method capable of searching for and providing patterns related to the input pattern by each batch (time) using only stream data generated within the time range And apparatus.

실시예들 중에서, 연관 패턴 마이닝 방법은 설정된 시간 간격을 기초로 배치를 할당하는 단계, 상기 할당된 배치에 상응하는 시간 범위 내에서 발생한 데이터를 기초로 트리 기반의 자료구조를 생성하는 단계, 탐색 시간 범위와 탐색 패턴을 입력받는 단계, 상기 입력된 탐색 시간 범위를 기초로 상기 탐색 패턴을 탐색할 배치를 결정하고, 결정된 배치에 대한 트리 기반의 자료구조를 상기 입력된 탐색 패턴을 기초로 재구성하는 단계 및 상기 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색하는 단계를 포함한다.Among the embodiments, the associative pattern mining method includes: allocating a layout based on a set time interval; generating a tree-based data structure based on data generated within a time range corresponding to the allocated layout; Determining a layout to search for the search pattern based on the input search time range, and reconstructing a tree-based data structure for the determined layout based on the input search pattern, And searching for a related intra-batch association pattern based on the reconstructed tree-based data structure.

일 실시예에서, 상기 배치를 할당하는 단계는 배치에 대한 시간 간격을 입력받는 단계, 최근에 할당된 배치의 시작 시각과 현재 시각 사이의 시간 차이를 상기 입력된 시간 간격과 비교하는 단계 및 상기 시간 차이가 상기 입력된 시간 간격보다 큰 경우 상기 최근에 할당된 배치의 시작 시각에서 상기 입력된 시간 간격이 경과한 이후 시간에 대한 배치를 다시 할당하는 단계를 포함할 수 있다.In one embodiment, the step of allocating the layout includes receiving a time interval for the batch, comparing the time difference between the start time and the current time of the recently allocated batch with the input time interval, And if the difference is greater than the input time interval, reallocating the arrangement for the time after the input time interval elapses from the start time of the recently allocated arrangement.

일 실시예에서, 상기 트리 기반의 자료구조는 배치별로 구분된 구조를 가질 수 있다.In one embodiment, the tree-based data structure may have a structure that is grouped by placement.

일 실시예에서, 상기 트리 기반의 자료구조는 아이템 정보를 저장하기 위한 아이템 엔트리로 구성된 헤더 테이블 및 아이템에 대한 정보를 저장하는 아이템 노드로 구성된 전위 트리를 포함할 수 있다.In one embodiment, the tree-based data structure may include a table of nodes consisting of an item entry for storing item information and an item tree consisting of item nodes for storing information about the item.

일 실시예에서, 상기 헤더 테이블은 아이템 이름, 빈도수 및 상기 전위 트리에서 해당 아이템에 대한 노드를 가리키는 링크를 포함할 수 있다.In one embodiment, the header table may include an item name, a frequency, and a link indicating a node for the item in the potential tree.

일 실시예에서, 상기 아이템 노드는 아이템 이름, 바로 이전에 생성된 같은 아이템에 대한 노드를 가리키는 노드 링크, 부모 노드를 가리키는 포인터 및 자식 노드들의 리스트를 포함할 수 있다.In one embodiment, the item node may include an item name, a node link pointing to the same item created immediately before, a pointer to a parent node, and a list of child nodes.

일 실시예에서, 상기 입력된 탐색 패턴을 기초로 트리 기반의 자료구조를 재구성하는 단계는 상기 입력 패턴 내 아이템들에 대응되는 아이템 노드들이 다른 아이템들에 대응되는 아이템 노드들의 하위 노드가 되도록 재구성하는 단계를 포함할 수 있다.In one embodiment, the step of reconstructing a tree-based data structure based on the input search pattern comprises reconstructing item nodes corresponding to items in the input pattern to be child nodes of item nodes corresponding to other items Step < / RTI >

일 실시예에서, 상기 다른 아이템들에 대응되는 아이템 노드들의 하위 노드가 되도록 재구성하는 단계는 상기 입력 패턴 내 아이템들을 상기 트리 자료구조 내 헤더 테이블의 정렬 순서에 따라 정렬하는 단계, 상기 헤더 테이블 내에서 상기 입력 패턴 내 아이템들이 정렬을 유지한 채로 다른 패턴의 아이템들의 아래로 내려가도록 순서를 조정하는 단계 및 상기 정렬된 입력 패턴 내 아이템들을 역순으로 선택하고 전위 트리에서 선택된 아이템에 대응되는 아이템 노드를 탐색하면서 노드 순서를 변경하는 단계를 포함할 수 있다.In one embodiment, the step of reconstructing to be a child node of the item nodes corresponding to the other items comprises arranging the items in the input pattern according to the sort order of the header table in the tree data structure, Adjusting the order of items in the input pattern such that the items in the input pattern are downwardly shifted to items of another pattern while maintaining alignment; selecting items in the sorted input pattern in reverse order and searching for item nodes corresponding to the items selected in the potential tree; And changing the node order.

일 실시예에서, 상기 선택된 아이템에 대응되는 아이템 노드를 탐색하면서 노드 순서를 변경하는 단계는 상기 선택된 아이템에 대응되는 아이템 노드가 자식 노드를 가지고 있으면, 상기 자식 노드 중에서 정렬 순서가 가장 빠른 아이템 노드와 상기 선택된 아이템에 대응되는 아이템 노드의 위치를 스위치하는 단계를 포함할 수 있다.In one embodiment, when the item node corresponding to the selected item has a child node, the step of changing the order of the nodes while searching for the item node corresponding to the selected item may include: And switching the position of the item node corresponding to the selected item.

일 실시예에서, 상기 선택된 아이템에 대응되는 아이템 노드를 탐색하면서 노드 순서를 변경하는 단계는 위치를 스위치한 후, 상기 자식 노드에 대해 동일한 아이템을 갖는 형제 노드가 존재하는 경우, 상기 자식 노드와 상기 형제 노드를 병합하는 단계를 더 포함할 수 있다.In one embodiment, the step of changing the node order while searching for the item node corresponding to the selected item may include the step of, when a sibling node having the same item exists for the child node after switching the position, And merging the sibling nodes.

일 실시예에서, 상기 선택된 아이템에 대응되는 아이템 노드를 탐색하면서 노드 순서를 변경하는 단계는 상기 선택된 아이템에 대응되는 아이템 노드가 자식 노드를 가지고 있지 않으면, 노드 링크를 따라 다음 아이템 노드를 탐색하는 단계를 포함할 수 있다.In one embodiment, the step of changing the node order while searching for the item node corresponding to the selected item may include searching the next item node along the node link if the item node corresponding to the selected item does not have any child node . &Lt; / RTI >

일 실시예에서, 상기 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색하는 단계는 상기 헤더 테이블 내 엔트리에서 상기 정렬된 입력 패턴 내 마지막 아이템에 대응되는 엔트리에 접근하는 단계, 상기 대응되는 엔트리와 링크를 통해 연결된 아이템 노드로부터 루트 노드까지 모든 경로들을 추출하는 단계 및 상기 입력 패턴을 기초로 상기 추출된 경로들 내 존재하는 나머지 아이템들과 조합하여 연관 패턴을 생성하는 단계를 포함할 수 있다.In one embodiment, the step of searching for an association pattern in the corresponding layout based on the reconstructed tree-based data structure comprises the steps of: accessing an entry in the header table corresponding to the last item in the sorted input pattern; Extracting all the paths from the item node to the root node connected through the corresponding entry and the link, and generating an association pattern by combining the remaining items in the extracted paths based on the input pattern .

일 실시예에서, 상기 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색하는 단계는 상기 생성된 각 연관 패턴에 대해 상기 탐색 시간 범위 내 각 배치에서의 빈도수를 합하는 단계 및 상기 합한 빈도수가 0인 연관 패턴을 제거하는 단계를 더 포함할 수 있다.In one embodiment, the step of searching for an association pattern in the corresponding arrangement based on the reconstructed tree-based data structure may include summing the frequencies in each arrangement within the search time range for each generated association pattern, And removing the associated pattern with a frequency of zero.

실시예들 중에서, 연관 패턴 마이닝 서버는 컴퓨터 읽기 가능한 명령어들을 저장하는 메모리 및 상기 메모리와 전기적으로 연결되고 사용자의 요청에 따라 상기 저장된 명령어들을 통해 아래의 과정을 실행하는 프로세서를 포함하고, 상기 프로세서는 설정된 시간 간격을 기초로 배치를 할당하는 단계, 상기 할당된 배치에 상응하는 시간 범위 내에서 발생한 데이터를 기초로 트리 기반의 자료구조를 생성하는 단계, 탐색 시간 범위와 탐색 패턴을 입력받는 단계, 상기 입력된 탐색 시간 범위를 기초로 상기 탐색 패턴을 탐색할 배치를 결정하고, 결정된 배치에 대한 트리 기반의 자료구조를 상기 입력된 탐색 패턴을 기초로 재구성하는 단계 및 상기 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색하는 단계를 실행한다.In embodiments, the associative pattern mining server includes a memory that stores computer-readable instructions and a processor that is electrically coupled to the memory and that performs the following steps over the stored instructions in response to a user request: Generating a tree-based data structure based on data generated within a time range corresponding to the allocated layout, receiving a search time range and a search pattern, Determining a layout to search for the search pattern based on the input search time range, reconstructing a tree-based data structure for the determined layout based on the input search pattern, and reconstructing the reconstructed tree- And executes a step of searching for an association pattern in the layout as a basis.

일 실시예에서, 상기 프로세서는 상기 입력 패턴 내 아이템들에 대응되는 아이템 노드들이 다른 아이템들에 대응되는 아이템 노드들의 하위 노드가 되도록 상기 트리 기반의 자료구조를 재구성할 수 있다.In one embodiment, the processor can reconstruct the tree-based data structure such that item nodes corresponding to items in the input pattern are child nodes of item nodes corresponding to other items.

일 실시예에서, 상기 프로세서는 상기 입력 패턴 내 아이템들을 상기 트리 자료구조 내 헤더 테이블의 정렬 순서에 따라 정렬하고, 상기 헤더 테이블 내에서 상기 입력 패턴 내 아이템들이 정렬을 유지한 채로 다른 패턴의 아이템들의 아래로 내려가도록 순서를 조정하며, 상기 정렬된 입력 패턴 내 아이템들을 역순으로 선택하고 전위 트리에서 선택된 아이템에 대응되는 아이템 노드를 탐색하면서 노드 순서를 변경할 수 있다.In one embodiment, the processor arranges the items in the input pattern according to the sorting order of the header table in the tree data structure, and, within the header table, The order of nodes in the sorted input pattern may be changed in reverse order, and the order of the nodes may be changed while searching for the item node corresponding to the item selected in the potential tree.

일 실시예에서, 상기 프로세서는 상기 헤더 테이블 내 엔트리에서 상기 정렬된 입력 패턴 내 마지막 아이템에 대응되는 엔트리에 접근하고, 상기 대응되는 엔트리와 링크를 통해 연결된 아이템 노드로부터 루트 노드까지 모든 경로들을 추출하며, 상기 입력 패턴을 기초로 상기 추출된 경로들 내 존재하는 나머지 아이템들과 조합하여 연관 패턴을 생성할 수 있다.In one embodiment, the processor accesses an entry in the header table corresponding to the last item in the sorted input pattern and extracts all paths from the item node to the root node linked through the link with the corresponding entry , And may generate an association pattern by combining the remaining items existing in the extracted paths based on the input pattern.

일 실시예에서, 상기 프로세서는 상기 생성된 각 연관 패턴에 대해 상기 탐색 시간 범위 내 각 배치에서의 빈도수를 합하고, 상기 합한 빈도수가 0인 연관 패턴을 제거할 수 있다.In one embodiment, the processor may sum the frequencies in each batch within the search time range for each generated association pattern, and remove the association pattern with the combined frequency of zero.

실시예들 중에서, 연관 패턴 마이닝 방법에 관한 컴퓨터 프로그램을 저장하는 기록매체는 설정된 시간 간격을 기초로 배치를 할당하는 기능, 상기 할당된 배치에 상응하는 시간 범위 내에서 발생한 데이터를 기초로 트리 기반의 자료구조를 생성하는 기능, 탐색 시간 범위와 탐색 패턴을 입력받는 기능, 상기 입력된 탐색 시간 범위를 기초로 상기 탐색 패턴을 탐색할 배치를 결정하고, 결정된 배치에 대한 트리 기반의 자료구조를 상기 입력된 탐색 패턴을 기초로 재구성하는 기능 및 상기 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색하는 기능을 수행하는 연관 패턴 마이닝 방법에 관한 컴퓨터 프로그램을 포함한다.Among the embodiments, the recording medium storing the computer program relating to the associative pattern mining method includes a function of allocating a layout on the basis of a set time interval, a function of allocating a tree based on the data generated within the time range corresponding to the allocated layout, A function for generating a data structure, a function for receiving a search time range and a search pattern, a layout for searching the search pattern based on the input search time range, and a tree-based data structure for the determined layout, Based on the reconstructed tree-based data structure, and a function of searching for an association pattern in the corresponding layout based on the reconstructed tree-based data structure.

본 발명의 일 실시예에 따른 연관 패턴 마이닝 방법 및 서버는 스트림 환경에서 지속적으로 발생하는 데이터를 생성 시간을 고려하여 일정한 시간 간격으로 할당된 배치에 반영함으로써 설정된 시간 범위 내에서 패턴을 탐색할 수 있다.The associative pattern mining method and the server according to the embodiment of the present invention can search patterns within a set time range by reflecting the data continuously generated in the stream environment to the allocation allocated at regular time intervals considering the generation time .

본 발명의 일 실시예에 따른 연관 패턴 마이닝 방법 및 서버는 시간에 민감한 슬라이딩 윈도우를 사용하여 일정한 시간 간격으로 배치를 할당하고, 각 배치에 상응하는 시간 내에 생성된 스트림 데이터를 해당 배치에 반영하여 시간별 패턴 탐색이 가능하다.The association pattern mining method and the server according to an embodiment of the present invention allocate layouts at predetermined time intervals using a time-sensitive sliding window, and reflect the generated stream data in a corresponding time according to each layout, Pattern search is possible.

본 발명의 일 실시예에 따른 연관 패턴 마이닝 방법 및 서버는 시간 범위를 설정하고 패턴을 입력하면, 해당 시간 범위 내에서 발생한 스트림 데이터만을 사용하여 배치별(시간별)로 입력 패턴에 연관된 패턴들을 탐색하여 제공할 수 있다.The association pattern mining method and the server according to an embodiment of the present invention search patterns related to an input pattern by batch (time) using only stream data generated within a corresponding time range by setting a time range and inputting a pattern .

도 1은 본 발명의 일 실시예에 따른 연관 패턴 마이닝 시스템을 설명하는 블록도이다.
도 2는 도 1에 있는 연관 패턴 마이닝 서버를 설명하는 블록도이다.
도 3은 도 2에 있는 연관 패턴 마이닝 서버에서 실행되는 연관 패턴 마이닝 방법을 설명하는 흐름도이다.
도 4는 트리 기반의 자료구조를 설명하는 도면이다.
도 5는 트리 기반의 자료구조를 구축하는 과정을 설명하는 흐름도이다.
도 6은 트리 기반의 자료구조를 재구성하는 과정을 설명하는 도면이다.
도 7은 설정된 시간 범위 내에서 입력 패턴과 연관된 연관 패턴을 탐색하는 과정을 설명하는 흐름도이다.1 is a block diagram illustrating an associative pattern mining system according to an embodiment of the present invention.
2 is a block diagram illustrating the association pattern mining server of FIG.
FIG. 3 is a flowchart illustrating an associative pattern mining method executed in the associative pattern mining server in FIG. 2; FIG.
4 is a diagram for explaining a tree-based data structure.
5 is a flowchart illustrating a process of constructing a tree-based data structure.
6 is a diagram illustrating a process of reconstructing a tree-based data structure.
7 is a flowchart for explaining a process of searching an association pattern associated with an input pattern within a set time range.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The description of the present invention is merely an example for structural or functional explanation, and the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the present invention should be understood to include equivalents capable of realizing technical ideas. Also, the purpose or effect of the present invention should not be construed as limiting the scope of the present invention, since it does not mean that a specific embodiment should include all or only such effect.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.The terms "first "," second ", and the like are intended to distinguish one element from another, and the scope of the right should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It is to be understood that the singular " include " or "have" are to be construed as including the stated feature, number, step, operation, It is to be understood that the combination is intended to specify that it does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (e.g., a, b, c, etc.) is used for convenience of explanation, the identification code does not describe the order of each step, Unless otherwise stated, it may occur differently from the stated order. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes any type of recording device that stores data that can be read by a computer system . Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. , Computer-readable code in a distributed fashion can be stored and executed.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used predefined terms should be interpreted to be consistent with the meanings in the context of the related art and can not be interpreted as having ideal or overly formal meaning unless explicitly defined in the present application.

도 1은 본 발명의 일 실시예에 따른 연관 패턴 마이닝 시스템을 설명하는 블록도이다.1 is a block diagram illustrating an associative pattern mining system according to an embodiment of the present invention.

도 1을 참조하면, 연관 패턴 마이닝 시스템(100)은 사용자 단말기(110) 및 연관 패턴 마이닝 서버(120)를 포함한다.Referring to FIG. 1, an associative pattern mining system 100 includes a user terminal 110 and an associative pattern mining server 120.

사용자 단말기(110)는 사용자의 제어에 의해 연관 패턴 마이닝 서버(120)에 접속하여 연관 패턴 마이닝을 요청할 수 있다. 사용자 단말기(110)는 연관 패턴 마이닝 서버(120)에서 제공된 경과 정보 또는 결과 정보를 화면에 디스플레이한다. 사용자 단말기(110)는 모바일 단말기, 테블릿 PC, 랩탑 PC 또는 데스크탑 PC에 해당할 수 있다.The user terminal 110 may access the association pattern mining server 120 under the control of the user and request association pattern mining. The user terminal 110 displays the progress information or the result information provided from the association pattern mining server 120 on the screen. The user terminal 110 may correspond to a mobile terminal, a tablet PC, a laptop PC, or a desktop PC.

연관 패턴 마이닝 서버(120)는 지속적으로 데이터가 생성되는 스트림 환경에서 사용자 단말기(110)의 요청에 따라 입력 패턴과 연관된 연관 패턴을 마이닝한다. The association pattern mining server 120 mines an association pattern associated with an input pattern according to a request of the user terminal 110 in a stream environment in which data is continuously generated.

연관 패턴 마이닝 서버(120)는 스트림 환경에서 지속적으로 발생하는 데이터를 시간에 민감한 슬라이딩 윈도우 기반으로 처리하며, 해당 윈도우는 사용자가 정의한 수의 배치(batch)로 구성된다. 또한, 각 배치는 일정한 시간 간격으로 할당되며, 해당 배치에는 해당 시간 간격에 상응하는 시간에 생성된 스트림 데이터가 삽입된다.The association pattern mining server 120 processes data generated continuously in a stream environment on a time-sensitive sliding window basis, and the window is composed of a number of batches defined by a user. In addition, each batch is allocated at regular time intervals, and stream data generated at a time corresponding to the corresponding time interval is inserted into the batch.

연관 패턴 마이닝 서버(120)는 생성 시간을 고려하여 스트림 데이터를 배치에 삽입하고, 해당 데이터를 기초로 트리 기반의 자료구조를 생성한다. 연관 패턴 마이닝 서버(120)는 생성된 트리 기반의 자료구조를 기초로 설정된 시간 범위 내에서 패턴을 마이닝할 수 있다. 즉, 연관 패턴 마이닝 서버(120)는 시간에 민감한 슬라이딩 윈도우를 사용하여 일정한 시간 간격으로 배치를 할당하고, 각 배치에 상응하는 시간 내에 생성된 스트림 데이터를 해당 배치에 반영하여 시간별 패턴 마이닝을 수행한다.The association pattern mining server 120 inserts stream data into the layout in consideration of generation time, and generates a tree-based data structure based on the data. The associative pattern mining server 120 can minify the pattern within a set time range based on the generated tree-based data structure. That is, the association pattern mining server 120 allocates the layouts at predetermined time intervals using the time-sensitive sliding window, and reflects the generated stream data within the time corresponding to each layout to the corresponding layouts to perform the time-based pattern mining .

예를 들어, 연관 패턴 마이닝 서버(120)는 설정된 시간 간격을 기초로 배치를 할당하고, 할당된 배치에 상응하는 시간 범위 내에서 발생한 데이터를 기초로 트리 기반의 자료구조를 생성한다. 탐색 시간 범위와 탐색 패턴이 입력되는 경우, 연관 패턴 마이닝 서버(120)는 입력된 탐색 시간 범위를 기초로 탐색 패턴을 탐색할 배치를 결정하고, 결정된 배치에 대한 트리 기반의 자료구조를 입력된 탐색 패턴을 기초로 재구성한다. 연관 패턴 마이닝 서버(120)는 재구성된 트리 기반의 자료구조를 기초로 해당 배치 내 연관 패턴을 탐색한다. 연관 패턴 마이닝 서버(120)는 마이닝 경과 정보와 결과 정보를 사용자 단말기(110)에 제공할 수 있다.For example, the association pattern mining server 120 allocates a layout based on a set time interval, and generates a tree-based data structure based on data generated within a time range corresponding to the allocated layout. When a search time range and a search pattern are input, the association pattern mining server 120 determines a layout to search for a search pattern based on the input search time range, and searches for a tree- Reconstruct based on pattern. The association pattern mining server 120 searches for an association pattern in the corresponding layout based on the reconstructed tree-based data structure. The association pattern mining server 120 may provide the user terminal 110 with the mining progress information and the result information.

도 2는 도 1에 있는 연관 패턴 마이닝 서버를 설명하는 블록도이다.2 is a block diagram illustrating the association pattern mining server of FIG.

도 2를 참조하면, 연관 패턴 마이닝 서버(120)는 다른 구성요소들과 버스(220)를 통해 통신하는 프로세서(또는, CPU)(210)를 포함한다. 프로세서(210)는 다른 구성요소들의 작동을 제어하고 다른 구성요소들과 연관 패턴 마이닝을 수행한다. 또한, 프로세서(210)는 메모리(230)와 전기적으로 연결되고 사용자의 요청에 따라 메모리(230)에 저장된 명령어들을 통해 연관 패턴 마이닝을 수행할 수 있다.Referring to FIG. 2, the associative pattern mining server 120 includes a processor (or CPU) 210 that communicates with other components via a bus 220. Processor 210 controls the operation of other components and performs associative pattern mining with other components. The processor 210 is also electrically connected to the memory 230 and may perform associative pattern mining through instructions stored in the memory 230 at the request of the user.

연관 패턴 마이닝 서버(120)는 메모리(230) 및 저장 장치(240)를 포함하고, 메모리(230)는 롬(Read Only Memory, ROM)(232) 및 램(Random Access Memory, RAM)(234)을 포함한다. 여기에서, 메모리(230)는 일시적 또는 영구적으로 컴퓨터가 판독할 수 있는 기록매체에 해당하고, 저장 장치(240)는 영구적으로 컴퓨터가 판독할 수 있는 기록매체에 해당한다. 메모리(230) 및 저장 장치(240) 중 적어도 하나는 연관 패턴 마이닝을 위한 명령어를 포함하는 컴퓨터 프로그램을 저장한다.The associative pattern mining server 120 includes a memory 230 and a storage device 240. The memory 230 includes a read only memory (ROM) 232 and a random access memory (RAM) . Here, the memory 230 corresponds to a temporary or permanent computer-readable recording medium, and the storage device 240 corresponds to a permanent computer-readable recording medium. At least one of the memory 230 and the storage device 240 stores a computer program including instructions for associating pattern mining.

연관 패턴 마이닝 서버(120)는 네트워크(280)와 통신하기 위하여 네트워크 인터페이스(270)를 포함한다. 네트워크 인터페이스(270)는 연관 패턴 마이닝 서버(120)와 네트워크(280)간의 정보, 데이터 및 신호를 전송할 수 있는 환경을 설정한다. 도 1을 참조하면, 연관 패턴 마이닝 서버(120)는 네트워크(280)를 통하여 사용자 단말기(110)와 연결될 수 있다.The association pattern mining server 120 includes a network interface 270 for communicating with the network 280. The network interface 270 establishes an environment in which information, data, and signals between the association pattern mining server 120 and the network 280 can be transmitted. Referring to FIG. 1, the association pattern mining server 120 may be connected to the user terminal 110 through a network 280.

연관 패턴 마이닝 서버(120)는 사용자 인터페이스 입력 장치(250)(예를 들어, 마우스, 트랙볼, 터치 패드, 그래픽 태블릿, 스캐너, 상품 바코드 스캔을 위한 바코드 스캐너, 터치 스크린, 키보드, 포인팅 장치 등)를 통해 사용자로부터 정보를 입력받을 수 있다. 사용자 인터페이스 입력 장치(250)는 연관 패턴 마이닝 서버(120) 또는 네트워크에 정보(예를 들어, 트랜잭션)를 입력할 수 있는 모든 메커니즘을 포함한다.The associative pattern mining server 120 may include a user interface input device 250 (e.g., a mouse, a trackball, a touchpad, a graphic tablet, a scanner, a barcode scanner for product barcode scanning, a touch screen, a keyboard, a pointing device, etc.) Information can be input from the user through the Internet. The user interface input device 250 includes all the mechanisms by which information (e.g., transactions) can be entered into the association pattern mining server 120 or the network.

연관 패턴 마이닝 서버(120)는 사용자 인터페이스 출력 장치(260)를 통해 정보를 출력할 수 있다. 사용자 인터페이스 출력 장치(260)는 디스플레이 화면과 같은 시각적 출력 장치를 포함할 수 있으나, 이에 한정되지 않는다. 사용자 인터페이스 출력 장치(260)는 사용자에게 정보를 출력할 수 있는 모든 메커니즘을 포함하고, 하나의 영상 출력 장치 또는 스피커와 같은 출력 장치와 결합될 수 있다.The association pattern mining server 120 may output information through the user interface output device 260. [ The user interface output device 260 may include, but is not limited to, a visual output device such as a display screen. The user interface output device 260 may include any mechanism capable of outputting information to a user and may be combined with an output device such as a video output device or a speaker.

일 실시예에서, 디스플레이 스크린은 연관 패턴 마이닝 서버(120)로부터 수신한 정보를 디스플레이하고 운영자로부터 입력을 수신할 수 있다. 즉, 디스플레이 스크린은 사용자 인터페이스 입력 장치(250)와 사용자 인터페이스 출력 장치(260)로서 구현될 수 있다.In one embodiment, the display screen may display information received from the associative pattern mining server 120 and receive input from an operator. That is, the display screen may be implemented as a user interface input device 250 and a user interface output device 260.

도 3은 도 2에 있는 연관 패턴 마이닝 서버에서 실행되는 연관 패턴 마이닝 방법을 설명하는 흐름도이다.FIG. 3 is a flowchart illustrating an associative pattern mining method executed in the associative pattern mining server in FIG. 2; FIG.

도 3을 참조하면, 연관 패턴 마이닝 서버(120)는 스트림 환경에서 시작 시각을 기준으로 설정된 시간 간격(T_T)(312)마다 배치를 할당한다(314). 일 실시예에서, 연관 패턴 마이닝 서버(120)는 사용자 단말기(110)를 통해 시간 간격(T_T)을 입력받을 수 있다.Referring to FIG. 3, the association pattern mining server 120 allocates a batch (314) for each set time interval (T _T ) 312 based on the start time in the stream environment. In one embodiment, the association pattern mining server 120 may receive the time interval T _T through the user terminal 110.

배치가 할당되면 연관 패턴 마이닝 서버(120)는 할당된 배치에 상응하는 시간 범위 내에서 발생한 스트림 데이터를 트리 기반의 자료구조(324) 내 해당 배치에 지속적으로 반영한다(125). 일 실시예에서, 트리 기반의 자료구조(324)는 배치별로 구분된 구조를 가질 수 있다.When the batch is assigned, the association pattern mining server 120 continuously reflects the stream data generated within the time range corresponding to the allocated batch to the batch in the tree-based data structure 324 (125). In one embodiment, the tree-based data structure 324 may have a structure that is grouped by placement.

사용자 단말기(110)는 사용자의 제어에 의해 탐색 시간 범위 및 탐색 패턴을 입력받아 연관 패턴 마이닝 서버(120)에 전송할 수 있다. 탐색 시간 범위와 탐색 패턴이 입력되면(330), 연관 패턴 마이닝 서버(120)는 먼저 입력된 탐색 패턴(이하, 입력 패턴)을 기준으로 연관 패턴들을 탐색하기 위해 트리 기반의 자료구조(324)를 재구성한다(332). The user terminal 110 may receive the search time range and the search pattern and transmit the search time range and the search pattern to the association pattern mining server 120 under the control of the user. If the search time range and the search pattern are input 330, the association pattern mining server 120 first searches the tree-based data structure 324 for searching for the association patterns based on the input search pattern (332).

다음으로 연관 패턴 마이닝 서버(120)는 설정된 탐색 시간 범위에 상응하는 배치들을 계산한다. 즉, 연관 패턴 마이닝 서버(120)는 입력된 탐색 시간 범위를 기초로 탐색 패턴을 탐색할 배치를 결정한다.Next, the associative pattern mining server 120 calculates the batches corresponding to the set search time range. That is, the association pattern mining server 120 determines a search pattern for searching for a search pattern based on the input search time range.

연관 패턴 마이닝 서버(120)는 트리 기반의 자료구조 내 해당 배치들에서 입력 패턴에 연관된 적어도 하나의 연관 패턴을 탐색하고(336) 배치별 탐색 결과(338)를 제공한다. 사용자 단말기(110)는 제공받은 탐색 결과를 화면에 디스플레이하여 사용자에게 제공할 수 있다.The associative pattern mining server 120 searches for (336) at least one association pattern associated with the input pattern at the corresponding batches in the tree-based data structure and provides batch-specific search results (338). The user terminal 110 may display the provided search result on a screen and provide the search result to a user.

이하에서는 도 3을 참조하여, 연관 패턴 마이닝 방법에 대해 상세히 설명하기로 한다.Hereinafter, the association pattern mining method will be described in detail with reference to FIG.

사용자 단말기(110)는 사용자의 제어에 의해 배치의 시간 간격(T_T)을 입력받아 연관 패턴 마이닝 서버(120)에 전송할 수 있다. 배치의 시간 간격(T_T)을 입력받으면(312) 연관 패턴 마이닝 서버(120)는 현재 시각에 대한 배치를 할당한다(314). 다음으로 지속적으로 데이터가 생성되는 스트림 환경에서 연관 패턴 마이닝 서버(120)는 가장 최근에 할당된 배치의 시작 시각과 현재 시각을 각각 변수 T_P 그리고 T_C에 저장한다(316). The user terminal 110 may receive the time interval T _T of the batch by the user's control and transmit the time interval T _T to the association pattern mining server 120. When the time interval T _T of the batch is input 312, the associative pattern mining server 120 allocates 314 the batch for the current time. Next, in a stream environment in which data is continuously generated, the association pattern mining server 120 stores the start time and the current time of the most recently allocated batch in the variables T _P and T _C , respectively (316).

연관 패턴 마이닝 서버(120)는 지속적으로 최근에 할당된 배치의 시작 시각(T_P)과 현재 시각 (T_C)사이의 시간 차이를 계산하여, 계산된 시간 차이와 설정된 시간 간격(T_T)와 비교한다(318). 만약 시작 시각(T_P)과 현재 시각 (T_C)사이의 시간 차이가 설정된 시간 간격(T_T)보다 작지 않다면(크면) 이전 배치의 시작 시각(T_P)으로부터 설정된 시간 간격(T_T)만큼 지난 시각에 대한 배치를 다시 할당한다(314). The associating pattern mining server 120 calculates the time difference between the start time T _P and the current time T _C of the continuously allocated batch and stores the calculated time difference and the set time interval T _T (318). If the start time (T _P) and the current time (T _C), the time interval (T _T) is set from the time interval (T _T) not smaller than the (larger) from the previous batch time (T _P), the time difference is set between by The batch for the last time is reassigned (314).

만약 시작 시각(T_P)과 현재 시각 (T_C)사이의 시간 차이가 설정된 시간 간격(T_T)보다 작다면 연관 패턴 마이닝 서버(120)는 새로운 데이터가 수신되었는지 확인한다(320). 새로운 데이터가 수신되지 않았으면 연관 패턴 마이닝 서버(120)는 시작 시각(T_P)과 현재 시각 (T_C)사이의 시간 차이를 설정된 시간 간격(T_T)과 비교하는 과정(318)으로 돌아가 해당 과정을 수행한다.If the time difference between the start time (T _P ) and the current time (T _C ) is less than the set time interval (T _T ), the association pattern mining server 120 determines whether new data has been received (320). If no new data is received, the association pattern mining server 120 returns to step 318 of comparing the time difference between the start time (T _P ) and the current time (T _C ) with the set time interval (T _T ) .

새로운 스트림 데이터가 수신되면 연관 패턴 마이닝 서버(120)는 시간에 민감한 슬라이딩 윈도우(Time-Sensitive Sliding Window) 기반의 트리 자료구조(324)에 수신된 데이터를 삽입한다(322). 삽입된 스트림 데이터는 트리 자료구조(324)의 현재 배치 정보에 반영된다. 이와 같은 과정을 통해 연관 패턴 마이닝 서버(120)는 할당된 배치에 상응하는 시간 범위 내에서 발생한 데이터를 기초로 트리 기반의 자료구조를 생성할 수 있다.When the new stream data is received, the association pattern mining server 120 inserts the received data into the time-sensitive sliding window based tree structure 324 (322). The inserted stream data is reflected in the current placement information of the tree data structure 324. Through this process, the associative pattern mining server 120 can generate a tree-based data structure based on data generated within a time range corresponding to the allocated layout.

도 4는 트리 기반의 자료구조를 설명하는 도면이다.4 is a diagram for explaining a tree-based data structure.

도 4를 참조하면, 시간에 민감한 슬라이딩 윈도우 기반의 트리 자료구조(410)는 헤더 테이블(410)과 전위 트리(420)로 구성된다. 도 4의 트리 자료구조는 두 개의 배치가 할당돼 있는 경우의 예이다.Referring to FIG. 4, a time-sensitive sliding window-based tree data structure 410 includes a header table 410 and a dislocation tree 420. The tree data structure of FIG. 4 is an example in which two batches are allocated.

헤더 테이블(410)은 아이템에 대한 정보를 저장하는 엔트리로 구성되며, 각 엔트리는 아이템 이름, 빈도수(Support), 링크(Link)로 구성된다. 링크는 전위 트리(420)에서 마지막으로 생성된 해당 아이템에 대한 노드를 가리킨다.The header table 410 is composed of entries for storing information on items, and each entry is composed of an item name, a frequency (Support), and a link. The link points to the node for the last item created in the dislocation tree 420.

전위 트리(420)는 적어도 하나의 아이템 노드로 구성된다. 각 아이템 노드는 아이템 이름, 바로 이전에 생성된 같은 아이템에 대한 노드를 가리키는 노드 링크(Node link), 부모 노드를 가리키는 포인터, 자식 노드들의 집합을 관리하기 위한 자식 노드 리스트를 포함한다. 또한, 각 아이템 노드는 시간에 민감한 슬라이딩 윈도우를 기반으로 하기 위해 배치별 빈도수 정보를 저장하는 배치 빈도수 배열(430)을 포함한다.The dislocation tree 420 consists of at least one item node. Each item node includes an item name, a node link indicating a node for the same item generated immediately before, a pointer indicating a parent node, and a child node list for managing a set of child nodes. In addition, each item node includes a batch frequency array 430 for storing batch frequency information to be based on a time-sensitive sliding window.

도 5는 트리 기반의 자료구조를 구축하는 과정을 설명하는 흐름도이다.5 is a flowchart illustrating a process of constructing a tree-based data structure.

도 5를 참조하면, 먼저 현재 시각에 대한 배치가 할당되면(512), 연관 패턴 마이닝 서버(120)는 트리 자료구조(324) 내 각 아이템 노드를 탐색하고(514), 탐색된 노드의 배치 빈도수 배열(430) 끝에 새로운 요소를 추가(516)한 후 해당 값을 0으로 설정한다(518). 이와 같은 과정을 통해 새로운 배치가 할당된 트리 자료구조를 구축할 수 있다.5, if a layout for the current time is first allocated 512, the association pattern mining server 120 searches 514 for each item node in the tree data structure 324, A new element is added 516 to the end of the array 430 and the value is set to 0 (518). Through this process, we can construct a tree data structure to which a new batch is assigned.

그 후 새로운 스트림 데이터가 수신되면(530) 연관 패턴 마이닝 서버(120)는 데이터 내 아이템들을 트리 자료구조(324) 내 헤더 테이블(410)의 정렬순서에 따라 정렬하고(532), 트리 자료구조에 차례대로 각 아이템을 삽입한다(534). 연관 패턴 마이닝 서버(120)는 삽입하는 아이템에 대응되는 아이템 노드에 대해 배치 빈도수 배열의 마지막 값을 1 증가시킨다(536).When the new stream data is received 530, the associative pattern mining server 120 arranges the items in the data according to the sort order of the header table 410 in the tree data structure 324 (532) Each item is inserted in turn (534). The association pattern mining server 120 increments the last value of the arrangement frequency array for the item node corresponding to the item to be inserted by 1 (536).

다시 도 3을 참조하면, 사용자 단말기(110)는 사용자의 제어에 의해 탐색 시간 범위 및 탐색 패턴을 입력받아 연관 패턴 마이닝 서버(120)에 전송할 수 있다. 탐색 시간 범위와 탐색 패턴이 입력되면(330), 연관 패턴 마이닝 서버(120)는 입력 패턴 내 아이템들에 대한 아이템 노드들이 다른 아이템들에 대한 아이템 노드들의 하위 노드가 되도록 트리 기반의 자료구조를 재구성한다(332).Referring again to FIG. 3, the user terminal 110 receives the search time range and the search pattern and transmits the search time range and the search pattern to the association pattern mining server 120 under the control of the user. If the search time range and the search pattern are input 330, the association pattern mining server 120 reconstructs the tree-based data structure so that the item nodes for the items in the input pattern become the child nodes of the item nodes for the other items (332).

도 6은 트리 기반의 자료구조를 재구성하는 과정을 설명하는 도면이다.6 is a diagram illustrating a process of reconstructing a tree-based data structure.

도 6을 참조하면, 연관 패턴 마이닝 서버(120)는 먼저 입력 패턴(612)의 아이템을 트리 자료구조(324) 내 헤더 테이블(410)의 정렬순서에 따라 정렬한다(614).Referring to FIG. 6, the association pattern mining server 120 first aligns the items of the input pattern 612 according to the sort order of the header table 410 in the tree data structure 324 (614).

연관 패턴 마이닝 서버(120)는 헤더 테이블(410)에서 입력 패턴 내 아이템들이 정렬순서를 유지한 채로 다른 아이템들의 아래로 내려가도록 순서를 조정한다(616). 예를 들어, 정렬순서가 {D, C, A, B, E}인 도 4의 헤더 테이블(410)에 대한 정렬된 입력 패턴이 {A, B}라고 가정하면, 연관 패턴 마이닝 서버(120)는 헤더 테이블(410)의 정렬 순서를 {D, C, E, A, B}가 되도록 조정한다. The association pattern mining server 120 adjusts the order of the items in the input pattern in the header table 410 so that the items in the input pattern descend below other items while maintaining the sorting order. For example, assuming that the sorted input pattern for the header table 410 of FIG. 4 with the sort order {D, C, A, B, E} is {A, B} C, E, A, and B} in the header table 410. The header table 410 includes a header table 410,

다음으로 연관 패턴 마이닝 서버(120)는 헤더 테이블(410)에서 정렬된 패턴 내 각 아이템을 역순으로 선택하고(618), 헤더 테이블(410)에서 선택된 아이템에 대한 엔트리에 접근한다(620).The associating pattern mining server 120 then selects 610 each item in the sorted pattern in the header table 410 and accesses 620 an entry for the selected item in the header table 410.

연관 패턴 마이닝 서버(120)는 접근한 엔트리의 링크 및 해당 링크와 연결된 아이템 노드의 노드 링크를 통해 해당 아이템에 대한 각 아이템 노드에 접근한다(622). 연관 패턴 마이닝 서버(120)는 접근한 아이템 노드가 자식 노드를 포함하고 있는지 확인한다(624).The association pattern mining server 120 accesses each item node for the corresponding item through the link of the accessed entry and the node link of the item node connected to the link (622). The association pattern mining server 120 determines whether the accessed item node includes a child node (624).

만약 해당 노드가 자식 노드를 가지고 있지 않으면 연관 패턴 마이닝 서버(120)는 노드 링크를 따라 다음 아이템 노드에 접근한다(622). 해당 노드가 자식 노드를 가지고 있으면 연관 패턴 마이닝 서버(120)는 자식 노드 중에서 정렬순서가 가장 빠른 아이템 노드를 선택하여 두 노드의 위치를 교환(스위치)한다(626). 즉, 연관 패턴 마이닝 서버(120)는 자식 노드 중에서 정렬 순서가 가장 빠른 아이템 노드와 선택된 아이템에 대응되는 아이템 노드의 위치를 교환한다.If the node does not have a child node, the association pattern mining server 120 accesses the next item node along the node link (622). If the node has a child node, the associative pattern mining server 120 selects an item node having the highest sorting order among the child nodes to exchange (switch) the positions of the two nodes (626). That is, the association pattern mining server 120 exchanges the item node having the highest sorting order among the child nodes and the item node corresponding to the selected item.

위치를 교환한 후, 자식 노드에 대해 동일한 아이템을 갖는 형제 노드가 존재하는 경우 연관 패턴 마이닝 서버(120)는 두 아이템 노드를 병합한다. 예를 들어, 도 4의 전위 트리(420)에서 아이템 노드 C: {1, 2}가 부모 노드인 D: {7, 5}와 위치가 교환되면, 위치가 교환된 아이템 노드 C: {1, 2}는 아이템 노드 C: {2, 3}와 형제 노드가 되므로 연관 패턴 마이닝 서버(120)는 두 노드를 병합한다. 연관 패턴 마이닝 서버(120)는 해당 과정을 하위 노드들에 대해 재귀적으로 수행한다(628).After exchanging the positions, if there are siblings having the same item for the child node, the association pattern mining server 120 merges the two item nodes. For example, if item node C: {1, 2} is exchanged with parent node D: {7, 5} in the potential tree 420 of FIG. 4, 2} becomes an item node C: {2, 3} and a sibling node, the association pattern mining server 120 merges the two nodes. The association pattern mining server 120 recursively performs the process for the lower nodes (628).

정렬된 입력 패턴 내 모든 아이템에 대해 상기와 같은 처리가 완료되면 트리 기반의 자료구조를 재구성하는 과정이 완료된다.When the above process is completed for all the items in the sorted input pattern, the process of reconstructing the tree-based data structure is completed.

다시 도 3을 참조하면, 연관 패턴 마이닝 서버(120)는 재구성된 트리 자료구조(334)로부터 입력된 시간 범위 내 입력 패턴에 대한 연관 패턴들을 탐색하기 위해 해당 시간 범위 내 배치들을 계산한다(510). 즉, 연관 패턴 마이닝 서버(120)는 입력된 탐색 시간 범위를 기초로 입력 패턴을 탐색할 배치를 결정한다.Referring again to FIG. 3, the associative pattern mining server 120 computes (510) arrangements within the time range to search for the association patterns for input patterns within the time range input from the reconstructed tree data structure 334, . That is, the associative pattern mining server 120 determines a layout to search for an input pattern based on the input search time range.

연관 패턴 마이닝 서버(120)는 결정된 배치 내 배치별 연관 패턴들을 탐색하고(336), 탐색된 배치별 연관 패턴 결과(338)를 제공한다. The associative pattern mining server 120 searches 336 for the determined intra-batch intra-batch related patterns and provides the detected per-batch related pattern results 338. [

도 7은 설정된 시간 범위 내에서 입력 패턴과 연관된 연관 패턴을 탐색하는 과정을 설명하는 흐름도이다.7 is a flowchart for explaining a process of searching an association pattern associated with an input pattern within a set time range.

이하에서는 도 7을 참조하여, 연관 패턴을 탐색하는 과정을 자세히 설명하기로 한다.Hereinafter, a process of searching for an association pattern will be described in detail with reference to FIG.

트리 자료구조의 재구성 과정이 완료되면, 연관 패턴 마이닝 서버(120)는 먼저 입력된 탐색 시간 범위 내 배치들을 계산한다(714). 연관 패턴 마이닝 서버(120)는 헤더 테이블(410) 내 엔트리에서 정렬된 입력 패턴 내 마지막 아이템에 대응되는 엔트리에 접근한다(716).When the reconstruction process of the tree data structure is completed, the association pattern mining server 120 first calculates 714 the layouts within the search time range input. The association pattern mining server 120 accesses an entry corresponding to the last item in the sorted input pattern in the entry in the header table 410 (716).

연관 패턴 마이닝 서버(120)는 해당 엔트리의 링크와 연결된 아이템 노드에 접근하여(718), 해당 아이템 노드를 시작으로 루트 노드(Root node)까지의 모든 경로를 추출한다(720). 이때, 추출된 경로 내 각 아이템 노드는 해당 배치들에 대한 빈도수 값들을 배치 빈도수 배열(430)로 가진다. The association pattern mining server 120 accesses the item node connected to the link of the corresponding entry (718), and extracts all the routes from the corresponding item node to the root node (720). At this time, each item node in the extracted path has frequency values for the corresponding arrangements in the arrangement frequency array 430.

예를 들어, 도 4에서 입력된 탐색 시간 범위 내에 두 번째 배치만이 존재한다고 가정하면, 아이템 노드 B: {0, 1}을 시작으로 추출되는 경로 {B, A, C, D} 내 아이템 노드들은 각각 {1}, {2}, {2}, {5}의 배치 빈도수 배열(230)을 가진다. 상기 예에서 만약 입력된 탐색 시간 범위 내에 두 배치가 모두 존재한다고 가정하면, 추출되는 경로 내 아이템 노드들은 각각 {0, 1}, {0, 2}, {1, 2}, {7, 5}의 빈도수를 가진다.For example, assuming that only the second batch exists within the search time range input in FIG. 4, the item node B: the item node in the path {B, A, C, D} Have arrangement frequency arrangement 230 of {1}, {2}, {2}, and {5}, respectively. In the above example, if there are two arrangements in the input search time range, the item nodes in the extracted route are {0,1}, {0,2}, {1,2}, { .

경로를 추출한 후, 연관 패턴 마이닝 서버(120)는 노드 링크를 따라 다음 아이템 노드로 이동하며(718), 더는 연결된 아이템 노드가 없을 때까지 상기 과정을 반복한다(722). After extracting the path, the associative pattern mining server 120 moves to the next item node along the node link (718), and repeats the above process until there is no connected item node (722).

모든 경로의 추출이 완료되면, 연관 패턴 마이닝 서버(120)는 입력 패턴을 기초로 추출된 경로들 내 존재하는 나머지 아이템들과 조합하여 연관 패턴을 생성한다(724). 예를 들어, 상기 예의 추출된 경로 {B, A, C, D}에 대하여 입력 패턴이 {B, A}일 때, 해당 입력 패턴과 나머지 아이템들 {C, D}을 조합한 결과는 {B, A, C}, {B, A, D}, {B, A, C, D}이다. When the extraction of all the paths is completed, the association pattern mining server 120 generates an association pattern by combining the remaining items in the extracted paths based on the input pattern (724). For example, when the input pattern is {B, A} for the extracted path {B, A, C, D} of the above example, the result of combining the input pattern and the remaining items {C, D} , A, C}, {B, A, D}, {B, A, C, D}.

각 조합된 연관 패턴의 빈도수는 결정된 각 배치에서 산출된 가장 작은 아이템 빈도수로 결정된다. 예를 들어, 도 4에서 조합된 패턴 {B, A, C} 내 각 아이템은 각각 {0, 1}, {0, 2}, {1, 2}의 빈도수를 가지며, 각 배치에 대한 가장 작은 빈도수는 0과 1이다. 따라서 {B, A, C}는 첫 번째 배치에서 등장하지 않은 패턴이며, 두 번째 배치에서 빈도수 1을 가지고 등장한 패턴이다. The frequency of each associated association pattern is determined by the smallest item frequency calculated in each determined layout. For example, each item in the combined pattern {B, A, C} in FIG. 4 has a frequency of {0, 1}, {0, 2}, {1, 2} The frequency is 0 and 1. Thus, {B, A, C} is a pattern that did not appear in the first batch, but a pattern with a frequency of one in the second batch.

연관 패턴 마이닝 서버(120)는 각 연관 패턴에 대해 탐색 시간 범위 내 각 배치에서의 빈도수를 합하는 방식으로 빈도수를 통합한다(726). 예를 들어, 입력 패턴 {B, A}에 연관된 패턴 가운데 {B, A, C} 패턴에 대해 첫 번째 배치에 대한 빈도수와 두 번째 배치에 대한 빈도수가 각각 1과 2이면, 해당 조합 연관 패턴의 빈도수는 3이 된다.Associated pattern mining server 120 consolidates the frequencies in a manner that adds the frequencies in each batch within the search time range for each association pattern (726). For example, if the frequency for the first batch and the frequency for the second batch are 1 and 2, respectively, for the pattern {B, A, C} among the patterns associated with the input pattern {B, A} The frequency is 3.

상기와 같은 과정을 통해 통합된 결과에서 빈도수가 0인 조합 패턴을 제거하고(728), 연관 패턴 마이닝 서버(120)는 빈도수가 1 이상인 조합 패턴을 제공한다.In step 728, the association pattern mining server 120 provides a combination pattern having a frequency of 1 or more.

상기와 같은 과정을 통해 연관 패턴 마이닝 서버(120)는 지속적으로 데이터가 생성되는 스트림 환경에서 시간에 민감한 슬라이딩 윈도우를 기반으로 입력된 패턴에 대한 연관 패턴들을 일정한 시간 간격별로 탐색 및 통합하여 마이닝할 수 있다.Through the above process, the associative pattern mining server 120 searches and integrates association patterns of patterns inputted on the basis of a time-sensitive sliding window in a stream environment where data is continuously generated, have.

상기에서는 본 출원의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 출원을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the present invention as defined by the following claims It can be understood that

100: 연관 패턴 마이닝 시스템
110: 사용자 단말기
120: 연관 패턴 마이닝 서버100: Associative Pattern Mining System
110: User terminal
120: Associative Pattern Mining Server

Claims

1. An association pattern mining method performed in an associative pattern mining server,
Assigning a layout based on a set time interval;
Generating a tree-based data structure based on data generated within a time range corresponding to the allocated layout;
Receiving a search time range and a search pattern;
Determining a layout to search for the search pattern based on the input search time range, and reconstructing a tree-based data structure for the determined layout based on the input search pattern; And
And searching for an association pattern in the corresponding layout based on the reconstructed tree-based data structure.

2. The method of claim 1, wherein assigning
Receiving a time interval for the batch;
Comparing a time difference between a start time and a current time of a recently allocated batch with the input time interval; And
And if the time difference is greater than the input time interval, reassigning the arrangement for the time after the input time interval elapses from the start time of the recently allocated arrangement. Way.

The method of claim 1, wherein the tree-based data structure
Wherein each of the plurality of patterns has a structure divided according to a layout.

4. The method of claim 3, wherein the tree-based data structure
A header table composed of item entries for storing item information, and an item tree storing item information about the items.

5. The method of claim 4, wherein the header table
An item name, a frequency, and a link indicating a node for the item in the dislocation tree.

5. The method of claim 4, wherein the item node
An item name, a node link indicating a node for the same item generated immediately before, a pointer indicating a parent node, and a list of child nodes.

5. The method of claim 4, wherein reconstructing the tree-based data structure based on the input search pattern comprises:
And reconfiguring item nodes corresponding to items in the input search pattern to be child nodes of item nodes corresponding to other items.

8. The method of claim 7, wherein reconstructing the item nodes to be child nodes of the item nodes corresponding to the other items
Arranging items in the input search pattern according to a sort order of a header table in the tree-based data structure;
Arranging the items in the input search pattern in the header table so that the items in the input search pattern descend under items of another pattern while maintaining alignment; And
Selecting items in the sorted search pattern in reverse order and searching for item nodes corresponding to the selected item in the potential tree and changing the node order.

9. The method of claim 8, wherein changing the node order while searching for an item node corresponding to the selected item
If the item node corresponding to the selected item has a child node, switching the item node having the fastest sorting order among the child nodes and the position of the item node corresponding to the selected item, Mining method.

10. The method of claim 9, wherein changing the node order while searching for an item node corresponding to the selected item
Further comprising merging the child node and the sibling node if there is a sibling node having the same item for the child node after switching the location of the sibling node.

9. The method of claim 8, wherein changing the node order while searching for an item node corresponding to the selected item
And if the item node corresponding to the selected item does not have a child node, searching for the next item node along the node link.

9. The method of claim 8, wherein the step of searching for an association pattern in the corresponding layout based on the reconstructed tree-
Accessing an entry in the header table corresponding to the last item in the aligned search pattern;
Extracting all paths from the item node to the root node linked through the link with the corresponding entry; And
And generating an association pattern by combining the remaining items in the extracted paths based on the input search pattern.

13. The method of claim 12, wherein the step of searching for an association pattern in the corresponding layout based on the reconstructed tree-
Summing the frequencies in each arrangement within the search time range for each generated association pattern; And
And removing the association pattern in which the combined frequency is zero.

A memory for storing computer readable instructions; And
And a processor electrically connected to the memory and executing the following steps through the stored instructions according to a user's request,
The processor
Assigning a layout based on a set time interval;
Generating a tree-based data structure based on data generated within a time range corresponding to the allocated layout;
Receiving a search time range and a search pattern;
Determining a layout to search for the search pattern based on the input search time range, and reconstructing a tree-based data structure for the determined layout based on the input search pattern; And
And searching for an association pattern in the corresponding batch based on the reconstructed tree-based data structure.

15. The apparatus of claim 14, wherein the processor
Wherein the tree-based data structure is reconfigured so that item nodes corresponding to items in the input search pattern become child nodes of item nodes corresponding to other items.

16. The system of claim 15, wherein the processor
Arranging the items in the input search pattern according to the sort order of the header table in the tree-based data structure, and arranging the items in the input search pattern in the header table Descending order of the items in the ordered search pattern, and selecting the items in the ordered search pattern in reverse order and searching for the item node corresponding to the selected item in the potential tree, thereby changing the node order.

17. The system of claim 16, wherein the processor
Accesses an entry corresponding to the last item in the sorted search pattern in an entry in the header table, extracts all routes from the item node connected to the corresponding entry through the link to the root node, To generate an association pattern by combining the extracted items with remaining items existing in the extracted paths.

18. The apparatus of claim 17, wherein the processor
And summing the frequency numbers in each batch within the search time range for each of the generated association patterns, and removing the association pattern in which the combined frequency is zero.

A computer-readable recording medium on which a program for implementing an associative pattern mining method is recorded,
Assigning a layout based on a set time interval;
Generating a tree-based data structure based on data generated within a time range corresponding to the allocated layout;
Receiving a search time range and a search pattern;
Determining a layout to search for the search pattern based on the input search time range, and reconstructing a tree-based data structure for the determined layout based on the input search pattern; And
And searching for an association pattern in the corresponding layout based on the reconstructed tree-based data structure.