KR20140120711A

KR20140120711A - Method and apparatus for mining closed frequent pattern using pararell processing

Info

Publication number: KR20140120711A
Application number: KR1020130036933A
Authority: KR
Inventors: 박형민
Original assignee: 삼성전자주식회사
Priority date: 2013-04-04
Filing date: 2013-04-04
Publication date: 2014-10-14
Also published as: KR102081722B1

Abstract

The present invention relates to a method and an apparatus to mine a closed frequent pattern using parallel processing. According to an aspect of the present invention, a method to mine a closed frequent pattern comprises the steps of: generating and storing projection data using a frequent item as a prefix pattern in a database; allocating the stored projection data to a plurality of processing nodes in accordance to the priority of the stored projection data; generating a frequent pattern using the allocated projection data; comparing the frequent pattern with a closed frequent pattern list to update the closed frequent pattern list; and generating and storing projection data using the frequent pattern as a prefix pattern when the frequent pattern is added to the closed frequent pattern list.

Description

[0001] METHOD AND APPARATUS FOR MINING CLOSED FREQUENTIAL PATTERN USING PARALLEL PROCESSING [0002]

병렬 처리를 이용한 닫힌 빈발 패턴 마이닝 방법 및 장치와 관련된다.To a closed frequent pattern mining method and apparatus using parallel processing.

인터넷을 비롯한 IT 기술의 발달과 함께 축적되는 데이터의 양은 기하급수적으로 증가하고 있다. 이런 거대한 데이터를 분석해서 유용한 정보를 추출하는 일은 점점 어려워지고 있다. 이를 해결하려는 노력의 일환으로 데이터 마이닝 기법들이 연구 개발되어 각광을 받고 있다.The amount of data that accumulates with the development of IT technology, including the Internet, is increasing exponentially. It is becoming increasingly difficult to analyze this huge data and extract useful information. As an effort to solve this problem, data mining techniques have been researched and developed.

데이터 마이닝 기법들 중에서 연관관계 마이닝(association rules mining) 기법은 데이터 내의 아이템(item)들이 어떻게 연관되어 있는지 규칙을 찾아내는 기법이다. 이를 위해 먼저 빈발 패턴(frequent pattern)을 찾아내고 이로부터 연관관계 규칙을 찾아낸다. 그러나 데이터의 양이 증가함에 따라 추출되는 빈발 패턴의 수도 증가하여, 최근에는 빈발 패턴의 수 자체도 너무 많아서 분석이 곤란한 상황에 이르렀다. 이에 닫힌 빈발 패턴(closed frequent pattern)이라는 많은 수의 빈발 패턴을 하나의 패턴으로 표현하는 개념이 제안되었다.Among the data mining techniques, association rules mining is a technique to find out how the items in the data are related. To do this, we first find a frequent pattern and find the association rules from it. However, as the amount of data increases, the number of frequent patterns to be extracted increases, and in recent years, the number of frequent patterns is too large to analyze. The concept of expressing a large number of frequent patterns called closed frequent patterns as a pattern has been proposed.

닫힌 빈발 패턴을 추출하기 위해, 초기에는 모든 빈발 패턴을 발견한 후 발견된 빈발 패턴에서 닫힌 빈발 패턴을 찾아냈으나 빈발 패턴을 추출하는데 시간이 너무 오래 걸리고 많은 빈발 패턴들이 결국에는 하나의 닫힌 빈발 패턴으로 압축되기 때문에 비효율적이었다. 그래서 데이터로부터 직접 닫힌 빈발 패턴들을 추출하는 기법들이 개발되었다.In order to extract a closed frequent pattern, it was found that all the frequent patterns were found and the closed frequent patterns were found in the frequent patterns found. However, it took too long to extract the frequent patterns and many frequent patterns eventually resulted in one closed frequent patterns Which is inefficient. Thus, techniques have been developed to extract frequent closed patterns directly from the data.

기본적으로 닫힌 빈발 패턴을 찾아내는 알고리즘들은 지수 시간 복잡도(exponential time complexity)를 가진다. 모든 가능한 패턴들을 깊이 우선 방식(depth-first search method)이나 너비 우선 방식(width-first search method)으로 탐색하면서 원하는 닫힌 빈발 패턴인지를 검사한다. 이 때 탐색되는 방식을 이용하여 검사되는 패턴들을 트리 형태로 구조화해서 나열할 수 있는데, 닫힌 빈발 패턴의 정의에 의해 특정 패턴을 루트로 하는 서브 트리에 속한 패턴들이 모두 닫힌 빈발 패턴이 아닐 수 있다. 이를 얼마나 빨리 직접적인 탐색 없이 알아낼 수 있느냐에 따라 알고리즘의 성능이 크게 좌우된다.Basically, algorithms that find closed frequent patterns have exponential time complexity. All possible patterns are searched by a depth-first search method or a width-first search method to check whether a desired closed closed pattern is desired. In this case, the patterns to be inspected can be structured in the form of a tree, and the closed patterns may not be a closed pattern in which the patterns belonging to a subtree rooted at a specific pattern are closed. The performance of algorithms depends heavily on how quickly it can be found without a direct search.

병렬 처리를 이용한 닫힌 빈발 패턴 마이닝 방법 및 장치를 제공하는 것을 목적으로 한다.It is an object of the present invention to provide a closed frequent pattern mining method and apparatus using parallel processing.

일 양상에 따른 병렬 처리를 이용한 닫힌 빈발 패턴 마이닝 방법은 데이터베이스에서 빈발 아이템을 접두 패턴으로 하는 투영 데이터를 생성하여 저장하는 단계, 상기 저장된 투영 데이터간의 우선 순위에 따라 상기 저장된 투영 데이터를 복수의 프로세싱 노드에 할당하는 단계, 상기 할당된 투영 데이터를 이용하여 빈발 패턴을 생성하는 단계, 상기 빈발 패턴을 닫힌 빈발 패턴 리스트와 비교하여 상기 닫힌 빈발 패턴 리스트를 갱신하는 단계 및 상기 빈발 패턴이 상기 닫힌 빈발 패턴 리스트에 추가된 경우, 상기 빈발 패턴을 접두 패턴으로 하는 투영 데이터를 생성하여 저장하는 단계를 포함할 수 있다.A closed frequent pattern mining method using parallel processing according to an aspect includes generating and storing projection data that prefixes a frequent item in a database and storing the projection data in a plurality of processing nodes according to a priority among the stored projection data, Generating a frequent pattern using the allocated projection data, updating the closed frequent pattern list by comparing the frequent pattern with a closed frequent pattern list, and updating the closed frequent pattern list using the closed frequent pattern list A step of generating and storing projection data having the frequent pattern as a prefix pattern may be included.

일 양상에 따르면, 상기 할당하는 단계는 상기 저장된 투영 데이터의 접두 패턴을 기준으로 깊이 우선 탐색 순서에 따라 상기 저장된 투영 데이터를 할당할 수 있다.According to an aspect, the allocating step may allocate the stored projection data according to a depth-first search order based on a prefix pattern of the stored projection data.

일 양상에 따르면, 상기 데이터베이스에서 빈발 아이템을 접두 패턴으로 하는 투영 데이터를 생성하여 저장하는 단계는 상기 데이터베이스에서 빈발 아이템을 찾는 단계, 상기 빈발 아이템을 접두 패턴으로 하는 투영 데이터를 생성하는 단계 및 상기 생성된 투영 데이터를 저장하는 단계를 포함할 수 있다.According to an aspect of the present invention, the step of generating and storing projection data having a frequent item as a prefix pattern in the database includes a step of searching a frequent item in the database, a step of generating projection data in which the frequent item is a prefix pattern, And storing the projected projection data.

일 양상에 따르면, 상기 빈발 패턴을 생성하는 단계는 상기 할당된 투영 데이터에서 빈발 아이템을 찾는 단계 및 상기 할당된 투영 데이터의 접두 패턴과 상기 할당된 투영 데이터의 빈발 아이템을 결합하여 빈발 패턴을 생성하는 단계를 포함할 수 있다.According to an aspect of the present invention, the step of generating the frequent pattern may include a step of finding a frequent item in the allocated projection data, and a step of generating a frequent pattern by combining the preliminary pattern of the allocated projected data and a frequent item of the allocated projected data Step < / RTI >

일 양상에 따르면, 상기 닫힌 빈발 패턴 리스트를 갱신하는 단계는 상기 닫힌 빈발 패턴 리스트에 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 후보가 존재하는지 여부에 따라 상기 닫힌 빈발 패턴 리스트를 갱신할 수 있다.According to an aspect, the updating of the closed frequent pattern list may update the closed frequent pattern list according to whether a closed frequent pattern candidate having the same degree of support as the frequent pattern exists in the closed frequent pattern list.

일 양상에 따르면, 상기 닫힌 빈발 패턴 리스트를 갱신하는 단계는 상기 닫힌 빈발 패턴 리스트에 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 상기 빈발 패턴을 상기 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to an aspect, the updating of the closed frequent pattern list may include adding the frequent pattern to the closed frequent pattern list when there is no closed frequent pattern candidate having the same degree of support as the frequent pattern in the closed frequent pattern list can do.

일 양상에 따르면, 상기 닫힌 빈발 패턴 리스트를 갱신하는 단계는 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 상기 빈발 패턴을 부분집합으로 하는 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 상기 빈발 패턴을 상기 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to an aspect of the present invention, the step of updating the closed frequent pattern list may further include the step of, when there is no closed frequent pattern candidate that is a subset of the frequent patterns among the closed pattern candidates having the same degree of support as the frequent pattern, You can add it to a closed frequent pattern list.

일 양상에 따르면, 상기 닫힌 빈발 패턴 리스트를 갱신하는 단계는 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 상기 빈발 패턴의 지지 레코드와 동일한 지지 레코드를 가진 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 상기 빈발 패턴을 상기 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to an aspect of the present invention, in the step of updating the closed frequent pattern list, when there is no closed frequent pattern candidate having the same support record as the support pattern of the frequent pattern among the closed pattern candidates having the same support degree as the frequent pattern, A frequent pattern can be added to the closed frequent pattern list.

일 양상에 따르면, 상기 닫힌 빈발 패턴 리스트를 갱신하는 단계는 상기 빈발 패턴의 지지도가 미리 설정된 값 미만인 경우, 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 상기 빈발 패턴의 지지 레코드와 동일한 지지 레코드를 가진 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단하고, 상기 빈발 패턴의 지지도가 미리 설정된 값 이상인 경우, 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 상기 빈발 패턴을 부분집합으로 하는 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단할 수 있다.According to an aspect of the present invention, the step of updating the closed frequent pattern list may further include the step of, when the support degree of the frequent pattern is less than a predetermined value, storing the same support record as the support record of the frequent pattern among the closed pattern candidates having the same degree of support as the frequent pattern A closed frequent pattern candidate having the frequent pattern candidate as a subset of the closed pattern candidates having the same degree of support as the frequent pattern is determined if the closed frequent pattern candidate having the same degree of support as the frequent pattern is present, It can be determined whether or not it exists.

일 양상에 따르면, 상기 빈발 패턴을 접두 패턴으로 하는 투영 데이터를 생성하여 저장하는 단계는 상기 저장된 투영 데이터의 수가 미리 설정된 값 이상인 경우, 상기 할당받은 투영 데이터로부터 하나의 투영 데이터만을 생성하고, 상기 할당 받은 투영 데이터를 다음에 투영할 빈발 아이템과 함께 저장할 수 있다.According to an aspect of the present invention, generating and storing projection data having the frequent pattern as a prefix pattern may generate only one projection data from the allocated projection data when the number of stored projection data is equal to or larger than a preset value, The received projection data may be stored with the next item to be projected.

일 양상에 따른 병렬 처리를 이용한 닫힌 빈발 패턴 마이닝 장치는 복수의 프로세싱 노드에서 생성된 투영 데이터를 저장하고, 저장된 투영 데이터를 우선 순위에 따라 복수의 프로세싱 노드에 할당하는 투영 데이터 제공부, 닫힌 빈발 패턴 후보를 포함하는 닫힌 빈발 패턴 리스트를 저장하는 닫힌 빈발 패턴 관리부 및 상기 투영 데이터 제공부로부터 할당된 투영 데이터를 이용하여 빈발 패턴을 생성하여 상기 닫힌 빈발 패턴 리스트를 갱신하고, 상기 빈발 패턴이 상기 닫힌 빈발 패턴 리스트에 추가된 경우, 상기 빈발 패턴을 접두 패턴으로 하는 투영 데이터를 생성하는 복수의 프로세싱 노드를 포함할 수 있다.Closed frequent pattern mining apparatus using parallel processing according to one aspect includes projection data providing unit for storing projection data generated at a plurality of processing nodes and allocating stored projection data to a plurality of processing nodes in accordance with priority, Generating a frequent pattern using the closed frequent pattern management unit storing the closed frequent pattern list including the candidates and the projection data allocated from the projection data providing unit to update the closed frequent pattern list, When added to the pattern list, it may include a plurality of processing nodes that generate projection data that makes the frequent pattern a pre-pattern.

일 양상에 따르면, 상기 투영 데이터 제공부는 상기 저장된 투영 데이터의 접두 패턴을 기준으로 깊이 우선 탐색 순서에 따라 상기 저장된 투영 데이터를 할당할 수 있다.According to an aspect, the projection data providing unit may allocate the stored projection data according to a depth-first search order based on a prefix pattern of the stored projection data.

일 양상에 따르면, 상기 복수의 프로세싱 노드 중 임의의 하나의 프로세싱 노드는 데이터베이스에서 빈발 아이템을 찾은 후, 상기 빈발 아이템을 접두 패턴으로 하는 투영 데이터를 생성할 수 있다.According to an aspect, any one of the plurality of processing nodes may generate frequent items in the database, and then generate projection data with the frequent items as a prefix pattern.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 할당된 투영 데이터에서 빈발 아이템을 찾은 후, 찾은 빈발 아이템을 상기 할당된 투영 데이터의 접두 패턴과 결합하여 빈발 패턴을 생성할 수 있다.According to an aspect, the plurality of processing nodes may search a frequent item in the allocated projection data, and then combine the found frequent items with the prefix pattern of the allocated projection data to generate a frequent pattern.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 닫힌 빈발 패턴 리스트에 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 후보가 존재하는지 여부에 따라 상기 닫힌 빈발 패턴 리스트를 갱신할 수 있다. According to an aspect, the plurality of processing nodes can update the closed frequent pattern list according to whether a closed frequent pattern candidate having the same degree of support as the frequent pattern exists in the closed frequent pattern list.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 닫힌 빈발 패턴 리스트에 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 상기 빈발 패턴을 상기 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to an aspect, the plurality of processing nodes may add the frequent pattern to the closed frequent pattern list when there is no closed frequent pattern candidate having the same degree of support as the frequent pattern in the closed frequent pattern list.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 중 상기 빈발 패턴을 부분집합으로 포함하는 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 상기 빈발 패턴을 상기 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to one aspect, when there are no closed frequent pattern candidates that include the frequent patterns as a subset among the closed frequent patterns having the same degree of support as the frequent patterns, the plurality of processing nodes may convert the frequent patterns into the closed frequent patterns You can add it to the list.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 후보 중 상기 빈발 패턴의 지지 레코드와 동일한 지지 레코드를 가진 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 상기 빈발 패턴을 상기 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to an aspect of the present invention, when there are no closed frequent pattern candidates having the same support record as the support pattern of the frequent pattern among the closed frequent pattern candidates having the same degree of support as the frequent pattern, Can be added to the closed frequent pattern list.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 빈발 패턴의 지지도가 미리 설정된 값 미만인 경우, 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 상기 빈발 패턴의 지지 레코드와 동일한 지지 레코드를 가진 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단하고, 상기 빈발 패턴의 지지도가 미리 설정된 값 이상인 경우, 상기 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 상기 빈발 패턴을 부분집합으로 하는 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단할 수 있다.According to an aspect of the present invention, when the degree of support of the frequent pattern is less than a predetermined value, the plurality of processing nodes determine whether a closed frequent pattern having the same support record as the support record of the frequent pattern among the closed pattern candidates having the same degree of support as the frequent pattern Determining whether or not a closed frequent pattern candidate having a subset of the frequent patterns among the closed pattern candidates having the same degree of support as the frequent pattern exists if the degree of support of the frequent pattern is equal to or greater than a predetermined value, It can be judged.

일 양상에 따르면, 상기 복수의 프로세싱 노드는 상기 투영 데이터 제공부에 저장된 투영 데이터의 수가 미리 설정된 값 이상인 경우, 상기 할당 받은 투영 데이터로부터 하나의 투영 데이터만을 생성하고, 상기 할당 받은 투영 데이터를 다음에 투영할 빈발 아이템과 함께 저장할 수 있다.According to an aspect of the present invention, the plurality of processing nodes generate only one projection data from the allocated projection data when the number of projection data stored in the projection data providing unit is equal to or greater than a preset value, It can be saved together with the frequent items to be projected.

병렬화를 통해 닫힌 빈발 패턴 마이닝에 걸리는 시간을 줄일 수 있다.Parallelization can reduce the time taken for frequent pattern mining that is closed.

나아가, 닫힌 빈발 패턴일 가능성이 높은 빈발 패턴을 우선적으로 생성하도록 하여 닫힌 빈발 패턴 마이닝을 위한 탐색 공간을 줄일 수 있다.Furthermore, it is possible to reduce the search space for closed frequent pattern mining by preferentially generating frequent patterns that are likely to be closed frequent patterns.

도 1은 투영 데이터를 설명하기 위한 예시도,
도 2는 일 실시예에 따른 병렬 처리를 이용한 닫힌 빈발패턴 마이닝 장치의 구성도,
도 3a 내지 도 3f는 깊이 우선 탐색 방법에 따른 병렬 처리과정을 설명하기 위한 예시도,
도 4는 닫힌 빈발 패턴 리스트의 예시도,
도 5는 데이터베이스에서 투영 데이터를 생성하는 과정을 나타내는 순서도,
도 6은 복수의 프로세싱 노드에 의한 닫힌 빈발 패턴 마이닝 과정을 나타내는 순서도이다.1 is an illustration for explaining projection data,
FIG. 2 is a block diagram of a closed frequent pattern mining apparatus using parallel processing according to an embodiment. FIG.
FIGS. 3A to 3F are diagrams for explaining a parallel processing process according to a depth-first search method;
Figure 4 is an illustration of a closed frequent pattern list,
5 is a flowchart showing a process of generating projection data in a database,
6 is a flowchart illustrating a closed frequent pattern mining process by a plurality of processing nodes.

이하, 첨부된 도면을 참조하여 기술되는 바람직한 실시예를 통하여 본 발명을 당업자가 용이하게 이해하고 재현할 수 있도록 상세히 기술하기로 한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout.

도 1은 투영 데이터를 설명하기 위한 예시도이다.1 is an exemplary diagram for explaining projection data.

도 1을 참조하면, 도시된 예에서 예시적인 데이터베이스(110)와 빈발 아이템 a를 접두 패턴으로 하는 투영 데이터(120), 빈발 아이템 b를 접두 패턴으로 하는 투영 데이터(130) 및 빈발 아이템 c를 접두 패턴으로 하는 투영 데이터(140)가 각각 도시되어 있다.Referring to FIG. 1, in the illustrated example, projection data 120, which preforms an exemplary database 110 and frequent item a, preforms projection data 130, which prefixes frequent item b, and frequent item c, And projection data 140 in the form of a pattern are shown.

빈발 아이템은 사용자가 정의한 최소 지지도(minimum support)를 만족하는 아이템을 의미한다. 또한, 최소 지지도를 만족하는 패턴을 빈발 패턴(frequent pattern)이라고 하며, 패턴은 각각의 아이템들의 집합을 의미한다.A frequent item means an item that satisfies the minimum support defined by the user. In addition, a pattern satisfying the minimum support is called a frequent pattern, and a pattern means a set of items.

한편, 지지도(support)는 특정 패턴을 포함하는 레코드의 수를 의미하며, 최소 지지도는 특정 패턴이 빈발 패턴이 되기 위해 요구되는 지지도의 최소 값을 의미한다. On the other hand, support means the number of records including a specific pattern, and minimum support means a minimum value of support required for a specific pattern to become a frequent pattern.

도시된 예에서 최소 지지도가 2인 경우, 데이터베이스(110)에서 빈발 아이템은 a, b 및 c이다. In the illustrated example, if the minimum support is 2, the frequent items in database 110 are a, b, and c.

한편, 투영 데이터(projection data)는 데이터베이스에서 특정 빈발 아이템 또는 특정 빈발 패턴을 접두 패턴으로 하는 패턴들을 가지는 레코드를 모은 집합을 의미한다. Projection data, on the other hand, is a collection of records that have patterns in a database that are prefixed with a specific frequent item or a specific frequent pattern.

예를 들어, 도 1에서 a를 접두 패턴으로 하는 투영 데이터(120)는 데이터베이스에서 a를 접두 패턴으로 포함하는 패턴들과 각각의 패턴을 가지는 레코드를 포함한다. 이때, 접두 패턴 a는 투영 데이터에 포함된 모든 패턴에 대해 나타나므로, 별도로 표시하지 않을 수 있다.For example, the projection data 120 with the prefix pattern a in FIG. 1 includes patterns with a pattern as a prefix in the database and a record with each pattern. At this time, the prefix pattern a is displayed for all the patterns included in the projection data, so that the prefix pattern a may not be displayed separately.

한편, 데이터베이스(110)는 예를 들어 고객별 구입 아이템의 집합일 수 있다. 이때, 레코드는 고객의 식별정보일 수 있고, 각각의 아이템은 고객이 구입한 상품을 의미할 수 있다. Meanwhile, the database 110 may be, for example, a set of purchase items per customer. At this time, the record may be identification information of the customer, and each item may mean a product purchased by the customer.

또 다른 예로 데이터베이스(110)는 환자의 건강 검진 데이터를 나타낸 것일 수 있다. 이때, 레코드는 환자의 식별 정보이며, 각각의 아이템은 건강 검진 데이터를 나타낸 것일 수 있다. 다만, 데이터베이스(110)는 예시된 것에 한정되는 것은 아니다.In another example, the database 110 may represent the patient's health screening data. At this time, the record is identification information of the patient, and each item may be indicative of the health examination data. However, the database 110 is not limited to the illustrated ones.

도 2는 일 실시예에 따른 병렬 처리를 이용한 닫힌 빈발패턴 마이닝 장치의 구성도이다. FIG. 2 is a block diagram of a closed frequent pattern mining apparatus using parallel processing according to an embodiment.

도 2를 참조하면, 일 실시예에 따른 빈발 패턴 마이닝 장치(200)는 복수의 프로세싱 노드(210)를 포함할 수 있다. Referring to FIG. 2, a frequent pattern mining apparatus 200 according to an embodiment may include a plurality of processing nodes 210.

일 실시예에 따르면, 각각의 프로세싱 노드(210)는 멀티코어 프로세서에 포함되는 각각의 코어일 수 있다. According to one embodiment, each processing node 210 may be a respective core included in a multicore processor.

또 다른 실시예에 따르면, 각각의 프로세싱 노드(210)는 싱글 코어 프로세서일 수 있다.According to another embodiment, each processing node 210 may be a single core processor.

또 다른 실시예에 따르면, 프로세싱 노드(210)는 분산 컴퓨팅 환경을 구성하는 각각의 컴퓨팅 장치일 수 있다. According to another embodiment, the processing node 210 may be a respective computing device that constitutes a distributed computing environment.

복수의 프로세싱 노드(210)는 투영 데이터 제공부(230)에서 닫힌 빈발 패턴을 마이닝하기 위한 작업(job)을 할당 받아 병렬적으로 처리할 수 있다. The plurality of processing nodes 210 may be assigned a job for mining a closed frequent pattern in the projection data providing unit 230 and may process the same in parallel.

구체적으로, 복수의 프로세싱 노드(210)는 투영 데이터 제공부(230)로부터 투영 데이터를 할당 받아 빈발 패턴을 생성하고, 생성된 빈발 패턴이 닫힌 빈발 패턴(closed frequent pattern)인지 여부를 판단할 수 있다. More specifically, the plurality of processing nodes 210 may generate a frequent pattern by allocating the projection data from the projection data providing unit 230, and determine whether the generated frequent pattern is a closed frequent pattern .

이때, 빈발 패턴 중 동일한 지지도를 가진 다른 빈발 패턴에 포함되지 않는 빈발 패턴을 닫힌 빈발 패턴으로 정의할 수 있다.In this case, a frequent pattern that is not included in another frequent pattern having the same degree of support among frequent patterns can be defined as a closed frequent pattern.

일 실시예에 따르면, 복수의 프로세싱 노드(210) 중 임의의 하나의 프로세싱 노드는 데이터베이스에서 모든 빈발 아이템을 찾은 후 각각의 빈발 아이템을 접두 패턴으로 하는 투영 데이터를 생성할 수 있다. According to one embodiment, any one of the plurality of processing nodes 210 may generate all of the frequent items in the database and then projection data that pre-patterns each frequent item.

예를 들어, 도 1에 도시된 것과 같은 데이터베이스(110)가 주어지고, 최소 지지도가 2인 경우, 빈발 아이템은 a, b 및 c이며, a, b 및 c를 접두 패턴으로 하는 투영 데이터(120 내지 140)가 생성될 수 있다.For example, given a database 110 such as that shown in FIG. 1, and with a minimum support of 2, the frequent items are a, b, and c and the projection data 120 To 140 may be generated.

한편, 데이터베이스에서 빈발 아이템 탐색 및 빈발 아이템을 접두 패턴으로 하는 투영 데이터 생성은 복수의 프로세싱 노드(210) 중 임의의 하나의 프로세싱 노드에서 수행될 수 있다. On the other hand, generation of projection data that prefixes frequent items and frequent items in the database can be performed at any one of the processing nodes 210 of the plurality of processing nodes 210.

예를 들어, 현재 각 프로세싱 노드의 동작 상태 또는 각 프로세싱 노드의 성능을 고려하여 하나의 프로세싱 노드가 임의로 선택될 수 있다. For example, one processing node can be selected arbitrarily considering the current operating state of each processing node or the performance of each processing node.

투영 데이터 제공부(230)는 프로세싱 노드(210)에서 생성된 투영 데이터를 저장하고, 저장된 투영 데이터 중 깊이 우선 탐색 방법(pseudo depth-first search method)에 따른 우선 순위가 높은 투영 데이터를 복수의 프로세싱 노드(210)에 우선적으로 제공할 수 있다. The projection data providing unit 230 stores the projection data generated at the processing node 210 and outputs the high-priority projection data according to the pseudo depth-first search method among the stored projection data to a plurality of processing units Node 210 in a similar manner.

한편, 일 실시예에 따르면, 투영 데이터 제공부(230)는 우선순위 큐(Priority Queue)로 구현될 수 있다. 이때, 우선 순위 큐는 투영 데이터의 접두 패턴을 키(key)로 하는 힙(heap)과 같은 자료구조에 기반하여 구현될 수 있다. Meanwhile, according to one embodiment, the projection data providing unit 230 may be implemented with a priority queue. At this time, the priority queue can be implemented based on a data structure such as a heap with a prefix pattern of projection data as a key.

한편, 복수의 프로세싱 노드(210) 각각은 투영 데이터 제공부(230)로부터 투영 데이터를 제공 받아 빈발 패턴을 생성할 수 있다. Meanwhile, each of the plurality of processing nodes 210 may receive the projection data from the projection data providing unit 230 to generate a frequent pattern.

일 실시예에 따르면, 복수의 프로세싱 노드(210) 각각은 투영 데이터 제공부(230)에서 제공받은 투영 데이터에서 빈발 아이템을 찾은 후, 빈발 아이템을 투영 데이터의 접두 패턴과 결합하여 빈발 패턴을 생성할 수 있다. According to one embodiment, each of the plurality of processing nodes 210 finds a frequent item from the projection data provided by the projection data providing unit 230, combines the frequent item with the pre-pattern of the projection data to generate a frequent pattern .

예를 들어, 도 1을 참조하면, 접두 패턴 a에 대한 투영 데이터(120)에서 빈발 아이템은 b 및 c이다. 따라서, 접두 패턴 a에 대한 투영 데이터(120)를 제공받은 프로세싱 노드는 접두 패턴 a와 빈발 아이템 b 또는 c를 결합하여 빈발 패턴 {a, b}와 {a, c}를 생성할 수 있다. For example, referring to FIG. 1, the frequent items in the projection data 120 for the prefix pattern a are b and c. Thus, the processing node that has been provided with the projection data 120 for the prefix pattern a may combine the prefix pattern a and the frequent item b or c to generate frequent patterns {a, b} and {a, c}.

또한, 접두 패턴 b에 대한 투영 데이터(130)에서 빈발 아이템은 c이다. 따라서, 접두 패턴 b에 대한 투영 데이터를 제공받은 프로세싱 노드는 접두 패턴 b와 빈발 아이템 c를 결합하여 빈발 패턴 {b, c}를 생성할 수 있다. Also, the frequent item in the projection data 130 for the prefix pattern b is c. Thus, the processing node that has received the projection data for the prefix pattern b can combine the prefix pattern b and the frequent item c to generate frequent patterns {b, c}.

한편, 복수의 프로세싱 노드(210) 각각은 생성된 빈발 패턴을 접두 패턴으로 하는 투영 데이터를 생성하여 투영 데이터 제공부(230)에 저장할 수 있다.Meanwhile, each of the plurality of processing nodes 210 may generate projection data with the generated frequent pattern as a prefix pattern, and store the generated projection data in the projection data providing unit 230.

한편, 빈발 패턴의 생성과 투영 데이터의 생성을 완료한 프로세싱 노드는 투영 데이터 제공부(230)에 저장된 투영 데이터 중 우선 순위가 가장 높은 투영 데이터를 제공받아 동일한 과정을 반복할 수 있다.Meanwhile, the processing node that has completed the generation of the frequent pattern and the projection data may receive the projection data having the highest priority among the projection data stored in the projection data providing unit 230, and repeat the same process.

닫힌 빈발 패턴 관리부(250)는 닫힌 빈발 패턴 리스트를 저장할 수 있다. The closed frequent pattern management unit 250 may store the closed frequent pattern list.

일 실시예에 따르면, 복수의 프로세싱 노드(210) 각각은 생성된 빈발 패턴을 닫힌 빈발 패턴 리스트에 포함된 닫힌 빈발 패턴 후보들과 비교하여 닫힌 빈발 패턴 리스트를 갱신할 수 있다. According to one embodiment, each of the plurality of processing nodes 210 may update the closed frequent pattern list by comparing the generated frequent pattern with the closed frequent pattern candidates included in the closed frequent pattern list.

구체적으로, 복수의 프로세싱 노드(210) 각각은 닫힌 빈발 패턴 리스트 상에 포함되어 있는 닫힌 빈발 패턴 후보 중 각각의 프로세싱 노드에서 생성된 빈발 패턴과 동일한 지지도를 가지는 빈발 패턴 후보가 있는지 여부를 판단하여 닫힌 빈발 패턴 리스트를 갱신할 수 있다.Specifically, each of the plurality of processing nodes 210 determines whether there is a frequent pattern candidate having the same degree of support as the frequent pattern generated at each processing node among the closed frequent pattern candidates included on the closed frequent pattern list, The frequent pattern list can be updated.

이때, 닫힌 빈발 패턴 리스트 상에 생성된 빈발 패턴과 동일한 지지도를 가지는 빈발 패턴 후보가 존재하지 않는 경우, 생성된 빈발 패턴은 닫힌 빈발 패턴일 가능성이 높으므로, 닫힌 빈발 패턴 리스트에 추가될 수 있다.At this time, if there is no frequent pattern candidate having the same degree of support as the frequent pattern generated on the closed frequent pattern list, the generated frequent pattern is likely to be a closed frequent pattern, so that it can be added to the closed frequent pattern list.

또한, 일 실시예에 따르면, 생성된 빈발 패턴과 동일한 지지도를 가지는 닫힌 빈발 패턴 후보 중 생성된 빈발 패턴을 부분집합으로 하는 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 생성된 빈발 패턴을 닫힌 빈발 패턴 리스트에 추가할 수 있다. According to an embodiment, when there is no closed frequent pattern candidate that is a subset of the frequently generated frequent pattern candidates having the same degree of support as the generated frequent pattern, the generated frequent pattern is classified into a closed frequent pattern list . &Lt; / RTI >

즉, 닫힌 빈발 패턴 리스트 상에 생성된 빈발 패턴과 동일한 지지도를 가진 닫힌 빈발 패턴 후보가 존재하지 않거나, 존재하더라도 생성된 빈발 패턴을 부분집합으로 포함하는 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 생성된 빈발 패턴은 닫힌 빈발 패턴일 가능성이 있으므로, 닫힌 빈발 패턴 리스트에 추가될 수 있다.That is, if there is no closed frequent pattern candidate having the same degree of support as the frequent pattern generated on the closed frequent pattern list, or if there is no closed frequent pattern candidate including the generated frequent pattern as a subset even if it exists, Frequent patterns are likely to be closed frequent patterns, so they can be added to a closed frequent pattern list.

한편, 닫힌 빈발 패턴 후보 중 생성된 빈발 패턴과 동일한 지지도를 가지고 생성된 빈발 패턴에 부분집합으로 포함되는 닫힌 빈발 패턴 후보는 닫힌 빈발 패턴이 아닌 것으로 확정되므로 닫힌 빈발 패턴 리스트에서 삭제될 수 있다.On the other hand, a closed frequent pattern candidate included in a subset of the frequent patterns generated with the same support as the frequent patterns generated from the closed frequent pattern candidates is determined as not a closed frequent pattern, and thus can be deleted from the closed frequent pattern list.

한편, 또 다른 실시예에 따르면, 생성된 빈발 패턴과 동일한 지지도를 가지는 닫힌 빈발 패턴 후보 중 생성된 빈발 패턴과 동일한 지지 레코드를 가지는 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 생성된 빈발 패턴을 닫힌 빈발 패턴 리스트에 추가할 수 있다.According to another embodiment of the present invention, if there is no closed frequent pattern candidate having the same support record as the frequent pattern generated from the closed frequent pattern candidates having the same degree of support as the generated frequent pattern, You can add it to the pattern list.

반면, 생성된 빈발 패턴과 동일한 지지도를 가지는 닫힌 빈발 패턴 후보 중 생성된 빈발 패턴과 동일한 지지 레코드를 가지고 생성된 빈발 패턴을 부분집합으로 포함하는 닫힌 빈발 패턴 후보가 존재하는 경우, 생성된 빈발 패턴은 닫힌 빈발 패턴이 될 수 없으므로, 닫힌 빈발 패턴 리스트에 추가되지 않는다.On the other hand, if there is a closed frequent pattern candidate that includes a frequent pattern generated from a closed frequent pattern candidate having the same support as the generated frequent pattern and a frequent pattern generated with the same supporting record, It can not be a closed frequent pattern, so it is not added to the closed frequent pattern list.

한편, 일 실시예에 따르면, 복수의 프로세싱 노드(210)는 생성된 빈발 패턴의 지지도가 미리 설정된 값 미만인 경우, 생성된 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 생성된 빈발 패턴의 지지 레코드와 동일한 지지 레코드를 가진 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단할 수 있다. According to one embodiment, when the support degree of the created frequent pattern is less than a preset value, the plurality of processing nodes 210 may generate a support record of the frequent pattern generated among the closed pattern candidates having the same degree of support as the created frequent pattern, It can be determined whether there is a closed frequent pattern candidate with the same support record.

이때, 생성된 빈발 패턴의 지지도가 미리 설정된 값 이상인 경우, 생성된 빈발 패턴과 동일한 지지도를 가진 닫힌 패턴 후보 중 생성된 빈발 패턴을 부분집합으로 하는 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단할 수 있다.At this time, when the support degree of the generated frequent pattern is equal to or greater than a predetermined value, it can be determined whether there is a closed frequent pattern candidate having a subset of the generated frequent patterns among the closed pattern candidates having the same degree of support as the generated frequent patterns .

즉, 대체로 빈발 패턴의 길이가 길수록 지지도가 낮아지고, 빈발 패턴의 길이가 짧을수록 지지도가 커지므로, 빈발 패턴의 빈도가 일정 값 미만인 경우, 빈발 패턴의 지지 레코드와 닫힌 빈발 패턴 후보의 지지 레코드를 비교하도록 하여 연산량을 줄일 수 있다.In other words, as the length of the frequent pattern is generally shorter, the support degree becomes lower. When the frequency of the frequent pattern is shorter, the support degree becomes larger. Therefore, when the frequency of the frequent pattern is less than a predetermined value, So that the amount of computation can be reduced.

한편, 일 실시예에 따르면, 복수의 프로세싱 노드(210)는 생성된 빈발 패턴이 닫힌 빈발 패턴 리스트에 추가되지 않은 경우, 해당 빈발 패턴을 접두 패턴으로하는 투영 데이터를 생성하지 않음으로써, 탐색 공간을 줄일 수 있다.According to an exemplary embodiment, when the generated frequent pattern is not added to the closed frequent pattern list, the plurality of processing nodes 210 do not generate projection data having the frequent pattern as a prefix pattern, Can be reduced.

도 3a 내지 도 3f는 깊이 우선 탐색 방법에 따른 병렬 처리과정을 설명하기 위한 예시도이다.FIGS. 3A to 3F are diagrams for explaining a parallel processing process according to a depth-first search method.

도 3a 내지 도 3f에서 도 1에 도시된 데이터베이스(110)와 동일한 데이터베이스가 주어진 것으로 가정하며, 최소 지지도는 2인 것으로 가정한다. 또한, 프로세싱 노드는 2 개가 존재하는 것으로 가정한다.It is assumed that the same database as the database 110 shown in FIG. 1 is given in FIGS. 3A to 3F, and the minimum support is assumed to be two. It is also assumed that there are two processing nodes.

도 3a는 데이터베이스(110)에서 생성 가능한 빈발 패턴을 이용한 탐색 공간을 나타낸다. 도시된 예에서, 트리 상의 노드는 데이터베이스(110)에서 생성 가능한 빈발 패턴을 나타내며, 이는 투영 데이터의 접두 패턴이 될 수 있다. 3A shows a search space using frequent patterns that can be generated in the database 110. FIG. In the illustrated example, the nodes on the tree represent frequent patterns that can be generated in the database 110, which may be the prefix pattern of the projection data.

한편, 깊이 우선 탐색 방법(depth-first search method)에 의하면, 도시된 예에서 {a}->{a, b}->{a, b, c}->{a, c}->{b}->{b, c}->{c}의 순서로 높은 우선 순위를 가진다. On the other hand, according to the depth-first search method, {a, -} {a, b} -> {a, b, c} -> {a, c} -> {b } -> {b, c} -> {c}.

이때, 일 실시예에 따르면, 빈발 아이템 간의 우선 순위는 지지도가 가장 낮은 빈발 아이템이 높은 우선 순위를 가지도록 정할 수 있다. 다만, 반드시 이에 한정되는 것은 아니며, 알파벳 순서 등 다양한 방법에 의해 정해질 수 있다.At this time, according to one embodiment, the priority among the frequent items can be set so that the frequent items having the lowest support have the highest priority. However, the present invention is not limited thereto, and it can be determined by various methods such as alphabetical order.

한편, 도 3b 도 3f에 도시된 예에서는 빈발 아이템 간의 우선 순위는 알파벳 순서로 정해진 것으로 가정한다.Meanwhile, in the example shown in FIG. 3B, it is assumed that the priority order of the frequent items is determined in alphabetical order.

도 3b를 참조하면, 프로세싱 노드 1은 데이터베이스(110)에서 빈발 아이템을 찾을 수 있다. 도시된 예에서 빈발 아이템은 a, b 및 c이다. Referring to FIG. 3B, processing node 1 may find frequent items in database 110. In the illustrated example, the frequent items are a, b, and c.

이후, 각각의 빈발 아이템을 접두 패턴으로 하는 투영 데이터(Da, Db 및 Dc)를 생성할 수 있다. 이때, 생성된 투영 데이터는 도 1에 도시된 투영 데이터(120 내지 140)와 동일하다. 한편, 생성된 투영 데이터는 투영 데이터 제공부(230)에 저장될 수 있다. Thereafter, projection data Da, Db, and Dc can be generated with each frequent item as a prefix pattern. At this time, the generated projection data is the same as the projection data 120 to 140 shown in Fig. Meanwhile, the generated projection data may be stored in the projection data providing unit 230.

이때, 일 실시예에 따르면, 투영 데이터 제공부(230)는 우선 순위 큐(300)를 이용하여 저장된 투영 데이터를 깊이 우선 탐색 방법에 따른 우선 순위에 따라 프로세싱 노드 1 및 2로 제공할 수 있다. According to one embodiment, the projection data providing unit 230 may provide the projection data stored using the priority queue 300 to the processing nodes 1 and 2 according to the priority according to the depth-first search method.

도 3c를 참조하면, 우선 순위 큐(300)에 저장된 투영 데이터 중 깊이 우선 탐색 방법에 따른 우선 순위가 높은 {a} 및 {b}를 접두 패턴으로 하는 투영 데이터 Da와 Db가 프로세싱 노드 1과 2에 우선적으로 할당된다.Referring to FIG. 3C, the projection data Da and Db having the high priority {a} and {b} as the prefixes according to the depth-first search method among the projection data stored in the priority queue 300 are stored in the processing nodes 1 and 2 .

도 3d 및 도 3e를 참조하면, 투영 데이터 Da와 Db를 각각 할당받은 프로세싱 노드 1 및 2는 할당받은 투영 데이터를 이용하여 빈발 패턴을 생성할 수 있다. Referring to FIGS. 3D and 3E, the processing nodes 1 and 2 respectively assigned the projection data Da and Db can generate frequent patterns using the allocated projection data.

구체적으로 도 1을 참조하면, 투영 데이터 Da(120)에서 b와 c의 지지도가 최소 지지도 이상이므로, b와 c는 빈발 아이템이다. 따라서, 접두 패턴 a와 투영 데이터에서의 빈발 아이템 b 또는 c를 결합한 패턴 {a, b} 및 {a, c}는 빈발 패턴이 된다. Specifically, referring to FIG. 1, b and c are frequent items since the support degrees of b and c in the projection data Da (120) are more than minimum support. Therefore, the patterns {a, b} and {a, c} combining the prefix pattern a and the frequent item b or c in the projection data become frequent patterns.

이후, 프로세싱 노드 1은 생성된 빈발 패턴을 접두 패턴으로 하는 투영 데이터 Dab 및 Dac를 생성하여 우선 순위 큐(300)에 저장할 수 있다. Then, the processing node 1 may generate projection data Dab and Dac, which are prefix patterns of the generated frequent patterns, and store the generated projection data Dab and Dac in the priority queue 300.

마찬가지로, 프로세싱 노드 2는 투영 데이터 Db에서 빈발 아이템을 찾은 후 빈발 패턴을 생성할 수 있다. 도 1을 참조하면, 투영 데이터 Db(130)에서 c는 빈발 아이템이다. 따라서, {b, c}는 빈발 패턴이 된다. Similarly, the processing node 2 may generate a frequent pattern after finding the frequent item in the projection data Db. Referring to FIG. 1, in the projection data Db 130, c is a frequent item. Therefore, {b, c} is a frequent pattern.

한편, 프로세싱 노드 2는 생성된 빈발 패턴을 접두 패턴으로 하는 투영 데이터 Dbc를 생성하여 우선 순위 큐(300)에 저장할 수 있다. On the other hand, the processing node 2 may generate projection data Dbc having the generated frequent pattern as a prefix pattern, and store the generated projection data Dbc in the priority queue 300.

이후, 도 3f에 도시된 예와 같이 우선 순위 큐(300)에 저장된 투영 데이터 중 깊이 우선 탐색 방법 상의 우선순위가 높은 Dab, Dac가 프로세싱 노드 1 및 2에 할당될 수 있다.Then, among the projection data stored in the priority queue 300, high priority Dab, Dac on the depth-first search method can be assigned to the processing nodes 1 and 2 as in the example shown in FIG. 3F.

한편, 일 실시예에 따르면, 복수의 프로세싱 노드(210)는 우선 순위 큐에 저장된 투영 데이터의 수가 미리 설정된 값 이상인 경우, 하나의 투영 데이터만을 생성할 수 있다. Meanwhile, according to one embodiment, the plurality of processing nodes 210 can generate only one projection data when the number of projection data stored in the priority queue is equal to or larger than a predetermined value.

구체적으로, 도 3a에 도시된 예에서, 미리 설정된 투영 데이터의 수가 2인 경우, 우선 순위 큐(300)에 저장된 투영 데이터의 수는 미리 설정된 값을 초과한다. 따라서, 도 3c 및 도 3d에 도시된 예와 달리 우선 순위 큐(300)에서 Da 및 Da를 각각 제공받은 프로세싱 노드 1과 프로세싱 노드 2는 각각 하나의 투영 데이터만을 생성할 수 있다. Specifically, in the example shown in FIG. 3A, when the number of preset projection data is 2, the number of projection data stored in the priority queue 300 exceeds a predetermined value. Therefore, unlike the example shown in FIGS. 3C and 3D, the processing node 1 and the processing node 2, which respectively receive Da and Da in the priority queue 300, can generate only one projection data, respectively.

즉, 도 3c 및 도 3d에서 프로세싱 노드 1은 투영 데이터 Dab만을 생성하여 우선 순위 큐(300)에 저장할 수 있다. 이때, 프로세싱 노드 1은 Dac를 생성하는 대신 Da와 빈발 아이템 c를 함께 우선 순위 큐(300)에 저장함으로써, 이후에 프로세싱 노드 1 또는 2에 제공되어 Dac가 생성될 수 있도록 할 수 있다.3C and 3D, the processing node 1 may generate only the projection data Dab and store it in the priority queue 300. FIG. At this time, the processing node 1 may store Da and frequent item c together in the priority queue 300 instead of generating Dac, so that the processing node 1 can be provided to the processing node 1 or 2 so that the Dac can be generated.

한편, 프로세싱 노드 1 및 2는 생성된 빈발 패턴을 닫힌 빈발 패턴 리스트의 닫힌 빈발 패턴 후보와 비교하여 닫힌 빈발 패턴을 갱신할 수 있다. 이때, 생성된 빈발 패턴이 닫힌 빈발 패턴 리스트에 추가되지 않은 경우, 해당 빈발 패턴을 접두 패턴으로 하는 투영 데이터는 생성하지 않을 수 있다. Meanwhile, the processing nodes 1 and 2 can update the closed frequent pattern by comparing the generated frequent pattern with the closed frequent pattern candidates of the closed frequent pattern list. At this time, if the generated frequent pattern is not added to the closed frequent pattern list, projection data having the frequent pattern as a prefix pattern may not be generated.

즉, 생성된 빈발 패턴이 닫힌 빈발 패턴이 아닐 경우, 그 빈발 패턴을 루트로하는 서브 트리에 존재하는 빈발 패턴 역시 닫힌 빈발 패턴이 될 수 없다. 따라서, 해당 빈발 패턴을 접두 패턴으로 하는 투영 데이터를 생성하지 않음으로써, 그 빈발 패턴을 루트로 하는 빈발 패턴에 대해서는 가지치기(pruning)하여 탐색공간을 줄일 수 있다.That is, if the generated frequent pattern is not a closed frequent pattern, a frequent pattern existing in a subtree rooted by the frequent pattern can not also be a closed frequent pattern. Therefore, by not generating the projection data having the frequent pattern as the prefix pattern, it is possible to reduce the search space by pruning the frequent pattern having the frequent pattern as the root.

도 4는 닫힌 빈발 패턴 리스트의 예시도이다.4 is an exemplary view of a closed frequent pattern list.

도 4를 참조하면, 닫힌 빈발 패턴 리스트(400)는 현재까지 찾아진 닫힌 빈발 패턴 후보를 저장할 수 있다. 이때, 닫힌 빈발 패턴 리스트(400)는 닫힌 빈발 패턴 후보의 지지도를 포함할 수 있다. Referring to FIG. 4, the closed frequent pattern list 400 may store closed frequent pattern candidates found so far. At this time, the closed frequent pattern list 400 may include the degree of support of the closed frequent pattern candidate.

또한, 일 실시예에 따르면, 닫힌 빈발 패턴 리스트(400)는 닫힌 빈발 패턴 후보의 지지 레코드를 추가로 포함할 수 있다. Also, according to one embodiment, the closed frequent pattern list 400 may further include a support record of closed frequent pattern candidates.

일 실시예에 따르면, 닫힌 빈발 패턴 관리부(250)는 복수의 프로세싱 노드(210)는 생성된 빈발 패턴과 닫힌 빈발 패턴 후보를 비교하여 닫힌 빈발 패턴 리스트(400)를 지속적으로 갱신할 수 있다.According to one embodiment, the closed frequent pattern management unit 250 may continuously update the closed frequent pattern list 400 by comparing the generated frequent patterns with the closed frequent pattern candidates.

예를 들어, 지지도가 2인 빈발 패턴 {b, c}가 생성된 경우, 닫힌 빈발 패턴 리스트(400)에서 생성된 빈발 패턴과 동일한 지지도를 가지는 닫힌 빈발 패턴 후보 {a, b, c}를 찾을 수 있다. 이때, {b, c}는 {a, b, c}의 부분집합이므로, 닫힌 빈발 패턴 리스트에 추가될 수 없다. For example, if a frequent pattern {b, c} with a support score of 2 is generated, a closed frequent pattern candidate {a, b, c} having the same degree of support as the frequent pattern generated in the closed frequent pattern list 400 is found . At this time, {b, c} is a subset of {a, b, c} and can not be added to the closed frequent pattern list.

반면, 생성된 빈발 패턴 {b, c}의 지지도가 3인 경우, 동일한 지지도를 가지는 닫힌 빈발 패턴 후보가 존재하지 않으므로, 빈발 패턴 {b, c}는 닫힌 빈발 패턴 리스트에 추가될 수 있다.On the other hand, if the generated frequent pattern {b, c} has a degree of support of 3, the frequent pattern {b, c} can be added to the closed frequent pattern list since there is no closed frequent pattern candidate having the same degree of support.

한편, 생성된 빈발 패턴이 지지도가 4인 패턴 {a, b, d}인 경우, 닫힌 빈발 패턴 리스트에서 닫힌 빈발 패턴 후보 {a, b}는 {a, b, d}와 지지도가 같고, {a, b, d}의 부분집합이므로, {a, b}는 닫힌 빈발 패턴 리스트에서 삭제되고, {a, b, d}가 닫힌 빈발 패턴 리스트에 추가될 수 있다.On the other hand, if the generated frequent pattern is a pattern {a, b, d} with a support degree of 4, the closed frequent pattern candidate {a, b} in the closed frequent pattern list is equal to {a, b, d} is a subset of a, b, d}, {a, b} can be removed from the closed frequent pattern list and {a, b, d} can be added to the closed frequent pattern list.

또 다른 예로, 지지도가 4인 빈발 패턴 {b}가 생성된 경우, 닫힌 빈발 패턴 리스트 상에서 지지도가 4인 닫힌 빈발 패턴 후보 {a, b}를 찾을 수 있다. As another example, if a frequent pattern {b} with a score of 4 is generated, a closed frequent pattern candidate {a, b} with a score of 4 on the closed frequent pattern list can be found.

이때, 빈발 패턴{b}를 지지하는 레코드가 {R1, R2, R3, R4}인 경우, 닫힌 빈발 패턴 후보 {a, b}와 지지 레코드가 동일하므로, 빈발 패턴 {b}는 빈발 패턴 리스트(400)에 추가될 수 없다.In this case, if the record supporting the frequent pattern {b} is {R1, R2, R3, R4}, the frequent pattern {b} is a frequent pattern list 400).

반면, 지지도가 2인 빈발 패턴 {d}가 생성되었고, 지지 레코드가 {R5, R6}인 경우, 동일한 지지 레코드를 가지는 닫힌 분발 패턴 후보가 존재하지 않으므로, 빈발 패턴 {d}는 닫힌 빈발 패턴 리스트에 추가될 수 있다.On the other hand, if a frequent pattern {d} with a support score of 2 is generated and the supporting record is {R5, R6}, then the frequent pattern {d} Lt; / RTI >

한편, 지지도가 2인 빈발 패턴 {a, b, c, d}가 생성되었고, 지지 레코드가 {R1, R2}인 경우, 동일한 지지 레코드를 가지는 닫힌 빈발 패턴 후보 {a, b, c}가 존재하지만 {a, b, c}는 {a, b, c, d}의 부분집합이므로, 닫힌 빈발 패턴 후보 {a, b, c}는 삭제되고, 빈발 패턴 {a, b, c, d}가 닫힌 빈발 패턴 리스트에 추가될 수 있다.On the other hand, if the frequent pattern {a, b, c, d} with support 2 is generated and the supporting record is {R1, R2}, there is a closed frequent pattern candidate {a, b, c} However, since the closed frequent pattern candidates {a, b, c} are deleted and the frequent patterns {a, b, c, d} are {a, b, c} Can be added to the closed frequent pattern list.

도 5는 데이터베이스에서 투영 데이터를 생성하는 과정을 나타내는 순서도이다. 5 is a flowchart showing a process of generating projection data in a database.

도 5를 참조하면, 복수의 프로세싱 노드(210) 중 임의의 하나의 프로세싱 노드는 데이터베이스에서 지지도가 최소 지지도 이상인 빈발 아이템을 찾은 후(510), 각각의 빈발 아이템을 접두 패턴으로 하는 투영 데이터를 생성할 수 있다(530).Referring to FIG. 5, any one of the processing nodes 210 finds a frequent item with a degree of support in the database higher than the minimum support (510), and generates projection data having a prefix pattern of each frequent item (530).

이후, 생성된 투영 데이터는 투영 데이터 제공부(230)에 저장될 수 있다(550). Thereafter, the generated projection data may be stored in the projection data providing unit 230 (550).

도 6은 복수의 프로세싱 노드에 의한 닫힌 빈발 패턴 마이닝 과정을 나타내는 순서도이다. 6 is a flowchart illustrating a closed frequent pattern mining process by a plurality of processing nodes.

도 6을 참조하면, 복수의 프로세싱 노드(210) 각각은 투영 데이터 제공부(230)에 저장된 투영 데이터 중 우선 순위가 가장 높은 투영데이터를 병렬적으로 할당받을 수 있다(610).Referring to FIG. 6, each of the plurality of processing nodes 210 may receive the projection data having the highest priority among the projection data stored in the projection data providing unit 230 in parallel (610).

이때, 투영 데이터의 우선 순위는 접두 패턴을 기준으로 하여 깊이 우선 탐색 방법에 따른 우선 순위에 따라 결정될 수 있다.At this time, the priority of the projection data can be determined according to the priority according to the depth-first search method based on the prefix pattern.

한편, 복수의 프로세싱 노드(210) 각각은 할당받은 투영 데이터에서 빈발 아이템을 탐색하여(620), 빈발 패턴을 생성할 수 있다(630). 이때, 빈발 패턴은 탐색된 빈발 아이템을 투영 데이터의 접두 패턴과 결합함으로써 생성될 수 있다.Meanwhile, each of the plurality of processing nodes 210 may search for a frequent item in the allocated projection data (620) and generate a frequent pattern (630). At this time, the frequent pattern can be generated by combining the detected frequent item with the prefix pattern of the projection data.

빈발 아이템이 생성된 경우, 닫힌 빈발 패턴 리스트에 생성된 빈발 패턴과 지지도가 동일하고 빈발 패턴을 부분 집합으로 포함하는 닫힌 빈발 패턴 후보가 존재하는지 여부를 판단할 수 있다(640).If a frequent item is generated, it can be determined whether a closed frequent pattern candidate having the same support as the frequent pattern generated in the closed frequent pattern list and including a frequent pattern as a subset exists (640).

이때, 닫힌 빈발 패턴 리스트에 생성된 빈발 패턴과 지지도가 동일하고 빈발 패턴을 부분 집합으로 포함하는 닫힌 빈발 패턴 후보가 존재하지 않는 경우, 생성된 빈발 패턴을 닫힌 빈발 패턴 리스트에 추가하고(650), 생성된 빈발 패턴을 접두 패턴으로 하는 투영 데이터를 생성하여 투영 데이터 제공부(230)에 저장할 수 있다(660).At this time, if there is no closed frequent pattern candidate having the same support as the frequent pattern generated in the closed frequent pattern list and including a frequent pattern as a subset, the generated frequent pattern is added to the closed frequent pattern list (650) The projection data having the generated frequent pattern as a prefix pattern may be generated and stored in the projection data providing unit 230 (660).

한편, 할당 받은 투영 데이터의 모든 빈발 아이템에 대하여 빈발 패턴 생성 및 투영 데이터 생성이 완료된 경우(670), 투영 데이터 제공부(230)에 저장된 투영 데이터 중 깊이 우선 탐색 방법에 따른 우선 순위가 가장 높은 투영 데이터를 추출하여(680, 610), 620 내지 680의 과정을 반복할 수 있다. Meanwhile, when frequent pattern generation and projection data generation for all frequent items of the assigned projection data are completed (670), the projection data having the highest priority according to the depth-first search method among the projection data stored in the projection data providing unit 230 Data may be extracted (680, 610) and the process of 620 to 680 may be repeated.

한편, 투영 데이터 제공부(230)에 저장된 투영 데이터가 존재하지 않는 경우, 모든 절차가 종료되고, 닫힌 빈발 패턴 리스트에 포함된 닫힌 빈발 패턴 후보들은 닫힌 빈발 패턴으로 확정된다.On the other hand, when projection data stored in the projection data providing unit 230 does not exist, all the procedures are terminated and the closed frequent pattern candidates included in the closed frequent pattern list are determined as closed frequent patterns.

한편, 본 발명의 실시 예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 장치에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등을 포함한다. Meanwhile, the embodiments of the present invention can be embodied as computer readable codes on a computer readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer apparatus is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like.

또한, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Also, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers of the art to which the present invention belongs.

이상에서는 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

210: 프로세싱 노드 230: 투영 데이터 제공부
250: 닫힌 빈발 패턴 관리부210: processing node 230: projection data provider
250: closed frequent pattern management unit

Claims

Generating and storing projection data in which a frequent item is a prefix pattern in a database;
Assigning the stored projection data to a plurality of processing nodes according to a priority among the stored projection data;
Generating a frequent pattern using the allocated projection data;
Updating the closed frequent pattern list by comparing the frequent pattern with a closed frequent pattern list; And
And generating and storing projection data having the frequent pattern as a prefix pattern when the frequent pattern is added to the closed frequent pattern list.

The method according to claim 1,
Wherein the assigning comprises:
Wherein the stored projection data is allocated according to a priority according to a depth-first search method based on a prefix pattern of the stored projection data.

The method according to claim 1,
The step of generating and storing projection data that prefixes a frequent item in the database includes:
Searching a frequent item in the database;
Generating projection data with the frequent item as a prefix pattern; And
And storing the generated projection data. &Lt; Desc / Clms Page number 19 >

The method according to claim 1,
The generating of the frequent pattern may include:
Finding a frequent item from the allocated projection data; And
And generating a frequent pattern by combining the prefix pattern of the allocated projection data and the frequent item of the allocated projection data.

The method according to claim 1,
Wherein updating the closed frequent pattern list comprises:
And updating the closed frequent pattern list according to whether a closed frequent pattern candidate having the same degree of support as the frequent pattern exists in the closed frequent pattern list.

6. The method of claim 5,
Wherein updating the closed frequent pattern list comprises:
And adding the frequent pattern to the closed frequent pattern list if there is no closed frequent pattern candidate having the same degree of support as the frequent pattern in the closed frequent pattern list.

6. The method of claim 5,
Wherein updating the closed frequent pattern list comprises:
And a closed frequent pattern mining using parallel processing for adding the frequent pattern to the closed frequent pattern list when there is no closed frequent pattern candidate having a subset of the frequent patterns among the closed pattern candidates having the same degree of support as the frequent pattern Way.

6. The method of claim 5,
Wherein updating the closed frequent pattern list comprises:
Wherein when the closed frequent pattern candidate having the same support record as the support pattern of the frequent pattern does not exist among the closed pattern candidates having the same degree of support as the frequent pattern, Closed frequent pattern mining method.

6. The method of claim 5,
Wherein updating the closed frequent pattern list comprises:
Determining whether a closed frequent pattern candidate having the same support record as the support record of the frequent pattern among the closed pattern candidates having the same degree of support as the frequent pattern exists if the support degree of the frequent pattern is less than a preset value,
A closed frequent pattern using parallel processing for determining whether there is a closed frequent pattern candidate that is a subset of the frequent pattern among the closed pattern candidates having the same degree of support as the frequent pattern when the support degree of the frequent pattern is equal to or greater than a predetermined value, Mining method.

The method according to claim 1,
Generating and storing projection data having the frequent pattern as a prefix pattern,
A frequent pattern mining method using parallel processing for generating only one projection data from the allocated projection data and storing the allocated projection data together with a frequent item to be projected next when the number of stored projection data is equal to or larger than a preset value .

A projection data provider for storing projection data generated at a plurality of processing nodes and assigning the stored projection data to a plurality of processing nodes according to a priority;
A closed frequent pattern manager that stores a closed frequent pattern list containing closed frequent pattern candidates; And
Generating a frequent pattern using the projection data allocated from the projection data providing unit to update the closed frequent pattern list, and when the frequent pattern is added to the closed frequent pattern list, CLAIMS What is claimed is: 1. A closed frequent pattern mining device using parallel processing comprising a plurality of processing nodes for generating data.

12. The method of claim 11,
Wherein the projection data providing unit comprises:
Wherein the stored projection data is allocated according to a priority according to a depth-first search method based on a prefix pattern of the stored projection data.

12. The method of claim 11,
Wherein any one of the plurality of processing nodes comprises:
A closed frequent pattern mining device using parallel processing to generate projection data with a frequent item as a prefix pattern after a frequent item is found in a database.

12. The method of claim 11,
The plurality of processing nodes comprising:
Wherein the frequent item is found in the allocated projection data and then the frequent item is combined with the prefix pattern of the allocated projection data to generate a frequent pattern.

12. The method of claim 11,
The plurality of processing nodes comprising:
Wherein the closed frequent pattern list is updated based on whether a closed frequent pattern candidate having the same degree of support as the frequent pattern exists in the closed frequent pattern list.

16. The method of claim 15,
The plurality of processing nodes comprising:
And adding the frequent pattern to the closed frequent pattern list when there is no closed frequent pattern candidate having the same degree of support as the frequent pattern in the closed frequent pattern list.

16. The method of claim 15,
The plurality of processing nodes comprising:
A closed frequent pattern using parallel processing for adding the frequent pattern to the closed frequent pattern list when there is no closed frequent pattern candidate that includes the frequent pattern as a subset among the closed frequent patterns having the same degree of support as the frequent pattern; Mining device.

16. The method of claim 15,
The plurality of processing nodes comprising:
A parallel processing for adding the frequent pattern to the closed frequent pattern list when there is no closed frequent pattern candidate having the same support record as the support record of the frequent pattern among the closed frequent pattern candidates having the same degree of support as the frequent pattern; Closed frequent pattern mining device.

16. The method of claim 15,
The plurality of processing nodes comprising:
Determining whether a closed frequent pattern candidate having the same support record as the support record of the frequent pattern among the closed pattern candidates having the same degree of support as the frequent pattern exists if the support degree of the frequent pattern is less than a preset value,
A closed frequent pattern using parallel processing for determining whether there is a closed frequent pattern candidate that is a subset of the frequent pattern among the closed pattern candidates having the same degree of support as the frequent pattern when the support degree of the frequent pattern is equal to or greater than a predetermined value, Mining device.

12. The method of claim 11,
The plurality of processing nodes comprising:
A parallel processing for generating only one projection data from the allocated projection data and storing the allocated projection data together with a frequent item to be projected next when the number of projection data stored in the projection data providing unit is equal to or larger than a preset value Frequently used pattern mining device.