KR100792307B1

KR100792307B1 - Maximal Frequent Search Path Pattern Extracting System

Info

Publication number: KR100792307B1
Application number: KR1020050054259A
Authority: KR
Inventors: 장민석
Original assignee: 장민석
Priority date: 2005-06-23
Filing date: 2005-06-23
Publication date: 2008-01-07
Also published as: KR20050079966A

Abstract

본 발명은 그래프 모델(노드와 방향성 있는 에지)로 표현할 수 있는 검색 응용의 결과를 입력받아서 그중 가장 빈번한 검색 경로 패턴을 추출하는 시스템 및 방법에 관한 것으로서, 이를 통해 사용자(혹은 시스템)들이 선호하는 검색경로를 제공해줌으로써 웹 검색을 포함한 정보 검색 시스템의 효율적인 설계나 효과적인 마케팅 전략을 세울 수 있도록 도와주게 하는 것이다. The present invention relates to a system and method for receiving the results of a search application that can be represented by graph models (nodes and directional edges) and extracting the most frequent search path patterns among them. By providing a path, it helps to design an efficient information retrieval system including web search or to build an effective marketing strategy.

특히 본 발명은 실제 검색 정보를 그래프 모델로 변환하는 변환 모듈, 변환 모듈에 의해 변환된 그래프 모델을 저장하는 데이터베이스(혹은 외부 파일), 변환된 그래프 모델에서 최빈 검색 경로 정보를 마이닝하는 최빈 검색 경로 마이닝 모듈, 마이닝 결과를 실제 검색 정보로 변환해서 표시해주는 결과 경로 표시 GUI 모듈로 구성되어 있다. In particular, the present invention provides a conversion module for converting actual search information into a graph model, a database (or an external file) for storing the graph model converted by the conversion module, and the least search path mining for mining the most search path information in the converted graph model Module consists of a GUI module that displays the result path that converts mining results into actual search information.

이의 장점은 방향성 무게없는 그래프(노드와 에지)로 모델링 변환할 수 있는 모든 검색 응용 정보에 적용할 수 있다는 장점을 가지고 있다. The advantage is that it can be applied to any search application information that can be modeled and transformed into directional weightless graphs (nodes and edges).

검색 패턴, 경로 검색 패턴, 데이터 마이닝, 웹 경로 검색 패턴, 최빈 검색 경로, 패턴 마이닝, 방향성 있는 무게없는 그래프 Search pattern, path search pattern, data mining, web path search pattern, mode search path, pattern mining, directional weightless graph

Description

Maximal Frequent Search Path Pattern Extracting System}

도 1은 본 발명에 따른 최빈 검색 경로 패턴 추출 시스템의 구성을 개략적으로 나타낸 블록도 1 is a block diagram schematically showing the configuration of the least frequent search path pattern extraction system according to the present invention.

도 2는 본 발명의 마이닝 정보 추출 대상인 그래프 모델의 한 예시도 2 is an exemplary diagram of a graph model that is a mining information extraction target of the present invention;

도 3은 본 발명의 검색 경로 마이닝 방법의 동작 과정을 설명하기 위한 동작 흐름도 3 is a flowchart illustrating an operation of the search path mining method according to the present invention.

도 4는 도 3의 동작 흐름도의 동작의 적용 예를 나타낸 예시도 4 is an exemplary view illustrating an application example of an operation of the operation flowchart of FIG. 3.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100 : 검색 정보 입력 100: search information input

101 : 그래프 모델화 모듈101: graph modeling module

102 : 데이터베이스 혹은 외부 파일102: database or external file

103 : 최빈 검색 경로 패턴 마이닝 모듈103: mode search path pattern mining module

104 : 결과 경로 표시 GUI 모듈 104: result path display GUI module

300 : 방법의 동작과정 제 1 단계; 그래프 모델 입력 300: a first step in the operation of the method; Enter graph model

301 : 방법의 동작과정 제 2 단계; 각 노드의 발생횟수 검사 301: a second step of the operation of the method; Check the occurrence frequency of each node

302 : 방법의 동작과정 제 3 단계; 일정 발생 횟수 이상의 부경로만 추출302: a third step of the operation of the method; Extract only subpaths above a certain number of occurrences

303 : 방법의 동작과정 제 4 단계; 부경로 원소를 카테이션곱을 수행해서 길이가 한 개 늘어난 부경로 추출303: Fourth step of the operation of the method; Extraction of one additional path length by performing the Cartesian product on the subpath elements

304 : 방법의 동작과정 제 5 단계; 그래프 모델을 검색해서 부경로의 발생횟수 검사304: fifth operation of the method; Search graph model to check the number of occurrences of sub path

305 : 방법의 동작과정 제 6 단계; 제 3 단계에서 5 단계까지의 반복 과정305: Sixth step of the operation of the method; Repeat process from step 3 to step 5

306 : 방법의 동작과정 제 7 단계; 발생횟수가 가장 큰 부경로 출력306: Seventh step of the operation of the method; Negative path output with the greatest number of occurrences

400 : 정보 검색 경로 예400: Example of information search path

401 : Lk(일정 발생횟수 이상 빈도수를 가진 부경로 추출)401: Lk (Extract path with frequency above schedule occurrence frequency)

402 : Ck(Lk로부터 생성한 부경로 후보들) 402: Ck (subpath candidates generated from Lk)

403 : L2에서 생성한 부경로 후보들403: Subpath candidates created in L2

예를 들어 사용자들의 상품에 정보 검색 패턴을 빠르고 정확하게 알아내는 것은 마케팅 업무에서 수익과 직결된다. 사용자들의 정보 탐색 패턴들을 찾아내는 방법은 현재 크게 두 가지로 나눌 수 있다. 하나는 법칙 기반 패턴을 찾아내는 것이고, 다른 하나는 형태기반패턴을 구하는 것이다. 본 발명은 후자의 경우에 해당 된다. 사용자들의 정보 탐색 패턴을 특정한 형태의 그래프로 표현해서 이로부터 패턴을 계산하는 것이다. 사용자들이 가장 많이 찾는 정보 검색의 패턴을 찾아내는 기존의 방법은 다음 한계점들이 있다. 첫째, 일반적인 경로 패턴과는 달리 간단한 경우에 대해서 적용할 수 있다. 둘째, 검색의 순서를 무시하기 때문에 역방향 검색 패턴을 고려하지 않고 순방향 패턴만을 찾아낸다. 셋째, 여러 가지 검색 응용 결과를 그래프로 변환하는 모듈이 존재하지 않기 때문에 비슷한 다른 응용에 대해서 적용할 수가 없다. 본 발명은 이러한 문제점을 극복한 사용자들의 빈번한 정보검색 패턴을 알아내는 효율적인 알고리즘을 적용한 시스템에 대한 것이다. For example, quickly and accurately finding information retrieval patterns on users' products is directly related to profits in marketing. There are currently two ways to find information search patterns of users. One is to find law-based patterns, and the other is to find shape-based patterns. The present invention corresponds to the latter case. The pattern of information search of users is expressed in a graph of a certain form and the pattern is calculated from the information. Existing methods of finding the pattern of information retrieval most frequently searched by users have the following limitations. First, unlike general path patterns, it can be applied to simple cases. Second, since the search order is ignored, only the forward pattern is found without considering the backward search pattern. Third, there is no module to convert the results of various search applications into graphs, so it cannot be applied to other similar applications. The present invention is directed to a system employing an efficient algorithm for finding frequent information retrieval patterns of users who overcome these problems.

본 발명은 상기의 문제점들을 극복하는 방법 및 시스템을 제안한 것으로서, 그 목적은 사용자들의 일반적인 정보 검색 패턴을 추출하는 시스템을 시스템 이용자들에게 제공함으로써 소비자(사용자)의 기호도를 정확하고 빠르게 알게 함으로써 이에 따른 전략을 세울 수 있게 도와주게 하고자 하는 것이다. The present invention proposes a method and system for overcoming the above problems, and an object thereof is to provide the system users with a system for extracting general information retrieval patterns of users, thereby accurately and quickly knowing the preference of the consumer (user). It's about helping you build a strategy.

상기의 목적을 달성하기 위하여 본 발명의 일 실시 예는, 사용자(시스템)들의 검색 정보를 입력으로 받아 이로부터 검색 경로 패턴을 추출하기 전에 입력정보를 그래프 모델로 변환하는 모듈과; 이 변환된 정보가 저장되는 데이터베이스 혹은 외부파일 저장소와; 변환된 정보를 입력으로 사용자의 빈번한 검색 취향을 추출하는 최빈 검색 경로 패턴 마이닝 모듈과; 마이닝 모듈의 결과를 초기 검색 정보 형 태 혹은 그래프 모양으로 가시화해 줌으로써 시각적으로 검색 패턴을 알기 쉽게 보여주는 결과 경로 표시 GUI 모듈을 포함하는 최빈 검색 경로 패턴 추출 시스템을 제공한다. In order to achieve the above object, an embodiment of the present invention includes a module for receiving input search information of users (systems) and converting the input information into a graph model before extracting a search path pattern therefrom; A database or external file store in which the converted information is stored; A most frequent search path pattern mining module for extracting frequent search tastes of a user by inputting the converted information; By providing the result of mining module in the form of initial search information or graph, it provides the most frequent search path pattern extraction system including the result path display GUI module that shows the search pattern visually.

상기의 목적을 달성하기 위하여 본 발명의 다른 실시 예는, 마이닝 모듈에 적용한 방법으로서 응용으로부터 얻은 검색 정보에 대한 그래프 모델을 데이터베이스 혹은 파일로부터 입력받는 제 1 단계와; 이 모델에서 우선 경로길이 1인 각 노드의 발생횟수를 그래프 모델에서 검색하는 제 2 단계와; 방법에서 초기화된 일정 발생 횟수 이상의 부경로(subpath)만 추출하는 제 3 단계와; 추출된 부경로로부터 길이가 1이 늘어난 부경로를 생성하는 제 4 단계와; 새로 생성한 부경로의 발생횟수를 그래프모델을 검색해서 검사하는 제 5 단계와; 앞의 단계를 모든 부경로의 발생횟수가 0회가 될 때까지 반복하는 제 6 단계와; 최종적으로 발생횟수가 가장 큰 부경로를 출력하는 마지막 단계를 포함하는 최빈 검색 경로 패턴 추출 방법을 제공한다. In order to achieve the above object, another embodiment of the present invention provides a method applied to a mining module, comprising: a first step of receiving a graph model of search information obtained from an application from a database or a file; A second step of retrieving the number of occurrences of each node having a path length of 1 from the graph model in this model; Extracting only subpaths of a predetermined number of occurrences or more initialized in the method; A fourth step of generating a subpath whose length is increased from the extracted subpath; A fifth step of searching the graph model for the number of occurrences of the newly generated secondary path; A sixth step of repeating the preceding step until the occurrence frequency of all the sub-paths is zero; Finally, it provides a method of extracting the least frequent search path pattern including a final step of outputting a negative path having the greatest occurrence frequency.

상기 본 발명의 각 실시 예에서, 각 부경로의 순서를 고려해서 그래프 모델과 비교하며 제 3 단계에서의 일정횟수는 초기화되지 않고 사용자가 지정하도록 할 수 있다.In each embodiment of the present invention, the order of each subpath is compared with the graph model, and the predetermined number of times in the third step may be specified by the user without initialization.

본 발명의 목적과 특징 및 장점은 첨부 도면 및 다음의 상세한 설명을 참조함으로써 더욱 쉽게 이해될 수 있을 것이다. The objects, features and advantages of the present invention will be more readily understood by reference to the accompanying drawings and the following detailed description.

도 1은 본 발명에 따른 최빈 검색 경로 패턴 추출 시스템을 개략적으로 나타낸 블록도로서, 사용자(시스템)들의 검색 정보(100)를 입력으로 받아 이로부터 검색 경로 패턴을 추출하기 전에 입력정보를 그래프 모델로 변환하는 모듈(101)과 이 변환된 정보가 저장되는 데이터베이스 혹은 외부파일 저장소(102)와 변환된 정보를 입력으로 사용자의 빈번한 검색 취향을 추출하는 최빈 검색 경로 패턴 마이닝 모듈(103)과 마이닝 모듈의 결과를 초기 검색 정보 형태 혹은 그래프 모양으로 가시화해 줌으로써 시각적으로 검색 패턴을 알기 쉽게 보여주는 결과 경로 표시 GUI 모듈(104)을 포함하는 최빈 검색 경로 패턴 추출 시스템을 도시하고 있다.1 is a block diagram schematically illustrating a least frequent search path pattern extraction system according to an exemplary embodiment of the present invention, and receives input information of a user (system) as input and extracts the input information from the search path pattern before extracting the search path pattern from the graph model. The most recent search path pattern mining module 103 and the mining module of the module 101 for converting, the database or the external file storage 102 in which the converted information is stored, and the frequently searched search tastes are extracted by inputting the converted information. The least frequent search path pattern extraction system including a result path display GUI module 104 that visually shows the search pattern by visualizing the result in the form of an initial search information or a graph shape is illustrated.

도 2는 그래프 모델화 모듈(101)의 출력 정보인 정보 검색의 일반적인 경로 망을 그래프로 예시한 것이다. 이는 각 노드와 에지에 가중치를 보유하지 않는 특성을 가진 무게없는 그래프를 보여주고 있으며, 본 그래프는 정보 검색 경로를 모델링하는데 적합하며, 교통망이나 전화망의 예와 같이 노드(혹은 에지)에 가중치(교통량 혹은 전화통화량)를 가지는 경우에는 적합하지 않다. 도 2의 각 노드(A,B,C,D,E)는 사용자가 검색하는 정보(혹은 웹 사이트) 단위를 나타낸다. 각 에지는 양방향성을 가질 수 있다. 방향성을 고려하여 _nC₂(여기서 20)개의 에지가 존재한다. 기존 연구의 경우 역방향 경로는 고려하지 않고 있다. 예를 들어 ‘ABC’와 같은 경로를 고려한 방법은 존재한다. 하지만 ‘ABCAE’와 같은 경로에 대해서는 효과적인 방법이 없다. 2 is a graph illustrating a general path network of information retrieval that is output information of the graph modeling module 101. It shows a weightless graph with a characteristic that does not have weights at each node and edge, and this graph is suitable for modeling information retrieval paths. Or telephone calls). Of FIG. 2 Each node A, B, C, D, or E represents a unit of information (or a web site) that a user searches. Each edge may be bidirectional. In consideration of the directionality there are _n C ₂ (where 20) edges. In the case of previous studies, the reverse path is not considered. For example, there are methods that consider a path such as 'ABC'. However, there is no effective way for paths like 'ABCAE'.

도 3은 최빈 검색 경로 패턴 마이닝 모듈(103)에 적용한 방법을 설명하기 위 한 동작 흐름도이며, 도 4는 이의 동작 예를 도시하고 있다. 3 is an operation flowchart for explaining a method applied to the least frequent search path pattern mining module 103, and FIG. 4 shows an example of the operation thereof.

도 4를 참조하면, 정보 검색 경로 테이블(400)은 데이터베이스 혹은 외부파일(102)에 저장될 수 있는 다수의 검색 경로들을 예시하고 있다. 이 값들은 도 3의 각 단계(300~306)에서 발생횟수의 검사 대상이 된다. 여기서 Lk(401)는 k번째 빈번한 통과 경로를 나타내며, Ck(402)는 Lk를 포함한 집합이다. Lk에서 일정 발생횟수 이상의 원소 경로를 추출한 것이 Ck이다. 최종적으로 방법의 결과는 마지막 단계(306)를 진행하고 남은 Lk이다. C1은 최소 경로를 의미하므로 모든 노드를 열거하게 되며, 이중 D를 검사함으로써 그 빈도수를 계산(301)하게 되며, 이중 미리 초기화된 혹은 사용자가 원하는 빈도수 이상의 지지도를 가진 통과 경로를 L1으로 선정(302)하게 된다. 여기서는 최소 빈도수 2인 D, E 노드를 제거함으로써 L1을 산출하게 된다(401). 그 이후 L1의 경로를 포함한 길이가 2인 통과경로를 산출한다(303). 기존 방법과 다른 점은 방향성(역방향성)을 고려하기 때문에 순서 개념이 들어간 경로 그 자체를 표기하게 된다. 따라서 기존 방법의 경우 Ck로서 AB, BC, AC를 고려하였지만, 여기서는 AB, AC, BC, BA, CA, CB의 6가지 경로(402)를 고려하게 된다. 이의 빈도수를 D에서 계산하고자 할때도 집합으로서가 아닌 자체 경로가 존재하는 빈도수를 계산한다. 즉, CB 경로의 경우, C와 B는 모든 경로에 존재하지만, CB경로는 경우2(400)에만 존재하게 된다. 따라서 빈도수는 1 에 불과하게 된다. C2에서 발생횟수가 2 이상인 경로를 선택하게 되면, L2와 같이 산출된다(401, 304)). 이와 같은 단계를 계속(305)하게 되면, 최종적으로 L3(401)와 같은 결과 경로가 산출되게 된다. C3에서 지지도가 2인 경로를 제거함으로써 ABC 경로를 결과 경로 출력한다(401, 306). 여기서 L2에서 얻어진 확장된 부경로(403)에서 같은 노드가 중복된 것은 의미가 없으므로 C3의 부경로와 같이 길이가 3인 경로도 존재한다(402).Referring to FIG. 4, the information search path table 400 illustrates a number of search paths that may be stored in a database or an external file 102. These values are subject to the inspection of the number of occurrences in each of steps 300 to 306 of FIG. Where Lk 401 represents the k th frequent pass path, and Ck 402 is a set including Lk. Ck is the extraction of element paths over a certain number of occurrences from Lk. Finally the result of the method is Lk remaining after proceeding to the last step 306. Since C1 means the minimum path, all nodes are enumerated, and the frequency is calculated by checking the double D (301). ) Here, L1 is calculated by removing D and E nodes having a minimum frequency of 2 (401). Thereafter, a pass path having a length of 2 including the path of L1 is calculated (303). The difference from the existing method is that it considers directionality (reverseness), so the path itself containing the order concept is written. Therefore, the conventional method considers AB, BC, AC as Ck, but here, six paths 402 of AB, AC, BC, BA, CA, and CB are considered. When calculating its frequency in D, it calculates the frequency of its own path, not as a set. That is, in the case of the CB path, C and B exist in all the paths, but the CB path exists only in the case 2 (400). Thus, the frequency is only one. If a path having a frequency of occurrence of 2 or more is selected in C2, it is calculated as L2 (401, 304). Continuing this step, 305, results in a final result path such as L3 401. The ABC path is output as the result path by removing the path having the support of 2 at C3 (401, 306). Here, since the same node is not duplicated in the extended subpath 403 obtained in L2, there is also a path having a length of 3 as in the subpath of C3 (402).

이상의 본 발명은 상기에 기술된 실시 예들에 의해 한정되지 않고, 당업자들에 의해 다양한 변형 및 변경을 가져올 수 있으며, 이는 첨부된 청구항에서 정의되는 본 발명의 취지와 범위에 포함된다. The present invention is not limited to the embodiments described above, and various modifications and changes can be made by those skilled in the art, which are included in the spirit and scope of the present invention as defined in the appended claims.

따라서 본 발명은 일반적인 정보 검색 경로에 대한 빈번한 검색 경로를 추출하는 효율적인 방법 및 시스템에 대해 제안한다. 특징적인 것은 역방향 경로까지 고려한다는 점과 집합 개념이 아닌 순서개념을 적용함으로써 가장 일반적인 형태 기반 패턴을 추출할 수 있다는 것이다. Therefore, the present invention proposes an efficient method and system for extracting frequent search paths for general information search paths. The distinction is that the backward path is considered and the most common shape-based pattern can be extracted by applying the order concept rather than the aggregation concept.

Claims

In the system that receives the result of the information retrieval application that can be represented by the graph model and extracts the most frequent search path pattern among them,

A module for receiving the search information of users (systems) as input and converting the input information into a graph model before extracting the search path pattern from the search information;

A database or external file store in which the converted information is stored;

A most frequent search path pattern mining module for extracting frequent search tastes of a user by inputting the converted information;

The most frequent search path pattern extraction system, comprising a GUI module for displaying the result path visually showing the search pattern visually by visualizing the results of the mining module in the form of an initial search information or a graph.

The method of claim 1, wherein the graph model,

The most frequent search path pattern extraction system, which is a directional and weightless graph that matches information or websites searched by a user (system) with nodes in a graph and matches searched paths with directional edges.

In the search path mining method applied to the mining module,

A first step of receiving a graph model of search information obtained from an application from a database or a file;

A second step of retrieving the number of occurrences of each node having a path length of 1 from the graph model in this model;

Extracting only a subpath of a predetermined number of occurrences or more initialized in the method;

A fourth step of generating a subpath of which length is increased from the extracted subpath;

A fifth step of searching the graph model for the number of occurrences of the newly generated secondary path;

A sixth step of repeating the preceding step until the occurrence frequency of all the sub-paths is zero;

And a seventh step of finally outputting the negative path having the greatest number of occurrences.

The method of claim 3, wherein the third step,

The least frequent search path pattern mining method, characterized in that the method may initialize a predetermined number of occurrences or allow a user to designate it separately.

The method of claim 3, wherein the third, fourth, fifth, sixth, seventh sub-paths,

The least frequent search path pattern mining method characterized by maintaining the order, not the set.

The mode of least frequent search path pattern mining comprising a forward path as well as a reverse path to allow node duplication in the secondary path.