KR101761177B1 - Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same - Google Patents

Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same Download PDF

Info

Publication number
KR101761177B1
KR101761177B1 KR1020150167957A KR20150167957A KR101761177B1 KR 101761177 B1 KR101761177 B1 KR 101761177B1 KR 1020150167957 A KR1020150167957 A KR 1020150167957A KR 20150167957 A KR20150167957 A KR 20150167957A KR 101761177 B1 KR101761177 B1 KR 101761177B1
Authority
KR
South Korea
Prior art keywords
item
user
data
mining
tree
Prior art date
Application number
KR1020150167957A
Other languages
Korean (ko)
Other versions
KR20170062308A (en
Inventor
윤은일
김동규
양흥모
이강인
신동춘
Original Assignee
세종대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 세종대학교산학협력단 filed Critical 세종대학교산학협력단
Priority to KR1020150167957A priority Critical patent/KR101761177B1/en
Publication of KR20170062308A publication Critical patent/KR20170062308A/en
Application granted granted Critical
Publication of KR101761177B1 publication Critical patent/KR101761177B1/en

Links

Images

Classifications

    • G06F17/30539
    • G06F17/30327
    • G06F17/30339
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The high-K important pattern mining method comprises the steps of: (a) generating a user header table including a creation time, user characteristic information and a user data link; (b) Receiving user data, performing a data tree visit based on each of the at least one item and updating the data tree path, and (c) adding the user data to the tree end node on the data tree path as a tail node. And sequentially connecting the tail nodes associated with the user with the corresponding user data links as start links. Therefore, the upper K important pattern mining method can provide the average characteristic information for the data group or transaction to which the pattern belongs by mining the upper K important patterns corresponding to the specific time range for the generation time of the user data.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of mining a top K important pattern, a device for performing the same, and a recording medium storing the same. 2. Description of the Related Art [0002]

The present invention relates to a high-K important pattern mining technique, and more particularly, to a high-K important pattern mining technique that mines upper K important patterns corresponding to a specific time range for generation time of user data, And a recording medium storing the apparatus. 2. Description of the Related Art

The mining technique according to the prior art uses a method of receiving a minimum frequency threshold value from a user in order to mine frequent patterns from a database. However, there is a problem that it is difficult to set an appropriate threshold value in such a case. If the threshold value is set too high, any frequent patterns may not be mined. Conversely, if the threshold value is set too small, too many frequent patterns are extracted, which makes analysis difficult and requires a considerable time to perform the mining process . Therefore, the conventional mining technique has a problem that a user must perform mining work several times using a computer in order to set an appropriate threshold value.

Meanwhile, the prior art has proposed a variety of top K frequent pattern mining methods, but these techniques simply perform the top K frequent pattern mining operation for each database without considering the generation time of the data or the user characteristic information for each data Therefore, mining operations are performed using only basic information of the database, and effective results can not be obtained for the latest data types such as SNS data, in which user characteristics are important factors.

Korean Patent Registration No. 10-0913027 relates to a data mining method for analyzing a large amount of data sets to find feature information and a data mining system implemented using the method, A method for obtaining a mining result in real time by optimizing a memory usage and a mining operation time required in a sequential pattern search process is presented and a method for effectively grasping a change in a mining result with respect to a change in a data set is disclosed.

Korean Patent Registration No. 10-1317540 relates to a method for mining a maximum frequent pattern considering a weight, and in order to effectively mine meaningful patterns in a large amount of transaction data, By providing a mining method, maximum frequent pattern mining can be quickly and efficiently mined by avoiding duplication between frequent patterns, and weighted frequent pattern mining discloses a method for excluding patterns composed of relatively less important items among frequent patterns .

Korean Registered Patent No. 10-0913027 (registered on Aug. 12, 2009) Korean Registered Patent No. 10-1317540 (Registered on Mar. 10, 2014)

One embodiment of the present invention relates to a method for predicting an upper K important pattern mining method for mining upper K important patterns corresponding to a specific time range for generating user data and providing average characteristic information for a data group or a transaction to which the corresponding pattern belongs And apparatus.

An embodiment of the present invention is to provide an upper K important pattern mining method and apparatus for mining upper K important patterns according to the generation time of user data without setting a threshold value for the minimum frequency.

An embodiment of the present invention is to provide an upper K important pattern mining method and apparatus that provides mining results for various upper K important patterns for each period based on data accumulation time.

Among the embodiments, the upper K significant pattern mining method comprises the steps of: (a) generating a user header table including a creation time, user characteristic information and a user data link; (b) combining at least one item from the user header table Receiving a user data representative of the user data and performing a data tree visit based on each of the at least one item and updating a data tree path; and (c) And associating the user data link with the tail node associated with the user sequentially as a start link.

The upper K important pattern mining method may include: (d) generating an item header table by selecting user data for a specific time range from the data tree path, and performing a mining tree visit based on the item information in the selected user data And updating the mining tree path. Wherein the upper K important pattern mining method further comprises: (e) associating an item in the selected user data as a tail node to a tree end node on the mining tree path, and associating the item link as a start link with at least a general And sequentially connecting nodes or tail nodes.

In one embodiment, step (e) includes generating a tail node table based on a tail node associated with a tree end node on the mining tree path, and connecting the corresponding tail node link to the tail node with a start link . ≪ / RTI > The step (e) may include extracting a characteristic value for a corresponding path based on characteristic information stored in a tail node of each of the mining tree paths, and deriving a representative characteristic value of the mining tree path .

The specific time range may be specified by setting a start time and an end time for the generation time of the user data. In one embodiment, the item header table may include an item name, an item frequency, and an item link. In one embodiment, the tail node on the mining tree path may store property information and tail node link information for the top K data groups or transactions for a particular time span.

The step (d) includes updating the selected user data in the item header table based on the descending order of the item frequency, and performing a mining tree visit according to the descending order of the item frequency numbers to update the mining tree path . The step (d) may further include associating an item in the user data as a tail node until processing of all the updated user data in the mining tree path is completed.

In one embodiment, the step (a) may include the step of scanning the user data stored in the database and updating the user header table based on the scan data until the processing of all the user data is completed. In the step (a), if the user corresponding to the scan data does not exist in the user header table, it adds characteristic information about the user to the user header table, and if it is present in the user header table, And the like.

In one embodiment, the user characteristic information may include a data creation time, a data length, a number of data creation times, and at least one item information.

In an embodiment, the upper K important pattern mining device includes a user header table generating unit for generating a user header table including a generation time, user characteristic information, and a user data link, a combination of at least one item from the user header table generating unit A data tree path update unit for receiving user data representative of the data tree path and performing a data tree visit based on each of the at least one item and updating a data tree path, And a user data link connection unit for associating the data additionally as a tail node and sequentially connecting the tail nodes related to the user with the corresponding user data link as a start link.

The upper K important pattern mining device selects a user data for a specific time range from the data tree path to generate an item header table and performs a mining tree visit based on the item information in the selected user data, And a mining tree path update unit updating the mining tree path. The upper K important pattern mining apparatus further associates an item in the selected user data as a tail node to a tree end node on the mining tree path, and transmits the item link as a start link to at least a general node or a tail And an item link connection unit for sequentially connecting the nodes.

Among the embodiments, the computer readable recording medium on which the program for implementing the upper K important pattern mining method is recorded includes a function of generating a user header table including a generation time, user characteristic information and a user data link, Receiving a user data representative of a combination of at least one item from a header table, performing a data tree visit based on each of the at least one item and updating a data tree path, A function of sequentially associating the user data with an end node as a tail node and sequentially connecting a tail node related to the user with a corresponding user data link as a start link, selecting user data for a specific time range from the data tree path, Header And a function of updating a mining tree path by performing a mining tree visit on the basis of item information in the selected user data, Node, and connects at least a general node or a tail node of the corresponding item to the corresponding item link sequentially through a start link.

The disclosed technique may have the following effects. It is to be understood, however, that the scope of the disclosed technology is not to be construed as limited thereby, as it is not meant to imply that a particular embodiment should include all of the following effects or only the following effects.

The upper K important pattern mining method according to an embodiment of the present invention includes mining the upper K important patterns corresponding to the specific time range for the generation time of the user data and calculating average characteristic information for the data group or transaction to which the corresponding pattern belongs .

The upper K significant pattern mining method according to an embodiment of the present invention can minify the upper K important patterns according to the generation time of the user data without setting a threshold value for the minimum frequency.

The upper K important pattern mining method according to an embodiment of the present invention can provide mining results for various upper K important patterns for each period based on the data accumulation time.

1 is a block diagram illustrating a high-K important pattern mining system according to an embodiment of the present invention.
FIG. 2 is a block diagram showing the upper K significant pattern mining apparatus in FIG. 1; FIG.
FIG. 3 is a flowchart illustrating a high-K important pattern mining process performed in the high-K important pattern mining apparatus shown in FIG.
FIG. 4 is a flowchart illustrating a process of updating a data tree path performed by the upper K important pattern mining apparatus shown in FIG. 1; FIG.
FIG. 5 is a diagram showing a user header table and a data tree generated and updated in the upper K important pattern mining apparatus shown in FIG. 1; FIG.
FIG. 6 is a flowchart illustrating a process of updating a mining tree path performed by the upper K important pattern mining apparatus shown in FIG. 1. FIG.
7 is a diagram showing an item header table and a mining tree generated and updated in the upper K important pattern mining apparatus shown in FIG.

The description of the present invention is merely an example for structural or functional explanation, and the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the present invention should be understood to include equivalents capable of realizing technical ideas. Also, the purpose or effect of the present invention should not be construed as limiting the scope of the present invention, since it does not mean that a specific embodiment should include all or only such effect.

Meanwhile, the meaning of the terms described in the present application should be understood as follows.

The terms "first "," second ", and the like are intended to distinguish one element from another, and the scope of the right should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

It is to be understood that the singular " include " or "have" are to be construed as including the stated feature, number, step, operation, It is to be understood that the combination is intended to specify that it does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

In each step, the identification code (e.g., a, b, c, etc.) is used for convenience of explanation, the identification code does not describe the order of each step, Unless otherwise stated, it may occur differently from the stated order. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.

The present invention can be embodied as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all kinds of recording devices for storing data that can be read by a computer system . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner.

All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used predefined terms should be interpreted to be consistent with the meanings in the context of the related art and can not be interpreted as having ideal or overly formal meaning unless explicitly defined in the present application.

1 is a block diagram illustrating a high-K important pattern mining system according to an embodiment of the present invention.

Referring to FIG. 1, a high-K important pattern mining system 100 includes a user terminal 110, a high-K important pattern mining device 120, and a database 130.

The user terminal 110 may access the upper K important pattern mining device 120 under the control of the user and request the mining of the upper K important pattern. Here, the upper K important pattern may correspond to an important pattern in which the order of item frequency is within the upper K, based on item information included in a plurality of user data. The user terminal 110 may display on the screen the mining result information provided in the upper K important pattern device 120 or the property information of the data group. The user terminal 110 may correspond to a mobile terminal, a tablet PC, a laptop PC, or a desktop PC.

The upper K important pattern mining device 120 may perform important pattern mining on data stored in the database 130 at the request of the user terminal 110. More specifically, the upper K important pattern mining device 120 extracts user data within a specific time range based on the generation time of the user data, and generates and updates the item header table and the mining tree with the extracted user data . The upper K important pattern mining device 120 may extract a characteristic value for the path based on the item link of the mining tree path, and derive a representative characteristic value of the mining tree path. The upper K important pattern mining device 120 may provide the user terminal 110 with mining result information or characteristic information of the data group.

FIG. 2 is a block diagram showing the upper K significant pattern mining apparatus in FIG. 1; FIG.

2, the upper K important pattern mining apparatus 120 includes a user header table generation unit 210, a data tree path update unit 220, a user data link connection unit 230, a mining tree path update unit 240, An item link connection unit 250, and a control unit 260.

The user header table generation unit 210 may generate a user header table including the generation time, the user characteristic information, and the user data link. Here, the generation time may correspond to the time when the user data is generated and stored in the database 130. In one embodiment, the user characteristic information may include a data creation time, a data length, a number of data creation times, and at least one item information.

The data tree path update unit 220 receives user data that can be represented by a combination of at least one item from the user header table generation unit 210, performs a data tree visit based on each item of at least one item, The path can be updated. Here, the path of the data tree may correspond to an arrangement of a plurality of items built on the basis of the user characteristic information. That is, the data tree path may be formed based on a combination of a plurality of items and a frequency of items.

The user data link connection unit 230 can additionally associate user data as a tail node to a tree end node on the data tree path and sequentially connect a tail node related to the user with the corresponding user data link as a start link.

The mining tree path update unit 240 may generate the item header table by selecting user data for a specific time range from the data tree path. Here, the specific time range can be specified by setting the start time and the end time for the generation time of the user data. In one embodiment, the item header table may include an item name, an item frequency (or item rating), and an item link.

The mining tree path update unit 240 updates the mining tree path by performing a mining tree visit based on the item frequency in the selected user data. In one embodiment, the tail node on the mining tree path may store property information and tail node link information for the top K data groups or transactions for a particular time span.

In one embodiment, the mining tree path update unit 240 updates the item header table based on the descending order of the item frequency and updates the mining tree path by performing a mining tree visit in descending order of the item frequency number .

The item link connection unit 250 additionally associates an item in the user data selected in the tree end node on the mining tree path as a tail node and links at least a normal node or a tail node related to the item with the corresponding item link as a start link sequentially .

The item link connection unit 250 may generate a tail node table based on a tail node associated with a tree end node on a mining tree path and connect the corresponding tail node link to the tail node with a start link. The item link connection unit 250 can extract a characteristic value for a corresponding path based on characteristic information stored in a tail node of each of the mining tree paths, and derive a representative characteristic value of the mining tree path.

The control unit 260 controls the overall operation of the upper K important pattern mining apparatus 120 and includes a user header table generation unit 210, a data tree path update unit 220, a user data link connection unit 230, The update unit 240, and the item link connection unit 250, as shown in FIG.

FIG. 3 is a flowchart illustrating a high-K important pattern mining process performed in the high-K important pattern mining apparatus shown in FIG.

Referring to FIG. 3, the user header table generation unit 210 may scan user data stored in the database (step S301).

The data tree path update unit 220 may generate a data tree including the generation time and the user characteristic information (step S302). The data tree path update unit 220 may perform a data tree visit based on each of the at least one item and update the data tree path (step S303). Hereinafter, a process of generating and updating a data tree will be described in detail with reference to FIGS. 4 and 5. FIG.

The mining tree path update unit 240 may receive a mining request of a top K important pattern through the user terminal 110 from a user. The mining tree path update unit 240 may receive the specific time range and the upper K value for the generation time of the user data set by the user (step S304).

The mining tree path update unit 240 may generate a mining tree for the upper K important pattern based on the set specific time range and the upper K value (step S305). The mining tree path update unit 240 may perform a mining tree visit based on the item frequency and update the mining tree path (step S306). Hereinafter, a process of generating and updating a mining tree will be described in detail with reference to FIGS. 6 and 7. FIG.

The item link connection unit 250 can extract the property value based on the property information of the upper K important pattern (step S307). More specifically, the item link connection unit 250 may link the item link of the item header table with at least one general node and a tail node for the item. The user data may be represented by a combination of at least one item and may include item information associated with each of the plurality of items. That is, the item link connection unit 250 can sequentially connect at least one general node and a tail node with respect to a corresponding item as a start link.

The item link connection unit 250 may derive representative characteristic values of the mining tree path to provide mining result information (step S308). More specifically, the item link connection unit 250 extracts a characteristic value for a corresponding route based on item link information (or characteristic information stored in a tail node of the mining tree), derives a representative characteristic value of the mining tree path Mining result information or characteristic information of the data group to the user terminal 110.

FIG. 4 is a flowchart illustrating a process of updating a data tree path performed by the upper K important pattern mining apparatus shown in FIG. 1, FIG. 5 is a diagram illustrating a user header table generated and updated by the upper K important pattern mining apparatus shown in FIG. Fig.

4 and 5, the user header table generation unit 210 may scan the user data stored in the database (step S401). The user header table generation unit 210 may generate a user header table 510 including the generation time, the user characteristic information, and the user data link based on the scanned user data.

In one embodiment, the user header table 510 may include a user entry, a property information item, and a link item. The user item can distinguish a plurality of users through a user's name or an identification code. The characteristic information item can start the data creation time, the data length, the number of data creation, and at least one item information included in the user data. The link item may include a user data link that is associated with tail node 524 for that user.

The user header table generation unit 210 may add and update information on user data to the user header table 510 (step S402). More specifically, the user header table generation unit 210 may update the user header table based on the scan data until the user data stored in the database is scanned and all user data is processed. In one embodiment, if the user corresponding to the scanned user data (or scan data) does not exist in the user header table 510, the user header table generation unit 220 transmits the characteristic information about the user to the user header table 510 ). For example, information for each user may be added to a new row in the user header table 510.

Meanwhile, if the user corresponding to the scan data exists in the user header table 510, the user header table generation unit 210 can update the feature information for the user. For example, new user property information for an existing user may be added to the property information item for that user.

The data tree path update unit 220 may update the data tree path by receiving the user data from the user header table generation unit 210 (step S403). More specifically, the data tree path update unit 220 can receive the user data and construct the data tree 520. The data tree path update unit 220 may perform a data tree visit based on each of at least one item of user data.

In one embodiment, the data tree 520 may include a tree intermediate node and a tree termination node. The data tree 520 may also include a generic node 522 and a tail node 524. That is, the tree intermediate node of the data tree 520 may be associated with the normal node 522, and the tree end node may be associated with the tail node 524.

In one embodiment, each of the at least one item contained in the user data may be stored in the common node 522 or the tail node 524. More specifically, the item frequency of each of the common nodes 522 may correspond to the sum of the item frequencies stored in at least one tail node 524 connected thereto. That is, the data tree paths can be represented by a combination of at least one item, and items shared with each other can be stored in an upper node (or a general node). For example, the common node 522 and the tail node 524 may store item information in the form of [item name: item frequency]. (C: 2, D: 2, D: 2) among the combinations of at least one item ({A, C}, {A, B, D} 1) may be stored in the tail node 524.

In one embodiment, the tail node 524 may additionally store various characteristic information included in the scan data. Here, the various characteristic information included in the scan data may include user information, user characteristic information, and user data link information associated with a combination of at least one item.

The data tree path update unit 220 may generate a tail node at the tree end node on the data tree path (step S404). The data tree path update unit 220 may store property information of the corresponding user data in the tail node (step S405).

The user data link connection unit 230 may link the user data link of the user header table 510 with the tail node 524 related to the user (step S406). More specifically, the user data may comprise a plurality of user characteristic information associated with each of the users. That is, the user data link connection unit 230 may sequentially connect the tail node 524 related to the user with the corresponding user data link as a start link. The user data link connection unit 230 may provide the user data link information to the mining tree path update unit 240 and the mining tree path update unit 240 may update the mining tree path update unit 240 by using the user data link information, It can be extracted efficiently.

The data tree path update unit 220 may scan the user data until there is no user data to be processed in the database (step S407).

If there is no user data to be processed in the database, the data tree path update unit 220 can update the data tree path (step S408).

FIG. 6 is a flow chart for explaining a process of updating a mining tree path performed in the upper K important pattern mining apparatus shown in FIG. 1, FIG. 7 shows an item header table generated and updated in the upper K important pattern mining apparatus shown in FIG. 1, FIG.

Referring to FIGS. 6 and 7, the mining tree path update unit 240 may select user data for a specific time range 710 from the data tree path (step S601). In one embodiment, the mining tree path update unit 240 may select upper K important patterns through a divide and conquer algorithm through a recursive function.

The mining tree path update unit 240 generates an item header table 720 based on the selected user data and generates a mining tree for the upper K pattern based on the item frequency (or item support) of the selected user data (Step S602).

In one embodiment, the item header table 720 may include an item item, a support item, and a link item. An item item can distinguish a plurality of items through an item name or an identification code. The support item may indicate the item frequency, i.e., the number of times the item is selected in the user data or the degree of support of the item. The link item may include an item data link associated with a common node 732 or tail node 734 for that item.

In one embodiment, the user data link is connected only to the tail node 524 for that user, but the item data link may be associated with the common node 732 or tail node 734 for that item. That is, the item data link may be associated with at least one common node 732 or tail node 734. For example, an item data link for item C may be associated with a common node 732d and a tail node 734a.

The mining tree path update unit 240 may add and update the selected user data based on the descending order of the item frequency to the item header table 720 (step S603). More specifically, the mining tree path update unit 240 may additionally associate the item information in the user data as a tail node until the processing of all the user data selected in the mining tree path is completed. In an embodiment, the mining tree path update unit 240 updates the item header table 720 if the item included in the selected user data (or the selection data) does not exist in the item header table 720, ). For example, information for each item may be added to a new row of the item header table 720 based on the descending order of the item frequency.

On the other hand, the mining tree path update unit 240 can update the frequency (or the degree of support) and item information of the item if the item corresponding to the selection data exists in the item header table 720. For example, the mining tree path update unit 240 may increase the frequency based on the new item information and add information about the item.

The mining tree path update unit 240 can update the mining tree path according to the descending order of the item frequency numbers (step S604). More specifically, the mining tree path update unit 240 updates the mining tree path by performing a mining tree visit based on the item frequency. The mining tree path update unit 240 may perform a mining tree visit based on at least one item of user data.

In one embodiment, the mining tree 730 may include a tree intermediate node and a tree termination node. In addition, the mining tree 730 may include a generic node 732 and a tail node 734. That is, the tree intermediate node of the mining tree 730 may be associated with the normal node 732, and the tree end node may be associated with the tail node 734. [

In one embodiment, each of the at least one item included in the user data may be stored in the generic node 732 or the tail node 734. [ Here, the tail node 734 may include an item name, item frequency (or item support), characteristic information of a user data group, and creation time information. More specifically, the item frequency of each of the generic nodes 732 may correspond to the sum of the item frequencies stored in at least one tail node 734 associated therewith. That is, the mining tree paths can be represented by a combination of at least one item, and items shared by each other can be stored in an upper node (or a general node). For example, the general node 732 and the tail node 734 may store item information in the form of [item name: item frequency]. (C: 2, D: 1, D: 1) among the combinations of at least one item ({A, C}, {A, B, D} 1) may be stored in the tail node 734.

In one embodiment, the mining tree 730 may be created and updated based on user data for a particular time period of the data contained in the data tree 520. [ For example, when the user sets a specific time range for the generation time of user data from March 01 to March 04, the mining tree 730 selects only the user data satisfying the specific time range and forms .

In one embodiment, the tail node 734 may additionally store various characteristic information included in the selection data. Here, the various characteristic information included in the selection data may include user information, user characteristic information, and item link information associated with a combination of at least one item.

The mining tree path update unit 240 may add and update the tail node information for each user data (step S605). The mining tree path update unit 240 updates the mining tree path by performing a mining tree visit based on the item information in the selected user data.

The mining tree path update unit 240 may add and update the item header table 720 until there is no data to process among the selected user data (step S606). The mining tree path update unit 240 may update the mining tree path if there is no data to process among the selected user data (step S607).

In one embodiment, the item link connection 250 may generate the tail node table 740 based on the tail node 734 associated with the tree end node on the mining tree path. The item link connection unit 250 may connect each tail node link of the tail node table 740 to the corresponding tail node as a start link. The item link connection unit 250 efficiently links the item name, item frequency (or item support), characteristic information of the user data group, and creation time information stored in the tail node 734 through the tail node table 740 Can be managed. That is, the item link connection unit 250 can easily derive representative characteristic values through the tail node table 740.

Therefore, the upper K important pattern mining device 120 does not set a threshold value for the minimum frequency and mines the upper K important patterns corresponding to the specific time range for the generation time of the user data, And can provide average property information for the transaction. Also, the upper K significant pattern mining device 120 can provide mining results for various upper K important patterns for each period based on the data accumulation time.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as set forth in the following claims And changes may be made without departing from the spirit and scope of the invention.

100: Top K Critical Pattern Mining System
110: User terminal 120: Top K important pattern mining device
130: Database
210: user header table generation unit 220: data tree path update unit
230: user data link connection unit 240: mining tree path update unit
250: item link connection unit 260:
510: user header table 520: data tree
522: General node 524: Tail node
710: Specific time range 720: Item header table
730: Mining tree 732: General node
734: Tail node 740: Tail node table

Claims (17)

In a high-K important pattern mining method performed in a high-K important pattern mining device,
(a) generating a user header table including creation time, user characteristic information, and a user data link;
(b) receiving user data representative of a combination of at least one item from the user header table, performing a data tree visit based on each of the at least one item and updating the data tree path;
(c) further associating the user data as a tail node to a tree end node on the data tree path and sequentially connecting a tail node related to the user with a corresponding user data link as a start link;
(d) generating an item header table by selecting user data for a specific time range from the data tree path, and performing a mining tree visit based on item information in the selected user data to update a mining tree path; And
(e) additionally associating an item in the selected user data as a tail node to a tree end node on the mining tree path, connecting at least a regular node or a tail node related to the item with the corresponding item link as a start link , ≪ / RTI >
Wherein the step (e) includes generating a tail node table based on a tail node associated with a tree end node on the mining tree path, and connecting the tail node link to the tail node with a start link Important pattern mining methods.
delete delete delete 2. The method of claim 1, wherein step (e)
Extracting a characteristic value for the corresponding path based on the characteristic information stored in the tail node of each of the mining tree paths and deriving a representative characteristic value of the mining tree path; Mining method.
2. The method of claim 1,
And setting a start time and an end time for the generation time of the user data.
The method of claim 1, wherein the item header table
Item name, item frequency, and item link.
2. The method of claim 1, wherein the tail node on the mining tree path
Characterized by storing property information and tail node link information for a data group or transaction for a specific time range.
2. The method of claim 1, wherein step (d)
Updating the selected user data in the item header table on the basis of the descending order of the item frequency counts and updating the mining tree path by performing a mining tree visit in descending order of the item frequency counts. Important pattern mining methods.
10. The method of claim 9, wherein step (d)
Further comprising associating an item in the user data as a tail node until processing of all updated user data in the mining tree path is completed.
The method of claim 1, wherein step (a)
Scanning the user data stored in the database and updating the user header table based on the scan data until the processing of all the user data is completed.
12. The method of claim 11, wherein step (a)
If the user corresponding to the scan data does not exist in the user header table, adding the characteristic information for the user to the user header table and updating the characteristic information for the user if the user header table exists Wherein the upper K important pattern mining method is characterized by:
2. The method of claim 1,
A data creation time, a data length, a data creation count, and at least one item information.
A user header table generation unit for generating a user header table including a generation time, a user characteristic information, and a user data link;
A data tree path update unit for receiving user data representative of a combination of at least one item from the user header table generation unit, performing a data tree visit based on each of the at least one items, and updating a data tree path;
A user data link connection unit that additionally associates the user data as a tail node to a tree end node on the data tree path and sequentially connects a tail node related to the user with a corresponding user data link as a start link;
A mining tree path update unit for updating a mining tree path by performing a mining tree visit based on the item information in the selected user data to generate an item header table by selecting user data for a specific time range from the data tree path, ; And
Linking an item in the selected user data as a tail node to a tree end node on the mining tree path and linking at least an ordinary node or a tail node related to the item with the corresponding item link as a start link, Comprising a connection,
Wherein the item link connection unit creates a tail node table based on a tail node associated with a tree end node on the mining tree path and connects the tail node link to the tail node with a start link. Pattern mining device.
delete delete Generating a user header table containing the creation time, user characteristic information, and user data link;
Generating a user header table from the user header table, the user header table including a small generation time, a user characteristic information and a user data link;
Receiving user data representative of a combination of at least one item from the user header table, performing a data tree visit based on each of the at least one item and updating the data tree path;
A function of additionally associating the user data as a tail node to a tree end node on the data tree path and sequentially connecting a tail node related to the user with a corresponding user data link as a start link;
A function of generating a item header table by selecting user data for a specific time range from the data tree path and performing a mining tree visit based on the item information in the selected user data to update a mining tree path; And
A function of associating an item in the selected user data as a tail node to a tree end node on the mining tree path and sequentially connecting at least a general node or a tail node related to the item with the corresponding item link as a start link And,
Wherein the function of sequentially connecting at least a general node or a tail node with respect to the corresponding item comprises generating a tail node table based on a tail node associated with a tree end node on the mining tree path, A computer-readable recording medium having recorded thereon a program for implementing a top-K important pattern mining method for performing connection to a tail node.
KR1020150167957A 2015-11-27 2015-11-27 Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same KR101761177B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150167957A KR101761177B1 (en) 2015-11-27 2015-11-27 Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150167957A KR101761177B1 (en) 2015-11-27 2015-11-27 Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same

Publications (2)

Publication Number Publication Date
KR20170062308A KR20170062308A (en) 2017-06-07
KR101761177B1 true KR101761177B1 (en) 2017-07-25

Family

ID=59223418

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150167957A KR101761177B1 (en) 2015-11-27 2015-11-27 Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same

Country Status (1)

Country Link
KR (1) KR101761177B1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308292A (en) * 2018-11-27 2019-02-05 北京京东尚科信息技术有限公司 Crowd orients method for digging, device and computer readable storage medium
CN110188174B (en) * 2019-04-19 2021-10-29 浙江工业大学 Professional field FAQ intelligent question and answer method based on professional vocabulary mining
KR102079289B1 (en) * 2019-04-23 2020-04-07 주식회사 비닛 Wine recommendation system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101079063B1 (en) * 2010-02-22 2011-11-07 주식회사 케이티 Apparatus and method for association rule mining using frequent pattern-tree for incremental data processing
KR101443285B1 (en) * 2012-11-19 2014-09-22 충북대학교 산학협력단 Method of mining high utility patterns
KR101567338B1 (en) * 2014-08-26 2015-11-10 연세대학교 산학협력단 Apparatus and Method for frequent sub-graph component mining in graph data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101079063B1 (en) * 2010-02-22 2011-11-07 주식회사 케이티 Apparatus and method for association rule mining using frequent pattern-tree for incremental data processing
KR101443285B1 (en) * 2012-11-19 2014-09-22 충북대학교 산학협력단 Method of mining high utility patterns
KR101567338B1 (en) * 2014-08-26 2015-11-10 연세대학교 산학협력단 Apparatus and Method for frequent sub-graph component mining in graph data

Also Published As

Publication number Publication date
KR20170062308A (en) 2017-06-07

Similar Documents

Publication Publication Date Title
KR101705778B1 (en) Sliding window based frequent patterns management method for mining weighted maximal frequent patterns over data stream
JP5092165B2 (en) Data construction method and system
JP4736713B2 (en) Systems and methods to support the selection of project members
CN105893453A (en) Computer-implemented method for processing query in database and computer system
KR101761177B1 (en) Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same
CN105721629A (en) User identifier matching method and device
CN105404631B (en) Picture identification method and device
CN107092667A (en) Group's lookup method and device based on social networks
CN108319628B (en) User interest determination method and device
JP2016133817A (en) Similarity determination apparatus, similarity determination method and similarity determination program
CN107239437B (en) A kind of document edit method, device, readable storage medium storing program for executing and terminal device
CN109460424A (en) Effective sequence pattern processing method, device and computer equipment
CN106569986B (en) Character string replacing method and device
CN106469166B (en) A kind of information processing method and device
CN102611725B (en) A kind of method and device of memory node
CN107608995A (en) A kind of foundation of product chain object database, querying method, device and system
CN106227661A (en) Data processing method and device
CN107301192A (en) A kind of terminal identification method and identification server
KR20120136677A (en) Method and tree structure of database for extracting data steams frequent pattern based on weighted support and structure of database
JP6810352B2 (en) Fault analysis program, fault analysis device and fault analysis method
CN105893445A (en) Data processing method, server and terminal device
JP6751960B1 (en) Information processing system and information processing method
CN104951550B (en) Date storage method and device
KR101927689B1 (en) Method for processing of cim network diagram
CN110297818B (en) Method and device for constructing data warehouse

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant