KR101761177B1 - Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same - Google Patents
Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same Download PDFInfo
- Publication number
- KR101761177B1 KR101761177B1 KR1020150167957A KR20150167957A KR101761177B1 KR 101761177 B1 KR101761177 B1 KR 101761177B1 KR 1020150167957 A KR1020150167957 A KR 1020150167957A KR 20150167957 A KR20150167957 A KR 20150167957A KR 101761177 B1 KR101761177 B1 KR 101761177B1
- Authority
- KR
- South Korea
- Prior art keywords
- item
- user
- data
- mining
- tree
- Prior art date
Links
Images
Classifications
-
- G06F17/30539—
-
- G06F17/30327—
-
- G06F17/30339—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Abstract
The high-K important pattern mining method comprises the steps of: (a) generating a user header table including a creation time, user characteristic information and a user data link; (b) Receiving user data, performing a data tree visit based on each of the at least one item and updating the data tree path, and (c) adding the user data to the tree end node on the data tree path as a tail node. And sequentially connecting the tail nodes associated with the user with the corresponding user data links as start links. Therefore, the upper K important pattern mining method can provide the average characteristic information for the data group or transaction to which the pattern belongs by mining the upper K important patterns corresponding to the specific time range for the generation time of the user data.
Description
The present invention relates to a high-K important pattern mining technique, and more particularly, to a high-K important pattern mining technique that mines upper K important patterns corresponding to a specific time range for generation time of user data, And a recording medium storing the apparatus. 2. Description of the Related Art
The mining technique according to the prior art uses a method of receiving a minimum frequency threshold value from a user in order to mine frequent patterns from a database. However, there is a problem that it is difficult to set an appropriate threshold value in such a case. If the threshold value is set too high, any frequent patterns may not be mined. Conversely, if the threshold value is set too small, too many frequent patterns are extracted, which makes analysis difficult and requires a considerable time to perform the mining process . Therefore, the conventional mining technique has a problem that a user must perform mining work several times using a computer in order to set an appropriate threshold value.
Meanwhile, the prior art has proposed a variety of top K frequent pattern mining methods, but these techniques simply perform the top K frequent pattern mining operation for each database without considering the generation time of the data or the user characteristic information for each data Therefore, mining operations are performed using only basic information of the database, and effective results can not be obtained for the latest data types such as SNS data, in which user characteristics are important factors.
Korean Patent Registration No. 10-0913027 relates to a data mining method for analyzing a large amount of data sets to find feature information and a data mining system implemented using the method, A method for obtaining a mining result in real time by optimizing a memory usage and a mining operation time required in a sequential pattern search process is presented and a method for effectively grasping a change in a mining result with respect to a change in a data set is disclosed.
Korean Patent Registration No. 10-1317540 relates to a method for mining a maximum frequent pattern considering a weight, and in order to effectively mine meaningful patterns in a large amount of transaction data, By providing a mining method, maximum frequent pattern mining can be quickly and efficiently mined by avoiding duplication between frequent patterns, and weighted frequent pattern mining discloses a method for excluding patterns composed of relatively less important items among frequent patterns .
One embodiment of the present invention relates to a method for predicting an upper K important pattern mining method for mining upper K important patterns corresponding to a specific time range for generating user data and providing average characteristic information for a data group or a transaction to which the corresponding pattern belongs And apparatus.
An embodiment of the present invention is to provide an upper K important pattern mining method and apparatus for mining upper K important patterns according to the generation time of user data without setting a threshold value for the minimum frequency.
An embodiment of the present invention is to provide an upper K important pattern mining method and apparatus that provides mining results for various upper K important patterns for each period based on data accumulation time.
Among the embodiments, the upper K significant pattern mining method comprises the steps of: (a) generating a user header table including a creation time, user characteristic information and a user data link; (b) combining at least one item from the user header table Receiving a user data representative of the user data and performing a data tree visit based on each of the at least one item and updating a data tree path; and (c) And associating the user data link with the tail node associated with the user sequentially as a start link.
The upper K important pattern mining method may include: (d) generating an item header table by selecting user data for a specific time range from the data tree path, and performing a mining tree visit based on the item information in the selected user data And updating the mining tree path. Wherein the upper K important pattern mining method further comprises: (e) associating an item in the selected user data as a tail node to a tree end node on the mining tree path, and associating the item link as a start link with at least a general And sequentially connecting nodes or tail nodes.
In one embodiment, step (e) includes generating a tail node table based on a tail node associated with a tree end node on the mining tree path, and connecting the corresponding tail node link to the tail node with a start link . ≪ / RTI > The step (e) may include extracting a characteristic value for a corresponding path based on characteristic information stored in a tail node of each of the mining tree paths, and deriving a representative characteristic value of the mining tree path .
The specific time range may be specified by setting a start time and an end time for the generation time of the user data. In one embodiment, the item header table may include an item name, an item frequency, and an item link. In one embodiment, the tail node on the mining tree path may store property information and tail node link information for the top K data groups or transactions for a particular time span.
The step (d) includes updating the selected user data in the item header table based on the descending order of the item frequency, and performing a mining tree visit according to the descending order of the item frequency numbers to update the mining tree path . The step (d) may further include associating an item in the user data as a tail node until processing of all the updated user data in the mining tree path is completed.
In one embodiment, the step (a) may include the step of scanning the user data stored in the database and updating the user header table based on the scan data until the processing of all the user data is completed. In the step (a), if the user corresponding to the scan data does not exist in the user header table, it adds characteristic information about the user to the user header table, and if it is present in the user header table, And the like.
In one embodiment, the user characteristic information may include a data creation time, a data length, a number of data creation times, and at least one item information.
In an embodiment, the upper K important pattern mining device includes a user header table generating unit for generating a user header table including a generation time, user characteristic information, and a user data link, a combination of at least one item from the user header table generating unit A data tree path update unit for receiving user data representative of the data tree path and performing a data tree visit based on each of the at least one item and updating a data tree path, And a user data link connection unit for associating the data additionally as a tail node and sequentially connecting the tail nodes related to the user with the corresponding user data link as a start link.
The upper K important pattern mining device selects a user data for a specific time range from the data tree path to generate an item header table and performs a mining tree visit based on the item information in the selected user data, And a mining tree path update unit updating the mining tree path. The upper K important pattern mining apparatus further associates an item in the selected user data as a tail node to a tree end node on the mining tree path, and transmits the item link as a start link to at least a general node or a tail And an item link connection unit for sequentially connecting the nodes.
Among the embodiments, the computer readable recording medium on which the program for implementing the upper K important pattern mining method is recorded includes a function of generating a user header table including a generation time, user characteristic information and a user data link, Receiving a user data representative of a combination of at least one item from a header table, performing a data tree visit based on each of the at least one item and updating a data tree path, A function of sequentially associating the user data with an end node as a tail node and sequentially connecting a tail node related to the user with a corresponding user data link as a start link, selecting user data for a specific time range from the data tree path, Header And a function of updating a mining tree path by performing a mining tree visit on the basis of item information in the selected user data, Node, and connects at least a general node or a tail node of the corresponding item to the corresponding item link sequentially through a start link.
The disclosed technique may have the following effects. It is to be understood, however, that the scope of the disclosed technology is not to be construed as limited thereby, as it is not meant to imply that a particular embodiment should include all of the following effects or only the following effects.
The upper K important pattern mining method according to an embodiment of the present invention includes mining the upper K important patterns corresponding to the specific time range for the generation time of the user data and calculating average characteristic information for the data group or transaction to which the corresponding pattern belongs .
The upper K significant pattern mining method according to an embodiment of the present invention can minify the upper K important patterns according to the generation time of the user data without setting a threshold value for the minimum frequency.
The upper K important pattern mining method according to an embodiment of the present invention can provide mining results for various upper K important patterns for each period based on the data accumulation time.
1 is a block diagram illustrating a high-K important pattern mining system according to an embodiment of the present invention.
FIG. 2 is a block diagram showing the upper K significant pattern mining apparatus in FIG. 1; FIG.
FIG. 3 is a flowchart illustrating a high-K important pattern mining process performed in the high-K important pattern mining apparatus shown in FIG.
FIG. 4 is a flowchart illustrating a process of updating a data tree path performed by the upper K important pattern mining apparatus shown in FIG. 1; FIG.
FIG. 5 is a diagram showing a user header table and a data tree generated and updated in the upper K important pattern mining apparatus shown in FIG. 1; FIG.
FIG. 6 is a flowchart illustrating a process of updating a mining tree path performed by the upper K important pattern mining apparatus shown in FIG. 1. FIG.
7 is a diagram showing an item header table and a mining tree generated and updated in the upper K important pattern mining apparatus shown in FIG.
The description of the present invention is merely an example for structural or functional explanation, and the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the present invention should be understood to include equivalents capable of realizing technical ideas. Also, the purpose or effect of the present invention should not be construed as limiting the scope of the present invention, since it does not mean that a specific embodiment should include all or only such effect.
Meanwhile, the meaning of the terms described in the present application should be understood as follows.
The terms "first "," second ", and the like are intended to distinguish one element from another, and the scope of the right should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.
It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.
It is to be understood that the singular " include " or "have" are to be construed as including the stated feature, number, step, operation, It is to be understood that the combination is intended to specify that it does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.
In each step, the identification code (e.g., a, b, c, etc.) is used for convenience of explanation, the identification code does not describe the order of each step, Unless otherwise stated, it may occur differently from the stated order. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.
The present invention can be embodied as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all kinds of recording devices for storing data that can be read by a computer system . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner.
All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used predefined terms should be interpreted to be consistent with the meanings in the context of the related art and can not be interpreted as having ideal or overly formal meaning unless explicitly defined in the present application.
1 is a block diagram illustrating a high-K important pattern mining system according to an embodiment of the present invention.
Referring to FIG. 1, a high-K important
The
The upper K important
FIG. 2 is a block diagram showing the upper K significant pattern mining apparatus in FIG. 1; FIG.
2, the upper K important
The user header
The data tree
The user data
The mining tree
The mining tree
In one embodiment, the mining tree
The item
The item
The
FIG. 3 is a flowchart illustrating a high-K important pattern mining process performed in the high-K important pattern mining apparatus shown in FIG.
Referring to FIG. 3, the user header
The data tree
The mining tree
The mining tree
The item
The item
FIG. 4 is a flowchart illustrating a process of updating a data tree path performed by the upper K important pattern mining apparatus shown in FIG. 1, FIG. 5 is a diagram illustrating a user header table generated and updated by the upper K important pattern mining apparatus shown in FIG. Fig.
4 and 5, the user header
In one embodiment, the user header table 510 may include a user entry, a property information item, and a link item. The user item can distinguish a plurality of users through a user's name or an identification code. The characteristic information item can start the data creation time, the data length, the number of data creation, and at least one item information included in the user data. The link item may include a user data link that is associated with tail node 524 for that user.
The user header
Meanwhile, if the user corresponding to the scan data exists in the user header table 510, the user header
The data tree
In one embodiment, the
In one embodiment, each of the at least one item contained in the user data may be stored in the common node 522 or the tail node 524. More specifically, the item frequency of each of the common nodes 522 may correspond to the sum of the item frequencies stored in at least one tail node 524 connected thereto. That is, the data tree paths can be represented by a combination of at least one item, and items shared with each other can be stored in an upper node (or a general node). For example, the common node 522 and the tail node 524 may store item information in the form of [item name: item frequency]. (C: 2,
In one embodiment, the tail node 524 may additionally store various characteristic information included in the scan data. Here, the various characteristic information included in the scan data may include user information, user characteristic information, and user data link information associated with a combination of at least one item.
The data tree
The user data
The data tree
If there is no user data to be processed in the database, the data tree
FIG. 6 is a flow chart for explaining a process of updating a mining tree path performed in the upper K important pattern mining apparatus shown in FIG. 1, FIG. 7 shows an item header table generated and updated in the upper K important pattern mining apparatus shown in FIG. 1, FIG.
Referring to FIGS. 6 and 7, the mining tree
The mining tree
In one embodiment, the item header table 720 may include an item item, a support item, and a link item. An item item can distinguish a plurality of items through an item name or an identification code. The support item may indicate the item frequency, i.e., the number of times the item is selected in the user data or the degree of support of the item. The link item may include an item data link associated with a common node 732 or tail node 734 for that item.
In one embodiment, the user data link is connected only to the tail node 524 for that user, but the item data link may be associated with the common node 732 or tail node 734 for that item. That is, the item data link may be associated with at least one common node 732 or tail node 734. For example, an item data link for item C may be associated with a
The mining tree
On the other hand, the mining tree
The mining tree
In one embodiment, the
In one embodiment, each of the at least one item included in the user data may be stored in the generic node 732 or the tail node 734. [ Here, the tail node 734 may include an item name, item frequency (or item support), characteristic information of a user data group, and creation time information. More specifically, the item frequency of each of the generic nodes 732 may correspond to the sum of the item frequencies stored in at least one tail node 734 associated therewith. That is, the mining tree paths can be represented by a combination of at least one item, and items shared by each other can be stored in an upper node (or a general node). For example, the general node 732 and the tail node 734 may store item information in the form of [item name: item frequency]. (C: 2,
In one embodiment, the
In one embodiment, the tail node 734 may additionally store various characteristic information included in the selection data. Here, the various characteristic information included in the selection data may include user information, user characteristic information, and item link information associated with a combination of at least one item.
The mining tree
The mining tree
In one embodiment, the
Therefore, the upper K important
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as set forth in the following claims And changes may be made without departing from the spirit and scope of the invention.
100: Top K Critical Pattern Mining System
110: User terminal 120: Top K important pattern mining device
130: Database
210: user header table generation unit 220: data tree path update unit
230: user data link connection unit 240: mining tree path update unit
250: item link connection unit 260:
510: user header table 520: data tree
522: General node 524: Tail node
710: Specific time range 720: Item header table
730: Mining tree 732: General node
734: Tail node 740: Tail node table
Claims (17)
(a) generating a user header table including creation time, user characteristic information, and a user data link;
(b) receiving user data representative of a combination of at least one item from the user header table, performing a data tree visit based on each of the at least one item and updating the data tree path;
(c) further associating the user data as a tail node to a tree end node on the data tree path and sequentially connecting a tail node related to the user with a corresponding user data link as a start link;
(d) generating an item header table by selecting user data for a specific time range from the data tree path, and performing a mining tree visit based on item information in the selected user data to update a mining tree path; And
(e) additionally associating an item in the selected user data as a tail node to a tree end node on the mining tree path, connecting at least a regular node or a tail node related to the item with the corresponding item link as a start link , ≪ / RTI >
Wherein the step (e) includes generating a tail node table based on a tail node associated with a tree end node on the mining tree path, and connecting the tail node link to the tail node with a start link Important pattern mining methods.
Extracting a characteristic value for the corresponding path based on the characteristic information stored in the tail node of each of the mining tree paths and deriving a representative characteristic value of the mining tree path; Mining method.
And setting a start time and an end time for the generation time of the user data.
Item name, item frequency, and item link.
Characterized by storing property information and tail node link information for a data group or transaction for a specific time range.
Updating the selected user data in the item header table on the basis of the descending order of the item frequency counts and updating the mining tree path by performing a mining tree visit in descending order of the item frequency counts. Important pattern mining methods.
Further comprising associating an item in the user data as a tail node until processing of all updated user data in the mining tree path is completed.
Scanning the user data stored in the database and updating the user header table based on the scan data until the processing of all the user data is completed.
If the user corresponding to the scan data does not exist in the user header table, adding the characteristic information for the user to the user header table and updating the characteristic information for the user if the user header table exists Wherein the upper K important pattern mining method is characterized by:
A data creation time, a data length, a data creation count, and at least one item information.
A data tree path update unit for receiving user data representative of a combination of at least one item from the user header table generation unit, performing a data tree visit based on each of the at least one items, and updating a data tree path;
A user data link connection unit that additionally associates the user data as a tail node to a tree end node on the data tree path and sequentially connects a tail node related to the user with a corresponding user data link as a start link;
A mining tree path update unit for updating a mining tree path by performing a mining tree visit based on the item information in the selected user data to generate an item header table by selecting user data for a specific time range from the data tree path, ; And
Linking an item in the selected user data as a tail node to a tree end node on the mining tree path and linking at least an ordinary node or a tail node related to the item with the corresponding item link as a start link, Comprising a connection,
Wherein the item link connection unit creates a tail node table based on a tail node associated with a tree end node on the mining tree path and connects the tail node link to the tail node with a start link. Pattern mining device.
Generating a user header table from the user header table, the user header table including a small generation time, a user characteristic information and a user data link;
Receiving user data representative of a combination of at least one item from the user header table, performing a data tree visit based on each of the at least one item and updating the data tree path;
A function of additionally associating the user data as a tail node to a tree end node on the data tree path and sequentially connecting a tail node related to the user with a corresponding user data link as a start link;
A function of generating a item header table by selecting user data for a specific time range from the data tree path and performing a mining tree visit based on the item information in the selected user data to update a mining tree path; And
A function of associating an item in the selected user data as a tail node to a tree end node on the mining tree path and sequentially connecting at least a general node or a tail node related to the item with the corresponding item link as a start link And,
Wherein the function of sequentially connecting at least a general node or a tail node with respect to the corresponding item comprises generating a tail node table based on a tail node associated with a tree end node on the mining tree path, A computer-readable recording medium having recorded thereon a program for implementing a top-K important pattern mining method for performing connection to a tail node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150167957A KR101761177B1 (en) | 2015-11-27 | 2015-11-27 | Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150167957A KR101761177B1 (en) | 2015-11-27 | 2015-11-27 | Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170062308A KR20170062308A (en) | 2017-06-07 |
KR101761177B1 true KR101761177B1 (en) | 2017-07-25 |
Family
ID=59223418
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150167957A KR101761177B1 (en) | 2015-11-27 | 2015-11-27 | Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101761177B1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109308292A (en) * | 2018-11-27 | 2019-02-05 | 北京京东尚科信息技术有限公司 | Crowd orients method for digging, device and computer readable storage medium |
CN110188174B (en) * | 2019-04-19 | 2021-10-29 | 浙江工业大学 | Professional field FAQ intelligent question and answer method based on professional vocabulary mining |
KR102079289B1 (en) * | 2019-04-23 | 2020-04-07 | 주식회사 비닛 | Wine recommendation system and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101079063B1 (en) * | 2010-02-22 | 2011-11-07 | 주식회사 케이티 | Apparatus and method for association rule mining using frequent pattern-tree for incremental data processing |
KR101443285B1 (en) * | 2012-11-19 | 2014-09-22 | 충북대학교 산학협력단 | Method of mining high utility patterns |
KR101567338B1 (en) * | 2014-08-26 | 2015-11-10 | 연세대학교 산학협력단 | Apparatus and Method for frequent sub-graph component mining in graph data |
-
2015
- 2015-11-27 KR KR1020150167957A patent/KR101761177B1/en active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101079063B1 (en) * | 2010-02-22 | 2011-11-07 | 주식회사 케이티 | Apparatus and method for association rule mining using frequent pattern-tree for incremental data processing |
KR101443285B1 (en) * | 2012-11-19 | 2014-09-22 | 충북대학교 산학협력단 | Method of mining high utility patterns |
KR101567338B1 (en) * | 2014-08-26 | 2015-11-10 | 연세대학교 산학협력단 | Apparatus and Method for frequent sub-graph component mining in graph data |
Also Published As
Publication number | Publication date |
---|---|
KR20170062308A (en) | 2017-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101705778B1 (en) | Sliding window based frequent patterns management method for mining weighted maximal frequent patterns over data stream | |
JP5092165B2 (en) | Data construction method and system | |
JP4736713B2 (en) | Systems and methods to support the selection of project members | |
CN105893453A (en) | Computer-implemented method for processing query in database and computer system | |
KR101761177B1 (en) | Method for mining important pattern of high rank k, apparatus performing the same and storage medium storing the same | |
CN105721629A (en) | User identifier matching method and device | |
CN105404631B (en) | Picture identification method and device | |
CN107092667A (en) | Group's lookup method and device based on social networks | |
CN108319628B (en) | User interest determination method and device | |
JP2016133817A (en) | Similarity determination apparatus, similarity determination method and similarity determination program | |
CN107239437B (en) | A kind of document edit method, device, readable storage medium storing program for executing and terminal device | |
CN109460424A (en) | Effective sequence pattern processing method, device and computer equipment | |
CN106569986B (en) | Character string replacing method and device | |
CN106469166B (en) | A kind of information processing method and device | |
CN102611725B (en) | A kind of method and device of memory node | |
CN107608995A (en) | A kind of foundation of product chain object database, querying method, device and system | |
CN106227661A (en) | Data processing method and device | |
CN107301192A (en) | A kind of terminal identification method and identification server | |
KR20120136677A (en) | Method and tree structure of database for extracting data steams frequent pattern based on weighted support and structure of database | |
JP6810352B2 (en) | Fault analysis program, fault analysis device and fault analysis method | |
CN105893445A (en) | Data processing method, server and terminal device | |
JP6751960B1 (en) | Information processing system and information processing method | |
CN104951550B (en) | Date storage method and device | |
KR101927689B1 (en) | Method for processing of cim network diagram | |
CN110297818B (en) | Method and device for constructing data warehouse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |