CN106250549B - A kind of Frequent Pattern Mining method memory-based - Google Patents
A kind of Frequent Pattern Mining method memory-based Download PDFInfo
- Publication number
- CN106250549B CN106250549B CN201610662641.2A CN201610662641A CN106250549B CN 106250549 B CN106250549 B CN 106250549B CN 201610662641 A CN201610662641 A CN 201610662641A CN 106250549 B CN106250549 B CN 106250549B
- Authority
- CN
- China
- Prior art keywords
- tree
- frequent
- node
- value
- affairs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
Abstract
The invention discloses a kind of Frequent Pattern Mining methods memory-based, it constructs frequent mode initial tree, create the root node T of frequent pattern tree (fp tree) the following steps are included: step 1, with " null " label;Frequent episode in every affairs of reading is selected and is sorted by the order in L by scan database again;The path of a frequent pattern tree (fp tree) is constructed after sequence using null as root node, only count is incremented for the node in most end upper to path, and the counting of other nodes on path remains unchanged;It successively scans through and obtains frequent mode initial tree in entire database after all affairs;Step 2, frequent mode initial tree is successively traversed with Depth Priority Algorithm, the Counter Value of traversing nodes is that the value of the node itself adds the value of its all child's node.The solution have the advantages that: it can be reduced the write operation to NVM, can quickly construct frequent pattern tree (fp tree);And can be reduced to a large amount of intensive write operations of node count field close to root node, extend the NVM service life.
Description
Technical field
The invention belongs to memory technology fields, and in particular to a kind of Frequent Pattern Mining method memory-based.
Background technique
Increasingly mature with computer technology, data analysis has had great development since 20th century established.Data
Analysis can find in mass data and extract interested project, to provide instruction to policy-making body.Machine
Study and data mining can disclose the information that data are hidden behind, it has also become be the key technology of data analysis.
In the field of data mining, it is found that frequent episode or frequent mode in data set are one in data mining research
Important topic, it is the base of many significant data mining tasks such as correlation analysis, sequence pattern, causality, Emerging Pattern
Plinth.There are the technologies such as Apriori and FP-tree at present to handle Frequent Pattern Mining problem.
Since the condition of Frequent Pattern Mining method memory-based is to be mined data and data element is stored in byte
On addressing register, and DRAM requires to need continued power to keep data, and therefore, efficiency and persistence are likely to become data
Key Design problem in digging system.In order to solve the problems, such as such, such as phase transition storage in data memory-based analysis
(PCM) etc. nonvolatile memories (NVM) are typically considered the excellent of DRAM due to its outstanding non-volatile and performance efficiency
Elegant substitute.But NVM is used as main memory there is a problem of again and is following: first is that the read-write operation time difference to NVM is bigger,
Read operation is usually more than time spent by write operation and energy;Second is that NVM write operation number is limited, non-uniform write operation
Monolith NVM would generally be accelerated to fail.Just because of the considerations of lacking to NVM essential characteristic, the data carried out on NVM at present are dug
Pick seriously affects performance and the service life of storage system with machine learning algorithm.
The prior art uses a kind of technical solution for being called FP-tree algorithm, it is the improvement to Apriori algorithm, will
The structure of the key message boil down to frequent pattern tree (fp tree) (FP-tree) of frequent mode, it is huge to reduce expense in Apriori algorithm
Candidate item, to solve the performance bottleneck of Apriori algorithm.Briefly, FP-tree algorithm is not generate candidate item
In the case where, complete the function of Apriori algorithm.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without
candidate generation. ACM SIGMOD International Conference on Management of
Data (SIGMOD ' 00), 29 (2): 1-12, May 2000.(J. Han, J. Pei, and Y. Yin. " do not generate time
The Frequent Pattern Mining of option ", data management international conference, 29 (2): 1-12,2000.05.) describe FP-tree algorithm
The step of it is as follows:
(1) it is primary to scan entire transaction database D, obtains the support counting of whole item included in D, excludes branch
Degree of holding count value is less than the item of threshold value, and remaining item is frequent episode, arranges to obtain by its support counting descending to frequent episode
One list L;
(2) the root node T of FP-tree is created, with " null " label.Transaction database is scanned again.To thing each in D
Frequent episode therein is selected and is sorted by the order in L by business.If the frequent episode table after sequence is [p | P], wherein p is first
A frequent episode, and P is remaining frequent episode.Calling insert_tree ([p | P], T).Insert_tree ([p | P], T) and process
Executive condition is as follows: if T has children N to make N .item_name=p.item_name, the counting of N increases by 1;Otherwise one is created
A new node N, is counted and is set as 1, its father node T is linked to.If P non-empty, recursively calls insert_tree
(P, N).
By above step, a complete FP-tree has just been established.Finally according to established FP-tree under
It is up excavated in proper order, that is, can produce required frequent mode.It can be described as utilizing the letter in transaction database in brief
Breath constructs FP-tree, then the Mining Frequent Patterns from FP-tree.Its core concept is that direct compressed data library constructs one
Then frequent pattern tree (fp tree) generates correlation rule by this tree.
Fig. 1 gives the building process example of FP-tree.Fig. 1 (a) is database, wherein " transaction id " is each friendship
The serial number easily recorded, " project " are all items in each transaction record, and " item after sequence " is to go out occurrence according to each item
Item after number descending arrangement;Initially set up root node of the node as entire frequent pattern tree (fp tree) that a label is null, scanning
After first transaction record, node a is established, and enabling the value of the count field of node a is 1, shows that project a occurs 1 time, such as Fig. 1 (b)
It is shown;After scanning Article 2 transaction record, node b, c, d are successively established, the value of node count field is 1, show project b,
C, d also occur 1 time respectively, as shown in Fig. 1 (c);After successively scanning through in database All Activity record, foundation it is complete
Shown in FP-tree such as Fig. 1 (d), wherein each alphabet shows the item in database, stored in the digital representation count field after letter
The number that value, as this occur in the database.
But FP-tree algorithm there are the problem of have: during constructing frequent pattern tree (fp tree), in one affairs of every scanning
One item will be updated operation to FP-tree, i.e., carry out write operation to the node count field of respective items in FP-tree, this
A large amount of duplicate write operations are had led to, memory overhead is huge;And it is more closer to the write operation of root node, intensive largely writes
The service life that operation will lead to NVM is reduced.
Summary of the invention
In view of the problems of the existing technology, the technical problem to be solved by the invention is to provide a kind of memory-based
Frequent Pattern Mining method, it can be reduced write operation during constructing frequent pattern tree (fp tree) to NVM, be avoided that intensive a large amount of
Write operation achievees the purpose that extend the NVM service life
The technical problem to be solved by the present invention is in this way technical solution realize, it the following steps are included:
Step 1, frequent mode initial tree is constructed
1), successively each transaction record in scan database obtains the support of whole item included in database
Degree counts, and excludes the item that support counting value is less than threshold value, and remaining item is frequent episode, presses its support counting to frequent episode
Descending arranges to obtain a list L;
2) the root node T of frequent pattern tree (fp tree) is created, with " null " label;
3) frequent episode in every affairs of reading is selected and is sorted by the order in L by, scan database again;Row
The path of a frequent pattern tree (fp tree) is constructed after sequence using null as root node, only count is incremented for the node in most end upper to path,
The counting of other nodes on path remains unchanged;It successively scans through at the beginning of obtaining frequent mode after all affairs in entire database
Begin tree;
Step 2, frequent mode initial tree is successively traversed with Depth Priority Algorithm, the counter of traversing nodes
Value is that the value of the node itself adds the value of its all child's node.
The value of the count field of all elements is that the element occurs in entire database in frequent pattern tree (fp tree) of the invention
Number, as the tree that the Mining Algorithms of Frequent Patterns of the prior art constructs.
Compared with prior art, the solution have the advantages that:
The present invention is no longer updated operation to the count field of all nodes in current whole affairs, avoids
A large amount of duplicate write operations during frequent pattern tree (fp tree) are constructed, the write operation to NVM is reduced, can quickly construct frequent mode
Tree;And can be reduced a large amount of intensive write operations of the node count field to close root node, extend the NVM service life.
Detailed description of the invention
Detailed description of the invention of the invention is as follows:
Fig. 1 is the building exemplary diagram of the frequent pattern tree (fp tree) in background technique;
Fig. 2 is the flow chart of present invention building frequent mode initial tree;
Fig. 3 is the building exemplary diagram of frequent pattern tree (fp tree) of the invention;
Fig. 4 is the comparison diagram of read operation test in test;
Fig. 5 is the comparison diagram of write operation test in test;
Fig. 6 is the comparison diagram of building tree time test in test;
Fig. 7 is the comparison diagram of PCM life test in test.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples:
Input of the invention is database and minimum support threshold value σ, and output is FP-tree.
The present invention the following steps are included:
Step 1, frequent mode initial tree is constructed
1), successively each transaction record in scan database obtains the support of whole item included in database
Degree counts, and excludes the item that support counting value is less than threshold value, and remaining item is frequent episode, presses its support counting to frequent episode
Descending arranges to obtain a list L;
2) the root node T of frequent pattern tree (fp tree) is created, with " null " label;
3) frequent episode in every affairs of reading is selected and is sorted by the order in L by, scan database again;Row
The path of a frequent pattern tree (fp tree) is constructed after sequence using null as root node, only count is incremented for the node in most end upper to path,
The counting of other nodes on path remains unchanged;It successively scans through at the beginning of obtaining frequent mode after all affairs in entire database
Begin tree.
Fig. 2 is the flow chart of present invention building frequent mode initial tree, and process is as follows:
In step S21, the item that minimum support is not up in each affairs is left out, its frequency of occurrence is pressed to remaining item
Descending sort;
In step S22, successively affairs in scan database;
In step S23, each of affairs item is successively scanned, is traversed down along tree from root node from front to back;
In step S24, judge whether currentitem is the item of most end in affairs, if so, executing step S25;If not,
Execute step S27;
It whether there is corresponding node in step S25, decision tree, such as exist, then follow the steps S26;It is such as not present, then holds
Row step S29;
In step S26, it is incremented by the value of the middle count field of this;Then step S210 is gone to;
It whether there is corresponding node in step S27, decision tree, such as exist, then return step S23;It is such as not present, then holds
Row step S28;
In step S28, new node is created, enabling the value of its count field is 0;Then step S23 is returned;
In step S29, new node is created, enabling the value of its count field is 1;Then step S210 is gone to;
In step S210, judge whether all affairs are scanned, if not scanned, return step S22;If scanning
It finishes, thens follow the steps S211
In step S211, EP (end of program);
Step 2, complete frequent pattern tree (fp tree) is constructed
Frequent mode initial tree is successively traversed with Depth Priority Algorithm, the Counter Value of traversing nodes is should
The value of node itself adds the value of its all child's node.
Embodiment
Fig. 3 be the present invention building frequent pattern tree (fp tree) an example, the present embodiment the following steps are included:
Step 1, according to Fig. 3 (a) database sharing frequent mode initial tree, detailed process is as follows:
As shown in Figure 3 (b), root node of the node as entire frequent pattern tree (fp tree) that a label is null is established;Scanning
After first transaction record, node a is established, enabling the counting thresholding of node a is 1, shows that project a occurs 1 time;
As shown in Figure 3 (c), after scanning Article 2 transaction record, node b, c, d are constructed, enabling the count thresholding of b, c is 0, d
Count thresholding be 1, show that project d occurs for 1 time and (generates redundancy in order to reduce at this time when building frequent pattern tree (fp tree) and write, not
B is recorded, the number that c occurs, only record is located at the number that the item d at this transaction record end occurs, because what b later, c occurred
Number can be obtained according to the value of the count field of its child's node);
As shown in Fig. 3 (d), constructed initial tree out after entire database All Activity records successively is scanned through;
Step 2, complete frequent pattern tree (fp tree) is constructed
As shown in Fig. 3 (e), frequent mode initial tree is successively traversed with Depth Priority Algorithm, traversing nodes
Counter Value be the node itself value add its all child's node value.Such as the value of c count field is the original value 0 of c
The sum of with the value 5 of d count field, finally show that c occurs 5 times;The value and f that the value of f count field is child's node e and g of f are original
The sum of value 3 finally show that f occurs 6 times.After successively having traversed frequent pattern tree (fp tree), complete frequent pattern tree (fp tree) is constructed.
Experiment test
It chooses different types of data set to be tested, counts the read-write operation number of each data set, total building tree
Time and PCM service life.The title of these data sets be respectively T10I4D100K, T40I10D100K, chess, mushroom,
pumsb*、connect、pumsb、accidents、C73D10、C20D10。
Experimental result is referring to fig. 4 to Fig. 7:
In Fig. 4, ordinate represents the number read, and abscissa represents each data set, as can be seen from Figure 4, The present invention reduces
A large amount of read operation;
In Fig. 5, ordinate represents the number write, and abscissa represents each data set, as can be seen from Figure 5, The present invention reduces
A large amount of write operation;
In Fig. 6, ordinate represents the time of total building tree, and abscissa represents each data set, as can be seen from Figure 6, this hair
The bright time for reducing building tree;
In Fig. 7, ordinate is represented until PCM is write bad, to handle total transaction amount, and abscissa represents each data
Collection, as seen from Figure 7, the service life that the present invention can at least extend PCM is that 16.67%(occurs in data set T40I10D100K), most
99.05%(can be extended greatly to occur greatly to extend the service life of PCM in data set connect).
Claims (2)
1. a kind of Frequent Pattern Mining method memory-based, characterized in that the following steps are included:
Step 1, frequent mode initial tree is constructed
1), successively each transaction record in scan database obtains the support meter of whole item included in database
Number excludes the item that support counting value is less than threshold value, and remaining item is frequent episode, presses its support counting descending to frequent episode
Arrangement obtains a list L;
2) the root node T of frequent pattern tree (fp tree) is created, with " null " label;
3) frequent episode in every affairs of reading is selected and is sorted by the order in L by, scan database again;After sequence
The path of a frequent pattern tree (fp tree) is constructed using null as root node, only count is incremented for the node in most end upper to path, path
On the countings of other nodes remain unchanged;Successively scan through in entire database that frequent mode is obtained after all affairs is initial
Tree;
Step 2, frequent mode initial tree is successively traversed with Depth Priority Algorithm, the Counter Value of traversing nodes is
The value of the node itself adds the value of its all child's node.
2. Frequent Pattern Mining method memory-based according to claim 1, characterized in that the 3) step of the of step 1
Detailed process is as follows:
In step S21, the item that minimum support is not up in each affairs is left out, its frequency of occurrence descending is pressed to remaining item
Sequence;
In step S22, successively affairs in scan database;
In step S23, each of affairs item is successively scanned, is traversed down along tree from root node from front to back;
In step S24, judge whether currentitem is the item of most end in affairs, if so, executing step S25;If not, executing
Step S27;
It whether there is corresponding node in step S25, decision tree, such as exist, then follow the steps S26;It is such as not present, then executes step
Rapid S29;
In step S26, it is incremented by the value of the middle count field of this;Then step S210 is gone to;
It whether there is corresponding node in step S27, decision tree, such as exist, then return step S23;It is such as not present, then executes step
Rapid S28;
In step S28, new node is created, enabling the value of its count field is 0;Then step S23 is returned;
In step S29, new node is created, enabling the value of its count field is 1;Then step S210 is gone to;
In step S210, judge whether all affairs are scanned, if not scanned, return step S22;If scanning through
Finish, thens follow the steps S211
In step S211, EP (end of program).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610662641.2A CN106250549B (en) | 2016-08-14 | 2016-08-14 | A kind of Frequent Pattern Mining method memory-based |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610662641.2A CN106250549B (en) | 2016-08-14 | 2016-08-14 | A kind of Frequent Pattern Mining method memory-based |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106250549A CN106250549A (en) | 2016-12-21 |
CN106250549B true CN106250549B (en) | 2019-09-20 |
Family
ID=57591955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610662641.2A Active CN106250549B (en) | 2016-08-14 | 2016-08-14 | A kind of Frequent Pattern Mining method memory-based |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250549B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874396B (en) * | 2017-01-16 | 2020-04-14 | 重庆大学 | Frequent pattern mining method based on nonvolatile memory |
CN110096629B (en) * | 2019-05-15 | 2023-07-28 | 重庆大学 | Memory optimization method for transaction processing |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119302A (en) * | 2007-09-06 | 2008-02-06 | 华中科技大学 | Method for digging frequency mode in the lately time window of affair data flow |
CN102662948A (en) * | 2012-02-23 | 2012-09-12 | 浙江工商大学 | Data mining method for quickly finding utility pattern |
CN105589900A (en) * | 2014-11-21 | 2016-05-18 | 中国银联股份有限公司 | Data mining method based on multi-dimensional analysis |
-
2016
- 2016-08-14 CN CN201610662641.2A patent/CN106250549B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119302A (en) * | 2007-09-06 | 2008-02-06 | 华中科技大学 | Method for digging frequency mode in the lately time window of affair data flow |
CN102662948A (en) * | 2012-02-23 | 2012-09-12 | 浙江工商大学 | Data mining method for quickly finding utility pattern |
CN105589900A (en) * | 2014-11-21 | 2016-05-18 | 中国银联股份有限公司 | Data mining method based on multi-dimensional analysis |
Non-Patent Citations (4)
Title |
---|
Enhancing Lifetime of NVM-based Main Memorywith Bit Shifting and Flipping;Xianlu Luo 等;《Embedded and Real-Time Computing System and Applications》;20140822;第1-7页 * |
基于数组前缀树的频繁项集挖掘算法;牛新征 等;《小型微型计算机系统》;20140831;第35卷(第8期);第1693-1698页 * |
基于模式增长方式的高效用模式挖掘算法;王乐 等;《自动化学报》;20150930;第41卷(第9期);第1616-1626页 * |
多核处理器上的频繁图挖掘方法;栾华 等;《计算机研究与发展》;20151231;第2844-2856页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106250549A (en) | 2016-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Garcia et al. | Enhancing architectural recovery using concerns | |
Kuramochi et al. | An efficient algorithm for discovering frequent subgraphs | |
CN104715073B (en) | Based on the association rule mining system for improving Apriori algorithm | |
CN101772760B (en) | Database management program and database management device | |
Masseglia et al. | Sequential pattern mining | |
Chu et al. | Density conscious subspace clustering for high-dimensional data | |
CN106250549B (en) | A kind of Frequent Pattern Mining method memory-based | |
Antunes et al. | Sequential pattern mining algorithms: trade-offs between speed and memory | |
CN103136244A (en) | Parallel data mining method and system based on cloud computing platform | |
Tax et al. | Mining local process models with constraints efficiently: applications to the analysis of smart home data | |
US20020032538A1 (en) | Software test system and method | |
Sinha et al. | Identification of best algorithm in association rule mining based on performance | |
CN113220578A (en) | Method for generating function test case | |
CN102214248A (en) | Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data | |
Joshi et al. | An implementation of frequent pattern mining algorithm using dynamic function | |
Sharma et al. | A Performance based Transposition algorithm for Frequent itemsets Generation | |
CN115904970A (en) | Regression testing method and equipment | |
Yang et al. | Stamp: On discovery of statistically important pattern repeats in long sequential data | |
Mokeddem et al. | Distributed classification using class-association rules mining algorithm | |
Lo et al. | Bidirectional mining of non-redundant recurrent rules from a sequence database | |
Chen et al. | Research on association rules mining base on positive and negative items of FP-tree | |
Jing | Set-Based differential evolution algorithm based on guided local exploration for automated process discovery | |
CN109697197A (en) | A method of carving multiple Access database file | |
Chezhian et al. | Hierarchical sequence clustering algorithm for data mining | |
Chen et al. | Towards correlated sequential rules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |