CN106250549B - A kind of Frequent Pattern Mining method memory-based - Google Patents

A kind of Frequent Pattern Mining method memory-based Download PDF

Info

Publication number
CN106250549B
CN106250549B CN201610662641.2A CN201610662641A CN106250549B CN 106250549 B CN106250549 B CN 106250549B CN 201610662641 A CN201610662641 A CN 201610662641A CN 106250549 B CN106250549 B CN 106250549B
Authority
CN
China
Prior art keywords
tree
frequent
node
value
affairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610662641.2A
Other languages
Chinese (zh)
Other versions
CN106250549A (en
Inventor
刘铎
林怡
黄柏钧
朱潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201610662641.2A priority Critical patent/CN106250549B/en
Publication of CN106250549A publication Critical patent/CN106250549A/en
Application granted granted Critical
Publication of CN106250549B publication Critical patent/CN106250549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Abstract

The invention discloses a kind of Frequent Pattern Mining methods memory-based, it constructs frequent mode initial tree, create the root node T of frequent pattern tree (fp tree) the following steps are included: step 1, with " null " label;Frequent episode in every affairs of reading is selected and is sorted by the order in L by scan database again;The path of a frequent pattern tree (fp tree) is constructed after sequence using null as root node, only count is incremented for the node in most end upper to path, and the counting of other nodes on path remains unchanged;It successively scans through and obtains frequent mode initial tree in entire database after all affairs;Step 2, frequent mode initial tree is successively traversed with Depth Priority Algorithm, the Counter Value of traversing nodes is that the value of the node itself adds the value of its all child's node.The solution have the advantages that: it can be reduced the write operation to NVM, can quickly construct frequent pattern tree (fp tree);And can be reduced to a large amount of intensive write operations of node count field close to root node, extend the NVM service life.

Description

A kind of Frequent Pattern Mining method memory-based
Technical field
The invention belongs to memory technology fields, and in particular to a kind of Frequent Pattern Mining method memory-based.
Background technique
Increasingly mature with computer technology, data analysis has had great development since 20th century established.Data Analysis can find in mass data and extract interested project, to provide instruction to policy-making body.Machine Study and data mining can disclose the information that data are hidden behind, it has also become be the key technology of data analysis.
In the field of data mining, it is found that frequent episode or frequent mode in data set are one in data mining research Important topic, it is the base of many significant data mining tasks such as correlation analysis, sequence pattern, causality, Emerging Pattern Plinth.There are the technologies such as Apriori and FP-tree at present to handle Frequent Pattern Mining problem.
Since the condition of Frequent Pattern Mining method memory-based is to be mined data and data element is stored in byte On addressing register, and DRAM requires to need continued power to keep data, and therefore, efficiency and persistence are likely to become data Key Design problem in digging system.In order to solve the problems, such as such, such as phase transition storage in data memory-based analysis (PCM) etc. nonvolatile memories (NVM) are typically considered the excellent of DRAM due to its outstanding non-volatile and performance efficiency Elegant substitute.But NVM is used as main memory there is a problem of again and is following: first is that the read-write operation time difference to NVM is bigger, Read operation is usually more than time spent by write operation and energy;Second is that NVM write operation number is limited, non-uniform write operation Monolith NVM would generally be accelerated to fail.Just because of the considerations of lacking to NVM essential characteristic, the data carried out on NVM at present are dug Pick seriously affects performance and the service life of storage system with machine learning algorithm.
The prior art uses a kind of technical solution for being called FP-tree algorithm, it is the improvement to Apriori algorithm, will The structure of the key message boil down to frequent pattern tree (fp tree) (FP-tree) of frequent mode, it is huge to reduce expense in Apriori algorithm Candidate item, to solve the performance bottleneck of Apriori algorithm.Briefly, FP-tree algorithm is not generate candidate item In the case where, complete the function of Apriori algorithm.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD International Conference on Management of Data (SIGMOD ' 00), 29 (2): 1-12, May 2000.(J. Han, J. Pei, and Y. Yin. " do not generate time The Frequent Pattern Mining of option ", data management international conference, 29 (2): 1-12,2000.05.) describe FP-tree algorithm The step of it is as follows:
(1) it is primary to scan entire transaction database D, obtains the support counting of whole item included in D, excludes branch Degree of holding count value is less than the item of threshold value, and remaining item is frequent episode, arranges to obtain by its support counting descending to frequent episode One list L;
(2) the root node T of FP-tree is created, with " null " label.Transaction database is scanned again.To thing each in D Frequent episode therein is selected and is sorted by the order in L by business.If the frequent episode table after sequence is [p | P], wherein p is first A frequent episode, and P is remaining frequent episode.Calling insert_tree ([p | P], T).Insert_tree ([p | P], T) and process Executive condition is as follows: if T has children N to make N .item_name=p.item_name, the counting of N increases by 1;Otherwise one is created A new node N, is counted and is set as 1, its father node T is linked to.If P non-empty, recursively calls insert_tree (P, N).
By above step, a complete FP-tree has just been established.Finally according to established FP-tree under It is up excavated in proper order, that is, can produce required frequent mode.It can be described as utilizing the letter in transaction database in brief Breath constructs FP-tree, then the Mining Frequent Patterns from FP-tree.Its core concept is that direct compressed data library constructs one Then frequent pattern tree (fp tree) generates correlation rule by this tree.
Fig. 1 gives the building process example of FP-tree.Fig. 1 (a) is database, wherein " transaction id " is each friendship The serial number easily recorded, " project " are all items in each transaction record, and " item after sequence " is to go out occurrence according to each item Item after number descending arrangement;Initially set up root node of the node as entire frequent pattern tree (fp tree) that a label is null, scanning After first transaction record, node a is established, and enabling the value of the count field of node a is 1, shows that project a occurs 1 time, such as Fig. 1 (b) It is shown;After scanning Article 2 transaction record, node b, c, d are successively established, the value of node count field is 1, show project b, C, d also occur 1 time respectively, as shown in Fig. 1 (c);After successively scanning through in database All Activity record, foundation it is complete Shown in FP-tree such as Fig. 1 (d), wherein each alphabet shows the item in database, stored in the digital representation count field after letter The number that value, as this occur in the database.
But FP-tree algorithm there are the problem of have: during constructing frequent pattern tree (fp tree), in one affairs of every scanning One item will be updated operation to FP-tree, i.e., carry out write operation to the node count field of respective items in FP-tree, this A large amount of duplicate write operations are had led to, memory overhead is huge;And it is more closer to the write operation of root node, intensive largely writes The service life that operation will lead to NVM is reduced.
Summary of the invention
In view of the problems of the existing technology, the technical problem to be solved by the invention is to provide a kind of memory-based Frequent Pattern Mining method, it can be reduced write operation during constructing frequent pattern tree (fp tree) to NVM, be avoided that intensive a large amount of Write operation achievees the purpose that extend the NVM service life
The technical problem to be solved by the present invention is in this way technical solution realize, it the following steps are included:
Step 1, frequent mode initial tree is constructed
1), successively each transaction record in scan database obtains the support of whole item included in database Degree counts, and excludes the item that support counting value is less than threshold value, and remaining item is frequent episode, presses its support counting to frequent episode Descending arranges to obtain a list L;
2) the root node T of frequent pattern tree (fp tree) is created, with " null " label;
3) frequent episode in every affairs of reading is selected and is sorted by the order in L by, scan database again;Row The path of a frequent pattern tree (fp tree) is constructed after sequence using null as root node, only count is incremented for the node in most end upper to path, The counting of other nodes on path remains unchanged;It successively scans through at the beginning of obtaining frequent mode after all affairs in entire database Begin tree;
Step 2, frequent mode initial tree is successively traversed with Depth Priority Algorithm, the counter of traversing nodes Value is that the value of the node itself adds the value of its all child's node.
The value of the count field of all elements is that the element occurs in entire database in frequent pattern tree (fp tree) of the invention Number, as the tree that the Mining Algorithms of Frequent Patterns of the prior art constructs.
Compared with prior art, the solution have the advantages that:
The present invention is no longer updated operation to the count field of all nodes in current whole affairs, avoids A large amount of duplicate write operations during frequent pattern tree (fp tree) are constructed, the write operation to NVM is reduced, can quickly construct frequent mode Tree;And can be reduced a large amount of intensive write operations of the node count field to close root node, extend the NVM service life.
Detailed description of the invention
Detailed description of the invention of the invention is as follows:
Fig. 1 is the building exemplary diagram of the frequent pattern tree (fp tree) in background technique;
Fig. 2 is the flow chart of present invention building frequent mode initial tree;
Fig. 3 is the building exemplary diagram of frequent pattern tree (fp tree) of the invention;
Fig. 4 is the comparison diagram of read operation test in test;
Fig. 5 is the comparison diagram of write operation test in test;
Fig. 6 is the comparison diagram of building tree time test in test;
Fig. 7 is the comparison diagram of PCM life test in test.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples:
Input of the invention is database and minimum support threshold value σ, and output is FP-tree.
The present invention the following steps are included:
Step 1, frequent mode initial tree is constructed
1), successively each transaction record in scan database obtains the support of whole item included in database Degree counts, and excludes the item that support counting value is less than threshold value, and remaining item is frequent episode, presses its support counting to frequent episode Descending arranges to obtain a list L;
2) the root node T of frequent pattern tree (fp tree) is created, with " null " label;
3) frequent episode in every affairs of reading is selected and is sorted by the order in L by, scan database again;Row The path of a frequent pattern tree (fp tree) is constructed after sequence using null as root node, only count is incremented for the node in most end upper to path, The counting of other nodes on path remains unchanged;It successively scans through at the beginning of obtaining frequent mode after all affairs in entire database Begin tree.
Fig. 2 is the flow chart of present invention building frequent mode initial tree, and process is as follows:
In step S21, the item that minimum support is not up in each affairs is left out, its frequency of occurrence is pressed to remaining item Descending sort;
In step S22, successively affairs in scan database;
In step S23, each of affairs item is successively scanned, is traversed down along tree from root node from front to back;
In step S24, judge whether currentitem is the item of most end in affairs, if so, executing step S25;If not, Execute step S27;
It whether there is corresponding node in step S25, decision tree, such as exist, then follow the steps S26;It is such as not present, then holds Row step S29;
In step S26, it is incremented by the value of the middle count field of this;Then step S210 is gone to;
It whether there is corresponding node in step S27, decision tree, such as exist, then return step S23;It is such as not present, then holds Row step S28;
In step S28, new node is created, enabling the value of its count field is 0;Then step S23 is returned;
In step S29, new node is created, enabling the value of its count field is 1;Then step S210 is gone to;
In step S210, judge whether all affairs are scanned, if not scanned, return step S22;If scanning It finishes, thens follow the steps S211
In step S211, EP (end of program);
Step 2, complete frequent pattern tree (fp tree) is constructed
Frequent mode initial tree is successively traversed with Depth Priority Algorithm, the Counter Value of traversing nodes is should The value of node itself adds the value of its all child's node.
Embodiment
Fig. 3 be the present invention building frequent pattern tree (fp tree) an example, the present embodiment the following steps are included:
Step 1, according to Fig. 3 (a) database sharing frequent mode initial tree, detailed process is as follows:
As shown in Figure 3 (b), root node of the node as entire frequent pattern tree (fp tree) that a label is null is established;Scanning After first transaction record, node a is established, enabling the counting thresholding of node a is 1, shows that project a occurs 1 time;
As shown in Figure 3 (c), after scanning Article 2 transaction record, node b, c, d are constructed, enabling the count thresholding of b, c is 0, d Count thresholding be 1, show that project d occurs for 1 time and (generates redundancy in order to reduce at this time when building frequent pattern tree (fp tree) and write, not B is recorded, the number that c occurs, only record is located at the number that the item d at this transaction record end occurs, because what b later, c occurred Number can be obtained according to the value of the count field of its child's node);
As shown in Fig. 3 (d), constructed initial tree out after entire database All Activity records successively is scanned through;
Step 2, complete frequent pattern tree (fp tree) is constructed
As shown in Fig. 3 (e), frequent mode initial tree is successively traversed with Depth Priority Algorithm, traversing nodes Counter Value be the node itself value add its all child's node value.Such as the value of c count field is the original value 0 of c The sum of with the value 5 of d count field, finally show that c occurs 5 times;The value and f that the value of f count field is child's node e and g of f are original The sum of value 3 finally show that f occurs 6 times.After successively having traversed frequent pattern tree (fp tree), complete frequent pattern tree (fp tree) is constructed.
Experiment test
It chooses different types of data set to be tested, counts the read-write operation number of each data set, total building tree Time and PCM service life.The title of these data sets be respectively T10I4D100K, T40I10D100K, chess, mushroom, pumsb*、connect、pumsb、accidents、C73D10、C20D10。
Experimental result is referring to fig. 4 to Fig. 7:
In Fig. 4, ordinate represents the number read, and abscissa represents each data set, as can be seen from Figure 4, The present invention reduces A large amount of read operation;
In Fig. 5, ordinate represents the number write, and abscissa represents each data set, as can be seen from Figure 5, The present invention reduces A large amount of write operation;
In Fig. 6, ordinate represents the time of total building tree, and abscissa represents each data set, as can be seen from Figure 6, this hair The bright time for reducing building tree;
In Fig. 7, ordinate is represented until PCM is write bad, to handle total transaction amount, and abscissa represents each data Collection, as seen from Figure 7, the service life that the present invention can at least extend PCM is that 16.67%(occurs in data set T40I10D100K), most 99.05%(can be extended greatly to occur greatly to extend the service life of PCM in data set connect).

Claims (2)

1. a kind of Frequent Pattern Mining method memory-based, characterized in that the following steps are included:
Step 1, frequent mode initial tree is constructed
1), successively each transaction record in scan database obtains the support meter of whole item included in database Number excludes the item that support counting value is less than threshold value, and remaining item is frequent episode, presses its support counting descending to frequent episode Arrangement obtains a list L;
2) the root node T of frequent pattern tree (fp tree) is created, with " null " label;
3) frequent episode in every affairs of reading is selected and is sorted by the order in L by, scan database again;After sequence The path of a frequent pattern tree (fp tree) is constructed using null as root node, only count is incremented for the node in most end upper to path, path On the countings of other nodes remain unchanged;Successively scan through in entire database that frequent mode is obtained after all affairs is initial Tree;
Step 2, frequent mode initial tree is successively traversed with Depth Priority Algorithm, the Counter Value of traversing nodes is The value of the node itself adds the value of its all child's node.
2. Frequent Pattern Mining method memory-based according to claim 1, characterized in that the 3) step of the of step 1 Detailed process is as follows:
In step S21, the item that minimum support is not up in each affairs is left out, its frequency of occurrence descending is pressed to remaining item Sequence;
In step S22, successively affairs in scan database;
In step S23, each of affairs item is successively scanned, is traversed down along tree from root node from front to back;
In step S24, judge whether currentitem is the item of most end in affairs, if so, executing step S25;If not, executing Step S27;
It whether there is corresponding node in step S25, decision tree, such as exist, then follow the steps S26;It is such as not present, then executes step Rapid S29;
In step S26, it is incremented by the value of the middle count field of this;Then step S210 is gone to;
It whether there is corresponding node in step S27, decision tree, such as exist, then return step S23;It is such as not present, then executes step Rapid S28;
In step S28, new node is created, enabling the value of its count field is 0;Then step S23 is returned;
In step S29, new node is created, enabling the value of its count field is 1;Then step S210 is gone to;
In step S210, judge whether all affairs are scanned, if not scanned, return step S22;If scanning through Finish, thens follow the steps S211
In step S211, EP (end of program).
CN201610662641.2A 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based Active CN106250549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610662641.2A CN106250549B (en) 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610662641.2A CN106250549B (en) 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based

Publications (2)

Publication Number Publication Date
CN106250549A CN106250549A (en) 2016-12-21
CN106250549B true CN106250549B (en) 2019-09-20

Family

ID=57591955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610662641.2A Active CN106250549B (en) 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based

Country Status (1)

Country Link
CN (1) CN106250549B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874396B (en) * 2017-01-16 2020-04-14 重庆大学 Frequent pattern mining method based on nonvolatile memory
CN110096629B (en) * 2019-05-15 2023-07-28 重庆大学 Memory optimization method for transaction processing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119302A (en) * 2007-09-06 2008-02-06 华中科技大学 Method for digging frequency mode in the lately time window of affair data flow
CN102662948A (en) * 2012-02-23 2012-09-12 浙江工商大学 Data mining method for quickly finding utility pattern
CN105589900A (en) * 2014-11-21 2016-05-18 中国银联股份有限公司 Data mining method based on multi-dimensional analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119302A (en) * 2007-09-06 2008-02-06 华中科技大学 Method for digging frequency mode in the lately time window of affair data flow
CN102662948A (en) * 2012-02-23 2012-09-12 浙江工商大学 Data mining method for quickly finding utility pattern
CN105589900A (en) * 2014-11-21 2016-05-18 中国银联股份有限公司 Data mining method based on multi-dimensional analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Enhancing Lifetime of NVM-based Main Memorywith Bit Shifting and Flipping;Xianlu Luo 等;《Embedded and Real-Time Computing System and Applications》;20140822;第1-7页 *
基于数组前缀树的频繁项集挖掘算法;牛新征 等;《小型微型计算机系统》;20140831;第35卷(第8期);第1693-1698页 *
基于模式增长方式的高效用模式挖掘算法;王乐 等;《自动化学报》;20150930;第41卷(第9期);第1616-1626页 *
多核处理器上的频繁图挖掘方法;栾华 等;《计算机研究与发展》;20151231;第2844-2856页 *

Also Published As

Publication number Publication date
CN106250549A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
Garcia et al. Enhancing architectural recovery using concerns
Kuramochi et al. An efficient algorithm for discovering frequent subgraphs
CN104715073B (en) Based on the association rule mining system for improving Apriori algorithm
CN101772760B (en) Database management program and database management device
Masseglia et al. Sequential pattern mining
Chu et al. Density conscious subspace clustering for high-dimensional data
CN106250549B (en) A kind of Frequent Pattern Mining method memory-based
Antunes et al. Sequential pattern mining algorithms: trade-offs between speed and memory
CN103136244A (en) Parallel data mining method and system based on cloud computing platform
Tax et al. Mining local process models with constraints efficiently: applications to the analysis of smart home data
US20020032538A1 (en) Software test system and method
Sinha et al. Identification of best algorithm in association rule mining based on performance
CN113220578A (en) Method for generating function test case
CN102214248A (en) Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data
Joshi et al. An implementation of frequent pattern mining algorithm using dynamic function
Sharma et al. A Performance based Transposition algorithm for Frequent itemsets Generation
CN115904970A (en) Regression testing method and equipment
Yang et al. Stamp: On discovery of statistically important pattern repeats in long sequential data
Mokeddem et al. Distributed classification using class-association rules mining algorithm
Lo et al. Bidirectional mining of non-redundant recurrent rules from a sequence database
Chen et al. Research on association rules mining base on positive and negative items of FP-tree
Jing Set-Based differential evolution algorithm based on guided local exploration for automated process discovery
CN109697197A (en) A method of carving multiple Access database file
Chezhian et al. Hierarchical sequence clustering algorithm for data mining
Chen et al. Towards correlated sequential rules

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant