CN106250549A - A kind of Frequent Pattern Mining method based on internal memory - Google Patents

A kind of Frequent Pattern Mining method based on internal memory Download PDF

Info

Publication number
CN106250549A
CN106250549A CN201610662641.2A CN201610662641A CN106250549A CN 106250549 A CN106250549 A CN 106250549A CN 201610662641 A CN201610662641 A CN 201610662641A CN 106250549 A CN106250549 A CN 106250549A
Authority
CN
China
Prior art keywords
tree
frequent
node
value
affairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610662641.2A
Other languages
Chinese (zh)
Other versions
CN106250549B (en
Inventor
刘铎
林怡
黄柏钧
朱潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201610662641.2A priority Critical patent/CN106250549B/en
Publication of CN106250549A publication Critical patent/CN106250549A/en
Application granted granted Critical
Publication of CN106250549B publication Critical patent/CN106250549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Abstract

The invention discloses a kind of Frequent Pattern Mining method based on internal memory, it comprises the following steps: step 1, builds frequent mode initial tree, creates the root node T of frequent pattern tree (fp tree), with " null " labelling;Scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;Build the path of a frequent pattern tree (fp tree) after sequence with null for root node, only counting in the node of most end upper to path adds 1, and the counting of other nodes on path keeps constant;Scan through successively and whole data base obtains after all affairs frequent mode initial tree;Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is the value value plus its all child's nodes of this node itself.The solution have the advantages that: the write operation to NVM can be reduced, can quickly build frequent pattern tree (fp tree);And the write operation the most intensive to the node count field near root node can be reduced, extend the NVM life-span.

Description

A kind of Frequent Pattern Mining method based on internal memory
Technical field
The invention belongs to memory technology field, be specifically related to a kind of Frequent Pattern Mining method based on internal memory.
Background technology
Along with computer science and technology increasingly mature, data analysis from 20th century establish since had great development.Data Analysis can find and extract project interested in mass data, thus provides instruction to policy-making body.Machine Study and data mining can disclose the information that data are hidden behind, it has also become be the key technology of data analysis.
In Data Mining, find that the frequent episode in data set or frequent mode are in data mining research Important topic, it is the base of many significant data mining tasks such as correlation analysis, sequence pattern, cause effect relation, Emerging Pattern Plinth.There are the technology such as such as Apriori and FP-tree at present to process Frequent Pattern Mining problem.
Owing to the condition of Frequent Pattern Mining method based on internal memory is to be mined data and data element is stored in byte On addressing register, and DRAM requires to need continued power to keep data, is likely to become data accordingly, it is capable to imitate with persistency Key Design problem in digging system.In order to solve such problem, such as phase transition storage in data analysis based on internal memory Etc. (PCM) nonvolatile memory (NVM) is due to its outstanding non-volatile and performance efficiency, is typically considered the excellent of DRAM Elegant succedaneum.But use NVM to there is again problem below as hosting: one is that the read-write operation time difference to NVM is bigger, Read operation is generally more than the time spent by write operation and energy;Two is the write operation that NVM write operation number of times is limited, uneven Monoblock NVM would generally be accelerated lost efficacy.Just because of the consideration lacked NVM essential characteristic, the data carried out on NVM at present are dug Pick and machine learning algorithm have a strong impact on performance and the life-span of storage system.
Prior art uses a kind of technical scheme being called FP-tree algorithm, and it is the improvement to Apriori algorithm, will The structure of key message boil down to frequent pattern tree (fp tree) (FP-tree) of frequent mode, huge to reduce expense in Apriori algorithm Candidate item, thus solve the performance bottleneck of Apriori algorithm.Briefly, FP-tree algorithm is not generate candidate item In the case of, complete the function of Apriori algorithm.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD International Conference on Management of Data (SIGMOD ' 00), 29 (2): 1 12, May 2000.(J. Han, J. Pei, and Y. Yin. " do not produce time The Frequent Pattern Mining of option ", data management international conference, 29 (2): 1 12,2000.05.) describe FP-tree algorithm Step as follows:
(1) scan whole transaction database D once, it is thus achieved that the support counting of the whole items included in D, get rid of support Count value is less than the item of threshold value, and remaining item is frequent episode, and by its support counting descending, frequent episode is obtained one List L;
(2) the root node T of FP-tree is created, with " null " labelling.Again scan transaction database.To affairs each in D, will Frequent episode therein is selected and sorts by the order in L.If the frequent episode table after Pai Xu is [p | P], wherein p be first frequently , and P is remaining frequent episode.Call insert_tree ([p | P], T).Insert_tree ([p | P], T) and process execution feelings Condition is as follows: if T has children N to make N .item_name=p.item_name, then the counting of N increases by 1;Otherwise create a new knot Point N, is counted and is set to 1, be linked to its father node T.If P non-NULL, recursively call insert_tree (P, N).
Through above step, just establish a complete FP-tree.Finally according to the FP-tree established by under Excavate in proper order, required frequent mode can be produced.Can be described as the letter utilized in transaction database in brief Breath structure FP-tree, then Mining Frequent Patterns from FP-tree.Its core concept is directly to compress database sharing one Frequent pattern tree (fp tree), then generates correlation rule by this tree.
Fig. 1 gives the building process example of FP-tree.Fig. 1 (a) is data base, and wherein " transaction id " is each friendship The easily sequence number of record, " project " is all items in each transaction record, and " item after sequence " is for go out occurrence according to each item Item after number descending;Initially set up the node root node as whole frequent pattern tree (fp tree) that label is null, scanning Article 1, after transaction record, set up node a, and to make the value of the count field of node a be 1, show that project a occurs 1 time, such as Fig. 1 (b) Shown in;After scanning Article 2 transaction record, setting up node b, c, d successively, the value of its node count field is 1, shows project b, C, d occur 1 time the most respectively, as shown in Fig. 1 (c);Scan through in data base after All Activity record successively, foundation complete Shown in FP-tree such as Fig. 1 (d), the item in the most each letter representation data base, the numeral after letter represents storage in count field Value, is this number of times occurred in data base.
But the problem that FP-tree algorithm exists has: during building frequent pattern tree (fp tree), often in one affairs of scanning One item, will be updated operation, i.e. the node count field of respective items in FP-tree be carried out write operation FP-tree, this Having led to the write operation repeated in a large number, memory cost is huge;And the most the closer to the write operation of root node, intensive writes in a large number Operation can cause reduce the service life of NVM.
Summary of the invention
The problem existed for prior art, the technical problem to be solved is just to provide a kind of based on internal memory Frequent Pattern Mining method, it can reduce and building during frequent pattern tree (fp tree) the write operation to NVM, is avoided that intensive a large amount of Write operation, reaches to extend the purpose in NVM life-span
The technical problem to be solved is realized by such technical scheme, and it comprises the following steps:
Step 1, builds frequent mode initial tree
1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending Arrangement obtains a list L;
2), create frequent pattern tree (fp tree) root node T, with " null " labelling;
3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;After sequence Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path On other nodes counting keep constant;Scan through in whole data base that to obtain frequent mode after all affairs initial successively Tree;
Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is The value of this node itself is plus the value of its all child's nodes.
In the frequent pattern tree (fp tree) of the present invention, the value of the count field of all elements is this element and occurs in whole data base Number of times, as the tree built with the Mining Algorithms of Frequent Patterns of prior art.
Compared with prior art, the solution have the advantages that:
The present invention no longer count field to the node of all items in current whole piece affairs is updated operation, it is to avoid building A large amount of write operations repeated during frequent pattern tree (fp tree), reduce the write operation to NVM, can quickly build frequent pattern tree (fp tree); And the most intensive write operation to the node count field near root node can be reduced, extend the NVM life-span.
Accompanying drawing explanation
The accompanying drawing of the present invention is described as follows:
Fig. 1 is the structure exemplary plot of the frequent pattern tree (fp tree) in background technology;
Fig. 2 is the flow chart that the present invention builds frequent mode initial tree;
Fig. 3 is the structure exemplary plot of the frequent pattern tree (fp tree) of the present invention;
Fig. 4 is the comparison diagram of read operation test in test;
Fig. 5 is the comparison diagram of write operation test in test;
Fig. 6 is the comparison diagram building tree time test in test;
Fig. 7 is the comparison diagram of PCM life test in test.
Detailed description of the invention
The invention will be further described with embodiment below in conjunction with the accompanying drawings:
The input of the present invention is data base and minimum support threshold value σ, and output is FP-tree.
The present invention comprises the following steps:
Step 1, builds frequent mode initial tree
1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending Arrangement obtains a list L;
2), create frequent pattern tree (fp tree) root node T, with " null " labelling;
3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;After sequence Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path On other nodes counting keep constant;Scan through in whole data base that to obtain frequent mode after all affairs initial successively Tree.
Fig. 2 is the flow chart that the present invention builds frequent mode initial tree, and its flow process is as follows:
In step S21, the item being not up to minimum support is left out, to remaining item by its occurrence number descending in each affairs Sequence;
In step S22, the affairs in scan database successively;
In step S23, each in scanning affairs, down travels through along tree from root node from front to back successively;
In step S24, it is judged that whether currentitem is the item of most end in affairs, if so, perform step S25;If not, perform Step S27;
In step S25, it is judged that whether tree exists corresponding node, as existed, then perform step S26;If do not existed, then perform step Rapid S29;
In step S26, it is incremented by the value of this middle count field;Then step S210 is gone to;
In step S27, it is judged that whether tree exists corresponding node, as existed, then return step S23;If do not existed, then perform step Rapid S28;
In step S28, creating new node, the value making its count field is 0;Then step S23 is returned;
In step S29, creating new node, the value making its count field is 1;Then step S210 is gone to;
In step S210, it is judged that all affairs are the most scanned, if the most scanned, then return step S22;If scanning through Finish, then perform step S211
In step S211, EP (end of program);
Step 2, builds complete frequent pattern tree (fp tree)
Traveling through frequent mode initial tree successively with Depth Priority Algorithm, the Counter Value of traversing nodes is this node The value of itself is plus the value of its all child's nodes.
Embodiment
Fig. 3 is the example that the present invention builds frequent pattern tree (fp tree), and the present embodiment comprises the following steps:
Step 1, according to Fig. 3 (a) database sharing frequent mode initial tree, detailed process is as follows:
As shown in Figure 3 (b), the node root node as whole frequent pattern tree (fp tree) that label is null is set up;Scanning first After bar transaction record, setting up node a, the counting thresholding making node a is 1, shows that project a occurs 1 time;
As shown in Figure 3 (c), after scanning Article 2 transaction record, building node b, c, d, the count thresholding making b, c is 0, d's Count thresholding is 1, shows that (now in order to produce writing of redundancy when reducing and build frequent pattern tree (fp tree), not remembering occurs 1 time in project d Record the number of times that b, c occur, only record and be positioned at the number of times that the item d at this transaction record end occurs, because b afterwards, c appearance is secondary Number can being worth to according to the count field of its child's node);
As shown in Fig. 3 (d), the initial tree gone out constructed by after scanning through whole data base's All Activity record successively;
Step 2, builds complete frequent pattern tree (fp tree)
As shown in Fig. 3 (e), with Depth Priority Algorithm, frequent mode initial tree is traveled through successively, the meter of traversing nodes Number device value is the value value plus its all child's nodes of this node itself.The value 0 that the value of such as c count field is original for c is counted with d Value 5 sum of number field, finally show that c occurs 5 times;The value of f count field be the value of child node e and g of the f value 3 original with f it With, finally show that f occurs 6 times.After having traveled through frequent pattern tree (fp tree) successively, construct complete frequent pattern tree (fp tree).
Experiment test
Choose different types of data set to test, add up the read-write operation number of times of each data set, total build tree time Between and the PCM life-span.The title of these data sets be respectively T10I4D100K, T40I10D100K, chess, mushroom, pumsb*、connect、pumsb、accidents、C73D10、C20D10。
Experimental result sees Fig. 4 to Fig. 7:
In Fig. 4, vertical coordinate represents the number of times read, and abscissa represents each data set, as can be seen from Figure 4, The present invention reduces a large amount of Read operation;
In Fig. 5, vertical coordinate represents the number of times write, and abscissa represents each data set, as can be seen from Figure 5, The present invention reduces a large amount of Write operation;
In Fig. 6, vertical coordinate represents total time building tree, and abscissa represents each data set, and as can be seen from Figure 6, the present invention subtracts Lack the time building tree;
In Fig. 7, vertical coordinate represents until PCM is write bad, treatable total transaction amount, and abscissa represents each data set, As seen from Figure 7, the life-span of the minimum PCM of prolongation of the present invention is that 16.67%(occurs at data set T40I10D100K), maximum can Extend 99.05%(to occur at data set connect), greatly extend the life-span of PCM.

Claims (2)

1. a Frequent Pattern Mining method based on internal memory, is characterized in that, comprise the following steps:
Step 1, builds frequent mode initial tree
1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending Arrangement obtains a list L;
2), create frequent pattern tree (fp tree) root node T, with " null " labelling;
3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;After sequence Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path On other nodes counting keep constant;Scan through in whole data base that to obtain frequent mode after all affairs initial successively Tree;
Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is The value of this node itself is plus the value of its all child's nodes.
Frequent Pattern Mining method based on internal memory the most according to claim 1, is characterized in that, the 3rd of step 1) step Idiographic flow is as follows:
In step S21, the item being not up to minimum support is left out, to remaining item by its occurrence number descending in each affairs Sequence;
In step S22, the affairs in scan database successively;
In step S23, each in scanning affairs, down travels through along tree from root node from front to back successively;
In step S24, it is judged that whether currentitem is the item of most end in affairs, if so, perform step S25;If not, perform Step S27;
In step S25, it is judged that whether tree exists corresponding node, as existed, then perform step S26;If do not existed, then perform step Rapid S29;
In step S26, it is incremented by the value of this middle count field;Then step S210 is gone to;
In step S27, it is judged that whether tree exists corresponding node, as existed, then return step S23;If do not existed, then perform step Rapid S28;
In step S28, creating new node, the value making its count field is 0;Then step S23 is returned;
In step S29, creating new node, the value making its count field is 1;Then step S210 is gone to;
In step S210, it is judged that all affairs are the most scanned, if the most scanned, then return step S22;If scanning through Finish, then perform step S211
In step S211, EP (end of program).
CN201610662641.2A 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based Active CN106250549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610662641.2A CN106250549B (en) 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610662641.2A CN106250549B (en) 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based

Publications (2)

Publication Number Publication Date
CN106250549A true CN106250549A (en) 2016-12-21
CN106250549B CN106250549B (en) 2019-09-20

Family

ID=57591955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610662641.2A Active CN106250549B (en) 2016-08-14 2016-08-14 A kind of Frequent Pattern Mining method memory-based

Country Status (1)

Country Link
CN (1) CN106250549B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874396A (en) * 2017-01-16 2017-06-20 重庆大学 A kind of Frequent Pattern Mining method based on nonvolatile memory
CN110096629A (en) * 2019-05-15 2019-08-06 重庆大学 A method of the Mining Frequent based on effective weight tree weights item collection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119302A (en) * 2007-09-06 2008-02-06 华中科技大学 Method for digging frequency mode in the lately time window of affair data flow
CN102662948A (en) * 2012-02-23 2012-09-12 浙江工商大学 Data mining method for quickly finding utility pattern
CN105589900A (en) * 2014-11-21 2016-05-18 中国银联股份有限公司 Data mining method based on multi-dimensional analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119302A (en) * 2007-09-06 2008-02-06 华中科技大学 Method for digging frequency mode in the lately time window of affair data flow
CN102662948A (en) * 2012-02-23 2012-09-12 浙江工商大学 Data mining method for quickly finding utility pattern
CN105589900A (en) * 2014-11-21 2016-05-18 中国银联股份有限公司 Data mining method based on multi-dimensional analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIANLU LUO 等: "Enhancing Lifetime of NVM-based Main Memorywith Bit Shifting and Flipping", 《EMBEDDED AND REAL-TIME COMPUTING SYSTEM AND APPLICATIONS》 *
栾华 等: "多核处理器上的频繁图挖掘方法", 《计算机研究与发展》 *
牛新征 等: "基于数组前缀树的频繁项集挖掘算法", 《小型微型计算机系统》 *
王乐 等: "基于模式增长方式的高效用模式挖掘算法", 《自动化学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874396A (en) * 2017-01-16 2017-06-20 重庆大学 A kind of Frequent Pattern Mining method based on nonvolatile memory
CN110096629A (en) * 2019-05-15 2019-08-06 重庆大学 A method of the Mining Frequent based on effective weight tree weights item collection
CN110096629B (en) * 2019-05-15 2023-07-28 重庆大学 Memory optimization method for transaction processing

Also Published As

Publication number Publication date
CN106250549B (en) 2019-09-20

Similar Documents

Publication Publication Date Title
US20200167367A1 (en) Block chain state data synchronization method, apparatus, and electronic device
Willard et al. Adding range restriction capability to dynamic data structures
CN107943777A (en) A kind of collaborative editing, cooperative processing method, device, equipment and storage medium
Joshi et al. A dynamic approach for frequent pattern mining using transposition of database
Gan et al. Explainable fuzzy utility mining on sequences
CN106250549A (en) A kind of Frequent Pattern Mining method based on internal memory
CN100419750C (en) Method for converting concatenated join tables into tree structure and conversion program
CN112052233A (en) Multi-angle business process abnormity online detection method based on context awareness
Kiran et al. Finding periodic-frequent patterns in temporal databases using periodic summaries
CN102214248A (en) Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data
Guo et al. High utility episode mining made practical and fast
Sinha et al. Identification of best algorithm in association rule mining based on performance
Oguz et al. Incremental itemset mining based on matrix apriori algorithm
CN111078896A (en) Knowledge base completion method based on PRMATC algorithm
Dubey et al. A novel J2ME service for mining incremental patterns in mobile computing
Lin et al. Efficient mining of high average-utility sequential patterns from uncertain databases
Xiong et al. Mining simple path traversal patterns in knowledge graph
Lee et al. Mining traveling and purchasing behaviors of customers in electronic commerce environment
Chen et al. Research on association rules mining base on positive and negative items of FP-tree
CN109697197A (en) A method of carving multiple Access database file
Lin et al. A share strategy for utility frequent patterns mining
CN111369052A (en) Simplified road network KSP optimization algorithm
CN105989117A (en) Method and system for rapidly and jointly processing semi-structured data
Meddah et al. Mining Patterns Using Business Process Management
Zheng et al. A novel method to generate frequent itemsets in distributed environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant