CN106250549A

CN106250549A - A kind of Frequent Pattern Mining method based on internal memory

Info

Publication number: CN106250549A
Application number: CN201610662641.2A
Authority: CN
Inventors: 刘铎; 林怡; 黄柏钧; 朱潇
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2016-08-14
Filing date: 2016-08-14
Publication date: 2016-12-21
Anticipated expiration: 2036-08-14
Also published as: CN106250549B

Abstract

The invention discloses a kind of Frequent Pattern Mining method based on internal memory, it comprises the following steps: step 1, builds frequent mode initial tree, creates the root node T of frequent pattern tree (fp tree), with " null " labelling；Scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L；Build the path of a frequent pattern tree (fp tree) after sequence with null for root node, only counting in the node of most end upper to path adds 1, and the counting of other nodes on path keeps constant；Scan through successively and whole data base obtains after all affairs frequent mode initial tree；Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is the value value plus its all child's nodes of this node itself.The solution have the advantages that: the write operation to NVM can be reduced, can quickly build frequent pattern tree (fp tree)；And the write operation the most intensive to the node count field near root node can be reduced, extend the NVM life-span.

Description

A kind of Frequent Pattern Mining method based on internal memory

Technical field

The invention belongs to memory technology field, be specifically related to a kind of Frequent Pattern Mining method based on internal memory.

Background technology

Along with computer science and technology increasingly mature, data analysis from 20th century establish since had great development.Data Analysis can find and extract project interested in mass data, thus provides instruction to policy-making body.Machine Study and data mining can disclose the information that data are hidden behind, it has also become be the key technology of data analysis.

In Data Mining, find that the frequent episode in data set or frequent mode are in data mining research Important topic, it is the base of many significant data mining tasks such as correlation analysis, sequence pattern, cause effect relation, Emerging Pattern Plinth.There are the technology such as such as Apriori and FP-tree at present to process Frequent Pattern Mining problem.

Owing to the condition of Frequent Pattern Mining method based on internal memory is to be mined data and data element is stored in byte On addressing register, and DRAM requires to need continued power to keep data, is likely to become data accordingly, it is capable to imitate with persistency Key Design problem in digging system.In order to solve such problem, such as phase transition storage in data analysis based on internal memory Etc. (PCM) nonvolatile memory (NVM) is due to its outstanding non-volatile and performance efficiency, is typically considered the excellent of DRAM Elegant succedaneum.But use NVM to there is again problem below as hosting: one is that the read-write operation time difference to NVM is bigger, Read operation is generally more than the time spent by write operation and energy；Two is the write operation that NVM write operation number of times is limited, uneven Monoblock NVM would generally be accelerated lost efficacy.Just because of the consideration lacked NVM essential characteristic, the data carried out on NVM at present are dug Pick and machine learning algorithm have a strong impact on performance and the life-span of storage system.

Prior art uses a kind of technical scheme being called FP-tree algorithm, and it is the improvement to Apriori algorithm, will The structure of key message boil down to frequent pattern tree (fp tree) (FP-tree) of frequent mode, huge to reduce expense in Apriori algorithm Candidate item, thus solve the performance bottleneck of Apriori algorithm.Briefly, FP-tree algorithm is not generate candidate item In the case of, complete the function of Apriori algorithm.

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD International Conference on Management of Data (SIGMOD ' 00), 29 (2): 1 12, May 2000.(J. Han, J. Pei, and Y. Yin. " do not produce time The Frequent Pattern Mining of option ", data management international conference, 29 (2): 1 12,2000.05.) describe FP-tree algorithm Step as follows:

(1) scan whole transaction database D once, it is thus achieved that the support counting of the whole items included in D, get rid of support Count value is less than the item of threshold value, and remaining item is frequent episode, and by its support counting descending, frequent episode is obtained one List L；

(2) the root node T of FP-tree is created, with " null " labelling.Again scan transaction database.To affairs each in D, will Frequent episode therein is selected and sorts by the order in L.If the frequent episode table after Pai Xu is [p | P], wherein p be first frequently , and P is remaining frequent episode.Call insert_tree ([p | P], T).Insert_tree ([p | P], T) and process execution feelings Condition is as follows: if T has children N to make N .item_name=p.item_name, then the counting of N increases by 1；Otherwise create a new knot Point N, is counted and is set to 1, be linked to its father node T.If P non-NULL, recursively call insert_tree (P, N).

Through above step, just establish a complete FP-tree.Finally according to the FP-tree established by under Excavate in proper order, required frequent mode can be produced.Can be described as the letter utilized in transaction database in brief Breath structure FP-tree, then Mining Frequent Patterns from FP-tree.Its core concept is directly to compress database sharing one Frequent pattern tree (fp tree), then generates correlation rule by this tree.

Fig. 1 gives the building process example of FP-tree.Fig. 1 (a) is data base, and wherein " transaction id " is each friendship The easily sequence number of record, " project " is all items in each transaction record, and " item after sequence " is for go out occurrence according to each item Item after number descending；Initially set up the node root node as whole frequent pattern tree (fp tree) that label is null, scanning Article 1, after transaction record, set up node a, and to make the value of the count field of node a be 1, show that project a occurs 1 time, such as Fig. 1 (b) Shown in；After scanning Article 2 transaction record, setting up node b, c, d successively, the value of its node count field is 1, shows project b, C, d occur 1 time the most respectively, as shown in Fig. 1 (c)；Scan through in data base after All Activity record successively, foundation complete Shown in FP-tree such as Fig. 1 (d), the item in the most each letter representation data base, the numeral after letter represents storage in count field Value, is this number of times occurred in data base.

But the problem that FP-tree algorithm exists has: during building frequent pattern tree (fp tree), often in one affairs of scanning One item, will be updated operation, i.e. the node count field of respective items in FP-tree be carried out write operation FP-tree, this Having led to the write operation repeated in a large number, memory cost is huge；And the most the closer to the write operation of root node, intensive writes in a large number Operation can cause reduce the service life of NVM.

Summary of the invention

The problem existed for prior art, the technical problem to be solved is just to provide a kind of based on internal memory Frequent Pattern Mining method, it can reduce and building during frequent pattern tree (fp tree) the write operation to NVM, is avoided that intensive a large amount of Write operation, reaches to extend the purpose in NVM life-span

The technical problem to be solved is realized by such technical scheme, and it comprises the following steps:

Step 1, builds frequent mode initial tree

1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending Arrangement obtains a list L；

2), create frequent pattern tree (fp tree) root node T, with " null " labelling；

3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L；After sequence Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path On other nodes counting keep constant；Scan through in whole data base that to obtain frequent mode after all affairs initial successively Tree；

Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is The value of this node itself is plus the value of its all child's nodes.

In the frequent pattern tree (fp tree) of the present invention, the value of the count field of all elements is this element and occurs in whole data base Number of times, as the tree built with the Mining Algorithms of Frequent Patterns of prior art.

Compared with prior art, the solution have the advantages that:

The present invention no longer count field to the node of all items in current whole piece affairs is updated operation, it is to avoid building A large amount of write operations repeated during frequent pattern tree (fp tree), reduce the write operation to NVM, can quickly build frequent pattern tree (fp tree)； And the most intensive write operation to the node count field near root node can be reduced, extend the NVM life-span.

Accompanying drawing explanation

The accompanying drawing of the present invention is described as follows:

Fig. 1 is the structure exemplary plot of the frequent pattern tree (fp tree) in background technology；

Fig. 2 is the flow chart that the present invention builds frequent mode initial tree；

Fig. 3 is the structure exemplary plot of the frequent pattern tree (fp tree) of the present invention；

Fig. 4 is the comparison diagram of read operation test in test；

Fig. 5 is the comparison diagram of write operation test in test；

Fig. 6 is the comparison diagram building tree time test in test；

Fig. 7 is the comparison diagram of PCM life test in test.

Detailed description of the invention

The invention will be further described with embodiment below in conjunction with the accompanying drawings:

The input of the present invention is data base and minimum support threshold value σ, and output is FP-tree.

The present invention comprises the following steps:

Step 1, builds frequent mode initial tree

3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L；After sequence Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path On other nodes counting keep constant；Scan through in whole data base that to obtain frequent mode after all affairs initial successively Tree.

Fig. 2 is the flow chart that the present invention builds frequent mode initial tree, and its flow process is as follows:

In step S21, the item being not up to minimum support is left out, to remaining item by its occurrence number descending in each affairs Sequence；

In step S22, the affairs in scan database successively；

In step S23, each in scanning affairs, down travels through along tree from root node from front to back successively；

In step S24, it is judged that whether currentitem is the item of most end in affairs, if so, perform step S25；If not, perform Step S27；

In step S25, it is judged that whether tree exists corresponding node, as existed, then perform step S26；If do not existed, then perform step Rapid S29；

In step S26, it is incremented by the value of this middle count field；Then step S210 is gone to；

In step S27, it is judged that whether tree exists corresponding node, as existed, then return step S23；If do not existed, then perform step Rapid S28；

In step S28, creating new node, the value making its count field is 0；Then step S23 is returned；

In step S29, creating new node, the value making its count field is 1；Then step S210 is gone to；

In step S210, it is judged that all affairs are the most scanned, if the most scanned, then return step S22；If scanning through Finish, then perform step S211

In step S211, EP (end of program)；

Step 2, builds complete frequent pattern tree (fp tree)

Traveling through frequent mode initial tree successively with Depth Priority Algorithm, the Counter Value of traversing nodes is this node The value of itself is plus the value of its all child's nodes.

Embodiment

Fig. 3 is the example that the present invention builds frequent pattern tree (fp tree), and the present embodiment comprises the following steps:

Step 1, according to Fig. 3 (a) database sharing frequent mode initial tree, detailed process is as follows:

As shown in Figure 3 (b), the node root node as whole frequent pattern tree (fp tree) that label is null is set up；Scanning first After bar transaction record, setting up node a, the counting thresholding making node a is 1, shows that project a occurs 1 time；

As shown in Figure 3 (c), after scanning Article 2 transaction record, building node b, c, d, the count thresholding making b, c is 0, d's Count thresholding is 1, shows that (now in order to produce writing of redundancy when reducing and build frequent pattern tree (fp tree), not remembering occurs 1 time in project d Record the number of times that b, c occur, only record and be positioned at the number of times that the item d at this transaction record end occurs, because b afterwards, c appearance is secondary Number can being worth to according to the count field of its child's node)；

As shown in Fig. 3 (d), the initial tree gone out constructed by after scanning through whole data base's All Activity record successively；

Step 2, builds complete frequent pattern tree (fp tree)

As shown in Fig. 3 (e), with Depth Priority Algorithm, frequent mode initial tree is traveled through successively, the meter of traversing nodes Number device value is the value value plus its all child's nodes of this node itself.The value 0 that the value of such as c count field is original for c is counted with d Value 5 sum of number field, finally show that c occurs 5 times；The value of f count field be the value of child node e and g of the f value 3 original with f it With, finally show that f occurs 6 times.After having traveled through frequent pattern tree (fp tree) successively, construct complete frequent pattern tree (fp tree).

Experiment test

Choose different types of data set to test, add up the read-write operation number of times of each data set, total build tree time Between and the PCM life-span.The title of these data sets be respectively T10I4D100K, T40I10D100K, chess, mushroom, pumsb*、connect、pumsb、accidents、C73D10、C20D10。

Experimental result sees Fig. 4 to Fig. 7:

In Fig. 4, vertical coordinate represents the number of times read, and abscissa represents each data set, as can be seen from Figure 4, The present invention reduces a large amount of Read operation；

In Fig. 5, vertical coordinate represents the number of times write, and abscissa represents each data set, as can be seen from Figure 5, The present invention reduces a large amount of Write operation；

In Fig. 6, vertical coordinate represents total time building tree, and abscissa represents each data set, and as can be seen from Figure 6, the present invention subtracts Lack the time building tree；

In Fig. 7, vertical coordinate represents until PCM is write bad, treatable total transaction amount, and abscissa represents each data set, As seen from Figure 7, the life-span of the minimum PCM of prolongation of the present invention is that 16.67%(occurs at data set T40I10D100K), maximum can Extend 99.05%(to occur at data set connect), greatly extend the life-span of PCM.

Claims

1. a Frequent Pattern Mining method based on internal memory, is characterized in that, comprise the following steps:

Step 1, builds frequent mode initial tree

Frequent Pattern Mining method based on internal memory the most according to claim 1, is characterized in that, the 3rd of step 1) step Idiographic flow is as follows:

In step S22, the affairs in scan database successively；

In step S211, EP (end of program).