CN106250549A - A kind of Frequent Pattern Mining method based on internal memory - Google Patents
A kind of Frequent Pattern Mining method based on internal memory Download PDFInfo
- Publication number
- CN106250549A CN106250549A CN201610662641.2A CN201610662641A CN106250549A CN 106250549 A CN106250549 A CN 106250549A CN 201610662641 A CN201610662641 A CN 201610662641A CN 106250549 A CN106250549 A CN 106250549A
- Authority
- CN
- China
- Prior art keywords
- tree
- frequent
- node
- value
- affairs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
Abstract
The invention discloses a kind of Frequent Pattern Mining method based on internal memory, it comprises the following steps: step 1, builds frequent mode initial tree, creates the root node T of frequent pattern tree (fp tree), with " null " labelling;Scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;Build the path of a frequent pattern tree (fp tree) after sequence with null for root node, only counting in the node of most end upper to path adds 1, and the counting of other nodes on path keeps constant;Scan through successively and whole data base obtains after all affairs frequent mode initial tree;Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is the value value plus its all child's nodes of this node itself.The solution have the advantages that: the write operation to NVM can be reduced, can quickly build frequent pattern tree (fp tree);And the write operation the most intensive to the node count field near root node can be reduced, extend the NVM life-span.
Description
Technical field
The invention belongs to memory technology field, be specifically related to a kind of Frequent Pattern Mining method based on internal memory.
Background technology
Along with computer science and technology increasingly mature, data analysis from 20th century establish since had great development.Data
Analysis can find and extract project interested in mass data, thus provides instruction to policy-making body.Machine
Study and data mining can disclose the information that data are hidden behind, it has also become be the key technology of data analysis.
In Data Mining, find that the frequent episode in data set or frequent mode are in data mining research
Important topic, it is the base of many significant data mining tasks such as correlation analysis, sequence pattern, cause effect relation, Emerging Pattern
Plinth.There are the technology such as such as Apriori and FP-tree at present to process Frequent Pattern Mining problem.
Owing to the condition of Frequent Pattern Mining method based on internal memory is to be mined data and data element is stored in byte
On addressing register, and DRAM requires to need continued power to keep data, is likely to become data accordingly, it is capable to imitate with persistency
Key Design problem in digging system.In order to solve such problem, such as phase transition storage in data analysis based on internal memory
Etc. (PCM) nonvolatile memory (NVM) is due to its outstanding non-volatile and performance efficiency, is typically considered the excellent of DRAM
Elegant succedaneum.But use NVM to there is again problem below as hosting: one is that the read-write operation time difference to NVM is bigger,
Read operation is generally more than the time spent by write operation and energy;Two is the write operation that NVM write operation number of times is limited, uneven
Monoblock NVM would generally be accelerated lost efficacy.Just because of the consideration lacked NVM essential characteristic, the data carried out on NVM at present are dug
Pick and machine learning algorithm have a strong impact on performance and the life-span of storage system.
Prior art uses a kind of technical scheme being called FP-tree algorithm, and it is the improvement to Apriori algorithm, will
The structure of key message boil down to frequent pattern tree (fp tree) (FP-tree) of frequent mode, huge to reduce expense in Apriori algorithm
Candidate item, thus solve the performance bottleneck of Apriori algorithm.Briefly, FP-tree algorithm is not generate candidate item
In the case of, complete the function of Apriori algorithm.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without
candidate generation. ACM SIGMOD International Conference on Management of
Data (SIGMOD ' 00), 29 (2): 1 12, May 2000.(J. Han, J. Pei, and Y. Yin. " do not produce time
The Frequent Pattern Mining of option ", data management international conference, 29 (2): 1 12,2000.05.) describe FP-tree algorithm
Step as follows:
(1) scan whole transaction database D once, it is thus achieved that the support counting of the whole items included in D, get rid of support
Count value is less than the item of threshold value, and remaining item is frequent episode, and by its support counting descending, frequent episode is obtained one
List L;
(2) the root node T of FP-tree is created, with " null " labelling.Again scan transaction database.To affairs each in D, will
Frequent episode therein is selected and sorts by the order in L.If the frequent episode table after Pai Xu is [p | P], wherein p be first frequently
, and P is remaining frequent episode.Call insert_tree ([p | P], T).Insert_tree ([p | P], T) and process execution feelings
Condition is as follows: if T has children N to make N .item_name=p.item_name, then the counting of N increases by 1;Otherwise create a new knot
Point N, is counted and is set to 1, be linked to its father node T.If P non-NULL, recursively call insert_tree (P, N).
Through above step, just establish a complete FP-tree.Finally according to the FP-tree established by under
Excavate in proper order, required frequent mode can be produced.Can be described as the letter utilized in transaction database in brief
Breath structure FP-tree, then Mining Frequent Patterns from FP-tree.Its core concept is directly to compress database sharing one
Frequent pattern tree (fp tree), then generates correlation rule by this tree.
Fig. 1 gives the building process example of FP-tree.Fig. 1 (a) is data base, and wherein " transaction id " is each friendship
The easily sequence number of record, " project " is all items in each transaction record, and " item after sequence " is for go out occurrence according to each item
Item after number descending;Initially set up the node root node as whole frequent pattern tree (fp tree) that label is null, scanning
Article 1, after transaction record, set up node a, and to make the value of the count field of node a be 1, show that project a occurs 1 time, such as Fig. 1 (b)
Shown in;After scanning Article 2 transaction record, setting up node b, c, d successively, the value of its node count field is 1, shows project b,
C, d occur 1 time the most respectively, as shown in Fig. 1 (c);Scan through in data base after All Activity record successively, foundation complete
Shown in FP-tree such as Fig. 1 (d), the item in the most each letter representation data base, the numeral after letter represents storage in count field
Value, is this number of times occurred in data base.
But the problem that FP-tree algorithm exists has: during building frequent pattern tree (fp tree), often in one affairs of scanning
One item, will be updated operation, i.e. the node count field of respective items in FP-tree be carried out write operation FP-tree, this
Having led to the write operation repeated in a large number, memory cost is huge;And the most the closer to the write operation of root node, intensive writes in a large number
Operation can cause reduce the service life of NVM.
Summary of the invention
The problem existed for prior art, the technical problem to be solved is just to provide a kind of based on internal memory
Frequent Pattern Mining method, it can reduce and building during frequent pattern tree (fp tree) the write operation to NVM, is avoided that intensive a large amount of
Write operation, reaches to extend the purpose in NVM life-span
The technical problem to be solved is realized by such technical scheme, and it comprises the following steps:
Step 1, builds frequent mode initial tree
1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base
Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending
Arrangement obtains a list L;
2), create frequent pattern tree (fp tree) root node T, with " null " labelling;
3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;After sequence
Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path
On other nodes counting keep constant;Scan through in whole data base that to obtain frequent mode after all affairs initial successively
Tree;
Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is
The value of this node itself is plus the value of its all child's nodes.
In the frequent pattern tree (fp tree) of the present invention, the value of the count field of all elements is this element and occurs in whole data base
Number of times, as the tree built with the Mining Algorithms of Frequent Patterns of prior art.
Compared with prior art, the solution have the advantages that:
The present invention no longer count field to the node of all items in current whole piece affairs is updated operation, it is to avoid building
A large amount of write operations repeated during frequent pattern tree (fp tree), reduce the write operation to NVM, can quickly build frequent pattern tree (fp tree);
And the most intensive write operation to the node count field near root node can be reduced, extend the NVM life-span.
Accompanying drawing explanation
The accompanying drawing of the present invention is described as follows:
Fig. 1 is the structure exemplary plot of the frequent pattern tree (fp tree) in background technology;
Fig. 2 is the flow chart that the present invention builds frequent mode initial tree;
Fig. 3 is the structure exemplary plot of the frequent pattern tree (fp tree) of the present invention;
Fig. 4 is the comparison diagram of read operation test in test;
Fig. 5 is the comparison diagram of write operation test in test;
Fig. 6 is the comparison diagram building tree time test in test;
Fig. 7 is the comparison diagram of PCM life test in test.
Detailed description of the invention
The invention will be further described with embodiment below in conjunction with the accompanying drawings:
The input of the present invention is data base and minimum support threshold value σ, and output is FP-tree.
The present invention comprises the following steps:
Step 1, builds frequent mode initial tree
1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base
Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending
Arrangement obtains a list L;
2), create frequent pattern tree (fp tree) root node T, with " null " labelling;
3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;After sequence
Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path
On other nodes counting keep constant;Scan through in whole data base that to obtain frequent mode after all affairs initial successively
Tree.
Fig. 2 is the flow chart that the present invention builds frequent mode initial tree, and its flow process is as follows:
In step S21, the item being not up to minimum support is left out, to remaining item by its occurrence number descending in each affairs
Sequence;
In step S22, the affairs in scan database successively;
In step S23, each in scanning affairs, down travels through along tree from root node from front to back successively;
In step S24, it is judged that whether currentitem is the item of most end in affairs, if so, perform step S25;If not, perform
Step S27;
In step S25, it is judged that whether tree exists corresponding node, as existed, then perform step S26;If do not existed, then perform step
Rapid S29;
In step S26, it is incremented by the value of this middle count field;Then step S210 is gone to;
In step S27, it is judged that whether tree exists corresponding node, as existed, then return step S23;If do not existed, then perform step
Rapid S28;
In step S28, creating new node, the value making its count field is 0;Then step S23 is returned;
In step S29, creating new node, the value making its count field is 1;Then step S210 is gone to;
In step S210, it is judged that all affairs are the most scanned, if the most scanned, then return step S22;If scanning through
Finish, then perform step S211
In step S211, EP (end of program);
Step 2, builds complete frequent pattern tree (fp tree)
Traveling through frequent mode initial tree successively with Depth Priority Algorithm, the Counter Value of traversing nodes is this node
The value of itself is plus the value of its all child's nodes.
Embodiment
Fig. 3 is the example that the present invention builds frequent pattern tree (fp tree), and the present embodiment comprises the following steps:
Step 1, according to Fig. 3 (a) database sharing frequent mode initial tree, detailed process is as follows:
As shown in Figure 3 (b), the node root node as whole frequent pattern tree (fp tree) that label is null is set up;Scanning first
After bar transaction record, setting up node a, the counting thresholding making node a is 1, shows that project a occurs 1 time;
As shown in Figure 3 (c), after scanning Article 2 transaction record, building node b, c, d, the count thresholding making b, c is 0, d's
Count thresholding is 1, shows that (now in order to produce writing of redundancy when reducing and build frequent pattern tree (fp tree), not remembering occurs 1 time in project d
Record the number of times that b, c occur, only record and be positioned at the number of times that the item d at this transaction record end occurs, because b afterwards, c appearance is secondary
Number can being worth to according to the count field of its child's node);
As shown in Fig. 3 (d), the initial tree gone out constructed by after scanning through whole data base's All Activity record successively;
Step 2, builds complete frequent pattern tree (fp tree)
As shown in Fig. 3 (e), with Depth Priority Algorithm, frequent mode initial tree is traveled through successively, the meter of traversing nodes
Number device value is the value value plus its all child's nodes of this node itself.The value 0 that the value of such as c count field is original for c is counted with d
Value 5 sum of number field, finally show that c occurs 5 times;The value of f count field be the value of child node e and g of the f value 3 original with f it
With, finally show that f occurs 6 times.After having traveled through frequent pattern tree (fp tree) successively, construct complete frequent pattern tree (fp tree).
Experiment test
Choose different types of data set to test, add up the read-write operation number of times of each data set, total build tree time
Between and the PCM life-span.The title of these data sets be respectively T10I4D100K, T40I10D100K, chess, mushroom,
pumsb*、connect、pumsb、accidents、C73D10、C20D10。
Experimental result sees Fig. 4 to Fig. 7:
In Fig. 4, vertical coordinate represents the number of times read, and abscissa represents each data set, as can be seen from Figure 4, The present invention reduces a large amount of
Read operation;
In Fig. 5, vertical coordinate represents the number of times write, and abscissa represents each data set, as can be seen from Figure 5, The present invention reduces a large amount of
Write operation;
In Fig. 6, vertical coordinate represents total time building tree, and abscissa represents each data set, and as can be seen from Figure 6, the present invention subtracts
Lack the time building tree;
In Fig. 7, vertical coordinate represents until PCM is write bad, treatable total transaction amount, and abscissa represents each data set,
As seen from Figure 7, the life-span of the minimum PCM of prolongation of the present invention is that 16.67%(occurs at data set T40I10D100K), maximum can
Extend 99.05%(to occur at data set connect), greatly extend the life-span of PCM.
Claims (2)
1. a Frequent Pattern Mining method based on internal memory, is characterized in that, comprise the following steps:
Step 1, builds frequent mode initial tree
1), each transaction record in scan database successively, it is thus achieved that the support meter of the whole items included in data base
Number, gets rid of the support counting value item less than threshold value, and remaining item is frequent episode, to frequent episode by its support counting descending
Arrangement obtains a list L;
2), create frequent pattern tree (fp tree) root node T, with " null " labelling;
3), scan database again, the frequent episode in every the affairs that will read is selected and sorts by the order in L;After sequence
Build the path of a frequent pattern tree (fp tree) with null for root node, only counting in the node of most end upper to path adds 1, path
On other nodes counting keep constant;Scan through in whole data base that to obtain frequent mode after all affairs initial successively
Tree;
Step 2, travels through frequent mode initial tree successively with Depth Priority Algorithm, and the Counter Value of traversing nodes is
The value of this node itself is plus the value of its all child's nodes.
Frequent Pattern Mining method based on internal memory the most according to claim 1, is characterized in that, the 3rd of step 1) step
Idiographic flow is as follows:
In step S21, the item being not up to minimum support is left out, to remaining item by its occurrence number descending in each affairs
Sequence;
In step S22, the affairs in scan database successively;
In step S23, each in scanning affairs, down travels through along tree from root node from front to back successively;
In step S24, it is judged that whether currentitem is the item of most end in affairs, if so, perform step S25;If not, perform
Step S27;
In step S25, it is judged that whether tree exists corresponding node, as existed, then perform step S26;If do not existed, then perform step
Rapid S29;
In step S26, it is incremented by the value of this middle count field;Then step S210 is gone to;
In step S27, it is judged that whether tree exists corresponding node, as existed, then return step S23;If do not existed, then perform step
Rapid S28;
In step S28, creating new node, the value making its count field is 0;Then step S23 is returned;
In step S29, creating new node, the value making its count field is 1;Then step S210 is gone to;
In step S210, it is judged that all affairs are the most scanned, if the most scanned, then return step S22;If scanning through
Finish, then perform step S211
In step S211, EP (end of program).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610662641.2A CN106250549B (en) | 2016-08-14 | 2016-08-14 | A kind of Frequent Pattern Mining method memory-based |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610662641.2A CN106250549B (en) | 2016-08-14 | 2016-08-14 | A kind of Frequent Pattern Mining method memory-based |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106250549A true CN106250549A (en) | 2016-12-21 |
CN106250549B CN106250549B (en) | 2019-09-20 |
Family
ID=57591955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610662641.2A Active CN106250549B (en) | 2016-08-14 | 2016-08-14 | A kind of Frequent Pattern Mining method memory-based |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250549B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874396A (en) * | 2017-01-16 | 2017-06-20 | 重庆大学 | A kind of Frequent Pattern Mining method based on nonvolatile memory |
CN110096629A (en) * | 2019-05-15 | 2019-08-06 | 重庆大学 | A method of the Mining Frequent based on effective weight tree weights item collection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119302A (en) * | 2007-09-06 | 2008-02-06 | 华中科技大学 | Method for digging frequency mode in the lately time window of affair data flow |
CN102662948A (en) * | 2012-02-23 | 2012-09-12 | 浙江工商大学 | Data mining method for quickly finding utility pattern |
CN105589900A (en) * | 2014-11-21 | 2016-05-18 | 中国银联股份有限公司 | Data mining method based on multi-dimensional analysis |
-
2016
- 2016-08-14 CN CN201610662641.2A patent/CN106250549B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119302A (en) * | 2007-09-06 | 2008-02-06 | 华中科技大学 | Method for digging frequency mode in the lately time window of affair data flow |
CN102662948A (en) * | 2012-02-23 | 2012-09-12 | 浙江工商大学 | Data mining method for quickly finding utility pattern |
CN105589900A (en) * | 2014-11-21 | 2016-05-18 | 中国银联股份有限公司 | Data mining method based on multi-dimensional analysis |
Non-Patent Citations (4)
Title |
---|
XIANLU LUO 等: "Enhancing Lifetime of NVM-based Main Memorywith Bit Shifting and Flipping", 《EMBEDDED AND REAL-TIME COMPUTING SYSTEM AND APPLICATIONS》 * |
栾华 等: "多核处理器上的频繁图挖掘方法", 《计算机研究与发展》 * |
牛新征 等: "基于数组前缀树的频繁项集挖掘算法", 《小型微型计算机系统》 * |
王乐 等: "基于模式增长方式的高效用模式挖掘算法", 《自动化学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874396A (en) * | 2017-01-16 | 2017-06-20 | 重庆大学 | A kind of Frequent Pattern Mining method based on nonvolatile memory |
CN110096629A (en) * | 2019-05-15 | 2019-08-06 | 重庆大学 | A method of the Mining Frequent based on effective weight tree weights item collection |
CN110096629B (en) * | 2019-05-15 | 2023-07-28 | 重庆大学 | Memory optimization method for transaction processing |
Also Published As
Publication number | Publication date |
---|---|
CN106250549B (en) | 2019-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200167367A1 (en) | Block chain state data synchronization method, apparatus, and electronic device | |
Willard et al. | Adding range restriction capability to dynamic data structures | |
CN107943777A (en) | A kind of collaborative editing, cooperative processing method, device, equipment and storage medium | |
Joshi et al. | A dynamic approach for frequent pattern mining using transposition of database | |
Gan et al. | Explainable fuzzy utility mining on sequences | |
CN106250549A (en) | A kind of Frequent Pattern Mining method based on internal memory | |
CN100419750C (en) | Method for converting concatenated join tables into tree structure and conversion program | |
CN112052233A (en) | Multi-angle business process abnormity online detection method based on context awareness | |
Kiran et al. | Finding periodic-frequent patterns in temporal databases using periodic summaries | |
CN102214248A (en) | Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data | |
Guo et al. | High utility episode mining made practical and fast | |
Sinha et al. | Identification of best algorithm in association rule mining based on performance | |
Oguz et al. | Incremental itemset mining based on matrix apriori algorithm | |
CN111078896A (en) | Knowledge base completion method based on PRMATC algorithm | |
Dubey et al. | A novel J2ME service for mining incremental patterns in mobile computing | |
Lin et al. | Efficient mining of high average-utility sequential patterns from uncertain databases | |
Xiong et al. | Mining simple path traversal patterns in knowledge graph | |
Lee et al. | Mining traveling and purchasing behaviors of customers in electronic commerce environment | |
Chen et al. | Research on association rules mining base on positive and negative items of FP-tree | |
CN109697197A (en) | A method of carving multiple Access database file | |
Lin et al. | A share strategy for utility frequent patterns mining | |
CN111369052A (en) | Simplified road network KSP optimization algorithm | |
CN105989117A (en) | Method and system for rapidly and jointly processing semi-structured data | |
Meddah et al. | Mining Patterns Using Business Process Management | |
Zheng et al. | A novel method to generate frequent itemsets in distributed environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |