CN105447134A - Optimization method of a frequent item set mining algorithm - Google Patents
Optimization method of a frequent item set mining algorithm Download PDFInfo
- Publication number
- CN105447134A CN105447134A CN201510806032.5A CN201510806032A CN105447134A CN 105447134 A CN105447134 A CN 105447134A CN 201510806032 A CN201510806032 A CN 201510806032A CN 105447134 A CN105447134 A CN 105447134A
- Authority
- CN
- China
- Prior art keywords
- item
- item collection
- collection
- frequent
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
Abstract
The invention discloses an optimization method of a frequent item set mining algorithm. The method comprises the following steps of: for received data, using preorder traversal, traversing an item set tree, and thereby arranging item sets; executing parent set and subset comparison for neighboring item sets of the arranged item sets, and combining the item sets of which the comparison result is proper subsets and parent sets. As compared with the existing frequent item set mining algorithm, a function of extracting proper subsets is provide; and the method has main advantages of reducing size of data volume, reducing calculation process of data and reducing size of data storage through extraction of the proper subsets, and preventing repeated calculation of duplicated data through effective calculation for reducing invalid item sets.
Description
Technical field
The present invention relates to data processing field, particularly, relate to a kind of optimization method of Frequent Itemsets Mining Algorithm.
Background technology
Frequent Itemsets Mining Algorithm is for excavating item set (being called frequent item set) often occurred together, by excavating these frequent item sets, when there is one of them item of frequent item set in affairs, then can using other item of this frequent item set as recommendation.
Common Frequent Itemsets Mining Algorithm has two classes, and a class is Apriori algorithm, and another kind of is FPGrowth.FPGrowth forms based on Apriori algorithm optimization.FPgrowth algorithm is relative to Apriori, and maximum breakthrough is the iterations reducing data.Apriori needs to carry out K-1 time at calculating frequent item set and calculates, and K is the number of a frequent collection, and Fpgrowth only needs traversal 2 secondary data just can complete the calculating of frequent item set by building fptree.
Along with informationalized development, the burst of data increases, and the complicacy of data increases greatly.Although the computing time of frequent item set and the iterations of data can be shortened, the data of separate sources by technology such as hadoop, spark, Fpgrowth, the increase of quantity collection and the increasing of invalid frequent item set of frequent item set can be caused.In project, practical effect is not accurate, often recommends the result made mistake.And invalid data amount can increase the size of frequent item set, the performance of project and cost can not be satisfied the demands.
Summary of the invention
The object of the invention is to, for the problems referred to above, propose a kind of optimization method of Frequent Itemsets Mining Algorithm, to realize reducing data volume size, and the advantage of reduction data calculation process and data storage.
For achieving the above object, the technical solution used in the present invention is:
An optimization method for Frequent Itemsets Mining Algorithm, comprising:
Receive data;
For the data received, use preorder traversal, traversal Xiang Jishu, thus item collection is arranged;
Concentrate adjacent item collection to make father and son's collection to the item after arrangement to compare, and by comparative result be the item set of proper subclass and superset relationship also;
Its middle term integrates the abbreviation as frequent item set.
Preferably, described father and son's collection compares, and the content compared comprises, the subordinate relation of item collection and the support of item collection.
Preferably, the subordinate relation of described item collection is more specific is:
Suppose, two item collection are respectively A item collection and B item collection, and concentrate if the item inside A item collection is all contained in B item, then think that A item collection belongs to B item collection, A item collection is the subset of B item collection.
Preferably, the support of described item collection is more specific is:
Suppose, two item collection are respectively A item collection and B item collection, the support of item collection derives from data, be exactly the number of times that the item of this collection the inside occurs in the data simultaneously in simple terms, if the equal and A item collection of the frequent degree of the frequent degree of A item collection and B item collection is the subset of B item collection, then A item collection is the proper subclass of B item collection; If A item collection is the subset of B item collection, but support is different, then A item collection is the subset of B item collection, but is not proper subclass.
Technical scheme of the present invention has following beneficial effect:
Technical scheme of the present invention, compare with existing Frequent Itemsets Mining Algorithm, extract the effect of proper subclass, main advantage is by extracting proper subclass, reduce the size of data volume, the size that the computation process of reduction data and data store, and by the effective calculating reducing void item collection, prevent the calculating repeatedly of repeating data.Thus when using this algorithm to recommend, avoid recommending invalid commodity, can effectively experience by adding users.The use of proper subclass is namely cost-saving, improves performance and Consumer's Experience again.
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the optimization method of the Frequent Itemsets Mining Algorithm described in the embodiment of the present invention;
The data structure schematic diagram that Fig. 2 is the frequent item set described in the embodiment of the present invention;
The data structure schematic diagram that Fig. 3 can merge for the frequent item set described in the embodiment of the present invention;
The data structure schematic diagram that Fig. 4 can partly merge for the frequent item set described in the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.
As shown in Figure 1, a kind of optimization method of Frequent Itemsets Mining Algorithm, comprising:
Receive data;
For the data received, use preorder traversal, traversal Xiang Jishu, thus item collection is arranged;
Concentrate adjacent item collection to make father and son's collection to the item after arrangement to compare, and by comparative result be the item set of proper subclass and superset relationship also;
Its middle term integrates the abbreviation as frequent item set.
Preferably, father and son's collection compares, and the content compared comprises, the subordinate relation of item collection and the support of item collection.
Preferably, the subordinate relation of described item collection is more specific is:
Suppose, two item collection are respectively A item collection and B item collection, and concentrate if the item inside A item collection is all contained in B item, then think that A item collection belongs to B item collection, A item collection is the subset of B item collection.
Preferably, the support of item collection is more specific is:
Suppose, two item collection are respectively A item collection and B item collection, the support of item collection derives from data, be exactly the number of times that the item of this collection the inside occurs in the data simultaneously in simple terms, if the equal and A item collection of the frequent degree of the frequent degree of A item collection and B item collection is the subset of B item collection, then A item collection is the proper subclass of B item collection; If A item collection is the subset of B item collection, but support is different, then A item collection is the subset of B item collection, but is not proper subclass.
As shown in Figure 2: there are three column datas as shown in Figure 2 in frequent item set result set, secondary series and the 3rd is classified as the subset of first row, and the support of three row frequent item sets is all 10.In the case, first row, secondary series and the 3rd come from same data source, describe the proper subclass that secondary series and the 3rd is classified as first row, need not be divided into three column counts, can merge into same row, as shown in Figure 3.
As shown in Figure 3: in the result of calculation of frequent item set, next column data are selected to compare support and father's subset relation with current data column data, if next column is the proper subclass when prostatitis, then 2 row are combined into row, compare the 3rd row again, if the 3rd row are still father's subset relation with first row, then three row are combined into row, compare down successively; If the 3rd row and first row are not set membership, as shown in Figure 4, then first row and secondary series are merged, down compare from the 3rd leu.
Proper subclass and superset merge into same row, reduce the generation of repeating data in data volume, in frequent item set use procedure, decrease the number of times of calculating.In data accuracy, decrease reusing of the frequent item set coming from same data source, on data accuracy, serve the effect of optimization.
Father and son's collection compares: father and son's collection compares and is divided into 2 parts.First is the subordinate relation of frequent item set, and item collection is the set of frequent episode, if the item inside A frequent item set is all contained in B frequent item set, then think that A item collection belongs to B item collection, A item collection is the subset of B item collection.Second point compares support, and the support of frequent item set derives from data, is exactly the number of times that the item of the inside of this collection occurs in the data simultaneously in simple terms.If the equal and A item collection of the frequent degree of the frequent degree of A item collection and B item collection is the subset of B item collection, then A item collection is the proper subclass of B item collection; If A item collection is the subset of B item collection, but support is different, then A item collection is the subset of B item collection, but is not proper subclass.
Father and son collects alternative: select frequent item set be father and son collection compare time, only need to select 2 adjacent set to compare, in the use preorder traversal of traversal Xiang Jishu, the item collection of set membership may be there is, can according to adjacent relationship together.
Last it is noted that the foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, although with reference to previous embodiment to invention has been detailed description, for a person skilled in the art, it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (4)
1. an optimization method for Frequent Itemsets Mining Algorithm, is characterized in that, comprising:
Receive data;
For the data received, use preorder traversal, traversal Xiang Jishu, thus item collection is arranged;
Concentrate adjacent item collection to make father and son's collection to the item after arrangement to compare, and by comparative result be the item set of proper subclass and superset relationship also;
Its middle term integrates the abbreviation as frequent item set.
2. the optimization method of Frequent Itemsets Mining Algorithm according to claim 1, is characterized in that, described father and son's collection compares, and the content compared comprises, the subordinate relation of item collection and the support of item collection.
3. the optimization method of Frequent Itemsets Mining Algorithm according to claim 2, is characterized in that, the subordinate relation of described item collection is more specific is:
Suppose, two item collection are respectively A item collection and B item collection, and concentrate if the item inside A item collection is all contained in B item, then think that A item collection belongs to B item collection, A item collection is the subset of B item collection.
4. the optimization method of Frequent Itemsets Mining Algorithm according to claim 3, is characterized in that, the support of described item collection is more specific is:
Suppose, two item collection are respectively A item collection and B item collection, the support of item collection derives from data, be exactly the number of times that the item of this collection the inside occurs in the data simultaneously in simple terms, if the equal and A item collection of the frequent degree of the frequent degree of A item collection and B item collection is the subset of B item collection, then A item collection is the proper subclass of B item collection; If A item collection is the subset of B item collection, but support is different, then A item collection is the subset of B item collection, but is not proper subclass.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510806032.5A CN105447134B (en) | 2015-11-20 | 2015-11-20 | The optimization method of Frequent Itemsets Mining Algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510806032.5A CN105447134B (en) | 2015-11-20 | 2015-11-20 | The optimization method of Frequent Itemsets Mining Algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105447134A true CN105447134A (en) | 2016-03-30 |
CN105447134B CN105447134B (en) | 2019-03-08 |
Family
ID=55557311
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510806032.5A Active CN105447134B (en) | 2015-11-20 | 2015-11-20 | The optimization method of Frequent Itemsets Mining Algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105447134B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021412A (en) * | 2016-05-13 | 2016-10-12 | 上海市计算技术研究所 | Large-scale vehicle-passing data oriented accompanying vehicle identification method |
CN109300014A (en) * | 2018-10-24 | 2019-02-01 | 中南民族大学 | Method of Commodity Recommendation, device, server and storage medium based on Web log mining |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101937447A (en) * | 2010-06-07 | 2011-01-05 | 华为技术有限公司 | Alarm association rule mining method, and rule mining engine and system |
US20130332431A1 (en) * | 2012-06-12 | 2013-12-12 | International Business Machines Corporation | Closed itemset mining using difference update |
CN103678530A (en) * | 2013-11-30 | 2014-03-26 | 武汉传神信息技术有限公司 | Rapid detection method of frequent item sets |
CN104850577A (en) * | 2015-03-19 | 2015-08-19 | 浙江工商大学 | Data flow maximal frequent item set mining method based on ordered composite tree structure |
-
2015
- 2015-11-20 CN CN201510806032.5A patent/CN105447134B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101937447A (en) * | 2010-06-07 | 2011-01-05 | 华为技术有限公司 | Alarm association rule mining method, and rule mining engine and system |
US20130332431A1 (en) * | 2012-06-12 | 2013-12-12 | International Business Machines Corporation | Closed itemset mining using difference update |
CN103678530A (en) * | 2013-11-30 | 2014-03-26 | 武汉传神信息技术有限公司 | Rapid detection method of frequent item sets |
CN104850577A (en) * | 2015-03-19 | 2015-08-19 | 浙江工商大学 | Data flow maximal frequent item set mining method based on ordered composite tree structure |
Non-Patent Citations (1)
Title |
---|
林森媚: "基于合并FP树的频繁模式挖掘算法", 《广西师范大学学报:自然科学版》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021412A (en) * | 2016-05-13 | 2016-10-12 | 上海市计算技术研究所 | Large-scale vehicle-passing data oriented accompanying vehicle identification method |
CN109300014A (en) * | 2018-10-24 | 2019-02-01 | 中南民族大学 | Method of Commodity Recommendation, device, server and storage medium based on Web log mining |
CN109300014B (en) * | 2018-10-24 | 2020-09-08 | 中南民族大学 | Commodity recommendation method and device based on log mining, server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105447134B (en) | 2019-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kantor et al. | Coreference resolution with entity equalization | |
US11263247B2 (en) | Regular expression generation using longest common subsequence algorithm on spans | |
Song et al. | RP-DBSCAN: A superfast parallel DBSCAN algorithm based on random partitioning | |
US9372928B2 (en) | System and method for parallel search on explicitly represented graphs | |
US20150370838A1 (en) | Index structure to accelerate graph traversal | |
CN103761236A (en) | Incremental frequent pattern increase data mining method | |
US9275111B2 (en) | Minimizing result set size when converting from asymmetric to symmetric requests | |
CN105631068B (en) | A kind of net boundary conditional processing method that unstrctured grid CFD is calculated | |
CN102279738A (en) | Identifying entries and exits of strongly connected components | |
US9183598B2 (en) | Identifying event-specific social discussion threads | |
Kovács et al. | Frequent itemset mining on hadoop | |
CN104137095A (en) | System for evolutionary analytics | |
CN103995827B (en) | High-performance sort method in MapReduce Computational frames | |
CN103092992A (en) | Vector data preorder quadtree coding and indexing method based on Key / Value type NoSQL (Not only SQL) | |
CN103455534A (en) | Document clustering method and device | |
CN105447134A (en) | Optimization method of a frequent item set mining algorithm | |
CN109828965B (en) | Data processing method and electronic equipment | |
CN104462095A (en) | Extraction method and device of common pars of query statements | |
CN103853554A (en) | Software reconstruction position determination method and software reconstruction position identification device | |
US9262492B2 (en) | Dividing and combining operations | |
KR101348849B1 (en) | Method for mining of frequent subgraph | |
CN105095239A (en) | Uncertain graph query method and device | |
CN101359337A (en) | Method for interactively editing GIS topological data set | |
EP3488359A1 (en) | Systems and methods for database compression and evaluation | |
Thayasivam et al. | Improved convergence of iterative ontology alignment using block-coordinate descent |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |