CN101799810A - Association rule mining method and system thereof - Google Patents
Association rule mining method and system thereof Download PDFInfo
- Publication number
- CN101799810A CN101799810A CN200910077996A CN200910077996A CN101799810A CN 101799810 A CN101799810 A CN 101799810A CN 200910077996 A CN200910077996 A CN 200910077996A CN 200910077996 A CN200910077996 A CN 200910077996A CN 101799810 A CN101799810 A CN 101799810A
- Authority
- CN
- China
- Prior art keywords
- frequent
- item collection
- data
- count value
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses an association rule mining method and a system thereof. The method comprises the steps of: generating a K+1 item set from a frequent K item set; performing a plurality of parallel processing tasks, wherein each processing task obtains data of the corresponding part in a transaction data set, and counting the frequent count value of the K+1 item set in the data; collecting the statistical result of all processing tasks to obtain the frequent count value of the K+1 item set in the transaction data set; generating the frequent K+1 item set which meets the requirement of support degree according to the frequent count value of the K+1 item set; and outputting the association rule when the association rule which meets the requirement of support degree is judged to be existed according to the frequent K+1 item set. The invention can improve the processing efficiency for mining the association rule.
Description
Technical field
The present invention relates to the data mining technology in the communications field, relate in particular to a kind of association rule mining method and system thereof.
Background technology
In data mining was handled, the data mining purpose of correlation rule (Association Rule) was association that merits attention or the correlationship that discovery exists between the lot of data item, and it is the market basket analysis of retail trade that the typical case uses.So-called market basket analysis is meant that data are carried out correlation rule research helps to find the contact between the different commodity (or different item) in the transaction data base, find out the pattern of customer purchasing behavior, for example, if bread and milk are often bought simultaneously by client, then they are placed in the sales volume that helps to increase by two kinds of commodity together.In order to weigh the significance level of a rule, correlation rule adopts support (support) and confidence level (confidence) as module usually.Support can be represented the significance level of commodity in sell in the supermarket, and confidence level has reflected the correlation degree between the commodity.If in the transaction of buying bread, there is 60% transaction not only to buy bread but also bought milk, then claim correlation rule " bread
Milk " confidence level of (if expression is bought bread then bought milk) is 60%.
Correlation rule
(expression A and B exist simultaneously) support in transaction database D, usable probability P (A ∪ B) expression;
Correlation rule
Confidence level in transaction database D is that in transaction database D those comprise in the affairs of A, the probability that B also occurs simultaneously, i.e. conditional probability P (B|A).
The support of an item collection X in transaction database D is the number percent that the affairs count (X) that comprises X among the transaction database D accounts for affairs sum N, i.e. probability P (X).For an item collection X, if its support, claims then that X is frequent item set (FI:Frequent Itemset) or frequent mode more than or equal to support threshold value min_sup given in advance.
In the prior art, the data mining of correlation rule is handled and is generally comprised two parts:
First: find out the frequent item set of all supports more than or equal to the minimum support threshold value;
Second portion: generate the correlation rule that satisfies the confidence level threshold value by frequent item set.
The work of above-mentioned first is quite time-consuming, and second portion is operated in and is easier to realize on the basis of first, so the overall performance of association rules mining algorithm is mainly by first's work decision.
The algorithm of the excavation boolean relation rule frequent item set that Apriori algorithm of the prior art is a kind of classics.The Apriori algorithm is carrying out the work of above-mentioned first, promptly, when finding out frequent item set, need scan database repeatedly, during the amount of bordering on the sea data mining face to face, because the restriction of internal storage capacity, data can't all be loaded into the central computing of internal storage, even can't go up storage at unit (or single node), and, the Apriori algorithm has limited mining efficiency to a certain extent as a kind of serial algorithm.
Summary of the invention
The embodiment of the invention provides a kind of association rule mining method and system thereof, to solve the existing low problem of association rule mining treatment effeciency.
The association rule mining method that the embodiment of the invention provides comprises:
Generate K+1 item collection by frequent K item collection;
Carry out a plurality of parallel Processing tasks, wherein, each Processing tasks obtains the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
The statistics of all Processing tasks gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
The association rule mining system that the embodiment of the invention provides comprises:
Calling module is used for calling a plurality of parallel Processing tasks, and calling after described a plurality of parallel Processing tasks are finished and gather task according to behind the frequent K item collection generation K+1 item collection;
With described a plurality of parallel Processing tasks Processing tasks execution module one to one, be used to carry out Processing tasks, comprising: obtain the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
Gather task execution module, be used for carrying out and gather task, comprise: the statistics of all Processing tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
The above embodiment of the present invention, generating in the process of frequent K+1 item collection with frequent K item collection, Processing tasks by a plurality of executed in parallel obtains the partial data that Transaction Information is concentrated, and add up the frequent count value of K+1 item collection in the each several part Transaction Information respectively, and then gather, obtain the frequent count value that K+1 item collection is concentrated in whole Transaction Information, thereby generate frequent K+1 item collection and the output of satisfying the support requirement and satisfy the correlation rule that confidence level requires, a plurality of Processing tasks executed in parallel have been realized, compared with prior art, improved the treatment effeciency of association rule mining.
Description of drawings
Fig. 1 is a parallel association rules schematic flow sheet in the embodiment of the invention;
Fig. 2 adopts Map/Reduce mechanism to realize the synoptic diagram of parallel association rules flow process in the embodiment of the invention;
Fig. 3 is the data digging system structural representation in the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing the embodiment of the invention is described in detail.
In the association rule mining process, when generating frequent item set, need generate next frequent item set with the frequent item set of previous generation.
Referring to Fig. 1, the synoptic diagram of the association rule mining flow process that provides for the embodiment of the invention comprises:
In the step 103 of above-mentioned flow process, termination condition can comprise: reach the maximum iteration time of setting, perhaps Shu Chu correlation rule quantity reaches the amount threshold of setting, and perhaps the frequent k+1 item collection of Sheng Chenging is empty.
The frequent k item of utilization in the step 102 of above-mentioned flow process collection generates the process of the frequent k+1 item collection that satisfies the support requirement, can adopt Map/Reduce (mapping/simplification) mechanism to realize.Map/Reduce is the programming mode of a distributed treatment mass data collection, can allow Automatic Program be distributed to concurrent execution on the super large cluster of being made up of common machines by this mechanism.The process of the frequent k+1 item of the generation collection that employing Map/Reduce mechanism realizes can be as shown in Figure 2.
Referring to Fig. 2, realize the parallel association rules schematic flow sheet for adopting Map/Reduce mechanism in the embodiment of the invention.With the example that is applied as of commodity purchasing basket, I:{i1, i2 ... be that commodity are gathered, D:{T1, T2 ... being the shopping list set, minimum support is min_sup, minimum confidence level is min_conf, and as shown in the figure, maximum iteration time is that the flow process of the correlation rule of k comprises:
Generate the frequent 1-item collection of support according to set D more than or equal to min_sup.In this step, can generate the frequent 1-item collection that satisfies more than or equal to support threshold value min_sup condition by the mode of scanning set D.The item collection is meant the set of commodity, is the subclass of I.1-item collection is meant in the commodity set and includes only a kind of commodity (as i1), the support of item collection is meant that the number of times that this collection occurs (occurs 30 times as item collection i1 altogether divided by the total degree of concluding the business among the D in D in D, transaction adds up to 100 among the D, and then the support of this collection is 30%).If the support threshold value is 20%, then this 1-item rally is as frequent 1-item collection output.
Generate 2-item collection according to frequent 1-item collection, 2-item collection is meant in the commodity set and comprises that 2 kinds of commodity are (as i2, i3).Consider not need to calculate the possible situation of each 2-item collection, can do beta pruning and handle.
Generate a plurality of parallel Map tasks, and the Reduce task.Wherein, each Map task is responsible for obtaining the data of appropriate section among the set D, and the frequent count value of statistics 2-item collection in this partial data; The Reduce task is responsible for statistics to all Map tasks and is gathered and (for example obtain the frequent count value of 2-item collection in set D, in all shopping lists among the set D, the number of times that i1 and i2 occur in same shopping list simultaneously is { the i1 that the 2-item is concentrated, the frequent count value of i2} in set D), generate according to the frequent count value of 2-item collection and to satisfy the frequent 2-item collection that support requires, and judge according to frequent 2-item collection and to export this correlation rule when satisfying the correlation rule that confidence level requires.
These Map tasks in parallel are carried out, and wherein, for each Map task, carry out:
According to the data of obtaining respective range for the data line off-set value scope of its distribution from set D, specifically can be: according to the scope of predefined data line side-play amount key, read in the data of set D, and the data of reading in are converted to<key, value〉right, wherein, key is the sign of the data allocations that reads for the Map task, and value is the content of the data that read; According to read<key, value〉right, the frequent count value of statistics 2-item collection, and statistics is output as new<key, value〉right, wherein, key is a 2-item collection, value is the frequent count value that counts.
Carry out the Reduce task, the Reduce task with all Map tasks outputs<key, value〉centering key value is identical<key, value〉the value value addition of centering, obtain the frequent count value of 2-item collection in gathering D; Calculate the 2-item according to the frequent count value of 2-item collection and concentrate every support, for example, can the number of times of i2 and i3 and the supported degree of ratio of shopping list summation appear simultaneously by calculating in the shopping list, deletion 2-item is concentrated the item of support less than min_sup, keep the item of support wherein, thereby obtain frequent 2-item collection more than or equal to min_sup.The Reduce task also can judge whether the correlation rule of confidence level more than or equal to min_conf according to the frequent 2-item collection that obtains, if having, then exports this correlation rule.For example, the probability P (i3|i2) that also occurs simultaneously as i3 in the inventory that comprises i2 is during more than or equal to min_conf, output correlation rule i2=>i3.
Judge whether current iterations reaches k, if reach, process ends then; If do not reach, then carry out the next iteration process, promptly utilize frequent 2-item collection to generate frequent 3-item collection, the rest may be inferred, till satisfying termination condition.
It also can be a plurality of that the quantity of the Reduce task in the above-mentioned flow process can be one.If a plurality of, but these Reduce task executed in parallel then, wherein, each Reduce task can from all Map task handling results, search the key value identical<key, value〉to gathering.
In the above-mentioned flow process, because by the frequent count value in the partial data of Map task statistics K+1 item collection in transaction database of a plurality of executed in parallel, statistics according to all Map tasks gathers the frequent count value that obtains K+1 item collection again, thereby has realized a plurality of Processing tasks executed in parallel data handling procedures.
In the above-mentioned flow process, preferably, can finish to a plurality of XM the Map Task Distribution, also can give a node processing with one or more Map Task Distribution according to the load condition of node.In processing procedure, the Map task that each XM executed in parallel is assigned with, if an XM has been assigned with a plurality of Map tasks, then on this node, these Map tasks also are executed in parallel.
Based on identical technical conceive, the embodiment of the invention provides a kind of association rule mining system.
Referring to Fig. 3, the structural representation of the association rule mining system that provides for the embodiment of the invention, this system comprises: calling module 31, a plurality of Processing tasks execution module 32 (only illustrating 3 among the figure), gathers task execution module 33, also can further comprise judge module 34, wherein:
Processing tasks execution module 32 is corresponding one by one with described a plurality of parallel Processing tasks, is used to carry out Processing tasks, comprising: obtain the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
Gather task execution module 33, be used for carrying out and gather task, comprise: the statistics of all Processing tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in Transaction Information, frequent count value according to K+1 item collection is calculated the wherein support of each data item, get wherein support and form frequent K+1 item collection, and have in judgement according to described frequent K+1 item collection and to export this correlation rule when satisfying the correlation rule that confidence level requires more than or equal to the data item of support threshold value.
Said system can adopt Map/Reduce mechanism, at this moment, Processing tasks execution module 32 can be the Map task execution module, and this Map task execution module can be according to the data line offset ranges that is its distribution when handling, read the data of respective range from the affairs data centralization, and the data that read are converted to<key value〉right, wherein, key is the sign of the data allocations that reads for the Map task, and value is the data content that reads; According to this<key, value〉to the frequent count value of statistics K+1 item collection, and statistics is output as new<key, value right, wherein, key is a K+1 item collection, the frequent count value that value obtains for statistics.Gathering task execution module 33 can be the Reduce task execution module, this module in carrying out processing procedure, obtain the output of all Map tasks<key, value〉right, with the key value identical<key, value〉the value value addition of centering, obtain the frequent count value of K+1 item collection.
Need to prove that the embodiment of the invention can be applicable to the implementation procedure of Apriori algorithm, and the implementation procedure of other similar algorithms.
As can be seen from the above description, the embodiment of the invention realizes parallel association rule mining method based on Map/Reduce, and compared with prior art, its technique effect comprises:
(1) efficiency of algorithm gets a promotion.At the serial shortcoming of classical Apriori algorithm, finish algorithm most principal work (calculating frequent item set) based on Map/Reduce mechanism, effectively solve the problems such as effectiveness of performance that improved in the mass data association rule mining.The algorithm cost is bigger and calculating frequent item set work parallelization that parallel composition is higher can obtain higher parallel efficiency and speed-up ratio.
(2) storage capacity gets a promotion.Technical matterss such as efficient storage, redundancy backup, load balance and concurrent access when adopting the distributed file system solution to realize the mass data association rule mining.
(3) computing scale gets a promotion and enhanced scalability.Cluster environment based on Map/Reduce and DFS combination provides a solid computing platform for the large-scale parallel data mining, have good extensibility simultaneously, estimating to dispose the node number reaches about 256, the significantly lifting of computing scale helps solving the many bottleneck problems in the mass data excavation, can further improve and excavate effect and improve practicality.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Claims (13)
1. an association rule mining method is characterized in that, comprising:
Generate K+1 item collection by frequent K item collection;
Carry out a plurality of parallel Processing tasks, wherein, each Processing tasks obtains the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
The statistics of all Processing tasks gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
2. the method for claim 1 is characterized in that, described Processing tasks is mapping Map task;
Each Processing tasks obtains the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data, is specially:
Each Map task basis is the data line offset ranges of its distribution, read the data of respective range from the affairs data centralization, and the data that read are converted to<key1, value1〉right, wherein, key1 is the sign of the data allocations that reads for the Map task, and value1 is the data content that reads; And, statistics K+1 item collection this<key1, value1〉the frequent count value of centering, and statistics is output as<key2 value2 right, wherein, key2 is a K+1 item collection, the frequent count value that value2 obtains for statistics.
3. method as claimed in claim 2 is characterized in that, the statistics of all Map tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, is specially:
By carry out to simplify the Reduce task obtain all Map tasks outputs<key2, value2〉right, with the key2 value identical<key2, value2〉the value2 value addition of centering, obtain K+1 item collection in the concentrated frequent count value of described Transaction Information.
4. the method for claim 1 is characterized in that, satisfies the frequent K+1 item collection that support requires according to the frequent count value generation of K+1 item collection, is specially:
Frequent count value according to K+1 item collection is calculated the wherein support of each data item, gets wherein support and forms frequent K+1 item collection more than or equal to the data item of support threshold value.
5. the method for claim 1 is characterized in that, generate frequent K+1 item collection after, also comprise: if satisfy termination condition, then finish the association rule mining flow process.
6. method as claimed in claim 5 is characterized in that, satisfies termination condition, comprising:
Reach maximum iteration time; Perhaps, the correlation rule number of output surpasses the correlation rule amount threshold; Perhaps, the frequent K+1 item collection of generation is empty.
7. the method for claim 1 is characterized in that, described Processing tasks is assigned to a plurality of XM and carries out, and wherein, an XM is carried out one or more Processing tasks.
8. an association rule mining system is characterized in that, comprising:
Calling module is used for calling a plurality of parallel Processing tasks, and calling after described a plurality of parallel Processing tasks are finished and gather task according to behind the frequent K item collection generation K+1 item collection;
With described a plurality of parallel Processing tasks Processing tasks execution module one to one, be used to carry out Processing tasks, comprising: obtain the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
Gather task execution module, be used for carrying out and gather task, comprise: the statistics of all Processing tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
9. system as claimed in claim 8, it is characterized in that, described Processing tasks execution module is the Map task execution module, and described Map task execution module is further used for, according to the data line offset ranges that is its distribution, read the data of respective range from the affairs data centralization, and the data that read are converted to<key1 value1〉right, wherein, key1 is the sign of the data allocations that reads for the Map task, and value1 is the data content that reads; And, statistics K+1 item collection this<key1, value1〉the frequent count value of centering, and statistics is output as<key2 value2 right, wherein, key2 is a K+1 item collection, the frequent count value that value2 obtains for statistics.
10. system as claimed in claim 9, it is characterized in that, the described task execution module that gathers is the Reduce task execution module, described Reduce task execution module is further used for, obtain the output of all Map tasks<key2, value2〉right, with the key2 value identical<key2, value2〉the value2 value addition of centering, obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information.
11. system as claimed in claim 8, it is characterized in that, the described task execution module that gathers is further used for, and calculates the wherein support of each data item according to the frequent count value of K+1 item collection, gets wherein support and forms frequent K+1 item collection more than or equal to the data item of support threshold value.
12. system as claimed in claim 8 is characterized in that, also comprises:
Judge module is used for after generating frequent K+1 item collection, if termination condition is satisfied in judgement, then finishes the association rule mining flow process.
13. system as claimed in claim 12, it is characterized in that, described judge module is further used for, when judgement reaches maximum iteration time, the correlation rule number of perhaps judging output surpasses the correlation rule amount threshold, perhaps judge when the frequent K+1 item collection that generates is empty, finish the association rule mining flow process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910077996A CN101799810B (en) | 2009-02-06 | 2009-02-06 | Association rule mining method and system thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910077996A CN101799810B (en) | 2009-02-06 | 2009-02-06 | Association rule mining method and system thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101799810A true CN101799810A (en) | 2010-08-11 |
CN101799810B CN101799810B (en) | 2012-09-26 |
Family
ID=42595488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910077996A Active CN101799810B (en) | 2009-02-06 | 2009-02-06 | Association rule mining method and system thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101799810B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622447A (en) * | 2012-03-19 | 2012-08-01 | 南京大学 | Hadoop-based frequent closed itemset mining method |
CN102945240A (en) * | 2012-09-11 | 2013-02-27 | 杭州斯凯网络科技有限公司 | Method and device for realizing association rule mining algorithm supporting distributed computation |
CN103077183A (en) * | 2012-12-14 | 2013-05-01 | 北京普泽天玑数据技术有限公司 | Data importing method and system for distributed sequence list |
CN103093369A (en) * | 2011-11-03 | 2013-05-08 | 阿里巴巴集团控股有限公司 | Method and device for offering matched product based on correlation degree between products |
CN103136244A (en) * | 2011-11-29 | 2013-06-05 | 中国电信股份有限公司 | Parallel data mining method and system based on cloud computing platform |
CN103324712A (en) * | 2013-06-19 | 2013-09-25 | 西北工业大学 | Extraction method for non-redundancy plot rule |
CN103744842A (en) * | 2013-12-23 | 2014-04-23 | 武汉传神信息技术有限公司 | Method for translation error data analysis |
CN104054075A (en) * | 2011-12-06 | 2014-09-17 | 派赛普申合伙公司 | Text mining, analysis and output system |
CN104123504A (en) * | 2014-06-27 | 2014-10-29 | 武汉理工大学 | Cloud platform privacy protection method based on frequent item retrieval |
CN104573124A (en) * | 2015-02-09 | 2015-04-29 | 山东大学 | Education cloud application statistics method based on parallelized association rule algorithm |
CN104834733A (en) * | 2015-05-18 | 2015-08-12 | 成都博元科技有限公司 | Big data mining and analyzing method |
CN105302894A (en) * | 2015-10-21 | 2016-02-03 | 中国石油大学(华东) | Parallel association rule based tracking method and tracking apparatus for hotspots of public opinions |
CN105760279A (en) * | 2016-03-09 | 2016-07-13 | 北京国电通网络技术有限公司 | Method and system for generating fault early warning relevance tree of distributed database cluster |
CN105989095A (en) * | 2015-02-12 | 2016-10-05 | 香港理工大学深圳研究院 | Association rule significance test method and device capable of considering data uncertainty |
CN103093369B (en) * | 2011-11-03 | 2016-12-14 | 阿里巴巴集团控股有限公司 | A kind of method and device that collocation product is provided based on the degree of association between product |
CN106327323A (en) * | 2016-08-19 | 2017-01-11 | 清华大学 | Bank frequent item mode mining method and bank frequent item mode mining system |
CN103914528B (en) * | 2014-03-28 | 2017-02-15 | 南京邮电大学 | Parallelizing method of association analytical algorithm |
CN106844550A (en) * | 2016-12-30 | 2017-06-13 | 郑州云海信息技术有限公司 | Method and device is recommended in a kind of virtual platform operation |
CN106897293A (en) * | 2015-12-17 | 2017-06-27 | 中国移动通信集团公司 | A kind of data processing method and device |
CN106952198A (en) * | 2017-03-23 | 2017-07-14 | 阜阳职业技术学院 | A kind of Students ' Employment data analysing method based on Apriori algorithm |
CN110244184A (en) * | 2019-07-04 | 2019-09-17 | 国网江苏省电力有限公司 | A kind of distribution line fault observer method for digging, system and the medium of frequent item set |
US10467236B2 (en) | 2014-09-29 | 2019-11-05 | International Business Machines Corporation | Mining association rules in the map-reduce framework |
CN110489448A (en) * | 2019-07-24 | 2019-11-22 | 西安理工大学 | The method for digging of big data correlation rule based on Hadoop |
CN112434104A (en) * | 2020-12-04 | 2021-03-02 | 东北大学 | Redundant rule screening method and device for association rule mining |
CN112597215A (en) * | 2020-12-29 | 2021-04-02 | 科技谷(厦门)信息技术有限公司 | Data mining method based on Flink platform and parallel Apriori algorithm |
CN113420066A (en) * | 2021-06-18 | 2021-09-21 | 南京苏同科技有限公司 | Optimization method based on parallel association rules |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7836004B2 (en) * | 2006-12-11 | 2010-11-16 | International Business Machines Corporation | Using data mining algorithms including association rules and tree classifications to discover data rules |
CN101344902B (en) * | 2008-07-15 | 2010-07-28 | 北京科技大学 | Secondary protein structure forecasting technique based on association analysis and association classification |
-
2009
- 2009-02-06 CN CN200910077996A patent/CN101799810B/en active Active
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093369B (en) * | 2011-11-03 | 2016-12-14 | 阿里巴巴集团控股有限公司 | A kind of method and device that collocation product is provided based on the degree of association between product |
CN103093369A (en) * | 2011-11-03 | 2013-05-08 | 阿里巴巴集团控股有限公司 | Method and device for offering matched product based on correlation degree between products |
CN103136244A (en) * | 2011-11-29 | 2013-06-05 | 中国电信股份有限公司 | Parallel data mining method and system based on cloud computing platform |
CN104054075A (en) * | 2011-12-06 | 2014-09-17 | 派赛普申合伙公司 | Text mining, analysis and output system |
CN102622447A (en) * | 2012-03-19 | 2012-08-01 | 南京大学 | Hadoop-based frequent closed itemset mining method |
CN102945240A (en) * | 2012-09-11 | 2013-02-27 | 杭州斯凯网络科技有限公司 | Method and device for realizing association rule mining algorithm supporting distributed computation |
CN103077183A (en) * | 2012-12-14 | 2013-05-01 | 北京普泽天玑数据技术有限公司 | Data importing method and system for distributed sequence list |
CN103077183B (en) * | 2012-12-14 | 2017-11-17 | 北京普泽创智数据技术有限公司 | A kind of data lead-in method and its system of distributed sequence list |
CN103324712A (en) * | 2013-06-19 | 2013-09-25 | 西北工业大学 | Extraction method for non-redundancy plot rule |
CN103744842A (en) * | 2013-12-23 | 2014-04-23 | 武汉传神信息技术有限公司 | Method for translation error data analysis |
CN103914528B (en) * | 2014-03-28 | 2017-02-15 | 南京邮电大学 | Parallelizing method of association analytical algorithm |
CN104123504A (en) * | 2014-06-27 | 2014-10-29 | 武汉理工大学 | Cloud platform privacy protection method based on frequent item retrieval |
CN104123504B (en) * | 2014-06-27 | 2017-07-28 | 武汉理工大学 | A kind of cloud platform method for secret protection retrieved based on frequent episode |
US10467236B2 (en) | 2014-09-29 | 2019-11-05 | International Business Machines Corporation | Mining association rules in the map-reduce framework |
CN104573124B (en) * | 2015-02-09 | 2018-04-10 | 山东大学 | A kind of education cloud application statistical method based on parallelization association rule algorithm |
CN104573124A (en) * | 2015-02-09 | 2015-04-29 | 山东大学 | Education cloud application statistics method based on parallelized association rule algorithm |
CN105989095A (en) * | 2015-02-12 | 2016-10-05 | 香港理工大学深圳研究院 | Association rule significance test method and device capable of considering data uncertainty |
CN105989095B (en) * | 2015-02-12 | 2019-09-06 | 香港理工大学深圳研究院 | Take the correlation rule significance test method and device of data uncertainty into account |
CN104834733A (en) * | 2015-05-18 | 2015-08-12 | 成都博元科技有限公司 | Big data mining and analyzing method |
CN105302894A (en) * | 2015-10-21 | 2016-02-03 | 中国石油大学(华东) | Parallel association rule based tracking method and tracking apparatus for hotspots of public opinions |
CN106897293A (en) * | 2015-12-17 | 2017-06-27 | 中国移动通信集团公司 | A kind of data processing method and device |
CN106897293B (en) * | 2015-12-17 | 2020-09-11 | 中国移动通信集团公司 | Data processing method and device |
CN105760279A (en) * | 2016-03-09 | 2016-07-13 | 北京国电通网络技术有限公司 | Method and system for generating fault early warning relevance tree of distributed database cluster |
CN105760279B (en) * | 2016-03-09 | 2018-09-07 | 北京国电通网络技术有限公司 | Distributed experiment & measurement system fault pre-alarming relevance tree generation method and system |
CN106327323A (en) * | 2016-08-19 | 2017-01-11 | 清华大学 | Bank frequent item mode mining method and bank frequent item mode mining system |
CN106844550A (en) * | 2016-12-30 | 2017-06-13 | 郑州云海信息技术有限公司 | Method and device is recommended in a kind of virtual platform operation |
CN106952198A (en) * | 2017-03-23 | 2017-07-14 | 阜阳职业技术学院 | A kind of Students ' Employment data analysing method based on Apriori algorithm |
CN110244184A (en) * | 2019-07-04 | 2019-09-17 | 国网江苏省电力有限公司 | A kind of distribution line fault observer method for digging, system and the medium of frequent item set |
CN110489448A (en) * | 2019-07-24 | 2019-11-22 | 西安理工大学 | The method for digging of big data correlation rule based on Hadoop |
CN112434104A (en) * | 2020-12-04 | 2021-03-02 | 东北大学 | Redundant rule screening method and device for association rule mining |
CN112434104B (en) * | 2020-12-04 | 2023-10-20 | 东北大学 | Redundant rule screening method and device for association rule mining |
CN112597215A (en) * | 2020-12-29 | 2021-04-02 | 科技谷(厦门)信息技术有限公司 | Data mining method based on Flink platform and parallel Apriori algorithm |
CN113420066A (en) * | 2021-06-18 | 2021-09-21 | 南京苏同科技有限公司 | Optimization method based on parallel association rules |
Also Published As
Publication number | Publication date |
---|---|
CN101799810B (en) | 2012-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101799810B (en) | Association rule mining method and system thereof | |
Ahmed et al. | Efficient tree structures for high utility pattern mining in incremental databases | |
CN103930888B (en) | Selected based on the many grain size subpopulation polymerizations updating, storing and response constrains | |
CN103258049A (en) | Association rule mining method based on mass data | |
CN101178727A (en) | Method of querying relational database management systems | |
EP3924837A1 (en) | Materialized graph views for efficient graph analysis | |
CN101226557A (en) | Method and system for processing efficient relating subject model data | |
CN101133414A (en) | Multiprocessor system, and its information processing method | |
Singh et al. | Mining of high‐utility itemsets with negative utility | |
Osman et al. | Towards real-time analytics in the cloud | |
Öztaş et al. | A hybrid metaheuristic algorithm based on iterated local search for vehicle routing problem with simultaneous pickup and delivery | |
Karim et al. | An efficient distributed programming model for mining useful patterns in big datasets | |
Lin et al. | Maintenance algorithm for high average-utility itemsets with transaction deletion | |
CN107368501A (en) | The processing method and processing device of data | |
Salah et al. | A highly scalable parallel algorithm for maximally informative k-itemset mining | |
Anwar et al. | Optimization of many objective pickup and delivery problem with delay time of vehicle using memetic decomposition based evolutionary algorithm | |
CN110471960B (en) | High-utility item set mining method containing negative utility | |
US7647333B2 (en) | Cube-based percentile calculation | |
Nightingale | The extended global cardinality constraint: An empirical survey | |
US6219672B1 (en) | Distributed shared memory system and method of controlling distributed shared memory | |
CN115391047A (en) | Resource scheduling method and device | |
CN115936875A (en) | Financial product form hanging processing method and device | |
Scheinert et al. | Karasu: A collaborative approach to efficient cluster configuration for big data analytics | |
CN110648103B (en) | Method, apparatus, medium, and computing device for selecting items in a warehouse | |
CN113918561A (en) | Hybrid query method and system based on-cloud analysis scene and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |