CN101799810A - Association rule mining method and system thereof - Google Patents

Association rule mining method and system thereof Download PDF

Info

Publication number
CN101799810A
CN101799810A CN200910077996A CN200910077996A CN101799810A CN 101799810 A CN101799810 A CN 101799810A CN 200910077996 A CN200910077996 A CN 200910077996A CN 200910077996 A CN200910077996 A CN 200910077996A CN 101799810 A CN101799810 A CN 101799810A
Authority
CN
China
Prior art keywords
frequent
item collection
data
count value
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910077996A
Other languages
Chinese (zh)
Other versions
CN101799810B (en
Inventor
高丹
邓超
徐萌
罗治国
周文辉
何清
曾立
郑诗豪
沈亚飞
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN200910077996A priority Critical patent/CN101799810B/en
Publication of CN101799810A publication Critical patent/CN101799810A/en
Application granted granted Critical
Publication of CN101799810B publication Critical patent/CN101799810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an association rule mining method and a system thereof. The method comprises the steps of: generating a K+1 item set from a frequent K item set; performing a plurality of parallel processing tasks, wherein each processing task obtains data of the corresponding part in a transaction data set, and counting the frequent count value of the K+1 item set in the data; collecting the statistical result of all processing tasks to obtain the frequent count value of the K+1 item set in the transaction data set; generating the frequent K+1 item set which meets the requirement of support degree according to the frequent count value of the K+1 item set; and outputting the association rule when the association rule which meets the requirement of support degree is judged to be existed according to the frequent K+1 item set. The invention can improve the processing efficiency for mining the association rule.

Description

A kind of association rule mining method and system thereof
Technical field
The present invention relates to the data mining technology in the communications field, relate in particular to a kind of association rule mining method and system thereof.
Background technology
In data mining was handled, the data mining purpose of correlation rule (Association Rule) was association that merits attention or the correlationship that discovery exists between the lot of data item, and it is the market basket analysis of retail trade that the typical case uses.So-called market basket analysis is meant that data are carried out correlation rule research helps to find the contact between the different commodity (or different item) in the transaction data base, find out the pattern of customer purchasing behavior, for example, if bread and milk are often bought simultaneously by client, then they are placed in the sales volume that helps to increase by two kinds of commodity together.In order to weigh the significance level of a rule, correlation rule adopts support (support) and confidence level (confidence) as module usually.Support can be represented the significance level of commodity in sell in the supermarket, and confidence level has reflected the correlation degree between the commodity.If in the transaction of buying bread, there is 60% transaction not only to buy bread but also bought milk, then claim correlation rule " bread
Figure B2009100779965D0000011
Milk " confidence level of (if expression is bought bread then bought milk) is 60%.
Correlation rule
Figure B2009100779965D0000012
(expression A and B exist simultaneously) support in transaction database D, usable probability P (A ∪ B) expression;
Correlation rule
Figure B2009100779965D0000013
Confidence level in transaction database D is that in transaction database D those comprise in the affairs of A, the probability that B also occurs simultaneously, i.e. conditional probability P (B|A).
The support of an item collection X in transaction database D is the number percent that the affairs count (X) that comprises X among the transaction database D accounts for affairs sum N, i.e. probability P (X).For an item collection X, if its support, claims then that X is frequent item set (FI:Frequent Itemset) or frequent mode more than or equal to support threshold value min_sup given in advance.
In the prior art, the data mining of correlation rule is handled and is generally comprised two parts:
First: find out the frequent item set of all supports more than or equal to the minimum support threshold value;
Second portion: generate the correlation rule that satisfies the confidence level threshold value by frequent item set.
The work of above-mentioned first is quite time-consuming, and second portion is operated in and is easier to realize on the basis of first, so the overall performance of association rules mining algorithm is mainly by first's work decision.
The algorithm of the excavation boolean relation rule frequent item set that Apriori algorithm of the prior art is a kind of classics.The Apriori algorithm is carrying out the work of above-mentioned first, promptly, when finding out frequent item set, need scan database repeatedly, during the amount of bordering on the sea data mining face to face, because the restriction of internal storage capacity, data can't all be loaded into the central computing of internal storage, even can't go up storage at unit (or single node), and, the Apriori algorithm has limited mining efficiency to a certain extent as a kind of serial algorithm.
Summary of the invention
The embodiment of the invention provides a kind of association rule mining method and system thereof, to solve the existing low problem of association rule mining treatment effeciency.
The association rule mining method that the embodiment of the invention provides comprises:
Generate K+1 item collection by frequent K item collection;
Carry out a plurality of parallel Processing tasks, wherein, each Processing tasks obtains the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
The statistics of all Processing tasks gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
The association rule mining system that the embodiment of the invention provides comprises:
Calling module is used for calling a plurality of parallel Processing tasks, and calling after described a plurality of parallel Processing tasks are finished and gather task according to behind the frequent K item collection generation K+1 item collection;
With described a plurality of parallel Processing tasks Processing tasks execution module one to one, be used to carry out Processing tasks, comprising: obtain the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
Gather task execution module, be used for carrying out and gather task, comprise: the statistics of all Processing tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
The above embodiment of the present invention, generating in the process of frequent K+1 item collection with frequent K item collection, Processing tasks by a plurality of executed in parallel obtains the partial data that Transaction Information is concentrated, and add up the frequent count value of K+1 item collection in the each several part Transaction Information respectively, and then gather, obtain the frequent count value that K+1 item collection is concentrated in whole Transaction Information, thereby generate frequent K+1 item collection and the output of satisfying the support requirement and satisfy the correlation rule that confidence level requires, a plurality of Processing tasks executed in parallel have been realized, compared with prior art, improved the treatment effeciency of association rule mining.
Description of drawings
Fig. 1 is a parallel association rules schematic flow sheet in the embodiment of the invention;
Fig. 2 adopts Map/Reduce mechanism to realize the synoptic diagram of parallel association rules flow process in the embodiment of the invention;
Fig. 3 is the data digging system structural representation in the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing the embodiment of the invention is described in detail.
In the association rule mining process, when generating frequent item set, need generate next frequent item set with the frequent item set of previous generation.
Referring to Fig. 1, the synoptic diagram of the association rule mining flow process that provides for the embodiment of the invention comprises:
Step 101, the frequent k item collection of generation;
Step 102, utilize frequent k item collection to generate to satisfy the frequent k+1 item collection that support requires, judge when satisfying the correlation rule that confidence level requires, export this correlation rule according to this frequent k+1 item collection; Preferably, result can be exported to distributed file system preserves;
Step 103, judge whether to satisfy termination condition, if, process ends then; Otherwise, the k value is increased progressively and return step 102 and carry out the next iteration process.
In the step 103 of above-mentioned flow process, termination condition can comprise: reach the maximum iteration time of setting, perhaps Shu Chu correlation rule quantity reaches the amount threshold of setting, and perhaps the frequent k+1 item collection of Sheng Chenging is empty.
The frequent k item of utilization in the step 102 of above-mentioned flow process collection generates the process of the frequent k+1 item collection that satisfies the support requirement, can adopt Map/Reduce (mapping/simplification) mechanism to realize.Map/Reduce is the programming mode of a distributed treatment mass data collection, can allow Automatic Program be distributed to concurrent execution on the super large cluster of being made up of common machines by this mechanism.The process of the frequent k+1 item of the generation collection that employing Map/Reduce mechanism realizes can be as shown in Figure 2.
Referring to Fig. 2, realize the parallel association rules schematic flow sheet for adopting Map/Reduce mechanism in the embodiment of the invention.With the example that is applied as of commodity purchasing basket, I:{i1, i2 ... be that commodity are gathered, D:{T1, T2 ... being the shopping list set, minimum support is min_sup, minimum confidence level is min_conf, and as shown in the figure, maximum iteration time is that the flow process of the correlation rule of k comprises:
Generate the frequent 1-item collection of support according to set D more than or equal to min_sup.In this step, can generate the frequent 1-item collection that satisfies more than or equal to support threshold value min_sup condition by the mode of scanning set D.The item collection is meant the set of commodity, is the subclass of I.1-item collection is meant in the commodity set and includes only a kind of commodity (as i1), the support of item collection is meant that the number of times that this collection occurs (occurs 30 times as item collection i1 altogether divided by the total degree of concluding the business among the D in D in D, transaction adds up to 100 among the D, and then the support of this collection is 30%).If the support threshold value is 20%, then this 1-item rally is as frequent 1-item collection output.
Generate 2-item collection according to frequent 1-item collection, 2-item collection is meant in the commodity set and comprises that 2 kinds of commodity are (as i2, i3).Consider not need to calculate the possible situation of each 2-item collection, can do beta pruning and handle.
Generate a plurality of parallel Map tasks, and the Reduce task.Wherein, each Map task is responsible for obtaining the data of appropriate section among the set D, and the frequent count value of statistics 2-item collection in this partial data; The Reduce task is responsible for statistics to all Map tasks and is gathered and (for example obtain the frequent count value of 2-item collection in set D, in all shopping lists among the set D, the number of times that i1 and i2 occur in same shopping list simultaneously is { the i1 that the 2-item is concentrated, the frequent count value of i2} in set D), generate according to the frequent count value of 2-item collection and to satisfy the frequent 2-item collection that support requires, and judge according to frequent 2-item collection and to export this correlation rule when satisfying the correlation rule that confidence level requires.
These Map tasks in parallel are carried out, and wherein, for each Map task, carry out:
According to the data of obtaining respective range for the data line off-set value scope of its distribution from set D, specifically can be: according to the scope of predefined data line side-play amount key, read in the data of set D, and the data of reading in are converted to<key, value〉right, wherein, key is the sign of the data allocations that reads for the Map task, and value is the content of the data that read; According to read<key, value〉right, the frequent count value of statistics 2-item collection, and statistics is output as new<key, value〉right, wherein, key is a 2-item collection, value is the frequent count value that counts.
Carry out the Reduce task, the Reduce task with all Map tasks outputs<key, value〉centering key value is identical<key, value〉the value value addition of centering, obtain the frequent count value of 2-item collection in gathering D; Calculate the 2-item according to the frequent count value of 2-item collection and concentrate every support, for example, can the number of times of i2 and i3 and the supported degree of ratio of shopping list summation appear simultaneously by calculating in the shopping list, deletion 2-item is concentrated the item of support less than min_sup, keep the item of support wherein, thereby obtain frequent 2-item collection more than or equal to min_sup.The Reduce task also can judge whether the correlation rule of confidence level more than or equal to min_conf according to the frequent 2-item collection that obtains, if having, then exports this correlation rule.For example, the probability P (i3|i2) that also occurs simultaneously as i3 in the inventory that comprises i2 is during more than or equal to min_conf, output correlation rule i2=>i3.
Judge whether current iterations reaches k, if reach, process ends then; If do not reach, then carry out the next iteration process, promptly utilize frequent 2-item collection to generate frequent 3-item collection, the rest may be inferred, till satisfying termination condition.
It also can be a plurality of that the quantity of the Reduce task in the above-mentioned flow process can be one.If a plurality of, but these Reduce task executed in parallel then, wherein, each Reduce task can from all Map task handling results, search the key value identical<key, value〉to gathering.
In the above-mentioned flow process, because by the frequent count value in the partial data of Map task statistics K+1 item collection in transaction database of a plurality of executed in parallel, statistics according to all Map tasks gathers the frequent count value that obtains K+1 item collection again, thereby has realized a plurality of Processing tasks executed in parallel data handling procedures.
In the above-mentioned flow process, preferably, can finish to a plurality of XM the Map Task Distribution, also can give a node processing with one or more Map Task Distribution according to the load condition of node.In processing procedure, the Map task that each XM executed in parallel is assigned with, if an XM has been assigned with a plurality of Map tasks, then on this node, these Map tasks also are executed in parallel.
Based on identical technical conceive, the embodiment of the invention provides a kind of association rule mining system.
Referring to Fig. 3, the structural representation of the association rule mining system that provides for the embodiment of the invention, this system comprises: calling module 31, a plurality of Processing tasks execution module 32 (only illustrating 3 among the figure), gathers task execution module 33, also can further comprise judge module 34, wherein:
Calling module 31 is used for calling a plurality of parallel Processing tasks, and calling after described a plurality of parallel Processing tasks are finished and gather task according to behind the frequent K item collection generation K+1 item collection;
Processing tasks execution module 32 is corresponding one by one with described a plurality of parallel Processing tasks, is used to carry out Processing tasks, comprising: obtain the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
Gather task execution module 33, be used for carrying out and gather task, comprise: the statistics of all Processing tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in Transaction Information, frequent count value according to K+1 item collection is calculated the wherein support of each data item, get wherein support and form frequent K+1 item collection, and have in judgement according to described frequent K+1 item collection and to export this correlation rule when satisfying the correlation rule that confidence level requires more than or equal to the data item of support threshold value.
Said system can adopt Map/Reduce mechanism, at this moment, Processing tasks execution module 32 can be the Map task execution module, and this Map task execution module can be according to the data line offset ranges that is its distribution when handling, read the data of respective range from the affairs data centralization, and the data that read are converted to<key value〉right, wherein, key is the sign of the data allocations that reads for the Map task, and value is the data content that reads; According to this<key, value〉to the frequent count value of statistics K+1 item collection, and statistics is output as new<key, value right, wherein, key is a K+1 item collection, the frequent count value that value obtains for statistics.Gathering task execution module 33 can be the Reduce task execution module, this module in carrying out processing procedure, obtain the output of all Map tasks<key, value〉right, with the key value identical<key, value〉the value value addition of centering, obtain the frequent count value of K+1 item collection.
Judge module 34 is used for after generating frequent K+1 item collection, if termination condition is satisfied in judgement, then finishes the association rule mining flow process.For example,, judge that perhaps the correlation rule number of output surpasses the correlation rule amount threshold, when the frequent K+1 item collection of perhaps judging generation is empty, finish the association rule mining flow process when judge module 34 judgements reach maximum iteration time.
Need to prove that the embodiment of the invention can be applicable to the implementation procedure of Apriori algorithm, and the implementation procedure of other similar algorithms.
As can be seen from the above description, the embodiment of the invention realizes parallel association rule mining method based on Map/Reduce, and compared with prior art, its technique effect comprises:
(1) efficiency of algorithm gets a promotion.At the serial shortcoming of classical Apriori algorithm, finish algorithm most principal work (calculating frequent item set) based on Map/Reduce mechanism, effectively solve the problems such as effectiveness of performance that improved in the mass data association rule mining.The algorithm cost is bigger and calculating frequent item set work parallelization that parallel composition is higher can obtain higher parallel efficiency and speed-up ratio.
(2) storage capacity gets a promotion.Technical matterss such as efficient storage, redundancy backup, load balance and concurrent access when adopting the distributed file system solution to realize the mass data association rule mining.
(3) computing scale gets a promotion and enhanced scalability.Cluster environment based on Map/Reduce and DFS combination provides a solid computing platform for the large-scale parallel data mining, have good extensibility simultaneously, estimating to dispose the node number reaches about 256, the significantly lifting of computing scale helps solving the many bottleneck problems in the mass data excavation, can further improve and excavate effect and improve practicality.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (13)

1. an association rule mining method is characterized in that, comprising:
Generate K+1 item collection by frequent K item collection;
Carry out a plurality of parallel Processing tasks, wherein, each Processing tasks obtains the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
The statistics of all Processing tasks gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
2. the method for claim 1 is characterized in that, described Processing tasks is mapping Map task;
Each Processing tasks obtains the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data, is specially:
Each Map task basis is the data line offset ranges of its distribution, read the data of respective range from the affairs data centralization, and the data that read are converted to<key1, value1〉right, wherein, key1 is the sign of the data allocations that reads for the Map task, and value1 is the data content that reads; And, statistics K+1 item collection this<key1, value1〉the frequent count value of centering, and statistics is output as<key2 value2 right, wherein, key2 is a K+1 item collection, the frequent count value that value2 obtains for statistics.
3. method as claimed in claim 2 is characterized in that, the statistics of all Map tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, is specially:
By carry out to simplify the Reduce task obtain all Map tasks outputs<key2, value2〉right, with the key2 value identical<key2, value2〉the value2 value addition of centering, obtain K+1 item collection in the concentrated frequent count value of described Transaction Information.
4. the method for claim 1 is characterized in that, satisfies the frequent K+1 item collection that support requires according to the frequent count value generation of K+1 item collection, is specially:
Frequent count value according to K+1 item collection is calculated the wherein support of each data item, gets wherein support and forms frequent K+1 item collection more than or equal to the data item of support threshold value.
5. the method for claim 1 is characterized in that, generate frequent K+1 item collection after, also comprise: if satisfy termination condition, then finish the association rule mining flow process.
6. method as claimed in claim 5 is characterized in that, satisfies termination condition, comprising:
Reach maximum iteration time; Perhaps, the correlation rule number of output surpasses the correlation rule amount threshold; Perhaps, the frequent K+1 item collection of generation is empty.
7. the method for claim 1 is characterized in that, described Processing tasks is assigned to a plurality of XM and carries out, and wherein, an XM is carried out one or more Processing tasks.
8. an association rule mining system is characterized in that, comprising:
Calling module is used for calling a plurality of parallel Processing tasks, and calling after described a plurality of parallel Processing tasks are finished and gather task according to behind the frequent K item collection generation K+1 item collection;
With described a plurality of parallel Processing tasks Processing tasks execution module one to one, be used to carry out Processing tasks, comprising: obtain the data that Transaction Information is concentrated appropriate section, and the frequent count value of statistics K+1 item collection in this partial data;
Gather task execution module, be used for carrying out and gather task, comprise: the statistics of all Processing tasks is gathered obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information, frequent count value according to K+1 item collection generates the frequent K+1 item collection that satisfies the support requirement, and exports this correlation rule according to described frequent K+1 item collection when judgement has the correlation rule that satisfies the confidence level requirement.
9. system as claimed in claim 8, it is characterized in that, described Processing tasks execution module is the Map task execution module, and described Map task execution module is further used for, according to the data line offset ranges that is its distribution, read the data of respective range from the affairs data centralization, and the data that read are converted to<key1 value1〉right, wherein, key1 is the sign of the data allocations that reads for the Map task, and value1 is the data content that reads; And, statistics K+1 item collection this<key1, value1〉the frequent count value of centering, and statistics is output as<key2 value2 right, wherein, key2 is a K+1 item collection, the frequent count value that value2 obtains for statistics.
10. system as claimed in claim 9, it is characterized in that, the described task execution module that gathers is the Reduce task execution module, described Reduce task execution module is further used for, obtain the output of all Map tasks<key2, value2〉right, with the key2 value identical<key2, value2〉the value2 value addition of centering, obtain the frequent count value that K+1 item collection is concentrated in described Transaction Information.
11. system as claimed in claim 8, it is characterized in that, the described task execution module that gathers is further used for, and calculates the wherein support of each data item according to the frequent count value of K+1 item collection, gets wherein support and forms frequent K+1 item collection more than or equal to the data item of support threshold value.
12. system as claimed in claim 8 is characterized in that, also comprises:
Judge module is used for after generating frequent K+1 item collection, if termination condition is satisfied in judgement, then finishes the association rule mining flow process.
13. system as claimed in claim 12, it is characterized in that, described judge module is further used for, when judgement reaches maximum iteration time, the correlation rule number of perhaps judging output surpasses the correlation rule amount threshold, perhaps judge when the frequent K+1 item collection that generates is empty, finish the association rule mining flow process.
CN200910077996A 2009-02-06 2009-02-06 Association rule mining method and system thereof Active CN101799810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910077996A CN101799810B (en) 2009-02-06 2009-02-06 Association rule mining method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910077996A CN101799810B (en) 2009-02-06 2009-02-06 Association rule mining method and system thereof

Publications (2)

Publication Number Publication Date
CN101799810A true CN101799810A (en) 2010-08-11
CN101799810B CN101799810B (en) 2012-09-26

Family

ID=42595488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910077996A Active CN101799810B (en) 2009-02-06 2009-02-06 Association rule mining method and system thereof

Country Status (1)

Country Link
CN (1) CN101799810B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622447A (en) * 2012-03-19 2012-08-01 南京大学 Hadoop-based frequent closed itemset mining method
CN102945240A (en) * 2012-09-11 2013-02-27 杭州斯凯网络科技有限公司 Method and device for realizing association rule mining algorithm supporting distributed computation
CN103077183A (en) * 2012-12-14 2013-05-01 北京普泽天玑数据技术有限公司 Data importing method and system for distributed sequence list
CN103093369A (en) * 2011-11-03 2013-05-08 阿里巴巴集团控股有限公司 Method and device for offering matched product based on correlation degree between products
CN103136244A (en) * 2011-11-29 2013-06-05 中国电信股份有限公司 Parallel data mining method and system based on cloud computing platform
CN103324712A (en) * 2013-06-19 2013-09-25 西北工业大学 Extraction method for non-redundancy plot rule
CN103744842A (en) * 2013-12-23 2014-04-23 武汉传神信息技术有限公司 Method for translation error data analysis
CN104054075A (en) * 2011-12-06 2014-09-17 派赛普申合伙公司 Text mining, analysis and output system
CN104123504A (en) * 2014-06-27 2014-10-29 武汉理工大学 Cloud platform privacy protection method based on frequent item retrieval
CN104573124A (en) * 2015-02-09 2015-04-29 山东大学 Education cloud application statistics method based on parallelized association rule algorithm
CN104834733A (en) * 2015-05-18 2015-08-12 成都博元科技有限公司 Big data mining and analyzing method
CN105302894A (en) * 2015-10-21 2016-02-03 中国石油大学(华东) Parallel association rule based tracking method and tracking apparatus for hotspots of public opinions
CN105760279A (en) * 2016-03-09 2016-07-13 北京国电通网络技术有限公司 Method and system for generating fault early warning relevance tree of distributed database cluster
CN105989095A (en) * 2015-02-12 2016-10-05 香港理工大学深圳研究院 Association rule significance test method and device capable of considering data uncertainty
CN103093369B (en) * 2011-11-03 2016-12-14 阿里巴巴集团控股有限公司 A kind of method and device that collocation product is provided based on the degree of association between product
CN106327323A (en) * 2016-08-19 2017-01-11 清华大学 Bank frequent item mode mining method and bank frequent item mode mining system
CN103914528B (en) * 2014-03-28 2017-02-15 南京邮电大学 Parallelizing method of association analytical algorithm
CN106844550A (en) * 2016-12-30 2017-06-13 郑州云海信息技术有限公司 Method and device is recommended in a kind of virtual platform operation
CN106897293A (en) * 2015-12-17 2017-06-27 中国移动通信集团公司 A kind of data processing method and device
CN106952198A (en) * 2017-03-23 2017-07-14 阜阳职业技术学院 A kind of Students ' Employment data analysing method based on Apriori algorithm
CN110244184A (en) * 2019-07-04 2019-09-17 国网江苏省电力有限公司 A kind of distribution line fault observer method for digging, system and the medium of frequent item set
US10467236B2 (en) 2014-09-29 2019-11-05 International Business Machines Corporation Mining association rules in the map-reduce framework
CN110489448A (en) * 2019-07-24 2019-11-22 西安理工大学 The method for digging of big data correlation rule based on Hadoop
CN112434104A (en) * 2020-12-04 2021-03-02 东北大学 Redundant rule screening method and device for association rule mining
CN112597215A (en) * 2020-12-29 2021-04-02 科技谷(厦门)信息技术有限公司 Data mining method based on Flink platform and parallel Apriori algorithm
CN113420066A (en) * 2021-06-18 2021-09-21 南京苏同科技有限公司 Optimization method based on parallel association rules

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7836004B2 (en) * 2006-12-11 2010-11-16 International Business Machines Corporation Using data mining algorithms including association rules and tree classifications to discover data rules
CN101344902B (en) * 2008-07-15 2010-07-28 北京科技大学 Secondary protein structure forecasting technique based on association analysis and association classification

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093369B (en) * 2011-11-03 2016-12-14 阿里巴巴集团控股有限公司 A kind of method and device that collocation product is provided based on the degree of association between product
CN103093369A (en) * 2011-11-03 2013-05-08 阿里巴巴集团控股有限公司 Method and device for offering matched product based on correlation degree between products
CN103136244A (en) * 2011-11-29 2013-06-05 中国电信股份有限公司 Parallel data mining method and system based on cloud computing platform
CN104054075A (en) * 2011-12-06 2014-09-17 派赛普申合伙公司 Text mining, analysis and output system
CN102622447A (en) * 2012-03-19 2012-08-01 南京大学 Hadoop-based frequent closed itemset mining method
CN102945240A (en) * 2012-09-11 2013-02-27 杭州斯凯网络科技有限公司 Method and device for realizing association rule mining algorithm supporting distributed computation
CN103077183A (en) * 2012-12-14 2013-05-01 北京普泽天玑数据技术有限公司 Data importing method and system for distributed sequence list
CN103077183B (en) * 2012-12-14 2017-11-17 北京普泽创智数据技术有限公司 A kind of data lead-in method and its system of distributed sequence list
CN103324712A (en) * 2013-06-19 2013-09-25 西北工业大学 Extraction method for non-redundancy plot rule
CN103744842A (en) * 2013-12-23 2014-04-23 武汉传神信息技术有限公司 Method for translation error data analysis
CN103914528B (en) * 2014-03-28 2017-02-15 南京邮电大学 Parallelizing method of association analytical algorithm
CN104123504A (en) * 2014-06-27 2014-10-29 武汉理工大学 Cloud platform privacy protection method based on frequent item retrieval
CN104123504B (en) * 2014-06-27 2017-07-28 武汉理工大学 A kind of cloud platform method for secret protection retrieved based on frequent episode
US10467236B2 (en) 2014-09-29 2019-11-05 International Business Machines Corporation Mining association rules in the map-reduce framework
CN104573124B (en) * 2015-02-09 2018-04-10 山东大学 A kind of education cloud application statistical method based on parallelization association rule algorithm
CN104573124A (en) * 2015-02-09 2015-04-29 山东大学 Education cloud application statistics method based on parallelized association rule algorithm
CN105989095A (en) * 2015-02-12 2016-10-05 香港理工大学深圳研究院 Association rule significance test method and device capable of considering data uncertainty
CN105989095B (en) * 2015-02-12 2019-09-06 香港理工大学深圳研究院 Take the correlation rule significance test method and device of data uncertainty into account
CN104834733A (en) * 2015-05-18 2015-08-12 成都博元科技有限公司 Big data mining and analyzing method
CN105302894A (en) * 2015-10-21 2016-02-03 中国石油大学(华东) Parallel association rule based tracking method and tracking apparatus for hotspots of public opinions
CN106897293A (en) * 2015-12-17 2017-06-27 中国移动通信集团公司 A kind of data processing method and device
CN106897293B (en) * 2015-12-17 2020-09-11 中国移动通信集团公司 Data processing method and device
CN105760279A (en) * 2016-03-09 2016-07-13 北京国电通网络技术有限公司 Method and system for generating fault early warning relevance tree of distributed database cluster
CN105760279B (en) * 2016-03-09 2018-09-07 北京国电通网络技术有限公司 Distributed experiment & measurement system fault pre-alarming relevance tree generation method and system
CN106327323A (en) * 2016-08-19 2017-01-11 清华大学 Bank frequent item mode mining method and bank frequent item mode mining system
CN106844550A (en) * 2016-12-30 2017-06-13 郑州云海信息技术有限公司 Method and device is recommended in a kind of virtual platform operation
CN106952198A (en) * 2017-03-23 2017-07-14 阜阳职业技术学院 A kind of Students ' Employment data analysing method based on Apriori algorithm
CN110244184A (en) * 2019-07-04 2019-09-17 国网江苏省电力有限公司 A kind of distribution line fault observer method for digging, system and the medium of frequent item set
CN110489448A (en) * 2019-07-24 2019-11-22 西安理工大学 The method for digging of big data correlation rule based on Hadoop
CN112434104A (en) * 2020-12-04 2021-03-02 东北大学 Redundant rule screening method and device for association rule mining
CN112434104B (en) * 2020-12-04 2023-10-20 东北大学 Redundant rule screening method and device for association rule mining
CN112597215A (en) * 2020-12-29 2021-04-02 科技谷(厦门)信息技术有限公司 Data mining method based on Flink platform and parallel Apriori algorithm
CN113420066A (en) * 2021-06-18 2021-09-21 南京苏同科技有限公司 Optimization method based on parallel association rules

Also Published As

Publication number Publication date
CN101799810B (en) 2012-09-26

Similar Documents

Publication Publication Date Title
CN101799810B (en) Association rule mining method and system thereof
Ahmed et al. Efficient tree structures for high utility pattern mining in incremental databases
CN103930888B (en) Selected based on the many grain size subpopulation polymerizations updating, storing and response constrains
CN103258049A (en) Association rule mining method based on mass data
CN101178727A (en) Method of querying relational database management systems
EP3924837A1 (en) Materialized graph views for efficient graph analysis
CN101226557A (en) Method and system for processing efficient relating subject model data
CN101133414A (en) Multiprocessor system, and its information processing method
Singh et al. Mining of high‐utility itemsets with negative utility
Osman et al. Towards real-time analytics in the cloud
Öztaş et al. A hybrid metaheuristic algorithm based on iterated local search for vehicle routing problem with simultaneous pickup and delivery
Karim et al. An efficient distributed programming model for mining useful patterns in big datasets
Lin et al. Maintenance algorithm for high average-utility itemsets with transaction deletion
CN107368501A (en) The processing method and processing device of data
Salah et al. A highly scalable parallel algorithm for maximally informative k-itemset mining
Anwar et al. Optimization of many objective pickup and delivery problem with delay time of vehicle using memetic decomposition based evolutionary algorithm
CN110471960B (en) High-utility item set mining method containing negative utility
US7647333B2 (en) Cube-based percentile calculation
Nightingale The extended global cardinality constraint: An empirical survey
US6219672B1 (en) Distributed shared memory system and method of controlling distributed shared memory
CN115391047A (en) Resource scheduling method and device
CN115936875A (en) Financial product form hanging processing method and device
Scheinert et al. Karasu: A collaborative approach to efficient cluster configuration for big data analytics
CN110648103B (en) Method, apparatus, medium, and computing device for selecting items in a warehouse
CN113918561A (en) Hybrid query method and system based on-cloud analysis scene and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant