CN104731925A - MapReduce-based FP-Growth load balance parallel computing method - Google Patents
MapReduce-based FP-Growth load balance parallel computing method Download PDFInfo
- Publication number
- CN104731925A CN104731925A CN201510138318.0A CN201510138318A CN104731925A CN 104731925 A CN104731925 A CN 104731925A CN 201510138318 A CN201510138318 A CN 201510138318A CN 104731925 A CN104731925 A CN 104731925A
- Authority
- CN
- China
- Prior art keywords
- collection
- frequent
- glist
- item
- new list
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a MapReduce-based FP-Growth load balance parallel computing method. The method comprises the steps that 1, a database transaction set D is divided into different continuous partitions, and a sub-transaction set are stored on multiple nodes; 2, parallel computing is conducted on support counts to obtain all the frequent one-item sets FList; 3, items of the frequent one-item sets are divided into M groups according to a load balancing method to obtain a new list GList; 4, the database transaction set D is also divided into M groups according to the new list GList, a local FP-Tree of each transaction set DB is created when the division of the database transaction set D is finished, and a corresponding GList[gidi] is mined according to each local FP-Tree to obtain the frequent patterns of all the items in the frequent one-item set; 5, the frequent patterns of all the items in the frequent one-item set obtained on each node are aggregately output. The MapReduce-based FP-Growth load balance parallel computing method has good load balancing capacity and execution efficiency.
Description
Technical field
The present invention relates to a kind of parallel calculating method of load balancing, the load balancing parallel calculating method of especially a kind of FP-Grwoth based on MapReduce, belongs to the technical field of data mining.
Background technology
Association rule mining reflects mutual interdependency between a things and other things and relevance, is an important topic in data mining technology.Association rule mining needs experience two steps, i.e. the generation of frequent item set and the generation of correlation rule, and the overall performance of association rule mining determined primarily of the first stage.Classical association rules mining algorithm mainly contains Apriori algorithm, FP-Growth algorithm and Eclat algorithm, and the above two adopt horizontal data form to excavate, and the latter adopts vertical data form to excavate.FP-Growth algorithm comparatively Apriori algorithm, divide-and-conquer strategy is adopted to excavate database, do not produce candidate, it adopts the important information in FP-Tree store data storehouse, only need scan twice database, then the information of key is left in internal memory with the form of FP-Tree, avoid the great expense incurred that Multiple-Scan database brings.
Hadoop be one increase income, can the Distributed Computing Platform of parallel processing large-scale data.MapReduce is one of core component of Hadoop, is a high performance distributed programmed model and Computational frame, for carrying out parallel parsing and process to mass data.MapReduce carries out unified operation all tasks, the i.e. decomposition of task and the merging of result, mainly comprise two important core operations: Map and Reduce(maps and stipulations), large-scale data is split as multiple little data set and is sent on multiple stage machine (node) and carries out concurrent operation by Map function, and the operation result of upper for each machine (node) Map function is then carried out merging and obtains a result by Reduce function.
Along with the progress of society and the development of science and technology, data are explosive growth, the FP-Growth algorithm carrying out association rule mining with unit form far can not the problem such as storage and excavation of satisfying magnanimity data, and some existing FP-Growth parallel algorithms solve division and this two problems of follow-up parallel computation of database, but algorithm is at parallel efficiency calculation, memory consumption, there is obvious difference and deficiency in the aspects such as the performance difference that communication consumes and the sparse degree difference of FP-Tree causes, be short of load balancing when these all divide with db transaction collection and consider there is very large relation.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide the load balancing parallel calculating method of a kind of FP-Growth based on MapReduce, it has good load balance ability and execution efficiency.
According to technical scheme provided by the invention, the load balancing parallel calculating method of a kind of FP-Growth based on MapReduce, described load balancing parallel calculating method comprises the steps:
Step 1, db transaction collection D needed for input and minimum support counting, and described db transaction collection D is divided into continuously different subregions, and the subtransaction collection of db transaction collection D is stored on multiple stage node;
Step 2, first time scan database affairs collection D, the support counting of the item on the every platform node of parallel computation, and the support technology of the item of all node calculate is merged, to obtain all frequent 1 collection FList;
Step 3, the item of frequent 1 collection FList is divided into M group according to the method for load balancing, with the new list GList that to obtain length be M, in new list GList, the group number of each group is gid
i(1≤i≤M);
Step 4, second time scan database affairs collection D, be also divided into M group according to new list GList by db transaction collection D, divide the group number obtaining db transaction collection D corresponding with the group number in new list GList, if a transaction packet is containing GList
gidiin item, then part corresponding for these affairs being sent to group number is gid
itransaction set DB; After db transaction collection D division terminates, its local FP-Tree is created to each transaction set DB, and excavate corresponding GList according to local FP-Tree
gidi, to obtain the frequent mode of frequent 1 concentrated all item;
Step 5, by every platform node obtains frequent 1 concentrate all items frequent mode polymerization export.
Described step 3 comprises the steps:
Step 3.1, calculate the load of every in frequent 1 collection FList, according to load descending sort, to obtain permutation table SList;
Step 3.2, according to the group number M specified, M item before in permutation table SList is initialized as the M group in new list GList, and the often group in new list GList with every in permutation table SList in one_to_one corresponding;
Step 3.3, to add to not being assigned to the Section 1 organized in new list GList in permutation table SList in the group of least-loaded in new list GList, and the load value of the item of interpolation is added up, and upgrade the load organized in new list GList;
Step 3.4, repetition above-mentioned steps 3, until all items in permutation table SList all complete grouping;
Step 3.5, the new list GList obtained to be kept in HDFS file, so that multiple stage nodes sharing.
Compared with prior art, advantage of the present invention: the present invention utilizes the load of total length as this of the prefix path in condition pattern tree of each in frequent 1 collection FList, and carry out descending sort, then the group number M be divided into is specified, make every load sum of comprising in each group substantially equal, thus the equilibrium realizing frequent 1 collection FList divides the load balancing between each computing node, thus solve the situation of load inequality between each computing node, there are better load balance ability and execution efficiency.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
Below in conjunction with concrete drawings and Examples, the invention will be further described.
As shown in Figure 1: in order to have good load balance ability and execution efficiency, load balancing parallel calculating method of the present invention comprises the steps:
Step 1, db transaction collection D needed for input and minimum support counting, and described db transaction collection D is divided into continuously different subregions, and the subtransaction collection of db transaction collection D is stored on multiple stage node;
Db transaction collection D is divided into a few part of continuous print, is stored in respectively on different computing nodes.Each the parton affairs collection be divided is called data fragmentation, this process is directly completed by Hadoop, db transaction collection only need copy on HDFS by user, the Divide File of input can be that multiple data fragmentation (Blook) is stored on obstructed node by Hadoop framework, and be that each data fragmentation preserves copy, thus automatically complete data fragmentation process.
Step 2, first time scan database affairs collection D, the support counting of the item on the every platform node of parallel computation, and the support technology of the item of all node calculate is merged, to obtain all frequent 1 collection FList;
In the embodiment of the present invention, to be counted in whole db transaction collection D the support counting of each by first pair of MapReduce function, thus obtain frequent 1 collection FList.The wherein corresponding data fragmentation Shard of the input of each Map function.The input key assignments plaid matching formula of Map function is <key=lineNo, value=T>, and wherein lineNO represents current line number, and T represents the affairs that current line is corresponding.Output format for each affairs T, Map function is <key=item, value=1>, and wherein item represents each that occur in T.All Map with identical key value can be exported key assignments and be combined the rear input as Reduce by Hadoop, and the input format of Reduce function is <key=item, value={1, and 1,1 ... >.The output format of Reduce is <key=item, value=itemCount>, and wherein, itemCount represents the number of times that corresponding item item occurs, i.e. support counting.
Step 3, the item of frequent 1 collection FList is divided into M group according to the method for load balancing, with the new list GList that to obtain length be M, in new list GList, the group number of each group is gid
i(1≤i≤M);
In the embodiment of the present invention, be to need to divide into groups to db transaction collection D according to new list GList to the object that frequent 1 collection FList divides, to frequent 1 division collecting FList by whether balanced for the load directly having influence on each transaction set divided in next step, thus affect the execution efficiency of whole parallel algorithm.The present invention is to realize dividing frequent 1 collection FList premised on the load balancing between the transaction set be divided, by whole for original larger data base system be loose, be distributed on each node, thus realize parallel computation, so before frequent 1 the collection FList of division, the load of each transaction set first will be estimated.
For transaction set DB(gid
i), will corresponding GList be excavated
gidithe recurrence number of times sum of the condition pattern tree of middle comprised all items is as the load of this group.Therefore, to need first to estimate in frequent 1 collection FList the load of each, then divide frequent 1 collection FList.
The maximal value of the prefix path of the condition pattern tree corresponding to each is that this is at frequent 1 position n collected in FList, if the maximal value of the condition pattern tree prefix path corresponding to a certain item is n, the maximum recurrence number of times that the frequent mode so excavating this does is n-1+n-2+ ... + 1=(n × (n-1))/2, namely the excavation load of each can be estimated as (n × (n-1))/2.
According to the above description, then divide frequent 1 collection FList, the process obtaining new list GList comprises the steps:
Step 3.1, calculate the load of every in frequent 1 collection FList, according to load descending sort, to obtain permutation table SList;
Step 3.2, according to the group number M specified, M item before in permutation table SList is initialized as the M group in new list GList, and the often group in new list GList with every in permutation table SList in one_to_one corresponding;
Step 3.3, to add to not being assigned to the Section 1 organized in new list GList in permutation table SList in the group of least-loaded in new list GList, and the load value of the item of interpolation is added up, and upgrade the load organized in new list GList;
Step 3.4, repetition above-mentioned steps 3, until all items in permutation table SList all complete grouping;
Step 3.5, the new list GList obtained to be kept in HDFS file, so that multiple stage nodes sharing.
In the embodiment of the present invention, gid
icorresponding group is denoted as GList
gidi, and GList
gidieach in group is denoted as α j, α j ∈ GList
gidi, 1≤j≤GList
gidi.length.
Step 4, second time scan database affairs collection D, be also divided into M group according to new list GList by db transaction collection D, divide the group number obtaining db transaction collection D corresponding with the group number in new list GList, if a transaction packet is containing GList
gidiin item, then part corresponding for these affairs being sent to group number is gid
itransaction set DB; After db transaction collection D division terminates, its local FP-Tree is created to each transaction set DB, and excavate corresponding GList according to local FP-Tree
gidi, to obtain the frequent mode of frequent 1 concentrated all item;
In this step, completed by second pair of MapReduce function, wherein the task of Map function is divided into groups to db transaction collection D according to the dividing condition of frequent 1 collection FList, thus obtaining one group of separate to each other transaction set DB, Reduce function is responsible for carrying out FP-Growth excavation to the standalone transaction collection on this node.
Map function: generate the transaction set DB that M group is separate, all affairs on local node are sent in suitable grouping.Map function input key-value pair is still <key=lineNo, value=T>.The operation of Map function is as follows:
1), by new list GList be loaded into local node, generate a hashMap according to new list GList, its key is the item in new list GList, and value is this corresponding group number gid
i.
2), for each the affairs T read in, it is carried out sorting according to the order of frequent 1 collection FList middle term and deletes in T the item be not present in frequent 1 collection FList.
3), sorted affairs T={item is established
1, item
2..., item
n, travel through each item in T from back to front
j, circulate from n until when j equals 1 and terminate.If item
jbe present in certain the key-value pair key-value of hashMap, then key-value pairs identical with the value value of key-value pair key-value all in hashMap deleted.Then j item before in affairs T is sent in the group corresponding to value value of key-value pair key-value.
The output key-value pair of Map function is <key=gid
i, value={ item
1..., item
j>, wherein gid
irepresent the group number of the transaction set that these affairs will be distributed to, { item
1..., item
jrepresent it is not whole piece affairs be sent in corresponding grouping, but only send item
jpart before, the principle of transmission for: the item that affairs T comprises all belongs to which group in new list GList, and which group is the part that affairs T-phase is answered just be sent to.By deleting Hash table discal patch object, to guarantee that same affairs can not be repeatedly transmitted in same grouping.All like this comprising organizes GList
gidithe affairs of middle term, it is gid that the part of its correspondence is all sent to group number
itransaction set DB(gid
i) in, so to transaction set DB(gid
i) carry out FP-Tree excavate just can obtain all groups of GList
gidithe pattern of middle term.Different group GList
gidiin the item that comprises different, each frequent mode dividing into groups to obtain is different, so each transaction set DB is independently, does not rely on mutually between grouping.
Reduce function: Frequent Pattern Mining is carried out to local matter collection.After all Map tasks are all finished, because Hadoop can automatically merge the Map result with identical key value, thus Reduce be input as <key=gid
i, value=DB(gid
i) >, wherein transaction set DB(gid
i) expression group number is gid
ithe standalone transaction collection corresponding to grouping, this affairs collection is made up of the office being all distributed to this group.Each Reduce task processes the affairs collection that Hadoop distributes to it one by one.The operation of Reduce function is as follows:
1), load new list GList, for generating groupMap, the key in groupMap represents group number gid
i, value represents all item GList corresponding to this group
gidi.
2), transaction set DB(gid is scanned
i) in each record, create local TP and set: localFP-Tree.
3), recursive call Growth algorithm, obstructed with traditional Growth algorithm, first time call Growth(FP-Tree, null) time, a traversal group GList
gidimiddle term, instead of travel through whole gauge outfit, this is because each transaction set only need excavate the group GList of its correspondence
gidithe frequent mode of middle comprised item.
The output of Reduce is <key=pattern, value=sup(pattern) >.Wherein pattern represents frequent mode, sup(pattern) represent the number of times that this frequent mode occurs.
Step 5, by every platform node obtains frequent 1 concentrate all items frequent mode polymerization export.
Result for each computing node is carried out once result and is merged, and can obtain the net result under FP-Growth parallel algorithm.
The present invention is directed to traditional F P-Growth algorithm computing power and the limited problem of storage capacity on unit computing node, propose the parallelization computing method based on MapReduce, simultaneously for Data Placement out of true, each computing node causing each node calculate counting yield, memory consumption by the sparse degree difference of FP-Tree between each data block in parallelization process, there is the problems such as notable difference in communication consumption, proposes the load balancing parallel algorithm of a kind of FP-Growth based on MapReduce.
Compared to conventional individual algorithm and common parallel algorithm, the present invention utilizes the load of total length as this of the prefix path in condition pattern tree of each in frequent 1 collection FList, and carry out descending sort, then the group number M be divided into is specified, make every load sum of comprising in each group substantially equal, thus the equilibrium realizing frequent 1 collection FList divides the load balancing between each computing node, thus solve the situation of load inequality between each computing node, there are better load balance ability and execution efficiency.
Claims (2)
1. based on a load balancing parallel calculating method of the FP-Growth of MapReduce, it is characterized in that, described load balancing parallel calculating method comprises the steps:
Step 1, db transaction collection D needed for input and minimum support counting, and described db transaction collection D is divided into continuously different subregions, and the subtransaction collection of db transaction collection D is stored on multiple stage node;
Step 2, first time scan database affairs collection D, the support counting of the item on the every platform node of parallel computation, and the support technology of the item of all node calculate is merged, to obtain all frequent 1 collection FList;
Step 3, the item of frequent 1 collection FList is divided into M group according to the method for load balancing, with the new list GList that to obtain length be M, in new list GList, the group number of each group is gid
i(1≤i≤M);
Step 4, second time scan database affairs collection D, be also divided into M group according to new list GList by db transaction collection D, divide the group number obtaining db transaction collection D corresponding with the group number in new list GList, if a transaction packet is containing GList
gidiin item, then part corresponding for these affairs being sent to group number is gid
itransaction set DB; After db transaction collection D division terminates, its local FP-Tree is created to each transaction set DB, and excavate corresponding GList according to local FP-Tree
gidi, to obtain the frequent mode of frequent 1 concentrated all item;
Step 5, by every platform node obtains frequent 1 concentrate all items frequent mode polymerization export.
2. the load balancing parallel calculating method of the FP-Growth based on MapReduce according to claim 1, it is characterized in that, described step 3 comprises the steps:
Step 3.1, calculate the load of every in frequent 1 collection FList, according to load descending sort, to obtain permutation table SList;
Step 3.2, according to the group number M specified, M item before in permutation table SList is initialized as the M group in new list GList, and the often group in new list GList with every in permutation table SList in one_to_one corresponding;
Step 3.3, to add to not being assigned to the Section 1 organized in new list GList in permutation table SList in the group of least-loaded in new list GList, and the load value of the item of interpolation is added up, and upgrade the load organized in new list GList;
Step 3.4, repetition above-mentioned steps 3, until all items in permutation table SList all complete grouping;
Step 3.5, the new list GList obtained to be kept in HDFS file, so that multiple stage nodes sharing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510138318.0A CN104731925A (en) | 2015-03-26 | 2015-03-26 | MapReduce-based FP-Growth load balance parallel computing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510138318.0A CN104731925A (en) | 2015-03-26 | 2015-03-26 | MapReduce-based FP-Growth load balance parallel computing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104731925A true CN104731925A (en) | 2015-06-24 |
Family
ID=53455812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510138318.0A Pending CN104731925A (en) | 2015-03-26 | 2015-03-26 | MapReduce-based FP-Growth load balance parallel computing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104731925A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105183875A (en) * | 2015-09-21 | 2015-12-23 | 南京邮电大学 | FP-Growth data mining method based on shared path |
CN106503218A (en) * | 2016-10-27 | 2017-03-15 | 北京邮电大学 | A kind of parallelization Workflow association data find method |
CN106874479A (en) * | 2017-02-19 | 2017-06-20 | 郑州云海信息技术有限公司 | The improved method and device of the FP Growth algorithms based on FPGA |
CN107045512A (en) * | 2016-02-05 | 2017-08-15 | 北京京东尚科信息技术有限公司 | A kind of method for interchanging data and system |
CN108153589A (en) * | 2016-12-06 | 2018-06-12 | 国际商业机器公司 | For the method and system of the data processing in the processing arrangement of multithreading |
CN110232079A (en) * | 2019-05-08 | 2019-09-13 | 江苏理工学院 | A kind of modified FP-Growth data digging method based on Hadoop |
CN110990434A (en) * | 2019-11-29 | 2020-04-10 | 国网四川省电力公司信息通信公司 | Spark platform grouping and Fp-Growth association rule mining method |
CN111107493A (en) * | 2018-10-25 | 2020-05-05 | 中国电力科学研究院有限公司 | Method and system for predicting position of mobile user |
CN111309786A (en) * | 2020-02-20 | 2020-06-19 | 江西理工大学 | Parallel frequent item set mining method based on MapReduce |
CN113672665A (en) * | 2021-08-18 | 2021-11-19 | Oppo广东移动通信有限公司 | Data processing method, data acquisition system, electronic device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101127037A (en) * | 2006-08-15 | 2008-02-20 | 临安微创网格信息工程有限公司 | Periodic associated rule discovery algorithm based on time sequence vector diverse sequence method clustering |
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
US20150019562A1 (en) * | 2011-04-26 | 2015-01-15 | Brian J. Bulkowski | Method and system of mapreduce implementations on indexed datasets in a distributed database environment |
-
2015
- 2015-03-26 CN CN201510138318.0A patent/CN104731925A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101127037A (en) * | 2006-08-15 | 2008-02-20 | 临安微创网格信息工程有限公司 | Periodic associated rule discovery algorithm based on time sequence vector diverse sequence method clustering |
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
US20150019562A1 (en) * | 2011-04-26 | 2015-01-15 | Brian J. Bulkowski | Method and system of mapreduce implementations on indexed datasets in a distributed database environment |
Non-Patent Citations (1)
Title |
---|
周诗慧: "基于Hadoop的改进的并行Fp-Growth算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105183875A (en) * | 2015-09-21 | 2015-12-23 | 南京邮电大学 | FP-Growth data mining method based on shared path |
CN107045512A (en) * | 2016-02-05 | 2017-08-15 | 北京京东尚科信息技术有限公司 | A kind of method for interchanging data and system |
CN106503218A (en) * | 2016-10-27 | 2017-03-15 | 北京邮电大学 | A kind of parallelization Workflow association data find method |
US11036558B2 (en) | 2016-12-06 | 2021-06-15 | International Business Machines Corporation | Data processing |
CN108153589A (en) * | 2016-12-06 | 2018-06-12 | 国际商业机器公司 | For the method and system of the data processing in the processing arrangement of multithreading |
CN108153589B (en) * | 2016-12-06 | 2021-12-07 | 国际商业机器公司 | Method and system for data processing in a multi-threaded processing arrangement |
CN106874479A (en) * | 2017-02-19 | 2017-06-20 | 郑州云海信息技术有限公司 | The improved method and device of the FP Growth algorithms based on FPGA |
CN111107493A (en) * | 2018-10-25 | 2020-05-05 | 中国电力科学研究院有限公司 | Method and system for predicting position of mobile user |
CN111107493B (en) * | 2018-10-25 | 2022-09-02 | 中国电力科学研究院有限公司 | Method and system for predicting position of mobile user |
CN110232079A (en) * | 2019-05-08 | 2019-09-13 | 江苏理工学院 | A kind of modified FP-Growth data digging method based on Hadoop |
CN110990434A (en) * | 2019-11-29 | 2020-04-10 | 国网四川省电力公司信息通信公司 | Spark platform grouping and Fp-Growth association rule mining method |
CN110990434B (en) * | 2019-11-29 | 2023-04-18 | 国网四川省电力公司信息通信公司 | Spark platform grouping and Fp-Growth association rule mining method |
CN111309786A (en) * | 2020-02-20 | 2020-06-19 | 江西理工大学 | Parallel frequent item set mining method based on MapReduce |
CN111309786B (en) * | 2020-02-20 | 2023-09-15 | 韶关学院 | Parallel frequent item set mining method based on MapReduce |
CN113672665A (en) * | 2021-08-18 | 2021-11-19 | Oppo广东移动通信有限公司 | Data processing method, data acquisition system, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104731925A (en) | MapReduce-based FP-Growth load balance parallel computing method | |
CN103020256B (en) | A kind of association rule mining method of large-scale data | |
CN103258049A (en) | Association rule mining method based on mass data | |
CN102662639A (en) | Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method | |
CN109408521A (en) | A kind of method and device thereof for more new block chain global data state | |
CN103729478A (en) | LBS (Location Based Service) interest point discovery method based on MapReduce | |
Liao et al. | MRPrePost—A parallel algorithm adapted for mining big data | |
CN103617162A (en) | Method of constructing Hilbert R-tree index on equivalent cloud platform | |
CN103678550A (en) | Mass data real-time query method based on dynamic index structure | |
CN104834709B (en) | A kind of parallel cosine mode method for digging based on load balancing | |
CN112015741A (en) | Method and device for storing massive data in different databases and tables | |
CN101499097B (en) | Hash table based data stream frequent pattern internal memory compression and storage method | |
CN105045806A (en) | Dynamic splitting and maintenance method of quantile query oriented summary data | |
CN104679966B (en) | Empowerment hypergraph optimization division methods based on Hierarchy Method and discrete particle cluster | |
CN106815302A (en) | A kind of Mining Frequent Itemsets for being applied to game item recommendation | |
CN104933143A (en) | Method and device for acquiring recommended object | |
CN102207935A (en) | Method and system for establishing index | |
CN105138607B (en) | A kind of KNN querying methods based on combination grain distributed memory grid index | |
CN110232079A (en) | A kind of modified FP-Growth data digging method based on Hadoop | |
CN106874479A (en) | The improved method and device of the FP Growth algorithms based on FPGA | |
CN102043857A (en) | All-nearest-neighbor query method and system | |
Al-Hamodi et al. | An enhanced frequent pattern growth based on MapReduce for mining association rules | |
CN103761298A (en) | Distributed-architecture-based entity matching method | |
CN109254962A (en) | A kind of optimiged index method and device based on T- tree | |
CN102722546B (en) | The querying method of shortest path in relational database environment figure below |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150624 |