CN104731925A - MapReduce-based FP-Growth load balance parallel computing method - Google Patents

MapReduce-based FP-Growth load balance parallel computing method Download PDF

Info

Publication number
CN104731925A
CN104731925A CN201510138318.0A CN201510138318A CN104731925A CN 104731925 A CN104731925 A CN 104731925A CN 201510138318 A CN201510138318 A CN 201510138318A CN 104731925 A CN104731925 A CN 104731925A
Authority
CN
China
Prior art keywords
collection
frequent
glist
item
new list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510138318.0A
Other languages
Chinese (zh)
Inventor
杨勇
陈曙东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu IoT Research and Development Center
Original Assignee
Jiangsu IoT Research and Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu IoT Research and Development Center filed Critical Jiangsu IoT Research and Development Center
Priority to CN201510138318.0A priority Critical patent/CN104731925A/en
Publication of CN104731925A publication Critical patent/CN104731925A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a MapReduce-based FP-Growth load balance parallel computing method. The method comprises the steps that 1, a database transaction set D is divided into different continuous partitions, and a sub-transaction set are stored on multiple nodes; 2, parallel computing is conducted on support counts to obtain all the frequent one-item sets FList; 3, items of the frequent one-item sets are divided into M groups according to a load balancing method to obtain a new list GList; 4, the database transaction set D is also divided into M groups according to the new list GList, a local FP-Tree of each transaction set DB is created when the division of the database transaction set D is finished, and a corresponding GList[gidi] is mined according to each local FP-Tree to obtain the frequent patterns of all the items in the frequent one-item set; 5, the frequent patterns of all the items in the frequent one-item set obtained on each node are aggregately output. The MapReduce-based FP-Growth load balance parallel computing method has good load balancing capacity and execution efficiency.

Description

Based on the load balancing parallel calculating method of the FP-Growth of MapReduce
Technical field
The present invention relates to a kind of parallel calculating method of load balancing, the load balancing parallel calculating method of especially a kind of FP-Grwoth based on MapReduce, belongs to the technical field of data mining.
Background technology
Association rule mining reflects mutual interdependency between a things and other things and relevance, is an important topic in data mining technology.Association rule mining needs experience two steps, i.e. the generation of frequent item set and the generation of correlation rule, and the overall performance of association rule mining determined primarily of the first stage.Classical association rules mining algorithm mainly contains Apriori algorithm, FP-Growth algorithm and Eclat algorithm, and the above two adopt horizontal data form to excavate, and the latter adopts vertical data form to excavate.FP-Growth algorithm comparatively Apriori algorithm, divide-and-conquer strategy is adopted to excavate database, do not produce candidate, it adopts the important information in FP-Tree store data storehouse, only need scan twice database, then the information of key is left in internal memory with the form of FP-Tree, avoid the great expense incurred that Multiple-Scan database brings.
Hadoop be one increase income, can the Distributed Computing Platform of parallel processing large-scale data.MapReduce is one of core component of Hadoop, is a high performance distributed programmed model and Computational frame, for carrying out parallel parsing and process to mass data.MapReduce carries out unified operation all tasks, the i.e. decomposition of task and the merging of result, mainly comprise two important core operations: Map and Reduce(maps and stipulations), large-scale data is split as multiple little data set and is sent on multiple stage machine (node) and carries out concurrent operation by Map function, and the operation result of upper for each machine (node) Map function is then carried out merging and obtains a result by Reduce function.
Along with the progress of society and the development of science and technology, data are explosive growth, the FP-Growth algorithm carrying out association rule mining with unit form far can not the problem such as storage and excavation of satisfying magnanimity data, and some existing FP-Growth parallel algorithms solve division and this two problems of follow-up parallel computation of database, but algorithm is at parallel efficiency calculation, memory consumption, there is obvious difference and deficiency in the aspects such as the performance difference that communication consumes and the sparse degree difference of FP-Tree causes, be short of load balancing when these all divide with db transaction collection and consider there is very large relation.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide the load balancing parallel calculating method of a kind of FP-Growth based on MapReduce, it has good load balance ability and execution efficiency.
According to technical scheme provided by the invention, the load balancing parallel calculating method of a kind of FP-Growth based on MapReduce, described load balancing parallel calculating method comprises the steps:
Step 1, db transaction collection D needed for input and minimum support counting, and described db transaction collection D is divided into continuously different subregions, and the subtransaction collection of db transaction collection D is stored on multiple stage node;
Step 2, first time scan database affairs collection D, the support counting of the item on the every platform node of parallel computation, and the support technology of the item of all node calculate is merged, to obtain all frequent 1 collection FList;
Step 3, the item of frequent 1 collection FList is divided into M group according to the method for load balancing, with the new list GList that to obtain length be M, in new list GList, the group number of each group is gid i(1≤i≤M);
Step 4, second time scan database affairs collection D, be also divided into M group according to new list GList by db transaction collection D, divide the group number obtaining db transaction collection D corresponding with the group number in new list GList, if a transaction packet is containing GList gidiin item, then part corresponding for these affairs being sent to group number is gid itransaction set DB; After db transaction collection D division terminates, its local FP-Tree is created to each transaction set DB, and excavate corresponding GList according to local FP-Tree gidi, to obtain the frequent mode of frequent 1 concentrated all item;
Step 5, by every platform node obtains frequent 1 concentrate all items frequent mode polymerization export.
Described step 3 comprises the steps:
Step 3.1, calculate the load of every in frequent 1 collection FList, according to load descending sort, to obtain permutation table SList;
Step 3.2, according to the group number M specified, M item before in permutation table SList is initialized as the M group in new list GList, and the often group in new list GList with every in permutation table SList in one_to_one corresponding;
Step 3.3, to add to not being assigned to the Section 1 organized in new list GList in permutation table SList in the group of least-loaded in new list GList, and the load value of the item of interpolation is added up, and upgrade the load organized in new list GList;
Step 3.4, repetition above-mentioned steps 3, until all items in permutation table SList all complete grouping;
Step 3.5, the new list GList obtained to be kept in HDFS file, so that multiple stage nodes sharing.
Compared with prior art, advantage of the present invention: the present invention utilizes the load of total length as this of the prefix path in condition pattern tree of each in frequent 1 collection FList, and carry out descending sort, then the group number M be divided into is specified, make every load sum of comprising in each group substantially equal, thus the equilibrium realizing frequent 1 collection FList divides the load balancing between each computing node, thus solve the situation of load inequality between each computing node, there are better load balance ability and execution efficiency.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
Below in conjunction with concrete drawings and Examples, the invention will be further described.
As shown in Figure 1: in order to have good load balance ability and execution efficiency, load balancing parallel calculating method of the present invention comprises the steps:
Step 1, db transaction collection D needed for input and minimum support counting, and described db transaction collection D is divided into continuously different subregions, and the subtransaction collection of db transaction collection D is stored on multiple stage node;
Db transaction collection D is divided into a few part of continuous print, is stored in respectively on different computing nodes.Each the parton affairs collection be divided is called data fragmentation, this process is directly completed by Hadoop, db transaction collection only need copy on HDFS by user, the Divide File of input can be that multiple data fragmentation (Blook) is stored on obstructed node by Hadoop framework, and be that each data fragmentation preserves copy, thus automatically complete data fragmentation process.
Step 2, first time scan database affairs collection D, the support counting of the item on the every platform node of parallel computation, and the support technology of the item of all node calculate is merged, to obtain all frequent 1 collection FList;
In the embodiment of the present invention, to be counted in whole db transaction collection D the support counting of each by first pair of MapReduce function, thus obtain frequent 1 collection FList.The wherein corresponding data fragmentation Shard of the input of each Map function.The input key assignments plaid matching formula of Map function is <key=lineNo, value=T>, and wherein lineNO represents current line number, and T represents the affairs that current line is corresponding.Output format for each affairs T, Map function is <key=item, value=1>, and wherein item represents each that occur in T.All Map with identical key value can be exported key assignments and be combined the rear input as Reduce by Hadoop, and the input format of Reduce function is <key=item, value={1, and 1,1 ... >.The output format of Reduce is <key=item, value=itemCount>, and wherein, itemCount represents the number of times that corresponding item item occurs, i.e. support counting.
Step 3, the item of frequent 1 collection FList is divided into M group according to the method for load balancing, with the new list GList that to obtain length be M, in new list GList, the group number of each group is gid i(1≤i≤M);
In the embodiment of the present invention, be to need to divide into groups to db transaction collection D according to new list GList to the object that frequent 1 collection FList divides, to frequent 1 division collecting FList by whether balanced for the load directly having influence on each transaction set divided in next step, thus affect the execution efficiency of whole parallel algorithm.The present invention is to realize dividing frequent 1 collection FList premised on the load balancing between the transaction set be divided, by whole for original larger data base system be loose, be distributed on each node, thus realize parallel computation, so before frequent 1 the collection FList of division, the load of each transaction set first will be estimated.
For transaction set DB(gid i), will corresponding GList be excavated gidithe recurrence number of times sum of the condition pattern tree of middle comprised all items is as the load of this group.Therefore, to need first to estimate in frequent 1 collection FList the load of each, then divide frequent 1 collection FList.
The maximal value of the prefix path of the condition pattern tree corresponding to each is that this is at frequent 1 position n collected in FList, if the maximal value of the condition pattern tree prefix path corresponding to a certain item is n, the maximum recurrence number of times that the frequent mode so excavating this does is n-1+n-2+ ... + 1=(n × (n-1))/2, namely the excavation load of each can be estimated as (n × (n-1))/2.
According to the above description, then divide frequent 1 collection FList, the process obtaining new list GList comprises the steps:
Step 3.1, calculate the load of every in frequent 1 collection FList, according to load descending sort, to obtain permutation table SList;
Step 3.2, according to the group number M specified, M item before in permutation table SList is initialized as the M group in new list GList, and the often group in new list GList with every in permutation table SList in one_to_one corresponding;
Step 3.3, to add to not being assigned to the Section 1 organized in new list GList in permutation table SList in the group of least-loaded in new list GList, and the load value of the item of interpolation is added up, and upgrade the load organized in new list GList;
Step 3.4, repetition above-mentioned steps 3, until all items in permutation table SList all complete grouping;
Step 3.5, the new list GList obtained to be kept in HDFS file, so that multiple stage nodes sharing.
In the embodiment of the present invention, gid icorresponding group is denoted as GList gidi, and GList gidieach in group is denoted as α j, α j ∈ GList gidi, 1≤j≤GList gidi.length.
Step 4, second time scan database affairs collection D, be also divided into M group according to new list GList by db transaction collection D, divide the group number obtaining db transaction collection D corresponding with the group number in new list GList, if a transaction packet is containing GList gidiin item, then part corresponding for these affairs being sent to group number is gid itransaction set DB; After db transaction collection D division terminates, its local FP-Tree is created to each transaction set DB, and excavate corresponding GList according to local FP-Tree gidi, to obtain the frequent mode of frequent 1 concentrated all item;
In this step, completed by second pair of MapReduce function, wherein the task of Map function is divided into groups to db transaction collection D according to the dividing condition of frequent 1 collection FList, thus obtaining one group of separate to each other transaction set DB, Reduce function is responsible for carrying out FP-Growth excavation to the standalone transaction collection on this node.
Map function: generate the transaction set DB that M group is separate, all affairs on local node are sent in suitable grouping.Map function input key-value pair is still <key=lineNo, value=T>.The operation of Map function is as follows:
1), by new list GList be loaded into local node, generate a hashMap according to new list GList, its key is the item in new list GList, and value is this corresponding group number gid i.
2), for each the affairs T read in, it is carried out sorting according to the order of frequent 1 collection FList middle term and deletes in T the item be not present in frequent 1 collection FList.
3), sorted affairs T={item is established 1, item 2..., item n, travel through each item in T from back to front j, circulate from n until when j equals 1 and terminate.If item jbe present in certain the key-value pair key-value of hashMap, then key-value pairs identical with the value value of key-value pair key-value all in hashMap deleted.Then j item before in affairs T is sent in the group corresponding to value value of key-value pair key-value.
The output key-value pair of Map function is <key=gid i, value={ item 1..., item j>, wherein gid irepresent the group number of the transaction set that these affairs will be distributed to, { item 1..., item jrepresent it is not whole piece affairs be sent in corresponding grouping, but only send item jpart before, the principle of transmission for: the item that affairs T comprises all belongs to which group in new list GList, and which group is the part that affairs T-phase is answered just be sent to.By deleting Hash table discal patch object, to guarantee that same affairs can not be repeatedly transmitted in same grouping.All like this comprising organizes GList gidithe affairs of middle term, it is gid that the part of its correspondence is all sent to group number itransaction set DB(gid i) in, so to transaction set DB(gid i) carry out FP-Tree excavate just can obtain all groups of GList gidithe pattern of middle term.Different group GList gidiin the item that comprises different, each frequent mode dividing into groups to obtain is different, so each transaction set DB is independently, does not rely on mutually between grouping.
Reduce function: Frequent Pattern Mining is carried out to local matter collection.After all Map tasks are all finished, because Hadoop can automatically merge the Map result with identical key value, thus Reduce be input as <key=gid i, value=DB(gid i) >, wherein transaction set DB(gid i) expression group number is gid ithe standalone transaction collection corresponding to grouping, this affairs collection is made up of the office being all distributed to this group.Each Reduce task processes the affairs collection that Hadoop distributes to it one by one.The operation of Reduce function is as follows:
1), load new list GList, for generating groupMap, the key in groupMap represents group number gid i, value represents all item GList corresponding to this group gidi.
2), transaction set DB(gid is scanned i) in each record, create local TP and set: localFP-Tree.
3), recursive call Growth algorithm, obstructed with traditional Growth algorithm, first time call Growth(FP-Tree, null) time, a traversal group GList gidimiddle term, instead of travel through whole gauge outfit, this is because each transaction set only need excavate the group GList of its correspondence gidithe frequent mode of middle comprised item.
The output of Reduce is <key=pattern, value=sup(pattern) >.Wherein pattern represents frequent mode, sup(pattern) represent the number of times that this frequent mode occurs.
Step 5, by every platform node obtains frequent 1 concentrate all items frequent mode polymerization export.
Result for each computing node is carried out once result and is merged, and can obtain the net result under FP-Growth parallel algorithm.
The present invention is directed to traditional F P-Growth algorithm computing power and the limited problem of storage capacity on unit computing node, propose the parallelization computing method based on MapReduce, simultaneously for Data Placement out of true, each computing node causing each node calculate counting yield, memory consumption by the sparse degree difference of FP-Tree between each data block in parallelization process, there is the problems such as notable difference in communication consumption, proposes the load balancing parallel algorithm of a kind of FP-Growth based on MapReduce.
Compared to conventional individual algorithm and common parallel algorithm, the present invention utilizes the load of total length as this of the prefix path in condition pattern tree of each in frequent 1 collection FList, and carry out descending sort, then the group number M be divided into is specified, make every load sum of comprising in each group substantially equal, thus the equilibrium realizing frequent 1 collection FList divides the load balancing between each computing node, thus solve the situation of load inequality between each computing node, there are better load balance ability and execution efficiency.

Claims (2)

1. based on a load balancing parallel calculating method of the FP-Growth of MapReduce, it is characterized in that, described load balancing parallel calculating method comprises the steps:
Step 1, db transaction collection D needed for input and minimum support counting, and described db transaction collection D is divided into continuously different subregions, and the subtransaction collection of db transaction collection D is stored on multiple stage node;
Step 2, first time scan database affairs collection D, the support counting of the item on the every platform node of parallel computation, and the support technology of the item of all node calculate is merged, to obtain all frequent 1 collection FList;
Step 3, the item of frequent 1 collection FList is divided into M group according to the method for load balancing, with the new list GList that to obtain length be M, in new list GList, the group number of each group is gid i(1≤i≤M);
Step 4, second time scan database affairs collection D, be also divided into M group according to new list GList by db transaction collection D, divide the group number obtaining db transaction collection D corresponding with the group number in new list GList, if a transaction packet is containing GList gidiin item, then part corresponding for these affairs being sent to group number is gid itransaction set DB; After db transaction collection D division terminates, its local FP-Tree is created to each transaction set DB, and excavate corresponding GList according to local FP-Tree gidi, to obtain the frequent mode of frequent 1 concentrated all item;
Step 5, by every platform node obtains frequent 1 concentrate all items frequent mode polymerization export.
2. the load balancing parallel calculating method of the FP-Growth based on MapReduce according to claim 1, it is characterized in that, described step 3 comprises the steps:
Step 3.1, calculate the load of every in frequent 1 collection FList, according to load descending sort, to obtain permutation table SList;
Step 3.2, according to the group number M specified, M item before in permutation table SList is initialized as the M group in new list GList, and the often group in new list GList with every in permutation table SList in one_to_one corresponding;
Step 3.3, to add to not being assigned to the Section 1 organized in new list GList in permutation table SList in the group of least-loaded in new list GList, and the load value of the item of interpolation is added up, and upgrade the load organized in new list GList;
Step 3.4, repetition above-mentioned steps 3, until all items in permutation table SList all complete grouping;
Step 3.5, the new list GList obtained to be kept in HDFS file, so that multiple stage nodes sharing.
CN201510138318.0A 2015-03-26 2015-03-26 MapReduce-based FP-Growth load balance parallel computing method Pending CN104731925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510138318.0A CN104731925A (en) 2015-03-26 2015-03-26 MapReduce-based FP-Growth load balance parallel computing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510138318.0A CN104731925A (en) 2015-03-26 2015-03-26 MapReduce-based FP-Growth load balance parallel computing method

Publications (1)

Publication Number Publication Date
CN104731925A true CN104731925A (en) 2015-06-24

Family

ID=53455812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510138318.0A Pending CN104731925A (en) 2015-03-26 2015-03-26 MapReduce-based FP-Growth load balance parallel computing method

Country Status (1)

Country Link
CN (1) CN104731925A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183875A (en) * 2015-09-21 2015-12-23 南京邮电大学 FP-Growth data mining method based on shared path
CN106503218A (en) * 2016-10-27 2017-03-15 北京邮电大学 A kind of parallelization Workflow association data find method
CN106874479A (en) * 2017-02-19 2017-06-20 郑州云海信息技术有限公司 The improved method and device of the FP Growth algorithms based on FPGA
CN107045512A (en) * 2016-02-05 2017-08-15 北京京东尚科信息技术有限公司 A kind of method for interchanging data and system
CN108153589A (en) * 2016-12-06 2018-06-12 国际商业机器公司 For the method and system of the data processing in the processing arrangement of multithreading
CN110232079A (en) * 2019-05-08 2019-09-13 江苏理工学院 A kind of modified FP-Growth data digging method based on Hadoop
CN110990434A (en) * 2019-11-29 2020-04-10 国网四川省电力公司信息通信公司 Spark platform grouping and Fp-Growth association rule mining method
CN111107493A (en) * 2018-10-25 2020-05-05 中国电力科学研究院有限公司 Method and system for predicting position of mobile user
CN111309786A (en) * 2020-02-20 2020-06-19 江西理工大学 Parallel frequent item set mining method based on MapReduce
CN113672665A (en) * 2021-08-18 2021-11-19 Oppo广东移动通信有限公司 Data processing method, data acquisition system, electronic device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127037A (en) * 2006-08-15 2008-02-20 临安微创网格信息工程有限公司 Periodic associated rule discovery algorithm based on time sequence vector diverse sequence method clustering
CN101655857A (en) * 2009-09-18 2010-02-24 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
US20150019562A1 (en) * 2011-04-26 2015-01-15 Brian J. Bulkowski Method and system of mapreduce implementations on indexed datasets in a distributed database environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127037A (en) * 2006-08-15 2008-02-20 临安微创网格信息工程有限公司 Periodic associated rule discovery algorithm based on time sequence vector diverse sequence method clustering
CN101655857A (en) * 2009-09-18 2010-02-24 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
US20150019562A1 (en) * 2011-04-26 2015-01-15 Brian J. Bulkowski Method and system of mapreduce implementations on indexed datasets in a distributed database environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周诗慧: "基于Hadoop的改进的并行Fp-Growth算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183875A (en) * 2015-09-21 2015-12-23 南京邮电大学 FP-Growth data mining method based on shared path
CN107045512A (en) * 2016-02-05 2017-08-15 北京京东尚科信息技术有限公司 A kind of method for interchanging data and system
CN106503218A (en) * 2016-10-27 2017-03-15 北京邮电大学 A kind of parallelization Workflow association data find method
US11036558B2 (en) 2016-12-06 2021-06-15 International Business Machines Corporation Data processing
CN108153589A (en) * 2016-12-06 2018-06-12 国际商业机器公司 For the method and system of the data processing in the processing arrangement of multithreading
CN108153589B (en) * 2016-12-06 2021-12-07 国际商业机器公司 Method and system for data processing in a multi-threaded processing arrangement
CN106874479A (en) * 2017-02-19 2017-06-20 郑州云海信息技术有限公司 The improved method and device of the FP Growth algorithms based on FPGA
CN111107493A (en) * 2018-10-25 2020-05-05 中国电力科学研究院有限公司 Method and system for predicting position of mobile user
CN111107493B (en) * 2018-10-25 2022-09-02 中国电力科学研究院有限公司 Method and system for predicting position of mobile user
CN110232079A (en) * 2019-05-08 2019-09-13 江苏理工学院 A kind of modified FP-Growth data digging method based on Hadoop
CN110990434A (en) * 2019-11-29 2020-04-10 国网四川省电力公司信息通信公司 Spark platform grouping and Fp-Growth association rule mining method
CN110990434B (en) * 2019-11-29 2023-04-18 国网四川省电力公司信息通信公司 Spark platform grouping and Fp-Growth association rule mining method
CN111309786A (en) * 2020-02-20 2020-06-19 江西理工大学 Parallel frequent item set mining method based on MapReduce
CN111309786B (en) * 2020-02-20 2023-09-15 韶关学院 Parallel frequent item set mining method based on MapReduce
CN113672665A (en) * 2021-08-18 2021-11-19 Oppo广东移动通信有限公司 Data processing method, data acquisition system, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN104731925A (en) MapReduce-based FP-Growth load balance parallel computing method
CN103020256B (en) A kind of association rule mining method of large-scale data
CN103258049A (en) Association rule mining method based on mass data
CN102662639A (en) Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN109408521A (en) A kind of method and device thereof for more new block chain global data state
CN103729478A (en) LBS (Location Based Service) interest point discovery method based on MapReduce
Liao et al. MRPrePost—A parallel algorithm adapted for mining big data
CN103617162A (en) Method of constructing Hilbert R-tree index on equivalent cloud platform
CN103678550A (en) Mass data real-time query method based on dynamic index structure
CN104834709B (en) A kind of parallel cosine mode method for digging based on load balancing
CN112015741A (en) Method and device for storing massive data in different databases and tables
CN101499097B (en) Hash table based data stream frequent pattern internal memory compression and storage method
CN105045806A (en) Dynamic splitting and maintenance method of quantile query oriented summary data
CN104679966B (en) Empowerment hypergraph optimization division methods based on Hierarchy Method and discrete particle cluster
CN106815302A (en) A kind of Mining Frequent Itemsets for being applied to game item recommendation
CN104933143A (en) Method and device for acquiring recommended object
CN102207935A (en) Method and system for establishing index
CN105138607B (en) A kind of KNN querying methods based on combination grain distributed memory grid index
CN110232079A (en) A kind of modified FP-Growth data digging method based on Hadoop
CN106874479A (en) The improved method and device of the FP Growth algorithms based on FPGA
CN102043857A (en) All-nearest-neighbor query method and system
Al-Hamodi et al. An enhanced frequent pattern growth based on MapReduce for mining association rules
CN103761298A (en) Distributed-architecture-based entity matching method
CN109254962A (en) A kind of optimiged index method and device based on T- tree
CN102722546B (en) The querying method of shortest path in relational database environment figure below

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150624