CN103136244A - Parallel data mining method and system based on cloud computing platform - Google Patents

Parallel data mining method and system based on cloud computing platform Download PDF

Info

Publication number
CN103136244A
CN103136244A CN201110386148XA CN201110386148A CN103136244A CN 103136244 A CN103136244 A CN 103136244A CN 201110386148X A CN201110386148X A CN 201110386148XA CN 201110386148 A CN201110386148 A CN 201110386148A CN 103136244 A CN103136244 A CN 103136244A
Authority
CN
China
Prior art keywords
frequent item
item set
dimension table
contacts list
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110386148XA
Other languages
Chinese (zh)
Inventor
顾茜
赵鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201110386148XA priority Critical patent/CN103136244A/en
Publication of CN103136244A publication Critical patent/CN103136244A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a parallel data mining method based on a cloud computing platform. The cloud computing platform is provided with a map-reduce frame. The parallel data mining method comprises the following steps: distributed nodes establishes a truth contacting table for a software as a service (SAAS) application data base, the distributed nodes conducts data extracting to each single dimension table according to the truth contacting table to find out a frequent item set of each single dimension table, and/or find out a frequent item set of a dimension-cross table according to the truth contacting table. The frequent item sets found out by all the distributed nodes which serve as middle files are input into mission simplifying nodes. The mission simplifying nodes merge the received middle files and output the merged frequent item set to serve as data mining results. Based on the map-reduce frame, the parallel data mining method based on the cloud computing platform enables the mining process of a large-scale data set in the cloud computing to be carried out in a plurality of distributed nodes, finally the frequent item set of the mission simplifying nodes is merged to output the final data mining results, and therefore efficient mining of mass data is achieved, and efficiency of data mining is greatly improved.

Description

Parallel data mining method and system based on cloud computing platform
Technical field
The present invention relates to Data Mining, relate in particular to a kind of parallel data mining method and system based on cloud computing platform.
Background technology
Along with the development of cloud computing, software is namely served (Software As A Service, be called for short SAAS) application popularization, is the important technological problems that present enterprise need to solve to the excavation of SAAS application data.Traditional Apriori and improved data mining algorithm are only suitable for less data scale, the mass data of bringing for cloud computing, existing data mining algorithm and to improve the efficient of algorithm all unsatisfactory, corresponding original data mining system can't realize that mass data that enterprise brings cloud computing carries out the requirement of effectively excavating fast.
Summary of the invention
The objective of the invention is to propose a kind of parallel data mining method and system based on cloud computing platform, can realize the efficient excavation of mass data.
For achieving the above object, the invention provides a kind of parallel data mining method based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, and described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, and described parallel data mining method comprises:
Described distributed node is set up true contacts list to the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node carries out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, finds out the frequent item set of described each independent dimension table; And/or find out the frequent item set across the dimension table of described distributed SAAS application data base according to described true contacts list;
All described distributed nodes are input to described abbreviation task node with the frequent item set that finds as intermediate file;
Described abbreviation task node merges the intermediate file that receives, and the frequent item set after the output merging is as data mining results.
For achieving the above object, the invention provides a kind of parallel data mining system based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, described distributed node comprises the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node also comprises:
True contacts list is set up the unit, is used for the distributed SAAS application data base of having set up is set up true contacts list;
One-dimensional table frequent item set acquiring unit is used for according to described true contacts list, each independent dimension table of described distributed SAAS application data base being carried out data pick-up, finds out the frequent item set of described each independent dimension table;
Across dimension table frequent item set acquiring unit, be used for finding out according to described true contacts list the frequent item set across the dimension table of described distributed SAAS application data base;
Data input cell is used for the frequent item set that finds is input to described abbreviation task node as intermediate file;
Described abbreviation task node is used for the intermediate file that receives from each distributed node is merged, and the frequent item set after the output merging is as data mining results.
Based on technique scheme, the present invention is based on mapping-abbreviation (Map-Reduce) framework carries out the mining process of the large-scale dataset in cloud computing in a plurality of distributed nodes, export final data mining results by the frequent item set merging of task abbreviation node at last, thereby realized the efficient excavation of mass data, greatly improved the efficient of data mining.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the schematic flow sheet of an embodiment that the present invention is based on the parallel data mining method of cloud computing platform.
Fig. 2 is the schematic flow sheet of searching that the present invention is based in another embodiment of parallel data mining method of cloud computing platform the frequent item set of dimension table separately.
Fig. 3 is the schematic flow sheet of searching that the present invention is based in the another embodiment of parallel data mining method of cloud computing platform across the frequent item set of dimension table.
Fig. 4 is the structural representation of an embodiment of the parallel data mining system of cloud computing platform of the present invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
As shown in Figure 1, be the schematic flow sheet of an embodiment of the parallel data mining method that the present invention is based on cloud computing platform.Cloud computing platform in the present embodiment has the Map-Reduce framework, and the Map-Reduce framework comprises distributed node and the abbreviation task node of a plurality of mappings.The parallel data mining flow process specifically comprises the following steps:
Step 101, described distributed node are set up true contacts list to the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Step 102, described distributed node carry out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, find out the frequent item set of described each independent dimension table; And/or find out the frequent item set across the dimension table of described distributed SAAS application data base according to described true contacts list;
Step 103, all described distributed nodes are input to described abbreviation task node with the frequent item set that finds as intermediate file;
Step 104, described abbreviation task node merge the intermediate file that receives, and the frequent item set after the output merging is as data mining results.
In the present embodiment, cloud computing platform has adopted the Map-Reduce framework, and this framework is suitable for the concurrent operation of large-scale dataset (for example greater than 1TB).By the large-scale operation to data set being distributed to each distributed node on network, can realize the reliability that operates; And each distributed node can periodically be returned the updating record of the work of completing and state.And consider that Reduction is relatively poor in each distributed node Parallel Implementation effect, can Reduction be dispatched on an abbreviation node as far as possible.
The present embodiment combines with data mining algorithm under the Map-Reduce of cloud computing framework, carries out the data mining of large-scale dataset under distribution system, can improve greatly the efficient of data mining.
In step 101, distributed node can extract the form major key to a plurality of independent dimension table in the distributed SAAS application data base of having set up, and sets up true contacts list according to the form major key in front, and this fact contacts list and a plurality of independent dimension table are hub-and-spoke configuration.
The below has provided the table model example of distributed SAAS application data inside:
Table one
id 1 at 1
a1 ...
a2 ...
Table two
id 2 at 2 at 3
b1 ... ...
b2 ... ...
Table three
id 3 at 4
c1 ...
c2 ...
Tie up separately table T according to three of fronts tExtract form major key id t, and according to form major key id tSet up true contacts list T 1n, as following table:
id 1 id 2 id 3
a1 b2 c1
a2 b1 c2
... ... ...
True contacts list T 1n(id 1, id 2..., id n) in each id tBe true contacts list T 1nExternal key, be also corresponding dimension table T tMajor key.
As shown in Figure 2, in another embodiment of the parallel data mining method that the present invention is based on cloud computing platform separately the frequent item set of dimension table search schematic flow sheet.Compare with a upper embodiment, the frequent item set of the independent dimension table in the step 102 in the present embodiment to search flow process as follows:
Step 201, statistical fact contacts list T 1nIn with each independent dimension table T tOuter chain id corresponding to major key tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
Step 202, store described outer chain id by vector tEvery kind of value;
Step 203, according to the chain id of vector China and foreign countries tThe number of times statistics that occurs is found out frequent item set.
The vector that forms in step 202 has just stored for the dimension table number of times that each value of its major key occurs in true contacts list, and in step 203 according to this vector chain id at home and abroad tThe number of times that occurs what determine whether a collection is frequent, and the length of the frequent item set in closing with this frequent item set that frequent item set was formed that finds out can be 1~mt.Mt is each dimension table T tThe number of middle property value.
As shown in Figure 3, in the another embodiment of the parallel data mining method that the present invention is based on cloud computing platform across the schematic flow sheet of searching of the frequent item set of dimension table.Compare with a upper embodiment, the flow process of searching across the frequent item set of dimension table in the step 102 in the present embodiment comprises:
Step 301, the counting array of n dimension is set, described counting array is used for recording each independent dimension table T tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Step 302, the true contacts list T of scanning 1nWith n independent dimension table T t, generate line by line the connection tuple of the connection table T of universal relation, to the corresponding dimension separately of every row table T tThe projection process of carrying out obtain corresponding candidate, and each candidate is counted in the relevant position of described counting array;
Step 303, after whole row of described connection table T all are disposed, obtain to have recorded the described counting array of the support of all candidates;
Step 304, determine frequent item set according to described counting array.
At the true contacts list T of scanning 1nWith n independent dimension table T tAfterwards, step 302 can specifically comprise the following steps:
Generate line by line the connection tuple of the connection table T of universal relation, after handling the connection tuple of current line, the connection tuple of described current line is not preserved, and continue to generate and the connection tuple of being connected next line.This mode does not need every delegation of actual materialization connection table T.In each row of connection table T is processed, to the independent dimension table T of current line r tAttribute make projection π T t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i tAll are tieed up separately table T tBy the set i that obtains after projection process tIn all collection and null term collection make up, obtain all candidates, and each candidate counted in the relevant position of described counting array.
By above-mentioned steps, after all row of connection table T are disposed, recorded the support of all candidates in the counting array, accordingly just can be according to determining which collection is frequently in the counting array.And said process successively adopted for two steps completed different works for the treatment of, and connection table T only needs to be calculated and process 1 time, therefore needn't store with complete generation it, so just saved the processing resource, and then improved treatment effeciency.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be completed by the hardware that programmed instruction is correlated with, aforesaid program can be stored in a computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
Fig. 4 is the structural representation of an embodiment of the parallel data mining system of cloud computing platform of the present invention.In the present embodiment, cloud computing platform has the Map-Reduce framework, and the Map-Reduce framework comprises distributed node 1 and the abbreviation task node 2 of a plurality of mappings.Comprise at distributed node the distributed SAAS application data base 11 of having set up.Include a plurality of independent dimension tables in SAAS application data base 11.
Distributed node 1 also comprises: true contacts list sets up unit 12, one-dimensional table frequent item set acquiring unit 13, across dimension table frequent item set acquiring unit 14 and data input cell 15.Wherein, true contacts list is set up unit 12 and is responsible for the distributed SAAS application data base 11 of having set up is set up true contacts list.One-dimensional table frequent item set acquiring unit 13 is responsible for according to described true contacts list, each independent dimension table of 11 in described distributed SAAS application data base being carried out data pick-up, finds out the frequent item set of described each independent dimension table.Be responsible for finding out according to described true contacts list the frequent item set across the dimension table of described distributed SAAS application data base 11 across dimension table frequent item set acquiring unit 14.Data input cell 15 is responsible for the frequent item set that finds is input to abbreviation task node 2 as intermediate file.
Abbreviation task node 2 is responsible for the intermediate file that receives from each distributed node 1 is merged, and the frequent item set after the output merging is as data mining results.
In another embodiment, true contacts list is set up the unit and can specifically be comprised: form major key extraction assembly is used for a plurality of independent dimension table of described distributed SAAS application data base is extracted the form major key; Hub-and-spoke configuration is set up assembly, is used for setting up true contacts list according to described form major key, and described true contacts list and described a plurality of independent dimension table are hub-and-spoke configuration.
In another embodiment, one-dimensional table frequent item set acquiring unit can specifically comprise:
Outer chain statistics component is used for adding up described true contacts list T 1nIn with each independent dimension table T tOuter chain id corresponding to major key tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
The vector memory module is used for storing described outer chain id by vector tEvery kind of value;
Frequent item set is searched assembly, is used for according to the described vector chain id of China and foreign countries tThe number of times statistics that occurs is found out frequent item set.
In another embodiment, can specifically comprise across dimension table frequent item set acquiring unit:
The counting array arranges assembly, is used for arranging the counting array of n dimension, and described counting array is used for recording each independent dimension table T tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Connection table is formation component line by line, is used for scanning true contacts list T 1nWith n independent dimension table T t, generate line by line the connection tuple of the connection table T of universal relation;
The projection process assembly is used for the corresponding dimension separately of every row table T tThe projection process of carrying out obtain corresponding candidate;
Frequent item set counting assembly is used for each candidate is counted in the relevant position of described counting array, and after whole row of described connection table T all were disposed, the described counting array of the support of all candidates had been recorded in acquisition;
Frequent item set is determined assembly, is used for determining frequent item set according to described counting array.
In a upper embodiment, connection table formation component line by line can specifically be used for the connection tuple of described current line not being preserved after handling the connection tuple of current line, and continues to generate and the connection tuple of being connected next line.The projection process assembly specifically is used for processing at each row to described connection table T, to the independent dimension table T of current line r tAttribute make projection π T t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i t, obtaining that all are tieed up separately table T tBy the set i that obtains after projection process tIn all collection and null term collection make up, obtain all candidates.
The embodiment of the present invention is carried out in a plurality of distributed nodes based on the mining process of Map-Reduce framework with the large-scale dataset in cloud computing, export final data mining results by the frequent item set merging of task abbreviation node at last, thereby realized the efficient excavation of mass data, greatly improved the efficient of data mining.
Should be noted that at last: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit; Although with reference to preferred embodiment, the present invention is had been described in detail, those of ordinary skill in the field are to be understood that: still can modify or the part technical characterictic is equal to replacement the specific embodiment of the present invention; And not breaking away from the spirit of technical solution of the present invention, it all should be encompassed in the middle of the technical scheme scope that the present invention asks for protection.

Claims (10)

1. parallel data mining method based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, and described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, and described parallel data mining method comprises:
Described distributed node is set up true contacts list to the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node carries out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, finds out the frequent item set of described each independent dimension table; And/or find out the frequent item set across the dimension table of described distributed SAAS application data base according to described true contacts list;
All described distributed nodes are input to described abbreviation task node with the frequent item set that finds as intermediate file;
Described abbreviation task node merges the intermediate file that receives, and the frequent item set after the output merging is as data mining results.
2. parallel data mining method according to claim 1, wherein, the operation that described distributed node is set up true contacts list to the distributed SAAS application data base of having set up specifically comprises:
Described distributed node extracts the form major key to a plurality of independent dimension table in described distributed SAAS application data base, sets up true contacts list according to described form major key, and described true contacts list and described a plurality of independent dimension table are hub-and-spoke configuration.
3. parallel data mining method according to claim 2, wherein, described distributed node carries out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, and the operation of finding out the frequent item set of described each independent dimension table specifically comprises:
Add up described true contacts list T 1nIn with each independent dimension table T tOuter chain id corresponding to major key tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
Store described outer chain id by vector tEvery kind of value;
According to the described vector chain id of China and foreign countries tThe number of times statistics that occurs is found out frequent item set.
4. parallel data mining method according to claim 2, wherein, the described operation across the frequent item set of dimension table of finding out described distributed SAAS application data base according to described true contacts list specifically comprises:
The counting array of n dimension is set, and described counting array is used for recording each independent dimension table T tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Scan true contacts list T 1nWith n independent dimension table T t, generate line by line the connection tuple of the connection table T of universal relation, to the corresponding dimension separately of every row table T tThe projection process of carrying out obtain corresponding candidate, and each candidate is counted in the relevant position of described counting array;
After whole row of described connection table T all were disposed, the described counting array of the support of all candidates had been recorded in acquisition;
Determine frequent item set according to described counting array.
5. parallel data mining method according to claim 4, wherein, the described connection tuple that generates line by line the connection table T of universal relation is to the corresponding dimension separately of every row table T tThe projection process of carrying out obtain corresponding candidate, and the operation that each candidate is counted in the relevant position of described counting array specifically comprises:
Generate line by line the connection tuple of the connection table T of universal relation, after handling the connection tuple of current line, the connection tuple of described current line is not preserved, and continue to generate and the connection tuple of being connected next line;
In each row of described connection table T is processed, to the independent dimension table T of current line r tAttribute make projection π T t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i t
All are tieed up separately table T tBy the set i that obtains after projection process tIn all collection and null term collection make up, obtain all candidates, and each candidate counted in the relevant position of described counting array.
6. parallel data mining system based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, described distributed node comprises the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node also comprises:
True contacts list is set up the unit, is used for the distributed SAAS application data base of having set up is set up true contacts list;
One-dimensional table frequent item set acquiring unit is used for according to described true contacts list, each independent dimension table of described distributed SAAS application data base being carried out data pick-up, finds out the frequent item set of described each independent dimension table;
Across dimension table frequent item set acquiring unit, be used for finding out according to described true contacts list the frequent item set across the dimension table of described distributed SAAS application data base;
Data input cell is used for the frequent item set that finds is input to described abbreviation task node as intermediate file;
Described abbreviation task node is used for the intermediate file that receives from each distributed node is merged, and the frequent item set after the output merging is as data mining results.
7. parallel data mining according to claim 6 system, wherein, described true contacts list is set up the unit and is specifically comprised:
Form major key extraction assembly is used for a plurality of independent dimension table of described distributed SAAS application data base is extracted the form major key;
Hub-and-spoke configuration is set up assembly, is used for setting up true contacts list according to described form major key, and described true contacts list and described a plurality of independent dimension table are hub-and-spoke configuration.
8. parallel data mining according to claim 7 system, wherein, described one-dimensional table frequent item set acquiring unit specifically comprises:
Outer chain statistics component is used for adding up described true contacts list T 1nIn with each independent dimension table T tOuter chain id corresponding to major key tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
The vector memory module is used for storing described outer chain id by vector tEvery kind of value;
Frequent item set is searched assembly, is used for according to the described vector chain id of China and foreign countries tThe number of times statistics that occurs is found out frequent item set.
9. parallel data mining according to claim 7 system wherein, describedly specifically comprises across dimension table frequent item set acquiring unit:
The counting array arranges assembly, is used for arranging the counting array of n dimension, and described counting array is used for recording each independent dimension table T tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Connection table is formation component line by line, is used for scanning true contacts list T 1nWith n independent dimension table T t, generate line by line the connection tuple of the connection table T of universal relation;
The projection process assembly is used for the corresponding dimension separately of every row table T tThe projection process of carrying out obtain corresponding candidate;
Frequent item set counting assembly is used for each candidate is counted in the relevant position of described counting array, and after whole row of described connection table T all were disposed, the described counting array of the support of all candidates had been recorded in acquisition;
Frequent item set is determined assembly, is used for determining frequent item set according to described counting array.
10. parallel data mining according to claim 9 system, wherein said connection table formation component line by line specifically is used for after handling the connection tuple of current line, the connection tuple of described current line is not preserved, and continue to generate and the connection tuple of being connected next line;
Described projection process assembly specifically is used for processing at each row to described connection table T, to the independent dimension table T of current line r tAttribute make projection π T t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i t, obtaining that all are tieed up separately table T tBy the set i that obtains after projection process tIn all collection and null term collection make up, obtain all candidates.
CN201110386148XA 2011-11-29 2011-11-29 Parallel data mining method and system based on cloud computing platform Pending CN103136244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110386148XA CN103136244A (en) 2011-11-29 2011-11-29 Parallel data mining method and system based on cloud computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110386148XA CN103136244A (en) 2011-11-29 2011-11-29 Parallel data mining method and system based on cloud computing platform

Publications (1)

Publication Number Publication Date
CN103136244A true CN103136244A (en) 2013-06-05

Family

ID=48496079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110386148XA Pending CN103136244A (en) 2011-11-29 2011-11-29 Parallel data mining method and system based on cloud computing platform

Country Status (1)

Country Link
CN (1) CN103136244A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593418A (en) * 2013-10-30 2014-02-19 中国科学院计算技术研究所 Distributed subject finding method and system for big data
CN104754034A (en) * 2014-04-07 2015-07-01 惠州Tcl移动通信有限公司 An end-to-end cloud service system and a data mining method thereof
WO2016123808A1 (en) * 2015-02-06 2016-08-11 华为技术有限公司 Data processing system, calculation node and data processing method
CN103914528B (en) * 2014-03-28 2017-02-15 南京邮电大学 Parallelizing method of association analytical algorithm
CN107451290A (en) * 2017-08-15 2017-12-08 电子科技大学 A kind of data stream frequent item set mining method of parallelization
CN110175198A (en) * 2019-05-30 2019-08-27 禤世丽 Mining Frequent Itemsets and device based on MapReduce and array
CN110785749A (en) * 2018-06-25 2020-02-11 北京嘀嘀无限科技发展有限公司 System and method for generating wide tables
CN112819404A (en) * 2021-01-13 2021-05-18 中国联合网络通信集团有限公司 Data processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5842200A (en) * 1995-03-31 1998-11-24 International Business Machines Corporation System and method for parallel mining of association rules in databases
CN101799810A (en) * 2009-02-06 2010-08-11 中国移动通信集团公司 Association rule mining method and system thereof
CN101887450A (en) * 2010-05-19 2010-11-17 东北电力大学 Stochastic distributed data stream frequent item set mining system and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5842200A (en) * 1995-03-31 1998-11-24 International Business Machines Corporation System and method for parallel mining of association rules in databases
CN101799810A (en) * 2009-02-06 2010-08-11 中国移动通信集团公司 Association rule mining method and system thereof
CN101887450A (en) * 2010-05-19 2010-11-17 东北电力大学 Stochastic distributed data stream frequent item set mining system and method thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
VIVIANE CRESTANA JENSEN,NANDIT SOPARKAR: "frequent itemset counting accross mutiple tables", 《KNOWLEDGE DISCOVERY AND DATA MINING》, 20 April 2000 (2000-04-20), pages 49 - 61 *
何军等: "挖掘多关系关联规则", 《软件学报》, vol. 18, no. 11, 30 November 2007 (2007-11-30), pages 2753 - 2765 *
戎翔等: "基于MapReduce的频繁项集挖掘方法", 《西安邮电学院学报》, vol. 16, no. 4, 10 July 2011 (2011-07-10), pages 37 - 38 *
栾鸾等: "多关系频繁项集的并行获取", 《微电子学与计算机》, vol. 25, no. 10, 30 November 2008 (2008-11-30), pages 94 - 96 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593418B (en) * 2013-10-30 2017-03-29 中国科学院计算技术研究所 A kind of distributed motif discovery method and system towards big data
CN103593418A (en) * 2013-10-30 2014-02-19 中国科学院计算技术研究所 Distributed subject finding method and system for big data
CN103914528B (en) * 2014-03-28 2017-02-15 南京邮电大学 Parallelizing method of association analytical algorithm
CN104754034A (en) * 2014-04-07 2015-07-01 惠州Tcl移动通信有限公司 An end-to-end cloud service system and a data mining method thereof
US10567494B2 (en) 2015-02-06 2020-02-18 Huawei Technologies Co., Ltd. Data processing system, computing node, and data processing method
WO2016123808A1 (en) * 2015-02-06 2016-08-11 华为技术有限公司 Data processing system, calculation node and data processing method
CN106062732A (en) * 2015-02-06 2016-10-26 华为技术有限公司 Data processing system, calculation node and data processing method
CN106062732B (en) * 2015-02-06 2019-03-01 华为技术有限公司 Data processing system, calculate node and the method for data processing
CN107451290A (en) * 2017-08-15 2017-12-08 电子科技大学 A kind of data stream frequent item set mining method of parallelization
CN107451290B (en) * 2017-08-15 2020-03-10 电子科技大学 Parallelized data stream frequent item set mining method
CN110785749B (en) * 2018-06-25 2020-08-21 北京嘀嘀无限科技发展有限公司 System and method for generating wide tables
CN110785749A (en) * 2018-06-25 2020-02-11 北京嘀嘀无限科技发展有限公司 System and method for generating wide tables
US11061882B2 (en) 2018-06-25 2021-07-13 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for generating a wide table
CN110175198A (en) * 2019-05-30 2019-08-27 禤世丽 Mining Frequent Itemsets and device based on MapReduce and array
CN110175198B (en) * 2019-05-30 2023-05-05 禤世丽 Frequent item set mining method and device based on MapReduce and array
CN112819404A (en) * 2021-01-13 2021-05-18 中国联合网络通信集团有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103136244A (en) Parallel data mining method and system based on cloud computing platform
CN101446962B (en) Data conversion method, device thereof and data processing system
CN104899295B (en) A kind of heterogeneous data source data relation analysis method
CN102541757B (en) Write cache method, cache synchronization method and device
KR20100070968A (en) Cluster data management system and method for data recovery using parallel processing in cluster data management system
US20170011082A1 (en) Mechanisms for merging index structures in molap while preserving query consistency
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN103778133A (en) Database object changing method and device
CN102682108B (en) Row and line mixed database storage method
CN104866580A (en) Method for quickly detecting impact caused by database modification to current service
CN104346458A (en) Data storage method and device
CN102314485A (en) Method and device for adding, searching and deleting hash table
CN102222099A (en) Methods and devices for storing and searching data
CN104731896A (en) Data processing method and system
CN105630934A (en) Data statistic method and system
CN104484131B (en) The data processing equipment of multiple disks server and corresponding processing method
CN107451233A (en) Storage method of the preferential space-time trajectory data file of time attribute in auxiliary storage device
CN104298736A (en) Method and device for aggregating and connecting data as well as database system
CN105095247A (en) Symbolic data analysis method and system
CN105678323A (en) Image-based-on method and system for analysis of users
CN105740462A (en) Method for supporting data migration between different environments
CN104317850A (en) Data processing method and device
CN101963993B (en) Method for fast searching database sheet table record
CN103902582A (en) Data warehouse redundancy reduction method and device
CN103543959A (en) Method and device for mass data caching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130605