CN103136244A - Parallel data mining method and system based on cloud computing platform - Google Patents
Parallel data mining method and system based on cloud computing platform Download PDFInfo
- Publication number
- CN103136244A CN103136244A CN201110386148XA CN201110386148A CN103136244A CN 103136244 A CN103136244 A CN 103136244A CN 201110386148X A CN201110386148X A CN 201110386148XA CN 201110386148 A CN201110386148 A CN 201110386148A CN 103136244 A CN103136244 A CN 103136244A
- Authority
- CN
- China
- Prior art keywords
- frequent item
- item set
- dimension table
- contacts list
- distributed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention relates to a parallel data mining method based on a cloud computing platform. The cloud computing platform is provided with a map-reduce frame. The parallel data mining method comprises the following steps: distributed nodes establishes a truth contacting table for a software as a service (SAAS) application data base, the distributed nodes conducts data extracting to each single dimension table according to the truth contacting table to find out a frequent item set of each single dimension table, and/or find out a frequent item set of a dimension-cross table according to the truth contacting table. The frequent item sets found out by all the distributed nodes which serve as middle files are input into mission simplifying nodes. The mission simplifying nodes merge the received middle files and output the merged frequent item set to serve as data mining results. Based on the map-reduce frame, the parallel data mining method based on the cloud computing platform enables the mining process of a large-scale data set in the cloud computing to be carried out in a plurality of distributed nodes, finally the frequent item set of the mission simplifying nodes is merged to output the final data mining results, and therefore efficient mining of mass data is achieved, and efficiency of data mining is greatly improved.
Description
Technical field
The present invention relates to Data Mining, relate in particular to a kind of parallel data mining method and system based on cloud computing platform.
Background technology
Along with the development of cloud computing, software is namely served (Software As A Service, be called for short SAAS) application popularization, is the important technological problems that present enterprise need to solve to the excavation of SAAS application data.Traditional Apriori and improved data mining algorithm are only suitable for less data scale, the mass data of bringing for cloud computing, existing data mining algorithm and to improve the efficient of algorithm all unsatisfactory, corresponding original data mining system can't realize that mass data that enterprise brings cloud computing carries out the requirement of effectively excavating fast.
Summary of the invention
The objective of the invention is to propose a kind of parallel data mining method and system based on cloud computing platform, can realize the efficient excavation of mass data.
For achieving the above object, the invention provides a kind of parallel data mining method based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, and described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, and described parallel data mining method comprises:
Described distributed node is set up true contacts list to the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node carries out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, finds out the frequent item set of described each independent dimension table; And/or find out the frequent item set across the dimension table of described distributed SAAS application data base according to described true contacts list;
All described distributed nodes are input to described abbreviation task node with the frequent item set that finds as intermediate file;
Described abbreviation task node merges the intermediate file that receives, and the frequent item set after the output merging is as data mining results.
For achieving the above object, the invention provides a kind of parallel data mining system based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, described distributed node comprises the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node also comprises:
True contacts list is set up the unit, is used for the distributed SAAS application data base of having set up is set up true contacts list;
One-dimensional table frequent item set acquiring unit is used for according to described true contacts list, each independent dimension table of described distributed SAAS application data base being carried out data pick-up, finds out the frequent item set of described each independent dimension table;
Across dimension table frequent item set acquiring unit, be used for finding out according to described true contacts list the frequent item set across the dimension table of described distributed SAAS application data base;
Data input cell is used for the frequent item set that finds is input to described abbreviation task node as intermediate file;
Described abbreviation task node is used for the intermediate file that receives from each distributed node is merged, and the frequent item set after the output merging is as data mining results.
Based on technique scheme, the present invention is based on mapping-abbreviation (Map-Reduce) framework carries out the mining process of the large-scale dataset in cloud computing in a plurality of distributed nodes, export final data mining results by the frequent item set merging of task abbreviation node at last, thereby realized the efficient excavation of mass data, greatly improved the efficient of data mining.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the schematic flow sheet of an embodiment that the present invention is based on the parallel data mining method of cloud computing platform.
Fig. 2 is the schematic flow sheet of searching that the present invention is based in another embodiment of parallel data mining method of cloud computing platform the frequent item set of dimension table separately.
Fig. 3 is the schematic flow sheet of searching that the present invention is based in the another embodiment of parallel data mining method of cloud computing platform across the frequent item set of dimension table.
Fig. 4 is the structural representation of an embodiment of the parallel data mining system of cloud computing platform of the present invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
As shown in Figure 1, be the schematic flow sheet of an embodiment of the parallel data mining method that the present invention is based on cloud computing platform.Cloud computing platform in the present embodiment has the Map-Reduce framework, and the Map-Reduce framework comprises distributed node and the abbreviation task node of a plurality of mappings.The parallel data mining flow process specifically comprises the following steps:
In the present embodiment, cloud computing platform has adopted the Map-Reduce framework, and this framework is suitable for the concurrent operation of large-scale dataset (for example greater than 1TB).By the large-scale operation to data set being distributed to each distributed node on network, can realize the reliability that operates; And each distributed node can periodically be returned the updating record of the work of completing and state.And consider that Reduction is relatively poor in each distributed node Parallel Implementation effect, can Reduction be dispatched on an abbreviation node as far as possible.
The present embodiment combines with data mining algorithm under the Map-Reduce of cloud computing framework, carries out the data mining of large-scale dataset under distribution system, can improve greatly the efficient of data mining.
In step 101, distributed node can extract the form major key to a plurality of independent dimension table in the distributed SAAS application data base of having set up, and sets up true contacts list according to the form major key in front, and this fact contacts list and a plurality of independent dimension table are hub-and-spoke configuration.
The below has provided the table model example of distributed SAAS application data inside:
Table one
id 1 | at 1 |
a1 | ... |
a2 | ... |
Table two
id 2 | at 2 | at 3 |
b1 | ... | ... |
b2 | ... | ... |
Table three
id 3 | at 4 |
c1 | ... |
c2 | ... |
Tie up separately table T according to three of fronts
tExtract form major key id
t, and according to form major key id
tSet up true contacts list T
1n, as following table:
id 1 | id 2 | id 3 |
a1 | b2 | c1 |
a2 | b1 | c2 |
... | ... | ... |
True contacts list T
1n(id
1, id
2..., id
n) in each id
tBe true contacts list T
1nExternal key, be also corresponding dimension table T
tMajor key.
As shown in Figure 2, in another embodiment of the parallel data mining method that the present invention is based on cloud computing platform separately the frequent item set of dimension table search schematic flow sheet.Compare with a upper embodiment, the frequent item set of the independent dimension table in the step 102 in the present embodiment to search flow process as follows:
The vector that forms in step 202 has just stored for the dimension table number of times that each value of its major key occurs in true contacts list, and in step 203 according to this vector chain id at home and abroad
tThe number of times that occurs what determine whether a collection is frequent, and the length of the frequent item set in closing with this frequent item set that frequent item set was formed that finds out can be 1~mt.Mt is each dimension table T
tThe number of middle property value.
As shown in Figure 3, in the another embodiment of the parallel data mining method that the present invention is based on cloud computing platform across the schematic flow sheet of searching of the frequent item set of dimension table.Compare with a upper embodiment, the flow process of searching across the frequent item set of dimension table in the step 102 in the present embodiment comprises:
At the true contacts list T of scanning
1nWith n independent dimension table T
tAfterwards, step 302 can specifically comprise the following steps:
Generate line by line the connection tuple of the connection table T of universal relation, after handling the connection tuple of current line, the connection tuple of described current line is not preserved, and continue to generate and the connection tuple of being connected next line.This mode does not need every delegation of actual materialization connection table T.In each row of connection table T is processed, to the independent dimension table T of current line r
tAttribute make projection π T
t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i
tAll are tieed up separately table T
tBy the set i that obtains after projection process
tIn all collection and null term collection make up, obtain all candidates, and each candidate counted in the relevant position of described counting array.
By above-mentioned steps, after all row of connection table T are disposed, recorded the support of all candidates in the counting array, accordingly just can be according to determining which collection is frequently in the counting array.And said process successively adopted for two steps completed different works for the treatment of, and connection table T only needs to be calculated and process 1 time, therefore needn't store with complete generation it, so just saved the processing resource, and then improved treatment effeciency.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be completed by the hardware that programmed instruction is correlated with, aforesaid program can be stored in a computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
Fig. 4 is the structural representation of an embodiment of the parallel data mining system of cloud computing platform of the present invention.In the present embodiment, cloud computing platform has the Map-Reduce framework, and the Map-Reduce framework comprises distributed node 1 and the abbreviation task node 2 of a plurality of mappings.Comprise at distributed node the distributed SAAS application data base 11 of having set up.Include a plurality of independent dimension tables in SAAS application data base 11.
Distributed node 1 also comprises: true contacts list sets up unit 12, one-dimensional table frequent item set acquiring unit 13, across dimension table frequent item set acquiring unit 14 and data input cell 15.Wherein, true contacts list is set up unit 12 and is responsible for the distributed SAAS application data base 11 of having set up is set up true contacts list.One-dimensional table frequent item set acquiring unit 13 is responsible for according to described true contacts list, each independent dimension table of 11 in described distributed SAAS application data base being carried out data pick-up, finds out the frequent item set of described each independent dimension table.Be responsible for finding out according to described true contacts list the frequent item set across the dimension table of described distributed SAAS application data base 11 across dimension table frequent item set acquiring unit 14.Data input cell 15 is responsible for the frequent item set that finds is input to abbreviation task node 2 as intermediate file.
In another embodiment, true contacts list is set up the unit and can specifically be comprised: form major key extraction assembly is used for a plurality of independent dimension table of described distributed SAAS application data base is extracted the form major key; Hub-and-spoke configuration is set up assembly, is used for setting up true contacts list according to described form major key, and described true contacts list and described a plurality of independent dimension table are hub-and-spoke configuration.
In another embodiment, one-dimensional table frequent item set acquiring unit can specifically comprise:
Outer chain statistics component is used for adding up described true contacts list T
1nIn with each independent dimension table T
tOuter chain id corresponding to major key
tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
The vector memory module is used for storing described outer chain id by vector
tEvery kind of value;
Frequent item set is searched assembly, is used for according to the described vector chain id of China and foreign countries
tThe number of times statistics that occurs is found out frequent item set.
In another embodiment, can specifically comprise across dimension table frequent item set acquiring unit:
The counting array arranges assembly, is used for arranging the counting array of n dimension, and described counting array is used for recording each independent dimension table T
tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately
tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Connection table is formation component line by line, is used for scanning true contacts list T
1nWith n independent dimension table T
t, generate line by line the connection tuple of the connection table T of universal relation;
The projection process assembly is used for the corresponding dimension separately of every row table T
tThe projection process of carrying out obtain corresponding candidate;
Frequent item set counting assembly is used for each candidate is counted in the relevant position of described counting array, and after whole row of described connection table T all were disposed, the described counting array of the support of all candidates had been recorded in acquisition;
Frequent item set is determined assembly, is used for determining frequent item set according to described counting array.
In a upper embodiment, connection table formation component line by line can specifically be used for the connection tuple of described current line not being preserved after handling the connection tuple of current line, and continues to generate and the connection tuple of being connected next line.The projection process assembly specifically is used for processing at each row to described connection table T, to the independent dimension table T of current line r
tAttribute make projection π T
t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i
t, obtaining that all are tieed up separately table T
tBy the set i that obtains after projection process
tIn all collection and null term collection make up, obtain all candidates.
The embodiment of the present invention is carried out in a plurality of distributed nodes based on the mining process of Map-Reduce framework with the large-scale dataset in cloud computing, export final data mining results by the frequent item set merging of task abbreviation node at last, thereby realized the efficient excavation of mass data, greatly improved the efficient of data mining.
Should be noted that at last: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit; Although with reference to preferred embodiment, the present invention is had been described in detail, those of ordinary skill in the field are to be understood that: still can modify or the part technical characterictic is equal to replacement the specific embodiment of the present invention; And not breaking away from the spirit of technical solution of the present invention, it all should be encompassed in the middle of the technical scheme scope that the present invention asks for protection.
Claims (10)
1. parallel data mining method based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, and described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, and described parallel data mining method comprises:
Described distributed node is set up true contacts list to the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node carries out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, finds out the frequent item set of described each independent dimension table; And/or find out the frequent item set across the dimension table of described distributed SAAS application data base according to described true contacts list;
All described distributed nodes are input to described abbreviation task node with the frequent item set that finds as intermediate file;
Described abbreviation task node merges the intermediate file that receives, and the frequent item set after the output merging is as data mining results.
2. parallel data mining method according to claim 1, wherein, the operation that described distributed node is set up true contacts list to the distributed SAAS application data base of having set up specifically comprises:
Described distributed node extracts the form major key to a plurality of independent dimension table in described distributed SAAS application data base, sets up true contacts list according to described form major key, and described true contacts list and described a plurality of independent dimension table are hub-and-spoke configuration.
3. parallel data mining method according to claim 2, wherein, described distributed node carries out data pick-up according to described true contacts list to each the independent dimension table in described distributed SAAS application data base, and the operation of finding out the frequent item set of described each independent dimension table specifically comprises:
Add up described true contacts list T
1nIn with each independent dimension table T
tOuter chain id corresponding to major key
tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
Store described outer chain id by vector
tEvery kind of value;
According to the described vector chain id of China and foreign countries
tThe number of times statistics that occurs is found out frequent item set.
4. parallel data mining method according to claim 2, wherein, the described operation across the frequent item set of dimension table of finding out described distributed SAAS application data base according to described true contacts list specifically comprises:
The counting array of n dimension is set, and described counting array is used for recording each independent dimension table T
tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately
tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Scan true contacts list T
1nWith n independent dimension table T
t, generate line by line the connection tuple of the connection table T of universal relation, to the corresponding dimension separately of every row table T
tThe projection process of carrying out obtain corresponding candidate, and each candidate is counted in the relevant position of described counting array;
After whole row of described connection table T all were disposed, the described counting array of the support of all candidates had been recorded in acquisition;
Determine frequent item set according to described counting array.
5. parallel data mining method according to claim 4, wherein, the described connection tuple that generates line by line the connection table T of universal relation is to the corresponding dimension separately of every row table T
tThe projection process of carrying out obtain corresponding candidate, and the operation that each candidate is counted in the relevant position of described counting array specifically comprises:
Generate line by line the connection tuple of the connection table T of universal relation, after handling the connection tuple of current line, the connection tuple of described current line is not preserved, and continue to generate and the connection tuple of being connected next line;
In each row of described connection table T is processed, to the independent dimension table T of current line r
tAttribute make projection π T
t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i
t
All are tieed up separately table T
tBy the set i that obtains after projection process
tIn all collection and null term collection make up, obtain all candidates, and each candidate counted in the relevant position of described counting array.
6. parallel data mining system based on cloud computing platform, described cloud computing platform has mapping-abbreviation framework, described mapping-abbreviation framework comprises distributed node and the abbreviation task node of a plurality of mappings, described distributed node comprises the distributed SAAS application data base of having set up, and described SAAS application data base comprises a plurality of independent dimension tables;
Described distributed node also comprises:
True contacts list is set up the unit, is used for the distributed SAAS application data base of having set up is set up true contacts list;
One-dimensional table frequent item set acquiring unit is used for according to described true contacts list, each independent dimension table of described distributed SAAS application data base being carried out data pick-up, finds out the frequent item set of described each independent dimension table;
Across dimension table frequent item set acquiring unit, be used for finding out according to described true contacts list the frequent item set across the dimension table of described distributed SAAS application data base;
Data input cell is used for the frequent item set that finds is input to described abbreviation task node as intermediate file;
Described abbreviation task node is used for the intermediate file that receives from each distributed node is merged, and the frequent item set after the output merging is as data mining results.
7. parallel data mining according to claim 6 system, wherein, described true contacts list is set up the unit and is specifically comprised:
Form major key extraction assembly is used for a plurality of independent dimension table of described distributed SAAS application data base is extracted the form major key;
Hub-and-spoke configuration is set up assembly, is used for setting up true contacts list according to described form major key, and described true contacts list and described a plurality of independent dimension table are hub-and-spoke configuration.
8. parallel data mining according to claim 7 system, wherein, described one-dimensional table frequent item set acquiring unit specifically comprises:
Outer chain statistics component is used for adding up described true contacts list T
1nIn with each independent dimension table T
tOuter chain id corresponding to major key
tThe number of times of different value occurs, the value of t is 1~n, and n is the number of the described independent dimension table in described distributed SAAS application data base;
The vector memory module is used for storing described outer chain id by vector
tEvery kind of value;
Frequent item set is searched assembly, is used for according to the described vector chain id of China and foreign countries
tThe number of times statistics that occurs is found out frequent item set.
9. parallel data mining according to claim 7 system wherein, describedly specifically comprises across dimension table frequent item set acquiring unit:
The counting array arranges assembly, is used for arranging the counting array of n dimension, and described counting array is used for recording each independent dimension table T
tThe candidate collect mutually, t in described counting array dimension element is corresponding to dimension table T separately
tThe frequent item set set in item collection or null term collection, the value of t is 1~n, n is the number of the described independent dimension table in described distributed SAAS application data base;
Connection table is formation component line by line, is used for scanning true contacts list T
1nWith n independent dimension table T
t, generate line by line the connection tuple of the connection table T of universal relation;
The projection process assembly is used for the corresponding dimension separately of every row table T
tThe projection process of carrying out obtain corresponding candidate;
Frequent item set counting assembly is used for each candidate is counted in the relevant position of described counting array, and after whole row of described connection table T all were disposed, the described counting array of the support of all candidates had been recorded in acquisition;
Frequent item set is determined assembly, is used for determining frequent item set according to described counting array.
10. parallel data mining according to claim 9 system, wherein said connection table formation component line by line specifically is used for after handling the connection tuple of current line, the connection tuple of described current line is not preserved, and continue to generate and the connection tuple of being connected next line;
Described projection process assembly specifically is used for processing at each row to described connection table T, to the independent dimension table T of current line r
tAttribute make projection π T
t(r), find out all collection and the null term collection in the frequent item set set of belonging to that comprises, consist of set i
t, obtaining that all are tieed up separately table T
tBy the set i that obtains after projection process
tIn all collection and null term collection make up, obtain all candidates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110386148XA CN103136244A (en) | 2011-11-29 | 2011-11-29 | Parallel data mining method and system based on cloud computing platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110386148XA CN103136244A (en) | 2011-11-29 | 2011-11-29 | Parallel data mining method and system based on cloud computing platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103136244A true CN103136244A (en) | 2013-06-05 |
Family
ID=48496079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110386148XA Pending CN103136244A (en) | 2011-11-29 | 2011-11-29 | Parallel data mining method and system based on cloud computing platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103136244A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593418A (en) * | 2013-10-30 | 2014-02-19 | 中国科学院计算技术研究所 | Distributed subject finding method and system for big data |
CN104754034A (en) * | 2014-04-07 | 2015-07-01 | 惠州Tcl移动通信有限公司 | An end-to-end cloud service system and a data mining method thereof |
WO2016123808A1 (en) * | 2015-02-06 | 2016-08-11 | 华为技术有限公司 | Data processing system, calculation node and data processing method |
CN103914528B (en) * | 2014-03-28 | 2017-02-15 | 南京邮电大学 | Parallelizing method of association analytical algorithm |
CN107451290A (en) * | 2017-08-15 | 2017-12-08 | 电子科技大学 | A kind of data stream frequent item set mining method of parallelization |
CN110175198A (en) * | 2019-05-30 | 2019-08-27 | 禤世丽 | Mining Frequent Itemsets and device based on MapReduce and array |
CN110785749A (en) * | 2018-06-25 | 2020-02-11 | 北京嘀嘀无限科技发展有限公司 | System and method for generating wide tables |
CN112819404A (en) * | 2021-01-13 | 2021-05-18 | 中国联合网络通信集团有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5842200A (en) * | 1995-03-31 | 1998-11-24 | International Business Machines Corporation | System and method for parallel mining of association rules in databases |
CN101799810A (en) * | 2009-02-06 | 2010-08-11 | 中国移动通信集团公司 | Association rule mining method and system thereof |
CN101887450A (en) * | 2010-05-19 | 2010-11-17 | 东北电力大学 | Stochastic distributed data stream frequent item set mining system and method thereof |
-
2011
- 2011-11-29 CN CN201110386148XA patent/CN103136244A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5842200A (en) * | 1995-03-31 | 1998-11-24 | International Business Machines Corporation | System and method for parallel mining of association rules in databases |
CN101799810A (en) * | 2009-02-06 | 2010-08-11 | 中国移动通信集团公司 | Association rule mining method and system thereof |
CN101887450A (en) * | 2010-05-19 | 2010-11-17 | 东北电力大学 | Stochastic distributed data stream frequent item set mining system and method thereof |
Non-Patent Citations (4)
Title |
---|
VIVIANE CRESTANA JENSEN,NANDIT SOPARKAR: "frequent itemset counting accross mutiple tables", 《KNOWLEDGE DISCOVERY AND DATA MINING》, 20 April 2000 (2000-04-20), pages 49 - 61 * |
何军等: "挖掘多关系关联规则", 《软件学报》, vol. 18, no. 11, 30 November 2007 (2007-11-30), pages 2753 - 2765 * |
戎翔等: "基于MapReduce的频繁项集挖掘方法", 《西安邮电学院学报》, vol. 16, no. 4, 10 July 2011 (2011-07-10), pages 37 - 38 * |
栾鸾等: "多关系频繁项集的并行获取", 《微电子学与计算机》, vol. 25, no. 10, 30 November 2008 (2008-11-30), pages 94 - 96 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593418B (en) * | 2013-10-30 | 2017-03-29 | 中国科学院计算技术研究所 | A kind of distributed motif discovery method and system towards big data |
CN103593418A (en) * | 2013-10-30 | 2014-02-19 | 中国科学院计算技术研究所 | Distributed subject finding method and system for big data |
CN103914528B (en) * | 2014-03-28 | 2017-02-15 | 南京邮电大学 | Parallelizing method of association analytical algorithm |
CN104754034A (en) * | 2014-04-07 | 2015-07-01 | 惠州Tcl移动通信有限公司 | An end-to-end cloud service system and a data mining method thereof |
US10567494B2 (en) | 2015-02-06 | 2020-02-18 | Huawei Technologies Co., Ltd. | Data processing system, computing node, and data processing method |
WO2016123808A1 (en) * | 2015-02-06 | 2016-08-11 | 华为技术有限公司 | Data processing system, calculation node and data processing method |
CN106062732A (en) * | 2015-02-06 | 2016-10-26 | 华为技术有限公司 | Data processing system, calculation node and data processing method |
CN106062732B (en) * | 2015-02-06 | 2019-03-01 | 华为技术有限公司 | Data processing system, calculate node and the method for data processing |
CN107451290A (en) * | 2017-08-15 | 2017-12-08 | 电子科技大学 | A kind of data stream frequent item set mining method of parallelization |
CN107451290B (en) * | 2017-08-15 | 2020-03-10 | 电子科技大学 | Parallelized data stream frequent item set mining method |
CN110785749B (en) * | 2018-06-25 | 2020-08-21 | 北京嘀嘀无限科技发展有限公司 | System and method for generating wide tables |
CN110785749A (en) * | 2018-06-25 | 2020-02-11 | 北京嘀嘀无限科技发展有限公司 | System and method for generating wide tables |
US11061882B2 (en) | 2018-06-25 | 2021-07-13 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for generating a wide table |
CN110175198A (en) * | 2019-05-30 | 2019-08-27 | 禤世丽 | Mining Frequent Itemsets and device based on MapReduce and array |
CN110175198B (en) * | 2019-05-30 | 2023-05-05 | 禤世丽 | Frequent item set mining method and device based on MapReduce and array |
CN112819404A (en) * | 2021-01-13 | 2021-05-18 | 中国联合网络通信集团有限公司 | Data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103136244A (en) | Parallel data mining method and system based on cloud computing platform | |
CN101446962B (en) | Data conversion method, device thereof and data processing system | |
CN104899295B (en) | A kind of heterogeneous data source data relation analysis method | |
CN102541757B (en) | Write cache method, cache synchronization method and device | |
KR20100070968A (en) | Cluster data management system and method for data recovery using parallel processing in cluster data management system | |
US20170011082A1 (en) | Mechanisms for merging index structures in molap while preserving query consistency | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN103778133A (en) | Database object changing method and device | |
CN102682108B (en) | Row and line mixed database storage method | |
CN104866580A (en) | Method for quickly detecting impact caused by database modification to current service | |
CN104346458A (en) | Data storage method and device | |
CN102314485A (en) | Method and device for adding, searching and deleting hash table | |
CN102222099A (en) | Methods and devices for storing and searching data | |
CN104731896A (en) | Data processing method and system | |
CN105630934A (en) | Data statistic method and system | |
CN104484131B (en) | The data processing equipment of multiple disks server and corresponding processing method | |
CN107451233A (en) | Storage method of the preferential space-time trajectory data file of time attribute in auxiliary storage device | |
CN104298736A (en) | Method and device for aggregating and connecting data as well as database system | |
CN105095247A (en) | Symbolic data analysis method and system | |
CN105678323A (en) | Image-based-on method and system for analysis of users | |
CN105740462A (en) | Method for supporting data migration between different environments | |
CN104317850A (en) | Data processing method and device | |
CN101963993B (en) | Method for fast searching database sheet table record | |
CN103902582A (en) | Data warehouse redundancy reduction method and device | |
CN103543959A (en) | Method and device for mass data caching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130605 |