CN112925821A - 基于MapReduce的并行频繁项集增量数据挖掘方法 - Google Patents
基于MapReduce的并行频繁项集增量数据挖掘方法 Download PDFInfo
- Publication number
- CN112925821A CN112925821A CN202110177059.8A CN202110177059A CN112925821A CN 112925821 A CN112925821 A CN 112925821A CN 202110177059 A CN202110177059 A CN 202110177059A CN 112925821 A CN112925821 A CN 112925821A
- Authority
- CN
- China
- Prior art keywords
- data
- items
- mapreduce
- value
- mining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000007418 data mining Methods 0.000 title claims abstract description 15
- 238000005065 mining Methods 0.000 claims abstract description 54
- 230000002068 genetic effect Effects 0.000 claims abstract description 15
- 238000013144 data compression Methods 0.000 claims abstract description 14
- 238000010276 construction Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 41
- 230000008569 process Effects 0.000 claims description 27
- 238000007906 compression Methods 0.000 claims description 16
- 230000006835 compression Effects 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 11
- 238000005192 partition Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 9
- 229910052717 sulfur Inorganic materials 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 6
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000000903 blocking effect Effects 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 abstract description 10
- 230000001133 acceleration Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 239000013589 supplement Substances 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 244000141353 Prunus domestica Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 125000001967 indiganyl group Chemical group [H][In]([H])[*] 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
RetailRocket | Susy | Jester | |
Records | 2756101 | 5000000 | 4100000 |
Items | 5 | 28 | 13 |
Items | 299.6 | 880.5 | 96 |
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110177059.8A CN112925821B (zh) | 2021-02-07 | 2021-02-07 | 基于MapReduce的并行频繁项集增量数据挖掘方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110177059.8A CN112925821B (zh) | 2021-02-07 | 2021-02-07 | 基于MapReduce的并行频繁项集增量数据挖掘方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112925821A true CN112925821A (zh) | 2021-06-08 |
CN112925821B CN112925821B (zh) | 2022-05-13 |
Family
ID=76171345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110177059.8A Active CN112925821B (zh) | 2021-02-07 | 2021-02-07 | 基于MapReduce的并行频繁项集增量数据挖掘方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112925821B (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113254755A (zh) * | 2021-07-19 | 2021-08-13 | 南京烽火星空通信发展有限公司 | 一种基于分布式框架的舆情并行关联挖掘方法 |
CN115525695A (zh) * | 2022-10-08 | 2022-12-27 | 广东工业大学 | 一种互联网金融实时流数据的增量式频繁项集挖掘方法 |
CN117111845A (zh) * | 2023-08-18 | 2023-11-24 | 中电云计算技术有限公司 | 一种数据压缩方法、装置、设备及存储介质 |
CN117556289A (zh) * | 2024-01-12 | 2024-02-13 | 山东杰出人才发展集团有限公司 | 一种基于数据挖掘的企业数字化智能运营方法及系统 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104123216A (zh) * | 2014-07-24 | 2014-10-29 | 深圳职业技术学院 | 基于角色加权关联规则的角色与活动的智能工作流的生成方法 |
CN105930375A (zh) * | 2016-04-13 | 2016-09-07 | 云南财经大学 | 一种基于xbrl文件的数据挖掘方法 |
CN106599122A (zh) * | 2016-12-01 | 2017-04-26 | 东北大学 | 一种基于垂直分解的并行频繁闭序列挖掘方法 |
CN106709822A (zh) * | 2017-03-14 | 2017-05-24 | 国家电网公司 | 一种行业用电数据关联关系挖掘方法及装置 |
CN106815302A (zh) * | 2016-12-13 | 2017-06-09 | 华中科技大学 | 一种应用于游戏道具推荐的频繁项集挖掘方法 |
CN110532739A (zh) * | 2019-08-30 | 2019-12-03 | 西安邮电大学 | 一种基于频繁模式挖掘的多线程程序抄袭检测方法 |
CN111309786A (zh) * | 2020-02-20 | 2020-06-19 | 江西理工大学 | 基于MapReduce的并行频繁项集挖掘方法 |
CN111309827A (zh) * | 2020-03-23 | 2020-06-19 | 平安医疗健康管理股份有限公司 | 知识图谱构建方法、装置、计算机系统及可读存储介质 |
CN111488675A (zh) * | 2020-03-18 | 2020-08-04 | 四川大学 | 电力系统连锁故障潜在触发模式的挖掘方法 |
CN111597322A (zh) * | 2019-12-28 | 2020-08-28 | 华南理工大学 | 基于频繁项集的模板自动挖掘系统及其方法 |
-
2021
- 2021-02-07 CN CN202110177059.8A patent/CN112925821B/zh active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104123216A (zh) * | 2014-07-24 | 2014-10-29 | 深圳职业技术学院 | 基于角色加权关联规则的角色与活动的智能工作流的生成方法 |
CN105930375A (zh) * | 2016-04-13 | 2016-09-07 | 云南财经大学 | 一种基于xbrl文件的数据挖掘方法 |
CN106599122A (zh) * | 2016-12-01 | 2017-04-26 | 东北大学 | 一种基于垂直分解的并行频繁闭序列挖掘方法 |
CN106815302A (zh) * | 2016-12-13 | 2017-06-09 | 华中科技大学 | 一种应用于游戏道具推荐的频繁项集挖掘方法 |
CN106709822A (zh) * | 2017-03-14 | 2017-05-24 | 国家电网公司 | 一种行业用电数据关联关系挖掘方法及装置 |
CN110532739A (zh) * | 2019-08-30 | 2019-12-03 | 西安邮电大学 | 一种基于频繁模式挖掘的多线程程序抄袭检测方法 |
CN111597322A (zh) * | 2019-12-28 | 2020-08-28 | 华南理工大学 | 基于频繁项集的模板自动挖掘系统及其方法 |
CN111309786A (zh) * | 2020-02-20 | 2020-06-19 | 江西理工大学 | 基于MapReduce的并行频繁项集挖掘方法 |
CN111488675A (zh) * | 2020-03-18 | 2020-08-04 | 四川大学 | 电力系统连锁故障潜在触发模式的挖掘方法 |
CN111309827A (zh) * | 2020-03-23 | 2020-06-19 | 平安医疗健康管理股份有限公司 | 知识图谱构建方法、装置、计算机系统及可读存储介质 |
Non-Patent Citations (2)
Title |
---|
DAWEN XIA等: ""A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Sapatiotemporal Association Analysis of Mobile Trajectory Big Data "", 《COMPLEXITY》 * |
张维东等: ""利用决策树进行数据挖掘中的信息熵计算"", 《计算机工程》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113254755A (zh) * | 2021-07-19 | 2021-08-13 | 南京烽火星空通信发展有限公司 | 一种基于分布式框架的舆情并行关联挖掘方法 |
CN115525695A (zh) * | 2022-10-08 | 2022-12-27 | 广东工业大学 | 一种互联网金融实时流数据的增量式频繁项集挖掘方法 |
CN117111845A (zh) * | 2023-08-18 | 2023-11-24 | 中电云计算技术有限公司 | 一种数据压缩方法、装置、设备及存储介质 |
CN117556289A (zh) * | 2024-01-12 | 2024-02-13 | 山东杰出人才发展集团有限公司 | 一种基于数据挖掘的企业数字化智能运营方法及系统 |
CN117556289B (zh) * | 2024-01-12 | 2024-04-16 | 山东杰出人才发展集团有限公司 | 一种基于数据挖掘的企业数字化智能运营方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN112925821B (zh) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112925821B (zh) | 基于MapReduce的并行频繁项集增量数据挖掘方法 | |
Ramírez-Gallego et al. | An information theory-based feature selection framework for big data under apache spark | |
US10810169B1 (en) | Hybrid file system architecture, file storage, dynamic migration, and application thereof | |
Yagoubi et al. | Dpisax: Massively distributed partitioned isax | |
US8935233B2 (en) | Approximate index in relational databases | |
US8666971B2 (en) | Intelligent adaptive index density in a database management system | |
Zeng | Metric divergence measures and information value in credit scoring | |
US10877973B2 (en) | Method for efficient one-to-one join | |
Lin et al. | Maintenance algorithm for high average-utility itemsets with transaction deletion | |
Qu et al. | Efficient mining of frequent itemsets using only one dynamic prefix tree | |
Mao et al. | A MapReduce-based K-means clustering algorithm | |
CN113010597B (zh) | 一种面向海洋大数据的并行关联规则挖掘方法 | |
CN108389152B (zh) | 一种图结构感知的图处理方法及装置 | |
Singh et al. | High average-utility itemsets mining: a survey | |
Zhang et al. | Logistics service supply chain order allocation mixed K-Means and Qos matching | |
Elmeiligy et al. | An efficient parallel indexing structure for multi-dimensional big data using spark | |
Brabant et al. | k-maxitive Sugeno integrals as aggregation models for ordinal preferences | |
CN111309786B (zh) | 基于MapReduce的并行频繁项集挖掘方法 | |
JP2016095561A (ja) | 制御装置、分散データベースシステム、方法およびプログラム | |
Yang et al. | A dynamic balanced quadtree for real-time streaming data | |
Rochd et al. | An Efficient Distributed Frequent Itemset Mining Algorithm Based on Spark for Big Data. | |
Wijayanto et al. | Upgrading products based on existing dominant competitors | |
Choi et al. | Optimization of Dominance Testing in Skyline Queries Using Decision Trees | |
Han et al. | An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan | |
Ju et al. | Data Cleaning Optimization for Grain Big Data Processing using Task Merging |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220413 Address after: No.288, Daxue Road, Zhenjiang District, Shaoguan City, Guangdong Province, 512023 Applicant after: SHAOGUAN University Address before: 86 No. 341000 Jiangxi city of Ganzhou province Zhanggong District Hongqi Avenue Applicant before: Jiangxi University of Science and Technology |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231206 Address after: 117000 No. 130, Guangyu Road, Pingshan District, Benxi City, Liaoning Province Patentee after: BENXI STEEL (GROUP) INFORMATION AUTOMATION CO.,LTD. Address before: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province Patentee before: Dragon totem Technology (Hefei) Co.,Ltd. Effective date of registration: 20231206 Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province Patentee after: Dragon totem Technology (Hefei) Co.,Ltd. Address before: No.288, Daxue Road, Zhenjiang District, Shaoguan City, Guangdong Province, 512023 Patentee before: SHAOGUAN University |
|
TR01 | Transfer of patent right |