CN114691749B - Method for parallel incremental mining of frequent item sets based on sliding window - Google Patents

Method for parallel incremental mining of frequent item sets based on sliding window Download PDF

Info

Publication number
CN114691749B
CN114691749B CN202210077060.8A CN202210077060A CN114691749B CN 114691749 B CN114691749 B CN 114691749B CN 202210077060 A CN202210077060 A CN 202210077060A CN 114691749 B CN114691749 B CN 114691749B
Authority
CN
China
Prior art keywords
item
frequent
data
data set
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210077060.8A
Other languages
Chinese (zh)
Other versions
CN114691749A (en
Inventor
马汉达
方伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202210077060.8A priority Critical patent/CN114691749B/en
Publication of CN114691749A publication Critical patent/CN114691749A/en
Application granted granted Critical
Publication of CN114691749B publication Critical patent/CN114691749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of data processing analysis, and particularly relates to a method for parallel incremental mining of frequent item sets based on a sliding window, which aims at the problem of low operation efficiency of the existing parallel incremental mining method in a big data environment. The main implementation steps of the invention are as follows: acquiring and preprocessing a data set; dividing the data set into a plurality of blocks of incremental data sets; mining frequent item sets and quasi-frequent item sets of a single batch data set; if the previous batch data set exists in the current window, merging and updating the mining result of the current batch data set and the mining result of the previous batch; otherwise, entering a frequent item set and a quasi-frequent item set after incremental updating in the persistence current window and outputting the frequent item set; thus, the incremental dataset continues to be input, and the incremental mining step is looped. According to the invention, by introducing the technologies of sliding windows and the like, the speed of judging whether the item set is frequent or not is increased, and the method has good mining efficiency by combining Spark parallel computing and Hadoop distributed storage.

Description

Method for parallel incremental mining of frequent item sets based on sliding window
Technical Field
The invention belongs to the field of data processing analysis, and particularly relates to a method for parallel incremental mining of frequent item sets based on a sliding window.
Background
Association rules are an important area of research in data mining, aimed at finding frequent patterns in a data set. Association rule mining has been widely used in the fields of shopping recommendation, website click analysis, e-commerce, finance, medical diagnosis, and the like. Static association rule mining is the discovery of frequent item sets with fixed data sets and support. The support degree and the data set are changed most of the time, the incremental association rule mining is the frequent pattern mining under the condition that the data set is increased, and the incremental mining of the frequent item set is the main part of the association rule incremental mining. When facing large-scale data sets, the way of reading the data sets into the memory at one time is often not preferable, which requires a large memory space and huge I/O overhead, and has low expandability and low performance.
At this time, the batch reading into the memory occurs, and the increment mining is carried out on frequent item sets, but the mode is severely dependent on the historical data set in the aspect of reckoning the candidate item sets after increment updating, and the task of scanning the data set after the whole increment becomes unusually heavy along with continuous increment input of the historical data set; there are also methods to accelerate whole delta mining through a distributed computing framework of Hadoop and Spark; in addition, when the frequent item set is updated in an increment mode, if the frequent item set is constructed as a mode tree according to the traditional item set support count, the item ordering in the mined frequent item set is ordered according to the support count, and after the support of each item of the same frequent item set is changed, the internal ordering is not guaranteed, so that the increment item set and the history item set are difficult to match when the frequent item set is updated.
Disclosure of Invention
The invention provides a method for parallel increment mining of frequent item sets based on a sliding window, which aims at solving the defects of the prior art, and further improves the efficiency when processing large-scale increment data by combining a parallelization calculation frame while optimizing a structure and reducing data scanning work.
The technical scheme of the invention is as follows:
a method for parallel incremental mining of frequent item sets based on sliding windows specifically comprises the following steps:
step 1, acquiring a data set;
step 2, data preprocessing is carried out on the acquired data set;
step 3, dividing the data set into n incremental data sets DB k
Step 4, for the divided data set DB k Incremental excavation is carried out according to the batch input sliding window;
step 5, mining the current single batch data set DB k Frequent item sets and quasi-frequent item sets;
step 6, the current batch data set DB k As a preface batch DB 1…k-1 The increment of the data sets, combining the frequent item sets and the quasi-frequent item sets mined by the current batch and the previous batch data sets in the sliding window;
and 7, acquiring all frequent item sets in the updated current sliding window.
As a further preferable scheme of the method for parallel increment mining of frequent item sets based on sliding windows, in step 2, data preprocessing comprises the numerical processing of transaction items in transaction data sets and removing dirty data.
As a means ofIn the step 3, the data set dividing mode is divided into n parts according to the total number of data set transactions, and each part of data set is marked as DB k ,k∈[1,n]The method comprises the steps of carrying out a first treatment on the surface of the Since the transaction records of each data set have the same number, the transaction items of each transaction record have different numbers, and thus each data set DB is finally obtained k Is not absolutely equal in size. As a further preferable scheme of the method for parallel incremental mining of frequent item sets based on sliding window of the present invention, in step 4, there is defined as follows:
definition 4.1, sliding window is defined as a fixed size window containing m batches of data sets, which behaves like a fixed size queue of length m, one head in and the other head out; only m batches of data sets are reserved in the sliding window, when the increment data set of the (m+1) th batch is input, the 1 st batch of data set at the other end of the window needs to be removed, and only m fixed batches of data sets are ensured in the window.
Definition 4.2 incremental mining in sliding Window is defined as single batch dataset DB in each input Window k The incremental mining is to be performed on the basis of its preamble m-1 batch data sets.
As a further preferable scheme of the method for parallel incremental mining of frequent item sets based on sliding window of the present invention, in step 5, the following definitions and steps are included:
defining 5.1, wherein the frequent item set indicates that the support degree supitems of the item set exceeds the frequent minimum support degree minsup;
defining 5.2, wherein the quasi-frequent item set indicates that the support degree supitems of the item set exceeds the quasi-frequent minimum support degree semisup, but is smaller than the frequent minimum support degree minusup; wherein semisup < minusup needs to be satisfied;
step 5.3, single batch dataset DB k As the current window input data, reading the data set through textFile;
step 5.4, counting the frequent 1 item set and the quasi-frequent 1 item set, mapping into a data set by performing a flatMap operation on each transaction record tran (item 1, item2 …, item q)The method comprises the steps of (1) forming a binary group, then accumulating the count item count of each 1 item set through a reduced ByKey aggregation operation, and mapping the count item count into a new binary group (item q, item count); finally, using a filter operation to screen semisup<=minsup*|DB k 1 item set of L as quasi-frequent 1 item set L D1 ' frequent 1 item set L is screened out by using filter operation D1
Step 5.5, merging the frequent 1 item set and the quasi-frequent 1 item set to L s1 =L D1 +L D1 ' sorting according to the dictionary sequence of the item set, and broadcasting to each computing node for use;
step 5.6, pruning the transaction data set; 1 item set L according to the statistics s1 Rescanning data set DB k For data set DB k Is not in 1 item set L s1 Removing the transaction record items in the database;
step 5.7, grouping transaction records according to dictionary sequence prefix item items; rereading pruned transaction set DB k Ordering the items in each transaction record according to dictionary sequence, then executing a flat map operation on each transaction record, and enumerating the records according to the same suffix in each operation, wherein the records can be enumerated into three records with the same suffix (item 1, item2, item 3) (item 2, item 3) (item 3); grouping and aggregating the transaction records of the same prefix item into the same group through a groupByKey operation according to the prefix item;
step 5.8, frequent item set L is performed on the transaction record construction pattern tree Fp-tree in each prefix group by the foreach operation D And quasi-frequent item set L D ' mining, wherein the mining process is the same as the FPgrowth algorithm; in the construction process of the pattern tree, 1 item set L broadcasted in step 5.5 needs to be read s1 As a pattern tree Header table;
as a further preferable scheme of the method for parallel incremental mining of frequent item sets based on sliding window of the present invention, in step 6, the method comprises the following steps:
step 6.1, reading a preamble batch data set DB in a window to which a current prefix belongs according to the prefix item groups on each computing node 1…k-1 Digging outFrequent item set PL D And quasi-frequent item set PL D ', and combining both PL s =PL D +PL D ' constructing Item set prefix tree Item-PLTree in a mode of sharing prefix paths;
step 6.2, the increment frequent item set L mined in the step 5 is processed D And quasi-frequent item set L D ' merge into the Item set prefix tree Item-PLTree in a shared prefix path;
step 6.3, traversing the Item set prefix tree Item-PLTree in a preamble traversing mode, pruning node branches with the support degree count smaller than the quasi-frequent support degree semisup;
step 6.4, traversing the item set prefix tree in a preamble traversing mode, counting the nodes with the support degree greater than or equal to the frequent support degree minsup, and outputting the path from the node to the root node, namely the frequent item set WL of all batch data sets in the current window after incremental updating D
Step 6.5, persisting the frequent item set WL stored in the item set prefix tree under the group in the current window D And quasi-frequent item set WL D ’;
Compared with the prior art, the invention has the following advantages:
according to the invention, on the basis of a traditional frequent item set parallel increment mining method, by introducing a sliding window, dictionary sequence prefix grouping of a transaction set, prefix tree updating of an item set and a lasting quasi-frequent item set, the dependence of the increment updated candidate item set on original data set scanning is weakened, and the speed of judging whether the part of candidate item set is the frequent item set is greatly accelerated; and simultaneously, spark parallel computing and Hadoop distributed storage are combined, so that the method has good mining efficiency and expandability.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of an item set prefix tree constructed during an item set update of the present invention;
fig. 3 is a schematic view of a sliding window according to the present invention.
FIG. 4 is a graph of the runtime simulation results of the present invention for measuring incremental mining of different methods under a batch incremental dataset.
FIG. 5 is a graph of the effect of the present invention on measuring run time simulation of a batch of incremental datasets with different numbers of operational nodes.
Detailed Description
As shown in fig. 1, the implementation steps of the specific technology of the present invention are as follows:
and (3) setting up an environment, namely setting up a distributed environment of Spark 2.4.3 and Hadoop 2.9.2 on a Linux cloud platform, wherein the distributed environment comprises a Master node Master and 5 Slave nodes Slave. Step 1, acquiring a data set webdocs. Dat, wherein the size of the data set webdocs. Dat is 1.37GB;
step 2, preprocessing the data of the acquired data set, including the numerical processing of transaction items in the transaction data set, and removing dirty data;
step 3, dividing the data set into n incremental data sets DB k The data set dividing mode is divided into n parts according to the total number of data set transactions, and each part of data set is marked as DB k ,k∈[1,n]The method comprises the steps of carrying out a first treatment on the surface of the Since the transaction records of each data set have the same number, the transaction items of each transaction record have different numbers, and thus each data set DB is finally obtained k Is not absolutely equal in size; currently, n=4 is adopted, and each data set is about 340MB in size and is uniformly stored on a distributed file system HDFS;
step 4, for the divided data set DB k Incremental excavation is carried out according to the batch input sliding window;
definition 4.1, sliding window is defined as a fixed size window containing m batches of data sets, which behaves like a fixed size queue of length m, one head in and the other head out; only m batches of data sets are reserved in the sliding window, when the increment data set of the (m+1) th batch is input, the 1 st batch of data set at the other end of the window needs to be removed, and only m fixed batches of data sets are ensured in the window.
Definition 4.2 incremental mining in sliding Window is defined as single batch dataset DB in each input Window k The incremental mining is to be performed on the basis of its preamble m-1 batch data sets.
Step 5, excavating the current single batchSecondary data set DB k Frequent item sets and quasi-frequent item sets;
defining 5.1, wherein the frequent item set indicates that the support degree supitems of the item set exceeds the frequent minimum support degree minsup;
defining 5.2, wherein the quasi-frequent item set indicates that the support degree supitems of the item set exceeds the quasi-frequent minimum support degree semisup, but is smaller than the frequent minimum support degree minusup; wherein semisup < minusup needs to be satisfied;
step 5.3, single batch dataset DB k As the current window input data, reading the data set through textFile;
step 5.4, counting the frequent 1 item set and the quasi-frequent 1 item set, mapping the transaction record tran (item 1, item2 …, item q) in the data set into a binary group (item q, 1) by performing a flatMap operation, and then accumulating the count item count of each 1 item set by a reduced ByKey aggregation operation, and mapping the count item count into a new binary group (item q, item count); finally, using a filter operation to screen semisup<=minsup*|DB k 1 item set of L as quasi-frequent 1 item set L D1 ' frequent 1 item set L is screened out by using filter operation D1
Step 5.5, merging the frequent 1 item set and the quasi-frequent 1 item set to L s1 =L D1 +L D1 ' sorting according to the dictionary sequence of the item set, and broadcasting to each computing node for use;
step 5.6, pruning the transaction data set; 1 item set L according to the statistics s1 Rescanning data set DB k For data set DB k Is not in 1 item set L s1 Removing the transaction record items in the database;
step 5.7, grouping transaction records according to dictionary sequence prefix item items; rereading pruned transaction set DB k Ordering the items in each transaction record according to dictionary sequence, then executing a flat map operation on each transaction record, and enumerating the records according to the same suffix in each operation, wherein the records can be enumerated into three records with the same suffix (item 1, item2, item 3) (item 2, item 3) (item 3); then grouping and aggregating the transaction records of the same prefix item according to the prefix item through a groupByKey operationAll converged into the same packet;
step 5.8, frequent item set L is performed on the transaction record construction pattern tree Fp-tree in each prefix group by the foreach operation D And quasi-frequent item set L D ' mining, wherein the mining process is the same as the FPgrowth algorithm; in the construction process of the pattern tree, 1 item set L broadcasted in step 5.5 needs to be read s1 As a pattern tree Header table;
step 6, the current batch data set DB k As a preface batch DB 1…k-1 The increment of the data sets, combining the frequent item sets and the quasi-frequent item sets mined by the current batch and the previous batch data sets in the sliding window;
step 6.1, reading a preamble batch data set DB in a window to which a current prefix belongs according to the prefix item groups on each computing node 1…k-1 Mined frequent item sets PL D And quasi-frequent item set PL D ', and combining both PL s =PL D +PL D ' constructing Item set prefix tree Item-PLTree in a mode of sharing prefix paths;
step 6.2, the increment frequent item set L mined in the step 5 is processed D And quasi-frequent item set L D ' merge into the Item set prefix tree Item-PLTree in a shared prefix path;
step 6.3, traversing the Item set prefix tree Item-PLTree in a preamble traversing mode, pruning node branches with the support degree count smaller than the quasi-frequent support degree semisup;
step 6.4, traversing the item set prefix tree in a preamble traversing mode, counting the nodes with the support degree greater than or equal to the frequent support degree minsup, and outputting the path from the node to the root node, namely the frequent item set WL of all batch data sets in the current window after incremental updating D
Step 6.5, persisting the frequent item set WL stored in the item set prefix tree under the group in the current window D And quasi-frequent item set WL D ’;
Step 7, acquiring all the frequent item sets WL in the updated current sliding window D
Simulation results:
as can be seen from the results of FIG. 4, with the incremental dataset DB k The number of the historical data sets of the traditional increment mining method is increased, the size of the total data set to be scanned of the candidate set after each increment update is increased, so that the method consumes more and more time in the process of confirming the support degree of the candidate set after the increment update, as shown by an ascending line in the figure; in the method, due to the adoption of strategies such as sliding window, quasi-frequent item set, prefix item grouping update and the like, the data set DB is increased along with the increment k The dependency of the incremental candidate set on the historical dataset decreases, tends to stabilize, and does not increase linearly in time, as indicated by the decreasing line.
From the results of fig. 5, it can be seen that under the incremental dataset of batches, the running time of the method tends to decrease under the condition that the number of computing nodes increases, which illustrates the effectiveness and scalability of the distributed parallel design of the method.

Claims (3)

1. The method for parallel incremental mining of frequent item sets based on the sliding window is characterized by comprising the following steps:
step 1, acquiring a data set;
step 2, data preprocessing is carried out on the acquired data set;
step 3, dividing the data set into n incremental data sets DB k
Step 4, for the divided data set DB k Incremental excavation is carried out according to the batch input sliding window;
in step 4, there are the following definitions:
definition 4.1, sliding window is defined as a fixed size window containing m batches of data sets, which behaves like a fixed size queue of length m, one head in and the other head out; only m batches of data sets are reserved in the sliding window, when the increment data set of the (m+1) th batch is input, the 1 st batch of data set at the other end of the window needs to be removed, and only m fixed batches of data sets are ensured in the window;
definition 4.2 incremental mining definition in sliding WindowFor a single batch of data sets DB in each input window k Incremental mining is to be performed on the basis of m-1 batch data sets of the preamble;
step 5, mining the current single batch data set DB k Frequent item sets and quasi-frequent item sets;
in step 5, the following definitions and steps are included:
defining 5.1, wherein the frequent item set indicates that the support degree supitems of the item set exceeds the frequent minimum support degree minsup;
defining 5.2, wherein the quasi-frequent item set indicates that the support degree supitems of the item set exceeds the quasi-frequent minimum support degree semisup, but is smaller than the frequent minimum support degree minusup; wherein semisup < minusup needs to be satisfied;
step 5.3, single batch dataset DB k As the current window input data, reading the data set through textFile;
step 5.4, counting the frequent 1 item set and the quasi-frequent 1 item set, mapping the transaction record tran (item 1, item2 …, item q) in the data set into a binary group (item q, 1) by performing a flatMap operation, and then accumulating the count item count of each 1 item set by a reduced ByKey aggregation operation, and mapping the count item count into a new binary group (item q, item count); finally, using a filter operation to screen semisup<=minsup*|DB k 1 item set of L as quasi-frequent 1 item set L D1 ' frequent 1 item set L is screened out by using filter operation D1
Step 5.5, merging the frequent 1 item set and the quasi-frequent 1 item set to L s1 =L D1 +L D1 ' sorting according to the dictionary sequence of the item set, and broadcasting to each computing node for use;
step 5.6, pruning the transaction data set; 1 item set L according to the statistics s1 Rescanning data set DB k For data set DB k Is not in 1 item set L s1 Removing the transaction record items in the database;
step 5.7, grouping transaction records according to dictionary sequence prefix item items; rereading pruned transaction set DB k Ordering the items in each transaction record according to dictionary sequence, and then executing the flat on each transaction recordMap operation, each record is enumerated according to the same suffix in each operation, for example, (item 1, item2, item 3) can be enumerated as (item 1, item2, item 3) (item 2, item 3) (item 3) records with three identical suffixes; grouping and aggregating the transaction records of the same prefix item into the same group through a groupByKey operation according to the prefix item;
step 5.8, frequent item set L is performed on the transaction record construction pattern tree Fp-tree in each prefix group by the foreach operation D And quasi-frequent item set L D ' mining, wherein the mining process is the same as the FPgrowth algorithm; in the construction process of the pattern tree, 1 item set L broadcasted in step 5.5 needs to be read s1 As a pattern tree Header table;
step 6, the current batch data set DB k As a preface batch DB 1…k-1 The increment of the data sets, combining the frequent item sets and the quasi-frequent item sets mined by the current batch and the previous batch data sets in the sliding window;
in step 6, the steps of:
step 6.1, reading a preamble batch data set DB in a window to which a current prefix belongs according to the prefix item groups on each computing node 1…k-1 Mined frequent item sets PL D And quasi-frequent item set PL D ', and combining both PL s =PL D +PL D ' constructing Item set prefix tree Item-PLTree in a mode of sharing prefix paths;
step 6.2, the increment frequent item set L mined in the step 5 is processed D And quasi-frequent item set L D ' merge into the Item set prefix tree Item-PLTree in a shared prefix path;
step 6.3, traversing the Item set prefix tree Item-PLTree in a preamble traversing mode, pruning node branches with the support degree count smaller than the quasi-frequent support degree semisup;
step 6.4, traversing the item set prefix tree in a preamble traversing mode, counting the nodes with the support degree greater than or equal to the frequent support degree minsup, and outputting the path from the node to the root node, namely the frequent item set WL of all batch data sets in the current window after incremental updating D
Step 6.5, persisting the frequent item set WL stored in the item set prefix tree under the group in the current window D And quasi-frequent item set WL D ’;
And 7, acquiring all frequent item sets in the updated current sliding window.
2. The method for parallel incremental mining of frequent item sets based on sliding windows according to claim 1, wherein: in step 2, the data preprocessing includes a numerical processing of transaction items in the transaction dataset, and dirty data is removed.
3. The method for parallel incremental mining of frequent item sets based on sliding windows according to claim 1, wherein: in step 3, the data set is divided into n parts based on the total number of data set transactions, and each part of data set is marked as DB k ,k∈[1,n]The method comprises the steps of carrying out a first treatment on the surface of the Since the transaction records of each data set have the same number, the transaction items of each transaction record have different numbers, and thus each data set DB is finally obtained k Is not absolutely equal in size.
CN202210077060.8A 2022-05-11 2022-05-11 Method for parallel incremental mining of frequent item sets based on sliding window Active CN114691749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210077060.8A CN114691749B (en) 2022-05-11 2022-05-11 Method for parallel incremental mining of frequent item sets based on sliding window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210077060.8A CN114691749B (en) 2022-05-11 2022-05-11 Method for parallel incremental mining of frequent item sets based on sliding window

Publications (2)

Publication Number Publication Date
CN114691749A CN114691749A (en) 2022-07-01
CN114691749B true CN114691749B (en) 2024-03-19

Family

ID=82137948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210077060.8A Active CN114691749B (en) 2022-05-11 2022-05-11 Method for parallel incremental mining of frequent item sets based on sliding window

Country Status (1)

Country Link
CN (1) CN114691749B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110096302A (en) * 2010-02-22 2011-08-30 숭실대학교산학협력단 Apparatus and method for association rule mining using frequent pattern-tree for incremental data processing
CN103984723A (en) * 2014-05-15 2014-08-13 江苏易酒在线电子商务有限公司 Method used for updating data mining for frequent item by incremental data
CN107391621A (en) * 2017-07-06 2017-11-24 南京邮电大学 A kind of parallel association rule increment updating method based on Spark
CN109471877A (en) * 2018-11-01 2019-03-15 中南大学 Increment type tense frequent mode P mining method towards flow data
CN110222090A (en) * 2019-06-03 2019-09-10 哈尔滨工业大学(威海) A kind of mass data Mining Frequent Itemsets

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110096302A (en) * 2010-02-22 2011-08-30 숭실대학교산학협력단 Apparatus and method for association rule mining using frequent pattern-tree for incremental data processing
CN103984723A (en) * 2014-05-15 2014-08-13 江苏易酒在线电子商务有限公司 Method used for updating data mining for frequent item by incremental data
CN107391621A (en) * 2017-07-06 2017-11-24 南京邮电大学 A kind of parallel association rule increment updating method based on Spark
CN109471877A (en) * 2018-11-01 2019-03-15 中南大学 Increment type tense frequent mode P mining method towards flow data
CN110222090A (en) * 2019-06-03 2019-09-10 哈尔滨工业大学(威海) A kind of mass data Mining Frequent Itemsets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于MapReduce的并行关联规则增量更新算法;程广;王晓峰;;计算机工程(第02期);27-31+38 *

Also Published As

Publication number Publication date
CN114691749A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
Holley et al. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs
US11232085B2 (en) Outlier detection for streaming data
Davidson et al. Efficient parallel merge sort for fixed and variable length keys
US10049134B2 (en) Method and system for processing queries over datasets stored using hierarchical data structures
US10042914B2 (en) Database index for constructing large scale data level of details
US20140214334A1 (en) Efficient genomic read alignment in an in-memory database
US8935233B2 (en) Approximate index in relational databases
CN103761236A (en) Incremental frequent pattern increase data mining method
CN114168608B (en) Data processing system for updating knowledge graph
Xu et al. Distributed maximal clique computation and management
CN112925821B (en) MapReduce-based parallel frequent item set incremental data mining method
Wheatman et al. A parallel packed memory array to store dynamic graphs
Gazzarri et al. End-to-end task based parallelization for entity resolution on dynamic data
Kim et al. Real-time stream data mining based on CanTree and Gtree
Lee et al. Efficient approach of sliding window-based high average-utility pattern mining with list structures
CN113761390B (en) Method and system for analyzing attribute intimacy
CN106599122B (en) Parallel frequent closed sequence mining method based on vertical decomposition
CN108334532B (en) Spark-based Eclat parallelization method, system and device
Singh et al. High average-utility itemsets mining: a survey
US20190073195A1 (en) Computing device sort function
CN114691749B (en) Method for parallel incremental mining of frequent item sets based on sliding window
CN108596390B (en) Method for solving vehicle path problem
Wang et al. Improving online aggregation performance for skewed data distribution
Ceccarello et al. Distributed graph diameter approximation
Nowakiewicz et al. BIPie: fast selection and aggregation on encoded data using operator specialization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant