CN108520027B

CN108520027B - GPU accelerated frequent item set mining method based on CUDA framework

Info

Publication number: CN108520027B
Application number: CN201810255238.7A
Authority: CN
Inventors: 王宇新; 徐彤坤; 薛世卿
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2020-09-29
Anticipated expiration: 2038-03-20
Also published as: CN108520027A

Abstract

The invention belongs to the technical field of computer application, and provides a GPU-accelerated frequent item set mining method based on a CUDA framework. The invention provides a GPU accelerated frequent item set mining method based on a CUDA framework by adopting a graph connection and dynamic queue mode. The method fully combines the advantages of the Apriori algorithm and the Eclat algorithm, converts the logic complex task generated by the candidate set into a calculation intensive task to adapt to the calculation mode of the CUDA framework, and reasonably solves the problem of the global memory limitation of the GPU in a dynamic queue mode, for example, the situation that the memory required after a discrete data set is converted into a vertical bitmap exceeds the global memory limitation of the GPU. Experiments prove that the acceleration performance of the method for processing various types of large data sets exceeds that of a serial algorithm, the processing capacity is remarkably improved, and the extracted frequent item sets are accurate and reliable. In practical engineering application, the method has the advantage that other algorithms cannot be replaced.

Description

GPU accelerated frequent item set mining method based on CUDA framework

Technical Field

The invention belongs to the technical field of computer application, and relates to a frequent item set mining method in an association rule mining algorithm. In particular to a method for accelerating an algorithm for applying Apriori algorithm to large-scale data sets to carry out frequent item set mining by using a GPU.

Background

Frequent item set mining as a main step of an association rule mining algorithm can be used for deducing association rules, plays a great role in mining association between items, and is widely applied to financial prediction, medical diagnosis, business recommendation and the like. The purpose of frequent item set mining is to mine potential combinations, namely frequent items, in a data set, and the evaluation criterion of the frequent items is that all item sets with the support degree larger than a minimum support degree threshold value are used.

The problem of this algorithm is described as follows: i ═ I1, I2, …, im } denotes a set of items, D ═ { T1, T2, …, Tn } is a data set of n items. A K-term set is a set of terms that contains K elements. The association rule is an implication of the formula X ═>Y is, wherein

And is

Then in D the rule X ═>The support of Y is the percentage of transactions in the entire transaction dataset that contain X ∪ Y the main association rule mining algorithms are Apriori, Eclat and FP-growth.

The Apriori algorithm generates a frequent item set based on a horizontally stored data structure applying a breadth-first order. Apriori generates high-order k +1 item candidate items based on k item sets, called a concatenation step, and a process of obtaining the support of each candidate item is a calculation step of the support. The candidate item sets in the support degree calculation step are mutually independent, and the parallelism is higher. The Eclat algorithm applies a bitmap-type vertically stored data structure, adopts a depth-first search mode and applies a prefix tree mode, and realizes efficient query by the principle of longest same prefix matching. FP-growth is a tree-based algorithm that does not require the generation of candidate sets. Although Apriori and Eclat perform optimally compared with FP-growth based on performance analysis and comparison, the data structure and the calculation order of the method cannot adapt to parallel calculation well. And Apriori with poor serial performance adopts a generation-calculation processing mode, so that the Apriori contains data-independent calculation tasks and is more suitable for parallel calculation under the CUDA framework.

Although the Apriori algorithm can be accelerated in parallel by the CUDA framework, under the condition that a lot of candidate sets are generated or the minimum support threshold is low, the algorithm still needs to generate a large number of candidate sets and complete the support calculation of the large number of candidate sets by repeatedly scanning the database, and the IO with frequent access to the external memory causes the algorithm to be inefficient. Although the Eclat algorithm can avoid repeatedly scanning the database to complete the calculation of the support degree, the Apriori property is not combined, a large number of candidate sets are generated, and for a discrete data set, the memory required by a generated bitmap is larger may exceed the global memory limit of the GPU. Meanwhile, neither method can apply the step of generating candidates, which is a logic complex and interdependent step between data, to the CUDA framework.

Disclosure of Invention

According to the existing association rule mining algorithm, when the algorithm is actually applied to processing a large-scale data set, the processing efficiency cannot meet the requirement of system real-time performance, the algorithm cannot better adapt to a sparse data set, and the candidate item set generation step of the original algorithm is difficult to be directly transplanted to a CUDA (compute unified device architecture) framework for acceleration

In order to solve the problems in the prior art, the invention provides a GPU-accelerated frequent item set mining method based on a CUDA framework in a graph connection and dynamic queue mode. The method fully combines the advantages of the Apriori algorithm and the Eclat algorithm, converts the logic complex task generated by the candidate set into a calculation intensive task to adapt to the calculation mode of the CUDA framework, and reasonably solves the problem of the global memory limitation of the GPU in a dynamic queue mode, for example, the situation that the memory required after a discrete data set is converted into a vertical bitmap exceeds the global memory limitation of the GPU.

The technical scheme of the invention is as follows:

a GPU accelerated frequent item set mining method based on a CUDA framework comprises the following steps:

(1) the traditional transaction data set comprises all items in the corresponding transaction for each transaction, the items are converted into IDs (tids) of all transactions, each item of which corresponds to the item, and the existence of the IDs is represented in a form of a bit table, namely, the transaction data set with the original horizontal structure is converted into a data set with a vertical bit table structure < item, bitmap >, and the vertical bit table is copied into a GPU;

(2) calculating the number of 1 in each item corresponding bit table in the vertical bit table to obtain the corresponding support degree of each item; the support degrees of all items are respectively compared with the minimum support degree threshold value required by the item, and the pruning operation is completed to obtain L₁；

(3) L obtained according to step (2)₁And generating a candidate set based on the adjacency matrix in a graph connection mode to obtain C₂；L₁The item set is the vertex set V and the adjacency matrix is a square matrix S of | V | × | V |, if the element S_IJIs 1, the representative item set I and the item set J are equivalence classes that can be joined to yield C₂(ii) a If the element S_IJThe value of (A) is 0, which represents that the item set I and the item set J are not equivalence classes and cannot be connected; because the connections of the item sets themselves are invalid and the connections between the item sets are undirected, the diagonal and lower triangular elements of the matrix S are both 0;

(4) at C₂According to the dictionary sequence, adding the bit tables corresponding to the frequent item sets to the tail of the queue I and the queue II, as shown in fig. 2 (e); dynamically adjusting the maximum length of the bit table queue according to the global memory size of the GPU, copying the whole queue into the GPU, performing AND operation on the bit tables in the queue I and the queue II to obtain a result bit table queue, and performing the step (2) on the result bit table queue to obtain L₂(ii) a Adding the part exceeding the maximum length into a new queue to wait for the next execution;

(5) mixing L with_k-1Performing connection pretreatment on each k-1 order frequent item set, and connecting each k-1 order frequent item set into a number; p is the number of bits of the largest element in the frequent item set, and the item set I ═ I₁,i₂,…,i_k-1Connecting the application formula (1) into a number N

After connection pretreatment L_k-1＝{N₁,N_2,…,N_mThe (m) is a K-1 order frequent item set, and m is the number of the K-1 order frequent item set;

(6) the L after pretreatment_k-1Copying to a shared memory of the GPU to reduce access delay, and executing the step (3); generating C by way of graph connection_kAs shown in FIG. 3 (a); l is_k-1The item set is a vertex set V, and the adjacency matrix is a square matrix S of | V | × | V |;

(7) in the subsequent k-1 round, according to C_k-1The S matrix of (4) is executed to obtain Lk, as shown in FIG. 3(b)Shown in the specification;

(8) in the subsequent k-1 round (k)>4), in order to avoid the range that the N value obtained after the connection pretreatment in the step (5) is larger than longint, replacing each equivalence class in the item set with an index number, establishing an equivalence class index, and enabling L to be larger than the range of longint_k-1Generating L_kAnd L₂Generating L₃The same calculation process is performed as shown in fig. 4.

(9) And (5) repeatedly executing the steps (5), (6), (7) and (8) until all elements in the generated S matrix are 0, namely, a new frequent item set cannot be found.

The invention has the beneficial effects that: the method can convert the logic intensive task generated by the original candidate item set into a calculation intensive task by a graph connection mode and an equivalence index establishing mode, fully utilizes a GPU shared memory, and completes efficient candidate item set generation calculation under the condition of avoiding branch divergence. Meanwhile, the calculation advantages of the original Apriori algorithm and the Eclat algorithm are fully exerted in a dynamic queue mode, the problem of limitation of the global memory of the GPU is reasonably solved, the length of the dynamic queue is reasonably and dynamically adjusted, the starting times of the CUDA kernel function are reduced, and the GPU can embody good acceleration performance when being applied to the algorithm for acceleration. Experiments prove that the acceleration performance of the method for processing various types of large data sets exceeds that of a serial algorithm, the processing capacity is remarkably improved, and the extracted frequent item sets are accurate and reliable. In practical engineering application, the method has the advantage that other algorithms cannot be replaced. For example: in the engineering project of medical big data leukemia diagnosis, after extracting valuable frequent item sets, association rules can be generated for efficient disease diagnosis.

Drawings

FIG. 1 is a flow chart of the method;

FIG. 2 is a pictorial representation of the overall data conversion step;

FIG. 3 is a schematic diagram of a candidate generation and support calculation step algorithm process;

FIG. 4 is a diagram of equivalence class indexing in a graph-join iterative implementation.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are described in detail and completely with reference to the drawings in the embodiments of the present invention. The overall method flow diagram is shown in fig. 1:

in a first step, a conventional horizontally stored data set is converted into a data structure stored in a vertical bit list

1) The original transaction data set represents each transaction for each row of data, each transaction has a transaction Tid, and an original database is scanned and converted into vertical storage;

2) and (2) carrying out bitization on the transaction data set which is vertically stored, converting the transaction data set into a vertical bit table storage structure < item, bitmap >, wherein each item corresponds to the Tid set of the object containing the item, and carrying out bitization on the Tid set. If the item exists in a transaction, the value of the position of the transaction in the array corresponding to the item in the two-dimensional bit table is 1, otherwise, the value is 0.

Secondly, initializing a graph connection matrix, and preparing for CUDA kernel function calculation:

graph Length: is L_k-1Square of number of item set

graph _ d one-dimensional array with device end array length being graph length

By initializeMaskArray < < < < grid _ dim, block _ dim > > (graph Length, graph _ d)

Index term index ═ threadidx.x + blockdim.x × blockidx.x, and when index < graph length, the initialized graph _ d [ index ] is-1

Third, initialize the data of graph join operation and start the generation of candidate set:

1) when k is>When the value is 4, an equivalence class index is established, so that L_k-1Generating L_kAnd L₂Generating L₃The calculation process is the same, and the specific process is shown in fig. 4. And judging whether each item from the starting item to the k-2 item is the same as each item from the starting item to the k-2 item of the next item from the first item of the item set.

2) Mixing L with_k-1The connection preprocessing is carried out on each k-1 order frequent item:

p: number of bits of maximum element of data set

mul _ factor: power of P of 10

According to

Performing a ligation pretreatment, e.g. L₂Generation of C₃Then, new _ patterns [ i ]]＝C2_list[i][0]*mul_factor+C2_list[i][1]

3) The arrays and variables required for the following graph join operations are first defined and initialized:

Lk-1_new_pattern_d：L_k-1item set preprocessed and connected and composed { N }₁,N₂,…,N_{item_index}An array formed by preprocessing equivalent class indexes after the collection is copied to a device end

graph k-1_ d: one-dimensional array with device end array length being graphLength

item_index：L_k-1Number of frequent items

4) Generating candidate items by a graph connection method:

graphJoinKernel<<<grid_dim,block_dim>>>(Lk-1_new_pattern_d,

graphk-1_d,item_index,power)

and storing the Lk-1_ new _ pattern _ d into a shared memory, and if the item _ index is smaller than the maximum thread number in the thread blocks, applying one-dimensional thread blocks and placing the Lk-1_ new _ pattern _ d into __ shared __ side1[ MAXSIDLEN ]. Traversing each element Ei, j ═ i +1 in side1, S [ i ×/item _ index + j ] ═ 0 if side1[ Ei ]/power ═ side1[ Ej ]/power, i.e., they have the same equivalence class, and a graph join operation can be performed. If the item _ index is greater than the maximum number of threads in the thread block, then apply the two-dimensional thread block and place Lk-1_ new _ pattern _ d into __ shared __ SIDE1[ MAXSIDE LEN ] and __ shared __ SIDE2[ MAX SIDE LEN ]. Traversing each element Ei in side1, traversing each element Ei in side2, and if side1[ Ei ]/P is side2[ Ej ]/P and i < j, namely the two have the same equivalence class, can be graph-connected, then S [ i ] item _ index + j ] is 0, and generating a candidate set. The specific process is shown in fig. 3 (a).

Fourthly, storing the bit table corresponding to the generated candidate item set into a dynamic queue

1) Determining the queue LENGTH MAX _ LENGTH according to the size of the GPU global memory

2) Determining the required cycle number iter according to the MAX _ LENGTH and the bit table LENGTH corresponding to the actual candidate set

3) The frequent entries and the corresponding bit table queues src _ list _1[ MAX _ LENGTH ], src _ list _2[ MAX _ LENGTH ] and the calculation result queues dst _ list [ MAX _ LENGTH ] are initialized, and the specific process is shown in FIG. 2(e)

Fifthly, calculating the support degree of the bit table in the dynamic queue

1) Storing dst _ list obtained from the k-1 round into src _ list _1 and src _ list _2 according to the queue length to prepare the support degree calculation of the k-th round, and copying src _ list _1 and src _ list _2 to the device end

2) Calculating int data of 32 bits by each thread, performing and operation on bit tables of src _ list _1 and src _ list _2 to obtain tmp [ thadedx.x ], calculating by __ popc (tmp [ thadedx.x ]) to obtain a sum of 1 in calculation results of each thread, summing the calculation to sup [0], where sup [0] is the frequency of occurrence of the candidate item, and the specific process is shown in fig. 3(b)

Sixthly, completing the pruning operation according to the minimum support degree and continuing the generation of the frequent items of the next round

And when the calculation is finished, converting the frequency of the candidate item set in the transaction data set into the frequency of the item set according to the total number of the transactions. And deleting the item sets with the minimum support degree through comparison with the minimum support degree to form a new dst _ list for next round of frequent item set generation until the new frequent item set can not be found. According to the most frequently acquired item set, the corresponding association rule can be conveniently acquired.

Claims

1. A GPU accelerated frequent item set mining method based on a CUDA framework is characterized by comprising the following steps:

(1) the traditional transaction data set comprises all items in the corresponding transaction for each transaction, the traditional transaction data set is converted into IDs (identity) of all transactions of which each item corresponds to the item, and the existence of the IDs is represented in a bit table form, namely, the transaction data set with the original horizontal structure is converted into a data set with a vertical bit table structure < item, bitmap >, and the vertical bit table is copied into a GPU;

(4) at C₂Adding the bit tables corresponding to the frequent item sets to the tail ends of the queue I and the queue II according to the dictionary sequence in the square matrix of S; dynamically adjusting the maximum length of the bit table queue according to the global memory size of the GPU, copying the whole queue into the GPU, performing AND operation on the bit tables in the queue I and the queue II to obtain a result bit table queue, and performing the step (2) on the result bit table queue to obtain L₂(ii) a Adding the part exceeding the maximum length into a new queue to wait for the next execution;

After connection pretreatment L_k-1＝{N₁,N₂,…,N_mThe (m) is a K-1 order frequent item set, and m is the number of the K-1 order frequent item set;

(6) the L after pretreatment_k-1Copying to a shared memory of the GPU to reduce access delay, executing step (3), and generating C in a graph connection mode_k；L_k-1The item set is a vertex set V, and the adjacency matrix is a square matrix S of | V | × | V |;

(7) in the subsequent k-1 round, according to C_k-1Performing the step (4) to obtain L_k；

(8) In the subsequent k-1 round, k>And (4) replacing each equivalence class in the item set with an index number to establish an equivalence class index so as to enable L to be larger than the range of the Long int for avoiding the N value obtained after the connection pretreatment in the step (5)_k-1Generating L_kAnd L₂Generating L₃The calculation process is the same;

(9) and (5) repeatedly executing the steps (5), (6), (7) and (8) until all elements in the generated S square matrix are 0, namely, a new frequent item set can not be found.