CN106874479A

CN106874479A - The improved method and device of the FP Growth algorithms based on FPGA

Info

Publication number: CN106874479A
Application number: CN201710088020.2A
Authority: CN
Inventors: 曹芳; 陈继承; 王洪伟
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2017-02-19
Filing date: 2017-02-19
Publication date: 2017-06-20

Abstract

The present invention relates to a kind of FP Growth algorithms, the improved method and device of more particularly to a kind of FP Growth algorithms based on FPGA belong to machine learning algorithm process field, and the database in Spark clusters is scanned first, obtain frequent item set；Frequent item set is grouped；For each node in Spark clusters adds with one piece of FPGA board；FP trees are built to each group of frequent item set on FPGA boards；FP trees to each establishment carry out recurrence excavation；The result that each group of recurrence is excavated is merged.The present invention improves the efficiency of FP Growth algorithms, by the computing capability that Spark cluster single nodes are improved on clustered node plus with FPGA boards, the Spark clusters parallel computation frames of itself is remained simultaneously, effectively increases the overall performance of FP Growth algorithms under big data environment.

Description

The improved method and device of the FP-Growth algorithms based on FPGA

Technical field

The present invention relates to machine learning algorithm process field, more particularly to the FP-Growth algorithms based on FPGA improvement Method and device.

Background technology

FP-Growth algorithms based on Spark platforms are using MapReduce distributed computing platforms, based on internal memory meter Calculate, realize the parallelization of the algorithm, the digging efficiency of the algorithm is improved to a certain extent；However as the big data epoch Arrival, the data volume sharp increase in science and engineering calculation field, computation complexity is continuously increased, to based on Spark platforms The calculating performance of FP-Growth algorithms bring great challenge.Because single node disposal ability is limited, Spark is by extension Clustered node scale realizes the lifting of algorithm performance；And this cluster expansion not only causes that system cost and energy consumption quickly increase Plus, and cause that the data transfer overhead between cluster network complexity and node steeply rises, reduce what cluster expansion was brought Calculate performance gain.How can just solve the above problems, strengthen single node disposal ability and then reduce computing cluster Rapid Expansion The network transmission expense brought, finally realizing the performance boost of FP-Growth algorithms turns into hot issue urgently to be resolved hurrily.

The content of the invention

The improved method and device of the FP-Growth algorithms based on FPGA that the present invention is provided, overcome in the prior art The deficiency of presence, has been obviously improved the calculating performance of FP-Growth algorithms.

In order to achieve the above object, the present invention is achieved by the following technical solutions：

The present invention provides a kind of improved method of the FP-Growth algorithms based on FPGA, comprises the following steps：

Database in scanning Spark clusters, obtains frequent item set；

Frequent item set is grouped；

For each node in Spark clusters adds with one piece of FPGA board；

Frequent item set to each group on FPGA boards builds FP trees；

FP trees to each establishment on FPGA boards carry out recurrence excavation；

The result that each group of recurrence is excavated is merged.

Further, frequent item set is grouped, including：

Pressed frequent 1- item collections decreasing order arrangement；

Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance.

Further, in FPGA boards to each establishment FP trees, including：

Set up FP trees and a Tab table for memory node information that a root node is NULL；

Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, constructs FP trees A paths；

In above-mentioned insertion process, while pointing to the node of respective items with the pointer of Tab, and the counting of each node is increased 1。

Further, recurrence excavation is carried out to the FP trees of each establishment in FPGA boards, including：

A：FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time；

B：Its conditional pattern base is converted into condition FP trees；

C：Step step A B is iteratively repeated, untill FP trees include an element entry.

Further, the result that each group of recurrence is excavated is merged, including：

Each condition FP tree is generated into all of path from root node to leaf node, its institute is generated by the set in path Some nonvoid subsets.

Based on a kind of a kind of based on FPGA's of above-mentioned improved method of the FP-Growth algorithms based on FPGA of any one The improvement device of FP-Growth algorithms, including：

Acquisition module, for scanning the database in Spark clusters, obtains frequent item set；

Grouping module, for frequent item set to be grouped；

Board card module, for adding with one piece of FPGA board for each node in Spark clusters；

Achievement module, FP trees are built for the frequent item set to each group on FPGA boards；

Tree module is excavated, recurrence excavation is carried out for the FP trees to each establishment on FPGA boards；

Object module, for the result that each group of recurrence is excavated to be merged.

A kind of improved method of FP-Growth algorithms based on FPGA provided by the present invention, has the following advantages that：

1. the present invention is by the basis of original Spark clusters plus with FPGA, being that each clustered node increases by one block of FPGA plate Card, is a kind of new isomery because FPGA boards have the outstanding advantages such as high-performance, low-power consumption, easy programming, dynamic reconfigurable Computation accelerator part, constitutes FPGA boards and original node new Spark clustered nodes and serves whole Spark clusters, comes The computing capability of cluster single node is improved, while remaining the Spark clusters parallel computation frames of itself, big number is effectively increased According to the overall performance of FP-Growth algorithms under environment, and FPGA is engaged with CPU as acceleration equipment and to form Heterogeneous Computing and put down Platform, can effectively lift the combination property of Spark clusters；

2. with Spark be combined FPGA by the present invention, by the maximum achievement of most time-consuming in algorithm, amount of calculation with excavate tree part from Detached in Spark source codes and exploitation is realized and optimized on FPGA, and such as packet, Result of the other parts of algorithm is comprehensive The less parts of amount of calculation such as conjunction are still run according to the original mechanism of Spark, give full play to the two advantage, and right on this basis FP-Growth algorithms optimize improvement, the calculating performance of effective lifting FP-Growth algorithms.

Brief description of the drawings

Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will use needed for embodiment description Accompanying drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention.

Fig. 1 is one of schematic flow sheet of improved method of FP-Growth algorithms that the embodiment of the present invention 1 is based on FPGA.

Fig. 2 is the two of the schematic flow sheet of the improved method of the FP-Growth algorithms that the embodiment of the present invention 2 is based on FPGA.

Fig. 3 is the structural representation of the improvement device of the FP-Growth algorithms that the embodiment of the present invention 3 is based on FPGA.

Specific embodiment

Some technical terms being related in the present invention are explained below：

Frequent episode：In multiple set, the element/item for frequently occurring.

Frequent item set：A series of set, these gather some identical elements, while frequency of occurrences element high in set A subset is formed, the item collection of certain threshold condition is met.

Conditional pattern base：The set in the ancestors path of all nodes of the same frequent episode in FP trees.

Spark：It is universal parallel framework, enables internal memory distributed data collection, in addition to it can provide interactive inquiry, it Can possess Hadoop MapReduce and had the advantage that with Optimized Iterative workload；But it is different from MapReduce Output result can be stored in internal memory in the middle of Job, so as to no longer need to read and write HDFS.

FPGA：English full name be Field-Programmable Gate Array, i.e. field programmable gate array, it It is a kind of logical device, is the product further developed on the basis of the programming devices such as PAL, GAL, CPLD.

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

The present invention is described in detail with reference to the accompanying drawings and examples.

Embodiment 1

With reference to Fig. 1, a kind of improved method of the FP-Growth algorithms based on FPGA of the invention is comprised the following steps：

S101：Database in scanning Spark clusters, obtains frequent item set；

S102：Frequent item set is grouped；

S103：For each node in Spark clusters adds with one piece of FPGA board；

S104：Frequent item set to each group on FPGA boards builds FP trees；

S105：FP trees to each establishment on FPGA boards carry out recurrence excavation；

S106：The result that each group of recurrence is excavated is merged.

Embodiment 2

With reference to Fig. 2, a kind of improved method of the FP-Growth algorithms based on FPGA of the invention is comprised the following steps：

S201：Database in scanning Spark clusters, obtains frequent item set；

S202：Pressed frequent 1- item collections decreasing order arrangement；

S203：Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance；

S204：For each node in Spark clusters adds with one piece of FPGA board；

S205：Set up FP trees and a Tab table for memory node information that a root node is NULL；

S206：Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, builds Go out a paths of FP trees；

S207：In above-mentioned insertion process, while point to the node of respective items with the pointer of Tab, and by the meter of each node Number increases by 1；

S208：FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time；

S209：Its conditional pattern base is converted into condition FP trees；

S210：Step S208, step S209 are iteratively repeated, untill FP trees include an element entry；

S211：Each condition FP tree is generated into all of path from root node to leaf node, by the collection symphysis in path Into its all of nonvoid subset.

Embodiment 3

With reference to Fig. 3, the improvement device of the FP-Growth algorithms based on FPGA of the invention, including：Acquisition module 101, is grouped mould Block 102, board card module 103, achievement module 104 excavates tree module 105 and object module 106；Described acquisition module 101 according to Secondary connection grouping module 102, board card module 103, achievement module 104, excavation tree module 105 and object module 106.

Acquisition module 101 is used to scan the database in Spark clusters, obtains frequent item set；Grouping module 102 is used for will Frequent item set is grouped；Board card module 103 is used to add with one piece of FPGA board for each node in Spark clusters；Contribute The frequent item set that module 104 is used for each group on FPGA boards builds FP trees；Excavating tree module 105 is used in FPGA boards FP trees to each establishment carry out recurrence excavation；Object module 106 is used to merge the result that each group of recurrence is excavated.

The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, various equivalent modifications can be readily occurred in or replaced Change, these modifications or replacement should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection domain be defined.

Claims

1. a kind of improved method of the FP-Growth algorithms based on FPGA, it is characterised in that comprise the following steps：

Database in scanning Spark clusters, obtains frequent item set；

Frequent item set is grouped；

For each node in Spark clusters adds with one piece of FPGA board；

Frequent item set to each group on FPGA boards builds FP trees；

The result that each group of recurrence is excavated is merged.

2. the improved method of the FP-Growth algorithms based on FPGA according to claim 1, it is characterised in that will be frequent Item collection is grouped, including：

Pressed frequent 1- item collections decreasing order arrangement；

3. the improved method of the FP-Growth algorithms based on FPGA according to claim 1, it is characterised in that in FPGA Board to each establishment FP trees, including：

4. the improved method of the FP-Growth algorithms based on FPGA according to claim 3, it is characterised in that in FPGA Board carries out recurrence excavation to the FP trees of each establishment, including：

B：Its conditional pattern base is converted into condition FP trees；

5. the improved method of the FP-Growth algorithms based on FPGA according to claim 4, it is characterised in that will be each The result that group recurrence is excavated is merged, including：

6. be based on based on any one described in claim 1 ~ 5 the FP-Growth algorithms of FPGA improved method based on FPGA FP-Growth algorithms improvement device, it is characterised in that including：

Grouping module, for frequent item set to be grouped；