CN106874479A - The improved method and device of the FP Growth algorithms based on FPGA - Google Patents

The improved method and device of the FP Growth algorithms based on FPGA Download PDF

Info

Publication number
CN106874479A
CN106874479A CN201710088020.2A CN201710088020A CN106874479A CN 106874479 A CN106874479 A CN 106874479A CN 201710088020 A CN201710088020 A CN 201710088020A CN 106874479 A CN106874479 A CN 106874479A
Authority
CN
China
Prior art keywords
fpga
trees
node
growth algorithms
frequent item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710088020.2A
Other languages
Chinese (zh)
Inventor
曹芳
陈继承
王洪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710088020.2A priority Critical patent/CN106874479A/en
Publication of CN106874479A publication Critical patent/CN106874479A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • G06F16/24566Recursive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The present invention relates to a kind of FP Growth algorithms, the improved method and device of more particularly to a kind of FP Growth algorithms based on FPGA belong to machine learning algorithm process field, and the database in Spark clusters is scanned first, obtain frequent item set;Frequent item set is grouped;For each node in Spark clusters adds with one piece of FPGA board;FP trees are built to each group of frequent item set on FPGA boards;FP trees to each establishment carry out recurrence excavation;The result that each group of recurrence is excavated is merged.The present invention improves the efficiency of FP Growth algorithms, by the computing capability that Spark cluster single nodes are improved on clustered node plus with FPGA boards, the Spark clusters parallel computation frames of itself is remained simultaneously, effectively increases the overall performance of FP Growth algorithms under big data environment.

Description

The improved method and device of the FP-Growth algorithms based on FPGA
Technical field
The present invention relates to machine learning algorithm process field, more particularly to the FP-Growth algorithms based on FPGA improvement Method and device.
Background technology
FP-Growth algorithms based on Spark platforms are using MapReduce distributed computing platforms, based on internal memory meter Calculate, realize the parallelization of the algorithm, the digging efficiency of the algorithm is improved to a certain extent;However as the big data epoch Arrival, the data volume sharp increase in science and engineering calculation field, computation complexity is continuously increased, to based on Spark platforms The calculating performance of FP-Growth algorithms bring great challenge.Because single node disposal ability is limited, Spark is by extension Clustered node scale realizes the lifting of algorithm performance;And this cluster expansion not only causes that system cost and energy consumption quickly increase Plus, and cause that the data transfer overhead between cluster network complexity and node steeply rises, reduce what cluster expansion was brought Calculate performance gain.How can just solve the above problems, strengthen single node disposal ability and then reduce computing cluster Rapid Expansion The network transmission expense brought, finally realizing the performance boost of FP-Growth algorithms turns into hot issue urgently to be resolved hurrily.
The content of the invention
The improved method and device of the FP-Growth algorithms based on FPGA that the present invention is provided, overcome in the prior art The deficiency of presence, has been obviously improved the calculating performance of FP-Growth algorithms.
In order to achieve the above object, the present invention is achieved by the following technical solutions:
The present invention provides a kind of improved method of the FP-Growth algorithms based on FPGA, comprises the following steps:
Database in scanning Spark clusters, obtains frequent item set;
Frequent item set is grouped;
For each node in Spark clusters adds with one piece of FPGA board;
Frequent item set to each group on FPGA boards builds FP trees;
FP trees to each establishment on FPGA boards carry out recurrence excavation;
The result that each group of recurrence is excavated is merged.
Further, frequent item set is grouped, including:
Pressed frequent 1- item collections decreasing order arrangement;
Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance.
Further, in FPGA boards to each establishment FP trees, including:
Set up FP trees and a Tab table for memory node information that a root node is NULL;
Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, constructs FP trees A paths;
In above-mentioned insertion process, while pointing to the node of respective items with the pointer of Tab, and the counting of each node is increased 1。
Further, recurrence excavation is carried out to the FP trees of each establishment in FPGA boards, including:
A:FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time;
B:Its conditional pattern base is converted into condition FP trees;
C:Step step A B is iteratively repeated, untill FP trees include an element entry.
Further, the result that each group of recurrence is excavated is merged, including:
Each condition FP tree is generated into all of path from root node to leaf node, its institute is generated by the set in path Some nonvoid subsets.
Based on a kind of a kind of based on FPGA's of above-mentioned improved method of the FP-Growth algorithms based on FPGA of any one The improvement device of FP-Growth algorithms, including:
Acquisition module, for scanning the database in Spark clusters, obtains frequent item set;
Grouping module, for frequent item set to be grouped;
Board card module, for adding with one piece of FPGA board for each node in Spark clusters;
Achievement module, FP trees are built for the frequent item set to each group on FPGA boards;
Tree module is excavated, recurrence excavation is carried out for the FP trees to each establishment on FPGA boards;
Object module, for the result that each group of recurrence is excavated to be merged.
A kind of improved method of FP-Growth algorithms based on FPGA provided by the present invention, has the following advantages that:
1. the present invention is by the basis of original Spark clusters plus with FPGA, being that each clustered node increases by one block of FPGA plate Card, is a kind of new isomery because FPGA boards have the outstanding advantages such as high-performance, low-power consumption, easy programming, dynamic reconfigurable Computation accelerator part, constitutes FPGA boards and original node new Spark clustered nodes and serves whole Spark clusters, comes The computing capability of cluster single node is improved, while remaining the Spark clusters parallel computation frames of itself, big number is effectively increased According to the overall performance of FP-Growth algorithms under environment, and FPGA is engaged with CPU as acceleration equipment and to form Heterogeneous Computing and put down Platform, can effectively lift the combination property of Spark clusters;
2. with Spark be combined FPGA by the present invention, by the maximum achievement of most time-consuming in algorithm, amount of calculation with excavate tree part from Detached in Spark source codes and exploitation is realized and optimized on FPGA, and such as packet, Result of the other parts of algorithm is comprehensive The less parts of amount of calculation such as conjunction are still run according to the original mechanism of Spark, give full play to the two advantage, and right on this basis FP-Growth algorithms optimize improvement, the calculating performance of effective lifting FP-Growth algorithms.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will use needed for embodiment description Accompanying drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention.
Fig. 1 is one of schematic flow sheet of improved method of FP-Growth algorithms that the embodiment of the present invention 1 is based on FPGA.
Fig. 2 is the two of the schematic flow sheet of the improved method of the FP-Growth algorithms that the embodiment of the present invention 2 is based on FPGA.
Fig. 3 is the structural representation of the improvement device of the FP-Growth algorithms that the embodiment of the present invention 3 is based on FPGA.
Specific embodiment
Some technical terms being related in the present invention are explained below:
Frequent episode:In multiple set, the element/item for frequently occurring.
Frequent item set:A series of set, these gather some identical elements, while frequency of occurrences element high in set A subset is formed, the item collection of certain threshold condition is met.
Conditional pattern base:The set in the ancestors path of all nodes of the same frequent episode in FP trees.
Spark:It is universal parallel framework, enables internal memory distributed data collection, in addition to it can provide interactive inquiry, it Can possess Hadoop MapReduce and had the advantage that with Optimized Iterative workload;But it is different from MapReduce Output result can be stored in internal memory in the middle of Job, so as to no longer need to read and write HDFS.
FPGA:English full name be Field-Programmable Gate Array, i.e. field programmable gate array, it It is a kind of logical device, is the product further developed on the basis of the programming devices such as PAL, GAL, CPLD.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
The present invention is described in detail with reference to the accompanying drawings and examples.
Embodiment 1
With reference to Fig. 1, a kind of improved method of the FP-Growth algorithms based on FPGA of the invention is comprised the following steps:
S101:Database in scanning Spark clusters, obtains frequent item set;
S102:Frequent item set is grouped;
S103:For each node in Spark clusters adds with one piece of FPGA board;
S104:Frequent item set to each group on FPGA boards builds FP trees;
S105:FP trees to each establishment on FPGA boards carry out recurrence excavation;
S106:The result that each group of recurrence is excavated is merged.
Embodiment 2
With reference to Fig. 2, a kind of improved method of the FP-Growth algorithms based on FPGA of the invention is comprised the following steps:
S201:Database in scanning Spark clusters, obtains frequent item set;
S202:Pressed frequent 1- item collections decreasing order arrangement;
S203:Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance;
S204:For each node in Spark clusters adds with one piece of FPGA board;
S205:Set up FP trees and a Tab table for memory node information that a root node is NULL;
S206:Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, builds Go out a paths of FP trees;
S207:In above-mentioned insertion process, while point to the node of respective items with the pointer of Tab, and by the meter of each node Number increases by 1;
S208:FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time;
S209:Its conditional pattern base is converted into condition FP trees;
S210:Step S208, step S209 are iteratively repeated, untill FP trees include an element entry;
S211:Each condition FP tree is generated into all of path from root node to leaf node, by the collection symphysis in path Into its all of nonvoid subset.
Embodiment 3
With reference to Fig. 3, the improvement device of the FP-Growth algorithms based on FPGA of the invention, including:Acquisition module 101, is grouped mould Block 102, board card module 103, achievement module 104 excavates tree module 105 and object module 106;Described acquisition module 101 according to Secondary connection grouping module 102, board card module 103, achievement module 104, excavation tree module 105 and object module 106.
Acquisition module 101 is used to scan the database in Spark clusters, obtains frequent item set;Grouping module 102 is used for will Frequent item set is grouped;Board card module 103 is used to add with one piece of FPGA board for each node in Spark clusters;Contribute The frequent item set that module 104 is used for each group on FPGA boards builds FP trees;Excavating tree module 105 is used in FPGA boards FP trees to each establishment carry out recurrence excavation;Object module 106 is used to merge the result that each group of recurrence is excavated.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, various equivalent modifications can be readily occurred in or replaced Change, these modifications or replacement should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection domain be defined.

Claims (6)

1. a kind of improved method of the FP-Growth algorithms based on FPGA, it is characterised in that comprise the following steps:
Database in scanning Spark clusters, obtains frequent item set;
Frequent item set is grouped;
For each node in Spark clusters adds with one piece of FPGA board;
Frequent item set to each group on FPGA boards builds FP trees;
FP trees to each establishment on FPGA boards carry out recurrence excavation;
The result that each group of recurrence is excavated is merged.
2. the improved method of the FP-Growth algorithms based on FPGA according to claim 1, it is characterised in that will be frequent Item collection is grouped, including:
Pressed frequent 1- item collections decreasing order arrangement;
Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance.
3. the improved method of the FP-Growth algorithms based on FPGA according to claim 1, it is characterised in that in FPGA Board to each establishment FP trees, including:
Set up FP trees and a Tab table for memory node information that a root node is NULL;
Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, constructs FP trees A paths;
In above-mentioned insertion process, while pointing to the node of respective items with the pointer of Tab, and the counting of each node is increased 1。
4. the improved method of the FP-Growth algorithms based on FPGA according to claim 3, it is characterised in that in FPGA Board carries out recurrence excavation to the FP trees of each establishment, including:
A:FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time;
B:Its conditional pattern base is converted into condition FP trees;
C:Step step A B is iteratively repeated, untill FP trees include an element entry.
5. the improved method of the FP-Growth algorithms based on FPGA according to claim 4, it is characterised in that will be each The result that group recurrence is excavated is merged, including:
Each condition FP tree is generated into all of path from root node to leaf node, its institute is generated by the set in path Some nonvoid subsets.
6. be based on based on any one described in claim 1 ~ 5 the FP-Growth algorithms of FPGA improved method based on FPGA FP-Growth algorithms improvement device, it is characterised in that including:
Acquisition module, for scanning the database in Spark clusters, obtains frequent item set;
Grouping module, for frequent item set to be grouped;
Board card module, for adding with one piece of FPGA board for each node in Spark clusters;
Achievement module, FP trees are built for the frequent item set to each group on FPGA boards;
Tree module is excavated, recurrence excavation is carried out for the FP trees to each establishment on FPGA boards;
Object module, for the result that each group of recurrence is excavated to be merged.
CN201710088020.2A 2017-02-19 2017-02-19 The improved method and device of the FP Growth algorithms based on FPGA Pending CN106874479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710088020.2A CN106874479A (en) 2017-02-19 2017-02-19 The improved method and device of the FP Growth algorithms based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710088020.2A CN106874479A (en) 2017-02-19 2017-02-19 The improved method and device of the FP Growth algorithms based on FPGA

Publications (1)

Publication Number Publication Date
CN106874479A true CN106874479A (en) 2017-06-20

Family

ID=59167465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710088020.2A Pending CN106874479A (en) 2017-02-19 2017-02-19 The improved method and device of the FP Growth algorithms based on FPGA

Country Status (1)

Country Link
CN (1) CN106874479A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612682A (en) * 2017-09-25 2018-01-19 郑州云海信息技术有限公司 A kind of data processing method based on SHA512 algorithms, apparatus and system
CN107612681A (en) * 2017-09-25 2018-01-19 郑州云海信息技术有限公司 A kind of data processing method based on SM3 algorithms, apparatus and system
CN110209698A (en) * 2019-05-13 2019-09-06 浙江大学 A kind of textile design creative design method based on silk relics data
CN112561089A (en) * 2020-11-27 2021-03-26 成都飞机工业(集团)有限责任公司 Correlation analysis and prediction method for vulnerable spare parts

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665669B2 (en) * 2000-01-03 2003-12-16 Db Miner Technology Inc. Methods and system for mining frequent patterns
CN104679773A (en) * 2013-11-29 2015-06-03 中国科学院深圳先进技术研究院 Mass transaction data frequent itemset mining method and querying method
CN104731925A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 MapReduce-based FP-Growth load balance parallel computing method
CN105183875A (en) * 2015-09-21 2015-12-23 南京邮电大学 FP-Growth data mining method based on shared path
CN105468750A (en) * 2015-11-26 2016-04-06 央视国际网络无锡有限公司 Data dimension reduction and compression method for correlation rule algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665669B2 (en) * 2000-01-03 2003-12-16 Db Miner Technology Inc. Methods and system for mining frequent patterns
CN104679773A (en) * 2013-11-29 2015-06-03 中国科学院深圳先进技术研究院 Mass transaction data frequent itemset mining method and querying method
CN104731925A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 MapReduce-based FP-Growth load balance parallel computing method
CN105183875A (en) * 2015-09-21 2015-12-23 南京邮电大学 FP-Growth data mining method based on shared path
CN105468750A (en) * 2015-11-26 2016-04-06 央视国际网络无锡有限公司 Data dimension reduction and compression method for correlation rule algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612682A (en) * 2017-09-25 2018-01-19 郑州云海信息技术有限公司 A kind of data processing method based on SHA512 algorithms, apparatus and system
CN107612681A (en) * 2017-09-25 2018-01-19 郑州云海信息技术有限公司 A kind of data processing method based on SM3 algorithms, apparatus and system
CN110209698A (en) * 2019-05-13 2019-09-06 浙江大学 A kind of textile design creative design method based on silk relics data
CN112561089A (en) * 2020-11-27 2021-03-26 成都飞机工业(集团)有限责任公司 Correlation analysis and prediction method for vulnerable spare parts

Similar Documents

Publication Publication Date Title
CN106874479A (en) The improved method and device of the FP Growth algorithms based on FPGA
CN109359172B (en) Entity alignment optimization method based on graph partitioning
CN103258049A (en) Association rule mining method based on mass data
Liu et al. Efficient mining of large maximal bicliques
Liu et al. Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network
CN103116625A (en) Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop
CN104731925A (en) MapReduce-based FP-Growth load balance parallel computing method
CN102073700A (en) Discovery method of complex network community
Chen et al. Metric similarity joins using MapReduce
Behrens et al. Parallelizing an unstructured grid generator with a space-filling curve approach
Hu et al. Output-optimal massively parallel algorithms for similarity joins
CN105589908A (en) Association rule computing method for transaction set
CN108563715A (en) A kind of distributed convergence method for digging and system
CN102663083A (en) Large-scale social network information extraction method based on distributed computation
CN102915344A (en) SQL (structured query language) statement processing method and device
CN106503811A (en) A kind of infrastructure full life cycle management method based on big data
Sampath et al. An efficient weighted rule mining for web logs using systolic tree
CN107590225A (en) A kind of Visualized management system based on distributed data digging algorithm
CN104462095A (en) Extraction method and device of common pars of query statements
US20200104425A1 (en) Techniques for lossless and lossy large-scale graph summarization
Rajput et al. An efficient parallel searching algorithm on Hypercube Interconnection network
Senthilkumar et al. An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce
CN102708285A (en) Coremedicine excavation method based on complex network model parallelizing PageRank algorithm
CN103885834A (en) Pattern matching processor used in distributed environment
Abdolazimi et al. Connected components of big graphs in fixed mapreduce rounds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620

RJ01 Rejection of invention patent application after publication