CN106874479A - The improved method and device of the FP Growth algorithms based on FPGA - Google Patents
The improved method and device of the FP Growth algorithms based on FPGA Download PDFInfo
- Publication number
- CN106874479A CN106874479A CN201710088020.2A CN201710088020A CN106874479A CN 106874479 A CN106874479 A CN 106874479A CN 201710088020 A CN201710088020 A CN 201710088020A CN 106874479 A CN106874479 A CN 106874479A
- Authority
- CN
- China
- Prior art keywords
- fpga
- trees
- node
- growth algorithms
- frequent item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
- G06F16/24566—Recursive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/24569—Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Abstract
The present invention relates to a kind of FP Growth algorithms, the improved method and device of more particularly to a kind of FP Growth algorithms based on FPGA belong to machine learning algorithm process field, and the database in Spark clusters is scanned first, obtain frequent item set;Frequent item set is grouped;For each node in Spark clusters adds with one piece of FPGA board;FP trees are built to each group of frequent item set on FPGA boards;FP trees to each establishment carry out recurrence excavation;The result that each group of recurrence is excavated is merged.The present invention improves the efficiency of FP Growth algorithms, by the computing capability that Spark cluster single nodes are improved on clustered node plus with FPGA boards, the Spark clusters parallel computation frames of itself is remained simultaneously, effectively increases the overall performance of FP Growth algorithms under big data environment.
Description
Technical field
The present invention relates to machine learning algorithm process field, more particularly to the FP-Growth algorithms based on FPGA improvement
Method and device.
Background technology
FP-Growth algorithms based on Spark platforms are using MapReduce distributed computing platforms, based on internal memory meter
Calculate, realize the parallelization of the algorithm, the digging efficiency of the algorithm is improved to a certain extent;However as the big data epoch
Arrival, the data volume sharp increase in science and engineering calculation field, computation complexity is continuously increased, to based on Spark platforms
The calculating performance of FP-Growth algorithms bring great challenge.Because single node disposal ability is limited, Spark is by extension
Clustered node scale realizes the lifting of algorithm performance;And this cluster expansion not only causes that system cost and energy consumption quickly increase
Plus, and cause that the data transfer overhead between cluster network complexity and node steeply rises, reduce what cluster expansion was brought
Calculate performance gain.How can just solve the above problems, strengthen single node disposal ability and then reduce computing cluster Rapid Expansion
The network transmission expense brought, finally realizing the performance boost of FP-Growth algorithms turns into hot issue urgently to be resolved hurrily.
The content of the invention
The improved method and device of the FP-Growth algorithms based on FPGA that the present invention is provided, overcome in the prior art
The deficiency of presence, has been obviously improved the calculating performance of FP-Growth algorithms.
In order to achieve the above object, the present invention is achieved by the following technical solutions:
The present invention provides a kind of improved method of the FP-Growth algorithms based on FPGA, comprises the following steps:
Database in scanning Spark clusters, obtains frequent item set;
Frequent item set is grouped;
For each node in Spark clusters adds with one piece of FPGA board;
Frequent item set to each group on FPGA boards builds FP trees;
FP trees to each establishment on FPGA boards carry out recurrence excavation;
The result that each group of recurrence is excavated is merged.
Further, frequent item set is grouped, including:
Pressed frequent 1- item collections decreasing order arrangement;
Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance.
Further, in FPGA boards to each establishment FP trees, including:
Set up FP trees and a Tab table for memory node information that a root node is NULL;
Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, constructs FP trees
A paths;
In above-mentioned insertion process, while pointing to the node of respective items with the pointer of Tab, and the counting of each node is increased
1。
Further, recurrence excavation is carried out to the FP trees of each establishment in FPGA boards, including:
A:FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time;
B:Its conditional pattern base is converted into condition FP trees;
C:Step step A B is iteratively repeated, untill FP trees include an element entry.
Further, the result that each group of recurrence is excavated is merged, including:
Each condition FP tree is generated into all of path from root node to leaf node, its institute is generated by the set in path
Some nonvoid subsets.
Based on a kind of a kind of based on FPGA's of above-mentioned improved method of the FP-Growth algorithms based on FPGA of any one
The improvement device of FP-Growth algorithms, including:
Acquisition module, for scanning the database in Spark clusters, obtains frequent item set;
Grouping module, for frequent item set to be grouped;
Board card module, for adding with one piece of FPGA board for each node in Spark clusters;
Achievement module, FP trees are built for the frequent item set to each group on FPGA boards;
Tree module is excavated, recurrence excavation is carried out for the FP trees to each establishment on FPGA boards;
Object module, for the result that each group of recurrence is excavated to be merged.
A kind of improved method of FP-Growth algorithms based on FPGA provided by the present invention, has the following advantages that:
1. the present invention is by the basis of original Spark clusters plus with FPGA, being that each clustered node increases by one block of FPGA plate
Card, is a kind of new isomery because FPGA boards have the outstanding advantages such as high-performance, low-power consumption, easy programming, dynamic reconfigurable
Computation accelerator part, constitutes FPGA boards and original node new Spark clustered nodes and serves whole Spark clusters, comes
The computing capability of cluster single node is improved, while remaining the Spark clusters parallel computation frames of itself, big number is effectively increased
According to the overall performance of FP-Growth algorithms under environment, and FPGA is engaged with CPU as acceleration equipment and to form Heterogeneous Computing and put down
Platform, can effectively lift the combination property of Spark clusters;
2. with Spark be combined FPGA by the present invention, by the maximum achievement of most time-consuming in algorithm, amount of calculation with excavate tree part from
Detached in Spark source codes and exploitation is realized and optimized on FPGA, and such as packet, Result of the other parts of algorithm is comprehensive
The less parts of amount of calculation such as conjunction are still run according to the original mechanism of Spark, give full play to the two advantage, and right on this basis
FP-Growth algorithms optimize improvement, the calculating performance of effective lifting FP-Growth algorithms.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will use needed for embodiment description
Accompanying drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention.
Fig. 1 is one of schematic flow sheet of improved method of FP-Growth algorithms that the embodiment of the present invention 1 is based on FPGA.
Fig. 2 is the two of the schematic flow sheet of the improved method of the FP-Growth algorithms that the embodiment of the present invention 2 is based on FPGA.
Fig. 3 is the structural representation of the improvement device of the FP-Growth algorithms that the embodiment of the present invention 3 is based on FPGA.
Specific embodiment
Some technical terms being related in the present invention are explained below:
Frequent episode:In multiple set, the element/item for frequently occurring.
Frequent item set:A series of set, these gather some identical elements, while frequency of occurrences element high in set
A subset is formed, the item collection of certain threshold condition is met.
Conditional pattern base:The set in the ancestors path of all nodes of the same frequent episode in FP trees.
Spark:It is universal parallel framework, enables internal memory distributed data collection, in addition to it can provide interactive inquiry, it
Can possess Hadoop MapReduce and had the advantage that with Optimized Iterative workload;But it is different from MapReduce
Output result can be stored in internal memory in the middle of Job, so as to no longer need to read and write HDFS.
FPGA:English full name be Field-Programmable Gate Array, i.e. field programmable gate array, it
It is a kind of logical device, is the product further developed on the basis of the programming devices such as PAL, GAL, CPLD.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
The present invention is described in detail with reference to the accompanying drawings and examples.
Embodiment 1
With reference to Fig. 1, a kind of improved method of the FP-Growth algorithms based on FPGA of the invention is comprised the following steps:
S101:Database in scanning Spark clusters, obtains frequent item set;
S102:Frequent item set is grouped;
S103:For each node in Spark clusters adds with one piece of FPGA board;
S104:Frequent item set to each group on FPGA boards builds FP trees;
S105:FP trees to each establishment on FPGA boards carry out recurrence excavation;
S106:The result that each group of recurrence is excavated is merged.
Embodiment 2
With reference to Fig. 2, a kind of improved method of the FP-Growth algorithms based on FPGA of the invention is comprised the following steps:
S201:Database in scanning Spark clusters, obtains frequent item set;
S202:Pressed frequent 1- item collections decreasing order arrangement;
S203:Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance;
S204:For each node in Spark clusters adds with one piece of FPGA board;
S205:Set up FP trees and a Tab table for memory node information that a root node is NULL;
S206:Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, builds
Go out a paths of FP trees;
S207:In above-mentioned insertion process, while point to the node of respective items with the pointer of Tab, and by the meter of each node
Number increases by 1;
S208:FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time;
S209:Its conditional pattern base is converted into condition FP trees;
S210:Step S208, step S209 are iteratively repeated, untill FP trees include an element entry;
S211:Each condition FP tree is generated into all of path from root node to leaf node, by the collection symphysis in path
Into its all of nonvoid subset.
Embodiment 3
With reference to Fig. 3, the improvement device of the FP-Growth algorithms based on FPGA of the invention, including:Acquisition module 101, is grouped mould
Block 102, board card module 103, achievement module 104 excavates tree module 105 and object module 106;Described acquisition module 101 according to
Secondary connection grouping module 102, board card module 103, achievement module 104, excavation tree module 105 and object module 106.
Acquisition module 101 is used to scan the database in Spark clusters, obtains frequent item set;Grouping module 102 is used for will
Frequent item set is grouped;Board card module 103 is used to add with one piece of FPGA board for each node in Spark clusters;Contribute
The frequent item set that module 104 is used for each group on FPGA boards builds FP trees;Excavating tree module 105 is used in FPGA boards
FP trees to each establishment carry out recurrence excavation;Object module 106 is used to merge the result that each group of recurrence is excavated.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any
Those familiar with the art the invention discloses technical scope in, various equivalent modifications can be readily occurred in or replaced
Change, these modifications or replacement should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection domain be defined.
Claims (6)
1. a kind of improved method of the FP-Growth algorithms based on FPGA, it is characterised in that comprise the following steps:
Database in scanning Spark clusters, obtains frequent item set;
Frequent item set is grouped;
For each node in Spark clusters adds with one piece of FPGA board;
Frequent item set to each group on FPGA boards builds FP trees;
FP trees to each establishment on FPGA boards carry out recurrence excavation;
The result that each group of recurrence is excavated is merged.
2. the improved method of the FP-Growth algorithms based on FPGA according to claim 1, it is characterised in that will be frequent
Item collection is grouped, including:
Pressed frequent 1- item collections decreasing order arrangement;
Size according to database determines packet number, and some groups are classified as according to rule of classification set in advance.
3. the improved method of the FP-Growth algorithms based on FPGA according to claim 1, it is characterised in that in FPGA
Board to each establishment FP trees, including:
Set up FP trees and a Tab table for memory node information that a root node is NULL;
Data item in the affairs that every in frequent episode table is handled well is sequentially inserted into FP trees in descending order, constructs FP trees
A paths;
In above-mentioned insertion process, while pointing to the node of respective items with the pointer of Tab, and the counting of each node is increased
1。
4. the improved method of the FP-Growth algorithms based on FPGA according to claim 3, it is characterised in that in FPGA
Board carries out recurrence excavation to the FP trees of each establishment, including:
A:FP trees are traversed up since the item of the afterbody of Tab tables, traversal obtains the conditional pattern base of this every time;
B:Its conditional pattern base is converted into condition FP trees;
C:Step step A B is iteratively repeated, untill FP trees include an element entry.
5. the improved method of the FP-Growth algorithms based on FPGA according to claim 4, it is characterised in that will be each
The result that group recurrence is excavated is merged, including:
Each condition FP tree is generated into all of path from root node to leaf node, its institute is generated by the set in path
Some nonvoid subsets.
6. be based on based on any one described in claim 1 ~ 5 the FP-Growth algorithms of FPGA improved method based on FPGA
FP-Growth algorithms improvement device, it is characterised in that including:
Acquisition module, for scanning the database in Spark clusters, obtains frequent item set;
Grouping module, for frequent item set to be grouped;
Board card module, for adding with one piece of FPGA board for each node in Spark clusters;
Achievement module, FP trees are built for the frequent item set to each group on FPGA boards;
Tree module is excavated, recurrence excavation is carried out for the FP trees to each establishment on FPGA boards;
Object module, for the result that each group of recurrence is excavated to be merged.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710088020.2A CN106874479A (en) | 2017-02-19 | 2017-02-19 | The improved method and device of the FP Growth algorithms based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710088020.2A CN106874479A (en) | 2017-02-19 | 2017-02-19 | The improved method and device of the FP Growth algorithms based on FPGA |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874479A true CN106874479A (en) | 2017-06-20 |
Family
ID=59167465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710088020.2A Pending CN106874479A (en) | 2017-02-19 | 2017-02-19 | The improved method and device of the FP Growth algorithms based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874479A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107612682A (en) * | 2017-09-25 | 2018-01-19 | 郑州云海信息技术有限公司 | A kind of data processing method based on SHA512 algorithms, apparatus and system |
CN107612681A (en) * | 2017-09-25 | 2018-01-19 | 郑州云海信息技术有限公司 | A kind of data processing method based on SM3 algorithms, apparatus and system |
CN110209698A (en) * | 2019-05-13 | 2019-09-06 | 浙江大学 | A kind of textile design creative design method based on silk relics data |
CN112561089A (en) * | 2020-11-27 | 2021-03-26 | 成都飞机工业(集团)有限责任公司 | Correlation analysis and prediction method for vulnerable spare parts |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6665669B2 (en) * | 2000-01-03 | 2003-12-16 | Db Miner Technology Inc. | Methods and system for mining frequent patterns |
CN104679773A (en) * | 2013-11-29 | 2015-06-03 | 中国科学院深圳先进技术研究院 | Mass transaction data frequent itemset mining method and querying method |
CN104731925A (en) * | 2015-03-26 | 2015-06-24 | 江苏物联网研究发展中心 | MapReduce-based FP-Growth load balance parallel computing method |
CN105183875A (en) * | 2015-09-21 | 2015-12-23 | 南京邮电大学 | FP-Growth data mining method based on shared path |
CN105468750A (en) * | 2015-11-26 | 2016-04-06 | 央视国际网络无锡有限公司 | Data dimension reduction and compression method for correlation rule algorithm |
-
2017
- 2017-02-19 CN CN201710088020.2A patent/CN106874479A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6665669B2 (en) * | 2000-01-03 | 2003-12-16 | Db Miner Technology Inc. | Methods and system for mining frequent patterns |
CN104679773A (en) * | 2013-11-29 | 2015-06-03 | 中国科学院深圳先进技术研究院 | Mass transaction data frequent itemset mining method and querying method |
CN104731925A (en) * | 2015-03-26 | 2015-06-24 | 江苏物联网研究发展中心 | MapReduce-based FP-Growth load balance parallel computing method |
CN105183875A (en) * | 2015-09-21 | 2015-12-23 | 南京邮电大学 | FP-Growth data mining method based on shared path |
CN105468750A (en) * | 2015-11-26 | 2016-04-06 | 央视国际网络无锡有限公司 | Data dimension reduction and compression method for correlation rule algorithm |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107612682A (en) * | 2017-09-25 | 2018-01-19 | 郑州云海信息技术有限公司 | A kind of data processing method based on SHA512 algorithms, apparatus and system |
CN107612681A (en) * | 2017-09-25 | 2018-01-19 | 郑州云海信息技术有限公司 | A kind of data processing method based on SM3 algorithms, apparatus and system |
CN110209698A (en) * | 2019-05-13 | 2019-09-06 | 浙江大学 | A kind of textile design creative design method based on silk relics data |
CN112561089A (en) * | 2020-11-27 | 2021-03-26 | 成都飞机工业(集团)有限责任公司 | Correlation analysis and prediction method for vulnerable spare parts |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106874479A (en) | The improved method and device of the FP Growth algorithms based on FPGA | |
CN109359172B (en) | Entity alignment optimization method based on graph partitioning | |
CN103258049A (en) | Association rule mining method based on mass data | |
Liu et al. | Efficient mining of large maximal bicliques | |
Liu et al. | Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network | |
CN103116625A (en) | Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop | |
CN104731925A (en) | MapReduce-based FP-Growth load balance parallel computing method | |
CN102073700A (en) | Discovery method of complex network community | |
Chen et al. | Metric similarity joins using MapReduce | |
Behrens et al. | Parallelizing an unstructured grid generator with a space-filling curve approach | |
Hu et al. | Output-optimal massively parallel algorithms for similarity joins | |
CN105589908A (en) | Association rule computing method for transaction set | |
CN108563715A (en) | A kind of distributed convergence method for digging and system | |
CN102663083A (en) | Large-scale social network information extraction method based on distributed computation | |
CN102915344A (en) | SQL (structured query language) statement processing method and device | |
CN106503811A (en) | A kind of infrastructure full life cycle management method based on big data | |
Sampath et al. | An efficient weighted rule mining for web logs using systolic tree | |
CN107590225A (en) | A kind of Visualized management system based on distributed data digging algorithm | |
CN104462095A (en) | Extraction method and device of common pars of query statements | |
US20200104425A1 (en) | Techniques for lossless and lossy large-scale graph summarization | |
Rajput et al. | An efficient parallel searching algorithm on Hypercube Interconnection network | |
Senthilkumar et al. | An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce | |
CN102708285A (en) | Coremedicine excavation method based on complex network model parallelizing PageRank algorithm | |
CN103885834A (en) | Pattern matching processor used in distributed environment | |
Abdolazimi et al. | Connected components of big graphs in fixed mapreduce rounds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |
|
RJ01 | Rejection of invention patent application after publication |