CN104850866B

CN104850866B - Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA

Info

Publication number: CN104850866B
Application number: CN201510308043.0A
Authority: CN
Inventors: 蒲宇亮; 黄乐天; 彭军; 贺江
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2015-06-08
Filing date: 2015-06-08
Publication date: 2018-05-01
Anticipated expiration: 2035-06-08
Also published as: CN104850866A

Abstract

The invention discloses a kind of via Self-reconfiguration K means clustering technique implementation methods based on SoC FPGA, it comprises the following steps：S1：Build the SoC FPGA heterogeneous platform models of ARM host sides and the cooperation of FPGA device end；S2：ARM host sides build OpenCL mainframe programs, create kernel, complete Memory Allocation and mapping；S3：Mainframe program calls the kernel program at FPGA device end, sends data to FPGA device end；S4：Euclidean distance is calculated first OpenCL kernel program parallel pipelining processes, produces a distance matrix；S5：The 2nd OpenCL kernel programs of via Self-reconfiguration, filter out the element of every row minimum and record its correspondence barycenter；S6：The 3rd OpenCL kernel programs of via Self-reconfiguration, realize all sample points in each barycenter cluster distance add up and quantity statistics work；S7：Mainframe program calculates new barycenter data；S8：Mainframe program is iterated judgement.The present invention not only improves the execution speed of K means clustering algorithms, obtains the energy efficiency of higher, and by kernel via Self-reconfiguration, solve the problems, such as FPGA hardware inadequate resource.

Description

Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA

Technical field

The present invention relates to data mining technology field, more particularly to a kind of via Self-reconfiguration K-means based on SoC-FPGA to gather Class Implementation Technology.

Background technology

K-means algorithms as most common clustering algorithm, with its it is simple, effective the advantages that be widely used in pattern and know Not, the field such as machine learning and data mining, specific such as automatic document classification arranges, the Basis Function Center of neutral net determines, Nuclear magnetic resonance image dividing processing etc..K-means cluster belongs to unsupervised learning, its flow as shown in Figure 1, basic thought be with In space on the basis of K barycenter, sample is sorted out with a distance from each barycenter by sample point, and is calculated current all kinds of New barycenter.Successive ignition renewal cluster barycenter, based on barycenter convergence.K-means cluster process schematic diagrames are as shown in Figure 2. It is seen that largely being calculated involved in K-means from algorithmic procedure, more particularly to sample point and barycenter distance calculate and When updating barycenter computing, very big computing cost can be brought.In actual data mining task, sample size is often very huge Greatly, corresponding computing resource and storage resource consumption are all very big, since FPGA internal resources are limited, how K-means are gathered Class process carries out the time and optimization spatially is a problem of the area research at present.

Searched for by existing literature, find to be concentrated mainly on the time using the article of FPGA optimization K-means clustering techniques In terms of optimization, there are the construction cycle is long, cross-platform transplantability is poor, is unsuitable for multiprocessor heterogeneous platform association for the implementation taken The deficiencies of with accelerating.Kutty, Boussaid et al. are in 2013 in International Symposium on Circuits Publish an article on and Systems (ISCAS)《A high speed configurable FPGA architecture for k-mean clustering》, the acceleration of K-means clustering algorithms is realized on FPGA using globally configurable method.This The traditional HDL hardware program languages of kind carry out developing make developer that great effort flower is big in development difficulties such as sequential logics Module, can not be absorbed in algorithm in itself, cause development efficiency low.On the other hand, since HDL hardware languages can only be directed to FPGA Platform is developed, and causes the portability of system low with compatibility.

2008, Apple companies proposed first and exempt from version towards general the open of purpose multiple programming of heterogeneous system Tax standard, full name Open Computing Language（OpenCL）, i.e. open computing language.OpenCL is suitable for difference The collaboration parallel computation of processor, for the isomery coprocessor that it is supported including CPU, GPU, DSP etc., code is versatile, can be light Pine is transplanted at distinct device end.2011, altera corp issued the OpenCL standard development plans of FPGA, and in The FPGA products based on OpenCL frames are proposed within 2013, cross-platform multiple programming standard OpenCL be extend into FPGA necks Domain.

On programming mode, FPGA executive programs are all using the OpenCL language developments of class C/C++ styles, exploitation letter Just, modification is flexible, can greatly shorten the R＆D cycle, reduces the R＆D costs of product maintenance and upgrading；On the other hand, new method base In OpenCL frameworks, code can realize cross-platform Rapid transplant, be adapted to extension and the association applied to multiprocessor heterogeneous platform Among accelerating.Simultaneously as FPGA device introduces kernel via Self-reconfiguration mechanism, when compiling, can make full use of altera corp to provide For FPGA-OpenCL exploitation optimisation technique, achieveed the purpose that abundant digging utilization FPGA hardware resource.

The content of the invention

Holding for K-means clustering algorithms is improved it is an object of the invention to overcome the deficiencies of the prior art and provide a kind of Scanning frequency degree, obtain higher energy efficiency the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, solve The prior art it is big for operand existing for existing K-means clustering algorithms, take hardware resource it is more, power consumption is big, system The problems such as time delay is big.

The purpose of the present invention is what is be achieved through the following technical solutions：Via Self-reconfiguration K-means based on SoC-FPGA gathers Class Implementation Technology, it comprises the following steps：

S1：Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platforms of ARM host sides and the cooperation of FPGA device end are built Model, ARM host sides configure environmental parameter, complete initialization；The ARM host sides pass through with FPGA device end AXI on-chip bus connects；

S2：ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL The Memory Allocation of equipment end, and data are write into memory, ARM host sides are completed by way of parameter transmission and are set with OpenCL The memory mapping at standby end；

S3：Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit of ARM host sides Dimension, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to by AXI on-chip bus FPGA device end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory In；

S4：FPGA device end builds the first OpenCL kernel programs, the first OpenCL kernel program parallel pipelining processes Ground calculates each sample set data to the Euclidean distance of each barycenter, produces a distance matrix；

S5：The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive the The distance matrix that one OpenCL kernel programs produce, uses for each row of data in matrix and returns the method parallel processing of lookup The each row of data of distance matrix, filters out the element of every row minimum and records its correspondence barycenter, complete the classification processing of sample set；

S6：The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using simultaneously Line mode adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by number ARM host sides are passed back by AXI buses according to result；

S7：The mainframe program of ARM host sides is added up by the distance of each barycenter cluster and distinguished divided by respective sample point Quantity, calculates new barycenter data；

S8：Compared with the mainframe program of ARM host sides is made the difference new barycenter with the protoplasm heart：

（1）If result is more than given standard, continues to cluster iteration, jump to step S2；

（2）If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters Task is completed.

Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA further include a release kernel and are provided with memory Source step S9：After step S8 completions, all kernels and memory source are discharged.

For new barycenter data and former barycenter data are used compared with new barycenter being made the difference with the protoplasm heart described in step S8 Variance characterizes difference degree；The given standard is known threshold.

The beneficial effects of the invention are as follows：

（1）The present invention is directed to the problem of FPGA internal resources are insufficient in calculating process, devises the interior of FPGA device end Core via Self-reconfiguration mechanism, makes kernel module according to current task progress timesharing dynamic importing FPGA, so as to optimize FPGA hardware money The utilization ratio in source；Solving the optimization method for FPGA-OpenCL exploitations that the prior art provides includes vectorization and flowing water The technologies such as line duplication, although system can be made to obtain more powerful computing capability, the rise of hardware resource occupancy, can not even match somebody with somebody The problems such as putting.

（2）The SoC-FPGA systems that the present invention uses are made of two parts subsystem, be respectively ARM frameworks subsystem and FPGA architecture subsystem, since two systems are integrated on same chip, AXI on-chip bus high bandwidth characteristics will greatly shorten The communication delay of host and equipment, improves data throughout.

（3）The present invention realizes data calculation optimization by reasonable disposition calculation position：According to K-means clustering algorithms Characteristic, computational intensity is high and is adapted to parallel distance matrix to calculate, sample is sorted out, apart from moulds such as cumulative and sample size statistics Block is performed with kernel program form at FPGA ends, and barycenter renewal and the light calculation amount such as iteration control and is not easy parallel module and is existed ARM ends perform.

（4）Due to the fine granulation architecture of FPGA device, compiling only generates required logical construction, reduces system Energy consumption, has achieveed the purpose that high-performance low-power-consumption calculates.

（5）The present invention realizes that data memory access optimizes by way of reasonable disposition data are stored：OpenCL standards are provided Memory model include global memory, local memory and privately owned memory etc., since global memory possesses more than resource but accesses speed Degree is slow, and local memory access speed is fast but resource is less, and the relatively small number of data to be sorted of data volume are stored to local Deposit, the larger training set data of data volume is stored to global memory.

（6）Using OpenCL standard developments, system portability is strong, and compatibility is strong.

（7）For FPGA executive programs all using the OpenCL language developments of class C/C++ styles, exploitation is easy, and modification is flexible, The R＆D cycle can be greatly shortened, reduces the R＆D costs of product maintenance and upgrading.

Brief description of the drawings

Fig. 1 is the K-means clustering algorithm flow charts of the prior art；

Fig. 2 is the K-means cluster process schematic diagrames of the prior art；

Fig. 3 is the method for the present invention flow chart；

Fig. 4 is SoC-FPGA top-level module Organization Charts；

Searching work schematic diagram is simultaneously returned in Fig. 5 positions；

Fig. 6 is ARM host side iteration control flow charts.

Embodiment

Technical scheme is described in further detail below in conjunction with the accompanying drawings：

The system architecture of the present invention passes through AXI buses and FPGA device end phase as shown in figure 4, wherein ARM is host side Even, AXI on-chip bus high bandwidth characteristics will greatly shorten the communication delay of host and equipment, improve throughput of system.According to K- The characteristic of means clustering algorithms, computational intensity is high and be adapted to parallel distance matrix to calculate, sample is sorted out, distance is cumulative and The modules such as sample size statistics are performed with kernel program form at FPGA ends, and barycenter renewal and the light calculation amount such as iteration control and Parallel module is not easy to perform at ARM ends.

The memory model that OpenCL standards are provided includes global memory, local memory and privately owned memory etc., due to complete Intra-office, which is deposited, possesses that resource is more but access speed is slow, and local memory access speed is fast but resource is less, and data volume is relatively fewer Barycenter data store to local memory, the larger sample set data of data volume are stored to global memory.The design passes through conjunction The mode of configuration data storage is managed, realizes that data memory access optimizes.

In the entire system, ARM host sides manage all computing resources on whole platform, and host side program passes through definition Context and queue management kernel program.SIMD is based on tradition（Single Instruction Multiple Data）And The heterogeneous computing system of row implementation is different, and SoC-FPGA is realized parallel using assembly line used by the design, can be more preferable The module that ground handles and returns lookup etc. to have a large amount of branch's skip instructions.

As shown in figure 3, the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, it comprises the following steps：

S5：The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive the The distance matrix that one OpenCL kernel programs produce, as shown in figure 5, being used for each row of data in matrix and returning the side of lookup The each row of data of method parallel processing distance matrix, filters out the element of every row minimum and records its correspondence barycenter, complete sample set Classification processing；

S8：ARM host side iteration control flow charts are as shown in Figure 6, and the mainframe programs of ARM host sides is by new barycenter and protoplasm The heart make the difference comparing：

Claims

1. the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, it is characterised in that：It comprises the following steps：

S1：Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platform moulds of ARM host sides and the cooperation of FPGA device end are built Type, ARM host sides configure environmental parameter, complete initialization；The ARM host sides pass through AXI with FPGA device end On-chip bus connects；

S2：ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL equipment The Memory Allocation at end, and data are write into memory, ARM host sides and OpenCL equipment ends are completed by way of parameter transmission Memory mapping；

S3：Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit dimension of ARM host sides Degree, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to FPGA by AXI on-chip bus Equipment end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory；

S4：FPGA device end builds the first OpenCL kernel programs, counts the first OpenCL kernel program parallel pipelining processes Each sample set data are calculated to the Euclidean distance of each barycenter, produce a distance matrix；

S5：The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive first The distance matrix that OpenCL kernel programs produce, used for each row of data in matrix and return the method parallel processing of lookup away from From each row of data of matrix, filter out the element of every row minimum and record its correspondence barycenter, complete the classification processing of sample set；

S6：The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using parallel side Formula adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by data knot Fruit passes ARM host sides back by AXI buses；

S7：The mainframe program of ARM host sides is cumulative and respectively divided by respective by the distance between barycenter in each barycenter cluster Sample point quantity, calculates new barycenter data；

（2）If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters task Complete.

2. the via Self-reconfiguration K-means clustering technique implementation methods according to claim 1 based on SoC-FPGA, its feature exist In：Further include a release kernel and memory source step S9：After step S8 completions, discharge all kernels and provided with memory Source.

3. the via Self-reconfiguration K-means clustering technique implementation methods according to claim 1 based on SoC-FPGA, its feature exist In：For by new barycenter data and former barycenter data variance table compared with new barycenter being made the difference with the protoplasm heart described in step S8 Levy difference degree；The given standard is known threshold.