CN104850866B - Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA - Google Patents

Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA Download PDF

Info

Publication number
CN104850866B
CN104850866B CN201510308043.0A CN201510308043A CN104850866B CN 104850866 B CN104850866 B CN 104850866B CN 201510308043 A CN201510308043 A CN 201510308043A CN 104850866 B CN104850866 B CN 104850866B
Authority
CN
China
Prior art keywords
barycenter
fpga
opencl
data
kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510308043.0A
Other languages
Chinese (zh)
Other versions
CN104850866A (en
Inventor
蒲宇亮
黄乐天
彭军
贺江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510308043.0A priority Critical patent/CN104850866B/en
Publication of CN104850866A publication Critical patent/CN104850866A/en
Application granted granted Critical
Publication of CN104850866B publication Critical patent/CN104850866B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of via Self-reconfiguration K means clustering technique implementation methods based on SoC FPGA, it comprises the following steps:S1:Build the SoC FPGA heterogeneous platform models of ARM host sides and the cooperation of FPGA device end;S2:ARM host sides build OpenCL mainframe programs, create kernel, complete Memory Allocation and mapping;S3:Mainframe program calls the kernel program at FPGA device end, sends data to FPGA device end;S4:Euclidean distance is calculated first OpenCL kernel program parallel pipelining processes, produces a distance matrix;S5:The 2nd OpenCL kernel programs of via Self-reconfiguration, filter out the element of every row minimum and record its correspondence barycenter;S6:The 3rd OpenCL kernel programs of via Self-reconfiguration, realize all sample points in each barycenter cluster distance add up and quantity statistics work;S7:Mainframe program calculates new barycenter data;S8:Mainframe program is iterated judgement.The present invention not only improves the execution speed of K means clustering algorithms, obtains the energy efficiency of higher, and by kernel via Self-reconfiguration, solve the problems, such as FPGA hardware inadequate resource.

Description

Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA
Technical field
The present invention relates to data mining technology field, more particularly to a kind of via Self-reconfiguration K-means based on SoC-FPGA to gather Class Implementation Technology.
Background technology
K-means algorithms as most common clustering algorithm, with its it is simple, effective the advantages that be widely used in pattern and know Not, the field such as machine learning and data mining, specific such as automatic document classification arranges, the Basis Function Center of neutral net determines, Nuclear magnetic resonance image dividing processing etc..K-means cluster belongs to unsupervised learning, its flow as shown in Figure 1, basic thought be with In space on the basis of K barycenter, sample is sorted out with a distance from each barycenter by sample point, and is calculated current all kinds of New barycenter.Successive ignition renewal cluster barycenter, based on barycenter convergence.K-means cluster process schematic diagrames are as shown in Figure 2. It is seen that largely being calculated involved in K-means from algorithmic procedure, more particularly to sample point and barycenter distance calculate and When updating barycenter computing, very big computing cost can be brought.In actual data mining task, sample size is often very huge Greatly, corresponding computing resource and storage resource consumption are all very big, since FPGA internal resources are limited, how K-means are gathered Class process carries out the time and optimization spatially is a problem of the area research at present.
Searched for by existing literature, find to be concentrated mainly on the time using the article of FPGA optimization K-means clustering techniques In terms of optimization, there are the construction cycle is long, cross-platform transplantability is poor, is unsuitable for multiprocessor heterogeneous platform association for the implementation taken The deficiencies of with accelerating.Kutty, Boussaid et al. are in 2013 in International Symposium on Circuits Publish an article on and Systems (ISCAS)《A high speed configurable FPGA architecture for k-mean clustering》, the acceleration of K-means clustering algorithms is realized on FPGA using globally configurable method.This The traditional HDL hardware program languages of kind carry out developing make developer that great effort flower is big in development difficulties such as sequential logics Module, can not be absorbed in algorithm in itself, cause development efficiency low.On the other hand, since HDL hardware languages can only be directed to FPGA Platform is developed, and causes the portability of system low with compatibility.
2008, Apple companies proposed first and exempt from version towards general the open of purpose multiple programming of heterogeneous system Tax standard, full name Open Computing Language(OpenCL), i.e. open computing language.OpenCL is suitable for difference The collaboration parallel computation of processor, for the isomery coprocessor that it is supported including CPU, GPU, DSP etc., code is versatile, can be light Pine is transplanted at distinct device end.2011, altera corp issued the OpenCL standard development plans of FPGA, and in The FPGA products based on OpenCL frames are proposed within 2013, cross-platform multiple programming standard OpenCL be extend into FPGA necks Domain.
On programming mode, FPGA executive programs are all using the OpenCL language developments of class C/C++ styles, exploitation letter Just, modification is flexible, can greatly shorten the R&D cycle, reduces the R&D costs of product maintenance and upgrading;On the other hand, new method base In OpenCL frameworks, code can realize cross-platform Rapid transplant, be adapted to extension and the association applied to multiprocessor heterogeneous platform Among accelerating.Simultaneously as FPGA device introduces kernel via Self-reconfiguration mechanism, when compiling, can make full use of altera corp to provide For FPGA-OpenCL exploitation optimisation technique, achieveed the purpose that abundant digging utilization FPGA hardware resource.
The content of the invention
Holding for K-means clustering algorithms is improved it is an object of the invention to overcome the deficiencies of the prior art and provide a kind of Scanning frequency degree, obtain higher energy efficiency the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, solve The prior art it is big for operand existing for existing K-means clustering algorithms, take hardware resource it is more, power consumption is big, system The problems such as time delay is big.
The purpose of the present invention is what is be achieved through the following technical solutions:Via Self-reconfiguration K-means based on SoC-FPGA gathers Class Implementation Technology, it comprises the following steps:
S1:Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platforms of ARM host sides and the cooperation of FPGA device end are built Model, ARM host sides configure environmental parameter, complete initialization;The ARM host sides pass through with FPGA device end AXI on-chip bus connects;
S2:ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL The Memory Allocation of equipment end, and data are write into memory, ARM host sides are completed by way of parameter transmission and are set with OpenCL The memory mapping at standby end;
S3:Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit of ARM host sides Dimension, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to by AXI on-chip bus FPGA device end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory In;
S4:FPGA device end builds the first OpenCL kernel programs, the first OpenCL kernel program parallel pipelining processes Ground calculates each sample set data to the Euclidean distance of each barycenter, produces a distance matrix;
S5:The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive the The distance matrix that one OpenCL kernel programs produce, uses for each row of data in matrix and returns the method parallel processing of lookup The each row of data of distance matrix, filters out the element of every row minimum and records its correspondence barycenter, complete the classification processing of sample set;
S6:The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using simultaneously Line mode adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by number ARM host sides are passed back by AXI buses according to result;
S7:The mainframe program of ARM host sides is added up by the distance of each barycenter cluster and distinguished divided by respective sample point Quantity, calculates new barycenter data;
S8:Compared with the mainframe program of ARM host sides is made the difference new barycenter with the protoplasm heart:
(1)If result is more than given standard, continues to cluster iteration, jump to step S2;
(2)If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters Task is completed.
Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA further include a release kernel and are provided with memory Source step S9:After step S8 completions, all kernels and memory source are discharged.
For new barycenter data and former barycenter data are used compared with new barycenter being made the difference with the protoplasm heart described in step S8 Variance characterizes difference degree;The given standard is known threshold.
The beneficial effects of the invention are as follows:
(1)The present invention is directed to the problem of FPGA internal resources are insufficient in calculating process, devises the interior of FPGA device end Core via Self-reconfiguration mechanism, makes kernel module according to current task progress timesharing dynamic importing FPGA, so as to optimize FPGA hardware money The utilization ratio in source;Solving the optimization method for FPGA-OpenCL exploitations that the prior art provides includes vectorization and flowing water The technologies such as line duplication, although system can be made to obtain more powerful computing capability, the rise of hardware resource occupancy, can not even match somebody with somebody The problems such as putting.
(2)The SoC-FPGA systems that the present invention uses are made of two parts subsystem, be respectively ARM frameworks subsystem and FPGA architecture subsystem, since two systems are integrated on same chip, AXI on-chip bus high bandwidth characteristics will greatly shorten The communication delay of host and equipment, improves data throughout.
(3)The present invention realizes data calculation optimization by reasonable disposition calculation position:According to K-means clustering algorithms Characteristic, computational intensity is high and is adapted to parallel distance matrix to calculate, sample is sorted out, apart from moulds such as cumulative and sample size statistics Block is performed with kernel program form at FPGA ends, and barycenter renewal and the light calculation amount such as iteration control and is not easy parallel module and is existed ARM ends perform.
(4)Due to the fine granulation architecture of FPGA device, compiling only generates required logical construction, reduces system Energy consumption, has achieveed the purpose that high-performance low-power-consumption calculates.
(5)The present invention realizes that data memory access optimizes by way of reasonable disposition data are stored:OpenCL standards are provided Memory model include global memory, local memory and privately owned memory etc., since global memory possesses more than resource but accesses speed Degree is slow, and local memory access speed is fast but resource is less, and the relatively small number of data to be sorted of data volume are stored to local Deposit, the larger training set data of data volume is stored to global memory.
(6)Using OpenCL standard developments, system portability is strong, and compatibility is strong.
(7)For FPGA executive programs all using the OpenCL language developments of class C/C++ styles, exploitation is easy, and modification is flexible, The R&D cycle can be greatly shortened, reduces the R&D costs of product maintenance and upgrading.
Brief description of the drawings
Fig. 1 is the K-means clustering algorithm flow charts of the prior art;
Fig. 2 is the K-means cluster process schematic diagrames of the prior art;
Fig. 3 is the method for the present invention flow chart;
Fig. 4 is SoC-FPGA top-level module Organization Charts;
Searching work schematic diagram is simultaneously returned in Fig. 5 positions;
Fig. 6 is ARM host side iteration control flow charts.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The system architecture of the present invention passes through AXI buses and FPGA device end phase as shown in figure 4, wherein ARM is host side Even, AXI on-chip bus high bandwidth characteristics will greatly shorten the communication delay of host and equipment, improve throughput of system.According to K- The characteristic of means clustering algorithms, computational intensity is high and be adapted to parallel distance matrix to calculate, sample is sorted out, distance is cumulative and The modules such as sample size statistics are performed with kernel program form at FPGA ends, and barycenter renewal and the light calculation amount such as iteration control and Parallel module is not easy to perform at ARM ends.
The memory model that OpenCL standards are provided includes global memory, local memory and privately owned memory etc., due to complete Intra-office, which is deposited, possesses that resource is more but access speed is slow, and local memory access speed is fast but resource is less, and data volume is relatively fewer Barycenter data store to local memory, the larger sample set data of data volume are stored to global memory.The design passes through conjunction The mode of configuration data storage is managed, realizes that data memory access optimizes.
In the entire system, ARM host sides manage all computing resources on whole platform, and host side program passes through definition Context and queue management kernel program.SIMD is based on tradition(Single Instruction Multiple Data)And The heterogeneous computing system of row implementation is different, and SoC-FPGA is realized parallel using assembly line used by the design, can be more preferable The module that ground handles and returns lookup etc. to have a large amount of branch's skip instructions.
As shown in figure 3, the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, it comprises the following steps:
S1:Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platforms of ARM host sides and the cooperation of FPGA device end are built Model, ARM host sides configure environmental parameter, complete initialization;The ARM host sides pass through with FPGA device end AXI on-chip bus connects;
S2:ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL The Memory Allocation of equipment end, and data are write into memory, ARM host sides are completed by way of parameter transmission and are set with OpenCL The memory mapping at standby end;
S3:Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit of ARM host sides Dimension, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to by AXI on-chip bus FPGA device end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory In;
S4:FPGA device end builds the first OpenCL kernel programs, the first OpenCL kernel program parallel pipelining processes Ground calculates each sample set data to the Euclidean distance of each barycenter, produces a distance matrix;
S5:The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive the The distance matrix that one OpenCL kernel programs produce, as shown in figure 5, being used for each row of data in matrix and returning the side of lookup The each row of data of method parallel processing distance matrix, filters out the element of every row minimum and records its correspondence barycenter, complete sample set Classification processing;
S6:The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using simultaneously Line mode adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by number ARM host sides are passed back by AXI buses according to result;
S7:The mainframe program of ARM host sides is added up by the distance of each barycenter cluster and distinguished divided by respective sample point Quantity, calculates new barycenter data;
S8:ARM host side iteration control flow charts are as shown in Figure 6, and the mainframe programs of ARM host sides is by new barycenter and protoplasm The heart make the difference comparing:
(1)If result is more than given standard, continues to cluster iteration, jump to step S2;
(2)If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters Task is completed.
Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA further include a release kernel and are provided with memory Source step S9:After step S8 completions, all kernels and memory source are discharged.
For new barycenter data and former barycenter data are used compared with new barycenter being made the difference with the protoplasm heart described in step S8 Variance characterizes difference degree;The given standard is known threshold.

Claims (3)

1. the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, it is characterised in that:It comprises the following steps:
S1:Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platform moulds of ARM host sides and the cooperation of FPGA device end are built Type, ARM host sides configure environmental parameter, complete initialization;The ARM host sides pass through AXI with FPGA device end On-chip bus connects;
S2:ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL equipment The Memory Allocation at end, and data are write into memory, ARM host sides and OpenCL equipment ends are completed by way of parameter transmission Memory mapping;
S3:Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit dimension of ARM host sides Degree, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to FPGA by AXI on-chip bus Equipment end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory;
S4:FPGA device end builds the first OpenCL kernel programs, counts the first OpenCL kernel program parallel pipelining processes Each sample set data are calculated to the Euclidean distance of each barycenter, produce a distance matrix;
S5:The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive first The distance matrix that OpenCL kernel programs produce, used for each row of data in matrix and return the method parallel processing of lookup away from From each row of data of matrix, filter out the element of every row minimum and record its correspondence barycenter, complete the classification processing of sample set;
S6:The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using parallel side Formula adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by data knot Fruit passes ARM host sides back by AXI buses;
S7:The mainframe program of ARM host sides is cumulative and respectively divided by respective by the distance between barycenter in each barycenter cluster Sample point quantity, calculates new barycenter data;
S8:Compared with the mainframe program of ARM host sides is made the difference new barycenter with the protoplasm heart:
(1)If result is more than given standard, continues to cluster iteration, jump to step S2;
(2)If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters task Complete.
2. the via Self-reconfiguration K-means clustering technique implementation methods according to claim 1 based on SoC-FPGA, its feature exist In:Further include a release kernel and memory source step S9:After step S8 completions, discharge all kernels and provided with memory Source.
3. the via Self-reconfiguration K-means clustering technique implementation methods according to claim 1 based on SoC-FPGA, its feature exist In:For by new barycenter data and former barycenter data variance table compared with new barycenter being made the difference with the protoplasm heart described in step S8 Levy difference degree;The given standard is known threshold.
CN201510308043.0A 2015-06-08 2015-06-08 Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA Expired - Fee Related CN104850866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510308043.0A CN104850866B (en) 2015-06-08 2015-06-08 Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510308043.0A CN104850866B (en) 2015-06-08 2015-06-08 Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA

Publications (2)

Publication Number Publication Date
CN104850866A CN104850866A (en) 2015-08-19
CN104850866B true CN104850866B (en) 2018-05-01

Family

ID=53850501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510308043.0A Expired - Fee Related CN104850866B (en) 2015-06-08 2015-06-08 Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA

Country Status (1)

Country Link
CN (1) CN104850866B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631866B (en) * 2015-12-24 2019-04-05 武汉鸿瑞达信息技术有限公司 A kind of extraction calculation optimization method of the foreground target method based on heterogeneous platform
CN106354574A (en) * 2016-08-30 2017-01-25 浪潮(北京)电子信息产业有限公司 Acceleration system and method used for big data K-Mean clustering algorithm
CN106383695B (en) * 2016-09-14 2019-01-25 中国科学技术大学苏州研究院 The acceleration system and its design method of clustering algorithm based on FPGA
CN107703507B (en) * 2017-08-31 2020-04-10 西安空间无线电技术研究所 Target clustering implementation method and device based on FPGA
CN108280461B (en) * 2017-12-08 2020-04-14 西安电子科技大学 Rapid global K-means clustering method accelerated by OpenCL
CN108958852A (en) * 2018-07-16 2018-12-07 济南浪潮高新科技投资发展有限公司 A kind of system optimization method based on FPGA heterogeneous platform
CN111490946B (en) * 2019-01-28 2023-08-11 阿里巴巴集团控股有限公司 FPGA connection realization method and device based on OpenCL framework
CN113326479A (en) * 2021-05-28 2021-08-31 哈尔滨理工大学 FPGA-based K-means algorithm implementation method
CN114756880B (en) * 2022-04-14 2023-03-14 电子科技大学 Information hiding method and system based on FPGA

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2626801A3 (en) * 2012-02-09 2013-10-16 Altera Corporation Configuring a programmable device using high-level language
US8806403B1 (en) * 2013-06-21 2014-08-12 Altera Corporation Efficient configuration of an integrated circuit device using high-level language
CN104020983A (en) * 2014-06-16 2014-09-03 上海大学 KNN-GPU acceleration method based on OpenCL
CN104142845A (en) * 2014-07-21 2014-11-12 中国人民解放军信息工程大学 CT image reconstruction back projection acceleration method based on OpenCL-To-FPGA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2626801A3 (en) * 2012-02-09 2013-10-16 Altera Corporation Configuring a programmable device using high-level language
US8806403B1 (en) * 2013-06-21 2014-08-12 Altera Corporation Efficient configuration of an integrated circuit device using high-level language
CN104020983A (en) * 2014-06-16 2014-09-03 上海大学 KNN-GPU acceleration method based on OpenCL
CN104142845A (en) * 2014-07-21 2014-11-12 中国人民解放军信息工程大学 CT image reconstruction back projection acceleration method based on OpenCL-To-FPGA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Fractal video compression in OpenCL:An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms";Doris Chen 等;《Design Automation Conference (ASP-DAC), 2013 18th Asia and South Pacific》;20130429;第297-304页 *
"基于OpenCL的FPGA设计优化方法研究";范兴山 等;《电子技术应用》;20140116;第40卷(第1期);第16-19页 *

Also Published As

Publication number Publication date
CN104850866A (en) 2015-08-19

Similar Documents

Publication Publication Date Title
CN104850866B (en) Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA
Meng et al. Training deeper models by GPU memory optimization on TensorFlow
CN109740747B (en) Operation method, device and Related product
Kim et al. FPGA-based CNN inference accelerator synthesized from multi-threaded C software
Shao et al. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures
CN104142845B (en) CT image reconstructions back projection accelerated method based on OpenCL-To-FPGA
CN106383695B (en) The acceleration system and its design method of clustering algorithm based on FPGA
CN104866286B (en) A kind of k nearest neighbor classification accelerated method based on OpenCL and SoC-FPGA
CN109934339A (en) A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN109711539A (en) Operation method, device and Related product
EP2585950B1 (en) Apparatus and method for data stream processing using massively parallel processors
CN112580792B (en) Neural network multi-core tensor processor
CN109740725A (en) Neural network model operation method and device and storage medium
CN105447285B (en) A method of improving OpenCL hardware execution efficiency
Kim et al. The implementation of a power efficient bcnn-based object detection acceleration on a xilinx FPGA-SOC
Xu et al. Domino: Graph processing services on energy-efficient hardware accelerator
CN108804710A (en) Method and device for refining label through model tool based on business rule
CN117009038A (en) Graph computing platform based on cloud native technology
Lin et al. swFLOW: A dataflow deep learning framework on sunway taihulight supercomputer
Wang et al. Hardware-accelerated hypergraph processing with chain-driven scheduling
US20230065842A1 (en) Prediction and optimization of multi-kernel circuit design performance using a programmable overlay
CN109597619A (en) A kind of adaptive compiled frame towards heterogeneous polynuclear framework
Yang An efficient dispatcher for large scale graphprocessing on opencl-based fpgas
Li et al. Liquid state machine applications mapping for noc-based neuromorphic platforms
Segura et al. Energy-efficient stream compaction through filtering and coalescing accesses in gpgpu memory partitions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180501

Termination date: 20190608