CN104850866B - Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA - Google Patents
Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA Download PDFInfo
- Publication number
- CN104850866B CN104850866B CN201510308043.0A CN201510308043A CN104850866B CN 104850866 B CN104850866 B CN 104850866B CN 201510308043 A CN201510308043 A CN 201510308043A CN 104850866 B CN104850866 B CN 104850866B
- Authority
- CN
- China
- Prior art keywords
- barycenter
- fpga
- opencl
- data
- kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of via Self-reconfiguration K means clustering technique implementation methods based on SoC FPGA, it comprises the following steps:S1:Build the SoC FPGA heterogeneous platform models of ARM host sides and the cooperation of FPGA device end;S2:ARM host sides build OpenCL mainframe programs, create kernel, complete Memory Allocation and mapping;S3:Mainframe program calls the kernel program at FPGA device end, sends data to FPGA device end;S4:Euclidean distance is calculated first OpenCL kernel program parallel pipelining processes, produces a distance matrix;S5:The 2nd OpenCL kernel programs of via Self-reconfiguration, filter out the element of every row minimum and record its correspondence barycenter;S6:The 3rd OpenCL kernel programs of via Self-reconfiguration, realize all sample points in each barycenter cluster distance add up and quantity statistics work;S7:Mainframe program calculates new barycenter data;S8:Mainframe program is iterated judgement.The present invention not only improves the execution speed of K means clustering algorithms, obtains the energy efficiency of higher, and by kernel via Self-reconfiguration, solve the problems, such as FPGA hardware inadequate resource.
Description
Technical field
The present invention relates to data mining technology field, more particularly to a kind of via Self-reconfiguration K-means based on SoC-FPGA to gather
Class Implementation Technology.
Background technology
K-means algorithms as most common clustering algorithm, with its it is simple, effective the advantages that be widely used in pattern and know
Not, the field such as machine learning and data mining, specific such as automatic document classification arranges, the Basis Function Center of neutral net determines,
Nuclear magnetic resonance image dividing processing etc..K-means cluster belongs to unsupervised learning, its flow as shown in Figure 1, basic thought be with
In space on the basis of K barycenter, sample is sorted out with a distance from each barycenter by sample point, and is calculated current all kinds of
New barycenter.Successive ignition renewal cluster barycenter, based on barycenter convergence.K-means cluster process schematic diagrames are as shown in Figure 2.
It is seen that largely being calculated involved in K-means from algorithmic procedure, more particularly to sample point and barycenter distance calculate and
When updating barycenter computing, very big computing cost can be brought.In actual data mining task, sample size is often very huge
Greatly, corresponding computing resource and storage resource consumption are all very big, since FPGA internal resources are limited, how K-means are gathered
Class process carries out the time and optimization spatially is a problem of the area research at present.
Searched for by existing literature, find to be concentrated mainly on the time using the article of FPGA optimization K-means clustering techniques
In terms of optimization, there are the construction cycle is long, cross-platform transplantability is poor, is unsuitable for multiprocessor heterogeneous platform association for the implementation taken
The deficiencies of with accelerating.Kutty, Boussaid et al. are in 2013 in International Symposium on Circuits
Publish an article on and Systems (ISCAS)《A high speed configurable FPGA architecture for
k-mean clustering》, the acceleration of K-means clustering algorithms is realized on FPGA using globally configurable method.This
The traditional HDL hardware program languages of kind carry out developing make developer that great effort flower is big in development difficulties such as sequential logics
Module, can not be absorbed in algorithm in itself, cause development efficiency low.On the other hand, since HDL hardware languages can only be directed to FPGA
Platform is developed, and causes the portability of system low with compatibility.
2008, Apple companies proposed first and exempt from version towards general the open of purpose multiple programming of heterogeneous system
Tax standard, full name Open Computing Language(OpenCL), i.e. open computing language.OpenCL is suitable for difference
The collaboration parallel computation of processor, for the isomery coprocessor that it is supported including CPU, GPU, DSP etc., code is versatile, can be light
Pine is transplanted at distinct device end.2011, altera corp issued the OpenCL standard development plans of FPGA, and in
The FPGA products based on OpenCL frames are proposed within 2013, cross-platform multiple programming standard OpenCL be extend into FPGA necks
Domain.
On programming mode, FPGA executive programs are all using the OpenCL language developments of class C/C++ styles, exploitation letter
Just, modification is flexible, can greatly shorten the R&D cycle, reduces the R&D costs of product maintenance and upgrading;On the other hand, new method base
In OpenCL frameworks, code can realize cross-platform Rapid transplant, be adapted to extension and the association applied to multiprocessor heterogeneous platform
Among accelerating.Simultaneously as FPGA device introduces kernel via Self-reconfiguration mechanism, when compiling, can make full use of altera corp to provide
For FPGA-OpenCL exploitation optimisation technique, achieveed the purpose that abundant digging utilization FPGA hardware resource.
The content of the invention
Holding for K-means clustering algorithms is improved it is an object of the invention to overcome the deficiencies of the prior art and provide a kind of
Scanning frequency degree, obtain higher energy efficiency the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, solve
The prior art it is big for operand existing for existing K-means clustering algorithms, take hardware resource it is more, power consumption is big, system
The problems such as time delay is big.
The purpose of the present invention is what is be achieved through the following technical solutions:Via Self-reconfiguration K-means based on SoC-FPGA gathers
Class Implementation Technology, it comprises the following steps:
S1:Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platforms of ARM host sides and the cooperation of FPGA device end are built
Model, ARM host sides configure environmental parameter, complete initialization;The ARM host sides pass through with FPGA device end
AXI on-chip bus connects;
S2:ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL
The Memory Allocation of equipment end, and data are write into memory, ARM host sides are completed by way of parameter transmission and are set with OpenCL
The memory mapping at standby end;
S3:Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit of ARM host sides
Dimension, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to by AXI on-chip bus
FPGA device end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory
In;
S4:FPGA device end builds the first OpenCL kernel programs, the first OpenCL kernel program parallel pipelining processes
Ground calculates each sample set data to the Euclidean distance of each barycenter, produces a distance matrix;
S5:The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive the
The distance matrix that one OpenCL kernel programs produce, uses for each row of data in matrix and returns the method parallel processing of lookup
The each row of data of distance matrix, filters out the element of every row minimum and records its correspondence barycenter, complete the classification processing of sample set;
S6:The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using simultaneously
Line mode adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by number
ARM host sides are passed back by AXI buses according to result;
S7:The mainframe program of ARM host sides is added up by the distance of each barycenter cluster and distinguished divided by respective sample point
Quantity, calculates new barycenter data;
S8:Compared with the mainframe program of ARM host sides is made the difference new barycenter with the protoplasm heart:
(1)If result is more than given standard, continues to cluster iteration, jump to step S2;
(2)If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters
Task is completed.
Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA further include a release kernel and are provided with memory
Source step S9:After step S8 completions, all kernels and memory source are discharged.
For new barycenter data and former barycenter data are used compared with new barycenter being made the difference with the protoplasm heart described in step S8
Variance characterizes difference degree;The given standard is known threshold.
The beneficial effects of the invention are as follows:
(1)The present invention is directed to the problem of FPGA internal resources are insufficient in calculating process, devises the interior of FPGA device end
Core via Self-reconfiguration mechanism, makes kernel module according to current task progress timesharing dynamic importing FPGA, so as to optimize FPGA hardware money
The utilization ratio in source;Solving the optimization method for FPGA-OpenCL exploitations that the prior art provides includes vectorization and flowing water
The technologies such as line duplication, although system can be made to obtain more powerful computing capability, the rise of hardware resource occupancy, can not even match somebody with somebody
The problems such as putting.
(2)The SoC-FPGA systems that the present invention uses are made of two parts subsystem, be respectively ARM frameworks subsystem and
FPGA architecture subsystem, since two systems are integrated on same chip, AXI on-chip bus high bandwidth characteristics will greatly shorten
The communication delay of host and equipment, improves data throughout.
(3)The present invention realizes data calculation optimization by reasonable disposition calculation position:According to K-means clustering algorithms
Characteristic, computational intensity is high and is adapted to parallel distance matrix to calculate, sample is sorted out, apart from moulds such as cumulative and sample size statistics
Block is performed with kernel program form at FPGA ends, and barycenter renewal and the light calculation amount such as iteration control and is not easy parallel module and is existed
ARM ends perform.
(4)Due to the fine granulation architecture of FPGA device, compiling only generates required logical construction, reduces system
Energy consumption, has achieveed the purpose that high-performance low-power-consumption calculates.
(5)The present invention realizes that data memory access optimizes by way of reasonable disposition data are stored:OpenCL standards are provided
Memory model include global memory, local memory and privately owned memory etc., since global memory possesses more than resource but accesses speed
Degree is slow, and local memory access speed is fast but resource is less, and the relatively small number of data to be sorted of data volume are stored to local
Deposit, the larger training set data of data volume is stored to global memory.
(6)Using OpenCL standard developments, system portability is strong, and compatibility is strong.
(7)For FPGA executive programs all using the OpenCL language developments of class C/C++ styles, exploitation is easy, and modification is flexible,
The R&D cycle can be greatly shortened, reduces the R&D costs of product maintenance and upgrading.
Brief description of the drawings
Fig. 1 is the K-means clustering algorithm flow charts of the prior art;
Fig. 2 is the K-means cluster process schematic diagrames of the prior art;
Fig. 3 is the method for the present invention flow chart;
Fig. 4 is SoC-FPGA top-level module Organization Charts;
Searching work schematic diagram is simultaneously returned in Fig. 5 positions;
Fig. 6 is ARM host side iteration control flow charts.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The system architecture of the present invention passes through AXI buses and FPGA device end phase as shown in figure 4, wherein ARM is host side
Even, AXI on-chip bus high bandwidth characteristics will greatly shorten the communication delay of host and equipment, improve throughput of system.According to K-
The characteristic of means clustering algorithms, computational intensity is high and be adapted to parallel distance matrix to calculate, sample is sorted out, distance is cumulative and
The modules such as sample size statistics are performed with kernel program form at FPGA ends, and barycenter renewal and the light calculation amount such as iteration control and
Parallel module is not easy to perform at ARM ends.
The memory model that OpenCL standards are provided includes global memory, local memory and privately owned memory etc., due to complete
Intra-office, which is deposited, possesses that resource is more but access speed is slow, and local memory access speed is fast but resource is less, and data volume is relatively fewer
Barycenter data store to local memory, the larger sample set data of data volume are stored to global memory.The design passes through conjunction
The mode of configuration data storage is managed, realizes that data memory access optimizes.
In the entire system, ARM host sides manage all computing resources on whole platform, and host side program passes through definition
Context and queue management kernel program.SIMD is based on tradition(Single Instruction Multiple Data)And
The heterogeneous computing system of row implementation is different, and SoC-FPGA is realized parallel using assembly line used by the design, can be more preferable
The module that ground handles and returns lookup etc. to have a large amount of branch's skip instructions.
As shown in figure 3, the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, it comprises the following steps:
S1:Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platforms of ARM host sides and the cooperation of FPGA device end are built
Model, ARM host sides configure environmental parameter, complete initialization;The ARM host sides pass through with FPGA device end
AXI on-chip bus connects;
S2:ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL
The Memory Allocation of equipment end, and data are write into memory, ARM host sides are completed by way of parameter transmission and are set with OpenCL
The memory mapping at standby end;
S3:Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit of ARM host sides
Dimension, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to by AXI on-chip bus
FPGA device end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory
In;
S4:FPGA device end builds the first OpenCL kernel programs, the first OpenCL kernel program parallel pipelining processes
Ground calculates each sample set data to the Euclidean distance of each barycenter, produces a distance matrix;
S5:The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive the
The distance matrix that one OpenCL kernel programs produce, as shown in figure 5, being used for each row of data in matrix and returning the side of lookup
The each row of data of method parallel processing distance matrix, filters out the element of every row minimum and records its correspondence barycenter, complete sample set
Classification processing;
S6:The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using simultaneously
Line mode adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by number
ARM host sides are passed back by AXI buses according to result;
S7:The mainframe program of ARM host sides is added up by the distance of each barycenter cluster and distinguished divided by respective sample point
Quantity, calculates new barycenter data;
S8:ARM host side iteration control flow charts are as shown in Figure 6, and the mainframe programs of ARM host sides is by new barycenter and protoplasm
The heart make the difference comparing:
(1)If result is more than given standard, continues to cluster iteration, jump to step S2;
(2)If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters
Task is completed.
Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA further include a release kernel and are provided with memory
Source step S9:After step S8 completions, all kernels and memory source are discharged.
For new barycenter data and former barycenter data are used compared with new barycenter being made the difference with the protoplasm heart described in step S8
Variance characterizes difference degree;The given standard is known threshold.
Claims (3)
1. the via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA, it is characterised in that:It comprises the following steps:
S1:Under OpenCL programming frameworks, the SoC-FPGA heterogeneous platform moulds of ARM host sides and the cooperation of FPGA device end are built
Type, ARM host sides configure environmental parameter, complete initialization;The ARM host sides pass through AXI with FPGA device end
On-chip bus connects;
S2:ARM host sides build OpenCL mainframe programs, and mainframe program creates kernel, completes ARM host sides and OpenCL equipment
The Memory Allocation at end, and data are write into memory, ARM host sides and OpenCL equipment ends are completed by way of parameter transmission
Memory mapping;
S3:Mainframe program configuration FPGA device end working group group number, working group's size and the computing unit dimension of ARM host sides
Degree, and the kernel program at FPGA ends is called, sample set data and initial barycenter data are transmitted to FPGA by AXI on-chip bus
Equipment end, wherein, the sample set data are stored in global memory, and the barycenter data is stored in local memory;
S4:FPGA device end builds the first OpenCL kernel programs, counts the first OpenCL kernel program parallel pipelining processes
Each sample set data are calculated to the Euclidean distance of each barycenter, produce a distance matrix;
S5:The 2nd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 2nd OpenCL kernel programs receive first
The distance matrix that OpenCL kernel programs produce, used for each row of data in matrix and return the method parallel processing of lookup away from
From each row of data of matrix, filter out the element of every row minimum and record its correspondence barycenter, complete the classification processing of sample set;
S6:The 3rd OpenCL kernel programs of FPGA device end via Self-reconfiguration, the 3rd OpenCL kernel programs are using parallel side
Formula adds up the sample point distance in each barycenter cluster, counts sample point number in each barycenter cluster, and by data knot
Fruit passes ARM host sides back by AXI buses;
S7:The mainframe program of ARM host sides is cumulative and respectively divided by respective by the distance between barycenter in each barycenter cluster
Sample point quantity, calculates new barycenter data;
S8:Compared with the mainframe program of ARM host sides is made the difference new barycenter with the protoplasm heart:
(1)If result is more than given standard, continues to cluster iteration, jump to step S2;
(2)If result is less than given standard, show that barycenter is restrained, then no longer carry out cluster iteration, whole K-means clusters task
Complete.
2. the via Self-reconfiguration K-means clustering technique implementation methods according to claim 1 based on SoC-FPGA, its feature exist
In:Further include a release kernel and memory source step S9:After step S8 completions, discharge all kernels and provided with memory
Source.
3. the via Self-reconfiguration K-means clustering technique implementation methods according to claim 1 based on SoC-FPGA, its feature exist
In:For by new barycenter data and former barycenter data variance table compared with new barycenter being made the difference with the protoplasm heart described in step S8
Levy difference degree;The given standard is known threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510308043.0A CN104850866B (en) | 2015-06-08 | 2015-06-08 | Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510308043.0A CN104850866B (en) | 2015-06-08 | 2015-06-08 | Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104850866A CN104850866A (en) | 2015-08-19 |
CN104850866B true CN104850866B (en) | 2018-05-01 |
Family
ID=53850501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510308043.0A Expired - Fee Related CN104850866B (en) | 2015-06-08 | 2015-06-08 | Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104850866B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631866B (en) * | 2015-12-24 | 2019-04-05 | 武汉鸿瑞达信息技术有限公司 | A kind of extraction calculation optimization method of the foreground target method based on heterogeneous platform |
CN106354574A (en) * | 2016-08-30 | 2017-01-25 | 浪潮(北京)电子信息产业有限公司 | Acceleration system and method used for big data K-Mean clustering algorithm |
CN106383695B (en) * | 2016-09-14 | 2019-01-25 | 中国科学技术大学苏州研究院 | The acceleration system and its design method of clustering algorithm based on FPGA |
CN107703507B (en) * | 2017-08-31 | 2020-04-10 | 西安空间无线电技术研究所 | Target clustering implementation method and device based on FPGA |
CN108280461B (en) * | 2017-12-08 | 2020-04-14 | 西安电子科技大学 | Rapid global K-means clustering method accelerated by OpenCL |
CN108958852A (en) * | 2018-07-16 | 2018-12-07 | 济南浪潮高新科技投资发展有限公司 | A kind of system optimization method based on FPGA heterogeneous platform |
CN111490946B (en) * | 2019-01-28 | 2023-08-11 | 阿里巴巴集团控股有限公司 | FPGA connection realization method and device based on OpenCL framework |
CN113326479A (en) * | 2021-05-28 | 2021-08-31 | 哈尔滨理工大学 | FPGA-based K-means algorithm implementation method |
CN114756880B (en) * | 2022-04-14 | 2023-03-14 | 电子科技大学 | Information hiding method and system based on FPGA |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2626801A3 (en) * | 2012-02-09 | 2013-10-16 | Altera Corporation | Configuring a programmable device using high-level language |
US8806403B1 (en) * | 2013-06-21 | 2014-08-12 | Altera Corporation | Efficient configuration of an integrated circuit device using high-level language |
CN104020983A (en) * | 2014-06-16 | 2014-09-03 | 上海大学 | KNN-GPU acceleration method based on OpenCL |
CN104142845A (en) * | 2014-07-21 | 2014-11-12 | 中国人民解放军信息工程大学 | CT image reconstruction back projection acceleration method based on OpenCL-To-FPGA |
-
2015
- 2015-06-08 CN CN201510308043.0A patent/CN104850866B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2626801A3 (en) * | 2012-02-09 | 2013-10-16 | Altera Corporation | Configuring a programmable device using high-level language |
US8806403B1 (en) * | 2013-06-21 | 2014-08-12 | Altera Corporation | Efficient configuration of an integrated circuit device using high-level language |
CN104020983A (en) * | 2014-06-16 | 2014-09-03 | 上海大学 | KNN-GPU acceleration method based on OpenCL |
CN104142845A (en) * | 2014-07-21 | 2014-11-12 | 中国人民解放军信息工程大学 | CT image reconstruction back projection acceleration method based on OpenCL-To-FPGA |
Non-Patent Citations (2)
Title |
---|
"Fractal video compression in OpenCL:An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms";Doris Chen 等;《Design Automation Conference (ASP-DAC), 2013 18th Asia and South Pacific》;20130429;第297-304页 * |
"基于OpenCL的FPGA设计优化方法研究";范兴山 等;《电子技术应用》;20140116;第40卷(第1期);第16-19页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104850866A (en) | 2015-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104850866B (en) | Via Self-reconfiguration K-means clustering technique implementation methods based on SoC-FPGA | |
Meng et al. | Training deeper models by GPU memory optimization on TensorFlow | |
CN109740747B (en) | Operation method, device and Related product | |
Kim et al. | FPGA-based CNN inference accelerator synthesized from multi-threaded C software | |
Shao et al. | Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures | |
CN104142845B (en) | CT image reconstructions back projection accelerated method based on OpenCL-To-FPGA | |
CN106383695B (en) | The acceleration system and its design method of clustering algorithm based on FPGA | |
CN104866286B (en) | A kind of k nearest neighbor classification accelerated method based on OpenCL and SoC-FPGA | |
CN109934339A (en) | A kind of general convolutional neural networks accelerator based on a dimension systolic array | |
CN109711539A (en) | Operation method, device and Related product | |
EP2585950B1 (en) | Apparatus and method for data stream processing using massively parallel processors | |
CN112580792B (en) | Neural network multi-core tensor processor | |
CN109740725A (en) | Neural network model operation method and device and storage medium | |
CN105447285B (en) | A method of improving OpenCL hardware execution efficiency | |
Kim et al. | The implementation of a power efficient bcnn-based object detection acceleration on a xilinx FPGA-SOC | |
Xu et al. | Domino: Graph processing services on energy-efficient hardware accelerator | |
CN108804710A (en) | Method and device for refining label through model tool based on business rule | |
CN117009038A (en) | Graph computing platform based on cloud native technology | |
Lin et al. | swFLOW: A dataflow deep learning framework on sunway taihulight supercomputer | |
Wang et al. | Hardware-accelerated hypergraph processing with chain-driven scheduling | |
US20230065842A1 (en) | Prediction and optimization of multi-kernel circuit design performance using a programmable overlay | |
CN109597619A (en) | A kind of adaptive compiled frame towards heterogeneous polynuclear framework | |
Yang | An efficient dispatcher for large scale graphprocessing on opencl-based fpgas | |
Li et al. | Liquid state machine applications mapping for noc-based neuromorphic platforms | |
Segura et al. | Energy-efficient stream compaction through filtering and coalescing accesses in gpgpu memory partitions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180501 Termination date: 20190608 |