CN108763653B - Reconfigurable linear equation set solving accelerator based on FPGA - Google Patents

Reconfigurable linear equation set solving accelerator based on FPGA Download PDF

Info

Publication number
CN108763653B
CN108763653B CN201810412917.0A CN201810412917A CN108763653B CN 108763653 B CN108763653 B CN 108763653B CN 201810412917 A CN201810412917 A CN 201810412917A CN 108763653 B CN108763653 B CN 108763653B
Authority
CN
China
Prior art keywords
data
module
calculation
linear equation
coefficient matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810412917.0A
Other languages
Chinese (zh)
Other versions
CN108763653A (en
Inventor
潘红兵
苏岩
秦子迪
何书专
李丽
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810412917.0A priority Critical patent/CN108763653B/en
Publication of CN108763653A publication Critical patent/CN108763653A/en
Application granted granted Critical
Publication of CN108763653B publication Critical patent/CN108763653B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • G06F30/331Design verification, e.g. functional simulation or model checking using simulation with hardware acceleration, e.g. by using field programmable gate array [FPGA] or emulation

Abstract

The invention provides a reconfigurable linear equation set solving accelerator based on an FPGA, which comprises: the data distribution module is used for distributing the data in the internal memory to the calculation array module and adjusting the data distribution mode under the control of the main control module according to the scale and the type of the input coefficient matrix; the main program control module is used for controlling the operation of the data distribution module, the reconstruction control module and the calculation array module and the communication among the modules; the reconstruction control module is used for resetting the calculation mode according to the scale and the type of the coefficient matrix; an internal memory module for storing the coefficient matrix and the vector data; and the calculation array module is used for calculating the solution of the linear equation system. The reconstruction method designed by the invention can simultaneously adjust the storage and transmission modes of data, can adopt different operation modes under the scene of different requirements on operation resources and operation precision, and has better universality compared with the existing accelerator for solving a linear equation system.

Description

Reconfigurable linear equation set solving accelerator based on FPGA
Technical Field
The invention belongs to the technical field of data processing, and relates to a linear equation system solving technology.
Background
Solving of the linear equation system belongs to a part of matrix calculation and is the core of scientific and engineering calculation. Solving linear equations is a type of computation intensive task, and widely exists in the fields of data mining, signal processing, numerical approximation and the like.
The solution of the large-scale linear equation system generally has two types, namely a coefficient matrix is a sparse matrix and a coefficient matrix is a dense matrix. When the sparse linear equation set is solved, because a large number of elements exist, the iterative method is adopted to solve, so that a lot of computing resources can be saved, and the solution vector finally approaches to an accurate solution through multiple iterations. When solving the dense linear equation set, an LU decomposition method is generally adopted to obtain an accurate solution of the linear equation set. Due to the difference between the two algorithms, the solution of the large-scale linear equation set is usually realized by software through a general-purpose processor on a high-performance server.
The FPGA has a large number of computing components, can perform parallel pipeline processing on large-scale repeated computing, realizes parallelization of algorithms, and is a good choice for accelerating the solution of a large-scale linear equation set. By combining the FPGA reconfigurable technology and designing aiming at different algorithms, the universality of the solving mode can be realized.
Disclosure of Invention
In order to solve the problems and achieve generality of solution modes, the invention provides a general solution accelerator for sparse and dense matrix linear equation sets based on an FPGA (field programmable gate array) based on a reconfigurable technology, provides a reconfiguration method for two different types of linear equation sets, and is specifically achieved by the following technical scheme.
The reconfigurable linear equation system solving accelerator based on the FPGA comprises:
the data distribution module is used for distributing the data in the internal memory to the calculation array module and adjusting the data distribution mode under the control of the main control module according to the scale and the type of the input coefficient matrix;
the main program control module is used for controlling the operation of the data distribution module, the reconstruction control module and the calculation array module and the communication among the modules;
the reconstruction control module is used for resetting the calculation mode according to the scale and the type of the coefficient matrix;
an internal memory module for storing the coefficient matrix and the vector data;
and the calculation array module is used for calculating the solution of the linear equation system.
The reconfigurable linear equation system solving accelerator based on the FPGA is further designed in that the data distribution module controls a data path from data in the internal memory module to cache of the calculation array module at each moment, data exchange in the internal memory module is operated, and a header mark is added to each distributed column data, so that the data are distributed to cache of a calculation unit of the matched calculation array module.
The reconfigurable linear equation system solving accelerator based on the FPGA is further designed in that the coefficient matrix types processed by the data distribution module are divided into a sparse coefficient matrix and a dense coefficient matrix.
The reconfigurable linear equation set solving accelerator based on the FPGA is further designed in that a main program module is respectively in bidirectional communication connection with a data distribution module, a reconfiguration control module and a calculation array module, so that the operation of the data distribution module, the reconfiguration control module and the calculation array module and the communication among the modules are controlled, and an uppermost controller of the linear equation set solving accelerator is formed.
The reconfigurable linear equation set solving accelerator based on the FPGA is further designed in that the reconfiguration control module resets the operation mode of the calculation array module to an iterative method according to the type of the coefficient matrix and the type of the matrix is a sparse coefficient matrix; and resetting the operation mode of the calculation array module to be a direct method when the type of the matrix is a dense coefficient matrix.
The reconfigurable linear equation set solving accelerator based on the FPGA is further designed in that the iterative method adopts a Jacobi iterative method idea, a processing coefficient matrix is used for solving the linear equation set of a large-scale sparse matrix, and an approximate solution of required accuracy is obtained; the direct method adopts the principle of column-selected principal component LU decomposition, and solves a linear equation set with a processing coefficient matrix being a dense matrix to obtain an accurate solution.
The reconfigurable linear equation system solving accelerator based on the FPGA is further designed in that an internal storage module is provided with RAMs with different depths according to the scale of a coefficient matrix and is used for storing data of each column and row of the coefficient matrix, and the data is communicated with a cache in a calculation array module through a data bus to finish reading and writing of the data.
The reconfigurable linear equation system solution accelerator based on the FPGA is further designed in that the calculation array module comprises:
the preprocessing unit is used for completing preprocessing before calculation and distribution work of specific data;
12 × 12 computing unit array for performing parallelization large-scale data computation, and executing LU decomposition process of direct method and iterative computation process of iterative method;
the back substitution unit is used for calculating the back substitution process of the linear equation set solution after the LU decomposition is finished;
and the iteration judging unit is used for calculating the vector x after the single iteration is finished and judging whether the iteration is finished or not according to the accuracy.
The reconfigurable linear equation set solving accelerator based on the FPGA is further designed in that in a direct method calculation mode, a preprocessing unit completes column selection of principal elements, obtained line information of the principal elements is communicated with a data distribution module, and calculation is performed in sequence
Figure BDA0001646872160000031
amnIs to select the principal element column for data other than principal element, apivIs a column select principal and is in communication with the 12 x 12 array of compute units; in the iterative calculation mode, the preprocessing unit communicates with the 12-by-12 calculation unit array and distributes each parameter x in the vector x in turnnAnd distributing the data to a computing unit for storing the nth data.
The reconfigurable linear equation system solving accelerator based on the FPGA is further designed in that each computing unit array comprises a head label matching unit and a multiplication and addition computing unit.
THE ADVANTAGES OF THE PRESENT INVENTION
The linear equation system solving accelerator adopts a proper and efficient solving method according to the type of the equation required to be solved. The linear equation system solving accelerator can complete the linear equation system solving of LU decomposition and can also complete the linear equation system solving of a Jacobi iteration method. Different operation modes can be adopted under the scene of different requirements on operation resources and operation precision, and compared with the existing linear equation system solution accelerator, the solution accelerator has better universality.
The invention aims at analyzing the processes of two algorithms, extracts a similar operation process, adopts a parallel pipelining method, accelerates the operation process, and effectively improves the operation efficiency compared with a software solving method of a general processor.
The reconstruction control module can simultaneously adjust the storage and transmission modes of data, and has good acceleration effect on a large-scale coefficient matrix linear equation set.
Drawings
FIG. 1 is an overall architecture diagram of an accelerator for solving a system of linear equations.
FIG. 2 is a block diagram of a compute array module.
Fig. 3 is a diagram of a computing unit structure.
Detailed Description
The following describes the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention solves the accelerator based on a linear system of equations of a reconfigurable idea. The linear equation system solving accelerator consists of a main control module, an internal storage module, a reconstruction control module, a data distribution module and a calculation array module. Wherein:
the main control module is used as the uppermost control module of the system and controls the whole operation process of the solving accelerator. At the beginning of the operation of the solving accelerator, the main control module obtains parameters (the scale and the type of the coefficient matrix) of the coefficient matrix stored in the off-chip DRAM, and communicates with the reconstruction control module according to the obtained parameters to complete the initial work of the solving. The main control module is also communicated with the data distribution module, the internal storage module and the calculation array module, so that the data distribution work, the data exchange and preprocessing work and the data calculation work are completed.
And the internal storage module stores data required by calculation and a result after calculation. The internal storage module is controlled by the reconstruction control module, completes the reconstruction of the memory in the reconstruction process and adapts to the scale and the type of the coefficient matrix. Meanwhile, the data distribution module can also communicate with the internal storage module to complete data processing work such as exchange of partial data and addition of head labels. The internal memory is divided into 12 groups, each of which is connected to 12 computing units via a data bus.
And the reconstruction control module completes reconstruction according to the scale and the type of the coefficient matrix provided by the main controller at the initial stage of solving the accelerator. The internal storage reconstructing module adjusts the depth and size of internal storage according to the scale of the coefficient matrix; and adjusting the storage mode to be row storage or column storage according to the coefficient matrix type. And the reconstruction calculation array module adjusts the operation modes of the calculation array module according to the coefficient type or the specified calculation requirement, and is a direct method and an iterative method respectively. Under the operation mode of the direct method, a preprocessing unit in a calculation array module is adjusted to be in a column selection mode, and a back substitution unit is started; and under the operation mode of the iterative method, adjusting the preprocessing unit in the calculation array module to be in a distribution mode, and starting the iterative judgment unit.
The data distribution module is used for completing the distribution of data, and mainly adds a head label to the data so that the data is sent to the computing unit corresponding to the head label; and communicating with a preprocessing unit in the calculation array module to obtain the parameters of the column selection principal element and control the internal storage module to exchange principal element column data.
As shown in fig. 2, the calculation array module is composed of a preprocessing unit, a 12 × 12 calculation unit array, a back-substitution unit, and an iteration judgment unit. The preprocessing unit is composed of a multi-input comparator, a buffer and a divider. Data is stored in a buffer, a multi-input comparator completes the work of a column selection main element, and a divider completes the evaluation after the column selection main element
Figure BDA0001646872160000041
(amnIs to select the principal element column for data other than principal element, apivIs a pivot of the pivot column).
The preprocessing unit has two modes, the mode selection is reconstructedThe control module controls the direct method to be a column selection mode to complete column selection principal elements,
Figure BDA0001646872160000042
Calculating and distributing data; the iterative method is a distribution mode, and data distribution is completed. 12 × 12 computing unit array completes the multiply-add operation of data. As shown in fig. 3, after the single computing unit head label matching module matches the corresponding head label, the data is sent to the cache, and the subsequent part completes multiplication and accumulation. And after the direct method LU decomposition is completed, the back substitution unit performs subsequent back substitution operation to obtain a equation set solution. The iteration judgment unit calculates a vector x of one iteration, judges whether the vector x reaches the set accuracy, and solves a solution of the equation set if the vector x reaches the set accuracy; if not, the next iteration is carried out.
The working principle of solving the accelerator by the linear equation set is described below by taking a dense coefficient matrix linear equation set of 200 x 200 as an example. In the initial stage, after the main controller obtains the scale and the type of the matrix, the main controller communicates with the reconstruction controller and sets the storage mode of the internal storage module. For the dense coefficient matrix, the coefficient matrix is stored in columns, and each set of the internal memory stores an entire column of data completely, i.e., the first column is stored in the first set, the second column is stored in the second set, and the storage is sequentially performed, and the cycle is restarted at the thirteenth column of data and stored in the first set. The reconstruction control module is also provided with a calculation array module, adjusts the preprocessing unit into a column selection mode, adjusts the head label of each calculation unit, and starts the back substitution unit to close the iteration judgment unit. After the reconfiguration is completed, the off-chip DRAM transmits the data to the internal memory module through the data bus. After the storage is completed, the data allocation module adds a head tag before each column of data, and the head tag of each column of data is the column number (vector x special mark) in which the column is located. The main controller module controls the first row of data to enter a preprocessing unit of the calculation array module, and the calculation is carried out after row selection of principal elements is completed
Figure BDA0001646872160000051
And sent to the respective computing units. Then, each column of data is transmitted from each set of internal memory to the corresponding dataOn the bus, there are 12 data buses connected to 12 × 12 computing units, and there are 12 computing units matched to the column data of the corresponding head tag at the same time. And then, the data transmission of 144 columns (143 columns of data and vector x) is completed. The main control module controls the calculation to start, the calculation units simultaneously start the multiplication and addition operation, and the 1 st calculation unit calculates according to the difference of the stored data
Figure BDA0001646872160000052
The 2 nd calculation unit calculates
Figure BDA0001646872160000053
Sequentially, the 143 th calculation unit calculates
Figure BDA0001646872160000054
144 th calculation unit calculates
Figure BDA0001646872160000055
The first round of calculation completes the update result. The preprocessing unit completes the first round of calculation
Figure BDA0001646872160000056
After the first round of calculation is completed, the calculation is sent to each calculation unit. Starting the second round of calculation, i.e. the 1 st calculation unit
Figure BDA0001646872160000057
The 2 nd calculation unit calculates
Figure BDA0001646872160000058
Sequentially, the 143 th calculation unit calculates
Figure BDA0001646872160000059
144 th calculation unit calculates
Figure BDA00016468721600000510
Because the cache of the computing unit can not store a whole column of data, when the stored column of data is calculated, the main control moduleThe internal storage module is controlled to send the remaining column data, the data covers the 2 nd to the last of the cache of the computing unit, and the data of the first row is reserved. The calculation continues and finally completes the LU decomposition of columns 1 to 143. And updating the data of the rest columns into a calculation unit, keeping the vector x to perform the same calculation without operation, and finally obtaining a coefficient matrix of the LU decomposition once, wherein the result covers the original data in the internal memory. The main controller starts the second LU decomposition, the new coefficient matrix truncates the first row data and the first column data for the original matrix, and the vector x truncates x1The other operations are the same as the first LU decomposition. Each time a new LU decomposition is started, one row and one column of coefficients are removed, leaving one parameter of the vector x. And when LU decomposition is finally completed, the back substitution unit starts back substitution solution to obtain the solution of the linear equation set.
The invention relates to a linear equation set solving accelerator designed based on a reconfigurable idea, and a proper and efficient solving method is adopted according to the type of the equation to be solved. The linear equation system solving accelerator can complete the linear equation system solving of LU decomposition and can also complete the linear equation system solving of a Jacobi iteration method. Different operation modes can be adopted in the scene with less general requirements on operation resources and operation precision, and compared with the existing linear equation system solution accelerator, the solution accelerator has better universality. The invention aims at analyzing the processes of two algorithms, extracts a similar operation process, adopts a parallel pipelining method, accelerates the operation process and effectively improves the operation efficiency. The reconstruction control module of the solving accelerator can simultaneously adjust the storage and transmission modes of data, and has good acceleration effect for a large-scale coefficient matrix.
The reconfigurable linear equation set solving accelerator provided by the invention is described in detail above, so as to facilitate understanding of the invention and the core idea thereof. For a person skilled in the art, many modifications and deductions can be made in the concrete implementation according to the core idea of the invention. In view of the above, this description should not be taken in a limiting sense.

Claims (8)

1. A reconfigurable linear equation set solution accelerator based on an FPGA is characterized by comprising:
the data distribution module is used for distributing the data in the internal memory to the calculation array module and adjusting the data distribution mode under the control of the main control module according to the scale and the type of the input coefficient matrix;
the main program control module is used for controlling the operation of the data distribution module, the reconstruction control module and the calculation array module and the communication among the modules;
the reconstruction control module is used for resetting the calculation mode according to the scale and the type of the coefficient matrix;
an internal memory module for storing the coefficient matrix and the vector data;
the calculation array module is used for calculating the solution of the linear equation set;
the reconstruction control module resets the operation mode of the calculation array module as an iteration method for the matrix type as a sparse coefficient matrix according to the type of the coefficient matrix; for the matrix type being a dense coefficient matrix, resetting the operation mode of the calculation array module as a direct method;
the iterative method adopts the concept of a Jacobi iterative method, and a processing coefficient matrix is solved for a linear equation set of a large sparse matrix to obtain an approximate solution of required accuracy; the direct method adopts the principle of column-selected principal component LU decomposition, and solves a linear equation set with a processing coefficient matrix being a dense matrix to obtain an accurate solution.
2. The accelerator according to claim 1, wherein the data distribution module controls data paths of data in the internal memory module to the cache of the computing array module at each time, operates data exchange in the internal memory module, and adds a header mark to each distributed column data to distribute the data to the cache of the computing unit of the matching computing array module.
3. The accelerator according to claim 1, wherein the coefficient matrix types processed by the data distribution module are classified into sparse coefficient matrices and dense coefficient matrices.
4. The accelerator for solving the reconfigurable linear equation set based on the FPGA as claimed in claim 1, wherein the main program module is respectively connected with the data distribution module, the reconfiguration control module and the calculation array module in a bidirectional communication manner, so that the operation of the data distribution module, the reconfiguration control module and the calculation array module and the communication among the modules are controlled, and an uppermost controller of the accelerator for solving the linear equation set is formed.
5. The accelerator according to claim 1, wherein the internal storage module configures RAMs with different depths according to the scale of the coefficient matrix, and is configured to store data of each column and row of the coefficient matrix, and communicate with the cache in the calculation array module through a data bus to complete reading and writing of the data.
6. The accelerator according to claim 1, wherein the computational array module comprises:
the preprocessing unit is used for completing preprocessing before calculation and distribution work of specific data;
12 × 12 computing unit array for performing parallelization large-scale data computation, and executing LU decomposition process of direct method and iterative computation process of iterative method;
the back substitution unit is used for calculating the back substitution process of the linear equation set solution after the LU decomposition is finished;
and the iteration judging unit is used for calculating the vector x after the single iteration is finished and judging whether the iteration is finished or not according to the accuracy.
7. The accelerator according to claim 6, wherein in the direct calculation mode, the preprocessing unit completes column selection of the principal element, and communicates the row information of the obtained principal element with the data distribution moduleSignal, sequential calculation
Figure FDA0003521708650000021
amnIs to select the principal element column for data other than principal element, apivIs a column select principal and is in communication with the 12 x 12 array of compute units; in the iterative calculation mode, the preprocessing unit communicates with the 12 x 12 calculation unit array, and distributes each parameter x in the vector x in turnnAnd distributing the data to a computing unit for storing the nth data.
8. The FPGA-based reconfigurable linear equation set solution accelerator of claim 7, wherein each computing element array comprises a header tag matching element and a multiply-add computing element.
CN201810412917.0A 2018-04-30 2018-04-30 Reconfigurable linear equation set solving accelerator based on FPGA Active CN108763653B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810412917.0A CN108763653B (en) 2018-04-30 2018-04-30 Reconfigurable linear equation set solving accelerator based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810412917.0A CN108763653B (en) 2018-04-30 2018-04-30 Reconfigurable linear equation set solving accelerator based on FPGA

Publications (2)

Publication Number Publication Date
CN108763653A CN108763653A (en) 2018-11-06
CN108763653B true CN108763653B (en) 2022-04-22

Family

ID=64009432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810412917.0A Active CN108763653B (en) 2018-04-30 2018-04-30 Reconfigurable linear equation set solving accelerator based on FPGA

Country Status (1)

Country Link
CN (1) CN108763653B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070685B (en) * 2023-03-27 2023-07-21 南京大学 Memory computing unit, memory computing array and memory computing chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604306A (en) * 2009-06-03 2009-12-16 中国人民解放军国防科学技术大学 Method of column pivoting LU decomposition based on FPGA
CN103975302A (en) * 2011-12-22 2014-08-06 英特尔公司 Matrix multiply accumulate instruction
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9317482B2 (en) * 2012-10-14 2016-04-19 Microsoft Technology Licensing, Llc Universal FPGA/ASIC matrix-vector multiplication architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604306A (en) * 2009-06-03 2009-12-16 中国人民解放军国防科学技术大学 Method of column pivoting LU decomposition based on FPGA
CN103975302A (en) * 2011-12-22 2014-08-06 英特尔公司 Matrix multiply accumulate instruction
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FPGA矩阵计算并行算法与结构;邬贵明;《中国博士学位论文全文数据库 信息科技辑》;20120415;I137-2 *
Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network;Yiwei Zhang 等;《2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)》;20171215;第614-621页 *

Also Published As

Publication number Publication date
CN108763653A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108416436B (en) Method and system for neural network partitioning using multi-core processing module
CN110998570B (en) Hardware node with matrix vector unit with block floating point processing
CN111684473B (en) Improving performance of neural network arrays
CN104915322B (en) A kind of hardware-accelerated method of convolutional neural networks
CN110390385B (en) BNRP-based configurable parallel general convolutional neural network accelerator
CN111626414B (en) Dynamic multi-precision neural network acceleration unit
JP2021144750A (en) Accelerator for deep neural networks
CN104899182A (en) Matrix multiplication acceleration method for supporting variable blocks
EP3584719A1 (en) Method and device for multiplying matrices with vectors
CN104317768A (en) Matrix multiplication accelerating method for CPU+DSP (Central Processing Unit + Digital Signal Processor) heterogeneous system
CN108304925B (en) Pooling computing device and method
CN101086699A (en) Matrix multiplier device based on single FPGA
CN105373517A (en) Spark-based distributed matrix inversion parallel operation method
CN107341133A (en) The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition
CN111738433A (en) Reconfigurable convolution hardware accelerator
CN112784973A (en) Convolution operation circuit, device and method
US20200104669A1 (en) Methods and Apparatus for Constructing Digital Circuits for Performing Matrix Operations
CN114503126A (en) Matrix operation circuit, device and method
CN108763653B (en) Reconfigurable linear equation set solving accelerator based on FPGA
Cho et al. FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks
Bonny et al. Time efficient segmented technique for dynamic programming based algorithms with FPGA implementation
EP4318275A1 (en) Matrix multiplier and method for controlling matrix multiplier
CN116185937B (en) Binary operation memory access optimization method and device based on multi-layer interconnection architecture of many-core processor
CN113298241B (en) Deep separable convolutional neural network acceleration method and accelerator
Wu et al. Skeletongcn: a simple yet effective accelerator for gcn training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant