CN112992248A - PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register - Google Patents
PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register Download PDFInfo
- Publication number
- CN112992248A CN112992248A CN202110269554.1A CN202110269554A CN112992248A CN 112992248 A CN112992248 A CN 112992248A CN 202110269554 A CN202110269554 A CN 202110269554A CN 112992248 A CN112992248 A CN 112992248A
- Authority
- CN
- China
- Prior art keywords
- shift register
- fifo
- unit
- based variable
- variable length
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C19/00—Digital stores in which the information is moved stepwise, e.g. shift registers
- G11C19/28—Digital stores in which the information is moved stepwise, e.g. shift registers using semiconductor elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
The invention relates to a PE calculation unit structure of a variable-length cyclic shift register based on FIFO; variable length circular shift register comprising FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized; a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array. The invention can reduce the access times of convolution operation to the global storage and improve the computing efficiency of the network.
Description
Technical Field
The invention belongs to the field of artificial intelligence and integrated circuits, and particularly relates to a PE (provider edge) computing unit structure of a FIFO (first in first out) based variable-length cyclic shift register.
Background
With the rapid development of artificial intelligence, almost all industries and fields begin to apply artificial intelligence to solve practical problems, the technology is widely applied to various fields such as image recognition, voice recognition, medical health and automatic driving, and the technology is believed to cover more practical application places in the future. The rapid development of artificial intelligence benefits from the research breakthrough of various deep learning algorithms, wherein a deep Convolutional Neural Network (CNN) is a classic deep learning algorithm, and the algorithm completes various tasks such as identification, detection and segmentation of a target object by performing operations such as feature extraction and calculation on an input image.
At present, many CNNs are implemented by software on a general-purpose processor (CPU), but are limited by the serial computing manner adopted by the CPU, so that the computing efficiency is not high. Accordingly, the present invention provides a PE calculation unit structure of a variable length circular shift register based on FIFO, and the present invention is developed accordingly.
Disclosure of Invention
The invention aims to provide a PE (provider edge) computing unit structure of a variable-length cyclic shift register based on an FIFO (first in first out), which realizes FPGA (field programmable gate array) end acceleration of CNN (central network node) around key modules of the variable-length cyclic shift register based on the FIFO, a PE unit, a PE array, a maximum pooling and the like.
In order to achieve the purpose, the specific technical scheme of the invention is as follows: a PE calculation unit structure of a FIFO-based variable-length cyclic shift register comprises
Variable length circular shift register for FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized;
a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array.
Further, the cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for sequential input of data in the volume-in operation.
Further, the small module is a ReLU.
Further, the FIFO has the characteristic of first-in first-out and is combined with an external control signal.
Further, the PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of executing fixed stream.
Further, before the PE unit works, the sequenced Filter data and Feature data are written into the shift register.
The invention has the beneficial effects that: in the conventional shift register structure, if input data is discontinuous when entering the shift register, invalid data can be inserted into the middle of a register chain, which is not beneficial to control of calculation. Therefore, variable length shift registers are designed herein on the basis of a fifo (firstinfirstout) structure.
The shift register is divided into three states, and the solid lines in the figure represent the data flow of the current state. When the state machines are different, the module contains a FIFO function and a circular shift register function. The cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for inputting data in sequence in the rolling operation.
The PE unit is realized by adopting a cyclic shift register based on FIFO and matching with a multiplier to realize a complex multiplication function. The cyclic shift register stores Feature values and Filter values for the calculation of the current round. And arranging the Feature value and the Filter value in a certain sequence and putting the values into a circular shift register.
The PE units are instantiated and connected, and are integrated with other small modules such as a ReLU to generate a calculation array with a specified scale, namely a PE array is generated. And (3) configuring related PE array coefficients according to the structure and the number of layers of the corresponding network, so that the array is used for calculating the convolutional neural network, and the acceleration function of the convolutional neural network based on the FPGA can be realized.
Drawings
FIG. 1 is a diagram of a FIFO based variable length shift register architecture designed in the present invention; FIG. 2 is a diagram of the PE unit architecture designed in the present invention; FIG. 3 is a diagram of the PE array architecture designed in the present invention.
Detailed Description
The FIFO-based variable length circular shift register is a structure designed for realizing the multiplexing of convolution kernels and data stream input in convolution operation. In the conventional shift register structure, if input data is discontinuous when entering the shift register, invalid data can be inserted into the middle of a register chain, which is not beneficial to control of calculation. Therefore, variable length shift registers are designed herein on the basis of a fifo (firstinfirstout) structure.
The FIFO is a first-in first-out queue, and the basic control signals comprise a read/write clock, a read/write enable, an empty/full signal and an output data valid signal. A read pointer and a write pointer are arranged in the FIFO, the read pointer and the write pointer are reset to 0 when reset, the write pointer points to the address to be written next, and the write pointer is added with 1 after each write operation; the read pointer points to the address of the next data to be read, and the read pointer is incremented by 1 after each read operation. When the read pointer and the write pointer are completely the same, the FIFO is empty; the FIFO is full when the write pointer runs one more turn (depth of FIFO) than the read pointer.
The module utilizes the first-in first-out characteristic of FIFO and combines with external control signals to realize a variable-length shift register with the maximum length being FIFO depth. As shown in fig. 1, the module has two working states, and the functions implemented by the module are: the read data of the FIFO is directly used as the write data of the FIFO and is used as a circular shift register; and the read data is directly written into the FIFO, and the total amount of data in the FIFO is kept unchanged and is the difference value of the write pointer and the read pointer.
The PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of row fixed flow. The PE unit is realized by adopting a cyclic shift register based on FIFO and matching with a multiplier to realize a complex multiplication function.
Because one PE unit in the design only has one multiplier, and the structure of the PE unit is shown in FIG. 2, in order to realize general convolution calculation of multiple convolution kernels, multiple input characteristic diagrams and multiple channels, a FIFO-based cyclic shift register is adopted to cooperate with the multiplier to realize a miscellaneous multiplication function. The cyclic shift register is used for calculating Feature values and Filter values in the current round.
By elaborately arranging the arrangement sequence of Feature and Filter data, the data of a plurality of features and filters in the one-dimensional convolution calculation process of the current round are read into the PE unit after being sequenced, the data of the same Filter or Feature are prevented from being read from the global storage for many times, and therefore data multiplexing in the convolution calculation process can be maximized.
Before the PE unit works, firstly writing the sequenced Filter data and Feature data into a shift register, wherein the number of the Filter data is fLen multiplied by fNum multiplied by nchannel, and the number of the Feature data is iLen multiplied by nchannel.
The PL end designs PE array data flow by using the idea of line fixed flow, and the description is expanded around the FIFO-based variable-length cyclic shift register, the PE unit, the PE array, the maximum pooling and other key modules. Finally, the design of the convolutional neural network circuit generator with configurable parameters is completed, and the system has the general convolutional functions of multiple channels, multiple filters and multiple features.
Based on the PE unit described above in this patent, the PE Array module can directly multiplex the PE Array module to realize a PE Array, and the structure thereof is shown in fig. 3. The PE array is herein intended to be configured to a size of 3 × 14 × 64, with the computational bit width set to 8 bits, taking into account the DSP resources, memory resources, and accelerator performance on the VCU118 FPGA development board.
The data of the Filter is transmitted to the PE array through one channel, and 512 bits (8 bits multiplied by 64) are transmitted at a time, namely the data of all channels on one two-dimensional point. Feature once passes 8192bit (8 bit × 64 × 16) data, i.e., one convolution counts all image data needed in the PE computation array.
The data flow of the PE array is based on the idea of row fixed flow, Filter data are distributed in the horizontal direction, Feature data are distributed in the diagonal direction, and data of a certain row of an output characteristic diagram are calculated in each column. When the PE unit completes the computation, first, the PE units are accumulated in the PE channel direction, respectively. And accumulating the accumulated results of the PEs in the same row. When a complete convolution calculation is completed, the final convolution result obtained by accumulation is transmitted to the ReLU module, and the ReLU performs corresponding operation to obtain a result.
The CNN accelerator system designed by the text adopts a multistage pipeline structure to improve the operating frequency of a circuit, and through the design arrangement of input data, the text realizes the acceleration of the universal convolution calculation of simultaneously calculating multiple characteristic maps, multiple channels and multiple convolution kernels, the maximum clock frequency which can be finally reached is 140MHz, and a synthesizer selects LUT resources when synthesizing a multiplier, but rarely uses DSP resources.
Claims (6)
1. A PE calculation unit structure of a FIFO-based variable-length cyclic shift register is characterized in that: comprises that
Variable length circular shift register for FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized;
a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array.
2. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for sequential input of data in the roll-in operation.
3. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the small module is a ReLU.
4. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the FIFO has the characteristic of first-in first-out and is combined with an external control signal.
5. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of row fixed flow.
6. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: before the PE unit works, the sequenced Filter data and Feature data are written into a shift register.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110269554.1A CN112992248A (en) | 2021-03-12 | 2021-03-12 | PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110269554.1A CN112992248A (en) | 2021-03-12 | 2021-03-12 | PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112992248A true CN112992248A (en) | 2021-06-18 |
Family
ID=76334584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110269554.1A Pending CN112992248A (en) | 2021-03-12 | 2021-03-12 | PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112992248A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113609042A (en) * | 2021-07-20 | 2021-11-05 | 天津七所精密机电技术有限公司 | System for improving data interaction speed |
CN113609042B (en) * | 2021-07-20 | 2024-04-26 | 天津七所精密机电技术有限公司 | System for improving data interaction speed |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4672646A (en) * | 1986-09-16 | 1987-06-09 | Hewlett-Packard Company | Direct-injection FIFO shift register |
JPS62145599A (en) * | 1985-12-20 | 1987-06-29 | Hitachi Ltd | Variable step shift register |
JPH05217392A (en) * | 1991-12-10 | 1993-08-27 | Kawasaki Steel Corp | Variable length shift register and image processing device using it |
US6192498B1 (en) * | 1997-10-01 | 2001-02-20 | Globepan, Inc. | System and method for generating error checking data in a communications system |
US20010025228A1 (en) * | 2000-03-21 | 2001-09-27 | Anton Prantl | Method for evaluating measured data |
US20070157009A1 (en) * | 2006-01-03 | 2007-07-05 | Samsung Electronics Co., Ltd. | Loop accelerator and data processing system having the same |
CN101089840A (en) * | 2007-07-12 | 2007-12-19 | 浙江大学 | Matrix multiplication parallel computing system based on multi-FPGA |
US20100146373A1 (en) * | 2008-12-05 | 2010-06-10 | Yuan-Sun Chu | Configurable hierarchical comma-free reed-solomon decoding circuit and method thereof |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
CN106533869A (en) * | 2016-11-08 | 2017-03-22 | 北京飞利信电子技术有限公司 | Data forwarding method and device and electronic device |
CN107533667A (en) * | 2015-05-21 | 2018-01-02 | 谷歌公司 | Vector calculation unit in neural network processor |
US20180218760A1 (en) * | 2017-01-31 | 2018-08-02 | Intel Corporation | Configurable storage blocks having simple first-in first-out enabling circuitry |
CN110705687A (en) * | 2019-09-05 | 2020-01-17 | 北京三快在线科技有限公司 | Convolution neural network hardware computing device and method |
-
2021
- 2021-03-12 CN CN202110269554.1A patent/CN112992248A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62145599A (en) * | 1985-12-20 | 1987-06-29 | Hitachi Ltd | Variable step shift register |
US4672646A (en) * | 1986-09-16 | 1987-06-09 | Hewlett-Packard Company | Direct-injection FIFO shift register |
JPH05217392A (en) * | 1991-12-10 | 1993-08-27 | Kawasaki Steel Corp | Variable length shift register and image processing device using it |
US6192498B1 (en) * | 1997-10-01 | 2001-02-20 | Globepan, Inc. | System and method for generating error checking data in a communications system |
US20010025228A1 (en) * | 2000-03-21 | 2001-09-27 | Anton Prantl | Method for evaluating measured data |
US20070157009A1 (en) * | 2006-01-03 | 2007-07-05 | Samsung Electronics Co., Ltd. | Loop accelerator and data processing system having the same |
CN101089840A (en) * | 2007-07-12 | 2007-12-19 | 浙江大学 | Matrix multiplication parallel computing system based on multi-FPGA |
US20100146373A1 (en) * | 2008-12-05 | 2010-06-10 | Yuan-Sun Chu | Configurable hierarchical comma-free reed-solomon decoding circuit and method thereof |
CN107533667A (en) * | 2015-05-21 | 2018-01-02 | 谷歌公司 | Vector calculation unit in neural network processor |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
CN106533869A (en) * | 2016-11-08 | 2017-03-22 | 北京飞利信电子技术有限公司 | Data forwarding method and device and electronic device |
US20180218760A1 (en) * | 2017-01-31 | 2018-08-02 | Intel Corporation | Configurable storage blocks having simple first-in first-out enabling circuitry |
CN110705687A (en) * | 2019-09-05 | 2020-01-17 | 北京三快在线科技有限公司 | Convolution neural network hardware computing device and method |
Non-Patent Citations (1)
Title |
---|
欧春湘、杨嘉伟、任晓松: "基于FIFO的循环移位寄存器实现方法", 现代电子技术, vol. 37, no. 19, pages 60 - 62 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113609042A (en) * | 2021-07-20 | 2021-11-05 | 天津七所精密机电技术有限公司 | System for improving data interaction speed |
CN113609042B (en) * | 2021-07-20 | 2024-04-26 | 天津七所精密机电技术有限公司 | System for improving data interaction speed |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109409511B (en) | Convolution operation data flow scheduling method for dynamic reconfigurable array | |
CN111414994B (en) | FPGA-based Yolov3 network computing acceleration system and acceleration method thereof | |
CN110705703B (en) | Sparse neural network processor based on systolic array | |
WO2018113597A1 (en) | Multiplication and addition device for matrices, neural network computing device, and method | |
CN104899182A (en) | Matrix multiplication acceleration method for supporting variable blocks | |
CN111445012A (en) | FPGA-based packet convolution hardware accelerator and method thereof | |
CN110807522B (en) | General calculation circuit of neural network accelerator | |
CN110188869B (en) | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm | |
CN110851779B (en) | Systolic array architecture for sparse matrix operations | |
CN109284475B (en) | Matrix convolution calculating device and matrix convolution calculating method | |
CN110543936B (en) | Multi-parallel acceleration method for CNN full-connection layer operation | |
CN110543939A (en) | hardware acceleration implementation framework for convolutional neural network backward training based on FPGA | |
CN113033794B (en) | Light weight neural network hardware accelerator based on deep separable convolution | |
CN109284824A (en) | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies | |
CN115238863A (en) | Hardware acceleration method, system and application of convolutional neural network convolutional layer | |
CN113485750B (en) | Data processing method and data processing device | |
Shu et al. | High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination | |
CN116167425B (en) | Neural network acceleration method, device, equipment and medium | |
CN111222090B (en) | Convolution calculation module, neural network processor, chip and electronic equipment | |
CN110766136B (en) | Compression method of sparse matrix and vector | |
CN116888591A (en) | Matrix multiplier, matrix calculation method and related equipment | |
CN112992248A (en) | PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register | |
CN115310037A (en) | Matrix multiplication computing unit, acceleration unit, computing system and related method | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
CN112836793B (en) | Floating point separable convolution calculation accelerating device, system and image processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |