CN112992248A - PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register - Google Patents

PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register Download PDF

Info

Publication number
CN112992248A
CN112992248A CN202110269554.1A CN202110269554A CN112992248A CN 112992248 A CN112992248 A CN 112992248A CN 202110269554 A CN202110269554 A CN 202110269554A CN 112992248 A CN112992248 A CN 112992248A
Authority
CN
China
Prior art keywords
shift register
fifo
unit
based variable
variable length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110269554.1A
Other languages
Chinese (zh)
Inventor
张国和
刘嘉奇
陈琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Shenzhen Research Institute Of Xi'an Jiaotong University
Original Assignee
Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Shenzhen Research Institute Of Xi'an Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd, Shenzhen Research Institute Of Xi'an Jiaotong University filed Critical Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Priority to CN202110269554.1A priority Critical patent/CN112992248A/en
Publication of CN112992248A publication Critical patent/CN112992248A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • G11C19/28Digital stores in which the information is moved stepwise, e.g. shift registers using semiconductor elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The invention relates to a PE calculation unit structure of a variable-length cyclic shift register based on FIFO; variable length circular shift register comprising FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized; a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array. The invention can reduce the access times of convolution operation to the global storage and improve the computing efficiency of the network.

Description

PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register
Technical Field
The invention belongs to the field of artificial intelligence and integrated circuits, and particularly relates to a PE (provider edge) computing unit structure of a FIFO (first in first out) based variable-length cyclic shift register.
Background
With the rapid development of artificial intelligence, almost all industries and fields begin to apply artificial intelligence to solve practical problems, the technology is widely applied to various fields such as image recognition, voice recognition, medical health and automatic driving, and the technology is believed to cover more practical application places in the future. The rapid development of artificial intelligence benefits from the research breakthrough of various deep learning algorithms, wherein a deep Convolutional Neural Network (CNN) is a classic deep learning algorithm, and the algorithm completes various tasks such as identification, detection and segmentation of a target object by performing operations such as feature extraction and calculation on an input image.
At present, many CNNs are implemented by software on a general-purpose processor (CPU), but are limited by the serial computing manner adopted by the CPU, so that the computing efficiency is not high. Accordingly, the present invention provides a PE calculation unit structure of a variable length circular shift register based on FIFO, and the present invention is developed accordingly.
Disclosure of Invention
The invention aims to provide a PE (provider edge) computing unit structure of a variable-length cyclic shift register based on an FIFO (first in first out), which realizes FPGA (field programmable gate array) end acceleration of CNN (central network node) around key modules of the variable-length cyclic shift register based on the FIFO, a PE unit, a PE array, a maximum pooling and the like.
In order to achieve the purpose, the specific technical scheme of the invention is as follows: a PE calculation unit structure of a FIFO-based variable-length cyclic shift register comprises
Variable length circular shift register for FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized;
a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array.
Further, the cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for sequential input of data in the volume-in operation.
Further, the small module is a ReLU.
Further, the FIFO has the characteristic of first-in first-out and is combined with an external control signal.
Further, the PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of executing fixed stream.
Further, before the PE unit works, the sequenced Filter data and Feature data are written into the shift register.
The invention has the beneficial effects that: in the conventional shift register structure, if input data is discontinuous when entering the shift register, invalid data can be inserted into the middle of a register chain, which is not beneficial to control of calculation. Therefore, variable length shift registers are designed herein on the basis of a fifo (firstinfirstout) structure.
The shift register is divided into three states, and the solid lines in the figure represent the data flow of the current state. When the state machines are different, the module contains a FIFO function and a circular shift register function. The cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for inputting data in sequence in the rolling operation.
The PE unit is realized by adopting a cyclic shift register based on FIFO and matching with a multiplier to realize a complex multiplication function. The cyclic shift register stores Feature values and Filter values for the calculation of the current round. And arranging the Feature value and the Filter value in a certain sequence and putting the values into a circular shift register.
The PE units are instantiated and connected, and are integrated with other small modules such as a ReLU to generate a calculation array with a specified scale, namely a PE array is generated. And (3) configuring related PE array coefficients according to the structure and the number of layers of the corresponding network, so that the array is used for calculating the convolutional neural network, and the acceleration function of the convolutional neural network based on the FPGA can be realized.
Drawings
FIG. 1 is a diagram of a FIFO based variable length shift register architecture designed in the present invention; FIG. 2 is a diagram of the PE unit architecture designed in the present invention; FIG. 3 is a diagram of the PE array architecture designed in the present invention.
Detailed Description
The FIFO-based variable length circular shift register is a structure designed for realizing the multiplexing of convolution kernels and data stream input in convolution operation. In the conventional shift register structure, if input data is discontinuous when entering the shift register, invalid data can be inserted into the middle of a register chain, which is not beneficial to control of calculation. Therefore, variable length shift registers are designed herein on the basis of a fifo (firstinfirstout) structure.
The FIFO is a first-in first-out queue, and the basic control signals comprise a read/write clock, a read/write enable, an empty/full signal and an output data valid signal. A read pointer and a write pointer are arranged in the FIFO, the read pointer and the write pointer are reset to 0 when reset, the write pointer points to the address to be written next, and the write pointer is added with 1 after each write operation; the read pointer points to the address of the next data to be read, and the read pointer is incremented by 1 after each read operation. When the read pointer and the write pointer are completely the same, the FIFO is empty; the FIFO is full when the write pointer runs one more turn (depth of FIFO) than the read pointer.
The module utilizes the first-in first-out characteristic of FIFO and combines with external control signals to realize a variable-length shift register with the maximum length being FIFO depth. As shown in fig. 1, the module has two working states, and the functions implemented by the module are: the read data of the FIFO is directly used as the write data of the FIFO and is used as a circular shift register; and the read data is directly written into the FIFO, and the total amount of data in the FIFO is kept unchanged and is the difference value of the write pointer and the read pointer.
The PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of row fixed flow. The PE unit is realized by adopting a cyclic shift register based on FIFO and matching with a multiplier to realize a complex multiplication function.
Because one PE unit in the design only has one multiplier, and the structure of the PE unit is shown in FIG. 2, in order to realize general convolution calculation of multiple convolution kernels, multiple input characteristic diagrams and multiple channels, a FIFO-based cyclic shift register is adopted to cooperate with the multiplier to realize a miscellaneous multiplication function. The cyclic shift register is used for calculating Feature values and Filter values in the current round.
By elaborately arranging the arrangement sequence of Feature and Filter data, the data of a plurality of features and filters in the one-dimensional convolution calculation process of the current round are read into the PE unit after being sequenced, the data of the same Filter or Feature are prevented from being read from the global storage for many times, and therefore data multiplexing in the convolution calculation process can be maximized.
Before the PE unit works, firstly writing the sequenced Filter data and Feature data into a shift register, wherein the number of the Filter data is fLen multiplied by fNum multiplied by nchannel, and the number of the Feature data is iLen multiplied by nchannel.
The PL end designs PE array data flow by using the idea of line fixed flow, and the description is expanded around the FIFO-based variable-length cyclic shift register, the PE unit, the PE array, the maximum pooling and other key modules. Finally, the design of the convolutional neural network circuit generator with configurable parameters is completed, and the system has the general convolutional functions of multiple channels, multiple filters and multiple features.
Based on the PE unit described above in this patent, the PE Array module can directly multiplex the PE Array module to realize a PE Array, and the structure thereof is shown in fig. 3. The PE array is herein intended to be configured to a size of 3 × 14 × 64, with the computational bit width set to 8 bits, taking into account the DSP resources, memory resources, and accelerator performance on the VCU118 FPGA development board.
The data of the Filter is transmitted to the PE array through one channel, and 512 bits (8 bits multiplied by 64) are transmitted at a time, namely the data of all channels on one two-dimensional point. Feature once passes 8192bit (8 bit × 64 × 16) data, i.e., one convolution counts all image data needed in the PE computation array.
The data flow of the PE array is based on the idea of row fixed flow, Filter data are distributed in the horizontal direction, Feature data are distributed in the diagonal direction, and data of a certain row of an output characteristic diagram are calculated in each column. When the PE unit completes the computation, first, the PE units are accumulated in the PE channel direction, respectively. And accumulating the accumulated results of the PEs in the same row. When a complete convolution calculation is completed, the final convolution result obtained by accumulation is transmitted to the ReLU module, and the ReLU performs corresponding operation to obtain a result.
The CNN accelerator system designed by the text adopts a multistage pipeline structure to improve the operating frequency of a circuit, and through the design arrangement of input data, the text realizes the acceleration of the universal convolution calculation of simultaneously calculating multiple characteristic maps, multiple channels and multiple convolution kernels, the maximum clock frequency which can be finally reached is 140MHz, and a synthesizer selects LUT resources when synthesizing a multiplier, but rarely uses DSP resources.

Claims (6)

1. A PE calculation unit structure of a FIFO-based variable-length cyclic shift register is characterized in that: comprises that
Variable length circular shift register for FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized;
a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array.
2. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for sequential input of data in the roll-in operation.
3. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the small module is a ReLU.
4. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the FIFO has the characteristic of first-in first-out and is combined with an external control signal.
5. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of row fixed flow.
6. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: before the PE unit works, the sequenced Filter data and Feature data are written into a shift register.
CN202110269554.1A 2021-03-12 2021-03-12 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register Pending CN112992248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110269554.1A CN112992248A (en) 2021-03-12 2021-03-12 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110269554.1A CN112992248A (en) 2021-03-12 2021-03-12 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register

Publications (1)

Publication Number Publication Date
CN112992248A true CN112992248A (en) 2021-06-18

Family

ID=76334584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110269554.1A Pending CN112992248A (en) 2021-03-12 2021-03-12 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register

Country Status (1)

Country Link
CN (1) CN112992248A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609042A (en) * 2021-07-20 2021-11-05 天津七所精密机电技术有限公司 System for improving data interaction speed
CN113609042B (en) * 2021-07-20 2024-04-26 天津七所精密机电技术有限公司 System for improving data interaction speed

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4672646A (en) * 1986-09-16 1987-06-09 Hewlett-Packard Company Direct-injection FIFO shift register
JPS62145599A (en) * 1985-12-20 1987-06-29 Hitachi Ltd Variable step shift register
JPH05217392A (en) * 1991-12-10 1993-08-27 Kawasaki Steel Corp Variable length shift register and image processing device using it
US6192498B1 (en) * 1997-10-01 2001-02-20 Globepan, Inc. System and method for generating error checking data in a communications system
US20010025228A1 (en) * 2000-03-21 2001-09-27 Anton Prantl Method for evaluating measured data
US20070157009A1 (en) * 2006-01-03 2007-07-05 Samsung Electronics Co., Ltd. Loop accelerator and data processing system having the same
CN101089840A (en) * 2007-07-12 2007-12-19 浙江大学 Matrix multiplication parallel computing system based on multi-FPGA
US20100146373A1 (en) * 2008-12-05 2010-06-10 Yuan-Sun Chu Configurable hierarchical comma-free reed-solomon decoding circuit and method thereof
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
CN106533869A (en) * 2016-11-08 2017-03-22 北京飞利信电子技术有限公司 Data forwarding method and device and electronic device
CN107533667A (en) * 2015-05-21 2018-01-02 谷歌公司 Vector calculation unit in neural network processor
US20180218760A1 (en) * 2017-01-31 2018-08-02 Intel Corporation Configurable storage blocks having simple first-in first-out enabling circuitry
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62145599A (en) * 1985-12-20 1987-06-29 Hitachi Ltd Variable step shift register
US4672646A (en) * 1986-09-16 1987-06-09 Hewlett-Packard Company Direct-injection FIFO shift register
JPH05217392A (en) * 1991-12-10 1993-08-27 Kawasaki Steel Corp Variable length shift register and image processing device using it
US6192498B1 (en) * 1997-10-01 2001-02-20 Globepan, Inc. System and method for generating error checking data in a communications system
US20010025228A1 (en) * 2000-03-21 2001-09-27 Anton Prantl Method for evaluating measured data
US20070157009A1 (en) * 2006-01-03 2007-07-05 Samsung Electronics Co., Ltd. Loop accelerator and data processing system having the same
CN101089840A (en) * 2007-07-12 2007-12-19 浙江大学 Matrix multiplication parallel computing system based on multi-FPGA
US20100146373A1 (en) * 2008-12-05 2010-06-10 Yuan-Sun Chu Configurable hierarchical comma-free reed-solomon decoding circuit and method thereof
CN107533667A (en) * 2015-05-21 2018-01-02 谷歌公司 Vector calculation unit in neural network processor
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
CN106533869A (en) * 2016-11-08 2017-03-22 北京飞利信电子技术有限公司 Data forwarding method and device and electronic device
US20180218760A1 (en) * 2017-01-31 2018-08-02 Intel Corporation Configurable storage blocks having simple first-in first-out enabling circuitry
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
欧春湘、杨嘉伟、任晓松: "基于FIFO的循环移位寄存器实现方法", 现代电子技术, vol. 37, no. 19, pages 60 - 62 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609042A (en) * 2021-07-20 2021-11-05 天津七所精密机电技术有限公司 System for improving data interaction speed
CN113609042B (en) * 2021-07-20 2024-04-26 天津七所精密机电技术有限公司 System for improving data interaction speed

Similar Documents

Publication Publication Date Title
CN109409511B (en) Convolution operation data flow scheduling method for dynamic reconfigurable array
CN111414994B (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN110705703B (en) Sparse neural network processor based on systolic array
WO2018113597A1 (en) Multiplication and addition device for matrices, neural network computing device, and method
CN104899182A (en) Matrix multiplication acceleration method for supporting variable blocks
CN111445012A (en) FPGA-based packet convolution hardware accelerator and method thereof
CN110807522B (en) General calculation circuit of neural network accelerator
CN110188869B (en) Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm
CN110851779B (en) Systolic array architecture for sparse matrix operations
CN109284475B (en) Matrix convolution calculating device and matrix convolution calculating method
CN110543936B (en) Multi-parallel acceleration method for CNN full-connection layer operation
CN110543939A (en) hardware acceleration implementation framework for convolutional neural network backward training based on FPGA
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
CN109284824A (en) A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies
CN115238863A (en) Hardware acceleration method, system and application of convolutional neural network convolutional layer
CN113485750B (en) Data processing method and data processing device
Shu et al. High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination
CN116167425B (en) Neural network acceleration method, device, equipment and medium
CN111222090B (en) Convolution calculation module, neural network processor, chip and electronic equipment
CN110766136B (en) Compression method of sparse matrix and vector
CN116888591A (en) Matrix multiplier, matrix calculation method and related equipment
CN112992248A (en) PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register
CN115310037A (en) Matrix multiplication computing unit, acceleration unit, computing system and related method
CN110716751B (en) High-parallelism computing platform, system and computing implementation method
CN112836793B (en) Floating point separable convolution calculation accelerating device, system and image processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination