CN112992248A

CN112992248A - PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register

Info

Publication number: CN112992248A
Application number: CN202110269554.1A
Authority: CN
Inventors: 张国和; 刘嘉奇; 陈琳
Original assignee: Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd; Shenzhen Research Institute Of Xi'an Jiaotong University
Current assignee: Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd; Shenzhen Research Institute Of Xi'an Jiaotong University
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-18

Abstract

The invention relates to a PE calculation unit structure of a variable-length cyclic shift register based on FIFO; variable length circular shift register comprising FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized; a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array. The invention can reduce the access times of convolution operation to the global storage and improve the computing efficiency of the network.

Description

PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register

Technical Field

The invention belongs to the field of artificial intelligence and integrated circuits, and particularly relates to a PE (provider edge) computing unit structure of a FIFO (first in first out) based variable-length cyclic shift register.

Background

With the rapid development of artificial intelligence, almost all industries and fields begin to apply artificial intelligence to solve practical problems, the technology is widely applied to various fields such as image recognition, voice recognition, medical health and automatic driving, and the technology is believed to cover more practical application places in the future. The rapid development of artificial intelligence benefits from the research breakthrough of various deep learning algorithms, wherein a deep Convolutional Neural Network (CNN) is a classic deep learning algorithm, and the algorithm completes various tasks such as identification, detection and segmentation of a target object by performing operations such as feature extraction and calculation on an input image.

At present, many CNNs are implemented by software on a general-purpose processor (CPU), but are limited by the serial computing manner adopted by the CPU, so that the computing efficiency is not high. Accordingly, the present invention provides a PE calculation unit structure of a variable length circular shift register based on FIFO, and the present invention is developed accordingly.

Disclosure of Invention

The invention aims to provide a PE (provider edge) computing unit structure of a variable-length cyclic shift register based on an FIFO (first in first out), which realizes FPGA (field programmable gate array) end acceleration of CNN (central network node) around key modules of the variable-length cyclic shift register based on the FIFO, a PE unit, a PE array, a maximum pooling and the like.

In order to achieve the purpose, the specific technical scheme of the invention is as follows: a PE calculation unit structure of a FIFO-based variable-length cyclic shift register comprises

Variable length circular shift register for FIFO: the shift register is divided into three states, and when the states of the state machines are different, the FIFO function and the cyclic shift register function are realized;

a PE unit: and the PE unit is instantiated and connected, and is integrated with other small modules to generate a calculation array with a specified scale and generate a PE array.

Further, the cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for sequential input of data in the volume-in operation.

Further, the small module is a ReLU.

Further, the FIFO has the characteristic of first-in first-out and is combined with an external control signal.

Further, the PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of executing fixed stream.

Further, before the PE unit works, the sequenced Filter data and Feature data are written into the shift register.

The invention has the beneficial effects that: in the conventional shift register structure, if input data is discontinuous when entering the shift register, invalid data can be inserted into the middle of a register chain, which is not beneficial to control of calculation. Therefore, variable length shift registers are designed herein on the basis of a fifo (firstinfirstout) structure.

The shift register is divided into three states, and the solid lines in the figure represent the data flow of the current state. When the state machines are different, the module contains a FIFO function and a circular shift register function. The cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for inputting data in sequence in the rolling operation.

The PE unit is realized by adopting a cyclic shift register based on FIFO and matching with a multiplier to realize a complex multiplication function. The cyclic shift register stores Feature values and Filter values for the calculation of the current round. And arranging the Feature value and the Filter value in a certain sequence and putting the values into a circular shift register.

The PE units are instantiated and connected, and are integrated with other small modules such as a ReLU to generate a calculation array with a specified scale, namely a PE array is generated. And (3) configuring related PE array coefficients according to the structure and the number of layers of the corresponding network, so that the array is used for calculating the convolutional neural network, and the acceleration function of the convolutional neural network based on the FPGA can be realized.

Drawings

FIG. 1 is a diagram of a FIFO based variable length shift register architecture designed in the present invention; FIG. 2 is a diagram of the PE unit architecture designed in the present invention; FIG. 3 is a diagram of the PE array architecture designed in the present invention.

Detailed Description

The FIFO-based variable length circular shift register is a structure designed for realizing the multiplexing of convolution kernels and data stream input in convolution operation. In the conventional shift register structure, if input data is discontinuous when entering the shift register, invalid data can be inserted into the middle of a register chain, which is not beneficial to control of calculation. Therefore, variable length shift registers are designed herein on the basis of a fifo (firstinfirstout) structure.

The FIFO is a first-in first-out queue, and the basic control signals comprise a read/write clock, a read/write enable, an empty/full signal and an output data valid signal. A read pointer and a write pointer are arranged in the FIFO, the read pointer and the write pointer are reset to 0 when reset, the write pointer points to the address to be written next, and the write pointer is added with 1 after each write operation; the read pointer points to the address of the next data to be read, and the read pointer is incremented by 1 after each read operation. When the read pointer and the write pointer are completely the same, the FIFO is empty; the FIFO is full when the write pointer runs one more turn (depth of FIFO) than the read pointer.

The module utilizes the first-in first-out characteristic of FIFO and combines with external control signals to realize a variable-length shift register with the maximum length being FIFO depth. As shown in fig. 1, the module has two working states, and the functions implemented by the module are: the read data of the FIFO is directly used as the write data of the FIFO and is used as a circular shift register; and the read data is directly written into the FIFO, and the total amount of data in the FIFO is kept unchanged and is the difference value of the write pointer and the read pointer.

The PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of row fixed flow. The PE unit is realized by adopting a cyclic shift register based on FIFO and matching with a multiplier to realize a complex multiplication function.

Because one PE unit in the design only has one multiplier, and the structure of the PE unit is shown in FIG. 2, in order to realize general convolution calculation of multiple convolution kernels, multiple input characteristic diagrams and multiple channels, a FIFO-based cyclic shift register is adopted to cooperate with the multiplier to realize a miscellaneous multiplication function. The cyclic shift register is used for calculating Feature values and Filter values in the current round.

By elaborately arranging the arrangement sequence of Feature and Filter data, the data of a plurality of features and filters in the one-dimensional convolution calculation process of the current round are read into the PE unit after being sequenced, the data of the same Filter or Feature are prevented from being read from the global storage for many times, and therefore data multiplexing in the convolution calculation process can be maximized.

Before the PE unit works, firstly writing the sequenced Filter data and Feature data into a shift register, wherein the number of the Filter data is fLen multiplied by fNum multiplied by nchannel, and the number of the Feature data is iLen multiplied by nchannel.

The PL end designs PE array data flow by using the idea of line fixed flow, and the description is expanded around the FIFO-based variable-length cyclic shift register, the PE unit, the PE array, the maximum pooling and other key modules. Finally, the design of the convolutional neural network circuit generator with configurable parameters is completed, and the system has the general convolutional functions of multiple channels, multiple filters and multiple features.

Based on the PE unit described above in this patent, the PE Array module can directly multiplex the PE Array module to realize a PE Array, and the structure thereof is shown in fig. 3. The PE array is herein intended to be configured to a size of 3 × 14 × 64, with the computational bit width set to 8 bits, taking into account the DSP resources, memory resources, and accelerator performance on the VCU118 FPGA development board.

The data of the Filter is transmitted to the PE array through one channel, and 512 bits (8 bits multiplied by 64) are transmitted at a time, namely the data of all channels on one two-dimensional point. Feature once passes 8192bit (8 bit × 64 × 16) data, i.e., one convolution counts all image data needed in the PE computation array.

The data flow of the PE array is based on the idea of row fixed flow, Filter data are distributed in the horizontal direction, Feature data are distributed in the diagonal direction, and data of a certain row of an output characteristic diagram are calculated in each column. When the PE unit completes the computation, first, the PE units are accumulated in the PE channel direction, respectively. And accumulating the accumulated results of the PEs in the same row. When a complete convolution calculation is completed, the final convolution result obtained by accumulation is transmitted to the ReLU module, and the ReLU performs corresponding operation to obtain a result.

The CNN accelerator system designed by the text adopts a multistage pipeline structure to improve the operating frequency of a circuit, and through the design arrangement of input data, the text realizes the acceleration of the universal convolution calculation of simultaneously calculating multiple characteristic maps, multiple channels and multiple convolution kernels, the maximum clock frequency which can be finally reached is 140MHz, and a synthesizer selects LUT resources when synthesizing a multiplier, but rarely uses DSP resources.

Claims

1. A PE calculation unit structure of a FIFO-based variable-length cyclic shift register is characterized in that: comprises that

2. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the cyclic shift function is used for multiplexing of convolution kernels in convolution operation, and the FIFO function is used for sequential input of data in the roll-in operation.

3. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the small module is a ReLU.

4. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the FIFO has the characteristic of first-in first-out and is combined with an external control signal.

5. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: the PE unit realizes universal one-dimensional convolution and further realizes the convolution operation supporting multiple features, multiple convolution kernels and multiple channels by a method of row fixed flow.

6. The structure of PE compute unit of FIFO based variable length circular shift register according to claim 1, wherein: before the PE unit works, the sequenced Filter data and Feature data are written into a shift register.