CN112597079B

CN112597079B - Data write-back system of convolutional neural network accelerator

Info

Publication number: CN112597079B
Application number: CN202011527851.3A
Authority: CN
Inventors: 王天一; 边立剑
Original assignee: Shanghai Anlu Information Technology Co ltd
Current assignee: Shanghai Anlu Information Technology Co ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-10-17
Anticipated expiration: 2040-12-22
Also published as: CN112597079A

Abstract

The application provides a data write-back system of a convolutional neural network accelerator, which comprises an input buffer memory module, N-level write-back nodes and a write-back control module, wherein the input buffer memory module is used for being connected with a calculation unit to receive data, the write-back node at the uppermost level is connected with the input buffer memory module, the write-back node at the next level is at least connected with the two write-back nodes at the last level, N is a natural number larger than 1, and the write-back control module is connected with the write-back node at the lowest level to receive data from the write-back node at the lowest level and transmit the data to a bus. The data write-back system of the convolutional neural network accelerator comprises N levels of write-back nodes, wherein the uppermost level of write-back nodes are connected with the input buffer module, one next level of write-back nodes are at least connected with two previous levels of write-back nodes, N is a natural number larger than 1, and the tree structure classifies the write-back nodes, so that the transmission efficiency of data write-back can be improved.

Description

Data write-back system of convolutional neural network accelerator

Technical Field

The application relates to the technical field of deep learning, in particular to a data write-back system of a convolutional neural network accelerator.

Background

In the prior art, compared with the edge device, the cloud field programmable gate array (Field Programmable Gate Array, FPGA) can provide a large amount of logic and memory resources, but the neural network model running on the cloud is often huge, a large amount of intermediate results can be generated in the running process, and the on-chip random access memory (Random Access Memory, RAM) resources on the FPGA platform are often unable to buffer all data, so that the data needs to be transmitted to the off-chip memory, but the prior art cannot meet the transmission requirement of concurrent data with high throughput rate, and the data transmission efficiency is low.

Therefore, there is a need to provide a new data write-back system of convolutional neural network accelerator to solve the above-mentioned problems in the prior art.

Disclosure of Invention

The application aims to provide a data write-back system of a convolutional neural network accelerator, which improves the transmission efficiency of the data write-back of the convolutional neural network accelerator.

In order to achieve the above object, the data write-back system of the convolutional neural network accelerator of the present application includes:

the input buffer module is used for being connected with the computing unit to receive data;

n-level write-back nodes, wherein the write-back node at the uppermost level is connected with the input buffer module, one write-back node at the next level is at least connected with the write-back nodes at the two previous levels, and N is a natural number larger than 1;

and the write-back control module is connected with the write-back node of the lowest stage so as to receive data from the write-back node of the lowest stage and transmit the data to the bus.

The data write-back system of the convolutional neural network accelerator has the beneficial effects that: the data write-back system comprises N levels of write-back nodes, wherein the uppermost level of write-back nodes are connected with the input buffer module, one next level of write-back nodes are at least connected with two previous levels of write-back nodes, N is a natural number larger than 1, and the tree structure classifies the write-back nodes, so that the transmission efficiency of data write-back can be improved.

Preferably, the write-back node includes a first output buffer unit, a selection unit and at least two receiving buffer units, wherein an output end of the receiving buffer unit is connected with an input end of the selection unit, and an output end of the selection unit is connected with an input end of the first output buffer unit. The beneficial effects are that: the standardized design of the write-back node is simple and easy to use and transplant.

Further preferably, the number of the write-back nodes at the previous stage is adapted to the number of the receiving cache units of the write-back nodes at the next stage. The beneficial effects are that: and the waste of the receiving buffer unit of the next-stage write-back node is avoided.

Further preferably, the write-back control module includes an address mapping unit, and the data received by the write-back control module from the write-back node of the lowest stage includes calculation unit address information and calculation result data, and the address mapping unit calculates the write-back address according to the calculation address information and the start address information.

Further preferably, the write-back node further includes an arbitration unit and a buffer management unit, where the arbitration unit is connected to the selection unit, and the buffer management unit is connected to the receiving buffer unit and the first output buffer unit respectively. The beneficial effects are that: the collision in the data transmission process can be effectively avoided.

Further preferably, the receiving buffer unit includes a first buffer status unit and a first data buffer unit that are connected to each other, and the first buffer status unit is connected to the buffer management unit. The beneficial effects are that: and judging whether the first data cache unit has data or not conveniently.

Further preferably, the first output buffer unit includes a second buffer status unit and a second data buffer unit connected to each other, and the second buffer status unit is connected to the buffer management unit. The beneficial effects are that: and judging whether the data exists in the second data cache unit or not conveniently.

Further preferably, the cache management units of the interconnected write-back nodes are interconnected. The beneficial effects are that: avoiding data collision.

Further preferably, the input buffer module includes input buffer units, and the number of the input buffer units is adapted to the number of the receiving buffer units of the write-back node at the uppermost level. The beneficial effects are that: and avoiding the waste of the receiving buffer unit of the top-level write-back node.

Further preferably, the input buffer unit includes a buffer control unit, a third data buffer unit, and a second output buffer unit, where the buffer control unit is connected to the calculation unit, the third data buffer unit, and the buffer management unit of the corresponding write-back node, and the third data buffer unit is connected to the second output buffer unit.

Preferably, the number of the write-back nodes of the lowest stage is 1. The beneficial effects are that: only one data can be transmitted to the bus at the same time, and the collision of data transmission is avoided.

Drawings

FIG. 1 is a block diagram illustrating an arbitration unit according to some embodiments of the present application;

FIG. 2 is a block diagram illustrating a receive buffer unit according to some embodiments of the application;

FIG. 3 is a block diagram illustrating a first output buffer unit according to some embodiments of the present application;

FIG. 4 is a block diagram illustrating an input buffer unit according to some embodiments of the present application;

FIG. 5 is a block diagram of a convolutional neural network accelerator data write-back system in accordance with some embodiments of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Unless otherwise defined, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. As used herein, the word "comprising" and the like means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof without precluding other elements or items.

Aiming at the problems existing in the prior art, the embodiment of the application provides a data write-back system of a convolutional neural network accelerator, which is based on a cloud field programmable gate array (Field Programmable Gate Array, FPGA), and comprises an input buffer module, an N-level write-back node and a write-back control module, wherein the input buffer module is used for being connected with a computing unit to receive data; the uppermost-level write-back node is connected with the input buffer module, one next-level write-back node is at least connected with two previous-level write-back nodes, and N is a natural number larger than 1; the write-back control module is connected with the write-back node of the lowest stage, and is used for receiving data from the write-back node of the lowest stage and transmitting the data to the bus. Preferably, the number of the write-back nodes of the lowest stage is 1.

In some embodiments, the write-back node includes a first output buffer unit, a selection unit, an arbitration unit, a buffer management unit, and at least two receiving buffer units, where an output end of the receiving buffer unit is connected to an input end of the selection unit, an output end of the selection unit is connected to an input end of the first output buffer unit, the arbitration unit is connected to the selection unit, and the buffer management unit is connected to the receiving buffer unit and the first output buffer unit respectively. Specifically, the arbitration unit is a shift register, and the bit of the shift register is at least 2.

FIG. 1 is a block diagram illustrating an arbitration unit according to some embodiments of the present application. Referring to fig. 1, the arbitration unit 212 includes a shift register having the same number of bits as the number of the receiving buffer units connected thereto, for example, 4 in the receiving buffer units connected to the arbitration unit 212, and the shift register includes 4 bits, namely, a first bit 2121, a second bit 2122, a third bit 2123, and a fourth bit 2124. Taking the right shift as an example, the first bit 2121 is 1, the second bit 2122 is 0, the third bit 2123 is 0, the fourth bit 2124 is 0, the first bit 2121 is 0, the second bit 21222 is 1, the third bit 2123 is 0, and the fourth bit 2124 is 0 in the second clock period; the first bit 2121 is 0, the second bit 2122 is 0, the third bit 2123 is 1, and the fourth bit 2124 is 0 for the third clock cycle; the first bit 2121 is 0, the second bit 2122 is 0, the third bit 2123 is 0, and the fourth bit 2124 is 1 for a fourth clock cycle; and four clock cycles are one cycle. The principle of the left shift is the same as that of the right shift, and detailed description thereof will be omitted.

Fig. 2 is a block diagram illustrating a structure of a receiving buffer unit according to some embodiments of the present application. Referring to fig. 2, the receiving buffer unit 211 includes a first buffer status unit 2111 and a first data buffer unit 2112 connected to each other, and the first buffer status unit 2111 is connected to the buffer management unit (not shown). Further, the first buffer status unit 2111 is connected to the buffer management unit, when the first buffer status unit 2111 detects that no data is stored in the first data buffer unit 2112, the first buffer status unit 2111 feeds back to the buffer management unit, the buffer management unit marks the first data buffer unit 2112 as 1, when the first buffer status unit 2111 detects that data is stored in the first data buffer unit 2112, the first buffer status unit 2111 feeds back to the buffer management unit, and the buffer management unit marks the first data buffer unit 2112 as 0.

Fig. 3 is a block diagram illustrating a first output buffer unit according to some embodiments of the present application. Referring to fig. 3, the first output buffer unit 215 includes a second buffer status unit 2151 and a second data buffer unit 2152 connected to each other, the second buffer status unit 2151 is connected to the buffer management unit (not shown), an input terminal of the second data buffer unit 2152 is connected to an output terminal of the selection unit (not shown), and an output terminal of the second data buffer unit 2152 is connected to an input terminal of a first data buffer unit (not shown) of the reception buffer unit of the write-back node of the next stage or an input terminal of the write-back control module (not shown). Further, the second buffer status unit 2151 is connected to the buffer management unit, when the second buffer status unit 2151 detects that no data is stored in the second data buffer unit 2152, the buffer management unit marks the second data buffer unit 2152 as 1, when the second buffer status unit 2151 detects that data is stored in the second data buffer unit 2152, the buffer management unit feeds back to the buffer management unit, and the buffer management unit marks the second data buffer unit 2152 as 0.

In some embodiments, the cache management units of the interconnected write-back nodes are interconnected. .

Specifically, when the receiving buffer unit in the next-level write-back node is marked as 1 by the buffer management unit, that is, no data is stored in the receiving buffer unit, and the output buffer unit in the previous-level write-back node is marked as 1 by the buffer management unit, the output buffer unit may receive data from the receiving buffer unit according to the bit corresponding to 1 in the arbitration unit.

For example, the write-back node of the previous stage is a first write-back node, the write-back node of the next stage is a second write-back node, the first write-back node includes a first receiving buffer unit, a second receiving buffer unit, a third receiving buffer unit, a fourth receiving buffer unit, a first selecting unit, a first arbitration unit, a first buffer management unit, and a third output buffer unit, output ends of the first receiving buffer unit, the second receiving buffer unit, the third receiving buffer unit, and the fourth receiving buffer unit are respectively connected with four input ends of the first selecting unit, the first arbitration unit is connected with a control end of the first selecting unit, an output end of the first selecting unit is connected with the third output buffer unit, and the first arbitration unit is respectively connected with the first receiving buffer unit, the second receiving buffer unit, the third receiving buffer unit, the fourth receiving buffer unit, and the third output buffer unit, so as to mark the first receiving unit, the second receiving unit, the third receiving unit, the fourth receiving unit, the third receiving unit, and the third receiving buffer unit 1, or the third buffer unit;

the first write-back node comprises a fifth receiving buffer unit, a sixth receiving buffer unit, a seventh receiving buffer unit, an eighth receiving buffer unit, a second selecting unit, a second arbitration unit, a second buffer management unit and a fourth output buffer unit, wherein the output ends of the fifth receiving buffer unit, the sixth receiving buffer unit, the seventh receiving buffer unit and the eighth receiving buffer unit are respectively connected with the four input ends of the second selecting unit, the second arbitration unit is connected with the control end of the second selecting unit, the output end of the second selecting unit is connected with the fourth output buffer unit, and the second arbitration unit is respectively connected with the fifth receiving buffer unit, the sixth receiving buffer unit, the seventh receiving buffer unit, the eighth receiving buffer unit and the fourth output buffer unit so as to perform 1 or 0 marking on the fifth receiving buffer unit, the sixth receiving buffer unit, the seventh receiving buffer unit, the eighth receiving buffer unit and the fourth output buffer unit;

the first write-back node and the second write-back node are connected with each other, specifically, the output end of the third output buffer unit is connected with the input end of the fifth receiving buffer unit, the first buffer management unit is connected with the second buffer management unit, when no data is stored in the fifth receiving buffer unit, the second buffer unit feeds back the mark of the fifth receiving buffer unit to the first buffer unit as 1, when no data is stored in the third output buffer unit, the first buffer management unit marks the third output buffer unit as 1, and at this time, if the first bit of the first arbitration unit is 1, the first receiving buffer unit transmits the data stored in the first output buffer unit to the third output buffer unit through the first selection unit, and the third output buffer unit transmits the data to the fifth receiving buffer unit.

Fig. 4 is a block diagram illustrating an input buffer unit according to some embodiments of the application. Referring to fig. 4, the input buffer module includes input buffer units 11, the number of the input buffer units 11 is adapted to the number of receiving buffer units of the write-back node at the uppermost level, the input buffer units 11 include a buffer control unit 111, a third data buffer unit 112, and a second output buffer unit 113, the buffer control unit 111 is respectively connected to a control end of the computing unit (not shown in the figure), the third data buffer unit 112, and a corresponding buffer management unit (not shown in the figure) of the write-back node, an input end of the third data buffer unit 112 is connected to a data output end of the computing unit, the third data buffer unit 112 is connected to the second output buffer unit 113, and an output end of the second output buffer unit 113 is connected to a first data buffer unit (not shown in the figure) in the receiving buffer unit of the write-back node at the uppermost level. Wherein. Specifically, the third data buffer unit 112 is a first-in first-out (First Input First Output, FIFO) memory.

In some embodiments, when the cache management unit of the write-back node at the uppermost level feeds back 0 to the cache control unit, that is, the first data cache unit stores data, and if the third data cache unit stores data at this time, the cache control unit sends a non-empty signal to the computing unit, so that the computing unit stops working; when the cache management unit of the top level of the write-back node feeds back 0 to the cache control unit, namely, the first data cache unit does not store data, if the third data cache unit does not store data at the moment, the cache control unit does not process, or the cache control unit sends an empty signal to the calculation unit, so that the calculation unit immediately enters a working state; when the cache management unit of the top level of the write-back node feeds back 1 to the cache control unit, that is, no data is stored in the first data cache unit, if data is stored in the third data cache unit at this time, the second output cache unit reads the data from the third data cache unit.

In some embodiments, the number of the write-back nodes at the upper level is adapted to the number of the receiving cache units of the write-back nodes at the lower level.

In some embodiments, the write-back control module includes an address mapping unit, the data received by the write-back control module from the write-back node at the lowest stage includes computing unit address information and computing result data, the address mapping unit computes a write-back address according to the computing address information and the starting address information by means of address mapping, and transmits the write-back address and the computing result data together to a Bipolar Random Access Memory (BRAM) of the neural network accelerator along a bus.

In some embodiments, the second output buffer unit, the first data buffer unit, and the second data buffer unit in the present application are random access memories (Random Access Memory, RAM).

FIG. 5 is a block diagram of a convolutional neural network accelerator data write-back system in accordance with some embodiments of the present application. Referring to fig. 5, the data write-back system 100 of the convolutional neural network accelerator includes an input buffer module (not labeled in the figure), a level 2 write-back node 20, and a write-back control module 30. The level 2 write-back node 20 includes a first level write-back node 21 and a second level write-back node 22, where the first level write-back node 21 is the upper level of the second level write-back node 22, the input buffer module 10 is connected with the first level write-back node 21, the first level write-back node 21 is connected with the second level write-back node 22, the second level write-back node 22 is connected with the write-back control module 30, and the write-back control module 30 is connected with a bus (not labeled in the figure).

Referring to fig. 5, the input buffer module 10 includes 16 input buffer units 11, and the 16 input buffer units 11 are connected to 16 computing units (not labeled in the figure) in a one-to-one correspondence manner, so as to receive data from the corresponding computing units.

Referring to fig. 5, the first level write-back node 21 includes 4 write-back nodes, the second level write-back node 22 includes 1 write-back node, and the write-back nodes of the first level write-back node 21 and the second level write-back node 22 each include 4 receive buffer units 211, 1 arbitration unit 212, 1 selection unit 213, 1 buffer management unit 214, and 1 first output buffer unit 215. The input ends of the receiving buffer units 211 are connected with the input buffer units 11 in a one-to-one correspondence manner, in the same write-back node, the output ends of the 4 receiving buffer units 211 are respectively connected with the 4 input ends of the selecting unit 213, the output end of the arbitration unit 212 is connected with the control end of the selecting unit 213, and the buffer management unit 214 is connected with the 4 receiving buffer units 211 and the first output buffer unit 215.

Referring to fig. 5, the output ends of the first output buffer units 215 of the 4 write-back nodes in the first level write-back node 21 are respectively connected with the input ends of the 4 receiving buffer units 211 of the write-back node in the second level write-back node 22; the cache management units 214 of the 4 write-back nodes in the first-level write-back node 21 are all connected with the cache management unit 214 in the second-level write-back node 22.

While embodiments of the present application have been described in detail hereinabove, it will be apparent to those skilled in the art that various modifications and variations can be made to these embodiments. It is to be understood that such modifications and variations are within the scope and spirit of the present application as set forth in the following claims. Moreover, the application described herein is capable of other embodiments and of being practiced or of being carried out in various ways.

Claims

1. A data write-back system of a convolutional neural network accelerator, comprising:

the write-back control module is connected with the write-back node of the lowest stage, and is used for receiving data from the write-back node of the lowest stage and transmitting the data to the bus;

the write-back node comprises a first output buffer unit, a selection unit, at least two receiving buffer units, an arbitration unit and a buffer management unit, wherein the output end of the receiving buffer unit is connected with the input end of the selection unit, the output end of the selection unit is connected with the input end of the first output buffer unit, the arbitration unit is connected with the selection unit, and the buffer management unit is respectively connected with the receiving buffer units and the first output buffer units;

the number of the write-back nodes at the upper stage is matched with the number of the receiving cache units of the write-back nodes at the lower stage;

the write-back control module comprises an address mapping unit, the data received by the write-back control module from the write-back node at the lowest stage comprises calculation unit address information and calculation result data, and the address mapping unit calculates a write-back address according to the calculation address information and the initial address information.

2. The data write-back system of a convolutional neural network accelerator of claim 1, wherein the receive buffer unit comprises a first buffer status unit and a first data buffer unit that are connected to each other, the first buffer status unit being connected to the buffer management unit.

3. The data write-back system of a convolutional neural network accelerator of claim 1, wherein the first output buffer unit comprises a second buffer status unit and a second data buffer unit that are connected to each other, the second buffer status unit being connected to the buffer management unit.

4. A data write-back system of a convolutional neural network accelerator according to any one of claims 1, 2 or 3, wherein the cache management units of the write-back nodes that are connected to each other.

5. The data write-back system of a convolutional neural network accelerator of claim 1, wherein the input buffer module comprises input buffer cells, the number of input buffer cells being adapted to the number of receive buffer cells of the write-back node of the uppermost level.

6. The data write-back system of a convolutional neural network accelerator of claim 5, wherein the input buffer unit comprises a buffer control unit, a third data buffer unit, and a second output buffer unit, the buffer control unit is respectively connected with the computing unit, the third data buffer unit, and the buffer management unit of the corresponding write-back node, and the third data buffer unit is connected with the second output buffer unit.

7. The data write-back system of a convolutional neural network accelerator of claim 1, wherein the number of write-back nodes at the lowest level is 1.