WO2021168944A1 - 数据缓存电路和方法 - Google Patents
数据缓存电路和方法 Download PDFInfo
- Publication number
- WO2021168944A1 WO2021168944A1 PCT/CN2020/080318 CN2020080318W WO2021168944A1 WO 2021168944 A1 WO2021168944 A1 WO 2021168944A1 CN 2020080318 W CN2020080318 W CN 2020080318W WO 2021168944 A1 WO2021168944 A1 WO 2021168944A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- row
- register
- rows
- buffer
- Prior art date
Links
- 239000000872 buffer Substances 0.000 title claims abstract description 125
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims description 91
- 238000004364 calculation method Methods 0.000 claims description 76
- 238000012546 transfer Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 19
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the present disclosure relates to data caching, and in particular to data caching used for neural network calculations.
- Neural network is the core of artificial intelligence technology. At present, neural networks have received extensive research and attention, and are used in many artificial intelligence applications including computer vision, speech recognition, robotics, and autonomous driving.
- the number of neural network levels is often very large, even as many as thousands of layers, so the amount of input data and intermediate data of the neural network is also very large. Therefore, the data caching problem of neural network constitutes the bottleneck of its speed and energy efficiency.
- a data buffer circuit configured to buffer data in a feature map calculated by a neural network, wherein the size of the convolution kernel of the neural network is K*K data, The window corresponding to the convolution kernel slides in the feature map with a step size S, K is a positive integer, S is a positive integer, the circuit includes: a buffer, the buffer includes K buffer units, wherein each buffer unit It is configured to respectively store a plurality of rows of the characteristic map, and the plurality of rows includes a corresponding row in every K rows of the characteristic map.
- a data caching method stores data in a feature map calculated by a neural network in a buffer, wherein the size of the convolution kernel of the neural network is K*K Data, the window corresponding to the convolution kernel slides in the feature map with a step size S, the buffer includes K buffer units, K is a positive integer, and S is a positive integer.
- the method includes: , Storing multiple rows of the feature map, and the multiple rows include a corresponding row in every K rows of the feature map.
- FIG. 1 is a schematic diagram showing the calculation of a convolutional layer in a convolutional neural network according to an exemplary embodiment
- FIGS. 2a and 2b are schematic diagrams showing that the window corresponding to the convolution kernel slides in the feature map according to an exemplary embodiment
- Fig. 3 is a structural block diagram showing a system for calculation of a neural network according to an exemplary embodiment
- FIG. 4 is a block diagram showing the structure of a data caching circuit according to the first exemplary embodiment of the present disclosure
- FIG. 5 is a block diagram showing the structure of a data caching circuit according to a second exemplary embodiment of the present disclosure
- FIG. 6 is a schematic diagram showing a buffer according to a second exemplary embodiment of the present disclosure.
- FIG. 7a and 7b are schematic diagrams showing the data read mode and the data shift mode of the register group according to the second exemplary embodiment of the present disclosure
- FIGS. 8a-8e are schematic diagrams showing example operations of the data caching circuit when the convolution kernel of the neural network slides in a row according to the second exemplary embodiment of the present disclosure
- 9a-9e are schematic diagrams showing example operations of the data caching circuit when the convolution kernel of the neural network slides between lines according to the second exemplary embodiment of the present disclosure
- FIG. 10 is a flowchart showing a data caching method according to an exemplary embodiment
- FIG. 11 is a flowchart showing a data caching method according to an exemplary embodiment
- FIG. 12 is a flowchart illustrating a data caching method according to an exemplary embodiment.
- first, second, etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of these elements. Such terms are only used for Distinguish one element from another.
- first element and the second element may refer to the same instance of the element, and in some cases, based on the description of the context, they may also refer to different instances.
- the neural network used may be a deep neural network (Deep Neural Networks, DNN).
- a deep neural network includes an input layer, several hidden layers (middle layers), and an output layer.
- the input layer receives input data (for example, image pixel data, audio amplitude data, etc.), and preprocesses the input data (for example, de-averaging, normalization, principal component analysis (PCA) dimensionality reduction, etc.) ) To pass the preprocessed data to the hidden layer.
- Each of the several hidden layers receives data from the upper layer, performs calculations on the received data, and then passes the calculated data to the next layer, where the hidden layer can be, for example, a convolutional layer or a pooling layer .
- the output layer receives data from the last hidden layer, performs calculations on the received data, and then outputs the calculation results.
- the output layer may be, for example, a fully connected layer.
- Convolutional Neural Network is a deep neural network in which the hidden layer includes at least one convolutional layer.
- FIG. 1 is a schematic diagram illustrating calculation of a convolutional layer in a convolutional neural network according to an exemplary embodiment. As shown in FIG. 1, the feature map 101 and the convolution kernel 102 are subjected to convolution calculation to obtain an output matrix 103.
- the feature map 101 is a three-dimensional matrix with a height H, a width W, and the number of channels InCh.
- the three-dimensional matrix is composed of InCh layers with a height H and a width W.
- H, W and InCh are positive integers respectively, and H and W may be the same or different.
- the feature map in Fig. 1 is a three-dimensional matrix with a height of 5, a width of 5, and a number of channels of 3.
- FIG. 1 is only exemplary, and the height, width, and number of channels of the feature map are not limited thereto.
- the feature map is data input to the convolutional layer by the input layer or the previous hidden layer.
- each group of data along the width in the three-dimensional matrix is called the row of the three-dimensional matrix, and the address along the width in the three-dimensional matrix is called the column address; each group in the three-dimensional matrix along the height
- the data is called the column of the three-dimensional matrix, and the address along the height direction in the three-dimensional matrix is called the row address.
- each group of data along the height direction in the three-dimensional matrix may also be referred to as a row of the three-dimensional matrix, and each group of data along the width direction in the three-dimensional matrix may be referred to as a column of the three-dimensional matrix.
- the row address and column address in the three-dimensional matrix start from the address "0"
- the row address i is the i-th row
- the column address j is the j-th column.
- the two-dimensional address in the three-dimensional matrix is expressed as (row address, column address).
- the two-dimensional address of the data whose row address is i and column address is j in the three-dimensional matrix is (i, j).
- the convolution kernel 102 is a three-dimensional matrix with a height of K, a width of K, and a channel number of InCh.
- the number of channels of the convolution kernel 102 should be the same as the number of channels of the feature map 101 .
- the convolution kernel in Fig. 1 is a three-dimensional matrix with a height of 3, a width of 3, and a number of channels of 3.
- FIG. 1 is only exemplary, and the height, width, and number of channels of the convolution kernel are not limited thereto.
- FIG. 1 only shows one convolution kernel, it should be understood that FIG. 1 is only exemplary, and the number of convolution kernels in the convolutional neural network is not limited to this.
- the present disclosure uses (height ⁇ width) to describe the size of the feature map and the convolution kernel.
- the size of the feature map in FIG. 1 is 5 ⁇ 5 data
- the size of the convolution kernel is 3 ⁇ 3 data.
- the window corresponding to the convolution kernel slides along the height or width direction with a step size S in the feature map, where the step size S is a positive integer, and S is less than K. In some embodiments, S may be 1. In other embodiments, S may be greater than one.
- the three-dimensional matrix of the data in the feature map corresponding to the window is convolved with the convolution kernel 102 to obtain each element in the output matrix 103.
- the matrix corresponding to the window that is, the window corresponding matrix 101a and the convolution kernel 102, are convoluted as: the window corresponding matrix 101a is multiplied by the elements at the corresponding position in the convolution kernel 102, and then all the products are added to obtain the output matrix
- the calculation result in 103 is 103a.
- the K rows in the feature map are selected, and the window slides in the row direction or the width direction within the K rows.
- Figure 2a shows a schematic diagram of the window sliding in row K.
- the window corresponding matrix is a three-dimensional matrix composed of data at window positions on all layers of the feature map.
- the window ends the sliding in the current K rows and starts to slide in the reselected K rows.
- “the window has been slid to the end of the K rows” means that if the window continues to slide by the step size S, it will exceed the range of the feature map. In some cases, when the window has been slid such that the last column of the window corresponding matrix overlaps the last column of the feature map, the window has slid to the end of the K rows.
- Figure 2b shows a schematic diagram of the window sliding between rows. Similar to Figure 2a, Figure 2b is also a two-dimensional plane corresponding to height and width.
- FIGS. 2a and 2b show that the step size of window sliding is 1, it should be understood that FIG. 2 is only an example, and the step size of window sliding in the convolutional neural network is not limited to this.
- FIG. 3 is a structural block diagram showing a system 300 for calculation of a neural network according to an exemplary embodiment.
- the computing system 300 includes a data buffer circuit 301 and a computing circuit 302.
- the data buffer circuit 301 buffers input data used for neural network calculations, and outputs the buffered data to the calculation circuit 302.
- the data cache circuit 301 caches the data for the feature map calculated by the neural network, and the calculation circuit 302 loads the data of the convolution kernel of the neural network.
- the data buffer circuit 301 sequentially outputs the data of the window corresponding matrix to the calculation circuit 302, and the calculation circuit 302 calculates the received window corresponding matrix and the loaded convolution kernel to obtain each calculation result in the output matrix .
- the data cache circuit 301 caches all the data of the feature map, it is desirable to reduce the storage space occupied by the feature map.
- the present disclosure reduces the storage space occupied by the feature map by simplifying the cache addressing logic in the data cache circuit 301.
- FIG. 4 is a block diagram showing the structure of a data caching circuit 400 according to the first exemplary embodiment of the present disclosure. As shown in FIG. 4, the circuit 400 includes a buffer 401 and a buffer controller 402.
- the window corresponding matrices of all window positions are respectively stored in the buffer 401, and the buffer controller 402 controls the buffer 401 to output the current window corresponding matrix.
- the window corresponding matrix of window position 1 and the window corresponding matrix of window position 2 in FIG. 2a are both stored in the buffer 401.
- the buffer 401 When the window is located at window position 1, the buffer 401 outputs the window corresponding matrix at window position 1.
- the buffer outputs the window corresponding matrix at window position 2. Since the window corresponding matrix at window position 1 partially overlaps with the window corresponding matrix at window position 2, the overlapped part is repeatedly stored in the buffer 401. Therefore, although the addressing logic of the buffer 401 is relatively simple at this time, a large amount of data in the feature map is repeatedly stored, resulting in a waste of storage space.
- the three-dimensional matrix corresponding to the feature map is stored in the buffer 401, and the buffer controller 402 controls the buffer 401 to sequentially output the data corresponding to each two-dimensional address in the current window position.
- the addresses in the feature map are sequentially output as (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) data.
- no data is repeatedly stored in the buffer 401.
- the addressing logic of the buffer 401 is more complicated at this time.
- FIG. 5 is a block diagram showing the structure of a data caching circuit 500 according to the second exemplary embodiment of the present disclosure.
- the circuit 500 is configured to cache data in the feature map calculated by the neural network, where the size of the convolution kernel of the neural network is K ⁇ K data, corresponding to the convolution kernel
- the window of is slid in the feature map with a step size S, K is a positive integer, and S is a positive integer.
- the circuit 500 includes a buffer 501 including K buffer units 5010-501 (K-1), wherein each buffer unit is configured to store a plurality of rows of the characteristic map, and the plurality of rows include the Corresponding row in every K rows of the feature map.
- every K of the feature map is a continuous K rows in the feature map, for example, row 0-row (K-1) of the feature map, row 1-row K of the feature map.
- the window corresponding matrix is continuous K columns in every K rows of the feature map.
- each K lines of the feature map are respectively stored in K different cache units 5010-501 (K-1). Therefore, only the column address needs to be provided to the cache unit 5010-501 (K-1), and the cache unit 5010-501 (K-1) can output the data in one column of every K rows, without all the data in the column. The data is addressed one by one, which simplifies the addressing logic.
- the rows of the feature map are data along the width direction of each group of the feature map, and the columns of the feature map are data along the height direction of each group of the feature map. According to other embodiments, the rows of the feature map are data along the height direction of each group of the feature map, and the columns of the feature map are data along the width direction of each group of the feature map.
- the remainder obtained by dividing the row address by K corresponds to the sequence number of the cache unit storing the row of the feature map.
- FIG. 6 shows data stored in the buffer unit 5010-501 (K-1) in the buffer 501 according to some embodiments.
- the 0th row of the feature map since the remainder obtained by dividing 0 by K is 0, the 0th row of the feature map is stored in the cache unit 5010; for the Kth row of the feature map, because The remainder obtained by dividing K by K is also 0, and the Kth row of the feature map is stored in the cache unit 5010.
- the first row and the (K+1)th row of the feature map are stored in the cache unit 5011, and the (K-1)th and (2K-1)th row of the feature map are stored in the cache unit 501( K-1).
- the window corresponding to the convolution kernel slides in the width direction in the continuous K rows of the feature map.
- the row address of the first row of consecutive K rows is i
- the row address of the consecutive K rows are i, i+1, ... i+(K-1). Therefore, the remainder obtained by dividing the row address of consecutive K rows by K can be, for example, 0, 1,..., (K-1), or, for example, q, 1+q,..., (K- 1), 0, 1,..., (q-1), where q is a positive integer less than K, and is the remainder of i divided by K. Since the remainder obtained by dividing the row address by K corresponds to the sequence number of the cache unit storing the row, the consecutive K rows in the feature map are stored in K cache units, respectively.
- the capacity of the cache unit is designed according to the size of the feature map.
- the feature map with height H, width W, and channel number InCh has H rows, and each row contains W data in the feature map, that is, (W*InCh) data. It is specified that the integer quotient of H divided by K is M, then M rows are stored in the cache unit 5010, that is, (M*W*InCh) data of the feature map are stored.
- the capacity of the buffer unit should be designed to be sufficient to store (M*W*InCh) data.
- the circuit 500 further includes K register groups 5020-502 (K-1), each register group is configured to receive data from a corresponding buffer unit, and output the stored data to the calculation Circuit.
- the register group 5020-502 (K-1) outputs the data of the matrix corresponding to the current window to the calculation circuit, wherein each register group output window corresponds to the data of the corresponding row in the matrix.
- the output window of the register group 5020 corresponds to the 0th row of data in the matrix
- the output window of the register group 5021 corresponds to the first row of data in the matrix
- the output window of the register group 502 (K-1) corresponds to the (K)th row in the matrix. -1) Row data.
- the circuit 500 further includes a cache controller 503, the cache controller 503 is configured to: select consecutive K rows in the characteristic diagram; control the cache 501 to output the data in the matrix corresponding to the window; The window is slid in the row direction in the K rows; and after the window is slid, the buffer 501 is controlled to output the last S columns of the matrix corresponding to the window.
- the window when the window starts to slide in the selected K rows, the window is located in the 0th-(K-1)th column of the K rows, and the buffer controller 503 controls the buffer 501 to output the K rows Column 0-data in column (K-1).
- the buffer controller 503 after each window sliding, as described above with reference to FIG. 2a, the back (KS) column of the matrix corresponding to the window before sliding overlaps the front (KS) column of the matrix corresponding to the window after sliding. Therefore, The buffer controller 503 only needs to control the buffer 501 to output the data in the last S columns that are not overlapped in the window corresponding matrix.
- the cache controller 503 is further configured to: for each cache unit, control The cache unit outputs the data in the corresponding row column by column according to the column address.
- the continuous K lines in the feature map are respectively stored in K cache units. Therefore, for each cache unit, the cache controller 503 selects a line stored therein.
- the cache controller 503 controls all the cache units 5010-501 (K-1) to output data of the same column address at the same time. Since each cache unit outputs one row of the selected K rows, when all the cache units 5010-501 (K-1) output data of the same column address at the same time, the buffer 501 outputs one column of the selected K rows.
- the buffer 501 When the window slides in the selected K rows, the buffer 501 only outputs the data of the non-overlapping last S columns in the matrix corresponding to the windows before and after the slide. Therefore, after each window slide, each buffer unit changes from the current Starting from the column address, continue to output the last S column data of the matrix corresponding to the window without going back to the first column of the matrix corresponding to the window. Therefore, when the window slides in the selected K rows, the cache unit outputs the data in the selected row column by column according to the column address without returning to the address before the current column address after each window sliding.
- the addressing logic when the data stored in it is read out of order, the addressing logic is complicated and the data reading speed is slow; and when the data stored in it is output sequentially, the addressing logic is simple and the data read The fetching speed is accelerated. Since the cache unit sequentially outputs the data in the selected row, the addressing logic is simplified and the data reading speed is improved.
- the window ends sliding in the currently selected K rows, and starts to reselect the feature map. Slide in K rows, where the back (KS) row in the original K row overlaps with the front (KS) row in the reselected K row.
- the cache controller 503 is further configured to: start from the first row address of the feature map, select consecutive K rows; after the window slides to the end of the K rows, reselect the Row K from the (S+1)th row to the Kth row and S rows after the K row; control the buffer to start from the first column address and output the reselected K rows .
- the cache controller 503 is further configured to: after the window is slid to the end of the K rows, output the first row to the S row of the K rows. Cache unit, select the next row stored in it, and for the cache unit that outputs the (S+1)th row to the Kth row in the K rows, still select the currently selected row; control each The cache unit starts from the first column address in the selected row and outputs the selected row.
- the buffer unit 5010-501 (S-1) when the window slides in the 0th-(K-1)th row of the feature map, the buffer unit 5010-501 (S-1) outputs the K rows from the first row to the Sth row.
- Line that is, line 0-line (S-1) of the feature map
- the buffer unit 501S-501(K-1) outputs the line (S+1) of the K lines to the line (S+1)
- the row of K that is, the S-th row-the (K-1)th row of the feature map.
- the window slides to the end of the K line for the cache unit 5010, select the next line stored therein, that is, the Kth line of the feature map.
- the cache unit 5011 select the (K+th) line of the feature map.
- Line ..., for the buffer unit 501 (S-1), select the (K+S-1)th line of the feature map.
- the circuit 500 further includes a multiplexer 504 configured to transfer data from each cache unit to a corresponding register group.
- each register group receives the data in the corresponding row in the corresponding matrix of the window and outputs it to the calculation circuit.
- the selected row of each buffer unit corresponds to a different row in the window corresponding matrix, so the data from the buffer unit should be transferred to a different register group.
- the multiplexer 504 correspondingly changes the correspondence between the buffer unit that outputs data and the register set that receives data. For example, when the window slides in the 0th row-(K-1) row of the feature map, the buffer unit 5010 outputs the 0th row of the feature map, that is, the window corresponds to the 0th row of the matrix. At this time, the multiplexer 504 transfers the data from the cache unit 5010 to the register group 5020; when the window slides in the S-th line-(K+S-1) line of the feature map, the cache unit 5010 outputs the K-th line of the feature map, that is, The window corresponds to the (KS) row of the matrix. At this time, the multiplexer 504 transfers the data from the buffer unit 5010 to the register group 502 (KS).
- the buffer 501 includes a random access memory RAM.
- each register group 700 includes a write register 701 configured to receive data from a corresponding buffer unit; and a calculation register 702 configured to receive data from the write register 701 and register the The data is output to the calculation circuit.
- the write register 701 when the register set 700 is in the data reading mode, as shown in FIG. 7a, the write register 701 receives data from the corresponding buffer unit, and the calculation register 702 outputs the registered data to the calculation circuit;
- the write register 701 shifts the registered data to the calculation register 702.
- the register group 700 is alternately in a data read mode and a data shift mode.
- the calculation register 702 outputs the corresponding row of the matrix corresponding to the current window to the calculation circuit.
- the data registered in the calculation register 702 remains unchanged; the write register 701 receives the matrix corresponding to the window before and after the sliding The non-overlapping part in the corresponding row, that is, the sliding window corresponds to the last S columns in the corresponding row of the matrix.
- the write register 701 shifts the data received in the data read mode to the calculation register 702, and the data in the calculation register 702 is updated to the corresponding row of the corresponding matrix of the sliding window. data.
- the write register 701 receives the data from the buffer unit, and only needs to receive the data of the non-overlapping part of the corresponding row of the matrix corresponding to the window before and after the sliding, which reduces the access to the buffer. The delay of the device.
- the calculation register 702 includes K register units 7021-702K, and the last register unit 702K of the K register units is configured to receive data from the write register 701. Wherein, in response to receiving the data from the write register 701, each of the last (K-1) units 7022-702K of the K register units shifts the data registered therein to the previous register unit.
- the write register 701 when the window starts to slide in the K rows of the feature map, sequentially shifts the data in the corresponding rows in the window corresponding matrix to the register units 7021-702K according to the column addresses. In particular, at the first time, the write register 701 shifts the data in the 0th column in the corresponding row to the register unit 702K; at the second time, the write register 701 shifts the data in the first column in the corresponding row Bit into the register unit 702K, the register unit 702K shifts the 0th column data in the corresponding row to the register unit 702 (K-1); ...; at the Kth moment, writing the register 701 will correspond to the row (K-1) -1) The column data is shifted to the register unit 702K, and the register unit 7022-702K also shifts the data registered in it to the previous register unit. At this time, the register unit 7021 registers the 0th column of the corresponding row, the register unit 7022 registers the first column of the corresponding row
- the write register 701 shifts the data in the last S columns of the corresponding row in the corresponding matrix of the sliding window to the register unit 702 (K -S+1)-702K, and shift the data registered in the original register unit 702(K-S+1)-702K to the register unit 7021-702(KS).
- the calculation register in each register group is cleared.
- the S buffer units change the output lines, and the output lines of each register group also change. Therefore, when the window starts to slide in the new K rows, the data of the row output before the slide in the calculation register is cleared to register the data of the output row after the slide.
- each data in the feature map includes data with the same two-dimensional address on all channels, where the two-dimensional address includes the row address and column address of each data.
- the calculation circuit is a vector-matrix multiplication calculation circuit or a storage-calculation integrated circuit.
- FIGS. 8a-8e are schematic diagrams showing example operations of the data caching circuit when the convolution kernel of the neural network slides in a row according to the second exemplary embodiment of the present disclosure.
- the size of the feature map is 5 ⁇ 5 data
- the size of the convolution kernel is 3 ⁇ 3 data
- the step size of window sliding is 1.
- the buffer unit 8010 outputs the 0th row of the feature map
- the buffer unit 8011 outputs the 1st row of the feature map.
- the buffer unit 8012 outputs the second line of the feature map.
- the output window of the register group 8020 corresponds to row 0 of the matrix and therefore receives data from the buffer unit 8010;
- the output window of the register group 8021 corresponds to row 1 of the matrix and therefore receives data from the buffer unit 8011;
- the output window of the register group 8022 corresponds to the data of the matrix Line 2 therefore receives the data from the buffer unit 8012.
- the register groups 8020-8022 are in data shift mode.
- the write register registers the data received from the corresponding buffer unit in the last data reading mode, that is, the 0th column of the corresponding row of the characteristic map. At this time, the write register shifts the 0th column of the corresponding row of the characteristic map registered therein to the calculation register.
- the window has been slid from window position 1 to window position 2, and window position 2 corresponds to column 1 to column 3 of row 0-row 2 of the feature map.
- the register group 8020-8022 is in the data read mode.
- the window corresponding to the position 1 of the output window of the calculation register corresponds to the corresponding row of the matrix, and the write register receives the data in the third column of the corresponding row of the characteristic map.
- the window is located at window position 2.
- the register groups 8020-8022 are all in the data shift mode.
- the write register shifts the data in the third column of the corresponding row of the characteristic map to the calculation register, and the register unit in the calculation register also sequentially shifts the data registered in it to the previous register unit.
- the data registered in the register group 8020-8022 is updated to the data of the window corresponding matrix at window position 2 after sliding.
- 9a-9e are schematic diagrams showing a data caching circuit when the convolution kernel of the neural network slides between lines according to the second exemplary embodiment of the present disclosure.
- window position 3 the window ends sliding in the 0th row-2nd row of the window; at window position 4, the window starts sliding in the 1st row-3rd row.
- the window is located at window position 3, which corresponds to the second column to the fourth row of row 0-row 2 of the feature map.
- the register group 8020 receives data from the buffer unit 8010; the register group 8021 receives data from the buffer unit 8011; and the register group 8022 receives data from the buffer unit 8012.
- the window has been slid from window position 3 to window position 4.
- the window position 4 corresponds to the 1st row-the 3rd row of the 0th column-the 2nd column of the feature map.
- the cache unit 8010 outputs the third line of the feature map, the cache unit 8011 outputs the first line of the feature map, and the cache unit 8012 outputs the second line of the feature map.
- the output window of the register group 8020 corresponds to row 0 of the matrix and therefore receives data from the buffer unit 8011; the output window of the register group 8021 corresponds to row 1 of the matrix and therefore receives data from the buffer unit 8012; the output window of the register group 8022 corresponds to the data of the matrix Line 2 therefore receives the data from the buffer unit 8010.
- Figure 9b since the line output by the buffer unit 8010 is changed from line 0 of the feature map to line 3 of the feature map, the data on the 0th line of the feature map registered in the calculation register in the register group 8020-8022 is cleared. .
- FIG. 9b shows the shifting of the 0th column of the 1st row to the 3rd row of the feature map to the calculation register of the register group 8020-8022;
- Fig. 9c shows the shifting of the 1st row to the 3rd row of the feature map The first column of the 3 rows is shifted to the calculation register of the register group 8020-8022;
- Fig. 9d shows the shifting of the first row to the second column of the third row of the characteristic map to the calculation register of the register group 8020-8022 middle.
- the calculation register in the register group 8020-8022 outputs the window corresponding matrix corresponding to the position 4 of the window.
- FIG. 10 is a flowchart showing a data caching method of an exemplary embodiment of the present disclosure.
- This method stores the data in the feature map calculated by the neural network in a buffer, where the size of the convolution kernel of the neural network is K*K data, and the window corresponding to the convolution kernel has a step size of S Sliding in the feature map, the buffer includes K buffer units, K is a positive integer, and S is a positive integer.
- each cache unit a plurality of rows of a feature map are stored, and the plurality of rows includes a corresponding row in every K rows of the feature map.
- the remainder obtained by dividing the row address by K corresponds to the sequence number of the cache unit storing the row of the feature map.
- FIG. 11 is a flowchart showing a data caching method of an exemplary embodiment of the present disclosure.
- This method stores the data in the feature map calculated by the neural network in a buffer, where the size of the convolution kernel of the neural network is K*K data, and the window corresponding to the convolution kernel has a step size of S Sliding in the feature map, the buffer includes K buffer units, K is a positive integer, and S is a positive integer.
- each cache unit a plurality of rows of the feature map are stored, and the plurality of rows includes a corresponding one of every K rows of the feature map.
- step S1103 for each of the K register groups, data from the corresponding buffer unit is received, and the stored data is output to the calculation circuit.
- FIG. 12 is a flowchart showing a data caching method of an exemplary embodiment of the present disclosure.
- This method stores the data in the feature map calculated by the neural network in a buffer, where the size of the convolution kernel of the neural network is K*K data, and the window corresponding to the convolution kernel has a step size of S Sliding in the feature map, the buffer includes K buffer units, K is a positive integer, and S is a positive integer.
- each cache unit a plurality of rows of the feature map are stored, and the plurality of rows includes a corresponding one of every K rows of the feature map.
- the cache controller selects consecutive K lines in the feature map.
- the buffer controller controls the data in the matrix corresponding to the buffer output window.
- the cache controller makes the window slide in the row direction in the K rows.
- the buffer controller controls the buffer to output the last S columns of the matrix corresponding to the window.
- each of the K rows corresponds to a corresponding one of the corresponding ones of the K cache units, and for each cache unit, the cache controller controls the cache unit to output the corresponding row in the corresponding row column by column address. data.
- the cache controller selects consecutive K lines starting from the first line address of the feature map; after the window is slid to the end of the K lines, the cache controller reselects the K lines ranked (S+ 1) row to row K and row S after row K, among which, for outputting cache units from row 1 to row S in K rows, the cache controller selects For the next row stored therein, for the cache unit that outputs the (S+1)th row to the Kth row in the K rows, the cache controller still selects the currently selected row; the cache controller controls the cache The device starts from the first column address and outputs the re-selected K rows. The cache controller controls each cache unit to start from the first column address in the selected row and output the selected row.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims (28)
- 一种数据缓存电路,所述电路被配置为缓存用于神经网络所计算的特征图中的数据,其中,所述神经网络的卷积核大小为K*K个数据,对应于所述卷积核的窗口以步长S在所述特征图中滑动,K为正整数,S为正整数,所述电路包括:缓存器,所述缓存器包括K个缓存单元,其中每个缓存单元被配置为分别存储所述特征图的多个行,所述多个行包括所述特征图的每K行中的相应一行。
- 如权利要求1所述的电路,其中,对于所述特征图的每一行,将行地址除以K所得到的余数对应于存储所述特征图的该行的缓存单元的序号。
- 如权利要求1所述的电路,还包括:K个寄存器组,其中,每个所述寄存器组被配置为接收来自对应的缓存单元的数据,并且将所存储的数据输出到计算电路。
- 如权利要求1所述的电路,还包括缓存控制器,所述缓存控制器被配置为:选中所述特征图中连续的K行;控制所述缓存器以输出窗口所对应的矩阵中的数据;使所述窗口在所述K行中沿行方向滑动;以及在所述窗口滑动后,控制所述缓存器以输出所述窗口所对应的矩阵的后S列。
- 如权利要求4所述的电路,其中,所述K行中的每一行对应于所述K个缓存单元中的相应缓存单元中的相应一行,所述缓存控制器还被配置为:对于每个缓存单元,控制所述缓存单元以按照列地址逐列输出所述相应一行中的数据。
- 如权利要求5所述的电路,所述缓存控制器还被配置为:从所述特征图的第1个行地址开始,选中连续的K行;在所述窗口滑动到所述K行的末尾之后,重新选中所述K行中的排在第(S+1)的行至排在第K的行以及排在所述K行之后的S行;控制所述缓存器从第1个列地址开始,输出重新选中的K行。
- 如权利要求5所述的电路,其中,所述缓存控制器还被配置为:在所述窗口滑动到所述K行的末尾之后,对于输出所述K行中的排在第1的行至排在第S的行的缓存单元,选中其中所存储的下一行,对于输出所述K行中的排在第(S+1)的行至排在第K的行的缓存单元,仍然选中当前所选中的行;控制每个缓存单元以从所选中的行中的第1个列地址开始,输出所述所选中的行。
- 如权利要求3所述的电路,还包括:复用器,被配置为将来自每个缓存单元的数据传递到对应的寄存器组。
- 如权利要求3所述的电路,其中,每个寄存器组包括:写寄存器,被配置为接收来自对应的缓存单元的数据;以及计算寄存器,被配置为接收来自所述写寄存器的数据,并且将所寄存的数据输出到所述计算电路。
- 如权利要求9所述的电路,其中,每个寄存器组被配置为:当该寄存器组处于数据读取模式时,所述写寄存器接收来自对应的缓存单元的数据,所述计算寄存器将所寄存的数据输出到所述计算电路;以及当该寄存器组处于数据移位模式时,所述写寄存器将所寄存的数据移位到所述计算寄存器。
- 如权利要求9所述的电路,其中,所述计算寄存器包括:K个寄存器单元,所述K个寄存器单元中的最后一个寄存器单元被配置为接收来自所述写寄存器的数据,其中,响应于接收到来自所述写寄存器的数据,所述K个寄存器单元中的后(K-1)个中的每一个将其中所寄存的数据移位到前一个寄存器单元。
- 如权利要求9所述的电路,其中,在任何缓存单元所输出的行改变的情况下,清空每个寄存器组中的所述计算寄存器。
- 如权利要求1所述的电路,其中,所述特征图中的每个数据包括所有通道上的二维地址相同的数据,其中,每个数据的二维地址包括该数据的行地址、列地址。
- 如权利要求1所述的电路,其中,所述缓存器包括随机存取存储器RAM。
- 如权利要求3所述的电路,其中,所述计算电路为向量-矩阵乘法计算电路或者存算一体电路。
- 一种数据缓存方法,所述方法将用于神经网络所计算的特征图中的数据存储在缓存器中,其中,所述神经网络的卷积核大小为K*K个数据,对应于所述卷积核的窗口以步长S在所述特征图中滑动,所述缓存器包括K个缓存单元,K为正整数,S为正整数,所述方法包括:在每个缓存单元中,存储所述特征图的多个行,所述多个行包括所述特征图的每K行中的相应一行。
- 如权利要求16所述的方法,其中,对于所述特征图的每一行,将行地址除以K所得到的余数对应于存储所述特征图的该行的缓存单元的序号。
- 如权利要求16所述的方法,所述方法还包括:对于K个寄存器组中的每个寄存器组,接收来自对应的缓存单元的数据,并且将所存储的数据输出到计算电路。
- 如权利要求16所述的方法,所述方法还包括:缓存控制器选中所述特征图中连续的K行;所述缓存控制器控制所述缓存器输出窗口所对应的矩阵中的数据;所述缓存控制器使所述窗口在所述K行中沿行方向滑动;以及在所述窗口每次滑动后,所述缓存控制器控制所述缓存器以输出所述窗口对应的矩阵的后S列。
- 如权利要求19所述的方法,其中,所述K行中的每一行对应于所述K个缓存单元中的相应缓存单元中的相应一行,所述方法还包括:对于每个缓存单元,所述缓存控制器控制所述缓存单元以按照列地址逐列输出所述相应一行中的数据。
- 如权利要求20所述的方法,所述方法还包括:所述缓存控制器从所述特征图的第1个行地址开始,选中连续的K行;在所述窗口滑动到所述K行的末尾之后,所述缓存控制器重新选中所述K行中的排在第(S+1)的行至排在第K的行以及排在所述K行之后的S行;所述缓存控制器控制所述缓存器从第1个列地址开始,输出重新选中的K行。
- 如权利要求20所述的方法,所述方法还包括:在所述窗口滑动到所述K行的末尾之后,对于输出所述K行中的排在第1的行至排在第S的行的缓存单元,所述缓存控制器选中其中所存储的下一行,对于输出所述K行中的排在第(S+1)的行至排在第K的行的缓存单元,所述缓存控制器仍然选中当前所选中的行;所述缓存控制器控制每个缓存单元从所选中的行中的第1位列地址开始,输出所述所选中的行。
- 如权利要求18所述的方法,所述方法还包括:复用器将来自每个缓存单元的数据传递到对应的寄存器组。
- 如权利要求18所述的方法,其中每个寄存器组包括写寄存器和计算寄存器,所述方法还包括:所述写寄存器接收来自对应的缓存单元的数据;所述计算寄存器接收来自所述写寄存器的数据,并且将所寄存的数据输出到所述计算电路。
- 如权利要求24所述的方法,所述方法还包括:当该寄存器组处于数据读取模式时,所述写寄存器接收来自对应的缓存单元的数据,所述计算寄存器将所寄存的数据输出到所述计算电路;以及当该寄存器组处于数据移位模式时,所述写寄存器将所寄存的数据移位到所述计算寄存器。
- 如权利要求24所述的方法,其中,所述计算寄存器包括K个寄存器单元,所述方法还包括:所述K个寄存器单元中的最后一个寄存器单元接收来自所述写寄存器的数据,其中,响应于接收到来自所述写寄存器的数据,所述K个寄存器单元中的后(K-1)个中的每一个将其中所寄存的数据移位到前一个寄存器单元。
- 如权利要求24所述的方法,所述方法还包括:在任何缓存单元所输出的行改变的情况下,清空每个寄存器组中的所述计算寄存器。
- 如权利要求16所述的方法,其中,所述特征图中的每个数据包括所有通道上的二维地址相同的数据,其中,每个数据的二维地址包括该数据的行地址、列地址。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/849,913 US11216375B2 (en) | 2020-02-26 | 2020-04-15 | Data caching |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010118620.0A CN113313228B (zh) | 2020-02-26 | 2020-02-26 | 数据缓存电路和方法 |
CN202010118620.0 | 2020-02-26 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/849,913 Continuation US11216375B2 (en) | 2020-02-26 | 2020-04-15 | Data caching |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021168944A1 true WO2021168944A1 (zh) | 2021-09-02 |
Family
ID=77370142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/080318 WO2021168944A1 (zh) | 2020-02-26 | 2020-03-20 | 数据缓存电路和方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113313228B (zh) |
WO (1) | WO2021168944A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629406A (zh) * | 2017-03-24 | 2018-10-09 | 展讯通信(上海)有限公司 | 用于卷积神经网络的运算装置 |
CN108805266A (zh) * | 2018-05-21 | 2018-11-13 | 南京大学 | 一种可重构cnn高并发卷积加速器 |
CN109214506A (zh) * | 2018-09-13 | 2019-01-15 | 深思考人工智能机器人科技(北京)有限公司 | 一种卷积神经网络的建立装置及方法 |
US20190205735A1 (en) * | 2017-12-29 | 2019-07-04 | Facebook, Inc. | Lowering hardware for neural networks |
CN110390384A (zh) * | 2019-06-25 | 2019-10-29 | 东南大学 | 一种可配置的通用卷积神经网络加速器 |
CN110705687A (zh) * | 2019-09-05 | 2020-01-17 | 北京三快在线科技有限公司 | 卷积神经网络硬件计算装置及方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6771018B2 (ja) * | 2015-07-23 | 2020-10-21 | マイヤプリカ テクノロジー エルエルシー | 二次元配列プロセッサの性能向上 |
US10417560B2 (en) * | 2016-12-01 | 2019-09-17 | Via Alliance Semiconductor Co., Ltd. | Neural network unit that performs efficient 3-dimensional convolutions |
EP3557484B1 (en) * | 2016-12-14 | 2021-11-17 | Shanghai Cambricon Information Technology Co., Ltd | Neural network convolution operation device and method |
US10990648B2 (en) * | 2017-08-07 | 2021-04-27 | Intel Corporation | System and method for an optimized winograd convolution accelerator |
CN108182471B (zh) * | 2018-01-24 | 2022-02-15 | 上海岳芯电子科技有限公司 | 一种卷积神经网络推理加速器及方法 |
CN108681984B (zh) * | 2018-07-26 | 2023-08-15 | 珠海一微半导体股份有限公司 | 一种3*3卷积算法的加速电路 |
CN109934339B (zh) * | 2019-03-06 | 2023-05-16 | 东南大学 | 一种基于一维脉动阵列的通用卷积神经网络加速器 |
-
2020
- 2020-02-26 CN CN202010118620.0A patent/CN113313228B/zh active Active
- 2020-03-20 WO PCT/CN2020/080318 patent/WO2021168944A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629406A (zh) * | 2017-03-24 | 2018-10-09 | 展讯通信(上海)有限公司 | 用于卷积神经网络的运算装置 |
US20190205735A1 (en) * | 2017-12-29 | 2019-07-04 | Facebook, Inc. | Lowering hardware for neural networks |
CN108805266A (zh) * | 2018-05-21 | 2018-11-13 | 南京大学 | 一种可重构cnn高并发卷积加速器 |
CN109214506A (zh) * | 2018-09-13 | 2019-01-15 | 深思考人工智能机器人科技(北京)有限公司 | 一种卷积神经网络的建立装置及方法 |
CN110390384A (zh) * | 2019-06-25 | 2019-10-29 | 东南大学 | 一种可配置的通用卷积神经网络加速器 |
CN110705687A (zh) * | 2019-09-05 | 2020-01-17 | 北京三快在线科技有限公司 | 卷积神经网络硬件计算装置及方法 |
Also Published As
Publication number | Publication date |
---|---|
CN113313228B (zh) | 2022-10-14 |
CN113313228A (zh) | 2021-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Meng et al. | Ar-net: Adaptive frame resolution for efficient action recognition | |
CN110705687B (zh) | 卷积神经网络硬件计算装置及方法 | |
Deng et al. | DrAcc: A DRAM based accelerator for accurate CNN inference | |
CN108304922B (zh) | 用于神经网络计算的计算设备和计算方法 | |
US11017264B2 (en) | Method and apparatus with dilated convolution | |
US10936937B2 (en) | Convolution operation device and convolution operation method | |
EP0422348A2 (en) | Two-dimensional systolic array for neural networks, and method | |
US20230289230A1 (en) | Method and apparatus for accelerating convolutional neural network | |
US11709911B2 (en) | Energy-efficient memory systems and methods | |
US10734448B2 (en) | Convolutional neural network system employing resistance change memory cell array | |
CN110766127B (zh) | 神经网络计算专用电路及其相关计算平台与实现方法 | |
CN108717571B (zh) | 一种用于人工智能的加速方法和装置 | |
US20190385005A1 (en) | Framebuffer-less system and method of convolutional neural network | |
CN114761925A (zh) | 处理元件阵列的高效利用 | |
TWI764081B (zh) | 組合多個全局描述符以用於圖像檢索的框架 | |
US11941872B2 (en) | Progressive localization method for text-to-video clip localization | |
CN112926731A (zh) | 执行神经网络的矩阵乘法运算的装置和方法 | |
Shi et al. | Anchor-based self-ensembling for semi-supervised deep pairwise hashing | |
WO2021168944A1 (zh) | 数据缓存电路和方法 | |
CN108764182B (zh) | 一种优化的用于人工智能的加速方法和装置 | |
WO2022007265A1 (zh) | 一种膨胀卷积加速计算方法及装置 | |
WO2021188262A1 (en) | Processing in memory methods for convolutional operations | |
US20210390379A1 (en) | Data Loading | |
US20190164035A1 (en) | Device for reorganizable neural network computing | |
JP6938698B2 (ja) | イメージ検索のためのマルチグローバルディスクリプタを組み合わせるフレームワーク |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20921169 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20921169 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 19/04/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20921169 Country of ref document: EP Kind code of ref document: A1 |