TWI828185B

TWI828185B - Three-dimensional convolution device and three-dimensional convolution method

Info

Publication number: TWI828185B
Application number: TW111121509A
Authority: TW
Inventors: 陳永勝
Original assignee: 大陸商星宸科技股份有限公司
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2024-01-01
Also published as: TW202349278A

Abstract

A three-dimensional convolution method includes the following operations: performing a dimeson transposing operation on input data to consecutively arrange elements of the input data in depth and channel dimensions, in order to generate first data; performing a convolution on the first data and second data that corresponds to first weight data in blocks, in order to generate arithmetic data; and rearranging the arithmetic data according to an original dimensional format of the input data to generate output data.

Description

Three-dimensional convolution operation device and three-dimensional convolution operation method

本案是關於卷積運算裝置，尤其是關於利用重新排序資料維度形式來執行三維卷積運算的三維卷積運算裝置與其方法。This case relates to a convolution operation device, in particular to a three-dimensional convolution operation device and its method that performs a three-dimensional convolution operation by reordering the data dimension form.

卷積運算常見於人工神經網路模型中來進行判斷多個資料之間是否有類似特徵。在現有技術中，在計算三維卷積運算時會對深度維度以及通道維度上的多個資料值進行累加。在現有的資料格式中，深度維度上的多個資料值與通道維度上的多個資料值是分散的儲存於記憶體。如此，將造成三維卷積運算的複雜度變高。此外，在進行三維卷積運算時，卷積運算裝置需耗費較多時間來讀出分散的多個資料值，導致卷積運算裝置在存取資料的效率變低，而使得三維卷積運算的處理效率不佳。Convolution operations are commonly used in artificial neural network models to determine whether multiple data have similar features. In the prior art, when calculating the three-dimensional convolution operation, multiple data values in the depth dimension and the channel dimension are accumulated. In the existing data format, multiple data values in the depth dimension and multiple data values in the channel dimension are stored in memory dispersedly. In this way, the complexity of the three-dimensional convolution operation will become higher. In addition, when performing a three-dimensional convolution operation, the convolution operation device needs to spend more time to read out multiple scattered data values, resulting in a low efficiency of the convolution operation device in accessing data, which makes the three-dimensional convolution operation more difficult. Processing efficiency is poor.

於一些實施態樣中，本案的目的之一在於提供一種可提高卷積運算的處理效率的三維卷積運算裝置與其方法，以改善先前技術的不足。In some embodiments, one of the purposes of the present invention is to provide a three-dimensional convolution operation device and method that can improve the processing efficiency of the convolution operation, so as to improve the shortcomings of the previous technology.

於一些實施態樣中，三維卷積運算方法包含下列操作：對輸入資料進行維度轉換以將輸入資料在深度維度與通道維度上的多個元素連續地排列在一起，進而產生一第一資料；分塊地對該第一資料與對應於第一權重資料的一第二資料執行一卷積運算，以產生一運算資料；以及按照該輸入資料的原始維度形式重新排序該運算資料，以產生一輸出資料。In some implementations, the three-dimensional convolution operation method includes the following operations: performing dimensional conversion on the input data to continuously arrange multiple elements of the input data in the depth dimension and channel dimension together to generate a first data; Perform a convolution operation on the first data and a second data corresponding to the first weight data in blocks to generate an operation data; and reorder the operation data according to the original dimension form of the input data to generate an operation data. Output data.

於一些實施態樣中，三維卷積運算裝置包含緩衝器、直接記憶體存取電路、維度交換電路以及卷積運算電路。直接記憶體存取電路自一外部記憶體讀出一輸入資料，並儲存該輸入資料至該緩衝器。維度交換電路自該緩衝器讀出該輸入資料並對輸入資料進行維度轉換以將輸入資料在深度維度與通道維度上的多個元素連續地排列在一起而產生一第一資料。卷積運算電路分塊地對第一資料以及對應於一第一權重資料的一第二資料執行一卷積運算以產生一運算資料。維度交換電路更按照該輸入資料的一原始維度形式重新排序該運算資料以產生一輸出資料。In some implementations, the three-dimensional convolution operation device includes a buffer, a direct memory access circuit, a dimension switching circuit, and a convolution operation circuit. The direct memory access circuit reads input data from an external memory and stores the input data into the buffer. The dimension switching circuit reads the input data from the buffer and performs dimension conversion on the input data to continuously arrange multiple elements of the input data in the depth dimension and channel dimension together to generate a first data. The convolution operation circuit performs a convolution operation on the first data and a second data corresponding to a first weight data in blocks to generate an operation data. The dimension switching circuit further reorders the operation data according to an original dimensional form of the input data to generate an output data.

有關本案的特徵、實作與功效，茲配合圖式作較佳實施例詳細說明如下。Regarding the characteristics, implementation and functions of this case, the preferred embodiments are described in detail below with reference to the drawings.

本文所使用的所有詞彙具有其通常的意涵。上述之詞彙在普遍常用之字典中之定義，在本案的內容中包含任一於此討論的詞彙之使用例子僅為示例，不應限制到本案之範圍與意涵。同樣地，本案亦不僅以於此說明書所示出的各種實施例為限。All words used in this article have their ordinary meanings. The definitions of the above words in commonly used dictionaries, and the use examples of any of the words discussed here in the content of this case are only examples and should not limit the scope and meaning of this case. Likewise, this case is not limited to the various embodiments shown in this specification.

關於本文中所使用之『耦接』或『連接』，均可指二或多個元件相互直接作實體或電性接觸，或是相互間接作實體或電性接觸，亦可指二或多個元件相互操作或動作。如本文所用，用語『電路』可為由至少一個電晶體與/或至少一個主被動元件按一定方式連接以處理訊號的裝置。As used in this article, "coupling" or "connection" can refer to two or more components that are in direct physical or electrical contact with each other, or that are in indirect physical or electrical contact with each other. It can also refer to two or more components. Components interact or act with each other. As used herein, the term "circuit" may refer to a device consisting of at least one transistor and/or at least one active and passive component connected in a certain manner to process signals.

於一些實施例中，本案的目的之一在於藉由重新排序資料的維度形式來讓直接存取記憶體電路可更有效率地讀出輸入資料與權重資料，進而提高卷積運算的整體效率。In some embodiments, one of the goals of this approach is to allow the direct access memory circuit to read the input data and weight data more efficiently by reordering the dimensional form of the data, thereby improving the overall efficiency of the convolution operation.

圖1為根據本案一些實施例繪製的一種三維卷積運算裝置100的示意圖。於一些實施例，三維卷積運算裝置100可受控於一運算平台（其運行在至少一電腦主機上）。於一些實施例，三維卷積運算裝置100可包含一處理器（未示出），且三維卷積運算裝置100中的其他電路可受控於該處理器。Figure 1 is a schematic diagram of a three-dimensional convolution operation device 100 drawn according to some embodiments of the present application. In some embodiments, the three-dimensional convolution operation device 100 can be controlled by a computing platform (running on at least one computer host). In some embodiments, the three-dimensional convolution operation device 100 may include a processor (not shown), and other circuits in the three-dimensional convolution operation device 100 may be controlled by the processor.

三維卷積運算裝置100包含直接記憶體存取（direct memory access）電路110、緩衝器120、維度交換電路130以及卷積運算電路140。直接記憶體存取電路110可自外部記憶體100A讀取輸入資料DIN與權重資料DW1，並將上述兩個資料儲存於緩衝器120。於一些實施例中，外部記憶體100A可為（但不限於）動態隨機存取記憶體。於一些實施例中，緩衝器120可由（但不限於）靜態隨機存取記憶體實施。The three-dimensional convolution operation device 100 includes a direct memory access circuit 110 , a buffer 120 , a dimension exchange circuit 130 and a convolution operation circuit 140 . The direct memory access circuit 110 can read the input data DIN and the weight data DW1 from the external memory 100A, and store the above two data in the buffer 120 . In some embodiments, the external memory 100A may be (but is not limited to) dynamic random access memory. In some embodiments, buffer 120 may be implemented by, but is not limited to, static random access memory.

於一些實施例中，三維卷積運算裝置100所執行的卷積運算為三維卷積運算。相應地，輸入資料DIN可為五維張量（tensor），其原始維度形式為五維形式。例如，該原始維度形式的排列順序可表示為（N, Di, Hi, Wi, Ci），其中N為批次（batch）且可為輸入資料DIN中最高維度的維度值，Di為深度維度，Hi為高度維度，Wi為寬度維度，且Ci為通道維度。例如，在後述圖3的例子中，輸入資料DIN的原始維度形式（N, Di, Hi, Wi, Ci）為（1, 3, 2, 2, 5），代表輸入資料DIN在深度維度上的元素（或稱為資料值）個數為3），在高度維度的元素個數為2，在寬度維度上的元素個數為2，且在通道維度上的元素個數為5。類似地，權重資料DW1的原始維度形式可表示為（Dk, Hk, Wk, Ck, Co），其中Dk為深度維度，Hk為高度維度，Wk為寬度維度，C為通道維度，且Co為權重資料DW1的最高維度的維度值（其相同於輸出資料DO的通道維度的數值）。In some embodiments, the convolution operation performed by the three-dimensional convolution operation device 100 is a three-dimensional convolution operation. Correspondingly, the input data DIN can be a five-dimensional tensor, and its original dimension form is a five-dimensional form. For example, the arrangement order of the original dimension form can be expressed as (N, Di, Hi, Wi, Ci), where N is the batch (batch) and can be the dimension value of the highest dimension in the input data DIN, Di is the depth dimension, Hi is the height dimension, Wi is the width dimension, and Ci is the channel dimension. For example, in the example of Figure 3 described below, the original dimension form (N, Di, Hi, Wi, Ci) of the input data DIN is (1, 3, 2, 2, 5), which represents the depth dimension of the input data DIN. The number of elements (or data values) is 3), the number of elements in the height dimension is 2, the number of elements in the width dimension is 2, and the number of elements in the channel dimension is 5. Similarly, the original dimension form of weight data DW1 can be expressed as (Dk, Hk, Wk, Ck, Co), where Dk is the depth dimension, Hk is the height dimension, Wk is the width dimension, C is the channel dimension, and Co is the weight The dimension value of the highest dimension of data DW1 (which is the same as the value of the channel dimension of output data DO).

維度交換電路130自緩衝器120讀出輸入資料DIN與權重資料DW1，並按照預設維度形式（其可由運算平台所規範）對輸入資料DIN進行維度交換，以將輸入資料DIN在深度維度與通道維度上的多個元素連續地排列在一起，進而產生資料D1，並儲存資料D1至緩衝器120。於一些實施例中，維度交換電路130更按照預設維度形式對權重資料DW1進行維度交換以將權重資料DW1在深度維度與通道維度上的多個元素連續地排列在一起進而產生資料D2，並儲存資料D2至緩衝器120。直接記憶體存取電路110可自緩衝器110讀出資料D1與資料D2，並將該些資料轉存於外部記憶體100A。藉由上述操作，具有原始維度形式的輸入資料DIN與具有原始維度形式的權重資料DW1將分別重新排序為具有預設維度形式的資料D1與具有預設維度形式的資料D2。如此，可提高卷積運算的效率。關於維度形式轉換的操作將於後參照圖3詳細說明。於一些實施例中，維度交換電路130可由執行特定流程或軟體的資料處理電路實施。於一些實施例中，若權重資料DW1為常量資料，運算平台可將對應於權重資料DW1的資料D2事先儲存於外部記憶體100A，以更進一步地加速卷積運算的處理效率。The dimension exchange circuit 130 reads the input data DIN and the weight data DW1 from the buffer 120, and performs dimension exchange on the input data DIN according to a preset dimension form (which can be standardized by the computing platform), so as to convert the input data DIN in the depth dimension and channel Multiple elements in the dimension are continuously arranged together to generate data D1, and store the data D1 in the buffer 120. In some embodiments, the dimension exchange circuit 130 further performs dimension exchange on the weight data DW1 according to a preset dimension form to continuously arrange multiple elements of the weight data DW1 in the depth dimension and the channel dimension together to generate data D2, and Store data D2 in buffer 120. The direct memory access circuit 110 can read data D1 and data D2 from the buffer 110 and transfer the data to the external memory 100A. Through the above operations, the input data DIN in the original dimension form and the weight data DW1 in the original dimension form will be reordered into data D1 in the default dimension form and data D2 in the default dimension form respectively. In this way, the efficiency of the convolution operation can be improved. The operation of dimensional form conversion will be described in detail later with reference to Figure 3 . In some embodiments, dimension switching circuit 130 may be implemented by data processing circuitry executing specific processes or software. In some embodiments, if the weight data DW1 is constant data, the computing platform can store the data D2 corresponding to the weight data DW1 in the external memory 100A in advance to further accelerate the processing efficiency of the convolution operation.

直接記憶體存取電路110可自外部記憶體100A分塊地讀出資料D1與資料D2到緩衝器120。卷積運算電路140可自緩衝器120讀取資料D1與資料D2，並分塊地對資料D1與資料D2執行卷積運算以產生運算資料DC。在一些實施例中，運算平台（或卷積運算裝置100的處理器）根據系統存取頻寬、緩衝器120的容量、資料D1的維度尺寸以及資料D2的維度尺寸將資料D1與資料D2分成多個資料塊。如此，運算平台（或三維卷積運算裝置100的處理器）可控制直接存取記憶體電路110循序地將資料D1的該些資料塊與資料D2的該些資料塊讀入緩衝器120，並控制卷積運算電路140循序地自緩衝器120讀出資料D1的該些資料塊與資料D2的該些資料塊，並逐塊地執行卷積運算。在卷積運算電路140完成所有資料塊的卷積運算後，卷積運算電路140可產生運算資料DC，並經由緩衝器120以及直接記憶體存取電路110儲存於外部記憶體100A。於一些實施例中，卷積運算電路140可由一數位訊號處理電路實施。The direct memory access circuit 110 can read data D1 and data D2 from the external memory 100A into the buffer 120 in blocks. The convolution operation circuit 140 can read data D1 and data D2 from the buffer 120 and perform convolution operations on the data D1 and data D2 in blocks to generate operation data DC. In some embodiments, the computing platform (or the processor of the convolution operation device 100) divides the data D1 and the data D2 into Multiple data blocks. In this way, the computing platform (or the processor of the three-dimensional convolution computing device 100) can control the direct access memory circuit 110 to sequentially read the data blocks of the data D1 and the data blocks of the data D2 into the buffer 120, and The convolution operation circuit 140 is controlled to sequentially read out the data blocks of the data D1 and the data blocks of the data D2 from the buffer 120 and perform the convolution operation block by block. After the convolution operation circuit 140 completes the convolution operation of all data blocks, the convolution operation circuit 140 can generate the operation data DC and store it in the external memory 100A through the buffer 120 and the direct memory access circuit 110 . In some embodiments, the convolution operation circuit 140 may be implemented by a digital signal processing circuit.

維度交換電路130可經由直接記憶體存取電路110與緩衝器120讀出運算資料DC，並按照輸入資料DIN的原始維度形式重新排序運算資料DC，以產生輸出資料DO。維度交換電路130可經由緩衝器120以及直接記憶體存取電路110將輸出資料DO轉存至外部記憶體100A。如此，可讓運算平台中的其它裝置正確地存取輸出資料DO，以進行後續的應用。The dimension exchange circuit 130 can read the operation data DC through the direct memory access circuit 110 and the buffer 120, and reorder the operation data DC according to the original dimension form of the input data DIN to generate the output data DO. The dimension switching circuit 130 can transfer the output data DO to the external memory 100A via the buffer 120 and the direct memory access circuit 110 . In this way, other devices in the computing platform can correctly access the output data DO for subsequent applications.

圖2為根據本案一些實施例繪製一種三維卷積運算方法200的流程圖。於一些實施例中，三維卷積運算方法200可由（但不限於）圖1的三維卷積運算裝置100執行。為易於說明三維卷積運算裝置100的相關操作，請一併參照圖1與圖2。Figure 2 is a flow chart of a three-dimensional convolution operation method 200 according to some embodiments of this case. In some embodiments, the three-dimensional convolution operation method 200 may be executed by (but not limited to) the three-dimensional convolution operation device 100 of FIG. 1 . In order to easily explain the relevant operations of the three-dimensional convolution operation device 100, please refer to FIG. 1 and FIG. 2 together.

於操作S205，對輸入資料進行維度交換以將輸入資料在深度維度與通道維度上的多個元素連續地排列在一起，進而產生第一資料。於操作S210，將權重資料進行維度交換以將權重資料在深度維度與通道維度上的多個元素連續地排列在一起，進而產生第二資料。如前所述，直接記憶體存取電路110可自外部記憶體100A讀出輸入資料DIN以及權重資料DW1，並將該些資料儲存於緩衝器120。維度交換電路130可自緩衝器120讀出資料DIN與權重資料DW1，並將輸入資料DIN在深度維度與通道維度上的多個元素連續地排列在一起以產生資料D1，並將權重資料DW1在深度維度與通道維度上的多個元素連續地排列在一起以產生資料D2。接著，維度交換電路130經由緩衝器120以及直接記憶體存取電路110轉存資料D1與資料D2至外部記憶體100A。In operation S205, dimension exchange is performed on the input data to continuously arrange multiple elements of the input data in the depth dimension and the channel dimension, thereby generating first data. In operation S210, the weight data is dimensionally exchanged to continuously arrange multiple elements of the weight data in the depth dimension and the channel dimension together to generate second data. As mentioned above, the direct memory access circuit 110 can read the input data DIN and the weight data DW1 from the external memory 100A, and store these data in the buffer 120 . The dimension switching circuit 130 can read the data DIN and the weight data DW1 from the buffer 120, and continuously arrange the multiple elements of the input data DIN in the depth dimension and the channel dimension together to generate the data D1, and put the weight data DW1 in Multiple elements in the depth and channel dimensions are continuously arranged together to generate data D2. Then, the dimension exchange circuit 130 transfers the data D1 and the data D2 to the external memory 100A via the buffer 120 and the direct memory access circuit 110 .

圖3為根據本案一些實施例繪製對圖1的輸入資料DIN進行維度交換以產生資料D1的示意圖。如前所述，輸入資料DIN的原始維度形式可表示為（N, Di, Hi, Wi, Ci）。於圖3的例子中，原始維度形式（N, Di, Hi, Wi, Ci）為（1, 2, 3, 2, 5）。換言之，輸入資料DIN在高度維度Hi上可分為三個資料群（即對應於Hi=0, 1, 2的多個資料）。在每一個資料群中可進一步在寬度維度Wi上畫分為二個子資料群（即對應於Wi=0, 1的多個資料），且在每一個子資料群中可進一步在深度維度Di上分為兩筆資料（即對應於Di=0, 1的多個資料），且每一筆資料包含5個元素（或稱為資料值；即對應於Ci=5）。詳細而言，輸入資料DIN包含多筆資料D000、D001、D010、 D011、…、D210以及D211，其中資料D000代表其對應的高度維度Hi、寬度維度Wi以及深度維度Di皆為0，資料D001代表其對應的高度維度Hi、寬度維度Wi以及深度維度Di依序為0、0與1。依此類推，應可理解上述多筆資料與其原始維度形式的對應關係。Figure 3 is a schematic diagram of performing dimension exchange on the input data DIN in Figure 1 to generate data D1 according to some embodiments of this case. As mentioned before, the original dimensional form of the input data DIN can be expressed as (N, Di, Hi, Wi, Ci). In the example in Figure 3, the original dimension form (N, Di, Hi, Wi, Ci) is (1, 2, 3, 2, 5). In other words, the input data DIN can be divided into three data groups in the height dimension Hi (ie, multiple data corresponding to Hi=0, 1, 2). Each data group can be further divided into two sub-data groups in the width dimension Wi (that is, multiple data corresponding to Wi=0, 1), and each sub-data group can be further divided in the depth dimension Di. It is divided into two pieces of data (that is, multiple data corresponding to Di=0, 1), and each piece of data contains 5 elements (or called data value; that is, corresponding to Ci=5). Specifically, the input data DIN includes multiple pieces of data D000, D001, D010, D011,..., D210 and D211. Data D000 represents that its corresponding height dimension Hi, width dimension Wi and depth dimension Di are all 0, and data D001 represents The corresponding height dimension Hi, width dimension Wi and depth dimension Di are 0, 0 and 1 in order. By analogy, it should be possible to understand the correspondence between the above-mentioned multiple pieces of information and their original dimensional forms.

如圖3所示，可藉由將輸入資料DIN在深度維度Di與通道維度Ci上的多個元素連續地排列在一起來產生資料D1，其中資料D1的預設維度形式依序包含批次、高度維度、寬度維度、深度維度以及通道維度，其可依序註記為（N, H, W, D, C），其中，不同於輸入資料DIN的原始維度形式，在預設維度形式中，深度維度D與通道維度C為鄰近設置。藉由前述的維度交換，可看出在資料D1的多筆資料（即資料D000、D001、D010、 D011、…、D210以及D211）為連續地排列。換句話說，這些資料可以連續地儲存於外部記憶體100A（與/或緩衝器120）。如此一來，在進行卷積運算的過程中，直接記憶體電路110可以自外部記憶體100A連續讀出資料D1中的多筆資料來進行卷積運算。As shown in Figure 3, data D1 can be generated by continuously arranging multiple elements of the input data DIN in the depth dimension Di and channel dimension Ci. The default dimension form of the data D1 includes batch, The height dimension, width dimension, depth dimension and channel dimension can be noted in sequence as (N, H, W, D, C). Among them, different from the original dimension form of the input data DIN, in the default dimension form, depth Dimension D and channel dimension C are adjacent settings. Through the aforementioned dimension exchange, it can be seen that the multiple pieces of data in data D1 (ie, data D000, D001, D010, D011,..., D210 and D211) are continuously arranged. In other words, these data can be continuously stored in the external memory 100A (and/or the buffer 120). In this way, during the convolution operation, the direct memory circuit 110 can continuously read out multiple pieces of data in the data D1 from the external memory 100A to perform the convolution operation.

換個方式解釋，在二維卷積運算中，卷積核（相當於權重資料DW1）是在輸入資料的寬度維度以及高度維度上進行滑動並同時在通道維度上與對應的多個元素進行乘加運算以產生卷積運算結果。相對的，在三維卷積運算中，卷積核會更進一步地在輸入資料的深度維度上與對應的多個元素進行乘加運算以產生卷積運算結果。由於三維卷積運算在深度維度與通道維度上都會進行累加，故可將輸入資料DIN中在深度維度與通道維度上的多個元素連續地排列在一起以產生資料D1。如此一來，可將三維卷積運算的運算化簡為類似於二維卷積運算的運算，進而降低三維卷積運算的複雜度並增加處理效率。To explain it another way, in the two-dimensional convolution operation, the convolution kernel (equivalent to the weight data DW1) slides in the width and height dimensions of the input data and multiplies and adds corresponding multiple elements in the channel dimension at the same time. Operation to produce the result of the convolution operation. In contrast, in a three-dimensional convolution operation, the convolution kernel will further perform multiplication and addition operations with corresponding multiple elements in the depth dimension of the input data to generate a convolution operation result. Since the three-dimensional convolution operation accumulates in both the depth dimension and the channel dimension, multiple elements in the depth dimension and channel dimension in the input data DIN can be continuously arranged together to generate data D1. In this way, the three-dimensional convolution operation can be simplified into an operation similar to the two-dimensional convolution operation, thereby reducing the complexity of the three-dimensional convolution operation and increasing the processing efficiency.

詳細而言，在卷積運算的過程中，卷積運算電路140可經由直接記憶體存取電路110與緩衝器120連續地讀取資料D1中的兩筆資料來進行一次卷積運算。例如，該兩筆資料可為資料D000與D0001，其包含對應於不同深度（Di為0或1）的多個（例如為10個）元素，且該些元素對應於同一寬度（Wi為0）與同一高度（Hi為0）。藉由維度交換以及連續讀取的方式，不但可將多筆資料連續地排列在一起，還可等效地降低深度維度的維度數。例如，資料D1在連續讀取的過程中所呈現的維度形式（N, H, W, D, C）相當於（1, 3, 2, 1, 10），其中深度維度D的維度數等效地下降為1，且通道維度的維度數變為10。如此，可增加直接記憶體存取電路110每次所讀取的元素數量，以提升直接記憶體存取電路110的工作效率，從而提高卷積運算的計算效率。圖3僅以輸入資料DIN與資料D1為例說明，應當理解，圖3中的相同操作同樣適用於權重資料DW1與資料D2（或是後面提及的權重資料DW2與資料D3），故於此不再重複贅述。Specifically, during the convolution operation, the convolution operation circuit 140 can continuously read two pieces of data in the data D1 through the direct memory access circuit 110 and the buffer 120 to perform a convolution operation. For example, the two pieces of data can be data D000 and D0001, which contain multiple (for example, 10) elements corresponding to different depths (Di is 0 or 1), and these elements correspond to the same width (Wi is 0). with the same height (Hi is 0). Through dimension exchange and continuous reading, not only can multiple pieces of data be continuously arranged together, but the number of depth dimensions can also be effectively reduced. For example, the dimension form (N, H, W, D, C) presented by data D1 during the continuous reading process is equivalent to (1, 3, 2, 1, 10), in which the number of dimensions of the depth dimension D is equivalent The ground drops to 1, and the dimension number of the channel dimension becomes 10. In this way, the number of elements read by the direct memory access circuit 110 each time can be increased, thereby improving the working efficiency of the direct memory access circuit 110, thereby improving the calculation efficiency of the convolution operation. Figure 3 only takes the input data DIN and data D1 as an example. It should be understood that the same operations in Figure 3 are also applicable to the weight data DW1 and data D2 (or the weight data DW2 and data D3 mentioned later), so here No more details will be given.

繼續參照圖1與圖2，在操作S215中，根據緩衝器的容量，將第一資料與第二資料各自分成多個資料塊。在操作S220中，將對應於第一資料的多個資料塊中的一個資料塊讀入緩衝器。在操作S225中，將對應於第二資料的多個資料塊中的一個資料塊讀入緩衝器。在操作S230中，根據該緩衝器所儲存的多個資料塊執行卷積運算，以產生運算資料中的部分資料。在操作S235，儲存該部分資料到外部記憶體。Continuing to refer to Figures 1 and 2, in operation S215, the first data and the second data are each divided into a plurality of data blocks according to the capacity of the buffer. In operation S220, one data block among the plurality of data blocks corresponding to the first data is read into the buffer. In operation S225, one data block among the plurality of data blocks corresponding to the second data is read into the buffer. In operation S230, a convolution operation is performed according to a plurality of data blocks stored in the buffer to generate part of the operation data. In operation S235, the part of data is stored in the external memory.

如前所述，運算平台（或卷積運算裝置100的處理器）可根據存取頻寬、緩衝器120的容量、資料D1的尺寸以及資料D2的尺寸將資料D1與資料D2各自分成多個資料塊。於一些實施例中，經切分後的資料塊符合下列條件：資料D1之通道維度的數值相等於資料D2之通道維度的數值；以及資料D2在通道維度上的滑動（或稱偏移）數值相等於輸出資料DO的通道維度的數值，但本案並不以此為限。在將資料D1與資料D2各自分為多個資料塊後，直接記憶體存取電路110可分塊地讀取資料D1與資料D2到緩衝器120（即每次讀取資料D1的一個資料塊以及資料D2的一個資料塊到緩衝器120），以提供該些資料塊給卷積運算電路140，以執行一次卷積運算並產生運算資料DC的部分資料（相當於該次卷積運算的結果）。直接記憶體存取電路110可自緩衝器120讀出該部分資料，並轉存於外部記憶體100A。於一些實施例中，資料D1與資料D2可經由現有的調度演算法或分塊卷積（block convolution）演算法分為多個資料塊。As mentioned above, the computing platform (or the processor of the convolution computing device 100) can divide the data D1 and the data D2 into multiple data according to the access bandwidth, the capacity of the buffer 120, the size of the data D1 and the size of the data D2. data block. In some embodiments, the segmented data blocks meet the following conditions: the value of the channel dimension of data D1 is equal to the value of the channel dimension of data D2; and the sliding (or offset) value of data D2 on the channel dimension The value is equal to the channel dimension of the output data DO, but this case is not limited to this. After dividing the data D1 and the data D2 into multiple data blocks, the direct memory access circuit 110 can read the data D1 and the data D2 into the buffer 120 in blocks (that is, reading one data block of the data D1 at a time). and a data block of data D2 to the buffer 120) to provide these data blocks to the convolution operation circuit 140 to perform a convolution operation and generate part of the operation data DC (equivalent to the result of the convolution operation) ). The direct memory access circuit 110 can read the part of the data from the buffer 120 and transfer it to the external memory 100A. In some embodiments, data D1 and data D2 can be divided into multiple data blocks through existing scheduling algorithms or block convolution algorithms.

圖4A為根據本案一些實施例繪製根據經分塊後的資料D1的示意圖。在圖4A中，在通道維度Ci上的一個方形代表資料D1中的一個張量資料。由於通道維度Ci與深度維度Di合併為同一維度（於此例中，通道維度Ci的數值為8），在此維度上基於分界線BL1（以虛線表示）將資料D1分塊，並在高度維度Hi以及寬度維度Wi分別基於分界線BL2與分界線BL3（以虛線表示）分塊。如此，資料D1可分為16個資料塊。為易於理解，在圖4A中以網點以及斜線分別示出4個資料塊的對應設置方式，並可依此類推剩餘資料塊的位置。在實際應用中，資料D1的大小通常大於緩衝器120的容量。因此，直接記憶體存取電路110可分塊地讀入資料D1的一個資料塊到緩衝器120，以供卷積運算電路140執行卷積運算。FIG. 4A is a schematic diagram of drawing the block data D1 according to some embodiments of the present case. In Figure 4A, a square in the channel dimension Ci represents a tensor data in the data D1. Since the channel dimension Ci and the depth dimension Di are merged into the same dimension (in this example, the value of the channel dimension Ci is 8), the data D1 is divided into blocks based on the dividing line BL1 (represented by a dotted line) in this dimension, and in the height dimension Hi and the width dimension Wi are divided into blocks based on the dividing line BL2 and the dividing line BL3 (indicated by a dotted line) respectively. In this way, the data D1 can be divided into 16 data blocks. For easy understanding, the corresponding arrangements of the four data blocks are shown with dots and slashes in FIG. 4A , and the positions of the remaining data blocks can be deduced by analogy. In practical applications, the size of the data D1 is usually larger than the capacity of the buffer 120 . Therefore, the direct memory access circuit 110 can read a data block of the data D1 into the buffer 120 in blocks for the convolution operation circuit 140 to perform the convolution operation.

圖4B為根據本案一些實施例繪製根據經分塊後的資料D2的示意圖。在此例中，資料D2在通道維度Ck（或是深度維度Ck；兩者合併為同一維度）上基於分界線BL4（以虛線繪製）劃分為多個資料塊。由於資料D2的大小通常較小，故在此例中可不在高度維度Hk與寬度維度Wk上進一步地劃分，但本案並不以此為限。為易於理解，在圖4B中以網點以及斜線分別示出多個資料塊的對應設置方式。直接記憶體存取電路110可分塊地讀入資料D2的一個資料塊到緩衝器120，以供卷積運算電路140執行卷積運算。Figure 4B is a schematic diagram of drawing the block data D2 according to some embodiments of the present case. In this example, the data D2 is divided into multiple data blocks based on the dividing line BL4 (drawn with a dotted line) in the channel dimension Ck (or the depth dimension Ck; the two are merged into the same dimension). Since the size of the data D2 is usually small, it does not need to be further divided in the height dimension Hk and the width dimension Wk in this example, but this case is not limited to this. For ease of understanding, the corresponding arrangements of multiple data blocks are shown with dots and slashes in FIG. 4B . The direct memory access circuit 110 may read a data block of the data D2 into the buffer 120 in blocks for the convolution operation circuit 140 to perform the convolution operation.

繼續參照圖2，在操作S240中，確認卷積運算是否完成。若卷積運算完成（即計算完所有資料塊），執行操作S245。或者，若卷積運算未完成，再次執行操作S215，以讀入資料D1與資料D2的下一個資料塊來繼續執行卷積運算。藉由重複上述步驟，可以得到完整的運算資料DC。在操作S245中，確認下一層網路是否仍為卷積運算。若下一層網路仍為卷積運算，再次執行操作S210，並利用前述的多個操作再次執行下一層的卷積運算。關於操作S245的說明將於後參照圖5A與圖5B說明。或者，若下一層網路不為卷積運算，執行操作S250。在操作S250中，按照原始維度形式重新排序運算資料，以產生輸出資料。Continuing to refer to FIG. 2 , in operation S240, it is confirmed whether the convolution operation is completed. If the convolution operation is completed (that is, all data blocks are calculated), operation S245 is performed. Or, if the convolution operation is not completed, perform operation S215 again to read the next data block of data D1 and data D2 to continue performing the convolution operation. By repeating the above steps, the complete calculation data DC can be obtained. In operation S245, it is confirmed whether the next layer network is still a convolution operation. If the network of the next layer is still a convolution operation, perform operation S210 again, and use the aforementioned multiple operations to perform the convolution operation of the next layer again. The description of operation S245 will be described later with reference to FIG. 5A and FIG. 5B . Or, if the next layer network is not a convolution operation, perform operation S250. In operation S250, the operation data is reordered according to the original dimension form to generate output data.

例如，若下一層網路並非卷積運算，直接存取記憶體電路110可自外部記憶體100A讀出運算資料DC，並將運算資料DC轉存至緩衝器120。維度交換電路130可自緩衝器120讀出運算資料DC，並按照輸入資料DIN的原始維度形式重新排序運算資料DC以產生輸出資料DO，並儲存輸出資料DO至緩衝器120。直接存取記憶體電路110可自緩衝器120讀出輸出資料DO，並將之轉存至外部記憶體100A。如此，運算平台或是系統中的其他裝置可使用輸出資料DO執行後續資料處理。換言之，藉由操作S250，可將輸出資料DO的維度形式還原成運算平台所適用的原始維度形式，以讓神經網路模型中的其它網路可正確地使用輸出資料DO。For example, if the next layer network does not perform convolution operation, the direct access memory circuit 110 can read the operation data DC from the external memory 100A and transfer the operation data DC to the buffer 120 . The dimension switching circuit 130 can read the operation data DC from the buffer 120, reorder the operation data DC according to the original dimension form of the input data DIN to generate the output data DO, and store the output data DO in the buffer 120. The direct access memory circuit 110 can read the output data DO from the buffer 120 and transfer it to the external memory 100A. In this way, the computing platform or other devices in the system can use the output data DO to perform subsequent data processing. In other words, by operating S250, the dimensional form of the output data DO can be restored to the original dimensional form suitable for the computing platform, so that other networks in the neural network model can correctly use the output data DO.

上述三維卷積運算方法200的多個操作僅為示例，並非限定需依照此示例中的順序執行。在不違背本案的各實施例的操作方式與範圍下，在三維卷積運算方法200下的各種操作當可適當地增加、替換、省略或以不同順序執行（例如可以是同時執行或是部分同時執行）。The multiple operations of the above three-dimensional convolution operation method 200 are only examples, and are not limited to be performed in the order in this example. Without violating the operation mode and scope of each embodiment of the present application, various operations under the three-dimensional convolution operation method 200 can be appropriately added, replaced, omitted, or performed in a different order (for example, they can be performed simultaneously or partially simultaneously. implement).

圖5A為根據本案一些實施例繪製執行單一卷積運算層的運算的資料流程圖。於此例中，三維卷積運算裝置100所運行的神經網路模型包含單一卷積層（即前述的卷積運算包含單一卷積運算層）。在操作S501中，對輸入資料DIN進行維度交換（即將輸入資料DIN在深度維度與通道維度上的多個元素連續地排列在一起）以產生資料D1。在操作S502中，對權重資料DW1進行維度交換（即將權重資料DW1在深度維度與通道維度上的多個元素連續地排列在一起）以產生資料D2。在操作S503中，分塊地對資料D1與資料D2執行單一卷積運算層的運算，以產生運算資料DC（相當於圖2的操作S215至操作S240）。在操作S504中，按照原始維度形式重新排序運算資料DC，以產生輸出資料DO。FIG. 5A is a data flow diagram illustrating the operation of a single convolution operation layer according to some embodiments of the present invention. In this example, the neural network model run by the three-dimensional convolution operation device 100 includes a single convolution layer (that is, the aforementioned convolution operation includes a single convolution operation layer). In operation S501, the input data DIN is dimensionally exchanged (that is, multiple elements of the input data DIN in the depth dimension and the channel dimension are continuously arranged together) to generate data D1. In operation S502, the weight data DW1 is dimensionally exchanged (that is, multiple elements of the weight data DW1 in the depth dimension and the channel dimension are continuously arranged together) to generate data D2. In operation S503, a single convolution operation layer operation is performed on the data D1 and the data D2 in blocks to generate operation data DC (equivalent to operations S215 to S240 in Figure 2). In operation S504, the operation data DC is reordered according to the original dimension form to generate the output data DO.

圖5A的多個操作可參考前述圖2的多個操作之說明，故於此不再重複贅述。如前所述，在此例中，卷積運算只包含一個卷積運算層，故在執行完操作S503後可按照原始維度形式還原運算資料DC的維度，以產生輸出資料DO。For the multiple operations in FIG. 5A , reference can be made to the aforementioned description of the multiple operations in FIG. 2 , so the details are not repeated here. As mentioned above, in this example, the convolution operation only includes one convolution operation layer, so after the operation S503 is performed, the dimensions of the operation data DC can be restored according to the original dimension form to generate the output data DO.

圖5B為根據本案一些實施例繪製執行多個卷積運算層的運算的資料流程圖。相較於圖5A，在圖5B的例子中，三維卷積運算裝置100所運行的神經網路模型包含多層卷積網路，例如，前述的卷積運算包含第一卷積運算層與第二卷積運算層。FIG. 5B is a data flow chart for executing operations of multiple convolutional operation layers according to some embodiments of the present case. Compared with FIG. 5A , in the example of FIG. 5B , the neural network model run by the three-dimensional convolution operation device 100 includes a multi-layer convolution network. For example, the aforementioned convolution operation includes a first convolution operation layer and a second convolution operation layer. Convolutional operation layer.

在操作S511中，對輸入資料DIN進行維度交換以產生資料D1。在操作S512中，對權重資料DW1進行維度交換以產生資料D2。在操作S513中，分塊地對資料D1與資料D2執行第一卷積運算層的運算，以產生暫存資料DC’（其可儲存於圖1的緩衝器120）。在操作S514中，對權重資料DW2進行維度交換（即將權重資料DW2在深度維度與通道維度上的多個元素連續地排列在一起，其中權重資料DW2相當於第二卷積層的卷積核）以產生資料D3（其可儲存於圖1的外部記憶體100A，並可經由直接記憶體讀取電路110轉存於緩衝器120）。在操作S515中，分塊地對暫存資料DC’與資料D3執行第二卷積運算層的運算，以產生運算資料DC。在操作S516中，按照原始維度形式重新排序運算資料DC，以產生輸出資料DO。圖5B的多個操作可參考前述圖2的多個操作之說明，故於此不再重複贅述。例如，操作S513與操作S515可參照圖2的操作S215至操作S240之說明。於一些其他的實施例中，若權重資料DW2為常量資料，運算平台可將對應於權重資料DW2的資料D3事先儲存於外部記憶體100A。In operation S511, dimension exchange is performed on the input data DIN to generate data D1. In operation S512, dimension exchange is performed on the weight data DW1 to generate data D2. In operation S513, the operation of the first convolution operation layer is performed on the data D1 and the data D2 in blocks to generate temporary data DC' (which can be stored in the buffer 120 of FIG. 1). In operation S514, the weight data DW2 is dimensionally exchanged (that is, multiple elements of the weight data DW2 in the depth dimension and the channel dimension are continuously arranged together, where the weight data DW2 is equivalent to the convolution kernel of the second convolution layer) to Data D3 is generated (which can be stored in the external memory 100A of FIG. 1 and transferred to the buffer 120 via the direct memory read circuit 110 ). In operation S515, the operation of the second convolution operation layer is performed on the temporary data DC' and the data D3 in blocks to generate the operation data DC. In operation S516, the operation data DC is reordered according to the original dimension form to generate the output data DO. For the multiple operations of FIG. 5B , reference can be made to the aforementioned description of the multiple operations of FIG. 2 , so the details are not repeated here. For example, operations S513 and S515 may refer to the description of operations S215 to S240 in FIG. 2 . In some other embodiments, if the weight data DW2 is constant data, the computing platform can store the data D3 corresponding to the weight data DW2 in the external memory 100A in advance.

如前所述，在此例中，卷積運算包含兩個卷積運算層。因此，可在未重新排序維度形式下將經由第一個卷積運算層所得到的計算結果（即暫存資料DC’）直接輸入到第二個卷積運算層。換言之，在包含多個卷積運算層的神經網路模型中，可設置為按照原始維度形式來重新排序最後一個卷積運算層（於此例為第二個卷積運算層）的計算結果（即運算資料DC）以獲得輸出資料DO，而不用將經由每一層卷積運算層所得到的結果都還原回原始維度形式。如此，可進一步加快卷積運算的整體處理效率。As mentioned before, in this case the convolution operation consists of two convolution operation layers. Therefore, the calculation result obtained through the first convolution operation layer (i.e., the temporary data DC’) can be directly input to the second convolution operation layer without reordering the dimensions. In other words, in a neural network model containing multiple convolutional operation layers, it can be set to reorder the calculation results of the last convolutional operation layer (in this case, the second convolutional operation layer) according to the original dimensional form ( That is, the operation data DC) is used to obtain the output data DO without having to restore the results obtained through each convolutional operation layer back to the original dimensional form. In this way, the overall processing efficiency of the convolution operation can be further accelerated.

上述多個實施例是以三維卷積運算進行說明，但本案並不以此為限。應當理解，對資料的維度進行重新排序的操作可推廣應用至更高維度的卷積運算。The above-mentioned embodiments are described using three-dimensional convolution operations, but the present case is not limited to this. It should be understood that the operation of reordering the dimensions of the data can be generalized to higher-dimensional convolution operations.

綜上所述，本案一些實施例中的三維卷積運算裝置與三維運算方法可利用對資料的維度形式進行重新排序，以提高直接記憶體電路的存取效率。進一步地，藉由上述的重新排序，可降低三維卷積運算的複雜度，進而使得三維卷積裝置可藉由執行類似或相同於二維卷積運算的操作來實現三維卷積運算。如此，可提升三維卷積運算的處理效率。In summary, the three-dimensional convolution operation device and the three-dimensional operation method in some embodiments of the present application can reorder the dimensional form of data to improve the access efficiency of the direct memory circuit. Furthermore, through the above reordering, the complexity of the three-dimensional convolution operation can be reduced, thereby allowing the three-dimensional convolution device to implement the three-dimensional convolution operation by performing operations similar or identical to the two-dimensional convolution operation. In this way, the processing efficiency of three-dimensional convolution operations can be improved.

雖然本案之實施例如上所述，然而該些實施例並非用來限定本案，本技術領域具有通常知識者可依據本案之明示或隱含之內容對本案之技術特徵施以變化，凡此種種變化均可能屬於本案所尋求之專利保護範疇，換言之，本案之專利保護範圍須視本說明書之申請專利範圍所界定者為準。Although the embodiments of this case are as described above, these embodiments are not intended to limit this case. Those with ordinary knowledge in the technical field can make changes to the technical features of this case based on the explicit or implicit contents of this case. All these changes All may fall within the scope of patent protection sought in this case. In other words, the scope of patent protection in this case must be determined by the scope of the patent application in this specification.

100:三維卷積運算裝置 100A:外部記憶體 110:直接記憶體存取電路 120:緩衝器 130:維度交換電路 140:卷積運算電路 200:三維卷積運算方法 BL1~BL4:分界線 C, Ci, Ck, Co:通道維度 D000, D001, D010, D011, D100, D101:資料 D110, D111, D200, D201, D210, D211:資料 D1~D3:資料 DC:運算資料 DC’:暫存資料 DIN:輸入資料 DO:輸出資料 DW1, DW2:權重資料 D, Di, Dk:深度維度, H, Hi, Hk:高度維度 S205, S210, S215, S220, S225, S230, S235, S240, S245, S250:操作 S501~S504, S511~S516:操作 W, Wi, Wk:寬度維度 100: Three-dimensional convolution operation device 100A: External memory 110: Direct memory access circuit 120:Buffer 130:Dimension switching circuit 140: Convolution operation circuit 200: Three-dimensional convolution operation method BL1~BL4: dividing line C, Ci, Ck, Co: channel dimensions D000, D001, D010, D011, D100, D101: Information D110, D111, D200, D201, D210, D211: Information D1~D3: information DC: calculation data DC’: Temporarily stored data DIN: input data DO: Output data DW1, DW2: weight data D, Di, Dk: depth dimension, H, Hi, Hk: height dimension S205, S210, S215, S220, S225, S230, S235, S240, S245, S250: Operation S501~S504, S511~S516: Operation W, Wi, Wk: width dimension

［圖1］為根據本案一些實施例繪製的一種三維卷積運算裝置的示意圖；［圖2］為根據本案一些實施例繪製一種三維卷積運算方法的流程圖；［圖3］為根據本案一些實施例繪製對圖1的輸入資料進行維度交換以產生第一資料的示意圖；［圖4A］為根據本案一些實施例繪製根據經分塊後的第一資料的示意圖；［圖4B］為根據本案一些實施例繪製根據經分塊後的第二資料的示意圖；［圖5A］為根據本案一些實施例繪製執行單一卷積運算層的運算的資料流程圖；以及［圖5B］為根據本案一些實施例繪製執行多個卷積運算層的運算的資料流程圖。 [Figure 1] is a schematic diagram of a three-dimensional convolution operation device drawn according to some embodiments of this case; [Figure 2] is a flow chart of a three-dimensional convolution operation method according to some embodiments of this case; [Figure 3] is a schematic diagram of dimensional exchange of the input data in Figure 1 to generate the first data according to some embodiments of this case; [Figure 4A] is a schematic diagram of drawing the first data divided into blocks according to some embodiments of this case; [Figure 4B] is a schematic diagram of drawing the second data divided into blocks according to some embodiments of this case; [Figure 5A] is a data flow chart for executing operations of a single convolutional operation layer according to some embodiments of this case; and [Figure 5B] is a data flow chart for executing operations of multiple convolutional operation layers according to some embodiments of this case.

200: 三維卷積運算方法 S205, S210, S215, S220, S225, S230, S235, S240, S245, S250:操作 200: Three-dimensional convolution operation method S205, S210, S215, S220, S225, S230, S235, S240, S245, S250: Operation

Claims

A three-dimensional convolution operation method, including: Perform dimension exchange on an input data to continuously arrange a plurality of elements of the input data in the depth dimension and channel dimension together, thereby generating a first data; Perform a convolution operation on the first data and a second data corresponding to a first weight data in blocks to generate an operation data; and Reordering the operation data according to the original dimensional form of the input data to generate an output data.

For example, the three-dimensional convolution operation method of claim 1, wherein the convolution operation is performed on the first data and the second data in blocks to generate the operation data includes: A plurality of elements corresponding to different depths in the first data are continuously read to perform the convolution operation, wherein the elements correspond to the same width and the same height.

For example, the three-dimensional convolution operation method of request item 1 further includes: Dimension exchange is performed on the first weight data to continuously arrange a plurality of elements of the first weight data in the depth dimension and the channel dimension together, thereby generating the second data.

The three-dimensional convolution operation method of claim 1, wherein the convolution operation includes a first convolution operation layer and a second convolution operation layer, and the first data is read in blocks and corresponds to a first weight A second data of data is used to perform a convolution operation to generate the operation data including: Perform the operation of the first convolution operation layer on the first data and the second data in blocks to generate temporary data; Perform dimension exchange on a second weight data to continuously arrange a plurality of elements of the second weight data in the depth dimension and channel dimension together to generate a third data; and Perform the operation of the second convolution operation layer on the temporarily stored data and the third data in blocks to generate the operation data.

A three-dimensional convolution operation device, including: a buffer; A direct memory access circuit reads input data from an external memory and stores the input data into the buffer; A one-dimensional switching circuit that reads the input data from the buffer and performs dimension switching on the input data to continuously arrange a plurality of elements of the input data in the depth dimension and the channel dimension together to generate a first data; as well as a convolution operation circuit that performs a convolution operation on the first data and a second data corresponding to a first weight data in blocks to generate an operation data; Wherein, the dimension switching circuit further reorders the operation data according to an original dimension form of the input data to generate an output data.

The three-dimensional convolution operation device of claim 5, wherein the external memory further stores the second data, and a plurality of elements of the second data in the depth dimension and the width dimension are continuously arranged together.

As claimed in claim 5, the three-dimensional convolution operation device, wherein the direct memory access circuit further reads the first weight data from the external memory to the buffer, and the dimension exchange circuit further reads the third weight data from the buffer. A weighting data and performing dimension exchange on the first weighting data to continuously arrange a plurality of elements of the first weighting data in the depth dimension and the channel dimension together to generate the second data.

As claimed in claim 5, the three-dimensional convolution operation device, wherein the convolution operation circuit continuously reads a plurality of elements corresponding to different depths in the first data to perform the convolution operation, wherein these elements correspond to the same width and same height.

The three-dimensional convolution operation device of claim 5, wherein the convolution operation includes a first convolution operation layer and a second convolution operation layer, and the convolution operation circuit performs block processing on the first data and the third convolution operation layer. Perform the operation of the first convolution operation layer on the two data to generate a temporary storage data, and perform the operation of the second convolution operation layer on the temporary storage data and a third data corresponding to a second weight data in blocks. Compute to generate the computation data.

The three-dimensional convolution operation device of claim 9, wherein the direct memory access circuit further reads the second weight data from the external memory to the buffer, and the dimension exchange circuit further reads the second weight data from the buffer. The second weight data is dimensionally exchanged and a plurality of elements of the second weight data in the depth dimension and the channel dimension are continuously arranged together to generate the third data.

The three-dimensional convolution operation device of claim 5, wherein the first data generated by the dimension exchange circuit is stored in the external memory through the buffer and the direct memory access circuit, and the convolution operation circuit passes through The direct memory access circuit and the buffer read the first data in blocks.