TWI775296B

TWI775296B - Deep neural network accelerating method using ring tensors and system thereof

Info

Publication number: TWI775296B
Application number: TW110103724A
Authority: TW
Inventors: 黃朝宗
Original assignee: 國立清華大學
Priority date: 2020-06-09
Filing date: 2021-02-01
Publication date: 2022-08-21
Also published as: CN113780538A; TW202147147A

Abstract

A deep neural network accelerating method using a plurality of ring tensors including a ring tensor setting step, a ring tensor convolution calculating step and a non-linear activation function calculating step. The ring tensor setting step is performed to set an input feature tensor and a weight tensor of a convolutional network. The input feature tensor includes a plurality of input feature ring elements, and the weight tensor includes a plurality of weight ring elements. The ring tensor convolution calculating step is performed to calculate the input feature ring elements of the input feature tensor and the weight ring elements of the weight tensor according to a ring multiplication calculating step and a ring addition calculating step to generate a plurality of convolution feature ring elements of a convolution feature tensor. The non-linear activation function calculating step is performed to execute a directional rectified linear unit (ReLU) on the convolution feature ring elements of the convolution feature tensor to generate output feature ring elements. The ring multiplication calculating step includes performing a ring multiplication on one of the input feature ring elements and one of the weight ring elements to generate a multiplication output ring element. A multiplication output component of the multiplication output ring element is obtained by performing a component-wise product on an input feature component of the one of the input feature ring elements and a weight component of the one of the weight ring elements. The ring addition calculating step includes performing a ring addition on a plurality of the multiplication output ring elements to generate one of the convolution feature ring elements. A convolution feature component of each of the convolution feature ring elements is obtained by performing a component-wise addition on a plurality of the multiplication output components of the multiplication output ring elements. Therefore, the present disclosure provides a novel ring algebra which adopts the component-wise product for the ring multiplication to minimize complexity and directional ReLU for ring non-linearity to achieve great quality.

Description

Deep neural network acceleration method and system using ring tensors

本發明是關於一種深度神經網路加速方法及其系統，特別是關於一種使用環張量之深度神經網路加速方法及其系統。The present invention relates to a deep neural network acceleration method and system, in particular to a deep neural network acceleration method and system using ring tensors.

當使用卷積神經網路於影像處理應用時，其運算需求相當高，主流相關研究是採用參數修剪(weight pruning)的作法來去除值很小或不需要的網路參數，以減少運算量並同時減少所需的參數儲存空間。然而，此修剪後的網路之運算流程會變得不規則，進而造成硬體加速時的負擔，因此不適用於採用大量平行運算之加速器。When the convolutional neural network is used in image processing applications, its computing requirements are quite high. The mainstream related research is to use the method of parameter pruning (weight pruning) to remove network parameters with small or unnecessary values. At the same time, the required parameter storage space is reduced. However, the operation flow of the pruned network will become irregular, which will cause the burden of hardware acceleration, so it is not suitable for accelerators that use a large number of parallel operations.

另一方面，有一些研究在探討如何在減少運算量下並同時保留網路之規律性，其中包含四元數(Quaternion)網路、循環卷積神經網路(Circulant Convolutional Neural Network；CirCNN)、混洗網路(ShuffleNet)以及使用哈達瑪轉換之卷積神經網路(HadaNet)，這些作法的共通點在於使用特徵向量於通道方向的稀疏性，十分規律的減少運算量以及參數量，其中為了補救因為較少乘法帶來的品質損失，必需額外地讓不同通道間的資訊能夠交換。概念上可分為通道間之額外線性轉換以及通道變換位置，其中線性轉換的作法可得到較好的品質，但是對於常見的定點數運算來說，此些轉換會增加特徵值與參數的位元數，進而導致乘法器的複雜度上升。On the other hand, there are some studies on how to reduce the amount of computation while preserving the regularity of the network, including Quaternion network, Circulant Convolutional Neural Network (CirCNN), The shuffling network (ShuffleNet) and the convolutional neural network (HadaNet) using Hadamard transformation, the common point of these methods is to use the sparseness of the feature vector in the channel direction, and reduce the amount of computation and parameters very regularly. To remedy the quality loss due to fewer multiplications, it is necessary to additionally enable the exchange of information between different channels. Conceptually, it can be divided into additional linear transformation between channels and channel transformation position. Linear transformation can obtain better quality, but for common fixed-point operations, these transformations will increase the bits of eigenvalues and parameters. number, which in turn leads to an increase in the complexity of the multiplier.

由此可知，目前市場上缺乏一種可不增加乘法器的位元數、有效地交換通道間之資訊及保持影像品質的使用環張量之深度神經網路加速方法及其系統，故相關業者均在尋求其解決之道。It can be seen that there is currently a lack of a deep neural network acceleration method and system using ring tensors that can effectively exchange information between channels and maintain image quality without increasing the number of bits of the multiplier. seek its solution.

因此，本發明之目的在於提供一種使用環張量之深度神經網路加速方法及其系統，其利用具有方向性之非線性激活函數，在不增加環元素乘法運算步驟中乘法器的位元數之條件下，仍保持影像品質，可避免習知技術中因增加乘法器之位元數而導致複雜度上升之問題。Therefore, the object of the present invention is to provide a deep neural network acceleration method and system using ring tensors, which utilize a directional nonlinear activation function without increasing the number of bits of the multiplier in the multiplication step of the ring elements. Under such conditions, the image quality is still maintained, and the problem of increasing the complexity caused by increasing the number of bits of the multiplier in the prior art can be avoided.

依據本發明的方法態樣之一實施方式提供一種使用環張量之深度神經網路加速方法，包含以下步驟：環張量設定步驟、環張量卷積運算步驟以及張量非線性激活函數運算步驟。其中環張量設定步驟係設定卷積網路之輸入特徵環張量與參數環張量，輸入特徵環張量包含複數個輸入特徵環元素，參數環張量包含複數個參數環元素。環張量卷積運算步驟係依據環元素乘法運算步驟與環元素加法運算步驟對輸入特徵環張量之此些輸入特徵環元素及參數環張量之此些參數環元素進行運算，以產出卷積特徵環張量之複數個卷積特徵環元素。張量非線性激活函數運算步驟係對卷積特徵環張量之此些卷積特徵環元素之一者執行方向性非線性激活函數而產生輸出特徵環元素。環元素乘法運算步驟包含對此些輸入特徵環元素之一者與此些參數環元素之一者執行環元素乘法而產生乘法輸出環元素。乘法輸出環元素之乘法輸出分量由此些輸入特徵環元素之此者之輸入特徵分量及此些參數環元素之此者之參數分量執行分量積求得。環元素加法運算步驟包含對複數個乘法輸出環元素執行環元素加法而產生此些卷積特徵環元素之此者。各卷積特徵環元素之卷積特徵分量由此些乘法輸出環元素之複數個乘法輸出分量執行分量和求得。According to an embodiment of the method aspect of the present invention, a deep neural network acceleration method using ring tensors is provided, which includes the following steps: a ring tensor setting step, a ring tensor convolution operation step, and a tensor nonlinear activation function operation step. The loop tensor setting step is to set the input feature loop tensor and parameter loop tensor of the convolution network. The input feature loop tensor includes a plurality of input feature loop elements, and the parameter loop tensor includes a plurality of parameter loop elements. The ring tensor convolution operation step operates on the input feature ring elements of the input feature ring tensor and the parameter ring elements of the parameter ring tensor according to the ring element multiplication operation step and the ring element addition operation step to output Convolutional eigenloop elements of the convolutional eigenloop tensor. The tensor nonlinear activation function operation step is to perform a directional nonlinear activation function on one of the convolution feature loop elements of the convolution feature loop tensor to generate an output feature loop element. The ring element multiplication operation step includes performing a ring element multiplication on one of the input feature ring elements and one of the parameter ring elements to generate a multiplication output ring element. The multiplicative output components of the multiplicative output loop elements are obtained by performing a component product of the input feature components of one of the input feature loop elements and the parameter components of the one of the parameter loop elements. The ring element addition operation step includes performing ring element addition on a plurality of multiplication output ring elements to generate one of the convolutional feature ring elements. The convolution feature components of each convolution feature loop element are obtained by performing a component sum of the plurality of multiplication output components of these multiplication output loop elements.

藉此，本發明的使用環張量之深度神經網路加速方法透過具有方向性之非線性激活函數，能將額外之線性轉換置於方向性非線性激活函數中，因此不會增加環元素乘法運算步驟中乘法器的位元數。此外，本發明能有效地交換通道間的資訊，在環代數裡達到最高的影像品質。In this way, the deep neural network acceleration method using ring tensors of the present invention can place additional linear transformations in the directional nonlinear activation function through a directional nonlinear activation function, so that the ring element multiplication is not increased. The number of bits of the multiplier in the operation step. In addition, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.

前述實施方式之其他實施例如下：前述方向性非線性激活函數可包含第一轉換步驟、線性整流函數運算步驟及第二轉換步驟。其中第一轉換步驟係對此些卷積特徵環元素之此者執行第一線性轉換而產生線性輸出環元素。線性整流函數運算步驟係對線性輸出環元素執行分量線性整流函數而產生線性整流輸出環元素。第二轉換步驟係對線性整流輸出環元素執行第二線性轉換而產生輸出特徵環元素。Other examples of the aforementioned embodiments are as follows: the aforementioned directional nonlinear activation function may include a first transformation step, a linear rectification function operation step, and a second transformation step. wherein the first transformation step performs a first linear transformation on one of the convolutional feature loop elements to generate a linear output loop element. The linear rectification function operation step is to perform a component linear rectification function on the linear output loop elements to generate the linear rectified output loop elements. The second transformation step is to perform a second linear transformation on the linearly rectified output loop elements to generate output characteristic loop elements.

前述實施方式之其他實施例如下：前述第一線性轉換與第二線性轉換之任一者可為哈達瑪轉換(Hadamard Transform)。Other examples of the aforementioned embodiments are as follows: either one of the aforementioned first linear transformation and the second linear transformation may be a Hadamard Transform.

前述實施方式之其他實施例如下：前述方向性非線性激活函數可包含第一轉換步驟、正規化線性轉換步驟、線性整流函數運算步驟及第二轉換步驟。其中第一轉換步驟係對此些卷積特徵環元素之此者執行第一線性轉換而產生第一線性輸出環元素。正規化線性轉換步驟係對第一線性輸出環元素與正規化線性轉換執行環元素乘法而產生第二線性輸出環元素。線性整流函數運算步驟係對第二線性輸出環元素執行分量線性整流函數而產生線性整流輸出環元素。第二轉換步驟係對線性整流輸出環元素執行第二線性轉換而產生輸出特徵環元素。第一線性轉換與第二線性轉換之任一者為哈達瑪轉換。Other examples of the aforementioned embodiments are as follows: the aforementioned directional nonlinear activation function may include a first transformation step, a normalized linear transformation step, a linear rectification function operation step, and a second transformation step. wherein the first transformation step performs a first linear transformation on one of the convolutional feature loop elements to generate a first linear output loop element. The normalized linear transformation step is performing ring element multiplication on the first linear output loop element and the normalized linear transformation to produce a second linear output loop element. The linear rectification function operation step is to perform a component linear rectification function on the second linear output loop element to generate a linear rectified output loop element. The second transformation step is to perform a second linear transformation on the linearly rectified output loop elements to generate output characteristic loop elements. Either of the first linear transformation and the second linear transformation is a Hadamard transformation.

前述實施方式之其他實施例如下：前述使用環張量之深度神經網路加速方法可更包含張量量化步驟，此張量量化步驟係對輸出特徵環元素執行量化運算而產生量化輸出環元素。環張量設定步驟、環張量卷積運算步驟、張量非線性激活函數運算步驟及張量量化步驟依序執行。Other examples of the aforementioned embodiments are as follows: the aforementioned deep neural network acceleration method using ring tensors may further include a tensor quantization step, and the tensor quantization step performs a quantization operation on the output feature ring elements to generate quantized output ring elements. The ring tensor setting step, the ring tensor convolution operation step, the tensor nonlinear activation function operation step, and the tensor quantization step are performed in sequence.

依據本發明的結構態樣之一實施方式提供一種使用環張量之深度神經網路加速系統，其包含第一記憶體、第二記憶體以及運算處理單元。其中第一記憶體用以存取卷積網路之輸入特徵環張量，輸入特徵環張量包含複數個輸入特徵環元素。第二記憶體用以存取卷積網路之參數環張量，參數環張量包含複數個參數環元素。運算處理單元電性連接於第一記憶體與第二記憶體，運算處理單元接收輸入特徵環張量與參數環張量並經配置以實施包含以下步驟之操作：環張量設定步驟、環張量卷積運算步驟及張量非線性激活函數運算步驟。環張量設定步驟係設定卷積網路之輸入特徵環張量與參數環張量。環張量卷積運算步驟係依據環元素乘法運算步驟與環元素加法運算步驟對輸入特徵環張量之此些輸入特徵環元素及參數環張量之此些參數環元素進行運算，以產出卷積特徵環張量之複數個卷積特徵環元素。張量非線性激活函數運算步驟係對卷積特徵環張量之此些卷積特徵環元素之一者執行方向性非線性激活函數而產生輸出特徵環元素。環元素乘法運算步驟包含對此些輸入特徵環元素之一者與此些參數環元素之一者執行環元素乘法而產生乘法輸出環元素。乘法輸出環元素之乘法輸出分量由此些輸入特徵環元素之此者之輸入特徵分量及此些參數環元素之此者之參數分量執行分量積求得。環元素加法運算步驟包含對複數個乘法輸出環元素執行環元素加法而產生此些卷積特徵環元素之此者，各卷積特徵環元素之卷積特徵分量由此些乘法輸出環元素之複數個乘法輸出分量執行分量和求得。According to one embodiment of the structural aspect of the present invention, there is provided a deep neural network acceleration system using ring tensors, which includes a first memory, a second memory, and an arithmetic processing unit. The first memory is used for accessing the input feature loop tensor of the convolution network, and the input feature loop tensor includes a plurality of input feature loop elements. The second memory is used for accessing the parameter loop tensor of the convolution network, and the parameter loop tensor includes a plurality of parameter loop elements. The operation processing unit is electrically connected to the first memory and the second memory. The operation processing unit receives the input feature ring tensor and the parameter ring tensor and is configured to perform operations including the following steps: a ring tensor setting step, a ring extension Quantum convolution operation steps and tensor nonlinear activation function operation steps. The loop tensor setting step is to set the input feature loop tensor and parameter loop tensor of the convolutional network. The ring tensor convolution operation step operates on the input feature ring elements of the input feature ring tensor and the parameter ring elements of the parameter ring tensor according to the ring element multiplication operation step and the ring element addition operation step to output Convolutional eigenloop elements of the convolutional eigenloop tensor. The tensor nonlinear activation function operation step is to perform a directional nonlinear activation function on one of the convolution feature loop elements of the convolution feature loop tensor to generate an output feature loop element. The ring element multiplication operation step includes performing a ring element multiplication on one of the input feature ring elements and one of the parameter ring elements to generate a multiplication output ring element. The multiplicative output components of the multiplicative output loop elements are obtained by performing a component product of the input feature components of one of the input feature loop elements and the parameter components of the one of the parameter loop elements. The ring element addition operation step comprises performing ring element addition on a plurality of multiplication output ring elements to generate one of these convolution feature ring elements, the convolution feature components of each convolution feature ring element from which complex numbers of the multiplication output ring elements A component sum is performed on the output components of the multiplication.

藉此，本發明的使用環張量之深度神經網路加速系統利用具有方向性之非線性激活函數，能將額外之線性轉換置於方向性非線性激活函數中，因此不會增加環元素乘法運算步驟中乘法器的位元數。再者，本發明能有效地交換通道間的資訊，在環代數裡達到最高的影像品質。In this way, the deep neural network acceleration system using ring tensors of the present invention utilizes a directional nonlinear activation function, and can place additional linear transformations in the directional nonlinear activation function, so that the multiplication of ring elements is not increased. The number of bits of the multiplier in the operation step. Furthermore, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.

前述實施方式之其他實施例如下：前述第一線性轉換與第二線性轉換之任一者可為哈達瑪轉換。Other examples of the aforementioned embodiments are as follows: either one of the aforementioned first linear transformation and the second linear transformation may be a Hadamard transformation.

前述實施方式之其他實施例如下：前述運算處理單元經配置以實施張量量化步驟，張量量化步驟係對輸出特徵環元素執行量化運算而產生量化輸出環元素。環張量設定步驟、環張量卷積運算步驟、張量非線性激活函數運算步驟及張量量化步驟依序執行。Other examples of the aforementioned embodiments are as follows: the aforementioned arithmetic processing unit is configured to perform a tensor quantization step that performs a quantization operation on output feature loop elements to generate quantized output loop elements. The ring tensor setting step, the ring tensor convolution operation step, the tensor nonlinear activation function operation step, and the tensor quantization step are performed in sequence.

以下將參照圖式說明本發明之複數個實施例。為明確說明起見，許多實務上的細節將在以下敘述中一併說明。然而，應瞭解到，這些實務上的細節不應用以限制本發明。也就是說，在本發明部分實施例中，這些實務上的細節是非必要的。此外，為簡化圖式起見，一些習知慣用的結構與元件在圖式中將以簡單示意的方式繪示之；並且重複之元件將可能使用相同的編號表示之。Several embodiments of the present invention will be described below with reference to the drawings. For the sake of clarity, many practical details are set forth in the following description. It should be understood, however, that these practical details should not be used to limit the invention. That is, in some embodiments of the present invention, these practical details are unnecessary. In addition, for the purpose of simplifying the drawings, some well-known and conventional structures and elements will be shown in a simplified and schematic manner in the drawings; and repeated elements may be denoted by the same reference numerals.

此外，本文中當某一元件(或單元或模組等)「連接/連結」於另一元件，可指所述元件是直接連接/連結於另一元件，亦可指某一元件是間接連接/連結於另一元件，意即，有其他元件介於所述元件及另一元件之間。而當有明示某一元件是「直接連接/連結」於另一元件時，才表示沒有其他元件介於所述元件及另一元件之間。而第一、第二、第三等用語只是用來描述不同元件，而對元件本身並無限制，因此，第一元件亦可改稱為第二元件。且本文中之元件/單元/電路之組合非此領域中之一般周知、常規或習知之組合，不能以元件/單元/電路本身是否為習知，來判定其組合關係是否容易被技術領域中之通常知識者輕易完成。In addition, when a certain element (or unit or module, etc.) is "connected/connected" to another element herein, it may mean that the element is directly connected/connected to another element, or that a certain element is indirectly connected /Connected to another element means that there is another element between the element and the other element. When it is expressly stated that an element is "directly connected/connected" to another element, it means that no other element is interposed between the element and the other element. The terms first, second, third, etc. are only used to describe different elements, and do not limit the elements themselves. Therefore, the first element can also be renamed as the second element. And the combination of elements/units/circuits in this article is not a commonly known, conventional or well-known combination in this field, and it cannot be determined whether the combination relationship of the elements/units/circuits is well-known or not easily understood by those in the technical field. Usually the knowledgeable can do it easily.

請參閱第1圖，第1圖係繪示本發明第一實施例的使用環張量之深度神經網路加速方法100的流程示意圖。使用環張量之深度神經網路加速方法100包含環張量設定步驟S02、環張量卷積運算步驟S04以及張量非線性激活函數運算步驟S06。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a deep neural network acceleration method 100 using ring tensors according to a first embodiment of the present invention. The deep neural network acceleration method 100 using ring tensors includes a ring tensor setting step S02 , a ring tensor convolution operation step S04 , and a tensor nonlinear activation function operation step S06 .

環張量設定步驟S02係設定卷積網路之輸入特徵環張量110(input feature tensor)與參數環張量120(weight tensor)，輸入特徵環張量110包含複數個輸入特徵環元素，參數環張量120包含複數個參數環元素。環張量卷積運算步驟S04係依據環元素乘法運算步驟S042與環元素加法運算步驟S044對輸入特徵環張量110之輸入特徵環元素及參數環張量120之參數環元素進行運算，以產出卷積特徵環張量130之複數個卷積特徵環元素。其中環元素乘法運算步驟S042包含對此些輸入特徵環元素之一者與此些參數環元素之一者執行環元素乘法而產生乘法輸出環元素。乘法輸出環元素之一乘法輸出分量由此些輸入特徵環元素之此者之輸入特徵分量及此些參數環元素之此者之參數分量執行分量積(component-wise product)求得。而環元素加法運算步驟S044則包含對複數個乘法輸出環元素執行一環元素加法而產生此些卷積特徵環元素之一者，各卷積特徵環元素之一卷積特徵分量由此些乘法輸出環元素之複數個乘法輸出分量執行分量和(component-wise addition)求得。此外，張量非線性激活函數運算步驟S06係對卷積特徵環張量130之此些卷積特徵環元素之此者執行方向性非線性激活函數S062(directional Rectified Linear Unit；directional ReLU)而產生一輸出特徵環元素。藉此，本發明的使用環張量之深度神經網路加速方法100利用具有方向性之非線性激活函數，在不增加環元素乘法運算步驟S042中乘法器的位元數之條件下，仍保持影像品質，可避免習知技術中因增加乘法器之位元數而導致複雜度上升之問題。以下將透過較詳細的實施例來說明上述各步驟之細節。The loop tensor setting step S02 is to set an input feature loop tensor 110 (input feature tensor) and a parameter loop tensor 120 (weight tensor) of the convolutional network. The input feature loop tensor 110 includes a plurality of input feature loop elements. The parameter Ring tensor 120 contains a plurality of parametric ring elements. The ring tensor convolution operation step S04 operates on the input feature ring elements of the input feature ring tensor 110 and the parameter ring elements of the parameter ring tensor 120 according to the ring element multiplication operation step S042 and the ring element addition operation step S044 to produce A plurality of convolutional feature loop elements of the convolutional feature loop tensor 130 are obtained. The loop element multiplication operation step S042 includes performing loop element multiplication on one of the input feature loop elements and one of the parameter loop elements to generate a multiplication output loop element. A multiplicative output component of one of the multiplicative output loop elements is obtained by performing a component-wise product of the input feature component of the one of the input feature loop elements and the parameter component of the one of the parameter loop elements. The loop element addition operation step S044 includes performing a loop element addition on a plurality of multiplication output loop elements to generate one of the convolution feature loop elements, and a convolution feature component of each convolution feature loop element is output from these multiplications The multiple multiplication output components of the ring elements perform component-wise addition. In addition, the tensor nonlinear activation function operation step S06 is to perform a directional nonlinear activation function S062 (directional Rectified Linear Unit; directional ReLU) on one of the convolution feature loop elements of the convolution feature loop tensor 130 to generate an output Characteristic ring element. Thereby, the deep neural network acceleration method 100 using ring tensors of the present invention utilizes the nonlinear activation function with directivity, without increasing the number of bits of the multiplier in the ring element multiplication operation step S042, and still maintains The image quality can avoid the problem of increasing the complexity caused by increasing the number of bits of the multiplier in the prior art. The details of the above steps will be described below through more detailed embodiments.

請一併參閱第1圖至第4圖，其中第2圖係繪示第1圖的使用環張量之深度神經網路加速方法100的環元素乘法運算步驟S042之示意圖；第3圖係繪示第1圖的使用環張量之深度神經網路加速方法100的環元素加法運算步驟S044之示意圖；及第4圖係繪示第1圖的使用環張量之深度神經網路加速方法100的方向性非線性激活函數S062之流程示意圖。如圖所示，使用環張量之深度神經網路加速方法100包含環張量設定步驟S02、環張量卷積運算步驟S04以及張量非線性激活函數運算步驟S06。Please refer to FIG. 1 to FIG. 4 together, wherein FIG. 2 is a schematic diagram of the ring element multiplication operation step S042 of the deep neural network acceleration method 100 using ring tensors in FIG. 1; FIG. 3 is a schematic diagram of Fig. 1 is a schematic diagram showing the ring element addition operation step S044 of the deep neural network acceleration method 100 using ring tensors; and Fig. 4 is a diagram showing the deep neural network acceleration method 100 using ring tensors of Fig. 1 Schematic flow chart of the directional nonlinear activation function S062. As shown in the figure, the deep neural network acceleration method 100 using ring tensors includes a ring tensor setting step S02 , a ring tensor convolution operation step S04 , and a tensor nonlinear activation function operation step S06 .

環張量設定步驟S02係設定卷積網路之輸入特徵環張量110與參數環張量120，輸入特徵環張量110包含複數個輸入特徵環元素x ，參數環張量120包含複數個參數環元素g 。環張量(ring tensor)為多個維度陣列(array)，其可為環向量(ring vector)或環矩陣(ring matrix)。環元素(ring element)為n 個維度(n -dimension)實數向量，n 為大於等於2之正整數。輸入特徵環元素x 包含複數個輸入特徵分量x_i ，i 為大於等於0之整數。參數環元素g 包含複數個參數分量g_i 。輸入特徵分量x_i 及參數分量g_i 均為實數。The loop tensor setting step S02 is to set the input feature loop tensor 110 and the parameter loop tensor 120 of the convolution network. The input feature loop tensor 110 includes a plurality of input feature loop elements x , and the parameter loop tensor 120 includes a plurality of parameters Ring element g . A ring tensor is a multi-dimensional array, which can be a ring vector or a ring matrix. The ring element is an n -dimension real number vector, where n is a positive integer greater than or equal to 2. The input feature loop element x includes a plurality of input feature components x _i , where i is an integer greater than or equal to 0. The parameter ring element g contains a plurality of parameter components g _i . The input feature components x _i and the parameter components _gi are real numbers.

環張量卷積運算步驟S04係依據環元素乘法運算步驟S042與環元素加法運算步驟S044對輸入特徵環張量110之輸入特徵環元素x 及參數環張量120之參數環元素g 進行運算，以產出卷積特徵環張量130之複數個卷積特徵環元素cz 。卷積特徵環元素cz 包含複數個卷積特徵分量cz_j 。The ring tensor convolution operation step S04 is based on the ring element multiplication operation step S042 and the ring element addition operation step S044 to operate on the input feature ring element x of the input feature ring tensor 110 and the parameter ring element g of the parameter ring tensor 120, To generate a plurality of convolutional feature loop elements cz of the convolutional feature loop tensor 130 . The convolutional feature ring element cz contains a plurality of convolutional feature components cz _j .

環元素乘法運算步驟S042包含對此些輸入特徵環元素x 之一者與此些參數環元素g 之一者執行一環元素乘法(ring multiplication)而產生一乘法輸出環元素z 。乘法輸出環元素z 之一乘法輸出分量z_i 由此些輸入特徵環元素x 之此者之一輸入特徵分量x_i 及此些參數環元素g 之此者之一參數分量g_i 執行一分量積S0422求得，如第2圖所示。詳細地說，環元素乘法運算步驟S042包含將此些輸入特徵環元素x 之一者與此些參數環元素g 之一者先分別透過輸入特徵轉換矩陣T_x 與參數轉換矩陣T_g 轉換而產生輸入特徵轉換輸出

與參數轉換輸出

，然後對輸入特徵轉換輸出

與參數轉換輸出

執行分量積S0422而產生乘法輸出

，最後將乘法輸出

透過乘法輸出轉換矩陣T_z 轉換而產生乘法輸出環元素z 。本實施例之輸入特徵轉換矩陣T_x 、參數轉換矩陣T_g 及乘法輸出轉換矩陣T_z 均為單位矩陣(Identity)。乘法輸出環元素z 與輸入特徵環元素x 之關係可由下列式子(1)表示：z =Gx (1)。其中G 代表同構矩陣，此同構矩陣G 與參數分量g_i 有相關聯。The ring element multiplication operation step S042 includes performing a ring multiplication on one of the input feature ring elements x and one of the parameter ring elements g to generate a multiplication output ring element z . A multiplicative output component _zi of the multiplicative output loop element z performs a component product by one of the input feature components _xi of the one of the input feature loop elements x and a parameter component _gi of the one of the parameter loop elements g S0422 is obtained, as shown in Figure 2. Specifically, the loop element multiplication operation step S042 includes converting one of the input feature loop elements x and one of the parameter loop elements g through the input feature transformation matrix _Tx and the parameter transformation matrix _Tg respectively to generate input feature transform output

Convert output with parameters

, and then transform the output on the input features

Convert output with parameters

Execute the component product S0422 to generate the multiplication output

, and finally the multiplication output

The multiplication output ring element z is generated by transforming the multiplication output transformation matrix _T z . The input feature transformation matrix T _x , the parameter transformation matrix T _g and the multiplication output transformation matrix T _z in this embodiment are all identity matrices. The relationship between the multiplication output ring element z and the input feature ring element x can be expressed by the following equation (1): z = Gx (1). Among them, G represents an isomorphic matrix, and the isomorphic matrix G is associated with the parameter component _gi .

環元素加法運算步驟S044包含對複數個乘法輸出環元素z 執行一環元素加法(ring addition)而產生此些卷積特徵環元素cz 之一者，各卷積特徵環元素cz 之一卷積特徵分量cz_j 由此些乘法輸出環元素z 之複數個乘法輸出分量z_i 執行一分量和S0442求得，如第3圖所示。The ring element addition operation step S044 includes performing a ring addition (ring addition) on the plurality of multiplication output ring elements z to generate one of the convolution feature ring elements cz , a convolution feature component of each convolution feature ring element cz cz _j is obtained by performing a one-component sum S0442 on a plurality of multiplication output components _zi of these multiplication output ring elements z , as shown in FIG. 3 .

張量非線性激活函數運算步驟S06係對卷積特徵環張量130之此些卷積特徵環元素cz 之此者執行方向性非線性激活函數S062而產生輸出特徵環元素f_dir (cz )。詳細地說，方向性非線性激活函數S062包含第一轉換步驟S0622、線性整流函數運算步驟S0624及第二轉換步驟S0626。其中第一轉換步驟S0622係對此些卷積特徵環元素cz 之此者執行第一線性轉換V 而產生線性輸出環元素Vcz 。線性整流函數運算步驟S0624係對線性輸出環元素Vcz 執行分量線性整流函數f_cw (component-wise ReLU)而產生線性整流輸出環元素f_cw (Vcz )。第二轉換步驟S0626係對線性整流輸出環元素f_cw (Vcz )執行一第二線性轉換U而產生輸出特徵環元素f_dir (cz )。再者，環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06依序執行，且第一轉換步驟S0622、線性整流函數運算步驟S0624及第二轉換步驟S0626依序執行。分量線性整流函數f_cw 與輸出特徵環元素f _dir(cz)分別符合下列式子(2)、(3)：f _cw(y)=(max(0,y ₀),...,max(0,y _n-1))^t (2)；f _dir(cz)=Uf _cw(Vcz) (3)。其中y代表環元素，y ₀~y _n-1代表環元素y之分量y _i之i為0~n-1。本實施例之第一線性轉換V與第二線性轉換U之任一者可為哈達瑪轉換H(Hadamard Transform)，其所對應之輸出特徵環元素可用「f _H(cz)」表示，並符合下列式子(4)：f _H(cz)=Hf _cw(Hcz) (4)。 The tensor nonlinear activation function operation step S06 is to perform a directional nonlinear activation function S062 on one of the convolutional feature loop elements cz of the convolutional feature loop tensor 130 to generate an output feature loop element _fdir ( cz ). Specifically, the directional nonlinear activation function S062 includes a first conversion step S0622, a linear rectification function operation step S0624, and a second conversion step S0626. The first transformation step S0622 is to perform a first linear transformation V on one of these convolutional feature loop elements cz to generate a linear output loop element Vcz . The linear rectification function operation step S0624 is to perform a component linear rectification function f _cw (component-wise ReLU) on the linear output loop element Vcz to generate a linear rectified output loop element f _cw ( Vcz ). The second transformation step S0626 is to perform a second linear transformation U on the linearly rectified output loop element f _cw ( Vcz ) to generate the output characteristic loop element f _dir ( cz ). Furthermore, the ring tensor setting step S02, the ring tensor convolution operation step S04, and the tensor nonlinear activation function operation step S06 are performed in sequence, and the first conversion step S0622, the linear rectification function operation step S0624, and the second conversion step S0626 are performed in accordance with sequence execution. The component linear rectification function f _cw and the output eigencycle element f _dir ( cz ) conform to the following equations (2) and (3) respectively: f _cw ( y )=( max (0, y ₀ ),..., max ( 0, y _{n -1} )) ^t (2); f _dir ( cz ) = Uf _cw ( Vcz ) (3). Among them, y represents a ring element, and y ₀ ~ y _{n -1} represents the component y _i of the ring element y , where i is 0 ~ n -1. Either one of the first linear transformation V and the second linear transformation U in this embodiment can be a Hadamard Transform H (Hadamard Transform), and the corresponding output feature loop element can be represented by " f _H ( cz )", and The following formula (4) is satisfied: f _H ( cz ) = Hf _cw ( Hcz ) (4).

藉此，本發明的使用環張量之深度神經網路加速方法100透過具有方向性之非線性激活函數，能將額外之第一線性轉換V與第二線性轉換U置於方向性非線性激活函數S062中，因此不會增加環元素乘法運算步驟S042中乘法器的位元數。此外，本發明能有效地交換通道間的資訊，在環代數(Ring Algebra)裡達到最高的影像品質。 Thereby, the deep neural network acceleration method 100 using ring tensors of the present invention can place the additional first linear transformation V and the second linear transformation U into the directional nonlinear activation function through a directional nonlinear activation function In the activation function S062, the number of bits of the multiplier in the loop element multiplication operation step S042 will not be increased. In addition, the present invention can effectively exchange information between channels and achieve the highest image quality in Ring Algebra.

請一併參閱第1圖至第5圖，其中第5圖係繪示本發明第二實施例的使用環張量之深度神經網路加速方法100a的流程示意圖。如圖所示，使用環張量之深度神經網路加速方法100a包含環張量設定步驟S12、環張量卷積運算步驟S14以及張量非線性激活函數運算步驟S16。環張量設定步驟S12包含設定卷積網路之輸入特徵環張量110與參數環張量120。環張量卷積運算步驟S14包含環元素乘法運算步驟S142與環元素加法運算步驟S144。環張量設定步驟S12、環張量卷積運算步驟S14分別與第1 圖之環張量設定步驟S02、環張量卷積運算步驟S04之作動相同，細節不再贅述。特別的是，第5圖之張量非線性激活函數運算步驟S16係對卷積特徵環張量130之此些卷積特徵環元素cz之一者執行方向性非線性激活函數S162而產生輸出特徵環元素f _dir(cz)。方向性非線性激活函數S162包含第一轉換步驟S1622、正規化線性轉換步驟S1624、線性整流函數運算步驟S1626及第二轉換步驟S1628。其中第一轉換步驟S1622係對此些卷積特徵環元素cz之此者執行第一線性轉換V而產生第一線性輸出環元素Vcz。正規化線性轉換步驟S1624係對第一線性輸出環元素Vcz與正規化線性轉換T執行環元素乘法(同第2圖之環元素乘法運算步驟S042之環元素乘法)而產生第二線性輸出環元素TVcz。線性整流函數運算步驟S1626係對第二線性輸出環元素TVcz執行分量線性整流函數f _cw而產生線性整流輸出環元素f _cw(TVcz)。第二轉換步驟S1628係對線性整流輸出環元素f _cw(TVcz)執行第二線性轉換U而產生輸出特徵環元素f _dir(cz)。本實施例之第一線性轉換V與第二線性轉換U之任一者可為哈達瑪轉換H，正規化線性轉換T可為正規化對角矩陣，用以正規化環元素之分量，但本發明不以此為限。藉此，本發明的使用環張量之深度神經網路加速方法100a可利用正規化線性轉換T實現所需之轉換，其搭配具有方向性之非線性激活函數，能將額外之第一線性轉換V與第二線性轉換U置於方向性非線性激活函數S162中，因此不會增加環元素乘法運算步驟S142中乘法器的位元數。此外，本發明能有效地交換通道間的資訊，在環代數裡達到最高的影像品質。 Please refer to FIG. 1 to FIG. 5 together. FIG. 5 is a schematic flowchart of a deep neural network acceleration method 100 a using ring tensors according to the second embodiment of the present invention. As shown in the figure, the deep neural network acceleration method 100a using ring tensors includes a ring tensor setting step S12, a ring tensor convolution operation step S14, and a tensor nonlinear activation function operation step S16. The ring tensor setting step S12 includes setting the input feature ring tensor 110 and the parameter ring tensor 120 of the convolutional network. The ring tensor convolution operation step S14 includes a ring element multiplication operation step S142 and a ring element addition operation step S144. The operations of the ring tensor setting step S12 and the ring tensor convolution operation step S14 are respectively the same as the operations of the ring tensor setting step S02 and the ring tensor convolution operation step S04 in FIG. 1 , and details are not repeated here. In particular, the tensor nonlinear activation function operation step S16 in FIG. 5 is to perform a directional nonlinear activation function S162 on one of the convolution feature loop elements cz of the convolution feature loop tensor 130 to generate an output feature loop element f _dir ( cz ). The directional nonlinear activation function S162 includes a first conversion step S1622, a normalized linear conversion step S1624, a linear rectification function operation step S1626, and a second conversion step S1628. The first transformation step S1622 is to perform a first linear transformation V on one of these convolutional feature loop elements cz to generate a first linear output loop element Vcz . The normalized linear transformation step S1624 is to perform a ring element multiplication on the first linear output loop element Vcz and the normalized linear transformation T (same as the loop element multiplication in the loop element multiplication operation step S042 in FIG. 2) to generate a second linear output loop Element TVcz . The linear rectification function operation step S1626 is to perform a component linear rectification function f _cw on the second linear output loop element TVcz to generate a linear rectified output loop element f _cw ( TVcz ). The second transformation step S1628 is to perform a second linear transformation U on the linearly rectified output loop element f _cw ( TVcz ) to generate an output characteristic loop element f _dir ( cz ). Either one of the first linear transformation V and the second linear transformation U in this embodiment can be a Hadamard transformation H , and the normalized linear transformation T can be a normalized diagonal matrix for normalizing the components of the ring elements, but The present invention is not limited to this. In this way, the deep neural network acceleration method 100a using ring tensors of the present invention can utilize the normalized linear transformation T to achieve the required transformation, which, together with the directional nonlinear activation function, can convert the additional first linear transformation The transformation V and the second linear transformation U are placed in the directional nonlinear activation function S162, so the number of bits of the multiplier in the loop element multiplication operation step S142 is not increased. In addition, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.

請一併參閱第1圖、第2圖、第3圖、第4圖及第6圖，其中第6圖係繪示本發明第三實施例的使用環張量之深度神經網路加速方法100b的流程示意圖。如圖所示，使用環張量之深度神經網路加速方法100b包含環張量設定步驟S02、環張量卷積運算步驟S04、張量非線性激活函數運算步驟S06以及張量量化步驟S08，其中環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06分別與第1圖之環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06之作動相同，細節不再贅述。特別的是，第6圖之使用環張量之深度神經網路加速方法100b更包含張量量化步驟S08，張量量化步驟S08包含環元素量化步驟S082，環元素量化步驟S082係對輸出特徵環元素f _dir(cz)執行量化運算Q而產生量化輸出環元素Q(cz)。環張量設定步驟S02、環張量卷積運算步驟S04、張量非線性激活函數運算步驟S06及張量量化步驟S08依序執行。藉此，本發明的使用環張量之深度神經網路加速方法100b可將張量非線性激活函數運算步驟S06所產之輸出特徵環元素f _dir(cz)進行量化，進而得到具目標位元數之輸出特徵。 Please refer to Fig. 1, Fig. 2, Fig. 3, Fig. 4 and Fig. 6 together, wherein Fig. 6 illustrates a method 100b for accelerating a deep neural network using ring tensors according to the third embodiment of the present invention Schematic diagram of the process. As shown in the figure, the deep neural network acceleration method 100b using ring tensors includes a ring tensor setting step S02, a ring tensor convolution operation step S04, a tensor nonlinear activation function operation step S06, and a tensor quantization step S08, wherein the ring The tensor setting step S02, the ring tensor convolution operation step S04, and the tensor nonlinear activation function operation step S06 are respectively the same as the ring tensor setting step S02, the ring tensor convolution operation step S04, and the tensor nonlinear activation function operation step in Figure 1. The actions of S06 are the same, and details are not repeated here. In particular, the deep neural network acceleration method 100b using ring tensors in FIG. 6 further includes a tensor quantization step S08, the tensor quantization step S08 includes a ring element quantization step S082, and the ring element quantization step S082 is to output the feature ring The element fdir ( _cz ) performs the quantization operation Q to produce the quantized output ring element Q (cz ) . The ring tensor setting step S02 , the ring tensor convolution operation step S04 , the tensor nonlinear activation function operation step S06 and the tensor quantization step S08 are sequentially performed. In this way, the deep neural network acceleration method 100b using ring tensors of the present invention can quantize the output feature ring elements f _dir ( cz ) generated in the step S06 of the tensor nonlinear activation function operation, and then obtain a target number of bits. output features.

請一併參閱第1圖、第6圖及第7圖，其中第7圖係繪示本發明第四實施例的使用環張量之深度神經網路加速方法100c之卷積層的示意圖。如圖所示，使用環張量之深度神經網路加速方法100c之卷積層包含環張量卷積運算步驟S24、偏差張量疊加步驟S25、張量非線性激活函數運算步驟S26以及張量量化步驟S28。其中環張量卷積運算步驟S24係依據環元素乘法運算步驟(同第1圖之環元素乘法運算步驟S042)與環元素加法運算步驟(同第1圖之環元素加法運算步驟S044)對輸入特徵環張量x ^(l
-1) 及參數環張量g ^(l
) 進行運算，以產出複數個卷積特徵環張量cz ^(l
) 。其中l 代表第l 層。偏差張量疊加步驟S25係對卷積特徵環張量cz ^(l
) 疊加一偏差環張量b ^(l
) (bias)而產生更新後之卷積特徵環張量cz ^(l
) ，然後將更新後之卷積特徵環張量cz ^(l
) 輸入至張量非線性激活函數運算步驟S26。經張量非線性激活函數運算步驟S26與張量量化步驟S28之後，張量量化步驟S28所產生之量化輸出環張量Q(cz )會當作下一層之輸入特徵環張量x ^(l
) 。另外值得一提的是，在其他實施例中，會被輸入至張量非線性激活函數運算步驟S26之更新後之卷積特徵環張量cz ^(l
) 的數量可依據模型結構決定；換言之，不一定每一個更新後之卷積特徵環張量cz ^(l
) 都會經過張量非線性激活函數運算步驟S26。藉此，本發明的使用環張量之深度神經網路加速方法100c之卷積層可透過偏差環張量b ^(l
) 調整環張量卷積運算步驟S24之卷積特徵環張量cz ^(l
) ，進而得到所要之輸出特徵。Please refer to FIG. 1 , FIG. 6 , and FIG. 7 together, wherein FIG. 7 is a schematic diagram of a convolution layer of the deep neural network acceleration method 100 c using ring tensors according to the fourth embodiment of the present invention. As shown in the figure, the convolution layer of the deep neural network acceleration method 100c using ring tensors includes a ring tensor convolution operation step S24, a deviation tensor stacking step S25, a tensor nonlinear activation function operation step S26, and a tensor quantization step S28 . The ring tensor convolution operation step S24 is based on the ring element multiplication operation step (same as the ring element multiplication operation step S042 in Fig. 1) and the ring element addition operation step (same as the ring element addition operation step S044 in Fig. 1) on the input The feature ring tensor x ^{( l -1)} and the parameter ring tensor g ^{( l )} are operated to produce complex convolutional feature ring tensors cz ^{( l )} . where l represents the lth layer. The bias tensor stacking step S25 is to stack a bias loop tensor b ^{( l )} (bias) on the convolution feature loop tensor cz ⁽ l ) to generate an updated convolution feature loop tensor cz ^{( l )} , and then update the updated convolution feature loop tensor cz ( l ). Then, the convolution feature ring tensor cz ^{( l ) is} input to the tensor nonlinear activation function operation step S26. After the tensor nonlinear activation function operation step S26 and the tensor quantization step S28, the quantized output ring tensor Q( cz ) generated in the tensor quantization step S28 is used as the input feature ring tensor x ^{( l )} of the next layer. It is also worth mentioning that, in other embodiments, the number of the updated convolutional feature ring tensors cz ^{( l )} that will be input to the tensor nonlinear activation function operation step S26 can be determined according to the model structure; in other words, not necessarily Each updated convolutional feature ring tensor cz ^{( l )} will go through the tensor nonlinear activation function operation step S26. Thereby, the convolution layer of the deep neural network acceleration method 100c using ring tensors of the present invention can adjust the convolution feature ring tensor cz ⁽ l) of the ring tensor convolution operation step S24 through the deviation ring tensor b ^{( l )} . ⁾ to obtain the desired output features.

請一併參閱第1圖至第8圖，其中第8圖係繪示本發明第五實施例的使用環張量之深度神經網路加速系統200的方塊示意圖。如圖所示，使用環張量之深度神經網路加速系統200包含第一記憶體210、第二記憶體220以及運算處理單元230。Please refer to FIGS. 1 to 8 together, wherein FIG. 8 is a block diagram illustrating a deep neural network acceleration system 200 using ring tensors according to a fifth embodiment of the present invention. As shown in the figure, the deep neural network acceleration system 200 using ring tensors includes a first memory 210 , a second memory 220 and an arithmetic processing unit 230 .

第一記憶體210用以存取卷積網路之輸入特徵環張量110，輸入特徵環張量110包含複數個輸入特徵環元素x 。第二記憶體220用以存取卷積網路之參數環張量120，參數環張量120包含複數個參數環元素g 。The first memory 210 is used for accessing the input feature loop tensor 110 of the convolutional network. The input feature loop tensor 110 includes a plurality of input feature loop elements x . The second memory 220 is used for accessing the parameter loop tensor 120 of the convolutional network. The parameter loop tensor 120 includes a plurality of parameter loop elements g .

運算處理單元230電性連接於第一記憶體210與第二記憶體220，運算處理單元230接收輸入特徵環張量110與參數環張量120並經配置以實施環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06，如第1圖所示，其細節不再贅述。運算處理單元230可為微處理器、中央處理器或影像處理器，但本發明不以此為限。在其他實施例中，運算處理單元230經配置可實施第5圖的使用環張量之深度神經網路加速方法100a、第6圖的使用環張量之深度神經網路加速方法100b及第7圖的使用環張量之深度神經網路加速方法100c。The arithmetic processing unit 230 is electrically connected to the first memory 210 and the second memory 220. The arithmetic processing unit 230 receives the input feature loop tensor 110 and the parameter loop tensor 120 and is configured to implement the loop tensor setting step S02, loop The tensor convolution operation step S04 and the tensor nonlinear activation function operation step S06 are shown in Fig. 1, and the details thereof will not be repeated. The arithmetic processing unit 230 may be a microprocessor, a central processing unit or an image processor, but the invention is not limited thereto. In other embodiments, the arithmetic processing unit 230 is configured to implement the deep neural network acceleration method 100 a using ring tensors of FIG. 5 , the deep neural network acceleration method 100 b using ring tensors of FIG. 6 , and the seventh A deep neural network acceleration method using ring tensors for graphs 100c.

藉此，本發明的使用環張量之深度神經網路加速系統200利用具有方向性之非線性激活函數，能將額外之線性轉換置於方向性非線性激活函數S062、S162中，因此不會增加環元素乘法運算步驟S042中乘法器的位元數。再者，本發明能有效地交換通道間的資訊，在環代數裡達到最高的影像品質。In this way, the deep neural network acceleration system 200 using ring tensors of the present invention utilizes the directional nonlinear activation function, and can place additional linear transformations in the directional nonlinear activation functions S062 and S162, so there is no The number of bits of the multiplier in the ring element multiplication operation step S042 is increased. Furthermore, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.

請一併參閱表一、表二、表三及第9圖，其中表一顯示維度D(Dimension)為2之環代數特性；表二顯示維度D為4之環代數特性之一部分；表三顯示維度D為4之環代數特性之另一部分；及第9圖係繪示本發明與習知技術之比較結果的示意圖。其中符號Real代表原始實數之卷積運算與激活函數運算。符號R_H2 、R_H4 代表輸入特徵轉換矩陣T_x 與參數轉換矩陣T_g 為哈達瑪轉換H ₂ 、H ₄ ，哈達瑪轉換H ₂ 、H ₄ 分別符合下列式子(5)、(6)：H ₂ =

(5)；H ₄ =

(6)。Please refer to Table 1, Table 2, Table 3 and Figure 9 together. Table 1 shows the cyclic algebraic properties of dimension D (Dimension) of 2; Table 2 shows part of the cyclic algebraic properties of dimension D of 4; Table 3 shows The dimension D is another part of the ring algebraic property of 4; and FIG. 9 is a schematic diagram showing the comparison result between the present invention and the prior art. The symbol Real represents the convolution operation and activation function operation of the original real number. The symbols R _H2 and R _H4 represent the input feature transformation matrix T _x and the parameter transformation matrix T _g are Hadamard transformations H ₂ , H ₄ , and Hadamard transformations H ₂ , H ₄ conform to the following equations (5) and (6) respectively: H ₂ =

(5); H ₄ =

(6).

符號R_C 代表同構矩陣G 為2×2旋轉矩陣(Rotation Matrix)，且輸入特徵轉換矩陣T_x 與參數轉換矩陣T_g 係由複數乘法演算法(complex multiplication algorithm)推導而成；符號R_I2 、R_I4 代表同構矩陣G 使用分量積，且輸入特徵轉換矩陣T_x 與參數轉換矩陣T_g 為單位矩陣；符號R_F4 代表同構矩陣G 使用循環卷積，且輸入特徵轉換矩陣T_x 與參數轉換矩陣T_g 係由傅立葉轉換(Fourier transform)推導而成。符號R_Q 代表同構矩陣G 使用哈密頓(Hamilton)之四元數。符號R_H2 、R_C 、R_I2 、R_H4 、R_F4 、R_Q 、R_I4 為習知技術，其非線性函數f 均為分量線性整流函數f_cw ；符號(R_I2 ,f_H2 )、(R_I4 ,f_H4 )則為本發明，其非線性函數f 為方向性非線性激活函數f_H ₂ (x )、f_H ₄ (x )。第9圖之比較結果係依據四倍超解析度網路模型SR4ERNet-B17R3N1(four-times super-resolution (SR×4) networks SR4ERNet)模擬求得。由表一、表二、表三及第9圖可知，本發明之(R_I2 ,f_H2 )、(R_I4 ,f_H4 )使用具有方向性之非線性激活函數，相較於習知的環代數，本發明需要最少的硬體(即硬體資源比例及面積比例最低)並提供最高的品質(即峰值信噪比(Peak Signal-to-Noise Ratio；PSNR)最高)，且不會增加乘法器的位元數。表一環代數維度 D=2 符號 R_H2 R_C R_I2 (R_I2 ,f_H2 ) 乘法單位 (1 0)^t (1 1)^t 同構矩陣G

非線性函數f f_cw f_H ₂ (x ) 環元素乘法 T_x 、T_g H ₂

I ₂ T_z

I ₂ 乘法器的位元數 2 3 2 硬體資源比例參數數量 50% 乘法數量 50% 75% 50% 複雜度 63% 88% 50% 表二

環代數維度 D=4 符號 R_H4 R_F4 乘法單位 (1 0 0 0)^t 同構矩陣G

非線性函數f f_cw 環元素乘法 T_x 、T_g H ₄

T_z

乘法器的位元數 4 5 硬體資源比例參數數量 25% 乘法數量 25% 31% 複雜度 39% 47%

表三

環代數維度 D=4 符號 R_Q R_I4 (R_I4 ,f_H4 ) 乘法單位 (1 0 0 0)^t (1 1 1 1)^t 同構矩陣G

非線性函數f f_cw f_H ₄ (x ) 環元素乘法 T_x 、T_g

I ₄ T_z

I ₄ 乘法器的位元數 8 4 硬體資源比例參數數量 25% 乘法數量 50% 25% 複雜度 64% 25%

The symbol R _C represents that the isomorphic matrix G is a 2×2 rotation matrix, and the input feature transformation matrix T _x and the parameter transformation matrix T _g are derived by a complex multiplication algorithm; the symbol R _I2 , R _I4 represents that the isomorphic matrix G uses the component product, and the input feature transformation matrix T _x and the parameter transformation matrix T _g are unit matrices; the symbol R _F4 represents that the isomorphic matrix G uses cyclic convolution, and the input feature transformation matrix T _x and The parameter transformation matrix T _g is derived from the Fourier transform. The symbol R _Q represents the isomorphic matrix G using the quaternion of Hamilton (Hamilton). Symbols R _H2 , R _C , R _I2 , R _H4 , R _F4 , R _Q , R _I4 are conventional techniques, and the nonlinear functions f are all component linear rectification functions f _cw ; symbols (R _I2 , f _H2 ), ( R _I4 , f _H4 ) is the present invention, and its nonlinear function f is directional nonlinear activation functions f _H ₂ ( x ) and f _H ₄ ( x ). The comparison results in Figure 9 are obtained by simulation based on the four-times super-resolution (SR×4) networks SR4ERNet-B17R3N1 (four-times super-resolution (SR×4) networks SR4ERNet). It can be seen from Table 1, Table 2, Table 3 and Fig. 9 that the (R _I2 , f _H2 ) and (R _I4 , f _H4 ) of the present invention use a nonlinear activation function with directionality, which is compared with the conventional loop. Algebraically, the present invention requires the least hardware (that is, the hardware resource ratio and the area ratio are the lowest) and provides the highest quality (that is, the Peak Signal-to-Noise Ratio (PSNR) is the highest), and does not increase multiplication. The number of bits of the device. Table I

Ring Algebra dimension D=2 symbol R _H2 R _C R _I2 (R _I2 ,f _H2 ) multiplication unit (1 0) ^t (1 1) ^t Isomorphic matrix G

nonlinear function f f _cw f _H ₂ ( x ) Ring element multiplication T _x , T _g H ₂

I ₂ T _z

I ₂ The number of bits of the multiplier 2 3 2 hardware resource ratio Number of parameters 50% number of multiplications 50% 75% 50% the complexity 63% 88% 50%

Table II

Ring Algebra dimension D=4 symbol R _H4 R _F4 multiplication unit (1 0 0 0) ^t Isomorphic matrix G

nonlinear function f f _cw Ring element multiplication T _x , T _g H4 _{_}

T _z

The number of bits of the multiplier 4 5 hardware resource ratio Number of parameters 25% number of multiplications 25% 31% the complexity 39% 47%

Table 3

Ring Algebra dimension D=4 symbol R _Q R _I4 (R _I4 ,f _H4 ) multiplication unit (1 0 0 0) ^t (1 1 1 1) ^t Isomorphic matrix G

nonlinear function f f _cw f _H ₄ ( x ) Ring element multiplication T _x , T _g

I ₄ T _z

I ₄ The number of bits of the multiplier 8 4 hardware resource ratio Number of parameters 25% number of multiplications 50% 25% the complexity 64% 25%

由上述實施方式可知，本發明具有下列優點：其一，利用具有方向性之非線性激活函數，在不增加環元素乘法運算步驟中乘法器的位元數之條件下，仍保持影像品質，可避免習知技術中因增加乘法器之位元數而導致複雜度上升之問題。其二，透過具有方向性之非線性激活函數，能將額外之線性轉換置於方向性非線性激活函數中，因此不會增加環元素乘法運算步驟中乘法器的位元數。其三，本發明能有效地交換通道間的資訊，在環代數裡達到最高的影像品質。It can be seen from the above-mentioned embodiments that the present invention has the following advantages: firstly, by using a non-linear activation function with directivity, the image quality is still maintained without increasing the number of bits of the multiplier in the multiplication step of the ring element, and the image quality can be maintained. The problem of increasing the complexity caused by increasing the number of bits of the multiplier in the prior art is avoided. Second, through a directional nonlinear activation function, additional linear transformations can be placed in the directional nonlinear activation function, thus not increasing the number of bits in the multipliers in the loop element multiplication step. Third, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.

雖然本發明已以實施方式揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍內，當可作各種之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection of the present invention The scope shall be determined by the scope of the appended patent application.

100,100a,100b,100c:使用環張量之深度神經網路加速方法 S02,S12:環張量設定步驟 S04,S14,S24:環張量卷積運算步驟 S042,S142:環元素乘法運算步驟 S0422:分量積 S044,S144:環元素加法運算步驟 S0442:分量和 S06,S16,S26:張量非線性激活函數運算步驟 S062,S162:方向性非線性激活函數 S0622,S1622:第一轉換步驟 S0624,S1626:線性整流函數運算步驟 S0626,S1628:第二轉換步驟 S08,S28:張量量化步驟 S082:環元素量化步驟 S1624:正規化線性轉換步驟 S25:偏差張量疊加步驟 110:輸入特徵環張量 120:參數環張量 130:卷積特徵環張量x :輸入特徵環元素x ^(l
-1) ,x ^(l
) :輸入特徵環張量g :參數環元素g ^(l
) :參數環張量cz :卷積特徵環元素cz ^(l
) :卷積特徵環張量z :乘法輸出環元素T_x :輸入特徵轉換矩陣T_g :參數轉換矩陣T_z :乘法輸出轉換矩陣

:輸入特徵轉換輸出

:參數轉換輸出

:乘法輸出b ^(l
) :偏差環張量 200:使用環張量之深度神經網路加速系統 210:第一記憶體 220:第二記憶體 230:運算處理單元 Real,R_H2 ,R_C ,R_H4 ,R_F4 ,R_Q ,(R_I2 ,f_H2 ),(R_I4 ,f_H4 ):符號 PSNR:峰值信噪比100, 100a, 100b, 100c: Deep Neural Network Acceleration Method Using Ring Tensor S02, S12: Ring Tensor Setting Step S04, S14, S24: Ring Tensor Convolution Operation Step S042, S142: Ring Element Multiplication Operation Step S0422 : component product S044, S144: loop element addition operation step S0442: component sum S06, S16, S26: tensor nonlinear activation function operation step S062, S162: directional nonlinear activation function S0622, S1622: first conversion step S0624, S1626: Linear rectification function operation steps S0626, S1628: second conversion step S08, S28: tensor quantization step S082: ring element quantization step S1624: normalized linear conversion step S25: deviation tensor superposition step 110: input feature ring tensor 120: parametric ring tensor 130: convolution feature ring tensor x : input feature ring elements x ^{( l -1)} , x ^{( l )} : input feature ring tensor g : parametric ring element g ^{( l )} : parametric ring tensor cz : convolution feature loop element cz ^{( l )} : convolution feature loop tensor z : multiplication output loop element T _x : input feature transformation matrix T _g : parameter transformation matrix T _z : multiplication output transformation matrix

: input feature transform output

: parameter conversion output

: multiplication output b ^{( l )} : deviation ring tensor 200 : deep neural network acceleration system using ring tensor 210 : first memory 220 : second memory 230 : arithmetic processing unit Real, R _H2 , R _C , R _H4 , R _F4 , R _Q , (R _I2 , f _H2 ), (R _I4 , f _H4 ): symbols PSNR: peak signal-to-noise ratio

第1圖係繪示本發明第一實施例的使用環張量之深度神經網路加速方法的流程示意圖；第2圖係繪示第1圖的使用環張量之深度神經網路加速方法的環元素乘法運算步驟之示意圖；第3圖係繪示第1圖的使用環張量之深度神經網路加速方法的環元素加法運算步驟之示意圖；第4圖係繪示第1圖的使用環張量之深度神經網路加速方法的方向性非線性激活函數之流程示意圖；第5圖係繪示本發明第二實施例的使用環張量之深度神經網路加速方法的流程示意圖；第6圖係繪示本發明第三實施例的使用環張量之深度神經網路加速方法的流程示意圖；第7圖係繪示本發明第四實施例的使用環張量之深度神經網路加速方法之卷積層的示意圖；第8圖係繪示本發明第五實施例的使用環張量之深度神經網路加速系統的方塊示意圖；以及第9圖係繪示本發明與習知技術之比較結果的示意圖。FIG. 1 is a schematic flowchart of a deep neural network acceleration method using ring tensors according to a first embodiment of the present invention; FIG. 2 is a schematic diagram illustrating the multiplication operation steps of ring elements of the deep neural network acceleration method using ring tensors of FIG. 1; FIG. 3 is a schematic diagram illustrating the ring element addition operation steps of the deep neural network acceleration method using ring tensors of FIG. 1; FIG. 4 is a schematic flowchart of the directional nonlinear activation function of the deep neural network acceleration method using ring tensors of FIG. 1; FIG. 5 is a schematic flowchart of a deep neural network acceleration method using ring tensors according to a second embodiment of the present invention; FIG. 6 is a schematic flowchart illustrating a method for accelerating a deep neural network using ring tensors according to a third embodiment of the present invention; FIG. 7 is a schematic diagram illustrating a convolutional layer of a deep neural network acceleration method using ring tensors according to a fourth embodiment of the present invention; FIG. 8 is a block diagram illustrating a deep neural network acceleration system using ring tensors according to a fifth embodiment of the present invention; and FIG. 9 is a schematic diagram showing a comparison result between the present invention and the prior art.

100:使用環張量之深度神經網路加速方法100: Deep Neural Network Acceleration Method Using Ring Tensors

S02:環張量設定步驟S02: Ring tensor setting steps

S04:環張量卷積運算步驟S04: Ring tensor convolution operation steps

S042:環元素乘法運算步驟S042: Ring element multiplication operation steps

S044:環元素加法運算步驟S044: Ring element addition operation steps

S06:張量非線性激活函數運算步驟S06: Operation steps of tensor nonlinear activation function

S062:方向性非線性激活函數S062: directional nonlinear activation function

110:輸入特徵環張量110: Input feature ring tensor

120:參數環張量120: Parametric Ring Tensor

130:卷積特徵環張量130: Convolutional Feature Ring Tensor

Claims

A deep neural network acceleration method using ring tensors, including the following steps: A loop tensor setting step is to set an input feature loop tensor and a parameter loop tensor of a convolution network, the input feature loop tensor includes a plurality of input feature loop elements, and the parameter loop tensor includes a plurality of parameters ring element; A ring tensor convolution operation step is to perform operations on the input feature ring elements of the input feature ring tensor and the parameter ring elements of the parameter ring tensor according to a ring element multiplication operation step and a ring element addition operation step, to produce a plurality of convolutional feature loop elements of a convolutional feature loop tensor; and A tensor nonlinear activation function operation step is to perform a directional nonlinear activation function on one of the convolution feature loop elements of the convolution feature loop tensor to generate an output feature loop element; Wherein, the loop element multiplication operation step includes performing a loop element multiplication on one of the input feature loop elements and one of the parameter loop elements to generate a multiplication output loop element, and the multiplication outputs one of the multiplication output components of the loop elements Obtained by performing a component product of an input feature component of the one of the input feature loop elements and a parameter component of the one of the parameter loop elements; Wherein, the loop element addition operation step includes performing a loop element addition on a plurality of the multiplication output loop elements to generate the one of the convolution feature loop elements, and a convolution feature component of each of the convolution feature loop elements is composed of the A plurality of the multiplication output components of the multiplication output ring elements perform a component sum evaluation.

The deep neural network acceleration method using ring tensors as described in claim 1, wherein the directional nonlinear activation function comprises: a first transformation step of performing a first linear transformation on the one of the convolutional feature loop elements to generate a linear output loop element; a linear rectification function operation step of performing a component linear rectification function on the linear output loop element to generate a linear rectified output loop element; and A second transformation step is to perform a second linear transformation on the linearly rectified output loop element to generate the output characteristic loop element.

The deep neural network acceleration method using ring tensors as described in claim 2, wherein any one of the first linear transformation and the second linear transformation is a Hadamard Transform.

The deep neural network acceleration method using ring tensors as described in claim 1, wherein the directional nonlinear activation function comprises: a first transformation step of performing a first linear transformation on the one of the convolutional feature loop elements to generate a first linear output loop element; a normalized linear transformation step of performing the ring element multiplication on the first linear output ring element and a normalized linear transformation to generate a second linear output ring element; a linear rectification function operation step of performing a component linear rectification function on the second linear output loop element to generate a linear rectified output loop element; and a second conversion step of performing a second linear conversion on the linearly rectified output loop element to generate the output characteristic loop element; Wherein, any one of the first linear transformation and the second linear transformation is a Hadamard transformation.

The deep neural network acceleration method using ring tensors as described in claim 1, further comprising: A quantization step, which is to perform a quantization operation on the output feature loop element to generate a quantized output loop element; Wherein, the ring tensor setting step, the ring tensor convolution operation step, the tensor nonlinear activation function operation step, and the tensor quantization step are performed in sequence.

A deep neural network acceleration system using ring tensors, including: a first memory for accessing an input feature loop tensor of a convolutional network, the input feature loop tensor including a plurality of input feature loop elements; a second memory for accessing a parameter loop tensor of the convolutional network, the parameter loop tensor including a plurality of parameter loop elements; and an arithmetic processing unit, electrically connected to the first memory and the second memory, the arithmetic processing unit receives the input feature loop tensor and the parameter loop tensor and is configured to perform an operation including the following steps: A loop tensor setting step is to set the input feature loop tensor and the parameter loop tensor of the convolutional network; A ring tensor convolution operation step is to perform operations on the input feature ring elements of the input feature ring tensor and the parameter ring elements of the parameter ring tensor according to a ring element multiplication operation step and a ring element addition operation step, to produce a plurality of convolutional feature loop elements of a convolutional feature loop tensor; and A tensor nonlinear activation function operation step is to perform a directional nonlinear activation function on one of the convolution feature loop elements of the convolution feature loop tensor to generate an output feature loop element; Wherein, the loop element multiplication operation step includes performing a loop element multiplication on one of the input feature loop elements and one of the parameter loop elements to generate a multiplication output loop element, and the multiplication outputs one of the multiplication output components of the loop elements Obtained by performing a component product of an input feature component of the one of the input feature loop elements and a parameter component of the one of the parameter loop elements; Wherein, the loop element addition operation step includes performing a loop element addition on a plurality of the multiplication output loop elements to generate the one of the convolution feature loop elements, and a convolution feature component of each of the convolution feature loop elements is composed of the A plurality of the multiplication output components of the multiplication output ring elements perform a component sum evaluation.

The deep neural network acceleration system using ring tensors as described in claim 6, wherein the directional nonlinear activation function comprises: a first transformation step of performing a first linear transformation on the one of the convolutional feature loop elements to generate a linear output loop element; a linear rectification function operation step of performing a component linear rectification function on the linear output loop element to generate a linear rectified output loop element; and A second transformation step is to perform a second linear transformation on the linearly rectified output loop element to generate the output characteristic loop element.

The deep neural network acceleration system using ring tensors as claimed in claim 7, wherein any one of the first linear transformation and the second linear transformation is a Hadamard transformation.

The deep neural network acceleration system using ring tensors as described in claim 6, wherein the directional nonlinear activation function comprises: a first transformation step of performing a first linear transformation on the one of the convolutional feature loop elements to generate a first linear output loop element; a normalized linear transformation step of performing the ring element multiplication on the first linear output ring element and a normalized linear transformation to generate a second linear output ring element; a linear rectification function operation step of performing a component linear rectification function on the second linear output loop element to generate a linear rectified output loop element; and a second conversion step of performing a second linear conversion on the linearly rectified output loop element to generate the output characteristic loop element; Wherein, any one of the first linear transformation and the second linear transformation is a Hadamard transformation.

The deep neural network acceleration system using ring tensors of claim 6, wherein the operation processing unit is configured to perform a tensor quantization step that performs a quantization operation on the output feature ring elements and generate a quantized output ring element; Wherein, the ring tensor setting step, the ring tensor convolution operation step, the tensor nonlinear activation function operation step, and the tensor quantization step are performed in sequence.