TW202147147A - Deep neural network accelerating method using ring tensors and system thereof - Google Patents
Deep neural network accelerating method using ring tensors and system thereof Download PDFInfo
- Publication number
- TW202147147A TW202147147A TW110103724A TW110103724A TW202147147A TW 202147147 A TW202147147 A TW 202147147A TW 110103724 A TW110103724 A TW 110103724A TW 110103724 A TW110103724 A TW 110103724A TW 202147147 A TW202147147 A TW 202147147A
- Authority
- TW
- Taiwan
- Prior art keywords
- ring
- tensor
- loop
- linear
- elements
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
- Peptides Or Proteins (AREA)
Abstract
Description
本發明是關於一種深度神經網路加速方法及其系統,特別是關於一種使用環張量之深度神經網路加速方法及其系統。The present invention relates to a deep neural network acceleration method and system, in particular to a deep neural network acceleration method and system using ring tensors.
當使用卷積神經網路於影像處理應用時,其運算需求相當高,主流相關研究是採用參數修剪(weight pruning)的作法來去除值很小或不需要的網路參數,以減少運算量並同時減少所需的參數儲存空間。然而,此修剪後的網路之運算流程會變得不規則,進而造成硬體加速時的負擔,因此不適用於採用大量平行運算之加速器。When the convolutional neural network is used in image processing applications, its computing requirements are quite high. The mainstream related research is to use the method of parameter pruning (weight pruning) to remove network parameters with small or unnecessary values. At the same time, the required parameter storage space is reduced. However, the operation flow of the pruned network will become irregular, which will cause the burden of hardware acceleration, so it is not suitable for accelerators that use a large number of parallel operations.
另一方面,有一些研究在探討如何在減少運算量下並同時保留網路之規律性,其中包含四元數(Quaternion)網路、循環卷積神經網路(Circulant Convolutional Neural Network;CirCNN)、混洗網路(ShuffleNet)以及使用哈達瑪轉換之卷積神經網路(HadaNet),這些作法的共通點在於使用特徵向量於通道方向的稀疏性,十分規律的減少運算量以及參數量,其中為了補救因為較少乘法帶來的品質損失,必需額外地讓不同通道間的資訊能夠交換。概念上可分為通道間之額外線性轉換以及通道變換位置,其中線性轉換的作法可得到較好的品質,但是對於常見的定點數運算來說,此些轉換會增加特徵值與參數的位元數,進而導致乘法器的複雜度上升。On the other hand, there are some studies on how to reduce the amount of computation while preserving the regularity of the network, including Quaternion network, Circulant Convolutional Neural Network (CirCNN), The shuffling network (ShuffleNet) and the convolutional neural network (HadaNet) using Hadamard transformation, the common point of these methods is to use the sparseness of the feature vector in the channel direction, and reduce the amount of computation and parameters very regularly. To remedy the quality loss due to fewer multiplications, it is necessary to additionally enable the exchange of information between different channels. Conceptually, it can be divided into additional linear transformation between channels and channel transformation position. Linear transformation can obtain better quality, but for common fixed-point operations, these transformations will increase the bits of eigenvalues and parameters. number, which in turn leads to an increase in the complexity of the multiplier.
由此可知,目前市場上缺乏一種可不增加乘法器的位元數、有效地交換通道間之資訊及保持影像品質的使用環張量之深度神經網路加速方法及其系統,故相關業者均在尋求其解決之道。It can be seen that there is currently a lack of a deep neural network acceleration method and system using ring tensors that can effectively exchange information between channels and maintain image quality without increasing the number of bits of the multiplier. seek its solution.
因此,本發明之目的在於提供一種使用環張量之深度神經網路加速方法及其系統,其利用具有方向性之非線性激活函數,在不增加環元素乘法運算步驟中乘法器的位元數之條件下,仍保持影像品質,可避免習知技術中因增加乘法器之位元數而導致複雜度上升之問題。Therefore, the object of the present invention is to provide a deep neural network acceleration method and system using ring tensors, which utilize a directional nonlinear activation function without increasing the number of bits of the multiplier in the multiplication step of the ring elements. Under such conditions, the image quality is still maintained, and the problem of increasing the complexity caused by increasing the number of bits of the multiplier in the prior art can be avoided.
依據本發明的方法態樣之一實施方式提供一種使用環張量之深度神經網路加速方法,包含以下步驟:環張量設定步驟、環張量卷積運算步驟以及張量非線性激活函數運算步驟。其中環張量設定步驟係設定卷積網路之輸入特徵環張量與參數環張量,輸入特徵環張量包含複數個輸入特徵環元素,參數環張量包含複數個參數環元素。環張量卷積運算步驟係依據環元素乘法運算步驟與環元素加法運算步驟對輸入特徵環張量之此些輸入特徵環元素及參數環張量之此些參數環元素進行運算,以產出卷積特徵環張量之複數個卷積特徵環元素。張量非線性激活函數運算步驟係對卷積特徵環張量之此些卷積特徵環元素之一者執行方向性非線性激活函數而產生輸出特徵環元素。環元素乘法運算步驟包含對此些輸入特徵環元素之一者與此些參數環元素之一者執行環元素乘法而產生乘法輸出環元素。乘法輸出環元素之乘法輸出分量由此些輸入特徵環元素之此者之輸入特徵分量及此些參數環元素之此者之參數分量執行分量積求得。環元素加法運算步驟包含對複數個乘法輸出環元素執行環元素加法而產生此些卷積特徵環元素之此者。各卷積特徵環元素之卷積特徵分量由此些乘法輸出環元素之複數個乘法輸出分量執行分量和求得。According to an embodiment of the method aspect of the present invention, a deep neural network acceleration method using ring tensors is provided, which includes the following steps: a ring tensor setting step, a ring tensor convolution operation step, and a tensor nonlinear activation function operation step. The loop tensor setting step is to set the input feature loop tensor and parameter loop tensor of the convolution network. The input feature loop tensor includes a plurality of input feature loop elements, and the parameter loop tensor includes a plurality of parameter loop elements. The ring tensor convolution operation step operates on the input feature ring elements of the input feature ring tensor and the parameter ring elements of the parameter ring tensor according to the ring element multiplication operation step and the ring element addition operation step to output Convolutional eigenloop elements of the convolutional eigenloop tensor. The tensor nonlinear activation function operation step is to perform a directional nonlinear activation function on one of the convolution feature loop elements of the convolution feature loop tensor to generate an output feature loop element. The ring element multiplication operation step includes performing a ring element multiplication on one of the input feature ring elements and one of the parameter ring elements to generate a multiplication output ring element. The multiplicative output components of the multiplicative output loop elements are obtained by performing a component product of the input feature components of one of the input feature loop elements and the parameter components of the one of the parameter loop elements. The ring element addition operation step includes performing ring element addition on a plurality of multiplication output ring elements to generate one of the convolutional feature ring elements. The convolution feature components of each convolution feature loop element are obtained by performing a component sum of the plurality of multiplication output components of these multiplication output loop elements.
藉此,本發明的使用環張量之深度神經網路加速方法透過具有方向性之非線性激活函數,能將額外之線性轉換置於方向性非線性激活函數中,因此不會增加環元素乘法運算步驟中乘法器的位元數。此外,本發明能有效地交換通道間的資訊,在環代數裡達到最高的影像品質。In this way, the deep neural network acceleration method using ring tensors of the present invention can place additional linear transformations in the directional nonlinear activation function through a directional nonlinear activation function, so that the ring element multiplication is not increased. The number of bits of the multiplier in the operation step. In addition, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.
前述實施方式之其他實施例如下:前述方向性非線性激活函數可包含第一轉換步驟、線性整流函數運算步驟及第二轉換步驟。其中第一轉換步驟係對此些卷積特徵環元素之此者執行第一線性轉換而產生線性輸出環元素。線性整流函數運算步驟係對線性輸出環元素執行分量線性整流函數而產生線性整流輸出環元素。第二轉換步驟係對線性整流輸出環元素執行第二線性轉換而產生輸出特徵環元素。Other examples of the aforementioned embodiments are as follows: the aforementioned directional nonlinear activation function may include a first transformation step, a linear rectification function operation step, and a second transformation step. wherein the first transformation step performs a first linear transformation on one of the convolutional feature loop elements to generate a linear output loop element. The linear rectification function operation step is to perform a component linear rectification function on the linear output loop elements to generate the linear rectified output loop elements. The second transformation step is to perform a second linear transformation on the linearly rectified output loop elements to generate output characteristic loop elements.
前述實施方式之其他實施例如下:前述第一線性轉換與第二線性轉換之任一者可為哈達瑪轉換(Hadamard Transform)。Other examples of the aforementioned embodiments are as follows: either one of the aforementioned first linear transformation and the second linear transformation may be a Hadamard Transform.
前述實施方式之其他實施例如下:前述方向性非線性激活函數可包含第一轉換步驟、正規化線性轉換步驟、線性整流函數運算步驟及第二轉換步驟。其中第一轉換步驟係對此些卷積特徵環元素之此者執行第一線性轉換而產生第一線性輸出環元素。正規化線性轉換步驟係對第一線性輸出環元素與正規化線性轉換執行環元素乘法而產生第二線性輸出環元素。線性整流函數運算步驟係對第二線性輸出環元素執行分量線性整流函數而產生線性整流輸出環元素。第二轉換步驟係對線性整流輸出環元素執行第二線性轉換而產生輸出特徵環元素。第一線性轉換與第二線性轉換之任一者為哈達瑪轉換。Other examples of the aforementioned embodiments are as follows: the aforementioned directional nonlinear activation function may include a first transformation step, a normalized linear transformation step, a linear rectification function operation step, and a second transformation step. wherein the first transformation step performs a first linear transformation on one of the convolutional feature loop elements to generate a first linear output loop element. The normalized linear transformation step is performing ring element multiplication on the first linear output loop element and the normalized linear transformation to produce a second linear output loop element. The linear rectification function operation step is to perform a component linear rectification function on the second linear output loop element to generate a linear rectified output loop element. The second transformation step is to perform a second linear transformation on the linearly rectified output loop elements to generate output characteristic loop elements. Either of the first linear transformation and the second linear transformation is a Hadamard transformation.
前述實施方式之其他實施例如下:前述使用環張量之深度神經網路加速方法可更包含張量量化步驟,此張量量化步驟係對輸出特徵環元素執行量化運算而產生量化輸出環元素。環張量設定步驟、環張量卷積運算步驟、張量非線性激活函數運算步驟及張量量化步驟依序執行。Other examples of the aforementioned embodiments are as follows: the aforementioned deep neural network acceleration method using ring tensors may further include a tensor quantization step, and the tensor quantization step performs a quantization operation on the output feature ring elements to generate quantized output ring elements. The ring tensor setting step, the ring tensor convolution operation step, the tensor nonlinear activation function operation step, and the tensor quantization step are performed in sequence.
依據本發明的結構態樣之一實施方式提供一種使用環張量之深度神經網路加速系統,其包含第一記憶體、第二記憶體以及運算處理單元。其中第一記憶體用以存取卷積網路之輸入特徵環張量,輸入特徵環張量包含複數個輸入特徵環元素。第二記憶體用以存取卷積網路之參數環張量,參數環張量包含複數個參數環元素。運算處理單元電性連接於第一記憶體與第二記憶體,運算處理單元接收輸入特徵環張量與參數環張量並經配置以實施包含以下步驟之操作:環張量設定步驟、環張量卷積運算步驟及張量非線性激活函數運算步驟。環張量設定步驟係設定卷積網路之輸入特徵環張量與參數環張量。環張量卷積運算步驟係依據環元素乘法運算步驟與環元素加法運算步驟對輸入特徵環張量之此些輸入特徵環元素及參數環張量之此些參數環元素進行運算,以產出卷積特徵環張量之複數個卷積特徵環元素。張量非線性激活函數運算步驟係對卷積特徵環張量之此些卷積特徵環元素之一者執行方向性非線性激活函數而產生輸出特徵環元素。環元素乘法運算步驟包含對此些輸入特徵環元素之一者與此些參數環元素之一者執行環元素乘法而產生乘法輸出環元素。乘法輸出環元素之乘法輸出分量由此些輸入特徵環元素之此者之輸入特徵分量及此些參數環元素之此者之參數分量執行分量積求得。環元素加法運算步驟包含對複數個乘法輸出環元素執行環元素加法而產生此些卷積特徵環元素之此者,各卷積特徵環元素之卷積特徵分量由此些乘法輸出環元素之複數個乘法輸出分量執行分量和求得。According to one embodiment of the structural aspect of the present invention, there is provided a deep neural network acceleration system using ring tensors, which includes a first memory, a second memory, and an arithmetic processing unit. The first memory is used for accessing the input feature loop tensor of the convolution network, and the input feature loop tensor includes a plurality of input feature loop elements. The second memory is used for accessing the parameter loop tensor of the convolution network, and the parameter loop tensor includes a plurality of parameter loop elements. The operation processing unit is electrically connected to the first memory and the second memory. The operation processing unit receives the input feature ring tensor and the parameter ring tensor and is configured to perform operations including the following steps: a ring tensor setting step, a ring tension Quantum convolution operation steps and tensor nonlinear activation function operation steps. The loop tensor setting step is to set the input feature loop tensor and parameter loop tensor of the convolutional network. The ring tensor convolution operation step operates on the input feature ring elements of the input feature ring tensor and the parameter ring elements of the parameter ring tensor according to the ring element multiplication operation step and the ring element addition operation step to output Convolutional eigenloop elements of the convolutional eigenloop tensor. The tensor nonlinear activation function operation step is to perform a directional nonlinear activation function on one of the convolution feature loop elements of the convolution feature loop tensor to generate an output feature loop element. The ring element multiplication operation step includes performing a ring element multiplication on one of the input feature ring elements and one of the parameter ring elements to generate a multiplication output ring element. The multiplicative output components of the multiplicative output loop elements are obtained by performing a component product of the input feature components of one of the input feature loop elements and the parameter components of the one of the parameter loop elements. The ring element addition operation step includes performing ring element addition on a plurality of multiplication output ring elements to generate one of these convolution feature ring elements, the convolution feature components of each convolution feature ring element from the complex numbers of the multiplication output ring elements A component sum is performed on the output components of the multiplication.
藉此,本發明的使用環張量之深度神經網路加速系統利用具有方向性之非線性激活函數,能將額外之線性轉換置於方向性非線性激活函數中,因此不會增加環元素乘法運算步驟中乘法器的位元數。再者,本發明能有效地交換通道間的資訊,在環代數裡達到最高的影像品質。In this way, the deep neural network acceleration system using ring tensors of the present invention utilizes a directional nonlinear activation function, and can place additional linear transformations in the directional nonlinear activation function, so that the multiplication of ring elements is not increased. The number of bits of the multiplier in the operation step. Furthermore, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.
前述實施方式之其他實施例如下:前述方向性非線性激活函數可包含第一轉換步驟、線性整流函數運算步驟及第二轉換步驟。其中第一轉換步驟係對此些卷積特徵環元素之此者執行第一線性轉換而產生線性輸出環元素。線性整流函數運算步驟係對線性輸出環元素執行分量線性整流函數而產生線性整流輸出環元素。第二轉換步驟係對線性整流輸出環元素執行第二線性轉換而產生輸出特徵環元素。Other examples of the aforementioned embodiments are as follows: the aforementioned directional nonlinear activation function may include a first transformation step, a linear rectification function operation step, and a second transformation step. wherein the first transformation step performs a first linear transformation on one of the convolutional feature loop elements to generate a linear output loop element. The linear rectification function operation step is to perform a component linear rectification function on the linear output loop elements to generate the linear rectified output loop elements. The second transformation step is to perform a second linear transformation on the linearly rectified output loop elements to generate output characteristic loop elements.
前述實施方式之其他實施例如下:前述第一線性轉換與第二線性轉換之任一者可為哈達瑪轉換。Other examples of the aforementioned embodiments are as follows: either one of the aforementioned first linear transformation and the second linear transformation may be a Hadamard transformation.
前述實施方式之其他實施例如下:前述方向性非線性激活函數可包含第一轉換步驟、正規化線性轉換步驟、線性整流函數運算步驟及第二轉換步驟。其中第一轉換步驟係對此些卷積特徵環元素之此者執行第一線性轉換而產生第一線性輸出環元素。正規化線性轉換步驟係對第一線性輸出環元素與正規化線性轉換執行環元素乘法而產生第二線性輸出環元素。線性整流函數運算步驟係對第二線性輸出環元素執行分量線性整流函數而產生線性整流輸出環元素。第二轉換步驟係對線性整流輸出環元素執行第二線性轉換而產生輸出特徵環元素。第一線性轉換與第二線性轉換之任一者為哈達瑪轉換。Other examples of the aforementioned embodiments are as follows: the aforementioned directional nonlinear activation function may include a first transformation step, a normalized linear transformation step, a linear rectification function operation step, and a second transformation step. wherein the first transformation step performs a first linear transformation on one of the convolutional feature loop elements to generate a first linear output loop element. The normalized linear transformation step is performing ring element multiplication on the first linear output loop element and the normalized linear transformation to produce a second linear output loop element. The linear rectification function operation step is to perform a component linear rectification function on the second linear output loop element to generate a linear rectified output loop element. The second transformation step is to perform a second linear transformation on the linearly rectified output loop elements to generate output characteristic loop elements. Either of the first linear transformation and the second linear transformation is a Hadamard transformation.
前述實施方式之其他實施例如下:前述運算處理單元經配置以實施張量量化步驟,張量量化步驟係對輸出特徵環元素執行量化運算而產生量化輸出環元素。環張量設定步驟、環張量卷積運算步驟、張量非線性激活函數運算步驟及張量量化步驟依序執行。Other examples of the aforementioned embodiments are as follows: the aforementioned arithmetic processing unit is configured to perform a tensor quantization step that performs a quantization operation on output feature loop elements to generate quantized output loop elements. The ring tensor setting step, the ring tensor convolution operation step, the tensor nonlinear activation function operation step, and the tensor quantization step are performed in sequence.
以下將參照圖式說明本發明之複數個實施例。為明確說明起見,許多實務上的細節將在以下敘述中一併說明。然而,應瞭解到,這些實務上的細節不應用以限制本發明。也就是說,在本發明部分實施例中,這些實務上的細節是非必要的。此外,為簡化圖式起見,一些習知慣用的結構與元件在圖式中將以簡單示意的方式繪示之;並且重複之元件將可能使用相同的編號表示之。Several embodiments of the present invention will be described below with reference to the drawings. For the sake of clarity, many practical details are set forth in the following description. It should be understood, however, that these practical details should not be used to limit the invention. That is, in some embodiments of the present invention, these practical details are unnecessary. In addition, for the purpose of simplifying the drawings, some well-known and conventional structures and elements will be shown in a simplified and schematic manner in the drawings; and repeated elements may be denoted by the same reference numerals.
此外,本文中當某一元件(或單元或模組等)「連接/連結」於另一元件,可指所述元件是直接連接/連結於另一元件,亦可指某一元件是間接連接/連結於另一元件,意即,有其他元件介於所述元件及另一元件之間。而當有明示某一元件是「直接連接/連結」於另一元件時,才表示沒有其他元件介於所述元件及另一元件之間。而第一、第二、第三等用語只是用來描述不同元件,而對元件本身並無限制,因此,第一元件亦可改稱為第二元件。且本文中之元件/單元/電路之組合非此領域中之一般周知、常規或習知之組合,不能以元件/單元/電路本身是否為習知,來判定其組合關係是否容易被技術領域中之通常知識者輕易完成。In addition, when a certain element (or unit or module, etc.) is "connected/connected" to another element herein, it may mean that the element is directly connected/connected to another element, or that a certain element is indirectly connected /Connected to another element means that there is another element between the element and the other element. When it is expressly stated that an element is "directly connected/connected" to another element, it means that no other element is interposed between the element and the other element. The terms first, second, third, etc. are only used to describe different elements, and do not limit the elements themselves. Therefore, the first element can also be renamed as the second element. And the combination of elements/units/circuits in this article is not a commonly known, conventional or well-known combination in this field, and it cannot be determined whether the combination relationship of the elements/units/circuits is well-known or not easily understood by those in the technical field. Usually the knowledgeable can do it easily.
請參閱第1圖,第1圖係繪示本發明第一實施例的使用環張量之深度神經網路加速方法100的流程示意圖。使用環張量之深度神經網路加速方法100包含環張量設定步驟S02、環張量卷積運算步驟S04以及張量非線性激活函數運算步驟S06。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a deep neural
環張量設定步驟S02係設定卷積網路之輸入特徵環張量110(input feature tensor)與參數環張量120(weight tensor),輸入特徵環張量110包含複數個輸入特徵環元素,參數環張量120包含複數個參數環元素。環張量卷積運算步驟S04係依據環元素乘法運算步驟S042與環元素加法運算步驟S044對輸入特徵環張量110之輸入特徵環元素及參數環張量120之參數環元素進行運算,以產出卷積特徵環張量130之複數個卷積特徵環元素。其中環元素乘法運算步驟S042包含對此些輸入特徵環元素之一者與此些參數環元素之一者執行環元素乘法而產生乘法輸出環元素。乘法輸出環元素之一乘法輸出分量由此些輸入特徵環元素之此者之輸入特徵分量及此些參數環元素之此者之參數分量執行分量積(component-wise product)求得。而環元素加法運算步驟S044則包含對複數個乘法輸出環元素執行一環元素加法而產生此些卷積特徵環元素之一者,各卷積特徵環元素之一卷積特徵分量由此些乘法輸出環元素之複數個乘法輸出分量執行分量和(component-wise addition)求得。此外,張量非線性激活函數運算步驟S06係對卷積特徵環張量130之此些卷積特徵環元素之此者執行方向性非線性激活函數S062(directional Rectified Linear Unit;directional ReLU)而產生一輸出特徵環元素。藉此,本發明的使用環張量之深度神經網路加速方法100利用具有方向性之非線性激活函數,在不增加環元素乘法運算步驟S042中乘法器的位元數之條件下,仍保持影像品質,可避免習知技術中因增加乘法器之位元數而導致複雜度上升之問題。以下將透過較詳細的實施例來說明上述各步驟之細節。The loop tensor setting step S02 is to set an input feature loop tensor 110 (input feature tensor) and a parameter loop tensor 120 (weight tensor) of the convolutional network. The input
請一併參閱第1圖至第4圖,其中第2圖係繪示第1圖的使用環張量之深度神經網路加速方法100的環元素乘法運算步驟S042之示意圖;第3圖係繪示第1圖的使用環張量之深度神經網路加速方法100的環元素加法運算步驟S044之示意圖;及第4圖係繪示第1圖的使用環張量之深度神經網路加速方法100的方向性非線性激活函數S062之流程示意圖。如圖所示,使用環張量之深度神經網路加速方法100包含環張量設定步驟S02、環張量卷積運算步驟S04以及張量非線性激活函數運算步驟S06。Please refer to FIG. 1 to FIG. 4 together, wherein FIG. 2 is a schematic diagram of the ring element multiplication operation step S042 of the deep neural
環張量設定步驟S02係設定卷積網路之輸入特徵環張量110與參數環張量120,輸入特徵環張量110包含複數個輸入特徵環元素x
,參數環張量120包含複數個參數環元素g
。環張量(ring tensor)為多個維度陣列(array),其可為環向量(ring vector)或環矩陣(ring matrix)。環元素(ring element)為n
個維度(n
-dimension)實數向量,n
為大於等於2之正整數。輸入特徵環元素x
包含複數個輸入特徵分量xi
,i
為大於等於0之整數。參數環元素g
包含複數個參數分量gi
。輸入特徵分量xi
及參數分量gi
均為實數。The loop tensor setting step S02 is to set the input
環張量卷積運算步驟S04係依據環元素乘法運算步驟S042與環元素加法運算步驟S044對輸入特徵環張量110之輸入特徵環元素x
及參數環張量120之參數環元素g
進行運算,以產出卷積特徵環張量130之複數個卷積特徵環元素cz
。卷積特徵環元素cz
包含複數個卷積特徵分量czj
。The ring tensor convolution operation step S04 is based on the ring element multiplication operation step S042 and the ring element addition operation step S044 to operate on the input feature ring element x of the input
環元素乘法運算步驟S042包含對此些輸入特徵環元素x 之一者與此些參數環元素g 之一者執行一環元素乘法(ring multiplication)而產生一乘法輸出環元素z 。乘法輸出環元素z 之一乘法輸出分量zi 由此些輸入特徵環元素x 之此者之一輸入特徵分量xi 及此些參數環元素g 之此者之一參數分量gi 執行一分量積S0422求得,如第2圖所示。詳細地說,環元素乘法運算步驟S042包含將此些輸入特徵環元素x 之一者與此些參數環元素g 之一者先分別透過輸入特徵轉換矩陣Tx 與參數轉換矩陣Tg 轉換而產生輸入特徵轉換輸出與參數轉換輸出,然後對輸入特徵轉換輸出與參數轉換輸出執行分量積S0422而產生乘法輸出,最後將乘法輸出透過乘法輸出轉換矩陣Tz 轉換而產生乘法輸出環元素z 。本實施例之輸入特徵轉換矩陣Tx 、參數轉換矩陣Tg 及乘法輸出轉換矩陣Tz 均為單位矩陣(Identity)。乘法輸出環元素z 與輸入特徵環元素x 之關係可由下列式子(1)表示:z =Gx (1)。 其中G 代表同構矩陣,此同構矩陣G 與參數分量gi 有相關聯。The ring element multiplication operation step S042 includes performing a ring multiplication on one of the input feature ring elements x and one of the parameter ring elements g to generate a multiplication output ring element z . Multiplying one of the output components multiplication output ring element z z I whereby one of these inputs input characteristic feature of this element x by x i and the loop component of such a ring element g of this parameter by one parameter component g I performed a product component S0422 is obtained, as shown in Figure 2. In detail, the ring element comprising a multiplication step S042 wherein this ring These input element x by one and the parameter of such ring elements are converted g, first through one of the input feature parameter conversion matrix T x T matrix converter to generate g input feature transform output Convert output with parameters , and then transform the output on the input features Convert output with parameters Execute the component product S0422 to generate the multiplication output , and finally the multiplication output Generating multiplication output ring element transformation matrix T z z converter output through multiplication. The input feature transformation matrix T x , the parameter transformation matrix T g and the multiplication output transformation matrix T z in this embodiment are all identity matrices. The relationship between the multiplication output ring element z and the input feature ring element x can be expressed by the following equation (1): z = Gx (1). Wherein G represents a homogeneous matrix, and this matrix G isomorphic component parameters associated with a g i.
環元素加法運算步驟S044包含對複數個乘法輸出環元素z 執行一環元素加法(ring addition)而產生此些卷積特徵環元素cz 之一者,各卷積特徵環元素cz 之一卷積特徵分量czj 由此些乘法輸出環元素z 之複數個乘法輸出分量zi 執行一分量和S0442求得,如第3圖所示。Ring element adding step S044 contains a plurality of ring elements z multiplication output ring performing a summing element (ring addition) to generate a convolutional characteristic of such ring elements are one cz, wherein component one of the elements convolution convolution wherein each ring cz cz j whereby some complex multiplication output of the multiplication ring element z z I component output and S0442 perform a component obtained, as shown in FIG. 3.
張量非線性激活函數運算步驟S06係對卷積特徵環張量130之此些卷積特徵環元素cz
之此者執行方向性非線性激活函數S062而產生輸出特徵環元素fdir
(cz
)。詳細地說,方向性非線性激活函數S062包含第一轉換步驟S0622、線性整流函數運算步驟S0624及第二轉換步驟S0626。其中第一轉換步驟S0622係對此些卷積特徵環元素cz
之此者執行第一線性轉換V
而產生線性輸出環元素Vcz
。線性整流函數運算步驟S0624係對線性輸出環元素Vcz
執行分量線性整流函數fcw
(component-wise ReLU)而產生線性整流輸出環元素fcw
(Vcz
)。第二轉換步驟S0626係對線性整流輸出環元素fcw
(Vcz
)執行一第二線性轉換U而產生輸出特徵環元素fdir
(cz
)。再者,環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06依序執行,且第一轉換步驟S0622、線性整流函數運算步驟S0624及第二轉換步驟S0626依序執行。分量線性整流函數fcw
與輸出特徵環元素fdir
(cz
)分別符合下列式子(2)、(3):fcw
(y
)=(max
(0,y 0
),…,max
(0,yn -1
))t
(2);fdir
(cz
)=Ufcw
(Vcz
) (3)。
其中y
代表環元素,y 0
~yn -1
代表環元素y
之分量yi
之i
為0~n
-1。本實施例之第一線性轉換V
與第二線性轉換U
之任一者可為哈達瑪轉換H
(Hadamard Transform),其所對應之輸出特徵環元素可用「fH
(cz
)」表示,並符合下列式子(4):fH
(cz
)=Hfcw
(Hcz
) (4)。Tensor of nonlinear activation function convolution operation step S06 based on the feature of such ring tensor convolution wherein the
藉此,本發明的使用環張量之深度神經網路加速方法100透過具有方向性之非線性激活函數,能將額外之第一線性轉換V
與第二線性轉換U
置於方向性非線性激活函數S062中,因此不會增加環元素乘法運算步驟S042中乘法器的位元數。此外,本發明能有效地交換通道間的資訊,在環代數(Ring Algebra)裡達到最高的影像品質。Thereby, the deep neural
請一併參閱第1圖至第5圖,其中第5圖係繪示本發明第二實施例的使用環張量之深度神經網路加速方法100a的流程示意圖。如圖所示,使用環張量之深度神經網路加速方法100a包含環張量設定步驟S12、環張量卷積運算步驟S14以及張量非線性激活函數運算步驟S16。環張量設定步驟S12包含設定卷積網路之輸入特徵環張量110與參數環張量120。環張量卷積運算步驟S14包含環元素乘法運算步驟S142與環元素加法運算步驟S144。環張量設定步驟S12、環張量卷積運算步驟S14分別與第1圖之環張量設定步驟S02、環張量卷積運算步驟S04之作動相同,細節不再贅述。特別的是,第5圖之張量非線性激活函數運算步驟S16係對卷積特徵環張量130之此些卷積特徵環元素cz
之一者執行方向性非線性激活函數S162而產生輸出特徵環元素fdir
(cz
)。方向性非線性激活函數S162包含第一轉換步驟S1622、正規化線性轉換步驟S1624、線性整流函數運算步驟S1626及第二轉換步驟S1628。其中第一轉換步驟S1622係對此些卷積特徵環元素cz
之此者執行第一線性轉換V
而產生第一線性輸出環元素Vcz
。正規化線性轉換步驟S1624係對第一線性輸出環元素Vcz
與正規化線性轉換T
執行環元素乘法(同第2圖之環元素乘法運算步驟S042之環元素乘法)而產生第二線性輸出環元素VczT
。線性整流函數運算步驟S1626係對第二線性輸出環元素VczT
執行分量線性整流函數fcw
而產生線性整流輸出環元素fcw
(VczT
)。第二轉換步驟S1628係對線性整流輸出環元素fcw
(VczT
)執行第二線性轉換U
而產生輸出特徵環元素fdir
(cz
)。本實施例之第一線性轉換V
與第二線性轉換U
之任一者可為哈達瑪轉換H
,正規化線性轉換T
可為正規化對角矩陣,用以正規化環元素之分量,但本發明不以此為限。藉此,本發明的使用環張量之深度神經網路加速方法100a可利用正規化線性轉換T
實現所需之轉換,其搭配具有方向性之非線性激活函數,能將額外之第一線性轉換V
與第二線性轉換U
置於方向性非線性激活函數S162中,因此不會增加環元素乘法運算步驟S142中乘法器的位元數。此外,本發明能有效地交換通道間的資訊,在環代數裡達到最高的影像品質。Please refer to FIG. 1 to FIG. 5 together. FIG. 5 is a schematic flowchart of a deep neural
請一併參閱第1圖、第2圖、第3圖、第4圖及第6圖,其中第6圖係繪示本發明第三實施例的使用環張量之深度神經網路加速方法100b的流程示意圖。如圖所示,使用環張量之深度神經網路加速方法100b包含環張量設定步驟S02、環張量卷積運算步驟S04、張量非線性激活函數運算步驟S06以及張量量化步驟S08,其中環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06分別與第1圖之環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06之作動相同,細節不再贅述。特別的是,第6圖之使用環張量之深度神經網路加速方法100b更包含張量量化步驟S08,張量量化步驟S08包含環元素量化步驟S082,環元素量化步驟S082係對輸出特徵環元素fdir
(cz
)執行量化運算Q而產生量化輸出環元素Q(cz
)。環張量設定步驟S02、環張量卷積運算步驟S04、張量非線性激活函數運算步驟S06及張量量化步驟S08依序執行。藉此,本發明的使用環張量之深度神經網路加速方法100b可將張量非線性激活函數運算步驟S06所產之輸出特徵環元素fdir
(cz
)進行量化,進而得到具目標位元數之輸出特徵。Please refer to Fig. 1, Fig. 2, Fig. 3, Fig. 4 and Fig. 6 together, wherein Fig. 6 illustrates a
請一併參閱第1圖、第6圖及第7圖,其中第7圖係繪示本發明第四實施例的使用環張量之深度神經網路加速方法100c之卷積層的示意圖。如圖所示,使用環張量之深度神經網路加速方法100c之卷積層包含環張量卷積運算步驟S24、偏差張量疊加步驟S25、張量非線性激活函數運算步驟S26以及張量量化步驟S28。其中環張量卷積運算步驟S24係依據環元素乘法運算步驟(同第1圖之環元素乘法運算步驟S042)與環元素加法運算步驟(同第1圖之環元素加法運算步驟S044)對輸入特徵環張量x (l
-1)
及參數環張量g (l
)
進行運算,以產出複數個卷積特徵環張量cz (l
)
。其中l
代表第l
層。偏差張量疊加步驟S25係對卷積特徵環張量cz (l
)
疊加一偏差環張量b (l
)
(bias)而產生更新後之卷積特徵環張量cz (l
)
,然後將更新後之卷積特徵環張量cz (l
)
輸入至張量非線性激活函數運算步驟S26。經張量非線性激活函數運算步驟S26與張量量化步驟S28之後,張量量化步驟S28所產生之量化輸出環張量Q(cz
)會當作下一層之輸入特徵環張量x (l
)
。另外值得一提的是,在其他實施例中,會被輸入至張量非線性激活函數運算步驟S26之更新後之卷積特徵環張量cz (l
)
的數量可依據模型結構決定;換言之,不一定每一個更新後之卷積特徵環張量cz (l
)
都會經過張量非線性激活函數運算步驟S26。藉此,本發明的使用環張量之深度神經網路加速方法100c之卷積層可透過偏差環張量b (l
)
調整環張量卷積運算步驟S24之卷積特徵環張量cz (l
)
,進而得到所要之輸出特徵。Please refer to FIG. 1 , FIG. 6 , and FIG. 7 together, wherein FIG. 7 is a schematic diagram of a convolution layer of the deep neural
請一併參閱第1圖至第8圖,其中第8圖係繪示本發明第五實施例的使用環張量之深度神經網路加速系統200的方塊示意圖。如圖所示,使用環張量之深度神經網路加速系統200包含第一記憶體210、第二記憶體220以及運算處理單元230。Please refer to FIGS. 1 to 8 together, wherein FIG. 8 is a block diagram illustrating a deep neural
第一記憶體210用以存取卷積網路之輸入特徵環張量110,輸入特徵環張量110包含複數個輸入特徵環元素x
。第二記憶體220用以存取卷積網路之參數環張量120,參數環張量120包含複數個參數環元素g
。The
運算處理單元230電性連接於第一記憶體210與第二記憶體220,運算處理單元230接收輸入特徵環張量110與參數環張量120並經配置以實施環張量設定步驟S02、環張量卷積運算步驟S04及張量非線性激活函數運算步驟S06,如第1圖所示,其細節不再贅述。運算處理單元230可為微處理器、中央處理器或影像處理器,但本發明不以此為限。在其他實施例中,運算處理單元230經配置可實施第5圖的使用環張量之深度神經網路加速方法100a、第6圖的使用環張量之深度神經網路加速方法100b及第7圖的使用環張量之深度神經網路加速方法100c。The
藉此,本發明的使用環張量之深度神經網路加速系統200利用具有方向性之非線性激活函數,能將額外之線性轉換置於方向性非線性激活函數S062、S162中,因此不會增加環元素乘法運算步驟S042中乘法器的位元數。再者,本發明能有效地交換通道間的資訊,在環代數裡達到最高的影像品質。In this way, the deep neural
請一併參閱表一、表二、表三及第9圖,其中表一顯示維度D(Dimension)為2之環代數特性;表二顯示維度D為4之環代數特性之一部分;表三顯示維度D為4之環代數特性之另一部分;及第9圖係繪示本發明與習知技術之比較結果的示意圖。其中符號Real代表原始實數之卷積運算與激活函數運算。符號RH2 、RH4 代表輸入特徵轉換矩陣Tx 與參數轉換矩陣Tg 為哈達瑪轉換H 2 、H 4 ,哈達瑪轉換H 2 、H 4 分別符合下列式子(5)、(6):H 2 =(5);H 4 =(6)。Please refer to Table 1, Table 2, Table 3 and Figure 9 together. Table 1 shows the cyclic algebraic properties of dimension D (Dimension) of 2; Table 2 shows part of the cyclic algebraic properties of dimension D of 4; Table 3 shows The dimension D is another part of the ring algebraic property of 4; and FIG. 9 is a schematic diagram showing the comparison result between the present invention and the prior art. The symbol Real represents the convolution operation and activation function operation of the original real number. The symbols R H2 and R H4 represent the input feature transformation matrix T x and the parameter transformation matrix T g are Hadamard transformations H 2 , H 4 , and Hadamard transformations H 2 , H 4 conform to the following equations (5) and (6) respectively: H 2 = (5); H 4 = (6).
符號RC
代表同構矩陣G
為2×2旋轉矩陣(Rotation Matrix),且輸入特徵轉換矩陣Tx
與參數轉換矩陣Tg
係由複數乘法演算法(complex multiplication algorithm)推導而成;符號RI2
、RI4
代表同構矩陣G
使用分量積,且輸入特徵轉換矩陣Tx
與參數轉換矩陣Tg
為單位矩陣;符號RF4
代表同構矩陣G
使用循環卷積,且輸入特徵轉換矩陣Tx
與參數轉換矩陣Tg
係由傅立葉轉換(Fourier transform)推導而成。符號RQ
代表同構矩陣G
使用哈密頓(Hamilton)之四元數。符號RH2
、RC
、RI2
、RH4
、RF4
、RQ
、RI4
為習知技術,其非線性函數f
均為分量線性整流函數fcw
;符號(RI2
,fH2
)、(RI4
,fH4
)則為本發明,其非線性函數f
為方向性非線性激活函數fH 2
(x
)、fH 4
(x
)。第9圖之比較結果係依據四倍超解析度網路模型SR4ERNet-B17R3N1(four-times super-resolution (SR×4) networks SR4ERNet)模擬求得。由表一、表二、表三及第9圖可知,本發明之(RI2
,fH2
)、(RI4
,fH4
)使用具有方向性之非線性激活函數,相較於習知的環代數,本發明需要最少的硬體(即硬體資源比例及面積比例最低)並提供最高的品質(即峰值信噪比(Peak Signal-to-Noise Ratio;PSNR)最高),且不會增加乘法器的位元數。
表一
由上述實施方式可知,本發明具有下列優點:其一,利用具有方向性之非線性激活函數,在不增加環元素乘法運算步驟中乘法器的位元數之條件下,仍保持影像品質,可避免習知技術中因增加乘法器之位元數而導致複雜度上升之問題。其二,透過具有方向性之非線性激活函數,能將額外之線性轉換置於方向性非線性激活函數中,因此不會增加環元素乘法運算步驟中乘法器的位元數。其三,本發明能有效地交換通道間的資訊,在環代數裡達到最高的影像品質。It can be seen from the above-mentioned embodiments that the present invention has the following advantages: firstly, by using a non-linear activation function with directivity, the image quality is still maintained without increasing the number of bits of the multiplier in the multiplication step of the ring element, and the image quality can be maintained. The problem of increasing the complexity caused by increasing the number of bits of the multiplier in the prior art is avoided. Second, through a directional nonlinear activation function, additional linear transformations can be placed in the directional nonlinear activation function, thus not increasing the number of bits in the multipliers in the loop element multiplication step. Third, the present invention can effectively exchange information between channels and achieve the highest image quality in ring algebra.
雖然本發明已以實施方式揭露如上,然其並非用以限定本發明,任何熟習此技藝者,在不脫離本發明之精神和範圍內,當可作各種之更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection of the present invention The scope shall be determined by the scope of the appended patent application.
100,100a,100b,100c:使用環張量之深度神經網路加速方法 S02,S12:環張量設定步驟 S04,S14,S24:環張量卷積運算步驟 S042,S142:環元素乘法運算步驟 S0422:分量積 S044,S144:環元素加法運算步驟 S0442:分量和 S06,S16,S26:張量非線性激活函數運算步驟 S062,S162:方向性非線性激活函數 S0622,S1622:第一轉換步驟 S0624,S1626:線性整流函數運算步驟 S0626,S1628:第二轉換步驟 S08,S28:張量量化步驟 S082:環元素量化步驟 S1624:正規化線性轉換步驟 S25:偏差張量疊加步驟 110:輸入特徵環張量 120:參數環張量 130:卷積特徵環張量x :輸入特徵環元素x (l -1) ,x (l ) :輸入特徵環張量g :參數環元素g (l ) :參數環張量cz :卷積特徵環元素cz (l ) :卷積特徵環張量z :乘法輸出環元素Tx :輸入特徵轉換矩陣Tg :參數轉換矩陣Tz :乘法輸出轉換矩陣:輸入特徵轉換輸出:參數轉換輸出:乘法輸出b (l ) :偏差環張量 200:使用環張量之深度神經網路加速系統 210:第一記憶體 220:第二記憶體 230:運算處理單元 Real,RH2 ,RC ,RH4 ,RF4 ,RQ ,(RI2 ,fH2 ),(RI4 ,fH4 ):符號 PSNR:峰值信噪比100, 100a, 100b, 100c: Deep Neural Network Acceleration Method Using Ring Tensor S02, S12: Ring Tensor Setting Step S04, S14, S24: Ring Tensor Convolution Operation Step S042, S142: Ring Element Multiplication Operation Step S0422 : component product S044, S144: loop element addition operation step S0442: component sum S06, S16, S26: tensor nonlinear activation function operation step S062, S162: directional nonlinear activation function S0622, S1622: first conversion step S0624, S1626: Linear rectification function operation steps S0626, S1628: second conversion steps S08, S28: tensor quantization step S082: ring element quantization step S1624: normalized linear conversion step S25: deviation tensor superposition step 110: input feature ring tensor 120: parametric ring tensor 130: convolution feature ring tensor x : input feature ring elements x ( l -1) , x ( l ) : input feature ring tensor g : parametric ring element g ( l ) : parametric ring tensor cz : convolution feature loop element cz ( l ) : convolution feature loop tensor z : multiplication output loop element T x : input feature transformation matrix T g : parameter transformation matrix T z : multiplication output transformation matrix : input feature transform output : parameter conversion output : multiplication output b ( l ) : deviation ring tensor 200 : deep neural network acceleration system using ring tensor 210 : first memory 220 : second memory 230 : arithmetic processing unit Real, R H2 , R C , R H4 , R F4 , R Q , (R I2 , f H2 ), (R I4 , f H4 ): symbols PSNR: peak signal-to-noise ratio
第1圖係繪示本發明第一實施例的使用環張量之深度神經網路加速方法的流程示意圖; 第2圖係繪示第1圖的使用環張量之深度神經網路加速方法的環元素乘法運算步驟之示意圖; 第3圖係繪示第1圖的使用環張量之深度神經網路加速方法的環元素加法運算步驟之示意圖; 第4圖係繪示第1圖的使用環張量之深度神經網路加速方法的方向性非線性激活函數之流程示意圖; 第5圖係繪示本發明第二實施例的使用環張量之深度神經網路加速方法的流程示意圖; 第6圖係繪示本發明第三實施例的使用環張量之深度神經網路加速方法的流程示意圖; 第7圖係繪示本發明第四實施例的使用環張量之深度神經網路加速方法之卷積層的示意圖; 第8圖係繪示本發明第五實施例的使用環張量之深度神經網路加速系統的方塊示意圖;以及 第9圖係繪示本發明與習知技術之比較結果的示意圖。FIG. 1 is a schematic flowchart of a deep neural network acceleration method using ring tensors according to a first embodiment of the present invention; FIG. 2 is a schematic diagram illustrating the multiplication operation steps of ring elements of the deep neural network acceleration method using ring tensors of FIG. 1; FIG. 3 is a schematic diagram illustrating the ring element addition operation steps of the deep neural network acceleration method using ring tensors of FIG. 1; FIG. 4 is a schematic flowchart of the directional nonlinear activation function of the deep neural network acceleration method using ring tensors of FIG. 1; FIG. 5 is a schematic flowchart of a deep neural network acceleration method using ring tensors according to a second embodiment of the present invention; FIG. 6 is a schematic flowchart illustrating a method for accelerating a deep neural network using ring tensors according to a third embodiment of the present invention; FIG. 7 is a schematic diagram illustrating a convolutional layer of a deep neural network acceleration method using ring tensors according to a fourth embodiment of the present invention; FIG. 8 is a block diagram illustrating a deep neural network acceleration system using ring tensors according to a fifth embodiment of the present invention; and FIG. 9 is a schematic diagram showing a comparison result between the present invention and the prior art.
100:使用環張量之深度神經網路加速方法100: Deep Neural Network Acceleration Method Using Ring Tensors
S02:環張量設定步驟S02: Ring tensor setting steps
S04:環張量卷積運算步驟S04: Ring tensor convolution operation steps
S042:環元素乘法運算步驟S042: Ring element multiplication operation steps
S044:環元素加法運算步驟S044: Ring element addition operation steps
S06:張量非線性激活函數運算步驟S06: Operation steps of tensor nonlinear activation function
S062:方向性非線性激活函數S062: directional nonlinear activation function
110:輸入特徵環張量110: Input feature ring tensor
120:參數環張量120: Parametric Ring Tensor
130:卷積特徵環張量130: Convolutional Feature Ring Tensor
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/230,958 US20210383198A1 (en) | 2020-06-09 | 2021-04-14 | Deep neural network accelerating method using ring tensors and system thereof |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063036976P | 2020-06-09 | 2020-06-09 | |
US63/036,976 | 2020-06-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202147147A true TW202147147A (en) | 2021-12-16 |
TWI775296B TWI775296B (en) | 2022-08-21 |
Family
ID=78835704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110103724A TWI775296B (en) | 2020-06-09 | 2021-02-01 | Deep neural network accelerating method using ring tensors and system thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113780538A (en) |
TW (1) | TWI775296B (en) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10977001B2 (en) * | 2018-02-05 | 2021-04-13 | Mediatek Inc. | Asymmetric quantization of multiple-and-accumulate operations in deep learning processing |
EP3557425B1 (en) * | 2018-04-19 | 2024-05-15 | Aimotive Kft. | Accelerator and system for accelerating operations |
US10769526B2 (en) * | 2018-04-24 | 2020-09-08 | Intel Corporation | Machine learning accelerator architecture |
CN109344966A (en) * | 2018-07-26 | 2019-02-15 | 广东工业大学 | A kind of method of the full Connection Neural Network of efficient tensorization |
US11593660B2 (en) * | 2018-09-18 | 2023-02-28 | Insilico Medicine Ip Limited | Subset conditioning using variational autoencoder with a learnable tensor train induced prior |
US11562046B2 (en) * | 2018-11-26 | 2023-01-24 | Samsung Electronics Co., Ltd. | Neural network processor using dyadic weight matrix and operation method thereof |
-
2021
- 2021-02-01 TW TW110103724A patent/TWI775296B/en active
- 2021-02-01 CN CN202110134690.XA patent/CN113780538A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
TWI775296B (en) | 2022-08-21 |
CN113780538A (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou | Deep distributed convolutional neural networks: Universality | |
Xie et al. | Fast and efficient second-order method for training radial basis function networks | |
CN110119809B (en) | Apparatus and method for performing MAC operations on asymmetrically quantized data in neural networks | |
Dunne | Functional determinants in quantum field theory | |
WO2020044527A1 (en) | Information processing device | |
CN109767000A (en) | Neural network convolution method and device based on Winograd algorithm | |
Sun et al. | Weighted local linear composite quantile estimation for the case of general error distributions | |
Zhang et al. | A unified dnn weight pruning framework using reweighted optimization methods | |
CN110874636A (en) | Neural network model compression method and device and computer equipment | |
US20030005007A1 (en) | Signal adaptive filter bank optimization | |
WO2023065983A1 (en) | Computing apparatus, neural network processing device, chip, and data processing method | |
CN111796797A (en) | Method and device for realizing multiplication acceleration of polynomial on ring by using AI accelerator | |
Rusu et al. | Learning fast sparsifying transforms | |
CN114118344A (en) | Hardware accelerator applied to Transformer neural network and calculation method thereof | |
TWI775296B (en) | Deep neural network accelerating method using ring tensors and system thereof | |
CN113838104B (en) | Registration method based on multispectral and multimodal image consistency enhancement network | |
CN108566237B (en) | Low-complexity geometric mean decomposition precoding implementation method based on double diagonalization | |
US20220036190A1 (en) | Neural network compression device | |
WO2023045516A1 (en) | Fft execution method, apparatus and device | |
Zhang et al. | A unified dnn weight compression framework using reweighted optimization methods | |
CN116047753A (en) | Construction and optimization method of orthogonal optimization model of optical system | |
CN114237548A (en) | Method and system for complex dot product operation based on nonvolatile memory array | |
Boyd | A lag-averaged generalization of Euler's method for accelerating series | |
He et al. | Efficient FPGA design for Convolutions in CNN based on FFT-pruning | |
CN114118343A (en) | Layer normalization processing hardware accelerator and method applied to Transformer neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent |