TW201232429A - High-speed hardware back-propagation and recurrent type artificial neural network with flexible architecture - Google Patents

High-speed hardware back-propagation and recurrent type artificial neural network with flexible architecture Download PDF

Info

Publication number
TW201232429A
TW201232429A TW100101585A TW100101585A TW201232429A TW 201232429 A TW201232429 A TW 201232429A TW 100101585 A TW100101585 A TW 100101585A TW 100101585 A TW100101585 A TW 100101585A TW 201232429 A TW201232429 A TW 201232429A
Authority
TW
Taiwan
Prior art keywords
layer
output
receives
multiplexer
neural network
Prior art date
Application number
TW100101585A
Other languages
Chinese (zh)
Other versions
TWI525558B (en
Inventor
Meng-Shen Cai
Yan-Zhi Ye
Ya-Yu Zhan
Original Assignee
Univ Nat Taipei Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Taipei Technology filed Critical Univ Nat Taipei Technology
Priority to TW100101585A priority Critical patent/TWI525558B/en
Publication of TW201232429A publication Critical patent/TW201232429A/en
Application granted granted Critical
Publication of TWI525558B publication Critical patent/TWI525558B/en

Links

Landscapes

  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention adopts a field programmable gate array (FPGA) to develop a hardware artificial neural network, so as to fully utilize the features of parallel processing and high speed in a hardware circuit, and complete the operation of learning and recall in an artificial neural network. The present invention adopts a serial circular hardware architecture as a base to perform the computation of a 3-layer and 4-layer back-propagation artificial neural and recurrent type artificial neural network and realize the calculation of a segment linear activating function by the hardware schemes. The activating function includes a sigmoid function and a hyperbolic tangent function for allowing a user to select by himself/herself. A pipelined design is adopted to enhance the speed at an output layer error calculation in a reverse calculation. A serial circular hardware architecture is adopted at a hidden layer error calculation. Owing to the development being completed on the hardware, the portability and calculation speed are easier and faster than those developed on the software, so as to be suitable for low-end embedded systems.

Description

201232429 - 六、發明說明: 【發明所屬之技術領域】 本發明為__路係錢相互連結㈣經元麵仿生物神經網 路的特性,其本身為—平行運算的是制於彻硬體電路 的平行處理與高速的特性,完成類神經網路學習及回想的運算。 【先前技術】 先前技術-,D. H_erstrom提出χι _架構又稱為_ | (Connected Network of Adapted Processors System) . 多重資料匯流輸gleInStructionst_^ 表構,所有節點在同-時脈中所執行的是相同的指令,如圖_所示。 其優點在於可藉由匯流排㈣每個運算節財所設計的累加器與乘法 器’使得线整體運算速度更為有效[然而棘構僅完細神經網 路部分的運算’並沒有種_的活化函無塊,也不具備完整的學 習功能。且由於CNAPS在許多應用上精度不足,且一般類神經網路^ > 輸人、權重、轉移函數等通常都採用浮點的格式,在運算格式轉換時 將會造成許多的誤差。 先前技術二’其架構為二維脈動陣列架構(2D_SystQlic 2D SA)如圖一所示。此架構的各個連结(links)的神經元(neu_) 皆具有-個乘法ϋ運算資料與權重值的乘積,f料從上往下經過各個 節點與節點_«值做,可得到神經元的輸出。此架構中一個 縱列的郎點才能構成-個神經元的功能,但由於各個節點均包含一個 乘法器,將耗費大量的晶片面積,增添硬體的成本。 3 201232429 .交通大學额碎所郭摘_ VLSI平㈣發__路。該系 騰個運算單福控制器相連結,控侧用並舰流排的形式將 細至鳴⑽轉。料卿__辦,臟 大的匯流排將需要大量晶片的_,大幅度的提高了成本且運算單 元的數量會同時影響著控繼的架構,不僅造成移植上的不便,在即 時性編上更顯得_。亦她VLSI技術作為硬體開發平台設計 類神經網路晶# ’絲構_分時運算架構, 倍的資料量,減少硬體架構的成本。 使一個晶片可以處理數 先前技術三提岐_的環狀串觸構。從層級⑽度來觀看類 神經網路賴的運算,且記«完錢過查表法的硬體活化函 數’此開發方式只需針對;進行㈣單元數目上的修改。先 前技術四建構出通關的環狀㈣硬體架構,並改良先前許多架構無 法用硬體處理到的學f演算法部分。絲技術五_平行處理的設計 方式,但該設計僅將硬體大量的建造出運料元,在系制運算時間 上雖然相s快速,但需耗冑大量的硬體。先前技術六建構丨分段計算 的硬體架構唯必須透過軟體方得以達成,一但所面對的應用問題不 同,網路架構改變時,硬體就必須重新合成。 由此可見,上述習用物品仍有諸多缺失,實非一良善之設計者, 而亟待加以改良。 本案發明人鑑於上述㈣技術崎生的各項親加以改 良創新’並經乡年苦心、麟潛心研究後,終於成功研發絲本件具彈 性架構的咼速硬體倒傳遞及回饋型類神經網路。 201232429 【發明内容】 透過控制器以及善用記憶體將整個硬體的架構得以實現各種不同 的類神經網路,增加硬體類神經網路在其應用上的實用性為本發明的 重點。 1.1 倒傳遞類神經網路 倒傳遞類神經網路是目前最具代表性的人工類神經網路架構其 架構如圖三所示。倒傳遞類神經網路是目前應用最廣的網路屬於多 _ 縣知器的架構。相較於古典的感知機網路,增加了隱藏層的機制, 並且改使用平滑可微分的活化函數。多層感知器學習演算法的基本原 理是基於最陡坡降法(The Gradient Steepest Descent Meth〇d)的 觀念,在學習的過程中,調整網路的權重值,將誤差函數最小化。多 層感知器包括輸入層、隱藏層與輸出層,各層的作用如下: 1.輸入層:用來表示類神經網路根據各個問題從外部所接收的輸 入變數,在輸入層中的每一個神經元代表一個輸入變數,由於活化函 • 數的值域限制,不同的輸入之間會影響學習的效果,因此在輸入時需 進行適當的轉換’使輸入變數的值域介於〇〜1之間。 2·隱藏層:用來處理輸入單it之間的相互影響,其神經元的個數 沒有-定的準則,通常需要以經驗法則決定。隱藏層通常使用非線性 轉移函數。一般的設計,隱藏層的層數並無限制,但一層或兩層隱藏 層便可以處理大部分的問題。 3.輸出層:用來表示網路的輸出變數,其運算單元的數目依照問 題而定。 201232429 1.2活化函數 活化函數(Activation function)又稱為轉移函數(Transfer function),是一模擬生物神經元的門檻值的機制,通常為一非線性函 數。輸入數據與權重值運算並加總後稱為集成函數(Summati〇n function)。活化函數的主要功能是將集成函數轉換使網路得以發揮修 正權重值的效果。活化函數的特性會影響類神經網路學習非線性的功 能,而非線性的函數可以避免神經元在處理輪入資料時遺失非線性特 性的可能,這些特性會影響網路學習的功能。常見的活化函數有下列 幾種: 對稱硬限函數(Symmetrical Hard Limit function)如圖四: -1 if x<0 if 〇:>0 (1.1) 2. 雙彎曲函數(Sigmoid function)如圖五: =--- f i c\ \ 1 + e” (1.2) 3. 雙曲線正切函數(取咐加^ Tangent functi〇n)如圖六: y(x) = tanh(jc) = ——~~—— /1 〇 \201232429 - VI. Description of the invention: [Technical field to which the invention pertains] The present invention is characterized in that __roads are linked to each other (4) through the characteristics of a meta-surface mimic neural network, which itself is a parallel operation of a hard hardware circuit The parallel processing and high-speed features complete the neural network learning and recall operations. [Prior Art] Prior art - D. H_erstrom proposed χι_Architecture, also known as _ | (Connected Network of Adapted Processors System). Multiple data sinks are transmitted to gleInStructionst_^, all nodes are executed in the same-clock. The same instructions, as shown in Figure _. The advantage is that the accumulator and the multiplier designed by each bus of the bus (4) can make the overall operation speed of the line more efficient [however, the operation of the spine only completes the neural network part] is not kind. The activation function has no block and does not have a complete learning function. And because CNAPS is not accurate enough in many applications, and the general neural network ^ > input, weight, transfer function, etc. usually use the floating point format, it will cause a lot of errors in the operation format conversion. The prior art 2' architecture is a two-dimensional pulsation array architecture (2D_SystQlic 2D SA) as shown in FIG. The neurons (neu_) of each link of this architecture have the product of the multiplication and operation data and the weight value. The material f is obtained from the top to the bottom through each node and the node _« value, and the neuron is obtained. Output. A vertical point in this architecture can constitute a function of a neuron, but since each node contains a multiplier, it will consume a large amount of wafer area and add hardware cost. 3 201232429 . Jiaotong University's smashing place Guo _ _ VLSI Ping (four) issued __ road. The system is connected with a single operation controller, and the control side uses a form of parallel ship flow to be fine (10). It is expected that the dirty bus will require a large number of wafers, which will greatly increase the cost and the number of computing units will affect the control architecture at the same time, which will not only cause inconvenience in transplantation, but also in the instant editing. More _. Her VLSI technology is also used as a hardware development platform to design a neural network crystal # 'silk structure _ time-sharing computing architecture, double the amount of data, reducing the cost of the hardware architecture. A wafer can be processed by a number of prior art three-tie 岐. From the level (10) degree to see the operation of the neural network based on, and remember the "hardware activation function of the money check method", this development method only needs to be targeted; (4) the number of units is modified. The prior art 4 constructs a circular (four) hardware architecture for customs clearance and improves the part of the learning algorithm that many previous architectures cannot handle with hardware. Silk technology five _ parallel processing design, but the design only a large number of hardware to build the transport element, although the phase is fast in the system operation time, but it consumes a lot of hardware. The hardware architecture of the prior art six construction and segmentation calculations must be achieved only through the software side. Once the application problems are different, the hardware must be re-synthesized when the network architecture changes. It can be seen that there are still many shortcomings in the above-mentioned items, which is not a good designer, but needs to be improved. In view of the above (4) technology, the inventors of the case have improved and innovated in the past, and after years of hard work and research, they finally succeeded in researching and developing the elastic hardware back-transfer and feedback-type neural network. . 201232429 [Disclosed] The utility of the entire hardware architecture to realize various neural networks through the controller and the use of memory, and increasing the practicality of the hardware-like neural network in its application is the focus of the present invention. 1.1 Inverted-transfer-like neural network The inverse-transfer-like neural network is currently the most representative artificial neural network architecture. Its architecture is shown in Figure 3. The reverse-transfer-like neural network is the architecture of the most widely used network belonging to the multi-counter. Compared to the classical perceptron network, the mechanism of the hidden layer is added, and the smoothing and differentiating activation function is used instead. The basic principle of the multi-layer perceptron learning algorithm is based on the concept of the Gradient Steepest Descent Meth〇d. In the process of learning, the weight value of the network is adjusted to minimize the error function. The multi-layer perceptron includes an input layer, a hidden layer and an output layer. The functions of each layer are as follows: 1. Input layer: used to represent the input variables received by the neural network based on various problems, each neuron in the input layer. Representing an input variable, due to the value range of the activation function, different inputs will affect the learning effect, so the appropriate conversion is required when inputting 'The value range of the input variable is between 〇~1. 2. Hidden layer: used to deal with the interaction between input singles. The number of neurons is not defined. It usually needs to be determined by rules of thumb. The hidden layer usually uses a nonlinear transfer function. In general design, there is no limit to the number of layers in the hidden layer, but one or two layers of hidden layers can handle most of the problems. 3. Output layer: used to represent the output variable of the network. The number of arithmetic units depends on the problem. 201232429 1.2 Activation function The activation function, also known as the transfer function, is a mechanism that simulates the threshold of biological neurons, usually a nonlinear function. The input data and weight value operations are summed up and called the integration function (Summati〇n function). The main function of the activation function is to convert the integration function so that the network can play the role of correcting the weight value. The nature of the activation function affects the function of the neural network to learn nonlinearity, while the nonlinear function avoids the possibility of neurons losing nonlinear characteristics when processing rounded data. These characteristics affect the function of network learning. The common activation functions are as follows: Symmetrical Hard Limit function is shown in Figure 4: -1 if x<0 if 〇:>0 (1.1) 2. Sigmoid function is shown in Figure 5. : =--- fic\ \ 1 + e" (1.2) 3. Hyperbolic tangent function (take 咐 plus ^ Tangent functi〇n) as shown in Figure 6: y(x) = tanh(jc) = ——~~- — /1 〇 \

ex ^e~x VUJ 4·線性函數(Linear function)如圖七: y(v) = fcv (1.4) 1.3時間延遲類神經網路 時間延遲類神經網路(Time-Delay Neural Networks,以下簡稱 TDNN)與倒傳遞網路同樣為多層前饋式類神經網路的架構(Multi layer perceptnm,MLP) ’學習演算法亦使用倒傳遞演算法。與倒傳遞類神 201232429 麵路不同的部分在於湘將所有可能影響現階段狀態之前幾期 影響因子亦全部當作輸人訊號,藉由資料結構訊息的先後得以表現時 間前後的關係。 倒傳遞類神經網路的架構在處理有關時序性的問題時,加入前幾 筆過去的資訊,將日相序列轉換成空間序列的鶴輸蝴路中,使類 神經網路能夠學習資訊間時間前後的相關特性,即為時間延遲類神經 網路。如圖八所示,該網路僅有—筆輸入資料,但每次輸入時會包含 該資拠在、前-筆以及前兩筆的魏1與網路現在輸出有關的前 幾筆操取不同時間片段的資料,同時輪人至網路。藉由改變輸入資料 的結構,令網路能學習有關時序性的問題。 14回饋式類神經網路 回饋式類神經網路(Recurrent Neural Netw〇rk,酬)是一種可 以處理時雜辦(te_ral p職sses)的動誠神_路。此架 構包括連結輸人層a_tenated input-⑽⑽laye小回饋處理 層(P⑽ssing layer)與輸出層(0utput layer),如圖九所示。承 接層中的物TG代表記憶單元,隱藏層的輸人為輸人層與承接層的輸 出’其餘皆與倒傳遞類神經網路相同。而所謂_態是指神經元在網 路外進仃哺的行為,與倒傳遞賴經晴是發生仙部神經元或者 疋_輸人親的變㈣成㈣間延遲瞬酬料靜態神經網路不 同。網路透過將現階段的神經元的輸出值回傳(feedbad〇的方式而 (Time-Delay) , 與時間有密切相關的問題。 201232429 回饋式神經網路中具有回饋鏈結的神經元稱為動態神經元 (dynamic neuron)。回饋式神經網路與靜態倒傳遞類神經網路的不同 在於遞迴項網路的輸入與輸出之間並非僅是映射關係(mapping),而 是可藉由回傳的方式將現階段訊息處理的結果保留在網路結構中,作 為處理下一階段的參考訊息。回饋式神經網路的特性相當適合模擬動 態即時性的系統。其優點除學習速度快、網路可塑性高及收斂速度快 之外最重要的疋不需要將所有範例全部輸入網路,而可一筆一筆的 將資料送上線做即時性的線上學習。 時間延遲類神.經娜與回饋式神經網路都具有解決時間相關問題 的能力。然而時間延遲類神經網路的延遲個數難以決定。若延遲個數 太長’容易造成輸入向量過於龐大使得網路難以訓練。例如:輸入資 料有五項’每項延遲二筆資訊,則時間延遲類神經網路就必須具有十 五個輸入神經元。軸時間延麵神酬關單明瞭但由於網路較 為龐大’需有大量的訓練資料’造成訓練時間冗長,網路收敛速度慢 的情形’壓縮了在實際應用上的空間。 1.5誤差學習演算法 誤差學習演算法储正向運算的絲與目標值比概得到誤差, 透過誤差的大小修正麵耐彡成㈣向運算部分。正向運算過程中, 資料從輸入層經由隱藏層做加權運算得到集成函數透過活化函數的 轉換後,再傳至輸出層計算網路的輸出值。#輸出無法得到目標值時, 便會進行逆向運算’將誤差值回傳,期望透過修改各層神經元的權重 值使得誤差達到目標值或者是可容忍範圍之内。透過回饋式神經網路 201232429 即時性的調整 者稱為線上學習法,線场習法敵得醜具有動態且 參數的功能。 1.5.1 正向運算 正向運算是將輸人值經過與權重值加權乘積得到集成函數再透過活 化函數將值轉換成處理單元的輸出值,如公式(1.5)、(1 6)。、 正向運算單元模型。 ηEx ^e~x VUJ 4 · Linear function Figure 7: y(v) = fcv (1.4) 1.3 Time-Delay Neural Networks (Time-Delay Neural Networks, hereinafter referred to as TDNN) The multi-layer perceptnm (MLP) 'learning algorithm is also used in the multi-layer feedforward neural network's learning algorithm. The difference from the inverted transmission god 201232429 is that all the influence factors that may affect the current stage of the state are all regarded as the input signals, and the information structure information can be used to express the relationship between the time and the time. The architecture of the inverted-transfer-like neural network, when dealing with timing issues, adds the previous information from the past, transforming the solar phase sequence into a spatial sequence of the Hessian Road, enabling the neural network to learn the time between the information. The related characteristics before and after, namely the time delay type neural network. As shown in Figure 8, the network only has the input data of the pen, but each time it is input, it will contain the first few operations related to the current output of the resource, the front-pen and the first two. Data from different time segments, while turning people to the Internet. By changing the structure of the input data, the network can learn about timing issues. 14 Reward-like neural network Recurrent Neural Netw〇rk (remuneration) is a kind of slogan that can handle the problem (te_ral p job sses). The architecture includes a connection input layer a_tenated input-(10)(10)laye small feedback processing layer (P(10) ssing layer) and an output layer (0utput layer), as shown in FIG. The object TG in the receiving layer represents the memory unit, and the input of the hidden layer is the output of the input layer and the receiving layer. The rest are the same as the inverted transmission type neural network. The so-called _ state refers to the behavior of neurons feeding in and out of the network, and the reverse transmission of Lai Jingqing is different from the static neural network between the occurrence of the fairy neuron or the change of the sinister (four) into (4). The network is closely related to time by returning the output value of the current neurons (Time-Delay). 201232429 Neurons with feedback links in the feedback neural network are called Dynamic neuron. The difference between a feedback neural network and a static inverse neural network is that the input and output of the recursive network are not just mappings, but can be returned. The transmission method keeps the result of the current message processing in the network structure as a reference message for processing the next stage. The characteristics of the feedback neural network are quite suitable for the system that simulates dynamic immediacy. The most important aspect of road plasticity and fast convergence is that you don't need to input all the examples into the network, but you can send the data to the line for instant online learning. Time delay class god. Jing Na and feedback nerve The network has the ability to solve time-related problems. However, the number of delays in the time-delay-like neural network is difficult to determine. If the number of delays is too long, it is easy to cause input. The amount is too large to make the network difficult to train. For example, if there are five items in the input data, the time delay type neural network must have fifteen input neurons. The axis time extension face is clear. However, because the network is relatively large, 'there is a large amount of training data, the training time is long, and the network convergence speed is slow'. The space in practical application is compressed. 1.5 Error learning algorithm error learning algorithm storage forward operation The ratio of the wire to the target value is generally obtained, and the error correction surface is subjected to the (four) direction operation part. In the forward operation process, the data is weighted from the input layer via the hidden layer to obtain the conversion of the integration function through the activation function, and then Pass to the output layer to calculate the output value of the network. # Output can not get the target value, it will carry out the reverse operation 'return the error value, it is expected to modify the weight value of each layer of neurons to make the error reach the target value or tolerable range Through the feedback neural network 201232429, the immediacy adjuster is called online learning method, and the line field method is ugly. There are dynamic and parameter functions. 1.5.1 Forward operation The forward operation is to pass the input value to the weighted product of the weight value to obtain the integrated function and then convert the value into the output value of the processing unit through the activation function, as shown in formula (1.5). (1 6)., Forward unit model. η

νι=Σ^.·%)+^, ί=, (1.5) yj = f(Vj) = /(^] νν,.Λ:,. + b) fcl (1.6) bj .具有閥值功能的偏權值(bias)。 π :輸入層神經元數量。 wij :連結加權值。 /(_):具有處理功能的轉換函數,將輸入處理單元進行加權乘積 運算後的總和,轉換成處理單元輸出值的公式。通常為— 個具有雙向彎曲的指數函數,根據不同的函數在自變數趨 近正負無限大[-0〇,+0〇]時,函數值趨近的數值將不同。 乃:具有輪出功能的訊號。 1· 5. 2 逆向運算 “夕^感4㈣學省演算法中’以基於最陡坡降法導出的倒傳遞演 算法(Back,卿ti〇n,βρ)最具代表性。學習演算法的宗旨是降 低網路輸出值與目標值的差距。網路的輸出值與訓練目標的輸出值的 差距為網轉差’以誤差函數或觀量函數(Cost funGtlGn)表示之, 201232429 網路的學肢《«函數最小化的輕,狀義如下 (1.7) £=去 Σ(,η) z j ^輸出層第j個輸出單元的理想目標輸出值。Νι=Σ^.·%)+^, ί=, (1.5) yj = f(Vj) = /(^] νν,.Λ:,. + b) fcl (1.6) bj . The weight (bias). π : number of input layer neurons. Wij : Link weighted value. /(_): A conversion function with processing function that converts the sum of the weighted product operations of the input processing unit into a formula for the output value of the processing unit. Usually, it is an exponential function with two-way bending. According to different functions, when the self-variable is close to positive and negative infinity [-0〇, +0〇], the value of the function value will be different. Yes: A signal with a round-out function. 1· 5. 2 Reverse operation “Xi ^ Sense 4 (4) Provincial algorithm] The back-transfer algorithm based on the steepest slope method (Back, Qing ti〇n, βρ) is the most representative. The purpose of learning algorithm It is to reduce the gap between the network output value and the target value. The difference between the output value of the network and the output value of the training target is the network slip difference, which is represented by the error function or the observatory function (Cost funGtlGn), 201232429 The «function minimizes the lightness, and the meaning is as follows (1.7) £=destroy (,η) zj ^The ideal target output value of the jth output unit of the output layer.

Yj .輸出層第j個輸出單元的實際輸出值。 j :輸出層向量數目。 :類神Λ·稱的3樣過程為藉由修正連結的權重值,使誤差函數達 到最j化的秘’制整的巾§度和誤差函數對該加權值的偏微分值大 小成比例,並與學習速率7成正比。 Δνν. = -η. M (1.8) 77:最小能量函數的幅度,控制每次加權值修改的步幅,稱為學習 速率。 w 網路在修正與輸出層有關的權重時,隱藏層第]個與輸出層第k 個間之連結加權值的偏微分可驗積分的連鎖律(chain_ _5£_= dEdy, dvk 于 S^^~yk)f\vk) k :類神經網路的輸出層第k個神經元。 式·輪出層第k個神經元的誤差量。 J :類神經網路的隱藏層第j個神經元。 網路輪出層與隱藏層連結加權值的修正量為: 201232429 ° 權重與網路輪出層不相關時,誤差函數對網路前一層第 個神、’星元與隱藏層第]個神經元間的連結加權制偏微分為: (^Σ~(rk - y*)· /'(Vjt). w.t · /'(v.) · y.Yj . The actual output value of the jth output unit of the output layer. j : number of output layer vectors. The three kinds of processes are called to correct the weight value of the link, so that the error function reaches the most jest, and the error degree is proportional to the partial differential value of the weighted value. And is proportional to the learning rate of 7. Δνν. = -η. M (1.8) 77: The magnitude of the minimum energy function, which controls the stride of each weighted value modification, called the learning rate. w When the network corrects the weight associated with the output layer, the linkage law of the partial differential of the hidden weights of the hidden layer and the kth of the output layer (chain_ _5£_= dEdy, dvk in S^ ^~yk)f\vk) k : the kth neuron of the output layer of the neural network. The amount of error of the kth neuron in the rounded-out layer. J: The jth neuron of the hidden layer of the neural network. The correction amount of the network rounding layer and hidden layer link weighting value is: 201232429 ° When the weight is not related to the network rounding layer, the error function is the first god, the 'star and the hidden layer' The weighting system of meta-weights between meta-divisions is: (^Σ~(rk - y*)· /'(Vjt). wt · /'(v.) · y.

_ f x J δ, ΣΧ jk •/’(V,)_ f x J δ, ΣΧ jk •/’(V,)

4 ·為隱藏層第j個神經元的誤差量。 網路並非輸出層時,前一層與後一層連接之加權值%的修正量為: = Ti-^k-yi 同理’隱藏層神經元的門限值修正量為: Λ/Q - 犯 J~ η^θΙ = ~η-^ (1.9) 1·6網路參數 類神經網路通常包含AMf㈣要離,下觸種參數會影響 學習的過程結果以及效率: 1.隱藏層層數:若是沒有隱藏層則無法建構問題中輸入與輸出之 間的非線性_,過多的誠層數亦會使網路過度複雜,造成收叙的 速度減慢。隱藏層的層數為-層或兩層時通常有較佳的收斂性,並足 以反應近乎所有問題。 2_隱藏層神經元數目:隱藏層需要多少個處理單元並沒有一定的 201232429 規定’通常隱藏層神經元的數目越多收斂速度越慢,但可達到更小的 誤差值。過少的隱藏層神經元會因為網路沒有足夠的參數描述問題輸 入與輸出之間的非線性關係。過多的隱藏層神經元個數,容易使得網 路過度的描述與學習’使得網路學f到輸人的雜訊部分,藏層層數 以及隱藏層神經元的健通常是根據實驗,透過經驗法則來決定,或 者可參考[31],對式(10)、(1. u )擇一 ’作為隱藏層神經元個數 的計算方式: (1) 隱藏層單元數目=(前一層單元數+後一層單元數)+2。 麵 (1.10) (2) 隱藏層單元數目=(輸入層單元數X輸出層單元數)1/2。 (1.11) 3. 活化函數:倒傳遞類神經網路常用的活化函數為雙彎曲函數, 而動態類神經網路較常用的活化函數則為雙彎曲正切函數。可根據問 題的性質以及網路的設計而使用不同的活化函數。 4. 誤差函數:誤差函數除了可使用誤差平方和之外,亦可取立方 隹 和甚至其他的函數來表達網路輸出與目標輸出值的差距,改善學習品 質。 5.學習速率(Learning Rate):學習速率過大或過小皆對網路的 收斂性質不利。雖然越大的學習速率有較明顯的網路加權值修正量, 可較快逼近誤差函數最小值,但容易造成誤差震盪的現象。學習速率 過小則會造成收斂速度過慢^因此可採用變動的學習速率,在網路初 始時可先採取較大的學習速率,隨著網路的訓練過程中逐漸減小的方 12 201232429 式’-般適當的學習率約在0.6到Q 8之間。若是採變動的學習速率 則在每-組f;l丨練魏完科將學習速率乘以__個小於1G的係數,逐 漸縮小學習速率’但可設-個下限值不可小於此數值。但仍有些問題 的適當學習速率可能低到〇· 1以下。 6·慣性項(Momentuin):慣性量的加入主要是將每一代權重的更新 趨近於前-代的方向,如此可改善學習雜料敎的震錄訊以及 對平滑的目標值可放大搜尋方向達到收斂的效果。慣性因子太大或太 小對網路的收斂性均不佳,因此慣性因子可採與學習速率相同的方式 調整。 ?·批'"人冬習(Batch Learning):標準的倒傳遞類神經網路的學習 演算法是每載入一組訓練範例即更新一次權重值,稱為「逐例學習」。 而批次學習則是將所有的訓練樣本載入後,將每一筆的修正量加總並 做處理後再更新權重的改變量。 隨機擾動量:類神經網路的學習在於使誤差最小化,但最陡坡降 法有會陷入局部最小值(l〇cal minimum)的情形,使得收斂的結果並 不完善。為了解決此問題,可以結合科西機器(Cauchy machine)增 加搜尋跳脫區域解的能力,如公式(丨· 12)。 wn+1 =wn +Δνν+Ω (1 η) Ω :隨機擾動量,[_〇.〇〇1,+0.001]之間。 8.權重初始化:權重初始值的範圍在[-2-1,+2-1]之間的隨機亂 數’有較好的收斂效果。當權重範圍在[-ρ,Ρ]中,將類神經網路的權 重範圍設為[-d,d],d的公式如下(1.13)。 13 201232429 2P (1.13) π :輸入層輸入的個數。 P:權重範圍,需為整數。 9. 正規化(N〇rmaiize广由於不同活化函數具有不同值域的範 圍,若是輸入的值過大,將造成神經元輸出皆為活化函數的飽和值, 未能感受到各筆資料的差異而無法完善的學習,因此可將輸入輸出配 合不同的魏函數做正統,使紐介於活化疏軟義域範圍内。 10. 處理單元飽和:活化函數不論是用雙彎曲函數或雙脊曲正切 函數在自變數趨近於相當A的數時,函數微分值會趨近於零,使得網 路無法發娜正連結加權值的效果。為了避免處理單元飽和的問題, 可在活化函數微分值加上一個小的數值。 本發明透過片段線性法實現類神經網路最常用的雙彎曲函數及雙 彎曲正切函數硬義構,提升鴨❹樣性。並靖狀㈣搭配高速 管線設計方式及分段計算晴雜,將硬體陣列部分喊共用,設計 出可以讓個者隨著問題改變網路種綱硬體架構,取代需要冗長計 算的軟體。 2.1硬體架構與原理 2.1.1前向硬體架構 倒傳遞類神經網路和回饋式神經網路的架構皆為多層感知機架 構’每-層運算皆有前後層的層級以及順序相關性。若是硬體也以層 數設計的話’闕輯元件會隨著祕架構增加,成本也會隨之加大。 201232429 由於類神經網路的順雜,後—層雜必鮮顺—層的值 後,才可進行運算。即使將各驗皆設計為擁有獨立崎算單元凡對 整體運算速度的增加核’卻賴㈣倍增的賴元件。因此可將類 神經網路前—層的資料,經過輪入匯流排傳送,將運算的結果利用記 憶體儲存下來,下-層運算時,再將存放值置放於輸入匯流排上減 少硬體的使用量。 本發明採用先前技術之環狀串列架構為基礎,如圖十一所示。此 架構可以減少邏輯元件的數目降域本。制運算是經由很多運算單 =且成,運算單元越多所能計算的神_簡獻但運算單元的數 量往往會糾成本及硬體的_,降低了實際應用上的可能性。因此 本發明另Μ分段計算的方式’ #婦_路硬體賴元件過多或成 本過大無法合麟,侧咖她f,將侧邏輯元件個數固 定並善用記憶體的容量,以有限的神觀個數完成較大醜神經網路 運算。 抑硬體神經網路初始化後,輸入層資料會從輸入匯流排傳入各運算 =元内’與運算單元聰紐㈣《值進行麵函數的運算運算 結果傳至活化函數的硬體區塊,完成活化函數及其微分的計算。運算 完畢後,歌將移位訊號(shi⑴致能,將各個運算單元所計算的值 逐-傳送到運算單元外部記憶體中儲存。然後將適才活化函數運算完 麵運算結果#成下—層級的輸人,經由資流排,再進行上述相 :動作。如此重複的制此架構,即可以完成三層、四層的類神^ 網路’圖十二為前向運算架構圖。 15 201232429 由於上述環狀串列架構中實際硬體的邏輯元件數目會影響到建立 網路的大小,本發明採时段計算龄式實現整個神射職。圖十三 為分段計算流簡,假設硬體受祕某較際的條件下,所能合成出 的最大處理單元為2個,但需要運算的_,_路為5個,則其運算 方式為先將前_者運算完畢並存人記憶雜,制續運算其餘的神 經元》 2· 1_ 2逆向硬體架構 逆向硬體架構主要分為四大區塊,分別騎算輪出層_區塊、隱 藏層的δ區塊、計算如區塊以及更新權重的硬體區塊。前向運算會將所 計算的活化聽以及微分制結果分難存在記髓内。在隱藏層的 誤差運算上,由於運算的過程為乘加運算,與正向運算的概念雷同, 只是不需經由活化函數的硬體區塊,因此本發明採用正向運算的硬體 架構與平行處理的概念’除去正向賴中活化函數的部分完成隱藏層 誤差计具的硬體。 當逆向運算開始時,首先會將目標值和經由前向運算所得到輸出 層的值與其微分值傳至計算輸出層的5區塊。在此區塊中,目標值與輸 出層的計算結果相減後,並與微分值相乘,即可得到輸出層的差距量, δ 〇 輸出層的差距量運算完畢後,將繼續隱藏層差距量的運算。由於 隱藏層的運舁必須將輸出層的§與處理單元的權重相乘後做累加,因此 透過環狀串列的架構完成相乘累加的運算後,將結果與前向運算中各 個隱藏層神經元的微分值傳至隱藏層的δ區塊,此區塊只需透過一個乘 201232429 計算兩者相乘後所得到的值之後,即可得職藏層的差距量。 若疋、用路為四層的倒傳遞類神經網路 ,由於誤差量的運算方式相同 7利用同樣的架構與方法完成第二層隱藏層與第-層隱藏層之間的神 |不而額外再耗費邏輯元件建置該層的硬體區塊。 各個運算單元的差距量都運算完畢後,接著將·~的區塊運算 各個權重值的修正量。輸出層的差距量與隱藏層的差距量存放於同— 1記憶勸。由於輸出層的差距量是最先被料的,輸出層桃會先 被讀取出,因此輸出層與隱藏層間物亦會先被計算出來。如區塊的 輸入包含δ記憶體的輸出值、學f速率以及輸人匯翻卜神經網路的架 構中各個神經元財-偏雜,因物區塊中的輸塌流排會在每次 讀取δ記憶體巾的值時,將偏權值的觀傳送至如區塊的輪人匯流排, 接著再將前—層_輸人傳送至~區塊的輸人匯流排運算每個神經 元偏權鋪纽及魏加雜健_正量。同日核配合時序的控制 將前-代資料的權重值與w的值傳至更新權重的硬體區塊,前—代的 權重值會在倾塊内與Aw的仙加得到更碰的值,完成—次逆向運 算,圖十四為逆向運算架構圖。 2.1.3運算單元 本發明的前向運算以及逆向運算均使用環狀丰列架構,而環狀串 列即為將各個運算單元的結果互相串連之。運算單元内部架構主要為 記憶體、乘法累加器、移位暫存器以及活化函數等四個部分。這四個 部分皆為獨立的硬體區塊,採用高速管線設計方式將運算拆解為獨立 運作’可大幅提尚效能。 17 201232429 鼻單元内。P戶斤包含的^己憶體疋用來儲存計算時戶斤需的權重值, 類神經網路必_每__個輸人值進行加權的運算,因此在運算單元的 内部必須設計—個乘法累加器。由_神_路的前向運算和逆向運 算均包含胁的觀念,對於龍序相當重要,為了簡化控制 器的設計並搭配學f過程中能夠以正確的順序讀出相對應的權重,運 算單元内部設計具有—㈣(觸)架構的記憶體,僅需將寫入的順 序加以控制’則讀取時的先後_將與“。每個運算單元中 有額外的定址方式,用來將權重匯流排㈣料寫人到各個運算單元裡 號時序圖。4 · The amount of error of the jth neuron in the hidden layer. When the network is not the output layer, the correction amount of the weighted value % of the connection between the previous layer and the latter layer is: = Ti-^k-yi Similarly, the threshold correction amount of the hidden layer neurons is: Λ/Q - commit J~ η^θΙ = ~η-^ (1.9) 1·6 network parameter class neural network usually contains AMf (four) to leave, the lower touch parameters will affect the learning process results and efficiency: 1. hidden layer number: if not hidden The layer can't construct the nonlinearity between the input and the output in the problem. The excessive number of layers can make the network too complicated and slow down the collection. When the number of layers of the hidden layer is -layer or two layers, there is usually a better convergence and it is sufficient to react to almost all problems. 2_Hidden layer neurons: How many processing units are needed for the hidden layer does not have a certain 201232429 rule. The more the number of hidden layer neurons, the slower the convergence rate, but a smaller error value can be achieved. Too few hidden layer neurons will have a non-linear relationship between the input and output of the problem because the network does not have enough parameters. Excessive number of hidden layer neurons can easily lead to excessive description and learning of the network. 'Let the network learn to the noise part of the input, the number of layers and the health of the hidden layer are usually based on experiments. The law decides, or you can refer to [31], and choose the equations (10) and (1. u) as the calculation method of the number of hidden layer neurons: (1) The number of hidden layer units = (the number of units in the previous layer + The number of units in the next layer is +2. Face (1.10) (2) Number of hidden layer units = (number of input layer units X number of output layer units) 1/2. (1.11) 3. Activation function: The activation function commonly used in reverse-transfer-like neural networks is a double-bend function, while the more commonly used activation function of a dynamic-like neural network is a double-bend tangent function. Different activation functions can be used depending on the nature of the problem and the design of the network. 4. Error function: In addition to the error squared error, the error function can also take the cubic 隹 and even other functions to express the difference between the network output and the target output value, and improve the learning quality. 5. Learning Rate: The learning rate is too large or too small, which is detrimental to the convergence nature of the network. Although the larger learning rate has a more obvious network weighting correction amount, it can approach the minimum value of the error function faster, but it is easy to cause the error to oscillate. If the learning rate is too small, the convergence speed will be too slow. Therefore, the learning rate can be changed. In the initial stage of the network, a larger learning rate can be adopted. As the network training progresses, the party 12 is reduced. - The appropriate learning rate is between 0.6 and Q 8. If the learning rate of the change is taken, then the learning rate is multiplied by the coefficient of __ less than 1G, and the learning rate is gradually reduced, but the lower limit value cannot be less than this value. However, the appropriate learning rate for some problems may be as low as 〇·1 or less. 6. Inertia (Momentuin): The addition of inertia is mainly to bring the update of each generation weight to the direction of the pre-generation, so as to improve the seismic recording of the learning materials and to enlarge the search direction for the smooth target value. Achieve convergence effect. If the inertia factor is too large or too small, the convergence of the network is not good, so the inertia factor can be adjusted in the same way as the learning rate. • Batch '"Batch Learning: The standard inverse-transfer-like neural network learning algorithm is to update the weight value every time a set of training examples is loaded, called "case-by-case learning." In batch learning, after loading all the training samples, the correction amount of each stroke is added and processed to update the weight change. Random Perturbation: The learning of the neural network is to minimize the error, but the steepest slope method has a situation where it will fall into the local minimum (l〇cal minimum), so that the convergence result is not perfect. In order to solve this problem, the ability to search for the solution of the trip region can be increased by combining the Cauchy machine, such as the formula (丨 12). Wn+1 =wn +Δνν+Ω (1 η) Ω : random disturbance amount, between [_〇.〇〇1, +0.001]. 8. Weight initialization: The random number of the initial value of the weight between [-2-1, +2-1] has a good convergence effect. When the weight range is in [-ρ,Ρ], the weight range of the neural network is set to [-d,d], and the formula of d is as follows (1.13). 13 201232429 2P (1.13) π : The number of input layer inputs. P: Weight range, which needs to be an integer. 9. Normalization (N〇rmaiize is wide because different activation functions have different ranges of values. If the input value is too large, the neuron output will be the saturation value of the activation function, and the difference in the data cannot be felt. Perfect learning, so the input and output can be orthodox with different Wei functions, so that the nucleus is within the range of activation and softness. 10. Processing unit saturation: The activation function is either using the double bending function or the double ridge function. When the self-variables approach a number equivalent to A, the function differential value will approach zero, making the network unable to combine the effect of the weighting value. To avoid the problem of processing unit saturation, add a value to the activation function differential value. The invention realizes the double-bending function and the double-bend tangent function hard-like structure of the neural network based on the segment linear method, and improves the duck-like nature, and the Jingxing (4) is matched with the high-speed pipeline design method and the segmentation calculation. Miscellaneous, the hardware array part is shouted and shared, and the design can let the individual change the network hardware architecture with the problem, replacing the software that requires lengthy calculation. Body Architecture and Principles 2.1.1 Forward Hardware Architecture The architecture of the reverse-transfer-like neural network and the feedback-based neural network are all multi-layer perceptron architectures. Each-layer operation has a hierarchy of layers before and after, and if it is hard. If the body is designed with the number of layers, then the components will increase with the secret structure, and the cost will increase accordingly. 201232429 Due to the complication of the neural network, the back-layer will be fresh and smooth—the value of the layer It can be operated. Even if each test is designed to have an independent reciprocal unit, the increase of the overall operation speed is based on the doubling of the component. Therefore, the data of the pre-layer of the neural network can be passed through the round-trip convergence. The transfer is performed, and the result of the operation is stored by the memory. When the operation is performed in the lower layer operation, the stored value is placed on the input bus to reduce the amount of hardware used. The present invention is based on the prior art ring serial architecture. As shown in Figure 11. This architecture can reduce the number of logic elements in the domain. The operation is based on many operations, = the number of operations, the more computing units can be calculated, but the number of computing units will often Correcting cost and hardware _, reducing the possibility of practical application. Therefore, the present invention is also a method of segmentation calculation. ### The road hardware is too much or the cost is too large to be collateralized. The number of side logic components is fixed and the capacity of the memory is used to complete the large ugly neural network operation with a limited number of gods. After the hardware network is initialized, the input layer data is transmitted from the input bus to each Operation = Intra- and 'Operation Units' (4) "The value of the operation of the surface function is transferred to the hardware block of the activation function, and the calculation of the activation function and its differential is completed. After the operation is completed, the song will shift the signal (shi(1) When enabled, the values calculated by the respective arithmetic units are transferred to the external memory of the arithmetic unit for storage, and then the result of the calculation of the surface of the function is completed, and the input of the level is performed. Phase: Action. Repeatedly making this architecture, you can complete the three-layer, four-layer class of God's network. Figure 12 is the forward computing architecture diagram. 15 201232429 Since the number of logic elements of the actual hardware in the above-mentioned ring-and-column architecture affects the size of the established network, the present invention implements the entire generation of the gods. Figure 13 shows the flow of the segmentation calculation. It is assumed that the maximum number of processing units that can be synthesized is 2, but the number of _, _ roads that need to be calculated is 5, and the operation mode is In order to complete the calculation of the former _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ , δ blocks of hidden layers, calculation of hardware blocks such as blocks and update weights. The forward calculation divides the calculated activation hearing and differential results into difficult words. In the error calculation of the hidden layer, since the operation process is a multiplication and addition operation, the concept of the forward operation is the same as that of the hardware block without the activation function, so the present invention adopts the hardware architecture of the forward operation and parallel. The concept of processing 'removes the part of the activation function in the forward direction to complete the hardware of the hidden layer error gauge. When the reverse operation starts, the target value and the value of the output layer obtained by the forward operation and its differential value are first transferred to the 5 blocks of the calculation output layer. In this block, after the target value is subtracted from the calculation result of the output layer and multiplied by the differential value, the gap of the output layer can be obtained. After the calculation of the gap of the δ 〇 output layer, the layer gap will continue to be hidden. The amount of operations. Since the operation of the hidden layer must be multiplied by the weight of the processing layer and the weight of the processing unit, the operation is performed after the multiplication and accumulation operation is performed through the architecture of the circular string, and the result is combined with the hidden layer nerve in the forward operation. The differential value of the element is transmitted to the δ block of the hidden layer, and the block can obtain the gap of the working layer only after multiplying the value obtained by multiplying the two by 201232429. If 疋, the road is a four-layer reverse transfer-like neural network, because the error amount is calculated in the same way. 7 The same architecture and method are used to complete the god between the second hidden layer and the hidden layer of the first layer. The logic component is then used to build the hardware block of the layer. After the calculation of the difference amount of each arithmetic unit is completed, the correction amount of each weight value is calculated by the block of ~~. The amount of gap between the output layer and the hidden layer is stored in the same -1 memory. Since the gap of the output layer is the first to be received, the output layer peach will be read first, so the output layer and the hidden layer will also be calculated first. For example, the input of the block contains the output value of the delta memory, the learning f rate, and the individual neurons in the architecture of the input neural network, and the collapse flow in the object block will be When reading the value of the delta memory towel, the concept of the bias value is transmitted to the wheeled busbar such as the block, and then the front-layer_input is transmitted to the input block of the block to calculate each nerve. Yuan partial rights pavilion and Wei Jia miscellaneous _ positive amount. On the same day, the matching of the nuclear matching timing passes the weight value of the pre-generation data and the value of w to the hardware block of the updated weight, and the weight value of the pre-generation will be more affected by the value of Aw in the dumping block. Completion—the reverse operation, and Figure 14 is the reverse operation architecture diagram. 2.1.3 Arithmetic Unit Both the forward operation and the reverse operation of the present invention use a ring-shaped hierarchical structure, and the ring-like series is a result of concatenating the results of the respective arithmetic units. The internal structure of the arithmetic unit is mainly composed of four parts: memory, multiply accumulator, shift register and activation function. These four parts are independent hardware blocks, and the high-speed pipeline design method is used to disassemble the operation into independent operation, which can greatly improve the performance. 17 201232429 Inside the nose unit. The P memory contains the weight value required for the calculation, and the neural network must perform a weighted operation for each __ input value. Therefore, it must be designed inside the operation unit. Multiply accumulator. The forward and reverse operations of _God_Route contain the notion of threat, which is very important for the dragon sequence. In order to simplify the design of the controller and read the corresponding weights in the correct order, the arithmetic unit The internal design has a memory of the - (four) (touch) architecture, and only needs to control the order of writing. Then the sequence of readings will be "with." There is an additional addressing method in each unit for converging the weights. Row (four) is written to the timing diagram of each arithmetic unit.

面’即運算單元中記憶體的寫人訊麵鱗算單元定址搭配,而其男 取城則因平行處理運算資料,故可以朗。由於環狀串列的架鮮 由運算單元串接而成,所以各個運算單元計算累加運算岐同時軸 的。類神經網路在前向運算時,須透過—活化函數轉換輸人訊息,名 設計上,每-個運算單元即代表_個神經元,因此運算料内部各具 有獨立義化函數區塊。活化函數硬寵塊主要料兩大區塊, 包含雙”函數及其微分的區塊以及雙f曲正切函數及其微分的區 塊。資料透過輸人匯流娜輸人值傳絲法累加器與運算單元中啦 憶體做累加’ _加後的值傳送至兩個活化函數_區塊做運 算。而每個運算單柄部具有—條移位的控制訊號,此訊號控制是否 將目前運料元物絲輸㈣下—靖料。«料輸入且在 運算单兀mtM畢後’各麵算單元中的賴雜串列架構, 透過移位的方式,將資料傳到下—個運算單元。針五為運算單元訊 18 201232429 2.1. 4堆疊、佇列與隨位置存取記憶體 倒傳遞類神經網路與回饋式神經網路在運算的過程中需要存取資 料以進行運算,且回饋式神經網路的回饋架構也必須透過將儲存在記 憶體中的資料傳送到運算單元中進行運算以達成回饋的行為,因此記 憶體的配置相當重要。系統的記憶體存取方式—般都搭配記憶體位址 以進行儲存或是讀取,但為了簡化控制器的設計與資料的存放順序, 本論文在權重資料财方技_堆疊(Stad〇、侧(FIF⑴以及 參 隨機存取記紐等三種不同的硬齡構來儲存。而堆疊與仵列硬體架 構訊號功能說明如下: (1) clear ·在堆疊架構中表示寫入位置回到最低位置,讀取位 置回到最尚位置;在佇列架構中則表示寫入與讀取位置都回到最低位 置。 (2) hold :在堆疊與佇列架構中表示記錄目前記憶體位置。 (3) restart:此訊號需搭配h〇id訊號,在堆疊與佇列架構中表 示將讀取位置回到先前hold所設定的位置β (4) rd_addr :讀取資料的記憶體位址 表·ι記憶體硬體架構存放資料内容 名稱 | 存议貧料内容 Weight_RAM 存放權重資4 LearningAddr_FIFO 存放逆^運算辱權重值ϋ的記憶體位置 UpdatingAddr_FIFO 存放丨i正權重時-權重;出的記憶體位鲞 target_RAM ^的記憶體 targetAddr_FIFO 存放目標值輸出的記憶體位置 input_FIFO 存放輸入資料 PE_Weight一FIFO 存放單一神經元内的權重 ' ---—….丨.. 201232429The face is the address of the memory of the computing unit in the computing unit, and the male is taken from the city because it processes the data in parallel. Since the frame of the ring string is formed by connecting the arithmetic units in series, each arithmetic unit calculates the cumulative operation 岐 simultaneous axis. In the forward-looking operation, the neural network must convert the input message through the activation function. In the name design, each operation unit represents _ neurons, so the computational materials each have independent functionalization blocks. The activation function hard pet block is mainly composed of two large blocks, including the double "function and its differential block and the double f-curve function and its differential block. The data is transmitted through the input and flow transfer accumulators. In the arithmetic unit, the memory is added to the accumulated value of _, and the value is transferred to the two activation functions _ block for operation. Each operation single handle has a control signal for shifting, and this signal controls whether or not the current material will be transported. Yuan Shisi loses (4) under - Jing material. «Material input and after the operation of the single 兀MM, the sub-parallel architecture in each surface calculation unit, through the shift method, the data is transferred to the next operation unit. Pin 5 is the arithmetic unit signal 18 201232429 2.1. 4 stacking, arranging and random access memory memory transfer neural network and feedback neural network in the process of operation need to access data for calculation, and feedback The feedback structure of the neural network must also transfer the data stored in the memory to the arithmetic unit to perform the feedback to achieve the feedback behavior. Therefore, the memory configuration is very important. The memory access mode of the system is generally matched with the memory. The physical address is stored or read, but in order to simplify the design of the controller and the order in which the data is stored, this paper is divided into three different types: weight data (Stad〇, side (FIF (1), and random access memory). The hard and thin structure is stored. The functions of the stack and the hardware structure signal are as follows: (1) clear · In the stack architecture, the write position is returned to the lowest position, and the read position is returned to the most position; In the architecture, both the write and read positions are returned to the lowest position. (2) hold: indicates the current memory location in the stack and queue structure. (3) restart: This signal needs to be matched with the h〇id signal. The stacking and array architecture indicates that the reading position is returned to the position set by the previous hold. β (4) rd_addr: memory address table for reading data. ι memory hardware architecture data content name | Weight_RAM Storage weight 4 LearningAddr_FIFO Stores the inverse ^ 辱 权 weight value ϋ memory location UpgradeAddr_FIFO Store 丨 i positive weight - weight; out memory location 鲞 target_RAM ^ memory targetAd dr_FIFO stores the memory location of the target value output input_FIFO stores the input data PE_Weight-FIFO stores the weight within a single neuron ' ----....丨.. 201232429

Y FIFO dy_StackY FIFO dy_Stack

Recurrent一FIFO Y Stack ^iM^ib4* 丨献正向$算經過純以的結f 丨„_^ι路咖授的ιΐ藏層結果 所i的數iRecurrent-FIFO Y Stack ^iM^ib4* 丨向正$算过纯结的结 f 丨„_^ι路咖的 ΐ ΐ 结果 所 所 i i

delta FIFO new_Weight_StackDelta FIFO new_Weight_Stack

Result FIFO ί出層的5及隱多展*的--------- 訓練完畢時的權重一 2.1. 5數值系統 數位系統的運算原則較用二進位數字系統來表示,資料的運曾 及處理也是必須剌二元性信號的二驗雜元件來實^本發_ 用32此的二進位碼定时號錢作為本_的數值編碼方式’而 N1〇sn微處理晴糊是_觀754格式咖小數,因此 在下触切及__參_,麵桃轉定點轉換器 進订數值賴。域十,本發_數位數與小 數位數可以依照使用者進行變更。 3.1活化函數硬體架構 3.1.1活化函數特性 /架構具有兩細的活化函數,其,f崎另 雙相正切函數。雙彎曲函數和雙擎曲正切函數皆具有對稱的性質 如圖十七,十八。故賴硬體設計得到右半邊函數後,利用減法器 以及多工器的判®f,依照公式(1 14)和八4 μ ις、+ 〇 )和公式(1.15)求得左半邊函 ^4方式可以減少將近—半的顯元件使 和圖二十。 口十九 20 201232429 夕00 = j /(λ) , for x > 〇 IWO) ,forx<〇 (1.14) y(x) = · [/(文) ,for x > 〇 1~/(λ) ,f〇rx<〇 (1.15) 3.1.2 片段線性法 片#又線性法是採用Center Linear Approximation (CRI)演算法算出 近似雙彎曲函數曲線,如表2所示。其中表内的x代表輸入值,q代Result FIFO ί out of layer 5 and hidden multi-distribution *------------- The weight of the training is 1.1.1. The numerical principle of the digital system is more represented by the binary digital system. Once and the processing is also necessary to 剌 binary signal of the second check component to the actual ^ _ _ with 32 of the binary code timing number money as the _ numerical coding method ' and N1 〇 sn micro-processing is _ view The 754 format coffee decimal, so in the next touch and __ _ _, face peach to the fixed point converter to enter the value. In the domain ten, the number of digits and the number of decimal places can be changed according to the user. 3.1 Activation Function Hardware Architecture 3.1.1 Activation Function Characteristics The architecture has two fine activation functions, which are f-synchronous two-phase tangent functions. Both the double bending function and the double engine tangent function have symmetrical properties as shown in Fig. 17, eighteen. Therefore, after the hardware design is used to obtain the right half function, the left half of the function is obtained according to the formula (1 14) and the eight 4 μ ις, + 〇) and the formula (1.15) using the subtractor and the multiplexer's judgment. The way to reduce the near-half of the display component and Figure twenty.口1920 201232429 00 = j / (λ) , for x > 〇 IWO) , forx < 〇 (1.14) y (x) = · [ / ( text ) , for x > 〇 1 ~ / ( λ ), f〇rx<〇(1.15) 3.1.2 Fragment Linear Method# The linear method is to calculate the approximate double bending function curve using the Center Linear Approximation (CRI) algorithm, as shown in Table 2. Where x in the table represents the input value, q generation

表且代的··人數,代表内插的深度(Interp〇lati〇n 〇印出)。經測試後,召 為2時、s是定為0·28094有最佳解。此演算法可算出的部分其餘部 分則利用對稱性質來實現’以減少邏輯元件的使用個數其硬體架構 如圖二十一所示。圖二十二為片段線性法的示意圖,從圖中可以看出 片段線性法是_ _法的方式實現非線性的雙料函數。其原理為 先找尋轉折點後’誤差會慢慢的增加,#誤差值達到—個上限時會 找尋新的轉折點。此硬體_共需817個邏輯耕且須Η個時脈得到 輪出結果。The number of people in the table represents the depth of the interpolation (Interp〇lati〇n 〇 printed). After testing, the call is 2 hours, s is set to 0. 28094 has the best solution. The rest of the algorithm can be calculated using symmetrical properties to reduce the number of logical components used. The hardware architecture is shown in Figure 21. Figure 22 is a schematic diagram of the segment linear method. It can be seen from the figure that the segment linear method is a non-linear two-material function in the way of __method. The principle is that after finding the turning point, the error will increase slowly. When the #error value reaches the upper limit, a new turning point will be found. This hardware _ requires a total of 817 logic ploughs and has to take a clock to get the result.

片段線性法虛擬碼如下所汗_ . g(-c) = y,U) = 〇 ' /iU) = >^2(x) = 1(1 + -^) for(i = 0;i = q;i + +) { g'(x) = max[g(x),h(x)] Κχ) = ^(8(χ) + Κχ) + ά,) g(x) = g\x) 21 201232429 3.1.3片段線性法架構修改 q值為2的情形下,利用硬體區塊的重複性以簡化硬體架構。表 3· 2中片段線性演算法使用公式(^6)需要一個加法器以及兩個移位 暫存器。改使用公式(1.17)後,只需花費一個移位暫存器以及—個 加法器,且比前者少一個時脈即可將結果算出。在q為丨的第一次運 算中,公式(1.18)需使用兩個加法器以及一個移位暫存器,由於g(x) 初始值為〇,故將公式(3·17)代入後得公式(119>公式(1 19) 不僅只需花費-個加法H以及—個移位暫存^,且可將變數值合併。 表3. 2的架構是可以透過調整疊代次數q來增加準確度,然而疊 代次數越絲必越《’且也需要更㈣運糾間。經職結果發現 疊代兩次是較為平_方法。本發難峡疊代:縫“ 2的情況下, 將公式α2〇)改為-儲存常數的暫铜。不僅可減少縣式所需的 邏輯元件和物㈣咖關的情形 下,將第-次疊代時的公式(1·18)改為公式(ι 21),更可以利用調 整此常數來增加準確度,如圖二 十—所不。此硬體架構僅需333個邏 輯元件,且8個時脈即可得顺出結果。 咖4〇+音) (1.16)The fragment linear method virtual code is as follows: g(-c) = y, U) = 〇' /iU) = >^2(x) = 1(1 + -^) for(i = 0; i = q;i + +) { g'(x) = max[g(x),h(x)] Κχ) = ^(8(χ) + Κχ) + ά,) g(x) = g\x) 21 201232429 3.1.3 Fragment Linear Method Architecture In the case of modifying the q value to 2, the repeatability of the hardware block is utilized to simplify the hardware architecture. The fragment linear algorithm in Table 3.2 requires an adder and two shift registers using the formula (^6). After changing the formula (1.17), it only takes one shift register and one adder, and the result is calculated by one clock less than the former. In the first operation where q is ,, the formula (1.18) requires two adders and one shift register. Since the initial value of g(x) is 〇, the formula (3·17) is substituted. The formula (119> formula (1 19) not only costs -addition H and -shift temporary storage ^, and can combine the variable values. Table 3. The structure of 2 can be increased by adjusting the number of iterations q Degree, however, the number of iterations must be more and more "and also need more (four) transport between the two. The results of the job found that the iterations are relatively flat _ method. The hair of the dynasty dynasty: the seam "2, the formula Α2〇) changed to - the temporary copper of the storage constant. Not only can the logic elements and objects required by the county type be reduced. (4) In the case of the coffee gate, the formula (1·18) of the first iteration is changed to the formula (ι). 21), you can use this constant to increase the accuracy, as shown in Figure 20. This hardware architecture requires only 333 logic elements, and 8 clocks can be used to get the result. ) (1.16)

XX

h(X) = ~2\ (1.17) Λω = -(^(χ) + Λ(Λ) + Δ) (1.18) 22 201232429 Λ(λ) = ^(〇 + - + - + Λ) 2 2 4 ’ 1 λ: Δ =—+—+— 4 8 2 (κ)+營h(X) = ~2\ (1.17) Λω = -(^(χ) + Λ(Λ) + Δ) (1.18) 22 201232429 Λ(λ) = ^(〇+ - + - + Λ) 2 2 4 ' 1 λ: Δ =—+—+— 4 8 2 (κ)+ battalion

X △,+ 4 Λ W = - (g (jc) + Λ(χ) + Δ 2)X △, + 4 Λ W = - (g (jc) + Λ(χ) + Δ 2)

3·1.4雙彎曲正切函數架構 (1-19)(1.20)(1.21)3·1.4 Double Bending Tangent Function Architecture (1-19)(1.20)(1.21)

雙彎曲正切函數與雙彎曲函數的圖形類似所不同者在於兩者之 間值域的範圍。a—十四為#段線性法在疊代時g⑴以及h(x)的圖 形。由圖一十五可知紅色的線代表g(x)以及h⑴取大者所繪出之圖 形。從圖形可看出’片段線性法是—種以内插的方式交錯而成的非 線it圖升/硬體木構中常數的大小以及移位暫存器影響著圖二十四中 三角形的角度,也影響著片段線性法所合成出的曲線函數圖形。 因/_刪前-小節所完_段線性法修改後的架構調 雔fUl暫存器的數值以及移位暫存器來實現雙f曲正切函數。 又考曲正切函數值域在H,川,對稱中心為0。公 代表雙彎曲函數y軸的對稱中心。透過經驗法 ·的0.5 現硬體的雙彎曲正切函數區塊,如圖二十 D “”數後’實 個邏輯元件,需要9 _脈可簡顺出纟U。此硬體_共需3Π 丄1.5片段線性法相關參數 23 201232429 片段線性法的變數Δ代表演算法中的内差深度。每進行-次疊代運 算時移位暫存$會將Δ;&移兩恤元完成除法運算逐漸減小該數 值。而本發明在前—小節中將片段線性法架構就q為2時做修改後, 根據演算法架構中△影響著線段中祕部分的斜率,調整片段線性法架 構修改後的V 4和移位暫—使活化函數更精確。圖二十七為先前技 術中的片段線性法實現圖,關二十八則為架構修改後的完成圖。如 圖-十九所tf ’修改後的架構中’在兩次疊代具有各自獨立的*和〜的 情形下,誤差健小。圖三议片段祕法修改後的雜錢表3 2 的參數所實現雙彎曲正切函數_,圖三十—為其誤差圖。 表3.2片段線性法參數與結果比較 網路架構 雙f曲函數 修改前 雙弩曲函數 修改後 雙彎曲正切函數 q 2 2 2 0.476807 Λ 0.28094 0.979721 0.070235 0.197876 0.188034 最大誤差 0.024458 〇. 020027 0.051239 所需邏輯元件 817 個 LE 333 個 LE 317 個 LE 運算時間 12 個 elk 8 個 elk 9 個 elk 3.2控制單元 控制單掌握整個系統運算’掌控整個彳統的執行序 。主要分 為四大部分’其中包括權重順序的控制、運算單元位址的控制、目標 值的控制以及整體m程的控制。控制II的設計方法以有限狀態機 為基礎的設計模式。整體系統的控制器除了要控制分段運算,也要控 制每個運算單元中活化函數區塊、累加n、平行學習架構、輸出層誤 差計算和隱藏層誤差計算等。 24 201232429 3·2.〗權重記憶體管理單元 類神經網路在-次完整的訓練過財,權重值運用的紐次序會 根據網路所進行的步驟而有三種不同的順序。第一種為前向運算時權 重值運用的順序,第二種為逆向運算時權重值的順序第三種則是在 更新權重值時_序。因此本發明設計—健獅重記髓的控制 器將二種不同順序的權重位址儲存在記憶體内。儲存權重數值的記 憶體會根據不_情況讀取不同儲存位址的記舰,即可得到不同的 順序。而權好理單元就是根據這三财__,控卿重記憶體 的位址’改變權重值輸出的順序,簡化系紐彻的複雜度。 若網路架構為四層倒傳遞類神經網路,輸人層神經元個數丄個、 第-層隱藏層神經元個數2個、第二層隱藏層神經元個數3個、輸出 層神經元健4個,且硬體所能合綠域理單元2個為例如圖三 十二所示。由於環狀串列架構具有平行特性的關係,當控制器對運算 早7L内的記憶體下讀取的指令時,所有的運算單元會同時將記憶體中 第-個«傳送出。如圖三十二、圖三十三中,當輸人層的神經元將 資料傳送至環狀架構時,會同時將運算單元一記憶體中的w01、禮和 運算單元二記憶體中的卿122讀取出,並與輸入進行運算。當隱藏 層的資料經統化函數的硬體區塊完成計算後,接著會將隱藏層的兩 個神經元輸人’此時由於最大處理單元只有兩個,必須進行分段計算。 因此會將運算單元-記憶體中的wl3、w23、w33和運算單元二記憶體 中的wl4、w24、w44讀取出,並進行運算。接著重複讀取第一隱藏層 的輸出值,並讀取運算單元-記憶體中的wl5、w25、哪,由於運算單 25 201232429 元讀取訊號相同,因此必須如圖三十三、圖三十四所示,運算單元二 的記憶體中必須與運算單元一相同數量的權重值,如此才不會讀取到 運算單元二記憶體後面的W37、W47、W57時,導致整體系統的錯亂。 第一種順序利用佇列記憶體的特性即可將權重依序輸出。當網路 開始進行運料會賴有的硬體初純,接著權重控制輯會開始運 作,將第二、第三種情形的位址順序寫入到佇列記憶體中。 第二種順序是在類神經網路進行逆向運算中的情形。逆向運算的 隱藏層誤差量會透過環狀串列的架構進行運算。此時可將輸出層視為儀 輸入層帛—層隱藏層則視為第一層隱藏層,也就是將整個網路的架 冓反來看ϋ此權重值就必須以、的8、的順序置放於 運算單元-的記憶",接著將w49、w48、w4?、_依序放至運算單 -的德體中。權重值根據網路初始時的輸入順序而有相對應的存 放位址’如圖二十四所示。因此將權重值以記憶體位址 25 21、17 5貪取出來即為第二種順序。由於逆向運算會採用 衣狀串歹J的架構,如之前所述運算單元可能讀取到下一層運算時所 _ 需的權重值’故也必須考量到如圖三十三在每個運算單元的記憶體中 補上各層計算時所需相關數權重值的情形。 第-種If%讀物鳴在更新權重值的賴。由於倒傳遞演算 法的更新方式是從網路架構最後面層級的權重往前更新,而更新權重 值《要__狀㈣親,故㈣考慮運算單元記㈣中補上各 二時所⑥相同健權重值的情形,因此將記憶體位址n 2 11 .等順序儲存至彳宁列巾,使得權重值以倒序的 26 201232429 方式讀取出來為第三種順序。 以下疋權重S己憶體管理單元重要訊號說明: 1· ULFIFO—clear:此訊絲示將齡第二麵序的記憶體讀寫位 址都回到最低位置。 2. UA—FIF0_clear:此訊號表示將儲存第三種順序的記憶體讀寫位 址都回到最低》 3· Initialize :此訊號表示當所有網路的參數言是定完畢初始化 時將根據娜的翻、大小以及最域理單元健的情形,開始運算 管理權重擺放順序的位址。 4. Done :當權重管理單元初始化完畢後此訊號會產生高電位, 方便控制器進行控制。 .Rd_opcode .田此訊號為〇〇時,表示權*值從記憶體輸出的 順序為第-種,為” 01”時則為第二種順序,” 1〇”日夺則為網路正在 進行更新權重時的第三種順序。 權重管理單it進行第二種情形處理時的程式片段如表3. 3所示, 而各層間賴《記鐘健的計算方式姻,但每奴料根據網 路各層前㈣㈣,敍F㈣anUayei_jQde、_____、The double-bend tangent function differs from the graph of the double-bend function in the range of the range between the two. A—14 is the graph of g(1) and h(x) in the case of the # segment linear method in the iteration. It can be seen from Fig. 15 that the red line represents g(x) and h(1) is the shape drawn by the larger one. It can be seen from the graph that the 'fragment linear method is the size of the constant in the non-linear it diagram liter/hardwood structure interleaved by interpolation and the influence of the shift register on the triangle in Fig. 24. It also affects the curve function graph synthesized by the segment linear method. The value of the fUl register and the shift register are used to implement the double f-curve function because the /_deletion-segment is completed. In addition, the value of the tangent function is in H, and the center of symmetry is 0. The male represents the center of symmetry of the y-axis of the double-bend function. Through the empirical method, the 0.5-hard-bending double-bend tangent function block, as shown in Fig. 20 “D” after the number of real logic elements, requires 9 _ pulses to simplify 纟U. This hardware _ total 3 Π 片段 1.5 segment linear method related parameters 23 201232429 The variation Δ of the segment linear method represents the depth of the internal difference in the algorithm. Shifting the temporary storage $ for each iteration of the iteration will reduce the value by grading the Δ;& However, in the first section, the segment linear method architecture is modified when q is 2, and according to the slope of the secret portion of the line segment in the algorithm architecture, the modified V 4 and the shift of the segment linear method architecture are adjusted. Temporary - make the activation function more precise. Figure 27 shows the implementation of the fragment linear method in the prior art, and Guan 28 is the completed figure after the architecture modification. In the modified architecture of Figure-19, tf', the error is small in the case where the two iterations have their own independent * and ~. Figure 3: The double-curve tangent function _, which is the parameter of the miscellaneous money table 3 2 modified by the fragmentary secret method, is shown in Figure 30. Table 3.2 Fragment linear method parameters and results comparison Network architecture double f-curve function modified double-curve function modified double-bend tangent function q 2 2 2 0.476807 Λ 0.28094 0.979721 0.070235 0.197876 0.188034 Maximum error 0.024458 〇. 020027 0.051239 Required logic components 817 LE 333 LE 317 LE operation time 12 elk 8 elk 9 elk 3.2 control unit control single master the entire system operation 'master the entire system execution order. It is mainly divided into four parts, which include the control of the weight order, the control of the operation unit address, the control of the target value, and the control of the overall m-range. Control II's design approach is based on a finite state machine design pattern. In addition to controlling the segmentation operation, the controller of the overall system also controls the activation function block, the accumulation n, the parallel learning architecture, the output layer error calculation, and the hidden layer error calculation in each operation unit. 24 201232429 3·2. Weight Memory Management Unit The class neural network is trained in the whole time. The order of the weight values is used in three different orders according to the steps taken by the network. The first is the order in which the weight values are used in the forward operation, the second is the order of the weight values in the reverse operation, and the third is in the order of updating the weight values. Therefore, the controller of the present invention is designed to store two different order weight addresses in the memory. The memory that stores the weight value will get a different order according to the ship that reads the different storage addresses according to the situation. The right-handling unit is based on the order of the three __, the address of the control clerk's memory, which changes the order of the weight value output, and simplifies the complexity of the system. If the network architecture is a four-layer inverted-transition-like neural network, the number of input layer neurons is two, the number of layers in the first-layer hidden layer is two, and the number of neurons in the second layer is three, and the output layer The number of neurons is four, and two of the green domain units that can be combined with the hardware are shown in FIG. Since the ring-and-column architecture has a parallel characteristic relationship, when the controller reads the instructions read under the memory within 7L, all the arithmetic units will simultaneously transfer the first -> in the memory. As shown in Figure 32 and Figure 33, when the neurons in the input layer transmit the data to the ring structure, they will simultaneously calculate the w01 in the memory unit and the memory in the memory unit. 122 reads out and operates with the input. When the data of the hidden layer is calculated by the hardware block of the unified function, the two neurons of the hidden layer are then input. At this time, since there are only two processing units, the segmentation calculation must be performed. Therefore, wl3, w23, w33 in the arithmetic unit-memory and wl4, w24, and w44 in the arithmetic unit two memory are read out and operated. Then repeatedly read the output value of the first hidden layer, and read the wl5, w25, and which in the arithmetic unit-memory, since the operation signal 25 201232429 yuan reads the same signal, it must be as shown in Figure 33, Figure 30 As shown in FIG. 4, the memory of the arithmetic unit 2 must have the same number of weight values as the arithmetic unit, so that the W37, W47, and W57 behind the second memory of the arithmetic unit are not read, resulting in disorder of the overall system. The first sequence uses the characteristics of the queue memory to output the weights in order. When the network starts to carry out the hardware primitive, the weight control sequence starts to work, and the address sequence of the second and third cases is sequentially written into the queue memory. The second order is the case in the inverse operation of the neural network. The hidden layer error of the inverse operation is computed through the architecture of the ring string. At this point, the output layer can be regarded as the input layer of the instrument—the layer hidden layer is regarded as the first layer of hidden layer, that is, the entire network is reversed. The weight value must be in the order of 8, Place the memory in the operation unit -quot, then place w49, w48, w4?, _ in the order of the operation order. The weight value has a corresponding storage address according to the input order of the network initially as shown in Fig. 24. Therefore, the weight value is greeted by the memory address 25 21, 17 5 as the second order. Since the reverse operation will adopt the architecture of the clothing string J, as the arithmetic unit may read the weight value required for the next layer operation, it must also be considered as shown in Figure 33 in each arithmetic unit. The memory is filled with the case where the correlation weights are required for each layer calculation. The first-type If% reading is based on the update weight value. Since the update method of the reverse transfer algorithm is to update from the weight of the last level of the network architecture, and update the weight value "to __ shape (four) pro, (4) consider the operation unit record (four) to fill the same two times 6 In the case of the weighting of the weights, the memory addresses n 2 11 . are sequentially stored in the order of the Suining, so that the weight values are read in the reverse order of 26 201232429 as the third order. The following 疋 weights S memory management unit important signal description: 1 · ULFIFO-clear: This message shows the memory reading and writing address of the second-order order of the age back to the lowest position. 2. UA-FIF0_clear: This signal indicates that the third-order memory read and write address will be returned to the lowest value. 3 Initialize: This signal indicates that when all network parameters are initialized, they will be based on Na. In the case of flipping, sizing, and the most local unit, start computing the address of the management weight placement order. 4. Done: This signal will generate a high potential when the weight management unit is initialized, which is convenient for the controller to control. .Rd_opcode. When the signal is 〇〇, the order of the right value is output from the memory, and the second order is "01". The third order when updating weights. The weight management sheet is the program fragment when it is processed in the second case as shown in Table 3. 3, and the calculations of Zhong Jianjian are in the same layer, but each slave is based on the front of each layer of the network (4) (4), and the F (4) anUayei_jQde, _____ ,

Next—Layer—McxUx及Address的暫存器’即可計算各層權重記憶 第二種情形時所需輸出的位址順序。 -~~---表3·3權重管理單元進行第二種順序程式片段Next—Layer—McxUx and Address's Register' can calculate the order of the addresses required for each layer of weight memory. -~~---Table 3·3 weight management unit performs the second sequence program fragment

If ( neural_netowrk_kind=Recurrent) then _ Flrst-Layer_Node=Input_Node+Hidden l_Layer_Node 27 201232429If ( neural_netowrk_kind=Recurrent) then _ Flrst-Layer_Node=Input_Node+Hidden l_Layer_Node 27 201232429

Else First一Layer一 Node = Input一Node End if /*網路若為四層網路則需比較第二層隱藏層的神經元個數*/ MaxLayerNode=MAX[Input_Layer_Node, Hidden-Layer一 Node, Output一 Layer一 Node] if (MaxLayerNode>MaxPE) then /*MaxPE=硬體可合成出的最大個數 MaxNode=MaxPE else MaxNode^MaxLayerNode end if 各層神經元除以MaxNode所得的商數與餘數*/ if (RAM/=0) then Mod=Mod+l end if Forward_Layer一 Node=Output 一 Layer一 Node Next_Layer_Node=Hidden_Layer_Node Next_Layer_ Mod= Hidden_Layer_Mod Address=Max_Memory一 Address-1 Original一Address=Address Total_Counter= Fonvard_Layer_Node*MaxNode*Next-_Layer_Mod Used_Counter= Forward_Layer_Node* Next_Layer_Node Counter= Next—Layer—Node If (TotaLCounter=0) then Return Elsif (Used一Counter =0) then Save AddressElse First-Layer-Node = Input-Node End if /* If the network is a four-layer network, compare the number of neurons in the second hidden layer*/ MaxLayerNode=MAX[Input_Layer_Node, Hidden-Layer-Node, Output A Layer-Node] if (MaxLayerNode>MaxPE) then /*MaxPE=The maximum number of hardware can be synthesized MaxNode=MaxPE else MaxNode^MaxLayerNode end if the quotient and remainder of each layer of neurons divided by MaxNode*/ if ( RAM/=0) then Mod=Mod+l end if Forward_Layer-Node=Output-Layer-Node Next_Layer_Node=Hidden_Layer_Node Next_Layer_ Mod= Hidden_Layer_Mod Address=Max_Memory-Address-1 Original-Address=Address Total_Counter= Fonvard_Layer_Node*MaxNode*Next-_Layer_Mod Used_Counter= Forward_Layer_Node* Next_Layer_Node Counter= Next—Layer—Node If (TotaLCounter=0) then Return Elsif (Used-Counter =0) then Save Address

28 20123242928 201232429

Total-Counter™Total-CounterTM

Elsif (Counter=0) then Save AddressElsif (Counter=0) then Save Address

Address: Original一Address-1 Original-Address—Address: Original-Address-1 Original-Address—

Used Counter-Total _Counter—Used Counter-Total _Counter—

ElseElse

Save AddressSave Address

Address= Address- Forward_Layer_NodeAddress= Address- Forward_Layer_Node

Counter-Counter-

Used_Counter—Used_Counter—

Total 一 Counter—Total one Counter—

End if 3.2.2位址管理單元 環狀串列架構是由許多運算單元串接而成,每個運算單元為一個 神經元’各個運算單元具有其獨立的地址。由於權龍流排和輸入匯 流排與每個運算單元相連接,當權重匯流排上的權重值要寫入到某個 運算單元内時,會根據地址匯流排上的地址’決定將權重值寫入至指 定運算單元的記憶體内。雜址t理單樣衫理運算單元地址的控 制器’會根據網路的種類以及可合成最大運算單摘大小,控制運算 單7L位址的編號,減少系統控制器的負擔。 當網路在使用環狀串列進行運算時,由雜據網路的種類、網路 各層神經元健叹最讀料元料同,缝值必贿放於不同的 處理單元中。若要把满、wll寫入到運算單元的記憶體中時,必須如 29 201232429 圖三十五所示。當權重满、wll寫入到權重匯流排上時且地址匯流 排上的信號為1時’便會將觀寫人至第—㈣算單元内。接著便將 權重w02、w22寫入到權重匯流排,並將地址匯流排調整為2,如圖三 十六所示。將所有的權重值依照圖三十七編排,並配合上一小節的權 重控制器,即可將所有的權重值儲存在正確的運算單元内部。為了簡 化控制器,本發明將地址匯流排的控制器獨立分出。 位址管理單元的程式片段如表3.4解,而各層運算單元位址的 計算方式相同,但每次運算時根據網路各層前後關係,修改End if 3.2.2 Address Management Unit The circular serial array architecture is composed of a number of arithmetic units connected in series, each of which is a neuron. Each arithmetic unit has its own independent address. Since the Quanlong flow row and the input bus bar are connected to each operation unit, when the weight value on the weight bus bar is to be written into an operation unit, the weight value is determined according to the address on the address bus bar. Enter the memory of the specified arithmetic unit. The controller of the address of the single-layered computing unit is used to control the number of the 7L address of the computing unit according to the type of the network and the size of the maximum single-combination operation, thereby reducing the burden on the system controller. When the network is operating in a ring string, the type of the data network and the neurons in each layer of the network sigh the same reading material, and the seam value will be placed in different processing units. To write full and wll to the memory of the arithmetic unit, it must be as shown in Figure 35, 201232429. When the weight is full, wll is written to the weight bus and the signal on the address bus is 1, then the person will be read into the - (4) calculation unit. Then, the weights w02 and w22 are written to the weight bus and the address bus is adjusted to 2, as shown in Figure 36. All weight values are arranged in accordance with Figure 37, and with the weight controller of the previous section, all weight values can be stored in the correct arithmetic unit. In order to simplify the controller, the present invention separates the controllers of the address busbars independently. The program fragment of the address management unit is solved in Table 3.4, and the calculation of the address of each layer of the operation unit is the same, but each operation is modified according to the context of the network layers.

Forward—Layer_Node、Next一Layer一Node 以及 Next_Layer__ Mod 的暫存 器’即可計算各層運算單元位址的順序。 表3.4位址管理單元程式片段 If (neural_netowrk_type=Recurrent) thenForward-Layer_Node, Next-Layer-Node, and Next_Layer__ Mod's scratchpad' can calculate the order of the address of each layer of arithmetic unit. Table 3.4 Address Management Unit Program Fragment If (neural_netowrk_type=Recurrent) then

First_Layer_Node=Input_Node+Hiddenl_Layer_NodeFirst_Layer_Node=Input_Node+Hiddenl_Layer_Node

ElseElse

First_Layer_Node = Input_Node End if /*網路若為四層網路則需比較第二層隱藏層的神經元個數*/ MaxLayerNode=MAX[Input_Layer_Node,First_Layer_Node = Input_Node End if /* If the network is a four-layer network, compare the number of neurons in the second hidden layer*/ MaxLayerNode=MAX[Input_Layer_Node,

Hidden_Layer_Node,Hidden_Layer_Node,

Output_Layer_Node] if (MaxLayerNode>MaxPE) then /*MaxPE=硬體可合成出的最大個數〜Output_Layer_Node] if (MaxLayerNode>MaxPE) then /*MaxPE=The maximum number of hardware can be synthesized~

MaxNode=MaxPEMaxNode=MaxPE

MaxNode=MaxLayerNode end if 201232429 /*各層神經元除以MaxNode所得的商數與餘數 if (RAM/=0) then Mod=Mod+l end ifMaxNode=MaxLayerNode end if 201232429 /* The quotient and remainder of each layer of neurons divided by MaxNode if (RAM/=0) then Mod=Mod+l end if

Forward一 Layer一 Node=Output_Layer一 Node Next_Layer_Node=Hidden_Layer_Node Next_Layer_ Mod= Hidden_Layer_Mod Address= 1Forward-Layer-Node=Output_Layer-Node Next_Layer_Node=Hidden_Layer_Node Next_Layer_ Mod= Hidden_Layer_Mod Address= 1

Total_Counter=( Forward一Layer—Node+1)* MaxNode* Next—Layer_ Mod Counter= Forward一Layer一 Node If (Total_Cpunter=0) then ReturnTotal_Counter=( Forward-Layer-Node+1)* MaxNode* Next—Layer_ Mod Counter= Forward-Layer-Node If (Total_Cpunter=0) then Return

Elsif (Counter=0 AND Address= MaxNode) then Save AddressElsif (Counter=0 AND Address= MaxNode) then Save Address

Counter= Forward_Layer_Node Address: 1 Total—Counter—Counter= Forward_Layer_Node Address: 1 Total—Counter—

Elsif (Counter=0) then Save AddressElsif (Counter=0) then Save Address

Counter: Forward—Layer一Node Address= Address+1 Total—Counter—Counter: Forward—Layer-Node Address= Address+1 Total—Counter—

ElseElse

Save Address Counter—Save Address Counter—

Total_Counter— 3· 2· 3目標值管理單元 由於逆向運算時,倒傳遞學習演算法是先從輪出層的最後一個神 31 201232429 經元開始運算。亂目標值若使贿敝㈣_存,則在輸入目標 值時必須從輸出層最後一個神經元的目標值開始輸入。由於目標值是 存在仵列記憶體中,當網路#多組資料需要訓練時,就必須要從第一 組的最後-個資料開始輸人前列記憶體中,而訓練資料的輸入是從 第-組的第-_始,如此方式造成目標值與網路訓料料的輸入順 序相反。為了解決這樣的情況,本發明設計—個目標值的控制器可 使輸入時只須照著第-組第—個神經元、第—組第二個神經元的順序 輸入,而控制器會自行判斷該如何寫入至目標值的佇列記憶體中。 若網路輸出層具有四個神經元,而共有三組訓練資料,目標值輸 入順序如圖二十六所示。根據倒傳遞學習演算法中目標值將會從第一 組中的最後一筆位址3所存放的tl4開始運算,接著是位址2、位址丄, 最後是位址0的目標值。第一組訓練完畢後,接著並換到第二組的最 後一筆位址7所存放的t24開始進行第二組的訓練。圖二十七所示為 目標值演算法,首先會先將目標值總輸入筆數儲存在暫存器,接著寫 入組別從第一組開始,寫入神經元個數亦從第一個計算起。將第一組 資料所存放最後一個神經元的記憶體位址寫入至WriteDa1;a,當第一組 中所有神經元的位址寫入完畢後則將寫入組別換至下—組,直到所有 的位址都已從組別由後往前的順序寫入到佇列記憶體中。 以下為目標值管理單元的虛擬碼:_ 寫入組別=1 寫入神經元個數=1 if (總輸入筆數=〇) then 跳出 32 201232429 elsif (寫入神經元個數=輸出層神經元個數)then 將寫入的組別編號換到下一組 將寫入的神經元個數從第一個開始計數 else 將(組別*輸出層神經元-神經元個數)的位址寫入 end if 3.2.4 系統控制單元 整個類神經系統的運算是由系統的控制單元掌控。所有的數值都 鲁是放人記憶射,㈣H的目岐要配合時序將·存在記憶體中的 資料寫入或讀取。而控制n的設計方法是以有限狀態機為基礎的設計 模式如圖一十九戶斤示。除了對儲存的記憶體存取外,控制器也要控 制整個%狀串列的架構、分段計算、累加器n訊號選擇線以及 控制前三小節的控制單元等。 控制器的主要流程及狀態如圖四十所示。流程圖中分成三大區 塊’其中初始化產生起始訊號,權重、健和目標值控制器在接收到 • 起使訊號後,便會依照網路的種類以及大小進行計算。等待系統接受 到權重等其;I*理單元的完成訊紐’便錢權重等管理單元的控制將 資料依序擺設到佇列記憶體中,完成系統運作前的準備。 當系統完成準備後便開始進行前向運算的部分。首先控制器會根 據第-隱藏層活化函數的類型,調好工器的訊號選擇線。然後將训 練資料從輸入符列記憶體中傳送至環狀架構進行運算單元陣列運算。 若為回饋式神_關在赠資料寫人後便將承接層的龍也傳送到 環狀架構中,運算出第—隱藏層神經元的數值儲存下來後,判斷網路 33 201232429 大小是否_行分段轉,剌完絲—層_的運算。根據隱. 齡的層數,調整多工器的訊號選擇線。隱藏層及輪出層的運算,都 是將前-層利觸料财經财化函㈣塊計算料存在記憶 體令的值,再傳回環狀串列架構中,完成整個前向運算。 —系統進行完前向運算後,會根據此時是訓練網路或者是測試網路 決定後續的動作。若為網路訓練,則控制器將繼續進行逆向運算;若 為網路回想,則將輸出層所制的數值傳送到結果端輸出資料。逆向 運算開始時,控制时·目標值控制單元所輸出的數值與前向運算· 輸出層所得_值進行隱藏層誤差量的,接著透過環狀串列的架 構與隱藏層誤差量_塊進行制賴層縣量。縣量撕算完畢 後’控制器將誤ϋ量、學f速率以及各神經元前_層級的輸入值,透 過權重管理單it配合區塊運算出來的時間,即可得到—組新的權 重。最後在觸_«料是轉繼完畢,且_設定的疊代次數, 完成整個控制器對系統的控制。 %狀串列㈣触隱藏層誤差量的區塊進行得_藏層誤差量。_ 誤差量都計算完畢後,控制雜誤差量、學習速率以及各神經元前一 層級的輪域,透賴重管料元配合Aw崎縣料的時間,即可 得到-組_權重。最後在鑛訓練資料是否都爾完畢,且達到执 定的疊代次數’完成整個控制器對系統的控制。 3· 3軟體規劃 本系統的軟體規劃方面,是經由微處理器下達指令至匯流排沪 由Avalon匯流排依據暫存區塊的設計,對硬體下達參數命令,如多 34 201232429 路種類、轉大小、活化函數的、辑:細、缝倾、目標值… 等。此設計的好處是當要修改網路的類型、大小時直接在軟體界面 上即可進行設定’不«對硬體重新設定與_。_十—為本發明 所使用的系統及Nios „的架構圖。由於硬體運算制定點小數而 軟體使用的小數型態為以獅54格式的浮點小數。為了降低 處里益的貞射⑨陳_妓,㈣料轉狀狀轉換器置放 :在硬體端。最後當硬體運算完畢後,存放資料的定點數位址内 容’以有號小數的格式進行轉換,二進制特性左移小數位元數, =了換算传到十進制的洋點數,如此可以減少定點轉換浮點的轉換 器,且提高系統的時脈。 圖四十二為軟體規劃流程圖。系統正常運作除了硬體設計外必 須靠軟體下達麵。首先必須由碼下達硬體重⑽齡,接著定 Ϊ網路種i各層活化函數_、網路架構、網路測試或訓練、權重 貝料、目標貢料、輸入資料 '學習速率、詞練迴圈次數,當網峨 定以及她峨編訂她齡嫌 介面回覆軟體運算完成後,即可執行讀取結果的指令。表U則是受 控端硬體與主控端軟體之間溝通的訊號。 画^麵功能表 「rst:==— Learning_RateTotal_Counter—3· 2· 3 Target value management unit Because of the reverse operation, the inverse transfer learning algorithm starts from the last god of the turn-out layer. If the chaotic target value causes the bribe (4)_ to be stored, the input value must be entered from the target value of the last neuron in the output layer. Since the target value is stored in the memory, when the network #multiple data needs to be trained, it must be input from the last data of the first group into the front memory, and the input of the training data is from the first - The first - of the group, in this way, the target value is opposite to the input order of the network training material. In order to solve such a situation, the controller of the present invention is designed to input a target value only in the order of the first group of the first neuron and the second group of the first neuron, and the controller will It is judged how to write to the queue memory of the target value. If the network output layer has four neurons and there are three sets of training data, the target value input sequence is shown in Figure 26. According to the inverse transfer learning algorithm, the target value will be calculated from the tl4 stored in the last address 3 in the first group, followed by the address 2, the address 丄, and finally the target value of the address 0. After the first set of training is completed, the training of the second group is started by switching to t24 stored in the last address 7 of the second group. Figure 27 shows the target value algorithm. First, the total number of input values of the target value is stored in the scratchpad. Then the write group starts from the first group, and the number of neurons written is also from the first one. Calculated. Write the memory address of the last neuron stored in the first set of data to WriteDa1;a. When the addresses of all the neurons in the first group are written, the write group is changed to the next-group until All addresses have been written to the queue memory from the group to the front. The following is the virtual code of the target value management unit: _ write group = 1 write neurons number = if (total number of input = 〇) then jump out 32 201232429 elsif (write neurons number = output layer nerve The number of elements) then changes the group number to be written to the number of neurons to be written in the next group from the first to count the number of elements (group * output layer neurons - number of neurons) Write end if 3.2.4 System Control Unit The entire class of neural system operations is controlled by the system's control unit. All the values are left in memory, and (4) H's goal is to write or read the data in the memory in conjunction with the timing. The design method of controlling n is based on the finite state machine design pattern shown in Figure 19. In addition to accessing stored memory, the controller also controls the entire %-like array architecture, segmentation calculations, accumulator n-signal selection lines, and control units that control the first three bars. The main flow and state of the controller are shown in Figure 40. The flow chart is divided into three major blocks, where the initial signal is generated, and the weight, health and target value controllers are calculated according to the type and size of the network after receiving the signal. Waiting for the system to accept the weight and other such; the completion of the I* unit, the control of the weight of the management unit, etc., the data is sequentially placed in the queue memory to complete the preparation of the system. The part of the forward calculation begins when the system is ready. First, the controller adjusts the signal selection line of the worker according to the type of the first-hidden layer activation function. The training data is then transferred from the input array memory to the ring architecture for operation unit array operations. If it is a feedback-type god _ after the gift is written, the dragon of the receiving layer is also transmitted to the ring structure, and the value of the first-hidden layer neuron is calculated, and then the size of the network 33 201232429 is determined. The segment turns, the end of the silk - layer _ operation. Adjust the signal selection line of the multiplexer according to the number of layers of the hidden age. The operation of the hidden layer and the round-out layer is to store the value of the memory command in the calculation data of the front-layer profit-seeking financial (4) block, and then return it to the ring-like serial structure to complete the whole forward operation. - After the system performs the forward calculation, it will decide the subsequent actions according to whether it is the training network or the test network. In the case of network training, the controller will continue to perform the reverse operation; if it is for the network, the value produced by the output layer will be transmitted to the output data of the result. At the start of the reverse operation, the value of the hidden value of the value obtained by the control/target value control unit and the value obtained by the forward operation and output layer are then transmitted through the structure of the ring string and the hidden layer error amount_block. Lai layer county. After the countdown of the county is completed, the controller will calculate the amount of error, the rate of learning f, and the input value of each pre-neuronal level, and the weight of the weight management sheet will be used to calculate the new weight. Finally, after the touch is completed, and the number of iterations set by _ is completed, the control of the entire controller is completed. The %-like string (4) touches the block of the hidden layer error amount to obtain the _ layer layer error amount. _ After the error amount is calculated, the amount of error, the learning rate, and the previous stage of each neuron are controlled, and the time of the heavy tube element is matched with the time of Awaki County, and the group_weight is obtained. Finally, whether the mine training data is completed and the number of iterations reached is achieved, and the control of the entire controller is completed. 3·3 software planning The software planning aspect of the system is to send the command to the bus bar via the microprocessor. The Avalon bus is designed according to the temporary storage block, and the hardware commands are issued. For example, the type of the road is changed. Size, activation function, series: fine, slit, target value, etc. The advantage of this design is that you can set the 'no' reset hardware and _ directly on the software interface when you want to modify the type and size of the network. _10—The architecture of the system and Nios „ used by the present invention. The decimal number used by the software is a floating point fraction in the lion 54 format due to the hardware calculation of the decimal point. Chen _ 妓, (4) material-like converter placement: on the hardware end. Finally, when the hardware operation is completed, the fixed-point address content of the stored data is converted in the format of a fractional decimal number, and the binary characteristic is shifted to the left of the decimal place. The number of elements, = the number of foreign points converted to decimal, so that the converter of fixed-point conversion floating point can be reduced, and the clock of the system is improved. Figure 42 shows the flow chart of the software. The normal operation of the system is in addition to the hardware design. It must be done by software. First of all, it must be given a hard weight (10) age by the code, and then the activation function of each layer of the network _, network architecture, network testing or training, weighting materials, target tribute, input data 'learning' The rate, the number of times the word is looped, and when the network is set and she edits the software for the age of the interface, the instruction to read the result can be executed. The table U is the hardware of the controlled end and the software of the host. . ^ Face communication signal picture menu "rst: == - Learning_Rate

訓練次數 ^00 : 明…,_τ.丨—......_........丨...Number of trainings ^00 : Ming..., _τ.丨—......_........丨...

Ϊ置所有 ϋϋΐϋ 寫又' startSet all ϋϋΐϋ write and ' start

EpochEpoch

Kind 35 201232429 0x01 .四層倒傳遞類神經網& | 0x10:標準回饋式神經網路 0χΐΐ :回饋式神經網路且回饋j 重值會修正 ; Af.Hiddenl 5 寫 向量 第一層隱ϋ活化12^~~一一一 第二層〜- Af_Hidden2 6 寫 向量 Af一Output 7 寫 向量 輪出層活化函數~~~--—; Input_num 8 寫 整數 輸入層#經元一一—! Hiddenl_num 9 寫 整數 Γ第一7層隱藏層神 Hidden2_num 10 寫 整數 第二層隱藏層神經元個备 ~~~~ 輸^層神經元^ ~^ Output_num 11 寫 整數 Train_or一 Test 12 寫 位元 〇:網路為訓練狀.i ---- 1:網路為測試狀態 Weight 13 寫 定點— θ入權-值資料~~~~~--—- Target 14 寫 定點— 寫入目ϋ昇 - Input.data 15 寫 定 — ........................................ done 0 讀 位元 —- - ί 後的權重值 / Result T" 1 — 定點_ 網路為測試狀態時則輸出為輸出 層的運算結果 1 【實施方式】 籲 本發明分別以三層以及四層倒傳遞類神經網路進行正弦函數曲線 擬合、兩種不同的回饋式神經網路做電池殘餘電量的預測並且採用 分段計算硬體訓練的及未採用分段計算的結果和誤差。而在訓練時間 上則疋以Ni〇s II下達指令使網路開始訓練到接受回傳訓練完成指令 總耗費的_ ’與純軟體(MATLAB)做比較。此測試環境作業系統採 用的疋 Windows XP SP3 ’ CPU 為 Intel CoreTM2 Duo Processor E8400 (6M Cache,3. 00 GHz, 1333 MHz FSB),系統記憶體則為 DDRII 2GB。 36 201232429 而硬體則是採用 Quartus II 9.0 (32-Bit)及 Nios II 9. 0 IDE,訊 號模擬則是使用 ModelSim-Altera 6.4a (Quartus II 9.0) Starter Edition。本發明所定義的訓練樣本誤差以及測試樣本誤差則是採用均 方根誤差(Root Mean Square Error)方式計算,如式(4.1)。 RMSE =Kind 35 201232429 0x01 . Four-layer inverted transfer neural network & | 0x10: standard feedback neural network 0χΐΐ: feedback neural network and feedback j weight value will be corrected; Af.Hiddenl 5 write vector first layer conceal activation 12^~~11~2nd layer~- Af_Hidden2 6 Write vector Af-Output 7 Write vector round-trip activation function ~~~---; Input_num 8 Write integer input layer #经元一一—! Hiddenl_num 9 Write Integer Γ first 7 layers hidden layer god Hidden2_num 10 write integer second layer hidden layer neurons ready~~~~ ̄ 层层 neurons ^ ~^ Output_num 11 Write integer Train_or one Test 12 Write bit 〇: Network is Training shape. i ---- 1: Network is the test state Weight 13 Write the fixed point - θ input weight value data ~~~~~---- Target 14 Write fixed point - write the target - - Input.data 15 Write - ........................................ done 0 read bit -- - ̄ After the weight value / Result T" 1 - Fixed point _ When the network is in the test state, the output is the output result of the output layer 1 [Embodiment] The present invention calls the three-layer and four-layer inverted transfer gods The sinusoidal curve fitting is performed via the network, two different feedback neural networks are used to predict the residual battery power, and the results and errors of the hardware training and the segmentation calculation are not used. In the training time, the command of Ni〇s II is used to make the network start training to compare the total cost of the return training completion instruction with MATLAB. The Windows XP SP3 ' CPU for this test environment operating system is Intel CoreTM2 Duo Processor E8400 (6M Cache, 3. 00 GHz, 1333 MHz FSB), and the system memory is DDRII 2GB. 36 201232429 The hardware uses the Quartus II 9.0 (32-Bit) and Nios II 9. 0 IDE, and the signal simulation uses the ModelSim-Altera 6.4a (Quartus II 9.0) Starter Edition. The training sample error and the test sample error defined by the present invention are calculated by the Root Mean Square Error method, as in Equation (4.1). RMSE =

(Destinatio η - Output)2 V OutputNode 開發系統 (4.1)(Destinatio η - Output)2 V OutputNode Development System (4.1)

HI開發軟體 本發明採用超高速積體電路硬體描述語言(Very High Speed Integrated Circuit Hardware Description Language, VHDL)° 11.2開發硬體元件 使用可程式邏輯閘陣列(Field programmable Gate Array, FPGA),並以 Altera 的 FPGA 發展平台 Stratix II EP2S60F1020C4 進 行硬體的開發。系統的工作時脈為100MHz,並透過Aval〇n匯流排連結 到NIOS II嵌入式處理器與使用者邏輯(User L〇gic)以及其它周邊 (I/O)進行系統驗證即實驗數據分析。 4.1.3 Nios II嵌入式處理器HI development software The present invention uses a Very High Speed Integrated Circuit Hardware Description Language (VHDL). 11.2 Development of hardware components using a Field Programmable Gate Array (FPGA), and Altera's FPGA development platform Stratix II EP2S60F1020C4 is hardware developed. The system's working clock is 100MHz, and the Aval〇n bus is connected to the NIOS II embedded processor and user logic (User L〇gic) and other peripheral (I/O) for system verification, ie experimental data analysis. 4.1.3 Nios II Embedded Processor

Nios II嵌入式處理器為Altera公司研發的第二代處理器,其為 Soft-Core)32Reduced Instruction Set Computing, RISC)處理器。透過 Quartus 軟體中的 s〇pc (system On a Programming Chip) Builder開發系統,將FPGA内部邏輯元件 (Logic Element, LE)合咸產生的Ni〇s ιι處理器,搭配使用者自行 37 201232429 設計的硬體元件、記憶體單元、裝置介面以及各種ip等等,合成出s〇c (System On Chip)系統’並可燒錄到可程式規劃晶片中。Ni〇s π基 於 Eclipse 的 Nios II 集成開發環境(Integrated Devel〇pment Environment IDE),支援客製化指令(Cust〇m instructi〇n)並具有 分離的程式及資料匯流排,具有大幅度的靈活性,使用者可以在多種 系統設置中進行選擇,達到平衡效能、成本的目標。 本發明Nios II負責設定類神經網路的相關參數以及訓練樣本’ 運算過程由硬體的控制器負責’利用Nios II内建的Avalon匯流排將 軟硬體連結。Nios II硬體架構,如圖四十三所示。 4.1.4 Avalon 匯流排The Nios II embedded processor is a second-generation processor developed by Altera Corporation, which is a Soft-Core 32Reduced Instruction Set Computing (RISC) processor. Through the s〇pc (system On a Programming Chip) Builder development system in the Quartus software, the internal logic component (Logic Element, LE) of the FPGA is combined with the Ni〇s ιι processor generated by the user, with the user's own design of 201232429 hard. The body element, the memory unit, the device interface, and various ips, etc., are synthesized into a system on chip system and can be burned into a programmable chip. Ni〇s π is based on Eclipse's Nios II Integrated Development Environment (Integrated Develpence Environment IDE), supports custom instructions (Cust〇m instructi〇n) and has separate programs and data buss for maximum flexibility Users can choose among a variety of system settings to achieve the goal of balancing performance and cost. The Nios II of the present invention is responsible for setting relevant parameters of the neural network and training samples. The operation process is performed by the hardware controller. The Nival II built-in Avalon bus is used to connect the hardware and software. The Nios II hardware architecture is shown in Figure 43. 4.1.4 Avalon Busbar

Nios II採用Avalon匯流排,主要的功能是用於連結系統處理器 與週邊介面,其描述了主從式架構間連接埠的連接關係,以及元件間 通訊的時序關係。如圖四十四所示。Avalon匯流排具有以下特點: 1. 時脈同步:在Avalon Bus上的訊號都與Avalon clock同步, 其優點是可以簡化時序控制’不需要交握(Handshaking)及回覆收到 (Acknowledge)機制’也因此可以避免時序上的限制方便高速裝置的 傳輸。 2. 訊號分開:在Avalon Bus上的控制訊號、資料訊號、位置訊 號採用分開的通訊埠,其特點為可以簡化傳輸介面的設計。 3_動態寬度:在Avalon Bus上,Master與Slave之間可以傳輸 8、16、32位元的資料,其特點為在設計上具有相當大的靈活度。 4·仲裁技術:在Avalon Bus上,Master與Slave之間可同時使 38 201232429 - 用匯流排傳輸’其特點為減少頻寬限制並且可以處理多個訊號。 5. s己憶體映射(Memory-mapped):在Avalon Bus上可利用記情 體映射的方式對受控端進行控制,其特點為Chipseiect訊號可以忽略 所有外部訊號。The Nios II uses the Avalon bus. The main function is to connect the system processor to the peripheral interface. It describes the connection between the master-slave architecture and the timing relationship between the components. As shown in Figure 44. The Avalon bus has the following features: 1. Clock synchronization: The signals on the Avalon Bus are synchronized with the Avalon clock. The advantage is that it simplifies the timing control 'Handshaking and Acknowledge mechanism' Therefore, it is possible to avoid the limitation of timing and facilitate the transmission of the high-speed device. 2. Separation of signals: The control signals, data signals and position signals on the Avalon Bus use separate communication ports, which are characterized by simplifying the design of the transmission interface. 3_Dynamic Width: On the Avalon Bus, the Master and Slave can transmit 8, 16, and 32 bits of data, which is characterized by considerable flexibility in design. 4. Arbitration technology: On the Avalon Bus, the master and the slave can simultaneously make 38 201232429 - use bus to transmit 'characteristics to reduce the bandwidth limit and can handle multiple signals. 5. Memory-mapped: The controlled end can be controlled on the Avalon Bus by means of a sensible body map, which is characterized by the fact that the Chipseiect signal can ignore all external signals.

Avalon Bus上主控端對受控端的寫入與讀取訊號,如圖四十五及 圖四十六所示。其中Chipseiect是用來致能想要進行存取的受控端, 由於Avalon使用記憶體映射方式對受控端資料做存取動作,所以除了 • Write訊號及Read訊號用來控制資料的讀取及寫入外,仍須透過The write and read signals of the controlled end on the Avalon Bus are shown in Figure 45 and Figure 46. Chipseiect is used to enable the controlled end of the access. Since Avalon uses the memory mapping method to access the controlled data, the Write signal and the Read signal are used to control the reading of the data. Outside of writing, still have to pass

Address Bus對受控端内部資料做存取的動作。 4.2實驗結果 4. 2· 1曲線擬合 為了驗證本發明所提出之系統架構是否正確以及精確度是否精 準,本實驗採用正弦函數曲線擬合問題來進行實驗,實驗的參數如表 4.1所示。本實驗採用丨朽打的網路架構,訓練擬和一個正弦函數變化 鲁 的問題,方程式為式(4.1)。此實驗的權重為I5遺機亂數取[-0·5,0.5] 間’貫驗差吉果如圖四十七〜圖四十九,圖四十七中藍色代表相同權重 下硬體訓練結果’綠色代表軟體訓練結果,紅色則為實際曲線圖。實 驗結果分析如表4. 2。 j(jc) = —(sin(j:) + l) (4.1) 类4.i .正弦函數曲線擬合相關銮數玆定 ————— rakw1 从·η 圓;二· — ........ 么締 Ί 參數 工作時脈ί !_ 100MHz 39 201232429 整數位數 16 位元 ~~~-·| ———- .— ----' - ---- . ___ u 小數位數 16位元 _ 〜1 網路種類 三層倒傳遞類神經網路 ί 輸入神經元個數 * .........·*—-----------···------------ 1 1 個 、 ------------- 1 隱藏層神經元個數 5個 —1 輸出神經元個數 1 個 ----ί - ——- 一i 隱藏層活化函數 雙彎曲函數~ —1 輸出層活化函數 ··--------------------- 言 雙彎曲函數 ' —'.S--1 訓練樣本數 628 筆 ~ ~~ 測試樣本數 [ 314 筆 ——. 學習速率 0.6 --- 雙弩曲函數係數λ • ·— - - -- —---_ _ _____ 1 ·-----------------— 權重值範圍 [-0.5,05Ϊ~ ~~~~~ 訓練次數 1000 次 ^ _ 4.2正弦函數曲線擬合實驗結果誤差分味斤Address Bus access to the internal data of the controlled terminal. 4.2 Experimental Results 4. 2·1 Curve Fitting In order to verify whether the system architecture proposed by the present invention is correct and the accuracy is accurate, the experiment uses the sine function curve fitting problem to carry out the experiment. The experimental parameters are shown in Table 4.1. In this experiment, we use the network architecture of smashing to train the problem of changing a sinusoidal function. The equation is (4.1). The weight of this experiment is I5. The number of random numbers is [-0·5, 0.5]. The average error is shown in Figure 47~ Figure 49. The blue in Figure 47 represents the same weight. The training result 'green' represents the software training result, and red is the actual curve. The results of the analysis are shown in Table 4.2. j(jc) = —(sin(j:) + l) (4.1) Class 4.i. Sine function curve fitting related 兹 兹 —— —— —— rakw1 from · η circle; two ... ..... Ί Ί Ί Parameter working clock ί !_ 100MHz 39 201232429 Integer number 16 bits ~~~-·| ———- .— ----' - ---- . ___ u Decimal Number of digits 16 bits _ ~1 Network type Three layers of inverted neural network ί Input number of neurons * .........**------------ ··------------ 1 1 , ------------- 1 The number of hidden layer neurons is 5 - 1 The number of output neurons is 1 - ---ί - ——- I hi hidden layer activation function double bending function ~ -1 output layer activation function ··--------------------- double bending Function '-'.S--1 Training sample number 628 pen ~ ~~ Number of test samples [ 314 pens --. Learning rate 0.6 --- Double twist function coefficient λ • ·—— - - -- -----_ _ _____ 1 ·------------------ Weight range [-0.5,05Ϊ~ ~~~~~ Training times 1000 times ^ _ 4.2 Sine function curve fitting experiment results Error

4.2.2函式逼近 本實驗為使用本發縣構巾的四層倒傳遞轉經網路擬合各種不 同函式的問題’方程式為式(4.2)〜(4 5)。本實驗的訓料料的範 圍是在[卜6·28]職分^咖娜財娜_句分布, 軟硬體各個《值相同的情形下’其他所設定的參數如表4· 3。實驗結 果如圖五十〜五十二,藍声± 麵公式…紅色為料二、綠色表示公 式三而紫色表示公式四。表4 d •為各個不同系統實驗後效能與精準度 40 201232429 的分析表。 ^(jc) = i(sin(jc) + l) (4.2) y(x) = - (cos(j:) +1) 2 (4.3) 3»U) = i(l〇gW + 3) (4.4) j(jc) =-ί—exp〇) 600 (4.5) 表4.4 函式逼近實驗結果分析表 網路架構 1-2-3-4 (硬體架構) 1-2-3-4 (硬體架構) 1-2-3-4 (軟體架構) 分段處理 有 無 無 硬體合成運算單元 數目 2個 10個 訓練時間 62.298002 s 54.472789 5 435.489235 ί 正弦函數最小誤差 1.2991e-5 7.0661e-5 3.2435e-4 正弦函數最大誤差 0.0288 0.4950 0.2287 正弦函數平均誤差 0.0085 0.0742 0.0602 餘弦函數最小誤差 1.2991e-5 2.79l5e-5 1.6775e-4 餘弦函數最大誤差 0.0288 0.1170 0.1204 餘弦函數平均誤差 0.0085 0.0296 0.0521 對數函數最大誤差 1.299le-5 1.2991e-5 3.9936e-6 對數函數最大誤差 0.0288 0.5149 0.5421 對數函數平均誤差 0.0085 0.0238 0.0198 指數函數最小誤差 l_2991e-5 2.0928e-5 1.8030e-6 指數函數最大誤差 0.0288 0.2238 0.2026 指數函數平均誤差 0.0085 0.0309 0.0288 ' --------4.2.2 Function Approximation This experiment is a problem of fitting various different functions using the four-layer inverted transfer network of the hair of the county. The equation is (4.2)~(4 5). The scope of the training materials in this experiment is in the distribution of [Bu 6·28], and the distribution of the other words. The other parameters set in the case of the same value of the hardware and software are shown in Table 4.3. The experimental results are shown in Fig. 50~52, the blue sound ± surface formula... red for material 2, green for formula 3 and purple for formula 4. Table 4 d • Analysis table for post-experimental performance and precision for different systems 40 201232429. ^(jc) = i(sin(jc) + l) (4.2) y(x) = - (cos(j:) +1) 2 (4.3) 3»U) = i(l〇gW + 3) ( 4.4) j(jc) =-ί—exp〇) 600 (4.5) Table 4.4 Function of the approximation of the experimental results Analysis table network architecture 1-2-3-4 (hardware architecture) 1-2-3-4 (hard Body architecture) 1-2-3-4 (software architecture) Segmentation processing with or without hardware synthesis unit number 2 10 training times 62.298002 s 54.472789 5 435.489235 正 Sinusoidal function minimum error 1.2991e-5 7.0661e-5 3.2435 E-4 Sinusoidal function maximum error 0.0288 0.4950 0.2287 Sinusoidal function average error 0.0085 0.0742 0.0602 Cosine function minimum error 1.2991e-5 2.79l5e-5 1.6775e-4 Cosine function maximum error 0.0288 0.1170 0.1204 Cosine function average error 0.0085 0.0296 0.0521 Logarithmic function maximum Error 1.299le-5 1.2991e-5 3.9936e-6 Logarithmic function maximum error 0.0288 0.5149 0.5421 Logarithmic function mean error 0.0085 0.0238 0.0198 Exponential function minimum error l_2991e-5 2.0928e-5 1.8030e-6 Exponential function maximum error 0.0288 0.2238 0.2026 Index The average error of the function is 0.0085 0.0309 0.0288 ' --------

4.2.3電池殘餘電量預測 本實驗為使用時間延遲神經網路以及回饋式神經網路,以加百欲 公司實際量測所得到的電池放電資料進行殘餘電量的預測。此資料是 電池在40。C持續以2.4安培的電流放電所記錄而得的資料。本會 將測量所得的電壓、電流以及電池溫度進行不同的處理如公式(4 6) 41 201232429 〜(4.8)如同正規化的方式讓輸人的數值在⑺,i]間如圖五十四 至五十”所不’左邊為原始資料,右邊為經過公式處理後的資料。宜 他所設定的參數如表4.5及表4.6所示。實驗結果如肛十七至圖^ 十所示。圖五十七和圖五十九分鹏兩個鹏所糊㈣結果與實際 資料的情形,圖中的藍色為預測出的結果曲線,紅色為實際放電曲線T。 表4. 7為兩種不同的網路實驗後效能與精準度的分析表。 (4.6) (4.7) (4.8) :v-rl3 :(i + 2.5) -r 3 :r-r56 名稱 I 工作時β ί …. 1 整數位數 | ____— . - - ---------------- ~~ -» . - »>*_ , 小數位數 Γ * ! 一一 ϋ種鉍— _____ _·" 1 輸入神經元個數 一·- —* 1 i 丨延遲個數 ί ___________- ί隱藏層神經元個數 ί_________ 輸出神經元個數 [隱藏層活化函數 | 輪出層活化函數 「 ϋϋ 數———— [ . 預測樣本數 I 學習速率 i雙彎曲正切函數係]數 λ Γ 權重值範圍 訓練次數 100MHz 16 位ST — Ϊ6位元 — 時間延遲類神經網g 一 3Ϊ …— 一 3個 一 ',· 5個 — 、—— 、今數一一 ... — 629筆 ~~~一 ~~~------~~____ 629筆 ' 二一-------- 0.6 [-0-5,0.57 ^ϊοοο^~ 表4.6電池殘餘電量實驗回饋式神經網路相關參數設定 42 2012324294.2.3 Battery residual power prediction This experiment uses the time delay neural network and the feedback neural network to predict the residual power of the battery discharge data obtained by the actual measurement. This information is for the battery at 40. C continues to record data at a current discharge of 2.4 amps. This will measure the voltage, current and battery temperature differently as shown in the formula (4 6) 41 201232429 ~ (4.8) as in the normalized way to let the input value between (7), i] as shown in Figure 54 The 50th is not the left side of the original data, the right side is the data processed by the formula. The parameters set by him should be as shown in Table 4.5 and Table 4.6. The experimental results are shown in the anus 17 to the figure ^10. The sum of the results of the seven and ninety-nine points and the five points of the Peng (4) results and the actual data, the blue in the figure is the predicted result curve, and the red is the actual discharge curve T. Table 4. 7 are two different networks Analysis table of efficiency and precision after road experiment. (4.6) (4.7) (4.8) :v-rl3 :(i + 2.5) -r 3 :r-r56 Name I When working β ί .... 1 Integer digits | ____— . - - ---------------- ~~ -» . - »>*_ , Decimal number Γ * ! One by one 铋 — _____ _·" 1 Input number of neurons one·--*1 i 丨delay number ί ___________- ί hidden layer neurons ί_________ number of output neurons [hidden layer activation function | rounded layer The function " ϋϋ number --- [ . Forecast sample number I learning rate i double bending tangent function system] number λ 权 weight value range training times 100MHz 16 bit ST — Ϊ 6 bits — time delay class neural network g a 3 ... — One three ones, five, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one, one ---- 0.6 [-0-5,0.57 ^ϊοοο^^~ Table 4.6 Battery residual power experiment feedback neural network related parameter setting 42 201232429

表4.7 網路種類 神經網路 網路架構 54遲類 神經網路 9-5-1 (硬體架構) 回饋式神經 網路 分段處理 無 9-S-J (硬體架構) 3-2-1 (硬體架構) 回饋式神經 網路 硬體合成運 算單元數目 每一筆訓練 時間 10個 0.542483: 0-676606 無 10個— 0.429014 5 3-2-1 (硬體架構) 2Ϊ 0.452101 最小誤差 最大誤差 平均誤差 2.7325e-5 0.0412 Μ0ΪΓ 2.7l48e-4 ~~οόοΓΓ 2.685 le-4 0.041厂 0.001 厂 6.9390e-5 ~~〇0376 ~00004~ 上列詳細說明係針對本發明之—可行實施例之具體說明,惟該實 %例並非用錄制本發明之專利範圍,凡未麟本㈣技藝精神所為 之等效實她或變更’均應包含於本案之專利範圍中。 知上所述’本案不但在空間型態上確屬創新,並能較習用物品增 43 201232429 進上述多項1纽,應已_合_及物i之法定發 件,爰依法提出申請,懇喑主 务明專利要 * 貝局核准本件發明專利申請案,以勵發 明’至感德便。 【圖式簡單說明】 圖一為CNAPS架構圖; 圖二為二維脈動陣列架構運算示意圖; 圖三為倒傳遞類神經網路架構圖; 圖四為對稱硬限函數圖; 圖五為雙彎曲函數圖; 圖六為雙彎曲正切函數圖; 圖七為線性函數圖; 圖八為時間延遲類神經網路示意圖; 圖九為回饋式類神經網路架構圖; 圖十為正向運算單元模型圖; 圖十一為環狀架構示意圖; 圖十二為前向運算架構圖; 圖十三為分段計算流程圖; 圖十四為逆向運算架構圖; 圖十五為運算單元訊號時序圖; 圖十六為定點小數表示圖; 圖十七為雙曲線函數對稱性值圖; 圖十八為雙彎曲正切函數對稱性值圖; 圖十九為雙彎曲函數對稱性值硬體架構圖; 圖二十為雙彎曲正切函數對稱性值硬體架構圖; 圖二十一為片段線性法硬體架構圖; 圖二十二為片段線性法示意圖; 圖二十三為片段線性法修改後架構圖; 圖二十四為片段線性法疊代示意圖(1); 201232429 . 圖二十五為片段線性法疊代示意圖(2); 圖二十六為雙彎曲正切函數架構圖; 圖二十七為片段線性法原始圖; 圖二十八為修改後片段線性法圖; 圖二十九為片段線性法誤差比較圖; 圖三十為PWL雙彎曲正切函數圖; 圖三十一為PWL雙彎曲正切函數誤差圖; 圖三十二為倒傳遞類神經網路1-2-3-4架構圖; 圖三十三為運算單元記憶體補值圖; 圖三十四為權重記憶體資料配置圖; _ 胃三十五為權重_存圖⑴; 圖三十六為權重值儲存圖(2); 圖三十七為運算單元搭配地址示意圖; 圖三十八為目標值示意圖; 圖三十九為控制單元有限狀態機圖; 圖四十為控制單元流程圖; 圖四十—為硬體及Nios II架構圖; 圖四十二為軟體規劃流程圖; 圖四十三為Ni〇s π嵌入式處理器硬體架構圖; • 圖四十四為Avalon架構圖; 圖四十五為Avalon寫入訊號圖; 圖四十六為Avalon讀取訊號圖; 圖四十七為正弦函數曲線擬合實驗透過類神經網路輸出訓練結果圖. 圖四十八為正弦函數實際曲線與軟體及硬體擬合曲線對照圖;·, 圖四十九為正弦函數曲線誤差分析圖; =五十為函式逼近實驗測試資料透過不需分段硬體神經網路輪出結果 圖五十一為函式逼近實驗硬體測試資料誤差分析圖; ®五十二為函式逼近實驗職麵透過倾__路 圖五十三為函式逼近實驗軟體及硬體相關重下峨^料ς果圖; 45 201232429 圖; 圖五十四為電池電壓原始圖形與變化後輸入圖; 圖五十五為電池電流原始圖形與變化後輸入圖; 圖五十六為電池溫度原始圖形與變化後輸入圖; 圖五十七為以TDNN預測電池放電曲線結果比較圖; 圖五十八為以TD丽預測電池放電曲線誤差結果圖; 圖五十九為以回饋式神經網路預測電池放電曲線結果比較圖; 圖六十為以回饋式神經網路預測電池放電曲線誤差結果圖; 【主要元件符號說明】Table 4.7 Network Type Neural Network Network Architecture 54 Late Neural Network 9-5-1 (Hardware Architecture) Feedback Neural Network Segmentation Processing No 9-SJ (Hardware Architecture) 3-2-1 ( Hardware architecture) Number of feedback neural network hardware synthesis unit 10 training time each time 0.542483: 0-676606 no 10 - 0.429014 5 3-2-1 (hardware architecture) 2Ϊ 0.452101 minimum error maximum error mean error 2.7325e-5 0.0412 Μ0ΪΓ 2.7l48e-4 ~~οόοΓΓ 2.685 le-4 0.041 Factory 0.001 Factory 6.9390e-5 ~~〇0376 ~00004~ The above detailed description is for the specific description of the possible embodiments of the present invention, The actual example is not intended to cover the scope of the patent of the present invention, and the equivalent of her or the change of the technical spirit of the present invention shall be included in the patent scope of the present invention. Knowing that the case is not only innovative in terms of space type, but also can be added to the above-mentioned items, and can be submitted in accordance with the law, 恳喑 _ _ The main business must be patented * the board office approves the invention patent application, in order to invent the invention to the sense of virtue. Figure 1 is a CNAPS architecture diagram; Figure 2 is a schematic diagram of a two-dimensional pulsation array architecture; Figure 3 is a reverse-transfer-like neural network architecture diagram; Figure 4 is a symmetric hard-limit function diagram; Figure 6 is a double-bend tangent function diagram; Figure 7 is a linear function diagram; Figure 8 is a time-delay-like neural network diagram; Figure 9 is a feedback-like neural network architecture diagram; Figure 10 is a forward-operation unit model Figure 11 is a schematic diagram of the ring structure; Figure 12 is the forward operation architecture diagram; Figure 13 is the segmentation calculation flow chart; Figure 14 is the reverse operation architecture diagram; Figure 15 is the operation unit signal timing diagram; Figure 16 is a fixed-point decimal representation; Figure 17 is a hyperbolic function symmetry value diagram; Figure 18 is a double-bend tangent function symmetry value map; Figure 19 is a double-bend function symmetry value hardware architecture diagram; Twenty is a double-curved tangent function symmetry value hardware architecture diagram; Figure 21 is a fragment linear method hardware architecture diagram; Figure 22 is a fragment linear method diagram; Figure 23 is a fragment linear method modified architecture Fig. 24 is a schematic diagram of the piecewise linear method (1); 201232429. Fig. 25 is a schematic diagram of the piecewise linear method (2); Fig. 26 is a double bending tangent function structure diagram; Fig. 28 is the modified linear graph of the fragment; Fig. 29 is the error comparison graph of the fragment linear method; Fig. 30 is the PWL double bending tangent function graph; Fig. 31 is the PWL double bending Tangent function error graph; Figure 32 is the back-transfer neural network 1-2-3-4 architecture diagram; Figure 33 is the computational unit memory complement map; Figure 34 is the weight memory data configuration diagram ; _ stomach thirty-five is the weight _ deposit map (1); Figure 36 is the weight value storage map (2); Figure 37 is the operation unit collocation address diagram; Figure 38 is the target value diagram; Figure 39 Figure 490 is the control unit flow chart; Figure 40 is the hardware and Nios II architecture diagram; Figure 42 is the software planning flow chart; Figure 43 is the Ni〇s π embedded Processor hardware architecture diagram; • Figure 44 is the Avalon architecture diagram; Forty-five is the signal map for Avalon; Figure 46 is the Avalon read signal map; Figure forty-seven is the sine function curve fitting experiment through the neural network to output the training results. Figure 48 is the actual curve of the sine function Software and hardware fitting curve comparison chart; ·, Figure 49 is the sine function curve error analysis chart; = fifty is the function of the approximation experimental test data through the segmentation of the hardware network without the need for segmentation One is the approximation of the experimental hardware test data error analysis chart; ®52 is the function approximation of the experimental face through the tilt __ road map fifty-three for the approximation of the experimental software and hardware related heavy 峨 峨Fig. 45 201232429 Fig. Fig. 54 shows the original graph of the battery voltage and the input graph after the change; Fig. 55 shows the original graph of the battery current and the input graph after the change; Fig. 56 shows the original graph of the battery temperature and the input after the change. Figure 57 shows the comparison of the results of the battery discharge curve predicted by TDNN; Figure 58 shows the error results of the battery discharge curve predicted by TD 丽; Figure 59 shows the battery discharge curve predicted by the feedback neural network. FIG comparison result; FIG sixty is a feedback type neural network to predict the results of the battery discharge curve in FIG error; Main reference numerals DESCRIPTION

4646

Claims (1)

201232429 七、申請專利範圍: L —種具彈性架構的高速硬體倒傳遞及回饋型類神經網路其該網路 之運作模式包括: 三層倒傳遞類神經網路模式; 四層倒傳遞類神經網路模式; 標準回饋式類神經網路模式;以及 回饋權重值可修正之回饋式類神經網路模式。 1 2.如申請專利範圍第1項所述之運作模式,可利用軟體更新暫存器之 内容來決定該網路之運作模式。 3. 種具彈性架構的高速硬體倒傳遞及回饋型類神經網路其硬體架 構包括: 輸入裝置’雜人裝置為可產生數位職的裝置紐處理器; 可私式化之硬體’其中該可程式化之硬體與該輸入裝置界接接 > _輸人裝置之資料並作運算處理,該可程式化之硬體為元件可程 式邏輯閘陣列;以及 -輸出裝置,該細数無可料化之賴界接,接㈣可程式 算完成之資料’該輪出裝置為可接收數位訊號的裝置 或微處理器。 4. 如申明專利犯圍第丄項所述之運作模式,其中該標準回饋式類神經 網路模式,該模式架構包括: 一連接輸入層; 一回饋處理層; 47 201232429 一隱藏層;以及 一輸出層; 該回饋處理層接收前—次疊代的隱藏層輸出當輸入資料 ,再將接收 :1 斗傳、”α這-人疊代的隱藏層;回饋處理層sa隱藏層之間的 \為固疋’隱藏層的輪入資料來自回饋處理層跟連接輸入層, ~ %乘法累加後隱藏層的活化函數,將結果輸出給該 矜出層’輸ib層触隱藏相輪出值賴重值的乘法 累加後,經過 輸出層的活化函數得到輸出層的輸出。 # 申月專概ϋ第1項所述之運作模式其巾該回娜重值可修正 之回饋式神經網路模式,該模式架構包括: 一連接輸入層; —回饋處理層; 一隱藏層;以及 —輪出層; 該回饋處理層接收前一次疊代的隱藏層輸出當輸入資料 ,再將接收 · 的輪入貝料’傳給這次疊代的隱藏層;回饋處理層跟隱藏層之間的 權重值可經逆向運算修正;隱藏層的輸入資料來自回饋處理層跟連 接輸入層’經權重值的乘法累加後,經過隱藏層的活化函數,將結 果輪出給该輪出層;輸出層接收隱藏層的輸出值跟權重值的乘法累 加後,經過輸出層的活化函數得到輸出層的輸出。 6.如申6月專利範圍第1項所述之運作模式,其中該三層倒傳遞類神經 網路模式’該模式架構包括: 48 201232429 一輸入層; 一隱藏層;以及 一輸出層; 該輸環料綱象_,蝴㈣麵傳 藏層,·該隱藏層的輸入資料經權重值的乘法累加後,經過隱藏:、 活化函數,將結果輸出給該輸出層;該輸⑽接«藏層的輪= 跟權重值縣法累加後,經過輸峙_化錄制輸峙的輪出。 7·如申5青專利範圍第1項所述之運作控+ 延作核式’其中該四層倒傳遞類神經 網路模式’該模式架構包括: 輸入層; 第一隱藏層; 第一隱藏層;以及 輸出層; 該輪入層接收外部t«當輸μ料,再職㈣輸人:胳傳給該第 一隱藏層;該第〆隱藏層的輸人龍經權重值的乘法累加後,經過 該第-1¾藏層的活化函數,將結果輸出給該第二隱藏層;該第二隱 藏層接收該隱藏層的輸出值經權重值縣法累加後經過該第 隱藏層的活化函數’將結果輸^給該輸出層;經過該輸出層的權 重值的乘法累加後’經過該輪出層的活化函數制該輸出層的輪出。 8.如申請專利範圍第4項所述之標準回饋式類神經網路模式,可利用 控制軟體更新暫存器之内容來分別決定隱藏層及輸出層所使用的活 化函數。 49 201232429 9· 2請專利翻第5項所述之_權重值可修正之回饋式類神經網 横式,可㈣健更新暫翻之魄來分概定隱藏層及輸 出層所使用的活化函數。 10·如申物卿5嫩娜_可修私嘯式類神經網 路模式,可利用控制軟體更新暫存器之内容來分別決定回饋處理層 及隱藏層之間所使用的初始權重值。 U·如申請專利範圍第6項所述之三層倒傳遞類神經網路模式,可利用201232429 VII. Patent application scope: L—a high-speed hardware reverse transfer and feedback type neural network with flexible architecture. The operation modes of the network include: three-layer reverse transfer-like neural network mode; four-layer inverted transfer class Neural network mode; standard feedback-like neural network mode; and feedback-weighted value-correctable feedback-like neural network mode. 1 2. As stated in the operating mode described in item 1 of the patent application, the content of the software update register can be used to determine the mode of operation of the network. 3. The hardware architecture of the high-speed hardware reverse transfer and feedback type neural network with elastic structure includes: Input device 'Miscellaneous device is a device that can generate digital jobs; can be customized hardware' Wherein the programmable hardware is connected to the input device and the data of the input device is processed and processed, the programmable hardware is an element programmable logic gate array; and the output device is The number of unrecognized interfaces is connected to (4) the data that can be programmed to calculate the 'device' is a device or microprocessor that can receive digital signals. 4. The operational mode described in the patent stipulations, wherein the standard feedback-like neural network mode comprises: a connection input layer; a feedback processing layer; 47 201232429 a hidden layer; Output layer; the feedback processing layer receives the hidden layer output of the previous-time iteration when input data, and then receives: 1 bucket, "α-the hidden layer of the person's iteration; the feedback layer sa between the hidden layers\ For the solid layer, the enrollment data of the hidden layer comes from the feedback processing layer and the input layer. The % % multiply accumulates the activation function of the hidden layer, and outputs the result to the output layer. After the multiplication and multiplication, the output of the output layer is obtained through the activation function of the output layer. # 申月Special ϋ The operation mode described in Item 1 is a feedback-reversible neural network mode. The architecture includes: a connection input layer; a feedback processing layer; a hidden layer; and a rounding layer; the feedback processing layer receives the hidden layer output of the previous iteration when inputting data, and then receiving The rounded feed material is passed to the hidden layer of the iteration; the weight value between the feedback processing layer and the hidden layer can be corrected by the inverse operation; the input data of the hidden layer comes from the feedback processing layer and the input input layer's weighted value. After the multiplication is added, the result is rounded out to the rounding layer through the activation function of the hidden layer; the output layer receives the multiplication of the output value of the hidden layer and the weight value, and then obtains the output of the output layer through the activation function of the output layer. The operation mode described in the first paragraph of the patent scope of June, wherein the three-layer reverse transfer type neural network mode includes: 48 201232429 an input layer; a hidden layer; and an output layer; The ring material image _, the butterfly (four) face transfer layer, · the input data of the hidden layer is multiplied by the weight value, after the hidden:, activation function, the result is output to the output layer; the input (10) is connected to the «Tibetan layer The round = with the weight of the county after the accumulation of the method, after the transmission of the _ _ record of the output of the round. 7 · As stated in the application of the 5th patent scope of the scope of the operation control + extended nuclear type of which the four layers Transfer-like neural network Road mode 'The mode structure includes: an input layer; a first hidden layer; a first hidden layer; and an output layer; the round-in layer receives an external t« when the input material is re-submitted (4) input: the trait is passed to the first a hidden layer; after multiplying and accumulating the weights of the input dragons of the hidden layer, the result is output to the second hidden layer through the activation function of the layer -1; the second hidden layer receives the hidden layer The output value is accumulated by the weight value county method and then passed through the activation function of the first hidden layer to transmit the result to the output layer; after the multiplication of the weight value of the output layer, the activation function of the round-trip layer is used. The rounding of the output layer. 8. As in the standard feedback-type neural network mode described in claim 4, the content of the control software update register can be used to determine the activation function used by the hidden layer and the output layer, respectively. . 49 201232429 9· 2 Please turn the patent to the _ weight value can be corrected in the feedback-type neural network horizontal type, and (4) the health update temporarily to determine the activation function used by the hidden layer and the output layer. . 10. If Shen Qingqing 5 Nina _ can repair the whistling type of neural network mode, the content of the control software update register can be used to determine the initial weight value used between the feedback processing layer and the hidden layer. U. The three-layer inverted transfer neural network mode described in item 6 of the patent application scope can be utilized. 控制軟體更新暫存器之内容來分別決定隱藏層及輸出層所使用的活 化函數。 12.如申請專利範圍第7項所述之四層倒傳遞類神經網路模式,可利用 控制軟體更新暫存器之内容來分別決定第一隱藏層、第二隱藏層及 輸出層所使用的活化函數。 •如申-月專利範圍第8項所述之標準回饋式類神經網路模式,其中該 活化函數為雙彎曲函數或雙彎曲正切函數。 H.如申請專利範圍第9項所述之回饋權重值可修正之回饋式類神經網 籲 路模式,其中該活化函數為雙彎曲函數或雙彎曲正切函數。 如申清專利範圍第η項所述之三層倒傳遞類神經網路模式其中該 活化函數為雙彎曲函數或雙彎曲正切函數。 16.如申請專利範圍第12項所述之四層倒傳遞類神經網路模式,其中該 活化函數為雙彎曲函數或雙彎曲正切函數。 如申請專利範圍第13項所述之活化函數,其中該雙彎曲函數,包括: 一乘法器,該乘法器接收一輸入訊號並與一固定數值—1相乘; 50 201232429 一第一多工器 作一選擇; 該第一多工器接收該乘法器 之輪出及該輸入訊號並 —第一加法器 數值2相加; 該第一加法器接收該第— 多工器之輸出並與一固定 益接收該第一加 一第-二位元移位暫存器,該第—二位元移位暫存 法器之輸出並作二位元之移位;Control the contents of the software update register to determine the activation function used by the hidden layer and the output layer, respectively. 12. The four-layer inverted transfer neural network mode described in claim 7 of the patent application, the content of the control software update register can be used to determine the first hidden layer, the second hidden layer and the output layer respectively. Activation function. • A standard feedback type neural network mode as described in claim 8 of the patent application, wherein the activation function is a double bending function or a double bending tangent function. H. A feedback-type neural network callback mode in which the feedback weight value described in claim 9 is modified, wherein the activation function is a double bending function or a double bending tangent function. For example, the three-layer inverted transfer neural network mode described in item η of the patent scope includes the double bending function or the double bending tangent function. 16. The four-layer inverted transfer neural network mode of claim 12, wherein the activation function is a double bending function or a double bending tangent function. The activation function according to claim 13, wherein the double bending function comprises: a multiplier, the multiplier receives an input signal and multiplied by a fixed value of -1; 50 201232429 a first multiplexer Making a selection; the first multiplexer receives the rounding of the multiplier and the input signal and - adding the first adder value 2; the first adder receives the output of the first multiplexer and is fixed Receiving the first plus one nd-bit shift register, the output of the first two-bit shift register is shifted by two bits; -第二加法器’該第二加法器接收該第—加法器之輪出並與一第一 内插深度值相加; 一第二多工器’該第二多工器接收該第—二紅移位暫存器之輸出 及一固定數值〇並作一選擇; -三位元移位暫存器,該三位元移位暫存器接收該第二加法器之輸 出並作三位元之移位; 一第三加法器,該第三加法器接收該第二多工器之輪出並與一第二 内插深度值、該三位元移位暫存器之輸出作相加; -第三多工器’該第三多工器接收該第二多工器之輪出及該三位元 移位暫存器之輸出並作一選擇; -第二二位元移位暫存器,該第二二位元移位暫存器接收該第三加 法器之輸出並作二位元之移位; -第四多工器’該第四多工器接收該第三多工器之輸出及該第二二 位元移位暫存器之輸出並作一選擇; 一減法器,該減法器接收該第四多工器之輸出並與一固定數值^作 相減; 51 201232429 丁 口 ,該第五多工器接收該減法器之輪出及該第四多工器 之輸出並作一選· 該第五多工器之輪出即為 一雙彎曲函數。 18. 如申清專利範圍第13項所述之活化函數,其中該雙f曲正切函數, 包括: 一乘法器’該乘法器接收-輸入訊號並與-固定數值-i作相乘; °〇該第一多工器接收s亥乘法器之輸出及該輸入訊號並 作一選擇;a second adder that receives the round of the first adder and adds it to a first interpolated depth value; a second multiplexer 'the second multiplexer receives the second Red shift register output and a fixed value 〇 and make a choice; - three-bit shift register, the three-bit shift register receives the output of the second adder and makes three bits a third adder, the third adder receives the rotation of the second multiplexer and adds a second interpolation depth value and an output of the three-bit shift register; a third multiplexer that receives the output of the second multiplexer and the output of the three-bit shift register and makes a selection; - the second two-bit shift temporary storage The second two-bit shift register receives the output of the third adder and shifts the two bits; - the fourth multiplexer receives the third multiplexer The output and the output of the second bit shift register are selected as a selection; a subtractor that receives the output of the fourth multiplexer and subtracts from a fixed value 51 201232429 Dingkou, the fifth multiplexer receives the rounding of the subtractor and the output of the fourth multiplexer and makes a selection. The rounding of the fifth multiplexer is a double bending function. 18. The activation function as recited in claim 13 wherein the double f-curve function comprises: a multiplier 'the multiplier receives the input signal and multiplies the fixed value -i; The first multiplexer receives the output of the s-hai multiplier and the input signal and makes a selection; 一第一加法器’該第—加法器接收該第—多卫器之輸出並與-固定 數值-卜-第—内插深度值作相加; 第夕器,該第二多工器接收該第一多工器之輸出及一固定數 值-1並作一選擇; 第位元移位暫存器,該第--位元移位暫存器接收該第-力口 法器之輸出並作一位元移位;a first adder 'the first adder receives the output of the first multi-guard and adds the value to the - fixed value - the first - the interpolated depth value; the first multiplexer receives the The output of the first multiplexer and a fixed value of -1 are selected as a selection; the first bit shift register, the first bit shift register receives the output of the first force port and performs One-bit shift; 一第二加法器,該第二加法器接收該第二多工器之輸出並與該第一 一位兀移位暫存器之輸出、一第二内插深度值作相加; 一第三多工器,該第三多工器接收該第一一位元移位暫存器之輸出 及該第二多工器之輸出; 一第二一位7L移位暫存器,該第二一位元移位暫存器接收該第二加 法器之輸出並作一位元之移位; 一第四多工器,該第四多工器接收該第二一位元移位暫存器之輸出 及該第二多工之輸出並作一選擇; 52 201232429 減法器,該減㈣接收該第四多王器之輪出 -第五多玉||,該第五多工器接收 之輸出並作-選擇; U益之輪出及該滅法器 該第五多工器之輪出即為—雙彎曲正切函數。 19·如申賴_第14 述之活罐,其中物曲函數,包括: -乘法器,該乘法器接收一輸入訊號並與—固定數值4相乘;a second adder, the second adder receives the output of the second multiplexer and adds the output of the first bit 兀 shift register and a second interpolation depth value; a multiplexer, the third multiplexer receives an output of the first bit shift register and an output of the second multiplexer; a second bit 7L shift register, the second one The bit shift register receives the output of the second adder and shifts by one bit; a fourth multiplexer receives the second bit shift register Output and the output of the second multiplexer and make a selection; 52 201232429 subtractor, the subtraction (four) receives the fourth multi-master round-out-fifth jade||, the fifth multiplexer receives the output and Making-selection; U-Yuan's turn-out and the killer The fifth multiplexer's turn is the double-bend tangent function. 19. The living tank according to claim 14, wherein the volume function comprises: - a multiplier, the multiplier receives an input signal and multiplies with a fixed value of 4; 一第一多卫器’該第-多工器接收該乘法器之輪出及該輸入訊號並 作一選擇; 一第一加法器,該第一加法器接收該第一多工器 數值2相加; -第--位7G移靖存器,該第—二位元移位暫存_收該第一加 法器之輸出並作二位元之移位;a first multi-guard: the first multiplexer receives the multiplier and the input signal and makes a selection; a first adder, the first adder receives the first multiplexer value 2 Adding a first shift to the first adder and taking the shift of the two bits; 並與一固定值〇作相 之輪出並與一固定 -第二加法器,該第二加法!I接收該第_加法器之輪出並與一第一 内插深度值相加; -第二多4 ’該第二#器接收該第—二就移位暫存器之輸出 及一固定數值0並作一選擇; 一二位元移位暫存器,該三位元移位暫存器接收該第二加法器之輸 出並作三位元之移位; 一第三加法器,該第三加法器接收該第二多工器之輸出並與—第二 内插深度值、該三位元移位暫存器之輪出作相加; -第三多工器,該第三多卫器接收該第二多工器之輪出及該三位元 53 201232429 移位暫存器之輸出並作一選擇; -第二二位元移位暫存n,該第二二位元移位暫存器接收該第三加 法器之輸出並作二位元之移位; -第四多工器,該第四多丄器接收該第三多卫器之輸出及該第二二 位元移位暫存器之輸出並作一選擇; -減法器,賊法器接收該第四多工器之輸出並與—固定數值】作 相減; -第五多工器,該第五多卫器接收該減法器之輪出及該第四多工器 之輸出並作一選擇; ° 該第五多工器之輸出即為一雙彎曲函數。 20.如申請專利範圍帛14項所述之活化函數,其中該雙彎曲正切函數, 包括: 一乘法器,該乘法器接收一輸入訊號並與一固定數值_1作相乘· -第-多工II,該第-多工器接收該乘法器之輪出及該輸入訊號並 作一選擇; -第一加法器,該第-加法器接收該第一多工器之輸出並與—固定 數值-1、一第一内插深度值作相加; -第二多工器,該第二多工器接收該第—多卫器之輸出及一固定數 值-1並作一選擇: -第--位元雜暫補,鱗—位元移㈣料触該第一加 法|§之輸出並作一位元移位; -第二加法器’該第二加法II接收該第二多1之輪出並與該第一 54 201232429 一位兀移位暫存器之輸出、一第二内插深度值作相加; 之輪出 -第三多工器’該第三多工器接收該第_____位元移位暫存器 及5亥第·一多工器之輸出; 器接收該第二加 -第二-位元移位暫存器,該第二一位元移位暫存 法器之輸出並作一位元之移位·, -第四多工器’該第四多工器接收該第二—位元移位暫存器之輪出 及該第三多工器之輪出並作一選擇; 四多工器之輸出並與-固定值〇作相 減法器,s亥減法器接收該第 減; 器 :°亥第五夕工器接收該第四多工器之輸出及該減法 之輪出並作一選擇; 幻③第五多工器之輪出即為一雙彎曲正切函數。 ’如申請專概圍㈣項所述之回 根今^ 垔值了修正之回饋式類神經網路 、式,,、中S亥權重值亦可經訓練、Μ J J,、工丨深學習後計算而修正。 55And rotating with a fixed value and a fixed-second adder, the second addition! I receives the round of the first adder and adds a value of a first interpolation depth; Two more 4's the second #1 receives the second-second shift register output and a fixed value of 0 and makes a selection; a two-bit shift register, the three-bit shift temporary storage Receiving the output of the second adder and shifting the three bits; a third adder, the third adder receiving the output of the second multiplexer and - the second interpolated depth value, the third The bit shift register is added by the wheel; the third multiplexer receives the wheel of the second multiplexer and the three bits 53 201232429 shift register Outputting and making a selection; - the second two-bit shift temporary storage n, the second two-bit shift register receiving the output of the third adder and shifting by two bits; - fourth a fourth multi-tuner receives the output of the third multi-guard and the output of the second two-bit shift register and makes a selection; - a subtractor, the thief receives the fourth The output of the multiplexer is subtracted from the - fixed value; - a fifth multiplexer that receives the wheel of the subtractor and the output of the fourth multiplexer and makes a selection; The output of the fifth multiplexer is a double bending function. 20. The activation function of claim 14, wherein the double bending tangent function comprises: a multiplier that receives an input signal and multiplies it by a fixed value _1. Working II, the first multiplexer receives the rounding of the multiplier and the input signal and makes a selection; - a first adder, the first adder receives the output of the first multiplexer and a fixed value -1, a first interpolation depth value is added; - a second multiplexer, the second multiplexer receives the output of the first multi-guard and a fixed value -1 and makes a selection: - - - bit miscellaneous, scale-bit shift (four) to touch the first addition | § output and make a one-bit shift; - second adder 'the second adder II receives the second more than 1 round And outputting the output of the first 54 201232429 one-bit shift register and a second interpolation depth value; the round-out third multiplexer receives the first _ ____ bit shift register and the output of the 5th multiplexer; the device receives the second plus-second-bit shift register, the second bit The output of the temporary storage device is shifted by one bit. - The fourth multiplexer receives the second-bit shift register and the third The wheel of the tool is turned out and made a choice; the output of the four multiplexer is combined with the - fixed value as a subtractor, and the s-hai subtractor receives the first subtraction; The output of the tool and the rounding of the subtraction are made as a choice; the turn of the fifth multiplexer of the magic 3 is a pair of bending tangent functions. 'If the application of the general section (4) mentioned in the article back to the roots ^ 垔 了 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 修正 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回 回Corrected by calculation. 55
TW100101585A 2011-01-17 2011-01-17 Resilient high - speed hardware reverse transfer and feedback type neural network system TWI525558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW100101585A TWI525558B (en) 2011-01-17 2011-01-17 Resilient high - speed hardware reverse transfer and feedback type neural network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW100101585A TWI525558B (en) 2011-01-17 2011-01-17 Resilient high - speed hardware reverse transfer and feedback type neural network system

Publications (2)

Publication Number Publication Date
TW201232429A true TW201232429A (en) 2012-08-01
TWI525558B TWI525558B (en) 2016-03-11

Family

ID=47069602

Family Applications (1)

Application Number Title Priority Date Filing Date
TW100101585A TWI525558B (en) 2011-01-17 2011-01-17 Resilient high - speed hardware reverse transfer and feedback type neural network system

Country Status (1)

Country Link
TW (1) TWI525558B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9697463B2 (en) 2015-05-21 2017-07-04 Google Inc. Computing convolutions using a neural network processor
US9710748B2 (en) 2015-05-21 2017-07-18 Google Inc. Neural network processor
US9747548B2 (en) 2015-05-21 2017-08-29 Google Inc. Rotating data for neural network computations
US9805304B2 (en) 2015-05-21 2017-10-31 Google Inc. Prefetching weights for use in a neural network processor
US9842293B2 (en) 2015-05-21 2017-12-12 Google Inc. Batch processing in a neural network processor
US10074051B2 (en) 2015-05-21 2018-09-11 Google Llc Vector computation unit in a neural network processor
TWI664587B (en) * 2017-05-19 2019-07-01 美商谷歌有限責任公司 Scheduling neural network processing
CN110673824A (en) * 2018-07-03 2020-01-10 赛灵思公司 Matrix vector multiplication circuit and circular neural network hardware accelerator
TWI684140B (en) * 2017-03-29 2020-02-01 英屬開曼群島商意騰科技股份有限公司 Processing apparatus and method for artificial neuron
TWI684141B (en) * 2017-10-12 2020-02-01 英屬開曼群島商意騰科技股份有限公司 Apparatus and method for accelerating multiplication with none-zero packets in artificial neuron
TWI688838B (en) * 2017-10-06 2020-03-21 日商佳能股份有限公司 Control device, lithography device, measuring device, processing device, flattening device, and article manufacturing method
US10650303B2 (en) 2017-02-14 2020-05-12 Google Llc Implementing neural networks in fixed point arithmetic computing systems
US11399079B2 (en) 2018-02-14 2022-07-26 Eingot Llc Zero-knowledge environment based networking engine

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10878316B2 (en) 2015-05-21 2020-12-29 Google Llc Prefetching weights for use in a neural network processor
US11755895B2 (en) 2015-05-21 2023-09-12 Google Llc Rotating data for neural network computations
US9747546B2 (en) 2015-05-21 2017-08-29 Google Inc. Neural network processor
US9747548B2 (en) 2015-05-21 2017-08-29 Google Inc. Rotating data for neural network computations
US9805303B2 (en) 2015-05-21 2017-10-31 Google Inc. Rotating data for neural network computations
US9805304B2 (en) 2015-05-21 2017-10-31 Google Inc. Prefetching weights for use in a neural network processor
US12014272B2 (en) 2015-05-21 2024-06-18 Google Llc Vector computation unit in a neural network processor
TWI622939B (en) * 2015-05-21 2018-05-01 谷歌有限責任公司 Method,system and computer-readable medium for performing neural network computations for a neural network
US10049322B2 (en) 2015-05-21 2018-08-14 Google Llc Prefetching weights for use in a neural network processor
US10074051B2 (en) 2015-05-21 2018-09-11 Google Llc Vector computation unit in a neural network processor
US10083395B2 (en) 2015-05-21 2018-09-25 Google Llc Batch processing in a neural network processor
US10192162B2 (en) 2015-05-21 2019-01-29 Google Llc Vector computation unit in a neural network processor
US11853865B2 (en) 2015-05-21 2023-12-26 Google Llc Prefetching weights for use in a neural network processor
US10438117B1 (en) 2015-05-21 2019-10-08 Google Llc Computing convolutions using a neural network processor
TWI825596B (en) * 2015-05-21 2023-12-11 美商谷歌有限責任公司 Circuit, method and non-transitory machine-readable storage devices for performing neural network computations
US10699188B2 (en) 2015-05-21 2020-06-30 Google Llc Neural network processor
US11620513B2 (en) 2015-05-21 2023-04-04 Google Llc Computing convolutions using a neural network processor
US9697463B2 (en) 2015-05-21 2017-07-04 Google Inc. Computing convolutions using a neural network processor
US9842293B2 (en) 2015-05-21 2017-12-12 Google Inc. Batch processing in a neural network processor
US9710748B2 (en) 2015-05-21 2017-07-18 Google Inc. Neural network processor
US11620508B2 (en) 2015-05-21 2023-04-04 Google Llc Vector computation unit in a neural network processor
US11586920B2 (en) 2015-05-21 2023-02-21 Google Llc Neural network processor
US11049016B2 (en) 2015-05-21 2021-06-29 Google Llc Neural network processor
US11281966B2 (en) 2015-05-21 2022-03-22 Google Llc Prefetching weights for use in a neural network processor
US11170291B2 (en) 2015-05-21 2021-11-09 Google Llc Rotating data for neural network computations
US11210580B2 (en) 2015-05-21 2021-12-28 Google Llc Rotating data for neural network computations
US11216726B2 (en) 2015-05-21 2022-01-04 Google Llc Batch processing in a neural network processor
US11227216B2 (en) 2015-05-21 2022-01-18 Google Llc Batch processing in a neural network processor
TWI787803B (en) * 2017-02-14 2022-12-21 美商谷歌有限責任公司 Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems
TWI728230B (en) * 2017-02-14 2021-05-21 美商谷歌有限責任公司 Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems
TWI823571B (en) * 2017-02-14 2023-11-21 美商谷歌有限責任公司 Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems
US11868864B2 (en) 2017-02-14 2024-01-09 Google Llc Implementing neural networks in fixed point arithmetic computing systems
US10650303B2 (en) 2017-02-14 2020-05-12 Google Llc Implementing neural networks in fixed point arithmetic computing systems
TWI684140B (en) * 2017-03-29 2020-02-01 英屬開曼群島商意騰科技股份有限公司 Processing apparatus and method for artificial neuron
US11157794B2 (en) 2017-05-19 2021-10-26 Google Llc Scheduling neural network processing
TWI664587B (en) * 2017-05-19 2019-07-01 美商谷歌有限責任公司 Scheduling neural network processing
TWI688838B (en) * 2017-10-06 2020-03-21 日商佳能股份有限公司 Control device, lithography device, measuring device, processing device, flattening device, and article manufacturing method
TWI684141B (en) * 2017-10-12 2020-02-01 英屬開曼群島商意騰科技股份有限公司 Apparatus and method for accelerating multiplication with none-zero packets in artificial neuron
US11399079B2 (en) 2018-02-14 2022-07-26 Eingot Llc Zero-knowledge environment based networking engine
CN110673824A (en) * 2018-07-03 2020-01-10 赛灵思公司 Matrix vector multiplication circuit and circular neural network hardware accelerator

Also Published As

Publication number Publication date
TWI525558B (en) 2016-03-11

Similar Documents

Publication Publication Date Title
TW201232429A (en) High-speed hardware back-propagation and recurrent type artificial neural network with flexible architecture
JP6865847B2 (en) Processing equipment, chips, electronic equipment and methods
CN107316078B (en) Apparatus and method for performing artificial neural network self-learning operation
CN109117948A (en) Painting style conversion method and Related product
CN109062611A (en) Processing with Neural Network device and its method for executing vector scaling instruction
CN108345935A (en) Product and arithmetic unit, network element and network equipment
CN109697510A (en) Method and apparatus with neural network
CN114127680B (en) System and method for supporting alternative digital formats for efficient multiplication
CN107506828A (en) Computing device and method
CN110163363A (en) A kind of computing device and method
CN104238993A (en) Vector matrix product accelerator for microprocessor integration
CN107423816A (en) A kind of more computational accuracy Processing with Neural Network method and systems
WO2022111002A1 (en) Method and apparatus for training neural network, and computer readable storage medium
US20210397596A1 (en) Lookup table activation functions for neural networks
JP2022539495A (en) Systems and methods supporting asymmetric scaling factors for negative and positive values
Mukhopadhyay et al. Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array
Liu et al. An energy-efficient mixed-bit CNN accelerator with column parallel readout for ReRAM-based in-memory computing
Guckert Memristor-based arithmetic units
Lu et al. An RRAM-Based Computing-in-Memory Architecture and Its Application in Accelerating Transformer Inference
Prasad et al. Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality With At-MRAM Neural Engine
TW202311890A (en) Configurable nonlinear activation function circuits
Yi et al. RDCIM: RISC-V Supported Full-Digital Computing-in-Memory Processor With High Energy Efficiency and Low Area Overhead
Kao et al. A Behavior-Level Simulation Framework for RRAM-Based Deep Learning Accelerators with Flexible Architecture Configurations
Huang et al. Teaching hardware implementation of neural networks using high-level synthesis in less than four hours for engineering education of intelligent embedded computing
Lee et al. ReQUSA: a novel ReRAM-based hardware accelerator architecture for high-speed quantum computer simulation

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees